What can R do about a messy data format?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
31
down vote

favorite
6












Sometimes I see data posted in a Stack Overflow question formatted like in this question. This is not the first time, so I have decided to ask a question about it, and answer the question with a way to make the posted data palatable.



I will post the dataset example here just in case the question is deleted.



+------------+------+------+----------+--------------------------+
| Date | Emp1 | Case | Priority | PriorityCountinLast7days |
+------------+------+------+----------+--------------------------+
| 2018-06-01 | A | A1 | 0 | 0 |
| 2018-06-03 | A | A2 | 0 | 1 |
| 2018-06-03 | A | A3 | 0 | 2 |
| 2018-06-03 | A | A4 | 1 | 1 |
| 2018-06-03 | A | A5 | 2 | 1 |
| 2018-06-04 | A | A6 | 0 | 3 |
| 2018-06-01 | B | B1 | 0 | 1 |
| 2018-06-02 | B | B2 | 0 | 2 |
| 2018-06-03 | B | B3 | 0 | 3 |
+------------+------+------+----------+--------------------------+


As you can see this is not the right way to post data. As a user wrote in a comment,




It must've taken a bit of time to format the data the way you're
showing it here. Unfortunately this is not a good format for us to
copy & paste.




I believe this says it all. The asker is well intended and it took some work and time to try to be nice, but the result is not good.



What can R code do to make that table usable, if anything? Will it take a great deal of trouble?










share|improve this question























  • This is an interesting question by itself. However, I fear that a good answer might encourage people in presenting their dataset like that.
    – nicola
    Aug 26 at 6:39










  • @nicola I surely hope not! It's already bad to see it every now and then, maybe it will have the opposite effect :).
    – Rui Barradas
    Aug 26 at 6:48










  • I would enjoy a general smart messy format reader for SO. that would read those, and printed data.frames/ data.tables / tibbles, commented or not, with row numbers or not, recognize dates, and ideally try to guess a way to sort out strings containing spaces, I thought of building it in fact but never got around it.
    – Moody_Mudskipper
    Aug 29 at 20:25














up vote
31
down vote

favorite
6












Sometimes I see data posted in a Stack Overflow question formatted like in this question. This is not the first time, so I have decided to ask a question about it, and answer the question with a way to make the posted data palatable.



I will post the dataset example here just in case the question is deleted.



+------------+------+------+----------+--------------------------+
| Date | Emp1 | Case | Priority | PriorityCountinLast7days |
+------------+------+------+----------+--------------------------+
| 2018-06-01 | A | A1 | 0 | 0 |
| 2018-06-03 | A | A2 | 0 | 1 |
| 2018-06-03 | A | A3 | 0 | 2 |
| 2018-06-03 | A | A4 | 1 | 1 |
| 2018-06-03 | A | A5 | 2 | 1 |
| 2018-06-04 | A | A6 | 0 | 3 |
| 2018-06-01 | B | B1 | 0 | 1 |
| 2018-06-02 | B | B2 | 0 | 2 |
| 2018-06-03 | B | B3 | 0 | 3 |
+------------+------+------+----------+--------------------------+


As you can see this is not the right way to post data. As a user wrote in a comment,




It must've taken a bit of time to format the data the way you're
showing it here. Unfortunately this is not a good format for us to
copy & paste.




I believe this says it all. The asker is well intended and it took some work and time to try to be nice, but the result is not good.



What can R code do to make that table usable, if anything? Will it take a great deal of trouble?










share|improve this question























  • This is an interesting question by itself. However, I fear that a good answer might encourage people in presenting their dataset like that.
    – nicola
    Aug 26 at 6:39










  • @nicola I surely hope not! It's already bad to see it every now and then, maybe it will have the opposite effect :).
    – Rui Barradas
    Aug 26 at 6:48










  • I would enjoy a general smart messy format reader for SO. that would read those, and printed data.frames/ data.tables / tibbles, commented or not, with row numbers or not, recognize dates, and ideally try to guess a way to sort out strings containing spaces, I thought of building it in fact but never got around it.
    – Moody_Mudskipper
    Aug 29 at 20:25












up vote
31
down vote

favorite
6









up vote
31
down vote

favorite
6






6





Sometimes I see data posted in a Stack Overflow question formatted like in this question. This is not the first time, so I have decided to ask a question about it, and answer the question with a way to make the posted data palatable.



I will post the dataset example here just in case the question is deleted.



+------------+------+------+----------+--------------------------+
| Date | Emp1 | Case | Priority | PriorityCountinLast7days |
+------------+------+------+----------+--------------------------+
| 2018-06-01 | A | A1 | 0 | 0 |
| 2018-06-03 | A | A2 | 0 | 1 |
| 2018-06-03 | A | A3 | 0 | 2 |
| 2018-06-03 | A | A4 | 1 | 1 |
| 2018-06-03 | A | A5 | 2 | 1 |
| 2018-06-04 | A | A6 | 0 | 3 |
| 2018-06-01 | B | B1 | 0 | 1 |
| 2018-06-02 | B | B2 | 0 | 2 |
| 2018-06-03 | B | B3 | 0 | 3 |
+------------+------+------+----------+--------------------------+


As you can see this is not the right way to post data. As a user wrote in a comment,




It must've taken a bit of time to format the data the way you're
showing it here. Unfortunately this is not a good format for us to
copy & paste.




I believe this says it all. The asker is well intended and it took some work and time to try to be nice, but the result is not good.



What can R code do to make that table usable, if anything? Will it take a great deal of trouble?










share|improve this question















Sometimes I see data posted in a Stack Overflow question formatted like in this question. This is not the first time, so I have decided to ask a question about it, and answer the question with a way to make the posted data palatable.



I will post the dataset example here just in case the question is deleted.



+------------+------+------+----------+--------------------------+
| Date | Emp1 | Case | Priority | PriorityCountinLast7days |
+------------+------+------+----------+--------------------------+
| 2018-06-01 | A | A1 | 0 | 0 |
| 2018-06-03 | A | A2 | 0 | 1 |
| 2018-06-03 | A | A3 | 0 | 2 |
| 2018-06-03 | A | A4 | 1 | 1 |
| 2018-06-03 | A | A5 | 2 | 1 |
| 2018-06-04 | A | A6 | 0 | 3 |
| 2018-06-01 | B | B1 | 0 | 1 |
| 2018-06-02 | B | B2 | 0 | 2 |
| 2018-06-03 | B | B3 | 0 | 3 |
+------------+------+------+----------+--------------------------+


As you can see this is not the right way to post data. As a user wrote in a comment,




It must've taken a bit of time to format the data the way you're
showing it here. Unfortunately this is not a good format for us to
copy & paste.




I believe this says it all. The asker is well intended and it took some work and time to try to be nice, but the result is not good.



What can R code do to make that table usable, if anything? Will it take a great deal of trouble?







r dataframe






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Aug 26 at 11:13









Peter Mortensen

13k1983111




13k1983111










asked Aug 26 at 6:21









Rui Barradas

12.4k31628




12.4k31628











  • This is an interesting question by itself. However, I fear that a good answer might encourage people in presenting their dataset like that.
    – nicola
    Aug 26 at 6:39










  • @nicola I surely hope not! It's already bad to see it every now and then, maybe it will have the opposite effect :).
    – Rui Barradas
    Aug 26 at 6:48










  • I would enjoy a general smart messy format reader for SO. that would read those, and printed data.frames/ data.tables / tibbles, commented or not, with row numbers or not, recognize dates, and ideally try to guess a way to sort out strings containing spaces, I thought of building it in fact but never got around it.
    – Moody_Mudskipper
    Aug 29 at 20:25
















  • This is an interesting question by itself. However, I fear that a good answer might encourage people in presenting their dataset like that.
    – nicola
    Aug 26 at 6:39










  • @nicola I surely hope not! It's already bad to see it every now and then, maybe it will have the opposite effect :).
    – Rui Barradas
    Aug 26 at 6:48










  • I would enjoy a general smart messy format reader for SO. that would read those, and printed data.frames/ data.tables / tibbles, commented or not, with row numbers or not, recognize dates, and ideally try to guess a way to sort out strings containing spaces, I thought of building it in fact but never got around it.
    – Moody_Mudskipper
    Aug 29 at 20:25















This is an interesting question by itself. However, I fear that a good answer might encourage people in presenting their dataset like that.
– nicola
Aug 26 at 6:39




This is an interesting question by itself. However, I fear that a good answer might encourage people in presenting their dataset like that.
– nicola
Aug 26 at 6:39












@nicola I surely hope not! It's already bad to see it every now and then, maybe it will have the opposite effect :).
– Rui Barradas
Aug 26 at 6:48




@nicola I surely hope not! It's already bad to see it every now and then, maybe it will have the opposite effect :).
– Rui Barradas
Aug 26 at 6:48












I would enjoy a general smart messy format reader for SO. that would read those, and printed data.frames/ data.tables / tibbles, commented or not, with row numbers or not, recognize dates, and ideally try to guess a way to sort out strings containing spaces, I thought of building it in fact but never got around it.
– Moody_Mudskipper
Aug 29 at 20:25




I would enjoy a general smart messy format reader for SO. that would read those, and printed data.frames/ data.tables / tibbles, commented or not, with row numbers or not, recognize dates, and ideally try to guess a way to sort out strings containing spaces, I thought of building it in fact but never got around it.
– Moody_Mudskipper
Aug 29 at 20:25












5 Answers
5






active

oldest

votes

















up vote
22
down vote













Using data.table::fread:



x = '
+------------+------+------+----------+--------------------------+
| Date | Emp1 | Case | Priority | PriorityCountinLast7days |
+------------+------+------+----------+--------------------------+
| 2018-06-01 | A | A1 | 0 | 0 |
| 2018-06-03 | A | A2 | 0 | 1 |
| 2018-06-03 | A | A3 | 0 | 2 |
| 2018-06-03 | A | A4 | 1 | 1 |
| 2018-06-03 | A | A5 | 2 | 1 |
| 2018-06-04 | A | A6 | 0 | 3 |
| 2018-06-01 | B | B1 | 0 | 1 |
| 2018-06-02 | B | B2 | 0 | 2 |
| 2018-06-03 | B | B3 | 0 | 3 |
+------------+------+------+----------+--------------------------+
'

fread(gsub('\+.+\n' ,'', x, perl = T), drop=c(1,7))

# Date Emp1 Case Priority PriorityCountinLast7days
# 1: 2018-06-01 A A1 0 0
# 2: 2018-06-03 A A2 0 1
# 3: 2018-06-03 A A3 0 2
# 4: 2018-06-03 A A4 1 1
# 5: 2018-06-03 A A5 2 1
# 6: 2018-06-04 A A6 0 3
# 7: 2018-06-01 B B1 0 1
# 8: 2018-06-02 B B2 0 2
# 9: 2018-06-03 B B3 0 3


The gsub part removes the horizontal rules. drop removes the extra columns caused by delimiters at the line ends.






share|improve this answer





























    up vote
    16
    down vote













    The short answer to the question is yes, R code can solve that mess and no, it doesn't take that much trouble.



    The first step after copying & pasting the table into an R session is to read it in with read.table setting the header, sep, comment.char and strip.white arguments.



    Credits for reminding me of arguments comment.char and strip.white go to @nicola, and his comment.



    dat <- read.table(text = "
    +------------+------+------+----------+--------------------------+
    | Date | Emp1 | Case | Priority | PriorityCountinLast7days |
    +------------+------+------+----------+--------------------------+
    | 2018-06-01 | A | A1 | 0 | 0 |
    | 2018-06-03 | A | A2 | 0 | 1 |
    | 2018-06-03 | A | A3 | 0 | 2 |
    | 2018-06-03 | A | A4 | 1 | 1 |
    | 2018-06-03 | A | A5 | 2 | 1 |
    | 2018-06-04 | A | A6 | 0 | 3 |
    | 2018-06-01 | B | B1 | 0 | 1 |
    | 2018-06-02 | B | B2 | 0 | 2 |
    | 2018-06-03 | B | B3 | 0 | 3 |
    +------------+------+------+----------+--------------------------+
    ", header = TRUE, sep = "|", comment.char = "+", strip.white = TRUE)


    But as you can see there are some issues with the result.



    dat
    X Date Emp1 Case Priority PriorityCountinLast7days X.1
    1 NA 2018-06-01 A A1 0 0 NA
    2 NA 2018-06-03 A A2 0 1 NA
    3 NA 2018-06-03 A A3 0 2 NA
    4 NA 2018-06-03 A A4 1 1 NA
    5 NA 2018-06-03 A A5 2 1 NA
    6 NA 2018-06-04 A A6 0 3 NA
    7 NA 2018-06-01 B B1 0 1 NA
    8 NA 2018-06-02 B B2 0 2 NA
    9 NA 2018-06-03 B B3 0 3 NA


    To have separators start and end each data row made R believe those separators mark extra columns, which is not what is meant by the original question's OP.



    So the second step is to keep only the real columns. I will do this subsetting the columns by their numbers, easily done, they usually are the first and last columns.



    dat <- dat[-c(1, ncol(dat))]
    dat
    Date Emp1 Case Priority PriorityCountinLast7days
    1 2018-06-01 A A1 0 0
    2 2018-06-03 A A2 0 1
    3 2018-06-03 A A3 0 2
    4 2018-06-03 A A4 1 1
    5 2018-06-03 A A5 2 1
    6 2018-06-04 A A6 0 3
    7 2018-06-01 B B1 0 1
    8 2018-06-02 B B2 0 2
    9 2018-06-03 B B3 0 3


    That wasn't too hard, much better.

    In this case there is still a problem, to coerce column Date to class Date.



    dat$Date <- as.Date(dat$Date)


    And the result is satisfactory.



    str(dat)
    'data.frame': 9 obs. of 5 variables:
    $ Date : Date, format: "2018-06-01" "2018-06-03" ...
    $ Emp1 : Factor w/ 2 levels "A","B": 1 1 1 1 1 1 2 2 2
    $ Case : Factor w/ 9 levels "A1","A2","A3",..: 1 2 3 4 5 6 7 8 9
    $ Priority : int 0 0 0 1 2 0 0 0 0
    $ PriorityCountinLast7days: int 0 1 2 1 1 3 1 2 3


    Note that I have not set the more or less standard argument stringsAsFactors = FALSE. If needed, this should be done when running read.table.



    The whole process took only 3 lines of base R code.



    Finally, the end result in dput format, like it should be in the first place.



    dat <-
    structure(list(Date = structure(c(17683, 17685, 17685, 17685,
    17685, 17686, 17683, 17684, 17685), class = "Date"), Emp1 = c("A",
    "A", "A", "A", "A", "A", "B", "B", "B"), Case = c("A1", "A2",
    "A3", "A4", "A5", "A6", "B1", "B2", "B3"), Priority = c(0, 0,
    0, 1, 2, 0, 0, 0, 0), PriorityCountinLast7days = c(0, 1, 2, 1,
    1, 3, 1, 2, 3)), row.names = c(NA, -9L), class = "data.frame")





    share|improve this answer


















    • 1




      @李哲源 I don't know about "safer", I find it more natural to use read table functions since after all that does look like a table.
      – Rui Barradas
      Aug 26 at 6:46






    • 1




      I was about to post utils:::head.default(read.table("clipboard",comment.char="+",sep="|",strip.white=TRUE,header=TRUE,flush=TRUE)[-1],-1), which is not different from what you did (in my line, you manage the first and third line and remove the undesired columns).
      – nicola
      Aug 26 at 8:13

















    up vote
    5
    down vote













    md_table <- scan(text = "
    +------------+------+------+----------+--------------------------+
    | Date | Emp1 | Case | Priority | PriorityCountinLast7days |
    +------------+------+------+----------+--------------------------+
    | 2018-06-01 | A | A1 | 0 | 0 |
    | 2018-06-03 | A | A2 | 0 | 1 |
    | 2018-06-03 | A | A3 | 0 | 2 |
    | 2018-06-03 | A | A4 | 1 | 1 |
    | 2018-06-03 | A | A5 | 2 | 1 |
    | 2018-06-04 | A | A6 | 0 | 3 |
    | 2018-06-01 | B | B1 | 0 | 1 |
    | 2018-06-02 | B | B2 | 0 | 2 |
    | 2018-06-03 | B | B3 | 0 | 3 |
    +------------+------+------+----------+--------------------------+",
    what = "", sep = "", comment.char = "+", quiet = TRUE)

    ## it is clear that there are 5 columns
    mat <- matrix(md_table[md_table != "|"], ncol = 5, byrow = TRUE)
    # [,1] [,2] [,3] [,4] [,5]
    # [1,] "Date" "Emp1" "Case" "Priority" "PriorityCountinLast7days"
    # [2,] "2018-06-01" "A" "A1" "0" "0"
    # [3,] "2018-06-03" "A" "A2" "0" "1"
    # [4,] "2018-06-03" "A" "A3" "0" "2"
    # [5,] "2018-06-03" "A" "A4" "1" "1"
    # [6,] "2018-06-03" "A" "A5" "2" "1"
    # [7,] "2018-06-04" "A" "A6" "0" "3"
    # [8,] "2018-06-01" "B" "B1" "0" "1"
    # [9,] "2018-06-02" "B" "B2" "0" "2"
    #[10,] "2018-06-03" "B" "B3" "0" "3"




    ## a data frame with all character columns
    dat <- setNames(data.frame(mat[-1, ], stringsAsFactors = FALSE), mat[1, ])
    # Date Emp1 Case Priority PriorityCountinLast7days
    #1 2018-06-01 A A1 0 0
    #2 2018-06-03 A A2 0 1
    #3 2018-06-03 A A3 0 2
    #4 2018-06-03 A A4 1 1
    #5 2018-06-03 A A5 2 1
    #6 2018-06-04 A A6 0 3
    #7 2018-06-01 B B1 0 1
    #8 2018-06-02 B B2 0 2
    #9 2018-06-03 B B3 0 3




    ## or maybe just use `type.convert` on some columns?
    dat <- lapply(dat, type.convert)





    share|improve this answer





























      up vote
      3
      down vote













      Well, about this specific dataset I used the import feature in RStudio, but I took one additional step beforehand.



      1. Copy the dataset into the Notepad file.

      2. Replace all | characters with ,


      3. Import the Notepad file using read.csv to RStudio using this code (seperate columns by ,).

      But, if you mean use the R to fully understand it in one step, then I have no idea.






      share|improve this answer






















      • Notepad? Are you assuming Microsoft Windows?
        – Peter Mortensen
        Aug 26 at 11:16










      • @PeterMortensen Yes, but it could be any other editor as well.
        – Salman Lashkarara
        Aug 26 at 11:26

















      up vote
      -2
      down vote













      As it was suggested, you could use dput to save the content of a dataframe to a file, open the file in a text editor and paste its content. An example of mtcar's dataset limited to first 10 rows:



      dput(mtcars %>% head(10), file = 'reproducible.txt')


      The content of reproducible.txt can be used to make a dataframe/tibble as shown below. In such a case data the format is machine readable, but it is hard to be undestood by human at first glance (without pasting into R).



      df <- structure(list(mpg = c(21, 21, 22.8, 21.4, 18.7, 18.1, 14.3,
      24.4, 22.8, 19.2), cyl = c(6, 6, 4, 6, 8, 6, 8, 4, 4, 6), disp = c(160,
      160, 108, 258, 360, 225, 360, 146.7, 140.8, 167.6), hp = c(110,
      110, 93, 110, 175, 105, 245, 62, 95, 123), drat = c(3.9, 3.9,
      3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92), wt = c(2.62,
      2.875, 2.32, 3.215, 3.44, 3.46, 3.57, 3.19, 3.15, 3.44), qsec = c(16.46,
      17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20, 22.9, 18.3), vs = c(0,
      0, 1, 1, 0, 1, 0, 1, 1, 1), am = c(1, 1, 1, 0, 0, 0, 0, 0, 0,
      0), gear = c(4, 4, 4, 3, 3, 3, 3, 4, 4, 4), carb = c(4, 4, 1,
      1, 2, 1, 4, 2, 2, 4)), .Names = c("mpg", "cyl", "disp", "hp",
      "drat", "wt", "qsec", "vs", "am", "gear", "carb"), row.names = c("Mazda RX4",
      "Mazda RX4 Wag", "Datsun 710", "Hornet 4 Drive", "Hornet Sportabout",
      "Valiant", "Duster 360", "Merc 240D", "Merc 230", "Merc 280"), class = "data.frame")





      share|improve this answer


















      • 4




        Thanks for answering, but I think you have not understood what I am asking. In the case of a dataframe already existing in an R session, there is no problem, the problem is to have a table written in a text document with plus signs, dashes an pipe signs. How can that type of table be read in by R?
        – Rui Barradas
        Aug 26 at 6:53










      Your Answer





      StackExchange.ifUsing("editor", function ()
      StackExchange.using("externalEditor", function ()
      StackExchange.using("snippets", function ()
      StackExchange.snippets.init();
      );
      );
      , "code-snippets");

      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "1"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      convertImagesToLinks: true,
      noModals: false,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













       

      draft saved


      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52023709%2fwhat-can-r-do-about-a-messy-data-format%23new-answer', 'question_page');

      );

      Post as a guest






























      5 Answers
      5






      active

      oldest

      votes








      5 Answers
      5






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      22
      down vote













      Using data.table::fread:



      x = '
      +------------+------+------+----------+--------------------------+
      | Date | Emp1 | Case | Priority | PriorityCountinLast7days |
      +------------+------+------+----------+--------------------------+
      | 2018-06-01 | A | A1 | 0 | 0 |
      | 2018-06-03 | A | A2 | 0 | 1 |
      | 2018-06-03 | A | A3 | 0 | 2 |
      | 2018-06-03 | A | A4 | 1 | 1 |
      | 2018-06-03 | A | A5 | 2 | 1 |
      | 2018-06-04 | A | A6 | 0 | 3 |
      | 2018-06-01 | B | B1 | 0 | 1 |
      | 2018-06-02 | B | B2 | 0 | 2 |
      | 2018-06-03 | B | B3 | 0 | 3 |
      +------------+------+------+----------+--------------------------+
      '

      fread(gsub('\+.+\n' ,'', x, perl = T), drop=c(1,7))

      # Date Emp1 Case Priority PriorityCountinLast7days
      # 1: 2018-06-01 A A1 0 0
      # 2: 2018-06-03 A A2 0 1
      # 3: 2018-06-03 A A3 0 2
      # 4: 2018-06-03 A A4 1 1
      # 5: 2018-06-03 A A5 2 1
      # 6: 2018-06-04 A A6 0 3
      # 7: 2018-06-01 B B1 0 1
      # 8: 2018-06-02 B B2 0 2
      # 9: 2018-06-03 B B3 0 3


      The gsub part removes the horizontal rules. drop removes the extra columns caused by delimiters at the line ends.






      share|improve this answer


























        up vote
        22
        down vote













        Using data.table::fread:



        x = '
        +------------+------+------+----------+--------------------------+
        | Date | Emp1 | Case | Priority | PriorityCountinLast7days |
        +------------+------+------+----------+--------------------------+
        | 2018-06-01 | A | A1 | 0 | 0 |
        | 2018-06-03 | A | A2 | 0 | 1 |
        | 2018-06-03 | A | A3 | 0 | 2 |
        | 2018-06-03 | A | A4 | 1 | 1 |
        | 2018-06-03 | A | A5 | 2 | 1 |
        | 2018-06-04 | A | A6 | 0 | 3 |
        | 2018-06-01 | B | B1 | 0 | 1 |
        | 2018-06-02 | B | B2 | 0 | 2 |
        | 2018-06-03 | B | B3 | 0 | 3 |
        +------------+------+------+----------+--------------------------+
        '

        fread(gsub('\+.+\n' ,'', x, perl = T), drop=c(1,7))

        # Date Emp1 Case Priority PriorityCountinLast7days
        # 1: 2018-06-01 A A1 0 0
        # 2: 2018-06-03 A A2 0 1
        # 3: 2018-06-03 A A3 0 2
        # 4: 2018-06-03 A A4 1 1
        # 5: 2018-06-03 A A5 2 1
        # 6: 2018-06-04 A A6 0 3
        # 7: 2018-06-01 B B1 0 1
        # 8: 2018-06-02 B B2 0 2
        # 9: 2018-06-03 B B3 0 3


        The gsub part removes the horizontal rules. drop removes the extra columns caused by delimiters at the line ends.






        share|improve this answer
























          up vote
          22
          down vote










          up vote
          22
          down vote









          Using data.table::fread:



          x = '
          +------------+------+------+----------+--------------------------+
          | Date | Emp1 | Case | Priority | PriorityCountinLast7days |
          +------------+------+------+----------+--------------------------+
          | 2018-06-01 | A | A1 | 0 | 0 |
          | 2018-06-03 | A | A2 | 0 | 1 |
          | 2018-06-03 | A | A3 | 0 | 2 |
          | 2018-06-03 | A | A4 | 1 | 1 |
          | 2018-06-03 | A | A5 | 2 | 1 |
          | 2018-06-04 | A | A6 | 0 | 3 |
          | 2018-06-01 | B | B1 | 0 | 1 |
          | 2018-06-02 | B | B2 | 0 | 2 |
          | 2018-06-03 | B | B3 | 0 | 3 |
          +------------+------+------+----------+--------------------------+
          '

          fread(gsub('\+.+\n' ,'', x, perl = T), drop=c(1,7))

          # Date Emp1 Case Priority PriorityCountinLast7days
          # 1: 2018-06-01 A A1 0 0
          # 2: 2018-06-03 A A2 0 1
          # 3: 2018-06-03 A A3 0 2
          # 4: 2018-06-03 A A4 1 1
          # 5: 2018-06-03 A A5 2 1
          # 6: 2018-06-04 A A6 0 3
          # 7: 2018-06-01 B B1 0 1
          # 8: 2018-06-02 B B2 0 2
          # 9: 2018-06-03 B B3 0 3


          The gsub part removes the horizontal rules. drop removes the extra columns caused by delimiters at the line ends.






          share|improve this answer














          Using data.table::fread:



          x = '
          +------------+------+------+----------+--------------------------+
          | Date | Emp1 | Case | Priority | PriorityCountinLast7days |
          +------------+------+------+----------+--------------------------+
          | 2018-06-01 | A | A1 | 0 | 0 |
          | 2018-06-03 | A | A2 | 0 | 1 |
          | 2018-06-03 | A | A3 | 0 | 2 |
          | 2018-06-03 | A | A4 | 1 | 1 |
          | 2018-06-03 | A | A5 | 2 | 1 |
          | 2018-06-04 | A | A6 | 0 | 3 |
          | 2018-06-01 | B | B1 | 0 | 1 |
          | 2018-06-02 | B | B2 | 0 | 2 |
          | 2018-06-03 | B | B3 | 0 | 3 |
          +------------+------+------+----------+--------------------------+
          '

          fread(gsub('\+.+\n' ,'', x, perl = T), drop=c(1,7))

          # Date Emp1 Case Priority PriorityCountinLast7days
          # 1: 2018-06-01 A A1 0 0
          # 2: 2018-06-03 A A2 0 1
          # 3: 2018-06-03 A A3 0 2
          # 4: 2018-06-03 A A4 1 1
          # 5: 2018-06-03 A A5 2 1
          # 6: 2018-06-04 A A6 0 3
          # 7: 2018-06-01 B B1 0 1
          # 8: 2018-06-02 B B2 0 2
          # 9: 2018-06-03 B B3 0 3


          The gsub part removes the horizontal rules. drop removes the extra columns caused by delimiters at the line ends.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Aug 26 at 18:59

























          answered Aug 26 at 7:42









          dww

          13.1k22550




          13.1k22550






















              up vote
              16
              down vote













              The short answer to the question is yes, R code can solve that mess and no, it doesn't take that much trouble.



              The first step after copying & pasting the table into an R session is to read it in with read.table setting the header, sep, comment.char and strip.white arguments.



              Credits for reminding me of arguments comment.char and strip.white go to @nicola, and his comment.



              dat <- read.table(text = "
              +------------+------+------+----------+--------------------------+
              | Date | Emp1 | Case | Priority | PriorityCountinLast7days |
              +------------+------+------+----------+--------------------------+
              | 2018-06-01 | A | A1 | 0 | 0 |
              | 2018-06-03 | A | A2 | 0 | 1 |
              | 2018-06-03 | A | A3 | 0 | 2 |
              | 2018-06-03 | A | A4 | 1 | 1 |
              | 2018-06-03 | A | A5 | 2 | 1 |
              | 2018-06-04 | A | A6 | 0 | 3 |
              | 2018-06-01 | B | B1 | 0 | 1 |
              | 2018-06-02 | B | B2 | 0 | 2 |
              | 2018-06-03 | B | B3 | 0 | 3 |
              +------------+------+------+----------+--------------------------+
              ", header = TRUE, sep = "|", comment.char = "+", strip.white = TRUE)


              But as you can see there are some issues with the result.



              dat
              X Date Emp1 Case Priority PriorityCountinLast7days X.1
              1 NA 2018-06-01 A A1 0 0 NA
              2 NA 2018-06-03 A A2 0 1 NA
              3 NA 2018-06-03 A A3 0 2 NA
              4 NA 2018-06-03 A A4 1 1 NA
              5 NA 2018-06-03 A A5 2 1 NA
              6 NA 2018-06-04 A A6 0 3 NA
              7 NA 2018-06-01 B B1 0 1 NA
              8 NA 2018-06-02 B B2 0 2 NA
              9 NA 2018-06-03 B B3 0 3 NA


              To have separators start and end each data row made R believe those separators mark extra columns, which is not what is meant by the original question's OP.



              So the second step is to keep only the real columns. I will do this subsetting the columns by their numbers, easily done, they usually are the first and last columns.



              dat <- dat[-c(1, ncol(dat))]
              dat
              Date Emp1 Case Priority PriorityCountinLast7days
              1 2018-06-01 A A1 0 0
              2 2018-06-03 A A2 0 1
              3 2018-06-03 A A3 0 2
              4 2018-06-03 A A4 1 1
              5 2018-06-03 A A5 2 1
              6 2018-06-04 A A6 0 3
              7 2018-06-01 B B1 0 1
              8 2018-06-02 B B2 0 2
              9 2018-06-03 B B3 0 3


              That wasn't too hard, much better.

              In this case there is still a problem, to coerce column Date to class Date.



              dat$Date <- as.Date(dat$Date)


              And the result is satisfactory.



              str(dat)
              'data.frame': 9 obs. of 5 variables:
              $ Date : Date, format: "2018-06-01" "2018-06-03" ...
              $ Emp1 : Factor w/ 2 levels "A","B": 1 1 1 1 1 1 2 2 2
              $ Case : Factor w/ 9 levels "A1","A2","A3",..: 1 2 3 4 5 6 7 8 9
              $ Priority : int 0 0 0 1 2 0 0 0 0
              $ PriorityCountinLast7days: int 0 1 2 1 1 3 1 2 3


              Note that I have not set the more or less standard argument stringsAsFactors = FALSE. If needed, this should be done when running read.table.



              The whole process took only 3 lines of base R code.



              Finally, the end result in dput format, like it should be in the first place.



              dat <-
              structure(list(Date = structure(c(17683, 17685, 17685, 17685,
              17685, 17686, 17683, 17684, 17685), class = "Date"), Emp1 = c("A",
              "A", "A", "A", "A", "A", "B", "B", "B"), Case = c("A1", "A2",
              "A3", "A4", "A5", "A6", "B1", "B2", "B3"), Priority = c(0, 0,
              0, 1, 2, 0, 0, 0, 0), PriorityCountinLast7days = c(0, 1, 2, 1,
              1, 3, 1, 2, 3)), row.names = c(NA, -9L), class = "data.frame")





              share|improve this answer


















              • 1




                @李哲源 I don't know about "safer", I find it more natural to use read table functions since after all that does look like a table.
                – Rui Barradas
                Aug 26 at 6:46






              • 1




                I was about to post utils:::head.default(read.table("clipboard",comment.char="+",sep="|",strip.white=TRUE,header=TRUE,flush=TRUE)[-1],-1), which is not different from what you did (in my line, you manage the first and third line and remove the undesired columns).
                – nicola
                Aug 26 at 8:13














              up vote
              16
              down vote













              The short answer to the question is yes, R code can solve that mess and no, it doesn't take that much trouble.



              The first step after copying & pasting the table into an R session is to read it in with read.table setting the header, sep, comment.char and strip.white arguments.



              Credits for reminding me of arguments comment.char and strip.white go to @nicola, and his comment.



              dat <- read.table(text = "
              +------------+------+------+----------+--------------------------+
              | Date | Emp1 | Case | Priority | PriorityCountinLast7days |
              +------------+------+------+----------+--------------------------+
              | 2018-06-01 | A | A1 | 0 | 0 |
              | 2018-06-03 | A | A2 | 0 | 1 |
              | 2018-06-03 | A | A3 | 0 | 2 |
              | 2018-06-03 | A | A4 | 1 | 1 |
              | 2018-06-03 | A | A5 | 2 | 1 |
              | 2018-06-04 | A | A6 | 0 | 3 |
              | 2018-06-01 | B | B1 | 0 | 1 |
              | 2018-06-02 | B | B2 | 0 | 2 |
              | 2018-06-03 | B | B3 | 0 | 3 |
              +------------+------+------+----------+--------------------------+
              ", header = TRUE, sep = "|", comment.char = "+", strip.white = TRUE)


              But as you can see there are some issues with the result.



              dat
              X Date Emp1 Case Priority PriorityCountinLast7days X.1
              1 NA 2018-06-01 A A1 0 0 NA
              2 NA 2018-06-03 A A2 0 1 NA
              3 NA 2018-06-03 A A3 0 2 NA
              4 NA 2018-06-03 A A4 1 1 NA
              5 NA 2018-06-03 A A5 2 1 NA
              6 NA 2018-06-04 A A6 0 3 NA
              7 NA 2018-06-01 B B1 0 1 NA
              8 NA 2018-06-02 B B2 0 2 NA
              9 NA 2018-06-03 B B3 0 3 NA


              To have separators start and end each data row made R believe those separators mark extra columns, which is not what is meant by the original question's OP.



              So the second step is to keep only the real columns. I will do this subsetting the columns by their numbers, easily done, they usually are the first and last columns.



              dat <- dat[-c(1, ncol(dat))]
              dat
              Date Emp1 Case Priority PriorityCountinLast7days
              1 2018-06-01 A A1 0 0
              2 2018-06-03 A A2 0 1
              3 2018-06-03 A A3 0 2
              4 2018-06-03 A A4 1 1
              5 2018-06-03 A A5 2 1
              6 2018-06-04 A A6 0 3
              7 2018-06-01 B B1 0 1
              8 2018-06-02 B B2 0 2
              9 2018-06-03 B B3 0 3


              That wasn't too hard, much better.

              In this case there is still a problem, to coerce column Date to class Date.



              dat$Date <- as.Date(dat$Date)


              And the result is satisfactory.



              str(dat)
              'data.frame': 9 obs. of 5 variables:
              $ Date : Date, format: "2018-06-01" "2018-06-03" ...
              $ Emp1 : Factor w/ 2 levels "A","B": 1 1 1 1 1 1 2 2 2
              $ Case : Factor w/ 9 levels "A1","A2","A3",..: 1 2 3 4 5 6 7 8 9
              $ Priority : int 0 0 0 1 2 0 0 0 0
              $ PriorityCountinLast7days: int 0 1 2 1 1 3 1 2 3


              Note that I have not set the more or less standard argument stringsAsFactors = FALSE. If needed, this should be done when running read.table.



              The whole process took only 3 lines of base R code.



              Finally, the end result in dput format, like it should be in the first place.



              dat <-
              structure(list(Date = structure(c(17683, 17685, 17685, 17685,
              17685, 17686, 17683, 17684, 17685), class = "Date"), Emp1 = c("A",
              "A", "A", "A", "A", "A", "B", "B", "B"), Case = c("A1", "A2",
              "A3", "A4", "A5", "A6", "B1", "B2", "B3"), Priority = c(0, 0,
              0, 1, 2, 0, 0, 0, 0), PriorityCountinLast7days = c(0, 1, 2, 1,
              1, 3, 1, 2, 3)), row.names = c(NA, -9L), class = "data.frame")





              share|improve this answer


















              • 1




                @李哲源 I don't know about "safer", I find it more natural to use read table functions since after all that does look like a table.
                – Rui Barradas
                Aug 26 at 6:46






              • 1




                I was about to post utils:::head.default(read.table("clipboard",comment.char="+",sep="|",strip.white=TRUE,header=TRUE,flush=TRUE)[-1],-1), which is not different from what you did (in my line, you manage the first and third line and remove the undesired columns).
                – nicola
                Aug 26 at 8:13












              up vote
              16
              down vote










              up vote
              16
              down vote









              The short answer to the question is yes, R code can solve that mess and no, it doesn't take that much trouble.



              The first step after copying & pasting the table into an R session is to read it in with read.table setting the header, sep, comment.char and strip.white arguments.



              Credits for reminding me of arguments comment.char and strip.white go to @nicola, and his comment.



              dat <- read.table(text = "
              +------------+------+------+----------+--------------------------+
              | Date | Emp1 | Case | Priority | PriorityCountinLast7days |
              +------------+------+------+----------+--------------------------+
              | 2018-06-01 | A | A1 | 0 | 0 |
              | 2018-06-03 | A | A2 | 0 | 1 |
              | 2018-06-03 | A | A3 | 0 | 2 |
              | 2018-06-03 | A | A4 | 1 | 1 |
              | 2018-06-03 | A | A5 | 2 | 1 |
              | 2018-06-04 | A | A6 | 0 | 3 |
              | 2018-06-01 | B | B1 | 0 | 1 |
              | 2018-06-02 | B | B2 | 0 | 2 |
              | 2018-06-03 | B | B3 | 0 | 3 |
              +------------+------+------+----------+--------------------------+
              ", header = TRUE, sep = "|", comment.char = "+", strip.white = TRUE)


              But as you can see there are some issues with the result.



              dat
              X Date Emp1 Case Priority PriorityCountinLast7days X.1
              1 NA 2018-06-01 A A1 0 0 NA
              2 NA 2018-06-03 A A2 0 1 NA
              3 NA 2018-06-03 A A3 0 2 NA
              4 NA 2018-06-03 A A4 1 1 NA
              5 NA 2018-06-03 A A5 2 1 NA
              6 NA 2018-06-04 A A6 0 3 NA
              7 NA 2018-06-01 B B1 0 1 NA
              8 NA 2018-06-02 B B2 0 2 NA
              9 NA 2018-06-03 B B3 0 3 NA


              To have separators start and end each data row made R believe those separators mark extra columns, which is not what is meant by the original question's OP.



              So the second step is to keep only the real columns. I will do this subsetting the columns by their numbers, easily done, they usually are the first and last columns.



              dat <- dat[-c(1, ncol(dat))]
              dat
              Date Emp1 Case Priority PriorityCountinLast7days
              1 2018-06-01 A A1 0 0
              2 2018-06-03 A A2 0 1
              3 2018-06-03 A A3 0 2
              4 2018-06-03 A A4 1 1
              5 2018-06-03 A A5 2 1
              6 2018-06-04 A A6 0 3
              7 2018-06-01 B B1 0 1
              8 2018-06-02 B B2 0 2
              9 2018-06-03 B B3 0 3


              That wasn't too hard, much better.

              In this case there is still a problem, to coerce column Date to class Date.



              dat$Date <- as.Date(dat$Date)


              And the result is satisfactory.



              str(dat)
              'data.frame': 9 obs. of 5 variables:
              $ Date : Date, format: "2018-06-01" "2018-06-03" ...
              $ Emp1 : Factor w/ 2 levels "A","B": 1 1 1 1 1 1 2 2 2
              $ Case : Factor w/ 9 levels "A1","A2","A3",..: 1 2 3 4 5 6 7 8 9
              $ Priority : int 0 0 0 1 2 0 0 0 0
              $ PriorityCountinLast7days: int 0 1 2 1 1 3 1 2 3


              Note that I have not set the more or less standard argument stringsAsFactors = FALSE. If needed, this should be done when running read.table.



              The whole process took only 3 lines of base R code.



              Finally, the end result in dput format, like it should be in the first place.



              dat <-
              structure(list(Date = structure(c(17683, 17685, 17685, 17685,
              17685, 17686, 17683, 17684, 17685), class = "Date"), Emp1 = c("A",
              "A", "A", "A", "A", "A", "B", "B", "B"), Case = c("A1", "A2",
              "A3", "A4", "A5", "A6", "B1", "B2", "B3"), Priority = c(0, 0,
              0, 1, 2, 0, 0, 0, 0), PriorityCountinLast7days = c(0, 1, 2, 1,
              1, 3, 1, 2, 3)), row.names = c(NA, -9L), class = "data.frame")





              share|improve this answer














              The short answer to the question is yes, R code can solve that mess and no, it doesn't take that much trouble.



              The first step after copying & pasting the table into an R session is to read it in with read.table setting the header, sep, comment.char and strip.white arguments.



              Credits for reminding me of arguments comment.char and strip.white go to @nicola, and his comment.



              dat <- read.table(text = "
              +------------+------+------+----------+--------------------------+
              | Date | Emp1 | Case | Priority | PriorityCountinLast7days |
              +------------+------+------+----------+--------------------------+
              | 2018-06-01 | A | A1 | 0 | 0 |
              | 2018-06-03 | A | A2 | 0 | 1 |
              | 2018-06-03 | A | A3 | 0 | 2 |
              | 2018-06-03 | A | A4 | 1 | 1 |
              | 2018-06-03 | A | A5 | 2 | 1 |
              | 2018-06-04 | A | A6 | 0 | 3 |
              | 2018-06-01 | B | B1 | 0 | 1 |
              | 2018-06-02 | B | B2 | 0 | 2 |
              | 2018-06-03 | B | B3 | 0 | 3 |
              +------------+------+------+----------+--------------------------+
              ", header = TRUE, sep = "|", comment.char = "+", strip.white = TRUE)


              But as you can see there are some issues with the result.



              dat
              X Date Emp1 Case Priority PriorityCountinLast7days X.1
              1 NA 2018-06-01 A A1 0 0 NA
              2 NA 2018-06-03 A A2 0 1 NA
              3 NA 2018-06-03 A A3 0 2 NA
              4 NA 2018-06-03 A A4 1 1 NA
              5 NA 2018-06-03 A A5 2 1 NA
              6 NA 2018-06-04 A A6 0 3 NA
              7 NA 2018-06-01 B B1 0 1 NA
              8 NA 2018-06-02 B B2 0 2 NA
              9 NA 2018-06-03 B B3 0 3 NA


              To have separators start and end each data row made R believe those separators mark extra columns, which is not what is meant by the original question's OP.



              So the second step is to keep only the real columns. I will do this subsetting the columns by their numbers, easily done, they usually are the first and last columns.



              dat <- dat[-c(1, ncol(dat))]
              dat
              Date Emp1 Case Priority PriorityCountinLast7days
              1 2018-06-01 A A1 0 0
              2 2018-06-03 A A2 0 1
              3 2018-06-03 A A3 0 2
              4 2018-06-03 A A4 1 1
              5 2018-06-03 A A5 2 1
              6 2018-06-04 A A6 0 3
              7 2018-06-01 B B1 0 1
              8 2018-06-02 B B2 0 2
              9 2018-06-03 B B3 0 3


              That wasn't too hard, much better.

              In this case there is still a problem, to coerce column Date to class Date.



              dat$Date <- as.Date(dat$Date)


              And the result is satisfactory.



              str(dat)
              'data.frame': 9 obs. of 5 variables:
              $ Date : Date, format: "2018-06-01" "2018-06-03" ...
              $ Emp1 : Factor w/ 2 levels "A","B": 1 1 1 1 1 1 2 2 2
              $ Case : Factor w/ 9 levels "A1","A2","A3",..: 1 2 3 4 5 6 7 8 9
              $ Priority : int 0 0 0 1 2 0 0 0 0
              $ PriorityCountinLast7days: int 0 1 2 1 1 3 1 2 3


              Note that I have not set the more or less standard argument stringsAsFactors = FALSE. If needed, this should be done when running read.table.



              The whole process took only 3 lines of base R code.



              Finally, the end result in dput format, like it should be in the first place.



              dat <-
              structure(list(Date = structure(c(17683, 17685, 17685, 17685,
              17685, 17686, 17683, 17684, 17685), class = "Date"), Emp1 = c("A",
              "A", "A", "A", "A", "A", "B", "B", "B"), Case = c("A1", "A2",
              "A3", "A4", "A5", "A6", "B1", "B2", "B3"), Priority = c(0, 0,
              0, 1, 2, 0, 0, 0, 0), PriorityCountinLast7days = c(0, 1, 2, 1,
              1, 3, 1, 2, 3)), row.names = c(NA, -9L), class = "data.frame")






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Aug 27 at 2:01

























              answered Aug 26 at 6:41









              Rui Barradas

              12.4k31628




              12.4k31628







              • 1




                @李哲源 I don't know about "safer", I find it more natural to use read table functions since after all that does look like a table.
                – Rui Barradas
                Aug 26 at 6:46






              • 1




                I was about to post utils:::head.default(read.table("clipboard",comment.char="+",sep="|",strip.white=TRUE,header=TRUE,flush=TRUE)[-1],-1), which is not different from what you did (in my line, you manage the first and third line and remove the undesired columns).
                – nicola
                Aug 26 at 8:13












              • 1




                @李哲源 I don't know about "safer", I find it more natural to use read table functions since after all that does look like a table.
                – Rui Barradas
                Aug 26 at 6:46






              • 1




                I was about to post utils:::head.default(read.table("clipboard",comment.char="+",sep="|",strip.white=TRUE,header=TRUE,flush=TRUE)[-1],-1), which is not different from what you did (in my line, you manage the first and third line and remove the undesired columns).
                – nicola
                Aug 26 at 8:13







              1




              1




              @李哲源 I don't know about "safer", I find it more natural to use read table functions since after all that does look like a table.
              – Rui Barradas
              Aug 26 at 6:46




              @李哲源 I don't know about "safer", I find it more natural to use read table functions since after all that does look like a table.
              – Rui Barradas
              Aug 26 at 6:46




              1




              1




              I was about to post utils:::head.default(read.table("clipboard",comment.char="+",sep="|",strip.white=TRUE,header=TRUE,flush=TRUE)[-1],-1), which is not different from what you did (in my line, you manage the first and third line and remove the undesired columns).
              – nicola
              Aug 26 at 8:13




              I was about to post utils:::head.default(read.table("clipboard",comment.char="+",sep="|",strip.white=TRUE,header=TRUE,flush=TRUE)[-1],-1), which is not different from what you did (in my line, you manage the first and third line and remove the undesired columns).
              – nicola
              Aug 26 at 8:13










              up vote
              5
              down vote













              md_table <- scan(text = "
              +------------+------+------+----------+--------------------------+
              | Date | Emp1 | Case | Priority | PriorityCountinLast7days |
              +------------+------+------+----------+--------------------------+
              | 2018-06-01 | A | A1 | 0 | 0 |
              | 2018-06-03 | A | A2 | 0 | 1 |
              | 2018-06-03 | A | A3 | 0 | 2 |
              | 2018-06-03 | A | A4 | 1 | 1 |
              | 2018-06-03 | A | A5 | 2 | 1 |
              | 2018-06-04 | A | A6 | 0 | 3 |
              | 2018-06-01 | B | B1 | 0 | 1 |
              | 2018-06-02 | B | B2 | 0 | 2 |
              | 2018-06-03 | B | B3 | 0 | 3 |
              +------------+------+------+----------+--------------------------+",
              what = "", sep = "", comment.char = "+", quiet = TRUE)

              ## it is clear that there are 5 columns
              mat <- matrix(md_table[md_table != "|"], ncol = 5, byrow = TRUE)
              # [,1] [,2] [,3] [,4] [,5]
              # [1,] "Date" "Emp1" "Case" "Priority" "PriorityCountinLast7days"
              # [2,] "2018-06-01" "A" "A1" "0" "0"
              # [3,] "2018-06-03" "A" "A2" "0" "1"
              # [4,] "2018-06-03" "A" "A3" "0" "2"
              # [5,] "2018-06-03" "A" "A4" "1" "1"
              # [6,] "2018-06-03" "A" "A5" "2" "1"
              # [7,] "2018-06-04" "A" "A6" "0" "3"
              # [8,] "2018-06-01" "B" "B1" "0" "1"
              # [9,] "2018-06-02" "B" "B2" "0" "2"
              #[10,] "2018-06-03" "B" "B3" "0" "3"




              ## a data frame with all character columns
              dat <- setNames(data.frame(mat[-1, ], stringsAsFactors = FALSE), mat[1, ])
              # Date Emp1 Case Priority PriorityCountinLast7days
              #1 2018-06-01 A A1 0 0
              #2 2018-06-03 A A2 0 1
              #3 2018-06-03 A A3 0 2
              #4 2018-06-03 A A4 1 1
              #5 2018-06-03 A A5 2 1
              #6 2018-06-04 A A6 0 3
              #7 2018-06-01 B B1 0 1
              #8 2018-06-02 B B2 0 2
              #9 2018-06-03 B B3 0 3




              ## or maybe just use `type.convert` on some columns?
              dat <- lapply(dat, type.convert)





              share|improve this answer


























                up vote
                5
                down vote













                md_table <- scan(text = "
                +------------+------+------+----------+--------------------------+
                | Date | Emp1 | Case | Priority | PriorityCountinLast7days |
                +------------+------+------+----------+--------------------------+
                | 2018-06-01 | A | A1 | 0 | 0 |
                | 2018-06-03 | A | A2 | 0 | 1 |
                | 2018-06-03 | A | A3 | 0 | 2 |
                | 2018-06-03 | A | A4 | 1 | 1 |
                | 2018-06-03 | A | A5 | 2 | 1 |
                | 2018-06-04 | A | A6 | 0 | 3 |
                | 2018-06-01 | B | B1 | 0 | 1 |
                | 2018-06-02 | B | B2 | 0 | 2 |
                | 2018-06-03 | B | B3 | 0 | 3 |
                +------------+------+------+----------+--------------------------+",
                what = "", sep = "", comment.char = "+", quiet = TRUE)

                ## it is clear that there are 5 columns
                mat <- matrix(md_table[md_table != "|"], ncol = 5, byrow = TRUE)
                # [,1] [,2] [,3] [,4] [,5]
                # [1,] "Date" "Emp1" "Case" "Priority" "PriorityCountinLast7days"
                # [2,] "2018-06-01" "A" "A1" "0" "0"
                # [3,] "2018-06-03" "A" "A2" "0" "1"
                # [4,] "2018-06-03" "A" "A3" "0" "2"
                # [5,] "2018-06-03" "A" "A4" "1" "1"
                # [6,] "2018-06-03" "A" "A5" "2" "1"
                # [7,] "2018-06-04" "A" "A6" "0" "3"
                # [8,] "2018-06-01" "B" "B1" "0" "1"
                # [9,] "2018-06-02" "B" "B2" "0" "2"
                #[10,] "2018-06-03" "B" "B3" "0" "3"




                ## a data frame with all character columns
                dat <- setNames(data.frame(mat[-1, ], stringsAsFactors = FALSE), mat[1, ])
                # Date Emp1 Case Priority PriorityCountinLast7days
                #1 2018-06-01 A A1 0 0
                #2 2018-06-03 A A2 0 1
                #3 2018-06-03 A A3 0 2
                #4 2018-06-03 A A4 1 1
                #5 2018-06-03 A A5 2 1
                #6 2018-06-04 A A6 0 3
                #7 2018-06-01 B B1 0 1
                #8 2018-06-02 B B2 0 2
                #9 2018-06-03 B B3 0 3




                ## or maybe just use `type.convert` on some columns?
                dat <- lapply(dat, type.convert)





                share|improve this answer
























                  up vote
                  5
                  down vote










                  up vote
                  5
                  down vote









                  md_table <- scan(text = "
                  +------------+------+------+----------+--------------------------+
                  | Date | Emp1 | Case | Priority | PriorityCountinLast7days |
                  +------------+------+------+----------+--------------------------+
                  | 2018-06-01 | A | A1 | 0 | 0 |
                  | 2018-06-03 | A | A2 | 0 | 1 |
                  | 2018-06-03 | A | A3 | 0 | 2 |
                  | 2018-06-03 | A | A4 | 1 | 1 |
                  | 2018-06-03 | A | A5 | 2 | 1 |
                  | 2018-06-04 | A | A6 | 0 | 3 |
                  | 2018-06-01 | B | B1 | 0 | 1 |
                  | 2018-06-02 | B | B2 | 0 | 2 |
                  | 2018-06-03 | B | B3 | 0 | 3 |
                  +------------+------+------+----------+--------------------------+",
                  what = "", sep = "", comment.char = "+", quiet = TRUE)

                  ## it is clear that there are 5 columns
                  mat <- matrix(md_table[md_table != "|"], ncol = 5, byrow = TRUE)
                  # [,1] [,2] [,3] [,4] [,5]
                  # [1,] "Date" "Emp1" "Case" "Priority" "PriorityCountinLast7days"
                  # [2,] "2018-06-01" "A" "A1" "0" "0"
                  # [3,] "2018-06-03" "A" "A2" "0" "1"
                  # [4,] "2018-06-03" "A" "A3" "0" "2"
                  # [5,] "2018-06-03" "A" "A4" "1" "1"
                  # [6,] "2018-06-03" "A" "A5" "2" "1"
                  # [7,] "2018-06-04" "A" "A6" "0" "3"
                  # [8,] "2018-06-01" "B" "B1" "0" "1"
                  # [9,] "2018-06-02" "B" "B2" "0" "2"
                  #[10,] "2018-06-03" "B" "B3" "0" "3"




                  ## a data frame with all character columns
                  dat <- setNames(data.frame(mat[-1, ], stringsAsFactors = FALSE), mat[1, ])
                  # Date Emp1 Case Priority PriorityCountinLast7days
                  #1 2018-06-01 A A1 0 0
                  #2 2018-06-03 A A2 0 1
                  #3 2018-06-03 A A3 0 2
                  #4 2018-06-03 A A4 1 1
                  #5 2018-06-03 A A5 2 1
                  #6 2018-06-04 A A6 0 3
                  #7 2018-06-01 B B1 0 1
                  #8 2018-06-02 B B2 0 2
                  #9 2018-06-03 B B3 0 3




                  ## or maybe just use `type.convert` on some columns?
                  dat <- lapply(dat, type.convert)





                  share|improve this answer














                  md_table <- scan(text = "
                  +------------+------+------+----------+--------------------------+
                  | Date | Emp1 | Case | Priority | PriorityCountinLast7days |
                  +------------+------+------+----------+--------------------------+
                  | 2018-06-01 | A | A1 | 0 | 0 |
                  | 2018-06-03 | A | A2 | 0 | 1 |
                  | 2018-06-03 | A | A3 | 0 | 2 |
                  | 2018-06-03 | A | A4 | 1 | 1 |
                  | 2018-06-03 | A | A5 | 2 | 1 |
                  | 2018-06-04 | A | A6 | 0 | 3 |
                  | 2018-06-01 | B | B1 | 0 | 1 |
                  | 2018-06-02 | B | B2 | 0 | 2 |
                  | 2018-06-03 | B | B3 | 0 | 3 |
                  +------------+------+------+----------+--------------------------+",
                  what = "", sep = "", comment.char = "+", quiet = TRUE)

                  ## it is clear that there are 5 columns
                  mat <- matrix(md_table[md_table != "|"], ncol = 5, byrow = TRUE)
                  # [,1] [,2] [,3] [,4] [,5]
                  # [1,] "Date" "Emp1" "Case" "Priority" "PriorityCountinLast7days"
                  # [2,] "2018-06-01" "A" "A1" "0" "0"
                  # [3,] "2018-06-03" "A" "A2" "0" "1"
                  # [4,] "2018-06-03" "A" "A3" "0" "2"
                  # [5,] "2018-06-03" "A" "A4" "1" "1"
                  # [6,] "2018-06-03" "A" "A5" "2" "1"
                  # [7,] "2018-06-04" "A" "A6" "0" "3"
                  # [8,] "2018-06-01" "B" "B1" "0" "1"
                  # [9,] "2018-06-02" "B" "B2" "0" "2"
                  #[10,] "2018-06-03" "B" "B3" "0" "3"




                  ## a data frame with all character columns
                  dat <- setNames(data.frame(mat[-1, ], stringsAsFactors = FALSE), mat[1, ])
                  # Date Emp1 Case Priority PriorityCountinLast7days
                  #1 2018-06-01 A A1 0 0
                  #2 2018-06-03 A A2 0 1
                  #3 2018-06-03 A A3 0 2
                  #4 2018-06-03 A A4 1 1
                  #5 2018-06-03 A A5 2 1
                  #6 2018-06-04 A A6 0 3
                  #7 2018-06-01 B B1 0 1
                  #8 2018-06-02 B B2 0 2
                  #9 2018-06-03 B B3 0 3




                  ## or maybe just use `type.convert` on some columns?
                  dat <- lapply(dat, type.convert)






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Aug 26 at 8:05

























                  answered Aug 26 at 6:32









                  李哲源

                  44.4k1481128




                  44.4k1481128




















                      up vote
                      3
                      down vote













                      Well, about this specific dataset I used the import feature in RStudio, but I took one additional step beforehand.



                      1. Copy the dataset into the Notepad file.

                      2. Replace all | characters with ,


                      3. Import the Notepad file using read.csv to RStudio using this code (seperate columns by ,).

                      But, if you mean use the R to fully understand it in one step, then I have no idea.






                      share|improve this answer






















                      • Notepad? Are you assuming Microsoft Windows?
                        – Peter Mortensen
                        Aug 26 at 11:16










                      • @PeterMortensen Yes, but it could be any other editor as well.
                        – Salman Lashkarara
                        Aug 26 at 11:26














                      up vote
                      3
                      down vote













                      Well, about this specific dataset I used the import feature in RStudio, but I took one additional step beforehand.



                      1. Copy the dataset into the Notepad file.

                      2. Replace all | characters with ,


                      3. Import the Notepad file using read.csv to RStudio using this code (seperate columns by ,).

                      But, if you mean use the R to fully understand it in one step, then I have no idea.






                      share|improve this answer






















                      • Notepad? Are you assuming Microsoft Windows?
                        – Peter Mortensen
                        Aug 26 at 11:16










                      • @PeterMortensen Yes, but it could be any other editor as well.
                        – Salman Lashkarara
                        Aug 26 at 11:26












                      up vote
                      3
                      down vote










                      up vote
                      3
                      down vote









                      Well, about this specific dataset I used the import feature in RStudio, but I took one additional step beforehand.



                      1. Copy the dataset into the Notepad file.

                      2. Replace all | characters with ,


                      3. Import the Notepad file using read.csv to RStudio using this code (seperate columns by ,).

                      But, if you mean use the R to fully understand it in one step, then I have no idea.






                      share|improve this answer














                      Well, about this specific dataset I used the import feature in RStudio, but I took one additional step beforehand.



                      1. Copy the dataset into the Notepad file.

                      2. Replace all | characters with ,


                      3. Import the Notepad file using read.csv to RStudio using this code (seperate columns by ,).

                      But, if you mean use the R to fully understand it in one step, then I have no idea.







                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited Aug 26 at 11:15









                      Peter Mortensen

                      13k1983111




                      13k1983111










                      answered Aug 26 at 6:38









                      Salman Lashkarara

                      4,3991454104




                      4,3991454104











                      • Notepad? Are you assuming Microsoft Windows?
                        – Peter Mortensen
                        Aug 26 at 11:16










                      • @PeterMortensen Yes, but it could be any other editor as well.
                        – Salman Lashkarara
                        Aug 26 at 11:26
















                      • Notepad? Are you assuming Microsoft Windows?
                        – Peter Mortensen
                        Aug 26 at 11:16










                      • @PeterMortensen Yes, but it could be any other editor as well.
                        – Salman Lashkarara
                        Aug 26 at 11:26















                      Notepad? Are you assuming Microsoft Windows?
                      – Peter Mortensen
                      Aug 26 at 11:16




                      Notepad? Are you assuming Microsoft Windows?
                      – Peter Mortensen
                      Aug 26 at 11:16












                      @PeterMortensen Yes, but it could be any other editor as well.
                      – Salman Lashkarara
                      Aug 26 at 11:26




                      @PeterMortensen Yes, but it could be any other editor as well.
                      – Salman Lashkarara
                      Aug 26 at 11:26










                      up vote
                      -2
                      down vote













                      As it was suggested, you could use dput to save the content of a dataframe to a file, open the file in a text editor and paste its content. An example of mtcar's dataset limited to first 10 rows:



                      dput(mtcars %>% head(10), file = 'reproducible.txt')


                      The content of reproducible.txt can be used to make a dataframe/tibble as shown below. In such a case data the format is machine readable, but it is hard to be undestood by human at first glance (without pasting into R).



                      df <- structure(list(mpg = c(21, 21, 22.8, 21.4, 18.7, 18.1, 14.3,
                      24.4, 22.8, 19.2), cyl = c(6, 6, 4, 6, 8, 6, 8, 4, 4, 6), disp = c(160,
                      160, 108, 258, 360, 225, 360, 146.7, 140.8, 167.6), hp = c(110,
                      110, 93, 110, 175, 105, 245, 62, 95, 123), drat = c(3.9, 3.9,
                      3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92), wt = c(2.62,
                      2.875, 2.32, 3.215, 3.44, 3.46, 3.57, 3.19, 3.15, 3.44), qsec = c(16.46,
                      17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20, 22.9, 18.3), vs = c(0,
                      0, 1, 1, 0, 1, 0, 1, 1, 1), am = c(1, 1, 1, 0, 0, 0, 0, 0, 0,
                      0), gear = c(4, 4, 4, 3, 3, 3, 3, 4, 4, 4), carb = c(4, 4, 1,
                      1, 2, 1, 4, 2, 2, 4)), .Names = c("mpg", "cyl", "disp", "hp",
                      "drat", "wt", "qsec", "vs", "am", "gear", "carb"), row.names = c("Mazda RX4",
                      "Mazda RX4 Wag", "Datsun 710", "Hornet 4 Drive", "Hornet Sportabout",
                      "Valiant", "Duster 360", "Merc 240D", "Merc 230", "Merc 280"), class = "data.frame")





                      share|improve this answer


















                      • 4




                        Thanks for answering, but I think you have not understood what I am asking. In the case of a dataframe already existing in an R session, there is no problem, the problem is to have a table written in a text document with plus signs, dashes an pipe signs. How can that type of table be read in by R?
                        – Rui Barradas
                        Aug 26 at 6:53














                      up vote
                      -2
                      down vote













                      As it was suggested, you could use dput to save the content of a dataframe to a file, open the file in a text editor and paste its content. An example of mtcar's dataset limited to first 10 rows:



                      dput(mtcars %>% head(10), file = 'reproducible.txt')


                      The content of reproducible.txt can be used to make a dataframe/tibble as shown below. In such a case data the format is machine readable, but it is hard to be undestood by human at first glance (without pasting into R).



                      df <- structure(list(mpg = c(21, 21, 22.8, 21.4, 18.7, 18.1, 14.3,
                      24.4, 22.8, 19.2), cyl = c(6, 6, 4, 6, 8, 6, 8, 4, 4, 6), disp = c(160,
                      160, 108, 258, 360, 225, 360, 146.7, 140.8, 167.6), hp = c(110,
                      110, 93, 110, 175, 105, 245, 62, 95, 123), drat = c(3.9, 3.9,
                      3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92), wt = c(2.62,
                      2.875, 2.32, 3.215, 3.44, 3.46, 3.57, 3.19, 3.15, 3.44), qsec = c(16.46,
                      17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20, 22.9, 18.3), vs = c(0,
                      0, 1, 1, 0, 1, 0, 1, 1, 1), am = c(1, 1, 1, 0, 0, 0, 0, 0, 0,
                      0), gear = c(4, 4, 4, 3, 3, 3, 3, 4, 4, 4), carb = c(4, 4, 1,
                      1, 2, 1, 4, 2, 2, 4)), .Names = c("mpg", "cyl", "disp", "hp",
                      "drat", "wt", "qsec", "vs", "am", "gear", "carb"), row.names = c("Mazda RX4",
                      "Mazda RX4 Wag", "Datsun 710", "Hornet 4 Drive", "Hornet Sportabout",
                      "Valiant", "Duster 360", "Merc 240D", "Merc 230", "Merc 280"), class = "data.frame")





                      share|improve this answer


















                      • 4




                        Thanks for answering, but I think you have not understood what I am asking. In the case of a dataframe already existing in an R session, there is no problem, the problem is to have a table written in a text document with plus signs, dashes an pipe signs. How can that type of table be read in by R?
                        – Rui Barradas
                        Aug 26 at 6:53












                      up vote
                      -2
                      down vote










                      up vote
                      -2
                      down vote









                      As it was suggested, you could use dput to save the content of a dataframe to a file, open the file in a text editor and paste its content. An example of mtcar's dataset limited to first 10 rows:



                      dput(mtcars %>% head(10), file = 'reproducible.txt')


                      The content of reproducible.txt can be used to make a dataframe/tibble as shown below. In such a case data the format is machine readable, but it is hard to be undestood by human at first glance (without pasting into R).



                      df <- structure(list(mpg = c(21, 21, 22.8, 21.4, 18.7, 18.1, 14.3,
                      24.4, 22.8, 19.2), cyl = c(6, 6, 4, 6, 8, 6, 8, 4, 4, 6), disp = c(160,
                      160, 108, 258, 360, 225, 360, 146.7, 140.8, 167.6), hp = c(110,
                      110, 93, 110, 175, 105, 245, 62, 95, 123), drat = c(3.9, 3.9,
                      3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92), wt = c(2.62,
                      2.875, 2.32, 3.215, 3.44, 3.46, 3.57, 3.19, 3.15, 3.44), qsec = c(16.46,
                      17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20, 22.9, 18.3), vs = c(0,
                      0, 1, 1, 0, 1, 0, 1, 1, 1), am = c(1, 1, 1, 0, 0, 0, 0, 0, 0,
                      0), gear = c(4, 4, 4, 3, 3, 3, 3, 4, 4, 4), carb = c(4, 4, 1,
                      1, 2, 1, 4, 2, 2, 4)), .Names = c("mpg", "cyl", "disp", "hp",
                      "drat", "wt", "qsec", "vs", "am", "gear", "carb"), row.names = c("Mazda RX4",
                      "Mazda RX4 Wag", "Datsun 710", "Hornet 4 Drive", "Hornet Sportabout",
                      "Valiant", "Duster 360", "Merc 240D", "Merc 230", "Merc 280"), class = "data.frame")





                      share|improve this answer














                      As it was suggested, you could use dput to save the content of a dataframe to a file, open the file in a text editor and paste its content. An example of mtcar's dataset limited to first 10 rows:



                      dput(mtcars %>% head(10), file = 'reproducible.txt')


                      The content of reproducible.txt can be used to make a dataframe/tibble as shown below. In such a case data the format is machine readable, but it is hard to be undestood by human at first glance (without pasting into R).



                      df <- structure(list(mpg = c(21, 21, 22.8, 21.4, 18.7, 18.1, 14.3,
                      24.4, 22.8, 19.2), cyl = c(6, 6, 4, 6, 8, 6, 8, 4, 4, 6), disp = c(160,
                      160, 108, 258, 360, 225, 360, 146.7, 140.8, 167.6), hp = c(110,
                      110, 93, 110, 175, 105, 245, 62, 95, 123), drat = c(3.9, 3.9,
                      3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92), wt = c(2.62,
                      2.875, 2.32, 3.215, 3.44, 3.46, 3.57, 3.19, 3.15, 3.44), qsec = c(16.46,
                      17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20, 22.9, 18.3), vs = c(0,
                      0, 1, 1, 0, 1, 0, 1, 1, 1), am = c(1, 1, 1, 0, 0, 0, 0, 0, 0,
                      0), gear = c(4, 4, 4, 3, 3, 3, 3, 4, 4, 4), carb = c(4, 4, 1,
                      1, 2, 1, 4, 2, 2, 4)), .Names = c("mpg", "cyl", "disp", "hp",
                      "drat", "wt", "qsec", "vs", "am", "gear", "carb"), row.names = c("Mazda RX4",
                      "Mazda RX4 Wag", "Datsun 710", "Hornet 4 Drive", "Hornet Sportabout",
                      "Valiant", "Duster 360", "Merc 240D", "Merc 230", "Merc 280"), class = "data.frame")






                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited Aug 26 at 11:19









                      Peter Mortensen

                      13k1983111




                      13k1983111










                      answered Aug 26 at 6:39









                      Pawel Stradowski

                      8617




                      8617







                      • 4




                        Thanks for answering, but I think you have not understood what I am asking. In the case of a dataframe already existing in an R session, there is no problem, the problem is to have a table written in a text document with plus signs, dashes an pipe signs. How can that type of table be read in by R?
                        – Rui Barradas
                        Aug 26 at 6:53












                      • 4




                        Thanks for answering, but I think you have not understood what I am asking. In the case of a dataframe already existing in an R session, there is no problem, the problem is to have a table written in a text document with plus signs, dashes an pipe signs. How can that type of table be read in by R?
                        – Rui Barradas
                        Aug 26 at 6:53







                      4




                      4




                      Thanks for answering, but I think you have not understood what I am asking. In the case of a dataframe already existing in an R session, there is no problem, the problem is to have a table written in a text document with plus signs, dashes an pipe signs. How can that type of table be read in by R?
                      – Rui Barradas
                      Aug 26 at 6:53




                      Thanks for answering, but I think you have not understood what I am asking. In the case of a dataframe already existing in an R session, there is no problem, the problem is to have a table written in a text document with plus signs, dashes an pipe signs. How can that type of table be read in by R?
                      – Rui Barradas
                      Aug 26 at 6:53

















                       

                      draft saved


                      draft discarded















































                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52023709%2fwhat-can-r-do-about-a-messy-data-format%23new-answer', 'question_page');

                      );

                      Post as a guest













































































                      Popular posts from this blog

                      How to check contact read email or not when send email to Individual?

                      Displaying single band from multi-band raster using QGIS

                      How many registers does an x86_64 CPU actually have?