What can R do about a messy data format?

up vote
31
down vote

favorite

Sometimes I see data posted in a StackÂ Overflow question formatted like in this question. This is not the first time, so I have decided to ask a question about it, and answer the question with a way to make the posted data palatable.

I will post the dataset example here just in case the question is deleted.

+------------+------+------+----------+--------------------------+
| Date | Emp1 | Case | Priority | PriorityCountinLast7days |
+------------+------+------+----------+--------------------------+
| 2018-06-01 | A | A1 | 0 | 0 |
| 2018-06-03 | A | A2 | 0 | 1 |
| 2018-06-03 | A | A3 | 0 | 2 |
| 2018-06-03 | A | A4 | 1 | 1 |
| 2018-06-03 | A | A5 | 2 | 1 |
| 2018-06-04 | A | A6 | 0 | 3 |
| 2018-06-01 | B | B1 | 0 | 1 |
| 2018-06-02 | B | B2 | 0 | 2 |
| 2018-06-03 | B | B3 | 0 | 3 |
+------------+------+------+----------+--------------------------+

As you can see this is not the right way to post data. As a user wrote in a comment,

It must've taken a bit of time to format the data the way you're
showing it here. Unfortunately this is not a good format for us to
copy & paste.

I believe this says it all. The asker is well intended and it took some work and time to try to be nice, but the result is not good.

What can R code do to make that table usable, if anything? Will it take a great deal of trouble?

edited Aug 26 at 11:13

Peter Mortensen

13k1983111

asked Aug 26 at 6:21

Rui Barradas

12.4k31628

This is an interesting question by itself. However, I fear that a good answer might encourage people in presenting their dataset like that.
â€“Â nicola
Aug 26 at 6:39

@nicola I surely hope not! It's already bad to see it every now and then, maybe it will have the opposite effect :).
â€“Â Rui Barradas
Aug 26 at 6:48

I would enjoy a general smart messy format reader for SO. that would read those, and printed data.frames/ data.tables / tibbles, commented or not, with row numbers or not, recognize dates, and ideally try to guess a way to sort out strings containing spaces, I thought of building it in fact but never got around it.
â€“Â Moody_Mudskipper
Aug 29 at 20:25

add a commentÂ |Â

up vote
31
down vote

favorite

I will post the dataset example here just in case the question is deleted.

+------------+------+------+----------+--------------------------+
| Date | Emp1 | Case | Priority | PriorityCountinLast7days |
+------------+------+------+----------+--------------------------+
| 2018-06-01 | A | A1 | 0 | 0 |
| 2018-06-03 | A | A2 | 0 | 1 |
| 2018-06-03 | A | A3 | 0 | 2 |
| 2018-06-03 | A | A4 | 1 | 1 |
| 2018-06-03 | A | A5 | 2 | 1 |
| 2018-06-04 | A | A6 | 0 | 3 |
| 2018-06-01 | B | B1 | 0 | 1 |
| 2018-06-02 | B | B2 | 0 | 2 |
| 2018-06-03 | B | B3 | 0 | 3 |
+------------+------+------+----------+--------------------------+

As you can see this is not the right way to post data. As a user wrote in a comment,

It must've taken a bit of time to format the data the way you're
showing it here. Unfortunately this is not a good format for us to
copy & paste.

I believe this says it all. The asker is well intended and it took some work and time to try to be nice, but the result is not good.

What can R code do to make that table usable, if anything? Will it take a great deal of trouble?

edited Aug 26 at 11:13

Peter Mortensen

13k1983111

asked Aug 26 at 6:21

Rui Barradas

12.4k31628

This is an interesting question by itself. However, I fear that a good answer might encourage people in presenting their dataset like that.
â€“Â nicola
Aug 26 at 6:39

@nicola I surely hope not! It's already bad to see it every now and then, maybe it will have the opposite effect :).
â€“Â Rui Barradas
Aug 26 at 6:48

I would enjoy a general smart messy format reader for SO. that would read those, and printed data.frames/ data.tables / tibbles, commented or not, with row numbers or not, recognize dates, and ideally try to guess a way to sort out strings containing spaces, I thought of building it in fact but never got around it.
â€“Â Moody_Mudskipper
Aug 29 at 20:25

add a commentÂ |Â

up vote
31
down vote

favorite

I will post the dataset example here just in case the question is deleted.

+------------+------+------+----------+--------------------------+
| Date | Emp1 | Case | Priority | PriorityCountinLast7days |
+------------+------+------+----------+--------------------------+
| 2018-06-01 | A | A1 | 0 | 0 |
| 2018-06-03 | A | A2 | 0 | 1 |
| 2018-06-03 | A | A3 | 0 | 2 |
| 2018-06-03 | A | A4 | 1 | 1 |
| 2018-06-03 | A | A5 | 2 | 1 |
| 2018-06-04 | A | A6 | 0 | 3 |
| 2018-06-01 | B | B1 | 0 | 1 |
| 2018-06-02 | B | B2 | 0 | 2 |
| 2018-06-03 | B | B3 | 0 | 3 |
+------------+------+------+----------+--------------------------+

As you can see this is not the right way to post data. As a user wrote in a comment,

It must've taken a bit of time to format the data the way you're
showing it here. Unfortunately this is not a good format for us to
copy & paste.

I believe this says it all. The asker is well intended and it took some work and time to try to be nice, but the result is not good.

What can R code do to make that table usable, if anything? Will it take a great deal of trouble?

edited Aug 26 at 11:13

Peter Mortensen

13k1983111

asked Aug 26 at 6:21

Rui Barradas

12.4k31628

I will post the dataset example here just in case the question is deleted.

+------------+------+------+----------+--------------------------+
| Date | Emp1 | Case | Priority | PriorityCountinLast7days |
+------------+------+------+----------+--------------------------+
| 2018-06-01 | A | A1 | 0 | 0 |
| 2018-06-03 | A | A2 | 0 | 1 |
| 2018-06-03 | A | A3 | 0 | 2 |
| 2018-06-03 | A | A4 | 1 | 1 |
| 2018-06-03 | A | A5 | 2 | 1 |
| 2018-06-04 | A | A6 | 0 | 3 |
| 2018-06-01 | B | B1 | 0 | 1 |
| 2018-06-02 | B | B2 | 0 | 2 |
| 2018-06-03 | B | B3 | 0 | 3 |
+------------+------+------+----------+--------------------------+

As you can see this is not the right way to post data. As a user wrote in a comment,

It must've taken a bit of time to format the data the way you're
showing it here. Unfortunately this is not a good format for us to
copy & paste.

I believe this says it all. The asker is well intended and it took some work and time to try to be nice, but the result is not good.

What can R code do to make that table usable, if anything? Will it take a great deal of trouble?

r dataframe

edited Aug 26 at 11:13

Peter Mortensen

13k1983111

asked Aug 26 at 6:21

Rui Barradas

12.4k31628

edited Aug 26 at 11:13

Peter Mortensen

13k1983111

asked Aug 26 at 6:21

Rui Barradas

12.4k31628

edited Aug 26 at 11:13

Peter Mortensen

13k1983111

edited Aug 26 at 11:13

Peter Mortensen

13k1983111

edited Aug 26 at 11:13

Peter Mortensen

13k1983111

asked Aug 26 at 6:21

Rui Barradas

12.4k31628

asked Aug 26 at 6:21

Rui Barradas

12.4k31628

asked Aug 26 at 6:21

Rui Barradas

12.4k31628

This is an interesting question by itself. However, I fear that a good answer might encourage people in presenting their dataset like that.
â€“Â nicola
Aug 26 at 6:39

@nicola I surely hope not! It's already bad to see it every now and then, maybe it will have the opposite effect :).
â€“Â Rui Barradas
Aug 26 at 6:48

I would enjoy a general smart messy format reader for SO. that would read those, and printed data.frames/ data.tables / tibbles, commented or not, with row numbers or not, recognize dates, and ideally try to guess a way to sort out strings containing spaces, I thought of building it in fact but never got around it.
â€“Â Moody_Mudskipper
Aug 29 at 20:25

add a commentÂ |Â

This is an interesting question by itself. However, I fear that a good answer might encourage people in presenting their dataset like that.
â€“Â nicola
Aug 26 at 6:39

@nicola I surely hope not! It's already bad to see it every now and then, maybe it will have the opposite effect :).
â€“Â Rui Barradas
Aug 26 at 6:48

I would enjoy a general smart messy format reader for SO. that would read those, and printed data.frames/ data.tables / tibbles, commented or not, with row numbers or not, recognize dates, and ideally try to guess a way to sort out strings containing spaces, I thought of building it in fact but never got around it.
â€“Â Moody_Mudskipper
Aug 29 at 20:25

This is an interesting question by itself. However, I fear that a good answer might encourage people in presenting their dataset like that.
â€“Â nicola
Aug 26 at 6:39

@nicola I surely hope not! It's already bad to see it every now and then, maybe it will have the opposite effect :).
â€“Â Rui Barradas
Aug 26 at 6:48

I would enjoy a general smart messy format reader for SO. that would read those, and printed data.frames/ data.tables / tibbles, commented or not, with row numbers or not, recognize dates, and ideally try to guess a way to sort out strings containing spaces, I thought of building it in fact but never got around it.
â€“Â Moody_Mudskipper
Aug 29 at 20:25

add a commentÂ |Â

5 Answers
5

active

oldest

votes

up vote
22
down vote

Using data.table::fread:

x = '
+------------+------+------+----------+--------------------------+
| Date | Emp1 | Case | Priority | PriorityCountinLast7days |
+------------+------+------+----------+--------------------------+
| 2018-06-01 | A | A1 | 0 | 0 |
| 2018-06-03 | A | A2 | 0 | 1 |
| 2018-06-03 | A | A3 | 0 | 2 |
| 2018-06-03 | A | A4 | 1 | 1 |
| 2018-06-03 | A | A5 | 2 | 1 |
| 2018-06-04 | A | A6 | 0 | 3 |
| 2018-06-01 | B | B1 | 0 | 1 |
| 2018-06-02 | B | B2 | 0 | 2 |
| 2018-06-03 | B | B3 | 0 | 3 |
+------------+------+------+----------+--------------------------+
'

fread(gsub('\+.+\n' ,'', x, perl = T), drop=c(1,7))

# Date Emp1 Case Priority PriorityCountinLast7days
# 1: 2018-06-01 A A1 0 0
# 2: 2018-06-03 A A2 0 1
# 3: 2018-06-03 A A3 0 2
# 4: 2018-06-03 A A4 1 1
# 5: 2018-06-03 A A5 2 1
# 6: 2018-06-04 A A6 0 3
# 7: 2018-06-01 B B1 0 1
# 8: 2018-06-02 B B2 0 2
# 9: 2018-06-03 B B3 0 3

The gsub part removes the horizontal rules. drop removes the extra columns caused by delimiters at the line ends.

edited Aug 26 at 18:59

answered Aug 26 at 7:42

dww

13.1k22550

add a commentÂ |Â

up vote
16
down vote

The short answer to the question is yes, R code can solve that mess and no, it doesn't take that much trouble.

The first step after copying & pasting the table into an R session is to read it in with read.table setting the header, sep, comment.char and strip.white arguments.

Credits for reminding me of arguments comment.char and strip.white go to @nicola, and his comment.

dat <- read.table(text = "
+------------+------+------+----------+--------------------------+
| Date | Emp1 | Case | Priority | PriorityCountinLast7days |
+------------+------+------+----------+--------------------------+
| 2018-06-01 | A | A1 | 0 | 0 |
| 2018-06-03 | A | A2 | 0 | 1 |
| 2018-06-03 | A | A3 | 0 | 2 |
| 2018-06-03 | A | A4 | 1 | 1 |
| 2018-06-03 | A | A5 | 2 | 1 |
| 2018-06-04 | A | A6 | 0 | 3 |
| 2018-06-01 | B | B1 | 0 | 1 |
| 2018-06-02 | B | B2 | 0 | 2 |
| 2018-06-03 | B | B3 | 0 | 3 |
+------------+------+------+----------+--------------------------+
", header = TRUE, sep = "|", comment.char = "+", strip.white = TRUE)

But as you can see there are some issues with the result.

dat
 X Date Emp1 Case Priority PriorityCountinLast7days X.1
1 NA 2018-06-01 A A1 0 0 NA
2 NA 2018-06-03 A A2 0 1 NA
3 NA 2018-06-03 A A3 0 2 NA
4 NA 2018-06-03 A A4 1 1 NA
5 NA 2018-06-03 A A5 2 1 NA
6 NA 2018-06-04 A A6 0 3 NA
7 NA 2018-06-01 B B1 0 1 NA
8 NA 2018-06-02 B B2 0 2 NA
9 NA 2018-06-03 B B3 0 3 NA

To have separators start and end each data row made R believe those separators mark extra columns, which is not what is meant by the original question's OP.

So the second step is to keep only the real columns. I will do this subsetting the columns by their numbers, easily done, they usually are the first and last columns.

dat <- dat[-c(1, ncol(dat))]
dat
 Date Emp1 Case Priority PriorityCountinLast7days
1 2018-06-01 A A1 0 0
2 2018-06-03 A A2 0 1
3 2018-06-03 A A3 0 2
4 2018-06-03 A A4 1 1
5 2018-06-03 A A5 2 1
6 2018-06-04 A A6 0 3
7 2018-06-01 B B1 0 1
8 2018-06-02 B B2 0 2
9 2018-06-03 B B3 0 3

That wasn't too hard, much better.

In this case there is still a problem, to coerce column Date to class Date.

dat$Date <- as.Date(dat$Date)

And the result is satisfactory.

str(dat)
'data.frame': 9 obs. of 5 variables:
 $ Date : Date, format: "2018-06-01" "2018-06-03" ...
 $ Emp1 : Factor w/ 2 levels "A","B": 1 1 1 1 1 1 2 2 2
 $ Case : Factor w/ 9 levels "A1","A2","A3",..: 1 2 3 4 5 6 7 8 9
 $ Priority : int 0 0 0 1 2 0 0 0 0
 $ PriorityCountinLast7days: int 0 1 2 1 1 3 1 2 3

Note that I have not set the more or less standard argument stringsAsFactors = FALSE. If needed, this should be done when running read.table.

The whole process took only 3 lines of base R code.

Finally, the end result in dput format, like it should be in the first place.

dat <-
structure(list(Date = structure(c(17683, 17685, 17685, 17685, 
17685, 17686, 17683, 17684, 17685), class = "Date"), Emp1 = c("A", 
"A", "A", "A", "A", "A", "B", "B", "B"), Case = c("A1", "A2", 
"A3", "A4", "A5", "A6", "B1", "B2", "B3"), Priority = c(0, 0, 
0, 1, 2, 0, 0, 0, 0), PriorityCountinLast7days = c(0, 1, 2, 1, 
1, 3, 1, 2, 3)), row.names = c(NA, -9L), class = "data.frame")

edited Aug 27 at 2:01

answered Aug 26 at 6:41

Rui Barradas

12.4k31628

1

@Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ I don't know about "safer", I find it more natural to use read table functions since after all that does look like a table.
â€“Â Rui Barradas
Aug 26 at 6:46

1

I was about to post utils:::head.default(read.table("clipboard",comment.char="+",sep="|",strip.white=TRUE,header=TRUE,flush=TRUE)[-1],-1), which is not different from what you did (in my line, you manage the first and third line and remove the undesired columns).
â€“Â nicola
Aug 26 at 8:13

add a commentÂ |Â

up vote
5
down vote

md_table <- scan(text = "
+------------+------+------+----------+--------------------------+
| Date | Emp1 | Case | Priority | PriorityCountinLast7days |
+------------+------+------+----------+--------------------------+
| 2018-06-01 | A | A1 | 0 | 0 |
| 2018-06-03 | A | A2 | 0 | 1 |
| 2018-06-03 | A | A3 | 0 | 2 |
| 2018-06-03 | A | A4 | 1 | 1 |
| 2018-06-03 | A | A5 | 2 | 1 |
| 2018-06-04 | A | A6 | 0 | 3 |
| 2018-06-01 | B | B1 | 0 | 1 |
| 2018-06-02 | B | B2 | 0 | 2 |
| 2018-06-03 | B | B3 | 0 | 3 |
+------------+------+------+----------+--------------------------+",
what = "", sep = "", comment.char = "+", quiet = TRUE)

## it is clear that there are 5 columns
mat <- matrix(md_table[md_table != "|"], ncol = 5, byrow = TRUE)
# [,1] [,2] [,3] [,4] [,5] 
# [1,] "Date" "Emp1" "Case" "Priority" "PriorityCountinLast7days"
# [2,] "2018-06-01" "A" "A1" "0" "0" 
# [3,] "2018-06-03" "A" "A2" "0" "1" 
# [4,] "2018-06-03" "A" "A3" "0" "2" 
# [5,] "2018-06-03" "A" "A4" "1" "1" 
# [6,] "2018-06-03" "A" "A5" "2" "1" 
# [7,] "2018-06-04" "A" "A6" "0" "3" 
# [8,] "2018-06-01" "B" "B1" "0" "1" 
# [9,] "2018-06-02" "B" "B2" "0" "2" 
#[10,] "2018-06-03" "B" "B3" "0" "3"

## a data frame with all character columns
dat <- setNames(data.frame(mat[-1, ], stringsAsFactors = FALSE), mat[1, ])
# Date Emp1 Case Priority PriorityCountinLast7days
#1 2018-06-01 A A1 0 0
#2 2018-06-03 A A2 0 1
#3 2018-06-03 A A3 0 2
#4 2018-06-03 A A4 1 1
#5 2018-06-03 A A5 2 1
#6 2018-06-04 A A6 0 3
#7 2018-06-01 B B1 0 1
#8 2018-06-02 B B2 0 2
#9 2018-06-03 B B3 0 3

## or maybe just use `type.convert` on some columns?
dat <- lapply(dat, type.convert)

edited Aug 26 at 8:05

answered Aug 26 at 6:32

Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ

44.4k1481128

add a commentÂ |Â

up vote
3
down vote

Well, about this specific dataset I used the import feature in RStudio, but I took one additional step beforehand.

Copy the dataset into the Notepad file.

Replace all | characters with ,

Import the Notepad file using read.csv to RStudio using this code (seperate columns by ,).

But, if you mean use the R to fully understand it in one step, then I have no idea.

edited Aug 26 at 11:15

Peter Mortensen

13k1983111

answered Aug 26 at 6:38

Salman Lashkarara

4,3991454104

Notepad? Are you assuming Microsoft Windows?
â€“Â Peter Mortensen
Aug 26 at 11:16

@PeterMortensen Yes, but it could be any other editor as well.
â€“Â Salman Lashkarara
Aug 26 at 11:26

add a commentÂ |Â

up vote
-2
down vote

As it was suggested, you could use dput to save the content of a dataframe to a file, open the file in a text editor and paste its content. An example of mtcar's dataset limited to first 10 rows:

dput(mtcars %>% head(10), file = 'reproducible.txt')

The content of reproducible.txt can be used to make a dataframe/tibble as shown below. In such a case data the format is machine readable, but it is hard to be undestood by human at first glance (without pasting into R).

df <- structure(list(mpg = c(21, 21, 22.8, 21.4, 18.7, 18.1, 14.3,
24.4, 22.8, 19.2), cyl = c(6, 6, 4, 6, 8, 6, 8, 4, 4, 6), disp = c(160,
160, 108, 258, 360, 225, 360, 146.7, 140.8, 167.6), hp = c(110,
110, 93, 110, 175, 105, 245, 62, 95, 123), drat = c(3.9, 3.9,
3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92), wt = c(2.62,
2.875, 2.32, 3.215, 3.44, 3.46, 3.57, 3.19, 3.15, 3.44), qsec = c(16.46,
17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20, 22.9, 18.3), vs = c(0,
0, 1, 1, 0, 1, 0, 1, 1, 1), am = c(1, 1, 1, 0, 0, 0, 0, 0, 0,
0), gear = c(4, 4, 4, 3, 3, 3, 3, 4, 4, 4), carb = c(4, 4, 1,
1, 2, 1, 4, 2, 2, 4)), .Names = c("mpg", "cyl", "disp", "hp",
"drat", "wt", "qsec", "vs", "am", "gear", "carb"), row.names = c("Mazda RX4",
"Mazda RX4 Wag", "Datsun 710", "Hornet 4 Drive", "Hornet Sportabout",
"Valiant", "Duster 360", "Merc 240D", "Merc 230", "Merc 280"), class = "data.frame")

edited Aug 26 at 11:19

Peter Mortensen

13k1983111

answered Aug 26 at 6:39

Pawel Stradowski

8617

4

Thanks for answering, but I think you have not understood what I am asking. In the case of a dataframe already existing in an R session, there is no problem, the problem is to have a table written in a text document with plus signs, dashes an pipe signs. How can that type of table be read in by R?
â€“Â Rui Barradas
Aug 26 at 6:53

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52023709%2fwhat-can-r-do-about-a-messy-data-format%23new-answer', 'question_page');

);

Post as a guest

Name

5 Answers
5

active

oldest

votes

5 Answers
5

active

oldest

votes

up vote
22
down vote

Using data.table::fread:

x = '
+------------+------+------+----------+--------------------------+
| Date | Emp1 | Case | Priority | PriorityCountinLast7days |
+------------+------+------+----------+--------------------------+
| 2018-06-01 | A | A1 | 0 | 0 |
| 2018-06-03 | A | A2 | 0 | 1 |
| 2018-06-03 | A | A3 | 0 | 2 |
| 2018-06-03 | A | A4 | 1 | 1 |
| 2018-06-03 | A | A5 | 2 | 1 |
| 2018-06-04 | A | A6 | 0 | 3 |
| 2018-06-01 | B | B1 | 0 | 1 |
| 2018-06-02 | B | B2 | 0 | 2 |
| 2018-06-03 | B | B3 | 0 | 3 |
+------------+------+------+----------+--------------------------+
'

fread(gsub('\+.+\n' ,'', x, perl = T), drop=c(1,7))

# Date Emp1 Case Priority PriorityCountinLast7days
# 1: 2018-06-01 A A1 0 0
# 2: 2018-06-03 A A2 0 1
# 3: 2018-06-03 A A3 0 2
# 4: 2018-06-03 A A4 1 1
# 5: 2018-06-03 A A5 2 1
# 6: 2018-06-04 A A6 0 3
# 7: 2018-06-01 B B1 0 1
# 8: 2018-06-02 B B2 0 2
# 9: 2018-06-03 B B3 0 3

The gsub part removes the horizontal rules. drop removes the extra columns caused by delimiters at the line ends.

edited Aug 26 at 18:59

answered Aug 26 at 7:42

dww

13.1k22550

add a commentÂ |Â

up vote
22
down vote

Using data.table::fread:

x = '
+------------+------+------+----------+--------------------------+
| Date | Emp1 | Case | Priority | PriorityCountinLast7days |
+------------+------+------+----------+--------------------------+
| 2018-06-01 | A | A1 | 0 | 0 |
| 2018-06-03 | A | A2 | 0 | 1 |
| 2018-06-03 | A | A3 | 0 | 2 |
| 2018-06-03 | A | A4 | 1 | 1 |
| 2018-06-03 | A | A5 | 2 | 1 |
| 2018-06-04 | A | A6 | 0 | 3 |
| 2018-06-01 | B | B1 | 0 | 1 |
| 2018-06-02 | B | B2 | 0 | 2 |
| 2018-06-03 | B | B3 | 0 | 3 |
+------------+------+------+----------+--------------------------+
'

fread(gsub('\+.+\n' ,'', x, perl = T), drop=c(1,7))

# Date Emp1 Case Priority PriorityCountinLast7days
# 1: 2018-06-01 A A1 0 0
# 2: 2018-06-03 A A2 0 1
# 3: 2018-06-03 A A3 0 2
# 4: 2018-06-03 A A4 1 1
# 5: 2018-06-03 A A5 2 1
# 6: 2018-06-04 A A6 0 3
# 7: 2018-06-01 B B1 0 1
# 8: 2018-06-02 B B2 0 2
# 9: 2018-06-03 B B3 0 3

The gsub part removes the horizontal rules. drop removes the extra columns caused by delimiters at the line ends.

edited Aug 26 at 18:59

answered Aug 26 at 7:42

dww

13.1k22550

add a commentÂ |Â

up vote
22
down vote

Using data.table::fread:

x = '
+------------+------+------+----------+--------------------------+
| Date | Emp1 | Case | Priority | PriorityCountinLast7days |
+------------+------+------+----------+--------------------------+
| 2018-06-01 | A | A1 | 0 | 0 |
| 2018-06-03 | A | A2 | 0 | 1 |
| 2018-06-03 | A | A3 | 0 | 2 |
| 2018-06-03 | A | A4 | 1 | 1 |
| 2018-06-03 | A | A5 | 2 | 1 |
| 2018-06-04 | A | A6 | 0 | 3 |
| 2018-06-01 | B | B1 | 0 | 1 |
| 2018-06-02 | B | B2 | 0 | 2 |
| 2018-06-03 | B | B3 | 0 | 3 |
+------------+------+------+----------+--------------------------+
'

fread(gsub('\+.+\n' ,'', x, perl = T), drop=c(1,7))

# Date Emp1 Case Priority PriorityCountinLast7days
# 1: 2018-06-01 A A1 0 0
# 2: 2018-06-03 A A2 0 1
# 3: 2018-06-03 A A3 0 2
# 4: 2018-06-03 A A4 1 1
# 5: 2018-06-03 A A5 2 1
# 6: 2018-06-04 A A6 0 3
# 7: 2018-06-01 B B1 0 1
# 8: 2018-06-02 B B2 0 2
# 9: 2018-06-03 B B3 0 3

The gsub part removes the horizontal rules. drop removes the extra columns caused by delimiters at the line ends.

edited Aug 26 at 18:59

answered Aug 26 at 7:42

dww

13.1k22550

Using data.table::fread:

x = '
+------------+------+------+----------+--------------------------+
| Date | Emp1 | Case | Priority | PriorityCountinLast7days |
+------------+------+------+----------+--------------------------+
| 2018-06-01 | A | A1 | 0 | 0 |
| 2018-06-03 | A | A2 | 0 | 1 |
| 2018-06-03 | A | A3 | 0 | 2 |
| 2018-06-03 | A | A4 | 1 | 1 |
| 2018-06-03 | A | A5 | 2 | 1 |
| 2018-06-04 | A | A6 | 0 | 3 |
| 2018-06-01 | B | B1 | 0 | 1 |
| 2018-06-02 | B | B2 | 0 | 2 |
| 2018-06-03 | B | B3 | 0 | 3 |
+------------+------+------+----------+--------------------------+
'

fread(gsub('\+.+\n' ,'', x, perl = T), drop=c(1,7))

# Date Emp1 Case Priority PriorityCountinLast7days
# 1: 2018-06-01 A A1 0 0
# 2: 2018-06-03 A A2 0 1
# 3: 2018-06-03 A A3 0 2
# 4: 2018-06-03 A A4 1 1
# 5: 2018-06-03 A A5 2 1
# 6: 2018-06-04 A A6 0 3
# 7: 2018-06-01 B B1 0 1
# 8: 2018-06-02 B B2 0 2
# 9: 2018-06-03 B B3 0 3

The gsub part removes the horizontal rules. drop removes the extra columns caused by delimiters at the line ends.

edited Aug 26 at 18:59

answered Aug 26 at 7:42

dww

13.1k22550

edited Aug 26 at 18:59

answered Aug 26 at 7:42

dww

13.1k22550

answered Aug 26 at 7:42

dww

13.1k22550

answered Aug 26 at 7:42

dww

13.1k22550

add a commentÂ |Â

up vote
16
down vote

The short answer to the question is yes, R code can solve that mess and no, it doesn't take that much trouble.

The first step after copying & pasting the table into an R session is to read it in with read.table setting the header, sep, comment.char and strip.white arguments.

Credits for reminding me of arguments comment.char and strip.white go to @nicola, and his comment.

dat <- read.table(text = "
+------------+------+------+----------+--------------------------+
| Date | Emp1 | Case | Priority | PriorityCountinLast7days |
+------------+------+------+----------+--------------------------+
| 2018-06-01 | A | A1 | 0 | 0 |
| 2018-06-03 | A | A2 | 0 | 1 |
| 2018-06-03 | A | A3 | 0 | 2 |
| 2018-06-03 | A | A4 | 1 | 1 |
| 2018-06-03 | A | A5 | 2 | 1 |
| 2018-06-04 | A | A6 | 0 | 3 |
| 2018-06-01 | B | B1 | 0 | 1 |
| 2018-06-02 | B | B2 | 0 | 2 |
| 2018-06-03 | B | B3 | 0 | 3 |
+------------+------+------+----------+--------------------------+
", header = TRUE, sep = "|", comment.char = "+", strip.white = TRUE)

But as you can see there are some issues with the result.

dat
 X Date Emp1 Case Priority PriorityCountinLast7days X.1
1 NA 2018-06-01 A A1 0 0 NA
2 NA 2018-06-03 A A2 0 1 NA
3 NA 2018-06-03 A A3 0 2 NA
4 NA 2018-06-03 A A4 1 1 NA
5 NA 2018-06-03 A A5 2 1 NA
6 NA 2018-06-04 A A6 0 3 NA
7 NA 2018-06-01 B B1 0 1 NA
8 NA 2018-06-02 B B2 0 2 NA
9 NA 2018-06-03 B B3 0 3 NA

To have separators start and end each data row made R believe those separators mark extra columns, which is not what is meant by the original question's OP.

So the second step is to keep only the real columns. I will do this subsetting the columns by their numbers, easily done, they usually are the first and last columns.

dat <- dat[-c(1, ncol(dat))]
dat
 Date Emp1 Case Priority PriorityCountinLast7days
1 2018-06-01 A A1 0 0
2 2018-06-03 A A2 0 1
3 2018-06-03 A A3 0 2
4 2018-06-03 A A4 1 1
5 2018-06-03 A A5 2 1
6 2018-06-04 A A6 0 3
7 2018-06-01 B B1 0 1
8 2018-06-02 B B2 0 2
9 2018-06-03 B B3 0 3

That wasn't too hard, much better.

In this case there is still a problem, to coerce column Date to class Date.

dat$Date <- as.Date(dat$Date)

And the result is satisfactory.

str(dat)
'data.frame': 9 obs. of 5 variables:
 $ Date : Date, format: "2018-06-01" "2018-06-03" ...
 $ Emp1 : Factor w/ 2 levels "A","B": 1 1 1 1 1 1 2 2 2
 $ Case : Factor w/ 9 levels "A1","A2","A3",..: 1 2 3 4 5 6 7 8 9
 $ Priority : int 0 0 0 1 2 0 0 0 0
 $ PriorityCountinLast7days: int 0 1 2 1 1 3 1 2 3

Note that I have not set the more or less standard argument stringsAsFactors = FALSE. If needed, this should be done when running read.table.

The whole process took only 3 lines of base R code.

Finally, the end result in dput format, like it should be in the first place.

dat <-
structure(list(Date = structure(c(17683, 17685, 17685, 17685, 
17685, 17686, 17683, 17684, 17685), class = "Date"), Emp1 = c("A", 
"A", "A", "A", "A", "A", "B", "B", "B"), Case = c("A1", "A2", 
"A3", "A4", "A5", "A6", "B1", "B2", "B3"), Priority = c(0, 0, 
0, 1, 2, 0, 0, 0, 0), PriorityCountinLast7days = c(0, 1, 2, 1, 
1, 3, 1, 2, 3)), row.names = c(NA, -9L), class = "data.frame")

edited Aug 27 at 2:01

answered Aug 26 at 6:41

Rui Barradas

12.4k31628

1

@Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ I don't know about "safer", I find it more natural to use read table functions since after all that does look like a table.
â€“Â Rui Barradas
Aug 26 at 6:46

1

I was about to post utils:::head.default(read.table("clipboard",comment.char="+",sep="|",strip.white=TRUE,header=TRUE,flush=TRUE)[-1],-1), which is not different from what you did (in my line, you manage the first and third line and remove the undesired columns).
â€“Â nicola
Aug 26 at 8:13

add a commentÂ |Â

up vote
16
down vote

The short answer to the question is yes, R code can solve that mess and no, it doesn't take that much trouble.

The first step after copying & pasting the table into an R session is to read it in with read.table setting the header, sep, comment.char and strip.white arguments.

Credits for reminding me of arguments comment.char and strip.white go to @nicola, and his comment.

dat <- read.table(text = "
+------------+------+------+----------+--------------------------+
| Date | Emp1 | Case | Priority | PriorityCountinLast7days |
+------------+------+------+----------+--------------------------+
| 2018-06-01 | A | A1 | 0 | 0 |
| 2018-06-03 | A | A2 | 0 | 1 |
| 2018-06-03 | A | A3 | 0 | 2 |
| 2018-06-03 | A | A4 | 1 | 1 |
| 2018-06-03 | A | A5 | 2 | 1 |
| 2018-06-04 | A | A6 | 0 | 3 |
| 2018-06-01 | B | B1 | 0 | 1 |
| 2018-06-02 | B | B2 | 0 | 2 |
| 2018-06-03 | B | B3 | 0 | 3 |
+------------+------+------+----------+--------------------------+
", header = TRUE, sep = "|", comment.char = "+", strip.white = TRUE)

But as you can see there are some issues with the result.

dat
 X Date Emp1 Case Priority PriorityCountinLast7days X.1
1 NA 2018-06-01 A A1 0 0 NA
2 NA 2018-06-03 A A2 0 1 NA
3 NA 2018-06-03 A A3 0 2 NA
4 NA 2018-06-03 A A4 1 1 NA
5 NA 2018-06-03 A A5 2 1 NA
6 NA 2018-06-04 A A6 0 3 NA
7 NA 2018-06-01 B B1 0 1 NA
8 NA 2018-06-02 B B2 0 2 NA
9 NA 2018-06-03 B B3 0 3 NA

To have separators start and end each data row made R believe those separators mark extra columns, which is not what is meant by the original question's OP.

So the second step is to keep only the real columns. I will do this subsetting the columns by their numbers, easily done, they usually are the first and last columns.

dat <- dat[-c(1, ncol(dat))]
dat
 Date Emp1 Case Priority PriorityCountinLast7days
1 2018-06-01 A A1 0 0
2 2018-06-03 A A2 0 1
3 2018-06-03 A A3 0 2
4 2018-06-03 A A4 1 1
5 2018-06-03 A A5 2 1
6 2018-06-04 A A6 0 3
7 2018-06-01 B B1 0 1
8 2018-06-02 B B2 0 2
9 2018-06-03 B B3 0 3

That wasn't too hard, much better.

In this case there is still a problem, to coerce column Date to class Date.

dat$Date <- as.Date(dat$Date)

And the result is satisfactory.

str(dat)
'data.frame': 9 obs. of 5 variables:
 $ Date : Date, format: "2018-06-01" "2018-06-03" ...
 $ Emp1 : Factor w/ 2 levels "A","B": 1 1 1 1 1 1 2 2 2
 $ Case : Factor w/ 9 levels "A1","A2","A3",..: 1 2 3 4 5 6 7 8 9
 $ Priority : int 0 0 0 1 2 0 0 0 0
 $ PriorityCountinLast7days: int 0 1 2 1 1 3 1 2 3

Note that I have not set the more or less standard argument stringsAsFactors = FALSE. If needed, this should be done when running read.table.

The whole process took only 3 lines of base R code.

Finally, the end result in dput format, like it should be in the first place.

dat <-
structure(list(Date = structure(c(17683, 17685, 17685, 17685, 
17685, 17686, 17683, 17684, 17685), class = "Date"), Emp1 = c("A", 
"A", "A", "A", "A", "A", "B", "B", "B"), Case = c("A1", "A2", 
"A3", "A4", "A5", "A6", "B1", "B2", "B3"), Priority = c(0, 0, 
0, 1, 2, 0, 0, 0, 0), PriorityCountinLast7days = c(0, 1, 2, 1, 
1, 3, 1, 2, 3)), row.names = c(NA, -9L), class = "data.frame")

edited Aug 27 at 2:01

answered Aug 26 at 6:41

Rui Barradas

12.4k31628

1

@Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ I don't know about "safer", I find it more natural to use read table functions since after all that does look like a table.
â€“Â Rui Barradas
Aug 26 at 6:46

1

I was about to post utils:::head.default(read.table("clipboard",comment.char="+",sep="|",strip.white=TRUE,header=TRUE,flush=TRUE)[-1],-1), which is not different from what you did (in my line, you manage the first and third line and remove the undesired columns).
â€“Â nicola
Aug 26 at 8:13

add a commentÂ |Â

up vote
16
down vote

The short answer to the question is yes, R code can solve that mess and no, it doesn't take that much trouble.

The first step after copying & pasting the table into an R session is to read it in with read.table setting the header, sep, comment.char and strip.white arguments.

Credits for reminding me of arguments comment.char and strip.white go to @nicola, and his comment.

dat <- read.table(text = "
+------------+------+------+----------+--------------------------+
| Date | Emp1 | Case | Priority | PriorityCountinLast7days |
+------------+------+------+----------+--------------------------+
| 2018-06-01 | A | A1 | 0 | 0 |
| 2018-06-03 | A | A2 | 0 | 1 |
| 2018-06-03 | A | A3 | 0 | 2 |
| 2018-06-03 | A | A4 | 1 | 1 |
| 2018-06-03 | A | A5 | 2 | 1 |
| 2018-06-04 | A | A6 | 0 | 3 |
| 2018-06-01 | B | B1 | 0 | 1 |
| 2018-06-02 | B | B2 | 0 | 2 |
| 2018-06-03 | B | B3 | 0 | 3 |
+------------+------+------+----------+--------------------------+
", header = TRUE, sep = "|", comment.char = "+", strip.white = TRUE)

But as you can see there are some issues with the result.

dat
 X Date Emp1 Case Priority PriorityCountinLast7days X.1
1 NA 2018-06-01 A A1 0 0 NA
2 NA 2018-06-03 A A2 0 1 NA
3 NA 2018-06-03 A A3 0 2 NA
4 NA 2018-06-03 A A4 1 1 NA
5 NA 2018-06-03 A A5 2 1 NA
6 NA 2018-06-04 A A6 0 3 NA
7 NA 2018-06-01 B B1 0 1 NA
8 NA 2018-06-02 B B2 0 2 NA
9 NA 2018-06-03 B B3 0 3 NA

To have separators start and end each data row made R believe those separators mark extra columns, which is not what is meant by the original question's OP.

So the second step is to keep only the real columns. I will do this subsetting the columns by their numbers, easily done, they usually are the first and last columns.

dat <- dat[-c(1, ncol(dat))]
dat
 Date Emp1 Case Priority PriorityCountinLast7days
1 2018-06-01 A A1 0 0
2 2018-06-03 A A2 0 1
3 2018-06-03 A A3 0 2
4 2018-06-03 A A4 1 1
5 2018-06-03 A A5 2 1
6 2018-06-04 A A6 0 3
7 2018-06-01 B B1 0 1
8 2018-06-02 B B2 0 2
9 2018-06-03 B B3 0 3

That wasn't too hard, much better.

In this case there is still a problem, to coerce column Date to class Date.

dat$Date <- as.Date(dat$Date)

And the result is satisfactory.

str(dat)
'data.frame': 9 obs. of 5 variables:
 $ Date : Date, format: "2018-06-01" "2018-06-03" ...
 $ Emp1 : Factor w/ 2 levels "A","B": 1 1 1 1 1 1 2 2 2
 $ Case : Factor w/ 9 levels "A1","A2","A3",..: 1 2 3 4 5 6 7 8 9
 $ Priority : int 0 0 0 1 2 0 0 0 0
 $ PriorityCountinLast7days: int 0 1 2 1 1 3 1 2 3

Note that I have not set the more or less standard argument stringsAsFactors = FALSE. If needed, this should be done when running read.table.

The whole process took only 3 lines of base R code.

Finally, the end result in dput format, like it should be in the first place.

dat <-
structure(list(Date = structure(c(17683, 17685, 17685, 17685, 
17685, 17686, 17683, 17684, 17685), class = "Date"), Emp1 = c("A", 
"A", "A", "A", "A", "A", "B", "B", "B"), Case = c("A1", "A2", 
"A3", "A4", "A5", "A6", "B1", "B2", "B3"), Priority = c(0, 0, 
0, 1, 2, 0, 0, 0, 0), PriorityCountinLast7days = c(0, 1, 2, 1, 
1, 3, 1, 2, 3)), row.names = c(NA, -9L), class = "data.frame")

edited Aug 27 at 2:01

answered Aug 26 at 6:41

Rui Barradas

12.4k31628

The short answer to the question is yes, R code can solve that mess and no, it doesn't take that much trouble.

The first step after copying & pasting the table into an R session is to read it in with read.table setting the header, sep, comment.char and strip.white arguments.

Credits for reminding me of arguments comment.char and strip.white go to @nicola, and his comment.

dat <- read.table(text = "
+------------+------+------+----------+--------------------------+
| Date | Emp1 | Case | Priority | PriorityCountinLast7days |
+------------+------+------+----------+--------------------------+
| 2018-06-01 | A | A1 | 0 | 0 |
| 2018-06-03 | A | A2 | 0 | 1 |
| 2018-06-03 | A | A3 | 0 | 2 |
| 2018-06-03 | A | A4 | 1 | 1 |
| 2018-06-03 | A | A5 | 2 | 1 |
| 2018-06-04 | A | A6 | 0 | 3 |
| 2018-06-01 | B | B1 | 0 | 1 |
| 2018-06-02 | B | B2 | 0 | 2 |
| 2018-06-03 | B | B3 | 0 | 3 |
+------------+------+------+----------+--------------------------+
", header = TRUE, sep = "|", comment.char = "+", strip.white = TRUE)

But as you can see there are some issues with the result.

dat
 X Date Emp1 Case Priority PriorityCountinLast7days X.1
1 NA 2018-06-01 A A1 0 0 NA
2 NA 2018-06-03 A A2 0 1 NA
3 NA 2018-06-03 A A3 0 2 NA
4 NA 2018-06-03 A A4 1 1 NA
5 NA 2018-06-03 A A5 2 1 NA
6 NA 2018-06-04 A A6 0 3 NA
7 NA 2018-06-01 B B1 0 1 NA
8 NA 2018-06-02 B B2 0 2 NA
9 NA 2018-06-03 B B3 0 3 NA

To have separators start and end each data row made R believe those separators mark extra columns, which is not what is meant by the original question's OP.

So the second step is to keep only the real columns. I will do this subsetting the columns by their numbers, easily done, they usually are the first and last columns.

dat <- dat[-c(1, ncol(dat))]
dat
 Date Emp1 Case Priority PriorityCountinLast7days
1 2018-06-01 A A1 0 0
2 2018-06-03 A A2 0 1
3 2018-06-03 A A3 0 2
4 2018-06-03 A A4 1 1
5 2018-06-03 A A5 2 1
6 2018-06-04 A A6 0 3
7 2018-06-01 B B1 0 1
8 2018-06-02 B B2 0 2
9 2018-06-03 B B3 0 3

That wasn't too hard, much better.

In this case there is still a problem, to coerce column Date to class Date.

dat$Date <- as.Date(dat$Date)

And the result is satisfactory.

str(dat)
'data.frame': 9 obs. of 5 variables:
 $ Date : Date, format: "2018-06-01" "2018-06-03" ...
 $ Emp1 : Factor w/ 2 levels "A","B": 1 1 1 1 1 1 2 2 2
 $ Case : Factor w/ 9 levels "A1","A2","A3",..: 1 2 3 4 5 6 7 8 9
 $ Priority : int 0 0 0 1 2 0 0 0 0
 $ PriorityCountinLast7days: int 0 1 2 1 1 3 1 2 3

Note that I have not set the more or less standard argument stringsAsFactors = FALSE. If needed, this should be done when running read.table.

The whole process took only 3 lines of base R code.

Finally, the end result in dput format, like it should be in the first place.

dat <-
structure(list(Date = structure(c(17683, 17685, 17685, 17685, 
17685, 17686, 17683, 17684, 17685), class = "Date"), Emp1 = c("A", 
"A", "A", "A", "A", "A", "B", "B", "B"), Case = c("A1", "A2", 
"A3", "A4", "A5", "A6", "B1", "B2", "B3"), Priority = c(0, 0, 
0, 1, 2, 0, 0, 0, 0), PriorityCountinLast7days = c(0, 1, 2, 1, 
1, 3, 1, 2, 3)), row.names = c(NA, -9L), class = "data.frame")

edited Aug 27 at 2:01

answered Aug 26 at 6:41

Rui Barradas

12.4k31628

edited Aug 27 at 2:01

answered Aug 26 at 6:41

Rui Barradas

12.4k31628

answered Aug 26 at 6:41

Rui Barradas

12.4k31628

answered Aug 26 at 6:41

Rui Barradas

12.4k31628

1

@Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ I don't know about "safer", I find it more natural to use read table functions since after all that does look like a table.
â€“Â Rui Barradas
Aug 26 at 6:46

1

I was about to post utils:::head.default(read.table("clipboard",comment.char="+",sep="|",strip.white=TRUE,header=TRUE,flush=TRUE)[-1],-1), which is not different from what you did (in my line, you manage the first and third line and remove the undesired columns).
â€“Â nicola
Aug 26 at 8:13

add a commentÂ |Â

1

@Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ I don't know about "safer", I find it more natural to use read table functions since after all that does look like a table.
â€“Â Rui Barradas
Aug 26 at 6:46

1

I was about to post utils:::head.default(read.table("clipboard",comment.char="+",sep="|",strip.white=TRUE,header=TRUE,flush=TRUE)[-1],-1), which is not different from what you did (in my line, you manage the first and third line and remove the undesired columns).
â€“Â nicola
Aug 26 at 8:13

@Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ I don't know about "safer", I find it more natural to use read table functions since after all that does look like a table.
â€“Â Rui Barradas
Aug 26 at 6:46

I was about to post utils:::head.default(read.table("clipboard",comment.char="+",sep="|",strip.white=TRUE,header=TRUE,flush=TRUE)[-1],-1), which is not different from what you did (in my line, you manage the first and third line and remove the undesired columns).
â€“Â nicola
Aug 26 at 8:13

add a commentÂ |Â

up vote
5
down vote

md_table <- scan(text = "
+------------+------+------+----------+--------------------------+
| Date | Emp1 | Case | Priority | PriorityCountinLast7days |
+------------+------+------+----------+--------------------------+
| 2018-06-01 | A | A1 | 0 | 0 |
| 2018-06-03 | A | A2 | 0 | 1 |
| 2018-06-03 | A | A3 | 0 | 2 |
| 2018-06-03 | A | A4 | 1 | 1 |
| 2018-06-03 | A | A5 | 2 | 1 |
| 2018-06-04 | A | A6 | 0 | 3 |
| 2018-06-01 | B | B1 | 0 | 1 |
| 2018-06-02 | B | B2 | 0 | 2 |
| 2018-06-03 | B | B3 | 0 | 3 |
+------------+------+------+----------+--------------------------+",
what = "", sep = "", comment.char = "+", quiet = TRUE)

## it is clear that there are 5 columns
mat <- matrix(md_table[md_table != "|"], ncol = 5, byrow = TRUE)
# [,1] [,2] [,3] [,4] [,5] 
# [1,] "Date" "Emp1" "Case" "Priority" "PriorityCountinLast7days"
# [2,] "2018-06-01" "A" "A1" "0" "0" 
# [3,] "2018-06-03" "A" "A2" "0" "1" 
# [4,] "2018-06-03" "A" "A3" "0" "2" 
# [5,] "2018-06-03" "A" "A4" "1" "1" 
# [6,] "2018-06-03" "A" "A5" "2" "1" 
# [7,] "2018-06-04" "A" "A6" "0" "3" 
# [8,] "2018-06-01" "B" "B1" "0" "1" 
# [9,] "2018-06-02" "B" "B2" "0" "2" 
#[10,] "2018-06-03" "B" "B3" "0" "3"

## a data frame with all character columns
dat <- setNames(data.frame(mat[-1, ], stringsAsFactors = FALSE), mat[1, ])
# Date Emp1 Case Priority PriorityCountinLast7days
#1 2018-06-01 A A1 0 0
#2 2018-06-03 A A2 0 1
#3 2018-06-03 A A3 0 2
#4 2018-06-03 A A4 1 1
#5 2018-06-03 A A5 2 1
#6 2018-06-04 A A6 0 3
#7 2018-06-01 B B1 0 1
#8 2018-06-02 B B2 0 2
#9 2018-06-03 B B3 0 3

## or maybe just use `type.convert` on some columns?
dat <- lapply(dat, type.convert)

edited Aug 26 at 8:05

answered Aug 26 at 6:32

Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ

44.4k1481128

add a commentÂ |Â

up vote
5
down vote

md_table <- scan(text = "
+------------+------+------+----------+--------------------------+
| Date | Emp1 | Case | Priority | PriorityCountinLast7days |
+------------+------+------+----------+--------------------------+
| 2018-06-01 | A | A1 | 0 | 0 |
| 2018-06-03 | A | A2 | 0 | 1 |
| 2018-06-03 | A | A3 | 0 | 2 |
| 2018-06-03 | A | A4 | 1 | 1 |
| 2018-06-03 | A | A5 | 2 | 1 |
| 2018-06-04 | A | A6 | 0 | 3 |
| 2018-06-01 | B | B1 | 0 | 1 |
| 2018-06-02 | B | B2 | 0 | 2 |
| 2018-06-03 | B | B3 | 0 | 3 |
+------------+------+------+----------+--------------------------+",
what = "", sep = "", comment.char = "+", quiet = TRUE)

## it is clear that there are 5 columns
mat <- matrix(md_table[md_table != "|"], ncol = 5, byrow = TRUE)
# [,1] [,2] [,3] [,4] [,5] 
# [1,] "Date" "Emp1" "Case" "Priority" "PriorityCountinLast7days"
# [2,] "2018-06-01" "A" "A1" "0" "0" 
# [3,] "2018-06-03" "A" "A2" "0" "1" 
# [4,] "2018-06-03" "A" "A3" "0" "2" 
# [5,] "2018-06-03" "A" "A4" "1" "1" 
# [6,] "2018-06-03" "A" "A5" "2" "1" 
# [7,] "2018-06-04" "A" "A6" "0" "3" 
# [8,] "2018-06-01" "B" "B1" "0" "1" 
# [9,] "2018-06-02" "B" "B2" "0" "2" 
#[10,] "2018-06-03" "B" "B3" "0" "3"

## a data frame with all character columns
dat <- setNames(data.frame(mat[-1, ], stringsAsFactors = FALSE), mat[1, ])
# Date Emp1 Case Priority PriorityCountinLast7days
#1 2018-06-01 A A1 0 0
#2 2018-06-03 A A2 0 1
#3 2018-06-03 A A3 0 2
#4 2018-06-03 A A4 1 1
#5 2018-06-03 A A5 2 1
#6 2018-06-04 A A6 0 3
#7 2018-06-01 B B1 0 1
#8 2018-06-02 B B2 0 2
#9 2018-06-03 B B3 0 3

## or maybe just use `type.convert` on some columns?
dat <- lapply(dat, type.convert)

edited Aug 26 at 8:05

answered Aug 26 at 6:32

Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ

44.4k1481128

add a commentÂ |Â

up vote
5
down vote

md_table <- scan(text = "
+------------+------+------+----------+--------------------------+
| Date | Emp1 | Case | Priority | PriorityCountinLast7days |
+------------+------+------+----------+--------------------------+
| 2018-06-01 | A | A1 | 0 | 0 |
| 2018-06-03 | A | A2 | 0 | 1 |
| 2018-06-03 | A | A3 | 0 | 2 |
| 2018-06-03 | A | A4 | 1 | 1 |
| 2018-06-03 | A | A5 | 2 | 1 |
| 2018-06-04 | A | A6 | 0 | 3 |
| 2018-06-01 | B | B1 | 0 | 1 |
| 2018-06-02 | B | B2 | 0 | 2 |
| 2018-06-03 | B | B3 | 0 | 3 |
+------------+------+------+----------+--------------------------+",
what = "", sep = "", comment.char = "+", quiet = TRUE)

## it is clear that there are 5 columns
mat <- matrix(md_table[md_table != "|"], ncol = 5, byrow = TRUE)
# [,1] [,2] [,3] [,4] [,5] 
# [1,] "Date" "Emp1" "Case" "Priority" "PriorityCountinLast7days"
# [2,] "2018-06-01" "A" "A1" "0" "0" 
# [3,] "2018-06-03" "A" "A2" "0" "1" 
# [4,] "2018-06-03" "A" "A3" "0" "2" 
# [5,] "2018-06-03" "A" "A4" "1" "1" 
# [6,] "2018-06-03" "A" "A5" "2" "1" 
# [7,] "2018-06-04" "A" "A6" "0" "3" 
# [8,] "2018-06-01" "B" "B1" "0" "1" 
# [9,] "2018-06-02" "B" "B2" "0" "2" 
#[10,] "2018-06-03" "B" "B3" "0" "3"

## a data frame with all character columns
dat <- setNames(data.frame(mat[-1, ], stringsAsFactors = FALSE), mat[1, ])
# Date Emp1 Case Priority PriorityCountinLast7days
#1 2018-06-01 A A1 0 0
#2 2018-06-03 A A2 0 1
#3 2018-06-03 A A3 0 2
#4 2018-06-03 A A4 1 1
#5 2018-06-03 A A5 2 1
#6 2018-06-04 A A6 0 3
#7 2018-06-01 B B1 0 1
#8 2018-06-02 B B2 0 2
#9 2018-06-03 B B3 0 3

## or maybe just use `type.convert` on some columns?
dat <- lapply(dat, type.convert)

edited Aug 26 at 8:05

answered Aug 26 at 6:32

Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ

44.4k1481128

md_table <- scan(text = "
+------------+------+------+----------+--------------------------+
| Date | Emp1 | Case | Priority | PriorityCountinLast7days |
+------------+------+------+----------+--------------------------+
| 2018-06-01 | A | A1 | 0 | 0 |
| 2018-06-03 | A | A2 | 0 | 1 |
| 2018-06-03 | A | A3 | 0 | 2 |
| 2018-06-03 | A | A4 | 1 | 1 |
| 2018-06-03 | A | A5 | 2 | 1 |
| 2018-06-04 | A | A6 | 0 | 3 |
| 2018-06-01 | B | B1 | 0 | 1 |
| 2018-06-02 | B | B2 | 0 | 2 |
| 2018-06-03 | B | B3 | 0 | 3 |
+------------+------+------+----------+--------------------------+",
what = "", sep = "", comment.char = "+", quiet = TRUE)

## it is clear that there are 5 columns
mat <- matrix(md_table[md_table != "|"], ncol = 5, byrow = TRUE)
# [,1] [,2] [,3] [,4] [,5] 
# [1,] "Date" "Emp1" "Case" "Priority" "PriorityCountinLast7days"
# [2,] "2018-06-01" "A" "A1" "0" "0" 
# [3,] "2018-06-03" "A" "A2" "0" "1" 
# [4,] "2018-06-03" "A" "A3" "0" "2" 
# [5,] "2018-06-03" "A" "A4" "1" "1" 
# [6,] "2018-06-03" "A" "A5" "2" "1" 
# [7,] "2018-06-04" "A" "A6" "0" "3" 
# [8,] "2018-06-01" "B" "B1" "0" "1" 
# [9,] "2018-06-02" "B" "B2" "0" "2" 
#[10,] "2018-06-03" "B" "B3" "0" "3"

## a data frame with all character columns
dat <- setNames(data.frame(mat[-1, ], stringsAsFactors = FALSE), mat[1, ])
# Date Emp1 Case Priority PriorityCountinLast7days
#1 2018-06-01 A A1 0 0
#2 2018-06-03 A A2 0 1
#3 2018-06-03 A A3 0 2
#4 2018-06-03 A A4 1 1
#5 2018-06-03 A A5 2 1
#6 2018-06-04 A A6 0 3
#7 2018-06-01 B B1 0 1
#8 2018-06-02 B B2 0 2
#9 2018-06-03 B B3 0 3

## or maybe just use `type.convert` on some columns?
dat <- lapply(dat, type.convert)

edited Aug 26 at 8:05

answered Aug 26 at 6:32

Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ

44.4k1481128

edited Aug 26 at 8:05

answered Aug 26 at 6:32

Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ

44.4k1481128

answered Aug 26 at 6:32

Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ

44.4k1481128

answered Aug 26 at 6:32

Ã¦ÂÂŽÃ¥Â“Â²Ã¦ÂºÂ

44.4k1481128

add a commentÂ |Â

up vote
3
down vote

Well, about this specific dataset I used the import feature in RStudio, but I took one additional step beforehand.

Copy the dataset into the Notepad file.

Replace all | characters with ,

Import the Notepad file using read.csv to RStudio using this code (seperate columns by ,).

But, if you mean use the R to fully understand it in one step, then I have no idea.

edited Aug 26 at 11:15

Peter Mortensen

13k1983111

answered Aug 26 at 6:38

Salman Lashkarara

4,3991454104

Notepad? Are you assuming Microsoft Windows?
â€“Â Peter Mortensen
Aug 26 at 11:16

@PeterMortensen Yes, but it could be any other editor as well.
â€“Â Salman Lashkarara
Aug 26 at 11:26

add a commentÂ |Â

up vote
3
down vote

Well, about this specific dataset I used the import feature in RStudio, but I took one additional step beforehand.

Copy the dataset into the Notepad file.

Replace all | characters with ,

Import the Notepad file using read.csv to RStudio using this code (seperate columns by ,).

But, if you mean use the R to fully understand it in one step, then I have no idea.

edited Aug 26 at 11:15

Peter Mortensen

13k1983111

answered Aug 26 at 6:38

Salman Lashkarara

4,3991454104

Notepad? Are you assuming Microsoft Windows?
â€“Â Peter Mortensen
Aug 26 at 11:16

@PeterMortensen Yes, but it could be any other editor as well.
â€“Â Salman Lashkarara
Aug 26 at 11:26

add a commentÂ |Â

up vote
3
down vote

Well, about this specific dataset I used the import feature in RStudio, but I took one additional step beforehand.

Copy the dataset into the Notepad file.

Replace all | characters with ,

Import the Notepad file using read.csv to RStudio using this code (seperate columns by ,).

But, if you mean use the R to fully understand it in one step, then I have no idea.

edited Aug 26 at 11:15

Peter Mortensen

13k1983111

answered Aug 26 at 6:38

Salman Lashkarara

4,3991454104

Well, about this specific dataset I used the import feature in RStudio, but I took one additional step beforehand.

Copy the dataset into the Notepad file.

Replace all | characters with ,

Import the Notepad file using read.csv to RStudio using this code (seperate columns by ,).

But, if you mean use the R to fully understand it in one step, then I have no idea.

edited Aug 26 at 11:15

Peter Mortensen

13k1983111

answered Aug 26 at 6:38

Salman Lashkarara

4,3991454104

edited Aug 26 at 11:15

Peter Mortensen

13k1983111

edited Aug 26 at 11:15

Peter Mortensen

13k1983111

edited Aug 26 at 11:15

Peter Mortensen

13k1983111

answered Aug 26 at 6:38

Salman Lashkarara

4,3991454104

answered Aug 26 at 6:38

Salman Lashkarara

4,3991454104

answered Aug 26 at 6:38

Salman Lashkarara

4,3991454104

Notepad? Are you assuming Microsoft Windows?
â€“Â Peter Mortensen
Aug 26 at 11:16

@PeterMortensen Yes, but it could be any other editor as well.
â€“Â Salman Lashkarara
Aug 26 at 11:26

add a commentÂ |Â

Notepad? Are you assuming Microsoft Windows?
â€“Â Peter Mortensen
Aug 26 at 11:16

@PeterMortensen Yes, but it could be any other editor as well.
â€“Â Salman Lashkarara
Aug 26 at 11:26

Notepad? Are you assuming Microsoft Windows?
â€“Â Peter Mortensen
Aug 26 at 11:16

@PeterMortensen Yes, but it could be any other editor as well.
â€“Â Salman Lashkarara
Aug 26 at 11:26

add a commentÂ |Â

up vote
-2
down vote

As it was suggested, you could use dput to save the content of a dataframe to a file, open the file in a text editor and paste its content. An example of mtcar's dataset limited to first 10 rows:

dput(mtcars %>% head(10), file = 'reproducible.txt')

df <- structure(list(mpg = c(21, 21, 22.8, 21.4, 18.7, 18.1, 14.3,
24.4, 22.8, 19.2), cyl = c(6, 6, 4, 6, 8, 6, 8, 4, 4, 6), disp = c(160,
160, 108, 258, 360, 225, 360, 146.7, 140.8, 167.6), hp = c(110,
110, 93, 110, 175, 105, 245, 62, 95, 123), drat = c(3.9, 3.9,
3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92), wt = c(2.62,
2.875, 2.32, 3.215, 3.44, 3.46, 3.57, 3.19, 3.15, 3.44), qsec = c(16.46,
17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20, 22.9, 18.3), vs = c(0,
0, 1, 1, 0, 1, 0, 1, 1, 1), am = c(1, 1, 1, 0, 0, 0, 0, 0, 0,
0), gear = c(4, 4, 4, 3, 3, 3, 3, 4, 4, 4), carb = c(4, 4, 1,
1, 2, 1, 4, 2, 2, 4)), .Names = c("mpg", "cyl", "disp", "hp",
"drat", "wt", "qsec", "vs", "am", "gear", "carb"), row.names = c("Mazda RX4",
"Mazda RX4 Wag", "Datsun 710", "Hornet 4 Drive", "Hornet Sportabout",
"Valiant", "Duster 360", "Merc 240D", "Merc 230", "Merc 280"), class = "data.frame")

edited Aug 26 at 11:19

Peter Mortensen

13k1983111

answered Aug 26 at 6:39

Pawel Stradowski

8617

4

Thanks for answering, but I think you have not understood what I am asking. In the case of a dataframe already existing in an R session, there is no problem, the problem is to have a table written in a text document with plus signs, dashes an pipe signs. How can that type of table be read in by R?
â€“Â Rui Barradas
Aug 26 at 6:53

add a commentÂ |Â

up vote
-2
down vote

As it was suggested, you could use dput to save the content of a dataframe to a file, open the file in a text editor and paste its content. An example of mtcar's dataset limited to first 10 rows:

dput(mtcars %>% head(10), file = 'reproducible.txt')

df <- structure(list(mpg = c(21, 21, 22.8, 21.4, 18.7, 18.1, 14.3,
24.4, 22.8, 19.2), cyl = c(6, 6, 4, 6, 8, 6, 8, 4, 4, 6), disp = c(160,
160, 108, 258, 360, 225, 360, 146.7, 140.8, 167.6), hp = c(110,
110, 93, 110, 175, 105, 245, 62, 95, 123), drat = c(3.9, 3.9,
3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92), wt = c(2.62,
2.875, 2.32, 3.215, 3.44, 3.46, 3.57, 3.19, 3.15, 3.44), qsec = c(16.46,
17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20, 22.9, 18.3), vs = c(0,
0, 1, 1, 0, 1, 0, 1, 1, 1), am = c(1, 1, 1, 0, 0, 0, 0, 0, 0,
0), gear = c(4, 4, 4, 3, 3, 3, 3, 4, 4, 4), carb = c(4, 4, 1,
1, 2, 1, 4, 2, 2, 4)), .Names = c("mpg", "cyl", "disp", "hp",
"drat", "wt", "qsec", "vs", "am", "gear", "carb"), row.names = c("Mazda RX4",
"Mazda RX4 Wag", "Datsun 710", "Hornet 4 Drive", "Hornet Sportabout",
"Valiant", "Duster 360", "Merc 240D", "Merc 230", "Merc 280"), class = "data.frame")

edited Aug 26 at 11:19

Peter Mortensen

13k1983111

answered Aug 26 at 6:39

Pawel Stradowski

8617

4

Thanks for answering, but I think you have not understood what I am asking. In the case of a dataframe already existing in an R session, there is no problem, the problem is to have a table written in a text document with plus signs, dashes an pipe signs. How can that type of table be read in by R?
â€“Â Rui Barradas
Aug 26 at 6:53

add a commentÂ |Â

up vote
-2
down vote

As it was suggested, you could use dput to save the content of a dataframe to a file, open the file in a text editor and paste its content. An example of mtcar's dataset limited to first 10 rows:

dput(mtcars %>% head(10), file = 'reproducible.txt')

df <- structure(list(mpg = c(21, 21, 22.8, 21.4, 18.7, 18.1, 14.3,
24.4, 22.8, 19.2), cyl = c(6, 6, 4, 6, 8, 6, 8, 4, 4, 6), disp = c(160,
160, 108, 258, 360, 225, 360, 146.7, 140.8, 167.6), hp = c(110,
110, 93, 110, 175, 105, 245, 62, 95, 123), drat = c(3.9, 3.9,
3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92), wt = c(2.62,
2.875, 2.32, 3.215, 3.44, 3.46, 3.57, 3.19, 3.15, 3.44), qsec = c(16.46,
17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20, 22.9, 18.3), vs = c(0,
0, 1, 1, 0, 1, 0, 1, 1, 1), am = c(1, 1, 1, 0, 0, 0, 0, 0, 0,
0), gear = c(4, 4, 4, 3, 3, 3, 3, 4, 4, 4), carb = c(4, 4, 1,
1, 2, 1, 4, 2, 2, 4)), .Names = c("mpg", "cyl", "disp", "hp",
"drat", "wt", "qsec", "vs", "am", "gear", "carb"), row.names = c("Mazda RX4",
"Mazda RX4 Wag", "Datsun 710", "Hornet 4 Drive", "Hornet Sportabout",
"Valiant", "Duster 360", "Merc 240D", "Merc 230", "Merc 280"), class = "data.frame")

edited Aug 26 at 11:19

Peter Mortensen

13k1983111

answered Aug 26 at 6:39

Pawel Stradowski

8617

As it was suggested, you could use dput to save the content of a dataframe to a file, open the file in a text editor and paste its content. An example of mtcar's dataset limited to first 10 rows:

dput(mtcars %>% head(10), file = 'reproducible.txt')

df <- structure(list(mpg = c(21, 21, 22.8, 21.4, 18.7, 18.1, 14.3,
24.4, 22.8, 19.2), cyl = c(6, 6, 4, 6, 8, 6, 8, 4, 4, 6), disp = c(160,
160, 108, 258, 360, 225, 360, 146.7, 140.8, 167.6), hp = c(110,
110, 93, 110, 175, 105, 245, 62, 95, 123), drat = c(3.9, 3.9,
3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92), wt = c(2.62,
2.875, 2.32, 3.215, 3.44, 3.46, 3.57, 3.19, 3.15, 3.44), qsec = c(16.46,
17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20, 22.9, 18.3), vs = c(0,
0, 1, 1, 0, 1, 0, 1, 1, 1), am = c(1, 1, 1, 0, 0, 0, 0, 0, 0,
0), gear = c(4, 4, 4, 3, 3, 3, 3, 4, 4, 4), carb = c(4, 4, 1,
1, 2, 1, 4, 2, 2, 4)), .Names = c("mpg", "cyl", "disp", "hp",
"drat", "wt", "qsec", "vs", "am", "gear", "carb"), row.names = c("Mazda RX4",
"Mazda RX4 Wag", "Datsun 710", "Hornet 4 Drive", "Hornet Sportabout",
"Valiant", "Duster 360", "Merc 240D", "Merc 230", "Merc 280"), class = "data.frame")

edited Aug 26 at 11:19

Peter Mortensen

13k1983111

answered Aug 26 at 6:39

Pawel Stradowski

8617

edited Aug 26 at 11:19

Peter Mortensen

13k1983111

edited Aug 26 at 11:19

Peter Mortensen

13k1983111

edited Aug 26 at 11:19

Peter Mortensen

13k1983111

answered Aug 26 at 6:39

Pawel Stradowski

8617

answered Aug 26 at 6:39

Pawel Stradowski

8617

answered Aug 26 at 6:39

Pawel Stradowski

8617

4

Thanks for answering, but I think you have not understood what I am asking. In the case of a dataframe already existing in an R session, there is no problem, the problem is to have a table written in a text document with plus signs, dashes an pipe signs. How can that type of table be read in by R?
â€“Â Rui Barradas
Aug 26 at 6:53

add a commentÂ |Â

4

Thanks for answering, but I think you have not understood what I am asking. In the case of a dataframe already existing in an R session, there is no problem, the problem is to have a table written in a text document with plus signs, dashes an pipe signs. How can that type of table be read in by R?
â€“Â Rui Barradas
Aug 26 at 6:53

Thanks for answering, but I think you have not understood what I am asking. In the case of a dataframe already existing in an R session, there is no problem, the problem is to have a table written in a text document with plus signs, dashes an pipe signs. How can that type of table be read in by R?
â€“Â Rui Barradas
Aug 26 at 6:53

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu