Count separators in CSV rows with Pandas

I have a csv file as follows:

name,age
something
tom,20

And when I put it into a dataframe it looks like:

df = pd.read_csv('file', header=None)

 0 1
1 name age
2 something NaN
3 tom 20

How would I get the count of a comma in the raw row data. For example, the answer should look like:

# in pseudocode
df['_count_separators'] = len(df.raw_value.count(','))

 0 1 _count_separators
1 name age 1
2 something NaN 0
3 tom 20 1

edited Dec 20 '18 at 12:30

coldspeed

120k19119194

asked Dec 20 '18 at 5:15

David L

2549

do you also want to count the commas if they're in the column value?
– Omkar Sabade
Dec 20 '18 at 5:22

@OmkarSabade preferably just to get the number of separators that pandas inferred -- but either way is acceptable.
– David L
Dec 20 '18 at 5:24

I was hoping for a format like this: 1🐼tom🐼20🐼tom@doe.com
– Martijn
Dec 20 '18 at 14:17

add a comment |

I have a csv file as follows:

name,age
something
tom,20

And when I put it into a dataframe it looks like:

df = pd.read_csv('file', header=None)

 0 1
1 name age
2 something NaN
3 tom 20

How would I get the count of a comma in the raw row data. For example, the answer should look like:

# in pseudocode
df['_count_separators'] = len(df.raw_value.count(','))

 0 1 _count_separators
1 name age 1
2 something NaN 0
3 tom 20 1

edited Dec 20 '18 at 12:30

coldspeed

120k19119194

asked Dec 20 '18 at 5:15

David L

2549

do you also want to count the commas if they're in the column value?
– Omkar Sabade
Dec 20 '18 at 5:22

@OmkarSabade preferably just to get the number of separators that pandas inferred -- but either way is acceptable.
– David L
Dec 20 '18 at 5:24

I was hoping for a format like this: 1🐼tom🐼20🐼tom@doe.com
– Martijn
Dec 20 '18 at 14:17

add a comment |

I have a csv file as follows:

name,age
something
tom,20

And when I put it into a dataframe it looks like:

df = pd.read_csv('file', header=None)

 0 1
1 name age
2 something NaN
3 tom 20

How would I get the count of a comma in the raw row data. For example, the answer should look like:

# in pseudocode
df['_count_separators'] = len(df.raw_value.count(','))

 0 1 _count_separators
1 name age 1
2 something NaN 0
3 tom 20 1

edited Dec 20 '18 at 12:30

coldspeed

120k19119194

asked Dec 20 '18 at 5:15

David L

2549

I have a csv file as follows:

name,age
something
tom,20

And when I put it into a dataframe it looks like:

df = pd.read_csv('file', header=None)

 0 1
1 name age
2 something NaN
3 tom 20

How would I get the count of a comma in the raw row data. For example, the answer should look like:

# in pseudocode
df['_count_separators'] = len(df.raw_value.count(','))

 0 1 _count_separators
1 name age 1
2 something NaN 0
3 tom 20 1

python python-3.x pandas csv dataframe

edited Dec 20 '18 at 12:30

coldspeed

120k19119194

asked Dec 20 '18 at 5:15

David L

2549

edited Dec 20 '18 at 12:30

coldspeed

120k19119194

asked Dec 20 '18 at 5:15

David L

2549

edited Dec 20 '18 at 12:30

coldspeed

120k19119194

edited Dec 20 '18 at 12:30

coldspeed

120k19119194

edited Dec 20 '18 at 12:30

coldspeed

120k19119194

asked Dec 20 '18 at 5:15

David L

2549

asked Dec 20 '18 at 5:15

David L

2549

asked Dec 20 '18 at 5:15

David L

2549

do you also want to count the commas if they're in the column value?
– Omkar Sabade
Dec 20 '18 at 5:22

@OmkarSabade preferably just to get the number of separators that pandas inferred -- but either way is acceptable.
– David L
Dec 20 '18 at 5:24

I was hoping for a format like this: 1🐼tom🐼20🐼tom@doe.com
– Martijn
Dec 20 '18 at 14:17

add a comment |

do you also want to count the commas if they're in the column value?
– Omkar Sabade
Dec 20 '18 at 5:22

@OmkarSabade preferably just to get the number of separators that pandas inferred -- but either way is acceptable.
– David L
Dec 20 '18 at 5:24

I was hoping for a format like this: 1🐼tom🐼20🐼tom@doe.com
– Martijn
Dec 20 '18 at 14:17

do you also want to count the commas if they're in the column value?
– Omkar Sabade
Dec 20 '18 at 5:22

@OmkarSabade preferably just to get the number of separators that pandas inferred -- but either way is acceptable.
– David L
Dec 20 '18 at 5:24

I was hoping for a format like this: 1🐼tom🐼20🐼tom@doe.com
– Martijn
Dec 20 '18 at 14:17

add a comment |

4 Answers
4

active

oldest

votes

Very simply, read your data as a single column series, then split on comma and concatenate with separator count.

# s = pd.read_csv(pd.compat.StringIO(text), sep=r'|', squeeze=True, header=None)
s = pd.read_csv('/path/to/file.csv', sep=r'|', squeeze=True, header=None)

pd.concat([
 s.str.split(',', expand=True), 
 s.str.count(',').rename('_count_sep')
 ], axis=1)

 0 1 _count_sep
0 name age 1
1 something None 0
2 tom 20 1

Another solution for concatenation is to join on the index (this is a neat one liner):

s.str.split(',', expand=True).join(s.str.count(',').rename('_count_sep'))

 0 1 _count_sep
0 name age 1
1 something None 0
2 tom 20 1

edited Dec 20 '18 at 8:37

answered Dec 20 '18 at 5:35

coldspeed

120k19119194

We are on the same road:-) cheers
– W-B
Dec 20 '18 at 5:38

@W-B yup did not see until I posted... great minds.. huh? ;)
– coldspeed
Dec 20 '18 at 5:39

1

I read your mind hahahaha:-)
– W-B
Dec 20 '18 at 5:39

But learn new strcount:-) thanks man
– W-B
Dec 20 '18 at 5:40

1

Your answers stopped me from thinking otherwise
– Dark
Dec 20 '18 at 5:44

add a comment |

Doing this

df = pd.read_csv('file', header=None)
df2 = pd.read_csv('file', header=None,sep='|') # using another sep for read your csv again 

df2['0'].str.findall(',').str.len() # then one row into one cell , using str find 
0 1
1 0
2 1
3 5
Name: 0, dtype: int64

df['_count_separators']=df2['0'].str.findall(',').str.len()

Data

name,age
something
tom,20
something,,,,,somethingelse

answered Dec 20 '18 at 5:31

W-B

101k73163

add a comment |

You can use the csv module for the counting delimiters. This is a two-pass solution, but not necessarily inefficient versus alternative one-pass solutions.

from io import StringIO
import csv, pandas as pd, numpy as np

x = """name,age
something
tom,20"""

# replace StringIO(x) with open('file.csv', 'r')
with StringIO(x) as fin:
 delim_counts = np.fromiter(map(len, csv.reader(fin)), dtype=int)

# replace StringIO(x) with 'file.csv'
df = pd.read_csv(StringIO(x), header=None)
df['_count_separators'] = delim_counts - 1

print(df)

 0 1 _count_separators
0 name age 1
1 something NaN 0
2 tom 20 1

answered Dec 20 '18 at 23:39

jpp

91.7k2052102

add a comment |

One line of code: len(df) - df[1].isna().sum()

answered Dec 20 '18 at 5:31

Quang Hoang

1,7471913

Ohk if the nan itself is a part of the dataset then? like something,,,something?
– Dark
Dec 20 '18 at 5:32

i'm not sure in which instance would df = pd.read_csv('file.csv', header=None) give a nan in his sample.
– Quang Hoang
Dec 20 '18 at 5:35

This assumes there are only two columns...?
– coldspeed
Dec 20 '18 at 5:40

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53862765%2fcount-separators-in-csv-rows-with-pandas%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

Very simply, read your data as a single column series, then split on comma and concatenate with separator count.

# s = pd.read_csv(pd.compat.StringIO(text), sep=r'|', squeeze=True, header=None)
s = pd.read_csv('/path/to/file.csv', sep=r'|', squeeze=True, header=None)

pd.concat([
 s.str.split(',', expand=True), 
 s.str.count(',').rename('_count_sep')
 ], axis=1)

 0 1 _count_sep
0 name age 1
1 something None 0
2 tom 20 1

Another solution for concatenation is to join on the index (this is a neat one liner):

s.str.split(',', expand=True).join(s.str.count(',').rename('_count_sep'))

 0 1 _count_sep
0 name age 1
1 something None 0
2 tom 20 1

edited Dec 20 '18 at 8:37

answered Dec 20 '18 at 5:35

coldspeed

120k19119194

We are on the same road:-) cheers
– W-B
Dec 20 '18 at 5:38

@W-B yup did not see until I posted... great minds.. huh? ;)
– coldspeed
Dec 20 '18 at 5:39

1

I read your mind hahahaha:-)
– W-B
Dec 20 '18 at 5:39

But learn new strcount:-) thanks man
– W-B
Dec 20 '18 at 5:40

1

Your answers stopped me from thinking otherwise
– Dark
Dec 20 '18 at 5:44

add a comment |

Very simply, read your data as a single column series, then split on comma and concatenate with separator count.

# s = pd.read_csv(pd.compat.StringIO(text), sep=r'|', squeeze=True, header=None)
s = pd.read_csv('/path/to/file.csv', sep=r'|', squeeze=True, header=None)

pd.concat([
 s.str.split(',', expand=True), 
 s.str.count(',').rename('_count_sep')
 ], axis=1)

 0 1 _count_sep
0 name age 1
1 something None 0
2 tom 20 1

Another solution for concatenation is to join on the index (this is a neat one liner):

s.str.split(',', expand=True).join(s.str.count(',').rename('_count_sep'))

 0 1 _count_sep
0 name age 1
1 something None 0
2 tom 20 1

edited Dec 20 '18 at 8:37

answered Dec 20 '18 at 5:35

coldspeed

120k19119194

We are on the same road:-) cheers
– W-B
Dec 20 '18 at 5:38

@W-B yup did not see until I posted... great minds.. huh? ;)
– coldspeed
Dec 20 '18 at 5:39

1

I read your mind hahahaha:-)
– W-B
Dec 20 '18 at 5:39

But learn new strcount:-) thanks man
– W-B
Dec 20 '18 at 5:40

1

Your answers stopped me from thinking otherwise
– Dark
Dec 20 '18 at 5:44

add a comment |

Very simply, read your data as a single column series, then split on comma and concatenate with separator count.

# s = pd.read_csv(pd.compat.StringIO(text), sep=r'|', squeeze=True, header=None)
s = pd.read_csv('/path/to/file.csv', sep=r'|', squeeze=True, header=None)

pd.concat([
 s.str.split(',', expand=True), 
 s.str.count(',').rename('_count_sep')
 ], axis=1)

 0 1 _count_sep
0 name age 1
1 something None 0
2 tom 20 1

Another solution for concatenation is to join on the index (this is a neat one liner):

s.str.split(',', expand=True).join(s.str.count(',').rename('_count_sep'))

 0 1 _count_sep
0 name age 1
1 something None 0
2 tom 20 1

edited Dec 20 '18 at 8:37

answered Dec 20 '18 at 5:35

coldspeed

120k19119194

Very simply, read your data as a single column series, then split on comma and concatenate with separator count.

# s = pd.read_csv(pd.compat.StringIO(text), sep=r'|', squeeze=True, header=None)
s = pd.read_csv('/path/to/file.csv', sep=r'|', squeeze=True, header=None)

pd.concat([
 s.str.split(',', expand=True), 
 s.str.count(',').rename('_count_sep')
 ], axis=1)

 0 1 _count_sep
0 name age 1
1 something None 0
2 tom 20 1

Another solution for concatenation is to join on the index (this is a neat one liner):

s.str.split(',', expand=True).join(s.str.count(',').rename('_count_sep'))

 0 1 _count_sep
0 name age 1
1 something None 0
2 tom 20 1

edited Dec 20 '18 at 8:37

answered Dec 20 '18 at 5:35

coldspeed

120k19119194

edited Dec 20 '18 at 8:37

answered Dec 20 '18 at 5:35

coldspeed

120k19119194

answered Dec 20 '18 at 5:35

coldspeed

120k19119194

answered Dec 20 '18 at 5:35

coldspeed

120k19119194

We are on the same road:-) cheers
– W-B
Dec 20 '18 at 5:38

@W-B yup did not see until I posted... great minds.. huh? ;)
– coldspeed
Dec 20 '18 at 5:39

1

I read your mind hahahaha:-)
– W-B
Dec 20 '18 at 5:39

But learn new strcount:-) thanks man
– W-B
Dec 20 '18 at 5:40

1

Your answers stopped me from thinking otherwise
– Dark
Dec 20 '18 at 5:44

add a comment |

We are on the same road:-) cheers
– W-B
Dec 20 '18 at 5:38

@W-B yup did not see until I posted... great minds.. huh? ;)
– coldspeed
Dec 20 '18 at 5:39

1

I read your mind hahahaha:-)
– W-B
Dec 20 '18 at 5:39

But learn new strcount:-) thanks man
– W-B
Dec 20 '18 at 5:40

1

Your answers stopped me from thinking otherwise
– Dark
Dec 20 '18 at 5:44

We are on the same road:-) cheers
– W-B
Dec 20 '18 at 5:38

@W-B yup did not see until I posted... great minds.. huh? ;)
– coldspeed
Dec 20 '18 at 5:39

I read your mind hahahaha:-)
– W-B
Dec 20 '18 at 5:39

But learn new strcount:-) thanks man
– W-B
Dec 20 '18 at 5:40

Your answers stopped me from thinking otherwise
– Dark
Dec 20 '18 at 5:44

add a comment |

Doing this

df = pd.read_csv('file', header=None)
df2 = pd.read_csv('file', header=None,sep='|') # using another sep for read your csv again 

df2['0'].str.findall(',').str.len() # then one row into one cell , using str find 
0 1
1 0
2 1
3 5
Name: 0, dtype: int64

df['_count_separators']=df2['0'].str.findall(',').str.len()

Data

name,age
something
tom,20
something,,,,,somethingelse

answered Dec 20 '18 at 5:31

W-B

101k73163

add a comment |

Doing this

df = pd.read_csv('file', header=None)
df2 = pd.read_csv('file', header=None,sep='|') # using another sep for read your csv again 

df2['0'].str.findall(',').str.len() # then one row into one cell , using str find 
0 1
1 0
2 1
3 5
Name: 0, dtype: int64

df['_count_separators']=df2['0'].str.findall(',').str.len()

Data

name,age
something
tom,20
something,,,,,somethingelse

answered Dec 20 '18 at 5:31

W-B

101k73163

add a comment |

Doing this

df = pd.read_csv('file', header=None)
df2 = pd.read_csv('file', header=None,sep='|') # using another sep for read your csv again 

df2['0'].str.findall(',').str.len() # then one row into one cell , using str find 
0 1
1 0
2 1
3 5
Name: 0, dtype: int64

df['_count_separators']=df2['0'].str.findall(',').str.len()

Data

name,age
something
tom,20
something,,,,,somethingelse

answered Dec 20 '18 at 5:31

W-B

101k73163

Doing this

df = pd.read_csv('file', header=None)
df2 = pd.read_csv('file', header=None,sep='|') # using another sep for read your csv again 

df2['0'].str.findall(',').str.len() # then one row into one cell , using str find 
0 1
1 0
2 1
3 5
Name: 0, dtype: int64

df['_count_separators']=df2['0'].str.findall(',').str.len()

Data

name,age
something
tom,20
something,,,,,somethingelse

answered Dec 20 '18 at 5:31

W-B

101k73163

answered Dec 20 '18 at 5:31

W-B

101k73163

answered Dec 20 '18 at 5:31

W-B

101k73163

answered Dec 20 '18 at 5:31

W-B

101k73163

add a comment |

You can use the csv module for the counting delimiters. This is a two-pass solution, but not necessarily inefficient versus alternative one-pass solutions.

from io import StringIO
import csv, pandas as pd, numpy as np

x = """name,age
something
tom,20"""

# replace StringIO(x) with open('file.csv', 'r')
with StringIO(x) as fin:
 delim_counts = np.fromiter(map(len, csv.reader(fin)), dtype=int)

# replace StringIO(x) with 'file.csv'
df = pd.read_csv(StringIO(x), header=None)
df['_count_separators'] = delim_counts - 1

print(df)

 0 1 _count_separators
0 name age 1
1 something NaN 0
2 tom 20 1

answered Dec 20 '18 at 23:39

jpp

91.7k2052102

add a comment |

You can use the csv module for the counting delimiters. This is a two-pass solution, but not necessarily inefficient versus alternative one-pass solutions.

from io import StringIO
import csv, pandas as pd, numpy as np

x = """name,age
something
tom,20"""

# replace StringIO(x) with open('file.csv', 'r')
with StringIO(x) as fin:
 delim_counts = np.fromiter(map(len, csv.reader(fin)), dtype=int)

# replace StringIO(x) with 'file.csv'
df = pd.read_csv(StringIO(x), header=None)
df['_count_separators'] = delim_counts - 1

print(df)

 0 1 _count_separators
0 name age 1
1 something NaN 0
2 tom 20 1

answered Dec 20 '18 at 23:39

jpp

91.7k2052102

add a comment |

You can use the csv module for the counting delimiters. This is a two-pass solution, but not necessarily inefficient versus alternative one-pass solutions.

from io import StringIO
import csv, pandas as pd, numpy as np

x = """name,age
something
tom,20"""

# replace StringIO(x) with open('file.csv', 'r')
with StringIO(x) as fin:
 delim_counts = np.fromiter(map(len, csv.reader(fin)), dtype=int)

# replace StringIO(x) with 'file.csv'
df = pd.read_csv(StringIO(x), header=None)
df['_count_separators'] = delim_counts - 1

print(df)

 0 1 _count_separators
0 name age 1
1 something NaN 0
2 tom 20 1

answered Dec 20 '18 at 23:39

jpp

91.7k2052102

You can use the csv module for the counting delimiters. This is a two-pass solution, but not necessarily inefficient versus alternative one-pass solutions.

from io import StringIO
import csv, pandas as pd, numpy as np

x = """name,age
something
tom,20"""

# replace StringIO(x) with open('file.csv', 'r')
with StringIO(x) as fin:
 delim_counts = np.fromiter(map(len, csv.reader(fin)), dtype=int)

# replace StringIO(x) with 'file.csv'
df = pd.read_csv(StringIO(x), header=None)
df['_count_separators'] = delim_counts - 1

print(df)

 0 1 _count_separators
0 name age 1
1 something NaN 0
2 tom 20 1

answered Dec 20 '18 at 23:39

jpp

91.7k2052102

answered Dec 20 '18 at 23:39

jpp

91.7k2052102

answered Dec 20 '18 at 23:39

jpp

91.7k2052102

answered Dec 20 '18 at 23:39

jpp

91.7k2052102

add a comment |

One line of code: len(df) - df[1].isna().sum()

answered Dec 20 '18 at 5:31

Quang Hoang

1,7471913

Ohk if the nan itself is a part of the dataset then? like something,,,something?
– Dark
Dec 20 '18 at 5:32

i'm not sure in which instance would df = pd.read_csv('file.csv', header=None) give a nan in his sample.
– Quang Hoang
Dec 20 '18 at 5:35

This assumes there are only two columns...?
– coldspeed
Dec 20 '18 at 5:40

add a comment |

One line of code: len(df) - df[1].isna().sum()

answered Dec 20 '18 at 5:31

Quang Hoang

1,7471913

Ohk if the nan itself is a part of the dataset then? like something,,,something?
– Dark
Dec 20 '18 at 5:32

i'm not sure in which instance would df = pd.read_csv('file.csv', header=None) give a nan in his sample.
– Quang Hoang
Dec 20 '18 at 5:35

This assumes there are only two columns...?
– coldspeed
Dec 20 '18 at 5:40

add a comment |

One line of code: len(df) - df[1].isna().sum()

answered Dec 20 '18 at 5:31

Quang Hoang

1,7471913

One line of code: len(df) - df[1].isna().sum()

answered Dec 20 '18 at 5:31

Quang Hoang

1,7471913

answered Dec 20 '18 at 5:31

Quang Hoang

1,7471913

answered Dec 20 '18 at 5:31

Quang Hoang

1,7471913

answered Dec 20 '18 at 5:31

Quang Hoang

1,7471913

Ohk if the nan itself is a part of the dataset then? like something,,,something?
– Dark
Dec 20 '18 at 5:32

i'm not sure in which instance would df = pd.read_csv('file.csv', header=None) give a nan in his sample.
– Quang Hoang
Dec 20 '18 at 5:35

This assumes there are only two columns...?
– coldspeed
Dec 20 '18 at 5:40

add a comment |

Ohk if the nan itself is a part of the dataset then? like something,,,something?
– Dark
Dec 20 '18 at 5:32

i'm not sure in which instance would df = pd.read_csv('file.csv', header=None) give a nan in his sample.
– Quang Hoang
Dec 20 '18 at 5:35

This assumes there are only two columns...?
– coldspeed
Dec 20 '18 at 5:40

Ohk if the nan itself is a part of the dataset then? like something,,,something?
– Dark
Dec 20 '18 at 5:32

i'm not sure in which instance would df = pd.read_csv('file.csv', header=None) give a nan in his sample.
– Quang Hoang
Dec 20 '18 at 5:35

This assumes there are only two columns...?
– coldspeed
Dec 20 '18 at 5:40

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

搜尋此網誌

mjhjmtu