Count separators in CSV rows with Pandas

Clash Royale CLAN TAG#URR8PPP
I have a csv file as follows:
name,age
something
tom,20
And when I put it into a dataframe it looks like:
df = pd.read_csv('file', header=None)
0 1
1 name age
2 something NaN
3 tom 20
How would I get the count of a comma in the raw row data. For example, the answer should look like:
# in pseudocode
df['_count_separators'] = len(df.raw_value.count(','))
0 1 _count_separators
1 name age 1
2 something NaN 0
3 tom 20 1
python python-3.x pandas csv dataframe
add a comment |
I have a csv file as follows:
name,age
something
tom,20
And when I put it into a dataframe it looks like:
df = pd.read_csv('file', header=None)
0 1
1 name age
2 something NaN
3 tom 20
How would I get the count of a comma in the raw row data. For example, the answer should look like:
# in pseudocode
df['_count_separators'] = len(df.raw_value.count(','))
0 1 _count_separators
1 name age 1
2 something NaN 0
3 tom 20 1
python python-3.x pandas csv dataframe
do you also want to count the commas if they're in the column value?
– Omkar Sabade
Dec 20 '18 at 5:22
@OmkarSabade preferably just to get the number of separators thatpandasinferred -- but either way is acceptable.
– David L
Dec 20 '18 at 5:24
I was hoping for a format like this:1🐼tom🐼20🐼tom@doe.com
– Martijn
Dec 20 '18 at 14:17
add a comment |
I have a csv file as follows:
name,age
something
tom,20
And when I put it into a dataframe it looks like:
df = pd.read_csv('file', header=None)
0 1
1 name age
2 something NaN
3 tom 20
How would I get the count of a comma in the raw row data. For example, the answer should look like:
# in pseudocode
df['_count_separators'] = len(df.raw_value.count(','))
0 1 _count_separators
1 name age 1
2 something NaN 0
3 tom 20 1
python python-3.x pandas csv dataframe
I have a csv file as follows:
name,age
something
tom,20
And when I put it into a dataframe it looks like:
df = pd.read_csv('file', header=None)
0 1
1 name age
2 something NaN
3 tom 20
How would I get the count of a comma in the raw row data. For example, the answer should look like:
# in pseudocode
df['_count_separators'] = len(df.raw_value.count(','))
0 1 _count_separators
1 name age 1
2 something NaN 0
3 tom 20 1
python python-3.x pandas csv dataframe
python python-3.x pandas csv dataframe
edited Dec 20 '18 at 12:30
coldspeed
120k19119194
120k19119194
asked Dec 20 '18 at 5:15
David L
2549
2549
do you also want to count the commas if they're in the column value?
– Omkar Sabade
Dec 20 '18 at 5:22
@OmkarSabade preferably just to get the number of separators thatpandasinferred -- but either way is acceptable.
– David L
Dec 20 '18 at 5:24
I was hoping for a format like this:1🐼tom🐼20🐼tom@doe.com
– Martijn
Dec 20 '18 at 14:17
add a comment |
do you also want to count the commas if they're in the column value?
– Omkar Sabade
Dec 20 '18 at 5:22
@OmkarSabade preferably just to get the number of separators thatpandasinferred -- but either way is acceptable.
– David L
Dec 20 '18 at 5:24
I was hoping for a format like this:1🐼tom🐼20🐼tom@doe.com
– Martijn
Dec 20 '18 at 14:17
do you also want to count the commas if they're in the column value?
– Omkar Sabade
Dec 20 '18 at 5:22
do you also want to count the commas if they're in the column value?
– Omkar Sabade
Dec 20 '18 at 5:22
@OmkarSabade preferably just to get the number of separators that
pandas inferred -- but either way is acceptable.– David L
Dec 20 '18 at 5:24
@OmkarSabade preferably just to get the number of separators that
pandas inferred -- but either way is acceptable.– David L
Dec 20 '18 at 5:24
I was hoping for a format like this:
1🐼tom🐼20🐼tom@doe.com– Martijn
Dec 20 '18 at 14:17
I was hoping for a format like this:
1🐼tom🐼20🐼tom@doe.com– Martijn
Dec 20 '18 at 14:17
add a comment |
4 Answers
4
active
oldest
votes
Very simply, read your data as a single column series, then split on comma and concatenate with separator count.
# s = pd.read_csv(pd.compat.StringIO(text), sep=r'|', squeeze=True, header=None)
s = pd.read_csv('/path/to/file.csv', sep=r'|', squeeze=True, header=None)
pd.concat([
s.str.split(',', expand=True),
s.str.count(',').rename('_count_sep')
], axis=1)
0 1 _count_sep
0 name age 1
1 something None 0
2 tom 20 1
Another solution for concatenation is to join on the index (this is a neat one liner):
s.str.split(',', expand=True).join(s.str.count(',').rename('_count_sep'))
0 1 _count_sep
0 name age 1
1 something None 0
2 tom 20 1
We are on the same road:-) cheers
– W-B
Dec 20 '18 at 5:38
@W-B yup did not see until I posted... great minds.. huh? ;)
– coldspeed
Dec 20 '18 at 5:39
1
I read your mind hahahaha:-)
– W-B
Dec 20 '18 at 5:39
But learn new strcount:-) thanks man
– W-B
Dec 20 '18 at 5:40
1
Your answers stopped me from thinking otherwise
– Dark
Dec 20 '18 at 5:44
add a comment |
Doing this
df = pd.read_csv('file', header=None)
df2 = pd.read_csv('file', header=None,sep='|') # using another sep for read your csv again
df2['0'].str.findall(',').str.len() # then one row into one cell , using str find
0 1
1 0
2 1
3 5
Name: 0, dtype: int64
df['_count_separators']=df2['0'].str.findall(',').str.len()
Data
name,age
something
tom,20
something,,,,,somethingelse
add a comment |
You can use the csv module for the counting delimiters. This is a two-pass solution, but not necessarily inefficient versus alternative one-pass solutions.
from io import StringIO
import csv, pandas as pd, numpy as np
x = """name,age
something
tom,20"""
# replace StringIO(x) with open('file.csv', 'r')
with StringIO(x) as fin:
delim_counts = np.fromiter(map(len, csv.reader(fin)), dtype=int)
# replace StringIO(x) with 'file.csv'
df = pd.read_csv(StringIO(x), header=None)
df['_count_separators'] = delim_counts - 1
print(df)
0 1 _count_separators
0 name age 1
1 something NaN 0
2 tom 20 1
add a comment |
One line of code: len(df) - df[1].isna().sum()
Ohk if the nan itself is a part of the dataset then? likesomething,,,something?
– Dark
Dec 20 '18 at 5:32
i'm not sure in which instance woulddf = pd.read_csv('file.csv', header=None)give ananin his sample.
– Quang Hoang
Dec 20 '18 at 5:35
This assumes there are only two columns...?
– coldspeed
Dec 20 '18 at 5:40
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53862765%2fcount-separators-in-csv-rows-with-pandas%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
Very simply, read your data as a single column series, then split on comma and concatenate with separator count.
# s = pd.read_csv(pd.compat.StringIO(text), sep=r'|', squeeze=True, header=None)
s = pd.read_csv('/path/to/file.csv', sep=r'|', squeeze=True, header=None)
pd.concat([
s.str.split(',', expand=True),
s.str.count(',').rename('_count_sep')
], axis=1)
0 1 _count_sep
0 name age 1
1 something None 0
2 tom 20 1
Another solution for concatenation is to join on the index (this is a neat one liner):
s.str.split(',', expand=True).join(s.str.count(',').rename('_count_sep'))
0 1 _count_sep
0 name age 1
1 something None 0
2 tom 20 1
We are on the same road:-) cheers
– W-B
Dec 20 '18 at 5:38
@W-B yup did not see until I posted... great minds.. huh? ;)
– coldspeed
Dec 20 '18 at 5:39
1
I read your mind hahahaha:-)
– W-B
Dec 20 '18 at 5:39
But learn new strcount:-) thanks man
– W-B
Dec 20 '18 at 5:40
1
Your answers stopped me from thinking otherwise
– Dark
Dec 20 '18 at 5:44
add a comment |
Very simply, read your data as a single column series, then split on comma and concatenate with separator count.
# s = pd.read_csv(pd.compat.StringIO(text), sep=r'|', squeeze=True, header=None)
s = pd.read_csv('/path/to/file.csv', sep=r'|', squeeze=True, header=None)
pd.concat([
s.str.split(',', expand=True),
s.str.count(',').rename('_count_sep')
], axis=1)
0 1 _count_sep
0 name age 1
1 something None 0
2 tom 20 1
Another solution for concatenation is to join on the index (this is a neat one liner):
s.str.split(',', expand=True).join(s.str.count(',').rename('_count_sep'))
0 1 _count_sep
0 name age 1
1 something None 0
2 tom 20 1
We are on the same road:-) cheers
– W-B
Dec 20 '18 at 5:38
@W-B yup did not see until I posted... great minds.. huh? ;)
– coldspeed
Dec 20 '18 at 5:39
1
I read your mind hahahaha:-)
– W-B
Dec 20 '18 at 5:39
But learn new strcount:-) thanks man
– W-B
Dec 20 '18 at 5:40
1
Your answers stopped me from thinking otherwise
– Dark
Dec 20 '18 at 5:44
add a comment |
Very simply, read your data as a single column series, then split on comma and concatenate with separator count.
# s = pd.read_csv(pd.compat.StringIO(text), sep=r'|', squeeze=True, header=None)
s = pd.read_csv('/path/to/file.csv', sep=r'|', squeeze=True, header=None)
pd.concat([
s.str.split(',', expand=True),
s.str.count(',').rename('_count_sep')
], axis=1)
0 1 _count_sep
0 name age 1
1 something None 0
2 tom 20 1
Another solution for concatenation is to join on the index (this is a neat one liner):
s.str.split(',', expand=True).join(s.str.count(',').rename('_count_sep'))
0 1 _count_sep
0 name age 1
1 something None 0
2 tom 20 1
Very simply, read your data as a single column series, then split on comma and concatenate with separator count.
# s = pd.read_csv(pd.compat.StringIO(text), sep=r'|', squeeze=True, header=None)
s = pd.read_csv('/path/to/file.csv', sep=r'|', squeeze=True, header=None)
pd.concat([
s.str.split(',', expand=True),
s.str.count(',').rename('_count_sep')
], axis=1)
0 1 _count_sep
0 name age 1
1 something None 0
2 tom 20 1
Another solution for concatenation is to join on the index (this is a neat one liner):
s.str.split(',', expand=True).join(s.str.count(',').rename('_count_sep'))
0 1 _count_sep
0 name age 1
1 something None 0
2 tom 20 1
edited Dec 20 '18 at 8:37
answered Dec 20 '18 at 5:35
coldspeed
120k19119194
120k19119194
We are on the same road:-) cheers
– W-B
Dec 20 '18 at 5:38
@W-B yup did not see until I posted... great minds.. huh? ;)
– coldspeed
Dec 20 '18 at 5:39
1
I read your mind hahahaha:-)
– W-B
Dec 20 '18 at 5:39
But learn new strcount:-) thanks man
– W-B
Dec 20 '18 at 5:40
1
Your answers stopped me from thinking otherwise
– Dark
Dec 20 '18 at 5:44
add a comment |
We are on the same road:-) cheers
– W-B
Dec 20 '18 at 5:38
@W-B yup did not see until I posted... great minds.. huh? ;)
– coldspeed
Dec 20 '18 at 5:39
1
I read your mind hahahaha:-)
– W-B
Dec 20 '18 at 5:39
But learn new strcount:-) thanks man
– W-B
Dec 20 '18 at 5:40
1
Your answers stopped me from thinking otherwise
– Dark
Dec 20 '18 at 5:44
We are on the same road:-) cheers
– W-B
Dec 20 '18 at 5:38
We are on the same road:-) cheers
– W-B
Dec 20 '18 at 5:38
@W-B yup did not see until I posted... great minds.. huh? ;)
– coldspeed
Dec 20 '18 at 5:39
@W-B yup did not see until I posted... great minds.. huh? ;)
– coldspeed
Dec 20 '18 at 5:39
1
1
I read your mind hahahaha:-)
– W-B
Dec 20 '18 at 5:39
I read your mind hahahaha:-)
– W-B
Dec 20 '18 at 5:39
But learn new strcount:-) thanks man
– W-B
Dec 20 '18 at 5:40
But learn new strcount:-) thanks man
– W-B
Dec 20 '18 at 5:40
1
1
Your answers stopped me from thinking otherwise
– Dark
Dec 20 '18 at 5:44
Your answers stopped me from thinking otherwise
– Dark
Dec 20 '18 at 5:44
add a comment |
Doing this
df = pd.read_csv('file', header=None)
df2 = pd.read_csv('file', header=None,sep='|') # using another sep for read your csv again
df2['0'].str.findall(',').str.len() # then one row into one cell , using str find
0 1
1 0
2 1
3 5
Name: 0, dtype: int64
df['_count_separators']=df2['0'].str.findall(',').str.len()
Data
name,age
something
tom,20
something,,,,,somethingelse
add a comment |
Doing this
df = pd.read_csv('file', header=None)
df2 = pd.read_csv('file', header=None,sep='|') # using another sep for read your csv again
df2['0'].str.findall(',').str.len() # then one row into one cell , using str find
0 1
1 0
2 1
3 5
Name: 0, dtype: int64
df['_count_separators']=df2['0'].str.findall(',').str.len()
Data
name,age
something
tom,20
something,,,,,somethingelse
add a comment |
Doing this
df = pd.read_csv('file', header=None)
df2 = pd.read_csv('file', header=None,sep='|') # using another sep for read your csv again
df2['0'].str.findall(',').str.len() # then one row into one cell , using str find
0 1
1 0
2 1
3 5
Name: 0, dtype: int64
df['_count_separators']=df2['0'].str.findall(',').str.len()
Data
name,age
something
tom,20
something,,,,,somethingelse
Doing this
df = pd.read_csv('file', header=None)
df2 = pd.read_csv('file', header=None,sep='|') # using another sep for read your csv again
df2['0'].str.findall(',').str.len() # then one row into one cell , using str find
0 1
1 0
2 1
3 5
Name: 0, dtype: int64
df['_count_separators']=df2['0'].str.findall(',').str.len()
Data
name,age
something
tom,20
something,,,,,somethingelse
answered Dec 20 '18 at 5:31
W-B
101k73163
101k73163
add a comment |
add a comment |
You can use the csv module for the counting delimiters. This is a two-pass solution, but not necessarily inefficient versus alternative one-pass solutions.
from io import StringIO
import csv, pandas as pd, numpy as np
x = """name,age
something
tom,20"""
# replace StringIO(x) with open('file.csv', 'r')
with StringIO(x) as fin:
delim_counts = np.fromiter(map(len, csv.reader(fin)), dtype=int)
# replace StringIO(x) with 'file.csv'
df = pd.read_csv(StringIO(x), header=None)
df['_count_separators'] = delim_counts - 1
print(df)
0 1 _count_separators
0 name age 1
1 something NaN 0
2 tom 20 1
add a comment |
You can use the csv module for the counting delimiters. This is a two-pass solution, but not necessarily inefficient versus alternative one-pass solutions.
from io import StringIO
import csv, pandas as pd, numpy as np
x = """name,age
something
tom,20"""
# replace StringIO(x) with open('file.csv', 'r')
with StringIO(x) as fin:
delim_counts = np.fromiter(map(len, csv.reader(fin)), dtype=int)
# replace StringIO(x) with 'file.csv'
df = pd.read_csv(StringIO(x), header=None)
df['_count_separators'] = delim_counts - 1
print(df)
0 1 _count_separators
0 name age 1
1 something NaN 0
2 tom 20 1
add a comment |
You can use the csv module for the counting delimiters. This is a two-pass solution, but not necessarily inefficient versus alternative one-pass solutions.
from io import StringIO
import csv, pandas as pd, numpy as np
x = """name,age
something
tom,20"""
# replace StringIO(x) with open('file.csv', 'r')
with StringIO(x) as fin:
delim_counts = np.fromiter(map(len, csv.reader(fin)), dtype=int)
# replace StringIO(x) with 'file.csv'
df = pd.read_csv(StringIO(x), header=None)
df['_count_separators'] = delim_counts - 1
print(df)
0 1 _count_separators
0 name age 1
1 something NaN 0
2 tom 20 1
You can use the csv module for the counting delimiters. This is a two-pass solution, but not necessarily inefficient versus alternative one-pass solutions.
from io import StringIO
import csv, pandas as pd, numpy as np
x = """name,age
something
tom,20"""
# replace StringIO(x) with open('file.csv', 'r')
with StringIO(x) as fin:
delim_counts = np.fromiter(map(len, csv.reader(fin)), dtype=int)
# replace StringIO(x) with 'file.csv'
df = pd.read_csv(StringIO(x), header=None)
df['_count_separators'] = delim_counts - 1
print(df)
0 1 _count_separators
0 name age 1
1 something NaN 0
2 tom 20 1
answered Dec 20 '18 at 23:39
jpp
91.7k2052102
91.7k2052102
add a comment |
add a comment |
One line of code: len(df) - df[1].isna().sum()
Ohk if the nan itself is a part of the dataset then? likesomething,,,something?
– Dark
Dec 20 '18 at 5:32
i'm not sure in which instance woulddf = pd.read_csv('file.csv', header=None)give ananin his sample.
– Quang Hoang
Dec 20 '18 at 5:35
This assumes there are only two columns...?
– coldspeed
Dec 20 '18 at 5:40
add a comment |
One line of code: len(df) - df[1].isna().sum()
Ohk if the nan itself is a part of the dataset then? likesomething,,,something?
– Dark
Dec 20 '18 at 5:32
i'm not sure in which instance woulddf = pd.read_csv('file.csv', header=None)give ananin his sample.
– Quang Hoang
Dec 20 '18 at 5:35
This assumes there are only two columns...?
– coldspeed
Dec 20 '18 at 5:40
add a comment |
One line of code: len(df) - df[1].isna().sum()
One line of code: len(df) - df[1].isna().sum()
answered Dec 20 '18 at 5:31
Quang Hoang
1,7471913
1,7471913
Ohk if the nan itself is a part of the dataset then? likesomething,,,something?
– Dark
Dec 20 '18 at 5:32
i'm not sure in which instance woulddf = pd.read_csv('file.csv', header=None)give ananin his sample.
– Quang Hoang
Dec 20 '18 at 5:35
This assumes there are only two columns...?
– coldspeed
Dec 20 '18 at 5:40
add a comment |
Ohk if the nan itself is a part of the dataset then? likesomething,,,something?
– Dark
Dec 20 '18 at 5:32
i'm not sure in which instance woulddf = pd.read_csv('file.csv', header=None)give ananin his sample.
– Quang Hoang
Dec 20 '18 at 5:35
This assumes there are only two columns...?
– coldspeed
Dec 20 '18 at 5:40
Ohk if the nan itself is a part of the dataset then? like
something,,,something?– Dark
Dec 20 '18 at 5:32
Ohk if the nan itself is a part of the dataset then? like
something,,,something?– Dark
Dec 20 '18 at 5:32
i'm not sure in which instance would
df = pd.read_csv('file.csv', header=None) give a nan in his sample.– Quang Hoang
Dec 20 '18 at 5:35
i'm not sure in which instance would
df = pd.read_csv('file.csv', header=None) give a nan in his sample.– Quang Hoang
Dec 20 '18 at 5:35
This assumes there are only two columns...?
– coldspeed
Dec 20 '18 at 5:40
This assumes there are only two columns...?
– coldspeed
Dec 20 '18 at 5:40
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53862765%2fcount-separators-in-csv-rows-with-pandas%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
do you also want to count the commas if they're in the column value?
– Omkar Sabade
Dec 20 '18 at 5:22
@OmkarSabade preferably just to get the number of separators that
pandasinferred -- but either way is acceptable.– David L
Dec 20 '18 at 5:24
I was hoping for a format like this:
1🐼tom🐼20🐼tom@doe.com– Martijn
Dec 20 '18 at 14:17