Groupby and append lists and strings

I am trying to group-by the values in my "value_1" column. But my last column is made up of lists. When I try to group-by using my "value_1" column, the column made up of lists disappears.

Dataframe:

 value_1: value_2: value_3: list: 
 american california, nyc walmart, kmart [supermarket, connivence] 
 canadian toronto dunkinDonuts [coffee]
 american texas [state]
 canadian walmart [supermarket] 
 ... ... ... ....

My expected output is:

value_1: value_2: value_3: list: 
american california, nyc, texas walmart, kmart [supermarket, connivence, state] 
canadian toronto dunkinDonuts, walmart [coffee, supermarket]

Thanks!

edited Mar 3 at 16:02

yatu

15.3k41542

asked Mar 1 at 12:07

user11076352

add a comment |

I am trying to group-by the values in my "value_1" column. But my last column is made up of lists. When I try to group-by using my "value_1" column, the column made up of lists disappears.

Dataframe:

 value_1: value_2: value_3: list: 
 american california, nyc walmart, kmart [supermarket, connivence] 
 canadian toronto dunkinDonuts [coffee]
 american texas [state]
 canadian walmart [supermarket] 
 ... ... ... ....

My expected output is:

value_1: value_2: value_3: list: 
american california, nyc, texas walmart, kmart [supermarket, connivence, state] 
canadian toronto dunkinDonuts, walmart [coffee, supermarket]

Thanks!

edited Mar 3 at 16:02

yatu

15.3k41542

asked Mar 1 at 12:07

user11076352

add a comment |

I am trying to group-by the values in my "value_1" column. But my last column is made up of lists. When I try to group-by using my "value_1" column, the column made up of lists disappears.

Dataframe:

 value_1: value_2: value_3: list: 
 american california, nyc walmart, kmart [supermarket, connivence] 
 canadian toronto dunkinDonuts [coffee]
 american texas [state]
 canadian walmart [supermarket] 
 ... ... ... ....

My expected output is:

value_1: value_2: value_3: list: 
american california, nyc, texas walmart, kmart [supermarket, connivence, state] 
canadian toronto dunkinDonuts, walmart [coffee, supermarket]

Thanks!

edited Mar 3 at 16:02

yatu

15.3k41542

asked Mar 1 at 12:07

user11076352

I am trying to group-by the values in my "value_1" column. But my last column is made up of lists. When I try to group-by using my "value_1" column, the column made up of lists disappears.

Dataframe:

 value_1: value_2: value_3: list: 
 american california, nyc walmart, kmart [supermarket, connivence] 
 canadian toronto dunkinDonuts [coffee]
 american texas [state]
 canadian walmart [supermarket] 
 ... ... ... ....

My expected output is:

value_1: value_2: value_3: list: 
american california, nyc, texas walmart, kmart [supermarket, connivence, state] 
canadian toronto dunkinDonuts, walmart [coffee, supermarket]

Thanks!

python pandas

edited Mar 3 at 16:02

yatu

15.3k41542

asked Mar 1 at 12:07

user11076352

edited Mar 3 at 16:02

yatu

15.3k41542

asked Mar 1 at 12:07

user11076352

edited Mar 3 at 16:02

yatu

15.3k41542

edited Mar 3 at 16:02

yatu

15.3k41542

edited Mar 3 at 16:02

yatu

15.3k41542

asked Mar 1 at 12:07

user11076352

asked Mar 1 at 12:07

user11076352

asked Mar 1 at 12:07

user11076352

add a comment |

2 Answers
2

active

oldest

votes

Create dynamically dictionary by all columns with no list and value_1 and for list use lambda function with list comprehension with flatenning:

f1 = lambda x: ', '.join(x.dropna())
#alternative for join only strings
#f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])
f2 = lambda x: [z for y in x for z in y]
d = dict.fromkeys(df.columns.difference(['value_1','list']), f1)
d['list'] = f2 

df = df.groupby('value_1', as_index=False).agg(d)
print (df)
 value_1 value_2 value_3 
0 american california, nyc, texas walmart, kmart 
1 canadian toronto dunkinDonuts, walmart 

 list 
0 [supermarket, connivence, state] 
1 [coffee, supermarket]

Explanation:

f1 and f2 are lambda functions.

First remove missing values (if exist) and join strings with separator:

f1 = lambda x: ', '.join(x.dropna())

First get only strings values (omit missing values, because NaNs) and join strings with separator:

f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])

First get all string values with filtering empty strings and join strings with separator:

f1 = lambda x: ', '.join([y for y in x if y != ''])

Function f2 is for flatten lists, because after aggregation get nested lists like [['a','b'], ['c']]

f2 = lambda x: [z for y in x for z in y]

edited Mar 1 at 13:12

answered Mar 1 at 12:15

jezrael

352k26317391

add a comment |

You could groupby value_1 and aggregate the columns containing strings with the following function:

def str_cat(x):
 return x.str.cat(sep=', ')

And use GroupBy.sum to append the lists in the column list:

df.replace('',None).groupby('value_1').agg('list':'sum', 'value_2': str_cat,
 'value_3': str_cat)

 list value_2 
value_1 
american [supermarket, connivence, state] california, nyc, texas 
canadian [coffee, sipermarket] toronto, texas 

 value_3 
value_1 
american walmart, kmart, dunkinDonuts 
canadian dunkinDonuts, walmart

edited Mar 3 at 15:53

answered Mar 1 at 12:14

yatu

15.3k41542

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54944344%2fgroupby-and-append-lists-and-strings%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Create dynamically dictionary by all columns with no list and value_1 and for list use lambda function with list comprehension with flatenning:

f1 = lambda x: ', '.join(x.dropna())
#alternative for join only strings
#f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])
f2 = lambda x: [z for y in x for z in y]
d = dict.fromkeys(df.columns.difference(['value_1','list']), f1)
d['list'] = f2 

df = df.groupby('value_1', as_index=False).agg(d)
print (df)
 value_1 value_2 value_3 
0 american california, nyc, texas walmart, kmart 
1 canadian toronto dunkinDonuts, walmart 

 list 
0 [supermarket, connivence, state] 
1 [coffee, supermarket]

Explanation:

f1 and f2 are lambda functions.

First remove missing values (if exist) and join strings with separator:

f1 = lambda x: ', '.join(x.dropna())

First get only strings values (omit missing values, because NaNs) and join strings with separator:

f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])

First get all string values with filtering empty strings and join strings with separator:

f1 = lambda x: ', '.join([y for y in x if y != ''])

Function f2 is for flatten lists, because after aggregation get nested lists like [['a','b'], ['c']]

f2 = lambda x: [z for y in x for z in y]

edited Mar 1 at 13:12

answered Mar 1 at 12:15

jezrael

352k26317391

add a comment |

Create dynamically dictionary by all columns with no list and value_1 and for list use lambda function with list comprehension with flatenning:

f1 = lambda x: ', '.join(x.dropna())
#alternative for join only strings
#f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])
f2 = lambda x: [z for y in x for z in y]
d = dict.fromkeys(df.columns.difference(['value_1','list']), f1)
d['list'] = f2 

df = df.groupby('value_1', as_index=False).agg(d)
print (df)
 value_1 value_2 value_3 
0 american california, nyc, texas walmart, kmart 
1 canadian toronto dunkinDonuts, walmart 

 list 
0 [supermarket, connivence, state] 
1 [coffee, supermarket]

Explanation:

f1 and f2 are lambda functions.

First remove missing values (if exist) and join strings with separator:

f1 = lambda x: ', '.join(x.dropna())

First get only strings values (omit missing values, because NaNs) and join strings with separator:

f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])

First get all string values with filtering empty strings and join strings with separator:

f1 = lambda x: ', '.join([y for y in x if y != ''])

Function f2 is for flatten lists, because after aggregation get nested lists like [['a','b'], ['c']]

f2 = lambda x: [z for y in x for z in y]

edited Mar 1 at 13:12

answered Mar 1 at 12:15

jezrael

352k26317391

add a comment |

Create dynamically dictionary by all columns with no list and value_1 and for list use lambda function with list comprehension with flatenning:

f1 = lambda x: ', '.join(x.dropna())
#alternative for join only strings
#f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])
f2 = lambda x: [z for y in x for z in y]
d = dict.fromkeys(df.columns.difference(['value_1','list']), f1)
d['list'] = f2 

df = df.groupby('value_1', as_index=False).agg(d)
print (df)
 value_1 value_2 value_3 
0 american california, nyc, texas walmart, kmart 
1 canadian toronto dunkinDonuts, walmart 

 list 
0 [supermarket, connivence, state] 
1 [coffee, supermarket]

Explanation:

f1 and f2 are lambda functions.

First remove missing values (if exist) and join strings with separator:

f1 = lambda x: ', '.join(x.dropna())

First get only strings values (omit missing values, because NaNs) and join strings with separator:

f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])

First get all string values with filtering empty strings and join strings with separator:

f1 = lambda x: ', '.join([y for y in x if y != ''])

Function f2 is for flatten lists, because after aggregation get nested lists like [['a','b'], ['c']]

f2 = lambda x: [z for y in x for z in y]

edited Mar 1 at 13:12

answered Mar 1 at 12:15

jezrael

352k26317391

Create dynamically dictionary by all columns with no list and value_1 and for list use lambda function with list comprehension with flatenning:

f1 = lambda x: ', '.join(x.dropna())
#alternative for join only strings
#f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])
f2 = lambda x: [z for y in x for z in y]
d = dict.fromkeys(df.columns.difference(['value_1','list']), f1)
d['list'] = f2 

df = df.groupby('value_1', as_index=False).agg(d)
print (df)
 value_1 value_2 value_3 
0 american california, nyc, texas walmart, kmart 
1 canadian toronto dunkinDonuts, walmart 

 list 
0 [supermarket, connivence, state] 
1 [coffee, supermarket]

Explanation:

f1 and f2 are lambda functions.

First remove missing values (if exist) and join strings with separator:

f1 = lambda x: ', '.join(x.dropna())

First get only strings values (omit missing values, because NaNs) and join strings with separator:

f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])

First get all string values with filtering empty strings and join strings with separator:

f1 = lambda x: ', '.join([y for y in x if y != ''])

Function f2 is for flatten lists, because after aggregation get nested lists like [['a','b'], ['c']]

f2 = lambda x: [z for y in x for z in y]

edited Mar 1 at 13:12

answered Mar 1 at 12:15

jezrael

352k26317391

edited Mar 1 at 13:12

answered Mar 1 at 12:15

jezrael

352k26317391

answered Mar 1 at 12:15

jezrael

352k26317391

answered Mar 1 at 12:15

jezrael

352k26317391

add a comment |

You could groupby value_1 and aggregate the columns containing strings with the following function:

def str_cat(x):
 return x.str.cat(sep=', ')

And use GroupBy.sum to append the lists in the column list:

df.replace('',None).groupby('value_1').agg('list':'sum', 'value_2': str_cat,
 'value_3': str_cat)

 list value_2 
value_1 
american [supermarket, connivence, state] california, nyc, texas 
canadian [coffee, sipermarket] toronto, texas 

 value_3 
value_1 
american walmart, kmart, dunkinDonuts 
canadian dunkinDonuts, walmart

edited Mar 3 at 15:53

answered Mar 1 at 12:14

yatu

15.3k41542

add a comment |

You could groupby value_1 and aggregate the columns containing strings with the following function:

def str_cat(x):
 return x.str.cat(sep=', ')

And use GroupBy.sum to append the lists in the column list:

df.replace('',None).groupby('value_1').agg('list':'sum', 'value_2': str_cat,
 'value_3': str_cat)

 list value_2 
value_1 
american [supermarket, connivence, state] california, nyc, texas 
canadian [coffee, sipermarket] toronto, texas 

 value_3 
value_1 
american walmart, kmart, dunkinDonuts 
canadian dunkinDonuts, walmart

edited Mar 3 at 15:53

answered Mar 1 at 12:14

yatu

15.3k41542

add a comment |

You could groupby value_1 and aggregate the columns containing strings with the following function:

def str_cat(x):
 return x.str.cat(sep=', ')

And use GroupBy.sum to append the lists in the column list:

df.replace('',None).groupby('value_1').agg('list':'sum', 'value_2': str_cat,
 'value_3': str_cat)

 list value_2 
value_1 
american [supermarket, connivence, state] california, nyc, texas 
canadian [coffee, sipermarket] toronto, texas 

 value_3 
value_1 
american walmart, kmart, dunkinDonuts 
canadian dunkinDonuts, walmart

edited Mar 3 at 15:53

answered Mar 1 at 12:14

yatu

15.3k41542

You could groupby value_1 and aggregate the columns containing strings with the following function:

def str_cat(x):
 return x.str.cat(sep=', ')

And use GroupBy.sum to append the lists in the column list:

df.replace('',None).groupby('value_1').agg('list':'sum', 'value_2': str_cat,
 'value_3': str_cat)

 list value_2 
value_1 
american [supermarket, connivence, state] california, nyc, texas 
canadian [coffee, sipermarket] toronto, texas 

 value_3 
value_1 
american walmart, kmart, dunkinDonuts 
canadian dunkinDonuts, walmart

edited Mar 3 at 15:53

answered Mar 1 at 12:14

yatu

15.3k41542

edited Mar 3 at 15:53

answered Mar 1 at 12:14

yatu

15.3k41542

answered Mar 1 at 12:14

yatu

15.3k41542

answered Mar 1 at 12:14

yatu

15.3k41542

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

搜尋此網誌

mjhjmtu