Groupby and append lists and strings

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP












7















I am trying to group-by the values in my "value_1" column. But my last column is made up of lists. When I try to group-by using my "value_1" column, the column made up of lists disappears.



Dataframe:



 value_1: value_2: value_3: list: 
american california, nyc walmart, kmart [supermarket, connivence]
canadian toronto dunkinDonuts [coffee]
american texas [state]
canadian walmart [supermarket]
... ... ... ....


My expected output is:



value_1: value_2: value_3: list: 
american california, nyc, texas walmart, kmart [supermarket, connivence, state]
canadian toronto dunkinDonuts, walmart [coffee, supermarket]


Thanks!










share|improve this question




























    7















    I am trying to group-by the values in my "value_1" column. But my last column is made up of lists. When I try to group-by using my "value_1" column, the column made up of lists disappears.



    Dataframe:



     value_1: value_2: value_3: list: 
    american california, nyc walmart, kmart [supermarket, connivence]
    canadian toronto dunkinDonuts [coffee]
    american texas [state]
    canadian walmart [supermarket]
    ... ... ... ....


    My expected output is:



    value_1: value_2: value_3: list: 
    american california, nyc, texas walmart, kmart [supermarket, connivence, state]
    canadian toronto dunkinDonuts, walmart [coffee, supermarket]


    Thanks!










    share|improve this question


























      7












      7








      7


      1






      I am trying to group-by the values in my "value_1" column. But my last column is made up of lists. When I try to group-by using my "value_1" column, the column made up of lists disappears.



      Dataframe:



       value_1: value_2: value_3: list: 
      american california, nyc walmart, kmart [supermarket, connivence]
      canadian toronto dunkinDonuts [coffee]
      american texas [state]
      canadian walmart [supermarket]
      ... ... ... ....


      My expected output is:



      value_1: value_2: value_3: list: 
      american california, nyc, texas walmart, kmart [supermarket, connivence, state]
      canadian toronto dunkinDonuts, walmart [coffee, supermarket]


      Thanks!










      share|improve this question
















      I am trying to group-by the values in my "value_1" column. But my last column is made up of lists. When I try to group-by using my "value_1" column, the column made up of lists disappears.



      Dataframe:



       value_1: value_2: value_3: list: 
      american california, nyc walmart, kmart [supermarket, connivence]
      canadian toronto dunkinDonuts [coffee]
      american texas [state]
      canadian walmart [supermarket]
      ... ... ... ....


      My expected output is:



      value_1: value_2: value_3: list: 
      american california, nyc, texas walmart, kmart [supermarket, connivence, state]
      canadian toronto dunkinDonuts, walmart [coffee, supermarket]


      Thanks!







      python pandas






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Mar 3 at 16:02









      yatu

      15.3k41542




      15.3k41542










      asked Mar 1 at 12:07







      user11076352





























          2 Answers
          2






          active

          oldest

          votes


















          2














          Create dynamically dictionary by all columns with no list and value_1 and for list use lambda function with list comprehension with flatenning:



          f1 = lambda x: ', '.join(x.dropna())
          #alternative for join only strings
          #f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])
          f2 = lambda x: [z for y in x for z in y]
          d = dict.fromkeys(df.columns.difference(['value_1','list']), f1)
          d['list'] = f2

          df = df.groupby('value_1', as_index=False).agg(d)
          print (df)
          value_1 value_2 value_3
          0 american california, nyc, texas walmart, kmart
          1 canadian toronto dunkinDonuts, walmart

          list
          0 [supermarket, connivence, state]
          1 [coffee, supermarket]


          Explanation:



          f1 and f2 are lambda functions.



          First remove missing values (if exist) and join strings with separator:



          f1 = lambda x: ', '.join(x.dropna())


          First get only strings values (omit missing values, because NaNs) and join strings with separator:



          f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])


          First get all string values with filtering empty strings and join strings with separator:



          f1 = lambda x: ', '.join([y for y in x if y != '']) 


          Function f2 is for flatten lists, because after aggregation get nested lists like [['a','b'], ['c']]



          f2 = lambda x: [z for y in x for z in y]





          share|improve this answer
































            5














            You could groupby value_1 and aggregate the columns containing strings with the following function:



            def str_cat(x):
            return x.str.cat(sep=', ')


            And use GroupBy.sum to append the lists in the column list:



            df.replace('',None).groupby('value_1').agg('list':'sum', 'value_2': str_cat,
            'value_3': str_cat)

            list value_2
            value_1
            american [supermarket, connivence, state] california, nyc, texas
            canadian [coffee, sipermarket] toronto, texas

            value_3
            value_1
            american walmart, kmart, dunkinDonuts
            canadian dunkinDonuts, walmart





            share|improve this answer

























              Your Answer






              StackExchange.ifUsing("editor", function ()
              StackExchange.using("externalEditor", function ()
              StackExchange.using("snippets", function ()
              StackExchange.snippets.init();
              );
              );
              , "code-snippets");

              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "1"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader:
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              ,
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );













              draft saved

              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54944344%2fgroupby-and-append-lists-and-strings%23new-answer', 'question_page');

              );

              Post as a guest















              Required, but never shown
























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              2














              Create dynamically dictionary by all columns with no list and value_1 and for list use lambda function with list comprehension with flatenning:



              f1 = lambda x: ', '.join(x.dropna())
              #alternative for join only strings
              #f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])
              f2 = lambda x: [z for y in x for z in y]
              d = dict.fromkeys(df.columns.difference(['value_1','list']), f1)
              d['list'] = f2

              df = df.groupby('value_1', as_index=False).agg(d)
              print (df)
              value_1 value_2 value_3
              0 american california, nyc, texas walmart, kmart
              1 canadian toronto dunkinDonuts, walmart

              list
              0 [supermarket, connivence, state]
              1 [coffee, supermarket]


              Explanation:



              f1 and f2 are lambda functions.



              First remove missing values (if exist) and join strings with separator:



              f1 = lambda x: ', '.join(x.dropna())


              First get only strings values (omit missing values, because NaNs) and join strings with separator:



              f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])


              First get all string values with filtering empty strings and join strings with separator:



              f1 = lambda x: ', '.join([y for y in x if y != '']) 


              Function f2 is for flatten lists, because after aggregation get nested lists like [['a','b'], ['c']]



              f2 = lambda x: [z for y in x for z in y]





              share|improve this answer





























                2














                Create dynamically dictionary by all columns with no list and value_1 and for list use lambda function with list comprehension with flatenning:



                f1 = lambda x: ', '.join(x.dropna())
                #alternative for join only strings
                #f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])
                f2 = lambda x: [z for y in x for z in y]
                d = dict.fromkeys(df.columns.difference(['value_1','list']), f1)
                d['list'] = f2

                df = df.groupby('value_1', as_index=False).agg(d)
                print (df)
                value_1 value_2 value_3
                0 american california, nyc, texas walmart, kmart
                1 canadian toronto dunkinDonuts, walmart

                list
                0 [supermarket, connivence, state]
                1 [coffee, supermarket]


                Explanation:



                f1 and f2 are lambda functions.



                First remove missing values (if exist) and join strings with separator:



                f1 = lambda x: ', '.join(x.dropna())


                First get only strings values (omit missing values, because NaNs) and join strings with separator:



                f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])


                First get all string values with filtering empty strings and join strings with separator:



                f1 = lambda x: ', '.join([y for y in x if y != '']) 


                Function f2 is for flatten lists, because after aggregation get nested lists like [['a','b'], ['c']]



                f2 = lambda x: [z for y in x for z in y]





                share|improve this answer



























                  2












                  2








                  2







                  Create dynamically dictionary by all columns with no list and value_1 and for list use lambda function with list comprehension with flatenning:



                  f1 = lambda x: ', '.join(x.dropna())
                  #alternative for join only strings
                  #f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])
                  f2 = lambda x: [z for y in x for z in y]
                  d = dict.fromkeys(df.columns.difference(['value_1','list']), f1)
                  d['list'] = f2

                  df = df.groupby('value_1', as_index=False).agg(d)
                  print (df)
                  value_1 value_2 value_3
                  0 american california, nyc, texas walmart, kmart
                  1 canadian toronto dunkinDonuts, walmart

                  list
                  0 [supermarket, connivence, state]
                  1 [coffee, supermarket]


                  Explanation:



                  f1 and f2 are lambda functions.



                  First remove missing values (if exist) and join strings with separator:



                  f1 = lambda x: ', '.join(x.dropna())


                  First get only strings values (omit missing values, because NaNs) and join strings with separator:



                  f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])


                  First get all string values with filtering empty strings and join strings with separator:



                  f1 = lambda x: ', '.join([y for y in x if y != '']) 


                  Function f2 is for flatten lists, because after aggregation get nested lists like [['a','b'], ['c']]



                  f2 = lambda x: [z for y in x for z in y]





                  share|improve this answer















                  Create dynamically dictionary by all columns with no list and value_1 and for list use lambda function with list comprehension with flatenning:



                  f1 = lambda x: ', '.join(x.dropna())
                  #alternative for join only strings
                  #f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])
                  f2 = lambda x: [z for y in x for z in y]
                  d = dict.fromkeys(df.columns.difference(['value_1','list']), f1)
                  d['list'] = f2

                  df = df.groupby('value_1', as_index=False).agg(d)
                  print (df)
                  value_1 value_2 value_3
                  0 american california, nyc, texas walmart, kmart
                  1 canadian toronto dunkinDonuts, walmart

                  list
                  0 [supermarket, connivence, state]
                  1 [coffee, supermarket]


                  Explanation:



                  f1 and f2 are lambda functions.



                  First remove missing values (if exist) and join strings with separator:



                  f1 = lambda x: ', '.join(x.dropna())


                  First get only strings values (omit missing values, because NaNs) and join strings with separator:



                  f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])


                  First get all string values with filtering empty strings and join strings with separator:



                  f1 = lambda x: ', '.join([y for y in x if y != '']) 


                  Function f2 is for flatten lists, because after aggregation get nested lists like [['a','b'], ['c']]



                  f2 = lambda x: [z for y in x for z in y]






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Mar 1 at 13:12

























                  answered Mar 1 at 12:15









                  jezraeljezrael

                  352k26317391




                  352k26317391























                      5














                      You could groupby value_1 and aggregate the columns containing strings with the following function:



                      def str_cat(x):
                      return x.str.cat(sep=', ')


                      And use GroupBy.sum to append the lists in the column list:



                      df.replace('',None).groupby('value_1').agg('list':'sum', 'value_2': str_cat,
                      'value_3': str_cat)

                      list value_2
                      value_1
                      american [supermarket, connivence, state] california, nyc, texas
                      canadian [coffee, sipermarket] toronto, texas

                      value_3
                      value_1
                      american walmart, kmart, dunkinDonuts
                      canadian dunkinDonuts, walmart





                      share|improve this answer





























                        5














                        You could groupby value_1 and aggregate the columns containing strings with the following function:



                        def str_cat(x):
                        return x.str.cat(sep=', ')


                        And use GroupBy.sum to append the lists in the column list:



                        df.replace('',None).groupby('value_1').agg('list':'sum', 'value_2': str_cat,
                        'value_3': str_cat)

                        list value_2
                        value_1
                        american [supermarket, connivence, state] california, nyc, texas
                        canadian [coffee, sipermarket] toronto, texas

                        value_3
                        value_1
                        american walmart, kmart, dunkinDonuts
                        canadian dunkinDonuts, walmart





                        share|improve this answer



























                          5












                          5








                          5







                          You could groupby value_1 and aggregate the columns containing strings with the following function:



                          def str_cat(x):
                          return x.str.cat(sep=', ')


                          And use GroupBy.sum to append the lists in the column list:



                          df.replace('',None).groupby('value_1').agg('list':'sum', 'value_2': str_cat,
                          'value_3': str_cat)

                          list value_2
                          value_1
                          american [supermarket, connivence, state] california, nyc, texas
                          canadian [coffee, sipermarket] toronto, texas

                          value_3
                          value_1
                          american walmart, kmart, dunkinDonuts
                          canadian dunkinDonuts, walmart





                          share|improve this answer















                          You could groupby value_1 and aggregate the columns containing strings with the following function:



                          def str_cat(x):
                          return x.str.cat(sep=', ')


                          And use GroupBy.sum to append the lists in the column list:



                          df.replace('',None).groupby('value_1').agg('list':'sum', 'value_2': str_cat,
                          'value_3': str_cat)

                          list value_2
                          value_1
                          american [supermarket, connivence, state] california, nyc, texas
                          canadian [coffee, sipermarket] toronto, texas

                          value_3
                          value_1
                          american walmart, kmart, dunkinDonuts
                          canadian dunkinDonuts, walmart






                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited Mar 3 at 15:53

























                          answered Mar 1 at 12:14









                          yatuyatu

                          15.3k41542




                          15.3k41542



























                              draft saved

                              draft discarded
















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid


                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.

                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54944344%2fgroupby-and-append-lists-and-strings%23new-answer', 'question_page');

                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown






                              Popular posts from this blog

                              How to check contact read email or not when send email to Individual?

                              Displaying single band from multi-band raster using QGIS

                              How many registers does an x86_64 CPU actually have?