A pythonic and uFunc-y way to turn pandas column into “increasing” index? [duplicate]

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP












7















This question already has an answer here:



  • Count each group sequentially pandas

    3 answers



  • Add numeric column to pandas dataframe based on other textual column

    5 answers



Let's say I have a pandas df like so:



Index A B
0 foo 3
1 foo 2
2 foo 5
3 bar 3
4 bar 4
5 baz 5


What's a good fast way to add a column like so:



Index A B Aidx
0 foo 3 0
1 foo 2 0
2 foo 5 0
3 bar 3 1
4 bar 4 1
5 baz 5 2


I.e. adding an increasing index for each unique value?



I know I could use df.unique(), then use a dict and enumerate to create a lookup, and then apply that dictionary lookup to create the column. But I feel like there should be faster way, possibly involving groupby with some special function?










share|improve this question













marked as duplicate by coldspeed pandas
Users with the  pandas badge can single-handedly close pandas questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Dec 14 at 3:54


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.



















    7















    This question already has an answer here:



    • Count each group sequentially pandas

      3 answers



    • Add numeric column to pandas dataframe based on other textual column

      5 answers



    Let's say I have a pandas df like so:



    Index A B
    0 foo 3
    1 foo 2
    2 foo 5
    3 bar 3
    4 bar 4
    5 baz 5


    What's a good fast way to add a column like so:



    Index A B Aidx
    0 foo 3 0
    1 foo 2 0
    2 foo 5 0
    3 bar 3 1
    4 bar 4 1
    5 baz 5 2


    I.e. adding an increasing index for each unique value?



    I know I could use df.unique(), then use a dict and enumerate to create a lookup, and then apply that dictionary lookup to create the column. But I feel like there should be faster way, possibly involving groupby with some special function?










    share|improve this question













    marked as duplicate by coldspeed pandas
    Users with the  pandas badge can single-handedly close pandas questions as duplicates and reopen them as needed.

    StackExchange.ready(function()
    if (StackExchange.options.isMobile) return;

    $('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
    var $hover = $(this).addClass('hover-bound'),
    $msg = $hover.siblings('.dupe-hammer-message');

    $hover.hover(
    function()
    $hover.showInfoMessage('',
    messageElement: $msg.clone().show(),
    transient: false,
    position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
    dismissable: false,
    relativeToBody: true
    );
    ,
    function()
    StackExchange.helpers.removeMessages();

    );
    );
    );
    Dec 14 at 3:54


    This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

















      7












      7








      7


      2






      This question already has an answer here:



      • Count each group sequentially pandas

        3 answers



      • Add numeric column to pandas dataframe based on other textual column

        5 answers



      Let's say I have a pandas df like so:



      Index A B
      0 foo 3
      1 foo 2
      2 foo 5
      3 bar 3
      4 bar 4
      5 baz 5


      What's a good fast way to add a column like so:



      Index A B Aidx
      0 foo 3 0
      1 foo 2 0
      2 foo 5 0
      3 bar 3 1
      4 bar 4 1
      5 baz 5 2


      I.e. adding an increasing index for each unique value?



      I know I could use df.unique(), then use a dict and enumerate to create a lookup, and then apply that dictionary lookup to create the column. But I feel like there should be faster way, possibly involving groupby with some special function?










      share|improve this question














      This question already has an answer here:



      • Count each group sequentially pandas

        3 answers



      • Add numeric column to pandas dataframe based on other textual column

        5 answers



      Let's say I have a pandas df like so:



      Index A B
      0 foo 3
      1 foo 2
      2 foo 5
      3 bar 3
      4 bar 4
      5 baz 5


      What's a good fast way to add a column like so:



      Index A B Aidx
      0 foo 3 0
      1 foo 2 0
      2 foo 5 0
      3 bar 3 1
      4 bar 4 1
      5 baz 5 2


      I.e. adding an increasing index for each unique value?



      I know I could use df.unique(), then use a dict and enumerate to create a lookup, and then apply that dictionary lookup to create the column. But I feel like there should be faster way, possibly involving groupby with some special function?





      This question already has an answer here:



      • Count each group sequentially pandas

        3 answers



      • Add numeric column to pandas dataframe based on other textual column

        5 answers







      python pandas






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Dec 14 at 0:48









      Lagerbaer

      2,6351124




      2,6351124




      marked as duplicate by coldspeed pandas
      Users with the  pandas badge can single-handedly close pandas questions as duplicates and reopen them as needed.

      StackExchange.ready(function()
      if (StackExchange.options.isMobile) return;

      $('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
      var $hover = $(this).addClass('hover-bound'),
      $msg = $hover.siblings('.dupe-hammer-message');

      $hover.hover(
      function()
      $hover.showInfoMessage('',
      messageElement: $msg.clone().show(),
      transient: false,
      position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
      dismissable: false,
      relativeToBody: true
      );
      ,
      function()
      StackExchange.helpers.removeMessages();

      );
      );
      );
      Dec 14 at 3:54


      This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.






      marked as duplicate by coldspeed pandas
      Users with the  pandas badge can single-handedly close pandas questions as duplicates and reopen them as needed.

      StackExchange.ready(function()
      if (StackExchange.options.isMobile) return;

      $('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
      var $hover = $(this).addClass('hover-bound'),
      $msg = $hover.siblings('.dupe-hammer-message');

      $hover.hover(
      function()
      $hover.showInfoMessage('',
      messageElement: $msg.clone().show(),
      transient: false,
      position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
      dismissable: false,
      relativeToBody: true
      );
      ,
      function()
      StackExchange.helpers.removeMessages();

      );
      );
      );
      Dec 14 at 3:54


      This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
























          3 Answers
          3






          active

          oldest

          votes


















          7














          No need groupby using




          Method 1factorize



          pd.factorize(df.A)[0]
          array([0, 0, 0, 1, 1, 2], dtype=int64)
          #df['Aidx']=pd.factorize(df.A)[0]



          Method 2 sklearn



          from sklearn import preprocessing
          le = preprocessing.LabelEncoder()
          le.fit(df.A)
          LabelEncoder()
          le.transform(df.A)
          array([2, 2, 2, 0, 0, 1])



          Method 3 cat.codes



          df.A.astype('category').cat.codes



          Method 4 map + unique



          l=df.A.unique()
          df.A.map(dict(zip(l,range(len(l)))))
          0 0
          1 0
          2 0
          3 1
          4 1
          5 2
          Name: A, dtype: int64



          Method 5 np.unique



          x,y=np.unique(df.A.values,return_inverse=True)
          y
          array([2, 2, 2, 0, 0, 1], dtype=int64)


          EDIT: Some timings with OP's dataframe



          '''



          %timeit pd.factorize(view.Company)[0]

          The slowest run took 6.68 times longer than the fastest. This could mean that an intermediate result is being cached.
          10000 loops, best of 3: 155 µs per loop

          %timeit view.Company.astype('category').cat.codes

          The slowest run took 4.48 times longer than the fastest. This could mean that an intermediate result is being cached.
          1000 loops, best of 3: 449 µs per loop

          from itertools import izip

          %timeit l = view.Company.unique(); view.Company.map(dict(izip(l,xrange(len(l)))))

          1000 loops, best of 3: 666 µs per loop

          import numpy as np

          %timeit np.unique(view.Company.values, return_inverse=True)

          The slowest run took 8.08 times longer than the fastest. This could mean that an intermediate result is being cached.
          10000 loops, best of 3: 32.7 µs per loop


          Seems like numpy wins.






          share|improve this answer






















          • Good solutions, they should be really fast as well. May be add time comparison as OP is looking for the most efficient solution
            – Vaishali
            Dec 14 at 2:11











          • @Vaishali sorry it is hard for me to get the timing , would you mind add that for me , thanks a lot
            – W-B
            Dec 14 at 2:23


















          7














          One way is to use ngroup. Just remember you have to make sure your groupby isn't resorting the groups to get your desired output, so set sort=False:



          df['Aidx'] = df.groupby('A',sort=False).ngroup()
          >>> df
          Index A B Aidx
          0 0 foo 3 0
          1 1 foo 2 0
          2 2 foo 5 0
          3 3 bar 3 1
          4 4 bar 4 1
          5 5 baz 5 2





          share|improve this answer






























            5














            One more method of doing so could be.



            df['C'] = i.ne(df.A.shift()).cumsum()-1
            df


            When we print df value it will be as follows.



             Index A B C
            0 0 foo 3 0
            1 1 foo 2 0
            2 2 foo 5 0
            3 3 bar 3 1
            4 4 bar 4 1
            5 5 baz 5 2


            Explanation of solution: Let's break above solution into parts for understanding purposes.



            1st step: Compare df's A column by shifting its value down to itself as follows.



            i.ne(df.A.shift())


            Output we will get is:



            0 True
            1 False
            2 False
            3 True
            4 False
            5 True


            2nd step: Use of cumsum() function, so wherever TRUE value is coming(which will come when a match of A column and its shift is NOT found) it will call cumsum() function and its value will be increased.



            i.ne(df.A.shift()).cumsum()-1
            0 0
            1 0
            2 0
            3 1
            4 1
            5 2
            Name: A, dtype: int32


            3rd step: Save command's value into df['C'] which will create a new column named C in df.






            share|improve this answer


















            • 1




              Nice method ve++ for you
              – W-B
              Dec 14 at 1:32










            • @W-B, thank you for encouragement sir, ++ve for your unique style already :)
              – RavinderSingh13
              Dec 14 at 1:34


















            3 Answers
            3






            active

            oldest

            votes








            3 Answers
            3






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            7














            No need groupby using




            Method 1factorize



            pd.factorize(df.A)[0]
            array([0, 0, 0, 1, 1, 2], dtype=int64)
            #df['Aidx']=pd.factorize(df.A)[0]



            Method 2 sklearn



            from sklearn import preprocessing
            le = preprocessing.LabelEncoder()
            le.fit(df.A)
            LabelEncoder()
            le.transform(df.A)
            array([2, 2, 2, 0, 0, 1])



            Method 3 cat.codes



            df.A.astype('category').cat.codes



            Method 4 map + unique



            l=df.A.unique()
            df.A.map(dict(zip(l,range(len(l)))))
            0 0
            1 0
            2 0
            3 1
            4 1
            5 2
            Name: A, dtype: int64



            Method 5 np.unique



            x,y=np.unique(df.A.values,return_inverse=True)
            y
            array([2, 2, 2, 0, 0, 1], dtype=int64)


            EDIT: Some timings with OP's dataframe



            '''



            %timeit pd.factorize(view.Company)[0]

            The slowest run took 6.68 times longer than the fastest. This could mean that an intermediate result is being cached.
            10000 loops, best of 3: 155 µs per loop

            %timeit view.Company.astype('category').cat.codes

            The slowest run took 4.48 times longer than the fastest. This could mean that an intermediate result is being cached.
            1000 loops, best of 3: 449 µs per loop

            from itertools import izip

            %timeit l = view.Company.unique(); view.Company.map(dict(izip(l,xrange(len(l)))))

            1000 loops, best of 3: 666 µs per loop

            import numpy as np

            %timeit np.unique(view.Company.values, return_inverse=True)

            The slowest run took 8.08 times longer than the fastest. This could mean that an intermediate result is being cached.
            10000 loops, best of 3: 32.7 µs per loop


            Seems like numpy wins.






            share|improve this answer






















            • Good solutions, they should be really fast as well. May be add time comparison as OP is looking for the most efficient solution
              – Vaishali
              Dec 14 at 2:11











            • @Vaishali sorry it is hard for me to get the timing , would you mind add that for me , thanks a lot
              – W-B
              Dec 14 at 2:23















            7














            No need groupby using




            Method 1factorize



            pd.factorize(df.A)[0]
            array([0, 0, 0, 1, 1, 2], dtype=int64)
            #df['Aidx']=pd.factorize(df.A)[0]



            Method 2 sklearn



            from sklearn import preprocessing
            le = preprocessing.LabelEncoder()
            le.fit(df.A)
            LabelEncoder()
            le.transform(df.A)
            array([2, 2, 2, 0, 0, 1])



            Method 3 cat.codes



            df.A.astype('category').cat.codes



            Method 4 map + unique



            l=df.A.unique()
            df.A.map(dict(zip(l,range(len(l)))))
            0 0
            1 0
            2 0
            3 1
            4 1
            5 2
            Name: A, dtype: int64



            Method 5 np.unique



            x,y=np.unique(df.A.values,return_inverse=True)
            y
            array([2, 2, 2, 0, 0, 1], dtype=int64)


            EDIT: Some timings with OP's dataframe



            '''



            %timeit pd.factorize(view.Company)[0]

            The slowest run took 6.68 times longer than the fastest. This could mean that an intermediate result is being cached.
            10000 loops, best of 3: 155 µs per loop

            %timeit view.Company.astype('category').cat.codes

            The slowest run took 4.48 times longer than the fastest. This could mean that an intermediate result is being cached.
            1000 loops, best of 3: 449 µs per loop

            from itertools import izip

            %timeit l = view.Company.unique(); view.Company.map(dict(izip(l,xrange(len(l)))))

            1000 loops, best of 3: 666 µs per loop

            import numpy as np

            %timeit np.unique(view.Company.values, return_inverse=True)

            The slowest run took 8.08 times longer than the fastest. This could mean that an intermediate result is being cached.
            10000 loops, best of 3: 32.7 µs per loop


            Seems like numpy wins.






            share|improve this answer






















            • Good solutions, they should be really fast as well. May be add time comparison as OP is looking for the most efficient solution
              – Vaishali
              Dec 14 at 2:11











            • @Vaishali sorry it is hard for me to get the timing , would you mind add that for me , thanks a lot
              – W-B
              Dec 14 at 2:23













            7












            7








            7






            No need groupby using




            Method 1factorize



            pd.factorize(df.A)[0]
            array([0, 0, 0, 1, 1, 2], dtype=int64)
            #df['Aidx']=pd.factorize(df.A)[0]



            Method 2 sklearn



            from sklearn import preprocessing
            le = preprocessing.LabelEncoder()
            le.fit(df.A)
            LabelEncoder()
            le.transform(df.A)
            array([2, 2, 2, 0, 0, 1])



            Method 3 cat.codes



            df.A.astype('category').cat.codes



            Method 4 map + unique



            l=df.A.unique()
            df.A.map(dict(zip(l,range(len(l)))))
            0 0
            1 0
            2 0
            3 1
            4 1
            5 2
            Name: A, dtype: int64



            Method 5 np.unique



            x,y=np.unique(df.A.values,return_inverse=True)
            y
            array([2, 2, 2, 0, 0, 1], dtype=int64)


            EDIT: Some timings with OP's dataframe



            '''



            %timeit pd.factorize(view.Company)[0]

            The slowest run took 6.68 times longer than the fastest. This could mean that an intermediate result is being cached.
            10000 loops, best of 3: 155 µs per loop

            %timeit view.Company.astype('category').cat.codes

            The slowest run took 4.48 times longer than the fastest. This could mean that an intermediate result is being cached.
            1000 loops, best of 3: 449 µs per loop

            from itertools import izip

            %timeit l = view.Company.unique(); view.Company.map(dict(izip(l,xrange(len(l)))))

            1000 loops, best of 3: 666 µs per loop

            import numpy as np

            %timeit np.unique(view.Company.values, return_inverse=True)

            The slowest run took 8.08 times longer than the fastest. This could mean that an intermediate result is being cached.
            10000 loops, best of 3: 32.7 µs per loop


            Seems like numpy wins.






            share|improve this answer














            No need groupby using




            Method 1factorize



            pd.factorize(df.A)[0]
            array([0, 0, 0, 1, 1, 2], dtype=int64)
            #df['Aidx']=pd.factorize(df.A)[0]



            Method 2 sklearn



            from sklearn import preprocessing
            le = preprocessing.LabelEncoder()
            le.fit(df.A)
            LabelEncoder()
            le.transform(df.A)
            array([2, 2, 2, 0, 0, 1])



            Method 3 cat.codes



            df.A.astype('category').cat.codes



            Method 4 map + unique



            l=df.A.unique()
            df.A.map(dict(zip(l,range(len(l)))))
            0 0
            1 0
            2 0
            3 1
            4 1
            5 2
            Name: A, dtype: int64



            Method 5 np.unique



            x,y=np.unique(df.A.values,return_inverse=True)
            y
            array([2, 2, 2, 0, 0, 1], dtype=int64)


            EDIT: Some timings with OP's dataframe



            '''



            %timeit pd.factorize(view.Company)[0]

            The slowest run took 6.68 times longer than the fastest. This could mean that an intermediate result is being cached.
            10000 loops, best of 3: 155 µs per loop

            %timeit view.Company.astype('category').cat.codes

            The slowest run took 4.48 times longer than the fastest. This could mean that an intermediate result is being cached.
            1000 loops, best of 3: 449 µs per loop

            from itertools import izip

            %timeit l = view.Company.unique(); view.Company.map(dict(izip(l,xrange(len(l)))))

            1000 loops, best of 3: 666 µs per loop

            import numpy as np

            %timeit np.unique(view.Company.values, return_inverse=True)

            The slowest run took 8.08 times longer than the fastest. This could mean that an intermediate result is being cached.
            10000 loops, best of 3: 32.7 µs per loop


            Seems like numpy wins.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Dec 14 at 17:14









            Lagerbaer

            2,6351124




            2,6351124










            answered Dec 14 at 1:30









            W-B

            99.8k73163




            99.8k73163











            • Good solutions, they should be really fast as well. May be add time comparison as OP is looking for the most efficient solution
              – Vaishali
              Dec 14 at 2:11











            • @Vaishali sorry it is hard for me to get the timing , would you mind add that for me , thanks a lot
              – W-B
              Dec 14 at 2:23
















            • Good solutions, they should be really fast as well. May be add time comparison as OP is looking for the most efficient solution
              – Vaishali
              Dec 14 at 2:11











            • @Vaishali sorry it is hard for me to get the timing , would you mind add that for me , thanks a lot
              – W-B
              Dec 14 at 2:23















            Good solutions, they should be really fast as well. May be add time comparison as OP is looking for the most efficient solution
            – Vaishali
            Dec 14 at 2:11





            Good solutions, they should be really fast as well. May be add time comparison as OP is looking for the most efficient solution
            – Vaishali
            Dec 14 at 2:11













            @Vaishali sorry it is hard for me to get the timing , would you mind add that for me , thanks a lot
            – W-B
            Dec 14 at 2:23




            @Vaishali sorry it is hard for me to get the timing , would you mind add that for me , thanks a lot
            – W-B
            Dec 14 at 2:23













            7














            One way is to use ngroup. Just remember you have to make sure your groupby isn't resorting the groups to get your desired output, so set sort=False:



            df['Aidx'] = df.groupby('A',sort=False).ngroup()
            >>> df
            Index A B Aidx
            0 0 foo 3 0
            1 1 foo 2 0
            2 2 foo 5 0
            3 3 bar 3 1
            4 4 bar 4 1
            5 5 baz 5 2





            share|improve this answer



























              7














              One way is to use ngroup. Just remember you have to make sure your groupby isn't resorting the groups to get your desired output, so set sort=False:



              df['Aidx'] = df.groupby('A',sort=False).ngroup()
              >>> df
              Index A B Aidx
              0 0 foo 3 0
              1 1 foo 2 0
              2 2 foo 5 0
              3 3 bar 3 1
              4 4 bar 4 1
              5 5 baz 5 2





              share|improve this answer

























                7












                7








                7






                One way is to use ngroup. Just remember you have to make sure your groupby isn't resorting the groups to get your desired output, so set sort=False:



                df['Aidx'] = df.groupby('A',sort=False).ngroup()
                >>> df
                Index A B Aidx
                0 0 foo 3 0
                1 1 foo 2 0
                2 2 foo 5 0
                3 3 bar 3 1
                4 4 bar 4 1
                5 5 baz 5 2





                share|improve this answer














                One way is to use ngroup. Just remember you have to make sure your groupby isn't resorting the groups to get your desired output, so set sort=False:



                df['Aidx'] = df.groupby('A',sort=False).ngroup()
                >>> df
                Index A B Aidx
                0 0 foo 3 0
                1 1 foo 2 0
                2 2 foo 5 0
                3 3 bar 3 1
                4 4 bar 4 1
                5 5 baz 5 2






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Dec 14 at 1:49

























                answered Dec 14 at 0:50









                sacul

                29.9k41740




                29.9k41740





















                    5














                    One more method of doing so could be.



                    df['C'] = i.ne(df.A.shift()).cumsum()-1
                    df


                    When we print df value it will be as follows.



                     Index A B C
                    0 0 foo 3 0
                    1 1 foo 2 0
                    2 2 foo 5 0
                    3 3 bar 3 1
                    4 4 bar 4 1
                    5 5 baz 5 2


                    Explanation of solution: Let's break above solution into parts for understanding purposes.



                    1st step: Compare df's A column by shifting its value down to itself as follows.



                    i.ne(df.A.shift())


                    Output we will get is:



                    0 True
                    1 False
                    2 False
                    3 True
                    4 False
                    5 True


                    2nd step: Use of cumsum() function, so wherever TRUE value is coming(which will come when a match of A column and its shift is NOT found) it will call cumsum() function and its value will be increased.



                    i.ne(df.A.shift()).cumsum()-1
                    0 0
                    1 0
                    2 0
                    3 1
                    4 1
                    5 2
                    Name: A, dtype: int32


                    3rd step: Save command's value into df['C'] which will create a new column named C in df.






                    share|improve this answer


















                    • 1




                      Nice method ve++ for you
                      – W-B
                      Dec 14 at 1:32










                    • @W-B, thank you for encouragement sir, ++ve for your unique style already :)
                      – RavinderSingh13
                      Dec 14 at 1:34
















                    5














                    One more method of doing so could be.



                    df['C'] = i.ne(df.A.shift()).cumsum()-1
                    df


                    When we print df value it will be as follows.



                     Index A B C
                    0 0 foo 3 0
                    1 1 foo 2 0
                    2 2 foo 5 0
                    3 3 bar 3 1
                    4 4 bar 4 1
                    5 5 baz 5 2


                    Explanation of solution: Let's break above solution into parts for understanding purposes.



                    1st step: Compare df's A column by shifting its value down to itself as follows.



                    i.ne(df.A.shift())


                    Output we will get is:



                    0 True
                    1 False
                    2 False
                    3 True
                    4 False
                    5 True


                    2nd step: Use of cumsum() function, so wherever TRUE value is coming(which will come when a match of A column and its shift is NOT found) it will call cumsum() function and its value will be increased.



                    i.ne(df.A.shift()).cumsum()-1
                    0 0
                    1 0
                    2 0
                    3 1
                    4 1
                    5 2
                    Name: A, dtype: int32


                    3rd step: Save command's value into df['C'] which will create a new column named C in df.






                    share|improve this answer


















                    • 1




                      Nice method ve++ for you
                      – W-B
                      Dec 14 at 1:32










                    • @W-B, thank you for encouragement sir, ++ve for your unique style already :)
                      – RavinderSingh13
                      Dec 14 at 1:34














                    5












                    5








                    5






                    One more method of doing so could be.



                    df['C'] = i.ne(df.A.shift()).cumsum()-1
                    df


                    When we print df value it will be as follows.



                     Index A B C
                    0 0 foo 3 0
                    1 1 foo 2 0
                    2 2 foo 5 0
                    3 3 bar 3 1
                    4 4 bar 4 1
                    5 5 baz 5 2


                    Explanation of solution: Let's break above solution into parts for understanding purposes.



                    1st step: Compare df's A column by shifting its value down to itself as follows.



                    i.ne(df.A.shift())


                    Output we will get is:



                    0 True
                    1 False
                    2 False
                    3 True
                    4 False
                    5 True


                    2nd step: Use of cumsum() function, so wherever TRUE value is coming(which will come when a match of A column and its shift is NOT found) it will call cumsum() function and its value will be increased.



                    i.ne(df.A.shift()).cumsum()-1
                    0 0
                    1 0
                    2 0
                    3 1
                    4 1
                    5 2
                    Name: A, dtype: int32


                    3rd step: Save command's value into df['C'] which will create a new column named C in df.






                    share|improve this answer














                    One more method of doing so could be.



                    df['C'] = i.ne(df.A.shift()).cumsum()-1
                    df


                    When we print df value it will be as follows.



                     Index A B C
                    0 0 foo 3 0
                    1 1 foo 2 0
                    2 2 foo 5 0
                    3 3 bar 3 1
                    4 4 bar 4 1
                    5 5 baz 5 2


                    Explanation of solution: Let's break above solution into parts for understanding purposes.



                    1st step: Compare df's A column by shifting its value down to itself as follows.



                    i.ne(df.A.shift())


                    Output we will get is:



                    0 True
                    1 False
                    2 False
                    3 True
                    4 False
                    5 True


                    2nd step: Use of cumsum() function, so wherever TRUE value is coming(which will come when a match of A column and its shift is NOT found) it will call cumsum() function and its value will be increased.



                    i.ne(df.A.shift()).cumsum()-1
                    0 0
                    1 0
                    2 0
                    3 1
                    4 1
                    5 2
                    Name: A, dtype: int32


                    3rd step: Save command's value into df['C'] which will create a new column named C in df.







                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Dec 14 at 1:45

























                    answered Dec 14 at 1:26









                    RavinderSingh13

                    25.5k41438




                    25.5k41438







                    • 1




                      Nice method ve++ for you
                      – W-B
                      Dec 14 at 1:32










                    • @W-B, thank you for encouragement sir, ++ve for your unique style already :)
                      – RavinderSingh13
                      Dec 14 at 1:34













                    • 1




                      Nice method ve++ for you
                      – W-B
                      Dec 14 at 1:32










                    • @W-B, thank you for encouragement sir, ++ve for your unique style already :)
                      – RavinderSingh13
                      Dec 14 at 1:34








                    1




                    1




                    Nice method ve++ for you
                    – W-B
                    Dec 14 at 1:32




                    Nice method ve++ for you
                    – W-B
                    Dec 14 at 1:32












                    @W-B, thank you for encouragement sir, ++ve for your unique style already :)
                    – RavinderSingh13
                    Dec 14 at 1:34





                    @W-B, thank you for encouragement sir, ++ve for your unique style already :)
                    – RavinderSingh13
                    Dec 14 at 1:34



                    Popular posts from this blog

                    How to check contact read email or not when send email to Individual?

                    Displaying single band from multi-band raster using QGIS

                    How many registers does an x86_64 CPU actually have?