Pandas dataframe get value of last nonzero column

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
8
down vote

favorite












I have a pandas dataframe which contains 3 columns, each containing a site that a user has visited during a session.



In some cases, a user may have not visited 3 sites in a single session. This is shown by a 0, denoting that no site has been visited.



import pandas as pd

df = pd.DataFrame(data=[[5, 8, 1],[8,0,0],[1,17,0]],
columns=['site1', 'site2', 'site3'])
print(df)

site1 site2 site3
0 5 8 1
1 8 0 0
2 1 17 0


In the example above, user 0 has visited sites 5, 8 and 1. User 1 has visited site 8 only, and user 2 has visited sites 1 and 17.



I would like to create a new column, last_site, which shows the last site visited by the user in that session.



The result I want is this:



 site1 site2 site3 last_site
0 5 8 1 1
1 8 0 0 8
2 1 17 0 17


How can I do this in a concise way using pandas?










share|improve this question

























    up vote
    8
    down vote

    favorite












    I have a pandas dataframe which contains 3 columns, each containing a site that a user has visited during a session.



    In some cases, a user may have not visited 3 sites in a single session. This is shown by a 0, denoting that no site has been visited.



    import pandas as pd

    df = pd.DataFrame(data=[[5, 8, 1],[8,0,0],[1,17,0]],
    columns=['site1', 'site2', 'site3'])
    print(df)

    site1 site2 site3
    0 5 8 1
    1 8 0 0
    2 1 17 0


    In the example above, user 0 has visited sites 5, 8 and 1. User 1 has visited site 8 only, and user 2 has visited sites 1 and 17.



    I would like to create a new column, last_site, which shows the last site visited by the user in that session.



    The result I want is this:



     site1 site2 site3 last_site
    0 5 8 1 1
    1 8 0 0 8
    2 1 17 0 17


    How can I do this in a concise way using pandas?










    share|improve this question























      up vote
      8
      down vote

      favorite









      up vote
      8
      down vote

      favorite











      I have a pandas dataframe which contains 3 columns, each containing a site that a user has visited during a session.



      In some cases, a user may have not visited 3 sites in a single session. This is shown by a 0, denoting that no site has been visited.



      import pandas as pd

      df = pd.DataFrame(data=[[5, 8, 1],[8,0,0],[1,17,0]],
      columns=['site1', 'site2', 'site3'])
      print(df)

      site1 site2 site3
      0 5 8 1
      1 8 0 0
      2 1 17 0


      In the example above, user 0 has visited sites 5, 8 and 1. User 1 has visited site 8 only, and user 2 has visited sites 1 and 17.



      I would like to create a new column, last_site, which shows the last site visited by the user in that session.



      The result I want is this:



       site1 site2 site3 last_site
      0 5 8 1 1
      1 8 0 0 8
      2 1 17 0 17


      How can I do this in a concise way using pandas?










      share|improve this question













      I have a pandas dataframe which contains 3 columns, each containing a site that a user has visited during a session.



      In some cases, a user may have not visited 3 sites in a single session. This is shown by a 0, denoting that no site has been visited.



      import pandas as pd

      df = pd.DataFrame(data=[[5, 8, 1],[8,0,0],[1,17,0]],
      columns=['site1', 'site2', 'site3'])
      print(df)

      site1 site2 site3
      0 5 8 1
      1 8 0 0
      2 1 17 0


      In the example above, user 0 has visited sites 5, 8 and 1. User 1 has visited site 8 only, and user 2 has visited sites 1 and 17.



      I would like to create a new column, last_site, which shows the last site visited by the user in that session.



      The result I want is this:



       site1 site2 site3 last_site
      0 5 8 1 1
      1 8 0 0 8
      2 1 17 0 17


      How can I do this in a concise way using pandas?







      python pandas dataframe






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked 1 hour ago









      kskyriacou

      2,22411533




      2,22411533






















          3 Answers
          3






          active

          oldest

          votes

















          up vote
          9
          down vote



          accepted










          Use forward filling of misisng values created by replacing 0 values and thenselect last column by iloc:



          df['last'] = df.replace(0, np.nan).ffill(axis=1).iloc[:, -1].astype(int)
          print (df)
          site1 site2 site3 last
          0 5 8 1 1
          1 8 0 0 8
          2 1 17 0 17


          If performance is important is possible use numpy:



          a = df.values
          m = a != 0

          df['last'] = a[np.arange(m.shape[0]), m.shape[1]-m[:,::-1].argmax(1)-1]
          print (df)
          site1 site2 site3 last
          0 5 8 1 1
          1 8 0 0 8
          2 1 17 0 17





          share|improve this answer


















          • 3




            This forward filling logic here is excellent across the rows :) +1
            – pygo
            1 hour ago










          • Yes forward filling across rows is out of the box thinking
            – Vishnudev
            58 mins ago











          • Great and Instant logic indeed :-) .
            – pygo
            58 mins ago


















          up vote
          2
          down vote













          Code:



          df['last_site'] = df.apply(lambda x: x.iloc[x.nonzero()].iloc[-1], axis=1)


          Output:



           site1 site2 site3 last_site
          0 5 8 1 1
          1 8 0 0 8
          2 1 17 0 17





          share|improve this answer




















          • Good one @Vishnudev +1 !
            – pygo
            58 mins ago

















          up vote
          0
          down vote














          mask + ffill



          A "pure Pandas" solution:



          df['last'] = df.mask(df.eq(0)).ffill(1).iloc[:, -1].astype(int)


          numba



          For efficiency over a large number of rows / columns, numba can help. To see why this works better than argmax, see Efficiently return the index of the first value satisfying condition in array.



          from numba import njit

          @njit
          def get_last_val(A):
          m, n = A.shape
          res = A[:, -1]
          for i in range(m):
          for j in range(n):
          if A[i, j] == 0:
          res[i] = A[i, max(0, j-1)]
          break
          return res

          df['last'] = get_last_val(df.values)




          share




















            Your Answer






            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













             

            draft saved


            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53133183%2fpandas-dataframe-get-value-of-last-nonzero-column%23new-answer', 'question_page');

            );

            Post as a guest






























            3 Answers
            3






            active

            oldest

            votes








            3 Answers
            3






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            9
            down vote



            accepted










            Use forward filling of misisng values created by replacing 0 values and thenselect last column by iloc:



            df['last'] = df.replace(0, np.nan).ffill(axis=1).iloc[:, -1].astype(int)
            print (df)
            site1 site2 site3 last
            0 5 8 1 1
            1 8 0 0 8
            2 1 17 0 17


            If performance is important is possible use numpy:



            a = df.values
            m = a != 0

            df['last'] = a[np.arange(m.shape[0]), m.shape[1]-m[:,::-1].argmax(1)-1]
            print (df)
            site1 site2 site3 last
            0 5 8 1 1
            1 8 0 0 8
            2 1 17 0 17





            share|improve this answer


















            • 3




              This forward filling logic here is excellent across the rows :) +1
              – pygo
              1 hour ago










            • Yes forward filling across rows is out of the box thinking
              – Vishnudev
              58 mins ago











            • Great and Instant logic indeed :-) .
              – pygo
              58 mins ago















            up vote
            9
            down vote



            accepted










            Use forward filling of misisng values created by replacing 0 values and thenselect last column by iloc:



            df['last'] = df.replace(0, np.nan).ffill(axis=1).iloc[:, -1].astype(int)
            print (df)
            site1 site2 site3 last
            0 5 8 1 1
            1 8 0 0 8
            2 1 17 0 17


            If performance is important is possible use numpy:



            a = df.values
            m = a != 0

            df['last'] = a[np.arange(m.shape[0]), m.shape[1]-m[:,::-1].argmax(1)-1]
            print (df)
            site1 site2 site3 last
            0 5 8 1 1
            1 8 0 0 8
            2 1 17 0 17





            share|improve this answer


















            • 3




              This forward filling logic here is excellent across the rows :) +1
              – pygo
              1 hour ago










            • Yes forward filling across rows is out of the box thinking
              – Vishnudev
              58 mins ago











            • Great and Instant logic indeed :-) .
              – pygo
              58 mins ago













            up vote
            9
            down vote



            accepted







            up vote
            9
            down vote



            accepted






            Use forward filling of misisng values created by replacing 0 values and thenselect last column by iloc:



            df['last'] = df.replace(0, np.nan).ffill(axis=1).iloc[:, -1].astype(int)
            print (df)
            site1 site2 site3 last
            0 5 8 1 1
            1 8 0 0 8
            2 1 17 0 17


            If performance is important is possible use numpy:



            a = df.values
            m = a != 0

            df['last'] = a[np.arange(m.shape[0]), m.shape[1]-m[:,::-1].argmax(1)-1]
            print (df)
            site1 site2 site3 last
            0 5 8 1 1
            1 8 0 0 8
            2 1 17 0 17





            share|improve this answer














            Use forward filling of misisng values created by replacing 0 values and thenselect last column by iloc:



            df['last'] = df.replace(0, np.nan).ffill(axis=1).iloc[:, -1].astype(int)
            print (df)
            site1 site2 site3 last
            0 5 8 1 1
            1 8 0 0 8
            2 1 17 0 17


            If performance is important is possible use numpy:



            a = df.values
            m = a != 0

            df['last'] = a[np.arange(m.shape[0]), m.shape[1]-m[:,::-1].argmax(1)-1]
            print (df)
            site1 site2 site3 last
            0 5 8 1 1
            1 8 0 0 8
            2 1 17 0 17






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited 52 mins ago

























            answered 1 hour ago









            jezrael

            301k20229304




            301k20229304







            • 3




              This forward filling logic here is excellent across the rows :) +1
              – pygo
              1 hour ago










            • Yes forward filling across rows is out of the box thinking
              – Vishnudev
              58 mins ago











            • Great and Instant logic indeed :-) .
              – pygo
              58 mins ago













            • 3




              This forward filling logic here is excellent across the rows :) +1
              – pygo
              1 hour ago










            • Yes forward filling across rows is out of the box thinking
              – Vishnudev
              58 mins ago











            • Great and Instant logic indeed :-) .
              – pygo
              58 mins ago








            3




            3




            This forward filling logic here is excellent across the rows :) +1
            – pygo
            1 hour ago




            This forward filling logic here is excellent across the rows :) +1
            – pygo
            1 hour ago












            Yes forward filling across rows is out of the box thinking
            – Vishnudev
            58 mins ago





            Yes forward filling across rows is out of the box thinking
            – Vishnudev
            58 mins ago













            Great and Instant logic indeed :-) .
            – pygo
            58 mins ago





            Great and Instant logic indeed :-) .
            – pygo
            58 mins ago













            up vote
            2
            down vote













            Code:



            df['last_site'] = df.apply(lambda x: x.iloc[x.nonzero()].iloc[-1], axis=1)


            Output:



             site1 site2 site3 last_site
            0 5 8 1 1
            1 8 0 0 8
            2 1 17 0 17





            share|improve this answer




















            • Good one @Vishnudev +1 !
              – pygo
              58 mins ago














            up vote
            2
            down vote













            Code:



            df['last_site'] = df.apply(lambda x: x.iloc[x.nonzero()].iloc[-1], axis=1)


            Output:



             site1 site2 site3 last_site
            0 5 8 1 1
            1 8 0 0 8
            2 1 17 0 17





            share|improve this answer




















            • Good one @Vishnudev +1 !
              – pygo
              58 mins ago












            up vote
            2
            down vote










            up vote
            2
            down vote









            Code:



            df['last_site'] = df.apply(lambda x: x.iloc[x.nonzero()].iloc[-1], axis=1)


            Output:



             site1 site2 site3 last_site
            0 5 8 1 1
            1 8 0 0 8
            2 1 17 0 17





            share|improve this answer












            Code:



            df['last_site'] = df.apply(lambda x: x.iloc[x.nonzero()].iloc[-1], axis=1)


            Output:



             site1 site2 site3 last_site
            0 5 8 1 1
            1 8 0 0 8
            2 1 17 0 17






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered 1 hour ago









            Vishnudev

            602316




            602316











            • Good one @Vishnudev +1 !
              – pygo
              58 mins ago
















            • Good one @Vishnudev +1 !
              – pygo
              58 mins ago















            Good one @Vishnudev +1 !
            – pygo
            58 mins ago




            Good one @Vishnudev +1 !
            – pygo
            58 mins ago










            up vote
            0
            down vote














            mask + ffill



            A "pure Pandas" solution:



            df['last'] = df.mask(df.eq(0)).ffill(1).iloc[:, -1].astype(int)


            numba



            For efficiency over a large number of rows / columns, numba can help. To see why this works better than argmax, see Efficiently return the index of the first value satisfying condition in array.



            from numba import njit

            @njit
            def get_last_val(A):
            m, n = A.shape
            res = A[:, -1]
            for i in range(m):
            for j in range(n):
            if A[i, j] == 0:
            res[i] = A[i, max(0, j-1)]
            break
            return res

            df['last'] = get_last_val(df.values)




            share
























              up vote
              0
              down vote














              mask + ffill



              A "pure Pandas" solution:



              df['last'] = df.mask(df.eq(0)).ffill(1).iloc[:, -1].astype(int)


              numba



              For efficiency over a large number of rows / columns, numba can help. To see why this works better than argmax, see Efficiently return the index of the first value satisfying condition in array.



              from numba import njit

              @njit
              def get_last_val(A):
              m, n = A.shape
              res = A[:, -1]
              for i in range(m):
              for j in range(n):
              if A[i, j] == 0:
              res[i] = A[i, max(0, j-1)]
              break
              return res

              df['last'] = get_last_val(df.values)




              share






















                up vote
                0
                down vote










                up vote
                0
                down vote










                mask + ffill



                A "pure Pandas" solution:



                df['last'] = df.mask(df.eq(0)).ffill(1).iloc[:, -1].astype(int)


                numba



                For efficiency over a large number of rows / columns, numba can help. To see why this works better than argmax, see Efficiently return the index of the first value satisfying condition in array.



                from numba import njit

                @njit
                def get_last_val(A):
                m, n = A.shape
                res = A[:, -1]
                for i in range(m):
                for j in range(n):
                if A[i, j] == 0:
                res[i] = A[i, max(0, j-1)]
                break
                return res

                df['last'] = get_last_val(df.values)




                share













                mask + ffill



                A "pure Pandas" solution:



                df['last'] = df.mask(df.eq(0)).ffill(1).iloc[:, -1].astype(int)


                numba



                For efficiency over a large number of rows / columns, numba can help. To see why this works better than argmax, see Efficiently return the index of the first value satisfying condition in array.



                from numba import njit

                @njit
                def get_last_val(A):
                m, n = A.shape
                res = A[:, -1]
                for i in range(m):
                for j in range(n):
                if A[i, j] == 0:
                res[i] = A[i, max(0, j-1)]
                break
                return res

                df['last'] = get_last_val(df.values)





                share











                share


                share










                answered 4 mins ago









                jpp

                77.5k184591




                77.5k184591



























                     

                    draft saved


                    draft discarded















































                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53133183%2fpandas-dataframe-get-value-of-last-nonzero-column%23new-answer', 'question_page');

                    );

                    Post as a guest













































































                    Popular posts from this blog

                    Peggy Mitchell

                    Palaiologos

                    The Forum (Inglewood, California)