Pandas dataframe get value of last nonzero column

up vote
8
down vote

favorite

I have a pandas dataframe which contains 3 columns, each containing a site that a user has visited during a session.

In some cases, a user may have not visited 3 sites in a single session. This is shown by a 0, denoting that no site has been visited.

import pandas as pd

df = pd.DataFrame(data=[[5, 8, 1],[8,0,0],[1,17,0]], 
 columns=['site1', 'site2', 'site3'])
print(df)

 site1 site2 site3
0 5 8 1
1 8 0 0
2 1 17 0

In the example above, user 0 has visited sites 5, 8 and 1. User 1 has visited site 8 only, and user 2 has visited sites 1 and 17.

I would like to create a new column, last_site, which shows the last site visited by the user in that session.

The result I want is this:

 site1 site2 site3 last_site
0 5 8 1 1
1 8 0 0 8
2 1 17 0 17

How can I do this in a concise way using pandas?

asked 1 hour ago

kskyriacou

2,22411533

add a commentÂ |Â

up vote
8
down vote

favorite

I have a pandas dataframe which contains 3 columns, each containing a site that a user has visited during a session.

In some cases, a user may have not visited 3 sites in a single session. This is shown by a 0, denoting that no site has been visited.

import pandas as pd

df = pd.DataFrame(data=[[5, 8, 1],[8,0,0],[1,17,0]], 
 columns=['site1', 'site2', 'site3'])
print(df)

 site1 site2 site3
0 5 8 1
1 8 0 0
2 1 17 0

In the example above, user 0 has visited sites 5, 8 and 1. User 1 has visited site 8 only, and user 2 has visited sites 1 and 17.

I would like to create a new column, last_site, which shows the last site visited by the user in that session.

The result I want is this:

 site1 site2 site3 last_site
0 5 8 1 1
1 8 0 0 8
2 1 17 0 17

How can I do this in a concise way using pandas?

asked 1 hour ago

kskyriacou

2,22411533

add a commentÂ |Â

up vote
8
down vote

favorite

I have a pandas dataframe which contains 3 columns, each containing a site that a user has visited during a session.

In some cases, a user may have not visited 3 sites in a single session. This is shown by a 0, denoting that no site has been visited.

import pandas as pd

df = pd.DataFrame(data=[[5, 8, 1],[8,0,0],[1,17,0]], 
 columns=['site1', 'site2', 'site3'])
print(df)

 site1 site2 site3
0 5 8 1
1 8 0 0
2 1 17 0

In the example above, user 0 has visited sites 5, 8 and 1. User 1 has visited site 8 only, and user 2 has visited sites 1 and 17.

I would like to create a new column, last_site, which shows the last site visited by the user in that session.

The result I want is this:

 site1 site2 site3 last_site
0 5 8 1 1
1 8 0 0 8
2 1 17 0 17

How can I do this in a concise way using pandas?

asked 1 hour ago

kskyriacou

2,22411533

I have a pandas dataframe which contains 3 columns, each containing a site that a user has visited during a session.

In some cases, a user may have not visited 3 sites in a single session. This is shown by a 0, denoting that no site has been visited.

import pandas as pd

df = pd.DataFrame(data=[[5, 8, 1],[8,0,0],[1,17,0]], 
 columns=['site1', 'site2', 'site3'])
print(df)

 site1 site2 site3
0 5 8 1
1 8 0 0
2 1 17 0

In the example above, user 0 has visited sites 5, 8 and 1. User 1 has visited site 8 only, and user 2 has visited sites 1 and 17.

I would like to create a new column, last_site, which shows the last site visited by the user in that session.

The result I want is this:

 site1 site2 site3 last_site
0 5 8 1 1
1 8 0 0 8
2 1 17 0 17

How can I do this in a concise way using pandas?

python pandas dataframe

asked 1 hour ago

kskyriacou

2,22411533

asked 1 hour ago

kskyriacou

2,22411533

asked 1 hour ago

kskyriacou

2,22411533

asked 1 hour ago

kskyriacou

2,22411533

asked 1 hour ago

kskyriacou

2,22411533

add a commentÂ |Â

3 Answers
3

active

oldest

votes

up vote
9
down vote

accepted

Use forward filling of misisng values created by replacing 0 values and thenselect last column by iloc:

df['last'] = df.replace(0, np.nan).ffill(axis=1).iloc[:, -1].astype(int)
print (df)
 site1 site2 site3 last
0 5 8 1 1
1 8 0 0 8
2 1 17 0 17

If performance is important is possible use numpy:

a = df.values
m = a != 0

df['last'] = a[np.arange(m.shape[0]), m.shape[1]-m[:,::-1].argmax(1)-1]
print (df)
 site1 site2 site3 last
0 5 8 1 1
1 8 0 0 8
2 1 17 0 17

edited 52 mins ago

answered 1 hour ago

jezrael

301k20229304

3

This forward filling logic here is excellent across the rows :) +1
â€“Â pygo
1 hour ago

Yes forward filling across rows is out of the box thinking
â€“Â Vishnudev
58 mins ago

Great and Instant logic indeed :-) .
â€“Â pygo
58 mins ago

add a commentÂ |Â

up vote
2
down vote

Code:

df['last_site'] = df.apply(lambda x: x.iloc[x.nonzero()].iloc[-1], axis=1)

Output:

 site1 site2 site3 last_site
0 5 8 1 1
1 8 0 0 8
2 1 17 0 17

answered 1 hour ago

Vishnudev

602316

Good one @Vishnudev +1 !
â€“Â pygo
58 mins ago

add a commentÂ |Â

up vote
0
down vote

`mask` + `ffill`

A "pure Pandas" solution:

df['last'] = df.mask(df.eq(0)).ffill(1).iloc[:, -1].astype(int)

`numba`

For efficiency over a large number of rows / columns, numba can help. To see why this works better than argmax, see Efficiently return the index of the first value satisfying condition in array.

from numba import njit

@njit
def get_last_val(A):
 m, n = A.shape
 res = A[:, -1]
 for i in range(m):
 for j in range(n):
 if A[i, j] == 0:
 res[i] = A[i, max(0, j-1)]
 break
 return res

df['last'] = get_last_val(df.values)

answered 4 mins ago

jpp

77.5k184591

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53133183%2fpandas-dataframe-get-value-of-last-nonzero-column%23new-answer', 'question_page');

);

Post as a guest

Name

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

up vote
9
down vote

accepted

Use forward filling of misisng values created by replacing 0 values and thenselect last column by iloc:

df['last'] = df.replace(0, np.nan).ffill(axis=1).iloc[:, -1].astype(int)
print (df)
 site1 site2 site3 last
0 5 8 1 1
1 8 0 0 8
2 1 17 0 17

If performance is important is possible use numpy:

a = df.values
m = a != 0

df['last'] = a[np.arange(m.shape[0]), m.shape[1]-m[:,::-1].argmax(1)-1]
print (df)
 site1 site2 site3 last
0 5 8 1 1
1 8 0 0 8
2 1 17 0 17

edited 52 mins ago

answered 1 hour ago

jezrael

301k20229304

3

This forward filling logic here is excellent across the rows :) +1
â€“Â pygo
1 hour ago

Yes forward filling across rows is out of the box thinking
â€“Â Vishnudev
58 mins ago

Great and Instant logic indeed :-) .
â€“Â pygo
58 mins ago

add a commentÂ |Â

up vote
9
down vote

accepted

Use forward filling of misisng values created by replacing 0 values and thenselect last column by iloc:

df['last'] = df.replace(0, np.nan).ffill(axis=1).iloc[:, -1].astype(int)
print (df)
 site1 site2 site3 last
0 5 8 1 1
1 8 0 0 8
2 1 17 0 17

If performance is important is possible use numpy:

a = df.values
m = a != 0

df['last'] = a[np.arange(m.shape[0]), m.shape[1]-m[:,::-1].argmax(1)-1]
print (df)
 site1 site2 site3 last
0 5 8 1 1
1 8 0 0 8
2 1 17 0 17

edited 52 mins ago

answered 1 hour ago

jezrael

301k20229304

3

This forward filling logic here is excellent across the rows :) +1
â€“Â pygo
1 hour ago

Yes forward filling across rows is out of the box thinking
â€“Â Vishnudev
58 mins ago

Great and Instant logic indeed :-) .
â€“Â pygo
58 mins ago

add a commentÂ |Â

up vote
9
down vote

accepted

Use forward filling of misisng values created by replacing 0 values and thenselect last column by iloc:

df['last'] = df.replace(0, np.nan).ffill(axis=1).iloc[:, -1].astype(int)
print (df)
 site1 site2 site3 last
0 5 8 1 1
1 8 0 0 8
2 1 17 0 17

If performance is important is possible use numpy:

a = df.values
m = a != 0

df['last'] = a[np.arange(m.shape[0]), m.shape[1]-m[:,::-1].argmax(1)-1]
print (df)
 site1 site2 site3 last
0 5 8 1 1
1 8 0 0 8
2 1 17 0 17

edited 52 mins ago

answered 1 hour ago

jezrael

301k20229304

Use forward filling of misisng values created by replacing 0 values and thenselect last column by iloc:

df['last'] = df.replace(0, np.nan).ffill(axis=1).iloc[:, -1].astype(int)
print (df)
 site1 site2 site3 last
0 5 8 1 1
1 8 0 0 8
2 1 17 0 17

If performance is important is possible use numpy:

a = df.values
m = a != 0

df['last'] = a[np.arange(m.shape[0]), m.shape[1]-m[:,::-1].argmax(1)-1]
print (df)
 site1 site2 site3 last
0 5 8 1 1
1 8 0 0 8
2 1 17 0 17

edited 52 mins ago

answered 1 hour ago

jezrael

301k20229304

edited 52 mins ago

answered 1 hour ago

jezrael

301k20229304

answered 1 hour ago

jezrael

301k20229304

answered 1 hour ago

jezrael

301k20229304

3

This forward filling logic here is excellent across the rows :) +1
â€“Â pygo
1 hour ago

Yes forward filling across rows is out of the box thinking
â€“Â Vishnudev
58 mins ago

Great and Instant logic indeed :-) .
â€“Â pygo
58 mins ago

add a commentÂ |Â

3

This forward filling logic here is excellent across the rows :) +1
â€“Â pygo
1 hour ago

Yes forward filling across rows is out of the box thinking
â€“Â Vishnudev
58 mins ago

Great and Instant logic indeed :-) .
â€“Â pygo
58 mins ago

This forward filling logic here is excellent across the rows :) +1
â€“Â pygo
1 hour ago

Yes forward filling across rows is out of the box thinking
â€“Â Vishnudev
58 mins ago

Great and Instant logic indeed :-) .
â€“Â pygo
58 mins ago

add a commentÂ |Â

up vote
2
down vote

Code:

df['last_site'] = df.apply(lambda x: x.iloc[x.nonzero()].iloc[-1], axis=1)

Output:

 site1 site2 site3 last_site
0 5 8 1 1
1 8 0 0 8
2 1 17 0 17

answered 1 hour ago

Vishnudev

602316

Good one @Vishnudev +1 !
â€“Â pygo
58 mins ago

add a commentÂ |Â

up vote
2
down vote

Code:

df['last_site'] = df.apply(lambda x: x.iloc[x.nonzero()].iloc[-1], axis=1)

Output:

 site1 site2 site3 last_site
0 5 8 1 1
1 8 0 0 8
2 1 17 0 17

answered 1 hour ago

Vishnudev

602316

Good one @Vishnudev +1 !
â€“Â pygo
58 mins ago

add a commentÂ |Â

up vote
2
down vote

Code:

df['last_site'] = df.apply(lambda x: x.iloc[x.nonzero()].iloc[-1], axis=1)

Output:

 site1 site2 site3 last_site
0 5 8 1 1
1 8 0 0 8
2 1 17 0 17

answered 1 hour ago

Vishnudev

602316

Code:

df['last_site'] = df.apply(lambda x: x.iloc[x.nonzero()].iloc[-1], axis=1)

Output:

 site1 site2 site3 last_site
0 5 8 1 1
1 8 0 0 8
2 1 17 0 17

answered 1 hour ago

Vishnudev

602316

answered 1 hour ago

Vishnudev

602316

answered 1 hour ago

Vishnudev

602316

answered 1 hour ago

Vishnudev

602316

Good one @Vishnudev +1 !
â€“Â pygo
58 mins ago

add a commentÂ |Â

Good one @Vishnudev +1 !
â€“Â pygo
58 mins ago

Good one @Vishnudev +1 !
â€“Â pygo
58 mins ago

add a commentÂ |Â

up vote
0
down vote

`mask` + `ffill`

A "pure Pandas" solution:

df['last'] = df.mask(df.eq(0)).ffill(1).iloc[:, -1].astype(int)

`numba`

For efficiency over a large number of rows / columns, numba can help. To see why this works better than argmax, see Efficiently return the index of the first value satisfying condition in array.

from numba import njit

@njit
def get_last_val(A):
 m, n = A.shape
 res = A[:, -1]
 for i in range(m):
 for j in range(n):
 if A[i, j] == 0:
 res[i] = A[i, max(0, j-1)]
 break
 return res

df['last'] = get_last_val(df.values)

answered 4 mins ago

jpp

77.5k184591

add a commentÂ |Â

up vote
0
down vote

`mask` + `ffill`

A "pure Pandas" solution:

df['last'] = df.mask(df.eq(0)).ffill(1).iloc[:, -1].astype(int)

`numba`

For efficiency over a large number of rows / columns, numba can help. To see why this works better than argmax, see Efficiently return the index of the first value satisfying condition in array.

from numba import njit

@njit
def get_last_val(A):
 m, n = A.shape
 res = A[:, -1]
 for i in range(m):
 for j in range(n):
 if A[i, j] == 0:
 res[i] = A[i, max(0, j-1)]
 break
 return res

df['last'] = get_last_val(df.values)

answered 4 mins ago

jpp

77.5k184591

add a commentÂ |Â

up vote
0
down vote

`mask` + `ffill`

A "pure Pandas" solution:

df['last'] = df.mask(df.eq(0)).ffill(1).iloc[:, -1].astype(int)

`numba`

For efficiency over a large number of rows / columns, numba can help. To see why this works better than argmax, see Efficiently return the index of the first value satisfying condition in array.

from numba import njit

@njit
def get_last_val(A):
 m, n = A.shape
 res = A[:, -1]
 for i in range(m):
 for j in range(n):
 if A[i, j] == 0:
 res[i] = A[i, max(0, j-1)]
 break
 return res

df['last'] = get_last_val(df.values)

answered 4 mins ago

jpp

77.5k184591

`mask` + `ffill`

A "pure Pandas" solution:

df['last'] = df.mask(df.eq(0)).ffill(1).iloc[:, -1].astype(int)

`numba`

For efficiency over a large number of rows / columns, numba can help. To see why this works better than argmax, see Efficiently return the index of the first value satisfying condition in array.

from numba import njit

@njit
def get_last_val(A):
 m, n = A.shape
 res = A[:, -1]
 for i in range(m):
 for j in range(n):
 if A[i, j] == 0:
 res[i] = A[i, max(0, j-1)]
 break
 return res

df['last'] = get_last_val(df.values)

answered 4 mins ago

jpp

77.5k184591

answered 4 mins ago

jpp

77.5k184591

answered 4 mins ago

jpp

77.5k184591

answered 4 mins ago

jpp

77.5k184591

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu