Only copy one key-column into merged DataFrame
Clash Royale CLAN TAG#URR8PPP
up vote
6
down vote
favorite
Consider the following DataFrames:
df1 = pd.DataFrame('a': [0, 1, 2, 3], 'b': list('abcd'))
df2 = pd.DataFrame('c': list('abcd'), 'd': 'Alex')
In this instance, df1['b']
and df2['c']
are the key columns. So when merging:
df1.merge(df2, left_on='b', right_on='c')
a b c d
0 0 a a Alex
1 1 b b Alex
2 2 c c Alex
3 3 d d Alex
I end up with both key columns in the resultant DataFrame when I only need one. I've been using:
df1.merge(df2, left_on='b', right_on='c').drop('c', axis='columns')
Is there a way to only keep one key column?
python pandas merge
add a comment |Â
up vote
6
down vote
favorite
Consider the following DataFrames:
df1 = pd.DataFrame('a': [0, 1, 2, 3], 'b': list('abcd'))
df2 = pd.DataFrame('c': list('abcd'), 'd': 'Alex')
In this instance, df1['b']
and df2['c']
are the key columns. So when merging:
df1.merge(df2, left_on='b', right_on='c')
a b c d
0 0 a a Alex
1 1 b b Alex
2 2 c c Alex
3 3 d d Alex
I end up with both key columns in the resultant DataFrame when I only need one. I've been using:
df1.merge(df2, left_on='b', right_on='c').drop('c', axis='columns')
Is there a way to only keep one key column?
python pandas merge
add a comment |Â
up vote
6
down vote
favorite
up vote
6
down vote
favorite
Consider the following DataFrames:
df1 = pd.DataFrame('a': [0, 1, 2, 3], 'b': list('abcd'))
df2 = pd.DataFrame('c': list('abcd'), 'd': 'Alex')
In this instance, df1['b']
and df2['c']
are the key columns. So when merging:
df1.merge(df2, left_on='b', right_on='c')
a b c d
0 0 a a Alex
1 1 b b Alex
2 2 c c Alex
3 3 d d Alex
I end up with both key columns in the resultant DataFrame when I only need one. I've been using:
df1.merge(df2, left_on='b', right_on='c').drop('c', axis='columns')
Is there a way to only keep one key column?
python pandas merge
Consider the following DataFrames:
df1 = pd.DataFrame('a': [0, 1, 2, 3], 'b': list('abcd'))
df2 = pd.DataFrame('c': list('abcd'), 'd': 'Alex')
In this instance, df1['b']
and df2['c']
are the key columns. So when merging:
df1.merge(df2, left_on='b', right_on='c')
a b c d
0 0 a a Alex
1 1 b b Alex
2 2 c c Alex
3 3 d d Alex
I end up with both key columns in the resultant DataFrame when I only need one. I've been using:
df1.merge(df2, left_on='b', right_on='c').drop('c', axis='columns')
Is there a way to only keep one key column?
python pandas merge
python pandas merge
asked 1 hour ago
Alex
434318
434318
add a comment |Â
add a comment |Â
5 Answers
5
active
oldest
votes
up vote
5
down vote
One way is to set b
and c
as the index of your frames respectively, and use join
followed by reset_index
:
df1.set_index('b').join(df2.set_index('c')).reset_index()
b a d
0 a 0 Alex
1 b 1 Alex
2 c 2 Alex
3 d 3 Alex
This will be faster than the merge/drop
method on large dataframes, mostly because drop
is slow. @Bill's method is faster than my suggestion, and @W-B & @PiRsquared easily outspeed the other suggestions:
import timeit
df1 = pd.concat((df1 for _ in range(1000)))
df2 = pd.concat((df2 for _ in range(1000)))
def index_method(df1 = df1, df2 = df2):
return df1.set_index('b').join(df2.set_index('c')).reset_index()
def merge_method(df1 = df1, df2=df2):
return df1.merge(df2, left_on='b', right_on='c').drop('c', axis='columns')
def rename_method(df1 = df1, df2 = df2):
return df1.rename('b': 'c', axis=1).merge(df2)
def index_method2(df1 = df1, df2 = df2):
return df1.join(df2.set_index('c'), on='b')
def assign_method(df1 = df1, df2 = df2):
return df1.set_index('b').assign(c=df2.set_index('c').d).reset_index()
def map_method(df1 = df1, df2 = df2):
return df1.assign(d=df1.b.map(dict(df2.values)))
>>> timeit.timeit(index_method, number=10) / 10
0.7853091600998596
>>> timeit.timeit(merge_method, number=10) / 10
1.1696729859002517
>>> timeit.timeit(rename_method, number=10) / 10
0.4291436871004407
>>> timeit.timeit(index_method2, number=10) / 10
0.5037374985004135
>>> timeit.timeit(assign_method, number=10) / 10
0.0038641377999738325
>>> timeit.timeit(map_method, number=10) / 10
0.006620216699957382
3
df1.join(df2.set_index('c'), on='b')
â piRSquared
46 mins ago
2
Would you like testing my speed ?
â W-B
42 mins ago
2
@W-B, I just did, it's far faster!
â sacul
40 mins ago
@sacul thank you :-)
â W-B
38 mins ago
add a comment |Â
up vote
4
down vote
Another way is to give b and c the same name. At least for the merge operation.
df1.rename('b': 'c', axis=1).merge(df2)
a c d
0 0 a Alex
1 1 b Alex
2 2 c Alex
3 3 d Alex
add a comment |Â
up vote
4
down vote
Or use one set_index
and left_index=True
and right_on
paramater:
df1.set_index('b').merge(df2, left_index=True, right_on='c')
Output:
a c d
0 0 a Alex
1 1 b Alex
2 2 c Alex
3 3 d Alex
add a comment |Â
up vote
4
down vote
After set_index
you ca directly assign
the value
df1.set_index('b').assign(c=df2.set_index('c').d).reset_index()
Out[233]:
b a c
0 a 0 Alex
1 b 1 Alex
2 c 2 Alex
3 d 3 Alex
add a comment |Â
up vote
4
down vote
map
Obnoxious (not recommended) method that I was compelled to put down because I accidentally posted a duplicate answer to someone else.
df1.assign(d=df1.b.map(dict(df2.values)))
a b d
0 0 a Alex
1 1 b Alex
2 2 c Alex
3 3 d Alex
Wait, why not use map in this case of bringing only one column?
â ALollz
27 mins ago
1
Because it isn't generalized. It's very specific to this toy problem. If we truly were bringing over one column, then I'd agree.
â piRSquared
26 mins ago
add a comment |Â
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
5
down vote
One way is to set b
and c
as the index of your frames respectively, and use join
followed by reset_index
:
df1.set_index('b').join(df2.set_index('c')).reset_index()
b a d
0 a 0 Alex
1 b 1 Alex
2 c 2 Alex
3 d 3 Alex
This will be faster than the merge/drop
method on large dataframes, mostly because drop
is slow. @Bill's method is faster than my suggestion, and @W-B & @PiRsquared easily outspeed the other suggestions:
import timeit
df1 = pd.concat((df1 for _ in range(1000)))
df2 = pd.concat((df2 for _ in range(1000)))
def index_method(df1 = df1, df2 = df2):
return df1.set_index('b').join(df2.set_index('c')).reset_index()
def merge_method(df1 = df1, df2=df2):
return df1.merge(df2, left_on='b', right_on='c').drop('c', axis='columns')
def rename_method(df1 = df1, df2 = df2):
return df1.rename('b': 'c', axis=1).merge(df2)
def index_method2(df1 = df1, df2 = df2):
return df1.join(df2.set_index('c'), on='b')
def assign_method(df1 = df1, df2 = df2):
return df1.set_index('b').assign(c=df2.set_index('c').d).reset_index()
def map_method(df1 = df1, df2 = df2):
return df1.assign(d=df1.b.map(dict(df2.values)))
>>> timeit.timeit(index_method, number=10) / 10
0.7853091600998596
>>> timeit.timeit(merge_method, number=10) / 10
1.1696729859002517
>>> timeit.timeit(rename_method, number=10) / 10
0.4291436871004407
>>> timeit.timeit(index_method2, number=10) / 10
0.5037374985004135
>>> timeit.timeit(assign_method, number=10) / 10
0.0038641377999738325
>>> timeit.timeit(map_method, number=10) / 10
0.006620216699957382
3
df1.join(df2.set_index('c'), on='b')
â piRSquared
46 mins ago
2
Would you like testing my speed ?
â W-B
42 mins ago
2
@W-B, I just did, it's far faster!
â sacul
40 mins ago
@sacul thank you :-)
â W-B
38 mins ago
add a comment |Â
up vote
5
down vote
One way is to set b
and c
as the index of your frames respectively, and use join
followed by reset_index
:
df1.set_index('b').join(df2.set_index('c')).reset_index()
b a d
0 a 0 Alex
1 b 1 Alex
2 c 2 Alex
3 d 3 Alex
This will be faster than the merge/drop
method on large dataframes, mostly because drop
is slow. @Bill's method is faster than my suggestion, and @W-B & @PiRsquared easily outspeed the other suggestions:
import timeit
df1 = pd.concat((df1 for _ in range(1000)))
df2 = pd.concat((df2 for _ in range(1000)))
def index_method(df1 = df1, df2 = df2):
return df1.set_index('b').join(df2.set_index('c')).reset_index()
def merge_method(df1 = df1, df2=df2):
return df1.merge(df2, left_on='b', right_on='c').drop('c', axis='columns')
def rename_method(df1 = df1, df2 = df2):
return df1.rename('b': 'c', axis=1).merge(df2)
def index_method2(df1 = df1, df2 = df2):
return df1.join(df2.set_index('c'), on='b')
def assign_method(df1 = df1, df2 = df2):
return df1.set_index('b').assign(c=df2.set_index('c').d).reset_index()
def map_method(df1 = df1, df2 = df2):
return df1.assign(d=df1.b.map(dict(df2.values)))
>>> timeit.timeit(index_method, number=10) / 10
0.7853091600998596
>>> timeit.timeit(merge_method, number=10) / 10
1.1696729859002517
>>> timeit.timeit(rename_method, number=10) / 10
0.4291436871004407
>>> timeit.timeit(index_method2, number=10) / 10
0.5037374985004135
>>> timeit.timeit(assign_method, number=10) / 10
0.0038641377999738325
>>> timeit.timeit(map_method, number=10) / 10
0.006620216699957382
3
df1.join(df2.set_index('c'), on='b')
â piRSquared
46 mins ago
2
Would you like testing my speed ?
â W-B
42 mins ago
2
@W-B, I just did, it's far faster!
â sacul
40 mins ago
@sacul thank you :-)
â W-B
38 mins ago
add a comment |Â
up vote
5
down vote
up vote
5
down vote
One way is to set b
and c
as the index of your frames respectively, and use join
followed by reset_index
:
df1.set_index('b').join(df2.set_index('c')).reset_index()
b a d
0 a 0 Alex
1 b 1 Alex
2 c 2 Alex
3 d 3 Alex
This will be faster than the merge/drop
method on large dataframes, mostly because drop
is slow. @Bill's method is faster than my suggestion, and @W-B & @PiRsquared easily outspeed the other suggestions:
import timeit
df1 = pd.concat((df1 for _ in range(1000)))
df2 = pd.concat((df2 for _ in range(1000)))
def index_method(df1 = df1, df2 = df2):
return df1.set_index('b').join(df2.set_index('c')).reset_index()
def merge_method(df1 = df1, df2=df2):
return df1.merge(df2, left_on='b', right_on='c').drop('c', axis='columns')
def rename_method(df1 = df1, df2 = df2):
return df1.rename('b': 'c', axis=1).merge(df2)
def index_method2(df1 = df1, df2 = df2):
return df1.join(df2.set_index('c'), on='b')
def assign_method(df1 = df1, df2 = df2):
return df1.set_index('b').assign(c=df2.set_index('c').d).reset_index()
def map_method(df1 = df1, df2 = df2):
return df1.assign(d=df1.b.map(dict(df2.values)))
>>> timeit.timeit(index_method, number=10) / 10
0.7853091600998596
>>> timeit.timeit(merge_method, number=10) / 10
1.1696729859002517
>>> timeit.timeit(rename_method, number=10) / 10
0.4291436871004407
>>> timeit.timeit(index_method2, number=10) / 10
0.5037374985004135
>>> timeit.timeit(assign_method, number=10) / 10
0.0038641377999738325
>>> timeit.timeit(map_method, number=10) / 10
0.006620216699957382
One way is to set b
and c
as the index of your frames respectively, and use join
followed by reset_index
:
df1.set_index('b').join(df2.set_index('c')).reset_index()
b a d
0 a 0 Alex
1 b 1 Alex
2 c 2 Alex
3 d 3 Alex
This will be faster than the merge/drop
method on large dataframes, mostly because drop
is slow. @Bill's method is faster than my suggestion, and @W-B & @PiRsquared easily outspeed the other suggestions:
import timeit
df1 = pd.concat((df1 for _ in range(1000)))
df2 = pd.concat((df2 for _ in range(1000)))
def index_method(df1 = df1, df2 = df2):
return df1.set_index('b').join(df2.set_index('c')).reset_index()
def merge_method(df1 = df1, df2=df2):
return df1.merge(df2, left_on='b', right_on='c').drop('c', axis='columns')
def rename_method(df1 = df1, df2 = df2):
return df1.rename('b': 'c', axis=1).merge(df2)
def index_method2(df1 = df1, df2 = df2):
return df1.join(df2.set_index('c'), on='b')
def assign_method(df1 = df1, df2 = df2):
return df1.set_index('b').assign(c=df2.set_index('c').d).reset_index()
def map_method(df1 = df1, df2 = df2):
return df1.assign(d=df1.b.map(dict(df2.values)))
>>> timeit.timeit(index_method, number=10) / 10
0.7853091600998596
>>> timeit.timeit(merge_method, number=10) / 10
1.1696729859002517
>>> timeit.timeit(rename_method, number=10) / 10
0.4291436871004407
>>> timeit.timeit(index_method2, number=10) / 10
0.5037374985004135
>>> timeit.timeit(assign_method, number=10) / 10
0.0038641377999738325
>>> timeit.timeit(map_method, number=10) / 10
0.006620216699957382
edited 38 mins ago
answered 1 hour ago
sacul
25.8k41638
25.8k41638
3
df1.join(df2.set_index('c'), on='b')
â piRSquared
46 mins ago
2
Would you like testing my speed ?
â W-B
42 mins ago
2
@W-B, I just did, it's far faster!
â sacul
40 mins ago
@sacul thank you :-)
â W-B
38 mins ago
add a comment |Â
3
df1.join(df2.set_index('c'), on='b')
â piRSquared
46 mins ago
2
Would you like testing my speed ?
â W-B
42 mins ago
2
@W-B, I just did, it's far faster!
â sacul
40 mins ago
@sacul thank you :-)
â W-B
38 mins ago
3
3
df1.join(df2.set_index('c'), on='b')
â piRSquared
46 mins ago
df1.join(df2.set_index('c'), on='b')
â piRSquared
46 mins ago
2
2
Would you like testing my speed ?
â W-B
42 mins ago
Would you like testing my speed ?
â W-B
42 mins ago
2
2
@W-B, I just did, it's far faster!
â sacul
40 mins ago
@W-B, I just did, it's far faster!
â sacul
40 mins ago
@sacul thank you :-)
â W-B
38 mins ago
@sacul thank you :-)
â W-B
38 mins ago
add a comment |Â
up vote
4
down vote
Another way is to give b and c the same name. At least for the merge operation.
df1.rename('b': 'c', axis=1).merge(df2)
a c d
0 0 a Alex
1 1 b Alex
2 2 c Alex
3 3 d Alex
add a comment |Â
up vote
4
down vote
Another way is to give b and c the same name. At least for the merge operation.
df1.rename('b': 'c', axis=1).merge(df2)
a c d
0 0 a Alex
1 1 b Alex
2 2 c Alex
3 3 d Alex
add a comment |Â
up vote
4
down vote
up vote
4
down vote
Another way is to give b and c the same name. At least for the merge operation.
df1.rename('b': 'c', axis=1).merge(df2)
a c d
0 0 a Alex
1 1 b Alex
2 2 c Alex
3 3 d Alex
Another way is to give b and c the same name. At least for the merge operation.
df1.rename('b': 'c', axis=1).merge(df2)
a c d
0 0 a Alex
1 1 b Alex
2 2 c Alex
3 3 d Alex
answered 56 mins ago
Bill
1,97411827
1,97411827
add a comment |Â
add a comment |Â
up vote
4
down vote
Or use one set_index
and left_index=True
and right_on
paramater:
df1.set_index('b').merge(df2, left_index=True, right_on='c')
Output:
a c d
0 0 a Alex
1 1 b Alex
2 2 c Alex
3 3 d Alex
add a comment |Â
up vote
4
down vote
Or use one set_index
and left_index=True
and right_on
paramater:
df1.set_index('b').merge(df2, left_index=True, right_on='c')
Output:
a c d
0 0 a Alex
1 1 b Alex
2 2 c Alex
3 3 d Alex
add a comment |Â
up vote
4
down vote
up vote
4
down vote
Or use one set_index
and left_index=True
and right_on
paramater:
df1.set_index('b').merge(df2, left_index=True, right_on='c')
Output:
a c d
0 0 a Alex
1 1 b Alex
2 2 c Alex
3 3 d Alex
Or use one set_index
and left_index=True
and right_on
paramater:
df1.set_index('b').merge(df2, left_index=True, right_on='c')
Output:
a c d
0 0 a Alex
1 1 b Alex
2 2 c Alex
3 3 d Alex
answered 50 mins ago
Scott Boston
48.2k52451
48.2k52451
add a comment |Â
add a comment |Â
up vote
4
down vote
After set_index
you ca directly assign
the value
df1.set_index('b').assign(c=df2.set_index('c').d).reset_index()
Out[233]:
b a c
0 a 0 Alex
1 b 1 Alex
2 c 2 Alex
3 d 3 Alex
add a comment |Â
up vote
4
down vote
After set_index
you ca directly assign
the value
df1.set_index('b').assign(c=df2.set_index('c').d).reset_index()
Out[233]:
b a c
0 a 0 Alex
1 b 1 Alex
2 c 2 Alex
3 d 3 Alex
add a comment |Â
up vote
4
down vote
up vote
4
down vote
After set_index
you ca directly assign
the value
df1.set_index('b').assign(c=df2.set_index('c').d).reset_index()
Out[233]:
b a c
0 a 0 Alex
1 b 1 Alex
2 c 2 Alex
3 d 3 Alex
After set_index
you ca directly assign
the value
df1.set_index('b').assign(c=df2.set_index('c').d).reset_index()
Out[233]:
b a c
0 a 0 Alex
1 b 1 Alex
2 c 2 Alex
3 d 3 Alex
answered 43 mins ago
W-B
90.5k72754
90.5k72754
add a comment |Â
add a comment |Â
up vote
4
down vote
map
Obnoxious (not recommended) method that I was compelled to put down because I accidentally posted a duplicate answer to someone else.
df1.assign(d=df1.b.map(dict(df2.values)))
a b d
0 0 a Alex
1 1 b Alex
2 2 c Alex
3 3 d Alex
Wait, why not use map in this case of bringing only one column?
â ALollz
27 mins ago
1
Because it isn't generalized. It's very specific to this toy problem. If we truly were bringing over one column, then I'd agree.
â piRSquared
26 mins ago
add a comment |Â
up vote
4
down vote
map
Obnoxious (not recommended) method that I was compelled to put down because I accidentally posted a duplicate answer to someone else.
df1.assign(d=df1.b.map(dict(df2.values)))
a b d
0 0 a Alex
1 1 b Alex
2 2 c Alex
3 3 d Alex
Wait, why not use map in this case of bringing only one column?
â ALollz
27 mins ago
1
Because it isn't generalized. It's very specific to this toy problem. If we truly were bringing over one column, then I'd agree.
â piRSquared
26 mins ago
add a comment |Â
up vote
4
down vote
up vote
4
down vote
map
Obnoxious (not recommended) method that I was compelled to put down because I accidentally posted a duplicate answer to someone else.
df1.assign(d=df1.b.map(dict(df2.values)))
a b d
0 0 a Alex
1 1 b Alex
2 2 c Alex
3 3 d Alex
map
Obnoxious (not recommended) method that I was compelled to put down because I accidentally posted a duplicate answer to someone else.
df1.assign(d=df1.b.map(dict(df2.values)))
a b d
0 0 a Alex
1 1 b Alex
2 2 c Alex
3 3 d Alex
edited 42 mins ago
answered 49 mins ago
piRSquared
147k21130264
147k21130264
Wait, why not use map in this case of bringing only one column?
â ALollz
27 mins ago
1
Because it isn't generalized. It's very specific to this toy problem. If we truly were bringing over one column, then I'd agree.
â piRSquared
26 mins ago
add a comment |Â
Wait, why not use map in this case of bringing only one column?
â ALollz
27 mins ago
1
Because it isn't generalized. It's very specific to this toy problem. If we truly were bringing over one column, then I'd agree.
â piRSquared
26 mins ago
Wait, why not use map in this case of bringing only one column?
â ALollz
27 mins ago
Wait, why not use map in this case of bringing only one column?
â ALollz
27 mins ago
1
1
Because it isn't generalized. It's very specific to this toy problem. If we truly were bringing over one column, then I'd agree.
â piRSquared
26 mins ago
Because it isn't generalized. It's very specific to this toy problem. If we truly were bringing over one column, then I'd agree.
â piRSquared
26 mins ago
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53215736%2fonly-copy-one-key-column-into-merged-dataframe%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password