Understanding the FutureWarning on using join_axes when concatenating with Pandas - python

I have two DataFrames:
df1:
A B C
1 A1 B1 C1
2 A2 B2 C2
df2:
B C D
3 B3 C3 D3
4 B4 C4 D4
Columns B and C are identical for both.
I'd like to concatenate them vertically and keep the columns of the first DataFrame:
pd.concat([df1, df2], join_axes=[df1.columns]):
A B C
1 A1 B1 C1
2 A2 B2 C2
3 NaN B3 C3
4 NaN B4 C4
This works, but raises a
FutureWarning: The join_axes-keyword is deprecated. Use .reindex or .reindex_like on the result to achieve the same functionality.
I couldn't find (either in the documentation or through Google) how to "Use .reindex or .reindex_like on the result to achieve the same functionality".
Colab notebook illustrating issue: https://colab.research.google.com/drive/13EBq2z0Nh05JY7ovrdnLGtfeqdKVvZq0

Just like what the error mentioned add reindex
pd.concat([df1,df2.reindex(columns=df1.columns)])
Out[286]:
A B C
1 A1 B1 C1
2 A2 B2 C2
3 NaN B3 C3
4 NaN B4 C4

df1 = pd.DataFrame({'A': ['A1', 'A2'], 'B': ['B1', 'B2'], 'C': ['C1', 'C2']})
df2 = pd.DataFrame({'B': ['B3', 'B4'], 'C': ['C3', 'C4'], 'D': ['D1', 'D2']})
pd.concat([df1, df2], sort=False)[df1.columns]
yields the desired result.

OR...
pd.concat([df1, df2], sort=False).reindex(df1.columns, axis=1)
Output:
A B C
1 A1 B1 C1
2 A2 B2 C2
3 NaN B3 C3
4 NaN B4 C4

Related

How to transform tables using pandas

---I have a csv dataset---
import pandas as pd
df = pd.DataFrame({'A':['a','a','a','a1','a1','a1','a1','a1','a1'], 'B':['b','b','b','b1','b1','b1','b1','b1','b1'], 'C':['c','c','c','c1','c1','c1','c1','c1','c1'], 'D':['d','d1','d2','d3','d4','d5','d6','d7','d8'], 'Rank':[1,2,3,1,2,3,4,5,6})
---I want to transform as in the following table ---
pd.pivot_table(df, values = ['D'] index=['A','B','C'], columns = 'Rank').reset_index()
---I didn't get what I want---
pd.DataFrame({'A':['a','a1'], 'B':['b','b1'], 'C':['c','c1'], '1':['d','d3'], '2':['d1','d4'], '3':['d2','d5'], '4':['NaN','d6'], '5':['NaN','d7'], '6':['NaN','d8'], '7':['NaN','NaN']})
You have to use pivot, not pivot_table in this case:
df.pivot(index=['A', 'B', 'C'], columns='Rank', values='D').reset_index()
Output:
Rank A B C 1 2 3 4 5 6
0 a b c d d1 d2 NaN NaN NaN
1 a1 b1 c1 d3 d4 d5 d6 d7 d8
pivot_table aggregates duplicates, but pivot doesn't. Which is what you want.
To remove axis name:
df.pivot(index=['A', 'B', 'C'], columns='Rank', values='D').reset_index().rename_axis(columns=None)
Output:
A B C 1 2 3 4 5 6
0 a b c d d1 d2 NaN NaN NaN
1 a1 b1 c1 d3 d4 d5 d6 d7 d8

Merging two DataFrames horizontally without reindexing the first

I want to stack two DataFrames horizontally without re-indexing the first DataFrame (df1) as these indices contain some important information. However, indices on the second DataFrame (df2) has no significance and can be modified.
I could not find any way without converting the df2 to numpy and passing the indices of df1 at creation. For better understanding please find the below example.
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=[0, 2, 3,4])
df2 = pd.DataFrame({'A1': ['A4', 'A5', 'A6', 'A7'],
'C': ['C4', 'C5', 'C6', 'C7'],
'D2': ['D4', 'D5', 'D6', 'D7']},
index=[ 4, 5, 6 ,7])
print(df1)
print(df2)
A B D
-------------
0 A0 B0 D0
2 A1 B1 D1
3 A2 B2 D2
4 A3 B3 D3
A1 C D2
-------------
4 A4 C4 D4
5 A5 C5 D5
6 A6 C6 D6
7 A7 C7 D7
Result I want:
A B D A1 C D2
--------------------------
0 A0 B0 D0 A4 C4 D4
2 A1 B1 D1 A5 C5 D5
3 A2 B2 D2 A6 C6 D6
4 A3 B3 D3 A7 C7 D7
PS: I would prefer a "one-shot" command to achieve this instead of using loops and adding each value.
Change the index of df2 to the index of df1 and them concatenate the dataframes:
df2.index = df1.index
pd.concat([df1, df2], axis=1)

Concat two dataframes wit common columns [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 3 years ago.
I have two dataframes with same columns. Only one column has different values. I want to concatenate the two without duplication.
df2 = pd.DataFrame({'key': ['K0', 'K1', 'K2'],'cat': ['C0', 'C1', 'C2'],'B': ['B0', 'B1', 'B2']})
df1 = pd.DataFrame({'key': ['K0', 'K1', 'K2'],'cat': ['C0', 'C1', 'C2'],'B': ['A0', 'A1', 'A2']})
df1
Out[630]:
key cat B
0 K0 C0 A0
1 K1 C1 A1
2 K2 C2 A2
df2
Out[631]:
key cat B
0 K0 C0 B0
1 K1 C1 B1
2 K2 C2 B2
I tried:
result = pd.concat([df1, df2], axis=1)
result
Out[633]:
key cat B key cat B
0 K0 C0 A0 K0 C0 B0
1 K1 C1 A1 K1 C1 B1
2 K2 C2 A2 K2 C2 B2
The desired output:
key cat B_df1 B_df2
0 K0 C0 A0 B0
1 K1 C1 A1 B1
2 K2 C2 A2 B2
NOTE: I could drop duplicates afterwards and rename columns but that doesn't seem efficient
pd.merge will do the job
pd.merge(df1,df2, on=['key','cat'])
Output
key cat B_x B_y
0 K0 C0 A0 B0
1 K1 C1 A1 B1
2 K2 C2 A2 B2

Combine pandas dataframes eliminating common columns with python

I have 3 dataframes:
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],\
'B': ['B0', 'B1', 'B2', 'B3'],\
'C': ['C0', 'C1', 'C2', 'C3'],\
'D': ['D0', 'D1', 'D2', 'D3']},\
index=[0,1,2,3])
df2 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],\
'E': ['E0', 'E1', 'E2', 'E3']},\
index=[0,1,2,3])
df3 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],\
'F': ['F0', 'F1', 'F2', 'F3']},\
index=[0,1,2,3])
I want to combine them together to get the following results:
A B C D E F
0 A0 B0 C0 D0 E0 F0
1 A1 B1 C1 D1 E1 F1
2 A2 B2 C2 D2 E2 F2
3 A3 B3 C3 D3 E3 F3
When I try to combine them, I keep getting:
A B C D A E A F
0 A0 B0 C0 D0 A0 E0 A0 F0
1 A1 B1 C1 D1 A1 E1 A1 F1
2 A2 B2 C2 D2 A2 E2 A2 F2
3 A3 B3 C3 D3 A3 E3 A3 F3
The common column (A) is duplicated once for each dataframe used in the concat call. I have tried various combinations on:
df4 = pd.concat([df1, df2, df3], axis=1, sort=False)
Some variations have been disastrous while some keep giving the undesired result. Any suggestions would be much appreciated. Thanks.
Try
df4 = (pd.concat((df.set_index('A') for df in (df1,df2,df3)), axis=1)
.reset_index()
)
Output:
A B C D E F
0 A0 B0 C0 D0 E0 F0
1 A1 B1 C1 D1 E1 F1
2 A2 B2 C2 D2 E2 F2
3 A3 B3 C3 D3 E3 F3

Pandas str alphabetically then numerically

This is probably a simple question and I just couldn't find the answer. In a pandas DataFrame like the one below, how can the objects be sorted first alphabetically and then numerically.
START:
import pandas as pd
d ={'col1': ['A1','B2','A10','A7','C4','C2','C22','B4']}
df = pd.DataFrame(data=d)
df
col1
0 A1
1 A7
2 A10
3 B2
4 B4
5 C2
6 C4
7 C22
WHAT I WANT TO GET:
col1
0 A1
1 A7
2 A10
3 B2
4 B4
5 C2
6 C4
7 C22
WHAT I GET:
>>>df.sort_values(by='col1')
col1
0 A1
2 A10
1 A7
3 B2
4 B4
5 C2
7 C22
6 C4
This is overkill to use Pandas to sort a list:
lot_file = pd.DataFrame()
lot_file['SPOOL'] = ['A39','B34','A3','B37','A6','B18','A48','B15','A47']
group_lots = lot_file.sort_values(by=['SPOOL'])
group_lots['SPOOL'].tolist()
Output:
['A3', 'A39', 'A47', 'A48', 'A6', 'B15', 'B18', 'B34', 'B37']
Or use sorted
spool_list = ['A39','B34','A3','B37','A6','B18','A48','B15','A47']
sorted(spool_list)
Output:
['A3', 'A39', 'A47', 'A48', 'A6', 'B15', 'B18', 'B34', 'B37']

Categories

Resources