I have a dataframe of groups of 3s like:
group value1 value2 value3
1 A1 A2 A3
1 B1 B2 B3
1 C1 C2 C3
2 D1 D2 D3
2 E1 E2 E3
2 F1 F2 F3
...
I'd like to re-order the cells within each group according to a fixed rule by their 'positions', and repeat the same operation over all groups.
This 'fixed' rule will work like below:
Input:
group value1 value2 value3
1 position1 position2 position3
1 position4 position5 position6
1 position7 position8 position9
Output:
group value1 value2 value3
1 position1 position8 position6
1 position4 position2 position9
1 position7 position5 position3
Eventually the dataframe should look like (if this makes sense):
group value1 value2 value3
1 A1 C2 B3
1 B1 A2 C3
1 C1 B2 A3
2 D1 F2 E3
2 E1 D2 F3
2 F1 E2 D3
...
I know how to re-order them if the dataframe only has one group - basically create a temporary variable to store values, get each cell by .loc, and overwrite each cell with desired values.
However, even if we only have 1 group of 3 rows, this is still an apparently silly and tedious way.
My question is: can we possibly
find a general operation to rearrange cells by their relative position of in a group
repeat this operation over all groups?
Here is a proposal which uses numpy indexing with reshaping on each group.
Setup:
Lets assume your original df and the position dataframes are as below:
d = {'group': [1, 1, 1, 2, 2, 2],
'value1': ['A1', 'B1', 'C1', 'D1', 'E1', 'F1'],
'value2': ['A2', 'B2', 'C2', 'D2', 'E2', 'F2'],
'value3': ['A3', 'B3', 'C3', 'D3', 'E3', 'F3']}
out_d = {'group': [1, 1, 1, 2, 2, 2],
'value1': ['position1', 'position4', 'position7',
'position1', 'position4', 'position7'],
'value2': ['position8', 'position2', 'position5',
'position8', 'position2', 'position5'],
'value3': ['position6', 'position9', 'position3',
'position6', 'position9', 'position3']}
df = pd.DataFrame(d)
out = pd.DataFrame(out_d)
print("Original dataframe :\n\n",df,"\n\n Position dataframe :\n\n",out)
Original dataframe :
group value1 value2 value3
0 1 A1 A2 A3
1 1 B1 B2 B3
2 1 C1 C2 C3
3 2 D1 D2 D3
4 2 E1 E2 E3
5 2 F1 F2 F3
Position dataframe :
group value1 value2 value3
0 1 position1 position8 position6
1 1 position4 position2 position9
2 1 position7 position5 position3
3 2 position1 position8 position6
4 2 position4 position2 position9
5 2 position7 position5 position3
Working Solution:
Method 1: : Creating a function and use in df.groupby.apply
#remove letters and extract only position numbers and subtract 1
#since python indexing starts at 0
o = out.applymap(lambda x: int(''.join(re.findall('\d+',x)))-1 if type(x)==str else x)
#Merge this output with original dataframe
df1 = df.merge(o,on='group',left_index=True,right_index=True,suffixes=('','_pos'))
# Build a function which rearranges the df based on the position df:
def fun(x):
c = x.columns.str.contains("_pos")
return pd.DataFrame(np.ravel(x.loc[:,~c])[np.ravel(x.loc[:,c])]
.reshape(x.loc[:,~c].shape),
columns=x.columns[~c])
output = (df1.groupby("group").apply(fun).reset_index("group")
.reset_index(drop=True))
print(output)
group value1 value2 value3
0 1 A1 C2 B3
1 1 B1 A2 C3
2 1 C1 B2 A3
3 2 D1 F2 E3
4 2 E1 D2 F3
5 2 F1 E2 D3
Method 2: Iterate through each group and re-arrange:
o = out.applymap(lambda x: int(''.join(re.findall('\d+',x)))-1 if type(x)==str else x)
df1 = df.merge(o,on='group',left_index=True,right_index=True,
suffixes=('','_pos')).set_index("group")
idx = df1.index.unique()
l = []
for i in idx:
v = df1.loc[i]
c = v.columns.str.contains("_pos")
l.append(np.ravel(v.loc[:,~c])[np.ravel(v.loc[:,c])].reshape(v.loc[:,~c].shape))
final = pd.DataFrame(np.concatenate(l),index=df1.index,
columns=df1.columns[~c]).reset_index()
print(final)
group value1 value2 value3
0 1 A1 C2 B3
1 1 B1 A2 C3
2 1 C1 B2 A3
3 2 D1 F2 E3
4 2 E1 D2 F3
5 2 F1 E2 D3
Related
Given df1 and df2:
df1 = pd.DataFrame({
'Key': ['k1', 'k1', 'k1', 'k2', 'k3'],
'Num': [1, 2, 3, 1, 2],
'A': ['a1', 'a2', 'a3', 'a4', 'a5']
})
display(df1)
df2 = pd.DataFrame({
'Key': ['k1', 'k1', 'k2', 'k3'],
'Num': [1, 2, 1, 1],
'X': ['x1', 'x2', 'x3', 'x4']
})
display(df2)
df1:
Key Num A
0 k1 1 a1
1 k1 2 a2
2 k1 3 a3
3 k2 1 a4
4 k3 2 a5
df2:
Key Num X
0 k1 1 x1
1 k1 2 x2
2 k2 1 x3
3 k3 1 x4
Expected Output:
Key Num A X
0 k1 1 a1 x1
1 k1 2 a2 x2
2 k1 3 a3 x1
3 k2 1 a4 x3
4 k3 2 a5 x4
I would like to merge df2 into df1 on columns 'Key' and 'Num' such that if Num doesn't match, then the value with same key and num 1 from df2 will be matched if available.
IIUC, you can merge then fillna with another merge (here as map):
s = df1['Key'].map(df2.drop_duplicates('Key').set_index('Key')['X'])
df3 = (df1
.merge(df2, on=['Key', 'Num'], how='left')
.fillna({'X': s})
)
output:
Key Num A X
0 k1 1 a1 x1
1 k1 2 a2 x2
2 k1 3 a3 x1
3 k2 1 a4 x3
4 k3 2 a5 x4
Right merge df2 to df1, then merge that to df2 by just Key where Num == 1.
Fill in the missing values in X_x with the X_y values.
Drop excess columns and restore naming:
df3 = df2.merge(df1, how='right').merge(df2[df2.Num==1], on='Key')
df3['X_x'] = df3[['X_x', 'X_y']].bfill(axis=1)['X_x']
df3.drop(['Num_y', 'X_y'], axis=1, inplace=True)
df3.columns = ['Key', 'Num', 'X', 'A']
display(df3)
Output:
Key Num X A
0 k1 1 x1 a1
1 k1 2 x2 a2
2 k1 3 x1 a3
3 k2 1 x3 a4
4 k3 2 x4 a5
I have a text file with my data, formatted as a single list. The data is actually a number of rows and columns, but the format is as a single column. I have imported this into a pandas dataframe, and I would like to reshape this dataframe.
This is the format of the data as a list:
a1
b1
c1
d1
e1
a2
b2
c2
d2
e2
a3
b3
c3
d3
e3
etc...
The desired format is:
"Heading 1" "Heading 2" "Heading 3" "Heading 4" "Heading 5"
a1 b1 c1 d1 e1
a2 b2 c2 d2 e2
a3 b3 c3 d3 e3
I have tried pandas stack and unstack functions, but no luck. I had also tried using a numpy array, but my data has numbers and strings in it, so this does not work well.
You can create list of tuples first and pass to DataFrame constructor:
L = ['a1', 1, 'c1', 'd1', 'e1', 'a2', 2, 'c2', 'd2', 'e2', 'a3', 3, 'c3', 'd3', 'e3']
import itertools
#https://stackoverflow.com/a/1625013
def grouper(n, iterable, fillvalue=None):
"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return itertools.zip_longest(*args, fillvalue=fillvalue)
print (list(grouper(5, L)))
[('a1', 1, 'c1', 'd1', 'e1'), ('a2', 2, 'c2', 'd2', 'e2'), ('a3', 3, 'c3', 'd3', 'e3')]
df = pd.DataFrame(list(grouper(5, L))).rename(columns = lambda x: f'Heading {x + 1}')
print (df)
Heading 1 Heading 2 Heading 3 Heading 4 Heading 5
0 a1 1 c1 d1 e1
1 a2 2 c2 d2 e2
2 a3 3 c3 d3 e3
print (df.dtypes)
Heading 1 object
Heading 2 int64
Heading 3 object
Heading 4 object
Heading 5 object
dtype: object
First idea with reshape, but last is necessary convert column to numeric:
df = pd.DataFrame(np.array(L).reshape(-1, 5)).rename(columns = lambda x: f'Heading {x + 1}')
print (df)
Heading 1 Heading 2 Heading 3 Heading 4 Heading 5
0 a1 1 c1 d1 e1
1 a2 2 c2 d2 e2
2 a3 3 c3 d3 e3
print (df.dtypes)
Heading 1 object
Heading 2 object <- converted to object
Heading 3 object
Heading 4 object
Heading 5 object
dtype: object
This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 3 years ago.
I have two dataframes with same columns. Only one column has different values. I want to concatenate the two without duplication.
df2 = pd.DataFrame({'key': ['K0', 'K1', 'K2'],'cat': ['C0', 'C1', 'C2'],'B': ['B0', 'B1', 'B2']})
df1 = pd.DataFrame({'key': ['K0', 'K1', 'K2'],'cat': ['C0', 'C1', 'C2'],'B': ['A0', 'A1', 'A2']})
df1
Out[630]:
key cat B
0 K0 C0 A0
1 K1 C1 A1
2 K2 C2 A2
df2
Out[631]:
key cat B
0 K0 C0 B0
1 K1 C1 B1
2 K2 C2 B2
I tried:
result = pd.concat([df1, df2], axis=1)
result
Out[633]:
key cat B key cat B
0 K0 C0 A0 K0 C0 B0
1 K1 C1 A1 K1 C1 B1
2 K2 C2 A2 K2 C2 B2
The desired output:
key cat B_df1 B_df2
0 K0 C0 A0 B0
1 K1 C1 A1 B1
2 K2 C2 A2 B2
NOTE: I could drop duplicates afterwards and rename columns but that doesn't seem efficient
pd.merge will do the job
pd.merge(df1,df2, on=['key','cat'])
Output
key cat B_x B_y
0 K0 C0 A0 B0
1 K1 C1 A1 B1
2 K2 C2 A2 B2
I have a dataframe that looks like this:
df = pd.DataFrame({'key': ['K0', 'K0', 'K0', 'K1'],'cat': ['C0', 'C0', 'C1', 'C1'],'B': ['A0', 'A1', 'A2', 'A3']})
df
Out[15]:
key cat B
0 K0 C0 A0
1 K0 C0 A1
2 K0 C1 A2
3 K1 C1 A3
Is it possible to convert it to:
key cat B
0 K0 C0 A0
1 A1
2 K0 C1 A2
3 K1 C1 A3
I want to avoid showing same value of key & cat again and again and key reappears once cat changes.
It's for an excel purpose so I need it to be compatible with:
style.apply(f)
to_excel()
You can use duplicated over a subset of the columns to look for duplicate values:
cols = ['key', 'cat']
df.loc[df.duplicated(subset=cols), cols] = ''
key cat B
0 K0 C0 A0
1 A1
2 K0 C1 A2
3 K1 C1 A3
This is probably a simple question and I just couldn't find the answer. In a pandas DataFrame like the one below, how can the objects be sorted first alphabetically and then numerically.
START:
import pandas as pd
d ={'col1': ['A1','B2','A10','A7','C4','C2','C22','B4']}
df = pd.DataFrame(data=d)
df
col1
0 A1
1 A7
2 A10
3 B2
4 B4
5 C2
6 C4
7 C22
WHAT I WANT TO GET:
col1
0 A1
1 A7
2 A10
3 B2
4 B4
5 C2
6 C4
7 C22
WHAT I GET:
>>>df.sort_values(by='col1')
col1
0 A1
2 A10
1 A7
3 B2
4 B4
5 C2
7 C22
6 C4
This is overkill to use Pandas to sort a list:
lot_file = pd.DataFrame()
lot_file['SPOOL'] = ['A39','B34','A3','B37','A6','B18','A48','B15','A47']
group_lots = lot_file.sort_values(by=['SPOOL'])
group_lots['SPOOL'].tolist()
Output:
['A3', 'A39', 'A47', 'A48', 'A6', 'B15', 'B18', 'B34', 'B37']
Or use sorted
spool_list = ['A39','B34','A3','B37','A6','B18','A48','B15','A47']
sorted(spool_list)
Output:
['A3', 'A39', 'A47', 'A48', 'A6', 'B15', 'B18', 'B34', 'B37']