Swapping Column values based on condition - python

this is my dataframe
S.No Column1 Column2
0 7 A B
1 2 D F
2 5 C H
3 9 NaN J
4 1 T G
5 4 Z True
6 10 S Y
7 3 G V
8 10 R Y
9 8 T X
df.replace([np.nan,True],'A',inplace=True)
S.No Column1 Column2
0 7 A B
1 2 D F
2 5 C H
3 9 A J
4 1 T G
5 4 Z A
6 10 S Y
7 3 G V
8 10 R Y
9 8 T X
required output be like
S.No Column1 Column2
0 7 B A
1 2 F D
2 5 H C
3 9 J A
4 1 T G
5 4 Z A
6 10 Y S
7 3 V G
8 10 Y V
9 8 X T
HOW TO WRITE CODE

Use rename:
>>> df
S.No Column1 Column2
0 7 A B
1 2 D F
2 5 C H
3 9 NaN J
4 1 T G
5 4 Z True
6 10 S Y
7 3 G V
8 10 R Y
9 8 T X
>>> df.rename(columns={'Column1': 'Column2', 'Column2': 'Column1'})[df.columns]
S.No Column1 Column2
0 7 B A
1 2 F D
2 5 H C
3 9 J NaN
4 1 G T
5 4 True Z
6 10 Y S
7 3 V G
8 10 Y R
9 8 X T

If you want to swap the contents of the 2 columns, you can use:
df[['Column1', 'Column2']] = list(zip(df['Column2'], df['Column1']))
or use .to_numpy() or .values, as follows:
df[['Column1', 'Column2']] = df[['Column2', 'Column1']].to_numpy()
or
df[['Column1', 'Column2']] = df[['Column2', 'Column1']].values
Result:
Based on your data after df.replace:
print(df)
S.No Column1 Column2
0 7 B A
1 2 F D
2 5 H C
3 9 J A
4 1 G T
5 4 A Z
6 10 Y S
7 3 V G
8 10 Y R
9 8 X T

Related

How to compute nested group proportions using python pandas without losing original row count

Given the following data:
df = pd.DataFrame({
'where': ['a','a','a','a','a','a'] + ['b','b','b','b','b','b'],
'what': ['x','y','z','x','y','z'] + ['x','y','z','x','y','z'],
'val' : [1,3,2,5,4,3] + [5,6,3,4,5,3]
})
Which looks as:
where what val
0 a x 1
1 a y 3
2 a z 2
3 a x 5
4 a y 4
5 a z 3
6 b x 5
7 b y 6
8 b z 3
9 b x 4
10 b y 5
11 b z 3
I would like to compute the proportion of what in where, and create a new
column that represented this.
The column will have duplicates, If I consider what = x in the above, and
add that column in then the data would be as follows
where what val what_where_prop
0 a x 1 6/18
1 a y 3
2 a z 2
3 a x 5 6/18
4 a y 4
5 a z 3
6 b x 5 9/26
7 b y 6
8 b z 3
9 b x 4 9/26
10 b y 5
11 b z 3
Here 6/18 is computed by finding the total x (6 = 1 + 5) in a over the total of val in a. The same process is taken for 9/26
The full solution will be filled similarly for y and z in the final column.
IIUC,
df['what_where_group'] = (df.groupby(['where', 'what'], as_index=False)['val']
.transform('sum')
.div(df.groupby('where')['val']
.transform('sum'),
axis=0))
df
Output:
where what val what_where_prop what_where_group
0 a x 1 6 0.333333
1 a y 3 7 0.388889
2 a z 2 5 0.277778
3 a x 5 6 0.333333
4 a y 4 7 0.388889
5 a z 3 5 0.277778
6 b x 5 9 0.346154
7 b y 6 11 0.423077
8 b z 3 6 0.230769
9 b x 4 9 0.346154
10 b y 5 11 0.423077
11 b z 3 6 0.230769
Details:
First groupby two levels using what and where, by using index=False, I am not setting the index as the groups, and transform sum. Next, groupby only where and transform sum. Lastly, divide, using div, the first groupby by the second groupby using the direction as rows with axis=0.
Another way:
g = df.set_index(['where', 'what'])['val']
num = g.sum(level=[0,1])
denom = g.sum(level=0)
ww_group = num.div(denom, level=0).rename('what_where_group')
df.merge(ww_group, left_on=['where','what'], right_index=True)
Output:
where what val what_where_prop what_where_group
0 a x 1 6 0.333333
3 a x 5 6 0.333333
1 a y 3 7 0.388889
4 a y 4 7 0.388889
2 a z 2 5 0.277778
5 a z 3 5 0.277778
6 b x 5 9 0.346154
9 b x 4 9 0.346154
7 b y 6 11 0.423077
10 b y 5 11 0.423077
8 b z 3 6 0.230769
11 b z 3 6 0.230769
Details:
Basically the same as before just using steps. And, merge results to apply division to each line.

Display pandas columns - wide to long

Say I have a row of column headers, and associated values in a Pandas Dataframe:
print df
A B C D E F G H I J K
1 2 3 4 5 6 7 8 9 10 11
how do I go about displaying them like the following:
print df
A B C D E
1 2 3 4 5
F G H I J
6 7 8 9 10
K
11
custom function
def new_repr(self):
g = self.groupby(np.arange(self.shape[1]) // 5, axis=1)
return '\n\n'.join([d.to_string() for _, d in g])
print(new_repr(df))
A B C D E
0 1 2 3 4 5
F G H I J
0 6 7 8 9 10
K
0 11
pd.set_option('display.width', 20)
pd.set_option('display.expand_frame_repr', True)
df
A B C D E \
0 1 2 3 4 5
F G H I J \
0 6 7 8 9 10
K
0 11

pandas select FROM .... TO

I have a dataframe
C V S D LOC
1 2 3 4 X
5 6 7 8
1 2 3 4
5 6 7 8 Y
9 10 11 12
how can i select rows from loc X to Y and inport them in another csv
Use idxmax for first values of index where True in condition:
df = df.loc[(df['LOC'] == 'X').idxmax():(df['LOC'] == 'Y').idxmax()]
print (df)
C V S D LOC
0 1 2 3 4 X
1 5 6 7 8 NaN
2 1 2 3 4 NaN
3 5 6 7 8 Y
In [133]: df.loc[df.index[df.LOC=='X'][0]:df.index[df.LOC=='Y'][0]]
Out[133]:
C V S D LOC
0 1 2 3 4 X
1 5 6 7 8 NaN
2 1 2 3 4 NaN
3 5 6 7 8 Y
PS this will select all rows between first occurence of X and first occurence of Y

How two merge two dataframes without any index being based

Suppose I have two dataframes X and Y:
import pandas as pd
X = pd.DataFrame({'A':[1,4,7],'B':[2,5,8],'C':[3,6,9]})
Y = pd.DataFrame({'D':[1],'E':[11]})
In [4]: X
Out[4]:
A B C
0 1 2 3
1 4 5 6
2 7 8 9
In [6]: Y
Out[6]:
D E
0 1 11
and then, I want to get the following result dataframe:
A B C D E
0 1 2 3 1 11
1 4 5 6 1 11
2 7 8 9 1 11
how?
Assuming that Y contains only one row:
In [9]: X.assign(**Y.to_dict('r')[0])
Out[9]:
A B C D E
0 1 2 3 1 11
1 4 5 6 1 11
2 7 8 9 1 11
or a much nicer alternative from #piRSquared:
In [27]: X.assign(**Y.iloc[0])
Out[27]:
A B C D E
0 1 2 3 1 11
1 4 5 6 1 11
2 7 8 9 1 11
Helper dict:
In [10]: Y.to_dict('r')[0]
Out[10]: {'D': 1, 'E': 11}
Here is another way
Y2 = pd.concat([Y]*3, ignore_index = True) #This duplicates the rows
Which produces:
D E
0 1 11
0 1 11
0 1 11
Then concat once again:
pd.concat([X,Y2], axis =1)
A B C D E
0 1 2 3 1 11
1 4 5 6 1 11
2 7 8 9 1 11

Handling duplicate rows in python

I have a date frame df, let's say with 5 columns : a, b, c, d, e.
a b c d e
1 6 x 8 3
2 3 y 2 3
3 5 d 1 1
3 4 g 3 4
5 3 z 3 1
This is what I want to do, for all the rows with same value of column a, I want to drop duplicates, but value of column b should be summed across those rows, and for rest of the columns, I want to keep the first value.
Final Data frame will be :
a b c d e
1 6 x 8 3
2 3 y 2 3
3 9 d 1 1
5 3 z 3 1
How to do this?
I'd assign to column 'b' the result of grouping on 'a' and summing, you can then drop the duplicates:
In [171]:
df['b'] = df.groupby('a')['b'].transform('sum')
df
Out[171]:
a b c d e
0 1 6 x 8 3
1 2 3 y 2 3
2 3 9 d 1 1
3 3 9 g 3 4
4 5 3 z 3 1
In [172]:
df.drop_duplicates('a')
Out[172]:
a b c d e
0 1 6 x 8 3
1 2 3 y 2 3
2 3 9 d 1 1
4 5 3 z 3 1

Categories

Resources