Need to convert all columns in a row with unique values infront? - python

I have data frame where i need to convert all the column in a row with their unique values
A B C
1 2 2
1 2 3
5 2 9
Desired output
X1 V1
A 1
A 5
B 2
C 2
C 3
C 9
I can get unique values by unique() function but don't know how I get desired output in pandas

You can use melt and drop_duplicates:
df.melt(var_name='X1', value_name='V1').drop_duplicates()
Output:
X1 V1
0 A 1
2 A 5
3 B 2
6 C 2
7 C 3
8 C 9
P.S. And you can add .reset_index(drop=True) if you want to have sequential integers for index

Related

summing columns from different dataframes Pandas

I have 3 DataFrames, all with over 100 rows and 1000 columns. I am trying to combine all these DataFrames into one in such a way that common columns from each DataFrame are summed up. I understand there is a method of summation called "pd.DataFrame.sum()", but remember, I have over 1000 columns and I can not add each common column manually. I am attaching sample DataFrames and the result I want. Help will be appreciated.
#Sample DataFrames.
df_1 = pd.DataFrame({'a':[1,2,3],'b':[2,1,0],'c':[1,3,5]})
df_2 = pd.DataFrame({'a':[1,1,0],'b':[2,1,4],'c':[1,0,2],'d':[2,2,2]})
df_3 = pd.DataFrame({'a':[1,2,3],'c':[1,3,5], 'x':[2,3,4]})
#Result.
df_total = pd.DataFrame({'a':[3,5,6],'b':[4,2,4],'c':[3,6,12],'d':[2,2,2], 'x':[2,3,4]})
df_total
a b c d x
0 3 4 3 2 2
1 5 2 6 2 3
2 6 4 12 2 4
Let us do pd.concat then sum
out = pd.concat([df_1,df_2,df_3],axis=1).sum(level=0,axis=1)
Out[7]:
a b c d x
0 3 4 3 2 2
1 5 2 6 2 3
2 6 4 12 2 4
You can add with fill_value=0:
df_1.add(df_2, fill_value=0).add(df_3, fill_value=0).astype(int)
Output:
a b c d x
0 3 4 3 2 2
1 5 2 6 2 3
2 6 4 12 2 4
Note: pandas intrinsically aligns most operations along indexes (index and column headers).

Finding difference between two columns of a dataframe along with groupby

I saw a primitive version of this question here
but i my dataframe has diffrent names and i want to calculate separately for them
A B C
0 a 3 5
1 a 6 9
2 b 3 8
3 b 11 19
i want to groupby A and then find diffence between alternate B and C.something like this
A B C dA
0 a 3 5 6
1 a 6 9 NaN
2 b 3 8 16
3 b 11 19 NaN
i tried doing
df['dA']=df.groupby('A')(['C']-['B'])
df['dA']=df.groupby('A')['C']-df.groupby('A')['B']
none of them helped
what mistake am i making?
IIUC, here is one way to perform the calculation:
# create the data frame
from io import StringIO
import pandas as pd
data = '''idx A B C
0 a 3 5
1 a 6 9
2 b 3 8
3 b 11 19
'''
df = pd.read_csv(StringIO(data), sep='\s+', engine='python').set_index('idx')
Now, compute dA. I look last value of C less first value of B, as grouped by A. (Is this right? Or is it max(C) less min(B)?). If you're guaranteed to have the A values in pairs, then #BenT's shift() would be more concise.
dA = (
(df.groupby('A')['C'].transform('last') -
df.groupby('A')['B'].transform('first'))
.drop_duplicates()
.rename('dA'))
print(pd.concat([df, dA], axis=1))
A B C dA
idx
0 a 3 5 6.0
1 a 6 9 NaN
2 b 3 8 16.0
3 b 11 19 NaN
I used groupby().transform() to preserve index values, to support the concat operation.

Calculate Quantiles on Groupby Dataframe and add value back to DF

What I'm looking to do is group my Dataframe on a Categorical column, compute quantiles using second column, and store the result in a 3rd column. For simplicity lets just do the P50. Example below:
Original DF:
Col1 Col2
A 2
B 4
C 2
A 6
B 12
C 10
Desired DF:
Col1 Col2 Col3_P50
A 2 4
B 4 8
C 2 6
A 6 4
B 12 8
C 10 6
One easy way would be to create a small dataframe of each Category (A,B,C) and compute quantile and merge back to existing DF, but my actual dataset has 100s of category so this isn't an option. Any suggestions would be much appreciated!
You can do transform with quantile
df['Col3_P50'] = df.groupby("Col1")['Col2'].transform('quantile',0.5)
print(df)
Col1 Col2 Col3_P50
0 A 2 4
1 B 4 8
2 C 2 6
3 A 6 4
4 B 12 8
5 C 10 6
If you have multiple values, one way is creating a dictionary and set the keys as column names and values inside the groupby:
d = {'P_50':0.5,'P_90':0.9}
for k,v in d.items():
df[k]=df.groupby("Col1")['Col2'].transform('quantile',v)
print(df)
Col1 Col2 P_50 P_90
0 A 2 4 5.6
1 B 4 8 11.2
2 C 2 6 9.2
3 A 6 4 5.6
4 B 12 8 11.2
5 C 10 6 9.2

Melting values of multiple columns to single column based on another column value in pandas

I have a dataframe which looks like :
A B 1 4
alpha 1 2 3
beta 4 5 6
gamma 4 8 9
df= pd.DataFrame([['alpha',1,2,3], ['beta', 4,5,6], ['gamma',4,8,9]], columns=['A','B', 1, 4])
I an now trying to map value of column 'B' to -> 1 and 4. The result dataframe should look like:
A B value
alpha 1 2
beta 4 6
gamma 4 9
​
I tried melt and stack but couldn't figure it out.
Let us try lookup
df['value']=df.lookup(df.index,df.B.astype(str))
df
A B 1 4 value
0 alpha 1 2 3 2
1 beta 4 5 6 6
2 gamma 4 8 9 9

How to change values of dataframe cells based on following row cell value?

I'm working in Python, with Pandas DataFrames.
I have a problem where my dataframe looks like this:
Index A B Copy_of_B
1 a 0 0
2 a 1 1
3 a 5 5
4 b 0 0
5 b 4 4
6 c 6 6
My expected output is:
Index A B Copy_of_B
1 a 0 1
2 a 1 1
3 a 5 5
4 b 0 4
5 b 4 4
6 c 6 6
I would like to replace the 0 values in the Copy_of_B column with the values in the following row, but I don't want to use a for loop to iterate.
Is there an easy solution for this?
Thanks,
Barna
I make use of fact that your DataFrame has index composed of consecutive numbers.
Start from creating 2 indices:
ind = df[df.Copy_of_B == 0].index
ind2 = ind + 1
The first contains index values of rows where Copy_of_B == 0.
The second contains indices of subsequent rows.
Then, to "copy" data from subsequent rows to rows containing zeroes, run:
df.loc[ind, 'Copy_of_B'] = df.loc[ind2, 'Copy_of_B'].tolist()
As you can see, without any loop running over the whole DataFrame.
You can use mask and bfill:
df['Copy_of_B'] = df['B'].mask(df['B'].eq(0)).bfill()
Output:
Index A B Copy_of_B
0 1 a 0 1.0
1 2 a 1 1.0
2 3 a 5 5.0
3 4 b 0 4.0
4 5 b 4 4.0
5 6 c 6 6.0

Categories

Resources