pandas switch rows with columns and preserve data - python

How to replace rows with columns in below data that all data is preserved?
Test data:
import pandas as pd
data_dic = {
"x": ['a','b','a','a','b'],
"y": [1,2,3,4,5]
}
df = pd.DataFrame(data_dic)
x y
0 a 1
1 b 2
2 a 3
3 b 4
4 b 5
Expected Output:
a b
0 1 2
1 3 4
2 NaN 5

Use GroupBy.cumcount with pivot:
df = df.assign(g = df.groupby('x').cumcount()).pivot('g','x','y')
Or DataFrame.set_index with Series.unstack:
df = df.set_index([df.groupby('x').cumcount(),'x'])['y'].unstack()
print (df)
x a b
g
0 1.0 2.0
1 3.0 4.0
2 NaN 5.0

Related

Division in pandas dataframe

I am trying to divide my data frame with one of its columns:
Here is my data frame:
A
B
C
1
10
10
2
20
30
3
15
33
Now, I want to divide columns "b" and "c" by column "a", my desired output be like:
A
B
C
1
10
10
2
10
15
3
5
11
df/df['a']
Use DataFrame.div:
df[['B','C']] = df[['B','C']].div(df['A'], axis=0)
print (df)
A B C
0 1 10.0 10.0
1 2 10.0 15.0
2 3 5.0 11.0
If need divide all columns without A:
cols = df.columns.difference(['A'])
df[cols] = df[cols].div(df['A'], axis=0)
try this:
d = {
'A': [1,2,3],
'B': [10,20,15],
'C': [10,30,33]
}
df = pd.DataFrame(d)
df['B'] = df['B']/df['A']
df['C'] = df['C']/df['A']
print(df)
Output:
A B C
0 1 10.0 10.0
1 2 10.0 15.0
2 3 5.0 11.0

Pandas Dataframe multiple rows with same index

I have a dictionary that looks like this:
dict = {
"A": [1,2,3],
"B": [4]
}
when I try to create a panda Dataframe I use:
output_df = pd.DataFrame.from_dict(dict, orient='index')
Output:
-
1
2
3
A
1
2
3
B
4
What I want:
-
1
A
1
A
2
A
3
B
4
Thanks for your help! :)
try:
df.stack().swaplevel(0,1)
1 A 1.0
2 A 2.0
3 A 3.0
1 B 4.0
dtype: float64
df.stack().swaplevel(0,1).reset_index(level=[1], name='a').reset_index(drop=True)
level_1 a
0 A 1.0
1 A 2.0
2 A 3.0
3 B 4.0

Create new variables from row for each existing variable in pandas dataframe

I have a dataframe which look like:
0 target_year ID v1 v2
1 2000 1 0.3 1
2 2000 2 1.2 4
...
10 2001 1 3 2
11 2001 2 2 2
An I would like the following output:
0 ID v1_1 v2_1 v1_2 v2_2
1 1 0.3 1 3 2
2 2 1.2 4 2 2
Do you have any idea how to do that?
You could use pd.pivot_table, using the GroupBy.cumcount of ID as columns.
Then we can use a list comprehension with f-strings to merge the MultiIndex header into a sinlge level:
cols = df.groupby('ID').ID.cumcount() + 1
df_piv = (pd.pivot_table(data = df.drop('target_year', axis=1)[['v1','v2']],
index = df.ID,
columns = cols)
df_piv.columns = [f'{i}_{j}' for i,j in df_piv.columns]
v1_1 v1_2 v2_1 v2_2
ID
1 0.3 3.0 1 2
2 1.2 2.0 4 2
Use GroupBy.cumcount for counter column, reshape by DataFrame.set_index with DataFrame.unstack and last flatten in list comprehension and f-strings:
g = df.groupby('ID').ID.cumcount() + 1
df = df.drop('target_year', axis=1).set_index(['ID', g]).unstack()
df.columns = [f'{a}_{b}' for a, b in df.columns]
df = df.reset_index()
print (df)
ID v1_1 v1_2 v2_1 v2_2
0 1 0.3 3.0 1 2
1 2 1.2 2.0 4 2
If your data come in only two years, you can also merge:
cols = ['ID','v1', 'v2']
df[df.target_year.eq(2000)][cols].merge(df[df.target_year.eq(2001)][cols],
on='ID',
suffixes=['_1','_2'])
Output
ID v1_1 v2_1 v1_2 v2_2
0 1 0.3 1 3.0 2
1 2 1.2 4 2.0 2

Shift rows in a Column when equal to specific value

I want to shift rows in a pandas df when values are equal to a specific value in a Column. For the df below, I'm trying to shift the values in Column B to Column A when values in A == x.
import pandas as pd
df = pd.DataFrame({
'A' : [1,'x','x','x',5],
'B' : ['x',2,3,4,'x'],
})
This is my attempt:
df = df.loc[df.A.shift(-1) == df.A.shift(1), 'x'] = df.A.shift(1)
Intended Output:
A B
0 1 x
1 2
2 3
3 4
4 5 x
You can use:
m = df.A.eq('x')
df[m]=df[m].shift(-1,axis=1)
print(df)
A B
0 1 x
1 2 NaN
2 3 NaN
3 4 NaN
4 5 x
You can use:
df[df.A=='x'] = df.shift(-1,axis=1)
print(df)
A B
0 1 x
1 2 NaN
2 3 NaN
3 4 NaN
4 5 x

Why Python Pandas append to DataFrame like this?

I want to add l in column 'A' but it creates a new column and adds l to the last one. Why is it happening? And how can I make what I want?
import pandas as pd
l=[1,2,3]
df = pd.DataFrame(columns =['A'])
df = df.append(l, ignore_index=True)
df = df.append(l, ignore_index=True)
print(df)
A 0
0 NaN 1.0
1 NaN 2.0
2 NaN 3.0
3 NaN 1.0
4 NaN 2.0
5 NaN 3.0
Edited
Is this what you want to do:
In[6]:df=df.A.append(pd.Series(l)).reset_index().drop('index',1).rename(columns={0:'A'})
In[7]:df
Out[7]:
A
0 1
1 2
2 3
Then you can add any list of different length.
Suppose:
a=[9,8,7,6,5]
In[11]:df=df.A.append(pd.Series(a)).reset_index().drop('index',1).rename(columns={0:'A'})
In[12]:df
Out[12]:
A
0 1
1 2
2 3
3 9
4 8
5 7
6 6
7 5
Previously
are you looking for this :
df=pd.DataFrame(l,columns=['A'])
df
Out[5]:
A
0 1
1 2
2 3
You can just pass a dictionary in the dataframe constructor, that if I understand your question correctly.
l = [1,2,3]
df = pd.DataFrame({'A': l})
df
A
0 1
1 2
2 3

Categories

Resources