Dynamically create columns in a dataframe - python

I have a Dataframe like the following:
a b a1 b1
0 1 6 10 20
1 2 7 11 21
2 3 8 12 22
3 4 9 13 23
4 5 2 14 24
where a1 and b1 are dynamically created by a and b. Can we create percentage columns dynamically as well ?
The one thing that is contant is the created columns will have 1 suffixed after the name
Expected output:
a b a1 b1 a% b%
0 0 6 10 20 0 30
1 2 7 11 21 29 33
2 3 8 12 22 38 36
3 4 9 13 23 44 39
4 5 2 14 24 250 8

Create new DataFrame by divide both columns and rename columns by DataFrame.add_suffix, last append to original by DataFrame.join:
cols = ['a','b']
new = [f'{x}1' for x in cols]
df = df.join(df[cols].div(df[new].to_numpy()).mul(100).add_suffix('%'))
print (df)
a b a1 b1 a% b%
0 1 6 10 20 10.000000 30.000000
1 2 7 11 21 18.181818 33.333333
2 3 8 12 22 25.000000 36.363636
3 4 9 13 23 30.769231 39.130435
4 5 2 14 24 35.714286 8.333333

Related

Concatenate dataframes along columns in a pandas dataframe

I want to concatenate two df along columns. Both have the same number of indices.
df1
A B C
0 1 2 3
1 4 5 6
2 7 8 9
3 10 11 12
df2
D E F
0 13 14 15
1 16 17 18
2 19 20 21
3 22 23 24
Expected:
A B C D E F
0 1 2 3 13 14 15
1 4 5 6 16 17 18
2 7 8 9 19 20 21
3 10 11 12 22 23 24
I have done:
df_combined = pd.concat([df1,df2], axis=1)
But, the df_combined have new rows with NaN values in some columns...
I can't find my error. So, what I have to do? Thanks in advance!
In this case, merge() works.
pd.merge(df1, df2, left_index=True, right_index=True)
output
A B C D E F
0 1 2 3 13 14 15
1 4 5 6 16 17 18
2 7 8 9 19 20 21
3 10 11 12 22 23 24
This works only if both dataframe have same indices.

How to copy the current row and the next row value in a new dataframe using python?

The df looks like below:
A B C
1 8 23
2 8 22
3 8 45
4 9 45
5 6 12
6 8 10
7 11 12
8 9 67
I want to create a new df with the occurence of 8 in 'B' and the next row value of 8.
New df:
The df looks like below:
A B C
1 8 23
2 8 22
3 8 45
4 9 45
6 8 10
7 11 12
Use boolean indexing with compared by shifted values with | for bitwise OR:
df = df[df.B.shift().eq(8) | df.B.eq(8)]
print (df)
A B C
0 1 8 23
1 2 8 22
2 3 8 45
3 4 9 45
5 6 8 10
6 7 11 12

Python Dataframe: Create columns based on another column

I have a dataframe with repeated values for one column (here column 'A') and I want to convert this dataframe so that new columns are formed based on values of column 'A'.
Example
df = pd.DataFrame({'A':range(4)*3, 'B':range(12),'C':range(12,24)})
df
A B C
0 0 0 12
1 1 1 13
2 2 2 14
3 3 3 15
4 0 4 16
5 1 5 17
6 2 6 18
7 3 7 19
8 0 8 20
9 1 9 21
10 2 10 22
11 3 11 23
Note that the values of "A" column are repeated 3 times.
Now I want the simplest solution to convert it to another dataframe with this configuration (please ignore the naming of the columns, it is used for description purpose only, they could be anything):
B C
A0 A1 A2 A3 A0 A1 A2 A3
0 0 1 2 3 12 13 14 15
1 4 5 6 7 16 17 18 19
2 8 9 10 11 20 21 22 23
This is a pivot problem, so use
df.assign(idx=df.groupby('A').cumcount()).pivot('idx', 'A', ['B', 'C'])
B C
A 0 1 2 3 0 1 2 3
idx
0 0 1 2 3 12 13 14 15
1 4 5 6 7 16 17 18 19
2 8 9 10 11 20 21 22 23
If the headers are important, you can use MultiIndex.set_levels to fix them.
u = df.assign(idx=df.groupby('A').cumcount()).pivot('idx', 'A', ['B', 'C'])
u.columns = u.columns.set_levels(
['A' + u.columns.levels[1].astype(str)], level=[1])
u
B C
A A0 A1 A2 A3 A0 A1 A2 A3
idx
0 0 1 2 3 12 13 14 15
1 4 5 6 7 16 17 18 19
2 8 9 10 11 20 21 22 23
You may need assign the group help key by cumcount , then just do unstack
yourdf=df.assign(D=df.groupby('A').cumcount(),A='A'+df.A.astype(str)).set_index(['D','A']).unstack()
B C
A A0 A1 A2 A3 A0 A1 A2 A3
D
0 0 1 2 3 12 13 14 15
1 4 5 6 7 16 17 18 19
2 8 9 10 11 20 21 22 23

Pivot column and column values in pandas dataframe

I have a dataframe that looks like this, but with 26 rows and 110 columns:
index/io 1 2 3 4
0 42 53 23 4
1 53 24 6 12
2 63 12 65 34
3 13 64 23 43
Desired output:
index io value
0 1 42
0 2 53
0 3 23
0 4 4
1 1 53
1 2 24
1 3 6
1 4 12
2 1 63
2 2 12
...
I have tried with dict and lists by transforming the dataframe to dict, and then create a new list with index values and update in new dict with io.
indx = []
for key, value in mydict.iteritems():
for k, v in value.iteritems():
indx.append(key)
indxio = {}
for element in indx:
for key, value in mydict.iteritems():
for k, v in value.iteritems():
indxio.update({element:k})
I know this is too far probably, but it's the only thing I could think of. The process was too long, so I stopped.
You can use set_index, stack, and reset_index().
df.set_index("index/io").stack().reset_index(name="value")\
.rename(columns={'index/io':'index','level_1':'io'})
Output:
index io value
0 0 1 42
1 0 2 53
2 0 3 23
3 0 4 4
4 1 1 53
5 1 2 24
6 1 3 6
7 1 4 12
8 2 1 63
9 2 2 12
10 2 3 65
11 2 4 34
12 3 1 13
13 3 2 64
14 3 3 23
15 3 4 43
You need set_index + stack + rename_axis + reset_index:
df = df.set_index('index/io').stack().rename_axis(('index','io')).reset_index(name='value')
print (df)
index io value
0 0 1 42
1 0 2 53
2 0 3 23
3 0 4 4
4 1 1 53
5 1 2 24
6 1 3 6
7 1 4 12
8 2 1 63
9 2 2 12
10 2 3 65
11 2 4 34
12 3 1 13
13 3 2 64
14 3 3 23
15 3 4 43
Solution with melt, rename, but there is different order of values, so sort_values is necessary:
d = {'index/io':'index'}
df = df.melt('index/io', var_name='io', value_name='value') \
.rename(columns=d).sort_values(['index','io']).reset_index(drop=True)
print (df)
index io value
0 0 1 42
1 0 2 53
2 0 3 23
3 0 4 4
4 1 1 53
5 1 2 24
6 1 3 6
7 1 4 12
8 2 1 63
9 2 2 12
10 2 3 65
11 2 4 34
12 3 1 13
13 3 2 64
14 3 3 23
15 3 4 43
And alternative solution for numpy lovers:
df = df.set_index('index/io')
a = np.repeat(df.index, len(df.columns))
b = np.tile(df.columns, len(df.index))
c = df.values.ravel()
cols = ['index','io','value']
df = pd.DataFrame(np.column_stack([a,b,c]), columns = cols)
print (df)
index io value
0 0 1 42
1 0 2 53
2 0 3 23
3 0 4 4
4 1 1 53
5 1 2 24
6 1 3 6
7 1 4 12
8 2 1 63
9 2 2 12
10 2 3 65
11 2 4 34
12 3 1 13
13 3 2 64
14 3 3 23
15 3 4 43

Changing structure of pandas dataframe

Is there a function that can swap between the following dataframes(df1,df2):
import random
import pandas as pd
numbers = random.sample(range(1,50), 10)
d = {'num': list(range(1,6)) + list(range(1,6)),'values':numbers,'type':['a']*5 + ['b']*5}
df = pd.DataFrame(d)
e = {'num': list(range(1,6)) ,'a':numbers[:5],'b':numbers[5:]}
df2 = pd.DataFrame(e)
Dataframe df1:
#df1
num type values
0 1 a 18
1 2 a 26
2 3 a 34
3 4 a 21
4 5 a 48
5 1 b 1
6 2 b 19
7 3 b 36
8 4 b 42
9 5 b 30
Dataframe df2:
a b num
0 18 1 1
1 26 19 2
2 34 36 3
3 21 42 4
4 48 30 5
I take the first df and the type column becomes a type name with the variables.Is there a function that can do this(from df1 to df2) and the vice-verca action(from df2 to df1)
You can use stack and pivot:
print df
num type values
0 1 a 20
1 2 a 25
2 3 a 2
3 4 a 27
4 5 a 29
5 1 b 39
6 2 b 40
7 3 b 6
8 4 b 17
9 5 b 47
print df2
a b num
0 20 39 1
1 25 40 2
2 2 6 3
3 27 17 4
4 29 47 5
df1 = df2.set_index('num').stack().reset_index()
df1.columns = ['num','type','values']
df1 = df1.sort_values('type')
print df1
num type values
0 1 a 20
2 2 a 46
4 3 a 21
6 4 a 33
8 5 a 10
1 1 b 45
3 2 b 39
5 3 b 38
7 4 b 37
9 5 b 34
df3 = df.pivot(index='num', columns='type', values='values').reset_index()
df3.columns.name = None
df3 = df3[['a','b','num']]
print df3
a b num
0 46 23 1
1 38 6 2
2 36 47 3
3 33 34 4
4 15 1 5

Categories

Resources