Is there a function that can swap between the following dataframes(df1,df2):
import random
import pandas as pd
numbers = random.sample(range(1,50), 10)
d = {'num': list(range(1,6)) + list(range(1,6)),'values':numbers,'type':['a']*5 + ['b']*5}
df = pd.DataFrame(d)
e = {'num': list(range(1,6)) ,'a':numbers[:5],'b':numbers[5:]}
df2 = pd.DataFrame(e)
Dataframe df1:
#df1
num type values
0 1 a 18
1 2 a 26
2 3 a 34
3 4 a 21
4 5 a 48
5 1 b 1
6 2 b 19
7 3 b 36
8 4 b 42
9 5 b 30
Dataframe df2:
a b num
0 18 1 1
1 26 19 2
2 34 36 3
3 21 42 4
4 48 30 5
I take the first df and the type column becomes a type name with the variables.Is there a function that can do this(from df1 to df2) and the vice-verca action(from df2 to df1)
You can use stack and pivot:
print df
num type values
0 1 a 20
1 2 a 25
2 3 a 2
3 4 a 27
4 5 a 29
5 1 b 39
6 2 b 40
7 3 b 6
8 4 b 17
9 5 b 47
print df2
a b num
0 20 39 1
1 25 40 2
2 2 6 3
3 27 17 4
4 29 47 5
df1 = df2.set_index('num').stack().reset_index()
df1.columns = ['num','type','values']
df1 = df1.sort_values('type')
print df1
num type values
0 1 a 20
2 2 a 46
4 3 a 21
6 4 a 33
8 5 a 10
1 1 b 45
3 2 b 39
5 3 b 38
7 4 b 37
9 5 b 34
df3 = df.pivot(index='num', columns='type', values='values').reset_index()
df3.columns.name = None
df3 = df3[['a','b','num']]
print df3
a b num
0 46 23 1
1 38 6 2
2 36 47 3
3 33 34 4
4 15 1 5
Related
I have a dataframe-
data={'a':[1,2,3,6],'b':[5,6,7,6],'c':[45,77,88,99]}
df=pd.DataFrame(data)
Now I want to add a column at a two rows down in the dataframe.
The updated dataframe should look like-
l=[4,5] #column to add
a b c d
0 1 5 45 0
1 2 6 77 0
2 3 7 88 4
3 6 6 99 5
I did this-
df.loc[:2,'f'] = pd.Series(l)
Idea is add Series by index with length by list:
df['d'] = pd.Series(l, index=df.index[-len(l):])
print (df)
a b c d
0 1 5 45 NaN
1 2 6 77 NaN
2 3 7 88 4.0
3 6 6 99 5.0
Last for 0 values add Series.reindex by original index
df['d'] = pd.Series(l, index=df.index[-len(l):]).reindex(df.index, fill_value=0)
print (df)
a b c d
0 1 5 45 0
1 2 6 77 0
2 3 7 88 4
3 6 6 99 5
Another idea is repeat 0 values by difference of lengths and add l:
df['d'] = [0] * (len(df) - len(l)) + l
print (df)
a b c d
0 1 5 45 0
1 2 6 77 0
2 3 7 88 4
3 6 6 99 5
You can add a col with 0s and set the index:
>>> df
a b c
0 1 5 45
1 2 6 77
2 3 7 88
3 6 6 99
>>> df['d'] = 0
>>> df.iloc[-2:, df.columns.get_loc('d')] = [4,5]
>>> df
a b c d
0 1 5 45 0
1 2 6 77 0
2 3 7 88 4
3 6 6 99 5
I have a dataframe that looks like this, but with 26 rows and 110 columns:
index/io 1 2 3 4
0 42 53 23 4
1 53 24 6 12
2 63 12 65 34
3 13 64 23 43
Desired output:
index io value
0 1 42
0 2 53
0 3 23
0 4 4
1 1 53
1 2 24
1 3 6
1 4 12
2 1 63
2 2 12
...
I have tried with dict and lists by transforming the dataframe to dict, and then create a new list with index values and update in new dict with io.
indx = []
for key, value in mydict.iteritems():
for k, v in value.iteritems():
indx.append(key)
indxio = {}
for element in indx:
for key, value in mydict.iteritems():
for k, v in value.iteritems():
indxio.update({element:k})
I know this is too far probably, but it's the only thing I could think of. The process was too long, so I stopped.
You can use set_index, stack, and reset_index().
df.set_index("index/io").stack().reset_index(name="value")\
.rename(columns={'index/io':'index','level_1':'io'})
Output:
index io value
0 0 1 42
1 0 2 53
2 0 3 23
3 0 4 4
4 1 1 53
5 1 2 24
6 1 3 6
7 1 4 12
8 2 1 63
9 2 2 12
10 2 3 65
11 2 4 34
12 3 1 13
13 3 2 64
14 3 3 23
15 3 4 43
You need set_index + stack + rename_axis + reset_index:
df = df.set_index('index/io').stack().rename_axis(('index','io')).reset_index(name='value')
print (df)
index io value
0 0 1 42
1 0 2 53
2 0 3 23
3 0 4 4
4 1 1 53
5 1 2 24
6 1 3 6
7 1 4 12
8 2 1 63
9 2 2 12
10 2 3 65
11 2 4 34
12 3 1 13
13 3 2 64
14 3 3 23
15 3 4 43
Solution with melt, rename, but there is different order of values, so sort_values is necessary:
d = {'index/io':'index'}
df = df.melt('index/io', var_name='io', value_name='value') \
.rename(columns=d).sort_values(['index','io']).reset_index(drop=True)
print (df)
index io value
0 0 1 42
1 0 2 53
2 0 3 23
3 0 4 4
4 1 1 53
5 1 2 24
6 1 3 6
7 1 4 12
8 2 1 63
9 2 2 12
10 2 3 65
11 2 4 34
12 3 1 13
13 3 2 64
14 3 3 23
15 3 4 43
And alternative solution for numpy lovers:
df = df.set_index('index/io')
a = np.repeat(df.index, len(df.columns))
b = np.tile(df.columns, len(df.index))
c = df.values.ravel()
cols = ['index','io','value']
df = pd.DataFrame(np.column_stack([a,b,c]), columns = cols)
print (df)
index io value
0 0 1 42
1 0 2 53
2 0 3 23
3 0 4 4
4 1 1 53
5 1 2 24
6 1 3 6
7 1 4 12
8 2 1 63
9 2 2 12
10 2 3 65
11 2 4 34
12 3 1 13
13 3 2 64
14 3 3 23
15 3 4 43
Hi I am trying to do transpose operation in pandas, but the condition is the value of one column should be associated with the transposed rows.
The example given below will explain the better way:
the data is looks like:
A 1 2 3 4 51 52 53 54
B 11 22 23 24 71 72 73 74
The result I am trying to do like this:
A 1 51
A 2 52
A 3 53
A 4 54
B 11 71
B 22 72
B 23 73
B 24 74
In first row, the data is in single row, I want to transpose data from 1 to 4 with the value 'A' in other column. Can anyone suggest how can I do this??
It seems you need melt or stack:
print (df)
0 1 2 3 4
0 A 1 2 3 4
1 B 11 22 23 24
df1 = pd.melt(df, id_vars=0).drop('variable', axis=1).sort_values(0)
df1.columns = list('ab')
print (df1)
a b
0 A 1
2 A 2
4 A 3
6 A 4
1 B 11
3 B 22
5 B 23
7 B 24
df2 = df.set_index(0).stack().reset_index(level=1, drop=True).reset_index(name='a')
df2.columns = list('ab')
print (df2)
a b
0 A 1
1 A 2
2 A 3
3 A 4
4 B 11
5 B 22
6 B 23
7 B 24
EDIT by comment:
#set index with first column
df = df.set_index(0)
#create MultiIndex
cols = np.arange(len(df.columns))
df.columns = [ cols // 4, cols % 4]
print (df)
0 1
0 1 2 3 0 1 2 3
0
A 1 2 3 4 51 52 53 54
B 11 22 23 24 71 72 73 74
#stack, reset index names, remove level and reset index
df1 = df.stack().rename_axis((None, None)).reset_index(level=1, drop=True).reset_index()
#set new columns names
df1.columns = ['a','b','c']
print (df1)
a b c
0 A 1 51
1 A 2 52
2 A 3 53
3 A 4 54
4 B 11 71
5 B 22 72
6 B 23 73
7 B 24 74
I have two dataframes, the first one df1 contains only one row :
A B C D E
0 5 8 9 5 0
and the second one has multiple rows , but the same number of columns:
D C E A B
0 5 0 3 3 7
1 9 3 5 2 4
2 7 6 8 8 1
3 6 7 7 8 1
4 5 9 8 9 4
5 3 0 3 5 0
6 2 3 8 1 3
7 3 3 7 0 1
8 9 9 0 4 7
9 3 2 7 2 0
In real example I have much more columns (more than 100). the both dataframes has the same number of columns, and the same columns names, but the order of columns is different, as it's shown in the example.
I should multiply two dataframes (matrix_like multiplication), except of I couldn't perform simple df2.values * df1.values because the columns are not ordered in the same manner, so for instance the second column of df1 B couldn't be multiplied at the second column of df2, because we find C instead of B at second column of df2 , while the column B is the 5th column in df2.
Is there simple and pythonic solution to multiply the dataframes, taking into account the column names ant not column index?
df1[df2.columns] returns a dataframe where the columns are ordered as in df2:
df1
Out[91]:
A B C D E
0 3 8 9 5 0
df1[df2.columns]
Out[92]:
D C E A B
0 5 9 0 3 8
So, you just need:
df2.values * df1[df2.columns].values
This will raise a key error if you have additional columns in df2; and it will only select df2's columns even if you have more columns in df1.
As #MaxU noted, since you are operating on numpy arrays, in order to go back to the dataframe structure you will need:
pd.DataFrame(df2.values * df1[df2.columns].values, columns = df2.columns)
You can use mul, df1 is converted to Serie by ix:
print df1.ix[0]
A 5
B 8
C 9
D 5
E 0
Name: 0, dtype: int64
print df2.mul(df1.ix[0])
A B C D E
0 15 56 0 25 0
1 10 32 27 45 0
2 40 8 54 35 0
3 40 8 63 30 0
4 45 32 81 25 0
5 25 0 0 15 0
6 5 24 27 10 0
7 0 8 27 15 0
8 20 56 81 45 0
9 10 0 18 15 0
If you need change order of final DataFrame, use with reindex_axis:
print df2.mul(df1.ix[0]).reindex_axis(df2.columns.tolist(), axis=1)
D C E A B
0 25 0 0 15 56
1 45 27 0 10 32
2 35 54 0 40 8
3 30 63 0 40 8
4 25 81 0 45 32
5 15 0 0 25 0
6 10 27 0 5 24
7 15 27 0 0 8
8 45 81 0 20 56
9 15 18 0 10 0
Another solution is reorder columns by reindex index of Serie by df2.columns:
print df2.mul(df1.ix[0].reindex(df2.columns))
D C E A B
0 25 0 0 15 56
1 45 27 0 10 32
2 35 54 0 40 8
3 30 63 0 40 8
4 25 81 0 45 32
5 15 0 0 25 0
6 10 27 0 5 24
7 15 27 0 0 8
8 45 81 0 20 56
9 15 18 0 10 0
Say I have a Pandas DataFrame whose data look like
import numpy as np
import pandas as pd
n = 30
df = pd.DataFrame({'a': np.arange(n),
'b': np.random.choice([0, 1, 2], n),
'c': np.arange(n)})
Question: how to permute groups (grouped by b column)?
Not permutation within each group, but permutation in group level?
Example
Before
a b c
1 0 1
2 0 2
3 1 3
4 1 4
5 2 5
6 2 6
After
a b c
3 1 3
4 1 4
1 0 1
2 0 2
5 2 5
6 2 6
Basically before permutation, df['b'].unqiue() == [0, 1, 2], after permutation, df['b'].unique() == [1, 0, 2].
Here's an answer inspired by the accepted answer to this SO post, which uses a temporary Categorical column as a sorting key to do custom sort orderings. In this answer, I produce all permutations, but you can just take the first one if you are looking for only one.
import itertools
df_results = list()
orderings = itertools.permutations(df["b"].unique())
for ordering in orderings:
df_2 = df.copy()
df_2["b_key"] = pd.Categorical(df_2["b"], [i for i in ordering])
df_2.sort_values("b_key", inplace=True)
df_2.drop(["b_key"], axis=1, inplace=True)
df_results.append(df_2)
for df in df_results:
print(df)
The idea here is that we create a new categorical variable each time, with a slightly different enumerated order, then sort by it. We discard it at the end once we no longer need it.
If i understood your question correctly, you can do it this way:
n = 30
df = pd.DataFrame({'a': np.arange(n),
'b': np.random.choice([0, 1, 2], n),
'c': np.arange(n)})
order = pd.Series([1,0,2])
cols = df.columns
df['idx'] = df.b.map(order)
index = df.index
df = df.reset_index().sort_values(['idx', 'index'])[cols]
Step by step:
In [103]: df['idx'] = df.b.map(order)
In [104]: df
Out[104]:
a b c idx
0 0 2 0 2
1 1 0 1 1
2 2 1 2 0
3 3 0 3 1
4 4 1 4 0
5 5 1 5 0
6 6 1 6 0
7 7 2 7 2
8 8 0 8 1
9 9 1 9 0
10 10 0 10 1
11 11 1 11 0
12 12 0 12 1
13 13 2 13 2
14 14 0 14 1
15 15 2 15 2
16 16 1 16 0
17 17 2 17 2
18 18 1 18 0
19 19 1 19 0
20 20 0 20 1
21 21 0 21 1
22 22 1 22 0
23 23 1 23 0
24 24 2 24 2
25 25 0 25 1
26 26 0 26 1
27 27 0 27 1
28 28 1 28 0
29 29 1 29 0
In [105]: df.reset_index().sort_values(['idx', 'index'])
Out[105]:
index a b c idx
2 2 2 1 2 0
4 4 4 1 4 0
5 5 5 1 5 0
6 6 6 1 6 0
9 9 9 1 9 0
11 11 11 1 11 0
16 16 16 1 16 0
18 18 18 1 18 0
19 19 19 1 19 0
22 22 22 1 22 0
23 23 23 1 23 0
28 28 28 1 28 0
29 29 29 1 29 0
1 1 1 0 1 1
3 3 3 0 3 1
8 8 8 0 8 1
10 10 10 0 10 1
12 12 12 0 12 1
14 14 14 0 14 1
20 20 20 0 20 1
21 21 21 0 21 1
25 25 25 0 25 1
26 26 26 0 26 1
27 27 27 0 27 1
0 0 0 2 0 2
7 7 7 2 7 2
13 13 13 2 13 2
15 15 15 2 15 2
17 17 17 2 17 2
24 24 24 2 24 2