combine multiple column into one in pandas [duplicate] - python

This question already has answers here:
How do I melt a pandas dataframe?
(3 answers)
Closed 12 days ago.
I have table like below
Column 1 Column 2 Column 3 ...
0 a 1 2
1 b 1 3
2 c 2 1
and I want to convert it to be like below
Column 1 Column 2
0 a 1
1 a 2
2 b 1
3 b 3
4 c 2
5 c 1
...
I want to take each value from Column 2 (and so on) and pair it to value in column 1. I have no idea how to do it in pandas or even where to start.

You can use pd.melt to do this:
>>> df = pd.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'},
... 'B': {0: 1, 1: 3, 2: 5},
... 'C': {0: 2, 1: 4, 2: 6}})
>>> df
A B C
0 a 1 2
1 b 3 4
2 c 5 6
>>> pd.melt(df, id_vars=['A'], value_vars=['B', 'C'])
A variable value
0 a B 1
1 b B 3
2 c B 5
3 a C 2
4 b C 4
5 c C 6

Here's my approach, hope it helps:
import pandas as pd
df=pd.DataFrame({'col1':['a','b','c'],'col2':[1,1,2],'col3':[2,3,1]})
new_df=pd.DataFrame(columns=['col1','col2'])
for index,row in df.iterrows():
for element in row.values[1:]:
new_df.loc[len(new_df)]=[row[0],element]
print(new_df)
Output:
col1 col2
0 a 1
1 a 2
2 b 1
3 b 3
4 c 2
5 c 1

Related

How do I flatten A Python DataFrame by an index and a list of values? [duplicate]

This question already has answers here:
Unnest (explode) a Pandas Series
(8 answers)
Closed 2 years ago.
Suppose that I have the following code
import pandas as pd
cars = {'Index': [1, 2, 3, 4],
'Values': ['A, B, C, D', 'A, B', 'C', 'D']
}
df = pd.DataFrame(cars, columns = ['Index', 'Values'])
print (df)
which creates a DataFrame that looks like this...
Index Values
0 1 A, B, C, D
1 2 A, B
2 3 C
3 4 D
How do I take that Dataframe, and create a new one which looks like this...
Index Values
0 1 A
1 1 B
2 1 C
3 1 D
4 2 A
5 2 B
6 3 C
7 4 D
df.Values = df.Values.str.split(",")
df = df.explode('Values').reset_index(drop=True)
Index Values
0 1 A
1 1 B
2 1 C
3 1 D
4 2 A
5 2 B
6 3 C
7 4 D

Filling a column with its header value

How would I be able to create a new column D and fill it with it's respective header value (i.e. not set as just D, but any value that is passed as a column header)
import pandas as pd
df = pd.DataFrame({'B': [1, 2, 3], 'C': [4, 5, 6]})
Output:
index B C D
0 1 4 D
1 2 5 D
2 3 6 D
One way is the following (if you know at hand what is the header):
df['D'] = 'D'
>>> df
B C D
0 1 4 D
1 2 5 D
2 3 6 D
Or if your 'D' column is initially empty, e.g.
>>> df
B C D
0 1 4
1 2 5
2 3 6
then the following works too:
header = list(df.columns)[-1]
df[header] = header
>>> df
B C D
0 1 4 D
1 2 5 D
2 3 6 D

How can I add a column to a pandas DataFrame that uniquely identifies grouped data? [duplicate]

Given the following data frame:
import pandas as pd
import numpy as np
df=pd.DataFrame({'A':['A','A','A','B','B','B'],
'B':['a','a','b','a','a','a'],
})
df
A B
0 A a
1 A a
2 A b
3 B a
4 B a
5 B a
I'd like to create column 'C', which numbers the rows within each group in columns A and B like this:
A B C
0 A a 1
1 A a 2
2 A b 1
3 B a 1
4 B a 2
5 B a 3
I've tried this so far:
df['C']=df.groupby(['A','B'])['B'].transform('rank')
...but it doesn't work!
Use groupby/cumcount:
In [25]: df['C'] = df.groupby(['A','B']).cumcount()+1; df
Out[25]:
A B C
0 A a 1
1 A a 2
2 A b 1
3 B a 1
4 B a 2
5 B a 3
Use groupby.rank function.
Here the working example.
df = pd.DataFrame({'C1':['a', 'a', 'a', 'b', 'b'], 'C2': [1, 2, 3, 4, 5]})
df
C1 C2
a 1
a 2
a 3
b 4
b 5
df["RANK"] = df.groupby("C1")["C2"].rank(method="first", ascending=True)
df
C1 C2 RANK
a 1 1
a 2 2
a 3 3
b 4 1
b 5 2

in Pandas, how to create a variable that is n for the nth observation within a group?

consider this
df = pd.DataFrame({'B': ['a', 'a', 'b', 'b'], 'C': [1, 2, 6,2]})
df
Out[128]:
B C
0 a 1
1 a 2
2 b 6
3 b 2
I want to create a variable that simply corresponds to the ordering of observations after sorting by 'C' within each groupby('B') group.
df.sort_values(['B','C'])
Out[129]:
B C order
0 a 1 1
1 a 2 2
3 b 2 1
2 b 6 2
How can I do that? I am thinking about creating a column that is one, and using cumsum but that seems too clunky...
I think you can use range with len(df):
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3],
'B': ['a', 'a', 'b'],
'C': [5, 3, 2]})
print df
A B C
0 1 a 5
1 2 a 3
2 3 b 2
df.sort_values(by='C', inplace=True)
#or without inplace
#df = df.sort_values(by='C')
print df
A B C
2 3 b 2
1 2 a 3
0 1 a 5
df['order'] = range(1,len(df)+1)
print df
A B C order
2 3 b 2 1
1 2 a 3 2
0 1 a 5 3
EDIT by comment:
I think you can use groupby with cumcount:
import pandas as pd
df = pd.DataFrame({'B': ['a', 'a', 'b', 'b'], 'C': [1, 2, 6,2]})
df.sort_values(['B','C'], inplace=True)
#or without inplace
#df = df.sort_values(['B','C'])
print df
B C
0 a 1
1 a 2
3 b 2
2 b 6
df['order'] = df.groupby('B', sort=False).cumcount() + 1
print df
B C order
0 a 1 1
1 a 2 2
3 b 2 1
2 b 6 2
Nothing wrong with Jezrael's answer but there's a simpler (though less general) method in this particular example. Just add groupby to JohnGalt's suggestion of using rank.
>>> df['order'] = df.groupby('B')['C'].rank()
B C order
0 a 1 1.0
1 a 2 2.0
2 b 6 2.0
3 b 2 1.0
In this case, you don't really need the ['C'] but it makes the ranking a little more explicit and if you had other unrelated columns in the dataframe then you would need it.
But if you are ranking by more than 1 column, you should use Jezrael's method.

Pandas moving window over rows

Is there a way to apply a function over a moving window centered around the current row?, for example:
>>> df = pd.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'},
... 'B': {0: 1, 1: 3, 2: 5},
... 'C': {0: 2, 1: 4, 2: 6}})
>>> df
C
0 2
1 4
2 6
Desired results generates a column D which is the average of the values of the column C of the previous, current and following rows, that is:
row 0 => D = (2 + 4)/2 = 3
row 1 => D = (2 + 4 + 6)/3 = 4
row 2 => D = (4 + 6)/2 = 5
>>> df_final
C D
0 2 3
1 4 4
2 6 5
It looks like you just want a rolling mean, with a centred window of 3. For example:
>>> df["D"] = pd.rolling_mean(df["C"], window=3, center=True, min_periods=2)
>>> df
A B C D
0 a 1 2 3
1 b 3 4 4
2 c 5 6 5
Updated answer: pd.rolling_mean was deprecated in 0.18 and is no longer available as of pandas=0.23.4.
Window functions are now methods
Window functions have been refactored to be methods on Series/DataFrame objects, rather than top-level functions, which are now deprecated. This allows these window-type functions, to have a similar API to that of .groupby.
It either needs to be called on the dataframe:
In [55]: df['D'] = df['C'].rolling(window=3, center=True, min_periods=2).mean()
In [56]: df
Out[56]:
A B C D
0 a 1 2 3.0
1 b 3 4 4.0
2 c 5 6 5.0
Or from pandas.core.window.Rolling:
In [57]: df['D'] = pd.core.window.Rolling(df['C'], window=3, center=True, min_periods=2).mean()
In [58]: df
Out[58]:
A B C D
0 a 1 2 3.0
1 b 3 4 4.0
2 c 5 6 5.0

Categories

Resources