how to convert header row into new columns in python pandas? - python

I am having following dataframe:
A,B,C
1,2,3
I have to convert above dataframe like following format:
cols,vals
A,1
B,2
c,3
How to create column names as a new column in pandas?

You can transpose by T:
import pandas as pd
df = pd.DataFrame({'A': {0: 1}, 'C': {0: 3}, 'B': {0: 2}})
print (df)
A B C
0 1 2 3
print (df.T)
0
A 1
B 2
C 3
df1 = df.T.reset_index()
df1.columns = ['cols','vals']
print (df1)
cols vals
0 A 1
1 B 2
2 C 3
If DataFrame has more rows, you can use:
import pandas as pd
df = pd.DataFrame({'A': {0: 1, 1: 9, 2: 1},
'C': {0: 3, 1: 6, 2: 7},
'B': {0: 2, 1: 4, 2: 8}})
print (df)
A B C
0 1 2 3
1 9 4 6
2 1 8 7
df.index = 'vals' + df.index.astype(str)
print (df.T)
vals0 vals1 vals2
A 1 9 1
B 2 4 8
C 3 6 7
df1 = df.T.reset_index().rename(columns={'index':'cols'})
print (df1)
cols vals0 vals1 vals2
0 A 1 9 1
1 B 2 4 8
2 C 3 6 7

Related

combine multiple column into one in pandas [duplicate]

This question already has answers here:
How do I melt a pandas dataframe?
(3 answers)
Closed 12 days ago.
I have table like below
Column 1 Column 2 Column 3 ...
0 a 1 2
1 b 1 3
2 c 2 1
and I want to convert it to be like below
Column 1 Column 2
0 a 1
1 a 2
2 b 1
3 b 3
4 c 2
5 c 1
...
I want to take each value from Column 2 (and so on) and pair it to value in column 1. I have no idea how to do it in pandas or even where to start.
You can use pd.melt to do this:
>>> df = pd.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'},
... 'B': {0: 1, 1: 3, 2: 5},
... 'C': {0: 2, 1: 4, 2: 6}})
>>> df
A B C
0 a 1 2
1 b 3 4
2 c 5 6
>>> pd.melt(df, id_vars=['A'], value_vars=['B', 'C'])
A variable value
0 a B 1
1 b B 3
2 c B 5
3 a C 2
4 b C 4
5 c C 6
Here's my approach, hope it helps:
import pandas as pd
df=pd.DataFrame({'col1':['a','b','c'],'col2':[1,1,2],'col3':[2,3,1]})
new_df=pd.DataFrame(columns=['col1','col2'])
for index,row in df.iterrows():
for element in row.values[1:]:
new_df.loc[len(new_df)]=[row[0],element]
print(new_df)
Output:
col1 col2
0 a 1
1 a 2
2 b 1
3 b 3
4 c 2
5 c 1

Taking rows from dataframe until a condition is met

I have a dataframe with two columns:
A B
0 1 3
1 2 2
2 3 2
3 9 3
4 1 1
...
For a given index i, I want the rows from row i to the row j in which df.at[j,A]-df.at[i,B]>5. I don't want any rows after row j.
For example, let i=1, the output should be:
[out]
A B
2 2
3 2
9 3
Is there a simple way of do this without using loops?
df = pd.DataFrame({'A': [10, 1, 2, 3, 9], 'B': [1, 3, 2, 2, 3]})
i = 2
base = df.at[i, 'B']
df = df.iloc[i:]
j = df[df['A'] - df.at[i, 'B'] > 5]
if not j.empty:
print(df.iloc[:j.index[0]])
else:
print('Condition not found')
Prints:
A B
2 2 2
3 3 2
4 9 3
You could try as follows:
import pandas as pd
data = {'A': {0: 10, 1: 2, 2: 3, 3: 9}, 'B': {0: 3, 1: 2, 2: 2, 3: 3}}
df = pd.DataFrame(data)
i=1
s = df.loc[i:,'A']-df.loc[i,'B']>5
trues = s[s==True]
if not trues.empty:
subset = df.iloc[i:trues.idxmax()+1]
else:
subset = pd.DataFrame()
print(subset)
A B
1 2 2
2 3 2
3 9 3

how to convert timeseries ranking table to individual rank table in pandas dataframe python

for example, ranktable is
time/rank 1 2 3
1 a b c
2 b c a
and I want convert this to individual rank by time
time/individual a b c
1 1 2 3
2 3 1 2
with pandas dataframe, code is below..
ranktable = pd.DataFrame([{
'time': 1,
1: 'a',
2: 'b',
3: 'c'
},{
'time': 2,
1: 'b',
2: 'c',
3: 'a'
}])
resultIWant = pd.DataFrame([{
'time': 1,
'a': 1,
'b': 2,
'c': 3
}, {
'time': 2,
'a': 3,
'b': 1,
'c': 2
}])
is there any easy way to convert?
Use DataFrame.melt with DataFrame.pivot:
df = (ranktable.melt('time')
.pivot('time','value','variable')
.rename_axis(None, axis=1)
.reset_index())
print (df)
time a b c
0 1 1 2 3
1 2 3 1 2
Use pandas.DataFrame.apply:
new_df = ranktable.set_index("time").apply(lambda x: pd.Series(x.index, index=x), 1)
print(new_df)
Output:
a b c
time
1 1 2 3
2 3 1 2

Remove multivalued columns

I have a dataframe。
A B
0 2 3
1 2 4
2 3 5
If the value of a column has more than 2 different values, I will remove.
expect the output:
A
0 2
1 2
2 3
You can use .nunique() and .loc, passing a boolean
df = pd.DataFrame({'A': {0: 2, 1: 2, 2: 3}, 'B': {0: 3, 1: 4, 2: 5}})
df.loc[:, (df.nunique() <= 2)]
A
0 2
1 2
2 3
An alternative approach (credit to this answer):
criteria = df.nunique() <= 2
df[criteria.index[criteria]]
Use for loop and value_count to get the result:-
df = pd.DataFrame(data= {'A':[2,2,3], 'B':[3,4,5]})
for var in df.columns:
result = df[var].value_counts()
if len(result)>2:
df.drop(var, axis=1,inplace=True)
df
Output
A
0 2
1 2
2 3

New column in dataframe based on location of values in another column

I am trying to create a new column 'ratioA' in a dataframe df whereby the values are related to a column A:
For a given row, df['ratioA'] is equal to the ratio between df['A'] in that row and the next row.
I iterated over the index column as reference, but not sure why the values are appearing as NaN - Technically only the last row should appear as NaN.
import numpy as np
import pandas as pd
series1 = pd.Series({'A': 1, 'B': 2})
series2 = pd.Series({'A': 3, 'B': 4})
series3 = pd.Series({'A': 5, 'B': 6})
series4 = pd.Series({'A': 7, 'B': 8})
df = pd.DataFrame([series1, series2, series3, series4], index=[0,1,2,3])
df = df.reset_index()
for i in df['index']:
df['ratioA'] = df['A'][df['index']==i]/df['A'][df['index']==i+1]
print (df)
The output is:
index A B ratioA
0 0 1 2 NaN
1 1 3 4 NaN
2 2 5 6 NaN
3 3 7 8 NaN
The desired output should be:
index A B ratioA
0 0 1 2 0.33
1 1 3 4 0.60
2 2 5 6 0.71
3 3 7 8 NaN
You can use vectorized solution - divide by div shifted column A:
print (df['A'].shift(-1))
0 3.0
1 5.0
2 7.0
3 NaN
Name: A, dtype: float64
df['ratioA'] = df['A'].div(df['A'].shift(-1))
print (df)
index A B ratioA
0 0 1 2 0.333333
1 1 3 4 0.600000
2 2 5 6 0.714286
3 3 7 8 NaN
In pandas loops are very slow, so the best is avoid them (Jeff (pandas developer) explain it better.):
for i, row in df.iterrows():
if i != df.index[-1]:
df.loc[i, 'ratioA'] = df.loc[i,'A'] / df.loc[i+1, 'A']
print (df)
index A B ratioA
0 0 1 2 0.333333
1 1 3 4 0.600000
2 2 5 6 0.714286
3 3 7 8 NaN
Timings:
series1 = pd.Series({'A': 1, 'B': 2})
series2 = pd.Series({'A': 3, 'B': 4})
series3 = pd.Series({'A': 5, 'B': 6})
series4 = pd.Series({'A': 7, 'B': 8})
df = pd.DataFrame([series1, series2, series3, series4], index=[0,1,2,3])
#[4000 rows x 3 columns]
df = pd.concat([df]*1000).reset_index(drop=True)
df = df.reset_index()
In [49]: %timeit df['ratioA1'] = df['A'].div(df['A'].shift(-1))
1000 loops, best of 3: 431 µs per loop
In [50]: %%timeit
...: for i, row in df.iterrows():
...: if i != df.index[-1]:
...: df.loc[i, 'ratioA'] = df.loc[i,'A'] / df.loc[i+1, 'A']
...:
1 loop, best of 3: 2.15 s per loop

Categories

Resources