how to convert header row into new columns in python pandas?

how to convert header row into new columns in python pandas? - python

I am having following dataframe:
A,B,C
1,2,3
I have to convert above dataframe like following format:
cols,vals
A,1
B,2
c,3
How to create column names as a new column in pandas?

You can transpose by T:
import pandas as pd
df = pd.DataFrame({'A': {0: 1}, 'C': {0: 3}, 'B': {0: 2}})
print (df)
A B C
0 1 2 3
print (df.T)
0
A 1
B 2
C 3
df1 = df.T.reset_index()
df1.columns = ['cols','vals']
print (df1)
cols vals
0 A 1
1 B 2
2 C 3
If DataFrame has more rows, you can use:
import pandas as pd
df = pd.DataFrame({'A': {0: 1, 1: 9, 2: 1},
'C': {0: 3, 1: 6, 2: 7},
'B': {0: 2, 1: 4, 2: 8}})
print (df)
A B C
0 1 2 3
1 9 4 6
2 1 8 7
df.index = 'vals' + df.index.astype(str)
print (df.T)
vals0 vals1 vals2
A 1 9 1
B 2 4 8
C 3 6 7
df1 = df.T.reset_index().rename(columns={'index':'cols'})
print (df1)
cols vals0 vals1 vals2
0 A 1 9 1
1 B 2 4 8
2 C 3 6 7

Related

combine multiple column into one in pandas [duplicate]

This question already has answers here:
How do I melt a pandas dataframe?
(3 answers)
Closed 12 days ago.
I have table like below
Column 1 Column 2 Column 3 ...
0 a 1 2
1 b 1 3
2 c 2 1
and I want to convert it to be like below
Column 1 Column 2
0 a 1
1 a 2
2 b 1
3 b 3
4 c 2
5 c 1
...
I want to take each value from Column 2 (and so on) and pair it to value in column 1. I have no idea how to do it in pandas or even where to start.

You can use pd.melt to do this:
>>> df = pd.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'},
... 'B': {0: 1, 1: 3, 2: 5},
... 'C': {0: 2, 1: 4, 2: 6}})
>>> df
A B C
0 a 1 2
1 b 3 4
2 c 5 6
>>> pd.melt(df, id_vars=['A'], value_vars=['B', 'C'])
A variable value
0 a B 1
1 b B 3
2 c B 5
3 a C 2
4 b C 4
5 c C 6

Here's my approach, hope it helps:
import pandas as pd
df=pd.DataFrame({'col1':['a','b','c'],'col2':[1,1,2],'col3':[2,3,1]})
new_df=pd.DataFrame(columns=['col1','col2'])
for index,row in df.iterrows():
for element in row.values[1:]:
new_df.loc[len(new_df)]=[row[0],element]
print(new_df)
Output:
col1 col2
0 a 1
1 a 2
2 b 1
3 b 3
4 c 2
5 c 1

Taking rows from dataframe until a condition is met

I have a dataframe with two columns:
A B
0 1 3
1 2 2
2 3 2
3 9 3
4 1 1
...
For a given index i, I want the rows from row i to the row j in which df.at[j,A]-df.at[i,B]>5. I don't want any rows after row j.
For example, let i=1, the output should be:
[out]
A B
2 2
3 2
9 3
Is there a simple way of do this without using loops?

df = pd.DataFrame({'A': [10, 1, 2, 3, 9], 'B': [1, 3, 2, 2, 3]})
i = 2
base = df.at[i, 'B']
df = df.iloc[i:]
j = df[df['A'] - df.at[i, 'B'] > 5]
if not j.empty:
print(df.iloc[:j.index[0]])
else:
print('Condition not found')
Prints:
A B
2 2 2
3 3 2
4 9 3

You could try as follows:
import pandas as pd
data = {'A': {0: 10, 1: 2, 2: 3, 3: 9}, 'B': {0: 3, 1: 2, 2: 2, 3: 3}}
df = pd.DataFrame(data)
i=1
s = df.loc[i:,'A']-df.loc[i,'B']>5
trues = s[s==True]
if not trues.empty:
subset = df.iloc[i:trues.idxmax()+1]
else:
subset = pd.DataFrame()
print(subset)
A B
1 2 2
2 3 2
3 9 3

how to convert timeseries ranking table to individual rank table in pandas dataframe python

for example, ranktable is
time/rank 1 2 3
1 a b c
2 b c a
and I want convert this to individual rank by time
time/individual a b c
1 1 2 3
2 3 1 2
with pandas dataframe, code is below..
ranktable = pd.DataFrame([{
'time': 1,
1: 'a',
2: 'b',
3: 'c'
},{
'time': 2,
1: 'b',
2: 'c',
3: 'a'
}])
resultIWant = pd.DataFrame([{
'time': 1,
'a': 1,
'b': 2,
'c': 3
}, {
'time': 2,
'a': 3,
'b': 1,
'c': 2
}])
is there any easy way to convert?

Use DataFrame.melt with DataFrame.pivot:
df = (ranktable.melt('time')
.pivot('time','value','variable')
.rename_axis(None, axis=1)
.reset_index())
print (df)
time a b c
0 1 1 2 3
1 2 3 1 2

Use pandas.DataFrame.apply:
new_df = ranktable.set_index("time").apply(lambda x: pd.Series(x.index, index=x), 1)
print(new_df)
Output:
a b c
time
1 1 2 3
2 3 1 2

Remove multivalued columns

I have a dataframe。
A B
0 2 3
1 2 4
2 3 5
If the value of a column has more than 2 different values, I will remove.
expect the output:
A
0 2
1 2
2 3

You can use .nunique() and .loc, passing a boolean
df = pd.DataFrame({'A': {0: 2, 1: 2, 2: 3}, 'B': {0: 3, 1: 4, 2: 5}})
df.loc[:, (df.nunique() <= 2)]
A
0 2
1 2
2 3
An alternative approach (credit to this answer):
criteria = df.nunique() <= 2
df[criteria.index[criteria]]

Use for loop and value_count to get the result:-
df = pd.DataFrame(data= {'A':[2,2,3], 'B':[3,4,5]})
for var in df.columns:
result = df[var].value_counts()
if len(result)>2:
df.drop(var, axis=1,inplace=True)
df
Output
A
0 2
1 2
2 3

New column in dataframe based on location of values in another column

I am trying to create a new column 'ratioA' in a dataframe df whereby the values are related to a column A:
For a given row, df['ratioA'] is equal to the ratio between df['A'] in that row and the next row.
I iterated over the index column as reference, but not sure why the values are appearing as NaN - Technically only the last row should appear as NaN.
import numpy as np
import pandas as pd
series1 = pd.Series({'A': 1, 'B': 2})
series2 = pd.Series({'A': 3, 'B': 4})
series3 = pd.Series({'A': 5, 'B': 6})
series4 = pd.Series({'A': 7, 'B': 8})
df = pd.DataFrame([series1, series2, series3, series4], index=[0,1,2,3])
df = df.reset_index()
for i in df['index']:
df['ratioA'] = df['A'][df['index']==i]/df['A'][df['index']==i+1]
print (df)
The output is:
index A B ratioA
0 0 1 2 NaN
1 1 3 4 NaN
2 2 5 6 NaN
3 3 7 8 NaN
The desired output should be:
index A B ratioA
0 0 1 2 0.33
1 1 3 4 0.60
2 2 5 6 0.71
3 3 7 8 NaN

You can use vectorized solution - divide by div shifted column A:
print (df['A'].shift(-1))
0 3.0
1 5.0
2 7.0
3 NaN
Name: A, dtype: float64
df['ratioA'] = df['A'].div(df['A'].shift(-1))
print (df)
index A B ratioA
0 0 1 2 0.333333
1 1 3 4 0.600000
2 2 5 6 0.714286
3 3 7 8 NaN
In pandas loops are very slow, so the best is avoid them (Jeff (pandas developer) explain it better.):
for i, row in df.iterrows():
if i != df.index[-1]:
df.loc[i, 'ratioA'] = df.loc[i,'A'] / df.loc[i+1, 'A']
print (df)
index A B ratioA
0 0 1 2 0.333333
1 1 3 4 0.600000
2 2 5 6 0.714286
3 3 7 8 NaN
Timings:
series1 = pd.Series({'A': 1, 'B': 2})
series2 = pd.Series({'A': 3, 'B': 4})
series3 = pd.Series({'A': 5, 'B': 6})
series4 = pd.Series({'A': 7, 'B': 8})
df = pd.DataFrame([series1, series2, series3, series4], index=[0,1,2,3])
#[4000 rows x 3 columns]
df = pd.concat([df]*1000).reset_index(drop=True)
df = df.reset_index()
In [49]: %timeit df['ratioA1'] = df['A'].div(df['A'].shift(-1))
1000 loops, best of 3: 431 µs per loop
In [50]: %%timeit
...: for i, row in df.iterrows():
...: if i != df.index[-1]:
...: df.loc[i, 'ratioA'] = df.loc[i,'A'] / df.loc[i+1, 'A']
...:
1 loop, best of 3: 2.15 s per loop

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to convert header row into new columns in python pandas? - python

I am having following dataframe: A,B,C 1,2,3 I have to convert above dataframe like following format: cols,vals A,1 B,2 c,3 How to create column names as a new column in pandas?

Related

combine multiple column into one in pandas [duplicate]

Taking rows from dataframe until a condition is met

how to convert timeseries ranking table to individual rank table in pandas dataframe python

Remove multivalued columns

New column in dataframe based on location of values in another column

Categories

Resources