Display pandas columns - wide to long - python

Say I have a row of column headers, and associated values in a Pandas Dataframe:
print df
A B C D E F G H I J K
1 2 3 4 5 6 7 8 9 10 11
how do I go about displaying them like the following:
print df
A B C D E
1 2 3 4 5
F G H I J
6 7 8 9 10
K
11

custom function
def new_repr(self):
g = self.groupby(np.arange(self.shape[1]) // 5, axis=1)
return '\n\n'.join([d.to_string() for _, d in g])
print(new_repr(df))
A B C D E
0 1 2 3 4 5
F G H I J
0 6 7 8 9 10
K
0 11

pd.set_option('display.width', 20)
pd.set_option('display.expand_frame_repr', True)
df
A B C D E \
0 1 2 3 4 5
F G H I J \
0 6 7 8 9 10
K
0 11

Related

keeping first column value .melt func

I want to use dataframe.melt function in pandas lib to convert data format from rows into column but keeping first column value. I ve just tried also .pivot, but it is not working good. Please look at the example below and please help:
ID Alphabet Unspecified: 1 Unspecified: 2
0 1 A G L
1 2 B NaN NaN
2 3 C H NaN
3 4 D I M
4 5 E J NaN
5 6 F K O
Into this:
ID Alphabet
0 1 A
1 1 G
2 1 L
3 2 B
4 3 C
5 3 H
6 4 D
7 4 I
8 4 M
9 5 E
10 5 J
11 6 F
12 6 K
11 6 O
Try (assuming ID is unique and sorted):
df = (
pd.melt(df, "ID")
.sort_values("ID", kind="stable")
.drop(columns="variable")
.dropna()
.reset_index(drop=True)
.rename(columns={"value": "Alphabet"})
)
print(df)
Prints:
ID Alphabet
0 1 A
1 1 G
2 1 L
3 2 B
4 3 C
5 3 H
6 4 D
7 4 I
8 4 M
9 5 E
10 5 J
11 6 F
12 6 K
13 6 O
Don't melt but rather stack, this will directly drop the NaNs and keep the order per row:
out = (df
.set_index('ID')
.stack().droplevel(1)
.reset_index(name='Alphabet')
)
Output:
ID Alphabet
0 1 A
1 1 G
2 1 L
3 2 B
4 3 C
5 3 H
6 4 D
7 4 I
8 4 M
9 5 E
10 5 J
11 6 F
12 6 K
13 6 O
One option is with pivot_longer from pyjanitor:
# pip install pyjanitor
import pandas as pd
import janitor
(df
.pivot_longer(
index = 'ID',
names_to = 'Alphabet',
names_pattern = ['.+'],
sort_by_appearance = True)
.dropna()
)
ID Alphabet
0 1 A
1 1 G
2 1 L
3 2 B
6 3 C
7 3 H
9 4 D
10 4 I
11 4 M
12 5 E
13 5 J
15 6 F
16 6 K
17 6 O
In the code above, the names_pattern accepts a list of regular expression to match the desired columns, all the matches are collated into one column names Alphabet in names_to.

Swapping Column values based on condition

this is my dataframe
S.No Column1 Column2
0 7 A B
1 2 D F
2 5 C H
3 9 NaN J
4 1 T G
5 4 Z True
6 10 S Y
7 3 G V
8 10 R Y
9 8 T X
df.replace([np.nan,True],'A',inplace=True)
S.No Column1 Column2
0 7 A B
1 2 D F
2 5 C H
3 9 A J
4 1 T G
5 4 Z A
6 10 S Y
7 3 G V
8 10 R Y
9 8 T X
required output be like
S.No Column1 Column2
0 7 B A
1 2 F D
2 5 H C
3 9 J A
4 1 T G
5 4 Z A
6 10 Y S
7 3 V G
8 10 Y V
9 8 X T
HOW TO WRITE CODE
Use rename:
>>> df
S.No Column1 Column2
0 7 A B
1 2 D F
2 5 C H
3 9 NaN J
4 1 T G
5 4 Z True
6 10 S Y
7 3 G V
8 10 R Y
9 8 T X
>>> df.rename(columns={'Column1': 'Column2', 'Column2': 'Column1'})[df.columns]
S.No Column1 Column2
0 7 B A
1 2 F D
2 5 H C
3 9 J NaN
4 1 G T
5 4 True Z
6 10 Y S
7 3 V G
8 10 Y R
9 8 X T
If you want to swap the contents of the 2 columns, you can use:
df[['Column1', 'Column2']] = list(zip(df['Column2'], df['Column1']))
or use .to_numpy() or .values, as follows:
df[['Column1', 'Column2']] = df[['Column2', 'Column1']].to_numpy()
or
df[['Column1', 'Column2']] = df[['Column2', 'Column1']].values
Result:
Based on your data after df.replace:
print(df)
S.No Column1 Column2
0 7 B A
1 2 F D
2 5 H C
3 9 J A
4 1 G T
5 4 A Z
6 10 Y S
7 3 V G
8 10 Y R
9 8 X T

How to match list in multiple columns

example my dataframe,
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 ...
0 A B C D E F G H I J K L M N O P Q R S T U V
1 B C D E F G H I J K L M N O P Q R S T U V A
2 V A B C D E F G H I J K L M N O P Q R S T U
and my list
mylist = ['A', 'B' 'C']
I want to match the columns and the list so that only the characters in the list exist in the column.
output what I want
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 ...
0 A B C
1 B C A
2 A B C
I'm not sure what to do, so I ask a question.
Thank you for reading.
Use DataFrame.isin with DataFrame.where:
mylist = ['A', 'B', 'C']
df = df.where(df.isin(mylist), '')
print (df)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
0 A B C
1 B C A
2 A B C
Or if invert mask is possible use:
df[~df.isin(mylist)] = ''
This might also work -
df = df[df.isin(mylist)].fillna('')

Separate pandas df by repeating range in a column

Problem:
I'm trying to split a pandas data frame by the repeating ranges in column A. My data and output are as follows. The ranges in columns A are always increasing and do not skip values. The values in column A do start and stop arbitrarily, however.
Data:
import pandas as pd
dict = {"A": [1,2,3,2,3,4,3,4,5,6],
"B": ["a","b","c","d","e","f","g","h","i","k"]}
df = pd.DataFrame(dict)
df
A B
0 1 a
1 2 b
2 3 c
3 2 d
4 3 e
5 4 f
6 3 g
7 4 h
8 5 i
9 6 k
Desired ouptut:
df1
A B
0 1 a
1 2 b
2 3 c
df2
A B
0 2 d
1 3 e
2 4 f
df3
A B
0 3 g
1 4 h
2 5 i
3 6 k
Thanks for advice!
Answer times:
from timeit import default_timer as timer
start = timer()
for x ,y in df.groupby(df.A.diff().ne(1).cumsum()):
print(y)
end = timer()
aa = end - start
start = timer()
s = (df.A.diff() != 1).cumsum()
g = df.groupby(s)
for _,g_ in g:
print(g_)
end = timer()
bb = end - start
start = timer()
[*(d for _, d in df.groupby(df.A.diff().ne(1).cumsum()))]
print(*(d for _, d in df.groupby(df.A.diff().ne(1).cumsum())), sep='\n\n')
end = timer()
cc = end - start
print(aa,bb,cc)
0.0176649530000077 0.018132143000002543 0.018715283999995336
Create the groupby key by using diff and cumsum
for x ,y in df.groupby(df.A.diff().ne(1).cumsum()):
print(y)
A B
0 1 a
1 2 b
2 3 c
A B
3 2 d
4 3 e
5 4 f
A B
6 3 g
7 4 h
8 5 i
9 6 k
Just groupby by the difference
s = (df.A.diff() != 1).cumsum()
g = df.groupby(s)
for _,g_ in g:
print(g_)
Outputs
A B
0 1 a
1 2 b
2 3 c
A B
3 2 d
4 3 e
5 4 f
A B
6 3 g
7 4 h
8 5 i
9 6 k
One-liner
because that's important
[*(d for _, d in df.groupby(df.A.diff().ne(1).cumsum()))]
Print it
print(*(d for _, d in df.groupby(df.A.diff().ne(1).cumsum())), sep='\n\n')
A B
0 1 a
1 2 b
2 3 c
A B
3 2 d
4 3 e
5 4 f
A B
6 3 g
7 4 h
8 5 i
9 6 k
Assign it
df1, df2, df3 = (d for _, d in df.groupby(df.A.diff().ne(1).cumsum()))

Creating python function to create categorical bins in pandas

I'm trying to create a reusable function in python 2.7(pandas) to form categorical bins, i.e. group less-value categories as 'other'. Can someone help me to create a function for the below: col1, col2, etc. are different categorical variable columns.
##Reducing categories by binning categorical variables - column1
a = df.col1.value_counts()
#get top 5 values of index
vals = a[:5].index
df['col1_new'] = df.col1.where(df.col1.isin(vals), 'other')
df = df.drop(['col1'],axis=1)
##Reducing categories by binning categorical variables - column2
a = df.col2.value_counts()
#get top 6 values of index
vals = a[:6].index
df['col2_new'] = df.col2.where(df.col2.isin(vals), 'other')
df = df.drop(['col2'],axis=1)
You can use:
df = pd.DataFrame({'A':list('abcdefabcdefabffeg'),
'D':[1,3,5,7,1,0,1,3,5,7,1,0,1,3,5,7,1,0]})
print (df)
A D
0 a 1
1 b 3
2 c 5
3 d 7
4 e 1
5 f 0
6 a 1
7 b 3
8 c 5
9 d 7
10 e 1
11 f 0
12 a 1
13 b 3
14 f 5
15 f 7
16 e 1
17 g 0
def replace_under_top(df, c, n):
a = df[c].value_counts()
#get top n values of index
vals = a[:n].index
#assign columns back
df[c] = df[c].where(df[c].isin(vals), 'other')
#rename processes column
df = df.rename(columns={c : c + '_new'})
return df
Test:
df1 = replace_under_top(df, 'A', 3)
print (df1)
A_new D
0 other 1
1 b 3
2 other 5
3 other 7
4 e 1
5 f 0
6 other 1
7 b 3
8 other 5
9 other 7
10 e 1
11 f 0
12 other 1
13 b 3
14 f 5
15 f 7
16 e 1
17 other 0
df2 = replace_under_top(df, 'D', 4)
print (df2)
A D_new
0 other 1
1 b 3
2 other 5
3 other 7
4 e 1
5 f other
6 other 1
7 b 3
8 other 5
9 other 7
10 e 1
11 f other
12 other 1
13 b 3
14 f 5
15 f 7
16 e 1
17 other other

Categories

Resources