Reordering DataFrame in Pandas - python

I have a DataFrame that looks like this:
A B C D E
1 a a a a a
2 b b b b b
3 c c c c c
4 d d d d d
5 e e e e e
6 f f f f f
Anyone knows how to reorder it using Pandas to make it look like this:
A B C D E F G H I J
1 a a a a a b b b b b
3 c c c c c d d d d d
5 e e e e e f f f f f
I tried reading the documentation https://pandas.pydata.org/docs/user_guide/reshaping.html. Its difficult to understand as a beginner, appreciate any help.

Assuming that you want to group every two rows in a single row, use the underlying numpy array and reshape it:
from string import ascii_uppercase
out = (pd.DataFrame(df.to_numpy().reshape(len(df)//2, -1),
index=df.index[::2])
.rename(columns=dict(enumerate(ascii_uppercase)))
)
Output:
A B C D E F G H I J
1 a a a a a b b b b b
3 c c c c c d d d d d
5 e e e e e f f f f f

Related

Pairwise matrix counts of two columns using pandas [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 2 years ago.
I am trying to obtain pairwise counts of two column variables using pandas. I have a dataframe of two columns in the following format:
col1 col2
a e
b g
c h
d f
a g
b h
c f
d e
a f
b g
c g
d h
a e
b e
c g
d h
b h
What I would like to get as output would be the following matrix of counts, for e.g.:
e f g h
a 2 1 1 0
b 1 0 2 2
c 0 1 2 1
d 1 1 0 2
I am getting totally confused with pandas iterating over columns, rows, indexes and such. Appreciate some guidance here.
Pandas often has simple functions built in - in this case, you want crosstab:
pd.crosstab(dat['col1'], dat['col2'])
full code:
import pandas as pd
from io import StringIO
x = '''col1 col2
a e
b g
c h
d f
a g
b h
c f
d e
a f
b g
c g
d h
a e
b e
c g
d h
b h'''
dat = pd.read_csv(StringIO(x), sep = '\s+')
pd.crosstab(dat['col1'], dat['col2'])
You're looking for a crosstab:
count_matrix = pd.crosstab(index=df["col1"], columns=df["col2"])
print(count_matrix)
col2 e f g h
col1
a 2 1 1 0
b 1 0 2 2
c 0 1 2 1
d 1 1 0 2
If you don't like the column/index names in (e.g. still seeing "col1" and "col2"), then you can remove them with rename_axis:
count_matrix = count_matrix.rename_axis(index=None, columns=None)
print(count_matrix)
e f g h
a 2 1 1 0
b 1 0 2 2
c 0 1 2 1
d 1 1 0 2
If you want that all together in one snippet:
count_matrix = (pd.crosstab(index=df["col1"], columns=df["col2"])
.rename_axis(index=None, columns=None))

Pandas Shift Row Value to Match Column Name

I have a sample dataset that has a set list of column names. In shifting data around, I have each row printing letters in each row as seen below.
I am trying to shift the values of each row to match either respective column. I have tried doing pd.shift() to do so but have not had much success. I am trying to get what is seen below. Any thoughts?
import pandas as pd
df = pd.DataFrame({'A': list('AAAAA'),
'B': list('CBBDE'),
'C': list('DDCEG'),
'D': list('EEDF '),
'E': list('FFE '),
'F': list('GGF '),
'G': list(' G ')})
A B C D E F G
0 A C D E F G
1 A B D E F G
2 A B C D E F G
3 A D E F
4 A E G
After:
A B C D E F G
0 A C D E F G
1 A B D E F G
2 A B C D E F G
3 A D E F
4 A E G
Here's a broadcasted comparison approach. This will be quite fast, but does have a higher memory complexity.
a = df.to_numpy()
b = df.columns.to_numpy()
pd.DataFrame(np.equal.outer(a, b).any(1) * b, columns=b)
A B C D E F G
0 A C D E F G
1 A B D E F G
2 A B C D E F G
3 A D E F
4 A E G
This is more list pivot problem
s=df.mask(df=='').stack().reset_index()
s.pivot(index='level_0',columns=0,values=0)
Out[34]:
0 A B C D
level_0
0 A B C D
1 A NaN C NaN
2 A NaN C D
Here's one way with stack, merge, pivot:
new_df = df.stack().reset_index()
(new_df.merge(new_df, left_on=['level_0', 'level_1'],
right_on=['level_0',0],
how='left')
.pivot('level_0', 'level_1', '0_y')
)
Output:
level_1 A B C D E F G
level_0
0 A NaN C D E F G
1 A B NaN D E F G
2 A B C D E F G
3 A NaN NaN D E F NaN
4 A NaN NaN NaN E NaN G

Expand pandas dataframe by replacing cell value with a list

I have a pandas dataframe like this below:
A B C
a b c
d e f
where A B and C are column names. Now i have a list:
mylist = [1,2,3]
I want to replace the c in column C with list such as dataframe expands for all value of list, like below:
A B C
a b 1
a b 2
a b 3
d e f
Any help would be appreciated!
I tried this,
mylist = [1,2,3]
x=pd.DataFrame({'mylist':mylist})
x['C']='c'
res= pd.merge(df,x,on=['C'],how='left')
res['mylist']=res['mylist'].fillna(res['C'])
For further,
del res['C']
res.rename(columns={"mylist":"C"},inplace=True)
print res
Output:
A B C
0 a b 1
1 a b 2
2 a b 3
3 d e f
You can use:
print (df)
A B C
0 a b c
1 d e f
2 a b c
3 t e w
mylist = [1,2,3]
idx1 = df.index[df.C == 'c']
df = df.loc[idx1.repeat(len(mylist))].assign(C=mylist * len(idx1)).append(df[df.C != 'c'])
print (df)
A B C
0 a b 1
0 a b 2
0 a b 3
2 a b 1
2 a b 2
2 a b 3
1 d e f
3 t e w

Selecting columns from a pandas dataframe based on columns conditions

I want to select to new dataframe, columns that have 'C' in value
protein 1 2 3 4 5
prot1 C M D F A
prot2 C D A M A
prot3 C C D F A
prot4 S D F C L
prot5 S D A I L
So i want to have this:
protein 1 2 4
prot1 C M F
prot2 C D M
prot3 C C F
prot4 S D C
prot5 S D I
Number of colums can be n, i found examples only which i must specify column name... i cant do this here. The script should check column by colummn.
In [22]: df[['protein']].join(df[df.columns[df.eq('C').any()]])
Out[22]:
protein 1 2 4
0 prot1 C M F
1 prot2 C D M
2 prot3 C C F
3 prot4 S D C
4 prot5 S D I
Use:
np.random.seed(123)
n = np.random.choice(['C','M','D', '-'], size=(3,10))
n[:,0] = ['a','b','w']
foo = pd.DataFrame(n)
print (foo)
0 1 2 3 4 5 6 7 8 9
0 a M D D C D D M - D
1 b M D M C M D - M C
2 w C - M - D M C C C
mask = foo.eq('C').any()
#set columns which need in output
mask.loc[0] = True
#filter
print (foo.loc[:,mask])
0 1 4 7 8 9
0 a M C M - D
1 b M C - M C
2 w C - C C C

how to replace certain elements in the pandas dataframe index

I have a dictionary and want to use .replace to only replace the indices that are in the dictionary key withe values.
dicts = {"certain index element1": "changed element1",
"certain index element2": "changed element2",
"certain index element3": "changed element3",
}
This does not work:
df.replace(dicts,regex=False,inplace=True)
The df is huge so I can not reassign all of the index from scratch. I only need to change certain elements and everything else remains the same.
If I wanted to replace certain elements within the df (not indices) it would work but for indices it does not.
Use rename(index=dicts)
example
df = pd.DataFrame(
np.random.choice(list('abcd'), (10, 10)),
list('ABCDEFGHIJ')
)
dc = {'A': '_A_', 'B': '_B_'}
df.rename(index=dc, inplace=True)
print(df)
0 1 2 3 4 5 6 7 8 9
_A_ a a a a b a d c c b
_B_ b c c a a a b b a b
C b b d b d c c a a b
D d d d d c b b a a d
E d c c c a d a d d a
F d c d c d d d d b d
G b c d c c b c a a b
H c c c b b a a b c a
I c a a b a d c c a a
J a a a c a b d c c c

Categories

Resources