Selecting columns from a pandas dataframe based on columns conditions - python

I want to select to new dataframe, columns that have 'C' in value
protein 1 2 3 4 5
prot1 C M D F A
prot2 C D A M A
prot3 C C D F A
prot4 S D F C L
prot5 S D A I L
So i want to have this:
protein 1 2 4
prot1 C M F
prot2 C D M
prot3 C C F
prot4 S D C
prot5 S D I
Number of colums can be n, i found examples only which i must specify column name... i cant do this here. The script should check column by colummn.

In [22]: df[['protein']].join(df[df.columns[df.eq('C').any()]])
Out[22]:
protein 1 2 4
0 prot1 C M F
1 prot2 C D M
2 prot3 C C F
3 prot4 S D C
4 prot5 S D I

Use:
np.random.seed(123)
n = np.random.choice(['C','M','D', '-'], size=(3,10))
n[:,0] = ['a','b','w']
foo = pd.DataFrame(n)
print (foo)
0 1 2 3 4 5 6 7 8 9
0 a M D D C D D M - D
1 b M D M C M D - M C
2 w C - M - D M C C C
mask = foo.eq('C').any()
#set columns which need in output
mask.loc[0] = True
#filter
print (foo.loc[:,mask])
0 1 4 7 8 9
0 a M C M - D
1 b M C - M C
2 w C - C C C

Related

Reordering DataFrame in Pandas

I have a DataFrame that looks like this:
A B C D E
1 a a a a a
2 b b b b b
3 c c c c c
4 d d d d d
5 e e e e e
6 f f f f f
Anyone knows how to reorder it using Pandas to make it look like this:
A B C D E F G H I J
1 a a a a a b b b b b
3 c c c c c d d d d d
5 e e e e e f f f f f
I tried reading the documentation https://pandas.pydata.org/docs/user_guide/reshaping.html. Its difficult to understand as a beginner, appreciate any help.
Assuming that you want to group every two rows in a single row, use the underlying numpy array and reshape it:
from string import ascii_uppercase
out = (pd.DataFrame(df.to_numpy().reshape(len(df)//2, -1),
index=df.index[::2])
.rename(columns=dict(enumerate(ascii_uppercase)))
)
Output:
A B C D E F G H I J
1 a a a a a b b b b b
3 c c c c c d d d d d
5 e e e e e f f f f f

Pairwise matrix counts of two columns using pandas [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 2 years ago.
I am trying to obtain pairwise counts of two column variables using pandas. I have a dataframe of two columns in the following format:
col1 col2
a e
b g
c h
d f
a g
b h
c f
d e
a f
b g
c g
d h
a e
b e
c g
d h
b h
What I would like to get as output would be the following matrix of counts, for e.g.:
e f g h
a 2 1 1 0
b 1 0 2 2
c 0 1 2 1
d 1 1 0 2
I am getting totally confused with pandas iterating over columns, rows, indexes and such. Appreciate some guidance here.
Pandas often has simple functions built in - in this case, you want crosstab:
pd.crosstab(dat['col1'], dat['col2'])
full code:
import pandas as pd
from io import StringIO
x = '''col1 col2
a e
b g
c h
d f
a g
b h
c f
d e
a f
b g
c g
d h
a e
b e
c g
d h
b h'''
dat = pd.read_csv(StringIO(x), sep = '\s+')
pd.crosstab(dat['col1'], dat['col2'])
You're looking for a crosstab:
count_matrix = pd.crosstab(index=df["col1"], columns=df["col2"])
print(count_matrix)
col2 e f g h
col1
a 2 1 1 0
b 1 0 2 2
c 0 1 2 1
d 1 1 0 2
If you don't like the column/index names in (e.g. still seeing "col1" and "col2"), then you can remove them with rename_axis:
count_matrix = count_matrix.rename_axis(index=None, columns=None)
print(count_matrix)
e f g h
a 2 1 1 0
b 1 0 2 2
c 0 1 2 1
d 1 1 0 2
If you want that all together in one snippet:
count_matrix = (pd.crosstab(index=df["col1"], columns=df["col2"])
.rename_axis(index=None, columns=None))

How to replace row values in a particular column using index?

I have the following data frames,
data frame- 1 (named as df1)
index A B C
1 q a w
2 e d q
3 r f r
4 t g t
5 y j o
6 i k p
7 j w k
8 i o u
9 a p v
10 o l a
data frame- 2 (named as df2)
index C
3 a
7 b
9 c
10 d
I tried to replace the rows for specific indexes in the column "C" using the data frame - 2 for the data frame - 1 but I got the following result after using the below code:
df1['C'] = df2
Output:
index A B C
1 q a NaN
2 e d NaN
3 r f a
4 t g NaN
5 y j NaN
6 i k NaN
7 j w b
8 i o NaN
9 a p c
10 o l d
But I want something like this,
Expected output:
index A B C
1 q a w
2 e d q
3 r f a
4 t g t
5 y j o
6 i k p
7 j w b
8 i o u
9 a p c
10 o l d
So clearly I don't need NaN values in column "C" instead I want the values to remain as it is. (I mean it should change only for that particular index value).
Please let me know the solution.
Thanks in advance!
Assuming index is the actual index column, we can do loc:
df1.loc[df2.index, 'C'] = df2['C']
Or even more simple with:
df1.update(df2)
Output:
A B C
index
1 q a w
2 e d q
3 r f a
4 t g t
5 y j o
6 i k p
7 j w b
8 i o u
9 a p c
10 o l d
Try this
for idx, row in df2.iterrows():
df1.at[idx, 'C'] = row['C']

Pandas Shift Row Value to Match Column Name

I have a sample dataset that has a set list of column names. In shifting data around, I have each row printing letters in each row as seen below.
I am trying to shift the values of each row to match either respective column. I have tried doing pd.shift() to do so but have not had much success. I am trying to get what is seen below. Any thoughts?
import pandas as pd
df = pd.DataFrame({'A': list('AAAAA'),
'B': list('CBBDE'),
'C': list('DDCEG'),
'D': list('EEDF '),
'E': list('FFE '),
'F': list('GGF '),
'G': list(' G ')})
A B C D E F G
0 A C D E F G
1 A B D E F G
2 A B C D E F G
3 A D E F
4 A E G
After:
A B C D E F G
0 A C D E F G
1 A B D E F G
2 A B C D E F G
3 A D E F
4 A E G
Here's a broadcasted comparison approach. This will be quite fast, but does have a higher memory complexity.
a = df.to_numpy()
b = df.columns.to_numpy()
pd.DataFrame(np.equal.outer(a, b).any(1) * b, columns=b)
A B C D E F G
0 A C D E F G
1 A B D E F G
2 A B C D E F G
3 A D E F
4 A E G
This is more list pivot problem
s=df.mask(df=='').stack().reset_index()
s.pivot(index='level_0',columns=0,values=0)
Out[34]:
0 A B C D
level_0
0 A B C D
1 A NaN C NaN
2 A NaN C D
Here's one way with stack, merge, pivot:
new_df = df.stack().reset_index()
(new_df.merge(new_df, left_on=['level_0', 'level_1'],
right_on=['level_0',0],
how='left')
.pivot('level_0', 'level_1', '0_y')
)
Output:
level_1 A B C D E F G
level_0
0 A NaN C D E F G
1 A B NaN D E F G
2 A B C D E F G
3 A NaN NaN D E F NaN
4 A NaN NaN NaN E NaN G

Expand pandas dataframe by replacing cell value with a list

I have a pandas dataframe like this below:
A B C
a b c
d e f
where A B and C are column names. Now i have a list:
mylist = [1,2,3]
I want to replace the c in column C with list such as dataframe expands for all value of list, like below:
A B C
a b 1
a b 2
a b 3
d e f
Any help would be appreciated!
I tried this,
mylist = [1,2,3]
x=pd.DataFrame({'mylist':mylist})
x['C']='c'
res= pd.merge(df,x,on=['C'],how='left')
res['mylist']=res['mylist'].fillna(res['C'])
For further,
del res['C']
res.rename(columns={"mylist":"C"},inplace=True)
print res
Output:
A B C
0 a b 1
1 a b 2
2 a b 3
3 d e f
You can use:
print (df)
A B C
0 a b c
1 d e f
2 a b c
3 t e w
mylist = [1,2,3]
idx1 = df.index[df.C == 'c']
df = df.loc[idx1.repeat(len(mylist))].assign(C=mylist * len(idx1)).append(df[df.C != 'c'])
print (df)
A B C
0 a b 1
0 a b 2
0 a b 3
2 a b 1
2 a b 2
2 a b 3
1 d e f
3 t e w

Categories

Resources