python pandas index of ones (1s) at row-wise

python pandas index of ones (1s) at row-wise - python

From Pandas Dataframe, how to get the index of all ones at the row level?
My data frame has around a hundred columns. here is an example:
a b c d
0 1 0 1 0
1 0 0 0 1
2 1 1 0 1
3 1 1 0 0
4 1 1 1 1
The expected result is
0 a,c
1 d
2 a,b,d
3 a,b
4 a,b,c,d
I found this question on stackoverflow
index of non "NaN" values in Pandas
but it works at the column level
Thanks in advance.

If there are only 1 and 0 values use DataFrame.dot for matrix multiplication with columns names and separator, last remove separator with Series.str.rstrip:
df['e'] = df.dot(df.columns + ', ').str.rstrip(', ')
#if exist another values like 0,1 and compare 1
#df['e'] = df.eq(1).dot(df.columns + ', ').str.rstrip(', ')
print (df)
a b c d e
0 1 0 1 0 a, c
1 0 0 0 1 d
2 1 1 0 1 a, b, d
3 1 1 0 0 a, b
4 1 1 1 1 a, b, c, d
Also for Series use:
s = df.dot(df.columns + ', ').str.rstrip(', ')
print (s)
0 a, c
1 d
2 a, b, d
3 a, b
4 a, b, c, d
dtype: object

Try:
df=df.stack()
df=df.loc[df.eq(1)].reset_index(level=1).groupby(level=0).agg(', '.join)
Outputs:
level_1
0 a, c
1 d
2 a, b, d
3 a, b
4 a, b, c, d

Related

How to convert binary columns with multiple occurrences into categorical data in Pandas

I have the following example data set
A
B
C
D
foo
0
1
1
bar
0
0
1
baz
1
1
0
How could extract the column names of each 1 occurrence in a row and put that into another column E so that I get the following table:
A
B
C
D
E
foo
0
1
1
C, D
bar
0
0
1
D
baz
1
1
0
B, C
Note that there can be more than two 1s per row.

You can use DataFrame.dot.
df['E'] = df[['B', 'C', 'D']].dot(df.columns[1:] + ', ').str.rstrip(', ')
df
A B C D E
0 foo 0 1 1 C, D
1 bar 0 0 1 D
2 baz 1 1 0 B, C
Inspired by jezrael's answer in this post.
Another way is that you can convert each row to boolean and use it as a selection mask to filter the column names.
cols = pd.Index(['B', 'C', 'D'])
df['E'] = df[cols].astype('bool').apply(lambda row: ", ".join(cols[row]), axis=1)
df
A B C D E
0 foo 0 1 1 C, D
1 bar 0 0 1 D
2 baz 1 1 0 B, C

Pandas select on multiple columns then replace

I am trying to do a multiple column select then replace in pandas
df:
a b c d e
0 1 1 0 none
0 0 0 1 none
1 0 0 0 none
0 0 0 0 none
select where any or all of a, b, c, d are non zero
i, j = np.where(df)
s=pd.Series(dict(zip(zip(i, j),
df.columns[j]))).reset_index(-1, drop=True)
s:
0 b
0 c
1 d
2 a
Now I want to replace the values in column e by the series:
df['e'] = s.values
so that e looks like:
e:
b, c
d
a
none
But the problem is that the lengths of the series are different to the number of rows in the dataframe.
Any idea on how I can do this?

Use DataFrame.dot for product with columns names, add rstrip, last add numpy.where for replace empty strings to None:
e = df.dot(df.columns + ', ').str.rstrip(', ')
df['e'] = np.where(e.astype(bool), e, None)
print (df)
a b c d e
0 0 1 1 0 b, c
1 0 0 0 1 d
2 1 0 0 0 a
3 0 0 0 0 None

You can locate the 1's and use their locations as boolean indexes into the dataframe columns:
df['e'] = (df==1).apply(lambda x: df.columns[x], axis=1)\
.str.join(",").replace('','none')
# a b c d e
#0 0 1 1 0 b,c
#1 0 0 0 1 d
#2 1 0 0 0 a
#3 0 0 0 0 none

Append values of row with same values

I have following data frame:
1 A a
1 A b
2 B c
1 A d
How do I append all the values of a row with same values to data frame:
1 A a,c,d
2 B c

You can use groupby and apply function join :
df.columns = ['a','b','c']
print (df)
a b c
0 1 A a
1 1 A b
2 2 B c
3 1 A d
print (df.groupby(['a', 'b'])['c'].apply(', '.join).reset_index())
a b c
0 1 A a, b, d
1 2 B c
Or if first column is index:
df.columns = ['a','b']
print (df)
a b
1 A a
1 A b
2 B c
1 A d
df1 = df.b.groupby([df.index, df.a]).apply(', '.join).reset_index(name='c')
df1.columns = ['a','b','c']
print (df1)
a b c
0 1 A a, b, d
1 2 B c

Display two columns with a condition in Pandas

Suppose I have a dataframe df such as
A B C
0 a 1
1 b 1
2 c 2
I would like to return B, and C when C==1, like so
B C
a 1
b 1
I've got as far as df.B[df.C==1], which returns
B
a
b
which is right (countwise) but wrong in the slice. How do I get C?

You can use loc or query:
print df.loc[df.C==1, ['B','C']]
B C
0 a 1
1 b 1
print df[['B','C']].query('C == 1')
B C
0 a 1
1 b 1
Or if you need only column C:
print df.loc[df.C==1, 'C']
0 1
1 1
Name: C, dtype: int64

Convert Two column data frame to occurrence matrix in pandas

Hi all I have a csv file which contains data as the format below
A a
A b
B f
B g
B e
B h
C d
C e
C f
The first column contains items second column contains available feature from feature vector=[a,b,c,d,e,f,g,h]
I want to convert this to occurence matrix look like below
a,b,c,d,e,f,g,h
A 1,1,0,0,0,0,0,0
B 0,0,0,0,1,1,1,1
C 0,0,0,1,1,1,0,0
Can anyone tell me how to do this using pandas?

Here is another way to do it using pd.get_dummies().
import pandas as pd
# your data
# =======================
df
col1 col2
0 A a
1 A b
2 B f
3 B g
4 B e
5 B h
6 C d
7 C e
8 C f
# processing
# ===================================
pd.get_dummies(df.col2).groupby(df.col1).apply(max)
a b d e f g h
col1
A 1 1 0 0 0 0 0
B 0 0 0 1 1 1 1
C 0 0 1 1 1 0 0

Unclear if your data has a typo or not but you can crosstab for this:
In [95]:
pd.crosstab(index=df['A'], columns = df['a'])
Out[95]:
a b d e f g h
A
A 1 0 0 0 0 0
B 0 0 1 1 1 1
C 0 1 1 1 0 0
In your sample data your second column has value a as the name of that column but in your expected output it's in the column as a value
EDIT
OK I fixed your input data so it generates the correct result:
In [98]:
import pandas as pd
import io
t="""A a
A b
B f
B g
B e
B h
C d
C e
C f"""
df = pd.read_csv(io.StringIO(t), sep='\s+', header=None, names=['A','a'])
df
Out[98]:
A a
0 A a
1 A b
2 B f
3 B g
4 B e
5 B h
6 C d
7 C e
8 C f
In [99]:
ct = pd.crosstab(index=df['A'], columns = df['a'])
ct
Out[99]:
a a b d e f g h
A
A 1 1 0 0 0 0 0
B 0 0 0 1 1 1 1
C 0 0 1 1 1 0 0

This approach yields the same result in a scipy sparse coo matrix much faster
from scipy import sparse
df['col1'] = df['col1'].astype("category")
df['col2'] = df['col2'].astype("category")
df['ones'] = 1
user_items = sparse.coo_matrix((df.ones.astype(float),
(df.col1.cat.codes,
df.col2.cat.codes)))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python pandas index of ones (1s) at row-wise - python

Try: df=df.stack() df=df.loc[df.eq(1)].reset_index(level=1).groupby(level=0).agg(', '.join) Outputs: level_1 0 a, c 1 d 2 a, b, d 3 a, b 4 a, b, c, d

Related

How to convert binary columns with multiple occurrences into categorical data in Pandas

Pandas select on multiple columns then replace

Append values of row with same values

Display two columns with a condition in Pandas

Convert Two column data frame to occurrence matrix in pandas

Categories

Resources