Display two columns with a condition in Pandas

Display two columns with a condition in Pandas - python

Suppose I have a dataframe df such as
A B C
0 a 1
1 b 1
2 c 2
I would like to return B, and C when C==1, like so
B C
a 1
b 1
I've got as far as df.B[df.C==1], which returns
B
a
b
which is right (countwise) but wrong in the slice. How do I get C?

You can use loc or query:
print df.loc[df.C==1, ['B','C']]
B C
0 a 1
1 b 1
print df[['B','C']].query('C == 1')
B C
0 a 1
1 b 1
Or if you need only column C:
print df.loc[df.C==1, 'C']
0 1
1 1
Name: C, dtype: int64

Related

How to convert binary columns with multiple occurrences into categorical data in Pandas

I have the following example data set
A
B
C
D
foo
0
1
1
bar
0
0
1
baz
1
1
0
How could extract the column names of each 1 occurrence in a row and put that into another column E so that I get the following table:
A
B
C
D
E
foo
0
1
1
C, D
bar
0
0
1
D
baz
1
1
0
B, C
Note that there can be more than two 1s per row.

You can use DataFrame.dot.
df['E'] = df[['B', 'C', 'D']].dot(df.columns[1:] + ', ').str.rstrip(', ')
df
A B C D E
0 foo 0 1 1 C, D
1 bar 0 0 1 D
2 baz 1 1 0 B, C
Inspired by jezrael's answer in this post.
Another way is that you can convert each row to boolean and use it as a selection mask to filter the column names.
cols = pd.Index(['B', 'C', 'D'])
df['E'] = df[cols].astype('bool').apply(lambda row: ", ".join(cols[row]), axis=1)
df
A B C D E
0 foo 0 1 1 C, D
1 bar 0 0 1 D
2 baz 1 1 0 B, C

Join an array to every row in the pandas dataframe

I have a data frame and an array as follows:
df = pd.DataFrame({'x': range(0,5), 'y' : range(1,6)})
s = np.array(['a', 'b', 'c'])
I would like to attach the array to every row of the data frame, such that I got a data frame as follows:
What would be the most efficient way to do this?

Just plain assignment:
# replace the first `s` with your desired column names
df[s] = [s]*len(df)

Try this:
for i in s:
df[i] = i
Output:
x y a b c
0 0 1 a b c
1 1 2 a b c
2 2 3 a b c
3 3 4 a b c
4 4 5 a b c

You could use pandas.concat:
pd.concat([df, pd.DataFrame(s).T], axis=1).ffill()
output:
x y 0 1 2
0 0 1 a b c
1 1 2 a b c
2 2 3 a b c
3 3 4 a b c
4 4 5 a b c

You can try using df.loc here.
df.loc[:, s] = s
print(df)
x y a b c
0 0 1 a b c
1 1 2 a b c
2 2 3 a b c
3 3 4 a b c
4 4 5 a b c

python pandas index of ones (1s) at row-wise

From Pandas Dataframe, how to get the index of all ones at the row level?
My data frame has around a hundred columns. here is an example:
a b c d
0 1 0 1 0
1 0 0 0 1
2 1 1 0 1
3 1 1 0 0
4 1 1 1 1
The expected result is
0 a,c
1 d
2 a,b,d
3 a,b
4 a,b,c,d
I found this question on stackoverflow
index of non "NaN" values in Pandas
but it works at the column level
Thanks in advance.

If there are only 1 and 0 values use DataFrame.dot for matrix multiplication with columns names and separator, last remove separator with Series.str.rstrip:
df['e'] = df.dot(df.columns + ', ').str.rstrip(', ')
#if exist another values like 0,1 and compare 1
#df['e'] = df.eq(1).dot(df.columns + ', ').str.rstrip(', ')
print (df)
a b c d e
0 1 0 1 0 a, c
1 0 0 0 1 d
2 1 1 0 1 a, b, d
3 1 1 0 0 a, b
4 1 1 1 1 a, b, c, d
Also for Series use:
s = df.dot(df.columns + ', ').str.rstrip(', ')
print (s)
0 a, c
1 d
2 a, b, d
3 a, b
4 a, b, c, d
dtype: object

Try:
df=df.stack()
df=df.loc[df.eq(1)].reset_index(level=1).groupby(level=0).agg(', '.join)
Outputs:
level_1
0 a, c
1 d
2 a, b, d
3 a, b
4 a, b, c, d

merging rows with repeating column values

I have a dataframe as follows:
data
0 a
1 a
2 a
3 a
4 a
5 b
6 b
7 b
8 b
9 b
I want to group the repeating values of a and b into a single row element as follows:
data
0 a
a
a
a
a
1 b
b
b
b
b
How do I go about doing this? I tried the following but it puts each repeating value in its own column
df.groupby('data')

Seems like a pivot problem, but since you missing the column(create by cumcount) and index(create by factorize) columns , it is hard to figure out
pd.crosstab(pd.factorize(df.data)[0],df.groupby('data').cumcount(),df.data,aggfunc='sum')
Out[358]:
col_0 0 1 2 3 4
row_0
0 a a a a a
1 b b b b b

Something like
index = ((df['data'] != df['data'].shift()).cumsum() - 1).rename(columns= {'data':''})
df = df.set_index(index)
data
0 a
0 a
0 a
0 a
0 a
1 b
1 b
1 b
1 b
1 b

You can use pd.factorize followed by set_index:
df = df.assign(key=pd.factorize(df['data'], sort=False)[0]).set_index('key')
print(df)
data
key
0 a
0 a
0 a
0 a
0 a
1 b
1 b
1 b
1 b
1 b

Append values of row with same values

I have following data frame:
1 A a
1 A b
2 B c
1 A d
How do I append all the values of a row with same values to data frame:
1 A a,c,d
2 B c

You can use groupby and apply function join :
df.columns = ['a','b','c']
print (df)
a b c
0 1 A a
1 1 A b
2 2 B c
3 1 A d
print (df.groupby(['a', 'b'])['c'].apply(', '.join).reset_index())
a b c
0 1 A a, b, d
1 2 B c
Or if first column is index:
df.columns = ['a','b']
print (df)
a b
1 A a
1 A b
2 B c
1 A d
df1 = df.b.groupby([df.index, df.a]).apply(', '.join).reset_index(name='c')
df1.columns = ['a','b','c']
print (df1)
a b c
0 1 A a, b, d
1 2 B c

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Display two columns with a condition in Pandas - python

Suppose I have a dataframe df such as A B C 0 a 1 1 b 1 2 c 2 I would like to return B, and C when C==1, like so B C a 1 b 1 I've got as far as df.B[df.C==1], which returns B a b which is right (countwise) but wrong in the slice. How do I get C?

You can use loc or query: print df.loc[df.C==1, ['B','C']] B C 0 a 1 1 b 1 print df[['B','C']].query('C == 1') B C 0 a 1 1 b 1 Or if you need only column C: print df.loc[df.C==1, 'C'] 0 1 1 1 Name: C, dtype: int64

Related

How to convert binary columns with multiple occurrences into categorical data in Pandas

Join an array to every row in the pandas dataframe

python pandas index of ones (1s) at row-wise

merging rows with repeating column values

Append values of row with same values

Categories

Resources