print columns names if row equal to value in python - python

How can i iterate between rows and print the columns names in one column if the value = 1
mydata = [{'a' : '0', 'b': 1, 'c': 0}, {'a' : 1, 'b': 0, 'c':1}, {'a' : '0', 'b': 1, 'c':1}]
df = pd.DataFrame(mydata)
a b c Result
0 1 0 b
1 0 1 a , c
0 1 1 b , c
The result only shows the columns name that are equal to 1

Using dot
df['New']=df.astype(int).dot(df.columns+',').str[:-1]
df
Out[44]:
a b c New
0 0 1 0 b
1 1 0 1 a,c
2 0 1 1 b,c

Use boolean indexing to index the row, and join column names
df['new'] = df.eq(1).apply(lambda x: ', '.join(x[x].index), axis = 1)
a b c new
0 0 1 0 b
1 1 0 1 a, c
2 0 1 1 b, c

You an also do this:
for i in range(len(df)):
df.set_value(i,'Result',[df.columns[(df == 1).iloc[i]]])

More beginners solutions would be
for index, row in df.iterrows():
for key in row.keys():
print(key if row.get(key) == 1 else None)

Related

How to sum up pandas columns where the number is determined by the values of another column? [duplicate]

I have a data frame as:
a b c d......
1 1
3 3 3 5
4 1 1 4 6
1 0
I want to select number of columns based on value given in column "a". In this case for first row it would only select column b.
How can I achieve something like:
df.iloc[:,column b:number of columns corresponding to value in column a]
My expected output would be:
a b c d e
1 1 0 0 1 # 'e' contains value in column b because colmn a = 1
3 3 3 5 335 # 'e' contains values of column b,c,d because colm a
4 1 1 4 1 # = 3
1 0 NAN
Define a little function for this:
def select(df, r):
return df.iloc[r, 1:1 + df.iat[r, 0]]
The function uses iat to query the a column for that row, and iloc to select columns from the same row.
Call it as such:
select(df, 0)
b 1.0
Name: 0, dtype: float64
And,
select(df, 1)
b 3.0
c 3.0
d 5.0
Name: 1, dtype: float64
Based on your edit, consider this -
df
a b c d e
0 1 1 0 0 0
1 3 3 3 5 0
2 4 1 1 4 6
3 1 0 0 0 0
Use where/mask (with numpy broadcasting) + agg here -
df['e'] = df.iloc[:, 1:]\
.astype(str)\
.where(np.arange(df.shape[1] - 1) < df.a[:, None], '')\
.agg(''.join, axis=1)
df
a b c d e
0 1 1 0 0 1
1 3 3 3 5 335
2 4 1 1 4 1146
3 1 0 0 0 0
If nothing matches, then those entries in e will have an empty string. Just use replace -
df['e'] = df['e'].replace('', np.nan)
A numpy slicing approach
a = v[:, 0]
b = v[:, 1:]
n, m = b.shape
b = b.ravel()
b = np.where(b == 0, '', b.astype(str))
r = np.arange(n) * m
f = lambda t: b[t[0]:t[1]]
df.assign(g=list(map(''.join, map(f, zip(r, r + a)))))
a b c d e g
0 1 1 0 0 0 1
1 3 3 3 5 0 335
2 4 1 1 4 6 1146
3 1 0 0 0 0
Edit: one line solution with slicing.
df["f"] = df.astype(str).apply(lambda r: "".join(r[1:int(r["a"])+1]), axis=1)
# df["f"] = df["f"].astype(int) if you need `f` to be integer
df
a b c d e f
0 1 1 X X X 1
1 3 3 3 5 X 335
2 4 1 1 4 6 1146
3 1 0 X X X 0
Dataset used:
df = pd.DataFrame({'a': {0: 1, 1: 3, 2: 4, 3: 1},
'b': {0: 1, 1: 3, 2: 1, 3: 0},
'c': {0: 'X', 1: '3', 2: '1', 3: 'X'},
'd': {0: 'X', 1: '5', 2: '4', 3: 'X'},
'e': {0: 'X', 1: 'X', 2: '6', 3: 'X'}})
Suggestion for improvement would be appreciated!

How to conditionally add one hot vector to a Pandas DataFrame

I have the following Pandas DataFrame in Python:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.array([[1, 2, 3], [3, 2, 1], [2, 1, 1]]),
columns=['a', 'b', 'c'])
df
It looks as the following when you output it:
a b c
0 1 2 3
1 3 2 1
2 2 1 1
I need to add 3 new columns, as column "d", column "e", and column "f".
Values in each new column will be determined based on the values of column "b" and column "c".
In a given row:
If the value of column "b" is bigger than the value of column "c", columns [d, e, f] will have the values [1, 0, 0].
If the value of column "b" is equal to the value of column "c", columns [d, e, f] will have the values [0, 1, 0].
If the value of column "b" is smaller than the value of column "c", columns [d, e, f] will have the values [0, 0, 1].
After this operation, the DataFrame needs to look as the following:
a b c d e f
0 1 2 3 0 0 1 # Since b smaller than c
1 3 2 1 1 0 0 # Since b bigger than c
2 2 1 1 0 1 0 # Since b = c
My original DataFrame is much bigger than the one in this example.
Is there a good way of doing this in Python without looping through the DataFrame?
You can use np.where to create condition vector and use str.get_dummies to create dummies
df['vec'] = np.where(df.b>df.c, 'd', np.where(df.b == df.c, 'e', 'f'))
df = df.assign(**df['vec'].str.get_dummies()).drop('vec',1)
a b c d e f
0 1 2 3 0 0 1
1 3 2 1 1 0 0
2 2 1 1 0 1 0
Let us try np.sign with get_dummies, -1 is c<b, 0 is c=b, 1 is c>b
df=df.join(np.sign(df.eval('c-b')).map({-1:'d',0:'e',1:'f'}).astype(str).str.get_dummies())
df
Out[29]:
a b c d e f
0 1 2 3 0 0 1
1 3 2 1 1 0 0
2 2 1 1 0 1 0
You simply harness the Boolean conditions you've already specified.
df["d"] = np.where(df.b > df.c, 1, 0)
df["e"] = np.where(df.b == df.c, 1, 0)
df["f"] = np.where(df.b < df.c, 1, 0)

How do I delete a column that contains only zeros from a given row in pandas

I've found how to remove column with zeros for all the rows using the command df.loc[:, (df != 0).any(axis=0)], and I need to do the same but given the row number.
For example, for the folowing df
In [75]: df = pd.DataFrame([[1,1,0,0], [1,0,1,0]], columns=['a','b','c','d'])
In [76]: df
Out[76]:
a b c d
0 1 1 0 0
1 1 0 1 0
Give me the columns with non-zeros for the row 0 and I would expect the result:
a b
0 1 1
And for the row 1 get:
a c
1 1 1
I tried a lot of combinations of commands but I couldn't find a solution.
UPDATE:
I have a 300x300 matrix, I need to better visualize its result.
Below a pseudo-code trying to show what I need
for i in range(len(df[rows])):
_df = df.iloc[i]
_df = _df.filter(remove_zeros_columns)
print('Row: ', i)
print(_df)
Result:
Row: 0
a b
0 1 1
Row: 1
a c f
1 1 5 10
Row: 2
e
2 20
Best Regards.
Kleyson Rios.
You can change data structure:
df = df.reset_index().melt('index', var_name='columns').query('value != 0')
print (df)
index columns value
0 0 a 1
1 1 a 1
2 0 b 1
5 1 c 1
If need new column by values joined by , compare values for not equal by DataFrame.ne and use matrix multiplication by DataFrame.dot:
df['new'] = df.ne(0).dot(df.columns + ', ').str.rstrip(', ')
print (df)
a b c d new
0 1 1 0 0 a, b
1 1 0 1 0 a, c
EDIT:
for i in df.index:
row = df.loc[[i]]
a = row.loc[:, (row != 0).any()]
print ('Row {}'.format(i))
print (a)
Or:
def f(x):
print ('Row {}'.format(x.name))
print (x[x!=0].to_frame().T)
df.apply(f, axis=1)
Row 0
a b
0 1 1
Row 1
a c
1 1 1
df = pd.DataFrame([[1, 1, 0, 0], [1, 0, 1, 0]], columns=['a', 'b', 'c', 'd'])
def get(row):
return list(df.columns[row.ne(0)])
df['non zero column'] = df.apply(lambda x: get(x), axis=1)
print(df)
also if you want single liner use this
df['non zero column'] = [list(df.columns[i]) for i in df.ne(0).values]
output
a b c d non zero column
0 1 1 0 0 [a, b]
1 1 0 1 0 [a, c]
I think this answers your question more strictly.
Just change the value of given_row as needed.
given_row = 1
mask_all_rows = df.apply(lambda x: x!=0, axis=0)
mask_row = mask_all_rows.loc[given_row]
cols_to_keep = mask_row.index[mask_row == True].tolist()
df_filtered = df[cols_to_keep]
# And if you only want to keep the given row
df_filtered = df_filtered[df_filtered.index == given_row]

Selecting columns based on value given in other column using pandas in python

I have a data frame as:
a b c d......
1 1
3 3 3 5
4 1 1 4 6
1 0
I want to select number of columns based on value given in column "a". In this case for first row it would only select column b.
How can I achieve something like:
df.iloc[:,column b:number of columns corresponding to value in column a]
My expected output would be:
a b c d e
1 1 0 0 1 # 'e' contains value in column b because colmn a = 1
3 3 3 5 335 # 'e' contains values of column b,c,d because colm a
4 1 1 4 1 # = 3
1 0 NAN
Define a little function for this:
def select(df, r):
return df.iloc[r, 1:1 + df.iat[r, 0]]
The function uses iat to query the a column for that row, and iloc to select columns from the same row.
Call it as such:
select(df, 0)
b 1.0
Name: 0, dtype: float64
And,
select(df, 1)
b 3.0
c 3.0
d 5.0
Name: 1, dtype: float64
Based on your edit, consider this -
df
a b c d e
0 1 1 0 0 0
1 3 3 3 5 0
2 4 1 1 4 6
3 1 0 0 0 0
Use where/mask (with numpy broadcasting) + agg here -
df['e'] = df.iloc[:, 1:]\
.astype(str)\
.where(np.arange(df.shape[1] - 1) < df.a[:, None], '')\
.agg(''.join, axis=1)
df
a b c d e
0 1 1 0 0 1
1 3 3 3 5 335
2 4 1 1 4 1146
3 1 0 0 0 0
If nothing matches, then those entries in e will have an empty string. Just use replace -
df['e'] = df['e'].replace('', np.nan)
A numpy slicing approach
a = v[:, 0]
b = v[:, 1:]
n, m = b.shape
b = b.ravel()
b = np.where(b == 0, '', b.astype(str))
r = np.arange(n) * m
f = lambda t: b[t[0]:t[1]]
df.assign(g=list(map(''.join, map(f, zip(r, r + a)))))
a b c d e g
0 1 1 0 0 0 1
1 3 3 3 5 0 335
2 4 1 1 4 6 1146
3 1 0 0 0 0
Edit: one line solution with slicing.
df["f"] = df.astype(str).apply(lambda r: "".join(r[1:int(r["a"])+1]), axis=1)
# df["f"] = df["f"].astype(int) if you need `f` to be integer
df
a b c d e f
0 1 1 X X X 1
1 3 3 3 5 X 335
2 4 1 1 4 6 1146
3 1 0 X X X 0
Dataset used:
df = pd.DataFrame({'a': {0: 1, 1: 3, 2: 4, 3: 1},
'b': {0: 1, 1: 3, 2: 1, 3: 0},
'c': {0: 'X', 1: '3', 2: '1', 3: 'X'},
'd': {0: 'X', 1: '5', 2: '4', 3: 'X'},
'e': {0: 'X', 1: 'X', 2: '6', 3: 'X'}})
Suggestion for improvement would be appreciated!

np.where multiple return values

Using pandas and numpy I am trying to process a column in a dataframe, and want to create a new column with values relating to it. So if in column x the value 1 is present, in the new column it would be a, for value 2 it would be b etc
I can do this for single conditions, i.e
df['new_col'] = np.where(df['col_1'] == 1, a, n/a)
And I can find example of multiple conditions i.e if x = 3 or x = 4 the value should a, but not to do something like if x = 3 the value should be a and if x = 4 the value be c.
I tried simply running two lines of code such as :
df['new_col'] = np.where(df['col_1'] == 1, a, n/a)
df['new_col'] = np.where(df['col_1'] == 2, b, n/a)
But obviously the second line overwrites. Am I missing something crucial?
I think you can use loc:
df.loc[(df['col_1'] == 1, 'new_col')] = a
df.loc[(df['col_1'] == 2, 'new_col')] = b
Or:
df['new_col'] = np.where(df['col_1'] == 1, a, np.where(df['col_1'] == 2, b, np.nan))
Or numpy.select:
df['new_col'] = np.select([df['col_1'] == 1, df['col_1'] == 2],[a, b], default=np.nan)
Or use Series.map, if no match get NaN by default:
d = { 0 : 'a', 1 : 'b'}
df['new_col'] = df['col_1'].map(d)
I think numpy choose() is the best option for you.
import numpy as np
choices = 'abcde'
N = 10
np.random.seed(0)
data = np.random.randint(1, len(choices) + 1, size=N)
print(data)
print(np.choose(data - 1, choices))
Output:
[5 1 4 4 4 2 4 3 5 1]
['e' 'a' 'd' 'd' 'd' 'b' 'd' 'c' 'e' 'a']
you could define a dict with your desired transformations.
Then loop through the a DataFrame column and fill it.
There may a more elegant ways, but this will work:
# create a dummy DataFrame
df = pd.DataFrame( np.random.randint(2, size=(6,4)), columns=['col_1', 'col_2', 'col_3', 'col_4'], index=range(6) )
# create a dict with your desired substitutions:
swap_dict = { 0 : 'a',
1 : 'b',
999 : 'zzz', }
# introduce new column and fill with swapped information:
for i in df.index:
df.loc[i, 'new_col'] = swap_dict[ df.loc[i, 'col_1'] ]
print df
returns something like:
col_1 col_2 col_3 col_4 new_col
0 1 1 1 1 b
1 1 1 1 1 b
2 0 1 1 0 a
3 0 1 0 0 a
4 0 0 1 1 a
5 0 0 1 0 a
Use the pandas Series.map instead of where.
import pandas as pd
df = pd.DataFrame({'col_1' : [1,2,4,2]})
print(df)
def ab_ify(v):
if v == 1:
return 'a'
elif v == 2:
return 'b'
else:
return None
df['new_col'] = df['col_1'].map(ab_ify)
print(df)
# output:
#
# col_1
# 0 1
# 1 2
# 2 4
# 3 2
# col_1 new_col
# 0 1 a
# 1 2 b
# 2 4 None
# 3 2 b

Categories

Resources