Python - Pandas - create "first fail" column from other column data - python

I have a data frame that represents fail-data for a series of parts, showing which of 3 tests (A, B, C) pass (0) or fail (1).
A B C
1 0 1 1
2 0 0 0
3 1 0 0
4 0 0 1
5 0 0 0
6 0 1 0
7 1 1 0
8 1 1 1
I'd like to add a final column to the dataframe showing the First Fail (FF) of each part, or a default (P) if no fails.
A B C | FF
1 0 1 1 | B
2 0 0 0 | P
3 1 0 0 | A
4 0 0 1 | C
5 0 0 0 | P
6 0 1 0 | B
7 1 1 0 | A
8 1 1 1 | A
Any easy way to do this pandas? Does it require iterating over each row?

maybe:
>>> df['FF'] = df.dot(df.columns).str.slice(0, 1).replace('', 'P')
>>> df
A B C FF
1 0 1 1 B
2 0 0 0 P
3 1 0 0 A
4 0 0 1 C
5 0 0 0 P
6 0 1 0 B
7 1 1 0 A
8 1 1 1 A
alternatively:
>>> df['FF'] = np.where(df.any(axis=1), df.idxmax(axis=1), 'P')
>>> df
A B C FF
1 0 1 1 B
2 0 0 0 P
3 1 0 0 A
4 0 0 1 C
5 0 0 0 P
6 0 1 0 B
7 1 1 0 A
8 1 1 1 A

Related

is there any way to convert the columns in Pandas Dataframe using its mirror image Dataframe structure

the df I have is :
0 1 2
0 0 0 0
1 0 0 1
2 0 1 0
3 0 1 1
4 1 0 0
5 1 0 1
6 1 1 0
7 1 1 1
I wanted to obtain a Dataframe with columns reversed/mirror image :
0 1 2
0 0 0 0
1 1 0 0
2 0 1 0
3 1 1 0
4 0 0 1
5 1 0 1
6 0 1 1
7 1 1 1
Is there any way to do that
You can check
df[:] = df.iloc[:,::-1]
df
Out[959]:
0 1 2
0 0 0 0
1 1 0 0
2 0 1 0
3 1 1 0
4 0 0 1
5 1 0 1
6 0 1 1
7 1 1 1
Here is a bit more verbose, but likely more efficient solution as it doesn't require to rewrite the data. It only renames and reorders the columns:
cols = df.columns
df.columns = df.columns[::-1]
df = df.loc[:,cols]
Or shorter variant:
df = df.iloc[:,::-1].set_axis(df.columns, axis=1)
Output:
0 1 2
0 0 0 0
1 1 0 0
2 0 1 0
3 1 1 0
4 0 0 1
5 1 0 1
6 0 1 1
7 1 1 1
There are other ways, but here's one solution:
df[df.columns] = df[reversed(df.columns)]
Output:
0 1 2
0 0 0 0
1 1 0 0
2 0 1 0
3 1 1 0
4 0 0 1
5 1 0 1
6 0 1 1
7 1 1 1

How to one-hot-encode matrix of sentences at the character level?

There is a dataframe:
0 1 2 3
0 a c e NaN
1 b d NaN NaN
2 b c NaN NaN
3 a b c d
4 a b NaN NaN
5 b c NaN NaN
6 a b NaN NaN
7 a b c e
8 a b c NaN
9 a c e NaN
I would like to transfrom encode it with one-hot like this
a c e b d
0 1 1 1 0 0
1 0 0 0 1 1
2 0 1 0 1 0
3 1 1 0 1 1
4 1 0 0 1 0
5 0 1 0 1 0
6 1 0 0 1 0
7 1 1 1 1 0
8 1 1 0 1 0
9 1 1 1 0 0
pd.get_dummies does not work here, because it acutually encode each columns independently. How can I get this? Btw, the order of the columns doesn't matter.
Try this:
df.stack().str.get_dummies().max(level=0)
Out[129]:
a b c d e
0 1 0 1 0 1
1 0 1 0 1 0
2 0 1 1 0 0
3 1 1 1 1 0
4 1 1 0 0 0
5 0 1 1 0 0
6 1 1 0 0 0
7 1 1 1 0 1
8 1 1 1 0 0
9 1 0 1 0 1
One way using str.join and str.get_dummies:
one_hot = df1.apply(lambda x: "|".join([i for i in x if pd.notna(i)]), 1).str.get_dummies()
print(one_hot)
Output:
a b c d e
0 1 0 1 0 1
1 0 1 0 1 0
2 0 1 1 0 0
3 1 1 1 1 0
4 1 1 0 0 0
5 0 1 1 0 0
6 1 1 0 0 0
7 1 1 1 0 1
8 1 1 1 0 0
9 1 0 1 0 1

Loop column names in get_dummies for pandas?

For pandas I have written the code below in order to convert all categorical features. However after I run it on my data set and check data types, nothing changes.
Thank you in advance.
Code:
def dummy_conv(data):
names=data.select_dtypes(exclude=['number']).columns
for c in names:
data=pd.get_dummies(data,columns=[c],drop_first=True)
dummy_conv(data_train)
data_train.dtypes # object features are not converted
Looping is not necessary, filter by list of columns, also not forget for return:
data_train = pd.DataFrame({'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')})
print (data_train)
A B C D E F
0 a 4 7 1 5 a
1 b 5 8 3 3 a
2 c 4 9 5 6 a
3 d 5 4 7 9 b
4 e 5 2 1 2 b
5 f 4 3 0 4 b
def dummy_conv(data):
names=data.select_dtypes(exclude=['number']).columns
return pd.get_dummies(data[names], drop_first=True)
df = dummy_conv(data_train)
print (df)
A_b A_c A_d A_e A_f F_b
0 0 0 0 0 0 0
1 1 0 0 0 0 0
2 0 1 0 0 0 0
3 0 0 1 0 0 1
4 0 0 0 1 0 1
5 0 0 0 0 1 1
If want convert only non numeric columns:
def dummy_conv(data):
return pd.get_dummies(data,drop_first=True)
#same output like
#names=data.select_dtypes(exclude=['number']).columns
#return pd.get_dummies(data,columns=names,drop_first=True)
df = dummy_conv(data_train)
print (df)
B C D E A_b A_c A_d A_e A_f F_b
0 4 7 1 5 0 0 0 0 0 0
1 5 8 3 3 1 0 0 0 0 0
2 4 9 5 6 0 1 0 0 0 0
3 5 4 7 9 0 0 1 0 0 1
4 5 2 1 2 0 0 0 1 0 1
5 4 3 0 4 0 0 0 0 1 1

Python pandas: add new columns based on the existed a column value, and set the value of new columns as 1 or 0

I have a dataframe named df as following:
ticker class_n
1 a
2 b
3 c
4 d
5 e
6 f
7 a
8 b
............................
I want to add new columns to this dataframe, the new columns names is the value of unique category of class_n(I mean no repeat class_n). Further, the value of new columns is 1 (if the value of class_n is same with column name), other is 0.
for example as the following dataframe. I want to get the new dataframe as following:
ticer class_n a b c d e f
1 a 1 0 0 0 0 0
2 b 0 1 0 0 0 0
3 c 0 0 1 0 0 0
4 d 0 0 0 1 0 0
5 e 0 0 0 0 1 0
6 f 0 0 0 0 0 1
7 a 1 0 0 0 0 0
8 b 0 1 0 0 0 0
My code is following:
lst_class = list(set(list(df['class_n'])))
for cla in lst_class:
df[c] = 0
df.loc[df['class_n'] is cla, cla] =1
but there is error:
KeyError: 'cannot use a single bool to index into setitem'
Thanks!
Use pd.get_dummies
df.join(pd.get_dummies(df.class_n))
ticker class_n a b c d e f
0 1 a 1 0 0 0 0 0
1 2 b 0 1 0 0 0 0
2 3 c 0 0 1 0 0 0
3 4 d 0 0 0 1 0 0
4 5 e 0 0 0 0 1 0
5 6 f 0 0 0 0 0 1
6 7 a 1 0 0 0 0 0
7 8 b 0 1 0 0 0 0
Or the same thing but a little more manually
f, u = pd.factorize(df.class_n.values)
d = pd.DataFrame(np.eye(u.size, dtype=int)[f], df.index, u)
df.join(d)
ticker class_n a b c d e f
0 1 a 1 0 0 0 0 0
1 2 b 0 1 0 0 0 0
2 3 c 0 0 1 0 0 0
3 4 d 0 0 0 1 0 0
4 5 e 0 0 0 0 1 0
5 6 f 0 0 0 0 0 1
6 7 a 1 0 0 0 0 0
7 8 b 0 1 0 0 0 0

Sequence number groupby ID with reset

I'am looking for a way to générate a sequence of numbers that reset on every break
Example
ID VAR
A 0
A 0
A 1
A 1
A 0
A 0
A 1
A 1
B 1
B 1
B 1
B 0
B 0
B 0
B 0
Each time var is at 1 and ID the same as before, you start the counter.
but if ID is not the same or VAR is 0 you start again from 0
Desired output
ID VAR DESIRED
A 0 0
A 0 0
A 1 1
A 1 2
A 0 0
A 0 0
A 1 1
A 1 2
B 1 1
B 1 2
B 1 3
B 0 0
B 0 0
B 0 0
B 0 0
You can create an intermediate index, and then groupby this index and ID, cumsumming up on VAR:
df['ix'] = df['VAR'].diff().fillna(0).abs().cumsum()
df['DESIRED'] = df.groupby(['ID','ix'])['VAR'].cumsum()
In [21]: df
Out[21]:
ID VAR ix DESIRED
0 A 0 0 0
1 A 0 0 0
2 A 1 1 1
3 A 1 1 2
4 A 0 2 0
5 A 0 2 0
6 A 1 3 1
7 A 1 3 2
8 B 1 3 1
9 B 1 3 2
10 B 1 3 3
11 B 0 4 0
12 B 0 4 0
13 B 0 4 0
14 B 0 4 0

Categories

Resources