How to apply ffill to 1? - python

I have a dataframe like below,
A B C D
0 1 0 0 0
1 0 1 0 0
2 0 1 0 0
3 0 0 1 0
I want to convert this into like this,
A B C D
0 1 0 0 0
1 1 1 0 0
2 1 1 0 0
3 1 1 1 0
so far I tried,
df= df.replace('0',np.NaN)
df=df.fillna(method='ffill').fillna('0')
my above code works fine,
But I think there is some other better approach to solve this problem,

Use cumsum with data converted to numeric and then replace by DataFrame.mask:
df = df.mask(df.astype(int).cumsum() >= 1, '1')
print (df)
A B C D
0 1 0 0 0
1 1 1 0 0
2 1 1 0 0
3 1 1 1 0
Detail:
print (df.astype(int).cumsum())
A B C D
0 1 0 0 0
1 1 1 0 0
2 1 2 0 0
3 1 2 1 0
Or same principe in numpy with numpy.where:
arr = df.values.astype(int)
df = pd.DataFrame(np.where(np.cumsum(arr, axis=0) >= 1, '1', '0'),
index=df.index,
columns= df.columns)
print (df)
A B C D
0 1 0 0 0
1 1 1 0 0
2 1 1 0 0
3 1 1 1 0

Related

How to create new column based on value in a set of columns

I have a pandas df like this:
time a b c
1 0 1 0
1 0 1 0
1 1 0 0
1 0 1 0
1 0 0 1
1 0 0 0
I want to create a new column, df.code based on the following logic:
if df.a == 1, return 4
if df.b == 1, return 2
if df.c == 1, return 1
if a,b, or c != 1, return 0
time a b c code
1 0 1 0 2
1 0 1 0 2
1 1 0 0 4
1 0 1 0 2
1 0 0 1 1
1 0 0 0 0
How do I do this? I'm essentially trying to compress select dummy columns into a multiclass columns.
We can dot the dataframe with the codes.
df['code'] = df[['a','b','c']].dot([4,2,1])
df
Output
time a b c code
0 1 0 1 0 2
1 1 0 1 0 2
2 1 1 0 0 4
3 1 0 1 0 2
4 1 0 0 1 1
5 1 0 0 0 0
This example should works as is:
stack.csv
time a b c
1 0 1 0
1 0 1 0
1 1 0 0
1 0 1 0
1 0 0 1
1 0 0 0
main.py
df = pd.read_csv('stack.csv', sep=' ', index_col=False)
df['code'] = 0
df.loc[df['a'] == 1, 'code'] = 4
df.loc[df['b'] == 1, 'code'] = 2
df.loc[df['c'] == 1, 'code'] = 1
print(df)
output:
time a b c code
0 1 0 1 0 2
1 1 0 1 0 2
2 1 1 0 0 4
3 1 0 1 0 2
4 1 0 0 1 1
5 1 0 0 0 0

Unravelling a DataFrame

I need to transform a df into antoher, being the original (df1) like this:
value
A--A 4
A--B 2
A--C 1
B--B 2
C--C 3
D--B 2
E--E 6
Then I have this other df2, filled with 0:
A B C D E
A 0 0 0 0 0
B 0 0 0 0 0
C 0 0 0 0 0
D 0 0 0 0 0
E 0 0 0 0 0
F 0 0 0 0 0
G 0 0 0 0 0
I need to convert it to a final df3, getting the values from the pairs in the index from df1, separted by "--", and fill it like this:
A B C D E
A 4 2 1 0 0
B 2 2 0 2 0
C 1 0 3 0 0
D 0 2 0 0 0
E 0 0 0 0 6
F 0 0 0 0 0
G 0 0 0 0 0
There can be pairs in pd2 not existing in pd1. It that case it remains with 0. Any suggestions??
You can create this from df itself. First, set df.index to a MultiIndex using str.split, and then unstack and reindex.
df.index = pd.MultiIndex.from_arrays(zip(*df.index.str.split('--')))
(df['value'].unstack()
.reindex(index=df2.index, columns=df2.columns)
.fillna(0, downcast='infer'))
A B C D E
A 4 2 1 0 0
B 0 2 0 0 0
C 0 0 3 0 0
D 0 2 0 0 0
E 0 0 0 0 6
F 0 0 0 0 0
G 0 0 0 0 0
If you know what rows and columns you want to use, you don't even need df2.
(df['value'].unstack()
.reindex(index=list('ABCDEFG'), columns=list('ABCDE'))
.fillna(0, downcast='infer'))
A B C D E
A 4 2 1 0 0
B 0 2 0 0 0
C 0 0 3 0 0
D 0 2 0 0 0
E 0 0 0 0 6
F 0 0 0 0 0
G 0 0 0 0 0
As per OP's comment, to maintain symmetricity, use pivot your table so NaNs are preserved, then fillna with the transpose:
v = (df['value'].unstack()
.reindex(index=df2.index, columns=df2.columns))
v.fillna(v.T.reindex_like(v)).fillna(0, downcast='infer')
A B C D E
A 4 2 1 0 0
B 2 2 0 2 0
C 1 0 3 0 0
D 0 2 0 0 0
E 0 0 0 0 6
F 0 0 0 0 0
G 0 0 0 0 0

is it possible to do the boolean in row by row in pandas?

I would like to 'OR' between row and row+1
for example,
A B C D E F G
r0 0 1 1 0 0 1 0
r1 0 0 0 0 0 0 0
r2 0 0 1 0 1 0 1
and the expected output will be like this
result 0 1 1 0 1 1
I know only how to sum it.
df.loc['result'] = df.sum()
but in this case i would like to do OR
thank you in advance
You can apply any over the first axis.
>>> df
>>>
A B C D E F G
r0 0 1 1 0 0 1 0
r1 0 0 0 0 0 0 0
r2 0 0 1 0 1 0 1
>>>
>>> df.loc['result'] = df.any(axis=0).astype(int)
>>> df
>>>
A B C D E F G
r0 0 1 1 0 0 1 0
r1 0 0 0 0 0 0 0
r2 0 0 1 0 1 0 1
result 0 1 1 0 1 1 1
... assuming that in your output you forgot the last column.

Is there a way to break a pandas column with categories to seperate true or false columns with the category name as the column name

I have a dataframe with the following column:
df = pd.DataFrame({"A": [1,2,1,2,2,2,0,1,0]})
and i want:
df2 = pd.DataFrame({"0": [0,0,0,0,0,0,1,0,1],"1": [1,0,1,0,0,0,0,1,0],"2": [0,1,0,1,1,1,0,0,0]})
is there an elegant way of doing this using a oneliner.
NOTE
I can do this using df['0'] = df['A'].apply(find_zeros)
I dont mind if 'A' is included in the final.
Use get_dummies:
df2 = pd.get_dummies(df.A)
print (df2)
0 1 2
0 0 1 0
1 0 0 1
2 0 1 0
3 0 0 1
4 0 0 1
5 0 0 1
6 1 0 0
7 0 1 0
8 1 0 0
In [50]: df.A.astype(str).str.get_dummies()
Out[50]:
0 1 2
0 0 1 0
1 0 0 1
2 0 1 0
3 0 0 1
4 0 0 1
5 0 0 1
6 1 0 0
7 0 1 0
8 1 0 0

Python pandas: add new columns based on the existed a column value, and set the value of new columns as 1 or 0

I have a dataframe named df as following:
ticker class_n
1 a
2 b
3 c
4 d
5 e
6 f
7 a
8 b
............................
I want to add new columns to this dataframe, the new columns names is the value of unique category of class_n(I mean no repeat class_n). Further, the value of new columns is 1 (if the value of class_n is same with column name), other is 0.
for example as the following dataframe. I want to get the new dataframe as following:
ticer class_n a b c d e f
1 a 1 0 0 0 0 0
2 b 0 1 0 0 0 0
3 c 0 0 1 0 0 0
4 d 0 0 0 1 0 0
5 e 0 0 0 0 1 0
6 f 0 0 0 0 0 1
7 a 1 0 0 0 0 0
8 b 0 1 0 0 0 0
My code is following:
lst_class = list(set(list(df['class_n'])))
for cla in lst_class:
df[c] = 0
df.loc[df['class_n'] is cla, cla] =1
but there is error:
KeyError: 'cannot use a single bool to index into setitem'
Thanks!
Use pd.get_dummies
df.join(pd.get_dummies(df.class_n))
ticker class_n a b c d e f
0 1 a 1 0 0 0 0 0
1 2 b 0 1 0 0 0 0
2 3 c 0 0 1 0 0 0
3 4 d 0 0 0 1 0 0
4 5 e 0 0 0 0 1 0
5 6 f 0 0 0 0 0 1
6 7 a 1 0 0 0 0 0
7 8 b 0 1 0 0 0 0
Or the same thing but a little more manually
f, u = pd.factorize(df.class_n.values)
d = pd.DataFrame(np.eye(u.size, dtype=int)[f], df.index, u)
df.join(d)
ticker class_n a b c d e f
0 1 a 1 0 0 0 0 0
1 2 b 0 1 0 0 0 0
2 3 c 0 0 1 0 0 0
3 4 d 0 0 0 1 0 0
4 5 e 0 0 0 0 1 0
5 6 f 0 0 0 0 0 1
6 7 a 1 0 0 0 0 0
7 8 b 0 1 0 0 0 0

Categories

Resources