Shift boolean horizontally in pandas dataframe - python

Say I have a dataframe of booleans, called original:
original = pd.DataFrame([
[True, False, False, True, False],
[False, True, False, False, False]
])
0 1 2 3 4
0 True False False True False
1 False True False False False
And I want to create the following boolean dataframe (all to the right of a True should now be True):
0 1 2 3 4
0 False True True True True
1 False False True True True
I've accomplished this as follows, but was wondering if anyone had a less cumbersome method:
original.shift(axis=1).fillna(False).astype(int) \
.T.replace(to_replace=0, method='ffill').T.astype(bool)

cummax
original.cummax(1).shift(axis=1).fillna(False)
0 1 2 3 4
0 False True True True True
1 False False True True True

IIUC
original[original].shift(1,axis=1).ffill(1).fillna(0).astype(bool)
Out[77]:
0 1 2 3 4
0 False True True True True
1 False False True True True

Related

How to satisfy the condition of 2 columns different rows at the same time

My logic is like this:
cond2 column is true before expected column, and cond1 column is true before cond2 column, then expected column can be true
input
import pandas as pd
import numpy as np
d={'cond1':[False,False,True,False,False,False,False,True,False,False],'cond2':[False,True,False,True,True,False,False,False,True,False]}
df = pd.DataFrame(d)
expected result table
cond1 cond2 expected
0 FALSE FALSE
1 FALSE TRUE
2 TRUE FALSE
3 FALSE TRUE
4 FALSE TRUE
5 FALSE FALSE TRUE
6 FALSE FALSE TRUE
7 TRUE FALSE
8 FALSE TRUE
9 FALSE FALSE TRUE
I have such an idea
get the number of lines from cond1 is true to the present, and then use the cumsum function to calculate the number of lines where cond2 is true is greater than 0
But how to get the number of lines from cond1 is true to the present
The description is not fully clear. It looks like you need a cummax per group starting with True in cond1:
m = df.groupby(df['cond1'].cumsum())['cond2'].cummax()
df['expected'] = df['cond2'].ne(m)
Output:
cond1 cond2 expected
0 False False False
1 False True False
2 True False False
3 False True False
4 False True False
5 False False True
6 False False True
7 True False False
8 False True False
9 False False True
It's not very clear what you're looking for~
df['expected'] = ((df.index > df.idxmax().max())
& ~df.any(axis=1))
# Output:
cond1 cond2 expected
0 False False False
1 False True False
2 True False False
3 False True False
4 False True False
5 False False True
6 False False True
7 True False False
8 False True False
9 False False True

Selecting Dataframe row from a condition to another condition

I have a dataframe with two columns:
A B
0 False False
1 False False
2 False False
3 True False
4 False False
5 False False
6 False True
7 False False
8 False False
9 False False
10 True False
11 False False
12 False False
I would like to create a new column "C" with Boolean values, that turns on (=True) each time B turns on and turns of each time A turns on (ex: here between index 6 to index 10)
Ex: for this df, the output will be:
A B C
0 False False False
1 False False False
2 False False False
3 True False False
4 False False False
5 False False False
6 False True True
7 False False True
8 False False True
9 False False True
10 True False True
11 False False False
12 False False False
I wrote this code with a for loop and a "switch", but I'm pretty sure there will be faster and easier solution to do the same thing for large dataframes. I appreciate your help.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A': [False,False,False,True,False,False,False,False,False,False,True,False,False],
'B': [False,False,False,False,False,False,True,False,False,False,False,False,False]
})
df["C"]=0
switch=False
for i in df.index :
if df.B.iloc[i]:
switch=True
if switch:
df.C.iloc[i]=True
else:
df.C.iloc[i]=False
if df.A.iloc[i]:
switch=False
print(df)
Alternative approach using ffill
df.loc[df['A'],'C'] = False
df.loc[df['B'],'C'] = True
df['C'] = df['C'].ffill().fillna(False) #start "off"
Combine the two columns, subtract 1, filter out negative and even numbers:
x = (df['A'] | df['B']).cumsum().sub(1)
df['C'] = (x >= 0) & (x % 2 == 1)
Output:
>>> df
A B C
0 False False False
1 False False False
2 False False False
3 True False False
4 False False False
5 False False False
6 False True True <
7 False False True <
8 False False True <
9 False False True <
10 True False False
11 False False False
12 False False False

Is there a way to select interior True values for portions of a DataFrame?

I have a DataFrame that looks like the following:
df = pd.DataFrame({'a':[True]*5+[False]*5+[True]*5,'b':[False]+[True]*3+[False]+[True]*5+[False]*4+[True]})
a b
0 True False
1 True True
2 True True
3 True True
4 True False
5 False True
6 False True
7 False True
8 False True
9 False True
10 True False
11 True False
12 True False
13 True False
14 True False
How can I select blocks where column a is True only when the interior values over the same rows for column b are True?
I know that I could find break apart the DataFrame into consecutive True regions, and apply a function to each DataFrame chunk, but this is for a much larger problem with 10 million+ rows, and I don't think such a solution would scale up very well.
My expected output would be the following:
a b c
0 True False True
1 True True True
2 True True True
3 True True True
4 True False True
5 False True False
6 False True False
7 False True False
8 False True False
9 False True False
10 True False False
11 True False False
12 True False False
13 True False False
14 True True False
You can do a groupby on the a values and then look at the b values in a function, like this:
groupby_consec_a = df.groupby(df.a.diff().ne(0).cumsum())
all_interior = lambda x: x.iloc[1:-1].all()
df['c'] = df.a & groupby_consec_a.b.transform(all_interior)
Try out whether it's fast enough on your data. If not, the lambda will have to be replaced by pandas functions, but that will be more code.

Pandas get one hot encodings from a column as booleans

I'm considering a Pandas Dataframe. I would like to find an efficient way in which the second Dataframe is created.
import pandas as pd
data = {"column":[0,1,2,0,1,2,0]}
df = pd.DataFrame(data)
column
0
1
2
0
1
2
0
column0 column1 column2
true false false
false true false
false false true
true false false
false true false
false false true
true false false
This is a get_dummies problem, but you will additionally need to specify dtype=bool to get columns of bools:
pd.get_dummies(df['column'], dtype=bool)
0 1 2
0 True False False
1 False True False
2 False False True
3 True False False
4 False True False
5 False False True
6 True False False
pd.get_dummies(df['column'], dtype=bool).dtypes
0 bool
1 bool
2 bool
dtype: object
# carbon copy of expected output
dummies = pd.get_dummies(df['column'], dtype=bool)
dummies[:] = np.where(pd.get_dummies(df['column'], dtype=bool), 'true', 'false')
dummies.add_prefix('column')
column0 column1 column2
0 true false false
1 false true false
2 false false true
3 true false false
4 false true false
5 false false true
6 true false false
I also use get_dummies as cs95. However, I use str.get_dummies and concat the word column before get_dummies. Finally, replace
('column'+df.column.astype(str)).str.get_dummies().replace({1:'true', 0:'false'})
Out[2164]:
column0 column1 column2
0 true false false
1 false true false
2 false false true
3 true false false
4 false true false
5 false false true
6 true false false
factorize and slice assignment
i, u = pd.factorize(df.column)
a = np.empty((len(i), len(u)), '<U5')
a.fill('false')
a[np.arange(len(i)), i] = 'true'
pd.DataFrame(a).add_prefix('column')
column0 column1 column2
0 true false false
1 false true false
2 false false true
3 true false false
4 false true false
5 false false true
6 true false false

Broadcasting boolean operations in Pandas

Lets say I have an NxM boolean dataframe X and an Nx1 boolean dataframe Y. I would like to perform a boolean operation on each column returning a new dataframe that is NxM. For example:
x = pd.DataFrame([[True, True, True], [True, False, True], [False, False, True]])
y = pd.DataFrame([[False], [True], [True]])
I would like x & y to return:
0 1 2
0 False False False
1 True False True
2 False False True
But instead it returns:
0 1 2
0 False NaN NaN
1 True NaN NaN
2 False NaN NaN
Instead treating y as a series with
x & y[0]
gives:
0 1 2
0 False True True
1 False False True
2 False False True
Which appears to be broadcasting by row. Is there a correct way to do this other than transposing applying the operation with the Series and than untransposing?
(x.T & y[0]).T
0 1 2
0 False False False
1 True False True
2 False False True
It seems that that fails when the row index is not the same as the column labels
You could call apply and pass a lambda and call squeeze to flatten the Series into a 1-D array:
In [152]:
x.apply(lambda s: s & y.squeeze())
Out[152]:
0 1 2
0 False False False
1 True False True
2 False False True
I'm not sure if this is quicker though, here we're applying the mask column-wise by calling apply on the df which is why transposing is unnecessary
Actually you could use np.logical_and:
In [156]:
np.logical_and(x,y)
Out[156]:
0 1 2
0 False False False
1 True False True
2 False False True

Categories

Resources