Find first 'True' value in blocks in pandas data frame - python

I have a dataframe, where one column contains only True or False values in blocks. For example:
df =
b
0 False
1 True
2 True
3 False
4 True
5 True
6 True
7 True
8 False
9 False
10 False
11 False
12 False
13 True
14 True
15 True
I need to find the beginning of block with True:
>> find_first_true(df)
>> array([1, 4, 13])
Any elegant solutions?
EDIT
Thanks for the proposed solution. I am wondering, what's the easiest way to extract blocks of a certain length, starting from the indices I found?
For example, I need to take blocks (number of rows) of length 4 before the indices. So, if my indices (found previously)
index = array([1, 4, 13])
then I need blocks:
[df.loc[0:4], df.loc[9:13]]
or
b
0 False
1 True
2 True
3 False
4 True
9 False
10 False
11 False
12 False
13 True
I am looping over indices, but wonder about more pandasian solution

In [2]: df = pd.read_clipboard()
In [3]: df
Out[3]:
b
0 False
1 True
2 True
3 False
4 True
5 True
6 True
7 True
8 False
9 False
10 False
11 False
12 False
13 True
14 True
15 True
In [11]: np.where(((df.b != df.b.shift(1)) & df.b).values)[0]
Out[11]: array([ 1, 4, 13], dtype=int64)

def find_first_true(df):
#finds indexes of true elements
a = list(map(lambda e: e[0] + 1 if e[1] else 0, enumerate(df)))
a = list(filter(bool, a))
a = list(map(lambda x: x - 1, a))
#removes consecutive elements
ta = [a[0]] + list(filter(lambda x: a[x] - a[x-1] != 1, range(1, len(a))))
a = list(map(lambda x: a[x], ta))
return a

find_first = []
for i in range(len(df)):
if (df.loc[i, 'b'] == False and df.loc[i+1, 'b'] == True):
find_first.append(i+1)

Related

Count occurrences of stings in a row Pandas

I'm trying to count the number of instances of a certain sting in a row in a pandas dataframe.
In the example here I utilized a lambda function and pandas .count() to try and count the number of times 'True' exists in each row.
Though instead of a count of 'True' it is just returning a boolean whether or not it exists in the row...
#create dataframe
d = {'Period': [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4],
'Result': ['True','None','False','True','False','True','False','True','True','False','False','True','False','True','False','False'],
'Result1': ['True','None','False','True','False','True','False','True','True','False','False','True','False','True','False','False'],
'Result2': ['True','None','False','True','False','True','False','True','True','False','False','True','False','True','False','False']}
df = pd.DataFrame(data=d)
#count instances of Trus or False in each row
df['Count'] = df.apply(lambda row: row.astype(str).str.count('True').any(), axis=1)
print(df)
The desired outcome is:
Period Result Result1 Result2 Count
1 True True True 3
2 None None None 0
3 False False False 0
4 True True True 3
1 False False False 0
2 True True True 3
3 False False False 0
... ... ... ... ......
You can use np.where:
df['count'] = np.where(df == 'True', 1, 0).sum(axis=1)
Regarding why your apply returns a boolean: both any and all
returns boolean, not numbers
Edit: You can include df.isin for multiple conditions:
df['count'] = np.where(df.isin(['True', 'False']), 1, 0).sum(axis=1)
Use eq with sum:
df.eq("True").sum(axis=1)
Use apply with lambda function.
df.apply(lambda x: x.eq("True").sum(), axis=1)
For more than 1 text matching try
df.iloc[:,1:].apply(lambda x: x.str.contains("True|False")).sum(axis=1)
Avoiding using the apply function, as it can be slow:
df[["Result", "Result1", "Result2"]].sum(axis=1).str.count("True")
This also will work for when you have strings that are like:
"this sentence contains True"
Your lambda is not working correctly, try this:
import pandas as pd
#create dataframe
d = {'Period': [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4],
'Result': ['True','None','False','True','False','True','False','True','True','False','False','True','False','True','False','False'],
'Result1': ['True','None','False','True','False','True','False','True','True','False','False','True','False','True','False','False'],
'Result2': ['True','None','False','True','False','True','False','True','True','False','False','True','False','True','False','False']}
df = pd.DataFrame(data=d)
#count instances of Trues or False in each row
df['Count'] = df.apply(lambda row: sum(row[1:4] == 'True') ,axis=1)
print(df)
# Output:
# >> Period Result Result1 Result2 Count
# >> 0 1 True True True 3
# >> 1 2 None None None 0
# >> 2 3 False False False 0
# >> 3 4 True True True 3
# >> 4 1 False False False 0
# >> 5 2 True True True 3
# >> 6 3 False False False 0
# >> 7 4 True True True 3
# >> 8 1 True True True 3
# >> 9 2 False False False 0
# >> 10 3 False False False 0
# >> 11 4 True True True 3
# >> 12 1 False False False 0
# >> 13 2 True True True 3
# >> 14 3 False False False 0
# >> 15 4 False False False 0

find groups of neighboring True in pandas series

I have a series with True and False and need to find all groups of True.
This means that I need to find the start index and end index of neighboring Truevalues.
The following code gives the intended result but is very slow, inefficient and clumsy.
import pandas as pd
def groups(ser):
g = []
flag = False
start = None
for idx, s in ser.items():
if flag and not s:
g.append((start, idx-1))
flag = False
elif not flag and s:
start = idx
flag = True
if flag:
g.append((start, idx))
return g
if __name__ == "__main__":
ser = pd.Series([1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1], dtype=bool)
print(ser)
g = groups(ser)
print("\ngroups of True:")
for start, end in g:
print("from {} until {}".format(start, end))
pass
output is:
0 True
1 True
2 False
3 False
4 True
5 False
6 False
7 True
8 True
9 True
10 True
11 False
12 True
13 False
14 True
groups of True:
from 0 until 1
from 4 until 4
from 7 until 10
from 12 until 12
from 14 until 14
There are similar questions out there but non is looking to find the indices of the group starts/ends.
Label contiguous groups of True elements within a pandas Series
Streaks of True or False in pandas Series
It's common to use cumsum on the negation to check for consecutive blocks. For example:
for _,x in s[s].groupby((1-s).cumsum()):
print(f'from {x.index[0]} to {x.index[-1]}')
Output:
from 0 to 1
from 4 to 4
from 7 to 10
from 12 to 12
from 14 to 14
You can use itertools:
In [478]: from operator import itemgetter
...: from itertools import groupby
In [489]: a = ser[ser].index.tolist() # Create a list of indexes having `True` in `ser`
In [498]: for k, g in groupby(enumerate(a), lambda ix : ix[0] - ix[1]):
...: l = list(map(itemgetter(1), g))
...: print(f'from {l[0]} to {l[-1]}')
...:
from 0 to 1
from 4 to 4
from 7 to 10
from 12 to 12
from 14 to 14

In python, how to shift and fill with a specific values for all the shifted rows in DataFrame?

I have a following dataframe.
y = pd.DataFrame(np.zeros((10,1), dtype = 'bool'), columns = ['A'])
y.iloc[[3,5], 0] = True
A
0 False
1 False
2 False
3 True
4 False
5 True
6 False
7 False
8 False
9 False
And I want to make 'True' for the next three rows from where 'True' is found in the above dataframe. The expected results is shown in the below.
A
0 False
1 False
2 False
3 True
4 True
5 True
6 True
7 True
8 False
9 False
I can do that in the following way, but I wonder if there is a smarter way to do so.
y['B'] = y['A'].shift()
y['C'] = y['B'].shift()
y['D'] = y.any(axis = 1)
y['A'] = y['D']
y = y['A']
Thank you for the help in advance.
I use parameter limit in forward filling missing values with replace False to missing values and last replace NaNs to False:
y.A = y.A.replace(False, np.nan).ffill(limit=2).fillna(False)
print (y)
A
0 False
1 False
2 False
3 True
4 True
5 True
6 True
7 True
8 False
9 False
Another idea with Rolling.apply and any for test at least one True per window:
y.A = y.A.rolling(3, min_periods=1).apply(lambda x: x.any()).astype(bool)

identify blocks of consecutive True values with tolerance

I have a boolean pandas DataFrame:
w=pd.DataFrame(data=[True,False,True,True,True,False,False,True,False,True,True,False,True])
I am trying to identify the blocks of True values, which are long at least N:
I can do that (as suggested elsewere on SO) by
N=3.0
b = w.ne(w.shift()).cumsum() *w
m = b[0].map(b[0].mask(b[0] == 0).value_counts()) >= N
which works fine and returns
m
0 False
1 False
2 True
3 True
4 True
5 False
6 False
7 False
8 False
9 False
10 False
11 False
12 False
Now, I need to do the same buyt allow for some tolerance in determining the blocks. so I would like to identify all the blocks long at least N, but allowing for M values (arbitrarly placed within the block) to be False.
For the example w, N=3, and M=1 it should be,
w
0 True
1 False
2 True
3 True
4 True
5 False
6 False
7 True
8 False
9 True
10 True
11 False
12 True
differently from previous results at:
desidered=
0 **True**
1 **True**
2 True
3 True
4 True
5 False
6 False
7 True
8 ** True **
9 True
10 True
11 **True**
12 True
I believe you can re-use solution with inverting m by ~ and last chain both conditions by or :
N = 3.0
M = 1
b = w.ne(w.shift()).cumsum() *w
m = b[0].map(b[0].mask(b[0] == 0).value_counts()) <= N
w1 = ~m
b1 = w1.ne(w1.shift()).cumsum() * w1
m1 = b1.map(b1.mask(b1 == 0).value_counts()) == M
m = m | m1
print (m)
0 True
1 True
2 True
3 True
4 True
5 False
6 False
7 True
8 True
9 True
10 True
11 True
12 True
Name: 0, dtype: bool

Whats the fastest way to loop through a DataFrame and count occurrences within the DataFrame whilst some condition is fulfilled (in Python)?

I have a dataframe with two Boolean fields (as below).
import pandas as pd
d = [{'a1':False, 'a2':False}, {'a1':True, 'a2':False}, {'a1':True, 'a2':False}, {'a1':False, 'a2':False}, {'a1':False, 'a2':True},
{'a1': False, 'a2': False}, {'a1':False, 'a2':False}, {'a1':True, 'a2':False}, {'a1':False, 'a2':True}, {'a1':False, 'a2':False},]
df = pd.DataFrame(d)
df
Out[1]:
a1 a2
0 False False
1 True False
2 True False
3 False False
4 False True
5 False False
6 False False
7 True False
8 False True
9 False False
I am trying to find the fastest and most "Pythonic" way of achieving the following:
If a1==True, count instances from current row where a2==False (e.g. row 1: a1=True, a2 is False for three rows from row 1)
At first instance of a2==True, stop counting (e.g. row 4, count = 3)
Set value of 'count' to new df column 'a3' on row where counting began (e.g. 'a3' = 3 on row 1)
Target result set as follows.
a1 a2 a3
0 False False 0
1 True False 3
2 True False 2
3 False False 0
4 False True 0
5 False False 0
6 False False 0
7 True False 1
8 False True 0
9 False False 0
I have been trying to accomplish this using for loops, iterrows and while loops and so far haven't been able to produce a good nested combination which provides the results I want. Any help appreciated. I apologize if the problem is not totally clear.
How about this:
df['a3'] = df.apply(lambda x: 0 if not x.a1 else len(df.a2[x.name:df.a2.tolist()[x.name:].index(True)+x.name]), axis=1)
So, if a1 is False write 0 else write the length of list that goes from that row until next True.
This will do the trick:
df['a3'] = 0
# loop throught every value of 'a1'
for i in xrange(df['a1'].__len__()):
# if 'a1' at position i is 'True'...
if df['a1'][i] == True:
count = 0
# loop over the remaining items in 'a2'
# remaining: __len__() - i
# i: position of 'True' value in 'a1'
for j in xrange(df['a2'].__len__() - i):
# if the value of 'a2' is 'False'...
if df['a2'][j + i] == False:
# count the occurances of 'False' values in a row...
count += 1
else:
# ... if it's not 'False' break the loop
break
# write the number of occurances on the right position (i) in 'a3'
df['a3'][i] = count
and produce the following output:
a1 a2 a3
0 False False 0
1 True False 3
2 True False 2
3 False False 0
4 False True 0
5 False False 0
6 False False 0
7 True False 1
8 False True 0
9 False False 0
Edit: added comments in the code

Categories

Resources