Python dataframe : Add a column that increments when another column changes

Python dataframe : Add a column that increments when another column changes - python

I have a dataframe as follows
df = pd.DataFrame({
'Values' : [False, False, True, False, False, True, True, False, False, True]
})
df
Values
0 False
1 False
2 True
3 False
4 False
5 True
6 True
7 False
8 False
9 True
I would like to add another column named 'count' which increment by one whenever a True detects in 'Values' column
My expected output is as follows
Values Count
0 False 0
1 False 0
2 True 1
3 False 1
4 False 1
5 True 2
6 True 3
7 False 3
8 False 3
9 True 4
Now I am doing as follows
counter = [0]
def handleOneRow(row):
if row['Values'] == True:
counter[0] = counter[0] + 1
return counter[0]
df['count'] = df.apply(lambda x : handleOneRow(x), axis=1)
Is there any other simple way in dataframe ?

Related

Find a row meeting conditions at most in n next rows in pandas

I have a pandas dataframe like this:
close low max_close higher_low
0 2 1 True False
1 3 4 False True
2 1 2 True False
3 0 3 False False
4 5 2 False True
5 4 5 False True
6 3 3 True False
7 6 7 False True
and could be created with the code:
import pandas as pd
df = pd.DataFrame(
{
'close': [2, 3, 1, 0, 5, 4, 3, 6],
'low': [1, 4, 2, 3, 2, 5, 3, 7],
'max_close': [True, False, True, False, False, False, True, False],
'higher_low': [False, True, False, False, True, True, False, True]
}
)
For any row with a True value in the max_close column, I want to find the first row in the next rows where the value in the higher_low column is True and the value in the low column is greater than the value in the close column and also this row must be at most in the next 2 rows after the row where the value in max_close column was True.
So the output should be :
close low max_close higher_low
1 3 4 False True
7 6 7 False True
(Index 4 is not in the output because in this row: low < close. Also, index 5 is not in the output because it's three rows after index 2, while we have a condition that it should be at most in the next 2 rows.)
Also, it's my priority not to use any for-loops in the code.
Have you any idea about this?

Use -
lookup = 2
indices = []
for i in range(1, lookup+1):
if i > 1:
tmp = df.loc[(df[df['max_close']].loc[:-(i-1)].index+i)]
else:
tmp = df.loc[(df[df['max_close']].index+i)]
tmp_ind = tmp[(tmp['higher_low']) & (tmp['low']>tmp['close'])].index
indices += tmp_ind.tolist()
df.loc[set(indices)]
Output
close low max_close higher_low
1 3 4 False True
7 6 7 False True

Create virtual groups from max_close column then keep the 3 first rows (1 row for max_close and the 2 following). Finally, filter out on your 2 conditions:
out = (df.groupby(df['max_close'].cumsum()).head(3)
.query("higher_low & (close < low)"))
print(out)
# Output
close low max_close higher_low
1 3 4 False True
7 6 7 False True

Count number of consecutive True in column, restart when False

I work with the following column in a pandas df:
A
True
True
True
False
True
True
I want to add column B that counts the number of consecutive "True" in A. I want to restart everytime a "False" comes up. Desired output:
A B
True 1
True 2
True 3
False 0
True 1
True 2

Using cumsum identify the blocks of rows where the values in column A stays True, then group the column A on these blocks and calculate cumulative sum to assign ordinal numbers
df['B'] = df['A'].groupby((~df['A']).cumsum()).cumsum()
A B
0 True 1
1 True 2
2 True 3
3 False 0
4 True 1
5 True 2

Using a simple & native approach
(For a small code sample it worked fine)
import pandas as pd
df = pd.DataFrame({'A': [True, False, True, True, True, False, True, True]})
class ToNums:
counter = 0
#staticmethod
def convert(bool_val):
if bool_val:
ToNums.counter += 1
else:
ToNums.counter = 0
return ToNums.counter
df['B'] = df.A.map(ToNums.convert)
df
A B
0 True 1
1 False 0
2 True 1
3 True 2
4 True 3
5 False 0
6 True 1
7 True 2

Here's an example
v=0
for i,val in enumerate(df['A']):
if val =="True":
df.loc[i,"C"]= v =v+1
else:
df.loc[i,"C"]=v=0
df.head()
This will give the desired output
A C
0 True 1
1 True 2
2 True 3
3 False 0
4 True 1

You can use a combination of groupby, cumsum, and cumcount
df['B'] = (df.groupby((df['A']&
~df['A'].shift(1).fillna(False) # row is True and next is False
)
.cumsum() # make group id
)
.cumcount().add(1) # make cumulated count
*df['A'] # multiply by 0 where initially False, 1 otherwise
)
output:
A B
0 True 1
1 True 2
2 True 3
3 False 0
4 True 1
5 True 2

Shift boolean horizontally in pandas dataframe

Say I have a dataframe of booleans, called original:
original = pd.DataFrame([
[True, False, False, True, False],
[False, True, False, False, False]
])
0 1 2 3 4
0 True False False True False
1 False True False False False
And I want to create the following boolean dataframe (all to the right of a True should now be True):
0 1 2 3 4
0 False True True True True
1 False False True True True
I've accomplished this as follows, but was wondering if anyone had a less cumbersome method:
original.shift(axis=1).fillna(False).astype(int) \
.T.replace(to_replace=0, method='ffill').T.astype(bool)

cummax
original.cummax(1).shift(axis=1).fillna(False)
0 1 2 3 4
0 False True True True True
1 False False True True True

IIUC
original[original].shift(1,axis=1).ffill(1).fillna(0).astype(bool)
Out[77]:
0 1 2 3 4
0 False True True True True
1 False False True True True

Whats the fastest way to loop through a DataFrame and count occurrences within the DataFrame whilst some condition is fulfilled (in Python)?

I have a dataframe with two Boolean fields (as below).
import pandas as pd
d = [{'a1':False, 'a2':False}, {'a1':True, 'a2':False}, {'a1':True, 'a2':False}, {'a1':False, 'a2':False}, {'a1':False, 'a2':True},
{'a1': False, 'a2': False}, {'a1':False, 'a2':False}, {'a1':True, 'a2':False}, {'a1':False, 'a2':True}, {'a1':False, 'a2':False},]
df = pd.DataFrame(d)
df
Out[1]:
a1 a2
0 False False
1 True False
2 True False
3 False False
4 False True
5 False False
6 False False
7 True False
8 False True
9 False False
I am trying to find the fastest and most "Pythonic" way of achieving the following:
If a1==True, count instances from current row where a2==False (e.g. row 1: a1=True, a2 is False for three rows from row 1)
At first instance of a2==True, stop counting (e.g. row 4, count = 3)
Set value of 'count' to new df column 'a3' on row where counting began (e.g. 'a3' = 3 on row 1)
Target result set as follows.
a1 a2 a3
0 False False 0
1 True False 3
2 True False 2
3 False False 0
4 False True 0
5 False False 0
6 False False 0
7 True False 1
8 False True 0
9 False False 0
I have been trying to accomplish this using for loops, iterrows and while loops and so far haven't been able to produce a good nested combination which provides the results I want. Any help appreciated. I apologize if the problem is not totally clear.

How about this:
df['a3'] = df.apply(lambda x: 0 if not x.a1 else len(df.a2[x.name:df.a2.tolist()[x.name:].index(True)+x.name]), axis=1)
So, if a1 is False write 0 else write the length of list that goes from that row until next True.

This will do the trick:
df['a3'] = 0
# loop throught every value of 'a1'
for i in xrange(df['a1'].__len__()):
# if 'a1' at position i is 'True'...
if df['a1'][i] == True:
count = 0
# loop over the remaining items in 'a2'
# remaining: __len__() - i
# i: position of 'True' value in 'a1'
for j in xrange(df['a2'].__len__() - i):
# if the value of 'a2' is 'False'...
if df['a2'][j + i] == False:
# count the occurances of 'False' values in a row...
count += 1
else:
# ... if it's not 'False' break the loop
break
# write the number of occurances on the right position (i) in 'a3'
df['a3'][i] = count
and produce the following output:
a1 a2 a3
0 False False 0
1 True False 3
2 True False 2
3 False False 0
4 False True 0
5 False False 0
6 False False 0
7 True False 1
8 False True 0
9 False False 0
Edit: added comments in the code

Broadcasting boolean operations in Pandas

Lets say I have an NxM boolean dataframe X and an Nx1 boolean dataframe Y. I would like to perform a boolean operation on each column returning a new dataframe that is NxM. For example:
x = pd.DataFrame([[True, True, True], [True, False, True], [False, False, True]])
y = pd.DataFrame([[False], [True], [True]])
I would like x & y to return:
0 1 2
0 False False False
1 True False True
2 False False True
But instead it returns:
0 1 2
0 False NaN NaN
1 True NaN NaN
2 False NaN NaN
Instead treating y as a series with
x & y[0]
gives:
0 1 2
0 False True True
1 False False True
2 False False True
Which appears to be broadcasting by row. Is there a correct way to do this other than transposing applying the operation with the Series and than untransposing?
(x.T & y[0]).T
0 1 2
0 False False False
1 True False True
2 False False True
It seems that that fails when the row index is not the same as the column labels

You could call apply and pass a lambda and call squeeze to flatten the Series into a 1-D array:
In [152]:
x.apply(lambda s: s & y.squeeze())
Out[152]:
0 1 2
0 False False False
1 True False True
2 False False True
I'm not sure if this is quicker though, here we're applying the mask column-wise by calling apply on the df which is why transposing is unnecessary
Actually you could use np.logical_and:
In [156]:
np.logical_and(x,y)
Out[156]:
0 1 2
0 False False False
1 True False True
2 False False True

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python dataframe : Add a column that increments when another column changes - python

Related

Find a row meeting conditions at most in n next rows in pandas

Count number of consecutive True in column, restart when False

Shift boolean horizontally in pandas dataframe

Whats the fastest way to loop through a DataFrame and count occurrences within the DataFrame whilst some condition is fulfilled (in Python)?

Broadcasting boolean operations in Pandas

Categories

Resources