I have a dataframe as follows
df = pd.DataFrame({
'Values' : [False, False, True, False, False, True, True, False, False, True]
})
df
Values
0 False
1 False
2 True
3 False
4 False
5 True
6 True
7 False
8 False
9 True
I would like to add another column named 'count' which increment by one whenever a True detects in 'Values' column
My expected output is as follows
Values Count
0 False 0
1 False 0
2 True 1
3 False 1
4 False 1
5 True 2
6 True 3
7 False 3
8 False 3
9 True 4
Now I am doing as follows
counter = [0]
def handleOneRow(row):
if row['Values'] == True:
counter[0] = counter[0] + 1
return counter[0]
df['count'] = df.apply(lambda x : handleOneRow(x), axis=1)
Is there any other simple way in dataframe ?
Related
I have a pandas dataframe like this:
close low max_close higher_low
0 2 1 True False
1 3 4 False True
2 1 2 True False
3 0 3 False False
4 5 2 False True
5 4 5 False True
6 3 3 True False
7 6 7 False True
and could be created with the code:
import pandas as pd
df = pd.DataFrame(
{
'close': [2, 3, 1, 0, 5, 4, 3, 6],
'low': [1, 4, 2, 3, 2, 5, 3, 7],
'max_close': [True, False, True, False, False, False, True, False],
'higher_low': [False, True, False, False, True, True, False, True]
}
)
For any row with a True value in the max_close column, I want to find the first row in the next rows where the value in the higher_low column is True and the value in the low column is greater than the value in the close column and also this row must be at most in the next 2 rows after the row where the value in max_close column was True.
So the output should be :
close low max_close higher_low
1 3 4 False True
7 6 7 False True
(Index 4 is not in the output because in this row: low < close. Also, index 5 is not in the output because it's three rows after index 2, while we have a condition that it should be at most in the next 2 rows.)
Also, it's my priority not to use any for-loops in the code.
Have you any idea about this?
Use -
lookup = 2
indices = []
for i in range(1, lookup+1):
if i > 1:
tmp = df.loc[(df[df['max_close']].loc[:-(i-1)].index+i)]
else:
tmp = df.loc[(df[df['max_close']].index+i)]
tmp_ind = tmp[(tmp['higher_low']) & (tmp['low']>tmp['close'])].index
indices += tmp_ind.tolist()
df.loc[set(indices)]
Output
close low max_close higher_low
1 3 4 False True
7 6 7 False True
Create virtual groups from max_close column then keep the 3 first rows (1 row for max_close and the 2 following). Finally, filter out on your 2 conditions:
out = (df.groupby(df['max_close'].cumsum()).head(3)
.query("higher_low & (close < low)"))
print(out)
# Output
close low max_close higher_low
1 3 4 False True
7 6 7 False True
I work with the following column in a pandas df:
A
True
True
True
False
True
True
I want to add column B that counts the number of consecutive "True" in A. I want to restart everytime a "False" comes up. Desired output:
A B
True 1
True 2
True 3
False 0
True 1
True 2
Using cumsum identify the blocks of rows where the values in column A stays True, then group the column A on these blocks and calculate cumulative sum to assign ordinal numbers
df['B'] = df['A'].groupby((~df['A']).cumsum()).cumsum()
A B
0 True 1
1 True 2
2 True 3
3 False 0
4 True 1
5 True 2
Using a simple & native approach
(For a small code sample it worked fine)
import pandas as pd
df = pd.DataFrame({'A': [True, False, True, True, True, False, True, True]})
class ToNums:
counter = 0
#staticmethod
def convert(bool_val):
if bool_val:
ToNums.counter += 1
else:
ToNums.counter = 0
return ToNums.counter
df['B'] = df.A.map(ToNums.convert)
df
A B
0 True 1
1 False 0
2 True 1
3 True 2
4 True 3
5 False 0
6 True 1
7 True 2
Here's an example
v=0
for i,val in enumerate(df['A']):
if val =="True":
df.loc[i,"C"]= v =v+1
else:
df.loc[i,"C"]=v=0
df.head()
This will give the desired output
A C
0 True 1
1 True 2
2 True 3
3 False 0
4 True 1
You can use a combination of groupby, cumsum, and cumcount
df['B'] = (df.groupby((df['A']&
~df['A'].shift(1).fillna(False) # row is True and next is False
)
.cumsum() # make group id
)
.cumcount().add(1) # make cumulated count
*df['A'] # multiply by 0 where initially False, 1 otherwise
)
output:
A B
0 True 1
1 True 2
2 True 3
3 False 0
4 True 1
5 True 2
Say I have a dataframe of booleans, called original:
original = pd.DataFrame([
[True, False, False, True, False],
[False, True, False, False, False]
])
0 1 2 3 4
0 True False False True False
1 False True False False False
And I want to create the following boolean dataframe (all to the right of a True should now be True):
0 1 2 3 4
0 False True True True True
1 False False True True True
I've accomplished this as follows, but was wondering if anyone had a less cumbersome method:
original.shift(axis=1).fillna(False).astype(int) \
.T.replace(to_replace=0, method='ffill').T.astype(bool)
cummax
original.cummax(1).shift(axis=1).fillna(False)
0 1 2 3 4
0 False True True True True
1 False False True True True
IIUC
original[original].shift(1,axis=1).ffill(1).fillna(0).astype(bool)
Out[77]:
0 1 2 3 4
0 False True True True True
1 False False True True True
I have a dataframe with two Boolean fields (as below).
import pandas as pd
d = [{'a1':False, 'a2':False}, {'a1':True, 'a2':False}, {'a1':True, 'a2':False}, {'a1':False, 'a2':False}, {'a1':False, 'a2':True},
{'a1': False, 'a2': False}, {'a1':False, 'a2':False}, {'a1':True, 'a2':False}, {'a1':False, 'a2':True}, {'a1':False, 'a2':False},]
df = pd.DataFrame(d)
df
Out[1]:
a1 a2
0 False False
1 True False
2 True False
3 False False
4 False True
5 False False
6 False False
7 True False
8 False True
9 False False
I am trying to find the fastest and most "Pythonic" way of achieving the following:
If a1==True, count instances from current row where a2==False (e.g. row 1: a1=True, a2 is False for three rows from row 1)
At first instance of a2==True, stop counting (e.g. row 4, count = 3)
Set value of 'count' to new df column 'a3' on row where counting began (e.g. 'a3' = 3 on row 1)
Target result set as follows.
a1 a2 a3
0 False False 0
1 True False 3
2 True False 2
3 False False 0
4 False True 0
5 False False 0
6 False False 0
7 True False 1
8 False True 0
9 False False 0
I have been trying to accomplish this using for loops, iterrows and while loops and so far haven't been able to produce a good nested combination which provides the results I want. Any help appreciated. I apologize if the problem is not totally clear.
How about this:
df['a3'] = df.apply(lambda x: 0 if not x.a1 else len(df.a2[x.name:df.a2.tolist()[x.name:].index(True)+x.name]), axis=1)
So, if a1 is False write 0 else write the length of list that goes from that row until next True.
This will do the trick:
df['a3'] = 0
# loop throught every value of 'a1'
for i in xrange(df['a1'].__len__()):
# if 'a1' at position i is 'True'...
if df['a1'][i] == True:
count = 0
# loop over the remaining items in 'a2'
# remaining: __len__() - i
# i: position of 'True' value in 'a1'
for j in xrange(df['a2'].__len__() - i):
# if the value of 'a2' is 'False'...
if df['a2'][j + i] == False:
# count the occurances of 'False' values in a row...
count += 1
else:
# ... if it's not 'False' break the loop
break
# write the number of occurances on the right position (i) in 'a3'
df['a3'][i] = count
and produce the following output:
a1 a2 a3
0 False False 0
1 True False 3
2 True False 2
3 False False 0
4 False True 0
5 False False 0
6 False False 0
7 True False 1
8 False True 0
9 False False 0
Edit: added comments in the code
Lets say I have an NxM boolean dataframe X and an Nx1 boolean dataframe Y. I would like to perform a boolean operation on each column returning a new dataframe that is NxM. For example:
x = pd.DataFrame([[True, True, True], [True, False, True], [False, False, True]])
y = pd.DataFrame([[False], [True], [True]])
I would like x & y to return:
0 1 2
0 False False False
1 True False True
2 False False True
But instead it returns:
0 1 2
0 False NaN NaN
1 True NaN NaN
2 False NaN NaN
Instead treating y as a series with
x & y[0]
gives:
0 1 2
0 False True True
1 False False True
2 False False True
Which appears to be broadcasting by row. Is there a correct way to do this other than transposing applying the operation with the Series and than untransposing?
(x.T & y[0]).T
0 1 2
0 False False False
1 True False True
2 False False True
It seems that that fails when the row index is not the same as the column labels
You could call apply and pass a lambda and call squeeze to flatten the Series into a 1-D array:
In [152]:
x.apply(lambda s: s & y.squeeze())
Out[152]:
0 1 2
0 False False False
1 True False True
2 False False True
I'm not sure if this is quicker though, here we're applying the mask column-wise by calling apply on the df which is why transposing is unnecessary
Actually you could use np.logical_and:
In [156]:
np.logical_and(x,y)
Out[156]:
0 1 2
0 False False False
1 True False True
2 False False True