Python - Check values in consecutive columns(i.e across rows) - python

I have a 6x4 dataframe containing numerical values. I would like to check if the value in the current column is the same as the next column's i.e are there any equal values in consecutive columns per row?. How do I perform this check as a new column?
import itertools as it
import pandas as pd
list(set(it.permutations([1,1,0,0])))
x_list = list(set(it.permutations([1,1,0,0])))
x_df = pd.DataFrame(x_list)
x_df.columns = ['one', 'two', 'three', 'four']

If I understood you correctly:
x = x_df.diff(periods=-1, axis=1)
x['four'] = x_df['four'] - x_df['three']
print((x==0))
Input:
one two three four
0 1 0 1 0
1 1 1 0 0
2 1 0 0 1
3 0 1 1 0
4 0 1 0 1
5 0 0 1 1
Output:
one two three four
0 False False False False
1 True False True True
2 False True False False
3 False True False False
4 False False False False
5 True False True True

Related

Create a column Counting the consecutive True values on multi-index

Let df be a dataframe of boolean values with a two column index. I want to calculate the value for every id. For example, this is how it would look on this specific case.
value consecutive
id Week
1 1 True 1
1 2 True 2
1 3 False 0
1 4 True 1
1 5 True 2
2 1 False 0
2 2 False 0
2 3 True 1
This is my solution:
def func(id,week):
M = df.loc[id]
M= df.loc[id][:week+1]
consecutive_list = list()
S=0
for index,row in M.iterrows():
if row['value']:
S+=1
else:
S=0
consecutive_list.append(S)
return consecutive_list[-1]
Then we generate the column "consecutive" as a list on the following way:
Consecutive_list = list()
for k in df.index:
id = k[0]
week=k[1]
Consecutive_list.append(func(id,week))
df['consecutive'] = Consecutive_list
I would like to know if there is a more Pythonic way to do this.
EDIT: I wrote the "consecutive" column in order to show what I expect this to be.
If you are trying to add the consecutive column to the df, this should work:
df.assign(consecutive = df['value'].groupby(df['value'].diff().ne(0).cumsum()).cumsum())
Output:
value consecutive
1 a True 1
b True 2
2 a False 0
b True 1
3 a True 2
b False 0
4 a False 0
b True 1

Count number of consecutive True in column, restart when False

I work with the following column in a pandas df:
A
True
True
True
False
True
True
I want to add column B that counts the number of consecutive "True" in A. I want to restart everytime a "False" comes up. Desired output:
A B
True 1
True 2
True 3
False 0
True 1
True 2
Using cumsum identify the blocks of rows where the values in column A stays True, then group the column A on these blocks and calculate cumulative sum to assign ordinal numbers
df['B'] = df['A'].groupby((~df['A']).cumsum()).cumsum()
A B
0 True 1
1 True 2
2 True 3
3 False 0
4 True 1
5 True 2
Using a simple & native approach
(For a small code sample it worked fine)
import pandas as pd
df = pd.DataFrame({'A': [True, False, True, True, True, False, True, True]})
class ToNums:
counter = 0
#staticmethod
def convert(bool_val):
if bool_val:
ToNums.counter += 1
else:
ToNums.counter = 0
return ToNums.counter
df['B'] = df.A.map(ToNums.convert)
df
A B
0 True 1
1 False 0
2 True 1
3 True 2
4 True 3
5 False 0
6 True 1
7 True 2
Here's an example
v=0
for i,val in enumerate(df['A']):
if val =="True":
df.loc[i,"C"]= v =v+1
else:
df.loc[i,"C"]=v=0
df.head()
This will give the desired output
A C
0 True 1
1 True 2
2 True 3
3 False 0
4 True 1
You can use a combination of groupby, cumsum, and cumcount
df['B'] = (df.groupby((df['A']&
~df['A'].shift(1).fillna(False) # row is True and next is False
)
.cumsum() # make group id
)
.cumcount().add(1) # make cumulated count
*df['A'] # multiply by 0 where initially False, 1 otherwise
)
output:
A B
0 True 1
1 True 2
2 True 3
3 False 0
4 True 1
5 True 2

Whats the fastest way to loop through a DataFrame and count occurrences within the DataFrame whilst some condition is fulfilled (in Python)?

I have a dataframe with two Boolean fields (as below).
import pandas as pd
d = [{'a1':False, 'a2':False}, {'a1':True, 'a2':False}, {'a1':True, 'a2':False}, {'a1':False, 'a2':False}, {'a1':False, 'a2':True},
{'a1': False, 'a2': False}, {'a1':False, 'a2':False}, {'a1':True, 'a2':False}, {'a1':False, 'a2':True}, {'a1':False, 'a2':False},]
df = pd.DataFrame(d)
df
Out[1]:
a1 a2
0 False False
1 True False
2 True False
3 False False
4 False True
5 False False
6 False False
7 True False
8 False True
9 False False
I am trying to find the fastest and most "Pythonic" way of achieving the following:
If a1==True, count instances from current row where a2==False (e.g. row 1: a1=True, a2 is False for three rows from row 1)
At first instance of a2==True, stop counting (e.g. row 4, count = 3)
Set value of 'count' to new df column 'a3' on row where counting began (e.g. 'a3' = 3 on row 1)
Target result set as follows.
a1 a2 a3
0 False False 0
1 True False 3
2 True False 2
3 False False 0
4 False True 0
5 False False 0
6 False False 0
7 True False 1
8 False True 0
9 False False 0
I have been trying to accomplish this using for loops, iterrows and while loops and so far haven't been able to produce a good nested combination which provides the results I want. Any help appreciated. I apologize if the problem is not totally clear.
How about this:
df['a3'] = df.apply(lambda x: 0 if not x.a1 else len(df.a2[x.name:df.a2.tolist()[x.name:].index(True)+x.name]), axis=1)
So, if a1 is False write 0 else write the length of list that goes from that row until next True.
This will do the trick:
df['a3'] = 0
# loop throught every value of 'a1'
for i in xrange(df['a1'].__len__()):
# if 'a1' at position i is 'True'...
if df['a1'][i] == True:
count = 0
# loop over the remaining items in 'a2'
# remaining: __len__() - i
# i: position of 'True' value in 'a1'
for j in xrange(df['a2'].__len__() - i):
# if the value of 'a2' is 'False'...
if df['a2'][j + i] == False:
# count the occurances of 'False' values in a row...
count += 1
else:
# ... if it's not 'False' break the loop
break
# write the number of occurances on the right position (i) in 'a3'
df['a3'][i] = count
and produce the following output:
a1 a2 a3
0 False False 0
1 True False 3
2 True False 2
3 False False 0
4 False True 0
5 False False 0
6 False False 0
7 True False 1
8 False True 0
9 False False 0
Edit: added comments in the code

Plotting multidimensional binary data as horizontal bars

I'm trying to nicely plot a multidimensional array (see below) as parallel horizontal bars, in a way that it's filled when True and white when False.
Here's my data:
bar_1 bar_2 bar_3
0 True False False
1 True False False
2 True False True
3 False True False
4 False True False
5 False True False
6 False False False
7 False False False
8 False False False
9 False True False
10 False True False
11 False True False
12 False True False
13 False True False
14 False True False
15 False True False
16 True False False
17 True False False
18 True False True
19 False True False
20 False True False
21 False True False
22 True False True
23 False True False
24 False True False
25 False True False
Here's how I'd like to display it:
I was looking through the matplotlib docs for something similar, but no luck. Perhaps I'm missing the keywords for this type of plotting. What'd be the name of this type of plot? Is it possible to generate this with matplotlib?
Thanks to ImportanceOfBeingErnest I came up with a solution. I am going to use broken_barh which was not exactly designed for this purpose, but with a little bit of tweaking, it can be used for this.
For the sake of simplicity, I'll only display a function, and mark the places red where it's above 0.5.
index = np.linspace(start=-1, stop=5, num=100)
sin = np.sin(index**2)
df = pd.DataFrame({'sin': sin, 'filtered': sin > .5}, index=index)
Produces
Note I'm not going to use the same data as on the plots because that'd take up too much space
In the next step, I compute the flag points where my function crosses the threshold. Just to visualize it:
0 0 0 1 1 1 0 0 1 1 0 0 1 0 0
I compute the shifted values:
0 0 1 1 1 0 0 1 1 0 0 1 0 0 0
0 0 0 1 1 1 0 0 1 1 0 0 1 0 0 // The original values
0 0 0 0 1 1 1 0 0 1 1 0 0 1 0
And I keep only the values where the original array is True and the other two arrays xor (either one is True or the other but not both):
0 0 0 1 0 1 0 0 1 1 0 0 0 0 0
Note that a single spike that doesn't cross just touches the threshold will be 0.
This can be easily achieved with
flags = df.filtered & (df.filtered.shift().fillna(False) ^ df.filtered.shift(-1).fillna(False))
Now I multiply the flag points with the index (not necessarily an integer index).
flag_values = flags * flags.index
0 0 0 3 0 5 0 0 8 9 0 0 0 0 0
And drop the 0 values:
flag_values = flag_values[flag_values != 0]
[3, 5, 8, 9]
I still need to reshape it:
value_pairs = flag_values.values.reshape((-1, 2))
[[3, 5],
[8, 9]]
And now I need to subtract the first column from the second one:
value_pairs[:, 1] = value_pairs[:, 1] - value_pairs[:, 0]
And I can plot it as follows:
ax = df.sin.plot()
// The second parameter is the height of the bar, and its thickness.
ax.broken_barh(value_pairs, (0.49, 0.02,), facecolors='red')
Here's the result

Adding a count to prior cell value in Pandas

in Pandas I am looking to add a value in one column 'B' depending on the boolean values from another column 'A'. So if 'A' is True then start counting (i.e. adding a one each new line) as long as 'A' is false. When 'A' is True reset and start counting again. I managed to do this with a 'for' loop but this is very time consuming. I am wondering if there is no more time efficient solution?
the result should look like this:
Date A B
01.2010 False 0
02.2010 True 1
03.2010 False 2
04.2010 False 3
05.2010 True 1
06.2010 False 2
You can use cumsum with groupby and cumcount:
print df
Date A
0 1.201 False
1 1.201 True
2 1.201 False
3 2.201 True
4 3.201 False
5 4.201 False
6 5.201 True
7 6.201 False
roll = df.A.cumsum()
print roll
0 0
1 1
2 1
3 2
4 2
5 2
6 3
7 3
Name: A, dtype: int32
df['B'] = df.groupby(roll).cumcount() + 1
#if in first values are False, output is 0
df.loc[roll == 0 , 'B'] = 0
print df
Date A B
0 1.201 False 0
1 1.201 True 1
2 1.201 False 2
3 2.201 True 1
4 3.201 False 2
5 4.201 False 3
6 5.201 True 1
7 6.201 False 2
thanks, I got the solution from another post similar to this:
rolling_count = 0
def set_counter(val):
if val == False:
global rolling_count
rolling_count +=1
else:
val == True
rolling_count = 1
return rolling_count
df['B'] = df['A'].map(set_counter)

Categories

Resources