Essentially, I want to convert consecutive duplicates of Trues, to False as the title suggests.
For example, say, i have an array of 0s and 1s
x = pd.Series([1,0,0,1,1])
should become:
y = pd.Series([0,0,0,0,1])
# where the 1st element of x becomes 0 since its not a consecutive
# and the 4th element becomes 0 because its the first instance of the consecutive duplicate
# And everything else should remain the same.
This can also apply to consecutives of more than two, Say i have a much longer array:
eg.
x = pd.Series([1,0,0,1,1,1,0,1,1,0,1,1,1,1,0,0,1,1,1,1,1])
becomes;
y = pd.Series([0,0,0,0,0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1])
Posts that i have searched are mostly either deleting consecutive duplicates, and does not retain the original length. In this case, it should retain the original length.
It is something like the following code:
for i in range(len(x)):
if x[i] == x[i+1]:
x[i] = True
else:
x[i] = False
but this gives me a never ending run. And does not accommodate consecutives of more than two.
Pandas solution - create Series, then consecutive groups by shift and cumsum and filter last 1 values in duplicates by Series.duplicated:
s = pd.Series(x)
g = s.ne(s.shift()).cumsum()
s1 = (~g.duplicated(keep='last') & g.duplicated(keep=False) & s.eq(1)).astype(int)
print (s1.tolist())
[0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1]
EDIT:
For multiple columns use function:
x = pd.Series([1,0,0,1,1,1,0,1,1,0,1,1,1,1,0,0,1,1,1,1,1])
df = pd.DataFrame({'a':x, 'b':x})
def f(s):
g = s.ne(s.shift()).cumsum()
return (~g.duplicated(keep='last') & g.duplicated(keep=False) & s.eq(1)).astype(int)
df = df.apply(f)
print (df)
a b
0 0 0
1 0 0
2 0 0
3 0 0
4 0 0
5 1 1
6 0 0
7 0 0
8 1 1
9 0 0
10 0 0
11 0 0
12 0 0
13 1 1
14 0 0
15 0 0
16 0 0
17 0 0
18 0 0
19 0 0
20 1 1
Vanilla Python :
x = [1,0,0,1,1,1,0,1,1,0,1,1,1,1,0,0,1,1,1,1,1]
counter = 0
for i, e in enumerate(x):
if not e:
counter = 0
continue
if not counter or (i < len(x) - 1 and x[i+1]):
counter += 1
x[i] = 0
print(x)
Prints :
[0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1]
Related
I have some acceleration data that I have set up a new column to give a 1 if the accel value in the accelpos column >=2.5 using the following code
frame["new3"] = np.where((frame.accelpos >=2.5), '1', '0')
I end up getting data in sequences like so
0,0,0,0,1,1,1,1,1,0,0,0,1,1,0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,0
I want to add a second column to give a 1 just at the start of each sequence as follows
0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
Any help would be much apreciated
You can compare shifted values by Series.shift and get values only for '1', so chain conditions by & for bitwise AND and last casting to integers for True/False to 1/0 mapping:
df = pd.DataFrame({'col':'0,0,0,0,1,1,1,1,1,0,0,0,1,1,0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,0'.split(',')})
df['new'] = (df['col'].ne(df['col'].shift()) & df['col'].eq('1')).astype(int)
Or test difference, but because possible first 1 is necessary replace missing value by original with fillna:
s = df['col'].astype(int)
df['new'] = s.diff().fillna(s).eq(1).astype(int)
print (df)
col new
0 0 0
1 0 0
2 0 0
3 0 0
4 1 1
5 1 0
6 1 0
7 1 0
8 1 0
9 0 0
10 0 0
11 0 0
12 1 1
13 1 0
14 0 0
15 0 0
16 0 0
17 1 1
18 1 0
19 1 0
20 1 0
21 1 0
22 1 0
23 1 0
24 1 0
25 1 0
26 1 0
27 0 0
28 0 0
29 0 0
30 0 0
I am not familiar with the where function. I guess i might try and help from an algorithmic point of view.
Assume we have a list a = [0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, ..., 0]
From an algorithmic POV if you want to replace each sequence of 1 with a unique one at the begining of such sequence here is what you want to do :
parse the list
assess whether it is a one or a zero
if it is a one then, each following item must be a 0 until you actually have a zero
You might want to have something like this :
a = [0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1]
for i in range(len(a)-1):
if a[i] == 1 :
for j in range(1,len(a)-i):
if a[i+j] == 1:
a[i+j] = 0
else :
break
I have two dataframe columns containing sequences of 0 and -1.
Using Python command “count” I can calculate the number of times the 1st column equal '-1' ( =3 times) and the number of times the 2nd column equals '-1' ( =2 times). Actually, I would like to calculate the number of times that both columns x and y are equal to '-1' simultaneously ( = it should be equal to 1 in the given example)(something like calculating: count = df1['x'][df1['x'] == df1['y'] == -1]. count() but I cannot put 2 conditions directly in command 'count'..).
Is there a simpe way to do it (using count or some other workaround)?
Thanks in advance!
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)
df1 = pd.DataFrame({
"x": [0, 0, 0, -1 , 0, -1, 0, 0, 0, 0 , 0, 0, 0, -1, 0],
"y": [0, 0, 0, 0 , 0, 0, -1, 0, 0, 0 , 0, 0, 0, -1, 0],
})
df1
x y
0 0 0
1 0 0
2 0 0
3 -1 0
4 0 0
5 -1 0
6 0 -1
7 0 0
8 0 0
9 0 0
10 0 0
11 0 0
12 0 0
13 -1 -1
14 0 0
count = df1['x'][df1['x'] == -1]. count()
count
3
count = df1['y'][df1['y'] == -1]. count()
count
2
You can use eq + all to get a boolean Series that returns True if both columns are equal to -1 at the same time. Then sum fetches the total:
out = df1[['x','y']].eq(-1).all(axis=1).sum()
Output:
1
Sum x and y, and count the ones where they add to -2. ie both are -1
(df1.x + df1.y).eq(-2).sum()
1
I have a binary numpy array, mostly zero-valued, and I want to fill the gaps bewteen non-zero values with a given value, but in an alternate way.
For example:
[0,0,1,0,0,0,0,1,0,0,1,1,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,1,0,0]
should result in either
[0,0,1,1,1,1,1,1,0,0,1,1,0,0,0,0,0,1,1,1,0,0,0,0,1,1,1,1,0,0]
or
[1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,0,0,1,1,1]
The idea is: while scanning the array left to right, fill 0 values with 1 up the next 1, if you didn't do it up to the previous 1.
I can do this iteratively and in this way
A = np.array([0,0,1,0,0,0,0,1,0,0,1,1,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,1,0,0])
ones_index = np.where(A == 1)[0]
begins = ones_index[::2] # beginnings of filling section
ends = ones_index[1::2] # ends of filling sections
from itertools import zip_longest
# fill those sections
for begin, end in zip_longest(begins, ends, fillvalue=len(A)):
A[begin:end] = 1
but I'm looking for a more efficent solution, maybe with numpy broadcasting. Any ideas?
One nice answer to this question is that we can produce the first result via np.logical_xor.accumulate(arr) | arr and the second via ~np.logical_xor.accumulate(arr) | arr. A quick demonstration:
A = np.array([0,0,1,0,0,0,0,1,0,0,1,1,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,1,0,0])
print(np.logical_xor.accumulate(A) | A)
print(~np.logical_xor.accumulate(A) | A)
The resulting output:
[0 0 1 1 1 1 1 1 0 0 1 1 0 0 0 0 0 1 1 1 0 0 0 0 1 1 1 1 0 0]
[1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1]
np.where(arr.cumsum() % 2 == 1, 1, arr)
# array([0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0,
# 0, 0, 1, 1, 1, 1, 0, 0])
Is there any way to make the below code more efficient.
for i in range(0, len(df)):
current_row = df.iloc[i]
if i > 0:
previous_row =df.iloc[i-1]
else:
previous_row = current_row
if (current_row['A'] != 1):
if ((current_row['C'] < 55) and (current_row['D'] >= -1)):
df.loc[i,'F'] = previous_row['F'] + 1
else:
df.loc[i,'F'] = previous_row['F']
For example if the dataframe is like below:
df = pd.DataFrame({'A':[1,1,1, 0, 0, 0, 1, 0, 0], 'C':[1,1,1, 0, 0, 0, 1, 1, 1], 'D':[1,1,1, 0, 0, 0, 1, 1, 1],
'F':[1,1,1, 0, 0, 0, 1, 1, 1]})
My output should look like this
>>> df
A C D F
0 1 1 1 1
1 1 1 1 1
2 1 1 1 1
3 0 0 0 2
4 0 0 0 3
5 0 0 0 4
6 1 1 1 1
7 0 1 1 2
8 0 1 1 3
I basically want to reorder(don't think this is a shuffling task) a list of 100 binary numbers. The following properties should hold after the reorder: the fixed frequency of 1's should remain, which is 10 and the 1's should be roughly spread apart from each other as shown below, so every 9th, 10th, or 11th digit is a 1. I want this reordering to be random. The trivial approach I had in mind is to track the index of the first 1 in the input list and generate a new start index. Any ideas on other solutions?
x = [1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0]
Codes as follows:
def main():
from random import shuffle
from random import randint
from itertools import chain
num_of_10th = randint(0, 5) * 2
num_of_11th = num_of_9th = int((10 - num_of_10th) / 2)
lsts = []
for i in range(num_of_10th):
lsts.append([1, 0, 0, 0, 0, 0, 0, 0, 0, 0])
for i in range(num_of_9th):
lsts.append([1, 0, 0, 0, 0, 0, 0, 0, 0])
for i in range(num_of_11th):
lsts.append([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
shuffle(lsts)
lsts = list(chain.from_iterable(lsts))
print(lsts)
You can use python's list multiplication.
My solution will generate a random size between 1 and 10 using random.randint. from this size I create the repeated_part that starts with a 1 and fills in the rest with zero's. For example
when size is 5 repeated_part will be [1, 0, 0, 0, 0].
From the size we can calculate the number of times it fits in a list of 100 100//spread and we add one overflow. Now the list will be too large for example with a size of 3 the total size of the list is ((100/3)+1)*3 = 102 so we truncate the list to become 100 in length with [:100].
import random
size = random.randint(1, 10)
repeated_part = [1] + [0]*(size-1)
result = (repeated_part * (100 // size + 1)) [:100]
Note is you want the 1 to not start as first you could use random.shuffle(repeated_part) but still hold all your other requirements.