Related
I have some acceleration data that I have set up a new column to give a 1 if the accel value in the accelpos column >=2.5 using the following code
frame["new3"] = np.where((frame.accelpos >=2.5), '1', '0')
I end up getting data in sequences like so
0,0,0,0,1,1,1,1,1,0,0,0,1,1,0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,0
I want to add a second column to give a 1 just at the start of each sequence as follows
0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
Any help would be much apreciated
You can compare shifted values by Series.shift and get values only for '1', so chain conditions by & for bitwise AND and last casting to integers for True/False to 1/0 mapping:
df = pd.DataFrame({'col':'0,0,0,0,1,1,1,1,1,0,0,0,1,1,0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,0'.split(',')})
df['new'] = (df['col'].ne(df['col'].shift()) & df['col'].eq('1')).astype(int)
Or test difference, but because possible first 1 is necessary replace missing value by original with fillna:
s = df['col'].astype(int)
df['new'] = s.diff().fillna(s).eq(1).astype(int)
print (df)
col new
0 0 0
1 0 0
2 0 0
3 0 0
4 1 1
5 1 0
6 1 0
7 1 0
8 1 0
9 0 0
10 0 0
11 0 0
12 1 1
13 1 0
14 0 0
15 0 0
16 0 0
17 1 1
18 1 0
19 1 0
20 1 0
21 1 0
22 1 0
23 1 0
24 1 0
25 1 0
26 1 0
27 0 0
28 0 0
29 0 0
30 0 0
I am not familiar with the where function. I guess i might try and help from an algorithmic point of view.
Assume we have a list a = [0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, ..., 0]
From an algorithmic POV if you want to replace each sequence of 1 with a unique one at the begining of such sequence here is what you want to do :
parse the list
assess whether it is a one or a zero
if it is a one then, each following item must be a 0 until you actually have a zero
You might want to have something like this :
a = [0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1]
for i in range(len(a)-1):
if a[i] == 1 :
for j in range(1,len(a)-i):
if a[i+j] == 1:
a[i+j] = 0
else :
break
I have a binary numpy array, mostly zero-valued, and I want to fill the gaps bewteen non-zero values with a given value, but in an alternate way.
For example:
[0,0,1,0,0,0,0,1,0,0,1,1,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,1,0,0]
should result in either
[0,0,1,1,1,1,1,1,0,0,1,1,0,0,0,0,0,1,1,1,0,0,0,0,1,1,1,1,0,0]
or
[1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,0,0,1,1,1]
The idea is: while scanning the array left to right, fill 0 values with 1 up the next 1, if you didn't do it up to the previous 1.
I can do this iteratively and in this way
A = np.array([0,0,1,0,0,0,0,1,0,0,1,1,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,1,0,0])
ones_index = np.where(A == 1)[0]
begins = ones_index[::2] # beginnings of filling section
ends = ones_index[1::2] # ends of filling sections
from itertools import zip_longest
# fill those sections
for begin, end in zip_longest(begins, ends, fillvalue=len(A)):
A[begin:end] = 1
but I'm looking for a more efficent solution, maybe with numpy broadcasting. Any ideas?
One nice answer to this question is that we can produce the first result via np.logical_xor.accumulate(arr) | arr and the second via ~np.logical_xor.accumulate(arr) | arr. A quick demonstration:
A = np.array([0,0,1,0,0,0,0,1,0,0,1,1,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,1,0,0])
print(np.logical_xor.accumulate(A) | A)
print(~np.logical_xor.accumulate(A) | A)
The resulting output:
[0 0 1 1 1 1 1 1 0 0 1 1 0 0 0 0 0 1 1 1 0 0 0 0 1 1 1 1 0 0]
[1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1]
np.where(arr.cumsum() % 2 == 1, 1, arr)
# array([0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0,
# 0, 0, 1, 1, 1, 1, 0, 0])
The problem description is simple, but I cannot figure how to make this work in Pandas. Basically, I'm trying to replace consecutive values (except the first) with some replacement value. For example:
data = {
"A": [0, 1, 1, 1, 0, 0, 0, 0, 2, 2, 2, 2, 3]
}
df = pd.DataFrame.from_dict(data)
A
0 0
1 1
2 1
3 1
4 0
5 0
6 0
7 0
8 2
9 2
10 2
11 2
12 3
If I run this through some function foo(df, 2, 0) I would get the following:
A
0 0
1 1
2 1
3 1
4 0
5 0
6 0
7 0
8 2
9 0
10 0
11 0
12 3
Which replaces all values of 2 with 0, except for the first one. Is this possible?
You can find all the rows where A = 2 and A is also equal to the previous A value and set them to 0:
data = {
"A": [0, 1, 1, 1, 0, 0, 0, 0, 2, 2, 2, 2, 3]
}
df = pd.DataFrame.from_dict(data)
df[(df.A == 2) & (df.A == df.A.shift(1))] = 0
Output:
A
0 0
1 1
2 1
3 1
4 0
5 0
6 0
7 0
8 2
9 0
10 0
11 0
12 3
If you have more than one column in the dataframe, use df.loc to just set the A values:
df.loc[(df.A == 2) & (df.A == df.A.shift(1)), 'A'] = 0
Try, if 'A' is duplicated further down the datafame, an is monotonic increasing:
def foo(df, val=2, repl=0):
return df.mask((df.groupby('A').transform('cumcount') > 0) & (df['A'] == val), repl)
foo(df, 2, 0)
Output:
A
0 0
1 1
2 1
3 1
4 0
5 0
6 0
7 0
8 2
9 0
10 0
11 0
12 3
I'm not sure if this is the best way, but I came up with this solution, hope to be helpful:
import pandas as pd
data = {
"A": [0, 1, 1, 1, 0, 0, 0, 0, 2, 2, 2, 2, 3]
}
df = pd.DataFrame(data)
def replecate(df, number, replacement):
i = 1
for column in df.columns:
for index,value in enumerate(df[column]):
if i == 1 and value == number :
i = 0
elif value == number and i != 1:
df[column][index] = replacement
i = 1
return df
replecate(df, 2 , 0)
Output
A
0 0
1 1
2 1
3 1
4 0
5 0
6 0
7 0
8 2
9 0
10 0
11 0
12 3
I've managed a solution to this problem by shifting the row down by one and checking to see if the values align. Also included a function which can take multiple values to check for (not just 2).
import pandas as pd
data = {
"A": [0, 1, 1, 1, 0, 0, 0, 0, 2, 2, 2, 2, 3]
}
df = pd.DataFrame(data)
def replace_recurring(df,key,offset=1,values=[2]):
df['offset'] = df[key].shift(offset)
df.loc[(df[key]==df['offset']) & (df[key].isin(values)),key] = 0
df = df.drop(['offset'],axis=1)
return df
df = replace_recurring(df,'A',offset=1,values=[2])
Giving the output:
A
0 0
1 1
2 1
3 1
4 0
5 0
6 0
7 0
8 2
9 0
10 0
11 0
12 3
I have A column with signal on == 1 and B column with signal off == 1 ,the rest values are zero.
data = {'A': [1, 0, 0, 0, 0, 1, 0],
'B': [1, 0, 1, 1, 0, 0, 1]}
df = pd.DataFrame.from_dict(data)
I need to create a column C where:
A == 1 and B == 0 or 1, C= 1
C = 1 till to B == 1, than C = 0
Here what the result should be:
df['C'] = [1, 1, 0, 0, 0, 1, 0]
I used
df.loc[df['A'] == 1, 'C'] = 1
to set at 1 the row where A == 1, but I can not find the way to get first non zero in B column, after the 1 signal on A, and replace the other with zeros till to next 1 in A.
You can do mask, with transform idxmax , mask here is to set B to 0 when A equal to 1 , since no matter what value of B, the C will be 1.
df['C']=(df.index<df.B.mask(df.A.eq(1),0).groupby(df.A.cumsum()).transform('idxmax')).astype(int)
df
A B C
0 1 1 1
1 0 0 1
2 0 1 0
3 0 1 0
4 0 0 0
5 1 0 1
6 0 1 0
Update
s=df.B.mask(df.A.eq(1),0)
s=(s==1)&(s.shift(-1)==0)
df['C']=(df.index<s.groupby(df.A.cumsum()).transform('idxmax')).astype(int)
df.loc[df.A==1,'C']=1
Hello and welcome to stackoverflow.
This is a case you usually wouldn't use pandas for as the value of C depends on previous rows. And pandas is more about using "split-apply-combine" on independent measurements
If it is not runtime-critical I would probably write a plain old loop for this:
In [4]: C = []
...: signal = 0
...: for _, row in df.iterrows():
...: if ((signal == 1) and (row.B == 1)):
...: signal = 0
...: elif(row.A == 1):
...: signal = 1
...: C.append(signal)
...:
In [5]: C
Out[5]: [1, 1, 0, 0, 0, 1, 0]
In [6]: df['C'] = C
In [7]: df
Out[7]:
A B C
0 1 1 1
1 0 0 1
2 0 1 0
3 0 1 0
4 0 0 0
5 1 0 1
6 0 1 0
This won't have a good performance, but imho it is worth it to cleanly express the intent of your code if it is still "fast enough".
Solution based on iterrows (as proposed in one of other answers)
may be too slow.
Define the following function computing the output signal for a group
of input rows (starting on each case of A == 1):
def signal(grp):
return pd.Series(np.equal(np.where(grp.A == 1, 0, grp.B)
.cumsum(), 0).astype(int), index=grp.index)
Then group df and apply this function:
df['C'] = df.groupby(df.A.cumsum()).apply(signal)\
.reset_index(level=0, drop=True)
Edit
Yet faster solution, without grouping, is:
sig = df.A.replace(0, np.nan)
sig.update(df.A.lt(df.B).astype(int).replace(0, np.nan) - 1)
df['C'] = sig.ffill().fillna(0, downcast='infer')
For a sample of 7000 rows (your data repeated 1000 times) the execution
time of this solution is 14 times shorter than the solution by YOBEN_S.
Essentially, I want to convert consecutive duplicates of Trues, to False as the title suggests.
For example, say, i have an array of 0s and 1s
x = pd.Series([1,0,0,1,1])
should become:
y = pd.Series([0,0,0,0,1])
# where the 1st element of x becomes 0 since its not a consecutive
# and the 4th element becomes 0 because its the first instance of the consecutive duplicate
# And everything else should remain the same.
This can also apply to consecutives of more than two, Say i have a much longer array:
eg.
x = pd.Series([1,0,0,1,1,1,0,1,1,0,1,1,1,1,0,0,1,1,1,1,1])
becomes;
y = pd.Series([0,0,0,0,0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1])
Posts that i have searched are mostly either deleting consecutive duplicates, and does not retain the original length. In this case, it should retain the original length.
It is something like the following code:
for i in range(len(x)):
if x[i] == x[i+1]:
x[i] = True
else:
x[i] = False
but this gives me a never ending run. And does not accommodate consecutives of more than two.
Pandas solution - create Series, then consecutive groups by shift and cumsum and filter last 1 values in duplicates by Series.duplicated:
s = pd.Series(x)
g = s.ne(s.shift()).cumsum()
s1 = (~g.duplicated(keep='last') & g.duplicated(keep=False) & s.eq(1)).astype(int)
print (s1.tolist())
[0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1]
EDIT:
For multiple columns use function:
x = pd.Series([1,0,0,1,1,1,0,1,1,0,1,1,1,1,0,0,1,1,1,1,1])
df = pd.DataFrame({'a':x, 'b':x})
def f(s):
g = s.ne(s.shift()).cumsum()
return (~g.duplicated(keep='last') & g.duplicated(keep=False) & s.eq(1)).astype(int)
df = df.apply(f)
print (df)
a b
0 0 0
1 0 0
2 0 0
3 0 0
4 0 0
5 1 1
6 0 0
7 0 0
8 1 1
9 0 0
10 0 0
11 0 0
12 0 0
13 1 1
14 0 0
15 0 0
16 0 0
17 0 0
18 0 0
19 0 0
20 1 1
Vanilla Python :
x = [1,0,0,1,1,1,0,1,1,0,1,1,1,1,0,0,1,1,1,1,1]
counter = 0
for i, e in enumerate(x):
if not e:
counter = 0
continue
if not counter or (i < len(x) - 1 and x[i+1]):
counter += 1
x[i] = 0
print(x)
Prints :
[0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1]