Related
Create a New Column after every for loop iteration
proba=[12,65,1,54]
tau=[]
for i in range(len(proba)):
for j in range(len(proba)):
if proba[j]>=proba[i]:
tau.append(1)
else:
tau.append(0)
print(tau)
Getting output like this as below:
[1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1]
But I required output like below:
proba tau1 tau2 tau3 tau4
12 1 0 1 0
65 1 1 1 1
1 0 0 1 0
54 1 0 1 1
we can use pandas and numpy also to make code more generic
You could use a combination of pandas and numpy:
proba = np.array([12,65,1,54])
df = pd.DataFrame(proba, columns=['proba'])
for i in range(len(proba)):
df = pd.concat([df, pd.Series(proba >= proba[i], name=f'tau{i}').astype(int)], axis=1)
Output:
proba tau0 tau1 tau2 tau3
0 12 1 0 1 0
1 65 1 1 1 1
2 1 0 0 1 0
3 54 1 0 1 1
Builtin data structures such as dictionaries and/or lists, serve well for creating dataframes
import pandas as pd
proba = [12, 65, 1, 54]
taus = {}
for idx, i in enumerate(proba):
vals=[]
for j in proba:
if j >= i:
vals.append(1)
else:
vals.append(0)
taus[f"tau{idx}"] = vals
df = pd.DataFrame(taus)
df["proba"] = proba
I have a binary numpy array, mostly zero-valued, and I want to fill the gaps bewteen non-zero values with a given value, but in an alternate way.
For example:
[0,0,1,0,0,0,0,1,0,0,1,1,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,1,0,0]
should result in either
[0,0,1,1,1,1,1,1,0,0,1,1,0,0,0,0,0,1,1,1,0,0,0,0,1,1,1,1,0,0]
or
[1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,0,0,1,1,1]
The idea is: while scanning the array left to right, fill 0 values with 1 up the next 1, if you didn't do it up to the previous 1.
I can do this iteratively and in this way
A = np.array([0,0,1,0,0,0,0,1,0,0,1,1,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,1,0,0])
ones_index = np.where(A == 1)[0]
begins = ones_index[::2] # beginnings of filling section
ends = ones_index[1::2] # ends of filling sections
from itertools import zip_longest
# fill those sections
for begin, end in zip_longest(begins, ends, fillvalue=len(A)):
A[begin:end] = 1
but I'm looking for a more efficent solution, maybe with numpy broadcasting. Any ideas?
One nice answer to this question is that we can produce the first result via np.logical_xor.accumulate(arr) | arr and the second via ~np.logical_xor.accumulate(arr) | arr. A quick demonstration:
A = np.array([0,0,1,0,0,0,0,1,0,0,1,1,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,1,0,0])
print(np.logical_xor.accumulate(A) | A)
print(~np.logical_xor.accumulate(A) | A)
The resulting output:
[0 0 1 1 1 1 1 1 0 0 1 1 0 0 0 0 0 1 1 1 0 0 0 0 1 1 1 1 0 0]
[1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1]
np.where(arr.cumsum() % 2 == 1, 1, arr)
# array([0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0,
# 0, 0, 1, 1, 1, 1, 0, 0])
The problem description is simple, but I cannot figure how to make this work in Pandas. Basically, I'm trying to replace consecutive values (except the first) with some replacement value. For example:
data = {
"A": [0, 1, 1, 1, 0, 0, 0, 0, 2, 2, 2, 2, 3]
}
df = pd.DataFrame.from_dict(data)
A
0 0
1 1
2 1
3 1
4 0
5 0
6 0
7 0
8 2
9 2
10 2
11 2
12 3
If I run this through some function foo(df, 2, 0) I would get the following:
A
0 0
1 1
2 1
3 1
4 0
5 0
6 0
7 0
8 2
9 0
10 0
11 0
12 3
Which replaces all values of 2 with 0, except for the first one. Is this possible?
You can find all the rows where A = 2 and A is also equal to the previous A value and set them to 0:
data = {
"A": [0, 1, 1, 1, 0, 0, 0, 0, 2, 2, 2, 2, 3]
}
df = pd.DataFrame.from_dict(data)
df[(df.A == 2) & (df.A == df.A.shift(1))] = 0
Output:
A
0 0
1 1
2 1
3 1
4 0
5 0
6 0
7 0
8 2
9 0
10 0
11 0
12 3
If you have more than one column in the dataframe, use df.loc to just set the A values:
df.loc[(df.A == 2) & (df.A == df.A.shift(1)), 'A'] = 0
Try, if 'A' is duplicated further down the datafame, an is monotonic increasing:
def foo(df, val=2, repl=0):
return df.mask((df.groupby('A').transform('cumcount') > 0) & (df['A'] == val), repl)
foo(df, 2, 0)
Output:
A
0 0
1 1
2 1
3 1
4 0
5 0
6 0
7 0
8 2
9 0
10 0
11 0
12 3
I'm not sure if this is the best way, but I came up with this solution, hope to be helpful:
import pandas as pd
data = {
"A": [0, 1, 1, 1, 0, 0, 0, 0, 2, 2, 2, 2, 3]
}
df = pd.DataFrame(data)
def replecate(df, number, replacement):
i = 1
for column in df.columns:
for index,value in enumerate(df[column]):
if i == 1 and value == number :
i = 0
elif value == number and i != 1:
df[column][index] = replacement
i = 1
return df
replecate(df, 2 , 0)
Output
A
0 0
1 1
2 1
3 1
4 0
5 0
6 0
7 0
8 2
9 0
10 0
11 0
12 3
I've managed a solution to this problem by shifting the row down by one and checking to see if the values align. Also included a function which can take multiple values to check for (not just 2).
import pandas as pd
data = {
"A": [0, 1, 1, 1, 0, 0, 0, 0, 2, 2, 2, 2, 3]
}
df = pd.DataFrame(data)
def replace_recurring(df,key,offset=1,values=[2]):
df['offset'] = df[key].shift(offset)
df.loc[(df[key]==df['offset']) & (df[key].isin(values)),key] = 0
df = df.drop(['offset'],axis=1)
return df
df = replace_recurring(df,'A',offset=1,values=[2])
Giving the output:
A
0 0
1 1
2 1
3 1
4 0
5 0
6 0
7 0
8 2
9 0
10 0
11 0
12 3
Essentially, I want to convert consecutive duplicates of Trues, to False as the title suggests.
For example, say, i have an array of 0s and 1s
x = pd.Series([1,0,0,1,1])
should become:
y = pd.Series([0,0,0,0,1])
# where the 1st element of x becomes 0 since its not a consecutive
# and the 4th element becomes 0 because its the first instance of the consecutive duplicate
# And everything else should remain the same.
This can also apply to consecutives of more than two, Say i have a much longer array:
eg.
x = pd.Series([1,0,0,1,1,1,0,1,1,0,1,1,1,1,0,0,1,1,1,1,1])
becomes;
y = pd.Series([0,0,0,0,0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1])
Posts that i have searched are mostly either deleting consecutive duplicates, and does not retain the original length. In this case, it should retain the original length.
It is something like the following code:
for i in range(len(x)):
if x[i] == x[i+1]:
x[i] = True
else:
x[i] = False
but this gives me a never ending run. And does not accommodate consecutives of more than two.
Pandas solution - create Series, then consecutive groups by shift and cumsum and filter last 1 values in duplicates by Series.duplicated:
s = pd.Series(x)
g = s.ne(s.shift()).cumsum()
s1 = (~g.duplicated(keep='last') & g.duplicated(keep=False) & s.eq(1)).astype(int)
print (s1.tolist())
[0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1]
EDIT:
For multiple columns use function:
x = pd.Series([1,0,0,1,1,1,0,1,1,0,1,1,1,1,0,0,1,1,1,1,1])
df = pd.DataFrame({'a':x, 'b':x})
def f(s):
g = s.ne(s.shift()).cumsum()
return (~g.duplicated(keep='last') & g.duplicated(keep=False) & s.eq(1)).astype(int)
df = df.apply(f)
print (df)
a b
0 0 0
1 0 0
2 0 0
3 0 0
4 0 0
5 1 1
6 0 0
7 0 0
8 1 1
9 0 0
10 0 0
11 0 0
12 0 0
13 1 1
14 0 0
15 0 0
16 0 0
17 0 0
18 0 0
19 0 0
20 1 1
Vanilla Python :
x = [1,0,0,1,1,1,0,1,1,0,1,1,1,1,0,0,1,1,1,1,1]
counter = 0
for i, e in enumerate(x):
if not e:
counter = 0
continue
if not counter or (i < len(x) - 1 and x[i+1]):
counter += 1
x[i] = 0
print(x)
Prints :
[0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1]
I have two large matrices (1800L;1800C), epeq and triax, that have columns like:
epeq=
0
1
1
2
1
0
3
3
1
1
0
2
1
1
1
triax=
-1
1
3
1
-2
-3
-1
1
2
3
2
1
-1
-3
-1
1
as you can see, triax columns have cycles of positive and negative elements. I want a cumulative sum in epeq in the beginning of each cycle in triax and that this value stay constant during the cycle, like this:
epeq_cr=
0
1
1
1
1
1
1
11
11
11
11
11
11
11
11
17
and apply this procedure to all columns of the epeq matrix. I have that code but something miss.
epeq_cr = np.copy(epeq)
for g in range(1,len(epeq_cr)):
for h in range(len(epeq_cr[g])):
if (triax[g-1][h]<0 and triax[g][h]>0):
epeq_cr[g][h] = np.cumsum()...
I've run out of time to look at this now but I'd start by figuring out where the cycles start in the triax:
epeq = np.array([1, 1, 2, 1, 0, 3, 3, 1, 1, 0, 2, 1, 1, 1])
triax = np.array([-1, 1, 3, 1, -2, -3, -1, 1, 2, 3, 2, 1, -1, -3, -1, 1])
t_shift = np.roll(triax, 1)
t_shift[0] = 0
cycle_starts = np.argwhere((triax > 0) & (t_shift < 0)).flatten()
array([ 1, 7, 15])
So for any position, i, in epeq_cr you need to find the largest number less than i in cycle_starts and sum(epeq[:position]).
epeq_cr = np.copy(epeq)
for g in range(1,len(epeq_cr)):
for h in range(len(epeq_cr[g])):
if (triax[g-1][h]<=0 and triax[g][h]>=0):
epeq_cr[g][h]=sum(epeq[v][h] for v in range(g+1))
else:
epeq_cr[g][h]=epeq_cr[g-1][h]