Multiple conditional statements on list comprehension

Multiple conditional statements on list comprehension - python

So this is my code and I want to know if I can use list comprehension to execute the same operation (count the clusters within rows and output a list of length df.shape[0]). There are at least two rows for the same cluster number, but it can be more and they cycles. I tried but couldn't figure it out.
Any suggestions?
My code:
import pandas as pd
cluster_global = 0
cluster_relativo = 0
cluster_index = []
for index, row in df.iterrows():
if row['cluster'] == cluster_relativo:
cluster_index.append(cluster_global)
elif row['cluster'] == (cluster_relativo + 1):
cluster_global += 1
cluster_relativo += 1
cluster_index.append(cluster_global)
elif row['cluster'] == 0:
cluster_global += 1
cluster_relativo = 0
cluster_index.append(cluster_global)
The DataFrame looks like
index
cluster
0
0
1
0
2
1
3
1
4
1
5
2
6
2
7
0
8
0
...
...
n
m<40

Do you want this?
from itertools import groupby
result = [0 if index == 0 and key == 0
else index
for index, (key, group) in enumerate(groupby(my_values))
for _ in group
]
print(result)
Replace my_values in the list comprehension via - df['cluster'].values. to test

Related

Python inner loop and outer loop with counter to iterate over list

I have specific issue where, im trying to find solution on inner loop(execute 3 time) and continue outer to process rest of the list in for loop:
strings = ['A','A','A','A','A','A','A','A','A','A','B','B','B','B','B','B','B','B','B','B',\
'A','A','A','A','A','A','A','A','A','A','B','B','B','B','B','B','B','B','B','B']
i=0
for string in strings:
global i
if string == 'A':
while i < 3:
print(string, i)
i+=1
if i==3: continue
elif string== 'B':
while i < 3:
print(string,i)
i+=1
if i==3: continue
# print(string)
Current result:
A 0
A 1
A 2
Expected to have continued over list once the inner loop complete and process from next:
A 0
A 1
A 2
B 0
B 1
B 2
A 0
A 1
A 2
B 0
B 1
B 2

If I understand correctly the logic, you could use itertools.groupby to help you form the groups:
variant #1
from itertools import groupby
MAX = 3
for k,g in groupby(strings):
for i in range(min(len(list(g)), MAX)):
print(f'{k} {i}')
print()
variant #2
from itertools import groupby
MAX = 3
for k,g in groupby(strings):
for i,_ in enumerate(g):
if i >= MAX:
break
print(f'{k} {i}')
print()
output:
A 0
A 1
A 2
B 0
B 1
B 2
A 0
A 1
A 2
B 0
B 1
B 2
variant #3: without import
prev = None
count = 0
MAX = 3
for s in strings:
if s == prev:
if count < MAX:
print(f'{s} {count}')
count += 1
elif prev:
count = 0
print()
prev = s

how to merge rows summing over each column without iteration

I have the following df:
prevent _p _n _id
0 1 0 0 83135
0 0 1 0 83135
0 0 1 0 82238
I would like to merge all rows having the same column _idby summing over each column for
the desired output in a dataframe, final (please note that if thee sum is greater than 1, the value should just be 1):
prevent _p _n _id
0 1 1 0 83135
0 0 1 0 82238
I can easily do this using the following code iterating over the dataframe:
final = pd.DataFrame()
for id_ in _ids:
out = df[df._id == id_]
prevent = 0
_p = 0
_n = 0
d = {}
if len(out) > 0:
for row in out.itertuples():
if prevent == 0:
prevent += row.prevent
if _p == 0:
_p += row._p
if _n == 0:
_n += row._n
d['_p'] = _p
d['_n'] = _n
d['prevent'] = prevent
t=pd.DataFrame([d])
t['_id'] = id_
final=pd.concat([final, t])
I have several hundred thousand rows, so this will be very inefficient. Is there a way to vectorize this?

Treat 0 and 1 as boolean with any, then convert them back to integers:
df.groupby("_id").any().astype("int").reset_index()

Check groupby
out = df.groupby('_id',as_index=False).sum()

Vectorized function with counter on pandas dataframe column

Consider this pandas dataframe where the condition column is 1 when value is below 5 (any threshold).
import pandas as pd
d = {'value': [30,100,4,0,80,0,1,4,70,70],'condition':[0,0,1,1,0,1,1,1,0,0]}
df = pd.DataFrame(data=d)
df
Out[1]:
value condition
0 30 0
1 100 0
2 4 1
3 0 1
4 80 0
5 0 1
6 1 1
7 4 1
8 70 0
9 70 0
What I want is to have all consecutive values below 5 to have the same id and all values above five have 0 (or NA or a negative value, doesn't matter, they just need to be the same). I want to create a new column called new_id that contains these cumulative ids as follows:
value condition new_id
0 30 0 0
1 100 0 0
2 4 1 1
3 0 1 1
4 80 0 0
5 0 1 2
6 1 1 2
7 4 1 2
8 70 0 0
9 70 0 0
In a very inefficient for loop I would do this (which works):
for i in range(0,df.shape[0]):
if (df.loc[df.index[i],'condition'] == 1) & (df.loc[df.index[i-1],'condition']==0):
new_id = counter # assign new id
counter += 1
elif (df.loc[df.index[i],'condition']==1) & (df.loc[df.index[i-1],'condition']!=0):
new_id = counter-1 # assign current id
elif (df.loc[df.index[i],'condition']==0):
new_id = df.loc[df.index[i],'condition'] # assign 0
df.loc[df.index[i],'new_id'] = new_id
df
But this is very inefficient and I have a very big dataset. Therefore I tried different kinds of vectorization but I so far failed to keep it from counting up inside each "cluster" of consecutive points:
# First try using cumsum():
df['new_id'] = 0
df['new_id_temp'] = ((df['condition'] == 1)).astype(int).cumsum()
df.loc[(df['condition'] == 1), 'new_id'] = df['new_id_temp']
df[['value', 'condition', 'new_id']]
# Another try using list comprehension but this just does +1:
[row+1 for ind, row in enumerate(df['condition']) if (row != row-1)]
I also tried using apply() with a custom if else function but it seems like this does not allow me to use a counter.
There is already a ton of similar posts about this but none of them keep the same id for consecutive rows.
Example posts are:
Maintain count in python list comprehension
Pandas cumsum on a separate column condition
Python - keeping counter inside list comprehension
python pandas conditional cumulative sum
Conditional count of cumulative sum Dataframe - Loop through columns

You can use the cumsum(), as you did in your first try, just modify it a bit:
# calculate delta
df['delta'] = df['condition']-df['condition'].shift(1)
# get rid of -1 for the cumsum (replace it by 0)
df['delta'] = df['delta'].replace(-1,0)
# cumulative sum conditional: multiply with condition column
df['cumsum_x'] = df['delta'].cumsum()*df['condition']

Welcome to SO! Why not just rely on base Python for this?
def counter_func(l):
new_id = [0] # First value is zero in any case
counter = 0
for i in range(1, len(l)):
if l[i] == 0:
new_id.append(0)
elif l[i] == 1 and l[i-1] == 0:
counter += 1
new_id.append(counter)
elif l[i] == l[i-1] == 1:
new_id.append(counter)
else: new_id.append(None)
return new_id
df["new_id"] = counter_func(df["condition"])
Looks like this
value condition new_id
0 30 0 0
1 100 0 0
2 4 1 1
3 0 1 1
4 80 0 0
5 0 1 2
6 1 1 2
7 4 1 2
8 70 0 0
9 70 0 0
Edit :
You can also use numba, which sped up the function quite a lot for me about : about 1sec to ~60ms.
You should input numpy arrays into the function to use it, meaning you'll have to df["condition"].values.
from numba import njit
import numpy as np
#njit
def func(arr):
res = np.empty(arr.shape[0])
counter = 0
res[0] = 0 # First value is zero anyway
for i in range(1, arr.shape[0]):
if arr[i] == 0:
res[i] = 0
elif arr[i] and arr[i-1] == 0:
counter += 1
res[i] = counter
elif arr[i] == arr[i-1] == 1:
res[i] = counter
else: res[i] = np.nan
return res
df["new_id"] = func(df["condition"].values)

Looping over a pandas column and creating a new column if it meets conditions

I have a pandas dataframe and I want to loop over the last column "n" times based on a condition.
import random as random
import pandas as pd
p = 0.5
df = pd.DataFrame()
start = []
for i in range(5)):
if random.random() < p:
start.append("0")
else:
start.append("1")
df['start'] = start
print(df['start'])
Essentially, I want to loop over the final column "n" times and if the value is 0, change it to 1 with probability p so the results become the new final column. (I am simulating on-off every time unit with probability p).
e.g. after one iteration, the dataframe would look something like:
0 0
0 1
1 1
0 0
0 1
after two:
0 0 1
0 1 1
1 1 1
0 0 0
0 1 1
What is the best way to do this?
Sorry if I am asking this wrong, I have been trying to google for a solution for hours and coming up empty.

Like this. Append col with name 1, 2, ...
# continue from question code ...
# colname is 1, 2, ...
for col in range(1, 5):
tmp = []
for i in range(5):
# check final col
if df.iloc[i,col-1:col][0] == "0":
if random.random() < p:
tmp.append("0")
else:
tmp.append("1")
else: # == 1
tmp.append("1")
# append new col
df[str(col)] = tmp
print(df)
# initial
s
0 0
1 1
2 0
3 0
4 0
# result
s 1 2 3 4
0 0 0 1 1 1
1 0 0 0 0 1
2 0 0 1 1 1
3 1 1 1 1 1
4 0 0 0 0 0

Python DataFrame Accumulator Based on Flag

I have a logic-driven flag column and I need to create a column that increments by 1 when the flag is true and decrements by 1 when the flag is false down to a floor of zero.
I've tried a few different methods and I can't get the Accumulator 'shift' to reference the new value created by the process. I know the method below wouldn't stop at zero anyway, but I was just trying to work through the concept before and this is the most to-the-point example to explain the goal. Do I need a for loop to iterate line-by-line?
df = pd.DataFrame(data=np.random.randint(2,size=10), columns=['flag'])
df['accum'] = 0
df['accum'] = np.where(df['flag'] == 1, df['accum'].shift(1) + 1, df['accum'].shift(1) - 1)
df['dOutput'] = [1,0,1,2,1,2,3,2,1,0] #desired output
df
Output

As far as I know, there's no numpy or pandas vectorized operation to do this, so, you should iterate line-by-line:
def cumsum_with_floor(series):
acc = 0
output = []
accum_list = []
for val in series:
val = 1 if val else -1
acc += val
accum_list.append(val)
acc = acc if acc > 0 else 0
output.append(acc)
return pd.Series(output, index=series.index), pd.Series(accum_list, index=series.index)
series = pd.Series([1,0,1,1,0,0,0,1])
dOutput, accum = cumsum_with_floor(series)
dOutput
Out:
0 1
1 0
2 1
3 2
4 1
5 0
6 0
7 1
dtype: int64
accum # shifted by one step forward compared with you example
Out:
0 1
1 -1
2 1
3 1
4 -1
5 -1
6 -1
7 1
dtype: int64
But may be there's somebody who knows suitable combination of pd.clip and pd.cumsum or other vectorized operations.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Multiple conditional statements on list comprehension - python

Do you want this? from itertools import groupby result = [0 if index == 0 and key == 0 else index for index, (key, group) in enumerate(groupby(my_values)) for _ in group ] print(result) Replace my_values in the list comprehension via - df['cluster'].values. to test

Related

Python inner loop and outer loop with counter to iterate over list

how to merge rows summing over each column without iteration

Vectorized function with counter on pandas dataframe column

Looping over a pandas column and creating a new column if it meets conditions

Python DataFrame Accumulator Based on Flag

Categories

Resources