Python checking if row is empty after slicing - python

I'm working on a program that pulls data from an excel file and inputs in a calendar. In the excel file, dates are on the y axis. I'm iterating cell by cell across each row, but if the entire row is empty (aside from the date), I want to perform a different action than if just some of the cells in the row were empty. I cannot reference the header names directly as they will not always be the same.
In the example below, for row 1 I'm iterating 0, 2, nan, 4, nan
For row 2, I want to print('empty row') before moving to row 3.
Date bx1 bx2 bx3 bx4 bx5
1 0 2 4
2
3 0 1 2 3 4
4 1 2 3 4 5
5
6
7 0 1 2 3 4
I've tried this:
if pd.isnull(m):
print('emptyrow')
and this:
if pd.isna(df[1]):
print('empty row')
Here's code for context:
layout = [[sg.In(key='-CAL-', enable_events=True, visible=False),
sg.CalendarButton('Calendar', target='-CAL-', pad=None, font=('MS Sans Serif', 10, 'bold'),
button_color=('red', 'white'), format='%m/%d/%Y')],
[sg.OK(), sg.Cancel()]]
window = sg.Window('Data Collector', layout, grab_anywhere=False, size=(400, 280), return_keyboard_events=True,
finalize=True)
event, values = window.read()
adate = (values['-CAL-'])
stu = (values[0])
window.close()
df = pd.read_excel('C:\\Users\\aelfont\\Documents\\python_date_test.xlsx', Sheet_name=0, header=None)
x = len(df.columns) # length of bx
z = 1 # used to determine when at end of row
b = 1 # location of column to start summing
c = len(df.index) # number of days in the month
r = 1 # used to stop once last day of month reached
y = 1
# while date < last day in month, do action
# while the above is true, enter data until end of row
# once at end of row, submit and move to next row
while y < c:
while z < x:
n = int((values['-CAL-'][3:5]))
m = df.iloc[n, b]
z = z + 1
b = b + 1
if pd.isnull(m):
ActionChains(browser) \
.send_keys(Keys.TAB) \
.perform()
continue
else:
ActionChains(browser) \
.send_keys(str(m)) \
.perform()
if z == x:
z = 1
b = 1
n = n + 1
y = y + 1
time.sleep(5)
yes = browser.find_element_by_css_selector('button.publishBottom:nth-child(2)')
time.sleep(5)
yes.click()
else:
ActionChains(browser) \
.send_keys(Keys.TAB) \
.perform()
if y == c:
break
if pd.isnull(m):
print('emptyrow')

can you try the following:
if pd.isnull(m).all():
print('emptyrow')
Full Code:
df = df.set_index('Date')
print(df)
for ind, row in df.iterrows():
print(pd.isnull(row).all())
Output:
bx1 bx2 bx3 bx4 bx5
Date
1 0.0 2.0 4.0 NaN NaN
2 NaN NaN NaN NaN NaN
3 0.0 1.0 2.0 3.0 4.0
4 1.0 2.0 3.0 4.0 5.0
5 NaN NaN NaN NaN NaN
6 NaN NaN NaN NaN NaN
7 0.0 1.0 2.0 3.0 4.0
False
True
False
False
True
True
False

Related

Rolling and Mode function to get the majority of voting for rows in pandas Dataframe

I have a pandas Dataframe:
np.random.seed(0)
df = pd.DataFrame({'Close': np.random.uniform(0, 100, size=10)})
lbound, ubound = 0, 1
change = df["Close"].diff()
df["Change"] = change
df["Result"] = np.select([ np.isclose(change, 1) | np.isclose(change, 0) | np.isclose(change, -1),
# The other conditions
(change > 0) & (change > ubound),
(change < 0) & (change < lbound),
change.between(lbound, ubound)],[0, 1, -1, 0])
Close Change Result
0 54.881350 NaN 0
1 71.518937 16.637586 1
2 60.276338 -11.242599 -1
3 54.488318 -5.788019 -1
4 42.365480 -12.122838 -1
5 64.589411 22.223931 1
6 43.758721 -20.830690 -1
7 89.177300 45.418579 1
8 96.366276 7.188976 1
9 38.344152 58.022124 -1
Problem statement - Now, I want the majority of voting for index 1,2,3,4 assigned to index 0, index 2,3,4,5 assigned to index 1 of result columns, and so on for all the subsequent indexes.
I tried:
df['Voting'] = df['Result'].rolling(window = 4,min_periods=1).apply(lambda x: x.mode()[0]).shift()
But,this doesn't give the result I intend. It takes the first 4 rolling window and applies the mode function.
Close Change Result Voting
0 54.881350 NaN 0 NaN
1 71.518937 16.637586 1 0.0
2 60.276338 -11.242599 -1 0.0
3 54.488318 -5.788019 -1 -1.0
4 42.36548 -12.122838 -1 -1.0
5 64.589411 22.223931 1 -1.0
6 43.758721 -20.830690 -1 -1.0
7 89.177300 45.418579 1 -1.0
8 96.366276 7.188976 1 -1.0
9 38.344152 -58.022124 -1 1.0
Result I Intend - Rolling window of 4(index 1,2,3,4) should be set and mode function be applied and result
should be assigned to index 0,then next rolling window(index 2,3,4,5) and result should
be assigned to index 1 and so on..
You have to reverse your list before then shift of 1 (because you don't want the current index in the result):
majority = lambda x: 0 if len((m := x.mode())) > 1 else m[0]
df['Voting'] = (df[::-1].rolling(4, min_periods=1)['Result']
.apply(majority).shift())
print(df)
# Output
Close Change Result Voting
0 54.881350 NaN 0 -1.0
1 71.518937 16.637586 1 -1.0
2 60.276338 -11.242599 -1 -1.0
3 54.488318 -5.788019 -1 0.0
4 42.365480 -12.122838 -1 1.0
5 64.589411 22.223931 1 0.0
6 43.758721 -20.830690 -1 1.0
7 89.177300 45.418579 1 0.0
8 96.366276 7.188976 1 -1.0
9 38.344152 58.022124 -1 NaN

How to insert n random NaN consecutive and no consecutive data by month?

I have this df:
CODE DATE PP
0 000130 1991-01-01 0.0
1 000130 1991-01-02 1.0
2 000130 1991-01-03 2.0
3 000130 1991-01-04 2.0
4 000130 1991-01-05 1.1
... ... ...
10861 000142 2020-12-27 2.1
10862 000142 2020-12-28 2.2
10863 000142 2020-12-29 2.1
10864 000142 2020-12-30 0.4
10865 000142 2020-12-31 1.1
I want to have at least 3 consecutive nans and 5 non consecutive nans in df['PP'] by each df['CODE'] with their corresponding df['DATE'].dt.year and df['DATE'].dt.month
so i must convert random values of df['PP'] to NaN to reach that 3 consecutive and 5 non consecutive NaNs. Expected result:
CODE DATE PP
0 000130 1991-01-01 0.0
1 000130 1991-01-02 NaN
2 000130 1991-01-03 NaN
3 000130 1991-01-04 NaN
4 000130 1991-01-05 1.1
5 000130 1991-01-06 2.1
6 000130 1991-01-07 NaN
7 000130 1991-01-08 2.1
8 000130 1991-01-09 0.4
9 000130 1991-01-10 NaN
... ... ... ...
Important: consecutive nans + alternate nans = 5. So i can have 3 consecutive nans per month inside the 5 nans. And if i already have n nans in a month, i should only add the difference to reach 5 nans. For example if i already have 2 nans in a month i should only add 3 consecutive nans. If i already have 5 nans in the month the code should do nothing with that month.
I tried this:
df['PPNEW']=df['PP'].groupby([df['CODE'],df['DATE'].dt.month]).sample(frac=0.984)
But i can't get the exact quantity of NaNs (only in percentage and months sometimes have 30-31 days) and i can't get consecutive NaNs.
Would you mind to help me?
Thanks in advance.
This is not exactly beautiful code, but it does the job assuming that there are no existing NaN in your data
import numpy as np
import pandas as pd
def add_nans(df, n_consecutive=3, n_alternate=5):
seq = list(df["PP"].values)
indexes = list(range(len(seq)))
idx = np.random.randint(0, len(seq) - n_consecutive + 1)
seq[idx : idx + n_consecutive] = ["nan"] * n_consecutive
if 0 < idx < len(seq) - n_consecutive:
indexes = indexes[:idx - 1] + indexes[idx + n_consecutive + 1:]
elif idx == 0:
indexes = indexes[n_consecutive + 1:]
elif idx == len(seq) - n_consecutive:
indexes = indexes[:idx - 1]
for i in range(n_alternate):
choice = np.random.randint(0, len(indexes))
idx = indexes.pop(choice)
try:
indexes.pop(choice)
except IndexError:
pass
try:
indexes.pop(choice - 1)
except IndexError:
pass
seq[idx] = "nan"
df["PP"] = seq
return df
Here it is the dataframe I tested this function on:
>>> df
CODE DATE PP
0 130 1991-01-01 0.0
1 130 1991-01-02 1.0
2 130 1991-01-03 2.0
3 130 1991-01-04 2.0
4 130 1991-01-05 1.1
5 142 2020-12-27 2.1
6 142 2020-12-28 2.2
7 142 2020-12-29 2.1
8 142 2020-12-30 0.4
9 142 2020-12-31 1.1
Here it is the final result once you apply the function to each group
>>> (df
.groupby(["CODE", df["DATE"].dt.month])
.apply(add_nans, n_consecutive=2, n_alternate=1))
CODE DATE PP
0 130 1991-01-01 nan
1 130 1991-01-02 nan
2 130 1991-01-03 2
3 130 1991-01-04 2
4 130 1991-01-05 nan
5 142 2020-12-27 nan
6 142 2020-12-28 2.2
7 142 2020-12-29 2.1
8 142 2020-12-30 nan
9 142 2020-12-31 nan
In your case n_consecutive = 3 and n_alternate = 2.
def main():
df = generate_df();
df = pd.read_csv('row_data.csv')
df["DATE"] = pd.to_datetime(df['DATE'])
df["Year"] = df['DATE'].dt.year
df['Month'] = df["DATE"].dt.month
final_df = pd.DataFrame(columns=['CODE', 'DATE', 'PP'])
for year in df["Year"].unique().tolist():
year_df = df[df["Year"] == year]
for code in year_df["CODE"].unique().tolist():
specific_code_df = year_df[year_df['CODE'] == code]
for month in specific_code_df['Month'].unique().tolist():
specific_by_month_df = specific_code_df[specific_code_df['Month'] == month]
full_months = len(specific_by_month_df)
if(full_months < 28):
final_df = pd.concat([final_df, specific_by_month_df])
continue
null_counts = specific_by_month_df['PP'].isna().sum()
if(null_counts == 0):
pp_column = list(specific_by_month_df['PP'])
for i in range(1, 4):
pp_column[i] = NaN
rand_index_1 = randint(5, (len(pp_column) // 2) - 1)
rand_index_2 = randint(len(pp_column) // 2, len(pp_column) - 1)
pp_column[rand_index_1] = NaN
pp_column[rand_index_2] = NaN
specific_by_month_df['PP'] = pp_column
elif(null_counts == 1):
pass
elif(null_counts == 2):
pass
elif(null_counts == 3):
pass
elif(null_counts == 4):
pass
else:
pass
final_df = pd.concat([final_df, specific_by_month_df])
final_df.to_csv("final_df.csv", index=False)
This is not complete , but just wanted to try it out.
Try the next, it will work if the main conditions are no broken, that is if there are no more than 3 consecutives nan, and more than 5 in total. If you wanna fix those cases use a bunch of if and it will do the job, but this is enough, I think. If you wanted by 'CODE' just change the df.groupby([df['CODE'], df['date'].dt.month]) to df.groupby(['CODE']). Let me know if it works ;)
from random import randrange
from datetime import datetime, timedelta
import numpy as np
from pandas import Series
def rand(start, stop, *exclude):
start_date = datetime.now()
x = randrange(start, stop)
exc = [0 if i < 0 else i for i in exclude]
while x in exc:
x = rand(start, stop)
assert datetime.now() - start_date < timedelta(seconds=5), 'Took TOO long!!!'
return x
def getPositions(L: list) -> list:
tot = []
for x in L:
if not isinstance(x, list):
tot.append(x)
else:
tot += getPositions(x)
return tot
def addNan(values: list, cons_nan: list=[], non_cons_nan: list=[]) -> list:
temp_non_cons = non_cons_nan[:]
temp_cons = cons_nan[:]
while len(temp_non_cons) + len(temp_cons) < 5:
while len(temp_non_cons) < 2:
exc = temp_non_cons[:]
for i in temp_cons:
exc += [i-1, i, i+1]
temp_non_cons.append(rand(0, len(values), *exc))
while len(temp_cons) < 3:
exc = []
for i in temp_non_cons:
exc += [i-1, i, i+1]
exc += temp_cons
non_exc = []
for i in temp_cons:
non_exc += [i-1, i, i+1]
if non_exc:
for i in range(0, len(values)):
if i not in non_exc:
exc.append(i)
try:
temp_cons.append(rand(0, len(values), *exc))
except AssertionError:
break
else:
break
temp_non_cons, temp_cons = cons_nan[:], non_cons_nan[:]
for i in temp_non_cons + temp_cons:
values[i] = np.nan
return values
def countNan(values: Series, x: int = 0) -> list:
if values.empty or not np.isnan(values.iloc[0]):
return []
else:
temp = x + 1
return [x] + countNan(values[1:], temp)
def runColumn(values: Series, count:int = 0) -> list:
if values.empty:
return []
else:
x = countNan(values)
if not x:
i = 0
temp = count + 1
return runColumn(values[i + 1:], temp)
else:
i = x[-1]
temp = count + i + 1
return [[i + count for i in x]] + runColumn(values[i + 1:], temp)
if __name__ == '__main__':
# Create the DF to analyze:
# ...
df['PP'] = df['PP'].apply(lambda x: np.nan if x == 0 else x)
groups = df.groupby([df['CODE'], df['date'].dt.month])
list2concat = []
for i, group in groups:
nans_positions = runColumn(group['PP'])
# 'nans_positions' will be a list like this [[x1, x1+1], [x2], [x3, x3+1, x3+2]]
# where each 'x' will be a position with a nan
nans_positions.sort(reverse=True)
if len(nans_positions ) > 2:
nans_positions = (nans_positions [0], getPositions(nans_positions [1:]))
group['PP'] = addNan(list(group['PP']), *nans_positions)
list2concat.append(group)
nan_df = pd.concat(list2concat, ignore_index=True)
print(nan_df)

Missing values replaced with average of its neighbors (timeseries)

I want all missing values from dataset to replace with average of two nearest neighbors. Except of first and last cells and when neighbors are 0 (then I manually fix values). I coded this and it works, but the solution is not very smart. Is is another way to do it faster? Interpolate method is suitable for that? I'm not quite sure how does it work.
Input:
0 1 2 3 4 5
0 0.0 1596.0 1578.0 1567.0 1580.0 1649.0
1 1554.0 1506.0 0.0 1466.0 1469.0 1503.0
2 1588.0 1510.0 1495.0 1485.0 1489.0 0.0
3 1592.0 0.0 0.0 1571.0 1647.0 0.0
Output:
0 1 2 3 4 5
0 0.0 1596.0 1578.0 1567.0 1580.0 1649.0
1 1554.0 1506.0 1486.0 1466.0 1469.0 1503.0
2 1588.0 1510.0 1495.0 1485.0 1489.0 1540.5
3 1592.0 0.0 0.0 1571.0 1647.0 0.0
Code:
data_len = len(df)
first_col = str(df.columns[0])
last_col = str(df.columns[len(df.columns) - 1])
d = df.apply(lambda s: pd.to_numeric(s, errors="coerce"))
m = d.eq(0) | d.isna()
s = m.stack()
list = s[s].index.tolist() #list of indeces of missing values
count = len(list)
for el in list:
if (el == ('0', first_col) or el == (str(data_len - 1), last_col)):
continue
next = df.at[str(int(el[0]) + 1), first_col] if el[1] == last_col else df.at[el[0], str(int(el[1]) + 1)]
prev = df.at[str(int(el[0]) - 1), last_col] if el[1] == first_col else df.at[el[0], str(int(el[1]) - 1)]
if prev == 0 or next == 0:
continue
df.at[el[0],el[1]] = (prev + next)/2
JSON of example:
{"0":{"0":0.0,"1":1554.0,"2":1588.0,"3":0.0},"1":{"0":1596.0,"1":1506.0,"2":1510.0,"3":0.0},"2":{"0":1578.0,"1":0.0,"2":1495.0,"3":1561.0},"3":{"0":1567.0,"1":1466.0,"2":1485.0,"3":1571.0},"4":{"0":1580.0,"1":1469.0,"2":1489.0,"3":1647.0},"5":{"0":1649.0,"1":1503.0,"2":0.0,"3":0.0}}
Here's one approach using shift to average the neighbour's values and slice assigning back to the dataframe:
m = df==0
r = (df.shift(axis=1)+df.shift(-1,axis=1))/2
df.iloc[1:-1,1:-1] = df.mask(m,r)
print(df)
0 1 2 3 4 5
0 0.0 1596.0 1578.0 1567.0 1580.0 1649.0
1 1554.0 1506.0 1486.0 1466.0 1469.0 1503.0
2 1588.0 1510.0 1495.0 1485.0 1489.0 0.0
3 0.0 0.0 1561.0 1571.0 1647.0 0.0

In Pandas, how can I count consecutive positive and negatives in a row?

In python pandas or numpy, is there a built-in function or a combination of functions that can count the number of positive or negative values in a row?
This could be thought of as similar to a roulette wheel with the number of blacks or reds in a row.
Example input series data:
Date
2000-01-07 -3.550049
2000-01-10 28.609863
2000-01-11 -2.189941
2000-01-12 4.419922
2000-01-13 17.690185
2000-01-14 41.219971
2000-01-18 0.000000
2000-01-19 -16.330078
2000-01-20 7.950195
2000-01-21 0.000000
2000-01-24 38.370117
2000-01-25 6.060059
2000-01-26 3.579834
2000-01-27 7.669922
2000-01-28 2.739991
2000-01-31 -8.039795
2000-02-01 10.239990
2000-02-02 -1.580078
2000-02-03 1.669922
2000-02-04 7.440186
2000-02-07 -0.940185
Desired output:
- in a row 5 times
+ in a row 4 times
++ in a row once
++++ in a row once
+++++++ in a row once
Nonnegatives:
from functools import reduce # For Python 3.x
ser = df['x'] >= 0
c = ser.expanding().apply(lambda r: reduce(lambda x, y: x + 1 if y else x * y, r))
c[ser & (ser != ser.shift(-1))].value_counts()
Out:
1.0 2
7.0 1
4.0 1
2.0 1
Name: x, dtype: int64
Negatives:
ser = df['x'] < 0
c = ser.expanding().apply(lambda r: reduce(lambda x, y: x + 1 if y else x * y, r))
c[ser & (ser != ser.shift(-1))].value_counts()
Out:
1.0 6
Name: x, dtype: int64
Basically, it creates a boolean series takes the cumulative count between the turning points (when the sign changes, it starts over). For example, for nonnegatives, c is:
Out:
0 0.0
1 1.0 # turning point
2 0.0
3 1.0
4 2.0
5 3.0
6 4.0 # turning point
7 0.0
8 1.0
9 2.0
10 3.0
11 4.0
12 5.0
13 6.0
14 7.0 # turning point
15 0.0
16 1.0 # turning point
17 0.0
18 1.0
19 2.0 # turning point
20 0.0
Name: x, dtype: float64
Now, in order to identify the turning points the condition is that the current value is different than the next and it is True. If you select those, you have the counts.
You can use itertools.groupby() function.
import itertools
l = [-3.550049, 28.609863, -2.189941, 4.419922, 17.690185, 41.219971, 0.000000, -16.330078, 7.950195, 0.000000, 38.370117, 6.060059, 3.579834, 7.669922, 2.739991, -8.039795, 10.239990, -1.580078, 1.669922, 7.440186, -0.940185]
r_pos = {}
r_neg = {}
for k, v in itertools.groupby(l, lambda e:e>0):
count = len(list(v))
r = r_pos
if k == False:
r = r_neg
if count not in r.keys():
r[count] = 0
r[count] += 1
for k, v in r_neg.items():
print '%s in a row %s time(s)' % ('-'*k, v)
for k, v in r_pos.items():
print '%s in a row %s time(s)' % ('+'*k, v)
output
- in a row 6 time(s)
+ in a row 2 time(s)
++ in a row 1 time(s)
++++ in a row 1 time(s)
+++++++ in a row 1 time(s)
depending on what you consider as a positive value, you can change the line lambda e:e>0
So far this is what I've come up with, it works and outputs a count for how many times each of the negative, positive and zero values occur in a row. Maybe someone can make it more concise using some of the suggestions posted by ayhan and Ghilas above.
from collections import Counter
ser = [-3.550049, 28.609863, -2.1, 89941,4.419922,17.690185,41.219971,0.000000,-16.330078,7.950195,0.000000,38.370117,6.060059,3.579834,7.669922,2.739991,-8.039795,10.239990,-1.580078, 1.669922, 7.440186,-0.940185]
c = 0
zeros, neg_counts, pos_counts = [], [], []
for i in range(len(ser)):
c+=1
s = np.sign(ser[i])
try:
if s != np.sign(ser[i+1]):
if s == 0:
zeros.append(c)
elif s == -1:
neg_counts.append(c)
elif s == 1:
pos_counts.append(c)
c = 0
except IndexError:
pos_counts.append(c) if s == 1 else neg_counts.append(c) if s ==-1 else zeros.append(c)
print(Counter(neg_counts))
print(Counter(pos_counts))
print(Counter(zeros))
Out:
Counter({1: 5})
Counter({1: 3, 2: 1, 4: 1, 5: 1})
Counter({1: 2})

Pandas DataFrame use previous row value for complicated 'if' conditions to determine current value

I want to know if there is any faster way to do the following loop? Maybe use apply or rolling apply function to realize this
Basically, I need to access previous row's value to determine current cell value.
df.ix[0] = (np.abs(df.ix[0]) >= So) * np.sign(df.ix[0])
for i in range(1, len(df)):
for col in list(df.columns.values):
if ((df[col].ix[i] > 1.25) & (df[col].ix[i-1] == 0)) | :
df[col].ix[i] = 1
elif ((df[col].ix[i] < -1.25) & (df[col].ix[i-1] == 0)):
df[col].ix[i] = -1
elif ((df[col].ix[i] <= -0.75) & (df[col].ix[i-1] < 0)) | ((df[col].ix[i] >= 0.5) & (df[col].ix[i-1] > 0)):
df[col].ix[i] = df[col].ix[i-1]
else:
df[col].ix[i] = 0
As you can see, in the function, I am updating the dataframe, I need to access the most updated previous row, so using shift will not work.
For example:
Input:
A B C
1.3 -1.5 0.7
1.1 -1.4 0.6
1.0 -1.3 0.5
0.4 1.4 0.4
Output:
A B C
1 -1 0
1 -1 0
1 -1 0
0 1 0
you can use .shift() function for accessing previous or next values:
previous value for col column:
df['col'].shift()
next value for col column:
df['col'].shift(-1)
Example:
In [38]: df
Out[38]:
a b c
0 1 0 5
1 9 9 2
2 2 2 8
3 6 3 0
4 6 1 7
In [39]: df['prev_a'] = df['a'].shift()
In [40]: df
Out[40]:
a b c prev_a
0 1 0 5 NaN
1 9 9 2 1.0
2 2 2 8 9.0
3 6 3 0 2.0
4 6 1 7 6.0
In [43]: df['next_a'] = df['a'].shift(-1)
In [44]: df
Out[44]:
a b c prev_a next_a
0 1 0 5 NaN 9.0
1 9 9 2 1.0 2.0
2 2 2 8 9.0 6.0
3 6 3 0 2.0 6.0
4 6 1 7 6.0 NaN
I am surprised there isn't a native pandas solution to this as well, because shift and rolling do not get it done. I have devised a way to do this using the standard pandas syntax but I am not sure if it performs any better than your loop... My purposes just required this for consistency (not speed).
import pandas as pd
df = pd.DataFrame({'a':[0,1,2], 'b':[0,10,20]})
new_col = 'c'
def apply_func_decorator(func):
prev_row = {}
def wrapper(curr_row, **kwargs):
val = func(curr_row, prev_row)
prev_row.update(curr_row)
prev_row[new_col] = val
return val
return wrapper
#apply_func_decorator
def running_total(curr_row, prev_row):
return curr_row['a'] + curr_row['b'] + prev_row.get('c', 0)
df[new_col] = df.apply(running_total, axis=1)
print(df)
# Output will be:
# a b c
# 0 0 0 0
# 1 1 10 11
# 2 2 20 33
Disclaimer: I used pandas 0.16 but with only slight modification this will work for the latest versions too.
Others had similar questions and I posted this solution on those as well:
Reference previous row when iterating through dataframe
Reference values in the previous row with map or apply
#maxU has it right with shift, I think you can even compare dataframes directly, something like this:
df_prev = df.shift(-1)
df_out = pd.DataFrame(index=df.index,columns=df.columns)
df_out[(df>1.25) & (df_prev == 0)] = 1
df_out[(df<-1.25) & (df_prev == 0)] = 1
df_out[(df<-.75) & (df_prev <0)] = df_prev
df_out[(df>.5) & (df_prev >0)] = df_prev
The syntax may be off, but if you provide some test data I think this could work.
Saves you having to loop at all.
EDIT - Update based on comment below
I would try my absolute best not to loop through the DF itself. You're better off going column by column, sending to a list and doing the updating, then just importing back again. Something like this:
df.ix[0] = (np.abs(df.ix[0]) >= 1.25) * np.sign(df.ix[0])
for col in df.columns.tolist():
currData = df[col].tolist()
for currRow in range(1,len(currData)):
if currData[currRow]> 1.25 and currData[currRow-1]== 0:
currData[currRow] = 1
elif currData[currRow] < -1.25 and currData[currRow-1]== 0:
currData[currRow] = -1
elif currData[currRow] <=-.75 and currData[currRow-1]< 0:
currData[currRow] = currData[currRow-1]
elif currData[currRow]>= .5 and currData[currRow-1]> 0:
currData[currRow] = currData[currRow-1]
else:
currData[currRow] = 0
df[col] = currData

Categories

Resources