constrain a series or array to a range of values

constrain a series or array to a range of values - python

I have a series of values that I want to have constrained to be within +1 and -1.
s = pd.Series(np.random.randn(10000))
I know I can use apply, but is there a simple vectorized approach?
s_ = s.apply(lambda x: min(max(x, -1), 1))
s_.head()
0 -0.256117
1 0.879797
2 1.000000
3 -0.711397
4 -0.400339
dtype: float64

Use clip:
s = s.clip(-1,1)
Example Input:
s = pd.Series([-1.2, -0.5, 1, 1.1])
0 -1.2
1 -0.5
2 1.0
3 1.1
Example Output:
0 -1.0
1 -0.5
2 1.0
3 1.0

You can use the between Series method:
In [11]: s[s.between(-1, 1)]
Out[11]:
0 -0.256117
1 0.879797
3 -0.711397
4 -0.400339
5 0.667196
...
Note: This discards the values outside of the between range.

Use nested np.where
pd.Series(np.where(s < -1, -1, np.where(s > 1, 1, s)))
Timing

One more suggestion:
s[s<-1] = -1
s[s>1] = 1

Related

simple mapping of pandas series to 0 and 1s given threshold

I am sorry for asking such a simple question (yes I googled). Do I really require 2 steps to map a simple pandas series of float between 0 and 1s to 0 and 1s given a threshold. This is the reproducible example:
series = pd.Series([0.0, 0.3, 0.6, 1.0])
threshold = 0.5
print(series)
series[series > threshold] = 1.0
series[series <= threshold] = 0.0
print(series)
It works producing:
0 0.0
1 0.0
2 1.0
3 1.0
from:
0 0.0
1 0.3
2 0.6
3 1.0

You can use the > operator.
series = (series > threshold).astype(int)
print(series)
Output:
0 0
1 0
2 1
3 1
dtype: int32

You could also apply a function on all elements using map() like
series = series.map(lambda x: 1.0 if x > threshold else 0.0)

I'd use numpy.where:
np.where(series > threshold, 1, 0)

Pandas column that depends on its previous value (row)?

I would like to create a 3rd column in my dataframe, which depends on both the new and existing columns in the previous row.
This new column should start at 0.
I would like my 3rd column to start at 0.
Its next value is its previous value plus df.below_lo[i] (if the previous value was 0).
If its previous value was 1, its next value is its previous value plus df.above_hi[i].
I think I have two issues: how to initiate this 3rd column and how to make it dependent on itself.
import pandas as pd
import math
data = {'below_lo': [0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
'above_hi': [0, 0, -1, 0, -1, 0, -1, 0, 0, 0, 0, 0, 0]}
df = pd.DataFrame(data)
df['pos'] = math.nan
df['pos'][0] = 0
for i in range(len(df.below_lo)):
if df.pos[i] == 0:
df.pos[i+1] = df.pos[i] + df.below_lo[i]
if df.pos[i] == 1:
df.pos[i+1] = df.pos[i] + df.above_hi[i]
print(df)
The desired output is:
below_lo above_hi pos
0 0.0 0.0 0.0
1 1.0 0.0 0.0
2 0.0 -1.0 1.0
3 0.0 0.0 0.0
4 0.0 -1.0 0.0
5 0.0 0.0 0.0
6 0.0 -1.0 0.0
7 0.0 0.0 0.0
8 0.0 0.0 0.0
9 1.0 0.0 0.0
10 0.0 0.0 1.0
11 0.0 0.0 1.0
12 0.0 0.0 1.0
13 NaN NaN 1.0
The above code produces the correct output, except I am also getting a few of these error messages:
A value is trying to be set on a copy of a slice from a DataFrame
How do I clean this code up so that it runs without throwing this warning? ?

Use .loc:
df.loc[0, 'pos'] = 0
for i in range(len(df.below_lo)):
if df.loc[i, 'pos'] == 0:
df.loc[i+1, 'pos'] = df.loc[i, 'pos'] + df.loc[i, 'below_lo']
if df.loc[i, 'pos'] == 1:
df.loc[i+1, 'pos'] = df.loc[i, 'pos'] + df.loc[i, 'above_hi']

Appreciate there is an accepted, and perfectly good, answer by #Michael O. already, but if you dislike iterating over rows as not-quite Pandas-esque, here is a solution without explicit looping over rows:
from functools import reduce
res = reduce(lambda d, _ :
d.fillna({'pos':d['pos'].shift(1)
+ (d['pos'].shift(1) == 0) * d['below_lo']
+ (d['pos'].shift(1) == 1) * d['above_hi']}),
range(len(df)), df)
res
produces
below_lo above_hi pos
-- ---------- ---------- -----
0 0 0 0
1 1 0 1
2 0 -1 0
3 0 0 0
4 0 -1 0
5 0 0 0
6 0 -1 0
7 0 0 0
8 0 0 0
9 1 0 1
10 0 0 1
11 0 0 1
12 0 0 1
It is, admittedly, somewhat less efficient and has a bit more obscure syntax. But it could be written on a single line (even if I split it over a few for clarity)!
The idea is that we can use fillna(..) function by passing the value, calculated from the previous value of 'pos' (hence shift(1)) and current values of 'below_lo' and 'above_hi'. The extra complication here is that this operation will only fill NaN with a non-NaN for the row just below the one with non-NaN value. Hence we need to apply this function repeatedly until all NaNs are filled, and this is where reduce comes into play

How to create a column using a function based of previous values in the column in python

My Problem
I have a loop that creates a value for x in time period t based on x in time period t-1. The loop is really slow so i wanted to try and turn it into a function. I tried to use np.where with shift() but I had no joy. Any idea how i might be able to get around this problem?
Thanks!
My Code
import numpy as np
import pandas as pd
csv1 = pd.read_csv('y_list.csv', delimiter = ',')
df = pd.DataFrame(csv1)
df.loc[df.index[0], 'var'] = 0
for x in range(1,len(df.index)):
if df["LAST"].iloc[x] > 0:
df["var"].iloc[x] = ((df["var"].iloc[x - 1] * 2) + df["LAST"].iloc[x]) / 3
else:
df["var"].iloc[x] = (df["var"].iloc[x - 1] * 2) / 3
df
Input Data
Dates,LAST
03/09/2018,-7
04/09/2018,5
05/09/2018,-4
06/09/2018,5
07/09/2018,-6
10/09/2018,6
11/09/2018,-7
12/09/2018,7
13/09/2018,-9
Output
Dates,LAST,var
03/09/2018,-7,0.000000
04/09/2018,5,1.666667
05/09/2018,-4,1.111111
06/09/2018,5,2.407407
07/09/2018,-6,1.604938
10/09/2018,6,3.069959
11/09/2018,-7,2.046639
12/09/2018,7,3.697759
13/09/2018,-9,2.465173

You are looking at ewm:
arg = df.LAST.clip(lower=0)
arg.iloc[0] = 0
arg.ewm(alpha=1/3, adjust=False).mean()
Output:
0 0.000000
1 1.666667
2 1.111111
3 2.407407
4 1.604938
5 3.069959
6 2.046639
7 3.697759
8 2.465173
Name: LAST, dtype: float64

You can use df.shift to shift the dataframe be a default of 1 row, and convert the if-else block in to a vectorized np.where:
In [36]: df
Out[36]:
Dates LAST var
0 03/09/2018 -7 0.0
1 04/09/2018 5 1.7
2 05/09/2018 -4 1.1
3 06/09/2018 5 2.4
4 07/09/2018 -6 1.6
5 10/09/2018 6 3.1
6 11/09/2018 -7 2.0
7 12/09/2018 7 3.7
8 13/09/2018 -9 2.5
In [37]: (df.shift(1)['var']*2 + np.where(df['LAST']>0, df['LAST'], 0)) / 3
Out[37]:
0 NaN
1 1.666667
2 1.133333
3 2.400000
4 1.600000
5 3.066667
6 2.066667
7 3.666667
8 2.466667
Name: var, dtype: float64

Reclassification by column name in pandas

I would like to apply a test to a pandas dataframe, and create flags in a corresponding dataframe based on the test results. I've gotten this far:
import numpy as np
import pandas as pd
matrix = pd.DataFrame({'a': [1, 11, 2, 3, 4], 'b': [5, 6, 22, 8, 9]})
flags = pd.DataFrame(np.zeros(matrix.shape), columns=matrix.columns)
flag_values = pd.Series({"a": 100, "b": 200})
flags[matrix > 10] = flag_values
but this raises the error
ValueError: Must specify axis=0 or 1
Where can I specify the axis in this situation? Is there a better way to accomplish this?
Edit:
The result I'm looking for in this example for "flags" is
a b
0 0
100 0
0 200
0 0
0 0

You could define flags = (matrix > 10) * flag_values:
In [35]: (matrix > 10) * flag_values
Out[35]:
a b
0 0 0
1 100 0
2 0 200
3 0 0
4 0 0
This relies on True having numeric value 1 and False having numeric value 0.
It also relies on Pandas' nifty automatic alignment of DataFrames (and Series) based on labels before performing arithmetic operations.

mask with mul
flags.mask(matrix > 10,1).mul(flag_values,axis=1)
Out[566]:
a b
0 0.0 0.0
1 100.0 0.0
2 0.0 200.0
3 0.0 0.0
4 0.0 0.0

Pandas DataFrame use previous row value for complicated 'if' conditions to determine current value

I want to know if there is any faster way to do the following loop? Maybe use apply or rolling apply function to realize this
Basically, I need to access previous row's value to determine current cell value.
df.ix[0] = (np.abs(df.ix[0]) >= So) * np.sign(df.ix[0])
for i in range(1, len(df)):
for col in list(df.columns.values):
if ((df[col].ix[i] > 1.25) & (df[col].ix[i-1] == 0)) | :
df[col].ix[i] = 1
elif ((df[col].ix[i] < -1.25) & (df[col].ix[i-1] == 0)):
df[col].ix[i] = -1
elif ((df[col].ix[i] <= -0.75) & (df[col].ix[i-1] < 0)) | ((df[col].ix[i] >= 0.5) & (df[col].ix[i-1] > 0)):
df[col].ix[i] = df[col].ix[i-1]
else:
df[col].ix[i] = 0
As you can see, in the function, I am updating the dataframe, I need to access the most updated previous row, so using shift will not work.
For example:
Input:
A B C
1.3 -1.5 0.7
1.1 -1.4 0.6
1.0 -1.3 0.5
0.4 1.4 0.4
Output:
A B C
1 -1 0
1 -1 0
1 -1 0
0 1 0

you can use .shift() function for accessing previous or next values:
previous value for col column:
df['col'].shift()
next value for col column:
df['col'].shift(-1)
Example:
In [38]: df
Out[38]:
a b c
0 1 0 5
1 9 9 2
2 2 2 8
3 6 3 0
4 6 1 7
In [39]: df['prev_a'] = df['a'].shift()
In [40]: df
Out[40]:
a b c prev_a
0 1 0 5 NaN
1 9 9 2 1.0
2 2 2 8 9.0
3 6 3 0 2.0
4 6 1 7 6.0
In [43]: df['next_a'] = df['a'].shift(-1)
In [44]: df
Out[44]:
a b c prev_a next_a
0 1 0 5 NaN 9.0
1 9 9 2 1.0 2.0
2 2 2 8 9.0 6.0
3 6 3 0 2.0 6.0
4 6 1 7 6.0 NaN

I am surprised there isn't a native pandas solution to this as well, because shift and rolling do not get it done. I have devised a way to do this using the standard pandas syntax but I am not sure if it performs any better than your loop... My purposes just required this for consistency (not speed).
import pandas as pd
df = pd.DataFrame({'a':[0,1,2], 'b':[0,10,20]})
new_col = 'c'
def apply_func_decorator(func):
prev_row = {}
def wrapper(curr_row, **kwargs):
val = func(curr_row, prev_row)
prev_row.update(curr_row)
prev_row[new_col] = val
return val
return wrapper
#apply_func_decorator
def running_total(curr_row, prev_row):
return curr_row['a'] + curr_row['b'] + prev_row.get('c', 0)
df[new_col] = df.apply(running_total, axis=1)
print(df)
# Output will be:
# a b c
# 0 0 0 0
# 1 1 10 11
# 2 2 20 33
Disclaimer: I used pandas 0.16 but with only slight modification this will work for the latest versions too.
Others had similar questions and I posted this solution on those as well:
Reference previous row when iterating through dataframe
Reference values in the previous row with map or apply

#maxU has it right with shift, I think you can even compare dataframes directly, something like this:
df_prev = df.shift(-1)
df_out = pd.DataFrame(index=df.index,columns=df.columns)
df_out[(df>1.25) & (df_prev == 0)] = 1
df_out[(df<-1.25) & (df_prev == 0)] = 1
df_out[(df<-.75) & (df_prev <0)] = df_prev
df_out[(df>.5) & (df_prev >0)] = df_prev
The syntax may be off, but if you provide some test data I think this could work.
Saves you having to loop at all.
EDIT - Update based on comment below
I would try my absolute best not to loop through the DF itself. You're better off going column by column, sending to a list and doing the updating, then just importing back again. Something like this:
df.ix[0] = (np.abs(df.ix[0]) >= 1.25) * np.sign(df.ix[0])
for col in df.columns.tolist():
currData = df[col].tolist()
for currRow in range(1,len(currData)):
if currData[currRow]> 1.25 and currData[currRow-1]== 0:
currData[currRow] = 1
elif currData[currRow] < -1.25 and currData[currRow-1]== 0:
currData[currRow] = -1
elif currData[currRow] <=-.75 and currData[currRow-1]< 0:
currData[currRow] = currData[currRow-1]
elif currData[currRow]>= .5 and currData[currRow-1]> 0:
currData[currRow] = currData[currRow-1]
else:
currData[currRow] = 0
df[col] = currData

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

constrain a series or array to a range of values - python

Use clip: s = s.clip(-1,1) Example Input: s = pd.Series([-1.2, -0.5, 1, 1.1]) 0 -1.2 1 -0.5 2 1.0 3 1.1 Example Output: 0 -1.0 1 -0.5 2 1.0 3 1.0

You can use the between Series method: In [11]: s[s.between(-1, 1)] Out[11]: 0 -0.256117 1 0.879797 3 -0.711397 4 -0.400339 5 0.667196 ... Note: This discards the values outside of the between range.

Use nested np.where pd.Series(np.where(s < -1, -1, np.where(s > 1, 1, s))) Timing

One more suggestion: s[s<-1] = -1 s[s>1] = 1

Related

simple mapping of pandas series to 0 and 1s given threshold

Pandas column that depends on its previous value (row)?

How to create a column using a function based of previous values in the column in python

Reclassification by column name in pandas

Pandas DataFrame use previous row value for complicated 'if' conditions to determine current value

Categories

Resources