Related
I have the following boundary conditions for a time series in python.
The notation I use here is t_x, where x describe the time in milliseconds (this is not my code, I just thought this notation is good to explain my issue).
t_0 = 0
t_440 = -1.6
t_830 = 0
mean_value = -0.6
I want to create a list that contains 83 values (so the spacing is 10ms for each value).
The list should descibe a "curve" that starts at zero, has the minimum value of -1.6 at 440ms (so 44 in the list), ends with 0 at 880ms (so 83 in the list) and the overall mean value of the list should be -0.6.
I absolutely could not come up with an idea how to "fit" the boundaries to create such a list.
I would really appreciate help.
It is a quick and dirty approach, but it works:
X = list(range(0, 830 +1, 10))
Y = [0.0 for x in X]
Y[44] = -1.6
b = 12.3486
for x in range(44):
Y[x] = -1.6*(b*x+x**2)/(b*44+44**2)
for x in range(83, 44, -1):
Y[x] = -1.6*(b*(83-x)+(83-x)**2)/(b*38+38**2)
print(f'{sum(Y)/len(Y)=:8.6f}, {Y[0]=}, {Y[44]=}, {Y[83]=}')
from matplotlib import pyplot as plt
plt.plot(X,Y)
plt.show()
With the code giving following output:
sum(Y)/len(Y)=-0.600000, Y[0]=-0.0, Y[44]=-1.6, Y[83]=-0.0
And showing following diagram:
The first step in coming up with the above approach was to create a linear sloping 'curve' from the minimum to the zeroes. I turned out that linear approach gives here too large mean Y value what means that the 'curve' must have a sharp peak at its minimum and need to be approached with a polynomial. To make things simple I decided to use quadratic polynomial and approach the minimum from left and right side separately as the curve isn't symmetric. The b-value was found by trial and error and its precision can be increased manually or by writing a small function finding it in an iterative way.
Update providing a generic solution as requested in a comment
The code below provides a
meanYboundaryXY(lbc = [(0,0), (440,-1.6), (830,0), -0.6], shape='saw')
function returning the X and Y lists of the time series data calculated from the passed parameter with the boundary values:
def meanYboundaryXY(lbc = [(0,0), (440,-1.6), (830,0), -0.6]):
lbcXY = lbc[0:3] ; meanY_boundary = lbc[3]
minX = min(x for x,y in lbcXY)
maxX = max(x for x,y in lbcXY)
minY = lbc[1][1]
step = 10
X = list(range(minX, maxX + 1, step))
lenX = len(X)
Y = [None for x in X]
sumY = 0
for x, y in lbcXY:
Y[x//step] = y
sumY += y
target_sumY = meanY_boundary*lenX
if shape == 'rect':
subY = (target_sumY-sumY)/(lenX-3)
for i, y in enumerate(Y):
if y is None:
Y[i] = subY
elif shape == 'saw':
peakNextY = 2*(target_sumY-sumY)/(lenX-1)
iYleft = lbc[1][0]//step-1
iYrght = iYleft+2
iYstart = lbc[0][0] // step
iYend = lbc[2][0] // step
for i in range(iYstart, iYleft+1, 1):
Y[i] = peakNextY * i / iYleft
for i in range(iYend, iYrght-1, -1):
Y[i] = peakNextY * (iYend-i)/(iYend-iYrght)
else:
raise ValueError( str(f'meanYboundaryXY() EXIT, {shape=} not in ["saw","rect"]') )
return (X, Y)
X, Y = meanYboundaryXY()
print(f'{sum(Y)/len(Y)=:8.6f}, {Y[0]=}, {Y[44]=}, {Y[83]=}')
from matplotlib import pyplot as plt
plt.plot(X,Y)
plt.show()
The code outputs:
sum(Y)/len(Y)=-0.600000, Y[0]=0, Y[44]=-1.6, Y[83]=0
and creates following two diagrams for shape='rect' and shape='saw':
As an old geek, i try to solve the question with a simple algorithm.
First calculate points as two symmetric lines from 0 to 44 and 44 to 89 (orange on the graph).
Calculate sum except middle point and its ratio with sum of points when mean is -0.6, except middle point.
Apply ratio to previous points except middle point. (blue curve on the graph)
Obtain curve which was called "saw" by Claudio.
For my own, i think quadratic interpolation of Claudio is a better curve, but needs trial and error loops.
import matplotlib
# define goals
nbPoints = 89
msPerPoint = 10
midPoint = nbPoints//2
valueMidPoint = -1.6
meanGoal = -0.6
def createSerieLinear():
# two lines 0 up to 44, 44 down to 88 (89 values centered on 44)
serie=[0 for i in range(0,nbPoints)]
interval =valueMidPoint/midPoint
for i in range(0,midPoint+1):
serie[i]=i*interval
serie[nbPoints-1-i]=i*interval
return serie
# keep an original to plot
orange = createSerieLinear()
# work on a base
base = createSerieLinear()
# total except midPoint
totalBase = (sum(base)-valueMidPoint)
#total goal except 44
totalGoal = meanGoal*nbPoints - valueMidPoint
# apply ratio to reduce
reduceRatio = totalGoal/totalBase
for i in range(0,midPoint):
base[i] *= reduceRatio
base[nbPoints-1-i] *= reduceRatio
# verify
meanBase = sum(base)/nbPoints
print("new mean:",meanBase)
# draw
from matplotlib import pyplot as plt
X =[i*msPerPoint for i in range(0,nbPoints)]
plt.plot(X,base)
plt.plot(X,orange)
plt.show()
new mean: -0.5999999999999998
Hope you enjoy simple things :)
I have a piece of code that worked well when I optimized advertising budget with 2 variables (channels) but when I added aditional channels, it stopped optimizing with no error messages.
import numpy as np
import scipy.optimize as sco
# setup variables
media_budget = 100000 # total media budget
media_labels = ['launchvideoviews', 'conversion', 'traffic', 'videoviews', 'reach'] # channel names
media_coefs = [0.3524764781, 5.606903166, -0.1761937775, 5.678596017, 10.50445914] #
# model coefficients
media_drs = [-1.15, 2.09, 6.7, -0.201, 1.21] # diminishing returns
const = -243.1018144
# the function for our model
def model_function(x, media_coefs, media_drs, const):
# transform variables and multiply them by coefficients to get contributions
channel_1_contrib = media_coefs[0] * x[0]**media_drs[0]
channel_2_contrib = media_coefs[1] * x[1]**media_drs[1]
channel_3_contrib = media_coefs[2] * x[2]**media_drs[2]
channel_4_contrib = media_coefs[3] * x[3]**media_drs[3]
channel_5_contrib = media_coefs[4] * x[4]**media_drs[4]
# sum contributions and add constant
y = channel_1_contrib + channel_2_contrib + channel_3_contrib + channel_4_contrib + channel_5_contrib + const
# return negative conversions for the minimize function to work
return -y
# set up guesses, constraints and bounds
num_media_vars = len(media_labels)
guesses = num_media_vars*[media_budget/num_media_vars,] # starting guesses: divide budget evenly
args = (media_coefs, media_drs, const) # pass non-optimized values into model_function
con_1 = {'type': 'eq', 'fun': lambda x: np.sum(x) - media_budget} # so we can't go over budget
constraints = (con_1)
bound = (0, media_budget) # spend for a channel can't be negative or higher than budget
bounds = tuple(bound for x in range(5))
# run the SciPy Optimizer
solution = sco.minimize(model_function, x0=guesses, args=args, method='SLSQP', constraints=constraints, bounds=bounds)
# print out the solution
print(f"Spend: ${round(float(media_budget),2)}\n")
print(f"Optimized CPA: ${round(media_budget/(-1 * solution.fun),2)}")
print("Allocation:")
for i in range(len(media_labels)):
print(f"-{media_labels[i]}: ${round(solution.x[i],2)} ({round(solution.x[i]/media_budget*100,2)}%)")
And the result is
Spend: $100000.0
Optimized CPA: $-0.0
Allocation:
-launchvideoviews: $20000.0 (20.0%)
-conversion: $20000.0 (20.0%)
-traffic: $20000.0 (20.0%)
-videoviews: $20000.0 (20.0%)
-reach: $20000.0 (20.0%)
Which is the same as the initial guesses argument.
Thank you very much!
Update: Following #joni comment, I passed the gradient function explicitly, but still no result.
I don't know how to change the constrains to test #chthonicdaemon
comment yet.
import numpy as np
import scipy.optimize as sco
# setup variables
media_budget = 100000 # total media budget
media_labels = ['launchvideoviews', 'conversion', 'traffic', 'videoviews', 'reach'] # channel names
media_coefs = [0.3524764781, 5.606903166, -0.1761937775, 5.678596017, 10.50445914] #
# model coefficients
media_drs = [-1.15, 2.09, 6.7, -0.201, 1.21] # diminishing returns
const = -243.1018144
# the function for our model
def model_function(x, media_coefs, media_drs, const):
# transform variables and multiply them by coefficients to get contributions
channel_1_contrib = media_coefs[0] * x[0]**media_drs[0]
channel_2_contrib = media_coefs[1] * x[1]**media_drs[1]
channel_3_contrib = media_coefs[2] * x[2]**media_drs[2]
channel_4_contrib = media_coefs[3] * x[3]**media_drs[3]
channel_5_contrib = media_coefs[4] * x[4]**media_drs[4]
# sum contributions and add constant (objetive function)
y = channel_1_contrib + channel_2_contrib + channel_3_contrib + channel_4_contrib + channel_5_contrib + const
# return negative conversions for the minimize function to work
return -y
# partial derivative of the objective function
def fun_der(x, media_coefs, media_drs, const):
d_chan1 = 1
d_chan2 = 1
d_chan3 = 1
d_chan4 = 1
d_chan5 = 1
return np.array([d_chan1, d_chan2, d_chan3, d_chan4, d_chan5])
# set up guesses, constraints and bounds
num_media_vars = len(media_labels)
guesses = num_media_vars*[media_budget/num_media_vars,] # starting guesses: divide budget evenly
args = (media_coefs, media_drs, const) # pass non-optimized values into model_function
con_1 = {'type': 'eq', 'fun': lambda x: np.sum(x) - media_budget} # so we can't go over budget
constraints = (con_1)
bound = (0, media_budget) # spend for a channel can't be negative or higher than budget
bounds = tuple(bound for x in range(5))
# run the SciPy Optimizer
solution = sco.minimize(model_function, x0=guesses, args=args, method='SLSQP', constraints=constraints, bounds=bounds, jac=fun_der)
# print out the solution
print(f"Spend: ${round(float(media_budget),2)}\n")
print(f"Optimized CPA: ${round(media_budget/(-1 * solution.fun),2)}")
print("Allocation:")
for i in range(len(media_labels)):
print(f"-{media_labels[i]}: ${round(solution.x[i],2)} ({round(solution.x[i]/media_budget*100,2)}%)")
The reason you are not able to solve this exact problem turns out to be all about the specific coefficients you have. For the problem as it is specified, the optimum appears to be near allocations where some spends are zero. However, at spends near zero, due to the negative coefficients in media_drs, the objective function rapidly becomes infinite. I believe this is what is causing the issues you are experiencing. I can get a solution with success = True by manipulating the 6.7 to be 0.7 in the coefficients and setting lower bound that is larger than 0 to stop the objective function from exploding. So this isn't so much of a programming issue as a problem formulation issue.
I cannot imagine it would be true that you would see more payoff when you reduce the budget on a particular item, so all the negative powers in media_dirs seem off to me.
I will also post here some improvements I made while debugging this issue. Notice that I'm using numpy arrays more to make some of the functions easier to read. Also notice how I have calculated a correct jacobian:
import numpy as np
import scipy.optimize as sco
# setup variables
media_budget = 100000 # total media budget
media_labels = ['launchvideoviews', 'conversion', 'traffic', 'videoviews', 'reach'] # channel names
media_coefs = np.array([0.3524764781, 5.606903166, -0.1761937775, 5.678596017, 10.50445914]) #
# model coefficients
media_drs = np.array([-1.15, 2.09, 1.7, -0.201, 1.21]) # diminishing returns
const = -243.1018144
# the function for our model
def model_function(x, media_coefs, media_drs, const):
# transform variables and multiply them by coefficients to get contributions
channel_contrib = media_coefs * x**media_drs
# sum contributions and add constant
y = channel_contrib.sum() + const
# return negative conversions for the minimize function to work
return -y
def model_function_jac(x, media_coefs, media_drs, const):
dy_dx = media_coefs * media_drs * x**(media_drs-1)
return -dy_dx
# set up guesses, constraints and bounds
num_media_vars = len(media_labels)
guesses = num_media_vars*[media_budget/num_media_vars,] # starting guesses: divide budget evenly
args = (media_coefs, media_drs, const) # pass non-optimized values into model_function
con_1 = {'type': 'ineq', 'fun': lambda x: media_budget - sum(x)} # so we can't go over budget
constraints = (con_1,)
bound = (10, media_budget) # spend for a channel can't be negative or higher than budget
bounds = tuple(bound for x in range(5))
# run the SciPy Optimizer
solution = sco.minimize(
model_function, x0=guesses, args=args,
method='SLSQP',
jac=model_function_jac,
constraints=constraints,
bounds=bounds
)
# print out the solution
print(solution)
print(f"Spend: ${round(float(media_budget),2)}\n")
print(f"Optimized CPA: ${round(media_budget/(-1 * solution.fun),2)}")
print("Allocation:")
for i in range(len(media_labels)):
print(f"-{media_labels[i]}: ${round(solution.x[i],2)} ({round(solution.x[i]/media_budget*100,2)}%)")
This solution at least "works" in the sense that it reports a successful solve and returns an answer different from the initial guess.
I have two sets of frequencies data from experiment and from theoretical formula. I want to use minimize function of scipy.
Here's my code snippet.
where g is coupling which I want to find out.
Ad ind is inductance for plotting on x-axis.
from scipy.optimize import minimize
def eigenfreq1_func(ind,w_q,w_r,g):
return (w_q+w_r)+np.sqrt((w_q+w_r)**2.0-4*(w_q+w_r-g**2.0))/2
def eigenfreq2_func(ind,w_q,w_r,g):
return (w_q+w_r)-np.sqrt((w_q+w_r)**2.0-4*(w_q+w_r-g**2))/2.0
def err_func(y1,y1_fit,y2,y2_fit):
return np.sqrt((y1-y1_fit)**2+(y2-y2_fit)**2)
g_init=80e6
res1=eigenfreq1_func(ind,qubit_freq,readout_freq,g_init)
print res1
res2=eigenfreq2_func(ind,qubit_freq,readout_freq,g_init)
print res2
fit=minimize(err_func,args=[qubit_freq,res1,readout_freq,res2])
But it's showing the following error :
"TypeError: minimize() takes at least 2 arguments (2 given)"
First, the indentation in your example is messed up. Hope you don't try and run this
Second, here is a baby example to minimize the chi2 with the function scipy.optimize.minimize (note you can minimize what you want: likelihood, |chi|**?, toto, etc.):
import numpy as np
import scipy.optimize as opt
def functionyouwanttofit(x,y,z,t,u):
return np.array([x+y+z+t+u , x+y+z+t-u , x+y+z-t-u , x+y-z-t-u ]) # baby test here but put what you want
def calc_chi2(parameters):
x,y,z,t,u = parameters
data = np.array([100,250,300,500])
chi2 = sum( (data-functiontofit(x,y,z,t,u))**2 )
return chi2
# baby example for init, min & max values
x_init = 0
x_min = -1
x_max = 10
y_init = 1
y_min = -2
y_max = 9
z_init = 2
z_min = 0
z_max = 1000
t_init = 10
t_min = 1
t_max = 100
u_init = 10
u_min = 1
u_max = 100
parameters = [x_init,y_init,z_init,t_init,u_init]
bounds = [[x_min,x_max],[y_min,y_max],[z_min,z_max],[t_min,t_max],[u_min,u_max]]
result = opt.minimize(calc_chi2,parameters,bounds=bounds)
In your example you don't give initial values... This with the indentation... Were you waiting for someone doing the job for you ?
Third, note the optimization processes proposed by scipy are not always adapted to your needs. You may prefer minimizers such as lmfit
How can i generate a random walk data between a start-end values
while not passing over the maximum value and not going under the minimum value?
Here is my attempt to do this but for some reason sometimes the series goes over the max or under the min values. It seems that the Start and the End value are respected but not the minimum and the maximum value. How can this be fixed? Also i would like to give the standard deviation for the fluctuations but don't know how. I use a randomPerc for fluctuation but this is wrong as i would like to specify the std instead.
import numpy as np
import matplotlib.pyplot as plt
def generateRandomData(length,randomPerc, min,max,start, end):
data_np = (np.random.random(length) - randomPerc).cumsum()
data_np *= (max - min) / (data_np.max() - data_np.min())
data_np += np.linspace(start - data_np[0], end - data_np[-1], len(data_np))
return data_np
randomData=generateRandomData(length = 1000, randomPerc = 0.5, min = 50, max = 100, start = 66, end = 80)
## print values
print("Max Value",randomData.max())
print("Min Value",randomData.min())
print("Start Value",randomData[0])
print("End Value",randomData[-1])
print("Standard deviation",np.std(randomData))
## plot values
plt.figure()
plt.plot(range(randomData.shape[0]), randomData)
plt.show()
plt.close()
Here is a simple loop which checks for series that go under the minimum or over the maximum value. This is exactly what i am trying to avoid. The series should be distributed between the given limits for min and max values.
## generate 1000 series and check if there are any values over the maximum limit or under the minimum limit
for i in range(1000):
randomData = generateRandomData(length = 1000, randomPerc = 0.5, min = 50, max = 100, start = 66, end = 80)
if(randomData.min() < 50):
print(i, "Value Lower than Min limit")
if(randomData.max() > 100):
print(i, "Value Higher than Max limit")
As you impose conditions on your walk, it can not be considered purely random. Anyway, one way is to generate the walk iteratively, and check the boundaries on each iteration. But if you wanted a vectorized solution, here it is:
def bounded_random_walk(length, lower_bound, upper_bound, start, end, std):
assert (lower_bound <= start and lower_bound <= end)
assert (start <= upper_bound and end <= upper_bound)
bounds = upper_bound - lower_bound
rand = (std * (np.random.random(length) - 0.5)).cumsum()
rand_trend = np.linspace(rand[0], rand[-1], length)
rand_deltas = (rand - rand_trend)
rand_deltas /= np.max([1, (rand_deltas.max()-rand_deltas.min())/bounds])
trend_line = np.linspace(start, end, length)
upper_bound_delta = upper_bound - trend_line
lower_bound_delta = lower_bound - trend_line
upper_slips_mask = (rand_deltas-upper_bound_delta) >= 0
upper_deltas = rand_deltas - upper_bound_delta
rand_deltas[upper_slips_mask] = (upper_bound_delta - upper_deltas)[upper_slips_mask]
lower_slips_mask = (lower_bound_delta-rand_deltas) >= 0
lower_deltas = lower_bound_delta - rand_deltas
rand_deltas[lower_slips_mask] = (lower_bound_delta + lower_deltas)[lower_slips_mask]
return trend_line + rand_deltas
randomData = bounded_random_walk(1000, lower_bound=50, upper_bound =100, start=50, end=100, std=10)
You can see it as a solution of geometric problem. The trend_line is connecting your start and end points, and have margins defined by lower_bound and upper_bound. rand is your random walk, rand_trend it's trend line and rand_deltas is it's deviation from the rand trend line. We collocate the trend lines, and want to make sure that deltas don't exceed margins. When rand_deltas exceeds the allowed margin, we "fold" the excess back to the bounds.
At the end you add the resulting random deltas to the start=>end trend line, thus receiving the desired bounded random walk.
The std parameter corresponds to the amount of variance of the random walk.
update : fixed assertions
In this version "std" is not promised to be the "interval".
I noticed you used built in functions as arguments (min and max) which is not reccomended (I changed these to max_1 and min_1). Other than this your code should work as expected:
def generateRandomData(length,randomPerc, min_1,max_1,start, end):
data_np = (np.random.random(length) - randomPerc).cumsum()
data_np *= (max_1 - min_1) / (data_np.max() - data_np.min())
data_np += np.linspace(start - data_np[0], end - data_np[-1],len(data_np))
return data_np
randomData=generateRandomData(1000, 0.5, 50, 100, 66, 80)
If you are willing to modify your code this will work:
import random
for_fill=[]
# generate 1000 samples within the specified range and save them in for_fill
for x in range(1000):
generate_rnd_df=random.uniform(50,100)
for_fill.append(generate_rnd_df)
#set starting and end point manually
for_fill[0]=60
for_fill[999]=80
Here is one way, very crudely expressed in code.
>>> import random
>>> steps = 1000
>>> start = 66
>>> end = 80
>>> step_size = (50,100)
Generate 1,000 steps assured to be within the required range.
>>> crude_walk_steps = [random.uniform(*step_size) for _ in range(steps)]
>>> import numpy as np
Turn these steps into a walk but notice that they fail to meet the requirements.
>>> crude_walk = np.cumsum(crude_walk_steps)
>>> min(crude_walk)
57.099056617839288
>>> max(crude_walk)
75048.948693623403
Calculate a simple linear transformation to scale the steps.
>>> from sympy import *
>>> var('a b')
(a, b)
>>> solve([57.099056617839288*a+b-66,75048.948693623403*a+b-80])
{b: 65.9893403510312, a: 0.000186686954219243}
Scales the steps.
>>> walk = [0.000186686954219243*_+65.9893403510312 for _ in crude_walk]
Verify that the walk now starts and stops where intended.
>>> min(walk)
65.999999999999986
>>> max(walk)
79.999999999999986
You can also generate a stream of random walks and filter out those that do not meet your constraints. Just be aware that by filtering they are not really 'random' anymore.
The code below creates an infinite stream of 'valid' random walks. Be careful with
very tight constraints, the 'next' call might take a while ;).
import itertools
import numpy as np
def make_random_walk(first, last, min_val, max_val, size):
# Generate a sequence of random steps of lenght `size-2`
# that will be taken bewteen the start and stop values.
steps = np.random.normal(size=size-2)
# The walk is the cumsum of those steps
walk = steps.cumsum()
# Performing the walk from the start value gives you your series.
series = walk + first
# Compare the target min and max values with the observed ones.
target_min_max = np.array([min_val, max_val])
observed_min_max = np.array([series.min(), series.max()])
# Calculate the absolute 'overshoot' for min and max values
f = np.array([-1, 1])
overshoot = (observed_min_max*f - target_min_max*f)
# Calculate the scale factor to constrain the walk within the
# target min/max values.
# Don't upscale.
correction_base = [walk.min(), walk.max()][np.argmax(overshoot)]
scale = min(1, (correction_base - overshoot.max()) / correction_base)
# Generate the scaled series
new_steps = steps * scale
new_walk = new_steps.cumsum()
new_series = new_walk + first
# Check the size of the final step necessary to reach the target endpoint.
last_step_size = abs(last - new_series[-1]) # step needed to reach desired end
# Is it larger than the largest previously observed step?
if last_step_size > np.abs(new_steps).max():
# If so, consider this series invalid.
return None
else:
# Else, we found a valid series that meets the constraints.
return np.concatenate((np.array([first]), new_series, np.array([last])))
start = 66
stop = 80
max_val = 100
min_val = 50
size = 1000
# Create an infinite stream of candidate series
candidate_walks = (
(i, make_random_walk(first=start, last=stop, min_val=min_val, max_val=max_val, size=size))
for i in itertools.count()
)
# Filter out the invalid ones.
valid_walks = ((i, w) for i, w in candidate_walks if w is not None)
idx, walk = next(valid_walks) # Get the next valid series
print(
"Walk #{}: min/max({:.2f}/{:.2f})"
.format(idx, walk.min(), walk.max())
)
I have a range of dates and a measurement on each of those dates. I'd like to calculate an exponential moving average for each of the dates. Does anybody know how to do this?
I'm new to python. It doesn't appear that averages are built into the standard python library, which strikes me as a little odd. Maybe I'm not looking in the right place.
So, given the following code, how could I calculate the moving weighted average of IQ points for calendar dates?
from datetime import date
days = [date(2008,1,1), date(2008,1,2), date(2008,1,7)]
IQ = [110, 105, 90]
(there's probably a better way to structure the data, any advice would be appreciated)
EDIT:
It seems that mov_average_expw() function from scikits.timeseries.lib.moving_funcs submodule from SciKits (add-on toolkits that complement SciPy) better suits the wording of your question.
To calculate an exponential smoothing of your data with a smoothing factor alpha (it is (1 - alpha) in Wikipedia's terms):
>>> alpha = 0.5
>>> assert 0 < alpha <= 1.0
>>> av = sum(alpha**n.days * iq
... for n, iq in map(lambda (day, iq), today=max(days): (today-day, iq),
... sorted(zip(days, IQ), key=lambda p: p[0], reverse=True)))
95.0
The above is not pretty, so let's refactor it a bit:
from collections import namedtuple
from operator import itemgetter
def smooth(iq_data, alpha=1, today=None):
"""Perform exponential smoothing with factor `alpha`.
Time period is a day.
Each time period the value of `iq` drops `alpha` times.
The most recent data is the most valuable one.
"""
assert 0 < alpha <= 1
if alpha == 1: # no smoothing
return sum(map(itemgetter(1), iq_data))
if today is None:
today = max(map(itemgetter(0), iq_data))
return sum(alpha**((today - date).days) * iq for date, iq in iq_data)
IQData = namedtuple("IQData", "date iq")
if __name__ == "__main__":
from datetime import date
days = [date(2008,1,1), date(2008,1,2), date(2008,1,7)]
IQ = [110, 105, 90]
iqdata = list(map(IQData, days, IQ))
print("\n".join(map(str, iqdata)))
print(smooth(iqdata, alpha=0.5))
Example:
$ python26 smooth.py
IQData(date=datetime.date(2008, 1, 1), iq=110)
IQData(date=datetime.date(2008, 1, 2), iq=105)
IQData(date=datetime.date(2008, 1, 7), iq=90)
95.0
I'm always calculating EMAs with Pandas:
Here is an example how to do it:
import pandas as pd
import numpy as np
def ema(values, period):
values = np.array(values)
return pd.ewma(values, span=period)[-1]
values = [9, 5, 10, 16, 5]
period = 5
print ema(values, period)
More infos about Pandas EWMA:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.ewma.html
I did a bit of googling and I found the following sample code (http://osdir.com/ml/python.matplotlib.general/2005-04/msg00044.html):
def ema(s, n):
"""
returns an n period exponential moving average for
the time series s
s is a list ordered from oldest (index 0) to most
recent (index -1)
n is an integer
returns a numeric array of the exponential
moving average
"""
s = array(s)
ema = []
j = 1
#get n sma first and calculate the next n period ema
sma = sum(s[:n]) / n
multiplier = 2 / float(1 + n)
ema.append(sma)
#EMA(current) = ( (Price(current) - EMA(prev) ) x Multiplier) + EMA(prev)
ema.append(( (s[n] - sma) * multiplier) + sma)
#now calculate the rest of the values
for i in s[n+1:]:
tmp = ( (i - ema[j]) * multiplier) + ema[j]
j = j + 1
ema.append(tmp)
return ema
You can also use the SciPy filter method because the EMA is an IIR filter. This will have the benefit of being approximately 64 times faster as measured on my system using timeit on large data sets when compared to the enumerate() approach.
import numpy as np
from scipy.signal import lfilter
x = np.random.normal(size=1234)
alpha = .1 # smoothing coefficient
zi = [x[0]] # seed the filter state with first value
# filter can process blocks of continuous data if <zi> is maintained
y, zi = lfilter([1.-alpha], [1., -alpha], x, zi=zi)
I don't know Python, but for the averaging part, do you mean an exponentially decaying low-pass filter of the form
y_new = y_old + (input - y_old)*alpha
where alpha = dt/tau, dt = the timestep of the filter, tau = the time constant of the filter? (the variable-timestep form of this is as follows, just clip dt/tau to not be more than 1.0)
y_new = y_old + (input - y_old)*dt/tau
If you want to filter something like a date, make sure you convert to a floating-point quantity like # of seconds since Jan 1 1970.
My python is a little bit rusty (anyone can feel free to edit this code to make corrections, if I've messed up the syntax somehow), but here goes....
def movingAverageExponential(values, alpha, epsilon = 0):
if not 0 < alpha < 1:
raise ValueError("out of range, alpha='%s'" % alpha)
if not 0 <= epsilon < alpha:
raise ValueError("out of range, epsilon='%s'" % epsilon)
result = [None] * len(values)
for i in range(len(result)):
currentWeight = 1.0
numerator = 0
denominator = 0
for value in values[i::-1]:
numerator += value * currentWeight
denominator += currentWeight
currentWeight *= alpha
if currentWeight < epsilon:
break
result[i] = numerator / denominator
return result
This function moves backward, from the end of the list to the beginning, calculating the exponential moving average for each value by working backward until the weight coefficient for an element is less than the given epsilon.
At the end of the function, it reverses the values before returning the list (so that they're in the correct order for the caller).
(SIDE NOTE: if I was using a language other than python, I'd create a full-size empty array first and then fill it backwards-order, so that I wouldn't have to reverse it at the end. But I don't think you can declare a big empty array in python. And in python lists, appending is much less expensive than prepending, which is why I built the list in reverse order. Please correct me if I'm wrong.)
The 'alpha' argument is the decay factor on each iteration. For example, if you used an alpha of 0.5, then today's moving average value would be composed of the following weighted values:
today: 1.0
yesterday: 0.5
2 days ago: 0.25
3 days ago: 0.125
...etc...
Of course, if you've got a huge array of values, the values from ten or fifteen days ago won't contribute very much to today's weighted average. The 'epsilon' argument lets you set a cutoff point, below which you will cease to care about old values (since their contribution to today's value will be insignificant).
You'd invoke the function something like this:
result = movingAverageExponential(values, 0.75, 0.0001)
In matplotlib.org examples (http://matplotlib.org/examples/pylab_examples/finance_work2.html) is provided one good example of Exponential Moving Average (EMA) function using numpy:
def moving_average(x, n, type):
x = np.asarray(x)
if type=='simple':
weights = np.ones(n)
else:
weights = np.exp(np.linspace(-1., 0., n))
weights /= weights.sum()
a = np.convolve(x, weights, mode='full')[:len(x)]
a[:n] = a[n]
return a
I found the above code snippet by #earino pretty useful - but I needed something that could continuously smooth a stream of values - so I refactored it to this:
def exponential_moving_average(period=1000):
""" Exponential moving average. Smooths the values in v over ther period. Send in values - at first it'll return a simple average, but as soon as it's gahtered 'period' values, it'll start to use the Exponential Moving Averge to smooth the values.
period: int - how many values to smooth over (default=100). """
multiplier = 2 / float(1 + period)
cum_temp = yield None # We are being primed
# Start by just returning the simple average until we have enough data.
for i in xrange(1, period + 1):
cum_temp += yield cum_temp / float(i)
# Grab the timple avergae
ema = cum_temp / period
# and start calculating the exponentially smoothed average
while True:
ema = (((yield ema) - ema) * multiplier) + ema
and I use it like this:
def temp_monitor(pin):
""" Read from the temperature monitor - and smooth the value out. The sensor is noisy, so we use exponential smoothing. """
ema = exponential_moving_average()
next(ema) # Prime the generator
while True:
yield ema.send(val_to_temp(pin.read()))
(where pin.read() produces the next value I'd like to consume).
May be shortest:
#Specify decay in terms of span
#data_series should be a DataFrame
ema=data_series.ewm(span=5, adjust=False).mean()
import pandas_ta as ta
data["EMA3"] = ta.ema(data["close"], length=3)
pandas_ta is a Technical Analysis Library: https://github.com/twopirllc/pandas-ta. Above code calculates the Exponential Moving Average (EMA) for a series. You can specify the lag value using 'length'. Spesifically, above code calculates '3-day EMA'.
Here is a simple sample I worked up based on http://stockcharts.com/school/doku.php?id=chart_school:technical_indicators:moving_averages
Note that unlike in their spreadsheet, I don't calculate the SMA, and I don't wait to generate the EMA after 10 samples. This means my values differ slightly, but if you chart it, it follows exactly after 10 samples. During the first 10 samples, the EMA I calculate is appropriately smoothed.
def emaWeight(numSamples):
return 2 / float(numSamples + 1)
def ema(close, prevEma, numSamples):
return ((close-prevEma) * emaWeight(numSamples) ) + prevEma
samples = [
22.27, 22.19, 22.08, 22.17, 22.18, 22.13, 22.23, 22.43, 22.24, 22.29,
22.15, 22.39, 22.38, 22.61, 23.36, 24.05, 23.75, 23.83, 23.95, 23.63,
23.82, 23.87, 23.65, 23.19, 23.10, 23.33, 22.68, 23.10, 22.40, 22.17,
]
emaCap = 10
e=samples[0]
for s in range(len(samples)):
numSamples = emaCap if s > emaCap else s
e = ema(samples[s], e, numSamples)
print e
I'm a little late to the party here, but none of the solutions given were what I was looking for. Nice little challenge using recursion and the exact formula given in investopedia.
No numpy or pandas required.
prices = [{'i': 1, 'close': 24.5}, {'i': 2, 'close': 24.6}, {'i': 3, 'close': 24.8}, {'i': 4, 'close': 24.9},
{'i': 5, 'close': 25.6}, {'i': 6, 'close': 25.0}, {'i': 7, 'close': 24.7}]
def rec_calculate_ema(n):
k = 2 / (n + 1)
price = prices[n]['close']
if n == 1:
return price
res = (price * k) + (rec_calculate_ema(n - 1) * (1 - k))
return res
print(rec_calculate_ema(3))
A fast way (copy-pasted from here) is the following:
def ExpMovingAverage(values, window):
""" Numpy implementation of EMA
"""
weights = np.exp(np.linspace(-1., 0., window))
weights /= weights.sum()
a = np.convolve(values, weights, mode='full')[:len(values)]
a[:window] = a[window]
return a
I am using a list and a rate of decay as inputs. I hope this little function with just two lines may help you here, considering deep recursion is not stable in python.
def expma(aseries, ratio):
return sum([ratio*aseries[-x-1]*((1-ratio)**x) for x in range(len(aseries))])
more simply, using pandas
def EMA(tw):
for x in tw:
data["EMA{}".format(x)] = data['close'].ewm(span=x, adjust=False).mean()
EMA([10,50,100])
Papahaba's answer was almost what I was looking for (thanks!) but I needed to match initial conditions. Using an IIR filter with scipy.signal.lfilter is certainly the most efficient. Here's my redux:
Given a NumPy vector, x
import numpy as np
from scipy import signal
period = 12
b = np.array((1,), 'd')
a = np.array((period, 1-period), 'd')
zi = signal.lfilter_zi(b, a)
y, zi = signal.lfilter(b, a, x, zi=zi*x[0:1])
Get the N-point EMA (here, 12) returned in the vector y