Related
I want to calculate the BD-Rate for two different video encoding settings using the python script below.
Using 4 RD Points (R1 and PSNR1 are the reference RD Points of the Video1 while R2 and PSNR2 are the new tests with different video settings of Video2) the script works fine ie
from bjontegaard_metric import *
R1 = np.array([686.76, 309.58, 157.11, 85.95])
PSNR1 = np.array([40.28, 37.18, 34.24, 31.42])
R2 = np.array([893.34, 407.8, 204.93, 112.75])
PSNR2 = np.array([40.39, 37.21, 34.17, 31.24])
print 'BD-PSNR: ', BD_PSNR(R1, PSNR1, R2, PSNR2)
print 'BD-RATE: ', BD_RATE(R1, PSNR1, R2, PSNR2)
But with just 1 RD Point ie
from bjontegaard_metric import *
R1 = np.array([686.76])
PSNR1 = np.array([40.28])
R2 = np.array([893.34])
PSNR2 = np.array([40.39])
print 'BD-PSNR: ', BD_PSNR(R1, PSNR1, R2, PSNR2)
print 'BD-RATE: ', BD_RATE(R1, PSNR1, R2, PSNR2)
I get a warning: RankWarning: Polyfit may be poorly conditioned. Each video encoder run, returns just one pair of PSNR and Bitrate as a result. So I want to compare two pairs of PSNR/BitRate (Reference video & modified video). Is there any way to fix this warning? The results I get using only 1 RD point are reliable?
import numpy as np
import scipy.interpolate
def BD_PSNR(R1, PSNR1, R2, PSNR2, piecewise=0):
lR1 = np.log(R1)
lR2 = np.log(R2)
PSNR1 = np.array(PSNR1)
PSNR2 = np.array(PSNR2)
p1 = np.polyfit(lR1, PSNR1, 3)
p2 = np.polyfit(lR2, PSNR2, 3)
# integration interval
min_int = max(min(lR1), min(lR2))
max_int = min(max(lR1), max(lR2))
# find integral
if piecewise == 0:
p_int1 = np.polyint(p1)
p_int2 = np.polyint(p2)
int1 = np.polyval(p_int1, max_int) - np.polyval(p_int1, min_int)
int2 = np.polyval(p_int2, max_int) - np.polyval(p_int2, min_int)
else:
# See https://chromium.googlesource.com/webm/contributor-guide/+/master/scripts/visual_metrics.py
lin = np.linspace(min_int, max_int, num=100, retstep=True)
interval = lin[1]
samples = lin[0]
v1 = scipy.interpolate.pchip_interpolate(np.sort(lR1), PSNR1[np.argsort(lR1)], samples)
v2 = scipy.interpolate.pchip_interpolate(np.sort(lR2), PSNR2[np.argsort(lR2)], samples)
# Calculate the integral using the trapezoid method on the samples.
int1 = np.trapz(v1, dx=interval)
int2 = np.trapz(v2, dx=interval)
# find avg diff
avg_diff = (int2-int1)/(max_int-min_int)
return avg_diff
def BD_RATE(R1, PSNR1, R2, PSNR2, piecewise=0):
lR1 = np.log(R1)
lR2 = np.log(R2)
# rate method
p1 = np.polyfit(PSNR1, lR1, 3)
p2 = np.polyfit(PSNR2, lR2, 3)
# integration interval
min_int = max(min(PSNR1), min(PSNR2))
max_int = min(max(PSNR1), max(PSNR2))
# find integral
if piecewise == 0:
p_int1 = np.polyint(p1)
p_int2 = np.polyint(p2)
int1 = np.polyval(p_int1, max_int) - np.polyval(p_int1, min_int)
int2 = np.polyval(p_int2, max_int) - np.polyval(p_int2, min_int)
else:
lin = np.linspace(min_int, max_int, num=100, retstep=True)
interval = lin[1]
samples = lin[0]
v1 = scipy.interpolate.pchip_interpolate(np.sort(PSNR1), lR1[np.argsort(PSNR1)], samples)
v2 = scipy.interpolate.pchip_interpolate(np.sort(PSNR2), lR2[np.argsort(PSNR2)], samples)
# Calculate the integral using the trapezoid method on the samples.
int1 = np.trapz(v1, dx=interval)
int2 = np.trapz(v2, dx=interval)
# find avg diff
avg_exp_diff = (int2-int1)/(max_int-min_int)
avg_diff = (np.exp(avg_exp_diff)-1)*100
return avg_diff
According to IETF at https://tools.ietf.org/id/draft-ietf-netvc-testing-06.html#rfc.section.4.2 number 2 At least four points must be computed. These points should be the same quantizers when comparing two versions of the same codec. So any lesser points than 4 are not valid for reliable results.
1. Rate/distortion points are calculated for the reference and test codec.
2. At least four points must be computed. These points should be the same quantizers when comparing two versions of the same codec.
3. Additional points outside of the range should be discarded.
4. The rates are converted into log-rates.
5. A piecewise cubic hermite interpolating polynomial is fit to the points for each codec to produce functions of log-rate in terms of distortion.
Metric score ranges are computed:
1. If comparing two versions of the same codec, the overlap is the intersection of the two curves, bound by the chosen quantizer points.
2. If comparing dissimilar codecs, a third anchor codec’s metric scores at fixed quantizers are used directly as the bounds.
3. The log-rate is numerically integrated over the metric range for each curve, using at least 1000 samples and trapezoidal integration.
4. The resulting integrated log-rates are converted back into linear rate, and then the percent difference is calculated from the reference to the test codec.
I'm trying to calculate the Fourier transform of three muon polarization signals, which are simply cosine functions multiplied by an exponential decay.
So, doing the Fourier transform, we are going to see broadened peaks centered at the corresponding frequency.
The problem is that I have already tried to do the Fourier transform, but I do not know if it's correct; furthermore, I'm trying to calculate the FWHM using the scipy.stats.moment function, using the 2-nd moment: is it correct?
Can you tell me if the code is correct?
I put here the three signals in .npy file and the code used for the Fourier analysis.
The signals are signal[0], signal[1] and signal[2], arrays of 10 dimension.
Each signal[k] contains 10 polarization functions (1 for each applied magnetic field), which are signals of 400 points.
The corresponding files are signal_100, signal_110, signal_111, provided here:
https://github.com/JonathanFrassineti/UNDI-examples.
Ah, the frequencies range from 0 Hz to 40 MHz.
Thank you!
N = 400 # Number of signal points.
N1 = 40000000
T = 1./800. # Sampling spacing.
xf = np.fft.rfftfreq(N1, T)
yf1 = FWHM1 = sigma1 = delta1 = bhar1 = np.zeros(fields, dtype = object)
yf2 = FWHM2 = sigma2 = delta2 = bhar2 = np.zeros(fields, dtype = object)
yf3 = FWHM3 = sigma3 = delta3 = bhar3 = np.zeros(fields, dtype = object)
for j in range(fields):
# Fourier transform.
yf1[j] = np.fft.rfft(signal[0][j])
yf2[j] = np.fft.rfft(signal[1][j])
yf3[j] = np.fft.rfft(signal[2][j])
FWHM1[j] = moment(yf1[j], moment=2)
FWHM2[j] = moment(yf2[j], moment=2)
FWHM3[j] = moment(yf3[j], moment=2)
sigma1[j] = np.sqrt(np.abs(FWHM3[j]))/2.355
sigma2[j] = np.sqrt(np.abs(FWHM2[j]))/2.355
sigma3[j] = np.sqrt(np.abs(FWHM3[j]))/2.355
delta1[j] = sigma1[j]/gamma_Cu
delta2[j] = sigma2[j]/gamma_Cu
delta3[j] = sigma3[j]/gamma_Cu
bhar1[j] = (((a*angtom)**3)/(1e-7*gamma_Cu*hbar))*delta1[j]
bhar2[j] = (((a*angtom)**3)/(1e-7*gamma_Cu*hbar))*delta2[j]
bhar3[j] = (((a*angtom)**3)/(1e-7*gamma_Cu*hbar))*delta3[j]
Currently i work in a python project with same object. I've a set of data of magnetic field B(x,y,z), i think ideal would be to organize your data periodically at event and deduce Fe (sampling_rate).
f(A, t)=A*( cos(2*pi*fe*t) - sin(2*pi*fe*t)
B=[ 50, 50, 10, 3 ] # where each data is |B| normal at second
res=[ f(a, time) for time, a in enumerate(B) ]
fourrier_transform=np.fft.fft( res )
frequency= fftfreq([ time for time in range(len(B)) ]) # U can use fftfreq provide by scipy
Please star this project, research ressource to contribute
RFSignalToolkit github project
Any help would be much appreciated. I am trying to extract the BPM from any .wav file that is loaded onto the python script by using the pyAudioAnalysis library. For some reason it is not outputting the correct BPM? I tried to change the window size in the beat_extraction() function but it only allows numbers under 1 second and when I change the window size, the BPM seem to change. But when kept at 1 second window it outputs 30 every time.
The following is my code:
from pyAudioAnalysis import audioBasicIO
from pyAudioAnalysis import ShortTermFeatures
from pyAudioAnalysis import MidTermFeatures
import matplotlib.pyplot as plt
import numpy as np
import warnings
file_name = "unity_alan.wav"
# Extract the Sampling Rate (Fs) and the Raw Signal Data (signal)
[Fs, signal] = audioBasicIO.read_audio_file(file_name)
# Uncomment if signal has two channels
# signal = signal[:,0]
# Function to Normalize the Signal
def normalize_signal(signal):
signal = np.double(signal)
return (signal - signal.mean()) / ((np.abs(signal)).max() + 0.0000000001)
# Short Term Features
# For each short-term window a set of features is extracted. This would result in a sequence of feature
# vectors stored in a np matrix (in this case Features_midTerm)
signal = normalize_signal(signal)
# Fs - frequency sampling Rate
print("The Sampling Freq: ",Fs)
# Total Signal Len
signal_len = len(signal)
print("Total Signal Length: ",signal_len)
# Total Time Len of Song in Seconds
len_of_song = signal_len / Fs
print("Total Song Time (s): ",len_of_song)
# Window Size in mseconds
windowSize_in_ms = 50
windowSize_in_s = windowSize_in_ms/1000
# Window Size in Samples
windowSize_in_samples = Fs * (windowSize_in_ms / 1000) #divide by 1000 to turn to seconds
print("Window Size (samples): ",windowSize_in_samples)
# Window Step in mseconds
wStep_in_ms = 25
# Window Step in Samples
wStep_in_samples = Fs * (wStep_in_ms / 1000) #divide by 1000 to turn to seconds
print("Window Step in Samples: ", wStep_in_samples)
# Oversampling Percentage
oversampling_Percentage = (wStep_in_ms / windowSize_in_ms) * 100
print("Oversampling Percentage (overlap of windows): ",oversampling_Percentage)
# Total Number of Windows in Signal
total_number_windows = signal_len / windowSize_in_samples
print("Total Number of Windows in Signal: ",total_number_windows)
# Total number of feature samples produced (windows/size * total windows)
feature_samples_points_total_cal = int(total_number_windows * (windowSize_in_ms/wStep_in_ms))
print("Calculated Total of Points Produced per Short Term Feature: ",feature_samples_points_total_cal)
# Extract features and their names. Each index has its own vector of features. The total number should be the same
# as the calculated total of points produced per feature.
Features_shortTerm, feature_names = ShortTermFeatures.feature_extraction(signal, Fs, windowSize_in_samples, wStep_in_samples)
# Exact Number of points in the Features
feature_samples_points_total_exact = len(Features_shortTerm[0])
print("Exact Total of Points Produced per Short Term Feature: ",feature_samples_points_total_exact)
# Mid-window (in seconds)
mid_window_seconds = int(1 * Fs)
# Mid-step (in seconds)
mid_step_seconds = int(1 * Fs)
# MID FEATURE Extraction
Features_midTerm, short_Features_ignore, mid_feature_names = MidTermFeatures.mid_feature_extraction(signal,Fs,mid_window_seconds,mid_step_seconds,windowSize_in_samples,wStep_in_samples)
# Exact Mid-Term Feature Total Number of Points
midTerm_features_total_points = len(Features_midTerm)
print("Exact Mid-Term Total Number of Feature Points: ",midTerm_features_total_points)
# Beats per min
# The Tempo of music determins the speed at which it is played (measured in BPM)
bpm,confidence_ratio = MidTermFeatures.beat_extraction(Features_shortTerm,1)
print("Beats per Minute (bpm): ",bpm)
print("Confidence ratio for BPM: ", confidence_ratio)
# Figure out why the BPM does not match the actual reading
# of 115 BPM. It is showing 30 BPM which is for sure wrong.
The output of my script is as follows:
The Sampling Freq: 48000
Total Signal Length: 10992884
Total Song Time (s): 229.01841666666667
Window Size (samples): 2400.0
Window Step in Samples: 1200.0
Oversampling Percentage (overlap of windows): 50.0
Total Number of Windows in Signal: 4580.368333333333
Calculated Total of Points Produced per Short Term Feature: 9160
Exact Total of Points Produced per Short Term Feature: 9159
Exact Mid-Term Total Number of Feature Points: 136
Beats per Minute (bpm): 30.0
Confidence ratio for BPM: 1.0
The library's def of the function is as follows:
def beat_extraction(short_features, window_size, plot=False):
"""
This function extracts an estimate of the beat rate for a musical signal.
ARGUMENTS:
- short_features: a np array (n_feats x numOfShortTermWindows)
- window_size: window size in seconds
RETURNS:
- bpm: estimates of beats per minute
- ratio: a confidence measure
"""
# Features that are related to the beat tracking task:
selected_features = [0, 1, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15, 16, 17, 18]
max_beat_time = int(round(2.0 / window_size))
hist_all = np.zeros((max_beat_time,))
# for each feature
for ii, i in enumerate(selected_features):
# dif threshold (3 x Mean of Difs)
dif_threshold = 2.0 * (np.abs(short_features[i, 0:-1] -
short_features[i, 1::])).mean()
if dif_threshold <= 0:
dif_threshold = 0.0000000000000001
# detect local maxima
[pos1, _] = utilities.peakdet(short_features[i, :], dif_threshold)
position_diffs = []
# compute histograms of local maxima changes
for j in range(len(pos1)-1):
position_diffs.append(pos1[j+1]-pos1[j])
histogram_times, histogram_edges = \
np.histogram(position_diffs, np.arange(0.5, max_beat_time + 1.5))
hist_centers = (histogram_edges[0:-1] + histogram_edges[1::]) / 2.0
histogram_times = \
histogram_times.astype(float) / short_features.shape[1]
hist_all += histogram_times
if plot:
plt.subplot(9, 2, ii + 1)
plt.plot(short_features[i, :], 'k')
for k in pos1:
plt.plot(k, short_features[i, k], 'k*')
f1 = plt.gca()
f1.axes.get_xaxis().set_ticks([])
f1.axes.get_yaxis().set_ticks([])
if plot:
plt.show(block=False)
plt.figure()
# Get beat as the argmax of the agregated histogram:
max_indices = np.argmax(hist_all)
bpms = 60 / (hist_centers * window_size)
bpm = bpms[max_indices]
# ... and the beat ratio:
ratio = hist_all[max_indices] / hist_all.sum()
if plot:
# filter out >500 beats from plotting:
hist_all = hist_all[bpms < 500]
bpms = bpms[bpms < 500]
plt.plot(bpms, hist_all, 'k')
plt.xlabel('Beats per minute')
plt.ylabel('Freq Count')
plt.show(block=True)
return bpm, ratio
How can i generate a random walk data between a start-end values
while not passing over the maximum value and not going under the minimum value?
Here is my attempt to do this but for some reason sometimes the series goes over the max or under the min values. It seems that the Start and the End value are respected but not the minimum and the maximum value. How can this be fixed? Also i would like to give the standard deviation for the fluctuations but don't know how. I use a randomPerc for fluctuation but this is wrong as i would like to specify the std instead.
import numpy as np
import matplotlib.pyplot as plt
def generateRandomData(length,randomPerc, min,max,start, end):
data_np = (np.random.random(length) - randomPerc).cumsum()
data_np *= (max - min) / (data_np.max() - data_np.min())
data_np += np.linspace(start - data_np[0], end - data_np[-1], len(data_np))
return data_np
randomData=generateRandomData(length = 1000, randomPerc = 0.5, min = 50, max = 100, start = 66, end = 80)
## print values
print("Max Value",randomData.max())
print("Min Value",randomData.min())
print("Start Value",randomData[0])
print("End Value",randomData[-1])
print("Standard deviation",np.std(randomData))
## plot values
plt.figure()
plt.plot(range(randomData.shape[0]), randomData)
plt.show()
plt.close()
Here is a simple loop which checks for series that go under the minimum or over the maximum value. This is exactly what i am trying to avoid. The series should be distributed between the given limits for min and max values.
## generate 1000 series and check if there are any values over the maximum limit or under the minimum limit
for i in range(1000):
randomData = generateRandomData(length = 1000, randomPerc = 0.5, min = 50, max = 100, start = 66, end = 80)
if(randomData.min() < 50):
print(i, "Value Lower than Min limit")
if(randomData.max() > 100):
print(i, "Value Higher than Max limit")
As you impose conditions on your walk, it can not be considered purely random. Anyway, one way is to generate the walk iteratively, and check the boundaries on each iteration. But if you wanted a vectorized solution, here it is:
def bounded_random_walk(length, lower_bound, upper_bound, start, end, std):
assert (lower_bound <= start and lower_bound <= end)
assert (start <= upper_bound and end <= upper_bound)
bounds = upper_bound - lower_bound
rand = (std * (np.random.random(length) - 0.5)).cumsum()
rand_trend = np.linspace(rand[0], rand[-1], length)
rand_deltas = (rand - rand_trend)
rand_deltas /= np.max([1, (rand_deltas.max()-rand_deltas.min())/bounds])
trend_line = np.linspace(start, end, length)
upper_bound_delta = upper_bound - trend_line
lower_bound_delta = lower_bound - trend_line
upper_slips_mask = (rand_deltas-upper_bound_delta) >= 0
upper_deltas = rand_deltas - upper_bound_delta
rand_deltas[upper_slips_mask] = (upper_bound_delta - upper_deltas)[upper_slips_mask]
lower_slips_mask = (lower_bound_delta-rand_deltas) >= 0
lower_deltas = lower_bound_delta - rand_deltas
rand_deltas[lower_slips_mask] = (lower_bound_delta + lower_deltas)[lower_slips_mask]
return trend_line + rand_deltas
randomData = bounded_random_walk(1000, lower_bound=50, upper_bound =100, start=50, end=100, std=10)
You can see it as a solution of geometric problem. The trend_line is connecting your start and end points, and have margins defined by lower_bound and upper_bound. rand is your random walk, rand_trend it's trend line and rand_deltas is it's deviation from the rand trend line. We collocate the trend lines, and want to make sure that deltas don't exceed margins. When rand_deltas exceeds the allowed margin, we "fold" the excess back to the bounds.
At the end you add the resulting random deltas to the start=>end trend line, thus receiving the desired bounded random walk.
The std parameter corresponds to the amount of variance of the random walk.
update : fixed assertions
In this version "std" is not promised to be the "interval".
I noticed you used built in functions as arguments (min and max) which is not reccomended (I changed these to max_1 and min_1). Other than this your code should work as expected:
def generateRandomData(length,randomPerc, min_1,max_1,start, end):
data_np = (np.random.random(length) - randomPerc).cumsum()
data_np *= (max_1 - min_1) / (data_np.max() - data_np.min())
data_np += np.linspace(start - data_np[0], end - data_np[-1],len(data_np))
return data_np
randomData=generateRandomData(1000, 0.5, 50, 100, 66, 80)
If you are willing to modify your code this will work:
import random
for_fill=[]
# generate 1000 samples within the specified range and save them in for_fill
for x in range(1000):
generate_rnd_df=random.uniform(50,100)
for_fill.append(generate_rnd_df)
#set starting and end point manually
for_fill[0]=60
for_fill[999]=80
Here is one way, very crudely expressed in code.
>>> import random
>>> steps = 1000
>>> start = 66
>>> end = 80
>>> step_size = (50,100)
Generate 1,000 steps assured to be within the required range.
>>> crude_walk_steps = [random.uniform(*step_size) for _ in range(steps)]
>>> import numpy as np
Turn these steps into a walk but notice that they fail to meet the requirements.
>>> crude_walk = np.cumsum(crude_walk_steps)
>>> min(crude_walk)
57.099056617839288
>>> max(crude_walk)
75048.948693623403
Calculate a simple linear transformation to scale the steps.
>>> from sympy import *
>>> var('a b')
(a, b)
>>> solve([57.099056617839288*a+b-66,75048.948693623403*a+b-80])
{b: 65.9893403510312, a: 0.000186686954219243}
Scales the steps.
>>> walk = [0.000186686954219243*_+65.9893403510312 for _ in crude_walk]
Verify that the walk now starts and stops where intended.
>>> min(walk)
65.999999999999986
>>> max(walk)
79.999999999999986
You can also generate a stream of random walks and filter out those that do not meet your constraints. Just be aware that by filtering they are not really 'random' anymore.
The code below creates an infinite stream of 'valid' random walks. Be careful with
very tight constraints, the 'next' call might take a while ;).
import itertools
import numpy as np
def make_random_walk(first, last, min_val, max_val, size):
# Generate a sequence of random steps of lenght `size-2`
# that will be taken bewteen the start and stop values.
steps = np.random.normal(size=size-2)
# The walk is the cumsum of those steps
walk = steps.cumsum()
# Performing the walk from the start value gives you your series.
series = walk + first
# Compare the target min and max values with the observed ones.
target_min_max = np.array([min_val, max_val])
observed_min_max = np.array([series.min(), series.max()])
# Calculate the absolute 'overshoot' for min and max values
f = np.array([-1, 1])
overshoot = (observed_min_max*f - target_min_max*f)
# Calculate the scale factor to constrain the walk within the
# target min/max values.
# Don't upscale.
correction_base = [walk.min(), walk.max()][np.argmax(overshoot)]
scale = min(1, (correction_base - overshoot.max()) / correction_base)
# Generate the scaled series
new_steps = steps * scale
new_walk = new_steps.cumsum()
new_series = new_walk + first
# Check the size of the final step necessary to reach the target endpoint.
last_step_size = abs(last - new_series[-1]) # step needed to reach desired end
# Is it larger than the largest previously observed step?
if last_step_size > np.abs(new_steps).max():
# If so, consider this series invalid.
return None
else:
# Else, we found a valid series that meets the constraints.
return np.concatenate((np.array([first]), new_series, np.array([last])))
start = 66
stop = 80
max_val = 100
min_val = 50
size = 1000
# Create an infinite stream of candidate series
candidate_walks = (
(i, make_random_walk(first=start, last=stop, min_val=min_val, max_val=max_val, size=size))
for i in itertools.count()
)
# Filter out the invalid ones.
valid_walks = ((i, w) for i, w in candidate_walks if w is not None)
idx, walk = next(valid_walks) # Get the next valid series
print(
"Walk #{}: min/max({:.2f}/{:.2f})"
.format(idx, walk.min(), walk.max())
)
I have a range of dates and a measurement on each of those dates. I'd like to calculate an exponential moving average for each of the dates. Does anybody know how to do this?
I'm new to python. It doesn't appear that averages are built into the standard python library, which strikes me as a little odd. Maybe I'm not looking in the right place.
So, given the following code, how could I calculate the moving weighted average of IQ points for calendar dates?
from datetime import date
days = [date(2008,1,1), date(2008,1,2), date(2008,1,7)]
IQ = [110, 105, 90]
(there's probably a better way to structure the data, any advice would be appreciated)
EDIT:
It seems that mov_average_expw() function from scikits.timeseries.lib.moving_funcs submodule from SciKits (add-on toolkits that complement SciPy) better suits the wording of your question.
To calculate an exponential smoothing of your data with a smoothing factor alpha (it is (1 - alpha) in Wikipedia's terms):
>>> alpha = 0.5
>>> assert 0 < alpha <= 1.0
>>> av = sum(alpha**n.days * iq
... for n, iq in map(lambda (day, iq), today=max(days): (today-day, iq),
... sorted(zip(days, IQ), key=lambda p: p[0], reverse=True)))
95.0
The above is not pretty, so let's refactor it a bit:
from collections import namedtuple
from operator import itemgetter
def smooth(iq_data, alpha=1, today=None):
"""Perform exponential smoothing with factor `alpha`.
Time period is a day.
Each time period the value of `iq` drops `alpha` times.
The most recent data is the most valuable one.
"""
assert 0 < alpha <= 1
if alpha == 1: # no smoothing
return sum(map(itemgetter(1), iq_data))
if today is None:
today = max(map(itemgetter(0), iq_data))
return sum(alpha**((today - date).days) * iq for date, iq in iq_data)
IQData = namedtuple("IQData", "date iq")
if __name__ == "__main__":
from datetime import date
days = [date(2008,1,1), date(2008,1,2), date(2008,1,7)]
IQ = [110, 105, 90]
iqdata = list(map(IQData, days, IQ))
print("\n".join(map(str, iqdata)))
print(smooth(iqdata, alpha=0.5))
Example:
$ python26 smooth.py
IQData(date=datetime.date(2008, 1, 1), iq=110)
IQData(date=datetime.date(2008, 1, 2), iq=105)
IQData(date=datetime.date(2008, 1, 7), iq=90)
95.0
I'm always calculating EMAs with Pandas:
Here is an example how to do it:
import pandas as pd
import numpy as np
def ema(values, period):
values = np.array(values)
return pd.ewma(values, span=period)[-1]
values = [9, 5, 10, 16, 5]
period = 5
print ema(values, period)
More infos about Pandas EWMA:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.ewma.html
I did a bit of googling and I found the following sample code (http://osdir.com/ml/python.matplotlib.general/2005-04/msg00044.html):
def ema(s, n):
"""
returns an n period exponential moving average for
the time series s
s is a list ordered from oldest (index 0) to most
recent (index -1)
n is an integer
returns a numeric array of the exponential
moving average
"""
s = array(s)
ema = []
j = 1
#get n sma first and calculate the next n period ema
sma = sum(s[:n]) / n
multiplier = 2 / float(1 + n)
ema.append(sma)
#EMA(current) = ( (Price(current) - EMA(prev) ) x Multiplier) + EMA(prev)
ema.append(( (s[n] - sma) * multiplier) + sma)
#now calculate the rest of the values
for i in s[n+1:]:
tmp = ( (i - ema[j]) * multiplier) + ema[j]
j = j + 1
ema.append(tmp)
return ema
You can also use the SciPy filter method because the EMA is an IIR filter. This will have the benefit of being approximately 64 times faster as measured on my system using timeit on large data sets when compared to the enumerate() approach.
import numpy as np
from scipy.signal import lfilter
x = np.random.normal(size=1234)
alpha = .1 # smoothing coefficient
zi = [x[0]] # seed the filter state with first value
# filter can process blocks of continuous data if <zi> is maintained
y, zi = lfilter([1.-alpha], [1., -alpha], x, zi=zi)
I don't know Python, but for the averaging part, do you mean an exponentially decaying low-pass filter of the form
y_new = y_old + (input - y_old)*alpha
where alpha = dt/tau, dt = the timestep of the filter, tau = the time constant of the filter? (the variable-timestep form of this is as follows, just clip dt/tau to not be more than 1.0)
y_new = y_old + (input - y_old)*dt/tau
If you want to filter something like a date, make sure you convert to a floating-point quantity like # of seconds since Jan 1 1970.
My python is a little bit rusty (anyone can feel free to edit this code to make corrections, if I've messed up the syntax somehow), but here goes....
def movingAverageExponential(values, alpha, epsilon = 0):
if not 0 < alpha < 1:
raise ValueError("out of range, alpha='%s'" % alpha)
if not 0 <= epsilon < alpha:
raise ValueError("out of range, epsilon='%s'" % epsilon)
result = [None] * len(values)
for i in range(len(result)):
currentWeight = 1.0
numerator = 0
denominator = 0
for value in values[i::-1]:
numerator += value * currentWeight
denominator += currentWeight
currentWeight *= alpha
if currentWeight < epsilon:
break
result[i] = numerator / denominator
return result
This function moves backward, from the end of the list to the beginning, calculating the exponential moving average for each value by working backward until the weight coefficient for an element is less than the given epsilon.
At the end of the function, it reverses the values before returning the list (so that they're in the correct order for the caller).
(SIDE NOTE: if I was using a language other than python, I'd create a full-size empty array first and then fill it backwards-order, so that I wouldn't have to reverse it at the end. But I don't think you can declare a big empty array in python. And in python lists, appending is much less expensive than prepending, which is why I built the list in reverse order. Please correct me if I'm wrong.)
The 'alpha' argument is the decay factor on each iteration. For example, if you used an alpha of 0.5, then today's moving average value would be composed of the following weighted values:
today: 1.0
yesterday: 0.5
2 days ago: 0.25
3 days ago: 0.125
...etc...
Of course, if you've got a huge array of values, the values from ten or fifteen days ago won't contribute very much to today's weighted average. The 'epsilon' argument lets you set a cutoff point, below which you will cease to care about old values (since their contribution to today's value will be insignificant).
You'd invoke the function something like this:
result = movingAverageExponential(values, 0.75, 0.0001)
In matplotlib.org examples (http://matplotlib.org/examples/pylab_examples/finance_work2.html) is provided one good example of Exponential Moving Average (EMA) function using numpy:
def moving_average(x, n, type):
x = np.asarray(x)
if type=='simple':
weights = np.ones(n)
else:
weights = np.exp(np.linspace(-1., 0., n))
weights /= weights.sum()
a = np.convolve(x, weights, mode='full')[:len(x)]
a[:n] = a[n]
return a
I found the above code snippet by #earino pretty useful - but I needed something that could continuously smooth a stream of values - so I refactored it to this:
def exponential_moving_average(period=1000):
""" Exponential moving average. Smooths the values in v over ther period. Send in values - at first it'll return a simple average, but as soon as it's gahtered 'period' values, it'll start to use the Exponential Moving Averge to smooth the values.
period: int - how many values to smooth over (default=100). """
multiplier = 2 / float(1 + period)
cum_temp = yield None # We are being primed
# Start by just returning the simple average until we have enough data.
for i in xrange(1, period + 1):
cum_temp += yield cum_temp / float(i)
# Grab the timple avergae
ema = cum_temp / period
# and start calculating the exponentially smoothed average
while True:
ema = (((yield ema) - ema) * multiplier) + ema
and I use it like this:
def temp_monitor(pin):
""" Read from the temperature monitor - and smooth the value out. The sensor is noisy, so we use exponential smoothing. """
ema = exponential_moving_average()
next(ema) # Prime the generator
while True:
yield ema.send(val_to_temp(pin.read()))
(where pin.read() produces the next value I'd like to consume).
May be shortest:
#Specify decay in terms of span
#data_series should be a DataFrame
ema=data_series.ewm(span=5, adjust=False).mean()
import pandas_ta as ta
data["EMA3"] = ta.ema(data["close"], length=3)
pandas_ta is a Technical Analysis Library: https://github.com/twopirllc/pandas-ta. Above code calculates the Exponential Moving Average (EMA) for a series. You can specify the lag value using 'length'. Spesifically, above code calculates '3-day EMA'.
Here is a simple sample I worked up based on http://stockcharts.com/school/doku.php?id=chart_school:technical_indicators:moving_averages
Note that unlike in their spreadsheet, I don't calculate the SMA, and I don't wait to generate the EMA after 10 samples. This means my values differ slightly, but if you chart it, it follows exactly after 10 samples. During the first 10 samples, the EMA I calculate is appropriately smoothed.
def emaWeight(numSamples):
return 2 / float(numSamples + 1)
def ema(close, prevEma, numSamples):
return ((close-prevEma) * emaWeight(numSamples) ) + prevEma
samples = [
22.27, 22.19, 22.08, 22.17, 22.18, 22.13, 22.23, 22.43, 22.24, 22.29,
22.15, 22.39, 22.38, 22.61, 23.36, 24.05, 23.75, 23.83, 23.95, 23.63,
23.82, 23.87, 23.65, 23.19, 23.10, 23.33, 22.68, 23.10, 22.40, 22.17,
]
emaCap = 10
e=samples[0]
for s in range(len(samples)):
numSamples = emaCap if s > emaCap else s
e = ema(samples[s], e, numSamples)
print e
I'm a little late to the party here, but none of the solutions given were what I was looking for. Nice little challenge using recursion and the exact formula given in investopedia.
No numpy or pandas required.
prices = [{'i': 1, 'close': 24.5}, {'i': 2, 'close': 24.6}, {'i': 3, 'close': 24.8}, {'i': 4, 'close': 24.9},
{'i': 5, 'close': 25.6}, {'i': 6, 'close': 25.0}, {'i': 7, 'close': 24.7}]
def rec_calculate_ema(n):
k = 2 / (n + 1)
price = prices[n]['close']
if n == 1:
return price
res = (price * k) + (rec_calculate_ema(n - 1) * (1 - k))
return res
print(rec_calculate_ema(3))
A fast way (copy-pasted from here) is the following:
def ExpMovingAverage(values, window):
""" Numpy implementation of EMA
"""
weights = np.exp(np.linspace(-1., 0., window))
weights /= weights.sum()
a = np.convolve(values, weights, mode='full')[:len(values)]
a[:window] = a[window]
return a
I am using a list and a rate of decay as inputs. I hope this little function with just two lines may help you here, considering deep recursion is not stable in python.
def expma(aseries, ratio):
return sum([ratio*aseries[-x-1]*((1-ratio)**x) for x in range(len(aseries))])
more simply, using pandas
def EMA(tw):
for x in tw:
data["EMA{}".format(x)] = data['close'].ewm(span=x, adjust=False).mean()
EMA([10,50,100])
Papahaba's answer was almost what I was looking for (thanks!) but I needed to match initial conditions. Using an IIR filter with scipy.signal.lfilter is certainly the most efficient. Here's my redux:
Given a NumPy vector, x
import numpy as np
from scipy import signal
period = 12
b = np.array((1,), 'd')
a = np.array((period, 1-period), 'd')
zi = signal.lfilter_zi(b, a)
y, zi = signal.lfilter(b, a, x, zi=zi*x[0:1])
Get the N-point EMA (here, 12) returned in the vector y