numpy.random.uniform from a discontinuous set - python

I want to generate a random number uniformly in the sets of the form (a,b)∪(c,d)∪...∪(e,f), where a < b < c ... and a > 0, and f < 1. Is this possible with the numpy.random.uniform function?

If you only need to pick it once you can use np.random.choice:
import numpy as np
a, b, c, d = 0, 0.3, 0.7, 1
# Specify relative probabilities
prob = np.array([b-a, d-c])
prob = prob/prob.sum() # Normalize to sum up to one
r = np.random.choice([np.random.uniform(a, b), np.random.uniform(c, d)],
p=prob)
r
0.9662186527199109
If you need to generate many values:
n=10
R = np.array([np.random.choice([np.random.uniform(a, b),np.random.uniform(c, d)],
p=prob)
for _ in range(n)])
R
array([0.19130148, 0.24858629, 0.75106557, 0.11057559, 0.9276096 ,
0.01849698, 0.89433504, 0.99455349, 0.10128313, 0.23325187])
We can see that adding the probability parameter yield the expected result:
a,b,c,d,e,f = 0, .1, .2,.25, .5, 1
prob = np.array([b-a, d-c, f-e])
prob = prob/prob.sum()
n=10_000
R = np.array([np.random.choice([np.random.uniform(a, b),
np.random.uniform(c, d),
np.random.uniform(e, f)],
p=prob)
for _ in range(n)])
print(prob)
array([0.15384615, 0.07692308, 0.76923077])
print(R[np.logical_and(R>a, R<b)].size/n, R[np.logical_and(R>c, R<d)].size/n, R[np.logical_and(R>e, R<f)].size/n)
0.1537 0.0709 0.7754

Note: This answer was written for the original version of the question, which asked for uniform samples from the set (0, 0.3) ∪ (0.7, 1).
There are many ways you could do this. Here's one that's very concise, but it depends on the particular form of the intervals you have given:
In [16]: rng = np.random.default_rng()
In [17]: n = 1000
In [18]: x = rng.uniform(-0.3, 0.3, size=n) % 1
x is the array of n samples.
The trick is that the samples are generated on the interval (-0.3, 0.3). Then by mod'ing the values with 1, the negative values "wrap around" to the interval (0.7, 1).

Related

How to divide an array in several sections?

I have an array with approximately 12000 length, something like array([0.3, 0.6, 0.3, 0.5, 0.1, 0.9, 0.4...]). Also, I have a column in a dataframe that provides values like 2,3,7,3,2,7.... The length of the column is 48, and the sum of those values is 36.
I want to distribute the values, which means the 12000 lengths of array is distributed by specific every value. For example, the first value in that column( = 2) gets its own array of 12000*(2/36) (maybe [0.3, 0.6, 0.3]), and the second value ( = 3) gets its array of 12000*(3/36), and its value continues after the first value(something like [0.5, 0.1, 0.9, 0.4]) and so on.
import pandas as pd
import numpy as np
# mock some data
a = np.random.random(12000)
df = pd.DataFrame({'col': np.random.randint(1, 5, 48)})
indices = (len(a) * df.col.to_numpy() / sum(df.col)).cumsum()
indices = np.concatenate(([0], indices)).round().astype(int)
res = []
for s, e in zip(indices[:-1], indices[1:]):
res.append(a[round(s):round(e)])
# some tests
target_pcts = df.col.to_numpy() / sum(df.col)
realized_pcts = np.array([len(sl) / len(a) for sl in res])
diffs = target_pcts / realized_pcts
assert 0.99 < np.min(diffs) and np.max(diffs) < 1.01
assert all(np.concatenate([*res]) == a)

Fastest way to find the nearest pairs between two numpy arrays without duplicates

Given two large numpy arrays A and B with different number of rows (len(B) > len(A)) but same number of columns (A.shape[1] = B.shape[1] = 3). I want to know the fastest way to get a subset C from B that has the minimum total distance (sum of all pair-wise distances) to A without duplicates (each pair must be both unique). This means C should have the same shape as A.
Below is my code, but there are two main issues:
I cannot tell if this gives the minimum total distance
In reality I have a much more expensive distance-calculating function rather than np.linalg.norm (needs to take care of periodic boundary conditions). I think this is definitely not the fastest way to go since the code below calls the distance-calculating function one pair per time. There is a significant overhead when I call the more expensive distance-calculating function and it will run forever. Any suggestions?
import numpy as np
from operator import itemgetter
import random
import time
A = 100.*np.random.rand(1000, 3)
B = A.copy()
for (i,j), _ in np.ndenumerate(B):
B[i,j] += np.random.rand()
B = np.vstack([B, 100.*np.random.rand(500, 3)])
def calc_dist(x, y):
return np.linalg.norm(x - y)
t0 = time.time()
taken = []
for rowi in A:
res = min(((k, calc_dist(rowi, rowj)) for k, rowj in enumerate(B)
if k not in taken), key=itemgetter(1))
taken.append(res[0])
C = B[taken]
print(A.shape, B.shape, C.shape)
>>> (1000, 3) (1500, 3) (1000, 3)
print(time.time() - t0)
>>> 12.406389951705933
Edit: for those who are interested in the expensive distance-calculating function, it uses the ase package (can be installed by pip install ase)
from ase.geometry import find_mic
def calc_mic_dist(x, y):
return find_mic(np.array([x]) - np.array([y]),
cell=np.array([[50., 0.0, 0.0],
[25., 45., 0.0],
[0.0, 0.0, 100.]]))[1][0]
If you're OK with calculating the whole N² distances, which isn't that expensive for the sizes you've given, scipy.optimize has a function that will solve this directly.
import scipy.optimize
cost = np.linalg.norm(A[:, np.newaxis, :] - B, axis=2)
_, indexes = scipy.optimize.linear_sum_assignment(cost)
C = B[indexes]
Using the power of numpy broadcasting and vectorization
find_mic method in ase.geometry can handle 2d np arrays.
from ase.geometry import find_mic
def calc_mic_dist(x, y):
return find_mic(x - y,
cell=np.array([[50., 0.0, 0.0],
[25., 45., 0.0],
[0.0, 0.0, 100.]]))[1]
Test:
x = np.random.randn(1,3)
y = np.random.randn(5,3)
print (calc_mic_dist(x,y).shape)
# It is a distance metrics so:
assert np.allclose(calc_mic_dist(x,y), calc_mic_dist(y,x))
Ouptput:
(5,)
As you can see the metrics is calculated for each value of x with each value of y, because x-y in numpy does the magic of broadcasting.
Solution:
def calc_mic_dist(x, y):
return find_mic(x - y,
cell=np.array([[50., 0.0, 0.0],
[25., 45., 0.0],
[0.0, 0.0, 100.]]))[1]
t0 = time.time()
A = 100.*np.random.rand(1000, 3)
B = 100.*np.random.rand(5000, 3)
selected = [np.argmin(calc_mic_dist(a, B)) for a in A]
C = B[selected]
print (A.shape, B.shape, C.shape)
print (f"Time: {time.time()-t0}")
Output:
(1000, 3) (5000, 3) (1000, 3)
Time: 9.817562341690063
Takes around 10secs on google collab
Testing:
We know that calc_mic_dist(x,x) == 0 so If A is a subset of B then C should exactly be A
A = 100.*np.random.rand(1000, 3)
B = np.vstack([100.*np.random.rand(500, 3), A, 100.*np.random.rand(500, 3)])
selected = [np.argmin(calc_mic_dist(a, B)) for a in A]
C = B[selected]
print (A.shape, B.shape, C.shape)
print (np.allclose(A,C))
Output:
(1000, 3) (2000, 3) (1000, 3)
True
Edit 1: Avoid duplicates
Once a vector in B is selected it cannot be again selected for other
values of A
This can be achieved by remove the selected vector from B once it is selected so that it does not appear again for next rows of A as a possible candidate.
A = 100.*np.random.rand(1000, 3)
B = np.vstack([100.*np.random.rand(500, 3), A, 100.*np.random.rand(500, 3)])
B_ = B.copy()
C = np.zeros_like(A)
for i, a in enumerate(A):
s = np.argmin(calc_mic_dist(a, B_))
C[i] = B_[s]
# Remove the paried
B_ = np.delete(B_, (s), axis=0)
print (A.shape, B.shape, C.shape)
print (np.allclose(A,C))
Output:
(1000, 3) (2000, 3) (1000, 3)
True

Is there a vectorized way to sample multiples times with np.random.choice() with differents p?

I'm trying to implement a variation ratio, and I need T samples from an array C, but each sample has different weights p_t.
I'm using this:
import numpy as np
from scipy import stats
batch_size = 1
T = 3
C = np.array(['A', 'B', 'C'])
# p_batch_T dimensions: (batch, sample, class)
p_batch_T = np.array([[[0.01, 0.98, 0.01],
[0.3, 0.15, 0.55],
[0.85, 0.1, 0.05]]])
def variation_ratio(C, p_T):
# This function works only with one sample from the batch.
Y_T = np.array([np.random.choice(C, size=1, p=p_t) for p_t in p_T]) # vectorize this
C_mode, frecuency = stats.mode(Y_T)
T = len(Y_T)
return 1.0 - (f/T)
def variation_ratio_batch(C, p_batch_T):
return np.array([variation_ratio(C, p_T) for p_T in p_batch_T]) # and vectorize this
Is there a way to implement these functions with any for?
In stead of sampling with the given distribution p_T, we can sample uniformly between [0,1] and compare that to the cumulative distribution:
Let's start with Y_T, say for p_T = p_batch_T[0]
cum_dist = p_batch_T.cumsum(axis=-1)
idx_T = (np.random.rand(len(C),1) < cum_dist[0]).argmax(-1)
Y_T = C[idx_T[...,None]]
_, f = stats.mode(Y_T) # here axis=0 is default
Now let take that to the variation_ratio_batch:
idx_T = (np.random.rand(len(p_batch_T), len(C),1) < cum_dist).argmax(-1)
Y = C[idx_T[...,None]]
f = stats.mode(Y, axis=1) # notice axis 0 is batch
out = 1 - (f/T)
You could do it this way:
First, create a 2D weights array of shape (T, len(C)) and take the cumulative sum:
n_rows = 5
n_cols = 3
weights = np.random.rand(n_rows, n_cols)
cum_weights = (weights / weights.sum(axis=1, keepdims=True)).cumsum(axis=1)
cum_weights might look like this:
array([[0.09048919, 0.58962127, 1. ],
[0.36333997, 0.58380885, 1. ],
[0.28761923, 0.63413879, 1. ],
[0.39446498, 0.98760834, 1. ],
[0.27862476, 0.79715149, 1. ]])
Next, we can compare cum_weights to the appropriately sized output of np.random.rand. By taking argmin, we find the index in each row where the random number generated is greater than the cumulative weight:
indices = (cum_weights < np.random.rand(n_rows, 1)).argmin(axis=1)
We can then use indices to index an array of values of shape (n_cols,), which is len(C) in your original example.
np.vectorize should work:
from functools import partial
import numpy as np
#partial(np.vectorize, excluded=['rng'], signature='(),(k)->()')
def choice_batched(rng, probs):
return rng.choice(a=probs.shape[-1], p=probs)
then
num_classes = 3
batch_size = 5
alpha = .5 # Dirichlet prior hyperparameter.
rng = np.random.default_rng()
probs = np.random.dirichlet(alpha=np.full(fill_value=alpha, shape=num_classes), size=batch_size)
# Check each row sums to 1.
assert np.allclose(probs.sum(axis=-1), 1)
print(choice_batched(rng, probs))
print(choice_batched(rng, probs))
print(choice_batched(rng, probs))
print(choice_batched(rng, probs))
gives
[2 0 0 0 1]
[1 0 0 0 1]
[2 0 2 0 1]
[1 0 0 0 0]
Here is my implementation of Quang's and gmds' solutions:
def sample(ws, k):
"""Weighted sample k elements along the last axis.
ws -- Tensor of probabilities, shape (*, n)
k -- Number of elements to sample.
Returns tensor of shape (*, k) with values in {0, ..., n-1}.
"""
assert np.allclose(ws.sum(-1), 1)
cs = ws.cumsum(-1)
ps = np.random.random(ws.shape[:-1] + (k,))
return (cs[..., None, :] < ps[..., None]).sum(-1)
Say we have some stuff
>>> stuff = array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
And some weights / sampling probabilities.
>>> ws = array([[0.41296038, 0.36070229, 0.22633733],
[0.37576672, 0.14518771, 0.47904557],
[0.14742326, 0.29182459, 0.56075215]])
And we want to sample 2 elements along each row. Then we do
>>> ids = sample(ws, 2)
[[2, 0],
[1, 2],
[2, 2]]
And we can retrieve the sampled values from stuff using np.take_along_axis:
>>> np.take_along_axis(stuff, ids)
[[2, 0],
[4, 5],
[8, 8]]
The code could be generalized to sampling along an axis other than the last one, but I got confused about broadcasting, so somebody else should have a stab at it!

Partial convolution / correlation with numpy [duplicate]

I am learning numpy/scipy, coming from a MATLAB background. The xcorr function in Matlab has an optional argument "maxlag" that limits the lag range from –maxlag to maxlag. This is very useful if you are looking at the cross-correlation between two very long time series but are only interested in the correlation within a certain time range. The performance increases are enormous considering that cross-correlation is incredibly expensive to compute.
In numpy/scipy it seems there are several options for computing cross-correlation. numpy.correlate, numpy.convolve, scipy.signal.fftconvolve. If someone wishes to explain the difference between these, I'd be happy to hear, but mainly what is troubling me is that none of them have a maxlag feature. This means that even if I only want to see correlations between two time series with lags between -100 and +100 ms, for example, it will still calculate the correlation for every lag between -20000 and +20000 ms (which is the length of the time series). This gives a 200x performance hit! Do I have to recode the cross-correlation function by hand to include this feature?
Here are a couple functions to compute auto- and cross-correlation with limited lags. The order of multiplication (and conjugation, in the complex case) was chosen to match the corresponding behavior of numpy.correlate.
import numpy as np
from numpy.lib.stride_tricks import as_strided
def _check_arg(x, xname):
x = np.asarray(x)
if x.ndim != 1:
raise ValueError('%s must be one-dimensional.' % xname)
return x
def autocorrelation(x, maxlag):
"""
Autocorrelation with a maximum number of lags.
`x` must be a one-dimensional numpy array.
This computes the same result as
numpy.correlate(x, x, mode='full')[len(x)-1:len(x)+maxlag]
The return value has length maxlag + 1.
"""
x = _check_arg(x, 'x')
p = np.pad(x.conj(), maxlag, mode='constant')
T = as_strided(p[maxlag:], shape=(maxlag+1, len(x) + maxlag),
strides=(-p.strides[0], p.strides[0]))
return T.dot(p[maxlag:].conj())
def crosscorrelation(x, y, maxlag):
"""
Cross correlation with a maximum number of lags.
`x` and `y` must be one-dimensional numpy arrays with the same length.
This computes the same result as
numpy.correlate(x, y, mode='full')[len(a)-maxlag-1:len(a)+maxlag]
The return vaue has length 2*maxlag + 1.
"""
x = _check_arg(x, 'x')
y = _check_arg(y, 'y')
py = np.pad(y.conj(), 2*maxlag, mode='constant')
T = as_strided(py[2*maxlag:], shape=(2*maxlag+1, len(y) + 2*maxlag),
strides=(-py.strides[0], py.strides[0]))
px = np.pad(x, maxlag, mode='constant')
return T.dot(px)
For example,
In [367]: x = np.array([2, 1.5, 0, 0, -1, 3, 2, -0.5])
In [368]: autocorrelation(x, 3)
Out[368]: array([ 20.5, 5. , -3.5, -1. ])
In [369]: np.correlate(x, x, mode='full')[7:11]
Out[369]: array([ 20.5, 5. , -3.5, -1. ])
In [370]: y = np.arange(8)
In [371]: crosscorrelation(x, y, 3)
Out[371]: array([ 5. , 23.5, 32. , 21. , 16. , 12.5, 9. ])
In [372]: np.correlate(x, y, mode='full')[4:11]
Out[372]: array([ 5. , 23.5, 32. , 21. , 16. , 12.5, 9. ])
(It will be nice to have such a feature in numpy itself.)
Until numpy implements the maxlag argument, you can use the function ucorrelate from the pycorrelate package. ucorrelate operates on numpy arrays and has a maxlag keyword. It implements the correlation from using a for-loop and optimizes the execution speed with numba.
Example - autocorrelation with 3 time lags:
import numpy as np
import pycorrelate as pyc
x = np.array([2, 1.5, 0, 0, -1, 3, 2, -0.5])
c = pyc.ucorrelate(x, x, maxlag=3)
c
Result:
Out[1]: array([20, 5, -3])
The pycorrelate documentation contains a notebook showing perfect match between pycorrelate.ucorrelate and numpy.correlate:
matplotlib.pyplot provides matlab like syntax for computating and plotting of cross correlation , auto correlation etc.
You can use xcorr which allows to define the maxlags parameter.
import matplotlib.pyplot as plt
import numpy as np
data = np.arange(0,2*np.pi,0.01)
y1 = np.sin(data)
y2 = np.cos(data)
coeff = plt.xcorr(y1,y2,maxlags=10)
print(*coeff)
[-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
8 9 10] [ -9.81991753e-02 -8.85505028e-02 -7.88613080e-02 -6.91325329e-02
-5.93651264e-02 -4.95600447e-02 -3.97182508e-02 -2.98407146e-02
-1.99284126e-02 -9.98232812e-03 -3.45104289e-06 9.98555430e-03
1.99417667e-02 2.98641953e-02 3.97518558e-02 4.96037706e-02
5.94189688e-02 6.91964864e-02 7.89353663e-02 8.86346584e-02
9.82934198e-02] <matplotlib.collections.LineCollection object at 0x00000000074A9E80> Line2D(_line0)
#Warren Weckesser's answer is the best as it leverages numpy to get performance savings (and not just call corr for each lag). Nonetheless, it returns the cross-product (eg the dot product between the inputs at various lags). To get the actual cross-correlation I modified his answer w/ an optional mode argument, which if set to 'corr' returns the cross-correlation as such:
def crosscorrelation(x, y, maxlag, mode='corr'):
"""
Cross correlation with a maximum number of lags.
`x` and `y` must be one-dimensional numpy arrays with the same length.
This computes the same result as
numpy.correlate(x, y, mode='full')[len(a)-maxlag-1:len(a)+maxlag]
The return vaue has length 2*maxlag + 1.
"""
py = np.pad(y.conj(), 2*maxlag, mode='constant')
T = as_strided(py[2*maxlag:], shape=(2*maxlag+1, len(y) + 2*maxlag),
strides=(-py.strides[0], py.strides[0]))
px = np.pad(x, maxlag, mode='constant')
if mode == 'dot': # get lagged dot product
return T.dot(px)
elif mode == 'corr': # gets Pearson correlation
return (T.dot(px)/px.size - (T.mean(axis=1)*px.mean())) / \
(np.std(T, axis=1) * np.std(px))
I encountered the same problem some time ago, I paid more attention to the efficiency of calculation.Refer to the source code of MATLAB's function xcorr.m, I made a simple one.
import numpy as np
from scipy import signal, fftpack
import math
import time
def nextpow2(x):
if x == 0:
y = 0
else:
y = math.ceil(math.log2(x))
return y
def xcorr(x, y, maxlag):
m = max(len(x), len(y))
mx1 = min(maxlag, m - 1)
ceilLog2 = nextpow2(2 * m - 1)
m2 = 2 ** ceilLog2
X = fftpack.fft(x, m2)
Y = fftpack.fft(y, m2)
c1 = np.real(fftpack.ifft(X * np.conj(Y)))
index1 = np.arange(1, mx1+1, 1) + (m2 - mx1 -1)
index2 = np.arange(1, mx1+2, 1) - 1
c = np.hstack((c1[index1], c1[index2]))
return c
if __name__ == "__main__":
s = time.clock()
a = [1, 2, 3, 4, 5]
b = [6, 7, 8, 9, 10]
c = xcorr(a, b, 3)
e = time.clock()
print(c)
print(e-c)
Take the results of a certain run as an exmple:
[ 29. 56. 90. 130. 110. 86. 59.]
0.0001745000000001884
comparing with MATLAB code:
clear;close all;clc
tic
a = [1, 2, 3, 4, 5];
b = [6, 7, 8, 9, 10];
c = xcorr(a, b, 3)
toc
29.0000 56.0000 90.0000 130.0000 110.0000 86.0000 59.0000
时间已过 0.000279 秒。
If anyone can give a strict mathematical derivation about this,that would be very helpful.
I think I have found a solution, as I was facing the same problem:
If you have two vectors x and y of any length N, and want a cross-correlation with a window of fixed len m, you can do:
x = <some_data>
y = <some_data>
# Trim your variables
x_short = x[window:]
y_short = y[window:]
# do two xcorrelations, lagging x and y respectively
left_xcorr = np.correlate(x, y_short) #defaults to 'valid'
right_xcorr = np.correlate(x_short, y) #defaults to 'valid'
# combine the xcorrelations
# note the first value of right_xcorr is the same as the last of left_xcorr
xcorr = np.concatenate(left_xcorr, right_xcorr[1:])
Remember you might need to normalise the variables if you want a bounded correlation
Here is another answer, sourced from here, seems faster on the margin than np.correlate and has the benefit of returning a normalised correlation:
def rolling_window(self, a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
def xcorr(self, x,y):
N=len(x)
M=len(y)
meany=np.mean(y)
stdy=np.std(np.asarray(y))
tmp=self.rolling_window(np.asarray(x),M)
c=np.sum((y-meany)*(tmp-np.reshape(np.mean(tmp,-1),(N-M+1,1))),-1)/(M*np.std(tmp,-1)*stdy)
return c
as I answered here, https://stackoverflow.com/a/47897581/5122657
matplotlib.xcorr has the maxlags param. It is actually a wrapper of the numpy.correlate, so there is no performance saving. Nevertheless it gives exactly the same result given by Matlab's cross-correlation function. Below I edited the code from matplotlib so that it will return only the correlation. The reason is that if we use matplotlib.corr as it is, it will return the plot as well. The problem is, if we put complex data type as the arguments into it, we will get "casting complex to real datatype" warning when matplotlib tries to draw the plot.
<!-- language: python -->
import numpy as np
import matplotlib.pyplot as plt
def xcorr(x, y, maxlags=10):
Nx = len(x)
if Nx != len(y):
raise ValueError('x and y must be equal length')
c = np.correlate(x, y, mode=2)
if maxlags is None:
maxlags = Nx - 1
if maxlags >= Nx or maxlags < 1:
raise ValueError('maxlags must be None or strictly positive < %d' % Nx)
c = c[Nx - 1 - maxlags:Nx + maxlags]
return c

python recursive vectorization with timeseries

I have a Timeseries (s) which need to be processed recursively to get a timeseries result (res). Here is my sample code:
res=s.copy()*0
res[1]=k # k is a constant
for i in range(2,len(s)):
res[i]=c1*(s[i]+s[i-1])/2 +c2*res[i-1]+c3*res[i-2]
where c1,c2,c3 are constants. It works properly but I'd like to use vectorization and I tried with:
res[2:]=c1*(s[2:]+s[1:-1])/2+c2*res[1:-1]+c3*res[0:-2]
but I get "ValueError: operands could not be broadcast together with shapes (1016) (1018) "
if I try with
res=c1*(s[2:]+s[1:-1])/2+c2*res[1:-1]+c3*res[0:-2]
doesn't give any error, but I don't get a correct result, because res[0] and res[1] have to be initialized before the calculation will take place.
Is there a way to process it with vectorization?
Any help will be appreciated, thanks!
This expression
res[i] = c1*(s[i] + s[i-1])/2 + c2*res[i-1] + c3*res[i-2]
says that res is the output of a linear filter (or ARMA process) with input s. Several libraries have functions for computing this. Here's how you can use the scipy function scipy.signal.lfilter.
From inspection of the recurrence relation, we get the coefficients of the numerator (b) and denominator (a) of the filter's transfer function:
b = c1 * np.array([0.5, 0.5])
a = np.array([1, -c2, -c3])
We'll also need an appropriate initial condition for lfilter to handle res[:2] == [0, k]. For this, we use scipy.signal.lfiltic:
zi = lfiltic(b, a, [k, 0], x=s[1::-1])
In the simplest case, one would call lfilter like this:
y = lfilter(b, a, s)
With an initial condition zi, we use:
y, zo = lfilter(b, a, s, zi=zi)
However, to exactly match the calculation provided in the question, we need the output y to start with [0, k]. So we'll allocate an array y, initialize the first two elements with [0, k], and assign the output of lfilter to y[2:]:
y = np.empty_like(s)
y[:2] = [0, k]
y[2:], zo = lfilter(b, a, s[2:], zi=zi)
Here's a complete script with the original loop and with lfilter:
import numpy as np
from scipy.signal import lfilter, lfiltic
c1 = 0.125
c2 = 0.5
c3 = 0.25
np.random.seed(123)
s = np.random.rand(8)
k = 3.0
# Original version (edited lightly)
res = np.zeros_like(s)
res[1] = k # k is a constant
for i in range(2, len(s)):
res[i] = c1*(s[i] + s[i-1])/2 + c2*res[i-1] + c3*res[i-2]
# Using scipy.signal.lfilter
# Coefficients of the filter's transfer function.
b = c1 * np.array([0.5, 0.5])
a = np.array([1, -c2, -c3])
# Create the initial condition of the filter such that
# y[:2] == [0, k]
zi = lfiltic(b, a, [k, 0], x=s[1::-1])
y = np.empty_like(s)
y[:2] = [0, k]
y[2:], zo = lfilter(b, a, s[2:], zi=zi)
np.set_printoptions(precision=5)
print "res:", res
print "y: ", y
The output is:
res: [ 0. 3. 1.53206 1.56467 1.24477 1.08496 0.94142 0.84605]
y: [ 0. 3. 1.53206 1.56467 1.24477 1.08496 0.94142 0.84605]
lfilter accepts an axis argument, so you can filter an array of signals with a single call. lfiltic does not have an axis argument, so setting up the initial conditions requires a loop. The following script shows an example.
import numpy as np
from scipy.signal import lfilter, lfiltic
import matplotlib.pyplot as plt
# Parameters
c1 = 0.2
c2 = 1.1
c3 = -0.5
k = 1
# Create an array of signals for the demonstration.
np.random.seed(123)
nsamples = 50
nsignals = 4
s = np.random.randn(nsamples, nsignals)
# Coefficients of the filter's transfer function.
b = c1 * np.array([0.5, 0.5])
a = np.array([1, -c2, -c3])
# Create the initial condition of the filter for each signal
# such that
# y[:2] == [0, k]
# We need a loop here, because lfiltic is not vectorized.
zi = np.empty((2, nsignals))
for i in range(nsignals):
zi[:, i] = lfiltic(b, a, [k, 0], x=s[1::-1, i])
# Create the filtered signals.
y = np.empty_like(s)
y[:2, :] = np.array([0, k]).reshape(-1, 1)
y[2:, :], zo = lfilter(b, a, s[2:], zi=zi, axis=0)
# Plot the filtered signals.
plt.plot(y, linewidth=2, alpha=0.6)
ptp = y.ptp()
plt.ylim(y.min() - 0.05*ptp, y.max() + 0.05*ptp)
plt.grid(True)
plt.show()
Plot:

Categories

Resources