How to solve real life difference equations using python - python

I want to solve a difference equation using python.
y = x(n - 1) - (0.5(x(n-2) + x(n))
x here is a long array of values. I want to plot y with respect to another time sequence array t using Plotly. I can plot x with t, but I am not able to generate the filtered signal y. I have tried the code below, but it seems I'm missing something. I am not getting the desired output.
from scipy import signal
from plotly.offline import plot, iplot
x = array(...)
t = array(...) # x and t are big arrays
b = [-0.5, 1, -0.5]
a = 0
y = signal.lfilter(b, a, x, axis=-1, zi=None)
iplot([{"x": t, "y": y}])
However, the output is something like this.
>>>y
>>> array([-inf, ..., nan])
Therefore, I am getting a blank graph.
UPDATE with examples of x and t (9 values each):
x = [3.1137561664814495,
-1.4589810840917137,
-0.12631870857936914,
-1.2695030212226599,
2.7600637824592158,
-1.7810937909691049,
0.050527483431747656,
0.27158522344564368,
0.48001109260160274]
t = [0.0035589523041146265,
0.011991765409288035,
0.020505576424579175,
0.028935389041247817,
0.037447199517441021,
0.045880011487565042,
0.054462819797731044,
0.062835632533346342,
0.071347441874490158]

It appears that your problem is defining a = 0. When running your example, you get the following warning from SciPy:
/usr/local/lib/python2.7/site-packages/scipy/signal/signaltools.py:1353: RuntimeWarning:
divide by zero encountered in true_divide
[-inf inf nan nan nan inf -inf nan nan]
This division by zero is defined by value a. If you look at the documentation of scipy.signal.lfilter, it points out the following:
a : array_like
The denominator coefficient vector in a 1-D sequence. If a[0] is not 1, then both a and b are normalized by a[0].
If you change a = 0 to a = 1 you should get output you desire, although do consider that you might want to apply data normalization by a different factor.

Related

What is an efficient way to calculate the mean of values in the bin with maximum frequency for large number of numpy arrays?

I am looking for an efficient way to do the following calculations on millions of arrays. For the values in each array, I want to calculate the mean of the values in the bin with most frequency as demonstrated below. Some of the arrays might contain nan values and other values are float. The loop for my actual data takes too long to finish.
import numpy as np
array = np.array([np.random.uniform(0, 10) for i in range(800,)])
# adding nan values
mask = np.random.choice([1, 0], array.shape, p=[.7, .3]).astype(bool)
array[mask] = np.nan
array = array.reshape(50, 16)
bin_values=np.linspace(0, 10, 21)
f = np.apply_along_axis(lambda a: np.histogram(a, bins=bin_values)[0], 1, array)
bin_start = np.apply_along_axis(lambda a: bin_values[np.argmax(a)], 1, f).reshape(array.shape[0], -1)
bin_end = bin_start + (abs(bin_values[1]-bin_values[0])
values = np.zeros(array.shape[0])
for i in range(array.shape[0]):
values[i] = np.nanmean(array[i][(array[i]>=bin_start[i])*(array[i]<bin_end[i])])
Also, when I run the above code I get three warnings. The first is 'RuntimeWarning: Mean of empty slice' for the line where I calculate the value variable. I set a condition in case I have all nan values to skip this line, but the warning did not go away. I was wondering what the reason is. The other two warnings are for when the less and greater_equal conditions do not meet which make sense to me since they might be nan values.
The arrays that I want to run this algorithm on are independent, but I am already processing them with 12 separate scripts. Running the code in parallel would be an option, however, for now I am looking to improve the algorithm itself.
The reason that I am using lambda function is to run numpy.histogram over an axis since it seems the histogram function does not take an axis as an option. I was able to use a mask and remove the loop from the code. The code is 2 times faster now, but I think it still can be improved more.
I can explain what I want to do in more detail by an example if it clarifies it. Imagine I have 36 numbers which are greater than 0 and smaller than 20. Also, I have bins with equal distance of 0.5 over the same interval (0.0_0.5, 0.5_1.0, 1.0_1.5, … , 19.5_20.0). I want to see if I put the 36 numbers in their corresponding bin what would be the mean of the numbers within the bin which contain the most number of numbers.
Please post your solution if you can think of a faster algorithm.
import numpy as np
# creating an array to test the algorithm
array = np.array([np.random.uniform(0, 10) for i in range(800,)])
# adding nan values
mask = np.random.choice([1, 0], array.shape, p=[.7, .3]).astype(bool)
array[mask] = np.nan
array = array.reshape(50, 16)
# the algorithm
bin_values=np.linspace(0, 10, 21)
# calculating the frequency of each bin
f = np.apply_along_axis(lambda a: np.histogram(a, bins=bin_values)[0], 1, array)
bin_start = np.apply_along_axis(lambda a: bin_values[np.argmax(a)], 1, f).reshape(array.shape[0], -1)
bin_end = bin_start + (abs(bin_values[1]-bin_values[0]))
# creating a mask to get the mean over the bin with maximum frequency
mask = (array>=bin_start) * (array<bin_end)
mask_nan = np.tile(np.nan, (mask.shape[0], mask.shape[1]))
mask_nan[mask] = 1
v = np.nanmean(array * mask_nan, axis = 1)

Why does Matlab interp1 produce different results than numpy interp?

EDIT: Code edited to produce results consistent with Matlab. See below.
I am converting Matlab scripts to Python and the linear interpolation results are different in certain cases. I wonder why and if there is any way to fix this?
Here is the code example in both Matlab and Python and the resulting output (Note that t just so happens to be equal to tin in this case):
MATLAB:
t= [ 736696., 736696.00208333, 736696.00416667, 736696.00625, 736696.00833333, 736696.01041667, 736696.0125];
tin =[ 736696., 736696.00208333, 736696.00416667, 736696.00625, 736696.00833333, 736696.01041667, 736696.0125];
xin = [ nan , 1392., 1406. , 1418. , nan , 1442. , nan];
interp1(tin,xin,t)
ans =
NaN 1392 1406 1418 NaN 1442 NaN
Python (numpy):
(scipy interpolate.interp1d produces the same result as numpy)
t= [ 736696., 736696.00208333, 736696.00416667, 736696.00625, 736696.00833333, 736696.01041667, 736696.0125];
tin =[ 736696., 736696.00208333, 736696.00416667, 736696.00625, 736696.00833333, 736696.01041667, 736696.0125];
xin = [ nan , 1392., 1406. , 1418. , nan , 1442. , nan];
x = np.interp(t,tin,xin)
array([ nan, 1392., 1406., nan, nan, nan, nan])
# Edit
# Find indices where t == tin and if the np.interp output
# does not match the xin array, overwrite the np.interp output at those
# indices
same = np.where(t == tin)[0]
not_same = np.where(xin[same] != x[same])[0]
x[not_same] = xin[not_same]
It appears as if Matlab includes an additional equality check in it's interpolation.
Linear 1-D interpolation is generally done by finding two x values which span the input value x and then calculating the result as:
y = y1 + (y2-y1)*(x-x1)/(x2-x1)
If you pass in an x value which is exactly equal to one of the input x coordinates, the routine will generally calculate the correct value since x-x1 will be zero. However, if your input array has a nan as y1 or y2 these will propagate to the result.
Based on the code you posted, my best guess would be that Matlab's interpolation function has an additional check that is something like:
if x == x1:
return y1
and that the numpy function does not have this check.
To achieve the same effect in numpy you could do:
np.where(t == tin,xin,np.interp(t,tin,xin))

pearson correlation using np.random.rand failing

I have the following code to calculate the correlation coefficient using two different ways to generate number series. It fails to work for the first way (corr_coeff_pearson) but works for the 2nd way (corr_coeff_pearson_1). Why is this so? In both cases, the variables are of class 'numpy.ndarray'
import numpy as np
np.random.seed(1000)
inp_vct_lngt = 5
X = 2*np.random.rand(inp_vct_lngt,1)
y=4+3*X+np.random.randn(inp_vct_lngt,1)
print(type(X))
corr_coeff_pearson=0
corr_coeff_pearson = np.corrcoef(X,y)
print("Pearson Correlation:")
print(corr_coeff_pearson)
X_1 = np.random.randint(0,50,5)
y_1 = X_1 + np.random.normal(0,10,5)
print(type(X_1))
corr_coeff_pearson_1 = np.corrcoef(X_1,y_1)
print("Pearson Correlation:")
print(corr_coeff_pearson_1)
Is there some way to "convert" the number in the first way of generating the series that I am missing?
The issue is that X and y are 2 dimensional:
>>> X
array([[1.9330627 ],
[0.19204405],
[0.21168505],
[0.65018234],
[0.83079548]])
>>> y
array([[8.60619212],
[6.09210226],
[5.33097283],
[5.71649684],
[5.18771916]])
So corrcoef is thinking
Each row of x represents a variable, and each column a single observation of all those variables
(quoted from the docs)
What you can do is either flatten the two to one dimension:
>>> np.corrcoef(X.flatten(),y.flatten())
array([[1. , 0.84196446],
[0.84196446, 1. ]])
Or use rowvar=False:
>>> np.corrcoef(X,y,rowvar=False)
array([[1. , 0.84196446],
[0.84196446, 1. ]])

Finite Difference Function Index Error:

Below is a function for the finite difference method, it's a very standard way of calculating the derivative given some function f(x), a mesh (np.linspace), as well as a uniform distance between the each piece of the grid (h).
The problem being encountered is when I try a known function, (say x**2) across a mesh from 0,10; I am receiving a specific error. After the code I will post the error that's encountered.
def finitedifference(f,x,h,n):
"""f : function you are attempting to differentiate.
x : grid/domain with with you will differentiate.
h : distance between uniform mesh.
n : required for loop?"""
df = np.zeros_like(x)
for i in range(1,n):
df[i] = (f[i+1]-f[i-1])/(2*h)
#end_points
df[0] = (f[1]-f[0])/h
df[-1] = (f[-1]-f[-2])/h
return print(df)
What I use:
f = x**3
x = np.linspace(0,10,11)
h = x[1] - x[0]
finitedifference(f,x,h,11)
I receive the error:
"IndexError: index 11 is out of bounds for axis 0 with size 11"
Unfortunately I am not sure what this means, so maybe some clarification on the error/remedies for it? Thank you!
An array (or list) of size 11 has indices 0, 1, 2, ..., 10.
If you do a for loop over range(1, 11), it iterates over 1, 2, 3, ..., 10. If you then try to access index i+1, you'll end up outside of your array when i reaches 10.
On the other hand, you have left out index 0. Therefore, you probably want to use for i in range(n-1) to ensure that you'll start from the top and stay within limits.

Why numpy random normal generated a wrong random matrix with wrong mean value?

I am newbie to numpy and recently I got very confused on the random.normal method
I would like to generate a 2 by 2 matrix where the mean is zero so I wrote the following, however, as you can see the abs(0 - np.mean(b)) < 0.01 line outputs False, why? I expect it to output True.
>>> import numpy as np
>>> b = np.random.normal(0.0, 1.0, (2,2))
>>> b
array([[-1.44446094, -0.3655891 ],
[-1.15680584, -0.56890335]])
>>> abs(0 - np.mean(b)) < 0.01
False
If you want a generator, you'll need to manually fix the mean and std to your expected values:
def normal_gen(m, s, shape=(2,2)):
b = np.random.normal(0, s, shape)
b = (b - np.mean(b)) * (s / np.std(b)) + m
return b
Sampling from a normal distribution does not guarantee that the mean of your sample is the same as the mean of the normal distribution. If you take an infinite number of samples, it should have the same mean (via the Central Limit Theorem) but obviously, you can't really take an infinite number of samples.

Categories

Resources