It seems that scipy.signal.resample() makes errors when downsampling to an even number of points. For example, if we upsample a function to a multiple of the original points and then downsample again, we should get the original function back.
from scipy import signal
import numpy as np
def test_resample(n1,n2): # upsample from n1 to n2 points and back
x1=np.arange(n1)
y1=np.sin(x1)
y2,x2=signal.resample(y1,n2,x1)
y3,x3=signal.resample(y2,n1,x2)
print np.allclose(y1,y3)
But this fails when the lower number of points is even:
test_resample(10,20)
False
test_resample(11,22)
True
test_resample(11,33)
True
The problem occurs at the downsampling step. The errors are large, at least several percent for functions I tested.
Update of 4/8/17: This really seems to be a coding error. I reported details of the bug here.
Related
I am having issues implementing a strategy described in the article "Blind identification strategies for room occupancy estimation" in Python
In order to make my question as easy to answer as possible, I will do my best to keep the level of detail to the bare minimum
Here are the Mathematics of the problem
My aim is to minimize L as it represents the negative log likelihood function. The problem : the implementation leads to astronomically high values that don't seem coherent (infinite for certain values even). Is it? Because my results do not correspond to the ones given in the article either and I can't find why
Here is the L function I defined in Python:
import numpy as np
from numpy.linalg import inv,det
from numpy import dot,transpose,log
def L(ARX,O,u,y):
#Preliminaries: Break down theta into a,bu,bo,s2,o(1),...,o(N)
N=O.shape[0]
[a,bu,bo,s2]=ARX #the coefficient of the ARX system to be identified
I=np.eye(N) #the identity matrix for later calculations
#Delta matrix
Delta=np.zeros([N,N])
Delta[1:N,0:N-1]=np.eye(N-1)
#y_barre
y_barre=dot(I-a*Delta,y)-dot(bu*Delta,u)-dot(bo*Delta,O)
#Sigma_y (covariance matrix)
Sy=s2*dot(inv(I-a*Delta),inv(I-a*Delta).T)
#finally the negative log likelihood is given by
return log(det(Sy))+1/s2*dot(y_barre.T,y_barre)
I am new to Python, so I am not sure if this problem is due to my inexperience or whether this is a glitch.
I am running this code multiple times on the same data (no random number generation) and getting different results. This has occurred with more than one variable so far, and obviously I cannot proceed with the analysis until I figure out which results are trustworthy. Here is a short sample of the results I have obtained after running the code four times. Why is there such a discrepancy between these outputs? I am puzzled and greatly appreciate your advice.
Linear Regression
from scipy.stats import linregress
import scipy.stats
from scipy.signal import welch
import matplotlib
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import scipy.signal as signal
part_022_o = pd.read_excel(r'C:\Users\Me\Desktop\Behavioral Data Processed\part_022_combined_other.xlsx')
distance_o = part_022_o["distance"]
fs = 200
f, Pwelch_spec = signal.welch(distance_o, fs=fs, window='hanning',nperseg=400, noverlap=200, scaling='density', average='mean')
log_f = np.log(f, where=f>0)
log_pwelch = np.log(Pwelch_spec, where=Pwelch_spec>0)
idx = np.isfinite(log_f) & np.isfinite(log_pwelch)
polynomial_coefficients = np.polyfit(log_f[idx],log_pwelch[idx],1)
print(polynomial_coefficients)
scipy.stats.linregress(log_f[idx], log_pwelch[idx])
Results First Attempt
[ 0.00324568 -2.82962602]
Results Second Attempt
[-2.70137164 6.97117509]
Results Third Attempt
[-2.70137164 6.97117509]
Results Fourth Attempt
[-2.28028005 5.53839502]
The same thing happens when I use scipy.stats.linregress().
Thank you,
Confused
Edit: full code added.
Also, the issue appears to be related to np.log(), since only the values of "log_f" array seem to be changing with the different outputs. It is hard to be certain that nothing else is changing (e.g. log_pwelch), but differences in output clearly correspond to differences in the first value of the "log_f" array.
Edit: I have narrowed the issue down to np.log(f, where=f>0). The first value in the f array is zero. According to the documentation of numpy log, "...Note that if an uninitialized out array is created via the default out=None, locations within it where the condition is False will remain uninitialized." Apparently this means that the value or variable is unpredictable and can vary from trial to trial, which is exactly what I am observing. Given my inexperience with Python, I am not sure what the best solution is (e.g. specifying the out-array in the log function, use a random seed, just note the regression coefficients whenever the value of zero is unchanged after log, etc.)
Try to use a random seed to reproduce results. Do this with the following code at the top of your program:
import numpy as np
np.random.seed(123) or any number you want
see here for more info: https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.random.seed.html
A random seed ensures you get repeatable results when some part of your program is generating numbers at random.
Try finding out what the functions (np.polyfit(), np.log()) are actually doing using documentation.
This is standard practice for scikit-learn and ML to use a seed value.
I'm writing a program that is for a friend of mine that is currently studying Aeronautical Engineering. I'm trying to test if the math I've implemented works. For those who know, I'm trying to calculate the divergence (I think I'm not an engineer and I'm not going to pretend that I am).
He sent me a Stack overflow link to a how he thinks this should be done. (The thread can be found here. His version doesn't work for me as it gives me a Numpy error as seen below:
numpy.core._internal.AxisError: axis 1 is out of bounds for array of
dimension 1
Now I've tried a different method that gives me a different error as seen below:
ValueError: operands could not be broadcast together with shapes (60,58)
(60,59)
This method gives me the error above and I'm not entirely sure how to fix it. I've put the code that gives me the above error.
velocity = np.diff(c_flow)/np.diff(zex)
ucom = velocity.real
vcom = -(velocity.imag)
deltau = np.divide((np.diff(ucom)),(np.diff(x)))
deltav = np.divide((np.diff(vcom)),np.diff(y))
print(deltau + deltav)
Note: C_flow is defined earlier in the program and is the complex potential. zex is also defined earlier as an early form of the complex variable. x and y are two coordinate matrices from coordinate vectors.
The expected results from the print statement should be zero or a value that is very close to zero. (I'm not entirely sure what the value should be but as I've said, I'm not an engineer)
Thank you in advance
EDIT:
After following BenT's advice I used np.gradient and np.sum but this was adding the axis in the wrong direction so to counteract this the I separated the two functions as seen below:
velocity = np.diff(c_flow)/np.diff(z)
grad = (np.gradient(velocity))
divergence = np.sum(grad, axis=0)
print(np.average(divergence))
print(np.average(velocity))
I am trying to write a script in python to detect the existence of a simple alarm sound in any given input audio file. I explain my solution and I appreciate it if anyone can confirm it is a good solution. Any other solution implementable in python is appreciated.
The way I do this is calculating cross correlation of the two signals by calculating FFT of both signals (one is reversed), and multiplying them together and then calculating IFFT of the result. Then finding the peak of the result and comparing it with a pre-specified threshold would determine if the alarm sound is detected or not.
This is my code:
import scipy.fftpack as fftpack
def similarity(template, test):
corr = fftpack.irfft(fftpack.rfft(test , 2 * test.size ) * \
fftpack.rfft(template[:-1] , 2 * template.size ))
return max(abs(corr))
template and test are the 1-D lists of signal data. The second argument to rfft is used to pad zeros for calculating FFT. however, I am not sure how many zeros should be added. Also, should I do any normalisation o the given signal before applying FFT? for example, normalizing it based on the peak of template signal?
Solved!
I just needed to use scipy.signal.fftconvolve which takes care of zero padding itself. No normalization was required. So the working code for me is:
from scipy.signal import fftconvolve
def similarity(template, test):
corr = fftconvolve(template, test, mode='same')
return max(abs(corr))
I'm trying to calculate the lag between two signals in Python using cross correlation. The two signals are almost identical except for a very small timelag. I've tried numpy.correlate and scipy.convolve (alot faster) and both works relatively well but gives a small error. I'm starting to suspect that the error is the result of Python/scipy/numpy truncating a float somewhere. Has anyone been able to get high accuracy signal delay calculations working in Python?
Best regards
Fredrik
Depending on the power spectrum of the two signals you do get a small error due to the fact that the cross correlation is not properly normalised at each lag. Here is a little function that I use; it normallises the overlap region at each lag and I found it gives accurate results:
def NormCrossCorrSlow(x1, x2,
nlags=400):
res=[]
for i in range(-(nlags/2),nlags/2,1):
if i<0:
xx1=x1[:i]
xx2=x2[-i:]
elif i==0:
xx1=x1
xx2=x2
else:
xx1=x1[i:]
xx2=x2[:-i]
res.append( (xx1*xx2).sum() /( (xx1**2).sum() *(xx2**2).sum() )**0.5)
return numpy.array(res)