Reconstructing the original data from detrended data -- Python - python

I have obtained the detrended data from the following python code:
Detrended_Data = signal.detrend(Original_Data)
Is there a function in python wherein the "Original_Data" can be reconstructed using the "Detrended_Data" and some "correction factor"?

Are you referring to scipy.signal.detrend? If so, the answer is no -- there is no (and can never be an) un-detrend function. detrend maps many arrays to the same array. For example,
import numpy as np
import scipy.signal as signal
t = np.linspace(0, 5, 100)
assert np.allclose(signal.detrend(t), signal.detrend(2*t))
If there were an undetrend function, it would have to map signal.detrend(t) back to t, and also map signal.detrend(2*t) back to 2*t. That's impossible, since signal.detrend(t) is the same array as signal.detrend(2*t).

I guess you could use numpy to trend your data. That wouldn't properly give you the original data but it would make less 'noisy'.
Read this question as it goes much more in detail into this.

Related

How to deal with large integers in NumPy?

I'm doing a data analysis project where I'm working with really large numbers. I originally did everything in pure python but I'm now trying to do it with numpy and pandas. However it seems like I've hit a roadblock, since it is not possible to handle integers larger than 64 bits in numpy (if I use python ints in numpy they max out at 9223372036854775807). Do I just throw away numpy and pandas completely or is there a way to use them with python-style arbitrary large integers? I'm okay with a performance hit.
by default numpy keeps elements as number datatype.
But you can force typing to object, like below
import numpy as np
x = np.array([10,20,30,40], dtype=object)
x_exp2 = 1000**x
print(x_exp2)
the output is
[1000000000000000000000000000000
1000000000000000000000000000000000000000000000000000000000000
1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000]
The drawback is that the execution is much slower.
Later Edit to show that np.sum() works. There could be some limitations of course.
import numpy as np
x = np.array([10,20,30,40], dtype=object)
x_exp2 = 1000**x
print(x_exp2)
print(np.sum(x_exp2))
print(np.prod(x_exp2))
and the output is:
[1000000000000000000000000000000
1000000000000000000000000000000000000000000000000000000000000
1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000]
1000000000000000000000000000001000000000000000000000000000001000000000000000000000000000001000000000000000000000000000000
1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

Why does netCDF4 give different results depending on how data is read?

I am coding in python, and trying to use netCDF4 to read in some floating point netCDF data. Mt original code looked like
from netCDF4 import Dataset
import numpy as np
infile='blahblahblah'
ds = Dataset(infile)
start_pt = 5 # or whatever
x = ds.variables['thedata'][start_pt:start_pt+2,:,:,:]
Now, because of various and sundry other things, I now have to read 'thedata' one slice at a time:
x = np.zeros([2,I,J,K]) # I,J,K match size of input array
for n in range(2):
x[n,:,:,:] = ds.variables['thedata'][start_pt+n,:,:,:]
The thing is that the two methods of reading give slightly different results. Nothing big, like one part in 10 to the fifth, but still ....
So can anyone tell me why this is happening and how I can guarantee the same results from the two methods? My thought was that the first method perhaps automatically establishes x as being the same type as the input data, while the second method establishes x as the default type for a numpy array. However, the input data is 64 bit and I thought the default for a numpy array was also 64 bit. So that doesn't explain it. Any ideas? Thanks.
The first example pulls the data into a NetCDF4 Variable object, while the second example pulls the data into a numpy array. Is it possible that the Variable object is just displaying the data with a different amount of precision?

DSP : audio processing : squart or log to leverage fft?

Context :
I am discovering the vast field of DSP. Yes I'm a beginner.
My goal :
Apply fft on an audio array given by audiolab to get the different frequencies of the signal.
Question :
One question : I just cannot get what to do with a numpy array which contains audio datas, thanks to audiolab. :
import numpy as np
from scikits.audiolab import Sndfile
f = Sndfile('first.ogg', 'r')
# Sndfile instances can be queried for the audio file meta-data
fs = f.samplerate
nc = f.channels
enc = f.encoding
print(fs,nc,enc)
# Reading is straightfoward
data = f.read_frames(10)
print(data)
print(np.fft.fft(data))
Now I have got my datas.
Readings
I read those two nice articles here :
Analyze audio using Fast Fourier Transform (the accepted answser is wonderful)
and
http://www.onlamp.com/pub/a/python/2001/01/31/numerically.html?page=2
Now there are two technics : apparently one suggests square (first link) whereas the other a log, especially : 10ln10(abs(1.10**-20 + value))
Which one is the best ?
SUM UP :
I would like to get the fourier analysis of my array but any of those two answers seems to only emphasis the signal and not isolating the components.
I may be wrong, I am a still a noob.
What should I really do then ?
Thanks,
UPDATE:
I ask this question :
DSP - get the amplitude of all the frequencies which is related to this one.
Your question seems pretty confused, but you've obviously tried something, which is great. Let me take a step back and suggest an overall route for you:
Start by breaking your audio into chunks of some size, say N.
Perform the FFT on each chunk of N samples.
THEN worry about displaying the data as RMS (the square approach) or dB (the ln-based approach).
Really, you can think of those values as scaling factors for display.
If you need help with the FFT itself, my blog post on pitch detection with the FFT may help: http://blog.bjornroche.com/2012/07/frequency-detection-using-fft-aka-pitch.html
Adding to the answer given by #Bjorn Roche.
Here is a simple code for plotting frequency spectrum, using dB scale.
It uses matplotlib for plotting.
import numpy as np
import pylab
# for a real signal
def plotfftspectrum(signal, dt): # where dt is the sample rate
n = signal.size
spectrum = np.abs(np.fft.fft(signal))
spectrum = 20*np.log(spectrum/spectrum.max()) # dB scale
frequencies = np.fft.fftfreq(n, dt)
pylab.plot(frequencies[:n//2], spectrum[:n//2])
# plot n//2 due real function symmetry
pylab.show()
You can use it, after reading at least some samples of your data, e.g like 1024.
data = f.read_frames(1024)
plotfftspectrum(data, 1./f.samplerate)
Where I believe your sample rate is in frequency.

Why does the numpy angle function return values also for the masked array values

If you try the following code segment
import numpy as np
import numpy.ma as ma
a = np.random.random(100) + 1j*np.random.random(100)
mask = np.ones_like(a, dtype='bool')
mask[0:9] = False
a = ma.masked_array(a, mask)
phase = np.angle(a)
The phase array will not be masked. The angle function will return values for the whole array, even for the masked out values. Am I doing something wrong here or is this the way it should be? If so, why?
Had a quick look at the numpy source, and it might be a bug/not implemented yet.
It's listed as a "missing feature (work in progress)" on the numpy.ma page, issue #1: http://projects.scipy.org/numpy/wiki/MaskedArray.
The problem is that a number of unary functions such as np.angle, np.quantile call [np.]asarray in the source, which strips out the mask.
As the devs explain in the page I linked to, if these functions used ma.asarray instead of np.asarray they'd work, but they don't :(.
I guess this is a patch yet to be submitted?
As a temporary workaround, np.angle basically calls np.arctan2(a.imag,a.real) (optionally multiplying by 180/pi to get degrees), so you could use that.

Is it possible to reproduce randn() of MATLAB with NumPy?

I wonder if it is possible to exactly reproduce the whole sequence of randn() of MATLAB with NumPy. I coded my own routine with Python/Numpy, and it is giving me a little bit different results from the MATLAB code somebody else did, and I am having hard time finding out where it is coming from because of different random draws.
I have found the numpy.random.seed value which produces the same number for the first draw, but from the second draw and on, it is completely different. I'm making multivariate normal draws for about 20,000 times so I don't want to just save the matlab draws and read it in Python.
The user asked if it was possible to reproduce the output of randn() of Matlab, not rand. I have not been able to set the algorithm or seed to reproduce the exact number for randn(), but the solution below works for me.
In Matlab: Generate your normal distributed random numbers as follows:
rng(1);
norminv(rand(1,5),0,1)
ans =
-0.2095 0.5838 -3.6849 -0.5177 -1.0504
In Python: Generate your normal distributed random numbers as follows:
import numpy as np
from scipy.stats import norm
np.random.seed(1)
norm.ppf(np.random.rand(1,5))
array([[-0.2095, 0.5838, -3.6849, -0.5177,-1.0504]])
It is quite convenient to have functions, which can reproduce equal random numbers, when moving from Matlab to Python or vice versa.
If you set the random number generator to the same seed, it will theoretically create the same numbers, ie in matlab. I am not quite sure how to best do it, but this seems to work, in matlab do:
rand('twister', 5489)
and corresponding in numy:
np.random.seed(5489)
To (re)initalize your random number generators. This gives for me the same numbers for rand() and np.random.random(), however not for randn, I am not sure if there is an easy method for that.
With newer matlab versions you can probably set up a RandStream with the same properties as numpy, for older you can reproduce numpy's randn in matlab (or vice versa). Numpy uses the polar form to create the uniform numbers from np.random.random() (the second algorithm given here: http://www.taygeta.com/random/gaussian.html). You could just write that algorithm in matlab to create the same randn numbers as numpy does from the rand function in matlab.
If you don't need a huge amount of random numbers, just save them in a .mat and read them from scipy.io though...
Just wanted to further clarify on using the twister/seeding method: MATLAB and numpy generate the same sequence using this seeding but will fill them out in matrices differently.
MATLAB fills out a matrix down columns, while python goes down rows. So in order to get the same matrices in both, you have to transpose:
MATLAB:
rand('twister', 1337);
A = rand(3,5)
A =
Columns 1 through 2
0.262024675015582 0.459316887214567
0.158683972154466 0.321000540520167
0.278126519494360 0.518392820597537
Columns 3 through 4
0.261942925565145 0.115274226683149
0.976085284877434 0.386275068634359
0.732814552690482 0.628501179539712
Column 5
0.125057926335599
0.983548605143641
0.443224868645128
python:
import numpy as np
np.random.seed(1337)
A = np.random.random((5,3))
A.T
array([[ 0.26202468, 0.45931689, 0.26194293, 0.11527423, 0.12505793],
[ 0.15868397, 0.32100054, 0.97608528, 0.38627507, 0.98354861],
[ 0.27812652, 0.51839282, 0.73281455, 0.62850118, 0.44322487]])
Note: I also placed this answer on this similar question: Comparing Matlab and Numpy code that uses random number generation

Categories

Resources