How to prewhiten univariate time series in Python? - python

I am trying to find the cross-correlation between two time series and it turns out that they are auto-correlated and non-stationary. Reading about them, it seems that you have to pre-whiten the series before finding the correlation. But there is no clear process mentioned anywhere on how to pre-whiten a time series. I'm going to do it in Python so if anyone can mention even the steps, that would be great. R has a nice "prewhiten()" function for this purpose and I was wondering how to implement it in Python.

Related

Detect Seasonality in Time Series with Python

I've been trying to identify some seasonal customers in a dataset. As a first approach, I used the seasonal_decompose() function of the statsmodel package - this is useful for visualizing specific customers, but won't work for the whole dataset as I have almost 8000 different time series, one for each client.
Then, I decided trying the ADF test - the problem here was that it only detects
if my series is stationary or not, and because of the trend it won't work in my case.
I also tried combining this with the KPSS test (that tests for trend-stationarity),
but the results still bad.
Now, I have thought about four alternatives:
Find a way to evaluate it manually using a mean/variance approach
Try using CHTest
Try using the Darts package
Detrend my data and apply those tests (or others) again
The thing is that I couldn't find good examples of any of this in Python... most of the
solutions I found for my problem are developed in R. Is there a suitable way of
doing this in Python or should I give up, export my series and try using R?
Could you help me with some tips? I would really appreciate reading suggestions too. Thanks!

Solving the ODE with Odeint without using a np.linespace, but another array of time point

I would like my dynamic system to calculate and plot only a certain time spot. But as far as I learn, the setting for the time all using np.linespace, which is a continuous range.
How do I change my setting for the time in this case?
I would like to have those t which wanted to be plot out be the multiple of 2*pi:
t=n*pi,(n=0,1,2,3....)
how can I set up the time in this case?
Any array may also work? or only in the routines of np.linespace will work?
Thank you very much

Asking for how to reduce the time for the multiple fittings

Dear all in this community:
Hello. I am trying to do curve fitting at each voxel (e.g., in 2D image, it would be a pixel) through "Matlab". Thus, simply, I hope to do "curve fittings a huge number of times" (e.g., 256X256X500 times), but as expected, it requires a lot of time (roughly a year) under the assumption that the time for a single fitting would take roughly 1 sec.
Except for CUDA programming, is there some way to solve the issue in the Matlab? I am currently using Matlab and my code is executed with "triple for-loops" and "lsqcurvefit (i.e., built-in function for fitting)". Could I get some help to reduce the calculation time? (e.g., how to make the fitting process vectorized?)
Any comments would be really helpful to me. Thanks.

Wavelet for time series

I am trying to use wavelets coefficients as feature for neural networks on a time series data and I am bit confused on usage of the same. Do I need to find the coefficients on entire time series at once, or use a sliding window for finding the same. I mean, will finding coefficients on entire time series for once, include the future data points while determining those coefficients? What should be the approach to go about using Wavelets on a time series data without look ahead bias if any?
It is hard to provide you with a detailed answer without knowing what you are trying to achieve.
In a nutshell, you first need to decide whether you want to apply a discrete (DWT) or a continous (CWT) wavelet transform to your time series.
A DWT will allow you to decompose your input data into a set of discrete levels, providing you with information about the frequency content of the signal i.e. determining whether the signal contains high frequency variations or low frequency trends. Think of it as applying several band-pass filters to your input data.
I do not think that you should apply a DWT to your entire time series at once. Since you are working with financial data, maybe decomposing your input signal into 1-day windows and applying a DWT on these subsets would do the trick for you.
In any case, I would suggest:
Installing the pywt toolbox and playing with a dummy time series to understand how wavelet decomposition works.
Checking out the abundant literature available about wavelet analysis of financial data. For instance, if you are interested into financial time series forecasting, you might want to read this paper.
Posting your future questions on the DSP stack exchange, unless you have a specific coding-related answer.

Time series - correlation and lag time

I am studying the correlation between a set of input variables and a response variable, price. These are all in time series.
1) Is it necessary that I smooth out the curve where the input variable is cyclical (autoregressive)? If so, how?
2) Once a correlation is established, I would like to quantify exactly how the input variable affects the response variable.
Eg: "Once X increases >10% then there is an 2% increase in y 6 months later."
Which python libraries should I be looking at to implement this - in particular to figure out the lag time between two correlated occurrences?
Example:
I already looked at: statsmodels.tsa.ARMA but it seems to deal with predicting only one variable over time. In scipy the covariance matrix can tell me about the correlation, but does not help with figuring out the lag time.
While part of the question is more statistics based, the bit about how to do it in Python seems at home here. I see that you've since decided to do this in R from looking at your question on Cross Validated, but in case you decide to move back to Python, or for the benefit of anyone else finding this question:
I think you were in the right area looking at statsmodels.tsa, but there's a lot more to it than just the ARMA package:
http://statsmodels.sourceforge.net/devel/tsa.html
In particular, have a look at statsmodels.tsa.vector_ar for modelling multivariate time series. The documentation for it is available here:
http://statsmodels.sourceforge.net/devel/vector_ar.html
The page above specifies that it's for working with stationary time series - I presume this means removing both trend and any seasonality or periodicity. The following link is ultimately readying a model for forecasting, but it discusses the Box-Jenkins approach for building a model, including making it stationary:
http://www.colorado.edu/geography/class_homepages/geog_4023_s11/Lecture16_TS3.pdf
You'll notice that link discusses looking for autocorrelations (ACF) and partial autocorrelations (PACF), and then using the Augmented Dickey-Fuller test to test whether the series is now stationary. Tools for all three can be found in statsmodels.tsa.stattools. Likewise, statsmodels.tsa.arma_process has ACF and PACF.
The above link also discusses using metrics like AIC to determine the best model; both statsmodels.tsa.var_model and statsmodels.tsa.ar_model include AIC (amongst other measures). The same measures seem to be used for calculating lag order in var_model, using select_order.
In addition, the pandas library is at least partially integrated into statsmodels and has a lot of time series and data analysis functionality itself, so will probably be of interest. The time series documentation is located here:
http://pandas.pydata.org/pandas-docs/stable/timeseries.html

Categories

Resources