Autocorrelation plot intuitive - python

I am analyzing a time series dataset and I used seasonal_decompose function in statsmodel library to obtain trend and seasonal behavior. I obtained the autocorrelation plot and the decomposition of the time-series provided should provide a “remainder” component that should be uncorrelated. By observing the autocorrelation plot how do we say that auto-correlation function indicate that the remainder is indeed uncorrelated?
I am attaching the code I used to obtain autocorrelation plot and the plot obtained.
fig, ax = plt.subplots(figsize=(20, 5))
plot_acf(data, ax=ax)
plt.show()
Autocorrelation_plot

if the results of auto correlation are close to zero then the features not not correlated. I use lag of 40, but you will need to adjust this value dependant on your data.
plt.clf()
fig,ax = plt.subplots(figsize=(12,4))
plt.style.use('seaborn-pastel')
fig = tsaplots.plot_acf(df['value'], lags=40,ax=ax)
plt.show()
print('values close to 1 are showing strong positive correlation. The blue regions are showing areas of uncertainty')

Related

Seaborn distplot() won't display frequency in the y-axis

I am trying to display the weighted frequency in the y-axis of a seaborn.distplot() graph, but it keeps displaying the density (which is the default in distplot())
I read the documentation and also many similar questions here in Stack.
The common answer is to set norm_hist=False and also to assign the weights in a bumpy array as in a standard histogram. However, it keeps showing the density and not the probability/frequency of each bin.
My code is
plt.figure(figsize=(10, 4))
plt.xlim(-0.145,0.145)
plt.axvline(0, color='grey')
data = df['col1']
x = np.random.normal(data.mean(), scale=data.std(), size=(100000))
normal_dist =sns.distplot(x, hist=False,color="red",label="Gaussian")
data_viz = sns.distplot(data,color="blue", bins=31,label="data", norm_hist=False)
# I also tried adding the weights inside the argument
#hist_kws={'weights': np.ones(len(data))/len(data)})
plt.legend(bbox_to_anchor=(1, 1), loc=1)
And I keep receiving this output:
Does anyone have an idea of what could be the problem here?
Thanks!
[EDIT]: The problem is that the y-axis is showing the kdevalues and not those from the weighted histogram. If I set kde=False then I can display the frequency in the y-axis. However, I still want to keep the kde, so I am not considering that option.
Keeping the kde and the frequency/count in one y-axis in one plot will not work because they have different scales. So it might be better to create a plot with 2 axis with each showing the kde and histogram separately.
From documentation norm_hist If True, the histogram height shows a density rather than a count. **This is implied if a KDE or fitted density is plotted**.
versusnja in https://github.com/mwaskom/seaborn/issues/479 has a workaround:
# Plot hist without kde.
# Create another Y axis.
# Plot kde without hist on the second Y axis.
# Remove Y ticks from the second axis.
first_ax = sns.distplot(data, kde=False)
second_ax = ax.twinx()
sns.distplot(data, ax=second_ax, kde=True, hist=False)
second_ax.set_yticks([])
If you need this just for visualization it should be good enough.

How to plot gamma distribution for a dataframe in python

I have a population dataframe. How to plot a gamma distribution diagram?
I am using python and I can plot a diagram using seaborn or plot bar. But I need a gamma distribution diagram. I dont know how to fit my parameters into gamma functions.
plot = sns.distplot(df["Percent"])
fig = plot.get_figure()
plot2 = rows['state'].value_counts().plot.bar()
fig2 = plot2.get_figure()
It only shows me two general diagrams. Not sure how to get gamma distribution.

Discrete Fourier Transform using scipy.fftpack

I have a question regarding the scipy.fft package, and how I can use this to generate a Fourier transform of a pulse.
I am trying to do this for an arbitrary pulse in the future, but I wanted to make it as simple as possible so I have been attempting to FFT a time domain rectangular pulse, which should produce a frequency domain Sinc function. You can see more information here: https://en.wikipedia.org/wiki/Rectangular_function
From my understanding of FFTs, a signal needs to be repeating and periodic, so in a situation like a rectangular pulse, I will need to shift this in order for the FFT algorithm to 'see' it as a symmetric pulse.
My problem arises when I observe the real and imaginary components of my Fourier transform. I expect a rectangular pulse (As it is a real and even function) to be real and symmetrical. However, I notice that there are imaginary components, even when I don't use complex numbers.
My approach to this has been the following:
Define my input pulse
Shift my input pulse so that the function is symmetric around the
origin
Fourier transform this and shift it so negative frequencies are shown
first
Separate imaginary and real components
Plot amplitude and phase of my frequencies
I have attached graphs showing what I have attempted and outlining these steps.
This is my first question on stack overflow so I am unable to post images, but a link to the imgur album is here: https://imgur.com/a/geufY
I am having trouble with the phase information of my frequency, from the images in the imgur folder, I have a linearly increasing phase difference, which should in the ideal case be flat.
I expect it is a problem with how I am shifting my input pulse, and have tried several other methods (I can post them if that would help)
Any help with this would be much appreciated, I have been pouring over examples but these mostly refer to infinite sinusoidal functions rather than pulses.
My Code is shown below:
import numpy as np
import scipy.fftpack as fft
import matplotlib.pyplot as plt
'''Numerical code starts here'''
#Define number of points and time/freq arrays
npts = 2**12
time_array = np.linspace(-1, 1, npts)
freq_array = fft.fftshift(fft.fftfreq(len(time_array), time_array[1]-time_array[0]))
#Define a rectangular pulse
pulse = np.zeros(npts)
pulse_width = 100
pulse[npts/2 - pulse_width/2:npts/2 + pulse_width/2] = 1
#Shift the pulse so that the function is symmetrical about origin
shifted_pulse = fft.fftshift(pulse)
#Calculate the fourier transform of the shifted pulse
pulse_frequencies = fft.fftshift(fft.fft(shifted_pulse))
'''Plotting code starts here'''
#Plot the pulse in the time domain
fig, ax = plt.subplots()
ax.plot(time_array, pulse)
ax.set_title('Time domain pulse', fontsize=22)
ax.set_ylabel('Field Strength', fontsize=22)
ax.set_xlabel('Time', fontsize=22)
#Plot the shifted pulse in the time domain
fig, ax = plt.subplots()
ax.plot(time_array, shifted_pulse)
ax.set_title('Shifted Time domain pulse', fontsize=22)
ax.set_ylabel('Field Strength', fontsize=22)
ax.set_xlabel('Time', fontsize=22)
#Plot the frequency components in the frequency domain
fig, ax = plt.subplots()
ax.plot(freq_array, np.real(pulse_frequencies), 'b-', label='real')
ax.plot(freq_array, np.imag(pulse_frequencies), 'r-', label='imaginary')
ax.set_title('Pulse Frequencies real and imaginary', fontsize=22)
ax.set_ylabel('Spectral Density', fontsize=22)
ax.set_xlabel('Frequency', fontsize=22)
ax.legend()
#Plot the amplitude and phase of the frequency components in the frequency domain
fig, ax = plt.subplots()
ax.plot(freq_array, np.abs(pulse_frequencies), 'b-', label='amplitude')
ax.plot(freq_array, np.angle(pulse_frequencies), 'r-', label='phase')
ax.set_title('Pulse Frequencies intenisty and phase', fontsize=22)
ax.set_ylabel('Spectral Density', fontsize=22)
ax.set_xlabel('Frequency', fontsize=22)
ax.legend()
plt.show()

Plot Markers on Curve where Value of X is known in matplotlib

I plotted a curve w.r.t time-series from the data which I got from an experiment. Data is collected at 10ms interval. Data is single row array.
I also have calculated an array which contains the time at which a certain device is triggered. I drew axvlines of these triggered locations.
Now I want to show markers where my curve crosses these axvlines. How can I do it?
Time of trigger (X- is known). Curve is drawn but don't have any equation (irregular experiment data). Trigger interval is also not always the same.
Thanks.
p.s - I also use multiple parasite axes on figure too. Not that it really matters but just in case.
Want Markers On Curve Where AXVline Crosses
You can use numpy.interp() to interpolate the data.
import numpy as np
import matplotlib.pyplot as plt
trig = np.array([0.4,1.3,2.1])
time = np.linspace(0,3,9)
signal = np.sin(time)+1.3
fig, ax = plt.subplots()
ax.plot(time, signal)
for x in trig:
ax.axvline(x, color="limegreen")
#interpolate:
y = np.interp(trig, time, signal)
ax.plot(trig, y, ls="", marker="*", ms=15, color="crimson")
plt.show()

Plot a line graph over a histogram for residual plot in python

I have created a script to plot a histogram of a NO2 vs Temperature residuals in a dataframe called nighttime.
The histogram shows the normal distribution of the residuals from a regression line somewhere else in the python script.
I am struggling to find a way to plot a bell curve over the histogram like this example :
Plot Normal distribution with Matplotlib
How can I get a fitting normal distribution for my residual histogram?
plt.suptitle('NO2 and Temperature Residuals night-time', fontsize=20)
WSx_rm = nighttime['Temperature']
WSx_rm = sm.add_constant(WSx_rm)
NO2_WS_RM_mod = sm.OLS(nighttime.NO2, WSx_rm, missing = 'drop').fit()
NO2_WS_RM_mod_sr = (NO2_WS_RM_mod.resid / np.std(NO2_WS_RM_mod.resid))
#Histogram of residuals
ax = plt.hist(NO2_WS_RM_mod.resid)
plt.xlim(-40,50)
plt.xlabel('Residuals')
plt.show
You can exploit the methods from seaborn library for plotting the distribution with the bell curve. The residual variable is not clear to me in the example you have provided. You may see the code snippet below just for your reference.
# y here is an arbitrary target variable for explaining this example
residuals = y_actual - y_predicted
import seaborn as sns
sns.distplot(residuals, bins = 10) # you may select the no. of bins
plt.title('Error Terms', fontsize=20)
plt.xlabel('Residuals', fontsize = 15)
plt.show()
Does the following work for you? (using some adapted code from the link you gave)
import scipy.stats as stats
plt.suptitle('NO2 and Temperature Residuals night-time', fontsize=20)
WSx_rm = nighttime['Temperature']
WSx_rm = sm.add_constant(WSx_rm)
NO2_WS_RM_mod = sm.OLS(nighttime.NO2, WSx_rm, missing = 'drop').fit()
NO2_WS_RM_mod_sr = (NO2_WS_RM_mod.resid / np.std(NO2_WS_RM_mod.resid))
#Histogram of residuals
ax = plt.hist(NO2_WS_RM_mod.resid)
plt.xlim(-40,50)
plt.xlabel('Residuals')
# New Code: Draw fitted normal distribution
residuals = sorted(NO2_WS_RM_mod.resid) # Just in case it isn't sorted
normal_distribution = stats.norm.pdf(residuals, np.mean(residuals), np.std(residuals))
plt.plot(residuals, normal_distribution)
plt.show

Categories

Resources