python - frequency of power spectrum - python

I want to plot a power spectrum from my data set (array of about 2000 values, the data is recorded every minute).
I've gotten so far as:
y= np.fft.fft(data)
abs = np.abs(y) #absolute value
p = np.square(abs) #power
but am confused about setting the frequency.
I've tried using freqs = np.fft.fftfreq(len(y)), but when I plot the result it looks like, which can't be right.
What am I doing wrong?

Here is an example to plot the power spectrum:
import matplotlib.pyplot as plt
import numpy as np
t = np.linspace(0,2000,200)
data = 2 * np.sin(2*np.pi *60*t) + 2 * np.sin(2*np.pi *42*t)
spectrum = np.fft.fft(data)
power_spectrum = np.square(np.abs(spectrum))
fig, ax = plt.subplots()
ax.plot(np.arange(len(power_spectrum)), power_spectrum)
plt.show()

Related

Recover the time shift from nympy.correlate result in Python

This is not a duplicate question since other answers only explain how to plot the cross-correlation function and do not explain how you can get the time difference.
Given a sin signal and shifted version, we should be able to get the time delay between them.
I have created a sin signal and shifted it by t_d=0.05. The following is my code and its output:
import numpy as np
import matplotlib.pyplot as plt
fs = 1000
x = np.linspace(0, 1, fs)
f = 5
t_shift = 0.05
y = np.sin(2*np.pi*f*x)
y_shifted = np.sin(2*np.pi*f*(x-t_shift))
fig, ax = plt.subplots()
ax.plot(x, y, x, y_shifted)
plt.show()
By normalizing signals and applying numpy.correlate we get the following:
y_norm = (y-y.mean())/y.std()
y_shifted_norm = (y_shifted - y_shifted.mean())/y_shifted.std()
cc = np.correlate(y_norm, y_shifted_norm, 'full')
fig, ax = plt.subplots()
ax.plot(range(len(cc)), cc)
plt.show()
Question
From the indices of cross-correlation function, how can I get t_shift=0.05?
#Sepide. It seems to me as if you are trying to maximise the correlation between the signal y and a shifted version of y_shifted. This might be accomplished using np.correlate() but it seems nontrivial indeed to recover the time shifts in the signals. In the solution below I manually shift the time series and compute the correlation coefficient using np.corrcoef. As soon as this Pearson correlation coefficient equals 1, the two signals are aligned.
import numpy as np
import matplotlib.pyplot as plt
# Setting
fs = 1000
x = np.linspace(0, 1, fs)
f = 5
t_shift = 0.05
t_step = 1/fs
# Data
y = np.sin(2*np.pi*f*x)
y_shifted = np.sin(2*np.pi*f*(x-t_shift))
# Compute correlation
MaxTimeShift = 200
CorrelationList = np.empty((MaxTimeShift,1));
CorrelationList[:] = np.NaN
# Compute correlation for various shifts
for iter in range(MaxTimeShift):
CorrelationList[iter] = np.corrcoef( y[0:801].T, y_shifted[iter:(801+iter)].T)[0,1]
# Plot 1
plt.figure(1)
plt.plot(x, y, x, y_shifted)
plt.show()
# Plot 2
plt.figure(2)
ShiftList = t_step*np.arange(MaxTimeShift)
plt.plot(ShiftList, CorrelationList)
plt.title("Correlation coefficient")
plt.show()
print("The time shift between the signals is: ", ShiftList[np.argmax(CorrelationList)])

How to stack multiple histograms in a single figure in Python?

I have a numpy array with shape [30, 10000], where the first axis is the time step, and the second contains the values observed for a series of 10000 variables. I would like to visualize the data in a single figure, similar to this:
that you can find in the seaborn tutorial here. Basically, what I would like is to draw a histogram of 30/40 bins for each of the 30 temporal steps, and then - somehow - concatenate these histogram to have a common axis and plot them in the same figure.
My data look like a gaussian that moves and gets wider in time. You can reproduce something similar using the following code:
mean = 0.0
std = 1.0
data = []
for t in range(30):
mean = mean + 0.01
std = std + 0.1
data.append(np.random.normal(loc=mean, scale=std, size=[10000]))
data = np.array(data)
A figure similar to the picture showed above would be the best, but any help is appreciated!
Thank you,
G.
Use histogram? You could do this with np.hist2d, but this way is a little clearer...
import matplotlib.pyplot as plt
import numpy as np
data = np.random.randn(30, 10000)
H = np.zeros((30, 40))
bins = np.linspace(-3, 3, 41)
for i in range(30):
H[i, :], _ = np.histogram(data[i, :], bins)
fig, ax = plt.subplots()
times = np.arange(30) * 0.1
pc = ax.pcolormesh(bins, times, H)
ax.set_xlabel('data bins')
ax.set_ylabel('time [s]')
fig.colorbar(pc, label='count')

How to use Python to draw a normal probability plot by using certain column data in dataFrame

I have a Data Frame that contains two columns named, "thousands of dollars per year", and "EMPLOY".
I create a new variable in this data frame named "cubic_Root" by computing the data in df['thousands of dollars per year']
df['cubic_Root'] = -1 / df['thousands of dollars per year'] ** (1. / 3)
The data in df['cubic_Root'] like that:
ID cubic_Root
1 -0.629961
2 -0.405480
3 -0.329317
4 -0.480750
5 -0.305711
6 -0.449644
7 -0.449644
8 -0.480750
Now! How can I draw a normal probability plot by using the data in df['cubic_Root'].
You want the "Probability" Plots.
So for a single plot, you'd have something like below.
import scipy.stats
import numpy as np
import matplotlib.pyplot as plt
# 100 values from a normal distribution with a std of 3 and a mean of 0.5
data = 3.0 * np.random.randn(100) + 0.5
counts, start, dx, _ = scipy.stats.cumfreq(data, numbins=20)
x = np.arange(counts.size) * dx + start
plt.plot(x, counts, 'ro')
plt.xlabel('Value')
plt.ylabel('Cumulative Frequency')
plt.show()
If you want to plot a distribution, and you know it, define it as a function, and plot it as so:
import numpy as np
from matplotlib import pyplot as plt
def my_dist(x):
return np.exp(-x ** 2)
x = np.arange(-100, 100)
p = my_dist(x)
plt.plot(x, p)
plt.show()
If you don't have the exact distribution as an analytical function, perhaps you can generate a large sample, take a histogram and somehow smooth the data:
import numpy as np
from scipy.interpolate import UnivariateSpline
from matplotlib import pyplot as plt
N = 1000
n = N/10
s = np.random.normal(size=N) # generate your data sample with N elements
p, x = np.histogram(s, bins=n) # bin it into n = N/10 bins
x = x[:-1] + (x[1] - x[0])/2 # convert bin edges to centers
f = UnivariateSpline(x, p, s=n)
plt.plot(x, f(x))
plt.show()
You can increase or decrease s (smoothing factor) within the UnivariateSpline function call to increase or decrease smoothing. For example, using the two you get:
Probability density Function (PDF) of inter-arrival time of events.
import numpy as np
import scipy.stats
# generate data samples
data = scipy.stats.expon.rvs(loc=0, scale=1, size=1000, random_state=123)
A kernel density estimation can then be obtained by simply calling
scipy.stats.gaussian_kde(data,bw_method=bw)
where bw is an (optional) parameter for the estimation procedure. For this data set, and considering three values for bw the fit is as shown below
# test values for the bw_method option ('None' is the default value)
bw_values = [None, 0.1, 0.01]
# generate a list of kde estimators for each bw
kde = [scipy.stats.gaussian_kde(data,bw_method=bw) for bw in bw_values]
# plot (normalized) histogram of the data
import matplotlib.pyplot as plt
plt.hist(data, 50, normed=1, facecolor='green', alpha=0.5);
# plot density estimates
t_range = np.linspace(-2,8,200)
for i, bw in enumerate(bw_values):
plt.plot(t_range,kde[i](t_range),lw=2, label='bw = '+str(bw))
plt.xlim(-1,6)
plt.legend(loc='best')
Reference:
Python: Matplotlib - probability plot for several data set
how to plot Probability density Function (PDF) of inter-arrival time of events?

2d fft numpy/python confusion

I have data in the form x-y-z and want to create a power spectrum along x-y. Here is a basic example I am posting to check where I might be going wrong with my actual data:
import numpy as np
from matplotlib import pyplot as plt
fq = 10; N = 20
x = np.linspace(0,8,N); y = x
space = x[1] -x[0]
xx, yy = np.meshgrid(x,y)
fnc = np.sin(2*np.pi*fq*xx)
ft = np.fft.fft2(fnc)
ft = np.fft.fftshift(ft)
freq_x = np.fft.fftfreq(ft.shape[0], d=space)
freq_y = np.fft.fftfreq(ft.shape[1], d=space)
plt.imshow(
abs(ft),
aspect='auto',
extent=(freq_x.min(),freq_x.max(),freq_y.min(),freq_y.max())
)
plt.figure()
plt.imshow(fnc)
This results in the following function & frequency figures with the incorrect frequency. Thanks.
One of your problems is that matplotlib's imshow using a different coordinate system to what you expect. Provide a origin='lower' argument, and the peaks now appear at y=0, as expected.
Another problem that you have is that fftfreq needs to be told your timestep, which in your case is 8 / (N - 1)
import numpy as np
from matplotlib import pyplot as plt
fq = 10; N = 20
x = np.linspace(0,8,N); y = x
xx, yy = np.meshgrid(x,y)
fnc = np.sin(2*np.pi*fq*xx)
ft = np.fft.fft2(fnc)
ft = np.fft.fftshift(ft)
freq_x = np.fft.fftfreq(ft.shape[0], d=8 / (N - 1)) # this takes an argument for the timestep
freq_y = np.fft.fftfreq(ft.shape[1], d=8 / (N - 1))
plt.imshow(
abs(ft),
aspect='auto',
extent=(freq_x.min(),freq_x.max(),freq_y.min(),freq_y.max()),
origin='lower' , # this fixes your problem
interpolation='nearest', # this makes it easier to see what is happening
cmap='viridis' # let's use a better color map too
)
plt.grid()
plt.show()
You may say "but the frequency is 10, not 0.5!" However, if you want to sample a frequency of 10, you need to sample a lot faster than 8/19! Nyquist's theorem says you need to exceed a sampling rate of 20 to have any hope at all

Visualizing time series in spirals using R or Python?

Does anyone know how to do this in R? That is, represent this cyclical data from the left plot to the right plot?
http://cs.lnu.se/isovis/courses/spring07/dac751/papers/TimeSpiralsInfoVis2001.pdf
Here is some example data.
Day = c(rep(1,5),rep(2,5),rep(3,5))
Hour = rep(1:5,3)
Sunlight = c(0,1,2,3,0,1,2,3,2,1,0,0,4,2,1)
data = cbind(Day,Hour,Sunlight)
This seems pretty close:
# sample data - hourly for 10 days; daylight from roughly 6:00am to 6:00pm
set.seed(1) # for reproducibility
Day <- c(rep(1:10,each=24))
Hour <- rep(1:24)
data <- data.frame(Day,Hour)
data$Sunlight <- with(data,-10*cos(2*pi*(Hour-1+abs(rnorm(240)))/24))
data$Sunlight[data$Sunlight<0] <- 0
library(ggplot2)
ggplot(data,aes(x=Hour,y=10+24*Day+Hour-1))+
geom_tile(aes(color=Sunlight),size=2)+
scale_color_gradient(low="black",high="yellow")+
ylim(0,250)+ labs(y="",x="")+
coord_polar(theta="x")+
theme(panel.background=element_rect(fill="black"),panel.grid=element_blank(),
axis.text.y=element_blank(), axis.text.x=element_text(color="white"),
axis.ticks.y=element_blank())
I know how to do this in Python. I find the scatter plot from matplotlib good for this sort of thing. Here's an example:
import matplotlib.pyplot as plt
import numpy as np
period = 0.5
f = np.arange(0, 100, 0.03) // Data range
z = np.sin(f) // Data
a = f*np.sin(period*f);
b = f*np.cos(period*f);
fig = plt.figure()
ax = plt.subplot(111)
fig.add_subplot(ax)
ax.scatter(a, b, c=z, s=100, edgecolors='none')
plt.show()
You can change period to change the number of revolutions in the spiral. a and b plot the spiral whilst z contains the actual data (in this example, a sine wave).

Categories

Resources