I am implementing the method from this paper:
https://dspace.mit.edu/bitstream/handle/1721.1/66243/Picard_Noncontact%20Automated.pdf?sequence=1&isAllowed=y
The main idea is cardiac pulse measurement using a set of frames (N=300) from a 10s video so the frame rate equals 30 fps.
red = [item[:,:,0] for item in imgs]
green = [item[:,:,1] for item in imgs]
blue = [item[:,:,2] for item in imgs]
red_avg = [item.mean() for item in red]
green_avg = [item.mean() for item in green]
blue_avg = [item.mean() for item in blue]
red_mean, red_std = np.array(red_avg).mean(), np.array(red_avg).std()
green_mean, green_std = np.array(green_avg).mean(), np.array(green_avg).std()
blue_mean, blue_std = np.array(blue_avg).mean(), np.array(blue_avg).std()
red_avg = [(item - red_mean)/red_std for item in red_avg]
green_avg = [(item - green_mean)/green_std for item in green_avg]
blue_avg = [(item - blue_mean)/blue_std for item in blue_avg]
data = np.vstack([signal.detrend(red_avg), signal.detrend(green_avg), signal.detrend(blue_avg)]).reshape(300,3)
from sklearn.decomposition import FastICA
transformer = FastICA(n_components=3)
X_transformed = transformer.fit_transform(data)
from scipy.fftpack import fft
first = X_transformed.T[0]
second = X_transformed.T[1]
third = X_transformed.T[2]
ff = np.fft.fft(first)
fs = np.fft.fft(second)
ft = np.fft.fft(third)
imgs - is the initial list of arrays with 300 image pixel values.
As you can see, I split all frames into RGB channels and thus have traces x_i(t), where i = 1,2,3
After standardization, we detrend all the traces and stack them to further apply ICA and then FFT all the three components.
The method then claims that we need to plot power vs frequency (Hz) and select the component that is most likely to be heart pulse.
Finally, we applied the fast Fourier transform (FFT) on the selected source signal to
obtain the power spectrum. The pulse frequency was designated as the frequency that
corresponded to the highest power of the spectrum within an operational frequency band. For
our experiments, we set the operational range to [0.75, 4] Hz (corresponding to [45, 240]
bpm) to provide a wide range of heart rate measurements.
Here's how I try to visualize the frequencies:
import numpy as np
import matplotlib.pyplot as plt
from scipy import signal
data = ft
print(fs.size)
ps = np.abs(np.fft.fft(data))**2
sampling_rate = 30
freqs = np.fft.fftfreq(data.size, 1/sampling_rate)
idx = np.argsort(freqs)
#print(idx)
plt.plot(freqs[idx], ps[idx])
What I get is totally different since the range of frequencies is from $-15$ to $15$ and I have no idea whether this is in Hz or not.
The three images above are what I get when I execute the code to visualize frequencies and signal power.
I would appreciate any help or suggestions.
You should really learn how to work with images/videos as nD-tensors. Doing that you can replace all your data wrangling with much more concise code:
import numpy as np
import matplotlib.pyplot as plt
from scipy.fftpack import fft
from sklearn.decomposition import FastICA
images = [np.random.rand(640, 480, 3) for _ in range(30)]
# Create tensor with all images
images = np.array(images)
images.shape
# Take average of all pixels, for each image and each channel individually
avgs = np.mean(images, axis=(1, 2))
mean, std = np.mean(avgs), np.std(avgs)
# Normalize all average channels
avgs = (avgs - mean) / std
# Detrend across images
avgs = scipy.signal.detrend(avgs, axis=0)
transformer = FastICA(n_components=3)
X_transformed = transformer.fit_transform(avgs)
X_ff = np.fft.fft(X_transformed, axis=0)
plt.plot(np.abs(X_ff) ** 2)
To answer your question a little bit: I think you are mistakenly taking the fourier spectrum of the fourier spectrum of the third PCA component.
FFT(FFT(PCA[:, 2]))
while you meant to take one FFT only:
FFT(PCA[:, 2]))
Regarding the -15...15 axis: You have set your sampling frequency as 30Hz (or 30fps in video-terms). That means you can detect anything up to 15Hz in your video.
In fourier theory, there exists a thing as called "negative frequency". Now since we're mostly analyzing real signals (as opposed to complex signals), that negative frequency is always the same as the positive frequency. This means your spectrum is always symmetric and you can disregard the left half.
However since you've taken the FFT twice, you're looking at the FFT of a complex signal, which does have negative frequencies. That's the reason why your spectra are asymmetric and confusing.
Additionally, I believe you're confusing reshaping and transposing. Before PCA, you're assembling your data like
np.vstack([red, green, blue]) # shape = (3, 300)
which you want to transpose to get (300, 3). If you reshape instead, you're not swapping rows and columns but are interpreting the same data in a different shape.
Related
This is my first post so apologies for any formatting related issues.
So I have a dataset which was obtained from an atomic microscope. The data looks like a 1024x1024 matrix which is composed of different measurements taken from the sample in units of meters, eg.
data = [[1e-07 ... 4e-08][ ... ... ... ][3e-09 ... 12e-06]]
np.size(data) == (1024,1024)
From this data, I was hoping to 1) derive some statistics about the real data; and 2) using the power spectrum density (PSD) distribution, hopefully create a new dataset which is different, but statistically similar to the characteristics of the original data. My plan to do this was 2a) take a 2d fft of data, calculate the power spectrum density 2b) some method?, 2c) take the 2d ifft of the modified signal to turn it back into a new sample with the same power spectrum density as the original.
Moreover, regarding part 2b) this was the closest link I could find regarding a time series based solution; however, I am not understanding exactly how to implement this so far, since I am not exactly sure what the phase, frequency, and amplitudes of the fft data represent in this 2d case, and also since we are now talking about a 2d ifft I'm not exactly sure how to construct this complex matrix while incorporating the random number generation, and amplitude/phase shifts in a way that will translate back to something meaningful.
So basically, I have been having some trouble with my intuition. For this problem, we are working with a 2d Fourier transform of spatial data with no temporal component, so I believe that methods which are applied to time series data could be applied here as well. Since the fft of the original data is the 'frequency in the spatial domain', the x-axis of the PSD should be either pixels or meters, but then what is the 'power' in the y-axis describing? I was hoping that someone could help me figure this problem out.
My code is below, hopefully someone could help me solve my problem. Bonus if someone could help me understand what this shifted frequency vs amplitude plot is saying:
here is the image with the fft, shifted fft, and freq. vs aplitude plots.
Fortunately the power spectrum density function is a bit easier to understand
Thank you all for your time.
data = np.genfromtxt('asample3.0_00001-filter.txt')
x = np.arange(0,int(np.size(data,0)),1)
y = np.arange(0,int(np.size(data,1)),1)
z = data
npix = data.shape[0]
#taking the fourier transform
fourier_image = np.fft.fft2(data)
#Get power spectral density
fourier_amplitudes = np.abs(fourier_image)**2
#calculate sampling frequency fs (physical distance between pixels)
fs = 92e-07/npix
freq_shifted = fs/2 * np.linspace(-1,1,npix)
freq = fs/2 * np.linspace(0,1,int(npix/2))
print("Plotting 2d Fourier Transform ...")
fig, axs = plt.subplots(2,2,figsize=(15, 15))
axs[0,0].imshow(10*np.log10(np.abs(fourier_image)))
axs[0,0].set_title('fft')
axs[0,1].imshow(10*np.log10(np.abs(np.fft.fftshift(fourier_image))))
axs[0,1].set_title('shifted fft')
axs[1,0].plot(freq,10*np.log10(np.abs(fourier_amplitudes[:npix//2])))
axs[1,0].set_title('freq vs amplitude')
for ii in list(range(npix//2)):
axs[1,1].plot(freq_shifted,10*np.log10(np.fft.fftshift(np.abs(fourier_amplitudes[ii]))))
axs[1,1].set_title('shifted freq vs amplitude')
#constructing a wave vector array
## Get frequencies corresponding to signal PSD
kfreq = np.fft.fftfreq(npix) * npix
kfreq2D = np.meshgrid(kfreq, kfreq)
knrm = np.sqrt(kfreq2D[0]**2 + kfreq2D[1]**2)
knrm = knrm.flatten()
fourier_amplitudes = fourier_amplitudes.flatten()
#creating the power spectrum
kbins = np.arange(0.5, npix//2+1, 1.)
kvals = 0.5 * (kbins[1:] + kbins[:-1])
Abins, _, _ = stats.binned_statistic(knrm, fourier_amplitudes,
statistic = "mean",
bins = kbins)
Abins *= np.pi * (kbins[1:]**2 - kbins[:-1]**2)
print("Plotting power spectrum of surface ...")
fig = plt.figure(figsize=(10, 10))
plt.loglog(fs/kvals, Abins)
plt.xlabel("Spatial Frequency $k$ [meters]")
plt.ylabel("Power per Spatial Frequency $P(k)$")
plt.tight_layout()
Problem
I am trying to remove a frequency from a set of data obtained from an audio file.
To simplify down my problem, I have created the code below which creates a set of waves and merges them into a complex wave. Then it finds the fourier transform of this complex wave and inverses it.
I am expecting to see the original wave as a result since there should be no data loss, however I receive a very different wave instead.
Code:
import numpy as np
import matplotlib.pyplot as plt
import random
#Get plots
fig, c1 = plt.subplots()
c2 = c1.twinx()
fs = 100 # sample rate
f_list = [5,10,15,20,100] # the frequency of the signal
x = np.arange(fs) # the points on the x axis for plotting
# compute the value (amplitude) of the sin wave for each sample
wave = []
for f in f_list:
wave.append(list(np.sin(2*np.pi*f * (x/fs))))
#Adds the sine waves together into a single complex wave
wave4 = []
for i in range(len(wave[0])):
data = 0
for ii in range(len(wave)):
data += wave[ii][i]
wave4.append(data)
#Get frequencies from complex wave
fft = np.fft.rfft(wave4)
fft = np.abs(fft)
#Note: Here I will add some code to remove specific frequencies
#Get complex wave from frequencies
waveV2 = np.fft.irfft(fft)
#Plot the complex waves, should be the same
c1.plot(wave4, color="orange")
c1.plot(waveV2)
plt.show()
Results: (Orange is created wave, blue is original wave)
Expected results:
The blue and orange lines (the original and new wave created) should have the exact same values
You took the absolute value of the FFT before you do the inverse FFT. That changes things, and is probably the cause of your problem.
I would like to filter out unwanted frequencies and keep an only 60Hz signal.
Here is what I have done so far:
import numpy as np
from scipy.fftpack import rfft, irfft, fftfreq
#
time = np.linspace(0,1,1000)
in_sig = np.cos(54*np.pi*time) + np.cos(60*np.pi*time) + np.sin(66*np.pi*time);
high_freq = 62;
low_freq = 58;
freqs = fftfreq(len(in_sig), d=time[1]-time[0])
filt_sig = rfft(in_sig)
cut_filt_sig = filt_sig.copy()
cut_filt_sig[(freqs<low_freq)] = 0
cut_filt_sig[(freqs>high_freq)] = 0
cut_in_sig = irfft(cut_filt_sig)
from pylab import *
figure(figsize=(10, 6))
subplot(221);plot(time,in_sig); title('Input signal');
subplot(222);plot(freqs,filt_sig);xlim(0,100);title('FFT of the input signal');
subplot(223);plot(time,cut_in_sig); title('Filtered signal');
xlabel('Time (s)')
subplot(224);plot(freqs,cut_filt_sig);xlim(0,100); title('FFT of the filtered signal');
xlabel('Freq. (Hz)')
show()
Plotted results
As I can see the filtered signal has lower amplitudes at the edges, I assume it could be due to applied rectangular window. What windows would you recommend to use to improve to output?
The issue likely comes from numpy's linspace(). The default mode is to include the endpoint stop. So time is 0, 1/999, 2/999, ..., 1. On the contrary, fft, handles of signal of length N as a periodic signal sampled at 0, T/N, ... , T(N-1)/N, thus avoiding the redundancy of the endpoint.
The computed DFT therefore use a frame of length T=1000/999. Hence the frequencies of the DFT are k*999/1000, not k. Since the length of the frame is not a multiple of the period of the signal (1/6s), a problem named spectral leakage occurs.
To avoid the spectral leakage, the length of the frame can be shortened to a multiple of the period, by removing the endpoint:
time = np.linspace(0,1,1000,endpoint=False)
It returns time as 0, 1/1000, ....999/1000, handled by the DFT as a frame of length 1, that is a multiple of the period of the input signal (1/6s).
If the length of the frame is not a multiple of the period of the signal, the input signal can be windowed so as to partly mitigate the effect related to the discontinuity at the edge of the frame, but spurous frequencies still exist.
Finally, the actual frequencies can be properly computed by estimating the frequency of a peak as its mean frequency wih respect to power density. See my answer to
Why are frequency values rounded in signal using FFT?
I'm trying to create a home made spectrum analyzer with 8 strips of LED's.
The part i'm struggling with is performing the FFT and understanding how to use the results.
So far this is what I have:
import opc
import time
import pyaudio
import wave
import sys
import numpy
import math
CHUNK = 1024
# Gets the pitch from the audio
def pitch(signal):
# NOT SURE IF ANY OF THIS IS CORRECT
signal = numpy.fromstring(signal, 'Int16');
print "signal = ", signal
testing = numpy.fft.fft(signal)
print "testing = ", testing
wf = wave.open(sys.argv[1], 'rb')
RATE = wf.getframerate()
p = pyaudio.PyAudio() # Instantiate PyAudio
# Open Stream
stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
channels=wf.getnchannels(),
rate=wf.getframerate(),
output=True)
# Read data
data = wf.readframes(CHUNK)
# Play Stream
while data != '':
stream.write(data)
data = wf.readframes(CHUNK)
frequency = pitch(data)
print "%f frequency" %frequency
I'm struggling with what to do in the pitch method. I know i need to perform FFT on the data that is passed in, but am really unsure how to do it.
Also should be using this function?
Because of the way np.fft.fft works, if you use 1024 data points you will get values for 512 frequencies (plus a value zero Hz, DC offset). If you only want 8 frequencies you have to feed it 16 data points.
You might be able to do what you want by down sampling by a factor of 64 - then 16 down sampled points would be time-equivalent to 1024 original points. I've never explored this so I don't know what this entails or what the pitfalls might be.
You're going to have to do some learning - The Scientist and Engineer's Guide to Digital Signal Processing really is an excellant resource, at least it was for me.
Keep in mind that for an audio cd .wav file the sample frequency is 44100 Hz - a 1024 sample chunk is only 23 mS of the sound.
scipy.io.wavfile.read makes getting the data easy.
samp_rate, data = scipy.io.wavfile.read(filename)
data is a 2-d numpy array with one channel in in column zero, data[:,0], and the other in column 1, data[:,1]
Matplotlib's specgram and psd functions can give you the data you want. A graphing analog to what you are trying to do would be.
from matplotlib import pyplot as plt
import scipy.io.wavfile
samp_rate, data = scipy.io.wavfile.read(filename)
Pxx, freqs, bins, im = plt.specgram(data[:1024,0], NFFT = 16, noverlap = 0, Fs = samp_rate)
plt.show()
plt.close()
Since you aren't doing any plotting just use matplolib.mlab.specgram.
Pxx, freqs, t = matplolib.mlab.specgram(data[:1024,0], NFFT = 16, noverlap = 0, Fs = samp_rate)
Its return values (Pxx, freqs, t) are
- *Pxx*: 2-D array, columns are the periodograms of successive segments
- *freqs*: 1-D array of frequencies corresponding to the rows in Pxx
- *t*: 1-D array of times corresponding to midpoints of segments.
Pxx[1:, 0] would be the values for the frequencies for T0, Pxx[1:, 1] for T1, Pxx[1:, 2] for T2, ... This is what you would feed to your display. You don't use Pxx[0, :] because it is for 0 Hz.
power spectral density - matplotlib.mlab.psd()
Maybe another strategy to get down to 8 bands would be to use large chunks and normalize the values. Then you could break the values up into eight segments and get the sum of each segments. I think this is valid - maybe only for the power spectral density. sklearn.preprocessing.normalize
w = sklearn.preprocessing.normalize(Pxx[1:,:], norm = 'l1', axis = 0)
But then again, I just made all that up.
I don't know about the scipy.io.wavfile.read function that #wwii mentions in his answer, but it seems that his suggestion is the way to go to handle the signal loading. However, I just wanted to comment on the fourier transform.
What I imagine that you intend to do with your LED setup is to change each of the LED's brightnesses according to the power of the spectra in each of the 8 frequency bands that you intend to use. Thus, what I understood that you need, is to compute in some way the power as time goes by. The first complication is "how to compute the spectral power?"
The best way to do this is with the numpy.fft.rfft, which computes the fourier transform for signals that only have real numbers (not complex numbers). On the other hand, the function numpy.fft.fft is a general purpose function that can compute the fast fourier transform for signals with complex numbers. The conceptual difference is that numpy.fft.fft can be used to study travelling waves and their propagation direction. This is seen because the returned amplitudes correspond to positive or negative frequencies that indicate how the wave travels. numpy.fft.rfft yields the amplitude for real-valued frequencies as seen in numpy.fft.rfftfreq, which is what you need.
The last issue is to choose appropriate frequency bands in which to compute the spectral power. The human ear has a huge frequency response range and the width of each band will vary very much, with the low frequency band being very narrow and the high frequency band being very wide. Googling around, I found this nice resource that defines 7 relevant frequency bands
Sub-bass: 20 to 60 Hz
Bass: 60 to 250 Hz
Low midrange: 250 to 500 Hz
Midrange: 500 Hz to 2 kHz
Upper midrange: 2 to 4 kHz
Presence: 4 to 6 kHz
Brilliance: 6 to 20 kHz
I would suggest to use these bands, but split the upper midrange into 2-3 kHz and 3-4kHz. That way you'll be able to use your 8 LED setup. I'm uploading an updated pitch function for you to use
wf = wave.open(sys.argv[1], 'rb')
CHUNK = 1024
RATE = wf.getframerate()
DT = 1./float(RATE) # time between two successive audio frames
FFT_FREQS = numpy.fft.nfftfreq(CHUNCK,DT)
FFT_FREQS_INDS = -numpy.ones_like(FFT_FREQS)
bands_bounds = [[20,60], # Sub-bass
[60,250], # Bass
[250,500], # Low midrange
[500,2000], # Midrange
[2000,3000], # Upper midrange 0
[3000,4000], # Upper midrange 1
[4000,6000], # Presence
[6000,20000]] # Brilliance
for f_ind,freq in enumerate(FFT_FREQS):
for led_ind,bounds in enumerate(bands_bounds):
if freq<bounds[1] and freq>=bounds[0]:
FFT_FREQS_INDS[ind] = led_ind
# Returns the spectral power in each of the 8 bands assigned to the LEDs
def pitch(signal):
# CONSIDER SWITCHING TO scipy.io.wavfile.read TO GET SIGNAL
signal = numpy.fromstring(signal, 'Int16');
amplitude = numpy.fft.rfft(signal.astype(numpy.float))
power = [np.sum(np.abs(amplitude[FFT_FREQS_INDS==led_ind])**2) for led_ind in range(len(bands_bounds))]
return power
The first part of the code computes the fft frequencies and constructs the array FFT_FREQS_INDS that indicates to which of the 8 frequency bands the fft frequency corresponds to. Then, in pitch the power of the spectra in each of the bands is computed. Of course, this can be optimized but I tried to make the code self-explanatory.
The figure I plot via the code below is just a peak around ZERO, no matter how I change the data. My data is just one column which records every timing points of some kind of signal. Is the time_step a value I should define according to the interval of two neighbouring points in my data?
data=np.loadtxt("timesequence",delimiter=",",usecols=(0,),unpack=True)
ps = np.abs(np.fft.fft(data))**2
time_step = 1
freqs = np.fft.fftfreq(data.size, time_step)
idx = np.argsort(freqs)
pl.plot(freqs[idx], ps[idx])
pl.show()
As others have hinted at your signals must have a large nonzero component. A peak at 0 (DC) indicates the average value of your signal. This is derived from the Fourier transform itself. This cosine function cos(0)*ps(0) indicates a measure of the average value of the signal. Other Fourier transform components are cosine waves of varying amplitude which show frequency content at those values.
Note that stationary signals will not have a large DC component as they are already zero mean signals. If you do not want a large DC component then you should compute the mean of your signal and subtract values from that. Regardless of whether your data is 0,...,999 or 1,...,1000, or even 1000, ..., 2000 you will get a peak at 0Hz. The only difference will be the magnitude of the peak since it measures the average value.
data1 = arange(1000)
data2 = arange(1000)+1000
dataTransformed3 = data - mean(data)
data4 = numpy.zeros(1000)
data4[::10] = 1 #simulate a photon counter where a 1 indicates a photon came in at time indexed by array.
# we could assume that the sample rate was 10 Hz for example
ps1 = np.abs(np.fft.fft(data))**2
ps2 = np.abs(np.fft.fft(data))**2
ps3 = np.abs(np.fft.fft(dataTransformed))**2
figure()
plot(ps1) #shows the peak at 0 Hz
figure()
plot(ps2) #shows the peak at 0 Hz
figure()
plot(ps3) #shows the peak at 1 Hz this is because we removed the mean value but since
#the function is a step function the next largest component is the 1 Hz cosine wave.
#notice the order of magnitude difference in the two plots.
Here is a bare-bones example that shows input and output with a peak as you'd expect it:
import numpy as np
from scipy.fftpack import rfft, irfft, fftfreq
time = np.linspace(0,10,2000)
signal = np.cos(5*np.pi*time)
W = fftfreq(signal.size, d=time[1]-time[0])
f_signal = rfft(signal)
import pylab as plt
plt.subplot(121)
plt.plot(time,signal)
plt.subplot(122)
plt.plot(W,f_signal)
plt.xlim(0,10)
plt.show()
I use rfft since, more than likely, your input signal is from a physical data source and as such is real.
If you make your data all positive:
ps = np.abs(np.fft.fft(data))**2
time_step = 1
then most probably you will create a large 'DC', or 0 Hz component. So if your actual data has little amplitude, compared to that component, it will disappear from the plot, by the autoscaling feature.