on my Raspberry PI 4, I have a xlsx file which I scan with :df = pd.read_excel('')
This is a dataset of a pressure sensor with 146651 rows x 1 columns. I want to perform a fft for this dataset, but if I plot the fft I get a curve exact the same to the time signal (moved on the x-axis???).
So what is the problem?
import numpy as np
import matplotlib.pyplot as plt
from scipy import fftpack
import pandas as pd
# Import csv file
df = pd.read_excel('/home/pi/Downloads/test_11.12.19_500_neuHz.xlsx', skiprows=1)
print(df)
sig_fft = fftpack.fft(df)
power = np.abs(sig_fft)
print (power)
sample_freq = fftpack.fftfreq(df.size, 0.02)
plt.figure(figsize=(6,5))
plt.plot(sample_freq, power)
plt.show()
graph
Your input data must be in a single row. Currently, your FFT is applied to all rows individually, which means the output for any given row is the mean of the single cell signal, therefore, your output is the same as your input.
#data is one column
df = pd.DataFrame([4,5,4])
df
fft = fftpack.fft(df)
fft
# output = input => wrong
#data as one row
df = pd.DataFrame({'0':[4],'1':[5],'2':[4]})
df
fft = fftpack.fft(df)
fft
# right
Now I am performing the fft in another way. But how can I scale the frequency and the magnitude axis.
import numpy as np
import matplotlib.pyplot as plt
import scipy
from scipy.fftpack import fft
import pandas as pd
import math
from tkinter import filedialog
from tkinter import *
#choose csv file
root = Tk()
root.filename = filedialog.askopenfilename ( initialdir = "/home/pi", title = "Datei auswählen", filetypes = (("Comma Seperated Values (CSV)", "*.csv"), ("Alle Dateien", "*.*")) )
# Import csv file
df = pd.read_csv(root.filename, delimiter = ';')
#convert Voltage to Bar
df_echt = df/0.01
#preparation for fft
df_neu = df.as_matrix()
time = df_neu[:,0]
voltage = df_neu[:,0]/df_neu[:,0].max()
df_tr = df_neu.T
#fft
amplitude = np.fft.rfft (voltage)
freq = np.fft.rfftfreq(len(time),np.diff(time)[0])
#plot time signal and fft
plt.plot( np.absolute(amplitude), lw = 0.5)
plt.figure (2)
plt.plot (df_echt)
plt.legend (df)
plt.show()
FFT graph
So how do I scale the axises?
Related
I've been trying to get the fourier transform of the data in file S1L1E here https://github.com/gergobes/data-for-fft/blob/c0b664379c0eeab04abdc541e34d6e636e841eb0/S1L1E. First column is time, 2nd column is amplitude of first wave and 3rd column is the amplitude of another wave. The code I tried is;
import matplotlib.pyplot as plt
import numpy as np
from scipy.fft import fft, fftfreq
data = np.loadtxt("S1L1E")
time = data[:,0]
amp_left = data[:,1]
amp_right = data[:,2]
plt.plot(time, amp_left)
plt.plot(time, amp_right)
plt.show()
# fft attempt
samplerate = 5000
duration = time[-1]
N = int(samplerate * duration)
x = time
y = amp_left
yf = fft(y)
xf = fftfreq(len(y))
plt.plot(xf, abs(yf))
plt.show()
I tried for the first wave but got only a spike at 0. What am I doing wrong? It's my first time trying a fft so I'm kinda lost here. I would appreciate any help.
I am trying to calculate the power spectral density step by step using numpy FFT and comparing it to the PSD calculated by pylab PSD. Difference b/w plots of both methods can be visualized from in figures below.
#*************************PSD using pylab PSD**************************
import math
from rtlsdr import RtlSdr
import pylab as mpl
import numpy as np
import pandas as pd
sdr = RtlSdr()
sdr.rs = 2.4e6
sdr.fc = 935e6
sdr.gain = 40
samplest = ()
psd_scant = ()
psd_logt = ()
samples = sdr.read_samples(256*1024)
psd_scan, f=mpl.psd(samples, NFFT=1024, Fc=sdr.fc/1e6, Fs=sdr.rs/1e6)
psd_log = 10*np.log10(psd_scan)```
#*************************Step by Step PSD using FFT**************************
n=256
x= np.array_split(samples, n)
xdf = pd.DataFrame(x)
xdft =xdf.transpose()
NFFT = 1024
xfft = np.fft.fft(xdft, n=1024, axis=0)[:1024, :]
xfftt=xfft.transpose()
PSD = np.abs(xfftt)**2
PSD_log = 10*np.log10(PSD)
PSD_shifted = np.fft.fftshift(PSD_log)
Final = PSD_shifted.mean(axis=0)
plt.xlabel('Frequency'); plt.ylabel('PSD')
plt.grid('on')
plt.plot(f,Final)
Pylab PSD
PSD using FFT
**Hints/ Help required **
To compute the manual PSD plot same as pylab PSD will be appreciated.
How the y axis can be corrected.
I have 2 tables a 10 by 110 and a 35 by 110 and both contain random numbers from a exponential distribution function provided by my professor. The assignment is to prove the central limit theorem in statistics.
What I thought to try is:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
"importing data"
df1 = pd.read_excel(r'C:\Users\Henry\Desktop\n10.xlsx')
df2 = pd.read_excel(r'C:\Users\Henry\Desktop\n30.xlsx')
df1avg = pd.read_excel(r'C:\Users\Henry\Desktop\n10avg.xlsx')
df2avg = pd.read_excel(r'C:\Users\Henry\Desktop\n30avg.xlsx')
"plotting n10 histogram"
plt.hist(df1, bins=34)
plt.hist(df1avg, bins=11)
"plotting n30 histogram"
plt.hist(df2, bins=63)
plt.hist(df2avg, bins=11)
Is that ok or do I need to format the tables into a singular column, and if so what is the most efficient way to do that?
I suspect that you will want to flatten your dataframe first, as illustrated below.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
N = np.random.exponential(1, [40, 5])
df = pd.DataFrame(N) # convert to dataframe
bin_edges = np.linspace(0,6,30)
plt.figure()
plt.hist(df, bins = bin_edges, density = True)
plt.xlabel('Value')
plt.ylabel('Probability density')
The multiple (5) colours of lines per bin shows the histograms for each column of the data frame.
Fortunately, this is not hard to adjust. You can convert the data frame to a numpy array and flatten it:
plt.hist(df.to_numpy().flatten(), bins = bin_edges, density = True)
plt.ylabel('Probability density')
plt.xlabel('Value')
I have a plot with some outliers (wrong measurements):
The base data is good though. I want to just delete everything that is too far off the "current average". I tried using pd.rolling().mean() but with no satisfactory result:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
data = np.genfromtxt('shard_height_plot.csv', delimiter = ',')
df = pd.DataFrame(data)
df.set_index(0, inplace = True)
df2 = df.rolling(20).mean()
plt.plot(df)
plt.plot(df2)
plt.show()
I tried to search the web for a good solution but couldn't find one. It shouldn't be that hard to delete data points, that jump through the roof, should it?
Edit:
data file can be downloaded here: https://ufile.io/pviuc
Edit2:
I takled this problem of too many outliers by improving my data set creation.
The core of it:
if abs(D - D_List[-2]) > 30:
D = D_List[-2]
D_List.pop()
D_List.append(D)
Basically what this does is checking if the change of a value is larger than 30, if so it deletes the last value and replaces is with the second last. Not very spectacular but just what I need. I used one of the answers though because it is so much prettier. Thank you guys very much.
Let's try using scipy.signal see docs:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy import signal
data = np.genfromtxt('shard_height_plot.csv', delimiter = ',')
df = pd.DataFrame(data)
df.set_index(0, inplace = True)
df2 = df.rolling(20).mean()
b, a = signal.butter(3, 0.05)
y = signal.filtfilt(b,a, df[1].values)
df3 = pd.DataFrame(y, index=df2.index)
plt.plot(df, alpha=.3)
plt.plot(df2, alpha=.3)
plt.plot(df3)
plt.show()
Output:
Use medfilt:
y = signal.medfilt(df[1].values)
Output:
There are many ways to smooth a curve (rolling mean, GAM, smoothing spline etc.), my favorite one is the Savitzky–Golay method.
It works as follows: after having regressed a small window around a data point y onto a polynomial (with least squares), it uses this polynomial to get the estimation of your data point ^y. Then the window is shifted forward by one data point.
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import savgol_filter
x = np.linspace(0,5,150)
y = np.cos(x) + np.random.random(150) * 0.15
yhat = savgol_filter(y, 49, 3)
plt.plot(x,y)
plt.plot(x,yhat, color='red')
plt.show()
Note that rolling mean can't work in your case with a perimeter as low as 20, since the outlier point will have a non-negligible weight (5%) and will always induce a big bias...
I am producing the probability distribution function of my variable, which is temperature:
and I am going to produce several plots with temperature PDF evolution.
For this reason, I would like to link the color of the plot (rainbow-style) with the value of the peak of the temperature distribution.
In this way, it is easy to associate the average value of the temperature just by looking at the color.
Here's the code I have written for producing plots of the PDF evolution:
from netCDF4 import Dataset
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
import seaborn as sns
from scipy.stats import gaussian_kde
my_file = 'tas/tas.nc'
fh = Dataset(my_file, mode='r')
lons = (fh.variables['rlon'][:])
lats = (fh.variables['rlat'][:])
t = (fh.variables['tas'][:])-273
step = len(t[:,0,0])
t_units = fh.variables['tas'].units
fh.close()
len_lon = len(t[0,0,:])
len_lat = len(t[0,:,0])
len_tot = len_lat*len_lon
temperature = np.zeros(len_tot)
for i in range(step):
temperature=t[i,:,:]
temperature_array = temperature.ravel()
density = gaussian_kde(temperature_array)
xs = np.linspace(-80,50,200)
density.covariance_factor = lambda : .25
density._compute_covariance()
plt.title(str(1999+i))
plt.xlabel("Temperature (C)")
plt.ylabel("Frequency")
plt.plot(xs,density(xs))
plt.savefig('temp_'+str(i))
Because the question is lacking a working snippet, I had to come up with some sample data. This creates three datasets, where each one is colored with a specific color between blue (cold) and red (hot) according to their maximum value.
import matplotlib.pyplot as plt
import random
from colour import Color
nrange = 20
mydata1 = random.sample(range(nrange), 3)
mydata2 = random.sample(range(nrange), 3)
mydata3 = random.sample(range(nrange), 3)
colorlist = list(Color('blue').range_to(Color('red'), nrange))
# print(mydata1) print(mydata2) print(mydata3)
plt.plot(mydata1, color='{}'.format(colorlist[max(mydata1)]))
plt.plot(mydata2, color='{}'.format(colorlist[max(mydata2)]))
plt.plot(mydata3, color='{}'.format(colorlist[max(mydata3)]))
plt.show()