Signal data denoising or removing artifacts - python

In the original dataset there is noise. I want to denoise them. Spline interpolation or wavelet filtering can be used.
Please find the dataset sample here
>t1=1583516027000;t2=1583516028000
>t3=1583515991000;t4=1583515993000
>u1=d5[(d5['time']>=t1) & (d5['time']<=t2)]
>u2=d5[(d5['time']>=t3) & (d5['time']<=t4)]
t1,t2,t3,t4 are the timestamp interval where the noise occurred. To denoise them,
>u1['ch1'].interpolate(method='spline', order=2)
It provides me an error and also interpolation only interpolate the missing observations not the existing values.
Also, for wavelet denoise filtering I wrote this code
>import pywt
>import numpy as np
>from scipy.misc import electrocardiogram
>import scipy.signal as signal
>import matplotlib.pyplot as plt
>from skimage.restoration import denoise_wavelet
>wavelet_type='db6'
>x_denoise = denoise_wavelet(u1.iloc[:,0], method='BayesShrink', mode='soft', wavelet_levels=3, wavelet='sym8', rescale_sigma='True')
However, it does not change any results. How can I do this task? I am new to post the question. This is my first question. May be I am not clearly able to explain all the problem. My intention is to make a denoise dataset using spline interpolation and wavelet denoising filtering. But the problem is I can not filter the whole dataset. I have to filter only based on time interval because whole dataset dose not include the artifacts or noises. If I filter the whole data, it would also remove the original data. Therefore, I have to filter based on time interval.

Related

How can we generate non-Gaussian random noise in python?

I have to add non-Gaussian random noise in synthetic seismic data in my project, but apparently I can only find the methods to add Gaussian noise as below:
import numpy as np
noise = np.random.normal(0,1,100)
Are there any ways to generate them?

How to get and stack frames from live video and use LSI algorithm

I have a camera XIMEA and I want to make algorithm that could get and stack number of frames with my algorithm in real time.
For example, we have a live video. after 15-th frame I want to get [1,15] frames in one list (List comprehension?) and make something with them. After that I want to get a list with frames [2,16], [3,17] etc while I won't stop it. How can I do that?
I have a code like that for camera
import cv2
import time
import numpy as np
from matplotlib import pyplot as plt
from scipy import ndimage
cam = xiapi.Camera()
cam.open_device()
img = xiapi.Image()
now you see what libraries I used for it.
LSI algorithm (temporale filter) is about to get mean value for one pixel in a few frames (in my case in 15). Should I use some numpy functions for pictures as for arrays or opencv for frames?

Amplitude units in FFT

I'm completely new to python, scipy, matplotlib and programming in general.
I'm using the following code, which I came across online, to apply FFT to .wav files:
import scipy.io.wavfile as wavfile
import scipy
import scipy.fftpack as fftpk
import numpy as np
from matplotlib import pyplot as plt
s_rate, signal = wavfile.read("file.wav")
FFT = abs(scipy.fft.fft(signal))
freqs = fftpk.fftfreq(len(FFT), (1.0/s_rate))
plt.plot(freqs[range(len(FFT)//2)], FFT[range(len(FFT)//2)])
plt.xlabel('Frequency (Hz)')
plt.ylabel('Amplitude')
plt.show()
The resulting graphs give amplitude values that range from 0 to a few thousands, depending on the files, and I have no idea what unit these are in. I'm guessing they might be relative amplitudes, and I was wondering if there is a way to turn that into decibels, as I need specific values.
Thank you
Tanguy
They are amplitudes relative to the quantization units used for the samples in your input signal. So, without calibrating your input signal against a known level of source input (to get Volts per 1 bit change, etc.), the actual units are unknown. If calibrated, you may still need to divide the magnitudes of the FFT output by N (the FFT length), depending on your particular FFT implementation.
To get Decibels, convert by taking 20*log10(abs(...)) of the FFT results, and offset by your 0 dB calibration level.

Dask visualize method image size too small

I am trying to use the visualize method to visualize a Dask graph. However, the resulting image is too small (because there are a lot of nodes in my graph). How can I increase its size?
Here is the code:
from dask.diagnostics import ProgressBar
from matplotlib import pyplot as plt
df = dd.read_csv('nyc_parking_tickets_2017.csv')
missing_values = df.isnull().sum()
missing_count = ((missing_values / df.index.size) * 100)
missing_count.visualize()
This code is taken from Data Science with Python and Dask by Jesse Daniel. The dataset comes from this Kaggle dataset on NYC parking tickets.
Dask uses relatively sane defaults for graphviz. It's surprising that the image is small. However, if you want to modify the graph itself you can pass graph-level attributes to the visualize method (see the docstring). These will be passed to the GraphViz library.
You might also mean that the nodes in the graph are small, perhaps because there are very many of them. I don't recommend relying on the visualize method to gain insight if you have more than a few hundred partitions.

Outlier detection for time series

I wanted to generate a very simple example of anomaly detection for time series. So I created sample data with one very obvious outlier but I didn't get any method to detect the outlier reliably so far. I tried local outlier factor, isolation forests and k nearest neighbors. From what I read, at least one of those methods should be suitable. I also tried tweaking the parameters but that didn't really help.
What mistake do I make here? Are the methods not appropriate?
Below is a code example.
Thanks in advance!
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(1)
t=np.linspace(0,10,101).reshape(-1,1)
y_test=0.5+t+t**2+2*np.random.randn(len(t),1)
y_test[10]=y_test[10]*7
plt.figure(1)
plt.plot(t,y_test)
plt.show;
from sklearn.neighbors import LocalOutlierFactor
clf=LocalOutlierFactor(contamination='auto')
pred=clf.fit_predict(y_test)
plt.figure(2)
plt.plot(t[pred==1],y_test[pred==1],'bx')
plt.plot(t[pred==-1],y_test[pred==-1],'ro')
plt.show
from sklearn.ensemble import IsolationForest
clf=IsolationForest(behaviour='new',contamination='auto')
pred=clf.fit_predict(y_test)
plt.figure(3)
plt.plot(t[pred==1],y_test[pred==1],'bx')
plt.plot(t[pred==-1],y_test[pred==-1],'ro')
plt.show
from pyod.models.knn import KNN
clf = KNN()
clf.fit(y_test)
pred=clf.predict(y_test)
plt.figure(4)
plt.plot(t[pred==0],y_test[pred==0],'bx')
plt.plot(t[pred==1],y_test[pred==1],'ro')
plt.show
I guess it is because of two reasons:
these algorithms are not designed to handle 1-d data specifically
they are not designed for ts problem...
you may need to use time series tool for it

Categories

Resources