I'm trying to implement the following formulas in a Jupyter notebook:
)
Below, I have the code for loading the Noisy (Y) and Noise files (D).
# Read audio data from file
noisy_speech = AudioSegment.from_wav('NoisySignal/Station/sp01_station_sn5.wav')
y = noisy_speech.get_array_of_samples() # samples x(t)
y_f = noisy_speech.frame_rate # sampling rate f
#window size: the number of samples per frame, each frame is of 30ms
win_length = int(y_f * 0.03)
#number of samples between two consecutive frames, by default, hop_length = win_length / 4
hop_length = int(win_length / 2)
Y = librosa.stft(np.float32(y), n_fft = 2048, window = 'hann', hop_length = hop_length, win_length = win_length)
mag_Y = abs(Y)
angle = np.angle(Y)
print(Y.shape)
# Read audio data from file
n_speech = AudioSegment.from_wav('Noise/Station/Station_1.wav')
d = n_speech.get_array_of_samples() # samples x(t)
d_f = n_speech.frame_rate # sampling rate f
#window size: the number of samples per frame, each frame is of 30ms
win_length = int(d_f * 0.03)
#number of samples between two consecutive frames, by default, hop_length = win_length / 4
hop_length = int(win_length / 2)
D = librosa.stft(np.float32(y), n_fft = 2048, window = 'hann', hop_length = hop_length, win_length = win_length)
mag_D = abs(D)
means_mag_D = np.mean(mag_D, axis = 1)
So the array in Y & D, each column is a frame.
How would I implement the above formula for S-hat?
Is there a library that can do it for me, if not, how would I write it from scratch?
Also, if anyone has a link to a video or document for writing formulas to code, that would be helpful as well.
Thank you.
The formula doesn't involve any calculus or anything like that so you can simply use numpy. Work out how to do each of the individual mathematical operations using numpy and then put them all together.
import numpy as np
y = np.random.random(20)
d = np.random.random(20)
right = 1 - (d**2)/(y**2) # Calculate the right side of the max equation
right[right < 0] = 0 # Equivalent to max(0, right)
h = right**0.5
s_hat = h*y
Related
I have a specific python issue, that desperately needs to be sped up by avoiding the use of a loop, yet, I am at a loss as to how to do this. I need to read in a fits image, convert this to a numpy array (roughly, 2000 x 2000 elements in size), then for each element compute the statistics of a ring of elements around it.
As I have my code now, the statistics of the ring around the element is computed with a function using masks. This is fast but, of course, I call this function 2000x2000 times (the slow part).
I am relatively new to python. I think that using the mask function is clever, but I cannot find a way around individually addressing each element. Best of thanks for any help you can provide.
# First, the function computing the statistics within a ring
around the central pixel:<br/>
# flux = image intensity at pixel (i,j)<br/>
# rad1, rad2 = inner and outer radii<br/>
# array = image array<br/>_
def snr(flux, i, j, rad1, rad2, array):
a, b = i, j
nx, ny = array.shape
y, x = np.ogrid[-a:nx-a, -b:ny-b]
mask = (x*x + y*y >= rad1*rad1) & (x*x + y*y <= rad2*rad2)
Nmask = np.count_nonzero(mask)
noise = 0.6052697 * abs(Nmask * flux - sum(array[mask]))
return noise
# Now, the call to snr for each pixel in the array data1:<br/>_
frame1 = fits.open(in_frame, mode='readonly') # read in fits file
data1 = frame1[ext].data # convert to np array
ny, nx = data1.shape # array dimensions
noise1 = zeros((ny, nx), float) # empty array
r1 = 5 # inner radius (pixels)
r2 = 7 # outer radius (pixels)
# The function is fast, but calling it 2k x 2k times is not:
for j in range(ny):
for i in range(nx):
noise1[i,j] = der_snr(data1[i,j], i, j, r1, r2, data1)
The operation that you are trying to do can be expressed as an image convolution. Try something like this:
import numpy as np
import scipy.ndimage
from astropy.io import fits
def make_kernel(inner_radius, outer_radius):
if inner_radius > outer_radius:
raise ValueError
x, y = np.ogrid[-outer_radius:outer_radius + 1, -outer_radius:outer_radius + 1]
r2 = x * x + y * y
kernel = (r2 >= inner_radius * inner_radius) & (r2 <= outer_radius * outer_radius)
return kernel
in_frame = '<file path>'
ext = '...'
frame1 = fits.open(in_frame, mode='readonly')
data1 = frame1[ext].data
inner_radius = 5
outer_radius = 7
kernel = make_kernel(inner_radius, outer_radius)
n_kernel = np.count_nonzero(kernel)
conv = scipy.ndimage.convolve(data1, kernel, mode='constant')
noise1 = 0.6052697 * np.abs(n_kernel * data1 - conv)
I have data geographically scattered without any kind of pattern and I need to create an image where the value of each pixel is an average of the neighbors of that pixel that are less than X meters.
For this I use the library scipy.spatial to generate a KDTree with the data (cKDTree). Once the data structure is generated, I locate the pixel geographically and locate the geographic points that are closest.
# Generate scattered data points
coord_cart= [
[
feat.geometry().GetY(),
feat.geometry().GetX(),
feat.GetField(feature),
] for feat in layer
]
# Create KDTree structure
tree = cKDTree(coord_cart)
# Get raster image dimensions
pixel_size = 5
source_layer = shapefile.GetLayer()
x_min, x_max, y_min, y_max = source_layer.GetExtent()
x_res = int((x_max - x_min) / pixel_size)
y_res = int((y_max - y_min) / pixel_size)
# Create grid
x = np.linspace(x_min, x_max, x_res)
y = np.linspace(y_min, y_max, y_res)
X, Y = np.meshgrid(x, y)
grid = np.array(zip(Y.ravel(), X.ravel()))
# Get points that are less than 10 meters away
inds = tree.query_ball_point(grid, 10)
# inds is an np.array of lists of different length, so I need to convert it into an array of n_points x maximum number of neighbors
ll = np.array([len(l) for l in inds])
maxlen = max(ll)
arr = np.zeros((len(ll), maxlen), int)
# I don't know why but inds is an array of list, so I convert it into an array of array to use grid[inds]
# I THINK THIS IS A LITTLE INEFFICIENT
for i in range(len(inds)):
inds[i].extend([i] * (maxlen - len(inds[i])))
arr[i] = np.array(inds[i], dtype=int)
# AND THIS DOESN'T WORK
d = np.linalg.norm(grid - grid[inds])
Is there a better way to do this? I'm trying to use IDW to perform the interpolation between the points. I found this snippet that uses a function that gets the N nearest points but it does not work for me because I need that if there is no point in a radius R, the value of the pixel is 0.
d, inds = tree.query(zip(xt, yt, zt), k = 10)
w = 1.0 / d**2
air_idw = np.sum(w * air.flatten()[inds], axis=1) / np.sum(w, axis=1)
air_idw.shape = lon_curv.shape
Thanks in advance!
This may be one of the cases where KDTrees are not a good solution. This is because you are mapping to a grid, which is a very simple structure meaning there is nothing to gain from the KDTree's sophistication. Nearest grid point and distance can be found by simple arithmetic.
Below is a simple example implementation. I'm using a Gaussian kernel but changing that to IDW if you prefer should be straight-forward.
import numpy as np
from scipy import stats
def rasterize(coords, feature, gu, cutoff, kernel=stats.norm(0, 2.5).pdf):
# compute overlap (filter size / grid unit)
ovlp = int(np.ceil(cutoff/gu))
# compute raster dimensions
mn, mx = coords.min(axis=0), coords.max(axis=0)
reso = np.ceil((mx - mn) / gu).astype(int)
base = (mx + mn - reso * gu) / 2
# map coordinates to raster, the residual is the distance
grid_res = coords - base
grid_coords = np.rint(grid_res / gu).astype(int)
grid_res -= gu * grid_coords
# because of overlap we must add neighboring grid points to the nearest
gcovlp = np.c_[-ovlp:ovlp+1, np.zeros(2*ovlp+1, dtype=int)]
grid_coords = (gcovlp[:, None, None, :] + gcovlp[None, :, None, ::-1]
+ grid_coords).reshape(-1, 2)
# the corresponding residuals have the same offset with opposite sign
gdovlp = -gu * (gcovlp+1/2)
grid_res = (gdovlp[:, None, None, :] + gdovlp[None, :, None, ::-1]
+ grid_res).reshape(-1, 2)
# discard off fov grid points and points outside the cutoff
valid, = np.where(((grid_coords>=0) & (grid_coords<=reso)).all(axis=1) & (
np.einsum('ij,ij->i', grid_res, grid_res) <= cutoff*cutoff))
grid_res = grid_res[valid]
feature = feature[valid // (2*ovlp+1)**2]
# flatten grid so we can use bincount
grid_flat = np.ravel_multi_index(grid_coords[valid].T, reso+1)
return np.bincount(
grid_flat,
feature * kernel(np.sqrt(np.einsum('ij,ij->i', grid_res, grid_res))),
(reso + 1).prod()).reshape(reso+1)
gu = 5
cutoff = 10
coords = np.random.randn(10_000, 2) * (100, 20)
coords[:, 1] += 80 * np.sin(coords[:, 0] / 40)
feature = np.random.uniform(0, 1000, (10_000,))
from timeit import timeit
print(timeit("rasterize(coords, feature, gu, cutoff)", globals=globals(), number=100)*10, 'ms')
pic = rasterize(coords, feature, gu, cutoff)
import pylab
pylab.pcolor(pic, cmap=pylab.cm.jet)
pylab.colorbar()
pylab.show()
I was reading this paper "Self-Invertible 2D Log-Gabor Wavelets" it defines 2D log gabor filter as such:
The paper also states that the filter only covers one side of the frequency space and shows that in this image
On my attempt to implement the filter I get results that do not match with what is said in the paper. Let me start with my implementation then I will state the problems.
Implementation:
I created a 2d array that contains the filter and transformed each index so that the origin of the frequency domain is at the center of the array with positive x-axis going right and positive y-axis going up.
number_scales = 5 # scale resolution
number_orientations = 9 # orientation resolution
N = constantDim # image dimensions
def getLogGaborKernal(scale, angle, logfun=math.log2, norm = True):
# setup up filter configuration
center_scale = logfun(N) - scale
center_angle = ((np.pi/number_orientations) * angle) if (scale % 2) \
else ((np.pi/number_orientations) * (angle+0.5))
scale_bandwidth = 0.996 * math.sqrt(2/3)
angle_bandwidth = 0.996 * (1/math.sqrt(2)) * (np.pi/number_orientations)
# 2d array that will hold the filter
kernel = np.zeros((N, N))
# get the center of the 2d array so we can shift origin
middle = math.ceil((N/2)+0.1)-1
# calculate the filter
for x in range(0,constantDim):
for y in range(0,constantDim):
# get the transformed x and y where origin is at center
# and positive x-axis goes right while positive y-axis goes up
x_t, y_t = (x-middle),-(y-middle)
# calculate the filter value at given index
kernel[y,x] = logGaborValue(x_t,y_t,center_scale,center_angle,
scale_bandwidth, angle_bandwidth,logfun)
# normalize the filter energy
if norm:
Kernel = kernel / np.sum(kernel**2)
return kernel
To calculate the filter value at each index another transform is made where we go to the log-polar space
def logGaborValue(x,y,center_scale,center_angle,scale_bandwidth,
angle_bandwidth, logfun):
# transform to polar coordinates
raw, theta = getPolar(x,y)
# if we are at the center, return 0 as in the log space
# zero is not defined
if raw == 0:
return 0
# go to log polar coordinates
raw = logfun(raw)
# calculate (theta-center_theta), we calculate cos(theta-center_theta)
# and sin(theta-center_theta) then use atan to get the required value,
# this way we can eliminate the angular distance wrap around problem
costheta, sintheta = math.cos(theta), math.sin(theta)
ds = sintheta * math.cos(center_angle) - costheta * math.sin(center_angle)
dc = costheta * math.cos(center_angle) + sintheta * math.sin(center_angle)
dtheta = math.atan2(ds,dc)
# final value, multiply the radial component by the angular one
return math.exp(-0.5 * ((raw-center_scale) / scale_bandwidth)**2) * \
math.exp(-0.5 * (dtheta/angle_bandwidth)**2)
Problems:
The angle: the paper stated that indexing the angles from 1->8 would produce good coverage of the orientation, but in my implementation angles from 1->n don't cover except for half orientations. Even the vertical orientation is not correctly covered. This can be shown in this figure which contains sets of filters of scale 3 and orientations ranging from 1->8:
The coverage: from filters above it is clear the filter covers both sides of the space which is not what the paper says. This can be made more explicit by using 9 orientations ranging from -4 -> 4. The following image contains all the filters in one image to show how it covers both sides of the spectrum (this image is created by taking the maximum at each location from all filters):
Middle Column (orientation $\pi / 2$): in the first figure in orientation from 3 -> 8 it can be seen that the filter vanishes at orientation $ \pi / 2$. Is this normal? This can be seen too when I combine all the filters(of all 5 scales and 9 orientations) in one image:
Update:
Adding the impulse response of the filter in spatial domain, as you can see there is an obvious distortion in -4 & 4 orientations:
After a lot of code analysis, I found that my implementation was correct but the getPolar function was messed up, so the code above should work just fine. This is the a new code without the getPolar function if any one was looking for it:
number_scales = 5 # scale resolution
number_orientations = 8 # orientation resolution
N = 128 # image dimensions
def getFilter(f_0, theta_0):
# filter configuration
scale_bandwidth = 0.996 * math.sqrt(2/3)
angle_bandwidth = 0.996 * (1/math.sqrt(2)) * (np.pi/number_orientations)
# x,y grid
extent = np.arange(-N/2, N/2 + N%2)
x, y = np.meshgrid(extent,extent)
mid = int(N/2)
## orientation component ##
theta = np.arctan2(y,x)
center_angle = ((np.pi/number_orientations) * theta_0) if (f_0 % 2) \
else ((np.pi/number_orientations) * (theta_0+0.5))
# calculate (theta-center_theta), we calculate cos(theta-center_theta)
# and sin(theta-center_theta) then use atan to get the required value,
# this way we can eliminate the angular distance wrap around problem
costheta = np.cos(theta)
sintheta = np.sin(theta)
ds = sintheta * math.cos(center_angle) - costheta * math.sin(center_angle)
dc = costheta * math.cos(center_angle) + sintheta * math.sin(center_angle)
dtheta = np.arctan2(ds,dc)
orientation_component = np.exp(-0.5 * (dtheta/angle_bandwidth)**2)
## frequency componenet ##
# go to polar space
raw = np.sqrt(x**2+y**2)
# set origin to 1 as in the log space zero is not defined
raw[mid,mid] = 1
# go to log space
raw = np.log2(raw)
center_scale = math.log2(N) - f_0
draw = raw-center_scale
frequency_component = np.exp(-0.5 * (draw/ scale_bandwidth)**2)
# reset origin to zero (not needed as it is already 0?)
frequency_component[mid,mid] = 0
return frequency_component * orientation_component
I like to prototype algorithms in Matlab, but I have the requirement of putting them on a server that also runs quite a bit of Python code. Hence I quickly converted the code to Python and compared the two. The Matlab implementation runs ~1000 times faster (from timing function calls - no profiling). Anyone know off hand why the performance of Python is so slow?
Matlab
% init random data
w = 800;
h = 1200;
hmap = zeros(w,h);
npts = 250;
for i=1:npts
hmap(randi(w),randi(h)) = hmap(randi(w),randi(h))+1;
end
% Params
disksize = 251;
nBreaks = 25;
saturation = .9;
floorthresh =.05;
fh = fspecial('gaussian', disksize, disksize/7);
hmap = conv2(hmap, fh, 'same');
% Scaling, paritioning etc
hmap = hmap/(max(max(hmap)));
hmap(hmap<floorthresh) = 0;
hmap = round(nBreaks * hmap)/nBreaks;
hmap = hmap * (1/saturation);
% Show the image
imshow(hmap, [0,1])
colormap('jet')
Python
import numpy as np
from scipy.signal import convolve2d as conv2
# Test data parameters
w = 800
h = 1200
npts = 250
# generate data
xvals = np.random.randint(w, size=npts)
yvals = np.random.randint(h, size=npts)
# Heatmap parameters
gaussianSize = 250
nbreaks = 25
# Preliminary function definitions
def populateMat(w, h, xvals, yvals):
container = np.zeros((w,h))
for idx in range(0,xvals.size):
x = xvals[idx]
y = yvals[idx]
container[x,y] += 1
return container
def makeGaussian(size, fwhm):
x = np.arange(0, size, 1, float)
y = x[:,np.newaxis]
x0 = y0 = size // 2
return np.exp(-4*np.log(2) * ((x-x0)**2 + (y-y0)**2) / fwhm**2)
# Create the data matrix
dmat = populateMat(w,h,xvals,yvals)
h = makeGaussian(gaussianSize, fwhm=gaussianSize/2)
# Convolve
dmat2 = conv2(dmat, h, mode='same')
# Scaling etc
dmat2 = dmat2 / dmat2.max()
dmat2 = np.round(nbreaks*dmat2)/nbreaks
# Show
imshow(dmat2)
Ok, problem solved for me thanks to suggestion from #Yves Daust's comments;
The filter scipy.ndimage.filters.gaussian_filter utilises the separability of the kernel and reduces the running time to within a single order of magnitude of the matlab implementation.
import numpy as np
from scipy.ndimage.filters import gaussian_filter as gaussian
# Test data parameters
w = 800
h = 1200
npts = 250
# generate data
xvals = np.random.randint(w, size=npts)
yvals = np.random.randint(h, size=npts)
# Heatmap parameters
gaussianSize = 250
nbreaks = 25
# Preliminary function definitions
def populateMat(w, h, xvals, yvals):
container = np.zeros((w,h))
for idx in range(0,xvals.size):
x = xvals[idx]
y = yvals[idx]
container[x,y] += 1
return container
# Create the data matrix
dmat = populateMat(w,h,xvals,yvals)
# Convolve
dmat2 = gaussian(dmat, gaussianSize/7)
# Scaling etc
dmat2 = dmat2 / dmat2.max()
dmat2 = np.round(nbreaks*dmat2)/nbreaks
# Show
imshow(dmat2)
Main Problem: How can the scipy.signal.cwt() function be inversed.
I have seen where Matlab has an inverse continuous wavelet transform function which will return the original form of the data by inputting the wavelet transform, although you can filter out the slices you don't want.
MATALAB inverse cwt funciton
Since scipy doesn't appear to have the same function, I have been trying to figure out how to get the data back in the same form, while removing the noise and background.
How do I do this?
I tried squaring it to remove negative values, but this gives me values way to large and not quite right.
Here is what I have been trying:
# Compute the wavelet transform
widths = range(1,11)
cwtmatr = signal.cwt(xy['y'], signal.ricker, widths)
# Maybe we multiple by the original data? and square?
WT_to_original_data = (xy['y'] * cwtmatr)**2
And here is a fully compilable short script to show you the type of data I am trying to get and what I have etc.:
import numpy as np
from scipy import signal
import matplotlib.pyplot as plt
# Make some random data with peaks and noise
def make_peaks(x):
bkg_peaks = np.array(np.zeros(len(x)))
desired_peaks = np.array(np.zeros(len(x)))
# Make peaks which contain the data desired
# (Mid range/frequency peaks)
for i in range(0,10):
center = x[-1] * np.random.random() - x[0]
amp = 60 * np.random.random() + 10
width = 10 * np.random.random() + 5
desired_peaks += amp * np.e**(-(x-center)**2/(2*width**2))
# Also make background peaks (not desired)
for i in range(0,3):
center = x[-1] * np.random.random() - x[0]
amp = 40 * np.random.random() + 10
width = 100 * np.random.random() + 100
bkg_peaks += amp * np.e**(-(x-center)**2/(2*width**2))
return bkg_peaks, desired_peaks
x = np.array(range(0, 1000))
bkg_peaks, desired_peaks = make_peaks(x)
y_noise = np.random.normal(loc=30, scale=10, size=len(x))
y = bkg_peaks + desired_peaks + y_noise
xy = np.array( zip(x,y), dtype=[('x',float), ('y',float)])
# Compute the wavelet transform
# I can't figure out what the width is or does?
widths = range(1,11)
# Ricker is 2nd derivative of Gaussian
# (*close* to what *most* of the features are in my data)
# (They're actually Lorentzians and Breit-Wigner-Fano lines)
cwtmatr = signal.cwt(xy['y'], signal.ricker, widths)
# Maybe we multiple by the original data? and square?
WT = (xy['y'] * cwtmatr)**2
# plot the data and results
fig = plt.figure()
ax_raw_data = fig.add_subplot(4,3,1)
ax = {}
for i in range(0, 11):
ax[i] = fig.add_subplot(4,3, i+2)
ax_desired_transformed_data = fig.add_subplot(4,3,12)
ax_raw_data.plot(xy['x'], xy['y'], 'g-')
for i in range(0,10):
ax[i].plot(xy['x'], WT[i])
ax_desired_transformed_data.plot(xy['x'], desired_peaks, 'k-')
fig.tight_layout()
plt.show()
This script will output this image:
Where the first plot is the raw data, the middle plots are the wavelet transforms and the last plot is what I want to get out as the processed (background and noise removed) data.
Does anyone have any suggestions? Thank you so much for the help.
I ended up finding a package which provides an inverse wavelet transform function called mlpy. The function is mlpy.wavelet.uwt. This is the compilable script I ended up with which may interest people if they are trying to do noise or background removal:
import numpy as np
from scipy import signal
import matplotlib.pyplot as plt
import mlpy.wavelet as wave
# Make some random data with peaks and noise
############################################################
def gen_data():
def make_peaks(x):
bkg_peaks = np.array(np.zeros(len(x)))
desired_peaks = np.array(np.zeros(len(x)))
# Make peaks which contain the data desired
# (Mid range/frequency peaks)
for i in range(0,10):
center = x[-1] * np.random.random() - x[0]
amp = 100 * np.random.random() + 10
width = 10 * np.random.random() + 5
desired_peaks += amp * np.e**(-(x-center)**2/(2*width**2))
# Also make background peaks (not desired)
for i in range(0,3):
center = x[-1] * np.random.random() - x[0]
amp = 80 * np.random.random() + 10
width = 100 * np.random.random() + 100
bkg_peaks += amp * np.e**(-(x-center)**2/(2*width**2))
return bkg_peaks, desired_peaks
# make x axis
x = np.array(range(0, 1000))
bkg_peaks, desired_peaks = make_peaks(x)
avg_noise_level = 30
std_dev_noise = 10
size = len(x)
scattering_noise_amp = 100
scat_center = 100
scat_width = 15
scat_std_dev_noise = 100
y_scattering_noise = np.random.normal(scattering_noise_amp, scat_std_dev_noise, size) * np.e**(-(x-scat_center)**2/(2*scat_width**2))
y_noise = np.random.normal(avg_noise_level, std_dev_noise, size) + y_scattering_noise
y = bkg_peaks + desired_peaks + y_noise
xy = np.array( zip(x,y), dtype=[('x',float), ('y',float)])
return xy
# Random data Generated
#############################################################
xy = gen_data()
# Make 2**n amount of data
new_y, bool_y = wave.pad(xy['y'])
orig_mask = np.where(bool_y==True)
# wavelet transform parameters
levels = 8
wf = 'h'
k = 2
# Remove Noise first
# Wave transform
wt = wave.uwt(new_y, wf, k, levels)
# Matrix of the difference between each wavelet level and the original data
diff_array = np.array([(wave.iuwt(wt[i:i+1], wf, k)-new_y) for i in range(len(wt))])
# Index of the level which is most similar to original data (to obtain smoothed data)
indx = np.argmin(np.sum(diff_array**2, axis=1))
# Use the wavelet levels around this region
noise_wt = wt[indx:indx+1]
# smoothed data in 2^n length
new_y = wave.iuwt(noise_wt, wf, k)
# Background Removal
error = 10000
errdiff = 100
i = -1
iter_y_dict = {0:np.copy(new_y)}
bkg_approx_dict = {0:np.array([])}
while abs(errdiff)>=1*10**-24:
i += 1
# Wave transform
wt = wave.uwt(iter_y_dict[i], wf, k, levels)
# Assume last slice is lowest frequency (background approximation)
bkg_wt = wt[-3:-1]
bkg_approx_dict[i] = wave.iuwt(bkg_wt, wf, k)
# Get the error
errdiff = error - sum(iter_y_dict[i] - bkg_approx_dict[i])**2
error = sum(iter_y_dict[i] - bkg_approx_dict[i])**2
# Make every peak higher than bkg_wt
diff = (new_y - bkg_approx_dict[i])
peak_idxs_to_remove = np.where(diff>0.)[0]
iter_y_dict[i+1] = np.copy(new_y)
iter_y_dict[i+1][peak_idxs_to_remove] = np.copy(bkg_approx_dict[i])[peak_idxs_to_remove]
# new data without noise and background
new_y = new_y[orig_mask]
bkg_approx = bkg_approx_dict[len(bkg_approx_dict.keys())-1][orig_mask]
new_data = diff[orig_mask]
##############################################################
# plot the data and results
fig = plt.figure()
ax_raw_data = fig.add_subplot(121)
ax_WT = fig.add_subplot(122)
ax_raw_data.plot(xy['x'], xy['y'], 'g')
for bkg in bkg_approx_dict.values():
ax_raw_data.plot(xy['x'], bkg[orig_mask], 'k')
ax_WT.plot(xy['x'], new_data, 'y')
fig.tight_layout()
plt.show()
And here is the output I am getting now:
As you can see, there is still a problem with the background removal (it shifts to the right after each iteration), but it is a different question which I will address here.