Related
I try to rotate an image clockwise 45 degree and translate the image -50,-50.
Rotation process works fine:(I refer to this page:How do I rotate an image manually without using cv2.getRotationMatrix2D)
import numpy as np
import math
from scipy import ndimage
from PIL import Image
# inputs
img = ndimage.imread("A.png")
rotation_amount_degree = 45
# convert rotation amount to radian
rotation_amount_rad = rotation_amount_degree * np.pi / 180.0
# get dimension info
height, width, num_channels = img.shape
# create output image, for worst case size (45 degree)
max_len = int(math.sqrt(height*height + width*width))
rotated_image = np.zeros((max_len, max_len, num_channels))
#rotated_image = np.zeros((img.shape))
rotated_height, rotated_width, _ = rotated_image.shape
mid_row = int( (rotated_height+1)/2 )
mid_col = int( (rotated_width+1)/2 )
# for each pixel in output image, find which pixel
#it corresponds to in the input image
for r in range(rotated_height):
for c in range(rotated_width):
# apply rotation matrix, the other way
y = (r-mid_col)*math.cos(rotation_amount_rad) + (c-mid_row)*math.sin(rotation_amount_rad)
x = -(r-mid_col)*math.sin(rotation_amount_rad) + (c-mid_row)*math.cos(rotation_amount_rad)
# add offset
y += mid_col
x += mid_row
# get nearest index
#a better way is linear interpolation
x = round(x)
y = round(y)
#print(r, " ", c, " corresponds to-> " , y, " ", x)
# check if x/y corresponds to a valid pixel in input image
if (x >= 0 and y >= 0 and x < width and y < height):
rotated_image[r][c][:] = img[y][x][:]
# save output image
output_image = Image.fromarray(rotated_image.astype("uint8"))
output_image.save("rotated_image.png")
However, when I try to translate the image. I edited the above code to this:
if (x >= 0 and y >= 0 and x < width and y < height):
rotated_image[r-50][c-50][:] = img[y][x][:]
But I got something like this:
It seems the right and the bottom did not show the right pixel. How could I solve it?
Any suggestions would be highly appreciated.
The translation needs to be handled as a wholly separate step. Trying to translate the value from the source image doesn't account for newly created 0,0,0 (if RGB) valued pixels by the rotation.
Further, simply subtracting 50 from the rotated array index values, without validating them at that stage for positivity, is allowing for a negative valued index, which is fully supported by Python. That is why you are getting a "wrap" effect instead of a translation
You said your script rotated the image as intended, so while perhaps not the most efficient, the most intuitive is to simply shift the values of the image assembled after you rotate. You could test that the values for the new image remain positive after subtracting 50 and only saving the ones >= 0 or being cognizant of the fact that you are shifting the values downward by 50, any number less than 50 will be discarded and you get:
<what you in the block you said was functional then:>
translated_image = np.zeros((max_len, max_len, num_channels))
for i in range(0, rotated_height-50): # range(start, stop[, step])
for j in range(0, rotated_width-50):
translated_image[i+50][j+50][:] = rotated[i][j][:]
# save output image
output_image = Image.fromarray(translated_image.astype("uint8"))
output_image.save("rotated_translated_image.png")
I am trying to write a code for the orbit of the earth in SI using a symplectic integrator, my attempt is as follows:
import numpy as np
import matplotlib.pyplot as plt
#Set parameters
G = 6.67348e-11
mEar = 5.972e24
mSun = 1.989e30
def earth_orbit(x0, y0, vx0, vy0, N):
dt = 1/N #timestep
pos_arr = np.zeros((N,2)) #empty array to store position
vel_arr = np.zeros((N,2)) #empty array to store velocities
#Initial conditions
# x0 = x
# y0 = y
# vx0 = vx
# vy0 = vy
pos_arr[0] = (x0,y0) #set the intial positions in the array
vel_arr[0] = (vx0,vy0) #set the initial velocities in the array
#Implement Verlet Algorithm
for k in range (N-1):
pos_arr[k+1] = pos_arr[k] + vel_arr[k]*dt #update positions
force = -G * mSun * mEar * pos_arr[k+1] / (np.linalg.norm(pos_arr[k+1])**3) #force calculation
vel_arr[k+1] = vel_arr[k] + (force/mEar) * dt #update velocities
#Plot:
plt.plot(pos_arr, 'go', markersize = 1, label = 'Earth trajectory')
# plt.plot(0,0,'yo', label = 'Sun positon') # yellow marker
# plt.plot(pos_arr[0],'bo', label = 'Earth initial positon') # dark blue marker
plt.axis('equal')
plt.xlabel ('x')
plt.ylabel ('y')
return pos_arr, vel_arr
earth_orbit(149.59787e9, 0, 0, 29800, 1000)
The output is 2 dots and I can't figure out if this is a unit issue or a calculation issue?
Display the trajectory
pos_arr contains the x and y coordinates in its columns. To display the whole trajectory, plt.plot(pos_arr[:,0], pos_arr[:,1]) can thus be used. I would prefer to use plt.plot(*pos_arr.T) as a shorter alternative. The line that displays the trajectory must be replaced by:
plt.plot(*pos_arr.T, 'g', label = 'Earth trajectory')
Change the timestep
Here the timestep (in second) is chosen as 1/N, where N is the number of iterations. So, the total duration of the simulation is equal to timestep * N = 1 second ! For N=1000, you can instead try with timestep = 3600*12 (half-day), so that the total duration is a little less than 1.5 years. I suggest passing the duration as a parameter of the function earth_orbit and then setting timestep as duration / N.
def earth_orbit(x0, y0, vx0, vy0, N=1000, duration=3.15e7):
dt = duration / N
...
As said in the comments, this is not the Verlet algorithm, but the symplectic Euler algorithm. The difference is in the initialization, but in comparing against a more exact reference solution and with several step sizes, the difference in the orders, 2 vs. 1, will be quite visible.
A short change to the time loop ensuring that the velocities are at the half-time steps as required for Leapfrog Verlet could look like this:
def force(pos): return -G * mSun * mEar * pos_arr[k+1] / (np.linalg.norm(pos_arr[k+1])**3) #force calculation
pos_arr[0] = (x0,y0) #set the intial positions in the array
vel_arr[0] = (vx0,vy0) #set the initial velocities in the array
vel_arr[0] += (force(pos_arr[0])/mEar) * (0.5*dt) #correct for velocity at half-time
#Implement Verlet Algorithm
for k in range (N-1):
pos_arr[k+1] = pos_arr[k] + vel_arr[k] * dt #update positions
vel_arr[k+1] = vel_arr[k] + (force(pos_arr[k+1])/mEar) * dt #update velocities
The goal here is to construct the one-particle distribution function of a system evolving under Brownian dynamics; one has to produce a random number drawn from a Gaussian distribution. To construct this quantity, I am thinking of running several simulations, and for specific times in each of the simulations, save the distances of each particle from the center of the 2D square and only in the end, create a histogram of all the values.
My problem is that during each of the simulations, time begins from zero and goes on with a certain time-step, for each of which the particles move randomly. So, the distances to be saved have to be labeled correctly for their corresponding times.
So, my thought was to create an array that will have in each row, 5 sub-arrays, one for each time I want to save the distances of the particles from the center of the square. I am trying to work this with numpy, but with no success. For each simulation, and for specific times, I create an array with all the distances, and I try to append it with numpy.append to the specific subarray, but this doesn't work correctly; as I understand the problem lies in the fact that I don't know how to index properly( and for all the simulations) the sub-arrays.
Beyond that, I think that the approach is not the best: either I will have to abandon the idea of using numpy and figure out how I can index with two indices the array properly, or figure out a way to use numpy more effectively.
So, to the point, the general question here is how I could add/append values to specific sub-arrays of an array ( either pre-constructed with numpy or not and treated as a list).
The alternative would be for someone to mention a more efficient way of creating the one-particle distribution function of a Brownian motion problem, which would be really helpful.
I am adding the relevant code below. Thank you all in advance.
Code:
import random
import math
import matplotlib.pyplot as plt
import numpy as np
# def dump(particles,step,n):
# fileoutput = open('coord.txt', 'a')
# fileoutput.write("ITEM: TIMESTEP \n")
# fileoutput.write("%i \n" % step)
# fileoutput.write("ITEM: NUMBER OF ATOMS \n")
# fileoutput.write("%i \n" % n)
# fileoutput.write("ITEM: BOX BOUNDS \n")
# fileoutput.write("%e %e xlo xhi \n" % (0.0, 100))
# fileoutput.write("%e %e xlo xhi \n" % (0.0, 100))
# fileoutput.write("%e %e xlo xhi \n" % (-0.25, 0.25))
# fileoutput.write("ITEM: ATOMS id type x y z \n")
# i = 0
# while i < n:
# x = particles[i][0]
# y = particles[i][1]
# #fileoutput.write("%i %i %f %f %f \n" % (i, 1, x*1e10, y*1e10, z*1e10))
# fileoutput.write("%i %i %f %f %f \n" % (i, 1, x, y, 0))
# i += 1
# fileoutput.close()
num_sims = 2
N = 49
L = 10
meanz = 0
varz = 1
sigma = 1
# tau = sigma**2*ksi/(kT)
# Starting time
t_0 = 0
# Time increments
dt = 10**(-4) # dt/tau
# Ending time
T = 10**2 # T/tau
# Produce random particles and avoid overlap:
particles = np.full((N, 2), L/2)
times = np.arange(t_0, T, dt)
check = 0
distances = np.empty([50*num_sims, 5])
for sim in range(0, num_sims):
step = 0
t_index = 0
for t in times:
r=[]
for i in range(0,N):
z = np.random.normal(meanz, varz)
particles[i][0] = particles[i][0] + ((2*dt*sigma**2)**(1/2))*z
z = random.gauss(meanz, varz)
particles[i][1] = particles[i][1] + ((2*dt*sigma**2)**(1/2))*z
if (t%(2*(10**5)*dt) == 0):
for j in range (0,N):
rj = ((particles[j][0]-L/2)**2 + (particles[j][1]-L/2)**2)**(1/2)
r.append(rj)
distances[t_index] = np.append(distances[t_index],r)
t_index += 1
I have an application that requires a disk populated with 'n' points in a quasi-random fashion. I want the points to be somewhat random, but still have a more or less regular density over the disk.
My current method is to place a point, check if it's inside the disk, and then check if it is also far enough away from all other points already kept. My code is below:
import os
import random
import math
# ------------------------------------------------ #
# geometric constants
center_x = -1188.2
center_y = -576.9
center_z = -3638.3
disk_distance = 2.0*5465.6
disk_diam = 5465.6
# ------------------------------------------------ #
pts_per_disk = 256
closeness_criteria = 200.0
min_closeness_criteria = disk_diam/closeness_criteria
disk_center = [(center_x-disk_distance),center_y,center_z]
pts_in_disk = []
while len(pts_in_disk) < (pts_per_disk):
potential_pt_x = disk_center[0]
potential_pt_dy = random.uniform(-disk_diam/2.0, disk_diam/2.0)
potential_pt_y = disk_center[1]+potential_pt_dy
potential_pt_dz = random.uniform(-disk_diam/2.0, disk_diam/2.0)
potential_pt_z = disk_center[2]+potential_pt_dz
potential_pt_rad = math.sqrt((potential_pt_dy)**2+(potential_pt_dz)**2)
if potential_pt_rad < (disk_diam/2.0):
far_enough_away = True
for pt in pts_in_disk:
if math.sqrt((potential_pt_x - pt[0])**2+(potential_pt_y - pt[1])**2+(potential_pt_z - pt[2])**2) > min_closeness_criteria:
pass
else:
far_enough_away = False
break
if far_enough_away:
pts_in_disk.append([potential_pt_x,potential_pt_y,potential_pt_z])
outfile_name = "pt_locs_x_lo_"+str(pts_per_disk)+"_pts.txt"
outfile = open(outfile_name,'w')
for pt in pts_in_disk:
outfile.write(" ".join([("%.5f" % (pt[0]/1000.0)),("%.5f" % (pt[1]/1000.0)),("%.5f" % (pt[2]/1000.0))])+'\n')
outfile.close()
In order to get the most even point density, what I do is basically iteratively run this script using another script, with the 'closeness' criteria reduced for each successive iteration. At some point, the script can not finish, and I just use the points of the last successful iteration.
So my question is rather broad: is there a better way to do this? My method is ok for now, but my gut says that there is a better way to generate such a field of points.
An illustration of the output is graphed below, one with a high closeness criteria, and another with a 'lowest found' closeness criteria (what I want).
A simple solution based on Disk Point Picking from MathWorld:
import numpy as np
import matplotlib.pyplot as plt
n = 1000
r = np.random.uniform(low=0, high=1, size=n) # radius
theta = np.random.uniform(low=0, high=2*np.pi, size=n) # angle
x = np.sqrt(r) * np.cos(theta)
y = np.sqrt(r) * np.sin(theta)
# for plotting circle line:
a = np.linspace(0, 2*np.pi, 500)
cx,cy = np.cos(a), np.sin(a)
fg, ax = plt.subplots(1, 1)
ax.plot(cx, cy,'-', alpha=.5) # draw unit circle line
ax.plot(x, y, '.') # plot random points
ax.axis('equal')
ax.grid(True)
fg.canvas.draw()
plt.show()
It gives.
Alternatively, you also could create a regular grid and distort it randomly:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.tri as tri
n = 20
tt = np.linspace(-1, 1, n)
xx, yy = np.meshgrid(tt, tt) # create unit square grid
s_x, s_y = xx.ravel(), yy.ravel()
ii = np.argwhere(s_x**2 + s_y**2 <= 1).ravel() # mask off unwanted points
x, y = s_x[ii], s_y[ii]
triang = tri.Triangulation(x, y) # create triangluar grid
# distort the grid
g = .5 # distortion factor
rx = x + np.random.uniform(low=-g/n, high=g/n, size=x.shape)
ry = y + np.random.uniform(low=-g/n, high=g/n, size=y.shape)
rtri = tri.Triangulation(rx, ry, triang.triangles) # distorted grid
# for circle:
a = np.linspace(0, 2*np.pi, 500)
cx,cy = np.cos(a), np.sin(a)
fg, ax = plt.subplots(1, 1)
ax.plot(cx, cy,'k-', alpha=.2) # circle line
ax.triplot(triang, "g-", alpha=.4)
ax.triplot(rtri, 'b-', alpha=.5)
ax.axis('equal')
ax.grid(True)
fg.canvas.draw()
plt.show()
It gives
The triangles are just there for visualization. The obvious disadvantage is that depending on your choice of grid, either in the middle or on the borders (as shown here), there will be more or less large "holes" due to the grid discretization.
If you have a defined area like a disc (circle) that you wish to generate random points within you are better off using an equation for a circle and limiting on the radius:
x^2 + y^2 = r^2 (0 < r < R)
or parametrized to two variables
cos(a) = x/r
sin(a) = y/r
sin^2(a) + cos^2(a) = 1
To generate something like the pseudo-random distribution with low density you should take the following approach:
For randomly distributed ranges of r and a choose n points.
This allows you to generate your distribution to roughly meet your density criteria.
To understand why this works imagine your circle first divided into small rings of length dr, now imagine your circle divided into pie slices of angle da. Your randomness now has equal probability over the whole boxed area arou d the circle. If you divide the areas of allowed randomness throughout your circle you will get a more even distribution around the overall circle and small random variation for the individual areas giving you the psudo-random look and feel you are after.
Now your job is just to generate n points for each given area. You will want to have n be dependant on r as the area of each division changes as you move out of the circle. You can proportion this to the exact change in area each space brings:
for the n-th to n+1-th ring:
d(Area,n,n-1) = Area(n) - Area(n-1)
The area of any given ring is:
Area = pi*(dr*n)^2 - pi*(dr*(n-1))
So the difference becomes:
d(Area,n,n-1) = [pi*(dr*n)^2 - pi*(dr*(n-1))^2] - [pi*(dr*(n-1))^2 - pi*(dr*(n-2))^2]
d(Area,n,n-1) = pi*[(dr*n)^2 - 2*(dr*(n-1))^2 + (dr*(n-2))^2]
You could expound this to gain some insight on how much n should increase but it may be faster to just guess at some percentage increase (30%) or something.
The example I have provided is a small subset and decreasing da and dr will dramatically improve your results.
Here is some rough code for generating such points:
import random
import math
R = 10.
n_rings = 10.
n_angles = 10.
dr = 10./n_rings
da = 2*math.pi/n_angles
base_points_per_division = 3
increase_per_level = 1.1
points = []
ring = 0
while ring < n_rings:
angle = 0
while angle < n_angles:
for i in xrange(int(base_points_per_division)):
ra = angle*da + da*math.random()
rr = r*dr + dr*random.random()
x = rr*math.cos(ra)
y = rr*math.sin(ra)
points.append((x,y))
angle += 1
base_points_per_division = base_points_per_division*increase_per_level
ring += 1
I tested it with the parameters:
n_rings = 20
n_angles = 20
base_points = .9
increase_per_level = 1.1
And got the following results:
It looks more dense than your provided image, but I imagine further tweaking of those variables could be beneficial.
You can add an additional part to scale the density properly by calculating the number of points per ring.
points_per_ring = densitymath.pi(dr**2)*(2*n+1)
points_per_division = points_per_ring/n_angles
This will provide a an even better scaled distribution.
density = .03
points = []
ring = 0
while ring < n_rings:
angle = 0
base_points_per_division = density*math.pi*(dr**2)*(2*ring+1)/n_angles
while angle < n_angles:
for i in xrange(int(base_points_per_division)):
ra = angle*da + min(da,da*random.random())
rr = ring*dr + dr*random.random()
x = rr*math.cos(ra)
y = rr*math.sin(ra)
points.append((x,y))
angle += 1
ring += 1
Giving better results using the following parameters
R = 1.
n_rings = 10.
n_angles = 10.
density = 10/(dr*da) # ~ ten points per unit area
With a graph...
and for fun you can graph the divisions to see how well it is matching your distriubtion and adjust.
Depending on how random the points need to be, it may be simple enough to just make a grid of points within the disk, and then displace each point by some small but random amount.
It may be that you want more randomness, but if you just want to fill your disc with an even-looking distribution of points that aren't on an obvious grid, you could try a spiral with a random phase.
import math
import random
import pylab
n = 300
alpha = math.pi * (3 - math.sqrt(5)) # the "golden angle"
phase = random.random() * 2 * math.pi
points = []
for k in xrange(n):
theta = k * alpha + phase
r = math.sqrt(float(k)/n)
points.append((r * math.cos(theta), r * math.sin(theta)))
pylab.scatter(*zip(*points))
pylab.show()
Probability theory ensures that the rejection method is an appropriate method
to generate uniformly distributed points within the disk, D(0,r), centered at origin and of radius r. Namely, one generates points within the square [-r,r] x [-r,r], until a point falls within the disk:
do{
generate P in [-r,r]x[-r,r];
}while(P[0]**2+P[1]**2>r);
return P;
unif_rnd_disk is a generator function implementing this rejection method:
import matplotlib.pyplot as plt
import numpy as np
import itertools
def unif_rnd_disk(r=1.0):
pt=np.zeros(2)
while True:
yield pt
while True:
pt=-r+2*r*np.random.random(2)
if (pt[0]**2+pt[1]**2<=r):
break
G=unif_rnd_disk()# generator of points in disk D(0,r=1)
X,Y=zip(*[pt for pt in itertools.islice(G, 1, 1000)])
plt.scatter(X, Y, color='r', s=3)
plt.axis('equal')
If we want to generate points in a disk centered at C(a,b), we have to apply a translation to the points in the disk D(0,r):
C=[2.0, -3.5]
plt.scatter(C[0]+np.array(X), C[1]+np.array(Y), color='r', s=3)
plt.axis('equal')
Main Problem: How can the scipy.signal.cwt() function be inversed.
I have seen where Matlab has an inverse continuous wavelet transform function which will return the original form of the data by inputting the wavelet transform, although you can filter out the slices you don't want.
MATALAB inverse cwt funciton
Since scipy doesn't appear to have the same function, I have been trying to figure out how to get the data back in the same form, while removing the noise and background.
How do I do this?
I tried squaring it to remove negative values, but this gives me values way to large and not quite right.
Here is what I have been trying:
# Compute the wavelet transform
widths = range(1,11)
cwtmatr = signal.cwt(xy['y'], signal.ricker, widths)
# Maybe we multiple by the original data? and square?
WT_to_original_data = (xy['y'] * cwtmatr)**2
And here is a fully compilable short script to show you the type of data I am trying to get and what I have etc.:
import numpy as np
from scipy import signal
import matplotlib.pyplot as plt
# Make some random data with peaks and noise
def make_peaks(x):
bkg_peaks = np.array(np.zeros(len(x)))
desired_peaks = np.array(np.zeros(len(x)))
# Make peaks which contain the data desired
# (Mid range/frequency peaks)
for i in range(0,10):
center = x[-1] * np.random.random() - x[0]
amp = 60 * np.random.random() + 10
width = 10 * np.random.random() + 5
desired_peaks += amp * np.e**(-(x-center)**2/(2*width**2))
# Also make background peaks (not desired)
for i in range(0,3):
center = x[-1] * np.random.random() - x[0]
amp = 40 * np.random.random() + 10
width = 100 * np.random.random() + 100
bkg_peaks += amp * np.e**(-(x-center)**2/(2*width**2))
return bkg_peaks, desired_peaks
x = np.array(range(0, 1000))
bkg_peaks, desired_peaks = make_peaks(x)
y_noise = np.random.normal(loc=30, scale=10, size=len(x))
y = bkg_peaks + desired_peaks + y_noise
xy = np.array( zip(x,y), dtype=[('x',float), ('y',float)])
# Compute the wavelet transform
# I can't figure out what the width is or does?
widths = range(1,11)
# Ricker is 2nd derivative of Gaussian
# (*close* to what *most* of the features are in my data)
# (They're actually Lorentzians and Breit-Wigner-Fano lines)
cwtmatr = signal.cwt(xy['y'], signal.ricker, widths)
# Maybe we multiple by the original data? and square?
WT = (xy['y'] * cwtmatr)**2
# plot the data and results
fig = plt.figure()
ax_raw_data = fig.add_subplot(4,3,1)
ax = {}
for i in range(0, 11):
ax[i] = fig.add_subplot(4,3, i+2)
ax_desired_transformed_data = fig.add_subplot(4,3,12)
ax_raw_data.plot(xy['x'], xy['y'], 'g-')
for i in range(0,10):
ax[i].plot(xy['x'], WT[i])
ax_desired_transformed_data.plot(xy['x'], desired_peaks, 'k-')
fig.tight_layout()
plt.show()
This script will output this image:
Where the first plot is the raw data, the middle plots are the wavelet transforms and the last plot is what I want to get out as the processed (background and noise removed) data.
Does anyone have any suggestions? Thank you so much for the help.
I ended up finding a package which provides an inverse wavelet transform function called mlpy. The function is mlpy.wavelet.uwt. This is the compilable script I ended up with which may interest people if they are trying to do noise or background removal:
import numpy as np
from scipy import signal
import matplotlib.pyplot as plt
import mlpy.wavelet as wave
# Make some random data with peaks and noise
############################################################
def gen_data():
def make_peaks(x):
bkg_peaks = np.array(np.zeros(len(x)))
desired_peaks = np.array(np.zeros(len(x)))
# Make peaks which contain the data desired
# (Mid range/frequency peaks)
for i in range(0,10):
center = x[-1] * np.random.random() - x[0]
amp = 100 * np.random.random() + 10
width = 10 * np.random.random() + 5
desired_peaks += amp * np.e**(-(x-center)**2/(2*width**2))
# Also make background peaks (not desired)
for i in range(0,3):
center = x[-1] * np.random.random() - x[0]
amp = 80 * np.random.random() + 10
width = 100 * np.random.random() + 100
bkg_peaks += amp * np.e**(-(x-center)**2/(2*width**2))
return bkg_peaks, desired_peaks
# make x axis
x = np.array(range(0, 1000))
bkg_peaks, desired_peaks = make_peaks(x)
avg_noise_level = 30
std_dev_noise = 10
size = len(x)
scattering_noise_amp = 100
scat_center = 100
scat_width = 15
scat_std_dev_noise = 100
y_scattering_noise = np.random.normal(scattering_noise_amp, scat_std_dev_noise, size) * np.e**(-(x-scat_center)**2/(2*scat_width**2))
y_noise = np.random.normal(avg_noise_level, std_dev_noise, size) + y_scattering_noise
y = bkg_peaks + desired_peaks + y_noise
xy = np.array( zip(x,y), dtype=[('x',float), ('y',float)])
return xy
# Random data Generated
#############################################################
xy = gen_data()
# Make 2**n amount of data
new_y, bool_y = wave.pad(xy['y'])
orig_mask = np.where(bool_y==True)
# wavelet transform parameters
levels = 8
wf = 'h'
k = 2
# Remove Noise first
# Wave transform
wt = wave.uwt(new_y, wf, k, levels)
# Matrix of the difference between each wavelet level and the original data
diff_array = np.array([(wave.iuwt(wt[i:i+1], wf, k)-new_y) for i in range(len(wt))])
# Index of the level which is most similar to original data (to obtain smoothed data)
indx = np.argmin(np.sum(diff_array**2, axis=1))
# Use the wavelet levels around this region
noise_wt = wt[indx:indx+1]
# smoothed data in 2^n length
new_y = wave.iuwt(noise_wt, wf, k)
# Background Removal
error = 10000
errdiff = 100
i = -1
iter_y_dict = {0:np.copy(new_y)}
bkg_approx_dict = {0:np.array([])}
while abs(errdiff)>=1*10**-24:
i += 1
# Wave transform
wt = wave.uwt(iter_y_dict[i], wf, k, levels)
# Assume last slice is lowest frequency (background approximation)
bkg_wt = wt[-3:-1]
bkg_approx_dict[i] = wave.iuwt(bkg_wt, wf, k)
# Get the error
errdiff = error - sum(iter_y_dict[i] - bkg_approx_dict[i])**2
error = sum(iter_y_dict[i] - bkg_approx_dict[i])**2
# Make every peak higher than bkg_wt
diff = (new_y - bkg_approx_dict[i])
peak_idxs_to_remove = np.where(diff>0.)[0]
iter_y_dict[i+1] = np.copy(new_y)
iter_y_dict[i+1][peak_idxs_to_remove] = np.copy(bkg_approx_dict[i])[peak_idxs_to_remove]
# new data without noise and background
new_y = new_y[orig_mask]
bkg_approx = bkg_approx_dict[len(bkg_approx_dict.keys())-1][orig_mask]
new_data = diff[orig_mask]
##############################################################
# plot the data and results
fig = plt.figure()
ax_raw_data = fig.add_subplot(121)
ax_WT = fig.add_subplot(122)
ax_raw_data.plot(xy['x'], xy['y'], 'g')
for bkg in bkg_approx_dict.values():
ax_raw_data.plot(xy['x'], bkg[orig_mask], 'k')
ax_WT.plot(xy['x'], new_data, 'y')
fig.tight_layout()
plt.show()
And here is the output I am getting now:
As you can see, there is still a problem with the background removal (it shifts to the right after each iteration), but it is a different question which I will address here.