Edge of a curve based out of numpy array - python

I am looking for some mathematical guidance, to help me find the index locations (red circles) of a curve as shown in the image below. The curve is just 1D numpy array. I tried scipy - gaussianfilter1d. I also tried np.gradient and I am not anywhere close to what I want to do. The gradient is abruptly changing, so a first order gradient should give what I am looking for. Then I realized the data is not smooth, and I tried smoothing by 'gaussianfilter1d'. Even then, I am unable to pick up where it changes. I have various types of these numpy arrays (same size, values ranging from 0 - 1), so the solution has to be applicable, and not dependent on the given data set. So I could not hardcode. Any thoughts would be much appreciated.
CSV file

First you get a smooth function from your data using scipy's UnivariateSpline. Then you plot the area where the absolute slope is say at least 1/4 of it's maximum.
from scipy.interpolate import UnivariateSpline
f= UnivariateSpline(np.arange(5500), y, k=3, s=0.3)
df = f.derivative()
plt.plot(x,f(x))
cond = np.abs(df(x)) > 0.25*np.max(np.abs(df(x)))
plt.scatter(x[cond],f(x[cond]), c='r')
Looks like what you are looking for is the first and last point of the marked ones. So you do
(x[cond].min(),f(x[cond].min()).item()), (x[cond].max(), f(x[cond].max()).item())
And your points are:
((1455, 0.20595740349084446), (4230, 0.1722999962943679))

Related

Have I applied the Fourier Transformation correctly to this Dataframe? [EXAFS X-Ray Absorption Dataframe]

I have a dataset with a signal and a 1/distance (Angstrom^-1) column.
This is the dataset (fourier.csv): https://pastebin.com/ucFekzc6
After applying these steps:
import pandas as pd
import numpy as np
from numpy.fft import fft
df = pd.read_csv (r'fourier.csv')
df.plot(x ='1/distance', y ='signal', kind = 'line')
I generated this plot:
To generate the Fast Fourier Transformation data, I used the numpy library for its fft function and I applied it like this:
df['signal_fft'] = fft(df['signal'])
df.plot(x ='1/distance', y ='signal_fft', kind = 'line')
Now the plot looks like this, with the FFT data plotted instead of the initial "signal" data:
What I hoped to generate is something like this (This signal is extremely similar to mine, yet yields a vastly different FFT picture):
Theory Signal before windowing:
Theory FFT:
As you can see, my initial plot looks somewhat similar to graphic (a), but my FFT plot doesn't look anywhere near as clear as graphic (b). I'm still using the 1/distance data for both horizontal axes, but I don't see anything wrong with it, only that it should be interpreted as distance (Angstrom) instead of 1/distance (1/Angstrom) in the FFT plot.
How should I apply FFT in order to get a result that resembles the theoretical FFT curve?
Here's another slide that shows a similar initial signal to mine and a yet again vastly different FFT:
Addendum: I have been asked to provide some additional information on this problem, so I hope this helps.
The origin of the dataset that I have linked is an XAS (X-Ray Absorption Spectroscopy) spectrum of iron oxide. Such an experimentally obtained spectrum looks similar to the one shown in the "Schematic of XAFS data processing" on the top left, i.e. absorbance [a.u.] plotted against the photon energy [eV]. Firstly I processed the spectrum (pre-edge baseline correction + post-edge normalization). Then I converted the data on the x-axis from energy E to wavenumber k (thus dimension 1/Angstrom) and cut off the signal at the L-edge jump, leaving only the signal in the post-edge EXAFS region, referred to as fine structure function χ(k). The mentioned dataset includes k^2 weighted χ(k) (to emphasize oscillations at large k). All of this is not entirely relevant as the only thing I want to do now is a Fourier transformation on this signal ( k^2 χ(k) vs. k). In theory, as we are dealing with photoelectrons and (back)scattering phenomena, the EXAFS region of the XAS spectrum can be approximated using a superposition of many sinusoidal waves such as described in this equation with f(k) being the amplitude and δ(k) the phase shift of the scattered wave.
The aim is to gain an understanding of the chemical environment and the coordination spheres around the absorbing atom. The goal of the Fourier transform is to obtain some sort of signal in dependence of the "radius" R [Angstrom], which could later on be correlated to e.g. an oxygen being in ~2 Angstrom distance to the Mn atom (see "Schematic of XAFS data processing" on the right).
I only want to be able to reproduce the theoretically expected output after the FFT. My main concern is to get rid of the weird output signal and produce something that in some way resembles a curve with somewhat distinct local maxima (as shown in the 4th picture).
I don't have a 100% solution for you, but here's part of the problem.
The fft function you're using assumes that your X values are equally spaced. I checked this assumption by taking the difference between each 1/distance value, and graphing it:
df['1/distance'].diff().plot()
(Y is the difference, X is the index in the dataframe.)
This is supposed to be a constant line.
In order to fix this, one solution is to resample the signal through linear interpolation so that the timestep is constant.
from scipy import interpolate
rs_df = df.drop_duplicates().copy() # Needed because 0 is present twice in dataset
x = rs_df['1/distance']
y = rs_df['signal']
flinear = interpolate.interp1d(x, y, kind='linear')
xnew = np.linspace(np.min(x), np.max(x), rs_df.index.size)
ylinear = flinear(xnew)
rs_df['signal'] = ylinear
rs_df['1/distance'] = xnew
df.plot(x ='1/distance', y ='signal', kind = 'line')
rs_df.plot(x ='1/distance', y ='signal', kind = 'line')
The new line looks visually identical, but has a constant timestep.
I still don't get your intended result from the FFT, so this is only a partial solution.
MCVE
We import required dependencies:
import numpy as np
import pandas as pd
from scipy import signal
import matplotlib.pyplot as plt
And we load your dataset:
raw = pd.read_csv("https://pastebin.com/raw/ucFekzc6", sep="\t",
names=["k", "wchi"], header=0)
We clean the dataset a bit as it contains duplicates and a problematic point with null wave number (or infinite distance) and ensure a zero mean signal:
raw = raw.drop_duplicates()
raw = raw.iloc[1:, :]
raw["wchi"] = raw["wchi"] - raw["wchi"].mean()
The signal is about:
As noticed by #NickODell, signal is not equally sampled which is a problem if you aim to perform FFT signal processing.
We can resample your signal to have equally spaced sampling:
N = 65536
k = np.linspace(raw["k"].min(), raw["k"].max(), N)
interpolant = interpolate.interp1d(raw["k"], raw["wchi"], kind="linear")
g = interpolant(k)
Notice for performance concerns FFT does split the signal with the null frequency component at the borders (that's why your FFT signal does not look as it is usually presented in books). This indeed can be corrected by using classic fftshift method or performing ad hoc indexing.
R = 2*np.pi*fft.fftfreq(N, np.diff(k)[0])[:N//2]
G = (1/N)*fft.fft(g)[0:N//2]
Mind the 2π factor which is involved in the units scaling of your transformation.
You also have mentioned a windowing (at least in a picture) that is not referenced anywhere. This kind of filtering may help a lot when performing signal processing as it filter out artifacts and unwanted noise. I leave it up to you.
Least Square Spectral Analysis
An alternative to process your signal is available since the advent of modern Linear Algebra. There is a way to estimate the periodogram of an irregular sampled signal by a method called Least Square Spectral Analysis.
You are looking for the square root of the periodogram of your signal and scipy does implement an easy way to compute it by the Lomb-Scargle method.
To do so, we simply create a frequency vector (in this case they are desired output distances) and perform the regression for each of those distances w.r.t. your signal:
Rhat = np.linspace(raw["R"].min(), raw["R"].max()*2, 5000)
Ghat = signal.lombscargle(raw["k"], raw["wchi"], freqs=Rhat, normalize=True)
Graphically it leads to:
Comparison
If we compare both methodology we can confirm that the major peaks definitely match.
LSSA gives a smoother curve but do not assume it to be more accurate as this is statistical smooth of an interpolated curve. Anyway it fit the bill for your requirement:
I only want to be able to reproduce the theoretically expected output
after the FFT. My main concern is to get rid of the weird output
signal and produce something that in some way resembles a curve with
somewhat distinct local maxima (as shown in the 4th picture).
Conclusions
I think you have enough information to process your signal either by resampling and using FFT or by using LSSA. Both method has advantages and drawbacks.
Of course this needs to be validated with well know cases. Why not to reproduce with the data of the experience of the paper you are working on to check out you can reconstruct figures you posted.
You also need to dig in the signal conditioning before performing post processing (resampling, windowing, filtering).

Function diverging at boundaries: Schrödinger 2D, explicit method

I'm trying to simulate the 2D Schrödinger equation using the explicit algorithm proposed by Askar and Cakmak (1977). I define a 100x100 grid with a complex function u+iv, null at the boundaries. The problem is, after just a few iterations the absolute value of the complex function explodes near the boundaries.
I post here the code so, if interested, you can check it:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D
#Initialization+meshgrid
Ntsteps=30
dx=0.1
dt=0.005
alpha=dt/(2*dx**2)
x=np.arange(0,10,dx)
y=np.arange(0,10,dx)
X,Y=np.meshgrid(x,y)
#Initial Gaussian wavepacket centered in (5,5)
vargaussx=1.
vargaussy=1.
kx=10
ky=10
upre=np.zeros((100,100))
ucopy=np.zeros((100,100))
u=(np.exp(-(X-5)**2/(2*vargaussx**2)-(Y-5)**2/(2*vargaussy**2))/(2*np.pi*(vargaussx*vargaussy)**2))*np.cos(kx*X+ky*Y)
vpre=np.zeros((100,100))
vcopy=np.zeros((100,100))
v=(np.exp(-(X-5)**2/(2*vargaussx**2)-(Y-5)**2/(2*vargaussy**2))/(2*np.pi*(vargaussx*vargaussy)**2))*np.sin(kx*X+ky*Y)
#For the simple scenario, null potential
V=np.zeros((100,100))
#Boundary conditions
u[0,:]=0
u[:,0]=0
u[99,:]=0
u[:,99]=0
v[0,:]=0
v[:,0]=0
v[99,:]=0
v[:,99]=0
#Evolution with Askar-Cakmak algorithm
for n in range(1,Ntsteps):
upre=np.copy(ucopy)
vpre=np.copy(vcopy)
ucopy=np.copy(u)
vcopy=np.copy(v)
#For the first iteration, simple Euler method: without this I cannot have the two steps backwards wavefunction at the second iteration
#I use ucopy to make sure that for example u[i,j] is calculated not using the already modified version of u[i-1,j] and u[i,j-1]
if(n==1):
upre=np.copy(ucopy)
vpre=np.copy(vcopy)
for i in range(1,len(x)-1):
for j in range(1,len(y)-1):
u[i,j]=upre[i,j]+2*((4*alpha+V[i,j]*dt)*vcopy[i,j]-alpha*(vcopy[i+1,j]+vcopy[i-1,j]+vcopy[i,j+1]+vcopy[i,j-1]))
v[i,j]=vpre[i,j]-2*((4*alpha+V[i,j]*dt)*ucopy[i,j]-alpha*(ucopy[i+1,j]+ucopy[i-1,j]+ucopy[i,j+1]+ucopy[i,j-1]))
#Calculate absolute value and plot
abspsi=np.sqrt(np.square(u)+np.square(v))
fig=plt.figure()
ax=fig.add_subplot(projection='3d')
surf=ax.plot_surface(X,Y,abspsi)
plt.show()
As you can see the code is extremely simple: I cannot see where this error is coming from (I don't think is a stability problem because alpha<1/2). Have you ever encountered anything similar in your past simulations?
I'd try setting your dt to a smaller value (e.g. 0.001) and increase the number of integration steps (e.g fivefold).
The wavefunction looks in shape also at Ntsteps=150 and well beyond when trying out your code with dt=0.001.
Checking integrals of the motion (e.g. kinetic energy here?) should also confirm that things are going OK (or not) for different choices of dt.

scipy interp1d extrapolation method

I am trying to extrapolate values from some endpoints as shown in the image below
extrapolated value illustration
I have tried using the scipy interp1d method as shown below
from scipy import interpolate
x = [1,2,3,4]
y = [0,1,2,0]
f = interpolate.interp1d(x,y,fill_value='extrapolate')
print(f(4.3))
output : -0.5999999999999996
Though this is correct, I also need a second extrapolated value which is the intersection of X on segment i=1.The estimated value i am expecting is ~ 3.3 as seen from the graph in the image above.But I need get this programmatically,I am hoping there should be a way of returning multiple values from interp1d(.....) or something. Any help will be much appreciated.Thanks in advance
If you want to extrapolate based all but the last pair of values, you can just build a second interpolator, using x[:-1], y[:-1])

vectorized interpolation on array with nans

I am trying to interpolate an image cube NDIM=(dim_frequ, dim_spaxel1, dim_spaxel1) along the frequency axis. The aim is to oversample the frequency space. The array may contain nans. It would, of course, be possible to run two for loops over the array but that's definitely too slow.
What I want in pseudo code:
import numpy as np
from scipy.interpolate import interp1d
dim_frequ, dim_spaxel1, dim_spaxel2 = 2559, 70, 70
cube = np.random.rand(dim_frequ, dim_spaxel1, dim_spaxel2)
cube.ravel()[np.random.choice(cube.size, 1000, replace=False)] = np.nan
wavelength = np.arange(1.31, 2.5894999999, 5e-4) # deltaf so that len(wavelength)==DIMfrequ
wavelength_over = np.arange(1.31, 2.5894999999, 5e-5)
cube_over = interp1d(wavelength, cube, axis=0, kind='quadratic', fill_value="extrapolate")(wavelength_over)
cube_over[np.isnan(cube_over)] # array([], dtype=float64)
I've tried np.interp which can only handle 1D data (?)
I've tried scipy.interpolate.interp1d which can in principle handle
arrays along a given axis, but returns nans (I assume because of the
nans in the array)
This actually works in the case the kind is = 'linear'. I'd actually like it a bit fancier though, as soon as I set kind to 'quadratic' it returns nans.
I've tried the scipy.interpolate.CubicSpline
which raises a ValueError again because of the nans.
Any ideas what else to try? I am quite free in terms of the type of the interpolation, but it shouldn't be too fancy, i.e. nothing crazier than a spline or a low order polynomial
So a couple of things.
First
This returns all nan because cube_over has no nan in it after the above
cube_over[np.isnan(cube_over)]
Since np.isnan(cube_over) is all False
Otherwise it appears to be interpolating everything in the wavelength_over array.
Second
scipy doesn't like nans (see the docs) Typical practice is to drop the nan's from your set of points to interpolate since it typically will not add any value to the interpolation function.
Although it appears to be working with you interp1d example above. I am guessing it is dropped them along the axis when it builds the interpolation function, but I am not sure.
Third
What value do you actually want to interpolate? I am not sure what your desired output / endpoint is. It appears that your code is working more or less as expected. When you are interpolating you wavelength_over array. Seeing as they are so similar (if not the same value as the wavelength array. I think you might benefit from a 2d interpolation method but again I do not have a good understanding of your goal.
See 2d interpolation options in scipy docs
Hope this helps.

How to get correct parameters for 2D Gaussian fit to an image with noise

I have multiple images of objects where the object of interest (one per image) is near the center of said image. However, it's not necessarily the brightest source in the image, because some of the images also include sources that I am not interested in, but happen to be brighter/more intense. There is also usually a considerable amount of noise.
I would like to fit Gaussians to these 2D numpy image arrays, but am unsure how to effectively do so with these bright sources I don't want. In the end, I'll be stacking all the images (median) with the source of interest centered, so these other bright sources will just disappear. Below is my attempted code so far, where data is a list of multiple 2D arrays (images).
import numpy as np
import scipy
def Gaussian2D(x, y, x_0, y_0, theta, sigma_x, sigma_y, amp):
a = np.cos(theta)**2/(2*sigma_x**2) + np.sin(theta)**2/(2*sigma_y**2)
b = -np.sin(2*theta)/(4*sigma_x**2) + np.sin(2*theta)/(4*sigma_y**2)
c = np.sin(theta)**2/(2*sigma_x**2) + np.cos(theta)**2/(2*sigma_y**2)
exp_term = a * (x-x_0)**2
exp_term += 2*b*(x-x_0)*(y-y_0)
exp_term += c * (y-y_0)**2
return amp * np.exp(-exp_term)
def GaussianFit(data):
for data_set in data:
y_0, x_0 = (np.shape(data_set)[0]//2, np.shape(data_set)[1]//2)
sigma_x, sigma_y = np.std(data_set, axis=1), np.std(data_set, axis=0)
fit = scipy.optimize.curve_fit(Gaussian2D(x, y, x_0, y_0, 0, sigma_x, sigma_y, amp), data_set)
return fit
I've never done function fitting in a code, so I feel pretty lost. My specific questions are:
How can I define my parameters correctly? Do I need to flatten by array to get the sigma parameters? Also, I noticed in some example code that people made x and y arrays with linspace, so I'm not sure if I need to do that, and I'm also not sure what to put for the amplitude.
How would I handle the fact that I have multiple bright sources per image but only want to fit for the one closest to the center? Can I somehow specify to look near the center of the image?
I will also need the coordinates of the center source after fitting. How can I do this will ensuring it doesn't give me coordinates of other sources instead?
Any other help or advice is also appreciated. Thank you!
You can do this using a Gaussian Mixture Model. I don't think there is a function in SciPy, but there is one in scikit-learn
Here is a tutorial on this.
(from my answer to this question)
Then just remove the unwanted distribution from the image and fit to it.
Or there is skimage's blob detection.
On fitting a 2d Gaussian, read here. To use this you have to flatten the array as scipy's curve_fit only takes a 1d array. But it works fine.
Another approach is described here. A fit function with already three Gaussians in it is used. This would work if you know that there are always three (or in your case two) peaks on the image.

Categories

Resources