Can I get first derivative for kernel density estimation in python?

Can I get first derivative for kernel density estimation in python? - python

I have a data as 2D array and I used gaussian_kde to make estimation for data distribution. Now, I want to get the first derivative for the resultant density estimator to get zero crossings. Is it possible to get it from estimated density ?. If so, is there any built-in function in Python that can help ?

Following the example in the documentation of the gaussian_kde, once you have the Z, or more generally, the estimation of your density in a X axis, you can calculate its derivatives using standard numpy functions:
diff = np.gradient(Z)
Note that np.gradient computes central differences. If you would like forward differences you could do something like:
diff = np.r_[Z[1:] - Z[:-1], 0]
To find the zero-crossings you can do:
sdiff = np.sign(diff)
zc = np.where(sdiff[:-1] != sdiff[1:])
You can extend the above for 2D as dy, dx = np.gradient(Z) with Z a 2D array. And then operate in both Y and X direction.

Related

Weighted 1D interpolation of cloud data point

I have a cloud of data points (x,y) that I would like to interpolate and smooth.
Currently, I am using scipy :
from scipy.interpolate import interp1d
from scipy.signal import savgol_filter
spl = interp1d(Cloud[:,1], Cloud[:,0]) # interpolation
x = np.linspace(Cloud[:,1].min(), Cloud[:,1].max(), 1000)
smoothed = savgol_filter(spl(x), 21, 1) #smoothing
This is working pretty well, except that I would like to give some weights to the data points given at interp1d. Any suggestion for another function that is handling this ?
Basically, I thought that I could just multiply the occurrence of each point of the cloud according to its weight, but that is not very optimized as it increases a lot the number of points to interpolate, and slows down the algorithm ..

The default interp1d uses linear interpolation, i.e., it simply computes a line between two points. A weighted interpolation does not make much sense mathematically in such scenario - there is only one way in euclidean space to make a straight line between two points.
Depending on your goal, you can look into other methods of interpolation, e.g., B-splines. Then you can use scipy's scipy.interpolate.splrep and set the w argument:
w - Strictly positive rank-1 array of weights the same length as x and y. The weights are used in computing the weighted least-squares spline fit. If the errors in the y values have standard-deviation given by the vector d, then w should be 1/d. Default is ones(len(x)).

What are other methods to find curvature of a polynomial apart from np.polyfit()

I am trying to find the curvature of a polynomial. X and Y are python lists of X and Y coordinates respectively. I use scipy.interpolate because I am able to see better curves in my image. But once I find the coefficients of a 2D polynomial and re-plot them back into the image, the replotted curve looks way too off.
How do I find accurate coefficients of a polynomial curve
interpolate = interpolate.interp1d(X, Y)
z = np.polyfit(X, interpolate(X),2) #coefficients
poly_y = [z[0]*x*x + z[1]*x + z[2] for x in X] #Recompute Y coordinates
plt.plot(X, poly_y)

Use np.polyval instead of list comprehension to calculate the polynomial values at the given coordinates. It's usually faster, and less error-prone than typing out the terms by hand. The result will be ndarray instead of Python list.
poly_y = np.polyval(z, X)
If the curve looks way too off, it could be that the given degree (2) is insufficient to accurately replicate your spline-interpolated points. You should as also plot the spline interpolation interpolate(X) to see if something didn't go wrong in interp1d.

Reconstructing curve from gradient

Suppose I have a curve, and then I estimate its gradient via finite differences by using np.gradient. Given an initial point x[0] and the gradient vector, how can I reconstruct the original curve? Mathematically I see its possible given this system of equations, but I'm not certain how to do it programmatically.
Here is a simple example of my problem, where I have sin(x) and I compute the numerical difference, which matches cos(x).
test = np.vectorize(np.sin)(x)
numerical_grad = np.gradient(test, 30./100)
analytical_grad = np.vectorize(np.cos)(x)
## Plot data.
ax.plot(test, label='data', marker='o')
ax.plot(numerical_grad, label='gradient')
ax.plot(analytical_grad, label='proof', alpha=0.5)
ax.legend();

I found how to do it, by using numpy's trapz function (trapezoidal rule integration).
Following up on the code I presented on the question, to reproduce the input array test, we do:
x = np.linspace(1, 30, 100)
integral = list()
for t in range(len(x)):
integral.append(test[0] + np.trapz(numerical_grad[:t+1], x[:t+1]))
The integral array then contains the results of the numerical integration.

You can restore initial curve using integration.
As life example: If you have function for position for 1D moving, you can get function for velocity as derivative (gradient)
v(t) = s(t)' = ds / dt
And having velocity, you can potentially get position (not all functions are integrable analytically - in this case numerical integration is used) with some unknown constant (shift) added - and with initial position you can restore exact value
s(T) = Integral[from 0 to T](v(t)dt) + s(0)

Power Spectrum and Autocorrelation of Data in Numpy

I am interested in computing the power spectrum of a system of particles (~100,000) in 3D space with Python. What I have found so far is a group of functions in Numpy (fft,fftn,..) which compute the discrete Fourier transform, of which the square of the absolute value is the power spectrum. My question is a matter of how my data are being represented - and truthfully may be fairly simple to answer.
The data structure I have is an array which has a shape of (n,2), n being the number of particles I have, and each column representing either the x, y, and z coordinate of the n particles. The function I believe I should be using it the fftn() function, which takes the discrete Fourier transform of an n-dimensional array - but it says nothing about the format. How should the data be represented as a data structure to be fed into fftn?
Here is what I've tried so far to test the function:
import numpy as np
import random
import matplotlib.pyplot as plt
DATA = np.zeros((100,3))
for i in range(len(DATA)):
DATA[i,0] = random.uniform(-1,1)
DATA[i,1] = random.uniform(-1,1)
DATA[i,2] = random.uniform(-1,1)
FFT = np.fft.fftn(DATA)
PS = abs(FFT)**2
plt.plot(PS)
plt.show()
The array entitled DATA is a mock array, ultimately the thing which will be 100,000 by 3 in shape. The output of the code gives me something like:
As you can see, I think this is giving me three 1D power spectra (1 for each column of my data), but really I'd like a power spectrum as a function of radius.
Does anybody have any advice or alternative methods/packages they know of to compute the power spectrum (I'd even settle for the two point autocorrelation function).

It doesn't quite work the way you are setting it out...
You need a function, lets call it f(x, y, z), that describes the density of mass in space. In your case, you can consider the galaxies as point masses, so you will have a delta function centered at the location of each galaxy. It is for this function that you can calculate the three-dimensional autocorrelation, from which you could calculate the power spectrum.
If you want to use numpy to do that for you, you are first going to have to discretize your function. A possible mock example would be:
import numpy as np
import matplotlib.pyplot as plt
space = np.zeros((100, 100, 100), dtype=np.uint8)
x, y, z = np.random.randint(100, size=(3, 1000))
space[x, y, z] += 1
space_ps = np.abs(np.fft.fftn(space))
space_ps *= space_ps
space_ac = np.fft.ifftn(space_ps).real.round()
space_ac /= space_ac[0, 0, 0]
And now space_ac holds the three-dimensional autocorrelation function for the data set. This is not quite what you are after, and to get you one-dimensional correlation function you would have to average the values on spherical shells around the origin:
dist = np.minimum(np.arange(100), np.arange(100, 0, -1))
dist *= dist
dist_3d = np.sqrt(dist[:, None, None] + dist[:, None] + dist)
distances, _ = np.unique(dist_3d, return_inverse=True)
values = np.bincount(_, weights=space_ac.ravel()) / np.bincount(_)
plt.plot(distances[1:], values[1:])
There is another issue with doing things yourself this way: when you compute the power spectrum as above, mathematically is as if your three dimensional array wrapped around the borders, i.e. point [999, y, z] is a neighbour to [0, y, z]. So your autocorrelation could show two very distant galaxies as close neighbours. The simplest way to deal with this is by making your array twice as large along every dimension, padding with extra zeros, and then discarding the extra data.
Alternatively you could use scipy.ndimage.filters.correlate with mode='constant' to do all the dirty work for you.

Calculating gradient in 3D

I have the following set of points in 3d-space and D'd like to calculate the gradient everywhere, i.e. have a vector field returned.
points = []
for i in np.linspace(-20,20,100):
for j in np.linspace(-20,20,100):
points.append([i,j,i**2+j**2])
points = np.array(points)
It's an elliptic paraboloid.
Using np.gradient(points),
http://docs.scipy.org/doc/numpy/reference/generated/numpy.gradient.html
I neither get the correct values nor the dimension I would expect. Can anyone give me a hint?

You are mixing together the indices and the values in 'points', so the gradient is giving you wrong results. Here is a better way to construct the points with numpy and calculate the gradient:
x, y = np.mgrid[-20:20:100j, -20:20:100j]
z = x**2 + y**2
grad = np.gradient(z)
The resulting gradient is a tuple with two arrays, one for the gradient on the first direction, another for the gradient on the second direction. Note that this gradient doesn't take into account the separation between points (ie, delta x and delta y), so to get the derivative you need to divide by it:
deriv = grad/(40./100.)
If you want to reconstruct your 'points' as before, you just need to do:
points = np.array([x.ravel(), y.ravel(), z.ravel()]).T
You may also be interested in numpy's diff function, that gives the discrete difference along a given axis.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Can I get first derivative for kernel density estimation in python? - python

Related

Weighted 1D interpolation of cloud data point

What are other methods to find curvature of a polynomial apart from np.polyfit()

Reconstructing curve from gradient

Power Spectrum and Autocorrelation of Data in Numpy

Calculating gradient in 3D

Categories

Resources