2d interpolators of scipy for scattered data going crazy - python

I want to find the derivatives of some scattered data. I have tried two different methods:
projecting the scattered data on a regular grid using scipy.interpolate.griddata, then computing the gradients with numpy.gradients, and then projecting values back to the scattered locations.
creating a CloughTocher2DInterpolater (but I have the same issue with others) and getting the gradients out of it
The second one is an order of magnitude faster than the first one but unfortunately, it also goes crazy quite quickly when data are a bit complex. For instance starting with this signal (called F and which is a simple addition of tanh stepwise functions along x and y):
When I process F using the two methods, I get:
Method 1 gives a good approximation. Method 2 is also good but I need force the colormap because of the existence of some extreme values.
Now, if I add a small noise (i.e. of amplitude 0.1 while the signal has amplitudes between -3 and 3), the interpolator just goes crazy giving very large extreme values:
I don't know how to deal with this. I understand the interpolator won't like irregular function or noise, but I was not expecting such discrepancy. My first idea was to smooth data first but strangely I can't find any method that would help me on this. Another idea would be to make a 2d fit of F to try to remove noise but I'm dry here too...any idea ?
Here is the corresponding python example (working on python3.6.9):
import numpy as np
from scipy import interpolate
import matplotlib.pyplot as plt
plt.interactive(True)
# scattered data
N = 200
coordu = np.random.rand(N**2,2)
Xu=coordu[:,0]
Yu=coordu[:,1]
noise = 0.
noise = np.random.rand(Xu.shape[0])*0.1
Zu=np.tanh((Xu-0.25)/0.01+(Yu-0.25)/0.001)+np.tanh((Xu-0.5)/0.01+(Yu-0.5)/0.001)+np.tanh((Xu-0.75)/0.001+(Yu-0.75)/0.001)+noise
plt.figure();plt.scatter(Xu,Yu,1,Zu)
plt.title('Data signal F')
#plt.savefig('signalF_noisy.png')
### get the gradient
# using griddata np.gradients
Xs,Ys=np.meshgrid(np.linspace(0,1,N),np.linspace(0,1,N))
coords = np.array([Xs,Ys]).T
Zs = interpolate.griddata(coordu,Zu,coords)
nearest = interpolate.griddata(coordu,Zu,coords,method='nearest')
znan = np.isnan(Zs)
Zs[znan] = nearest[znan]
dZs = np.gradient(Zs,np.min(np.diff(Xs[0,:])))
dZus = interpolate.griddata(coords.reshape(N*N,2),dZs[0].reshape(N*N),coordu)
hist_dzus = np.histogram(dZus,100)
plt.figure();plt.scatter(Xu,Yu,1,dZus)
plt.colorbar()
plt.clim([0 ,10])
plt.title('dF/dx using griddata and np.gradients')
#plt.savefig('dxF_griddata_noisy.png')
# using interpolation method Clough
interp = interpolate.CloughTocher2DInterpolator(coordu,Zu)
dZuCT = interp.grad
hist_dzct = np.histogram(dZuCT[:,0,0],100)
plt.figure();plt.scatter(Xu,Yu,1,dZuCT[:,0,0])
plt.colorbar()
plt.clim([0 ,10])
plt.title('dF/dx using CloughTocher2DInterpolator')
#plt.savefig('dxF_CT2D_noisy.png')
# histograms
plt.figure()
plt.semilogy(hist_dzus[1][:-1],hist_dzus[0],'.-')
plt.semilogy(hist_dzct[1][:-1],hist_dzct[0],'.-')
plt.title('histogram of dF/dx')
plt.legend(('griddata','ClouhTocher'))
#plt.savefig('dxF_hist_noisy.png')

Related

Python fastKDE beyond limits of data points

I'm trying to use the fastKDE package (https://pypi.python.org/pypi/fastkde/1.0.8) to find the KDE of a point in a 2D plot. However, I want to know the KDE beyond the limits of the data points, and cannot figure out how to do this.
Using the code listed on the site linked above;
#!python
import numpy as np
from fastkde import fastKDE
import pylab as PP
#Generate two random variables dataset (representing 100000 pairs of datapoints)
N = 2e5
var1 = 50*np.random.normal(size=N) + 0.1
var2 = 0.01*np.random.normal(size=N) - 300
#Do the self-consistent density estimate
myPDF,axes = fastKDE.pdf(var1,var2)
#Extract the axes from the axis list
v1,v2 = axes
#Plot contours of the PDF should be a set of concentric ellipsoids centered on
#(0.1, -300) Comparitively, the y axis range should be tiny and the x axis range
#should be large
PP.contour(v1,v2,myPDF)
PP.show()
I'm able to find the KDE for any point within the limits of the data, but how do I find the KDE for say the point (0,300), without having to include it into var1 and var2. I don't want the KDE to be calculated with this data point, I want to know the KDE at that point.
I guess what I really want to be able to do is give the fastKDE a histogram of the data, so that I can set its axes myself. I just don't know if this is possible?
Cheers
I, too, have been experimenting with this code and have run into the same issues. What I've done (in lieu of a good N-D extrapolator) is to build a KDTree (with scipy.spatial) from the grid points that fastKDE returns and find the nearest grid point to the point I was to evaluate. I then lookup the corresponding pdf value at that point (it should be small near the edge of the pdf grid if not identically zero) and assign that value accordingly.
I came across this post while searching for a solution of this problem. Similiar to the building of a KDTree you could just calculate your stepsize in every griddimension, and then get the index of your query point by just subtracting the point value with the beginning of your axis and divide by the stepsize of that dimension, finally round it off, turn it to integer and voila. So for example in 1D:
def fastkde_test(test_x):
kde, axes = fastKDE.pdf(test_x, numPoints=num_p)
x_step = (max(axes)-min(axes)) / len(axes)
x_ind = np.int32(np.round((test_x-min(axes)) / x_step))
return kde[x_ind]
where test_x in this case is both the set for defining the KDE and the query set. Doing it this way is marginally faster by a factor of 10 in my case (at least in 1D, higher dimensions not yet tested) and does basically the same thing as the KDTree query.
I hope this helps anyone coming across this problem in the future, as I just did.
Edit: if your querying points outside of the range over which the KDE was calculated this method of course can only give you the same result as the KDTree query, namely the corresponding border of your KDE-grid. You would however have to hardcode this by cutting the resulting x_ind at the highest index, i.e. `len(axes)-1'.

How to normalise plotted points and get a circle?

Given 2000 random points in a unit circle (using numpy.random.normal(0,1)), I want to normalize them such that the output is a circle, how do I do that?
I was requested to show my efforts. This is part of a larger question: Write a program that samples 2000 points uniformly from the circumference of a unit circle. Plot and show it is indeed picked from the circumference. To generate a point (x,y) from the circumference, sample (x,y) from std normal distribution and normalise them.
I'm almost certain my code isn't correct, but this is where I am up to. Any advice would be helpful.
This is the new updated code, but it still doesn't seem to be working.
import numpy as np
import matplotlib.pyplot as plot
def plot():
xy = np.random.normal(0,1,(2000,2))
for i in range(2000):
s=np.linalg.norm(xy[i,])
xy[i,]=xy[i,]/s
plot.plot(xy)
plot.show()
I think the problem is in
plot.plot(xy)
even if I use
plot.plot(xy[:,0],xy[:,1])
it doesn't work.
Connected lines are not a good visualization here. You essentially connect random points on the circle. Since you do this quite often, you will get a filled circle. Try drawing points instead.
Also avoid name space mangling. You import matplotlib.pyplot as plot and also name your function plot. This will lead to name conflicts.
import numpy as np
import matplotlib.pyplot as plt
def plot():
xy = np.random.normal(0,1,(2000,2))
for i in range(2000):
s=np.linalg.norm(xy[i,])
xy[i,]=xy[i,]/s
fig, ax = plt.subplots(figsize=(5,5))
# scatter draws dots instead of lines
ax.scatter(xy[:,0], xy[:,1])
If you use dots instead, you will see that your points indeed lie on the unit circle.
Your code has many problems:
Why using np.random.normal (a gaussian distribution) when the problem text is about uniform (flat) sampling?
To pick points on a circle you need to correlate x and y; i.e. randomly sampling x and y will not give a point on the circle as x**2+y**2 must be 1 (for example for the unit circle centered in (x=0, y=0)).
A couple of ways to get the second point is to either "project" a random point from [-1...1]x[-1...1] on the unit circle or to pick instead uniformly the angle and compute a point on that angle on the circle.
First of all, if you look at the documentation for numpy.random.normal (and, by the way, you could just use numpy.random.randn), it takes an optional size parameter, which lets you create as large of an array as you'd like. You can use this to get a large number of values at once. For example: xy = numpy.random.normal(0,1,(2000,2)) will give you all the values that you need.
At that point, you need to normalize them such that xy[:,0]**2 + xy[:,1]**2 == 1. This should be relatively trivial after computing what xy[:,0]**2 + xy[:,1]**2 is. Simply using norm on each dimension separately isn't going to work.
Usual boilerplate
import numpy as np
import matplotlib.pyplot as plt
generate the random sample with two rows, so that it's more convenient to refer to x's and y's
xy = np.random.normal(0,1,(2,2000))
normalize the random sample using a library function to compute the norm, axis=0 means consider the subarrays obtained varying the first array index, the result is a (2000) shaped array that can be broadcasted to xy /= to have points with unit norm, hence lying on the unit circle
xy /= np.linalg.norm(xy, axis=0)
Eventually, the plot... here the key is the add_subplot() method, and in particular the keyword argument aspect='equal' that requires that the scale from user units to output units it's the same for both axes
plt.figure().add_subplot(111, aspect='equal').scatter(xy[0], xy[1])
pt.show()
to have

Efficiently and accurately interpolate between finite-element node stress points in python

I'd like to interpolate some 3D finite-element stress field data from a bunch of known nodes at points where nodes don't exist. I realise that node stresses are already extrapolated from gauss points, but it is the best I can do with the data I have available. The image below gives a 2D representation. The red and pink points would represent locations where I'd like to interpolate the value.
Initially I thought I could find the smallest bounding box (hull) or simplex that contained the point of interest and no other known points. Visualising this in 2D I realised that this might lead to ignoring data from a close-by value, incorrectly. I was planning on using the scipy LindearNDInterpolator but I notice there is some unexpected behaviour, and I'm worried it will exclude nearby points in the way that I just described. Notice how the pink point would not reference from the green triangle but ignore the point outside the orange triangle, although it is probably more relevant.
As far as I can tell the best way is to take the nearest surrounding nodes, and interpolating by weighted averaging on distance. I'm not sure if there is something readily available or if it needs to be written. I'd imagine this is a fairly common problem so I'd presume the wheel has already been invented...
Actually my final goal is to interpolate/regress values for a 3D line through the set of points.
You can try Inverse distance weighting. Here is an example in 1D (easily generalizable to 3D):
from pylab import *
# imaginary samples
xmax=10
Npoints=10
x=0.1*randint(0,10*xmax,Npoints)
y=sin(2*x)+x
plot(x,y,ls="",marker="x",color="red",label="samples",ms=9,mew=2)
# interpolation
x2=linspace(0,xmax,150) # new sampling
def weight(x,x0,p): # modify this function in 3D
return 1/(((x-x0)**2)**(p/2)+0.00001) # 0.00001 to avoid infinity
y2=zeros_like(x2)
for p in range(1,4):
for i in range(len(y2)):
y2[i]=sum(y*weight(x,x2[i],p))/sum(weight(x,x2[i],p))
plot(x2,y2,label="Interpolation p="+str(p))
legend(loc=2)
show()
Here is the result
As you can see, it's not really fantastic. The best results are, I think, for p=2, but it will be different in 3D. I have obtained better curves with a gaussian weight, but have no theorical background for such a choice.
https://stackoverflow.com/a/36337428/2372254
The first answer here was helpful but the 1-D example shows that the approach actually does some strange things with p=1 (wildy different from the data) and with p=3 we get some weird plateaux.
I took a look at Radial Basis Functions which are implemented in SciPy, and modified JPG's code as follows.
Modified Code
from pylab import *
from scipy.interpolate import Rbf, InterpolatedUnivariateSpline
# imaginary samples
xmax=10
Npoints=10
x=0.1*randint(0,10*xmax,Npoints)
Rbf requires sorted lists:
x.sort()
y=sin(2*x)+x
plot(x,y,ls="",marker="x",color="red",label="samples",ms=9,mew=2)
# interpolation
x2=linspace(0,xmax,150) # new sampling
def weight(x,x0,p): # modify this function in 3D
return 1/(((x-x0)**2)**(p/2)+0.00001) # 0.00001 to avoid infinity
y2=zeros_like(x2)
for p in range(1,4):
for i in range(len(y2)):
y2[i]=sum(y*weight(x,x2[i],p))/sum(weight(x,x2[i],p))
plot(x2,y2,label="Interpolation p="+str(p))
yrbf = Rbf(x, y)
fi = yrbf(x2)
plot(x2, fi, label="Radial Basis Function")
ius = InterpolatedUnivariateSpline(x, y)
yius = ius(x2)
plot(x2, yius, label="Univariate Spline")
legend(loc=2)
show()
The results are interesting and probably more suitable to my intended usage. The following figure was produced.
But the RBF implementation in SciPy (google for alternatives) has a major problem when points are repeated - not likely in a real scenario - and goes completely ballistic:
When smoothed (smooth=0.1 was used) it goes normal again. This might show some programming weirdness.

Trying to plot a quadratic regression, getting multiple lines

I'm making a demonstration of a different types of regression in numpy with ipython, and so far, I've been able to plot a simple linear regression without difficulty. Now, when I go on to make a quadratic fit to my data and go to plot it, I don't get a quadratic curve but instead get many lines. Here's the code I'm running that generates the problem:
import numpy
from numpy import random
from matplotlib import pyplot as plt
import math
# Generate random data
X = random.random((100,1))
epsilon=random.randn(100,1)
f = 3+5*X+epsilon
# least squares system
A =numpy.array([numpy.ones((100,1)),X,X**2])
A = numpy.squeeze(A)
A = A.T
quadfit = numpy.linalg.solve(numpy.dot(A.transpose(),A),numpy.dot(A.transpose(),f))
# plot the data and the fitted parabola
qdbeta0,qdbeta1,qdbeta2 = quadfit[0][0],quadfit[1][0],quadfit[2][0]
plt.scatter(X,f)
plt.plot(X,qdbeta0+qdbeta1*X+qdbeta2*X**2)
plt.show()
What I get is this picture (zoomed in to show the problem):
You can see that rather than having a single parabola that fits the data, I have a huge number of individual lines doing something that I'm not sure of. Any help would be greatly appreciated.
Your X is ordered randomly, so it's not a good set of x values to use to draw one continuous line, because it has to double back on itself. You could sort it, I guess, but TBH I'd just make a new array of x coordinates and use those:
plt.scatter(X,f)
x = np.linspace(0, 1, 1000)
plt.plot(x,qdbeta0+qdbeta1*x+qdbeta2*x**2)
gives me

How do I limit the interpolation region in the InterpolatedUnivariateSpline in Python when given non-uniform samples?

I'm trying to get a nice upsampler using Python when I have non-uniform spaced inputs. Any suggestions would be helpful. I've tried a number of interp functions. Here's an example:
from scipy.interpolate import InterpolatedUnivariateSpline
from numpy import linspace, arange, append
from matplotlib.pyplot import plot
F=[0, 1000,1500,2000,2500,3000,3500,4000,4500,5000,5500,22050]
M=[0.,2.85,2.49,1.65,1.55,1.81,1.35,1.00,1.13,1.58,1.21,0.]
ff=linspace(F[0],F[1],10)
for i in arange(2, len(F)):
ff=append(ff,linspace(F[i-1],F[i], 10))
aa=InterpolatedUnivariateSpline(x=F,y=M,k=2);
mm=aa(ff)
plot(F,M,'r-o'); plot(ff,mm,'bo'); show()
This is the plot I get:
I need to get interpolated values that don't go below 0. Note that the blue dots go below zero. The red line represents the original F vs. M data. If I use k=1 (piece-wise linear interp) then I get good values as shown here:
aa=InterpolatedUnivariateSpline(x=F,y=M,k=1)
mm=aa(ff); plot(F,M,'r-o');plot(ff,mm,'bo'); show()
The problem is that I need to have a "smooth" interpolation and not the piece-wise value. Does anyone know if the bbox argument in InterpolatedUnivarientSpline helps to fix that? I cant find any documentation on what bbox does. Is there another easier way to accomplish this?
Thanks in advance for any help.
Positivity-preserving interpolation is hard (if it wasn't, there wouldn't be a bunch of papers written about it). The splines of low degree (2, 3) usually do pretty well in this regard, but your data has that large gap in it, and it happens to be at the end of data range, making things worse.
One solution is to do interpolation in two steps: first upsample the data by piecewise linear interpolation, then interpolate new data with a smooth spline (I'll use cubic spline below, though quadratic also works).
The gap_size array records how large each gap is, relative to the smallest one. In subsequent loop, uniformly spaced points are replaced in large gaps (those that are at least twice the size of smallest one). The result is F_new, a nearly-uniform better grid that still includes the original points. The corresponding M values for it are generated by a piecewise linear spline.
Subsequent cubic interpolation produces a smooth curve that stays positive.
F = [0, 1000,1500,2000,2500,3000,3500,4000,4500,5000,5500,22050]
M = [0.,2.85,2.49,1.65,1.55,1.81,1.35,1.00,1.13,1.58,1.21,0.]
gap_size = np.diff(F) // np.diff(F).min()
F_new = []
for i in range(len(F)-1):
F_new.extend(np.linspace(F[i], F[i+1], gap_size[i], endpoint=False))
F_new.append(F[-1])
pl_spline = InterpolatedUnivariateSpline(F, M, k=1);
M_new = pl_spline(F_new)
smooth_spline = InterpolatedUnivariateSpline(F_new, M_new, k=3)
ff = np.linspace(F[0], F[-1], 100)
plt.plot(F, M, 'ro')
plt.plot(ff, smooth_spline(ff), 'b')
plt.show()
Of course, no tricks can hide the truth that we don't know what happens between 5500 and 22050 (Hz, I presume), the nearly-linear part is just a placeholder.

Categories

Resources