How to quickly calculate a 1D integral over an interpolated 2D array? - python

Given is a geometrical object, for simplification a semisphere with a certain radius. This is displayed as a 2D matrix with the Z data being the height. Assuming that I cut the object along any line, I want to calculate the area of the cut. My solution is to interpolate the semisphere using scipys RectBivariateSpline to accurately display it.
import numpy as np
import scipy.interpolate as intp
radius = 15.
gridsize = 0.5
spectrum = np.arange(-radius,radius+gridsize,gridsize)
X,Y = np.meshgrid(spectrum,spectrum)
Z = np.where(np.sqrt(X**2+Y**2)<=radius, np.sqrt(radius**2-np.sqrt(X**2+Y**2)**2), 0)
spline = intp.RectBivariateSpline(x = X[0,:], y = Y[:,0], z = Z)
#Example coordinates of the cut
x0 = -4.78
x = -6.73
y0 = -15.
y = 15.
However, the RectBivariateSpline only offers an area integral (which can be quickly checked by setting x0 = x or y0 = y). On the other hand the UnivariateSpline only takes in 1D array, which would only work if my cut happened to be along one specific vector of the matrix Z.
Since I want to perform this operation a few thousand times, I would need a comparably quick way to solve the integral (numerically or analytically doesn't matter as long as the error is somewhat negligible). Does anyone have an idea on how to do this?

It turned out, that, for my case, it was sufficient to sample the spline along my cut (using numpy's arange to gather equally spaced points) and then by integrating via the Simpson rule, which only requires a number of points with a sufficiently low distance (which can be controlled via arange's step parameter).

Related

What is the correct way to reproject a raster from a CRS to another using Python?

I have a raster of Land Cover data (specifically this one /eodata/auxdata/S2GLC/2017/S2GLC_T32TMS_2017 in https://finder.creodias.eu) that uses 'epsg:32632' as CRS. I want to reproject this raster on 'epsg:21781'. This is what the raster looks like when I open it with xarray.
fn = 'data/S2GLC_T32TMS_2017/S2GLC_T32TMS_2017.tif'
da = xr.open_rasterio(fn).sel(band=1, drop=True)
da
<xarray.DataArray (y: 10980, x: 10980)>
[120560400 values with dtype=uint8]
Coordinates:
* y (y) float64 5.2e+06 5.2e+06 5.2e+06 ... 5.09e+06 5.09e+06 5.09e+06
* x (x) float64 4e+05 4e+05 4e+05 ... 5.097e+05 5.097e+05 5.098e+05
Attributes:
transform: (10.0, 0.0, 399960.0, 0.0, -10.0, 5200020.0)
crs: +init=epsg:32632
res: (10.0, 10.0)
is_tiled: 0
nodatavals: (nan,)
scales: (1.0,)
offsets: (0.0,)
AREA_OR_POINT: Area
INTERLEAVE: BAND
My usual workflow was to transform all the point coordinates, create my destination grid and interpolate using nearest neighbors. Something that looks like this:
import numpy as np
import xarray as xr
import pyproj
from scipy.interpolate import griddata
y = da.y.values
x = da.x.values
xx, yy = np.meshgrid(x,y)
# (n,2) point coordinates in the original CRS
src_coords = np.column_stack([xx.flatten(), yy.flatten()])
transformer = pyproj.transformer.Transformer.from_crs('epsg:32632', 'epsg:21781')
xx, yy = transformer.transform(src_coords[:,0], src_coords[:,1])
# (n,2) point coordinates in the destination CRS, which are not on a regular grid
dst_coords = np.column_stack([xx.flatten(), yy.flatten()])
# I define my destination **regular** grid coordinates
x = np.linspace(620005,719995,10)
y = np.linspace(199995,100005,10)
xx, yy = np.meshgrid(x,y)
dst_grid = np.column_stack([xx.flatten(), yy.flatten()])
# I interpolate onto the grid
reprojected_array = griddata(
src_coords, da.values.flatten(), dst_coords, method='nearest'
).reshape(dst_shape)
Although this method is fairly transparent and (apparently) error-free, it can take very long when dealing with billions of points. Recently, I discovered rasterio's reproject function, and I was blown away by how fast it is. This is how I implemented it:
source = da.values
destination = np.zeros(dst_shape, np.int16)
res, aff = reproject(
source,
destination,
src_transform=src_transform, # affine transformation from original data
src_crs=src_crs,
dst_transform=dst_transform, # affine transformation that corresponds to the grid defined in the other approach
dst_crs=dst_crs,
resampling=Resampling.nearest) # using nearest neighbors just like with scope's griddata
Naturally I wanted to compare the results expecting them to be the same, but they were not, as you can see in the figure.
The resolution is 10 meters so the differences are not large, but after careful comparison with precise satellite data in the 'epsg:21781' coordinates, it looks like the old approach yields better results.
So my questions are:
why do these results differ?
is one approach better than the other? Are there specific conditions where one should prefer one or the other?
Griddata find nearest points in Euclidean distance,
on whatever map projection you give it.
Thus the nearest neighbors from a pipeline like
  4326 data points --> reproject --> nearest-Euclidean griddata
  query points
depend on the "reproject". Could you try +proj=sinu +lon_0= middle lon
for both data and query ?
What one really wants is a nearest-neighbor engine with great-circle distance,
not Euclidean distance.
The difference may be insignificant for small grids, or near the equator,
but less so in Finland -- cos 61° / cos 60° is ~ 97 %.
TL;DR
Is pyproj.transformer.Transformer.from_crs('epsg:32632', 'epsg:21781')
"correct" ? Don't know.
I see no test suite, and a couple of issues:
warp.reproject() generates the wrong result
roundtrip test \
"Nearest neighbor" is ill-defined / sensitive halfway between data points,
e.g. along the lines x or y = int + 0.5 on an int grid.
This is easy to test with KDTree.
xarray makes regular (Cartesian) grids easy, but afaik does not do
curvilinear (2d) grids.

Find two points/derivatives on curves between which the line is straight/constant

I'm plotting x and y points. This results in a curved line, the line is first bending and then after a point its straight and after some time it bends again. I want to retrieve those two points. Though x is linear and y is plotted against x but y is not linearly dependent on x.
I tried matplotlib for plotting and numpy polynomial functions, and am currently looking into splines, but it seems that for these y needs to be directly dependent on x.
Your data is noisy, so you can't use a simple numerical derivative. Instead, as you may have found already, you should fit it with a spline and then check the curvature of the spline.
Keying off this answer, you can fit a spline and calculate the second derivative (curvature) like this:
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import UnivariateSpline
x = file['n']
y = file['Ds/2']
y_spline = UnivariateSpline(x, y)
x_range = np.linspace(x[0], x[-1], 1000) # or could use x_range = x
y_spline_deriv = y_spl.derivative(n=2)
curvature = y_spline_deriv(x_range)
Then you can find the start and end of the straight region like this:
straight_points = np.where(curvature.abs() <= 0.1)[0] # pick your threshold
start_idx = straight_points[0]
end_idx = straight_points[-1]
start_x = x_range[start_idx]
end_x = x_range[end_idx]
Alternatively, if you're mainly interested in finding the flattest part of the curve (as shown in your graphic), you could try calculating the first derivative and then finding regions where the slope is within some small amount of the minimum slope anywhere in the data. In that case, just substitute y_spline_deriv = y_spl.derivative(n=1) in the code above.

Inverse FFT returns negative values when it should not

I have several points (x,y,z coordinates) in a 3D box with associated masses. I want to draw an histogram of the mass-density that is found in spheres of a given radius R.
I have written a code that, providing I did not make any errors which I think I may have, works in the following way:
My "real" data is something huge thus I wrote a little code to generate non overlapping points randomly with arbitrary mass in a box.
I compute a 3D histogram (weighted by mass) with a binning about 10 times smaller than the radius of my spheres.
I take the FFT of my histogram, compute the wave-modes (kx, ky and kz) and use them to multiply my histogram in Fourier space by the analytic expression of the 3D top-hat window (sphere filtering) function in Fourier space.
I inverse FFT my newly computed grid.
Thus drawing a 1D-histogram of the values on each bin would give me what I want.
My issue is the following: given what I do there should not be any negative values in my inverted FFT grid (step 4), but I get some, and with values much higher that the numerical error.
If I run my code on a small box (300x300x300 cm3 and the points of separated by at least 1 cm) I do not get the issue. I do get it for 600x600x600 cm3 though.
If I set all the masses to 0, thus working on an empty grid, I do get back my 0 without any noted issues.
I here give my code in a full block so that it is easily copied.
import numpy as np
import matplotlib.pyplot as plt
import random
from numba import njit
# 1. Generate a bunch of points with masses from 1 to 3 separated by a radius of 1 cm
radius = 1
rangeX = (0, 100)
rangeY = (0, 100)
rangeZ = (0, 100)
rangem = (1,3)
qty = 20000 # or however many points you want
# Generate a set of all points within 1 of the origin, to be used as offsets later
deltas = set()
for x in range(-radius, radius+1):
for y in range(-radius, radius+1):
for z in range(-radius, radius+1):
if x*x + y*y + z*z<= radius*radius:
deltas.add((x,y,z))
X = []
Y = []
Z = []
M = []
excluded = set()
for i in range(qty):
x = random.randrange(*rangeX)
y = random.randrange(*rangeY)
z = random.randrange(*rangeZ)
m = random.uniform(*rangem)
if (x,y,z) in excluded: continue
X.append(x)
Y.append(y)
Z.append(z)
M.append(m)
excluded.update((x+dx, y+dy, z+dz) for (dx,dy,dz) in deltas)
print("There is ",len(X)," points in the box")
# Compute the 3D histogram
a = np.vstack((X, Y, Z)).T
b = 200
H, edges = np.histogramdd(a, weights=M, bins = b)
# Compute the FFT of the grid
Fh = np.fft.fftn(H, axes=(-3,-2, -1))
# Compute the different wave-modes
kx = 2*np.pi*np.fft.fftfreq(len(edges[0][:-1]))*len(edges[0][:-1])/(np.amax(X)-np.amin(X))
ky = 2*np.pi*np.fft.fftfreq(len(edges[1][:-1]))*len(edges[1][:-1])/(np.amax(Y)-np.amin(Y))
kz = 2*np.pi*np.fft.fftfreq(len(edges[2][:-1]))*len(edges[2][:-1])/(np.amax(Z)-np.amin(Z))
# I create a matrix containing the values of the filter in each point of the grid in Fourier space
R = 5
Kh = np.empty((len(kx),len(ky),len(kz)))
#njit(parallel=True)
def func_njit(kx, ky, kz, Kh):
for i in range(len(kx)):
for j in range(len(ky)):
for k in range(len(kz)):
if np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2) != 0:
Kh[i][j][k] = (np.sin((np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2))*R)-(np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2))*R*np.cos((np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2))*R))*3/((np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2))*R)**3
else:
Kh[i][j][k] = 1
return Kh
Kh = func_njit(kx, ky, kz, Kh)
# I multiply each point of my grid by the associated value of the filter (multiplication in Fourier space = convolution in real space)
Gh = np.multiply(Fh, Kh)
# I take the inverse FFT of my filtered grid. I take the real part to get back floats but there should only be zeros for the imaginary part.
Density = np.real(np.fft.ifftn(Gh,axes=(-3,-2, -1)))
# Here it shows if there are negative values the magnitude of the error
print(np.min(Density))
D = Density.flatten()
N = np.mean(D)
# I then compute the histogram I want
hist, bins = np.histogram(D/N, bins='auto', density=True)
bin_centers = (bins[1:]+bins[:-1])*0.5
plt.plot(bin_centers, hist)
plt.xlabel('rho/rhom')
plt.ylabel('P(rho)')
plt.show()
Do you know why I'm getting these negative values? Do you think there is a simpler way to proceed?
Sorry if this is a very long post, I tried to make it very clear and will edit it with your comments, thanks a lot!
-EDIT-
A follow-up question on the issue can be found [here].1
The filter you create in the frequency domain is only an approximation to the filter you want to create. The problem is that we are dealing with the DFT here, not the continuous-domain FT (with its infinite frequencies). The Fourier transform of a ball is indeed the function you describe, however this function is infinitely large -- it is not band-limited!
By sampling this function only within a window, you are effectively multiplying it with an ideal low-pass filter (the rectangle of the domain). This low-pass filter, in the spatial domain, has negative values. Therefore, the filter you create also has negative values in the spatial domain.
This is a slice through the origin of the inverse transform of Kh (after I applied fftshift to move the origin to the middle of the image, for better display):
As you can tell here, there is some ringing that leads to negative values.
One way to overcome this ringing is to apply a windowing function in the frequency domain. Another option is to generate a ball in the spatial domain, and compute its Fourier transform. This second option would be the simplest to achieve. Do remember that the kernel in the spatial domain must also have the origin at the top-left pixel to obtain a correct FFT.
A windowing function is typically applied in the spatial domain to avoid issues with the image border when computing the FFT. Here, I propose to apply such a window in the frequency domain to avoid similar issues when computing the IFFT. Note, however, that this will always further reduce the bandwidth of the kernel (the windowing function would work as a low-pass filter after all), and therefore yield a smoother transition of foreground to background in the spatial domain (i.e. the spatial domain kernel will not have as sharp a transition as you might like). The best known windowing functions are Hamming and Hann windows, but there are many others worth trying out.
Unsolicited advice:
I simplified your code to compute Kh to the following:
kr = np.sqrt(kx[:,None,None]**2 + ky[None,:,None]**2 + kz[None,None,:]**2)
kr *= R
Kh = (np.sin(kr)-kr*np.cos(kr))*3/(kr)**3
Kh[0,0,0] = 1
I find this easier to read than the nested loops. It should also be significantly faster, and avoid the need for njit. Note that you were computing the same distance (what I call kr here) 5 times. Factoring out such computation is not only faster, but yields more readable code.
Just a guess:
Where do you get the idea that the imaginary part MUST be zero? Have you ever tried to take the absolute values (sqrt(re^2 + im^2)) and forget about the phase instead of just taking the real part? Just something that came to my mind.

Higher order local interpolation of implicit curves in Python

Given a set of points describing some trajectory in the 2D plane, I would like to provide a smooth representation of this trajectory with local high order interpolation.
For instance, say we define a circle in 2D with 11 points in the figure below. I would like to add points in between each consecutive pair of points in order or produce a smooth trace. Adding points on every segment is easy enough, but it produces slope discontinuities typical for a "local linear interpolation". Of course it is not an interpolation in the classical sense, because
the function can have multiple y values for a given x
simply adding more points on the trajectory would be fine (no continuous representation is needed).
so I'm not sure what would be the proper vocabulary for this.
The code to produce this figure can be found below. The linear interpolation is performed with the lin_refine_implicit function. I'm looking for a higher order solution to produce a smooth trace and I was wondering if there is a way of achieving it with classical functions in Scipy? I have tried to use various 1D interpolations from scipy.interpolate without much success (again because of multiple y values for a given x).
The end goals is to use this method to provide a smooth GPS trajectory from discrete measurements, so I would think this should have a classical solution somewhere.
import numpy as np
import matplotlib.pyplot as plt
def lin_refine_implicit(x, n):
"""
Given a 2D ndarray (npt, m) of npt coordinates in m dimension, insert 2**(n-1) additional points on each trajectory segment
Returns an (npt*2**(n-1), m) ndarray
"""
if n > 1:
m = 0.5*(x[:-1] + x[1:])
if x.ndim == 2:
msize = (x.shape[0] + m.shape[0], x.shape[1])
else:
raise NotImplementedError
x_new = np.empty(msize, dtype=x.dtype)
x_new[0::2] = x
x_new[1::2] = m
return lin_refine_implicit(x_new, n-1)
elif n == 1:
return x
else:
raise ValueError
n = 11
r = np.arange(0, 2*np.pi, 2*np.pi/n)
x = 0.9*np.cos(r)
y = 0.9*np.sin(r)
xy = np.vstack((x, y)).T
xy_highres_lin = lin_refine_implicit(xy, n=3)
plt.plot(xy[:,0], xy[:,1], 'ob', ms=15.0, label='original data')
plt.plot(xy_highres_lin[:,0], xy_highres_lin[:,1], 'dr', ms=10.0, label='linear local interpolation')
plt.legend(loc='best')
plt.plot(x, y, '--k')
plt.xlabel('X')
plt.ylabel('Y')
plt.title('GPS trajectory')
plt.show()
This is called parametric interpolation.
scipy.interpolate.splprep provides spline approximations for such curves. This assumes you know the order in which the points are on the curve.
If you don't know which point comes after which on the curve, the problem becomes more difficult. I think in this case, the problem is called manifold learning, and some of the algorithms in scikit-learn may be helpful in that.
I would suggest you try to transform your cartesian coordinates into polar coordinates, that should allow you to use the standard scipy.interpolation without issues as you won't have the ambiguity of the x->y mapping anymore.

2d interpolation in python with random spot

I checked the available interpolation method in scipy, but could not get the proper solution for my case.
assume i have 100 points whose coordinates are random,
e.g., their x and y positions are:
x=np.random.rand(100)*100
y=np.random.rand(100)*100
z = f(x,y) #the point value calculated by certain function
now i want to get the point value z of a new evenly sampled coordinates (xnew and y new)
xnew = range(100)
ynew = range(100)
how should i do this using bilinear sampling?
i know it is possible to do it point by point, e.g., find the 4 nearest random points, and do the interpolation, but there got to be some easier existing functions to do this
thanks alot!
Use scipy.interpolate.griddata. It does the exact thing you need
# griddata expects an ndarray for the interpolant coordinates
interpolants = numpy.array([xnew, ynew])
# defaults to linear interpolation
znew = scipy.interpolate.griddata((x, y), z, interpolants)
http://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.griddata.html#scipy.interpolate.griddata

Categories

Resources