I'm a MATLAB user and I'm trying to translate some code in Python as an assignment. Since I noticed some differences between the two languages in 3d interpolation results from my original code, I am trying to address the issue by analysing a simple example.
I set a 2x2x2 matrix (named blocc below) with some values, and its coordinates in three vectors (X,Y,Z). Given a query point, I use 3D-linear interpolation to find the intepolated value. Again,I get different results in MATLAB and Python (code below).
Python
import numpy as np
import scipy.interpolate as si
X,Y,Z =(np.array([1, 2]),np.array([1, 2]),np.array([1, 2]))
a = np.ones((2,2,1))
b = np.ones((2,2,1))*2
blocc = np.concatenate((a,b),axis=2) # Matrix with values
blocc[1,0,0]=7
blocc[0,1,1]=7
qp = np.array([2,1.5,1.5]) #My query point
value=si.interpn((X,Y,Z),blocc,qp,'linear')
print(value)
Here I get value=3
MATLAB
blocc = zeros(2,2,2);
blocc(:,:,1) = ones(2,2);
blocc(:,:,2) = ones(2,2)*2;
blocc(2,1,1)=7;
blocc(1,2,2)=7;
X=[1,2];
Y=[1,2];
Z=[1,2];
qp = [2 1.5 1.5];
value=interp3(X,Y,Z,blocc,qp(1),qp(2),qp(3),'linear')
And here value=2.75
I can't understand why: I think there is something I don't get about how does interpolation and/or matrix indexing work in Python. Can you please make it clear for me? Thanks!
Apparently, for MATLAB when X, Y and Z are vectors, then it considers that the order of the dimensions in the values array is (Y, X, Z). From the documentation:
V — Sample values
array
Sample values, specified as a real or complex array. The size requirements for V depend on the size of X, Y, and Z:
If X, Y, and Z are arrays representing a full grid (in meshgrid format), then the size of V matches the size of X, Y, or Z .
If X, Y, and Z are grid vectors, then size(V) = [length(Y) length(X) length(Z)].
If V contains complex numbers, then interp3 interpolates the real and imaginary parts separately.
Example: rand(10,10,10)
Data Types: single | double
Complex Number Support: Yes
This means that, to get the same result in Python, you just need to swap the first and second values in the query:
qp = np.array([1.5, 2, 1.5])
f = si.interpn((X, Y, Z), blocc, qp, 'linear')
print(f)
# [2.75]
Related
This will be a pretty basic question but I am a bit stuck on two things.
I have some data stored in a 2D array, let's just call it z. I have two separate 2D arrays, nxp and nyp that hold mapping information for every element in z. nxp and nyp therefore currently hold Cartesian co-ordinates and I want to transform this to polar co-ordinates.
Following this, I have defined polar to convert a given (x,y) to (r, theta) as:
import numpy as np
import math
def polar(x, y):
'''
Args:
x (double): x-coordinate.
y (double): y-coordinate.
Returns:
r, theta (in degrees).
'''
r = np.hypot(x, y)
theta = math.degrees(math.atan2(y, x))
return r, theta
But from this point on I think everything I am doing is a really bad approach to this problem. Ideally I would like to just feed in the Cartesian arrays and get back the polar arrays but this doesn't seem to work with my defined function (which is probably because I've defined input type as double implicitly but I was hoping python would be able to overload here).
r, theta = polar(nxp, nyp)
The traceback is:
.... in polar
theta = math.degrees(math.atan2(y,x))
TypeError: only size-1 arrays can be converted to Python scalars
So I am now implementing transforming everything to a 1D list and iterating to populate r and theta. E.g.
nxp_1D = nxp.ravel()
nyp_1D = nyp.ravel()
for counter, value in enumerate(nxp_1D):
r, theta = polar(value, nyp_1D[counter])
This exact implementation is faulty as it returns just a single value for r and theta, rather than populating a list of values.
More generally though I really don't like this approach for a few reasons. It looks to be a very heavy-handed solution to this problem. On top of this, I might want to do some contourf plots later and this would necessitate converting r and theta back to their original array shapes.
Is there a much easier and more efficient way for me to create the 2D arrays r and theta? Is it possible to create them either by changing my polar function definition or maybe by using list comprehension?
Thanks for any responses.
Yep, OK, so that was a very easy fix. Thank you to #user202729 and #Igor Raush. It was as simple as:
def polar(x, y)
r = np.hypot(x, y)
theta = np.arctan2(y, x)
return r, theta
.....
r, theta = polar(nxp, nyp)
Sorry for how daft that question was but thanks for your responses.
so I'm trying to get the second derivative of the following formula using numpy.gradient, and I'm trying to differentiate it once by S[:,0] and then by S[:,1]
S = np.random.multivariate_normal(mean, covariance, N)
formula = (S[:,0]**2) * (S[:,1]**2)
But the thing is when I use spacings as the second argument of numpy.gradient
dx = np.diff(S[:,0])
dy = np.diff(S[:,1])
dfdx = np.gradient(formula,dx)
I get the error saying
ValueError: when 1d, distances must match the length of the corresponding dimension
And I get that's because the spacings vector length is one element less than the formula's, but I didn't know what to do to fix that.
I've read somewhere also that you can have coordinates of the point rather than the spacing as the second argument, but when I tried checking the result out of that by differentiating the formula by S[:,0] and then by S[:,1], and then trying to differentiate it this time by S[:,0] and then by S[:,1], and comparing the two results, which should be similar; there was a huge difference between those two results.
Can anybody explain to me what I'm doing wrong here?
When introducing the vector of coordinates of values of your function using Numpy's gradient, you have to be careful to either introduce it as a list with as many arrays as dimensions of your function, or to specify at which axis (as an argument of gradient) you want to calculate the gradient.
When you checked both ways of differentiation, I think the problem is that your formula isn't actually two-dimensional, but one-dimensional (even though you use data from two variables, note your f array has only one dimension).
Take a look at this little script in which we verify that, indeed, the order of differentiation doesn't alter the result (assuming your function is well-behaved).
import numpy as np
# Dummy arrays and function
x = np.linspace(0,1,50)
y = np.linspace(0,2,50)
f = np.sin(2*np.pi*x[:,None]) * np.cos(2*np.pi*y)
dfdx = np.gradient(f, x, axis = 0)
df2dy = np.gradient(dfdx, y, axis = 1)
dfdy = np.gradient(f, y, axis = 1)
df2dx = np.gradient(dfdy, x, axis = 0)
# Check how many values are essentially different
print(np.sum(~np.isclose(df2dx, df2dy)))
Does this apply to your problem?
I have two arrays of the same length, say array x and array y. I want to find the value of y corresponding to x=0.56. This is not a value present in array x.
I would like python to find by itself the closest value larger than 0.56 (and its corresponding y value) and the closest value smaller than 0.56 (and its corresponding y value). Then simply interpolate to find the value of y when x 0.56.
This is easily done when I find the indices of the two x values and corresponding y values by myself and input them into Python (see following bit of code).
But is there any way for python to find the indices by itself?
#interpolation:
def effective_height(h1,h2,g1,g2):
return (h1 + (((0.56-g1)/(g2-g1))*(h2-h1)))
eff_alt1 = effective_height(x[12],x[13],y[12],y[13])
In this bit of code, I had to find the indices [12] and [13] corresponding to the closest smaller value to 0.56 and the closest larger value to 0.56.
Now I am looking for a similar technique where I would just tell python to interpolate between the two values of x for x=0.56 and print the corresponding value of y when x=0.56.
I have looked at scipy's interpolate but don't think it would help in this case, although further clarification on how I can use it in my case would be helpful too.
Does Numpy interp do what you want?:
import numpy as np
x = [0,1,2]
y = [2,3,4]
np.interp(0.56, x, y)
Out[81]: 2.56
Given your two arrays, x and y, you can do something like the following using SciPy.
from scipy.interpolate import InterpolatedUnivariateSpline
spline = InterpolatedUnivariateSpline(x, y, k=5)
spline(0.56)
The keyword k must be between 1 and 5, and controls the degree of the spline.
Example:
>>> x = range(10)
>>> y = range(0, 100, 10)
>>> spline = InterpolatedUnivariateSpline(x, y, k=5)
>>> spline(0.56)
array(5.6000000000000017)
I am trying to make a 2D 5850x5850 array from two 1D arrays by putting them into this equation for a 2D gausian.
psf = 1/(2*np.pi*sigma_x*sigma_y) * np.exp(-(x**2/(2*sigma_x**2) + y**2/(2*sigma_y**2)))
However it gives back a 1D array, waht am i doing wrong?
If I understand your question correctly:
All you need to do is to alter shape of your arrays.
E.g.
x.shape=(5850,1) # now it is column array
y.shape=(1,5850) # now it is row array
Then you can proceed as in your original post. The result will be 5850 by 5850 array. Each row will correspond to different x and each column will correspond to different y.
However I would change few things in your code to make it look like that:
psf = 1/(2*np.pi*sigma_x*sigma_y) * np.exp(-(x*x/(2*sigma_x*sigma_x) + y*y/(2*sigma_y*sigma_y)))
Squaring values is usually inefficient (unless your complier translates it to multiplication, but in Python there is no complier to rely on). Squaring is much slower than multiplication. When you take a value to the power your computer needs to be ready that it might be negative or that it is not an integer. There is no such overhead when you multiply values.
Try:
for i in xrange(0,1000000):
z=i**2
for i in xrange(0,1000000):
z=i*i
Formar ran 0.975s on my machine whereas later only 0.267s.
It doesn't understand that x and y are to mean that for every x, you must do this for each y. If you can't find a library to create 2d functions/guassians more conveniently, try:
z = np.empty((len(x), len(y))
for idx, yval in enumerate(y):
z[:,idx] = f(x, yval)
Where f(x, yval) if you 2d function but where you have y, use yval. There's got to be more support for 2d function creation somewhere, maybe try scipy 2d guassian functions in a search?
The proper expression to make a 2d Gaussian would be
x = np.arange(0, size, 1, float)
y = x[:,np.newaxis]
x0 = y0 = 0 # your center
np.exp(-4*np.log(2) * ((x-x0)**2 + (y-y0)**2) / radius**2)
I am interested in computing the power spectrum of a system of particles (~100,000) in 3D space with Python. What I have found so far is a group of functions in Numpy (fft,fftn,..) which compute the discrete Fourier transform, of which the square of the absolute value is the power spectrum. My question is a matter of how my data are being represented - and truthfully may be fairly simple to answer.
The data structure I have is an array which has a shape of (n,2), n being the number of particles I have, and each column representing either the x, y, and z coordinate of the n particles. The function I believe I should be using it the fftn() function, which takes the discrete Fourier transform of an n-dimensional array - but it says nothing about the format. How should the data be represented as a data structure to be fed into fftn?
Here is what I've tried so far to test the function:
import numpy as np
import random
import matplotlib.pyplot as plt
DATA = np.zeros((100,3))
for i in range(len(DATA)):
DATA[i,0] = random.uniform(-1,1)
DATA[i,1] = random.uniform(-1,1)
DATA[i,2] = random.uniform(-1,1)
FFT = np.fft.fftn(DATA)
PS = abs(FFT)**2
plt.plot(PS)
plt.show()
The array entitled DATA is a mock array, ultimately the thing which will be 100,000 by 3 in shape. The output of the code gives me something like:
As you can see, I think this is giving me three 1D power spectra (1 for each column of my data), but really I'd like a power spectrum as a function of radius.
Does anybody have any advice or alternative methods/packages they know of to compute the power spectrum (I'd even settle for the two point autocorrelation function).
It doesn't quite work the way you are setting it out...
You need a function, lets call it f(x, y, z), that describes the density of mass in space. In your case, you can consider the galaxies as point masses, so you will have a delta function centered at the location of each galaxy. It is for this function that you can calculate the three-dimensional autocorrelation, from which you could calculate the power spectrum.
If you want to use numpy to do that for you, you are first going to have to discretize your function. A possible mock example would be:
import numpy as np
import matplotlib.pyplot as plt
space = np.zeros((100, 100, 100), dtype=np.uint8)
x, y, z = np.random.randint(100, size=(3, 1000))
space[x, y, z] += 1
space_ps = np.abs(np.fft.fftn(space))
space_ps *= space_ps
space_ac = np.fft.ifftn(space_ps).real.round()
space_ac /= space_ac[0, 0, 0]
And now space_ac holds the three-dimensional autocorrelation function for the data set. This is not quite what you are after, and to get you one-dimensional correlation function you would have to average the values on spherical shells around the origin:
dist = np.minimum(np.arange(100), np.arange(100, 0, -1))
dist *= dist
dist_3d = np.sqrt(dist[:, None, None] + dist[:, None] + dist)
distances, _ = np.unique(dist_3d, return_inverse=True)
values = np.bincount(_, weights=space_ac.ravel()) / np.bincount(_)
plt.plot(distances[1:], values[1:])
There is another issue with doing things yourself this way: when you compute the power spectrum as above, mathematically is as if your three dimensional array wrapped around the borders, i.e. point [999, y, z] is a neighbour to [0, y, z]. So your autocorrelation could show two very distant galaxies as close neighbours. The simplest way to deal with this is by making your array twice as large along every dimension, padding with extra zeros, and then discarding the extra data.
Alternatively you could use scipy.ndimage.filters.correlate with mode='constant' to do all the dirty work for you.