Improving efficiency of creating multidimensional array from function - python

This will be a pretty basic question but I am a bit stuck on two things.
I have some data stored in a 2D array, let's just call it z. I have two separate 2D arrays, nxp and nyp that hold mapping information for every element in z. nxp and nyp therefore currently hold Cartesian co-ordinates and I want to transform this to polar co-ordinates.
Following this, I have defined polar to convert a given (x,y) to (r, theta) as:
import numpy as np
import math
def polar(x, y):
'''
Args:
x (double): x-coordinate.
y (double): y-coordinate.
Returns:
r, theta (in degrees).
'''
r = np.hypot(x, y)
theta = math.degrees(math.atan2(y, x))
return r, theta
But from this point on I think everything I am doing is a really bad approach to this problem. Ideally I would like to just feed in the Cartesian arrays and get back the polar arrays but this doesn't seem to work with my defined function (which is probably because I've defined input type as double implicitly but I was hoping python would be able to overload here).
r, theta = polar(nxp, nyp)
The traceback is:
.... in polar
theta = math.degrees(math.atan2(y,x))
TypeError: only size-1 arrays can be converted to Python scalars
So I am now implementing transforming everything to a 1D list and iterating to populate r and theta. E.g.
nxp_1D = nxp.ravel()
nyp_1D = nyp.ravel()
for counter, value in enumerate(nxp_1D):
r, theta = polar(value, nyp_1D[counter])
This exact implementation is faulty as it returns just a single value for r and theta, rather than populating a list of values.
More generally though I really don't like this approach for a few reasons. It looks to be a very heavy-handed solution to this problem. On top of this, I might want to do some contourf plots later and this would necessitate converting r and theta back to their original array shapes.
Is there a much easier and more efficient way for me to create the 2D arrays r and theta? Is it possible to create them either by changing my polar function definition or maybe by using list comprehension?
Thanks for any responses.

Yep, OK, so that was a very easy fix. Thank you to #user202729 and #Igor Raush. It was as simple as:
def polar(x, y)
r = np.hypot(x, y)
theta = np.arctan2(y, x)
return r, theta
.....
r, theta = polar(nxp, nyp)
Sorry for how daft that question was but thanks for your responses.

Related

Interpolate rows simultaneously in Python

I am trying to vectorize my code and have reached a roadblock. I have :
nxd array of x values [[x1],[...],[xn]] (where each row [x1] has many points [x11, ..., x1d]
nxd array of y values [[y1],[y2],[y3]] (where each row [y1] has many points [y11, ..., y1d]
nx1 array of x' values [[x'1],[...],[x'n]] that I would like to interpolate a y value for based on the corresponding row of x and y
The only thing I can think to use is a list comprehension like [np.interp(x'[i,:], x[i,:], y[i,:]) for i in range(n)]. I'd like a faster vectorized option if one exists. Thanks for the help!
This is hardly an answer, but I guess it may still be useful for someone (if not, feel free to delete this); and by the way,
I think I misunderstood your question at first. What you have is a collection of n different one-dimensional datasets or functions y(x) that you want to interpolate (correct me otherwise).
As such, it turns out doing this by multidimensional interpolation is a terrible approach.
The idea I thought is to add a new dimension to the data so your datasets are mapped into one single dataset in which this new dimension is what distinguishes between the different xi, where i=1,2,...,n. In other words, you assign a value in this new dimension, let's say, z, to every row of x; this way, different functions are correctly mapped to this higher-dimensional space.
However, this approach is slower than the np.interp list comprehension solution, at least one order of magnitude in my computer. I guess it has to do with two-dimensional interpolation algorithms being at best of order O(nlog(n)) (this is a guess); in this sense, it would seem more efficient to perform multiple interpolations to different datasets rather than one big interpolation.
Anyways, the approach is shown in the following snippet:
import numpy as np
from scipy.interpolate import LinearNDInterpolator
def vectorized_interpolation(x, y, xq):
"""
Vectorized option using LinearNDInterpolator
"""
# Dummy new data points in added dimension
z = np.arange(x.shape[0])
# We must repeat every z value for every row of x
interpolant = LinearNDInterpolator(list(zip(x.ravel(), np.repeat(z, x.shape[1]))), y.ravel())
return interpolant(xq, z)
def non_vectorized_interpolation(x, y, xq):
"""
Your non-vectorized solution
"""
return np.array([np.interp(xq[i], x[i], y[i]) for i in range(x.shape[0])])
if __name__ == "__main__":
n, d = 100, 500
x = np.linspace(0, 2*np.pi, n*d).reshape((n, d))
y = np.sin(x)
xq = np.linspace(0, 2*np.pi, n)
yq1 = vectorized_interpolation(x, y, xq)
yq2 = non_vectorized_interpolation(x, y, xq)
The only advantage of the vectorized solution is that LinearNDInterpolator (and some of the other scipy.interpolate functions) explicitly calculates the interpolant, so you can reuse it if you plan on interpolating the same datasets several times and avoid repetitive calculations. Another thing you could try is using multiprocessing if you have several cores in your machine, but this is not vectorizing which is what you asked for. Sorry I can't be of more help.

non uniform spacing, multivariate derivative with numpy.gradient

so I'm trying to get the second derivative of the following formula using numpy.gradient, and I'm trying to differentiate it once by S[:,0] and then by S[:,1]
S = np.random.multivariate_normal(mean, covariance, N)
formula = (S[:,0]**2) * (S[:,1]**2)
But the thing is when I use spacings as the second argument of numpy.gradient
dx = np.diff(S[:,0])
dy = np.diff(S[:,1])
dfdx = np.gradient(formula,dx)
I get the error saying
ValueError: when 1d, distances must match the length of the corresponding dimension
And I get that's because the spacings vector length is one element less than the formula's, but I didn't know what to do to fix that.
I've read somewhere also that you can have coordinates of the point rather than the spacing as the second argument, but when I tried checking the result out of that by differentiating the formula by S[:,0] and then by S[:,1], and then trying to differentiate it this time by S[:,0] and then by S[:,1], and comparing the two results, which should be similar; there was a huge difference between those two results.
Can anybody explain to me what I'm doing wrong here?
When introducing the vector of coordinates of values of your function using Numpy's gradient, you have to be careful to either introduce it as a list with as many arrays as dimensions of your function, or to specify at which axis (as an argument of gradient) you want to calculate the gradient.
When you checked both ways of differentiation, I think the problem is that your formula isn't actually two-dimensional, but one-dimensional (even though you use data from two variables, note your f array has only one dimension).
Take a look at this little script in which we verify that, indeed, the order of differentiation doesn't alter the result (assuming your function is well-behaved).
import numpy as np
# Dummy arrays and function
x = np.linspace(0,1,50)
y = np.linspace(0,2,50)
f = np.sin(2*np.pi*x[:,None]) * np.cos(2*np.pi*y)
dfdx = np.gradient(f, x, axis = 0)
df2dy = np.gradient(dfdx, y, axis = 1)
dfdy = np.gradient(f, y, axis = 1)
df2dx = np.gradient(dfdy, x, axis = 0)
# Check how many values are essentially different
print(np.sum(~np.isclose(df2dx, df2dy)))
Does this apply to your problem?

Any differences in 3d interpolation between MATLAB and Numpy/Scipy?

I'm a MATLAB user and I'm trying to translate some code in Python as an assignment. Since I noticed some differences between the two languages in 3d interpolation results from my original code, I am trying to address the issue by analysing a simple example.
I set a 2x2x2 matrix (named blocc below) with some values, and its coordinates in three vectors (X,Y,Z). Given a query point, I use 3D-linear interpolation to find the intepolated value. Again,I get different results in MATLAB and Python (code below).
Python
import numpy as np
import scipy.interpolate as si
X,Y,Z =(np.array([1, 2]),np.array([1, 2]),np.array([1, 2]))
a = np.ones((2,2,1))
b = np.ones((2,2,1))*2
blocc = np.concatenate((a,b),axis=2) # Matrix with values
blocc[1,0,0]=7
blocc[0,1,1]=7
qp = np.array([2,1.5,1.5]) #My query point
value=si.interpn((X,Y,Z),blocc,qp,'linear')
print(value)
Here I get value=3
MATLAB
blocc = zeros(2,2,2);
blocc(:,:,1) = ones(2,2);
blocc(:,:,2) = ones(2,2)*2;
blocc(2,1,1)=7;
blocc(1,2,2)=7;
X=[1,2];
Y=[1,2];
Z=[1,2];
qp = [2 1.5 1.5];
value=interp3(X,Y,Z,blocc,qp(1),qp(2),qp(3),'linear')
And here value=2.75
I can't understand why: I think there is something I don't get about how does interpolation and/or matrix indexing work in Python. Can you please make it clear for me? Thanks!
Apparently, for MATLAB when X, Y and Z are vectors, then it considers that the order of the dimensions in the values array is (Y, X, Z). From the documentation:
V — Sample values
array
Sample values, specified as a real or complex array. The size requirements for V depend on the size of X, Y, and Z:
If X, Y, and Z are arrays representing a full grid (in meshgrid format), then the size of V matches the size of X, Y, or Z .
If X, Y, and Z are grid vectors, then size(V) = [length(Y) length(X) length(Z)].
If V contains complex numbers, then interp3 interpolates the real and imaginary parts separately.
Example: rand(10,10,10)
Data Types: single | double
Complex Number Support: Yes
This means that, to get the same result in Python, you just need to swap the first and second values in the query:
qp = np.array([1.5, 2, 1.5])
f = si.interpn((X, Y, Z), blocc, qp, 'linear')
print(f)
# [2.75]

Why is the output of linspace and interp1d always the same?

So I was doing my assignment and we are required to use interpolation (linear interpolation) for the same. We have been asked to use the interp1d package from scipy.interpolate and use it to generate new y values given new x values and old coordinates (x1,y1) and (x2,y2).
To get new x coordinates (lets call this x_new) I used np.linspace between (x1,x2) and the new y coordinates (lets call this y_new) I found out using interp1d function on x_new.
However, I also noticed that applying np.linspace on (y1,y2) generates the exact same values of y_new which we got from interp1d on x_new.
Can anyone please explain to me why this is so? And if this is true, is it always true?
And if this is always true why do we at all need to use the interp1d function when we can use the np.linspace in it's place?
Here is the code I wrote:
import scipy.interpolate as ip
import numpy as np
x = [-1.5, 2.23]
y = [0.1, -11]
x_new = np.linspace(start=x[0], stop=x[-1], num=10)
print(x_new)
y_new = np.linspace(start=y[0], stop=y[-1], num=10)
print(y_new)
f = ip.interp1d(x, y)
y_new2 = f(x_new)
print(y_new2) # y_new2 values always the same as y_new
The reason why you stumbled upon this is that you only use two points for an interpolation of a linear function. You have as an input two different x values with corresponding y values. You then ask interp1d to find a linear function f(x)=m*x +b that fits best your input data. As you only have two points as input data, there is an exact solution, because a linear function is exactly defined by two points. To see this: take piece of paper, draw two dots an then think about how many straight lines you can draw to connect these dots.
The linear function that you get from two input points is defined by the parameters m=(y1-y2)/(x1-x2) and b=y1-m*x1, where (x1,y1),(x2,y2) are your two inputs points (or elements in your x and y arrays in your code snippet.
So, now what does np.linspace(start, stop, num,...) do? It gives you num evenly spaced points between start and stop. These points are start, start + delta, ..., end. The step width delta is given by delta=(end-start)/(num - 1). The -1 comes from the fact that you want to include your endpoint. So the nth point in your interval will lie at xn=x1+n*(x2-x1)/(num-1). At what y values will these points end up after we apply our linear function from interp1d? Lets plug it in:
f(xn)=m*xn+b=(y1-y2)/(x1-x2)*(x1+n/(num-1)*(x2-x1)) + y1-(y1-y1)/(x1-x2)*x1. Simplifying this results in f(xn)=(y2-y1)*n/(num - 1) + y1. And this is exactly what you get from np.linspace(y1,y2,num), i.e. f(xn)=yn!
Now, does this always work? No! We made use of the fact that our linear function is defined by the two endpoints of the intervals we use in np.linspace. So this will not work in general. Try to add one more x value and one more y value in your input list and then compare the results.

2d array for 2d function from 2 1d arrays (Python)

I am trying to make a 2D 5850x5850 array from two 1D arrays by putting them into this equation for a 2D gausian.
psf = 1/(2*np.pi*sigma_x*sigma_y) * np.exp(-(x**2/(2*sigma_x**2) + y**2/(2*sigma_y**2)))
However it gives back a 1D array, waht am i doing wrong?
If I understand your question correctly:
All you need to do is to alter shape of your arrays.
E.g.
x.shape=(5850,1) # now it is column array
y.shape=(1,5850) # now it is row array
Then you can proceed as in your original post. The result will be 5850 by 5850 array. Each row will correspond to different x and each column will correspond to different y.
However I would change few things in your code to make it look like that:
psf = 1/(2*np.pi*sigma_x*sigma_y) * np.exp(-(x*x/(2*sigma_x*sigma_x) + y*y/(2*sigma_y*sigma_y)))
Squaring values is usually inefficient (unless your complier translates it to multiplication, but in Python there is no complier to rely on). Squaring is much slower than multiplication. When you take a value to the power your computer needs to be ready that it might be negative or that it is not an integer. There is no such overhead when you multiply values.
Try:
for i in xrange(0,1000000):
z=i**2
for i in xrange(0,1000000):
z=i*i
Formar ran 0.975s on my machine whereas later only 0.267s.
It doesn't understand that x and y are to mean that for every x, you must do this for each y. If you can't find a library to create 2d functions/guassians more conveniently, try:
z = np.empty((len(x), len(y))
for idx, yval in enumerate(y):
z[:,idx] = f(x, yval)
Where f(x, yval) if you 2d function but where you have y, use yval. There's got to be more support for 2d function creation somewhere, maybe try scipy 2d guassian functions in a search?
The proper expression to make a 2d Gaussian would be
x = np.arange(0, size, 1, float)
y = x[:,np.newaxis]
x0 = y0 = 0 # your center
np.exp(-4*np.log(2) * ((x-x0)**2 + (y-y0)**2) / radius**2)

Categories

Resources