Updated: Apply (vectorized) function on each cell to interpolate grid - python

I've got a question. I've used these SO threads this, this and that to get where I'm now.
I've got a DEM file and coordinates + data from weather-stations. Now, I would like to interpolate the air temperature data following the GIDS model (Model 12 in this article) using my DEM. For selection of stations I want to use the 8 nearest neighbors using KDTree.
In short, (I think) I want to evaluate a function at every cell using the coordinates and elevation of my DEM.
I've developed a working function which uses x,y as input to evaluate each value of my grid. See details my IPython Notebook.
But now for a whole numpy array. I somehow understand that I have to vectorize my function so that I can apply it on a Numpy array instead of using a double loop. See my simplified code to evaluate my function on the array using both for-loop and a trial with a vectorized function using a numpy meshgrid. Is this the way forward?
>>> data = [[0.8,0.7,5,25],[2.1,0.71,6,35],[0.75,2.2,8,20],[2.2,2.1,4,18]]
>>> columns = ['Long', 'Lat', 'H', 'T']
>>> df = pd.DataFrame(data, columns=columns)
>>> tree = KDTree(zip(df.ix[:,0],df.ix[:,1]), leafsize=10)
>>> dem = np.array([[5,7,6],[7,9,7],[8,7,4]])
>>> print 'Ground points\n', df
Ground points
Long Lat H T
0 0.80 0.70 5 25
1 2.10 0.71 6 35
2 0.75 2.20 8 20
3 2.20 2.10 4 18
>>> print 'Grid to evaluate\n', dem
Grid to evaluate
[[5 7 6]
[7 9 7]
[8 7 4]]
>>> def f(x,y):
... [see IPython Notebook for details]
... return m( sum((p((d(1,di[:,0])),2)))**-1 ,
... sum(m(tp+(m(b1,(s(pix.ix[0,0],longp))) + m(b2,(s(pix.ix[0,1],latp))) + m(b3,(s(pix.ix[0,2],hp)))), (p((d(1,di[:,0])),2)))) )
...
>>> #Double for-loop
...
>>> tp = np.zeros([dem.shape[0],dem.shape[1]])
>>> for x in range(dem.shape[0]):
... for y in range(dem.shape[1]):
... tp[x][y] = f(x,y)
...
>>> print 'T predicted\n', tp
T predicted
[[ 24.0015287 18.54595636 19.60427132]
[ 28.90354881 20.72871172 17.35098489]
[ 54.69499782 43.79200925 15.33702417]]
>>> # Evaluation of vectorized function using meshgrid
...
>>> x = np.arange(0,3,1)
>>> y = np.arange(0,3,1)
>>> xx, yy = np.meshgrid(x,y, sparse=True)
>>> f_vec = np.vectorize(f) # vectorization of function f
>>> tp_vec = f_vec(xx,yy).T
>>> print 'meshgrid\nx\n', xx,'\ny\n',yy
meshgrid
x
[[0 1 2]]
y
[[0]
[1]
[2]]
>>> print 'T predicted using vectorized function\n', tp_vec
T predicted using vectorized function
[[ 24.0015287 18.54595636 19.60427132]
[ 28.90354881 20.72871172 17.35098489]
[ 54.69499782 43.79200925 15.33702417]]
EDIT
I used %%timeit to check on real data with grid with size of 100,100 and results were as follow:
#double loop
for x in range(100):
for y in range(100):
tp[x][y] = f(x,y)
1 loops, best of 3: 29.6 s per loop
#vectorized
tp_vec = f_vec(xx,yy).T
1 loops, best of 3: 29.5 s per loop
Both not so great..

If using vectorized function for a grid, try building a meshgrid with the shape of the dependent array. Use the components derived from the meshgrid to evaluate each grid cell using the vectorized function. Something like this
def f(x,y):
'...some code...'
single_value = array[x,y] # = dependent array (e.g. DEM)
'...some code...'
return z
x = np.arange(array.shape[0])
y = np.arange(array.shape[1])
xx, yy = np.meshgrid(x,y, sparse=True)
f_vec = np.vectorize(f) # vectorization of function f
tp_vec = f_vec(xx,yy).T

Related

Multiplication for a 3d array and slicing

I have a matrix of size 5 x 98 x 3. I want to find the transpose of each block of 98 x 3 and multiply it with itself to find the standard deviation.
Hence, I want my final answer to be of the size 5 x 3 x 3.
What would be an efficient way of doing this using numpy.
I can currently do this using the following code:
MU.shape[0] = 5
rows = 98
SIGMA = []
for i in np.arange(MU.shape[0]):
SIGMA.append([])
SIGMA[i] = np.matmul(np.transpose(diff[i]),diff[i])
SIGMA = np.array(SIGMA)
SIGMA = SIGMA/rows
Here diff is of the size 5 x 98 x 3.
Use np.einsum to sum-reduce the last axes off against each other -
SIGMA = np.einsum('ijk,ijl->ikl',diff,diff)
SIGMA = SIGMA/rows
Use optimize flag with True value in np.einsum to leverage BLAS.
We can also use np.matmul to get those sum-reductions -
SIGMA = np.matmul(diff.swapaxes(1,2),diff)
You can use this:
my_result = arr1.swapaxes(1,2) # arr1
Testing it out:
import numpy as np
NINETY_EIGHT = 10
arr1 = np.arange(5*NINETY_EIGHT*3).reshape(5,NINETY_EIGHT,3)
my_result = arr1.swapaxes(1,2) # arr1
print (my_result.shape)
Output:
(5, 3, 3)

Python: Counting Points from 2 1D Arrays

I have a data set of 2 1D arrays. My goal is to count the points in each section of a grid (with a size of my choosing).
plt.figure(figsize=(8,7))
np.random.seed(5)
x = np.random.random(100)
y = np.random.random(100)
plt.plot(x,y,'bo')
plt.grid(True)
My Plot
I would like to be able to split each section into is own unique set of 2 1D or 1 2D arrays.
import numpy as np
def split(arr, cond):
return [arr[cond], arr[~cond]]
a = np.array([1,3,5,7,2,4,6,8])
print split(a, a<5)
this will return a list of two arrays containing [1,2,3,4] and [5,6,7,8].
Try using this function based on the conditions you set (intervals of 0.2 it seems)
NOTE: to implement this correctly for your problem, you'll have to modify the split function seeing that you want to split the data into more than two sections. I'll leave that as an exercise for you to do :)
This function takes in two 1D arrays and returns a 2D matrix, in which each element is the number of points in the grid section corresponding to your image:
import numpy as np
def count_points(arr1, arr2, bin_width):
x = np.floor(arr1/bin_width).astype(int) # Bin number for each value
y = np.floor(arr2/bin_width).astype(int) # Bin number for each value
counts = np.zeros(shape=(max(x)+1, max(y)+1), dtype=int)
for i in range(x.shape[0]):
row = max(y) - y[i]
col = x[i]
counts[row, col] += 1
return counts
Note that x and y don't line up with the column and row index, since the origin is at the bottom left in the plot but the "origin" (index [0,0]`) of the matrix is the top left. I rearranged the matrix so that the elements line up with what you see in the photo.
Example:
np.random.seed(0)
x = np.random.random(100)
y = np.random.random(100)
print count_points(x, y, 0.2) # 0.2 matches the default gridlines in matplotlib
# Output:
#[[8 4 5 4 0]
# [2 5 5 7 4]
# [7 1 3 8 3]
# [4 2 5 3 4]
# [4 4 3 1 4]]
Which matches the counts here:

Python two arrays, get all points within radius

I have two arrays, lets say x and y that contain a few thousand datapoints.
Plotting a scatterplot gives a beautiful representation of them. Now I'd like to select all points within a certain radius. For example r=10
I tried this, but it does not work, as it's not a grid.
x = [1,2,4,5,7,8,....]
y = [-1,4,8,-1,11,17,....]
RAdeccircle = x**2+y**2
r = 10
regstars = np.where(RAdeccircle < r**2)
This is not the same as an nxn array, and RAdeccircle = x**2+y**2 does not seem to work as it does not try all permutations.
You can only perform ** on a numpy array, But in your case you are using lists, and using ** on a list returns an error,so you first need to convert the list to numpy array using np.array()
import numpy as np
x = np.array([1,2,4,5,7,8])
y = np.array([-1,4,8,-1,11,17])
RAdeccircle = x**2+y**2
print RAdeccircle
r = 10
regstars = np.where(RAdeccircle < r**2)
print regstars
>>> [ 2 20 80 26 170 353]
>>> (array([0, 1, 2, 3], dtype=int64),)

Using scipy's kmeans2 function in python

I found this example for using kmeans2 algorithm in python. I can't get the following part
# make some z vlues
z = numpy.sin(xy[:,1]-0.2*xy[:,1])
# whiten them
z = whiten(z)
# let scipy do its magic (k==3 groups)
res, idx = kmeans2(numpy.array(zip(xy[:,0],xy[:,1],z)),3)
The points are zip(xy[:,0],xy[:,1]), so what is the third value z doing here?
Also what is whitening?
Any explanation is appreciated. Thanks.
First:
# make some z vlues
z = numpy.sin(xy[:,1]-0.2*xy[:,1])
The weirdest thing about this is that it's equivalent to:
z = numpy.sin(0.8*xy[:, 1])
So I don't know why it's written that way. maybe there's a typo?
Next,
# whiten them
z = whiten(z)
whitening is simply normalizing the variance of the population. See here for a demo:
>>> z = np.sin(.8*xy[:, 1]) # the original z
>>> zw = vq.whiten(z) # save it under a different name
>>> zn = z / z.std() # make another 'normalized' array
>>> map(np.std, [z, zw, zn]) # standard deviations of the three arrays
[0.42645, 1.0, 1.0]
>>> np.allclose(zw, zn) # whitened is the same as normalized
True
It's not obvious to me why it is whitened. Anyway, moving along:
# let scipy do its magic (k==3 groups)
res, idx = kmeans2(numpy.array(zip(xy[:,0],xy[:,1],z)),3)
Let's break that into two parts:
data = np.array(zip(xy[:, 0], xy[:, 1], z))
which is a weird (and slow) way of writing
data = np.column_stack([xy, z])
In any case, you started with two arrays and merge them into one:
>>> xy.shape
(30, 2)
>>> z.shape
(30,)
>>> data.shape
(30, 3)
Then it's data that is passed to the kmeans algorithm:
res, idx = vq.kmeans2(data, 3)
So now you can see that it's 30 points in 3d space that are passed to the algorithm, and the confusing part is how the set of points were created.

2 dimensional interpolation problem

I have DATA on x and y axes and the output is on z
for example
y = 10
x = [1,2,3,4,5,6]
z = [2.3,3.4,5.6,7.8,9.6,11.2]
y = 20
x = [1,2,3,4,5,6]
z = [4.3,5.4,7.6,9.8,11.6,13.2]
y = 30
x = [1,2,3,4,5,6]
z = [6.3,7.4,8.6,10.8,13.6,15.2]
how can i find the value of z when y = 15 x = 3.5
I was trying to use scipy but i am very new at it
Thanks a lot for the help
vibhor
scipy.interpolate.bisplrep
Reference:
http://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.bisplrep.html
import scipy
import math
import numpy
from scipy import interpolate
x= [1,2,3,4,5,6]
y= [10,20,30]
Y = numpy.array([[i]*len(x) for i in y])
X = numpy.array([x for i in y])
Z = numpy.array([[2.3,3.4,5.6,7.8,9.6,11.2],
[4.3,5.4,7.6,9.8,11.6,13.2],
[6.3,7.4,8.6,10.8,13.6,15.2]])
tck = interpolate.bisplrep(X,Y,Z)
print interpolate.bisplev(3.5,15,tck)
7.84921875
EDIT:
Upper solution does not give you perfect fit.
check
print interpolate.bisplev(x,y,tck)
[[ 2.2531746 4.2531746 6.39603175]
[ 3.54126984 5.54126984 7.11269841]
[ 5.5031746 7.5031746 8.78888889]
[ 7.71111111 9.71111111 10.9968254 ]
[ 9.73730159 11.73730159 13.30873016]
[ 11.15396825 13.15396825 15.2968254 ]]
to overcome this interpolate whit polyinomials of 5rd degree in x and 2nd degree in y direction
tck = interpolate.bisplrep(X,Y,Z,kx=5,ky=2)
print interpolate.bisplev(x,y,tck)
[[ 2.3 4.3 6.3]
[ 3.4 5.4 7.4]
[ 5.6 7.6 8.6]
[ 7.8 9.8 10.8]
[ 9.6 11.6 13.6]
[ 11.2 13.2 15.2]]
This yield
print interpolate.bisplev(3.5,15,tck)
7.88671875
Plotting:
reference http://matplotlib.sourceforge.net/examples/mplot3d/surface3d_demo.html
fig = plt.figure()
ax = Axes3D(fig)
ax.plot_surface(X, Y, Z,rstride=1, cstride=1, cmap=cm.jet)
plt.show()
Given (not as Python code, since the second assignment would obliterate the first in each case, of course;-):
y = 10
x = [1,2,3,4,5,6]
z = [2.3,3.4,5.6,7.8,9.6,11.2]
y = 20
x = [1,2,3,4,5,6]
z = [4.3,5.4,7.6,9.8,11.6,13.2]
you ask: "how can i find the value of z when y = 15 x = 3.5"?
Since you're looking at a point exactly equidistant in both x and y from the given "grid", you just take the midpoint between the grid values (if you had values not equidistant, you'd take a proportional midpoint, see later). So for y=10, the z values for x 3 and 4 are 5.6 and 7.8, so for x 3.5 you estimate their midpoint, 6.7; and similarly for y=20 you estimate the midpoint between 7.6 and 9.8, i.e., 8.7. Finally, since you have y=15, the midpoint between 6.7 and 8.7 is your final interpolated value for z: 7.7.
Say you had y=13 and x=3.8 instead. Then for x you'd take the values 80% of the way, i.e.:
for y=10, 0.2*5.6+0.8*7.8 -> 7.36
for y=20, 0.2*7.6+0.8*9.8 -> 9.46
Now you want the z 30% of the way between these, 0.3*7.36 + 0.7*9.46 -> 8.83, that's z.
This is linear interpolation, and it's really very simple. Do you want to compute it by hand, or find routines that do it for you (given e.g. numpy arrays as "the grids")? Even in the latter case, I hope this "manual" explanation (showing what you're doing in the most elementary of arithmetical terms) can help you understand what you're doing...;-).
There are more advanced forms of interpolation, of course -- do you need those, or does linear interpolation suffice for your use case?
I would say just take the average of the values around it. So if you need X=3.5 and Y=15 (3.5,15), you average (3,10), (3,20), (4,10) and (4,20). Since I have no idea what the data is you are dealing with, I am not sure if the exact proximity would matter - in which case you can just stick w/the average - or if you need to do some sort of inverse distance weighting.

Categories

Resources