Selecting multiple values from 3d numpy in efficient way - python

I have a very large 3d numpy from which I want to extract many values (x, y, z).
For the sake of simplicity let's say this is the numpy:
import numpy as np
a = np.arange(64).reshape(4,4,4)
From which I want to extract the values of the following collection of points:
points = [[3,0,0],[0,1,0],[3,0,1],[2,3,1]]
In this example, the expected result should be:
[48,4,49,45]
Because performance metter, I want to avoid iterate like the following code:
points = [[3,0,0],[0,1,0],[3,0,1],[2,3,1]]
for i in points:
print(a[i[0],i[1],i[2]])

Try this. Uses numpy fancy/advanced indexing.
>>> import numpy as np
>>> a = np.arange(64).reshape(4,4,4)
>>> points = [[3,0,0],[0,1,0],[3,0,1],[2,3,1]]
>>> points = np.array(points)
>>> i = points[:, 0]
>>> j = points[:, 1]
>>> k = points[:, 2]
>>> a[i, j, k]
array([48, 4, 49, 45])

Related

2D indexing of scipy sparse matrix

import numpy as np
import scipy.sparse
x = np.random.randint(0, 1000, (1000, 100))
# prob better way to do this
d = np.random.random((1000,1000))
d[d < 0.99] = 0
y = scipy.sparse.csr_matrix(d)
What I would like to do is to create a new matrix z containing the values of y at the indices in x.
ie [0, 0] of z should contain the y[0, x[0, 0]]
[0, 1] of z should contain the y[0, x[0, 1]]
%time for i in range(1000): x[i, y[i]].todense()
~247ms
%time for i in range(1000): np.take(x[i].todense(), y[i])
~150ms
both of the above work, but I am looking for a faster method- this is currently the bottleneck on my code.
Please assume that representing the whole scipy.sparse matrix as dense isn't feasible.
edit:
%time z = np.vstack([q.todense()[0, p] for q, p in zip(x, y)])
is ~110ms
The answer seems to be to use an appropriately shaped broadcasting index, as outlined here: How to generate multi-dimensional 2D numpy index using a sub-index for one dimension
(answer deserves more upvotes)!
%time res = y[np.arange(0, 1000).reshape((-1, 1)), x].todense()

Python - Library function that given a X,Y pair of point find Xn, Yn which is the closest pair to that pair

Disclaimer note: I'm looking for a library, or pre-existing function that accomplishes this. Similar questions ask about the fundamental algorithm where I am looking for a quick implementation. So I apoligize if this appears to be a duplicate question as I'm just looking for a black boxed answer
Given a pair of geo coordinate points:
[34.232,-119.123]
And an array of other points:
[ [36.232,-117.123], [35.232,-119.123], [33.232,-112.123] ]
I'm looking for a function out there that would return a pair from the list above that is closest to the original coordinate
Edited from simple integers to float values
Per comment:
from scipy.spatial.distance import cdist
import numpy as np
def closest(point, ref):
dist = cdist(ref, [point])
return ref[np.argmin(dist)]
point = [1,2]
ref = [ [3,1], [4,1], [2,5] ]
closest(point,ref)
# out [3,1]
My two cents:
from scipy.spatial.distance import euclidean
from functools import partial
key = partial(euclidean, [1,2])
lst = [[3, 1], [4, 1], [2, 5]]
res = min(lst, key=key)
print(res)
Output
[3, 1]
One more:
from sklearn.neighbors import KDTree
import numpy as np
X = np.array([[3,1], [4,1], [2,5]])
tree = KDTree(X, leaf_size=2)
dist, ind = tree.query(np.array([1,2]).reshape(1,-1), k=1)
X[ind][0][0]
# array([3, 1])
Using numpy norm for euclidian distance
def fun(x, points):
points = np.array(points)
return points[np.argmin(np.linalg.norm(points-np.array(x), axis=1))]
print (fun([1,2], [[3,1], [4,1], [2,5]]))
print (fun([1,2], [[3,1], [2,1], [2,5]]))
Output:
[3 1]
[2 1]

How can I use a 3d numpy array of indices to retrieve the corresponding values i a 4d array?

I have a 4d numpy array temperature of data with the measured temperature at points x,y,z and time t. Assuming I have an array indices with the indices where the first instance of a condition is met, say temperature < 0, how do I extract a 3d array with the first temperatures satisfying this condition? That is I'm looking for the equivalent of numpy's 1d version (import numpy as np tacitly assumed)
>>> temperatures = np.arange(10,-10,-1)
>>> ind = np.argmax(temperatures < 0)
>>> T = temperature[ind]
I have tried the analogous
In [1]: temperatures = np.random.random((11,8,5,200)) * 1000
In [2]: temperatures.shape
Out[2]: (11, 8, 5, 200)
In [3]: indices= np.argmax(temperatures > 900,axis=3)
In [4]: indices.shape
Out[4]: (11, 8, 5)
In [5]: T = temperatures[:,:,:,indices]
In [6]: T.shape
Out[6]: (11, 8, 5, 11, 8, 5)
However, the dimensions if Tis 6.
I could of course do it with a for loop:
indices = np.argmax(temperatures > 900,axis=3)
x,y,z = temperatures.shape[:-1]
T = np.zeros((x,y,z))
for indx in range(x):
for indy in range(y):
for indz in range(z):
T[indx,indy,indz] = temperatures[indx,indy,indz,indices[indx,indy,indz]]
but I'm looking for something fore elegant and more pythonic. Is there someone more skilled with numpy out there who can help me out on this?
P.S. For the sake of clarity, I'm not just looking for the temperature at these points given by indices, I'm also looking for other quantities in arrays of the same shape as temperature, e.g. the time derivative. Also, in reality the arrays are much larger then this minimal example.
Numpy advanced indexing does always work:
import numpy as np
temperatures = np.random.random((11,8,5, 200)) * 1000
indices = np.argmax(temperatures > 900, axis=3)
x, y, z = temperatures.shape[:-1]
T = temperatures[np.arange(x)[:, np.newaxis, np.newaxis],
np.arange(y)[np.newaxis, :, np.newaxis],
np.arange(z)[np.newaxis, np.newaxis, :],
indices]
As jdehesa pointed out this can be made more concise:
x, y, z = np.ogrid[:x, :y, :z]
T = temperatures[x, y, z, i]
I think you need:
axis = 3
indices = np.argmax(temperatures > 900, axis=axis)
result = np.take_along_axis(temperatures, np.expand_dims(indices, axis), axis)
result = result.squeeze(axis)

Is it possible to index numpy array with sympy symbols?

Helle I want to do some summation on a numpy array like this
import numpy as np
import sympy as sy
import cv2
i, j = sy.symbols('i j', Integer=True)
#next read some grayscale image to create a numpy array of pixels
a = cv2.imread(filename)
b = sy.summation(sy.summation(a[i][j], (i,0,1)), (j,0,1)) #double summation
but I'm facing with an error. is it possible to handle numpy symbols as numpy arrays'indexes? if not can you sugest me a solution?
Thanks.
You can't use numpy object directly in SymPy expressions, because numpy objects don't know how to deal with symbolic variables.
Instead, create the thing you want symbolically using SymPy objects, and then lambdify it. The SymPy version of a numpy array is IndexedBase, but it seems there is a bug with it, so, since your array is 2-dimensional, you can also use MatrixSymbol.
In [49]: a = MatrixSymbol('a', 2, 2) # Replace 2, 2 with the size of the array
In [53]: i, j = symbols('i j', integer=True)
In [50]: f = lambdify(a, Sum(a[i, j], (i, 0, 1), (j, 0, 1)))
In [51]: b = numpy.array([[1, 2], [3, 4]])
In [52]: f(b)
Out[52]: 10
(also note that the correct syntax for creating integer symbols is symbols('i j', integer=True), not symbols('i j', Integer=True)).
Note that you have to use a[i, j] instead of a[i][j], which isn't supported.
MatrixSymbol is limited to 2-dimensional matrices. To generalize to arrays of
any dimension, you can generate the expression with IndexedBase. lambdify is
currently incompatible with IndexedBase, but it can be used with
DeferredVectors. So the trick is pass a DeferredVector to lambdify:
import sympy as sy
import numpy as np
a = sy.IndexedBase('a')
i, j, k = sy.symbols('i j k', integer=True)
s = sy.Sum(a[i, j, k], (i, 0, 1), (j, 0, 1), (k, 0, 1))
f = sy.lambdify(sy.DeferredVector('a'), s)
b = np.arange(24).reshape(2,3,4)
result = f(b)
expected = b[:2,:2,:2].sum()
assert expected == result

Intersection of 2D polygons

I have two numpy arrays that are OpenCV convex hulls and I want to check for intersection without creating for loops or creating images and performing numpy.bitwise_and on them, both of which are quite slow in Python. The arrays look like this:
[[[x1 y1]]
[[x2 y2]]
[[x3 y3]]
...
[[xn yn]]]
Considering [[x1 y1]] as one single element, I want to perform intersection between two numpy ndarrays. How can I do that? I have found a few questions of similar nature, but I could not figure out the solution to this from there.
You can use a view of the array as a single dimension to the intersect1d function like this:
def multidim_intersect(arr1, arr2):
arr1_view = arr1.view([('',arr1.dtype)]*arr1.shape[1])
arr2_view = arr2.view([('',arr2.dtype)]*arr2.shape[1])
intersected = numpy.intersect1d(arr1_view, arr2_view)
return intersected.view(arr1.dtype).reshape(-1, arr1.shape[1])
This creates a view of each array, changing each row to a tuple of values. It then performs the intersection, and changes the result back to the original format. Here's an example of using it:
test_arr1 = numpy.array([[0, 2],
[1, 3],
[4, 5],
[0, 2]])
test_arr2 = numpy.array([[1, 2],
[0, 2],
[3, 1],
[1, 3]])
print multidim_intersect(test_arr1, test_arr2)
This prints:
[[0 2]
[1 3]]
you can use http://pypi.python.org/pypi/Polygon/2.0.4, here is an example:
>>> import Polygon
>>> a = Polygon.Polygon([(0,0),(1,0),(0,1)])
>>> b = Polygon.Polygon([(0.3,0.3), (0.3, 0.6), (0.6, 0.3)])
>>> a & b
Polygon:
<0:Contour: [0:0.60, 0.30] [1:0.30, 0.30] [2:0.30, 0.60]>
To convert the result of cv2.findContours to Polygon point format, you can:
points1 = contours[0].reshape(-1,2)
This will convert the shape from (N, 1, 2) to (N, 2)
Following is a full example:
import Polygon
import cv2
import numpy as np
from scipy.misc import bytescale
y, x = np.ogrid[-2:2:100j, -2:2:100j]
f1 = bytescale(np.exp(-x**2 - y**2), low=0, high=255)
f2 = bytescale(np.exp(-(x+1)**2 - y**2), low=0, high=255)
c1, hierarchy = cv2.findContours((f1>120).astype(np.uint8),
cv2.cv.CV_RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)
c2, hierarchy = cv2.findContours((f2>120).astype(np.uint8),
cv2.cv.CV_RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)
points1 = c1[0].reshape(-1,2) # convert shape (n, 1, 2) to (n, 2)
points2 = c2[0].reshape(-1,2)
import pylab as pl
poly1 = pl.Polygon(points1, color="blue", alpha=0.5)
poly2 = pl.Polygon(points2, color="red", alpha=0.5)
pl.figure(figsize=(8,3))
ax = pl.subplot(121)
ax.add_artist(poly1)
ax.add_artist(poly2)
pl.xlim(0, 100)
pl.ylim(0, 100)
a = Polygon.Polygon(points1)
b = Polygon.Polygon(points2)
intersect = a&b # calculate the intersect polygon
poly3 = pl.Polygon(intersect[0], color="green") # intersect[0] are the points of the polygon
ax = pl.subplot(122)
ax.add_artist(poly3)
pl.xlim(0, 100)
pl.ylim(0, 100)
pl.show()
Output:
So this is what I did to get the job done:
import Polygon, numpy
# Here I extracted and combined some contours and created a convex hull from it.
# Now I wanna check whether a contour acquired differently intersects with this hull or not.
for contour in contours: # The result of cv2.findContours is a list of contours
contour1 = contour.flatten()
contour1 = numpy.reshape(contour1, (int(contour1.shape[0]/2),-1))
poly1 = Polygon.Polygon(contour1)
hull = hull.flatten() # This is the hull is previously constructued
hull = numpy.reshape(hull, (int(hull.shape[0]/2),-1))
poly2 = Polygon.Polygon(hull)
if (poly1 & poly2).area()<= some_max_val:
some_operations
I had to use for loop, and this altogether looks a bit tedious, although it gives me expected results. Any better methods would be greatly appreciated!
inspired by jiterrace's answer
I came across this post while working with Udacity deep learning class(trying to find the overlap between training and test data).
I am not familiar with "view" and found the syntax a bit hard to understand, probably the same when I try to communicate to my friends who think in "table".
My approach is basically to flatten/reshape the ndarray of shape (N, X, Y) into shape (N, X*Y, 1).
print(train_dataset.shape)
print(test_dataset.shape)
#(200000L, 28L, 28L)
#(10000L, 28L, 28L)
1). INNER JOIN (easier to understand, slow)
import pandas as pd
%%timeit -n 1 -r 1
def multidim_intersect_df(arr1, arr2):
p1 = pd.DataFrame([r.flatten() for r in arr1]).drop_duplicates()
p2 = pd.DataFrame([r.flatten() for r in arr2]).drop_duplicates()
res = p1.merge(p2)
return res
inters_df = multidim_intersect_df(train_dataset, test_dataset)
print(inters_df.shape)
#(1153, 784)
#1 loop, best of 1: 2min 56s per loop
2). SET INTERSECTION (fast)
%%timeit -n 1 -r 1
def multidim_intersect(arr1, arr2):
arr1_new = arr1.reshape((-1, arr1.shape[1]*arr1.shape[2])) # -1 means row counts are inferred from other dimensions
arr2_new = arr2.reshape((-1, arr2.shape[1]*arr2.shape[2]))
intersected = set(map(tuple, arr1_new)).intersection(set(map(tuple, arr2_new))) # list is not hashable, go tuple
return list(intersected) # in shape of (N, 28*28)
inters = multidim_intersect(train_dataset, test_dataset)
print(len(inters))
# 1153
#1 loop, best of 1: 34.6 s per loop

Categories

Resources