Find n-dimensional point in numpy array - python

I am investigating whether storing points in a numpy array helps me search for points, and I have several questions about it.
I have a Point class that represents a 3-dimensional point.
class Point( object ):
def __init__( self, x, y, z ):
self.x = x
self.y = y
self.z = z
def __repr__( self ):
return "<Point (%r, %r, %r)>" % ( self.x, self.y, self.z )
I build a list of Point objects. Notice that the coordinates (1, 2, 3) deliberately occurs twice; that is what I am going to search for.
>>> points = [Point(1, 2, 3), Point(4, 5, 6), Point(1, 2, 3), Point(7, 8, 9)]
I store the Point objects in a numpy array.
>>> import numpy
>>> npoints = numpy.array( points )
>>> npoints
array([<Point (1, 2, 3)>, <Point (4, 5, 6)>, <Point (1, 2, 3)>,
<Point (7, 8, 9)>], dtype=object)
I search for all points with coordinates (1, 2, 3) in the following manner.
>>> numpy.where( npoints == Point(1, 2, 3) )
>>> (array([], dtype=int64),)
But, the result is not useful. So, that does not seem to be the correct way to do it. Is numpy.where the thing to use? Is there another way to express the condition for numpy.where that would be successful?
The next thing I try is to store just the coordinates of the points in a numpy array.
>>> npoints = numpy.array( [(p.x, p.y, p.z) for p in points ])
>>> npoints
array([[1, 2, 3],
[4, 5, 6],
[1, 2, 3],
[7, 8, 9]])
I search for all points with coordinates (1,2,3) in the following manner.
>>> numpy.where( npoints == [1,2,3] )
(array([0, 0, 0, 2, 2, 2]), array([0, 1, 2, 0, 1, 2]))
The result is, at least, something I can deal with. The array of row indexes in the first return value, array([0, 0, 0, 2, 2, 2]), does indeed tell me that the coordinates I am searching for are in rows 0 and 2 of npoints. I could get away with doing something like the following.
>>> rows, cols = numpy.where( npoints == [1,2,3] )
>>> rows
array([0, 0, 0, 2, 2, 2])
>>> cols
array([0, 1, 2, 0, 1, 2])
>>> foundRows = set( rows )
>>> foundRows
set([0, 2])
>>> for r in foundRows:
... # Do something with npoints[r]
However, I feel that I am not really using numpy.where appropriately, and that I am just getting lucky in this particular situation.
What is the appropriate way to find all occurrences of a n-dimensional point (i.e., a row with particular values) in a numpy array?
Preserving the order of the array is essential.

You can create a “rich comparison” method object.__eq__(self, other) inside your Point class to be able to use == among Point objects:
class Point( object ):
def __init__( self, x, y, z ):
self.x = x
self.y = y
self.z = z
def __repr__( self ):
return "<Point (%r, %r, %r)>" % ( self.x, self.y, self.z )
def __eq__(self, other):
return self.x == other.x and self.y == other.y and self.z == other.z
import numpy
points = [Point(1, 2, 3), Point(4, 5, 6), Point(1, 2, 3), Point(7, 8, 9)]
npoints = numpy.array( points )
found = numpy.where(npoints == Point(1, 2, 3))
print(found) # => (array([0, 2]),)

Related

Sorting according to clockwise point coordinates

Given a list in Python containing 8 x, y coordinate values (all positive) of 4 points as [x1, x2, x3, x4, y1, y2, y3, y4] ((xi, yi) are x and y coordinates of ith point ),
How can I sort it such that new list [a1, a2, a3, a4, b1, b2, b3, b4] is such that coordinates (ai, bi) of 1 2 3 4 are clockwise in order with 1 closest to origin of xy plane, i.e. something like
2--------3
| |
| |
| |
1--------4
Points will roughly form a parallelogram.
Currently, I am thinking of finding point with least value of (x+y) as 1, then 2 by the point with least x in remaining coordinates, 3 by largest value of (x + y) and 4 as the remaining point
You should use a list of 2-item tuples as your data structure to represent a variable number of coordinates in a meaningful way.
from functools import reduce
import operator
import math
coords = [(0, 1), (1, 0), (1, 1), (0, 0)]
center = tuple(map(operator.truediv, reduce(lambda x, y: map(operator.add, x, y), coords), [len(coords)] * 2))
print(sorted(coords, key=lambda coord: (-135 - math.degrees(math.atan2(*tuple(map(operator.sub, coord, center))[::-1]))) % 360))
This outputs:
[(0, 0), (0, 1), (1, 1), (1, 0)]
import math
def centeroidpython(data):
x, y = zip(*data)
l = len(x)
return sum(x) / l, sum(y) / l
xy = [405952.0, 408139.0, 407978.0, 405978.0, 6754659.0, 6752257.0, 6754740.0, 6752378.0]
xy_pairs = list(zip(xy[:int(len(xy)/2)], xy[int(len(xy)/2):]))
centroid_x, centroid_y = centeroidpython(xy_pairs)
xy_sorted = sorted(xy_pairs, key = lambda x: math.atan2((x[1]-centroid_y),(x[0]-centroid_x)))
xy_sorted_x_first_then_y = [coord for pair in list(zip(*xy_sorted)) for coord in pair]
# P4=8,10 P1=3,5 P2=8,5 P3=3,10
points=[8,3,8,3,10,5,5,10]
k=0
#we know these numbers are extreme and data won't be bigger than these
xmin=1000
xmax=-1000
ymin=1000
ymax=-1000
#finding min and max values of x and y
for i in points:
if k<4:
if (xmin>i): xmin=i
if (xmax<i): xmax=i
else:
if (ymin>i): ymin=i
if (ymax<i): ymax=i
k +=1
sortedlist=[xmin,xmin,xmax,xmax,ymin,ymax,ymax,ymin]
print(sortedlist)
output:[3, 3, 8, 8, 5, 10, 10, 5]
for other regions you need to change sortedlist line. if center is inside the box then it will require more condition controlling
What we want to sort by is the angle from the start coordinate. I've used numpy here to interpret each vector from the starting coordinate as a complex number, for which there is an easy way of computing the angle (counterclockwise along the unit sphere)
def angle_with_start(coord, start):
vec = coord - start
return np.angle(np.complex(vec[0], vec[1]))
Full code:
import itertools
import numpy as np
def angle_with_start(coord, start):
vec = coord - start
return np.angle(np.complex(vec[0], vec[1]))
def sort_clockwise(points):
# convert into a coordinate system
# (1, 1, 1, 2) -> (1, 1), (1, 2)
coords = [np.array([points[i], points[i+4]]) for i in range(len(points) // 2)]
# find the point closest to the origin,
# this becomes our starting point
coords = sorted(coords, key=lambda coord: np.linalg.norm(coord))
start = coords[0]
rest = coords[1:]
# sort the remaining coordinates by angle
# with reverse=True because we want to sort by clockwise angle
rest = sorted(rest, key=lambda coord: angle_with_start(coord, start), reverse=True)
# our first coordinate should be our starting point
rest.insert(0, start)
# convert into the proper coordinate format
# (1, 1), (1, 2) -> (1, 1, 1, 2)
return list(itertools.chain.from_iterable(zip(*rest)))
Behavior on some sample inputs:
In [1]: a
Out[1]: [1, 1, 2, 2, 1, 2, 1, 2]
In [2]: sort_clockwise(a)
Out[2]: [1, 1, 2, 2, 1, 2, 2, 1]
In [3]: b
Out[3]: [1, 2, 0, 2, 1, 2, 3, 1]
In [4]: sort_clockwise(b)
Out[4]: [1, 0, 2, 2, 1, 3, 2, 1]
Based on BERA's answer but as a class:
code
import math
def class Sorter:
#staticmethod
def centerXY(xylist):
x, y = zip(*xylist)
l = len(x)
return sum(x) / l, sum(y) / l
#staticmethod
def sortPoints(xylist):
cx, cy = Sorter.centerXY(xylist)
xy_sorted = sorted(xylist, key = lambda x: math.atan2((x[1]-cy),(x[0]-cx)))
return xy_sorted
test
def test_SortPoints():
points=[(0,0),(0,1),(1,1),(1,0)]
center=Sorter.centerXY(points)
assert center==(0.5,0.5)
sortedPoints=Sorter.sortPoints(points)
assert sortedPoints==[(0, 0), (1, 0), (1, 1), (0, 1)]
As suggested by IgnacioVazquez-Abrams, we can also do sorting according to atan2 angles:
Code:
import math
import copy
import matplotlib.pyplot as plt
a = [2, 4, 5, 1, 0.5, 4, 0, 4]
print(a)
def clock(a):
angles = []
(x0, y0) = ((a[0]+a[1]+a[2]+a[3])/4, (a[4]+ a[5] + a[6] + a[7])/4) # centroid
for j in range(4):
(dx, dy) = (a[j] - x0, a[j+4] - y0)
angles.append(math.degrees(math.atan2(float(dy), float(dx))))
for k in range(4):
angles.append(angles[k] + 800)
# print(angles)
z = [copy.copy(x) for (y,x) in sorted(zip(angles,a), key=lambda pair: pair[0])]
print("z is: ", z)
plt.scatter(a[:4], a[4:8])
plt.show()
clock(a)
Output is :
[2, 4, 5, 1, 0.5, 4, 0, 4]
[-121.60750224624891, 61.92751306414704, -46.73570458892839, 136.8476102659946, 678.3924977537511, 861.9275130641471, 753.2642954110717, 936.8476102659946]
z is: [2, 5, 4, 1, 0.5, 0, 4, 4]
Try this line of code
def sort_clockwise(pts):
rect = np.zeros((4, 2), dtype="float32")
s = pts.sum(axis=1)
rect[0] = pts[np.argmin(s)]
rect[2] = pts[np.argmax(s)]
diff = np.diff(pts, axis=1)
rect[1] = pts[np.argmin(diff)]
rect[3] = pts[np.argmax(diff)]
return rect

Python filter points in 2D space

I have two arrays x=[1,2,3,4] and y=[1,0,0,1] describing 2D points (x,y). I want to know how many elements have x>2 and y==1. What is the most simple way to do this (without any loops)?
Is it possible to do something like x[x>2], but for two conditions?
Assuming these are numpy arrays, since your x[x>2] is numpy syntax, you just need the and (&) operator:
meet_cond = (x > 2) & (y == 1)
how_many = meet_cond.sum()
which_x = x[meet_cond]
which_y = y[meet_cond]
If x and y belong together as points, you might want to pack them into a np 2D array:
>>> import numpy as np
>>> x = np.array([1, 2, 3, 4])
>>> y = np.array([1, 0, 0, 1])
>>> xy = np.array([x, y]).T
>>> xy[(x > 2) & (y == 1)]
array([[4, 1]])
>>> xy[(xy[:, 0] > 2) & (xy[:, 1] == 1)]
array([[4, 1]])
>>> np.count_nonzero((xy[:, 0] > 2) & (xy[:, 1] == 1))
1

Computing average for numpy array

I have a 2d numpy array (6 x 6) elements. I want to create another 2D array out of it, where each block is the average of all elements within a blocksize window. Currently, I have the foll. code:
import os, numpy
def avg_func(data, blocksize = 2):
# Takes data, and averages all positive (only numerical) numbers in blocks
dimensions = data.shape
height = int(numpy.floor(dimensions[0]/blocksize))
width = int(numpy.floor(dimensions[1]/blocksize))
averaged = numpy.zeros((height, width))
for i in range(0, height):
print i*1.0/height
for j in range(0, width):
block = data[i*blocksize:(i+1)*blocksize,j*blocksize:(j+1)*blocksize]
if block.any():
averaged[i][j] = numpy.average(block[block>0])
return averaged
arr = numpy.random.random((6,6))
avgd = avg_func(arr, 3)
Is there any way I can make it more pythonic? Perhaps numpy has something which does it already?
UPDATE
Based on M. Massias's soln below, here is an update with fixed values replaced by variables. Not sure if it is coded right. it does seem to work though:
dimensions = data.shape
height = int(numpy.floor(dimensions[0]/block_size))
width = int(numpy.floor(dimensions[1]/block_size))
t = data.reshape([height, block_size, width, block_size])
avrgd = numpy.mean(t, axis=(1, 3))
To compute some operation slice by slice in numpy, it is very often useful to reshape your array and use extra axes.
To explain the process we'll use here: you can reshape your array, take the mean, reshape it again and take the mean again.
Here I assume blocksize is 2
t = np.array([[0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5],[0, 1, 2, 3, 4, 5],[0, 1, 2, 3, 4, 5],[0, 1, 2, 3, 4, 5],[0, 1, 2, 3, 4, 5],])
t = t.reshape([6, 3, 2])
t = np.mean(t, axis=2)
t = t.reshape([3, 2, 3])
np.mean(t, axis=1)
outputs
array([[ 0.5, 2.5, 4.5],
[ 0.5, 2.5, 4.5],
[ 0.5, 2.5, 4.5]])
Now that it's clear how this works, you can do it in one pass only:
t = t.reshape([3, 2, 3, 2])
np.mean(t, axis=(1, 3))
works too (and should be quicker since means are computed only once - I guess). I'll let you substitute height/blocksize, width/blocksize and blocksize accordingly.
See #askewcan nice remark on how to generalize this to any dimension.

How to make a multidimension numpy array with a varying row size?

I would like to create a two dimensional numpy array of arrays that has a different number of elements on each row.
Trying
cells = numpy.array([[0,1,2,3], [2,3,4]])
gives an error
ValueError: setting an array element with a sequence.
We are now almost 7 years after the question was asked, and your code
cells = numpy.array([[0,1,2,3], [2,3,4]])
executed in numpy 1.12.0, python 3.5, doesn't produce any error and
cells contains:
array([[0, 1, 2, 3], [2, 3, 4]], dtype=object)
You access your cells elements as cells[0][2] # (=2) .
An alternative to tom10's solution if you want to build your list of numpy arrays on the fly as new elements (i.e. arrays) become available is to use append:
d = [] # initialize an empty list
a = np.arange(3) # array([0, 1, 2])
d.append(a) # [array([0, 1, 2])]
b = np.arange(3,-1,-1) #array([3, 2, 1, 0])
d.append(b) #[array([0, 1, 2]), array([3, 2, 1, 0])]
While Numpy knows about arrays of arbitrary objects, it's optimized for homogeneous arrays of numbers with fixed dimensions. If you really need arrays of arrays, better use a nested list. But depending on the intended use of your data, different data structures might be even better, e.g. a masked array if you have some invalid data points.
If you really want flexible Numpy arrays, use something like this:
numpy.array([[0,1,2,3], [2,3,4]], dtype=object)
However this will create a one-dimensional array that stores references to lists, which means that you will lose most of the benefits of Numpy (vector processing, locality, slicing, etc.).
This isn't well supported in Numpy (by definition, almost everywhere, a "two dimensional array" has all rows of equal length). A Python list of Numpy arrays may be a good solution for you, as this way you'll get the advantages of Numpy where you can use them:
cells = [numpy.array(a) for a in [[0,1,2,3], [2,3,4]]]
Another option would be to store your arrays as one contiguous array and also store their sizes or offsets. This takes a little more conceptual thought around how to operate on your arrays, but a surprisingly large number of operations can be made to work as if you had a two dimensional array with different sizes. In the cases where they can't, then np.split can be used to create the list that calocedrus recommends. The easiest operations are ufuncs, because they require almost no modification. Here are some examples:
cells_flat = numpy.array([0, 1, 2, 3, 2, 3, 4])
# One of these is required, it's pretty easy to convert between them,
# but having both makes the examples easy
cell_lengths = numpy.array([4, 3])
cell_starts = numpy.insert(cell_lengths[:-1].cumsum(), 0, 0)
cell_lengths2 = numpy.diff(numpy.append(cell_starts, cells_flat.size))
assert np.all(cell_lengths == cell_lengths2)
# Copy prevents shared memory
cells = numpy.split(cells_flat.copy(), cell_starts[1:])
# [array([0, 1, 2, 3]), array([2, 3, 4])]
numpy.array([x.sum() for x in cells])
# array([6, 9])
numpy.add.reduceat(cells_flat, cell_starts)
# array([6, 9])
[a + v for a, v in zip(cells, [1, 3])]
# [array([1, 2, 3, 4]), array([5, 6, 7])]
cells_flat + numpy.repeat([1, 3], cell_lengths)
# array([1, 2, 3, 4, 5, 6, 7])
[a.astype(float) / a.sum() for a in cells]
# [array([ 0. , 0.16666667, 0.33333333, 0.5 ]),
# array([ 0.22222222, 0.33333333, 0.44444444])]
cells_flat.astype(float) / np.add.reduceat(cells_flat, cell_starts).repeat(cell_lengths)
# array([ 0. , 0.16666667, 0.33333333, 0.5 , 0.22222222,
# 0.33333333, 0.44444444])
def complex_modify(array):
"""Some complicated function that modifies array
pretend this is more complex than it is"""
array *= 3
for arr in cells:
complex_modify(arr)
cells
# [array([0, 3, 6, 9]), array([ 6, 9, 12])]
for arr in numpy.split(cells_flat, cell_starts[1:]):
complex_modify(arr)
cells_flat
# array([ 0, 3, 6, 9, 6, 9, 12])
In numpy 1.14.3, using append:
d = [] # initialize an empty list
a = np.arange(3) # array([0, 1, 2])
d.append(a) # [array([0, 1, 2])]
b = np.arange(3,-1,-1) #array([3, 2, 1, 0])
d.append(b) #[array([0, 1, 2]), array([3, 2, 1, 0])]
what you get an list of arrays (that can be of different lengths) and you can do operations like d[0].mean(). On the other hand,
cells = numpy.array([[0,1,2,3], [2,3,4]])
results in an array of lists.
You may want to do this:
a1 = np.array([1,2,3])
a2 = np.array([3,4])
a3 = np.array([a1,a2])
a3 # array([array([1, 2, 3]), array([3, 4])], dtype=object)
type(a3) # numpy.ndarray
type(a2) # numpy.ndarray
Slightly off-topic, but not as much as one would think because of eager mode which is now the default:
If you are using Tensorflow, you can do:
a = tf.ragged.constant([[0, 1, 2, 3]])
b = tf.ragged.constant([[2, 3, 4]])
c = tf.concat([a, b], axis=0)
And you can then do all the mathematical operations still, like tf.math.reduce_mean, etc.
np.array([[0,1,2,3], [2,3,4]], dtype=object) returns an "array" of lists.
a = np.array([np.array([0,1,2,3]), np.array([2,3,4])], dtype=object) returns an array of arrays. It allows already for operations such as a+1.
Building up on this, the functionality can be enhanced by subclassing.
import numpy as np
class Arrays(np.ndarray):
def __new__(cls, input_array, dims=None):
obj = np.array(list(map(np.array, input_array))).view(cls)
return obj
def __getitem__(self, ij):
if isinstance(ij, tuple) and len(ij) > 1:
# handle twodimensional slicing
if isinstance(ij[0],slice) or hasattr(ij[0], '__iter__'):
# [1:4,:] or [[1,2,3],[1,2]]
return Arrays(arr[ij[1]] for arr in self[ij[0]])
return self[ij[0]][ij[1]] # [1,:] np.array
return super(Arrays, self).__getitem__(ij)
def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
axis = kwargs.pop('axis', None)
dimk = [len(arg) if hasattr(arg, '__iter__') else 1 for arg in inputs]
dim = max(dimk)
pad_inputs = [([i]*dim if (d<dim) else i) for d,i in zip(dimk, inputs)]
result = [np.ndarray.__array_ufunc__(self, ufunc, method, *x, **kwargs) for x in zip(*pad_inputs)]
if method == 'reduce':
# handle sum, min, max, etc.
if axis == 1:
return np.array(result)
else:
# repeat over remaining axis
return np.ndarray.__array_ufunc__(self, ufunc, method, result, **kwargs)
return Arrays(result)
Now this works:
a = Arrays([[0,1,2,3], [2,3,4]])
a[0:1,0:-1]
# Arrays([[0, 1, 2]])
np.sin(a)
# Arrays([array([0. , 0.84147098, 0.90929743, 0.14112001]),
# array([ 0.90929743, 0.14112001, -0.7568025 ])], dtype=object)
a + 2*a
# Arrays([array([0, 3, 6, 9]), array([ 6, 9, 12])], dtype=object)
To get nanfunctions working, this can be done
# patch for nanfunction that cannot handle the object-ndarrays along with second axis=-1
def nanpatch(func):
def wrapper(a, axis=None, **kwargs):
if isinstance(a, Arrays):
rowresult = [func(x, **kwargs) for x in a]
if axis == 1:
return np.array(rowresult)
else:
# repeat over remaining axis
return func(rowresult)
# otherwise keep the original version
return func(a, axis=axis, **kwargs)
return wrapper
np.nanmean = nanpatch(np.nanmean)
np.nansum = nanpatch(np.nansum)
np.nanmin = nanpatch(np.nanmin)
np.nanmax = nanpatch(np.nanmax)
np.nansum(a)
# 15
np.nansum(a, axis=1)
# array([6, 9])

A 3-D grid of regularly spaced points

I want to create a list containing the 3-D coords of a grid of regularly spaced points, each as a 3-element tuple. I'm looking for advice on the most efficient way to do this.
In C++ for instance, I simply loop over three nested loops, one for each coordinate. In Matlab, I would probably use the meshgrid function (which would do it in one command). I've read about meshgrid and mgrid in Python, and I've also read that using numpy's broadcasting rules is more efficient. It seems to me that using the zip function in combination with the numpy broadcast rules might be the most efficient way, but zip doesn't seem to be overloaded in numpy.
Use ndindex:
import numpy as np
ind=np.ndindex(3,3,2)
for i in ind:
print(i)
# (0, 0, 0)
# (0, 0, 1)
# (0, 1, 0)
# (0, 1, 1)
# (0, 2, 0)
# (0, 2, 1)
# (1, 0, 0)
# (1, 0, 1)
# (1, 1, 0)
# (1, 1, 1)
# (1, 2, 0)
# (1, 2, 1)
# (2, 0, 0)
# (2, 0, 1)
# (2, 1, 0)
# (2, 1, 1)
# (2, 2, 0)
# (2, 2, 1)
Instead of meshgrid and mgrid, you can use ogrid, which is a "sparse" version of mgrid. That is, only the dimension along which the values change are filled in. The others are simply broadcast. This uses much less memory for large grids than the non-sparse alternatives.
For example:
>>> import numpy as np
>>> x, y = np.ogrid[-1:2, -2:3]
>>> x
array([[-1],
[ 0],
[ 1]])
>>> y
array([[-2, -1, 0, 1, 2]])
>>> x**2 + y**2
array([[5, 2, 1, 2, 5],
[4, 1, 0, 1, 4],
[5, 2, 1, 2, 5]])
I would say go with meshgrid or mgrid, in particular if you need non-integer coordinates. I'm surprised that Numpy's broadcasting rules would be more efficient, as meshgrid was designed especially for the problem that you want to solve.
for multi-d (greater than 2) meshgrids, use numpy.lib.index_tricks.nd_grid like so:
import numpy
grid = numpy.lib.index_tricks.nd_grid()
g1 = grid[:3,:3,:3]
g2 = grid[0:1:0.5, 0:1, 0:2]
g3 = grid[0:1:3j, 0:1:2j, 0:2:2j]
where g1 has x values of [0,1,2]
and g2 has x values of [0,.5],
and g3 has x values of [0.0,0.5,1.0] (the 3j defining the step count instead of the step increment. see the documentation for more details.
Here's an efficient option similar to your C++ solution, which I've used for exactly the same purpose:
import numpy, itertools, collections
def grid(xmin, xmax, xstep, ymin, ymax, ystep, zmin, zmax, zstep):
"return nested tuples of grid-sampled coordinates that include maxima"
return collections.deque( itertools.product(
numpy.arange(xmin, xmax+xstep, xstep).tolist(),
numpy.arange(ymin, ymax+ystep, ystep).tolist(),
numpy.arange(zmin, zmax+zstep, zstep).tolist() ) )
Performance is best (in my tests) when using a.tolist(), as shown above, but you can use a.flat instead and drop the deque() to get an iterator that will sip memory. Of course, you can also use a plain old tuple() or list() instead of deque() for a slight performance penalty (again, in my tests).

Categories

Resources