A very simple question:
how to compute efficiently in Python (or Cython) the following quantity.
Given the list of polygons in 3D (polygon
There is a list of polygons given in the following form:
vertex = np.array([[0, 0, 0], [0, 0, 1], [0, 1, 0],[1, 0, 0],[0.5, 0.5, 0.5]], order = 'F').T
polygons = np.array([3, 0, 1, 2, 4, 1, 2, 3 ,4])
i.e. polygon is a 1D array, which contains entries of the form [N,i1,i2,i3,i4,...],
N is the number of vertices in a polygons and then the id numbers of the vertices in the vertex array (in the example above there is one triangle with 3 vertices [0,1,2] and one polygon with 4 vertices [1,2,3,4]
I need to compute the information: a list of all edges and for each edge the information
which faces contain this edge.
And I need to do it fast: the number of vertices can be large.
Update
The polygon is closed, i.e. a polygon [4, 0, 1, 5, 7] means that there are 4 vertices and edges are 0-1, 1-5, 5-7, 7-0
The face is a synonim to polygon in fact.
Dunno if this is the fastest option, most probably not, but it works. I think the slowest part is edges.index((v, polygon[i + 1])) where we have to find if this edge is already in list. Vertex array is not really needed since edge is a pair of vertex indexes. I used face_index as a reference to polygon index since you didn't write what face is.
vertex = [[0,0,0], [0,0,1], [0,1,0],[1,0,0],[0.5,0.5,0.5]]
polygons = [3,0,1,2,4,1,2,3,4]
_polygons = polygons
edges = []
faces = []
face_index = 0
while _polygons:
polygon = _polygons[1:_polygons[0] + 1]
polygon.append(polygon[0])
_polygons = _polygons[_polygons[0] + 1:]
for i, v in enumerate(polygon[0:-1]):
if not (v, polygon[i + 1]) in edges:
edges.append((v, polygon[i + 1]))
faces.append([face_index, ])
else:
faces[edges.index((v, polygon[i + 1]))].append(face_index)
face_index += 1
edges = map(lambda edge, face: (edge, face), edges, faces)
print edges
<<< [((0, 1), [0]), ((1, 2), [0, 1]), ((2, 0), [0]), ((2, 3), [1]), ((3, 4), [1]), ((4, 1), [1])]
You can make it faster by removing line polygon.append(polygon[0]) and append first vertice of polygon to vertices list in polygon manually, which shouldn't be a problem.
I mean change polygons = [3,0,1,2,4,1,2,3,4] into polygons = [3,0,1,2,0,4,1,2,3,4,1].
PS Try to use PEP8. It is a code typing style. It says that you should put a space after every comma in iterables so it's eaasier to read.
Related
Goal
I am writing a "colocalization" script to identify unique co-localized pairs of coordinates between two sets of data. My data is quite large with <100k points in each set so performance is pretty important.
For example, I have two sets of points:
import numpy as np
points_a = np.array([[1, 1],[2, 2],[3, 3],[6, 6]])
points_b = np.array([[1, 1],[2, 3],[3, 5],[6, 6], [7,6]]) # may be longer than points_a
For each point in points_a I want to find the nearest point in points_b. However, I don't want any point in points_b used in more than one pair. I can easily find the nearest neighbors using NearestNeighbors or one of the similar routines:
from sklearn.neighbors import NearestNeighbors
neigh = NearestNeighbors(n_neighbors=1)
neigh.fit(points_b)
distances, indices = neigh.kneighbors(points_a)
print(indices)
>>> [0, 1, 1, 3]
As above, this can give me a solution where a point in point_b is used twice. I would like to instead find the solution where each point is used once while minimizing the total distance across all pairs. In the above case:
[0, 1, 2, 3]
I figure a start would be to use NearestNeighbors or similar to find nearest neighbor candidates:
from scipy.spatial import KDTree
max_search_r = 3
from sklearn.neighbors import NearestNeighbors
neigh = NearestNeighbors(n_neighbors=1)
neigh.fit(points_b)
distances, indices = neigh.radius_neighbors(points_a, max_search_radius)
print(distances)
print(indices)
>>>[[0, 2,23], [1.41, 1], [2.82, 1, 2], [0, 1]]
>>>[[0, 1], [0, 1], [0, 1, 2], [0, 1]]
This shrinks down the overall search parameters but I am unclear how I can then compute the global optimum. I stumbled across this post: Find optimal unique neighbour pairs based on closest distance
but the solution is for only a single set of points and I am unclear how I could translate the method to my case.
Any advice would be greatly appreciated!
Update
Hey all. With everyone's advice I found a somewhat working solution:
import numpy as np
from sklearn.neighbors import NearestNeighbors
from scipy.sparse import csr_matrix, csgraph
def colocalize_points(points_a: np.ndarray, points_b: np.ndarray, r: int):
""" Find pairs that minimize global distance. Filters out anything outside radius `r` """
neigh = NearestNeighbors(n_neighbors=1)
neigh.fit(points_b)
distances, b_indices = neigh.radius_neighbors(points_a, radius=r)
# flatten and get indices for A. This will also drop points in A with no matches in range
d_flat = np.hstack(distances) + 1
b_flat = np.hstack(b_indices)
a_flat = np.array([i for i, neighbors in enumerate(distances) for n in neighbors])
# filter out A points that cannot be matched
sm = csr_matrix((d_flat, (a_flat, b_flat)))
a_matchable = csgraph.maximum_bipartite_matching(sm, perm_type='column')
sm_filtered = sm[a_matchable != -1]
# now run the distance minimizing matching
row_match, col_match = csgraph.min_weight_full_bipartite_matching(sm_filtered)
return row_match, col_match
Only issue I have is that by filtering the matrix with maximum_bipartite_matching I cannot be sure I truly have the best result since it just returns the first match. For example, if I have 2 points in A [[2,2][3,3]] whose only candidate match is [3,3], maximum_bipartite_matching will keep whichever appears first. So if [2,2] appears first in the matrix, [3,3] will be dropped despite being a better match.
Update 1
To address comments below, here is my reasoning why maximum_bipartite_matching does not give me the desired solution. Consider points:
points_a = np.array([(1, 1), (2, 2), (3, 3)])
points_b = np.array([(1, 1), (2, 2), (3, 5), (2, 3)])
The optimal a,b point pairing that minimizes distance will be:
[(1, 1): (1, 1),
(2, 2): (2, 2),
(3, 3): (2, 3)]
However if I run the following:
neigh = NearestNeighbors(n_neighbors=1)
neigh.fit(points_b)
distances, b_indices = neigh.radius_neighbors(points_a, radius=3)
# flatten and get indices for A. This will also drop points in A with no matches in range
d_flat = np.hstack(distances) + 1
b_flat = np.hstack(b_indices)
a_flat = np.array([i for i, neighbors in enumerate(distances) for n in neighbors])
# filter out A points that cannot be matched
sm = csr_matrix((d_flat, (a_flat, b_flat)))
a_matchable = csgraph.maximum_bipartite_matching(sm, perm_type='column')
print([(points_a[a], points_b[i]) for i, a in enumerate(a_matchable)])
I get solution:
[(1, 1): (1, 1),
(2, 2): (2, 2),
(3, 3): (3, 5)]
Swapping the last two points in points_b will give me the expected solution. This indicates to me that the algorithm is not taking the distance (weight) into account and instead just tries to maximize the number of connections. I could very well have made a mistake though so please let me know.
so i'm trying to render a 2D mesh using vtk (in python). I have a list
of tuples containing all the points and also a list of tuples containing the
points of each cell. Just to experiment, I tried to create a polydata object
of a square with 4 elements and render it, but i ended up with this:
I would like it to show the lines connecting the nodes (like a wireframe)
instead of solid square..
This is the code to produce the image above:
def main2():
#Array of vectors containing the coordinates of each point
nodes = np.array([[0, 0, 0], [1, 0, 0], [2, 0, 0], [2, 1, 0], [2, 2, 0],
[1, 2, 0], [0, 2, 0], [0, 1, 0], [1, 1, 0]])
#Array of tuples containing the nodes correspondent of each element
elements = np.array([(0, 1, 8, 7), (7, 8, 5, 6), (1, 2, 3, 8), (8, 3, 4,
5)])
#Make the building blocks of polyData attributes
Mesh = vtk.vtkPolyData()
Points = vtk.vtkPoints()
Cells = vtk.vtkCellArray()
#Load the point and cell's attributes
for i in range(len(nodes)):
Points.InsertPoint(i, nodes[i])
for i in range(len(elements)):
Cells.InsertNextCell(mkVtkIdList(elements[i]))
#Assign pieces to vtkPolyData
Mesh.SetPoints(Points)
Mesh.SetPolys(Cells)
#Mapping the whole thing
MeshMapper = vtk.vtkPolyDataMapper()
if vtk.VTK_MAJOR_VERSION <= 5:
MeshMapper.SetInput(Mesh)
else:
MeshMapper.SetInputData(Mesh)
#Create an actor
MeshActor = vtk.vtkActor()
MeshActor.SetMapper(MeshMapper)
#Rendering Stuff
camera = vtk.vtkCamera()
camera.SetPosition(1,1,1)
camera.SetFocalPoint(0,0,0)
renderer = vtk.vtkRenderer()
renWin = vtk.vtkRenderWindow()
renWin.AddRenderer(renderer)
iren = vtk.vtkRenderWindowInteractor()
iren.SetRenderWindow(renWin)
renderer.AddActor(MeshActor)
renderer.SetActiveCamera(camera)
renderer.ResetCamera()
renderer.SetBackground(1,1,1)
renWin.SetSize(300,300)
#Interact with data
renWin.Render()
iren.Start()
main2()
I would also like to know if it's possible to have a gridline as the
background of the render window, instead of a black color, just like this:
Thanks in advance!
You can use MeshActor.GetProperty().SetRepresentationToWireframe() (https://www.vtk.org/doc/nightly/html/classvtkProperty.html#a2a4bdf2f46dc499ead4011024eddde5c) to render the actor as wireframe, or MeshActor.GetProperty().SetEdgeVisibility(True) to render it as solid with edges rendered as lines.
Regarding the render window background, I don't know.
Thanks to #MafiaSkafia I created what I was looking for, 2D grid for 3D purposes, maybe someone will be looking for something like this too.
# plane
planeSource = vtk.vtkPlaneSource()
planeSource.SetOrigin(-100.0, -100.0, 0.0)
# planeSource.SetNormal(0.0, 0.0, 1.0)
planeSource.SetResolution(100,100)
planeSource.SetPoint1(100.0,-100.0,0.0)
planeSource.SetPoint2(-100.0,100.0,0.0)
planeSource.Update()
plane = planeSource.GetOutput()
# Create a mapper and actor
mapperP = vtk.vtkPolyDataMapper()
mapperP.SetInputData(plane)
actorP = vtk.vtkActor()
actorP.SetMapper(mapperP)
actorP.GetProperty().SetColor(0,0,0)
actorP.GetProperty().EdgeVisibilityOn() # showing mesh
actorP.GetProperty().SetEdgeColor(1,1,1)
actorP.GetProperty().SetOpacity(0.2) # transparency
...
renderer.AddActor(actorP)
I have a set of images over which polygons are drawn. I have the points of those polygons and I draw these using Shapely and check whether certain points from an eye tracker fall into the polygons.
Now, some of those images are mirrored but I do not have the coordinates of the polygons drawn in them. How can I flip the polygons horizontally? Is there a way to do this with Shapely?
if you want to reflect a polygon with respect to a vertical axis, i.e., to flip them horizontally, one option would be to use the scale transformation (using negative unit scaling factor) provided by shapely.affinity or to use a custom transformation:
from shapely.affinity import scale
from shapely.ops import transform
from shapely.geometry import Polygon
def reflection(x0):
return lambda x, y: (2*x0 - x, y)
P = Polygon([[0, 0], [1, 1], [1, 2], [0, 1]])
print(P)
#POLYGON ((0 0, 1 1, 1 2, 0 1, 0 0))
Q1 = scale(P, xfact = -1, origin = (1, 0))
Q2 = transform(reflection(1), P)
print(Q1)
#POLYGON ((2 0, 1 1, 1 2, 2 1, 2 0))
print(Q2)
#POLYGON ((2 0, 1 1, 1 2, 2 1, 2 0))
by multiplying [[1,0], [0,-1]], You can get the vertically flipped shape. (I tested this on jupyter notebook)
pts = np.array([[153, 347],
[161, 323],
[179, 305],
[195, 315],
[184, 331],
[177, 357]])
display(Polygon(pts))
display(Polygon(pts.dot([[1,0],[0,-1]])))
And If you multiply [[-1,0],[0,1]], you will get horizontally flipped shape.
Refer linear transformation to understand why this works.
I have a large list of x and y coordinates, stored in an numpy array.
Coordinates = [[ 60037633 289492298]
[ 60782468 289401668]
[ 60057234 289419794]]
...
...
What I want is to find all nearest neighbors within a specific distance (lets say 3 meters) and store the result so that I later can do some further analysis on the result.
For most packages I found it is necessary to decided how many NNs should be found but I just want all within the set distance.
How can I achieve something like that and what is the fastest and best way to achieve something like that for a large dataset (some million points)?
You could use a scipy.spatial.cKDTree:
import numpy as np
import scipy.spatial as spatial
points = np.array([(1, 2), (3, 4), (4, 5)])
point_tree = spatial.cKDTree(points)
# This finds the index of all points within distance 1 of [1.5,2.5].
print(point_tree.query_ball_point([1.5, 2.5], 1))
# [0]
# This gives the point in the KDTree which is within 1 unit of [1.5, 2.5]
print(point_tree.data[point_tree.query_ball_point([1.5, 2.5], 1)])
# [[1 2]]
# More than one point is within 3 units of [1.5, 1.6].
print(point_tree.data[point_tree.query_ball_point([1.5, 1.6], 3)])
# [[1 2]
# [3 4]]
Here is an example showing how you can
find all the nearest neighbors to an array of points, with one call
to point_tree.query_ball_point:
import numpy as np
import scipy.spatial as spatial
import matplotlib.pyplot as plt
np.random.seed(2015)
centers = [(1, 2), (3, 4), (4, 5)]
points = np.concatenate([pt+np.random.random((10, 2))*0.5
for pt in centers])
point_tree = spatial.cKDTree(points)
cmap = plt.get_cmap('copper')
colors = cmap(np.linspace(0, 1, len(centers)))
for center, group, color in zip(centers, point_tree.query_ball_point(centers, 0.5), colors):
cluster = point_tree.data[group]
x, y = cluster[:, 0], cluster[:, 1]
plt.scatter(x, y, c=color, s=200)
plt.show()
I want to create a list containing the 3-D coords of a grid of regularly spaced points, each as a 3-element tuple. I'm looking for advice on the most efficient way to do this.
In C++ for instance, I simply loop over three nested loops, one for each coordinate. In Matlab, I would probably use the meshgrid function (which would do it in one command). I've read about meshgrid and mgrid in Python, and I've also read that using numpy's broadcasting rules is more efficient. It seems to me that using the zip function in combination with the numpy broadcast rules might be the most efficient way, but zip doesn't seem to be overloaded in numpy.
Use ndindex:
import numpy as np
ind=np.ndindex(3,3,2)
for i in ind:
print(i)
# (0, 0, 0)
# (0, 0, 1)
# (0, 1, 0)
# (0, 1, 1)
# (0, 2, 0)
# (0, 2, 1)
# (1, 0, 0)
# (1, 0, 1)
# (1, 1, 0)
# (1, 1, 1)
# (1, 2, 0)
# (1, 2, 1)
# (2, 0, 0)
# (2, 0, 1)
# (2, 1, 0)
# (2, 1, 1)
# (2, 2, 0)
# (2, 2, 1)
Instead of meshgrid and mgrid, you can use ogrid, which is a "sparse" version of mgrid. That is, only the dimension along which the values change are filled in. The others are simply broadcast. This uses much less memory for large grids than the non-sparse alternatives.
For example:
>>> import numpy as np
>>> x, y = np.ogrid[-1:2, -2:3]
>>> x
array([[-1],
[ 0],
[ 1]])
>>> y
array([[-2, -1, 0, 1, 2]])
>>> x**2 + y**2
array([[5, 2, 1, 2, 5],
[4, 1, 0, 1, 4],
[5, 2, 1, 2, 5]])
I would say go with meshgrid or mgrid, in particular if you need non-integer coordinates. I'm surprised that Numpy's broadcasting rules would be more efficient, as meshgrid was designed especially for the problem that you want to solve.
for multi-d (greater than 2) meshgrids, use numpy.lib.index_tricks.nd_grid like so:
import numpy
grid = numpy.lib.index_tricks.nd_grid()
g1 = grid[:3,:3,:3]
g2 = grid[0:1:0.5, 0:1, 0:2]
g3 = grid[0:1:3j, 0:1:2j, 0:2:2j]
where g1 has x values of [0,1,2]
and g2 has x values of [0,.5],
and g3 has x values of [0.0,0.5,1.0] (the 3j defining the step count instead of the step increment. see the documentation for more details.
Here's an efficient option similar to your C++ solution, which I've used for exactly the same purpose:
import numpy, itertools, collections
def grid(xmin, xmax, xstep, ymin, ymax, ystep, zmin, zmax, zstep):
"return nested tuples of grid-sampled coordinates that include maxima"
return collections.deque( itertools.product(
numpy.arange(xmin, xmax+xstep, xstep).tolist(),
numpy.arange(ymin, ymax+ystep, ystep).tolist(),
numpy.arange(zmin, zmax+zstep, zstep).tolist() ) )
Performance is best (in my tests) when using a.tolist(), as shown above, but you can use a.flat instead and drop the deque() to get an iterator that will sip memory. Of course, you can also use a plain old tuple() or list() instead of deque() for a slight performance penalty (again, in my tests).