I have a list of unorered points (2D) and I want to calculate the sum of distances between them.
As my background is a c++ dev I would do it like this:
import math
class Point:
def __init__(self, x,y):
self.x = x
self.y = y
def distance(P1, P2):
return math.sqrt((P2.x-P1.x)**2 + (P2.y-P1.y)**2)
points = [Point(rand(1), rand(1)) for i in range(10)]
#this part should be in a nicer way
pathLen = 0
for i in range(1,10):
pathLen += distance(points[i-1], points[i])
Is there a more pythonic way to replace the for loop? e.g with reduce or something like that?
best regards!
You can use a generator expresion with sum, zip and itertools islice to avoid duplicating data:
from itertools import islice
paathLen = sum(distance(x, y) for x, y in zip(points, islice(points, 1, None)))
Here you have the live example
A few fixes, as a C++ approach is probably not the best here:
import math
# you need this import here, python has no rand in the main namespace
from random import random
class Point:
def __init__(self, x,y):
self.x = x
self.y = y
# there's usually no need to encapsulate variables in Python
def distance(P1, P2):
# your distance formula was wrong
# you were adding positions on each axis instead of subtracting them
return math.sqrt((P1.x-P2.x)**2 + (P1.y-P2.y)**2)
points = [Point(random(), random()) for i in range(10)]
# use a sum over a list comprehension:
pathLen = sum([distance(points[i-1], points[i]) for i in range(10)])
#Robin Zigmond's zip approach is also a neat way to achieve it, though it wasn't immediately obvious to me that it could be used here.
I ran into a similar problem and pieced together a numpy solution which I think works nicely.
Namely, if you cast your list of points as a numpy array you can then do the following:
pts = np.asarray(points)
dist = np.sqrt(np.sum((pts[np.newaxis, :, :]-pts[:, np.newaxis, :])**2, axis=2))
dist results in a nxn numpy symmetric array where the distance between each point to every other point is given above or below the diagonal. The diagonal is each point's distance to itself so just 0s.
You can then use:
path_leng = np.sum(dist[np.triu_indices(pts.shape[0], 1)].tolist())
to collect the top half of the numpy array and sum them to get the pathlength.
Related
I'm trying to plot a simple moving averages function but the resulting array is a few numbers short of the full sample size. How do I plot such a line alongside a more standard line that extends for the full sample size? The code below results in this error message:
ValueError: x and y must have same first dimension, but have shapes (96,) and (100,)
This is using standard matplotlib.pyplot. I've tried just deleting X values using remove and del as well as switching all arrays to numpy arrays (since that's the output format of my moving averages function) then tried adding an if condition to the append in the while loop but neither has worked.
import random
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
def movingaverage(values, window):
weights = np.repeat(1.0, window) / window
smas = np.convolve(values, weights, 'valid')
return smas
sampleSize = 100
min = -10
max = 10
window = 5
vX = np.array([])
vY = np.array([])
x = 0
val = 0
while x < sampleSize:
val += (random.randint(min, max))
vY = np.append(vY, val)
vX = np.append(vX, x)
x += 1
plt.plot(vX, vY)
plt.plot(vX, movingaverage(vY, window))
plt.show()
Expected results would be two lines on the same graph - one a simple moving average of the other.
Just change this line to the following:
smas = np.convolve(values, weights,'same')
The 'valid' option, only convolves if the window completely covers the values array. What you want is 'same', which does what you are looking for.
Edit: This, however, also comes with its own issues as it acts like there are extra bits of data with value 0 when your window does not fully sit on top of the data. This can be ignored if chosen, as is done in this solution, but another approach is to pad the array with specific values of your choosing instead (see Mike Sperry's answer).
Here is how you would pad a numpy array out to the desired length with 'nan's (replace 'nan' with other values, or replace 'constant' with another mode depending on desired results)
https://docs.scipy.org/doc/numpy/reference/generated/numpy.pad.html
import numpy as np
bob = np.asarray([1,2,3])
alice = np.pad(bob,(0,100-len(bob)),'constant',constant_values=('nan','nan'))
So in your code it would look something like this:
import random
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
def movingaverage(values,window):
weights = np.repeat(1.0,window)/window
smas = np.convolve(values,weights,'valid')
shorted = int((100-len(smas))/2)
print(shorted)
smas = np.pad(smas,(shorted,shorted),'constant',constant_values=('nan','nan'))
return smas
sampleSize = 100
min = -10
max = 10
window = 5
vX = np.array([])
vY = np.array([])
x = 0
val = 0
while x < sampleSize:
val += (random.randint(min,max))
vY = np.append(vY,val)
vX = np.append(vX,x)
x += 1
plt.plot(vX,vY)
plt.plot(vX,(movingaverage(vY,window)))
plt.show()
To answer your basic question, the key is to take a slice of the x-axis appropriate to the data of the moving average. Since you have a convolution of 100 data elements with a window of size 5, the result is valid for the last 96 elements. You would plot it like this:
plt.plot(vX[window - 1:], movingaverage(vY, window))
That being said, your code could stand to have some optimization done on it. For example, numpy arrays are stored in fixed size static buffers. Any time you do append or delete on them, the entire thing gets reallocated, unlike Python lists, which have amortization built in. It is always better to preallocate if you know the array size ahead of time (which you do).
Secondly, running an explicit loop is rarely necessary. You are generally better off using the under-the-hood loops implemented at the lowest level in the numpy functions instead. This is called vectorization. Random number generation, cumulative sums and incremental arrays are all fully vectorized in numpy. In a more general sense, it's usually not very effective to mix Python and numpy computational functions, including random.
Finally, you may want to consider a different convolution method. I would suggest something based on numpy.lib.stride_tricks.as_strided. This is a somewhat arcane, but very effective way to implement a sliding window with numpy arrays. I will show it here as an alternative to the convolution method you used, but feel free to ignore this part.
All in all:
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
def movingaverage(values, window):
# this step creates a view into the same buffer
values = np.lib.stride_tricks.as_strided(values, shape=(window, values.size - window + 1), strides=values.strides * 2)
smas = values.sum(axis=0)
smas /= window # in-place to avoid temp array
return smas
sampleSize = 100
min = -10
max = 10
window = 5
v_x = np.arange(sampleSize)
v_y = np.cumsum(np.random.random_integers(min, max, sampleSize))
plt.plot(v_x, v_y)
plt.plot(v_x[window - 1:], movingaverage(v_y, window))
plt.show()
A note on names: in Python, variable and function names are conventionally name_with_underscore. CamelCase is reserved for class names. np.random.random_integers uses inclusive bounds just like random.randint, but allows you to specify the number of samples to generate. Confusingly, np.random.randint has an exclusive upper bound, more like random.randrange.
I have a list of Shapely polygons and a point like so:
from shapely.geometry import Point, Polygon
polygons = [Polygon(...), Polygon(...), ...]
point = Point(2.5, 5.7)
and I want to find the closest polygon in the list to that point. I'm already aware of the object.distance(other) function which returns the minimum distance between two geometric shapes, and I thought about computing all the distances in a loop to find the closest polygon:
polygons = [Polygon(...), Polygon(...), ...]
point = Point(2.5, 5.7)
min_dist = 10000
closest_polygon = None
for polygon in polygons:
dist = polygon.distance(point)
if dist < min_dist:
min_dist = dist
closest_polygon = polygon
My question is: Is there a more efficient way to do it?
There is a shorter way, e.g.
from shapely.geometry import Point, Polygon
import random
from operator import itemgetter
def random_coords(n):
return [(random.randint(0, 100), random.randint(0, 100)) for _ in range(n)]
polys = [Polygon(random_coords(3)) for _ in range(4)]
point = Point(random_coords(1))
min_distance, min_poly = min(((poly.distance(point), poly) for poly in polys), key=itemgetter(0))
as Georgy mentioned (++awesome!) even more concise:
min_poly = min(polys, key=point.distance)
but distance computation is in general computationally intensive
I have a solution that works if you have at least 2 polygons with a distance different from 0. Let's call these 2 polygons "basePolygon0" and "basePolygon1". The idea is to build a KD tree with the distance of each polygon to each of the "basis" polygons.
Once the KD tree has been built, we can query it by computing the distance to each of the basis polygons.
Here's a working example:
from shapely.geometry import Point, Polygon
import numpy as np
from scipy.spatial import KDTree
# prepare a test with triangles
poly0 = Polygon([(3,-1),(5,-1),(4,2)])
poly1 = Polygon([(-2,1),(-4,2),(-3,4)])
poly2 = Polygon([(-3,-3),(-4,-6),(-2,-6)])
poly3 = Polygon([(-1,-4),(1,-4),(0,-1)])
polys = [poly0,poly1,poly2,poly3]
p0 = Point(4,-3)
p1 = Point(-4,1)
p2 = Point(-4,-2)
p3 = Point(0,-2.5)
testPoints = [p0,p1,p2,p3]
# select basis polygons
# it works with any pair of polygons that have non zero distance
basePolygon0 = polys[0]
basePolygon1 = polys[1]
# compute tree query
def buildQuery(point):
distToBasePolygon0 = basePolygon0.distance(point)
distToBasePolygon1 = basePolygon1.distance(point)
return np.array([distToBasePolygon0,distToBasePolygon1])
distances = np.array([buildQuery(poly) for poly in polys])
# build the KD tree
tree = KDTree(distances)
# test it
for p in testPoints:
q = buildQuery(p)
output = tree.query(q)
print(output)
This yields as expected:
# (distance, polygon_index_in_KD_tree)
(2.0248456731316584, 0)
(1.904237866994273, 1)
(1.5991500555008626, 2)
(1.5109986459170694, 3)
There is one way that might be faster, but without doing any actual tests, it's hard for me to say for sure.
This might not work for your situation, but the basic idea is each time a Shapely object is added to the array, you adjust the position of different array elements so that it is always "sorted" in this manner. In Python, this can be done with the heapq module. The only issue with that module is that it's hard to choose a function to compare to different objects, so you would have to do something like this answer, where you make a custom Class for objects to put in the heapq that is a tuple.
import heapq
class MyHeap(object):
def __init__(self, initial=None, key=lambda x:x):
self.key = key
if initial:
self._data = [(key(item), item) for item in initial]
heapq.heapify(self._data)
else:
self._data = []
def push(self, item):
heapq.heappush(self._data, (self.key(item), item))
def pop(self):
return heapq.heappop(self._data)[1]
The first element in the tuple is a "key", which in this case would be the distance to the point, and then the second element would be the actual Shapely object, and you could use it like so:
point = Point(2.5, 5.7)
heap = MyHeap(initial=None, key=lambda x:x.distance(point))
heap.push(Polygon(...))
heap.push(Polygon(...))
# etc...
And at the end, the object you're looking for will be at heap.pop().
Ultimately, though, both algorithms seem to be (roughly) O(n), so any speed up would not be a significant one.
I am trying to convert a set of 3D points into a heightmap (a 2d image that shows the largest displacements of the points from the floor)
The only way I can come up with is writing a for look that iterates through all points and update the heightmap, this method, is quite slow.
import numpy as np
heightmap_resolution = 0.02
# generate some random 3D points
points = np.array([[x,y,z] for x in np.random.uniform(0,2,100) for y in np.random.uniform(0,2,100) for z in np.random.uniform(0,2,100)])
heightmap = np.zeros((int(np.max(points[:,1])/heightmap_resolution) + 1,
int(np.max(points[:,0])/heightmap_resolution) + 1))
for point in points:
y = int(point[1]/heightmap_resolution)
x = int(point[0]/heightmap_resolution)
if point[2] > heightmap[y][x]:
heightmap[y][x] = point[2]
I wonder if there is a better way of doing this. Any improvement is greatly appreciated!
The intuition:
If you find yourself using a for loop with numpy, you probably need to check again if numpy has an operation for it. I saw you wanted to compare items to get max and I wasn't sure if the structure was imporant so I changed it.
2nd point is heightmap is pre-allocating a lot of memory you aren't going to use. Try using a dictionary with a tuple (x,y) as the key or this (a dataframe)
import numpy as np
import pandas as pd
heightmap_resolution = 0.02
# generate some random 3D points
points = np.array([[x,y,z] for x in np.random.uniform(0,2,100) for y in np.random.uniform(0,2,100) for z in np.random.uniform(0,2,100)])
points_df = pd.DataFrame(points, columns = ['x','y','z'])
#didn't know if you wanted to keep the x and y columns so I made new ones.
points_df['x_normalized'] = (points_df['x']/heightmap_resolution).astype(int)
points_df['y_normalized'] = (points_df['y']/heightmap_resolution).astype(int)
points_df.groupby(['x_normalized','y_normalized'])['z'].max()
I have 2 arrays, sample, and data. Data will hold the coordinates to an ellipse. All points inside the ellipse will be white and outside will be black.
Now I want to calculate the spatial distance between my sample array and data array (ellipse array) given a certain centre of the ellipse (x,y). All of the possible points of the centre of the ellipse are stored in another array called center_points.
However, when I run the code I receive an empty list but I expect a list of spatial distances.
import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial import distance
center_points = []
def combinations(x,y):
dx = 2
dy = 2
return x-dx,y-dy
for x in range(10):
for y in range(10):
center_points.append(combinations(x,y))
sample = np.random.rand(100,100)
#spatial distance
spatial_distance = []
data = np.empty((100,100))
def ellipse(x,y):
if (x**2+y**2/3)>300:
return 0
else:
return 1
def translate(x, y, DX, DY):
return (x- DX, y - DY)
def rotate(m, n):
theta = np.radians(45)
matrix = np.array([[np.cos(theta), -np.sin(theta)], [np.sin(theta), np.cos(theta)]])
return np.dot(matrix, (m,n))
for i in range(data.shape[0]):
for j in range(data.shape[1]):
data[i][j]= ellipse(i,j)
data[i][j]= rotate(i,j)
for a,b in center_points:
data.append((translate(i,j,a,b)))
spatial_distance.append(distance.hamming(data,sample))
Indentation error by the looks of things. The for loops after your rotate function will never run as they are intented such that the interpreter thinks they are part of the rotate function which returns before the loops are entered.
Although I think the line data[i][j]= rotate(i,j) in your loops will throw an error anyway.
EDIT: the original error has been edited out of the OP
I'm working on an image processing program with OpenCV and numpy. For most pixel operations, I'm able to avoid nested for loops by using np.vectorize(), but one of the functions I need to implement requires as a parameter the 'distance from center', or basically the coordinates of the point being processed.
Pseudoexample :
myArr = [[0,1,2]
[3,4,5]]
def myFunc(val,row,col):
return [row,col]
f = np.vectorize(myFunc)
myResult = f(myArr,row,col)
I obviously can't get elemX and elemY from the vectorized array, but is there another numpy function I could use in this situation or do I have to use for loops?, Is there a way to do it using openCV?
The function I need to put each pixel through is :
f(i, j) = 1/(1 + d(i, j)/L) , d(i,j) being the euclidean distance of the point from the center of the image.
You can get an array of distance from the center using the following lines (which is an example, there are a lot of ways to do this):
import numpy as np
myArr = np.array([[0,1,2], [3,4,5]])
nx, ny = myArr.shape
x = np.arange(nx) - (nx-1)/2. # x an y so they are distance from center, assuming array is "nx" long (as opposed to 1. which is the other common choice)
y = np.arange(ny) - (ny-1)/2.
X, Y = np.meshgrid(x, y)
d = np.sqrt(X**2 + Y**2)
# d =
# [[ 1.11803399 1.11803399]
# [ 0.5 0.5 ]
# [ 1.11803399 1.11803399]]
Then you can calculate f(i, j) by:
f = 1/(1 + d/L)
As an aside, your heavy use of np.vectorize() is a bit dubious. Are you sure it's doing what you want, and did you note the statement from the documentation:
The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.
It's generally better to just write you code in vectorized form (like my line for f above which will work whether L is an array or a scaler), and not use numpy.vectorize(), and these are different things.
np.vectorize don't accelerate the code, you can vectorize it this way, `
# This compute distance between all points of MyArray and the center
dist_vector= np.sqrt(np.sum(np.power(center-MyArray,2),axis=1))
# F will contain the target value for each point
F = 1./(1 + 1. * dist_vector/L)