python loops very slow

python loops very slow - python

I have a question concerning the speed of loops in Python.
I created the following loops to fill values in my array, but it is very slow.
Is there a way to make it process faster ?
winW = 1
winH = 200
runlength = np.zeros(shape=(img.shape[0], img.shape[1]))
for y in range(0, img.shape[0] - winH, 1):
for x in range(0, img.shape[1] - winW, 1):
runlength[y, x] += np.sum(img[y:y + winH, x:x + winW]) / (winH * winW)
runlength[y + winH, x] += np.sum(img[y:y + winH, x:x + winW]) / (winH * winW)
Thanks for your help
Edit : I precise that I can only use numpy but not scipy

Let me describe how to speed up the first operation in the for loop, given by
runlength[y, x] += np.sum(img[y:y + winH, x:x + winW]) / (winH * winW)
Basically, you are moving a rectangle of width winW and height winH over the image. You start with the upper-left corner of the rectangle at point (0,0) of the image, then you sum all values in the image which lie below this rectangle and divide them by the total number of points. The output at position (0,0) is that number. Then you shift the rectangle one to the right and repeat the procedure until you are at the right end of the image. You move one row down and repeat.
In image processing terms: you apply a spatial filter mask to the image. The filter is an average filter of width winW and height winH.
To implement this efficiently, you can use the scipy.ndimage.correlate function. The input is your image, the weights contains the weight by which element below the rectangle is multiplied. In this case that is an array with dimensions (winH, winW) where every element contains the number 1 / (winH * winW). Thus every point of the image which lies below the rectangle is multiplied by 1 / (winH * winW), and then everything is summed.
To match your algorithm exactly, we need to set the origin to (-np.floor(winH/2), -np.floor(winW/2)) to specify that the mean of the rectangle is placed at the location upper right corner of the rectangle in the output.
Finally, to match your algorithm exactly, we have to set all points below (img.shape[0] - winH) or right of (img.shape[1] - winW) to zero. The for-loop can thus be replaced with
runlength_corr = correlate(input=img,
weights=np.ones((winH, winW)) / (winW * winH),
origin=(-np.floor(winH/2), -np.floor(winW/2)))
runlength_corr[(img.shape[0] - winH):, :] = 0
runlength_corr[:, (img.shape[1] - winW):] = 0
I compared the run time of the nested for-loops and the correlate method on a test image of size 512-by-512:
For-loops: Elapsed time: 0.665 sec
Correlate: Elapsed time: 0.085 sec
So this gives a nice speed-up of factor 8. The sum of absolute differences over the entire output is as low as 7.04e-09, so the outputs of both methods are essentially the same.

For starters, you seem to be calculating the same quantity twice, inside you loop. That alone could half your running time.
Second, if winW is always 1, then np.sum(img[y:y + winH, x:x + winW]) is just
np.sum(img[y:y + winH, x]). That should speed it up a bit.
What remains is how you can speed up np.sum(img[y:y + winH, x]). You can start with calculating
sum0 = np.sum(img[0: 0 + winH, x])
Now, note that the quantity
sum1 = np.sum(img[1: 1 + winH, x])
differs from the previous one by two pixels only, so, it is equal to sum0 - img[0, x] + img[1 + winH, x]. For the next y
sum2 = sum1 - img[1, x] + img[2 + winH, x]`
and so on

Related

How to vectorize tasks in python?

I (will) have a list of coordinates; using python's pillow module, I want to save a series of (cropped) smaller images to disk. Currently, I am using a for loop to act to determine one coordinate at a time then crop/save the image before proceeding to the next coordinate.
Is there a way to divide this job up such that multiple images can be cropped/saved simultaneously? I understand that this would take up more RAM but would be decrease performance time.
I'm sure this is possible but I'm not sure if this is simple. I've heard terms like 'vectorization' and 'multi-threading' that sound vaguely appropriate to this situation. But these topics extend beyond my experience.
I've attached the code for reference. However, I'm simply trying to solicit recommended strategies. (i.e. what techniques should I learn about to better tailor my approach, take multiple crops at once, etc?)
def parse_image(source, square_size, count, captures, offset=0, offset_type=0, print_coords=False):
"""
Starts at top left corner of image. Iterates through image by square_size (width = height)
across x values and after exhausting the row, begins next row lower by function of
square_size. Offset parameter is available such that, with multiple function calls,
overlapping images could be generated.
"""
src = Image.open(source)
dimensions = src.size
max_down = int(src.height/square_size) * square_size + square_size
max_right = int(src.width/square_size) * square_size + square_size
if offset_type == 1:
tl_x = 0 + offset
tl_y = 0
br_x = square_size + offset
br_y = square_size
for y in range(square_size,max_down,square_size):
for x in range(square_size + offset,max_right - offset,square_size):
if (tl_x,tl_y) not in captures:
sample = src.crop((tl_x,tl_y,br_x,br_y))
sample.save(f"{source[:-4]}_sample_{count}_x{tl_x}_y{tl_y}.jpg")
captures.append((tl_x,tl_y))
if print_coords == True:
print(f"image {count}: top-left (x,y): {(tl_x,tl_y)}, bottom-right (x,y): {(br_x,br_y)}")
tl_x = x
br_x = x + square_size
count +=1
else:
continue
tl_x = 0 + offset
br_x = square_size + offset
tl_y = y
br_y = y + square_size
else:
tl_x = 0
tl_y = 0 + offset
br_x = square_size
br_y = square_size + offset
for y in range(square_size + offset,max_down - offset,square_size):
for x in range(square_size,max_right,square_size):
if (tl_x,tl_y) not in captures:
sample = src.crop((tl_x,tl_y,br_x,br_y))
sample.save(f"{source[:-4]}_sample_{count}_x{tl_x}_y{tl_y}.jpg")
captures.append((tl_x,tl_y))
if print_coords == True:
print(f"image {count}: top-left (x,y): {(tl_x,tl_y)}, bottom-right (x,y): {(br_x,br_y)}")
tl_x = x
br_x = x + square_size
count +=1
else:
continue
tl_x = 0
br_x = square_size
tl_y = y + offset
br_y = y + square_size + offset
return count

What you want to achieve here is to have a higher degree of parallelism, the first thing to do is to understand what is the minimum task that you need to do here, and from that, think in a way to better distribute it.
First thing to notice here is that there is two behaviour, first if you have offset_type 0, and another if you have offset_type 1, split that off into two different functions.
Second thing is: given an image, you're taking crops of a given size, at a given offset(x,y) for the whole image. You could for instance, simplify this function to take one crop of the image, given the image offset(x,y). Then, you could call this function for all the x and y of the image in parallel. That's pretty much what most image processing frameworks tries to achieve, even more the one's that run code inside the GPU, small blocks of code, that operates locally in the image.
So lets say your image has width=100, height=100, and you're trying to make crops of w=10,h=10. Given the simplistic function that I described, I will call it crop(img, x, y, crop_size_x, crop_size_y) All you have to do is create the image:
img = Image.open(source)
crop_size_x = 10
crop_size_y = 10
crops = [crop(img, x, y, crop_size_x, crop_size_y) for x, y in zip(range(img.width), range(img.height))]
later on, you can then replace the list comprehension for a multi_processing library that can actually spawn many processes, do real parallelism, or even write such code inside a GPU kernel/shader, and use the GPU parallelism to achieve high performance.

Efficient way to calculate all IoUs of two lists

I have a function to calculate the IoU of two rectangles/bounding boxes.
def intersection_over_union(boxA, boxB):
# determine the (x, y)-coordinates of the intersection rectangle
xA = max(boxA[0], boxB[0])
yA = max(boxA[1], boxB[1])
xB = min(boxA[2], boxB[2])
yB = min(boxA[3], boxB[3])
# compute the area of intersection rectangle
interArea = max(0, xB - xA + 1) * max(0, yB - yA + 1)
# compute the area of both the prediction and ground-truth
# rectangles
boxAArea = (boxA[2] - boxA[0] + 1) * (boxA[3] - boxA[1] + 1)
boxBArea = (boxB[2] - boxB[0] + 1) * (boxB[3] - boxB[1] + 1)
# compute the intersection over union by taking the intersection
# area and dividing it by the sum of prediction + ground-truth
# areas - the interesection area
iou = interArea / float(boxAArea + boxBArea - interArea)
# return the intersection over union value
return iou
Now I want to calculate all IoUs of bboxes of one list with bboxes of another list, i.e. if list A containts 4 bboxes and list B contains 3 bboxes, then I want a 4x3 matrix with all possible IoUs as a result.
Of course I can do this with two loops like this
import numpy as np
n_i = len(bboxes_a)
n_j = len(bboxes_b)
iou_mat = np.empty((n_i, n_j))
for i in range(n_i):
for j in range(n_j):
iou_mat[i, j] = intersection_over_union(bboxes_a[i], bboxes_b[j])
but this approach is very slow, especially when the lists become very big.
I'm struggling to find a more efficient way. There has to be a way to utilize numpy to get rid of the loops, but I don't get it. Also the complexity right now is O(m*n). Is there a possibility to reduce the complexity?

vectorize:
low = np.s_[...,:2]
high = np.s_[...,2:]
def iou(A,B):
A,B = A.copy(),B.copy()
A[high] += 1; B[high] += 1
intrs = (np.maximum(0,np.minimum(A[high],B[high])
-np.maximum(A[low],B[low]))).prod(-1)
return intrs / ((A[high]-A[low]).prod(-1)+(B[high]-B[low]).prod(-1)-intrs)
AB = iou(A[:,None],B[None])
complexity:
Since you are calculating M x N values reducing complexity below M x N is impossible unless most of the values are zero and a sparse representation of the matrix is acceptable.
This can be done by argsorting (separately for x and y) all the ends of A and B. That's O((M+N) log(M+N)) EDIT As the coordinates are integer linear complexity may be possible here. EDIT ends This can then be used to prefilter A x B. The complexity of filtering and computing the nonzeros would be O(M + N + number of nonzeros).

You can use product() in itertools in python to replace a nested for-loop. Using built-in function is always better I think. Examples can be like:
import numpy as np
l1 = np.random.randint(0, 10, (4, 4))
l2 = np.random.randint(0, 10, (3, 4))
print(f'l1:\n{l1}')
print(f'l2:\n{l2}')
from itertools import product
ious = np.array([intersection_over_union(box1, box2) for box1, box2 in product(l1, l2)]).reshape(len(l2), len(l1))
print(f'ious:\n{ious}')
What's more, you should change iou = interArea / float(boxAArea + boxBArea - interArea) to iou = interArea / float(boxAArea + boxBArea - interArea + 1e-16) to avoid divided by zero error.

How to find the maximum value of a numpy array, with location restrictions?

I have a numpy array in python 2.7, which I am using the imshow() function to visualise. The code generating the array looks like:
from pylab import *
r0 = 3.0
S0 = 10.0
x = zeros((101,101))
noiseimg = zeros((101,101))
for i in range(101):
for j in range(101):
noiseimg[i,j] = noiseimg[i,j] + normal(3,1)
mean_i = randint(0,101)
mean_j = randint(0,101)
for i in range(101):
for j in range(101):
r = ((i-mean_i)**2 + (j-mean_j)**2)**0.5
x[i,j] = S0*(1+(r/r0)**2)**-1.5
x[i,j] = x[i,j] + noiseimg[i,j]
if (((i-50)**2 + (j-50)**2)**0.5 >= 40) and (((i-50)**2 + (j-50)**2)**0.5 <= 41):
x[i,j]=0
imshow(x)
show()
What this does is produce an image with a level of background noise, and one circularly symmetric source. There is a circle centred on the image, with a radius of 40 pixels.
What I need to know is how to find the location of the highest value pixel within that circle. I know how to find the maximum value in the circle, but not the [i,j] location of it.
Thank you!
My question has been flagged by stackoverflow as a potential duplicate, but this doesn't contain the location restrictions that I need.

One solution is to "zero" out all the elements surrounding the circle and then simply take the max of the entire array. It appears your radius is 41, centered at (50,50).
Then you could do
import numpy as np
xc, yc = 50, 50
length = 101
radius = 41
y_grid, x_grid = np.ogrid[-xc:length-xc, -yc:length-yc]
mask = x_grid ** 2 + y_grid ** 2 > radius ** 2
And now create your image. Then find the minimum value and set that to every value out side your boundary. If there is a pixel outside the circle that is bigger than the max inside the circle, it is now set to a much smaller value.
x_min = np.min(x)
x[mask] = x_min
So your image will look like
And now just take the max
print np.max(x)
6.4648628255130571
This solution is nice because it avoids loops, which pretty much defeats the purpose of using numpy in the first place.
EDIT:
Sorry you said you wanted the indices of the max. The above solution is the same just unravel the index.
>>> i, j = np.unravel_index(x.argmax(), x.shape)
>>> print "{} {}".format(i, j)
23 32
>>> np.max(x) == x[i,j]
True

circleList = []
indeces = []
for i in len(x[0]):
for j in len(x[1]):
if x[i,j] in circle: #However you check if pixel is inside circle
circleList.append(x[i,j])
indeces.append = ((i,j))
print np.max(circleList) #Here is your max
print indeces(np.argmax(circleList)) #Here are the indeces of the max
should do it.

How to find way to stop while loop when 1 first point meet last in polygon in python?

I need to calculate the perimeter of a polygon by using only coordinates.
My function:
def definePerimeter(xCoords, yCoords):
i = 0
sum = 0
while xCoords[i] != xCoords[i+1] and yCoords[i] != yCoords[i+1]:
dx = xCoords[i] - xCoords[i+1]
dy = yCoords[i] - yCoords[i+1]
dsquared = dx**2 + dy**2
result = math.sqrt(dsquared)
print "The list of segments:"
print "The segment: ", result
i += 1
sum = sum + result
print "Total 2D Perimeter is " , sum ,"m" *
gives wrong Perimeter (compared to ArcGIS).
How to find way to stop while loop when 1 first point meet last in polygon in python?

You don't really need a while loop here. You can do it with a for loop since you are going through all of the polygon's vertices:
sum = 0
for i in xrange(len(xCoords) - 1):
sum += np.sqrt((xCoords[i] - xCoords[i + 1]) ** 2) + (yCoords[i] -yCoords[i + 1]) ** 2))
sum+=np.sqrt((xCoords[0] - xCoords[-1]) ** 2) + (yCoords[0] -yCoords[-1]) ** 2))
If you insist on doing so with a while loop you can do so in this way:
sum = 0
i = 0
while (i < len(xCoords) - 1):
sum += np.sqrt((xCoords[i] - xCoords[i + 1]) ** 2) + (yCoords[i] -yCoords[i + 1]) ** 2))
i += 1
sum+=np.sqrt((xCoords[0] - xCoords[-1]) ** 2) + (yCoords[0] -yCoords[-1]) ** 2))

Your algorithm is not correct. You need to first sort your coordinates based on the arctangent of the angle they create with the centroid of your polygon. (In order to get the correct order of your coordinates in your shape)
from math import atan
def sort_coordinates(centroid, shuffled_coordinates):
Cx, Cy = centroid
return sorted(shuffled_coordinates, key=lambda p: math.atan2(p[1]-Cy,p[0]-Cx))
Then you can calculate the length of the sides of shape using pair coordinates and sum up all them to get the perimeter:
def perimeter(coordinates):
return sum(math.sqrt(pow(y2-y1,2)+pow(x2-x1,2)) for (x1,y1),(x2,y2) in zip(coordinates, coordinates[1:]))

Seeing your while loop condition, I am guessing your last and first coordinates are the same.
x = [0,1,2,3,0]
y = [0,2,4,5,0]
I don't see any other way in which that while loop condition makes sense. If it is so, then you should try.
i = math.fmod((i+1),len(xCoords))

Many particles in box - physics simulation

I'm currently trying to simulate many particles in a box bouncing around.
I've taken into account #kalhartt's suggestions and this is the improved code to initialize the particles inside the box:
import numpy as np
import scipy.spatial.distance as d
import matplotlib.pyplot as plt
# 2D container parameters
# Actual container is 50x50 but chose 49x49 to account for particle radius.
limit_x = 20
limit_y = 20
#Number and radius of particles
number_of_particles = 350
radius = 1
def force_init(n):
# equivalent to np.array(list(range(number_of_particles)))
count = np.linspace(0, number_of_particles-1, number_of_particles)
x = (count + 2) % (limit_x-1) + radius
y = (count + 2) / (limit_x-1) + radius
return np.column_stack((x, y))
position = force_init(number_of_particles)
velocity = np.random.randn(number_of_particles, 2)
The initialized positions look like this:
Once I have the particles initialized I'd like to update them at each time-step. The code for updating follows the previous code immediately and is as follows:
# Updating
while np.amax(abs(velocity)) > 0.01:
# Assume that velocity slowly dying out
position += velocity
velocity *= 0.995
#Get pair-wise distance matrix
pair_dist = d.cdist(position, position)
pair_d = pair_dist<=4
#If pdist [i,j] is <=4 then the particles are too close and so treat as collision
for i in range(len(pair_d)):
for j in range(i):
# Only looking at upper triangular matrix (not inc. diagonal)
if pair_d[i,j] ==True:
# If two particles are too close then swap velocities
# It's a bad hack but it'll work for now.
vel_1 = velocity[j][:]
velocity[j] = velocity[i][:]*0.9
velocity[i] = vel_1*0.9
# Masks for particles beyond the boundary
xmax = position[:, 0] > limit_x
xmin = position[:, 0] < 0
ymax = position[:, 1] > limit_y
ymin = position[:, 1] < 0
# flip velocity and assume that it looses 10% of energy
velocity[xmax | xmin, 0] *= -0.9
velocity[ymax | ymin, 1] *= -0.9
# Force maximum positions of being +/- 2*radius from edge
position[xmax, 0] = limit_x-2*radius
position[xmin, 0] = 2*radius
position[ymax, 0] = limit_y-2*radius
position[ymin, 0] = 2*radius
After updating it and letting it run to completion I get this result:
This is infinitely better than before but there are still patches that are too close together - such as:
Too close together. I think the updating works... and thanks to #kalhartt my code is wayyyy better and faster (and I learnt some things about numpy... props #kalhartt) but I still don't know where it's screwing up. I've tried changing the order of the actual updates with the pair-wise distance going last or the position +=velocity going last but to no avail. I added the *0.9 to make the entire thing die down faster and I tried it with 4 to make sure that 2*radius (=2) wasn't too tight a criteria... but nothing seems to work.
Any and all help would be appreciated.

There are just two typos standing in your way. First for i in range(len(positions)/2): only iterates over half of your particles. This is why half the particles stay in the x bounds (if you watch for large iterations its more clear). Second, the second y condition should be a minimum (I assume) position[i][1] < 0. The following block works to bound the particles for me (I didn't test with the collision code so there could be problems there).
for i in range(len(position)):
if position[i][0] > limit_x or position[i][0] < 0:
velocity[i][0] = -velocity[i][0]
if position[i][1] > limit_y or position[i][1] < 0:
velocity[i][1] = -velocity[i][1]
As an aside, try to leverage numpy to eliminate loops when possible. It is faster, more efficient, and in my opinion more readable. For example force_init would look like this:
def force_init(n):
# equivalent to np.array(list(range(number_of_particles)))
count = np.linspace(0, number_of_particles-1, number_of_particles)
x = (count * 2) % limit_x + radius
y = (count * 2) / limit_x + radius
return np.column_stack((x, y))
And your boundary conditions would look like this:
while np.amax(abs(velocity)) > 0.01:
position += velocity
velocity *= 0.995
# Masks for particles beyond the boundary
xmax = position[:, 0] > limit_x
xmin = position[:, 0] < 0
ymax = position[:, 1] > limit_y
ymin = position[:, 1] < 0
# flip velocity
velocity[xmax | xmin, 0] *= -1
velocity[ymax | ymin, 1] *= -1
Final note, it is probably a good idea to hard clip position to the bounding box with something like position[xmax, 0] = limit_x; position[xmin, 0] = 0. There may be cases where velocity is small and a particle outside the box will be reflected but not make it inside in the next iteration. So it will just sit outside the box being reflected forever.
EDIT: Collision
The collision detection is a much harder problem, but lets see what we can do. Lets take a look at your current implementation.
pair_dist = d.cdist(position, position)
pair_d = pair_dist<=4
for i in range(len(pair_d)):
for j in range(i):
# Only looking at upper triangular matrix (not inc. diagonal)
if pair_d[i,j] ==True:
# If two particles are too close then swap velocities
# It's a bad hack but it'll work for now.
vel_1 = velocity[j][:]
velocity[j] = velocity[i][:]*0.9
velocity[i] = vel_1*0.9
Overall a very good approach, cdist will efficiently calculate the distance
between sets of points and you find which points collide with pair_d = pair_dist<=4.
The nested for loops are the first problem. We need to iterate over True values of pair_d where j > i. First your code actually iterate over the lower triangular region by using for j in range(i) so that j < i, not particularly important in this instance as long since i,j pairs are not repeated. However Numpy has two builtins we can use instead, np.triu lets us set all values below a diagonal to 0 and np.nonzero will give us the indices of non-zero elements in a matrix. So this:
pair_dist = d.cdist(position, position)
pair_d = pair_dist<=4
for i in range(len(pair_d)):
for j in range(i+1, len(pair_d)):
if pair_d[i, j]:
...
is equivalent to
pair_dist = d.cdist(position, position)
pair_d = np.triu(pair_dist<=4, k=1) # k=1 to exclude the diagonal
for i, j in zip(*np.nonzero(pair_d)):
...
The second problem (as you noted) is that the velocities are just switched and scaled instead of reflected. What we really want to do is negate and scale the component of each particles velocity along the axis that connects them. Note that to do this we will need the vector connecting them position[j] - position[i] and the length of the vector connecting them (which we already calculated). So unfortunately part of the cdist calculation gets repeated. Lets quit using cdist and do it ourselves instead. The goal here is to make two arrays diff and norm where diff[i][j] is a vector pointing from particle i to j (so diff is a 3D array) and norm[i][j] is the distance between particles i and j. We can do this with numpy like so:
nop = number_of_particles
# Give pos a 3rd index so we can use np.repeat below
# equivalent to `pos3d = np.array([ position ])
pos3d = position.reshape(1, nop, 2)
# 3D arras with a repeated index so we can form combinations
# diff_i[i][j] = position[i] (for all j)
# diff_j[i][j] = position[j] (for all i)
diff_i = np.repeat(pos3d, nop, axis=1).reshape(nop, nop, 2)
diff_j = np.repeat(pos3d, nop, axis=0)
# diff[i][j] = vector pointing from position[i] to position[j]
diff = diff_j - diff_i
# norm[i][j] = sqrt( diff[i][j]**2 )
norm = np.linalg.norm(diff, axis=2)
# check for collisions and take the region above the diagonal
collided = np.triu(norm < radius, k=1)
for i, j in zip(*np.nonzero(collided)):
# unit vector from i to j
unit = diff[i][j] / norm[i][j]
# flip velocity
velocity[i] -= 1.9 * np.dot(unit, velocity[i]) * unit
velocity[j] -= 1.9 * np.dot(unit, velocity[j]) * unit
# push particle j to be radius units from i
# This isn't particularly effective when 3+ points are close together
position[j] += (radius - norm[i][j]) * unit
...
Since this post is long enough already, here is a gist of the code with my modifications.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python loops very slow - python

Related

How to vectorize tasks in python?

Efficient way to calculate all IoUs of two lists

How to find the maximum value of a numpy array, with location restrictions?

How to find way to stop while loop when 1 first point meet last in polygon in python?

Many particles in box - physics simulation

Categories

Resources