How to vectorize this matrix building operation?

How to vectorize this matrix building operation? - python

Problem:
Vectorize the building of a matrix where certain elements are a function determined by whether or not a particular triple index exists in a dictionary. Other elements are 0.
Code:
This is simple to implement using for loops:
# specify matrix size. In this example it is 4*4 but they get much, much bigger
mat_size = 4
# create empty numpy array
matt = np.zeros((mat_size, mat_size))
# index_list is a list of tuples which are subtracted element-wise in the for loops to form a new triple index used to access the coming dictionary
# the list of tuples gets larger as the matrix size increases
index_list = [(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 0, 2), (0, 1, 1), (0, 2, 0), (1, 1, 0)]
# the dictionary of functions:
# keys are a tuple of integers
# values are the functions to be evaluated to form the matrix
# there are more keys in the actual problem but this is for the minimal working example
func_dict = {(0, 0, 0): lambda a : a + 10,
(0, 0, -2): lambda a : a - 3,
(0, 0, -1): lambda a : a**5,
(0, 0, 1): lambda a : a*10 + 5,
(0, 1, 0): lambda a : a/3 + 2,
(0, 1, -2): lambda a : a*2 + a**4,
(0, 1, -1): lambda a : a/4 * 76,
(0, 1, 1): lambda a : a*19 / 3,
}
# only loop over half the matrix as they are always symmetric
for H in range(mat_size):
for J in range(H, mat_size):
x = index_list[J][0] - index_list[H][0] # subtract first tuple elements
y = index_list[J][1] - index_list[H][1] # subtract second tuple elements
z = index_list[J][2] - index_list[H][2] # subtract third tuple elements
# check if x,y,z is in the dictionary
if (x, y, z) in func_dict:
matt[H,J] = func_dict[x, y, z](3.43)
else:
matt[H,J] = 0
Output:
An upper triangular (with diagonal) matrix:
[[13.43 39.3 3.14333333 0. ]
[ 0. 13.43 65.17 39.3 ]
[ 0. 0. 13.43 0. ]
[ 0. 0. 0. 13.43 ]]
I parallelized the outer loop which is the current implementation I have been using and has been suitably fast enough. The matrix sizes are now becoming larger and after profiling this is the limiting step which I need to speed up as it gets called repeatedly.
I can feel this problem will allow for vectorization but after several attempts always results in a loop which then ends up being comparable with the parallelized version.
Is there an effective way to vectorize this problem, removing any loop(s)?

Related

Wrong difference returned between two pixelaccess objects

So I have a function which takes as parameters two PixelAccess objects which essentially are two images which are converted to a multi-dimensional array of pixels of the type (image1pixels[x,y]) , then it subtracts each tuple of pixels on the width and height of both the images and then appends the subtractions of pixels inside an array named c ; The function then returns the sum of all the tuples in the array.
Here is the function:
def difference(pix1, pix2):
size = width, height = img.size;
result = 0;
array = [];
for x in range(width):
for y in range(height):
c = tuple(map(sub, pix2[x, y], pix1[x, y]));
array.append(c);
result = abs(add(map(sum, array)));
return result;
Here to have an idea, when I print c, this is what is printed:
(0, 0, 0)
(0, 0, 0)
(0, 0, 0)
(-253, -253, -253)
(-210, -210, -210)
(-168, -168, -168)
(-147, -147, -147)
(-48, -48, -48)
(-13, -13, -13)
(-29, -29, -29)
(-48, -48, -48)
(-48, -48, -48)
(0, 0, 0)
(0, 0, 0)
(0, 0, 0)
I have to compare two images using this function, the expected difference should be 17988 but my function returns 9174.
I just want to know if my logic is wrong or if I'm coding the wrong way here knowing python is not my primary everyday language.
Thanks in advance.

Coding a circular filter in Python

I found a code snippet for making a circular filter using scipy and I'd like to understand how it works. I know there's a better one in skimage, but I'm interested in what's going on in this one.
from scipy.ndimage.filters import generic_filter as gf
# Define physical shape of filter mask
def circular_filter(image_data, radius):
kernel = np.zeros((2*radius+1, 2*radius+1))
y, x = np.ogrid[-radius:radius+1, -radius:radius+1]
mask = x**2 + y**2 <= radius**2
kernel[mask] = 1
filtered_image = gf(image_data, np.median, footprint = kernel)
return filtered_image
But I'm not sure I understand perfectly what's going on. In particular, what exactly do the lines
y, x = np.ogrid[-radius:radius+1, -radius:radius+1]
mask = x**2 + y**2 <= radius**2
kernel[mask] = 1
do?
I posted this as an answer to one of my previous questions, but it wasn't replied to, so I'm posting it as a new question.

Looking at your code in detail:
kernel = np.zeros((2*radius+1, 2*radius+1))
y, x = np.ogrid[-radius:radius+1, -radius:radius+1]
mask = x**2 + y**2 <= radius**2
kernel[mask] = 1
The first line:
kernel = np.zeros((2*radius+1, 2*radius+1))
creates a 2-d array of zeros, with a center point and "radius" points on either side. For radius = 2, you would get:
# __r__ +1 __r__
[ 0, 0, 0, 0, 0, ] #\
[ 0, 0, 0, 0, 0, ] #_} r
[ 0, 0, 0, 0, 0, ] # +1
[ 0, 0, 0, 0, 0, ] #\
[ 0, 0, 0, 0, 0, ] #_} r
Next, you get two arrays from the open mesh grid created by numpy.ogrid. Mesh grids are a "trick" in numpy that involves storing a "parallel" array or matrix that holds the x or y coordinate of a particular cell at the location of that cell.
For example, a y-mesh grid might look like this:
[ 0, 0, 0, 0, 0, ]
[ 1, 1, 1, 1, 1, ]
[ 2, 2, 2, 2, 2, ]
[ 3, 3, 3, 3, 3, ]
[ 4, 4, 4, 4, 4, ]
And an x-mesh grid might look like this:
[ 0, 1, 2, 3, 4, ]
[ 0, 1, 2, 3, 4, ]
[ 0, 1, 2, 3, 4, ]
[ 0, 1, 2, 3, 4, ]
[ 0, 1, 2, 3, 4, ]
If you look at them, you'll realize that Y_grid[x][y] == y and X_grid[x][y] == x which is so often useful that it has more than one numpy function to support it. ;-)
An open mesh grid is similar to a closed one, except that it only has "one dimension." That is, instead of a pair of (for example) 5x5 arrays, you get a 1x5 array and a 5x1 array. That's what ogrid does - it returns two open grids. The values are from -radius to radius+1, according to python rules (meaning the radius+1 is left out):
y, x = np.ogrid[-radius:radius+1, -radius:radius+1]
So y is a numpy array storing from e.g., -2..2 (inclusive), and x is an array from -2..2 inclusive. The next step is to build a boolean mask - that is, an array full of boolean values. As you know, when you operate on a numpy array, you get another numpy array. So involving two arrays in an expression with a constant produces another array:
mask = x**2 + y**2 <= radius**2
The value of mask is going to be a 2-color bitmap, where one color is "True" and the other color is "False." The bitmap will describe a solid circle or disk. (Because of the <= relation. Remember that x and y contain -2..2, not 0..4.)
Finally, you convert from type Boolean to int by using the masking array as an overlay on the kernel array (of zeroes), setting the zeroes to ones whenever the mask is "True":
kernel[mask] = 1
At this point, kernel looks like:
# __r__ +1 __r__
[ 0, 0, 1, 0, 0, ] #\
[ 0, 1, 1, 1, 0, ] #_} r
[ 1, 1, 1, 1, 1, ] # +1
[ 0, 1, 1, 1, 0, ] #\
[ 0, 0, 1, 0, 0, ] #_} r

I'm not familiar with SciPy but I'll give it a shot trying to explain the basic concepts.
This entire function's purpose is to alter the original image by applying a filter. This filter could do a lot of things, from changing the contrast of the image, or adding special effects, etc.
Let's go through the different lines:
kernel = np.zeros((2*radius+1, 2*radius+1))
In this line, a copy of the image data is being created, but with all the data being zeros (hence the zeros function is being used). This is so the mask can be applied later onto it.
y, x = np.ogrid[-radius:radius+1, -radius:radius+1]
This is creating what is known as a "meshgrid" or a multi-dimensional grid. This is to create the circular "mask". Just like how on a graph, x and y axes have evenly spaced scaling, the same is necessary here in the meshgrid.
The x and y variables in this case store evenly spaced values that serve as the axes' scaling.
mask = x**2 + y**2 <= radius**2
Here, a "mask" is being created. A mask will serve as the region in the image to be protected from the filter, so as to not alter any original data. Notice how x and y variables are used here in a Pythagorean inequality (important to see that it's not just a circle but a disk), just like how they would be in a mathematical sense. This will create a disk with the given radius that is now considered the mask. The mask variable now contains all coordinates (x,y) where the original data values should not be altered.
kernel[mask] = 1
This is where the mask is now applied to the copy of the image that was created earlier. Now, there is a perfect copy of the image (i.e. same dimensions) but with a disk-like "mask" that "protects" the original data from being altered. This is why all the points covered by the disk is set to 1. Also, notice how the dimensions of kernel and mask match. Both are multi-dimensional. The rest of values in the image copy are still set to zero, as was done in the first line.
filtered_image = gf(image_data, np.median, footprint = kernel)
This is final part where everything is pieced together. There is the original data stored in image_data and there is the kernel, which is the image copy with the mask applied on it indicating where the data should not be altered. Both of them are passed as parameters into the actual filter function gf (stands for generic filter) and the output is a new filtered image.
This is a core concept in image filtering and if you want to learn more about it, I suggest starting out by learning basic signal processing concepts. Signal processing courses cover the mathematics of how these concepts work, but are usually explained in really abstract mathematics because this concept can be applied to numerous different examples.

Python how to loop through an array in numpy at C speed and store some positions

I am new to python, numpy and opencv. I am playing with the first example of harris corner detector from here. My objective is to get an ordered list of all the corners. With this simple code I am able to get the X and Y coordinates of the corners and their value:
height, width, depth = img.shape
print height, width
for i in range(0, height): #looping at python speed
for j in range(0, (width)):
if dst[i,j] > 0.9*dst.max():
print i, j, dst[i,j]
However, it is dead slow. I don't know how this is called but apparently with numpy one can loop through arrays at C speed and even assign values, example:
img[0:height, 0:width, 0:depth] = 0
Can I loop through an array and assign the position of interesting values in another variable? I.e. can I use this on my code to make it faster?

You can get a mask of elements that would pass the IF conditional statement. Next up, if you need the indices that would pass the condition, use np.where or np.argwhere on the mask. For the valid dst elements, index dst with the same mask, thus using boolean indexing. The implementation would look something like this -
mask = dst > 0.9*dst.max()
out = np.column_stack((np.argwhere(mask),dst[mask]))
If you would like to get those three printed outputs separately, you could do -
I,J = np.where(mask)
valid_dst = dst[mask]
Finally, if you would like to edit the 3D array img based on the 2D mask, you could do -
img[mask] = 0
This way, you would change the corresponding elements in img across all channels in one go.

First of all if you are using Python 2.X you should use xrange instead of range, this speeds things up. range in Python 3.X has the same implementation as xrange in Python 2.X.
If you want to iterate over numpy arrays why not use the numpy enumerator for iteration?
# creating a sample img array with size 2x2 and RGB values
# so the shape is (2, 2, 3)
> img = np.random.randint(0, 255, (2,2,3))
> img
> array([[[228, 61, 154],
[108, 25, 52]],
[[237, 207, 127],
[246, 223, 101]]])
# iterate over the values, getting key and value at the same time
# additionally, no nasty nested for loops
> for key, val in np.ndenumerate(img):
> print key, val
# this prints the following
> (0, 0, 0) 228
> (0, 0, 1) 61
> (0, 0, 2) 154
> (0, 1, 0) 108
> (0, 1, 1) 25
> (0, 1, 2) 52
> (1, 0, 0) 237
> (1, 0, 1) 207
> (1, 0, 2) 127
> (1, 1, 0) 246
> (1, 1, 1) 223
> (1, 1, 2) 101

Create new numpy array-scalar of flexible dtype

I have a working solution to my problem, but when trying different things I was astounded there wasn't a better solution that I could find. It all boils down to creating a single flexible dtype value for comparing and inserting into an array.
I have an RGB 24-bit image (so 8-bits for each R, G, and B) image array. It turns out for some actions it is best to use it as a 3D array with HxWx3 other times it is best to use it as a structured array with the dtype([('R',uint8),('G',uint8),('B',uint8)]). One example is when trying to relabel the image colors so that every unique color is given a different value. I do this with the following code:
# Given im as an array of HxWx3, dtype=uint8
from numpy import dtype, uint8, unique, insert, searchsorted
rgb_dtype = dtype([('R',uint8),('G',uint8),('B',uint8)]))
im = im.view(dtype=rgb_dtype).squeeze() # need squeeze to remove the third dim
values = unique(im)
if tuple(values[0]) != (0, 0, 0):
values = insert(values, 0, 0) # value 0 needs to always be (0, 0, 0)
labels = searchsorted(values, im)
This works beautifully, however I am tried to make the if statement look nicer and just couldn't find a way. So lets look at the comparison first:
>>> values[0]
(0, 0, 0)
>>> values[0] == 0
False
>>> values[0] == (0, 0, 0)
False
>>> values[0] == array([0, 0, 0])
False
>>> values[0] == array([uint8(0), uint8(0), uint8(0)]).view(dtype=rgb_dtype)[0]
True
>>> values[0] == zeros((), dtype=rgb_dtype)
True
But what if you wanted something besides (0, 0, 0) or (1, 1, 1) and something that was not ridiculous looking? It seems like there should be an easier way to construct this, like rgb_dtype.create((0,0,0)).
Next with the insert statement, you need to insert 0 for (0, 0, 0). For other values this really does not work, for example inserting (1, 2, 3) actually inserts (1, 1, 1), (2, 2, 2), (3, 3, 3).
So in the end, is there a nicer way? Thanks!

I could make insert() work for your case doing (note that instead of 0 it is used [0]):
values = insert(values, [0], (1,2,3))
giving (for example):
array([(0, 1, 3), (0, 0, 0), (0, 0, 4), ..., (255, 255, 251), (255, 255, 253), (255, 255, 255)],
dtype=[('R', 'u1'), ('G', 'u1'), ('B', 'u1')])
Regarding another way to do your if, you can do this:
str(values[0]) == str((0,0,0))
or, perhaps more robust:
eval(str(values[0])) == eval(str(0,0,0))

A 3-D grid of regularly spaced points

I want to create a list containing the 3-D coords of a grid of regularly spaced points, each as a 3-element tuple. I'm looking for advice on the most efficient way to do this.
In C++ for instance, I simply loop over three nested loops, one for each coordinate. In Matlab, I would probably use the meshgrid function (which would do it in one command). I've read about meshgrid and mgrid in Python, and I've also read that using numpy's broadcasting rules is more efficient. It seems to me that using the zip function in combination with the numpy broadcast rules might be the most efficient way, but zip doesn't seem to be overloaded in numpy.

Use ndindex:
import numpy as np
ind=np.ndindex(3,3,2)
for i in ind:
print(i)
# (0, 0, 0)
# (0, 0, 1)
# (0, 1, 0)
# (0, 1, 1)
# (0, 2, 0)
# (0, 2, 1)
# (1, 0, 0)
# (1, 0, 1)
# (1, 1, 0)
# (1, 1, 1)
# (1, 2, 0)
# (1, 2, 1)
# (2, 0, 0)
# (2, 0, 1)
# (2, 1, 0)
# (2, 1, 1)
# (2, 2, 0)
# (2, 2, 1)

Instead of meshgrid and mgrid, you can use ogrid, which is a "sparse" version of mgrid. That is, only the dimension along which the values change are filled in. The others are simply broadcast. This uses much less memory for large grids than the non-sparse alternatives.
For example:
>>> import numpy as np
>>> x, y = np.ogrid[-1:2, -2:3]
>>> x
array([[-1],
[ 0],
[ 1]])
>>> y
array([[-2, -1, 0, 1, 2]])
>>> x**2 + y**2
array([[5, 2, 1, 2, 5],
[4, 1, 0, 1, 4],
[5, 2, 1, 2, 5]])

I would say go with meshgrid or mgrid, in particular if you need non-integer coordinates. I'm surprised that Numpy's broadcasting rules would be more efficient, as meshgrid was designed especially for the problem that you want to solve.

for multi-d (greater than 2) meshgrids, use numpy.lib.index_tricks.nd_grid like so:
import numpy
grid = numpy.lib.index_tricks.nd_grid()
g1 = grid[:3,:3,:3]
g2 = grid[0:1:0.5, 0:1, 0:2]
g3 = grid[0:1:3j, 0:1:2j, 0:2:2j]
where g1 has x values of [0,1,2]
and g2 has x values of [0,.5],
and g3 has x values of [0.0,0.5,1.0] (the 3j defining the step count instead of the step increment. see the documentation for more details.

Here's an efficient option similar to your C++ solution, which I've used for exactly the same purpose:
import numpy, itertools, collections
def grid(xmin, xmax, xstep, ymin, ymax, ystep, zmin, zmax, zstep):
"return nested tuples of grid-sampled coordinates that include maxima"
return collections.deque( itertools.product(
numpy.arange(xmin, xmax+xstep, xstep).tolist(),
numpy.arange(ymin, ymax+ystep, ystep).tolist(),
numpy.arange(zmin, zmax+zstep, zstep).tolist() ) )
Performance is best (in my tests) when using a.tolist(), as shown above, but you can use a.flat instead and drop the deque() to get an iterator that will sip memory. Of course, you can also use a plain old tuple() or list() instead of deque() for a slight performance penalty (again, in my tests).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.