Parallel array manipulations in numpy

Parallel array manipulations in numpy - python

I have a code in which I need to handle some big numpy arrays. For example I have a 3D array A and I need to construct another 3d array B using the elements of A. However all the elements of B are independent of each other. Example:
for i in np.arange(Nx):
for j in np.arange(Ny):
for k in np.arange(Nz):
B[i][j][k] = A[i+1][j][k]*np.sqrt(A[i][j-1][k-1])
So it will speed up immensely if I can construct the B array parallely. What is the simplest way to do this in python?
I also have similar matrix operations like normalizing each row of a 2D array. Example
for i in np.arange(Nx):
f[i,:] = f[i,:]/np.linalg.norm(f[i,:])
This will also speed up if it runs parallely for each row. How can it be done?

You should look into Numpy's roll function. I think this is equivalent to your first block of code (though you need to decide what happens at the edges - roll "wraps around"):
B = np.roll(A,1,axis=0) * np.sqrt(np.roll(np.roll(A,-1,axis=1),-1,axis=2))
Another fairly horrible one-liner for your second case is:
f /= np.sqrt(np.sum(f**2, axis=1))[...,np.newaxis]
Explanation of this line:
We are first going to calculate the norm of each row. Let's
f = np.random.rand(5,6)
Square each element of f
f**2
Sum the squares along axis 1, which "flattens" out that axis.
np.sum(f**2, axis=1)
Take the square root of the sum of the squares.
np.sqrt(np.sum(f**2, axis=1))
We now have the norm of each row.
To divide each original row of f by this correctly we need to make use of the Numpy broadcasting rules to effectively add a dimension:
np.sqrt(np.sum(f**2, axis=1))[...,np.newaxis]
And finally we calculate our result
f /= np.sqrt(np.sum(f**2, axis=1))[...,np.newaxis]

If you are taking good care of the edges, the standard way of going about your first vectorization would be something like this:
B = np.zeros(A.shape)
B[:-1, 1:, 1:] = A[1:, 1:, 1:] * np.sqrt(A[:-1, :-1, :-1])
You would then need to fill B[-1, :, :], B[:, 0, :] and B[:, :, 0] with appropriate values.
Extending this to other indices should be pretty straightforward.

To perform parallel processing in numpy, you should look at mpi4py. It's an MPI binding for Python. It allows distributed processing.

Related

How can you do an outer summation over only one dimension of a numpy 2D array?

I have a (square) 2 dimensional numpy array where I would like to compare (subtract) all of the values within each row to each other but not to other rows so the output should be a 3D array.
matrix = np.array([[10,1,32],[32,4,15],[6,3,1]])
Output should be a 3x3x3 array which looks like:
output = [[[0,-9,22],[0,-28,-17],[0,-3,-5]], [[9,0,31],[28,0,11],[3,0,-2]], [[-22,-31,0],[17,-11,0],[5,2,0]]]
I.e. for output[0], for each of the 3 rows of matrix, subtract that row's zeroth element from every other, for output[1] subtract each row's first element etc.
This seems to me like a reduced version of numpy's ufunc.outer functionality which should be possible with
tryouter = np.subtract(matrix, matrix)
and then taking some clever slice and/or transposition.
Indeed, if you do this, one finds that: output[i,j] = tryouter[i,j,i]
This looks like it should be solvable by using np.transpose to switch the 1 and 2 axes and then taking the arrays on the new 0,1 diagonal but I can't work out how to do this with numpy diagonal or any slicing method.
Is there a way to do this or is there a simpler approach to this whole problem built into numpy?
Thanks :)

You're close, you can do it with broadcasting:
out = matrix[None, :, :] - matrix.T[:, :, None]
Here .T is the same as np.transpose, and using None as an index introduces a new dummy dimension of size 1.

Finding nearest pixel in defined color space - quick implementation using numpy

I have been working on a task, where I implemented median cut for image quantization – representing the whole image by only limited set of pixels. I implemented the algorithm and now I am trying to implement the part, where I assign each pixel to a representant from the set found by median cut. So, I have variable 'color_space', which is 2d ndarray of shape (n,3), where n is the number of representatives. Then I have variable 'img', which is the original image of shape (rows, columns, 3).
Now I want to find the nearest pixel (bin) for each pixel from the image based on euclidean distance. I was able to come with this solution:
for row in range(img.shape[0]):
for column in range(img.shape[1]):
img[row][column] = color_space[np.linalg.norm(color_space - img[row][column], axis=1).argmin()]
What it does is, that for each pixel from the image, it computes the vector if distances from each of the bins and then it takes the closest one.
Problem is, that this solution is quite slow and I would like to vectorize it - instead of getting vector for each pixel, I would like to get a matrix, where for example first row would be the first vector of distances computed in my code etc...
This problem could be converted into a problem, where I want to do a matrix multiplication, but instead of getting dot product of two vectors, I would get their euclidean distance. Is there some good approach to such problems? Some general solution in numpy, if we want to do 'matrix multiplication' in numpy, but the function Rn x Rn -> R does not need to be dot product, but for example euclidean distance. Of course, for the multiplication, the original image should be resized to (row*columns, 3), but that is a detail.
I have been studying the documentation and searching internet, but didn't find any good approach.
Please note that I don't want others to solve my assignment, the solution I came up with is totally ok, I am just curious whether I could speed it up as I try to learn numpy properly.
Thanks for any advices!

Below is MWE for vectorizing your problem. See comments for explanation.
import numpy
# these are just random array declaration to work with.
image = numpy.random.rand(32, 32, 3)
color_space = numpy.random.rand(10,3)
# your code. I modified it to pick indexes
result = numpy.zeros((32,32))
for row in range(image.shape[0]):
for column in range(image.shape[1]):
result[row][column] = numpy.linalg.norm(color_space - image[row][column], axis=1).argmin()
result = result.astype(numpy.int)
# here we reshape for broadcasting correctly.
image = image.reshape(1,32,32,3)
color_space = color_space.reshape(10, 1,1,3)
# compute the norm on last axis, which is RGB values
result_norm = numpy.linalg.norm(image-color_space, axis=3)
# now compute the vectorized argmin
result_vectorized = result_norm.argmin(axis=0)
print(numpy.allclose(result, result_vectorized))
Eventually, you can get the correct solution by doing color_space[result]. You may have to remove the extra dimensions that you add in color space to get correct shapes in this final operation.

I think this approach might be a bit more numpy-ish/pythonic:
import numpy as np
from typing import *
from numpy import linalg as LA
# assume color_space is defined as a constant somewhere above and is of shape (n,3)
nearest_pixel_idxs: Callable[[np.ndarray], int] = lambda rgb: return LA.norm(color_space - rgb, axis=1).argmin()
img: np.ndarray = color_space[np.apply_along_axis(nearest_pixel_idxs, 1, img.reshape((-1, 3)))]
Why this solution might be more efficient:
It relies on the parallelizable apply_along_axis function nearest_pixel_idxs() rather than the nested for-loops. This is made possible by reshaping img and thereby removing the need for double indexing.
It avoids repeated writes into color_space by only indexing into it once at the very end.
Let me know if you would like me to go into greater depth on any of this - happy to help.

You could first broadcast to get all the combinations and then calculate each norm. You could then pick the smallest from there.
a = np.array([[1,2,3],
[2,3,4],
[3,4,5]])
b = np.array([[1,2,3],
[3,4,5]])
a = np.repeat(a.reshape(a.shape[0],1,3), b.shape[0], axis = 1)
b = np.repeat(b.reshape(1,b.shape[0],3), a.shape[0], axis = 0)
np.linalg.norm(a - b, axis = 2)
Each row of the result represents the distance of the row in a to each of the representatives in b
array([[0. , 3.46410162],
[1.73205081, 1.73205081],
[3.46410162, 0. ]])
You can then use argmin to get the final results.
IMO it is better to use (what #Umang Gupta proposed) numpy's automatic broadcasting than using repeat.

Numpy: Perform Multiplication-like Addition

I wanted to define my own addition operator that takes an Nx1 vector (call it A) and a 1xN vector (B) such that the element in the i^th row and j^th column is the sum of the i^th element in A and the j^th element in B. An example is illustrated here.
I was able to write the following code for the function (and it is correct as far as I know).
def test_fn(a, b):
a_len = a.shape[0]
b_len = b.shape[1]
prod = np.array([[0]*a_len]*b_len)
for i in range(a_len):
for j in range(b_len):
prod[i, j] = a[i, 0] + b[0, j]
return prod
However, the vectors I am working with contain thousands of elements, and the function above is quite slow. I was wondering if there was a better way to approach this problem, or if there was a numpy function that could be of use. Any help would be appreciated.

According to numpy's broadcasting rules, you can use a+b to implement your own defined operator.
The first rule of broadcasting is that if all input arrays do not have the same number of dimensions, a “1” will be repeatedly prepended to the shapes of the smaller arrays until all the arrays have the same number of dimensions.
The second rule of broadcasting ensures that arrays with a size of 1 along a particular dimension act as if they had the size of the array with the largest shape along that dimension. The value of the array element is assumed to be the same along that dimension for the “broadcast” array.

Vectorized Evaluation of a Function, Broadcasting and Element Wise Operations

Given this...
I have to explain what this code does, knowing that it performs the vectorized evaluation of F, using broadcasting and element wise operations concepts...
def F(x_pos, alpha):
D = x_pos.reshape(1,-1) - x_pos.reshape(-1,1)
return (1./alpha) * (alpha.reshape(1,-1) * R(D)).sum(axis=1)
My explanation is:
In the first line of the function F receives x_pos and alpha as parameters (both numpy arrays), in the second line the matrix D is calculated by means of broadcasting (basic operations such as addition in arrays numpy are performed elementwise, ie, element by element, but it is also possible with arranys of different size if numpy can transform them into others of the same size, this conversion is called broadcasting), subtracting an array of order 1xN with another of order Nx1, resulting in the matrix D of order NxN containing x_j - x_1, x_j - x_2, etc. as elements, finally, in the last line the reciprocal of alpha is calculated (which clearly is an arrangement), where each element is multiplied by the sum of the R evaluation of each cell of the matrix D multiplied by alpha_j horizontally (due to axis = 1 in the argument)
Questions:
Considering I'm new to Python, is my explanation OK?
The code has an error or not? Because I don't see that the "j must be different from 1, 2, ..., n" in each sum is taken into consideration in the code... and If it's in fact wrong... How can I fix the code so it do exactly the same thing as stated as in the image?

Few comments/improvements/fixes could be suggested here.
1] The first step could be alternatively done with just introducing a new axis and subtracting with itself, like so -
D = x_pos[:,None] - x_pos
In my opinion, this is a cleaner option. The performance benefit might be just marginal.
2] In the second line, I think it needs a fix as we need to avoid computations for the diagonal elements of R(D). So, If I got that correctly, the corrected code would be -
vals = R(D)
np.fill_diagonal(vals,0)
out = (1./alpha) * (alpha.reshape(1,-1) * vals).sum(axis=1)
Now, let's make the code a bit more idiomatic/cleaner.
At that line, we could write : (alpha * vals) instead of alpha.reshape(1,-1) * vals. This is because the shapes are already aligned for broadcasting as shown in a schematic diagram below -
alpha : n
vals : n x n
Thus, alpha would be automatically extended to 2D with its elements broadcasted along the first axis for the length of vals and then elementwise multiplications being generated with it. Again, this is meant as a cleaner code.
There's a further performance improvement possible here with (alpha.reshape(1,-1) * vals).sum(axis=1) being replaceable with a matrix-multiplicatiion using np.dot as alpha.dot(vals). The benefit on performance should be noticeable with this step.
So, the second step reduces to -
out = (1./alpha) * alpha.dot(vals)

Matrix vector multiplication along array axes

in a current project I have a large multidimensional array of shape (I,J,K,N) and a square matrix of dim N.
I need to perform a matrix vector multiplication of the last axis of the array with the square matrix.
So the obvious solution would be:
for i in range(I):
for j in range(J):
for k in range(K):
arr[i,j,k] = mat.dot(arr[i,j,k])
but of course this is rather slow. So I also tried numpy's tensordot but had little success.
I would expect that something like:
arr = tensordot(mat,arr,axes=((0,1),(3)))
should do the trick but I get a shape mismatch error.
Has someone a better solution or knows how to correctly use tensordot?
Thank you!

This should do what your loops, but with vectorized looping:
from numpy.core.umath_tests import matrix_multiply
arr[..., np.newaxis] = matrix_multiply(mat, arr[..., np.newaxis])
matrix_multiply and its sister inner1d are hidden, undocumented, gems of numpy, although a full set of linear algebra gufuncs should see the light with numpy 1.8. matrix_multiply does matrix multiplication on the last two dimensions of its inputs, and broadcasting on the rest. The only tricky part is setting an additional dimension, so that it sees column vectors when multiplying, and adding it also on assignment back into array, so that there is no shape mismatch.

I think your for loop is wrong, and for this case dot seems to be enough:
# a is your IJKN
# b is your NN
c = dot(a, b)
Here c will be a IJKN array. If you want to sum over the last dimension to get the IJK array:
arr = dot(a,b).sum(axis=3)
BUT I'm NOT SURE IF THIS IS WHAT YOU WANT...

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parallel array manipulations in numpy - python

To perform parallel processing in numpy, you should look at mpi4py. It's an MPI binding for Python. It allows distributed processing.

Related

How can you do an outer summation over only one dimension of a numpy 2D array?

Finding nearest pixel in defined color space - quick implementation using numpy

Numpy: Perform Multiplication-like Addition

Vectorized Evaluation of a Function, Broadcasting and Element Wise Operations

Matrix vector multiplication along array axes

Categories

Resources