How to np.convolve over two 2d arrays - python

I want to carry out np.convolve for two 2d arrays in a vectorized manner. Here is the thing:
The function np.convolve takes two 1d arrays, a and v, and computes the convolution. The result reads:
output[n] = \sum_m a[m] v[n - m] .
What I want to do is, for 2d arrays a and v, to repeat "convolution along axis=0" over axis=1. That is,
output[n][k] = \sum_m a[m][k] v[n - m][k] .
Though this is attained by a for statement with respect to the axis=1, I'm keen on finding a way to "vectorize" this to improve performance.
I looked up for a while but no clue. I'm glad if someone could find a way out. Thank you so much.

Depending on the size of your arrays it may be quicker to do a multiplication of the fft arrays, which is mathematically equivalent to the convolution operation. Based on the answer here, it is quicker, however it appears limited when you get to exceptionally long sequences, as proven here. An example of what this could look like:
from scipy.signal import fftconvolve
output = fftconvolve(a, v, mode='same') #mode='full', too
If they help you don't forget to give them an upvote, too :).

Related

Python correspondent for MATLAB matrix operation

I have a vector of indices (let's call it peo), a sparse matrix P and a matrix W. In MATLAB I can do an operation of this kind:
P(peo, peo) = W(peo, peo)
Is there a way to do the same in Python maintaining the same computational and time complexity?
Choosing library
There is a very similar way of expressing that in python if you use dense matrices. Using a sparse matrix is a little more complex. In general, if your code is not slowed by dense matrices too much and memory is not a problem I would stick to dense matrices with numpy since it is very convenient. (As they say premature optimization is the root of all evil... or something like that). However if you really need sparse matrices scipy will offer you an option for that.
Dense matrices
If you want to use dense matrices, you can use numpy to define matrices and peo should be defined as a list. Then you should note that the indexing with list (or vectors) doesn't work the same way in matlab as it does in python. Check this for details. (thank you Cris Lunego for pointing that out). To circumvent this and obtain the same behaviour as matlab, we will be using numpy.ix_. This will allows us to reproduce the matlab behavior of indexing with minimal alteration to our code.
Here is an example:
import numpy as np
# Dummy matrices definition
peo = [1, 3, 4]
P = np.zeros((5, 5))
W = np.ones((5, 5))
# Assignment (using np.ix_ to reproduce matlab behavior)
P[np.ix_(peo, peo)] = W[np.ix_(peo, peo)]
print(P)
Sparse matrices
For sparse matrices, scipy has a package called sparse that allows you to use sparse matrices along the way of matlab does. It gives you an actual choice on how the matrix should be represented where matlab doesn't. With great power comes great responsabiliy. Taking the time to read the pros and cons of each representation will help you choose the right one for your application.
In general it's hard to guarantee the exact same complexity because the languages are different and I don't know the intricate details of each. But the concept of sparse matrices is the same in scipy and matlab so you can expect the complexity to be comparable. (You might even be faster in python since you can choose a representation tailored to your needs).
Note that in this case if you want to keep working the same way as you describe in matlab you should choose a dok or lil representation. Those are the only two formats that allow efficient index access and sparsity change.
Here is an example of what you want to archive using dok representation:
from scipy.sparse import dok_matrix
import numpy as np
# Dummy matrices definition
peo = [1, 2, 4]
P = dok_matrix((5, 5))
W = np.ones((5, 5))
# Assignment (using np.ix_ to reproduce matlab behavior)
P[np.ix_(peo, peo)] = W[np.ix_(peo, peo)]
print(P.toarray())
If you are interested in the pros and cons of sparse matrix representation and algebra in Python here is a post that explores this a bit as well as performances. This is to take with a grain of salt since it is a little old, but the ideas behind it are still mostly correct.

Fast way to apply a mapping array to all elements in a numpy array?

Right now, I have code that basically looks like:
for x in range(img.shape[0]):
for y in range(image.shape[1]):
output[x,y] = map[ input[x,y] ]
where output, input and map are all numpy arrays (map is size 256, all are type uint8).
This works, but it's slow. Loops like this should be in C. That's what numpy is for.
Is there a numpy function (or a cv2 function, I'm already importing that anyway) that will do this?
How about?
output = map[input]
You're looking for np.take which is as simple as map.take(input). Both this and Eelco's solution are much faster than yours; this is about 70 percent faster than Eelco's system though your mileage may vary, and for input.shape >> (1e4, 1e4) you'll need a better solution.
One place to start is [Performance of various numpy fancy indexing methods, also with numba] (Performance of various numpy fancy indexing methods, also with numba) which details various performance-related facts for this generic problem (i.e. how do we use one k-dimensional array to index some other n-dimensional array in more than just trivial ways).
If you have something like Anaconda installed, you could try to use Numba to jit np.ndarray.take(...) and see how much performance you can buy. The link above explains this as well.

Multi-Dimensional Batch-Image Convolution using Numpy

In image processing and classification networks, a common task is the convolution or cross-correlation of input images with some fixed filters. For example, in convolutional neural nets (CNNs), this is an extremely common operation. I have reduced the general version task to this:
Given: a batch of N images [N,H,W,D,...] and a set of K filters [K,H,W,D,...]
Return: a ndarray that represents the m-dimensional cross-correlation (xcorr) of image N_i with filter K_j for every N_i in N and K_j in K
Currently, I am using scipy.spatial.cdist on a custom function that represents the max of the xcorr of two m-dim images, namely scipy.signal.correlate. The code looks something like this:
from scipy.spatial.distance import cdist
from scipy.signal import correlate
def xcorr(u,v):
'''unfortunately, cdist only takes 2D arrays, so need to do this'''
u = np.reshape(u, [96,96,3])
v = np.reshape(v, [96,96,3])
return np.max(correlate(u,v,mode='same',method='fft'))
batch_images = np.random.random([500,96,96,3])
my_filters = np.random.random([1000,96,96,3])
# unfortunately, cdist only takes 2D arrays, so need to do this
batch_vec = np.reshape(batch_images, [-1,np.prod(batch_images.shape[1:])])
filt_vec = np.reshape(my_filters, [-1,np.prod(my_filters.shape[1:])])
answer = cdist(batch_vec, filt_vec, xcorr)
The method works, and its nice that cdist is automatically parallelized across threads, but it is actually quite slow. I am guessing this is due to a number of reasons, including non-optimal use of the cache between threads (e.g. keep one filter fixed in cache while you filter all the images, or vice versa), the reshape operation inside xcorr, etc.
Does the community have any ideas how to speed this up? I realize in my example xcorr takes the maximum over the cross-correlation between both images, but this was just an example that was fit to work with cdist. Ideally, you could perform this batch operation and use some other function (or none) to get the output you wanted. Ideal solutions could handle (R,G,B,D,...) data.
Any/all help appreciated, including but not limited to wrapping C, although Python/numpy solutions are preferred. I saw some posts related to einsum notation, but I am not super familiar with that, so any help would be appreciated. I welcome tensorflow solutions IF they are able to get the same answer (within reasonable precision) as the corresponding slow numpy version.

Broadcasting (N,) and with (N,1) arrays in numpy

I recently ran into the following issue with broadcasting with numpy.
y = randn(100)
x = randn(100,1)
(y+x).shape
> 100,100
While I realize that this is according to the rules of https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html, it seems to be counter intuitive to what one would expect - that the result be a (100,1) vector.
I was just wondering - is there a good reason for this behavior (i.e. this is behavior desirable) - or is it just a by product of the way broadcasting rules are defined.
The basic idea is that when one or the other array requires iteration for the result shapes to make sense, then you iteratively perform the operation for each entry of the major axis (Separately, NumPy offers ways to cause the iteration to happen over different axes if desired, such as with einsum).
In this case, x has 100 different things along its major axis, each of which is individually added to y. Let's take just the first value x[0] and add it to y. Now we're talking about y having 100 things that are iteratively added to x[0], so the result is a shape-of-y thing. Repeat this for x[1] and so forth.
If you do x.T, then along x's major axis there is just 1 thing, namely a length-100 "row". So then it can be elementwise added to y without modification, so no more broadcasting is needed and you get the "naive" vector math operation you might have had in mind.
NumPy's broadcasting rules are trying to be effective for programming and iteration across a wide swath of possible calculations and operations, many having absolutely nothing to do with linear algebra or common vector/matrix operations. So broadcasting doesn't always (and shouldn't always) assume things in order to privilege the linear algebra sort of expectation.

Calculate 3D variant for summed area table using numpy cumsum

In case of a 2D array array.cumsum(0).cumsum(1) gives the Integral image of the array.
What happens if I compute array.cumsum(0).cumsum(1).cumsum(2) over a 3D array?
Do I get a 3D extension of Integral Image i.e, Integral volume over the array?
Its hard to visualize what happens in case of 3D.
I have gone through this discussion.
3D variant for summed area table (SAT)
This gives a recursive way on how to compute the Integral volume. What if I use the cumsum along the 3 axes. Will it give me the same thing?
Will it be more efficient than the recursive method?
Yes, the formula you give, array.cumsum(0).cumsum(1).cumsum(2), will work.
What the formula does is compute a few partial sums so that the sum of these sums is the volume sum. That is, every element needs to be summed exactly once, or, in other words, no element can be skipped and no element counted twice. I think going through each of these questions (is any element skipped or counted twice) is a good way to verify to yourself that this will work. And also run a small test:
x = np.ones((20,20,20)).cumsum(0).cumsum(1).cumsum(2)
print x[2,6,10] # 231.0
print 3*7*11 # 231
Of course, with all ones there could two errors that cancel each other out, but this wouldn't happen everywhere so it's a reasonable test.
As for efficiency, I would guess that the single pass approach is probably faster, but not by a lot. Also, the above could be sped up using an output array, eg, cumsum(n, out=temp) as otherwise three arrays will be created for this calculation. The best way to know is to test (but only if you need to).

Categories

Resources