Difference between cv2.findNonZero and Numpy.NonZero - python

Silly question here.
I want to find the locations of the pixels from some black and white images and found this two functions from Numpy library and OpenCV.
The example I found on the internet (http://docs.opencv.org/trunk/d1/d32/tutorial_py_contour_properties.html):
mask = np.zeros(imgray.shape,np.uint8)
cv2.drawContours(mask,[cnt],0,255,-1)
pixelpoints = np.transpose(np.nonzero(mask))
pixelpointsCV2 = cv2.findNonZero(mask)
Which states
Numpy gives coordinates in (row, column) format, while OpenCV gives coordinates in (x,y) format. So basically the answers will be interchanged. Note that, row = x and column = y.
Based on my understanding of english, isn't their explanation wrong? Shouldn't it be:
Numpy gives coordinates in (row, column) format, while OpenCV gives coordinates in (y,x) or (column, row) format.
My questions are:
Does numpy return (row,col)/(x,y) and OpenCV (y,x) where row=x, col=y? Although IMHO it should be row=y, col=x?
Which one is more computation efficient? In terms of time & resources.
Maybe I am not getting this simple thing right due to not being a non-native English speaker.

There is an error in the documentation:
Numpy gives coordinates in (row, column) format, while OpenCV gives coordinates in (x,y) format. So basically the answers will be interchanged. Note that, row = x and column = y. Note that, row = y and column = x.
So, regarding your questions:
numpy returns (row,col) = (y,x), and OpenCV returns (x,y) = (col,row)
You need to scan the whole matrix and retrieve some points. I don't think there will be any significant difference in performance (should be tested!).
Since you're using Python, probably it's better to use Python facilities, e.g. numpy.
Runtime test comparing these two versions -
In [86]: mask = (np.random.rand(128,128)>0.5).astype(np.uint8)
In [87]: %timeit cv2.findNonZero(mask)
10000 loops, best of 3: 97.4 µs per loop
In [88]: %timeit np.nonzero(mask)
1000 loops, best of 3: 297 µs per loop
In [89]: mask = (np.random.rand(512,512)>0.5).astype(np.uint8)
In [90]: %timeit cv2.findNonZero(mask)
1000 loops, best of 3: 1.65 ms per loop
In [91]: %timeit np.nonzero(mask)
100 loops, best of 3: 4.8 ms per loop
In [92]: mask = (np.random.rand(1024,1024)>0.5).astype(np.uint8)
In [93]: %timeit cv2.findNonZero(mask)
100 loops, best of 3: 6.75 ms per loop
In [94]: %timeit np.nonzero(mask)
100 loops, best of 3: 19.4 ms per loop
Thus, it seems using OpenCV results in something around 3x speedup over the NumPy counterpart across varying datasizes.

Related

Efficient way to sample a large array many times with NumPy?

If you don't care about the details of what I'm trying to implement, just skip past the lower horizontal line
I am trying to do a bootstrap error estimation on some statistic with NumPy. I have an array x, and wish to compute the error on the statistic f(x) for which usual gaussian assumptions in error analysis do not hold. x is very large.
To do this, I resample x using numpy.random.choice(), where the size of my resample is the size of the original array, with replacement:
resample = np.random.choice(x, size=len(x), replace=True)
This gives me a new realization of x. This operation must now be repeated ~1,000 times to give an accurate error estimate. If I generate 1,000 resamples of this nature;
resamples = [np.random.choice(x, size=len(x), replace=True) for i in range(1000)]
and then compute the statistic f(x) on each realization;
results = [f(arr) for arr in resamples]
then I have inferred the error of f(x) to be something like
np.std(results)
the idea being that even though f(x) itself cannot be described using gaussian error analysis, a distribution of f(x) measures subject to random error can be.
Okay, so that's a bootstrap. Now, my problem is that the line
resamples = [np.random.choice(x, size=len(x), replace=True) for i in range(1000)]
is very slow for large arrays. Is there a smarter way to do this without a list comprehension? The second list comprehension
results = [f(arr) for arr in resamples]
can be pretty slow too, depending on the details of the function f(x).
Since we are allowing repetitions, we could generate all the indices in one go with np.random.randint and then simply index to get resamples equivalent, like so -
num_samples = 1000
idx = np.random.randint(0,len(x),size=(num_samples,len(x)))
resamples_arr = x[idx]
One more approach would be to generate random number from uniform distribution with numpy.random.rand and scale to length of array, like so -
resamples_arr = x[(np.random.rand(num_samples,len(x))*len(x)).astype(int)]
Runtime test with x of 5000 elems -
In [221]: x = np.random.randint(0,10000,(5000))
# Original soln
In [222]: %timeit [np.random.choice(x, size=len(x), replace=True) for i in range(1000)]
10 loops, best of 3: 84 ms per loop
# Proposed soln-1
In [223]: %timeit x[np.random.randint(0,len(x),size=(1000,len(x)))]
10 loops, best of 3: 76.2 ms per loop
# Proposed soln-2
In [224]: %timeit x[(np.random.rand(1000,len(x))*len(x)).astype(int)]
10 loops, best of 3: 59.7 ms per loop
For very large x
With a very large array x of 600,000 elements, you might not want to create all those indices for 1000 samples. In that case, per sample solution would have their timings something like this -
In [234]: x = np.random.randint(0,10000,(600000))
# Original soln
In [235]: %timeit np.random.choice(x, size=len(x), replace=True)
100 loops, best of 3: 13 ms per loop
# Proposed soln-1
In [238]: %timeit x[np.random.randint(0,len(x),len(x))]
100 loops, best of 3: 12.5 ms per loop
# Proposed soln-2
In [239]: %timeit x[(np.random.rand(len(x))*len(x)).astype(int)]
100 loops, best of 3: 9.81 ms per loop
As alluded to by #Divakar you can pass a tuple to size to get a 2d array of resamples rather than using list comprehension.
Here assume for a second that f is just sum rather than some other function. Then:
x = np.random.randn(100000)
resamples = np.random.choice(x, size=(1000, x.shape[0]), replace=True)
# resamples.shape = (1000, 1000000)
results = np.apply_along_axis(f, axis=1, arr=resamples)
print(results.shape)
# (1000,)
Here np.apply_along_axis is admittedly just a glorified for-loop equivalent to [f(arr) for arr in resamples]. But I am not exactly sure if you need to index x here based on your question.

Python3 - Computationally efficient correlation between matrix and array

I'd like to correlate the columns of an mxn matrix with a 1xm array. This should give me an 1xn array back. At the moment I am doing this a bit clumsy with:
c = np.corrcoef(X, y)[:-1,-1]
I find the correlations I want here in the last column and with the last row/column corresponding to the correlation the array have with it self (so r = 1.0).
This is fine, but however, I need to do this on quite big matrices and that is basically when it becomes too computationally heavy and my computer gives up.
For example the largest matrix I am doing this for has the size:
48x290400 (= X) and 48x1 (=y), where I want to end up with 290400 r-values
This works fine in Matlab, but not in python using np.corrcoef. Anyone got a good solution for this?
Cheers
Daniel
We could use corr2_coeff from this post after transposing the input arrays -
corr2_coeff(a.T,b.T).ravel()
Sample run -
In [160]: a = np.random.rand(3, 5)
In [161]: b = np.random.rand(3, 1)
# Proposed in the question
In [162]: np.corrcoef(a.T, b.T)[:-1,-1]
Out[162]: array([-0.0716, 0.1905, 0.9699, 0.7482, -0.1511])
# Proposed in this post
In [163]: corr2_coeff(a.T,b.T).ravel()
Out[163]: array([-0.0716, 0.1905, 0.9699, 0.7482, -0.1511])
Runtime test -
In [171]: a = np.random.rand(48, 10000)
In [172]: b = np.random.rand(48, 1)
In [173]: %timeit np.corrcoef(a.T, b.T)[:-1,-1]
1 loops, best of 3: 619 ms per loop
In [174]: %timeit corr2_coeff(a.T,b.T).ravel()
1000 loops, best of 3: 1.72 ms per loop
In [176]: 619.0/1.72
Out[176]: 359.8837209302326
Massive 360x speedup there!
Scaling it further -
In [239]: a = np.random.rand(48, 29040)
In [240]: b = np.random.rand(48, 1)
In [241]: %timeit np.corrcoef(a.T, b.T)[:-1,-1]
1 loops, best of 3: 5.19 s per loop
In [242]: %timeit corr2_coeff(a.T,b.T).ravel()
100 loops, best of 3: 8.09 ms per loop
In [244]: 5190.0/8.09
Out[244]: 641.5327564894932
640x+ speedup on this bigger dataset and should scale better as we go towards actual dataset sizes!

Fast distance calculation in scipy and numpy

Let A,B be ((day,observation,dim)) arrays. Each array contains for a given day the same number of observations, an observation being a point with dim dimensions (that is dim floats). For every day, I want to compute the spatial distances between all observations in A and B that day.
For example:
import numpy as np
from scipy.spatial.distance import cdist
A, B = np.random.rand(50,1000,10), np.random.rand(50,1000,10)
output = []
for day in range(50):
output.append(cdist(A[day],B[day]))
where I use scipy.spatial.distance.cdist.
Is there a faster way to do this? Ideally, I would like to get for output a ((day,observation,observation)) array that contains for every day the pairwise distances between observations in A and B that day, whilst somehow avoid the loop over days.
One way to do it (though it will require a massive amount of memory) is to make clever use of array broadcasting:
output = np.sqrt( np.sum( (A[:,:,np.newaxis,:] - B[:,np.newaxis,:,:])**2, axis=-1) )
Edit
But after some testing, it seems that probably scikit-learn's euclidean_distances is the best option for large arrays. (Note that I've rewritten your loop into a list comprehension.)
This is for 100 data points per day:
# your own code using cdist
from scipy.spatial.distance import cdist
%timeit dists1 = np.asarray([cdist(x,y) for x, y in zip(A, B)])
100 loops, best of 3: 8.81 ms per loop
# pure numpy with broadcasting
%timeit dists2 = np.sqrt( np.sum( (A[:,:,np.newaxis,:] - B[:,np.newaxis,:,:])**2, axis=-1) )
10 loops, best of 3: 46.9 ms per loop
# scikit-learn's algorithm
from sklearn.metrics.pairwise import euclidean_distances
%timeit dists3 = np.asarray([euclidean_distances(x,y) for x, y in zip(A, B)])
100 loops, best of 3: 12.6 ms per loop
and this is for 2000 data points per day:
In [5]: %timeit dists1 = np.asarray([cdist(x,y) for x, y in zip(A, B)])
1 loops, best of 3: 3.07 s per loop
In [7]: %timeit dists3 = np.asarray([euclidean_distances(x,y) for x, y in zip(A, B)])
1 loops, best of 3: 2.94 s per loop
Edit: I'm an idiot and forgot that python's map is evaluated lazily. My "faster" code wasn't actually doing any of the work! Forcing evaluation removed the performance boost.
I think your time is going to be dominated by the time spent inside the scipy function. I'd use map instead of the loop anyway as I think its a bit neater but I don't think theres any magic way to get a huge performance boost here. Maybe compiling the code with cython or using numba would help a little.

Why does padding an FFT in NumPy make it run much slower?

I had writted a script using NumPy's fft function, where I was padding my input array to the nearest power of 2 to get a faster FFT.
After profiling the code, I found that the FFT call was taking the longest time, so I fiddled around with the parameters and found that if I didn't pad the input array, the FFT ran several times faster.
Here's a minimal example to illustrate what I'm talking about (I ran this in IPython and used the %timeit magic to time the execution).
x = np.arange(-4.*np.pi, 4.*np.pi, 1000)
dat1 = np.sin(x)
The timing results:
%timeit np.fft.fft(dat1)
100000 loops, best of 3: 12.3 µs per loop
%timeit np.fft.fft(dat1, n=1024)
10000 loops, best of 3: 61.5 µs per loop
Padding the array to a power of 2 leads to a very drastic slowdown.
Even if I create an array with a prime number of elements (hence the theoretically slowest FFT)
x2 = np.arange(-4.*np.pi, 4.*np.pi, 1009)
dat2 = np.sin(x2)
The time it takes to run still doesn't change so drastically!
%timeit np.fft.fft(dat2)
100000 loops, best of 3: 12.2 µs per loop
I would have thought that padding the array will be a one time operation, and then calculating the FFT should be quicker.
Am I missing anything?
EDIT: I was supposed to use np.linspace rather than np.arange. Below are the timing results using linspace
In [2]: import numpy as np
In [3]: x = np.linspace(-4*np.pi, 4*np.pi, 1000)
In [4]: x2 = np.linspace(-4*np.pi, 4*np.pi, 1024)
In [5]: dat1 = np.sin(x)
In [6]: dat2 = np.sin(x2)
In [7]: %timeit np.fft.fft(dat1)
10000 loops, best of 3: 55.1 µs per loop
In [8]: %timeit np.fft.fft(dat2)
10000 loops, best of 3: 49.4 µs per loop
In [9]: %timeit np.fft.fft(dat1, n=1024)
10000 loops, best of 3: 64.9 µs per loop
Padding still causes a slowdown. Could this be a local issue? i.e., due to some quirk in my NumPy setup it's acting this way?
FFT algorithms like NumPy's are fast for array sizes that factorize into a product of small primes, not just powers of two. If you increase the array size by padding the computational work increases. The speed of FFT algorithms is also critically dependent on the cache use. If you pad to an array size that creates less efficient cache use the efficiency slows down. The really fast FFT algorithms, like FFTW and Intel MKL, will actually generate plans for the array size factorization to get the most efficient computation. This includes both heuristics and actual measurements. So no, padding to the nearest power of two is only beneficial in introductory textbooks and not neccesarily in practice. As a rule of thumb you usually benefit from padding if the array size factorizes to one or more very large prime.
You're using np.arange when you want to be using np.linspace
In [2]: x = np.arange(-4.*np.pi, 4.*np.pi, 1000)
In [3]: x
Out[3]: array([-12.56637061])
np.arange takes arguments (start, stop, step), whereas np.linspace is (start, stop, number_of_pts). When you calculate with the data I suspect you think you're using, you get the expected behavior:
In [4]: x = np.linspace(-4.*np.pi, 4.*np.pi, 1000)
In [5]: dat1 = np.sin(x)
In [6]: %timeit np.fft.fft(dat1)
1 loops, best of 3: 28.1 µs per loop
In [7]: %timeit np.fft.fft(dat1, n=1024)
10000 loops, best of 3: 26.7 µs per loop
In [8]: x = np.linspace(-4.*np.pi, 4.*np.pi, 1009)
In [9]: dat2 = np.sin(x)
In [10]: %timeit np.fft.fft(dat2)
10000 loops, best of 3: 53 µs per loop
In [11]: %timeit np.fft.fft(dat2, n=1024)
10000 loops, best of 3: 26.8 µs per loop

Improve performance of array handling

I have a large code which takes a bit of time to run. I've tracked down the two lines that take up most of the time and I'd like to know if there's a way to speed them up. Here's a MWE:
import numpy as np
def setup(k=2, m=100, n=300):
return np.random.randn(k,m), np.random.randn(k,n),np.random.randn(k,m)
# make some random points and weights
a, b, w = setup()
# Weighted euclidean distance between arrays a and b.
wdiff = (a[np.newaxis,...] - b[np.newaxis,...].T) / w[np.newaxis,...]
# This is the set of operations that need a performance boost:
dist_1 = np.exp(-0.5*(wdiff*wdiff)) / w
dist_2 = np.array([i[0]*i[1] for i in dist_1])
I'm coming from this question BTW Fast weighted euclidean distance between points in arrays where ali_m suggested his amazing answer that saved me a lot of time by applying broadcasting (of which I know absolutely nothing, yet at least) Could something like that be applied with these lines?
Your dist_2 calculation can be sped up by a factor of 10 or so:
>>> dist_1.shape
(300, 2, 100)
>>> %timeit dist_2 = np.array([i[0]*i[1] for i in dist_1])
1000 loops, best of 3: 1.35 ms per loop
>>> %timeit dist_2 = dist_1.prod(axis=1)
10000 loops, best of 3: 116 µs per loop
>>> np.allclose(np.array([i[0]*i[1] for i in dist_1]), dist_1.prod(axis=1))
True
I couldn't manage to do much with your dist_1 as the majority of time is spent in the exponentiation:
>>> %timeit (-0.5*(wdiff*wdiff)) / w
1000 loops, best of 3: 467 µs per loop
>>> %timeit np.exp((-0.5*(wdiff*wdiff)))/w
100 loops, best of 3: 3.3 ms per loop

Categories

Resources