I am trying to write up a pixel interpolation(binning?) algorithm (I want to, for example, take four pixels and take their average and produce that average as a new pixel). I've had success with stride tricks to speed up the "partitioning" process, but the actual calculation is really slow. For a 256x512 16-bit grayscale image I get the averaging code to take 7s on my machine. I have to process from 2k to 20k images depending on the data set. The purpose is to make the image less noisy (I am aware my proposed method decreases resolution, but this might not be a bad thing for my purposes).
import numpy as np
from numpy.lib.stride_tricks import as_strided
from scipy.misc import imread
import matplotlib.pyplot as pl
import time
def sliding_window(arr, footprint):
""" Construct a sliding window view of the array"""
t0 = time.time()
arr = np.asarray(arr)
footprint = int(footprint)
if arr.ndim != 2:
raise ValueError("need 2-D input")
if not (footprint > 0):
raise ValueError("need a positive window size")
shape = (arr.shape[0] - footprint + 1,
arr.shape[1] - footprint + 1, footprint, footprint)
if shape[0] <= 0:
shape = (1, shape[1], arr.shape[0], shape[3])
if shape[1] <= 0:
shape = (shape[0], 1, shape[2], arr.shape[1])
strides = (arr.shape[1]*arr.itemsize, arr.itemsize,
arr.shape[1]*arr.itemsize, arr.itemsize)
t1 = time.time()
total = t1-t0
print "strides"
print total
return as_strided(arr, shape=shape, strides=strides)
def binning(w,footprint):
#the averaging block
#prelocate memory
binned = np.zeros(w.shape[0]*w.shape[1]).reshape(w.shape[0],w.shape[1])
#print w
t2 = time.time()
for i in xrange(w.shape[0]):
for j in xrange(w.shape[1]):
binned[i,j] = w[i,j].sum()/(footprint*footprint + 0.0)
t3 = time.time()
tot = t3-t2
print tot
return binned
Output:
5.60283660889e-05
7.00565886497
Is there some built-in/optimized function that would to the same thing I want, or should I just try and make a C extension (or even something else) ?
Below is the additional part of the code just for completeness, since I think the functions are the most important here. Image plotting is slow, but I think there is a way to improve it for example here
for i in range(2000):
arr = imread("./png/frame_" + str("%05d" % (i + 1) ) + ".png").astype(np.float64)
w = sliding_window(arr,footprint)
binned = binning(w,footprint)
pl.imshow(binned,interpolation = "nearest")
pl.gray()
pl.savefig("binned_" + str(i) + ".png")
enter code here
What I am looking for could be called interpolation. I just used the term the person who advised me to do this used. Probably that is the reason why I was finding histogram related stuff !
Apart from the median_filter I tried generic_filter from scipy.ndimage but those did not give me the results I wanted (they had no "valid" mode like in convolution i.e. these relied on going out of bounds of the array when moving the kernel arround). I asked in code review and it seems that stackoverflow would be a more suitable place for this question.
Without diving into your code, I think what you what is just to resize the image with interpolation. You should use an image library for this operation, as it will have heavily optimized code.
Since you are using SciPy, you might want to start with PIL, the Python Imaging Library. Use the resize method, were you can pass the desired interpolation parameter, probably Image.BILINEAR in your case.
It should look something like this:
import Image
im = Image.fromarray(your_numpy)
im.resize((w/2, h/2), Image.BILINEAR)
Edit: I just noticed, you can do it even with scipy itself, look at the documentation for
scipy.misc.imresize
a = np.array([[0,1,2],[3,4,5],[6,7,8],[9,10,11]], dtype=np.uint8)
res = scipy.misc.imresize(a, (3,2), interp="bilinear")
For the average look at scipy.ndimage.filters.uniform_filter, in scipy.ndimage.filters, you have lots kernels for convolution that are much faster than a direct convolution with scipy.convolve.
For completeness, there's an interesting solution on scipy blog. The idea is to reshape the array to higher dimensions, then apply mean over one dimension and again, reshape into a smaller array. Supposedly it's fast as well.
https://scipython.com/blog/binning-a-2d-array-in-numpy/
For the ones looking for true binning, rather than interpolation or decimation: this is also provided in the Pillow module with the function Image.reduce. The output of Image.reduce is equal to the rebin method from scipython.com linked by #Tilen K.
image = np.arange(16).astype(float).reshape(4,4)
array([[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.],
[12., 13., 14., 15.]])
np.asarray(Image.fromarray(image).reduce(2))
array([[ 2.5, 4.5],
[10.5, 12.5]], dtype=float32)
Related
I'm trying to find a way to perform operations on each elements across multiple 2D-arrays without having to loop over them. Or at least, not needing two for loops. My code calculates the standard deviation of each pixel over a series of images (arrays). Now, the amount of images there are is not the problem, it is the size of the arrays, making the code take extremely slow. The following is a working example of what I have.
import numpy as np
# reshape(# of image (arrays),# of rows, # of cols)
a = np.arange(32).reshape(2,4,4)
stddev_arr = np.array([])
for i in range(4):
for j in range(4):
pixel = a[0:,i,j]
stddev = np.std(pixel)
stddev_arr = np.append(stddev_arr, stddev)
My actual data is 2000x2000, making this code loop 4000000 times. Is there a better way to do this?
Any advice is extremely appreciated.
You're already using numpy. numpy's std() function takes an axis argument that tells it what axis you want it to operate on (in this case the zeroth axis). Because this offloads the calculation to numpy's C-backend (and possibly using SIMD optimizations for your processor that vectorize a lot of operations), it's so much faster than iterating. Another time-consuming operation in your code is when you append to stddev_arr. Appending to numpy arrays is slow because the entire array is copied into new memory before the new element is added. Now you already know how big that array needs to be, so you might as well preallocate it.
a = np.arange(32).reshape(2, 4, 4)
stdev = np.std(a, axis=0)
This gives a 4x4 array
array([[8., 8., 8., 8.],
[8., 8., 8., 8.],
[8., 8., 8., 8.],
[8., 8., 8., 8.]])
To flatten this into a 1D array, do flat_stdev = stdev.flatten().
Comparing the execution times:
# Using only numpy
def fun1(arr):
return np.std(arr, axis=0).flatten()
# Your function
def fun2(arr):
stddev_arr = np.array([])
for i in range(arr.shape[1]):
for j in range(arr.shape[2]):
pixel = arr[0:,i,j]
stddev = np.std(pixel)
stddev_arr = np.append(stddev_arr, stddev)
return stddev_arr
# Your function, but pre-allocating stddev_arr
def fun3(arr):
stddev_arr = np.zeros((arr.shape[1] * arr.shape[2],))
x = 0
for i in range(arr.shape[1]):
for j in range(arr.shape[2]):
pixel = arr[0:,i,j]
stddev = np.std(pixel)
stddev_arr[x] = stddev
x += 1
return stddev_arr
First, let's make sure all these functions are equivalent:
a = np.random.random((3, 10, 10))
assert np.all(fun1(a) == fun2(a))
assert np.all(fun1(a) == fun3(a))
Yup, all give the same result. Now, let's try with a bigger array.
a = np.random.random((3, 100, 100))
x = timeit.timeit('fun1(a)', setup='from __main__ import fun1, a', number=10)
# x: 0.003302899989648722
y = timeit.timeit('fun2(a)', setup='from __main__ import fun2, a', number=10)
# y: 5.495519500007504
z = timeit.timeit('fun3(a)', setup='from __main__ import fun3, a', number=10)
# z: 3.6250679999939166
Wow! We get a ~1.5x speedup just by preallocating.
Even more wow: using numpy's std() with the axis argument gives a > 1000x speedup, and this is just for the 100x100 array! With bigger arrays, you can expect to see even bigger speedup.
So based on what you have provided, you can reshape your array in another way to vectorize it to replace your two loops. Then you only have to use np.std once on the axis that you want.
a = np.arange(32).reshape(2, 4, 4)
a = a.reshape(2, -1).transpose()
stddev_arr = np.std(a, axis=1)
I have the following code snippet
def norm(x1, x2):
return np.sqrt(((x1 - x2)**2).sum(axis=0))
def call_norm(x1, x2):
x1 = x1[..., :, np.newaxis]
x2 = x2[..., np.newaxis, :]
return norm(x1, x2)
As I understand it, each x represents an array of points in N dimensional space, where N is the size of the final dimension of the array (so for points in 3-space the final dimension is size 3). It inserts extra dimensions and uses broadcasting to generate the cartesian product of these sets of points, and so calculates the distance between all pairs of points.
x = np.array([[1, 2, 3],[1, 2, 3]])
call_norm(x, x)
array([[ 0. , 1.41421356, 2.82842712],
[ 1.41421356, 0. , 1.41421356],
[ 2.82842712, 1.41421356, 0. ]])
(so the distance between[1,1] and [2,2] is 1.41421356, as expected)
I find that for moderate size problems this approach can use huge amounts of memory. I can easily "de-vectorise" the problem and replace it by iteration, but I'd expect that to be slow. I there a (reasonably) easy compromise solution where I could have most of the speed advantages of vectorisation but without the memory penalty? Some fancy generator trick?
There is no way to do this kind of computation without the memory penalty with numpy vectorization. For the specific case of efficiently computing pairwise distance matrices, packages tend to get around this by implementing things in C (e.g. scipy.spatial.distance) or in Cython (e.g. sklearn.metrics.pairwise).
If you want to do this "by-hand", so to speak, using numpy-style syntax but without incurring the memory penalty, the current best option is probably dask.array, which automates the construction and execution of flexible task graphs for batch execution using a numpy-style syntax.
Here's an example of using dask for this computation:
import dask.array as da
# Create the chunked data. This can be created
# from numpy arrays as well, e.g. x_dask = da.array(x_numpy)
x = da.random.random((100, 3), chunks=5)
y = da.random.random((200, 3), chunks=5)
# Compute the task graph (syntax just like numpy!)
diffs = x[:, None, :] - y[None, :, :]
dist = da.sqrt((diffs ** 2).sum(-1))
# Execute the task graph
result = dist.compute()
print(result.shape)
# (100, 200)
You'll find that dask is much more memory efficient than NumPy, is often more computationally efficient than NumPy, and can also be computed in parallel/out-of core relatively straightforwardly.
I am working on an real-time application. For this I need to store around 20 arrays per second. Each arrays consists of n Points with their respective x and y coordinate (z may follow as well in the future).
What I did come up with is some kind of a Ring Buffer, which takes the length of the total arrays (it's frames of a video btw.) and the number of the points with their coordinate (this doesn't change within one execution, but is variable for executions following).
My Buffer inits with an numpy array filled with zeros: np.zeros((lengthOfSlices,numberOfTrackedPoints))
However this seems to be problematic, because I write the whole Points for a Slice into the array at once, not after another. That means I can't broadcast the array as the shape is not correct.
Is there a numPythonic way to initialize the array with zeros and store vectorwise afterwards?
Below you can find what I have now:
class Buffer():
def __init__(self, lengthOfSlices, numberOfTrackedPoints):
self.data = np.zeros((lengthOfSlices,numberOfTrackedPoints))
self.index = 0
def extend(self, x):
'adds array x to ring buffer'
x_index = (self.index + np.arange(x.size)) % self.data.size
self.data[x_index] = x
self.index = x_index[-1] + 1
def get(self):
'returns the first-in-first-out data in the ring buffer'
idx = (self.index + np.arange(self.data.size)) % self.data.size
return self.data[idx]
You need to reshape the array based on the lenght of the frame.
Simple example:
>>> import numpy as np
>>> A = np.zeros(100)
>>> B = np.reshape(A, (10,10))
>>> B[0]
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
So that's probably something like self.data = np.reshape(self.data, (lengthOfAFrame, 20))
EDIT:
Apparently reshaping is not your (only?) problem, you might check collections.deque for a python implementation of a circular buffer (source and example)
I have a large matrix A of shape (n, n, 3, 3) with n is about 5000. Now I want find the inverse and transpose of matrix A:
import numpy as np
A = np.random.rand(1000, 1000, 3, 3)
identity = np.identity(3, dtype=A.dtype)
Ainv = np.zeros_like(A)
Atrans = np.zeros_like(A)
for i in range(1000):
for j in range(1000):
Ainv[i, j] = np.linalg.solve(A[i, j], identity)
Atrans[i, j] = np.transpose(A[i, j])
Is there a faster, more efficient way to do this?
This is taken from a project of mine, where I also do vectorized linear algebra on many 3x3 matrices.
Note that there is only a loop over 3; not a loop over n, so the code is vectorized in the important dimensions. I don't want to vouch for how this compares to a C/numba extension to do the same thing though, performance wise. This is likely to be substantially faster still, but at least this blows the loops over n out of the water.
def adjoint(A):
"""compute inverse without division by det; ...xv3xc3 input, or array of matrices assumed"""
AI = np.empty_like(A)
for i in xrange(3):
AI[...,i,:] = np.cross(A[...,i-2,:], A[...,i-1,:])
return AI
def inverse_transpose(A):
"""
efficiently compute the inverse-transpose for stack of 3x3 matrices
"""
I = adjoint(A)
det = dot(I, A).mean(axis=-1)
return I / det[...,None,None]
def inverse(A):
"""inverse of a stack of 3x3 matrices"""
return np.swapaxes( inverse_transpose(A), -1,-2)
def dot(A, B):
"""dot arrays of vecs; contract over last indices"""
return np.einsum('...i,...i->...', A, B)
A = np.random.rand(2,2,3,3)
I = inverse(A)
print np.einsum('...ij,...jk',A,I)
for the transpose:
testing a bit in ipython showed:
In [1]: import numpy
In [2]: x = numpy.ones((5,6,3,4))
In [3]: numpy.transpose(x,(0,1,3,2)).shape
Out[3]: (5, 6, 4, 3)
so you can just do
Atrans = numpy.transpose(A,(0,1,3,2))
to transpose the second and third dimensions (while leaving dimension 0 and 1 the same)
for the inversion:
the last example of http://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.inv.html#numpy.linalg.inv
Inverses of several matrices can be computed at once:
from numpy.linalg import inv
a = np.array([[[1., 2.], [3., 4.]], [[1, 3], [3, 5]]])
>>> inv(a)
array([[[-2. , 1. ],
[ 1.5, -0.5]],
[[-5. , 2. ],
[ 3. , -1. ]]])
So i guess in your case, the inversion can be done with just
Ainv = inv(A)
and it will know that the last two dimensions are the ones it is supposed to invert over, and that the first dimensions are just how you stacked your data. This should be much faster
speed difference
for the transpose: your method needs 3.77557015419 sec, and mine needs 2.86102294922e-06 sec (which is a speedup of over 1 million times)
for the inversion: i guess my numpy version is not high enough to try that numpy.linalg.inv trick with (n,n,3,3) shape, to see the speedup there (my version is 1.6.2, and the docs i based my solution on are for 1.8, but it should work on 1.8, if someone else can test that?)
Numpy has the array.T properties which is a shortcut for transpose.
For inversions, you use np.linalg.inv(A).
As posted by wim A.I also works on matrix. e.g.
print (A.I)
for numpy-matrix object, use matrix.getI.
e.g.
A=numpy.matrix('1 3;5 6')
print (A.getI())
I'm writing a moving average function that uses the convolve function in numpy, which should be equivalent to a (weighted moving average). When my weights are all equal (as in a simple arithmatic average), it works fine:
data = numpy.arange(1,11)
numdays = 5
w = [1.0/numdays]*numdays
numpy.convolve(data,w,'valid')
gives
array([ 3., 4., 5., 6., 7., 8.])
However, when I try to use a weighted average
w = numpy.cumsum(numpy.ones(numdays,dtype=float),axis=0); w = w/numpy.sum(w)
instead of the (for the same data) 3.667,4.667,5.667,6.667,... I expect, I get
array([ 2.33333333, 3.33333333, 4.33333333, 5.33333333, 6.33333333,
7.33333333])
If I remove the 'valid' flag, I don't even see the correct values. I would really like to use convolve for the WMA as well as MA as it makes the code cleaner (same code, different weights) and otherwise I think I'll have to loop through all the data and take slices.
Any ideas about this behavior?
What you want is np.correlate in a convolution the second argument is inverted basically, so that your expected result would be with np.convolve(data, w[::-1], 'valid').