I have a sparse matrix and I need to create a new neighbor matrix of each index.
Below I leave a representation of the data in the NxM matrix. For each of the elements of the matrix I need to obtain the neighbors in a section of KxK. With this information, it would generate a NMxKK matrix that contains in each row the indices of the neighboring KKs of the element.
I asked a similar question a while ago but the difference is that now the data is structured, so I can do without KdTree.
This new matrix is used to calculate the distance of non-zero neighbors, and with these distances associate a weight to each neighbor, to finally estimate the desired value as a weighted average of the neighbors.
Thanks in advance!
UPDATE
I have data like the ones in the image (generated with the function generate_data) and I need to perform the following operation.
Given a filter / kernel / NxN matrix, with N being the kernel size defined by me, calculate for nonzero values the distances with respect to the central pixel. Take as an example the value 20 that is in the position (1, 8) of the image. Taking a matrix of 5x5, the nonzero values of interest are 40 (in (0, 6)), 37 (in (1, 6)) and 25 (in (3, 10)), with distances 2.23606798, 2 and 2.82842712 respectively (obtained making the Euclidean norm between the indices).
What I need to get in this step is the matrix res:
[[0. 2.23606798 2. 0. 0. ]
[0. 0. 0. 0. 0. ]
[0. 0. 1. 0. 0. ]
[0. 0. 0. 0. 0. ]
[0. 0. 0. 0. 2.82842712]]
I need to obtain the 1. in the center of the matrix too to also take into account the value where I am standing (whose distance to itself is 0.).
With these values, I get the mask with non-zero values and calculate the weights based on a Gaussian distribution:
import scipy.stats as st
mask = 0 < res
gauss = st.norm.pdf(res) # or st.norm.pdf(mask * kernel(5))
[[0. , 0.03274718, 0.05399097, 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0.39894228, 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0.00730688]])
total = gauss.sum() # 0.4929873057962355
Finally, with these weights, I calculate the weights and the final value of the pixel by interpolating the values.
val[1, 8] = 0.03274718 * 40 / total + 0.05399097 * 37 / total + 0.39894228 * 20 / total + 0.00730688 * 25 / total
The same thing I must do for each pixel (I guess I have to add a kernel_size padding // 2 to be able to use the whole array).
Here is my script
import matplotlib.pylab as plt
import numpy as np
import scipy.stats as st
from scipy import sparse
def generate_data(m, n, density):
s = 64 * sparse.random(m, n, density=density).A
return s.astype(np.int8)
def plot_matrix(matrix):
for (j, i), label in np.ndenumerate(s):
plt.text(i, j, label, ha='center', va='center')
plt.imshow(matrix)
plt.show()
def kernel(n):
n = n if n % 2 != 0 else n + 1
mid = n // 2
m = np.ndarray((n, n, 2))
for i in range(n):
for j in range(n):
m[i, j] = np.array([i, j])
return np.linalg.norm(m - [mid, mid], axis=2)
s = generate_data(10, 14, 0.25)
plot_matrix(s)
This was really simple, although maybe not very efficient. What I had to do was two convolutions:
In the first, it was convolucionar the Gaussian kernel with the matrix
conv_1 = convolve2d(m * mask_clean, k_gauss)
In the second, the Gaussian kernel with the mask
conv_2 = convolve2d(mask_clean, k_gauss)
In each position, conv_1 would have the sum of each value weighed by the corresponding factor of the Gaussian kernel. conv_2 would have in each position the sum of all nonzero values. The only thing left to do was divide them to get the final result
# m have the data
mask_clean = (0 < m) & (m_mean - 3*m_std < m) & (m < m_mean + 3*m_std)
# Custom function to create a gaussian kernel
k = gkern(kernlen=5, std=5//2)
k_gauss = st.norm.pdf(k)
conv_1 = convolve2d(m * mask_clean, k_gauss)
conv_2 = convolve2d(mask_clean, k_gauss)
final = conv_1 / conv_2
Related
I'm trying to create a random test for a "harris_corner_detector" function implementation (VERY GENERALLY AND SLIGHTLY INCORRECTLY: a function that finds corners in an image)
In the test, I want to create random simple shapes in a binary numpy matrix (it's easy to know the coordinates of their corners) (e.g rectangles, triangles, rhombus (diamond) etc...) and check if the harris implementation finds the correct corners.
I already implemented a function that randomly 'draws' an axis parallel rectangle, but i can't find an efficient way to do so when it comes to shapes that are not parallel to the axes.
To create a random rectangle, I randomly choose a starting point and an ending point on both axes and I change the value of all of the cells within those bounds like so:
getting the random coords:
def _get_random_coords(self, start, end):
x_start, y_start = np.random.randint(start, end, 2)
x_end = np.random.randint(x_start + 7, end + 20)
y_end = np.random.randint(y_start + 7, end + 20)
return (x_start, x_end, y_start, y_end)
drawing the random rectangle (values are 255 for the background and 0 for the shape):
mat = np.ones((1024, 1024)) * 255
mat[x_start: x_end, y_start: y_end] = np.zeros((x_end - x_start, y_end - y_start))
but when it comes to drawing a diamond shape efficiently I'm at a loss. All I can think about is to run a loop that creates the diamond like so:
def _get_rhombus(self, size):
rhombus = []
for i in range(size):
rhombus.append(np.zeros(i+1))
for i in range(size - 1, 0, -1):
rhombus.append(np.zeros(i))
return np.array(rhombus)
and then another loop to add it to the larger matrix.
But this method is highly inefficient when it comes to testing (as I'll draw hundreds of them, some of them might be huge).
Any better ideas out there? Alternatively - is there a better way to test this?
Thanks in advance.
There are a number of questions here, but the main one is how to create a numpy array of a filled rhombus given the corners. I'll answer that, and leave other questios, like creating random rhombuses, etc.
To fill a convex polygon, one can find the line specified by subsequent corners and fill above or below that line, and then and all the filled areas together.
import numpy as np
import matplotlib.pyplot as plt
# given two (non-vertical) points, A and B,
# fill above or below the line connecting them
def fill(A, B, fill_below=True, xs=10, ys=12):
# the equation for a line is y = m*x + b, so calculate
# m and b from the two points on the line
m = (B[1]-A[1])/(B[0]-A[0]) # m = (y2 - y1)/(x2 - x1) = slope
b = A[1] - m*A[0] # b = y1 - m*x1 = y intercept
# for each points of the grid, calculate whether it's above, below, or on
# the line. Since y = m*x + b, calculating m*x + b - y will give
# 0 when on the line, <0 when above, and >0 when below
Y, X = np.mgrid[0:ys, 0:xs]
L = m*X + b - Y
# select whether, >=0 is True, or, <=0 is True, to determine whether to
# fill above or below the line
op = np.greater_equal if fill_below else np.less_equal
return op(L, 0.0)
Here's a simple low-res rhombus
r = fill((0, 3), (3, 8), True) & \
fill((3, 8), (7, 4), True) & \
fill((7,4), (5,0), False) & \
fill((5,0), (0,3), False)
plt.imshow(r, cmap='Greys', interpolation='nearest', origin='lower')
That is, the above figure is the result of and-ing together the following fills:
fig, ax = plt.subplots(1, 4, figsize=(10, 3))
fill_params = [((0, 3), (3, 8), True), ((3, 8), (7, 4), True), ((7, 4), (5, 0), False), ((5, 0), (0, 3), False)]
for p, ax in zip(fill_params, ax):
ax.imshow(fill(*p), cmap="Greys", interpolation='nearest', origin='lower')
Or, one could do high-res, and it can have multiple sides (although I think it must be convex).
r = fill((0, 300), (300, 800), True, 1000, 1200) & \
fill((300, 800), (600,700), True, 1000, 1200) & \
fill((600, 700), (700, 400), True, 1000, 1200) & \
fill((700,400), (500,0), False, 1000, 1200) & \
fill((500,0), (100,100), False, 1000, 1200) & \
fill((100, 100), (0,300), False, 1000, 1200)
plt.imshow(r, cmap='Greys', interpolation='nearest', origin='lower')
Obviously, there are a few things to improve, like not repeating the second point of the line and the first point of the new line, but I wanted to keep this all clean and simple (and also, for fill to work the points just need to define a line and don't need to be a corner, so in some cases this more general approach might be preferable). Also, currently one needs to specify whether to fill above or below the line, and that can be calculated in various ways, but is probably easiest when generating the rhombus.
Although a little less robust than the answer already posted, here's a neat trick to form rhombus using upper and lower triangular matrices concept
import numpy as np
import matplotlib.pyplot as plt
blank = np.zeros((10, 12))
anchorx, anchory = 2, 3
# better result for odd dimensions, because mid index exists
# can handle h != w but the rhombus would still fit to a square of dimension min(h, w) x min(h, w)
h, w = 7, 7
assert anchorx+h <= blank.shape[0], "Boundaries exceed, maintain 'anchorx+h <= blank.shape[0]' "
assert anchory+w <= blank.shape[1], "Boundaries exceed, maintain 'anchory+w <= blank.shape[1]' "
tri_rtc = np.fromfunction(lambda i, j: i >= j, (h // 2 + 1, w // 2 + 1), dtype=int)
tri_ltc = np.flip(tri_rtc, axis=1)
rhombus = np.vstack((np.hstack((tri_ltc, tri_rtc[:, 1:])), np.flip(np.hstack((tri_ltc, tri_rtc[:, 1:])), axis=0)[1:, :]))
blank[anchorx:anchorx+h, anchory:anchory+w] = rhombus
print(blank)
plt.imshow(blank)
plt.show()
[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 1. 1. 1. 0. 0. 0. 0.]
[0. 0. 0. 0. 1. 1. 1. 1. 1. 0. 0. 0.]
[0. 0. 0. 1. 1. 1. 1. 1. 1. 1. 0. 0.]
[0. 0. 0. 0. 1. 1. 1. 1. 1. 0. 0. 0.]
[0. 0. 0. 0. 0. 1. 1. 1. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]
I have ndarray of eigenvalues and their multiplicities (for instance, np.array([(2.2, 2), (3, 3), (5, 1)])). I need to compute Jordan matrix for this eigenvalues without using Python cycles and iterables (list comprehensions, for loops etc.), only by using NumPy's functions.
I decided to build the matrix by this steps:
Create this blocks using np.vectorize and np.eye with np.fill_diagonal:
Combine blocks into one matrix using hstack and vstack.
But I've got two problems:
Here's snippet of my block creating code:
def eye(t):
eye = np.eye(t[1].astype(int),k=1)
return eye
def jordan_matrix(X: np.ndarray) -> np.ndarray:
dim = np.sum(X[:,1].astype(int))
eyes = np.vectorize(eye, signature='(x)->(n,m)')(X)
return eyes
And I'm getting error ValueError: could not broadcast input array from shape (3,3) into shape (2,2)
I need to create extra zero matrices to fill space which is not used by created blocks, but their sizes are variable and I can't figure out how to create them without using Python's for and its equivalents.
Am I on the right way? How can I get out of this problems?
np.vectorize would basically loop under the hoods. We could use NumPy funcs for actual vectorization at Python level. Here's one such way -
def blockwise_jordan(a):
r = a[:,1].astype(int)
v = np.repeat(a[:,0],r)
out = np.diag(v)
n = out.shape[1]
fillvals = np.ones(n, dtype=out.dtype)
fillvals[r[:-1].cumsum()-1] = 0
out.flat[1::out.shape[1]+1] = fillvals
return out
Sample run -
In [52]: X = np.array([(2.2, 2), (3, 3), (5, 1)])
In [53]: blockwise_jordan(X)
Out[53]:
array([[2.2, 1. , 0. , 0. , 0. , 0. ],
[0. , 2.2, 0. , 0. , 0. , 0. ],
[0. , 0. , 3. , 1. , 0. , 0. ],
[0. , 0. , 0. , 3. , 1. , 0. ],
[0. , 0. , 0. , 0. , 3. , 0. ],
[0. , 0. , 0. , 0. , 0. , 5. ]])
Optimization #1
We can replace the final three steps to perform the conditional assignment of 1s and 0s, like so -
out.flat[1::n+1] = 1
c = r[:-1].cumsum()-1
out[c,c+1] = 0
Here's my solution:
def jordan(a):
e = a[:,0] # eigenvalues
m = a[:,1].astype('int') # multiplicities
d = np.repeat(e, m) # main diagonal
ones = np.ones(d.size - 1)
ones[np.cumsum(m)[:-1] -1] = 0
j = np.diag(d) + np.diag(ones, k=1)
return j
Edit: just realized that my solution is almost the same as Divakar's.
Given a 3D numpy array of shape (256, 256, 256), how would I make a solid sphere shape inside? The code below generates a series of increasing and decreasing circles but is diamond shaped when viewed in the two other dimensions.
def make_sphere(arr, x_pos, y_pos, z_pos, radius=10, size=256, plot=False):
val = 255
for r in range(radius):
y, x = np.ogrid[-x_pos:n-x_pos, -y_pos:size-y_pos]
mask = x*x + y*y <= r*r
top_half = arr[z_pos+r]
top_half[mask] = val #+ np.random.randint(val)
arr[z_pos+r] = top_half
for r in range(radius, 0, -1):
y, x = np.ogrid[-x_pos:size-x_pos, -y_pos:size-y_pos]
mask = x*x + y*y <= r*r
bottom_half = arr[z_pos+r]
bottom_half[mask] = val#+ np.random.randint(val)
arr[z_pos+2*radius-r] = bottom_half
if plot:
for i in range(2*radius):
if arr[z_pos+i].max() != 0:
print(z_pos+i)
plt.imshow(arr[z_pos+i])
plt.show()
return arr
EDIT: pymrt.geometry has been removed in favor of raster_geometry.
DISCLAIMER: I am the author of both pymrt and raster_geometry.
If you just need to have the sphere, you can use the pip-installable module raster_geometry, and particularly raster_geometry.sphere(), e.g:
import raster_geometry as rg
arr = rg.sphere(3, 1)
print(arr.astype(np.int_))
# [[[0 0 0]
# [0 1 0]
# [0 0 0]]
# [[0 1 0]
# [1 1 1]
# [0 1 0]]
# [[0 0 0]
# [0 1 0]
# [0 0 0]]]
internally, this is implemented as an n-dimensional superellipsoid generator, you can check its source code for details.
Briefly, the (simplified) code would reads like this:
import numpy as np
def sphere(shape, radius, position):
"""Generate an n-dimensional spherical mask."""
# assume shape and position have the same length and contain ints
# the units are pixels / voxels (px for short)
# radius is a int or float in px
assert len(position) == len(shape)
n = len(shape)
semisizes = (radius,) * len(shape)
# genereate the grid for the support points
# centered at the position indicated by position
grid = [slice(-x0, dim - x0) for x0, dim in zip(position, shape)]
position = np.ogrid[grid]
# calculate the distance of all points from `position` center
# scaled by the radius
arr = np.zeros(shape, dtype=float)
for x_i, semisize in zip(position, semisizes):
# this can be generalized for exponent != 2
# in which case `(x_i / semisize)`
# would become `np.abs(x_i / semisize)`
arr += (x_i / semisize) ** 2
# the inner part of the sphere will have distance below or equal to 1
return arr <= 1.0
and testing it:
# this will save a sphere in a boolean array
# the shape of the containing array is: (256, 256, 256)
# the position of the center is: (127, 127, 127)
# if you want is 0 and 1 just use .astype(int)
# for plotting it is likely that you want that
arr = sphere((256, 256, 256), 10, (127, 127, 127))
# just for fun you can check that the volume is matching what expected
# (the two numbers do not match exactly because of the discretization error)
print(np.sum(arr))
# 4169
print(4 / 3 * np.pi * 10 ** 3)
# 4188.790204786391
I am failing to get how your code exactly works, but to check that this is actually producing spheres (using your numbers) you could try:
arr = sphere((256, 256, 256), 10, (127, 127, 127))
# plot in 3D
import matplotlib.pyplot as plt
from skimage import measure
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1, projection='3d')
verts, faces, normals, values = measure.marching_cubes(arr, 0.5)
ax.plot_trisurf(
verts[:, 0], verts[:, 1], faces, verts[:, 2], cmap='Spectral',
antialiased=False, linewidth=0.0)
plt.show()
Other approaches
One could implement essentially the same with a combination of np.linalg.norm() and np.indices():
import numpy as np
def sphere_idx(shape, radius, position):
"""Generate an n-dimensional spherical mask."""
assert len(position) == len(shape)
n = len(shape)
position = np.array(position).reshape((-1,) + (1,) * n)
arr = np.linalg.norm(np.indices(shape) - position, axis=0)
return arr <= radius
producing the same results (sphere_ogrid is sphere from above):
import matplotlib.pyplot as plt
funcs = sphere_ogrid, sphere_idx
fig, axs = plt.subplots(1, len(funcs), squeeze=False, figsize=(4 * len(funcs), 4))
d = 500
n = 2
shape = (d,) * n
position = (d // 2,) * n
size = (d // 8)
base = sphere_ogrid(shape, size, position)
for i, func in enumerate(funcs):
arr = func(shape, size, position)
axs[0, i].imshow(arr)
However, this is going to be substantially slower and requires much more temporary memory n_dim * shape of the output.
The benchmarks below seems to support the speed assessment:
base = sphere_ogrid(shape, size, position)
for func in funcs:
print(f"{func.__name__:20s}", np.allclose(base, arr), end=" ")
%timeit -o func(shape, size, position)
# sphere_ogrid True 1000 loops, best of 5: 866 µs per loop
# sphere_idx True 100 loops, best of 5: 4.15 ms per loop
size = 100
radius = 10
x0, y0, z0 = (50, 50, 50)
x, y, z = np.mgrid[0:size:1, 0:size:1, 0:size:1]
r = np.sqrt((x - x0)**2 + (y - y0)**2 + (z - z0)**2)
r[r > radius] = 0
Nice question. My answer to a similar question would be applicable here also.
You can try the following code. In the below mentioned code AA is the matrix that you want.
import numpy as np
from copy import deepcopy
''' size : size of original 3D numpy matrix A.
radius : radius of circle inside A which will be filled with ones.
'''
size, radius = 5, 2
''' A : numpy.ndarray of shape size*size*size. '''
A = np.zeros((size,size, size))
''' AA : copy of A (you don't want the original copy of A to be overwritten.) '''
AA = deepcopy(A)
''' (x0, y0, z0) : coordinates of center of circle inside A. '''
x0, y0, z0 = int(np.floor(A.shape[0]/2)), \
int(np.floor(A.shape[1]/2)), int(np.floor(A.shape[2]/2))
for x in range(x0-radius, x0+radius+1):
for y in range(y0-radius, y0+radius+1):
for z in range(z0-radius, z0+radius+1):
''' deb: measures how far a coordinate in A is far from the center.
deb>=0: inside the sphere.
deb<0: outside the sphere.'''
deb = radius - abs(x0-x) - abs(y0-y) - abs(z0-z)
if (deb)>=0: AA[x,y,z] = 1
Following is an example of the output for size=5 and radius=2 (a sphere of radius 2 pixels inside a numpy array of shape 5*5*5):
[[[0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0.]
[0. 0. 1. 0. 0.]
[0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0.]]
[[0. 0. 0. 0. 0.]
[0. 0. 1. 0. 0.]
[0. 1. 1. 1. 0.]
[0. 0. 1. 0. 0.]
[0. 0. 0. 0. 0.]]
[[0. 0. 1. 0. 0.]
[0. 1. 1. 1. 0.]
[1. 1. 1. 1. 1.]
[0. 1. 1. 1. 0.]
[0. 0. 1. 0. 0.]]
[[0. 0. 0. 0. 0.]
[0. 0. 1. 0. 0.]
[0. 1. 1. 1. 0.]
[0. 0. 1. 0. 0.]
[0. 0. 0. 0. 0.]]
[[0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0.]
[0. 0. 1. 0. 0.]
[0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0.]]]
I haven't printed the output for the size and radius that you had asked for (size=32 and radius=4), as the output will be very long.
Here is how to create voxels space without numpy, the main idea that you calculate distance between center and voxel and if voxel in radius you will create.
from math import sqrt
def distance_dimension(xyz0 = [], xyz1 = []):
delta_OX = pow(xyz0[0] - xyz1[0], 2)
delta_OY = pow(xyz0[1] - xyz1[1], 2)
delta_OZ = pow(xyz0[2] - xyz1[2], 2)
return sqrt(delta_OX+delta_OY+delta_OZ)
def voxels_figure(figure = 'sphere', position = [0,0,0], size = 1):
xmin, xmax = position[0]-size, position[0]+size
ymin, ymax = position[1]-size, position[1]+size
zmin, zmax = position[2]-size, position[2]+size
voxels = []
if figure == 'cube':
for local_z, world_z in zip(range(zmax-zmin), range(zmin, zmax)):
for local_y, world_y in zip(range(ymax-ymin), range(ymin, ymax)):
for local_x, world_x in zip(range(xmax-xmin), range(xmin, xmax)):
voxels.append([world_x,world_y,world_z])
elif figure == 'sphere':
for local_z, world_z in zip(range(zmax-zmin), range(zmin, zmax)):
for local_y, world_y in zip(range(ymax-ymin), range(ymin, ymax)):
for local_x, world_x in zip(range(xmax-xmin), range(xmin, xmax)):
radius = distance_dimension(xyz0 = [world_x, world_y,world_z], xyz1 = position)
if radius < size:
voxels.append([world_x,world_y,world_z])
return voxels
voxels = voxels_figure(figure = 'sphere', position = [0,0,0], size = 3)
After you will get voxels indexes, you can apply ~ones for cube matrix.
Instead of using loops, I propose to use a meshgrid + sphere equation + np.where
import numpy as np
def generate_sphere(volumeSize):
x_ = np.linspace(0,volumeSize, volumeSize)
y_ = np.linspace(0,volumeSize, volumeSize)
z_ = np.linspace(0,volumeSize, volumeSize)
r = int(volumeSize/2) # radius can be changed by changing r value
center = int(volumeSize/2) # center can be changed here
u,v,w = np.meshgrid(x_, y_, z_, indexing='ij')
a = np.power(u-center, 2)+np.power(v-center, 2)+np.power(w-center, 2)
b = np.where(a<=r*r,1,0)
return b
i want to caluculate Matrix determinants of minors in Python, maybe using scipy or some other package.
any suggestions?
Numpy/SciPy will do all this.
Form sub-matrices by removing rows and columns.
Calculate determinants with linalg.det().
To create the minor matrix you could use the function
def minor(M, i, j):
M = np.delete(M, i, 0)
M = np.delete(M, j, 1)
return M
With this output
np.linalg.det(M)
To create the principal minor determinants of a matrix and make the calculus for each one determinant, you would want to do this:
import numpy as np
# procedure for creating principal minor determinants
def minor(M, size):
# size can be 2x2, 3x3, 4x4 etc.
theMinor = []
for i in range(size):
clearList = []
for j in range(size):
clearList.append(M[i][j])
theMinor.append(clearList)
return theMinor
# procedure to handle the principal minor
def handleMinorPrincipals(A, n):
# A is a square Matrix
# n is number or rows and cols for A
if n == 0:
return None
if n == 1:
return A[0][0]
# size 1x1 is calculated
# we now look for other minors
for i in range(1, n):
# get the minor determinant
minDet = minor(A, i + 1)
# check if determinant is greater than 0
if np.linalg.det(minDet) > 0:
# do smth
else:
# smth else
return
Example:
[[8. 8. 0. 0. 0.]
[6. 6. 3. 0. 0.]
[0. 4. 4. 4. 0.]
[0. 0. 2. 2. 2.]
[0. 0. 0. 2. 2.]]
size = 1 -> Minor is
[8]
size = 2 -> Minor is
[[8. 8.]
[6. 6.]]
size = 3 -> Minor is
[[8. 8. 0.]
[6. 6. 3.]
[0. 4. 4]]
I have a two dimensional array, i.e. an array of sequences which are also arrays. For each sequence I would like to calculate the autocorrelation, so that for a (5,4) array, I would get 5 results, or an array of dimension (5,7).
I know I could just loop over the first dimension, but that's slow and my last resort. Is there another way?
Thanks!
EDIT:
Based on the chosen answer plus the comment from mtrw, I have the following function:
def xcorr(x):
"""FFT based autocorrelation function, which is faster than numpy.correlate"""
# x is supposed to be an array of sequences, of shape (totalelements, length)
fftx = fft(x, n=(length*2-1), axis=1)
ret = ifft(fftx * np.conjugate(fftx), axis=1)
ret = fftshift(ret, axes=1)
return ret
Note that length is a global variable in my code, so be sure to declare it. I also didn't restrict the result to real numbers, since I need to take into account complex numbers as well.
Using FFT-based autocorrelation:
import numpy
from numpy.fft import fft, ifft
data = numpy.arange(5*4).reshape(5, 4)
print data
##[[ 0 1 2 3]
## [ 4 5 6 7]
## [ 8 9 10 11]
## [12 13 14 15]
## [16 17 18 19]]
dataFT = fft(data, axis=1)
dataAC = ifft(dataFT * numpy.conjugate(dataFT), axis=1).real
print dataAC
##[[ 14. 8. 6. 8.]
## [ 126. 120. 118. 120.]
## [ 366. 360. 358. 360.]
## [ 734. 728. 726. 728.]
## [ 1230. 1224. 1222. 1224.]]
I'm a little confused by your statement about the answer having dimension (5, 7), so maybe there's something important I'm not understanding.
EDIT: At the suggestion of mtrw, a padded version that doesn't wrap around:
import numpy
from numpy.fft import fft, ifft
data = numpy.arange(5*4).reshape(5, 4)
padding = numpy.zeros((5, 3))
dataPadded = numpy.concatenate((data, padding), axis=1)
print dataPadded
##[[ 0. 1. 2. 3. 0. 0. 0. 0.]
## [ 4. 5. 6. 7. 0. 0. 0. 0.]
## [ 8. 9. 10. 11. 0. 0. 0. 0.]
## [ 12. 13. 14. 15. 0. 0. 0. 0.]
## [ 16. 17. 18. 19. 0. 0. 0. 0.]]
dataFT = fft(dataPadded, axis=1)
dataAC = ifft(dataFT * numpy.conjugate(dataFT), axis=1).real
print numpy.round(dataAC, 10)[:, :4]
##[[ 14. 8. 3. 0. 0. 3. 8.]
## [ 126. 92. 59. 28. 28. 59. 92.]
## [ 366. 272. 179. 88. 88. 179. 272.]
## [ 734. 548. 363. 180. 180. 363. 548.]
## [ 1230. 920. 611. 304. 304. 611. 920.]]
There must be a more efficient way to do this, especially because autocorrelation is symmetric and I don't take advantage of that.
For really large arrays it becomes important to have n = 2 ** p, where p is an integer. This will save you huge amounts of time. For example:
def xcorr(x):
l = 2 ** int(np.log2(x.shape[1] * 2 - 1))
fftx = fft(x, n = l, axis = 1)
ret = ifft(fftx * np.conjugate(fftx), axis = 1)
ret = fftshift(ret, axes=1)
return ret
This might give you wrap-around errors. For large arrays the auto correlation should be insignificant near the edges, though.
Maybe it's just a preference, but I wanted to follow from the definition. I personally find it a bit easier to follow that way. This is my implementation for an arbitrary nd array.
from itertools import product
from numpy import empty, roll
def autocorrelate(x):
"""
Compute the multidimensional autocorrelation of an nd array.
input: an nd array of floats
output: an nd array of autocorrelations
"""
# used for transposes
t = roll(range(x.ndim), 1)
# pairs of indexes
# the first is for the autocorrelation array
# the second is the shift
ii = [list(enumerate(range(1, s - 1))) for s in x.shape]
# initialize the resulting autocorrelation array
acor = empty(shape=[len(s0) for s0 in ii])
# iterate over all combinations of directional shifts
for i in product(*ii):
# extract the indexes for
# the autocorrelation array
# and original array respectively
i1, i2 = asarray(i).T
x1 = x.copy()
x2 = x.copy()
for i0 in i2:
# clip the unshifted array at the end
x1 = x1[:-i0]
# and the shifted array at the beginning
x2 = x2[i0:]
# prepare to do the same for
# the next axis
x1 = x1.transpose(t)
x2 = x2.transpose(t)
# normalize shifted and unshifted arrays
x1 -= x1.mean()
x1 /= x1.std()
x2 -= x2.mean()
x2 /= x2.std()
# compute the autocorrelation directly
# from the definition
acor[tuple(i1)] = (x1 * x2).mean()
return acor