I have a two dimensional array, i.e. an array of sequences which are also arrays. For each sequence I would like to calculate the autocorrelation, so that for a (5,4) array, I would get 5 results, or an array of dimension (5,7).
I know I could just loop over the first dimension, but that's slow and my last resort. Is there another way?
Thanks!
EDIT:
Based on the chosen answer plus the comment from mtrw, I have the following function:
def xcorr(x):
"""FFT based autocorrelation function, which is faster than numpy.correlate"""
# x is supposed to be an array of sequences, of shape (totalelements, length)
fftx = fft(x, n=(length*2-1), axis=1)
ret = ifft(fftx * np.conjugate(fftx), axis=1)
ret = fftshift(ret, axes=1)
return ret
Note that length is a global variable in my code, so be sure to declare it. I also didn't restrict the result to real numbers, since I need to take into account complex numbers as well.
Using FFT-based autocorrelation:
import numpy
from numpy.fft import fft, ifft
data = numpy.arange(5*4).reshape(5, 4)
print data
##[[ 0 1 2 3]
## [ 4 5 6 7]
## [ 8 9 10 11]
## [12 13 14 15]
## [16 17 18 19]]
dataFT = fft(data, axis=1)
dataAC = ifft(dataFT * numpy.conjugate(dataFT), axis=1).real
print dataAC
##[[ 14. 8. 6. 8.]
## [ 126. 120. 118. 120.]
## [ 366. 360. 358. 360.]
## [ 734. 728. 726. 728.]
## [ 1230. 1224. 1222. 1224.]]
I'm a little confused by your statement about the answer having dimension (5, 7), so maybe there's something important I'm not understanding.
EDIT: At the suggestion of mtrw, a padded version that doesn't wrap around:
import numpy
from numpy.fft import fft, ifft
data = numpy.arange(5*4).reshape(5, 4)
padding = numpy.zeros((5, 3))
dataPadded = numpy.concatenate((data, padding), axis=1)
print dataPadded
##[[ 0. 1. 2. 3. 0. 0. 0. 0.]
## [ 4. 5. 6. 7. 0. 0. 0. 0.]
## [ 8. 9. 10. 11. 0. 0. 0. 0.]
## [ 12. 13. 14. 15. 0. 0. 0. 0.]
## [ 16. 17. 18. 19. 0. 0. 0. 0.]]
dataFT = fft(dataPadded, axis=1)
dataAC = ifft(dataFT * numpy.conjugate(dataFT), axis=1).real
print numpy.round(dataAC, 10)[:, :4]
##[[ 14. 8. 3. 0. 0. 3. 8.]
## [ 126. 92. 59. 28. 28. 59. 92.]
## [ 366. 272. 179. 88. 88. 179. 272.]
## [ 734. 548. 363. 180. 180. 363. 548.]
## [ 1230. 920. 611. 304. 304. 611. 920.]]
There must be a more efficient way to do this, especially because autocorrelation is symmetric and I don't take advantage of that.
For really large arrays it becomes important to have n = 2 ** p, where p is an integer. This will save you huge amounts of time. For example:
def xcorr(x):
l = 2 ** int(np.log2(x.shape[1] * 2 - 1))
fftx = fft(x, n = l, axis = 1)
ret = ifft(fftx * np.conjugate(fftx), axis = 1)
ret = fftshift(ret, axes=1)
return ret
This might give you wrap-around errors. For large arrays the auto correlation should be insignificant near the edges, though.
Maybe it's just a preference, but I wanted to follow from the definition. I personally find it a bit easier to follow that way. This is my implementation for an arbitrary nd array.
from itertools import product
from numpy import empty, roll
def autocorrelate(x):
"""
Compute the multidimensional autocorrelation of an nd array.
input: an nd array of floats
output: an nd array of autocorrelations
"""
# used for transposes
t = roll(range(x.ndim), 1)
# pairs of indexes
# the first is for the autocorrelation array
# the second is the shift
ii = [list(enumerate(range(1, s - 1))) for s in x.shape]
# initialize the resulting autocorrelation array
acor = empty(shape=[len(s0) for s0 in ii])
# iterate over all combinations of directional shifts
for i in product(*ii):
# extract the indexes for
# the autocorrelation array
# and original array respectively
i1, i2 = asarray(i).T
x1 = x.copy()
x2 = x.copy()
for i0 in i2:
# clip the unshifted array at the end
x1 = x1[:-i0]
# and the shifted array at the beginning
x2 = x2[i0:]
# prepare to do the same for
# the next axis
x1 = x1.transpose(t)
x2 = x2.transpose(t)
# normalize shifted and unshifted arrays
x1 -= x1.mean()
x1 /= x1.std()
x2 -= x2.mean()
x2 /= x2.std()
# compute the autocorrelation directly
# from the definition
acor[tuple(i1)] = (x1 * x2).mean()
return acor
Related
I have a sparse matrix and I need to create a new neighbor matrix of each index.
Below I leave a representation of the data in the NxM matrix. For each of the elements of the matrix I need to obtain the neighbors in a section of KxK. With this information, it would generate a NMxKK matrix that contains in each row the indices of the neighboring KKs of the element.
I asked a similar question a while ago but the difference is that now the data is structured, so I can do without KdTree.
This new matrix is used to calculate the distance of non-zero neighbors, and with these distances associate a weight to each neighbor, to finally estimate the desired value as a weighted average of the neighbors.
Thanks in advance!
UPDATE
I have data like the ones in the image (generated with the function generate_data) and I need to perform the following operation.
Given a filter / kernel / NxN matrix, with N being the kernel size defined by me, calculate for nonzero values the distances with respect to the central pixel. Take as an example the value 20 that is in the position (1, 8) of the image. Taking a matrix of 5x5, the nonzero values of interest are 40 (in (0, 6)), 37 (in (1, 6)) and 25 (in (3, 10)), with distances 2.23606798, 2 and 2.82842712 respectively (obtained making the Euclidean norm between the indices).
What I need to get in this step is the matrix res:
[[0. 2.23606798 2. 0. 0. ]
[0. 0. 0. 0. 0. ]
[0. 0. 1. 0. 0. ]
[0. 0. 0. 0. 0. ]
[0. 0. 0. 0. 2.82842712]]
I need to obtain the 1. in the center of the matrix too to also take into account the value where I am standing (whose distance to itself is 0.).
With these values, I get the mask with non-zero values and calculate the weights based on a Gaussian distribution:
import scipy.stats as st
mask = 0 < res
gauss = st.norm.pdf(res) # or st.norm.pdf(mask * kernel(5))
[[0. , 0.03274718, 0.05399097, 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0.39894228, 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0.00730688]])
total = gauss.sum() # 0.4929873057962355
Finally, with these weights, I calculate the weights and the final value of the pixel by interpolating the values.
val[1, 8] = 0.03274718 * 40 / total + 0.05399097 * 37 / total + 0.39894228 * 20 / total + 0.00730688 * 25 / total
The same thing I must do for each pixel (I guess I have to add a kernel_size padding // 2 to be able to use the whole array).
Here is my script
import matplotlib.pylab as plt
import numpy as np
import scipy.stats as st
from scipy import sparse
def generate_data(m, n, density):
s = 64 * sparse.random(m, n, density=density).A
return s.astype(np.int8)
def plot_matrix(matrix):
for (j, i), label in np.ndenumerate(s):
plt.text(i, j, label, ha='center', va='center')
plt.imshow(matrix)
plt.show()
def kernel(n):
n = n if n % 2 != 0 else n + 1
mid = n // 2
m = np.ndarray((n, n, 2))
for i in range(n):
for j in range(n):
m[i, j] = np.array([i, j])
return np.linalg.norm(m - [mid, mid], axis=2)
s = generate_data(10, 14, 0.25)
plot_matrix(s)
This was really simple, although maybe not very efficient. What I had to do was two convolutions:
In the first, it was convolucionar the Gaussian kernel with the matrix
conv_1 = convolve2d(m * mask_clean, k_gauss)
In the second, the Gaussian kernel with the mask
conv_2 = convolve2d(mask_clean, k_gauss)
In each position, conv_1 would have the sum of each value weighed by the corresponding factor of the Gaussian kernel. conv_2 would have in each position the sum of all nonzero values. The only thing left to do was divide them to get the final result
# m have the data
mask_clean = (0 < m) & (m_mean - 3*m_std < m) & (m < m_mean + 3*m_std)
# Custom function to create a gaussian kernel
k = gkern(kernlen=5, std=5//2)
k_gauss = st.norm.pdf(k)
conv_1 = convolve2d(m * mask_clean, k_gauss)
conv_2 = convolve2d(mask_clean, k_gauss)
final = conv_1 / conv_2
Given a 3D numpy array of shape (256, 256, 256), how would I make a solid sphere shape inside? The code below generates a series of increasing and decreasing circles but is diamond shaped when viewed in the two other dimensions.
def make_sphere(arr, x_pos, y_pos, z_pos, radius=10, size=256, plot=False):
val = 255
for r in range(radius):
y, x = np.ogrid[-x_pos:n-x_pos, -y_pos:size-y_pos]
mask = x*x + y*y <= r*r
top_half = arr[z_pos+r]
top_half[mask] = val #+ np.random.randint(val)
arr[z_pos+r] = top_half
for r in range(radius, 0, -1):
y, x = np.ogrid[-x_pos:size-x_pos, -y_pos:size-y_pos]
mask = x*x + y*y <= r*r
bottom_half = arr[z_pos+r]
bottom_half[mask] = val#+ np.random.randint(val)
arr[z_pos+2*radius-r] = bottom_half
if plot:
for i in range(2*radius):
if arr[z_pos+i].max() != 0:
print(z_pos+i)
plt.imshow(arr[z_pos+i])
plt.show()
return arr
EDIT: pymrt.geometry has been removed in favor of raster_geometry.
DISCLAIMER: I am the author of both pymrt and raster_geometry.
If you just need to have the sphere, you can use the pip-installable module raster_geometry, and particularly raster_geometry.sphere(), e.g:
import raster_geometry as rg
arr = rg.sphere(3, 1)
print(arr.astype(np.int_))
# [[[0 0 0]
# [0 1 0]
# [0 0 0]]
# [[0 1 0]
# [1 1 1]
# [0 1 0]]
# [[0 0 0]
# [0 1 0]
# [0 0 0]]]
internally, this is implemented as an n-dimensional superellipsoid generator, you can check its source code for details.
Briefly, the (simplified) code would reads like this:
import numpy as np
def sphere(shape, radius, position):
"""Generate an n-dimensional spherical mask."""
# assume shape and position have the same length and contain ints
# the units are pixels / voxels (px for short)
# radius is a int or float in px
assert len(position) == len(shape)
n = len(shape)
semisizes = (radius,) * len(shape)
# genereate the grid for the support points
# centered at the position indicated by position
grid = [slice(-x0, dim - x0) for x0, dim in zip(position, shape)]
position = np.ogrid[grid]
# calculate the distance of all points from `position` center
# scaled by the radius
arr = np.zeros(shape, dtype=float)
for x_i, semisize in zip(position, semisizes):
# this can be generalized for exponent != 2
# in which case `(x_i / semisize)`
# would become `np.abs(x_i / semisize)`
arr += (x_i / semisize) ** 2
# the inner part of the sphere will have distance below or equal to 1
return arr <= 1.0
and testing it:
# this will save a sphere in a boolean array
# the shape of the containing array is: (256, 256, 256)
# the position of the center is: (127, 127, 127)
# if you want is 0 and 1 just use .astype(int)
# for plotting it is likely that you want that
arr = sphere((256, 256, 256), 10, (127, 127, 127))
# just for fun you can check that the volume is matching what expected
# (the two numbers do not match exactly because of the discretization error)
print(np.sum(arr))
# 4169
print(4 / 3 * np.pi * 10 ** 3)
# 4188.790204786391
I am failing to get how your code exactly works, but to check that this is actually producing spheres (using your numbers) you could try:
arr = sphere((256, 256, 256), 10, (127, 127, 127))
# plot in 3D
import matplotlib.pyplot as plt
from skimage import measure
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1, projection='3d')
verts, faces, normals, values = measure.marching_cubes(arr, 0.5)
ax.plot_trisurf(
verts[:, 0], verts[:, 1], faces, verts[:, 2], cmap='Spectral',
antialiased=False, linewidth=0.0)
plt.show()
Other approaches
One could implement essentially the same with a combination of np.linalg.norm() and np.indices():
import numpy as np
def sphere_idx(shape, radius, position):
"""Generate an n-dimensional spherical mask."""
assert len(position) == len(shape)
n = len(shape)
position = np.array(position).reshape((-1,) + (1,) * n)
arr = np.linalg.norm(np.indices(shape) - position, axis=0)
return arr <= radius
producing the same results (sphere_ogrid is sphere from above):
import matplotlib.pyplot as plt
funcs = sphere_ogrid, sphere_idx
fig, axs = plt.subplots(1, len(funcs), squeeze=False, figsize=(4 * len(funcs), 4))
d = 500
n = 2
shape = (d,) * n
position = (d // 2,) * n
size = (d // 8)
base = sphere_ogrid(shape, size, position)
for i, func in enumerate(funcs):
arr = func(shape, size, position)
axs[0, i].imshow(arr)
However, this is going to be substantially slower and requires much more temporary memory n_dim * shape of the output.
The benchmarks below seems to support the speed assessment:
base = sphere_ogrid(shape, size, position)
for func in funcs:
print(f"{func.__name__:20s}", np.allclose(base, arr), end=" ")
%timeit -o func(shape, size, position)
# sphere_ogrid True 1000 loops, best of 5: 866 µs per loop
# sphere_idx True 100 loops, best of 5: 4.15 ms per loop
size = 100
radius = 10
x0, y0, z0 = (50, 50, 50)
x, y, z = np.mgrid[0:size:1, 0:size:1, 0:size:1]
r = np.sqrt((x - x0)**2 + (y - y0)**2 + (z - z0)**2)
r[r > radius] = 0
Nice question. My answer to a similar question would be applicable here also.
You can try the following code. In the below mentioned code AA is the matrix that you want.
import numpy as np
from copy import deepcopy
''' size : size of original 3D numpy matrix A.
radius : radius of circle inside A which will be filled with ones.
'''
size, radius = 5, 2
''' A : numpy.ndarray of shape size*size*size. '''
A = np.zeros((size,size, size))
''' AA : copy of A (you don't want the original copy of A to be overwritten.) '''
AA = deepcopy(A)
''' (x0, y0, z0) : coordinates of center of circle inside A. '''
x0, y0, z0 = int(np.floor(A.shape[0]/2)), \
int(np.floor(A.shape[1]/2)), int(np.floor(A.shape[2]/2))
for x in range(x0-radius, x0+radius+1):
for y in range(y0-radius, y0+radius+1):
for z in range(z0-radius, z0+radius+1):
''' deb: measures how far a coordinate in A is far from the center.
deb>=0: inside the sphere.
deb<0: outside the sphere.'''
deb = radius - abs(x0-x) - abs(y0-y) - abs(z0-z)
if (deb)>=0: AA[x,y,z] = 1
Following is an example of the output for size=5 and radius=2 (a sphere of radius 2 pixels inside a numpy array of shape 5*5*5):
[[[0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0.]
[0. 0. 1. 0. 0.]
[0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0.]]
[[0. 0. 0. 0. 0.]
[0. 0. 1. 0. 0.]
[0. 1. 1. 1. 0.]
[0. 0. 1. 0. 0.]
[0. 0. 0. 0. 0.]]
[[0. 0. 1. 0. 0.]
[0. 1. 1. 1. 0.]
[1. 1. 1. 1. 1.]
[0. 1. 1. 1. 0.]
[0. 0. 1. 0. 0.]]
[[0. 0. 0. 0. 0.]
[0. 0. 1. 0. 0.]
[0. 1. 1. 1. 0.]
[0. 0. 1. 0. 0.]
[0. 0. 0. 0. 0.]]
[[0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0.]
[0. 0. 1. 0. 0.]
[0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0.]]]
I haven't printed the output for the size and radius that you had asked for (size=32 and radius=4), as the output will be very long.
Here is how to create voxels space without numpy, the main idea that you calculate distance between center and voxel and if voxel in radius you will create.
from math import sqrt
def distance_dimension(xyz0 = [], xyz1 = []):
delta_OX = pow(xyz0[0] - xyz1[0], 2)
delta_OY = pow(xyz0[1] - xyz1[1], 2)
delta_OZ = pow(xyz0[2] - xyz1[2], 2)
return sqrt(delta_OX+delta_OY+delta_OZ)
def voxels_figure(figure = 'sphere', position = [0,0,0], size = 1):
xmin, xmax = position[0]-size, position[0]+size
ymin, ymax = position[1]-size, position[1]+size
zmin, zmax = position[2]-size, position[2]+size
voxels = []
if figure == 'cube':
for local_z, world_z in zip(range(zmax-zmin), range(zmin, zmax)):
for local_y, world_y in zip(range(ymax-ymin), range(ymin, ymax)):
for local_x, world_x in zip(range(xmax-xmin), range(xmin, xmax)):
voxels.append([world_x,world_y,world_z])
elif figure == 'sphere':
for local_z, world_z in zip(range(zmax-zmin), range(zmin, zmax)):
for local_y, world_y in zip(range(ymax-ymin), range(ymin, ymax)):
for local_x, world_x in zip(range(xmax-xmin), range(xmin, xmax)):
radius = distance_dimension(xyz0 = [world_x, world_y,world_z], xyz1 = position)
if radius < size:
voxels.append([world_x,world_y,world_z])
return voxels
voxels = voxels_figure(figure = 'sphere', position = [0,0,0], size = 3)
After you will get voxels indexes, you can apply ~ones for cube matrix.
Instead of using loops, I propose to use a meshgrid + sphere equation + np.where
import numpy as np
def generate_sphere(volumeSize):
x_ = np.linspace(0,volumeSize, volumeSize)
y_ = np.linspace(0,volumeSize, volumeSize)
z_ = np.linspace(0,volumeSize, volumeSize)
r = int(volumeSize/2) # radius can be changed by changing r value
center = int(volumeSize/2) # center can be changed here
u,v,w = np.meshgrid(x_, y_, z_, indexing='ij')
a = np.power(u-center, 2)+np.power(v-center, 2)+np.power(w-center, 2)
b = np.where(a<=r*r,1,0)
return b
What is the best (elegant and efficient) way in Theano to convert a vector of indices to a matrix of zeros and ones, in which every row is the one-of-N representation of an index?
v = t.ivector() # the vector of indices
n = t.scalar() # the width of the matrix
convert = <your code here>
f = theano.function(inputs=[v, n], outputs=convert)
Example:
n_val = 4
v_val = [1,0,3]
f(v_val, n_val) = [[0,1,0,0],[1,0,0,0],[0,0,0,1]]
I didn't compare the different option, but you can also do it like this. It don't request extra memory.
import numpy as np
import theano
n_val = 4
v_val = np.asarray([1,0,3])
idx = theano.tensor.lvector()
z = theano.tensor.zeros((idx.shape[0], n_val))
one_hot = theano.tensor.set_subtensor(z[theano.tensor.arange(idx.shape[0]), idx], 1)
f = theano.function([idx], one_hot)
print f(v_val)[[ 0. 1. 0. 0.]
[ 1. 0. 0. 0.]
[ 0. 0. 0. 1.]]
It's as simple as:
convert = t.eye(n,n)[v]
There still might be a more efficient solution that doesn't require building the whole identity matrix. This might be problematic for large n and short v's.
There's now a built in function for this theano.tensor.extra_ops.to_one_hot.
y = tensor.as_tensor([3,2,1])
fn = theano.function([], tensor.extra_ops.to_one_hot(y, 4))
print fn()
# [[ 0. 0. 0. 1.]
# [ 0. 0. 1. 0.]
# [ 0. 1. 0. 0.]]
In general we could have matrices of arbitrary sizes. For my application it is necessary to have square matrix. Also the dummy entries should have a specified value. I am wondering if there is anything built in numpy?
Or the easiest way of doing it
EDIT :
The matrix X is already there and it is not squared. We want to pad the value to make it square. Pad it with the dummy given value. All the original values will stay the same.
Thanks a lot
Building upon the answer by LucasB here is a function which will pad an arbitrary matrix M with a given value val so that it becomes square:
def squarify(M,val):
(a,b)=M.shape
if a>b:
padding=((0,0),(0,a-b))
else:
padding=((0,b-a),(0,0))
return numpy.pad(M,padding,mode='constant',constant_values=val)
Since Numpy 1.7, there's the numpy.pad function. Here's an example:
>>> x = np.random.rand(2,3)
>>> np.pad(x, ((0,1), (0,0)), mode='constant', constant_values=42)
array([[ 0.20687158, 0.21241617, 0.91913572],
[ 0.35815412, 0.08503839, 0.51852029],
[ 42. , 42. , 42. ]])
For a 2D numpy array m it’s straightforward to do this by creating a max(m.shape) x max(m.shape) array of ones p and multiplying this by the desired padding value, before setting the slice of p corresponding to m (i.e. p[0:m.shape[0], 0:m.shape[1]]) to be equal to m.
This leads to the following function, where the first line deals with the possibility that the input has only one dimension (i.e. is an array rather than a matrix):
import numpy as np
def pad_to_square(a, pad_value=0):
m = a.reshape((a.shape[0], -1))
padded = pad_value * np.ones(2 * [max(m.shape)], dtype=m.dtype)
padded[0:m.shape[0], 0:m.shape[1]] = m
return padded
So, for example:
>>> r1 = np.random.rand(3, 5)
>>> r1
array([[ 0.85950957, 0.92468279, 0.93643261, 0.82723889, 0.54501699],
[ 0.05921614, 0.94946809, 0.26500925, 0.02287463, 0.04511802],
[ 0.99647148, 0.6926722 , 0.70148198, 0.39861487, 0.86772468]])
>>> pad_to_square(r1, 3)
array([[ 0.85950957, 0.92468279, 0.93643261, 0.82723889, 0.54501699],
[ 0.05921614, 0.94946809, 0.26500925, 0.02287463, 0.04511802],
[ 0.99647148, 0.6926722 , 0.70148198, 0.39861487, 0.86772468],
[ 3. , 3. , 3. , 3. , 3. ],
[ 3. , 3. , 3. , 3. , 3. ]])
or
>>> r2=np.random.rand(4)
>>> r2
array([ 0.10307689, 0.83912888, 0.13105124, 0.09897586])
>>> pad_to_square(r2, 0)
array([[ 0.10307689, 0. , 0. , 0. ],
[ 0.83912888, 0. , 0. , 0. ],
[ 0.13105124, 0. , 0. , 0. ],
[ 0.09897586, 0. , 0. , 0. ]])
etc.
i want to caluculate Matrix determinants of minors in Python, maybe using scipy or some other package.
any suggestions?
Numpy/SciPy will do all this.
Form sub-matrices by removing rows and columns.
Calculate determinants with linalg.det().
To create the minor matrix you could use the function
def minor(M, i, j):
M = np.delete(M, i, 0)
M = np.delete(M, j, 1)
return M
With this output
np.linalg.det(M)
To create the principal minor determinants of a matrix and make the calculus for each one determinant, you would want to do this:
import numpy as np
# procedure for creating principal minor determinants
def minor(M, size):
# size can be 2x2, 3x3, 4x4 etc.
theMinor = []
for i in range(size):
clearList = []
for j in range(size):
clearList.append(M[i][j])
theMinor.append(clearList)
return theMinor
# procedure to handle the principal minor
def handleMinorPrincipals(A, n):
# A is a square Matrix
# n is number or rows and cols for A
if n == 0:
return None
if n == 1:
return A[0][0]
# size 1x1 is calculated
# we now look for other minors
for i in range(1, n):
# get the minor determinant
minDet = minor(A, i + 1)
# check if determinant is greater than 0
if np.linalg.det(minDet) > 0:
# do smth
else:
# smth else
return
Example:
[[8. 8. 0. 0. 0.]
[6. 6. 3. 0. 0.]
[0. 4. 4. 4. 0.]
[0. 0. 2. 2. 2.]
[0. 0. 0. 2. 2.]]
size = 1 -> Minor is
[8]
size = 2 -> Minor is
[[8. 8.]
[6. 6.]]
size = 3 -> Minor is
[[8. 8. 0.]
[6. 6. 3.]
[0. 4. 4]]