Find location where smaller array matches larger array the most - python

I need to find where a smaller 2d array, array1 matches the closest inside another 2d array, array2.
array1 with have the size of grid_size 46x46 to 96x96.
array2 will be larger (184x184).
I only have access to numpy.
I am currently trying to use the Tversky formula but am not tied to it.
Efficiency is the most important part as this will run many times. My current solution shown below is very slow.
for i in range(array2.shape[0] - grid_size):
for j in range(array2.shape[1] - grid_size):
r[i, j] = np.sum(array2[i:i+grid_size, j:j+grid_size] == array1 ) / (np.sum(array2[i:i+grid_size, j:j+grid_size] != array1 ) + np.sum(Si[i:i+grid_size, j:j+grid_size] == array1 ))
Edit:
The goal is to find the location where a smaller image matches another image.

Here is an FFT/convolution based approach that minimizes Euclidean distance:
import numpy as np
from numpy import fft
N = 184
n = 46
pad = 192
def best_offs(A,a):
A,a = A.astype(float),a.astype(float)
Ap,ap = (np.zeros((pad,pad)) for _ in "Aa")
Ap[:N,:N] = A
ap[:n,:n] = a
sim = fft.irfft2(fft.rfft2(ap).conj()*fft.rfft2(Ap))[:N-n+1,:N-n+1]
Ap[:N,:N] = A*A
ap[:n,:n] = 1
ref = fft.irfft2(fft.rfft2(ap).conj()*fft.rfft2(Ap))[:N-n+1,:N-n+1]
return np.unravel_index((ref-2*sim).argmin(),sim.shape)
# example
# random picture
A = np.random.randint(0,256,(N,N),dtype=np.uint8)
# random offset
offy,offx = np.random.randint(0,N-n+1,2)
# sub pic at random offset
# randomly flip half of the least significant 75% of all bits
a = A[offy:offy+n,offx:offx+n] ^ np.random.randint(0,64,(n,n))
# reconstruct offset
oyrec,oxrec = best_offs(A,a)
assert offy==oyrec and offx==oxrec
# speed?
from timeit import timeit
print(timeit(lambda:best_offs(A,a),number=100)*10,"ms")
# example with zero a
a[...] = 0
# make A smaller in a matching subsquare
A[offy:offy+n,offx:offx+n]>>=1
# reconstruct offset
oyrec,oxrec = best_offs(A,a)
assert offy==oyrec and offx==oxrec
Sample run:
3.458537160186097 ms

Related

How can I calculate the column ranks of a large numpy matrix?

Suppose I have the following representation of my data:
import numpy as np
import itertools
N = 100 # number of simulations
vars = 5 # number of variables to simulate
indices = list(range(vars))
sims = np.random.rand(vars,N) # simulation array of size (5,100)
combinations = list(itertools.combinations(indices,3)) # list of combinations of variables of length 3
# create a matrix which sums the variable simulations for each combination
combo_sims = np.empty((len(combinations),sims.shape[1]))
for i in range(len(combinations)):
combo_sims[i] = np.sum(sims[list(combinations[i])],axis=0)
How can I get the ordered ranks for each row by column? The expected result would be a matrix of the same shape as combo_sims but with ranks instead of the sum of sims. For example,
[[1,0,2],[0,2,1],[2,1,0]]
should return:
[[2,1,3],[1,3,2],[3,2,1]]
My actual data includes ~1,500,000 combinations and 10,000 simulations. What is the fastest way to get the results from question 1, keeping in mind memory utilization?
Is there a more efficient way to set the combo_sims matrix?
You can use scipy.stats.rankdata or np.argsort
If you are not interested in special handling of ties then use np.argsort because it is faster than scipy.stats.rankdata
1.5M x 10K float64 numbers = 120 GB memory. if you do not have enough RAM you can process by chunks and save results on disk.
If your combinations have fixed length = 3 then you can a) precompute some calculations b) If it possible use float32 numbers instead of float64 because it will work faster and takes 2x lesser memory.
I've modified your code and it became 5x times faster:
Number of combinations: 447580
Number of simulations: 10000
original: 10.793719053268433 seconds
with optimizations: 2.0908007621765137 seconds
Example:
import itertools
import time
import numpy as np
N = 10_000 # number of simulations
n_vars = 140 # number of variables to simulate
indices = list(range(n_vars))
sims = np.random.rand(n_vars, N) # simulation array of size (5,100)
combinations = list(
itertools.combinations(indices, 3)
) # list of combinations of variables of length 3
print(f"Number of combinations: {len(combinations)}")
print(f"Number of simulations: {N}")
################################# Original version
start_time = time.time()
# create a matrix which sums the variable simulations for each combination
combo_sims = np.empty((len(combinations), sims.shape[1]))
for i in range(len(combinations)):
combo_sims[i] = np.sum(sims[list(combinations[i])], axis=0)
print(f"original: {time.time() - start_time} seconds")
################################# Modified version
start_time = time.time()
sims = sims.astype(np.float32)
combinations = np.array(combinations, int)
combo_sims2 = np.empty((len(combinations), sims.shape[1]), np.float32)
pairs = np.array(list(itertools.combinations(range(n_vars), 2)), int)
n = 0
buf = np.zeros(sims.shape[1], np.float32)
for i, j in pairs:
np.add(sims[i], sims[j], buf)
for k in range(j + 1, n_vars):
np.add(buf, sims[k], combo_sims2[n])
n += 1
print(f"with optimizations: {time.time() - start_time} seconds")
assert np.allclose(combo_sims, combo_sims2)

Efficient sum of Gaussians in 3D with NumPy using large arrays

I have an M x 3 array of 3D coordinates, coords (M ~1000-10000), and I would like to compute the sum of Gaussians centered at these coordinates over a mesh grid 3D array. The mesh grid 3D array is typically something like 64 x 64 x 64, but sometimes upwards of 256 x 256 x 256, and can go even larger. I’ve followed this question to get started, by converting my meshgrid array into an array of N x 3 coordinates, xyz, where N is 64^3 or 256^3, etc. However, for large array sizes it takes too much memory to vectorize the entire calculation (understandable since it could approach 1e11 elements and consume a terabyte of RAM) so I’ve broken it up into a loop over M coordinates. However, this is too slow.
I’m wondering if there is any way to speed this up at all without overloading memory. By converting the meshgrid to xyz, I feel like I’ve lost any advantage of the grid being equally spaced, and that somehow, maybe with scipy.ndimage, I should be able to take advantage of the even spacing to speed things up.
Here’s my initial start:
import numpy as np
from scipy import spatial
#create meshgrid
side = 100.
n = 64 #could be 256 or larger
x_ = np.linspace(-side/2,side/2,n)
x,y,z = np.meshgrid(x_,x_,x_,indexing='ij')
#convert meshgrid to list of coordinates
xyz = np.column_stack((x.ravel(),y.ravel(),z.ravel()))
#create some coordinates
coords = np.random.random(size=(1000,3))*side - side/2
def sumofgauss(coords,xyz,sigma):
"""Simple isotropic gaussian sum at coordinate locations."""
n = int(round(xyz.shape[0]**(1/3.))) #get n samples for reshaping to 3D later
#this version overloads memory
#dist = spatial.distance.cdist(coords, xyz)
#dist *= dist
#values = 1./np.sqrt(2*np.pi*sigma**2) * np.exp(-dist/(2*sigma**2))
#values = np.sum(values,axis=0)
#run cdist in a loop over coords to avoid overloading memory
values = np.zeros((xyz.shape[0]))
for i in range(coords.shape[0]):
dist = spatial.distance.cdist(coords[None,i], xyz)
dist *= dist
values += 1./np.sqrt(2*np.pi*sigma**2) * np.exp(-dist[0]/(2*sigma**2))
return values.reshape(n,n,n)
image = sumofgauss(coords,xyz,1.0)
import matplotlib.pyplot as plt
plt.imshow(image[n/2]) #show a slice
plt.show()
M = 1000, N = 64 (~5 seconds):
M = 1000, N = 256 (~10 minutes):
Considering that many of your distance calculations will give zero weight after the exponential, you can probably drop a lot of your distances. Doing big chunks of distance calculations while dropping distances which are greater than a threshhold is usually faster with KDTree:
import numpy as np
from scipy.spatial import cKDTree # so we can get a `coo_matrix` output
def gaussgrid(coords, sigma = 1, n = 64, side = 100, eps = None):
x_ = np.linspace(-side/2,side/2,n)
x,y,z = np.meshgrid(x_,x_,x_,indexing='ij')
xyz = np.column_stack((x.ravel(),y.ravel(),z.ravel()))
if eps is None:
eps = np.finfo('float64').eps
thr = -np.log(eps) * 2 * sigma**2
data_tree = cKDTree(coords)
discr = 1000 # you can tweak this to get best results on your system
values = np.empty(n**3)
for i in range(n**3//discr + 1):
slc = slice(i * discr, i * discr + discr)
grid_tree = cKDTree(xyz[slc])
dists = grid_tree.sparse_distance_matrix(data_tree, thr, output_type = 'coo_matrix')
dists.data = 1./np.sqrt(2*np.pi*sigma**2) * np.exp(-dists.data/(2*sigma**2))
values[slc] = dists.sum(1).squeeze()
return values.reshape(n,n,n)
Now, even if you keep eps = None it'll be a bit faster as you're still returning about 10% your distances, but with eps = 1e-6 or so, you should get a big speedup. On my system:
%timeit out = sumofgauss(coords, xyz, 1.0)
1 loop, best of 3: 23.7 s per loop
%timeit out = gaussgrid(coords)
1 loop, best of 3: 2.12 s per loop
%timeit out = gaussgrid(coords, eps = 1e-6)
1 loop, best of 3: 382 ms per loop

Converting `for` loop that can't be vectorized to sparse matrix

There are 2 boxes and a small gap that allows 1 particle per second from one box to enter the other box. Whether a particle will go from A to B, or B to A depends on the ratio Pa/Ptot (Pa: number of particles in box A, Ptot: total particles in both boxes).
To make it faster, I need to get rid of the for loops, however I can't find a way to either vectorize them or turn them into a sparse matrix that represents my for loop:
What about for loops you can't vectorize? The ones where the result at iteration n depends on what you calculated in iteration n-1, n-2, etc. You can define a sparse matrix that represents your for loop and then do a sparse matrix solve.
But I can't figure out how to define a sparse matrix out of this. The simulation boils down to calculating:
where
is the piece that gives me trouble when trying to express my problem as described here. (Note: the contents in the parenthesis are a bool operation)
Questions:
Can I vectorize the for loop?
If not, how can I define a sparse matrix?
(bonus question) Why is the execution time x27 faster in Python (0.027s) than Octave (0.75s)?
Note: I implemented the simulation in both Python and Octave and will soon do it on Matlab, therefor the tags are correct.
Octave code
1; % starting with `function` causes errors
function arr = Px_simulation (Pa_init, Ptot, t_arr)
t_size = size(t_arr);
arr = zeros(t_size); % fixed size array is better than arr = []
rand_arr = rand(t_size); % create all rand values at once
_Pa = Pa_init;
for _j=t_arr()
if (rand_arr(_j) * Ptot > _Pa)
_Pa += 1;
else
_Pa -= 1;
endif
arr(_j) = _Pa;
endfor
endfunction
t = 1:10^5;
for _i=1:3
Ptot = 100*10^_i;
tic()
Pa_simulation = Px_simulation(Ptot, Ptot, t);
toc()
subplot(2,2,_i);
plot(t, Pa_simulation, "-2;simulation;")
title(strcat("{P}_{a0}=", num2str(Ptot), ',P=', num2str(Ptot)))
endfor
Python
import numpy
import matplotlib.pyplot as plt
import timeit
import cpuinfo
from random import random
print('\nCPU: {}'.format(cpuinfo.get_cpu_info()['brand']))
PARTICLES_COUNT_LST = [1000, 10000, 100000]
DURATION = 10**5
t_vals = numpy.linspace(0, DURATION, DURATION)
def simulation(na_initial, ntotal, tvals):
shape = numpy.shape(tvals)
arr = numpy.zeros(shape)
na_current = na_initial
for i in range(len(tvals)):
if random() > (na_current/ntotal):
na_current += 1
else:
na_current -= 1
arr[i] = na_current
return arr
plot_lst = []
for i in PARTICLES_COUNT_LST:
start_t = timeit.default_timer()
n_a_simulation = simulation(na_initial=i, ntotal=i, tvals=t_vals)
execution_time = (timeit.default_timer() - start_t)
print('Execution time: {:.6}'.format(execution_time))
plot_lst.append(n_a_simulation)
for i in range(len(PARTICLES_COUNT_LST)):
plt.subplot('22{}'.format(i))
plt.plot(t_vals, plot_lst[i], 'r')
plt.grid(linestyle='dotted')
plt.xlabel("time [s]")
plt.ylabel("Particles in box A")
plt.show()
IIUC you can use cumsum() in both Octave and Numpy:
Octave:
>> p = rand(1, 5);
>> r = rand(1, 5);
>> p
p =
0.43804 0.37906 0.18445 0.88555 0.58913
>> r
r =
0.70735 0.41619 0.37457 0.72841 0.27605
>> cumsum (2*(p<(r+0.03)) - 1)
ans =
1 2 3 2 1
>> (2*(p<(r+0.03)) - 1)
ans =
1 1 1 -1 -1
Also note that the following function will return values ([-1, 1]):

Python: Find mean of points within radius of a an element in 2D array

I am looking for an efficient way to find the mean of of values with a certain radius of an element in a 2D NumPy array, excluding the center point and values < 0.
My current method is to create a disc shaped mask (using the method here) and find the mean of points within this mask. This is taking too long however...over 10 minutes to calculate ~18000 points within my 300x300 array.
The array I want to find means within is here titled "arr"
def radMask(index,radius,array,insert):
a,b = index
nx,ny = array.shape
y,x = np.ogrid[-a:nx-a,-b:ny-b]
mask = x*x + y*y <= radius*radius
array[mask] = insert
return array
arr_mask = np.zeros_like(arr).astype(int)
arr_mask = radMask(center, radius, arr_mask, 1)
arr_mask[arr < 0] = 0 #Exclude points with no echo
arr_mask[ind] = 0 #Exclude center point
arr_mean = 0
if np.any(dbz_bg):
arr_mean = sp.mean(arr[arr_mask])
Is there any more efficient way to do this? I've looked into some of the image processing filters/tools but can't quite wrap my head around it.
is this helpful? This takes only a couple of seconds on my laptop for ~ 18000 points:
import numpy as np
#generate a random 300x300 matrix for testing
inputMat = np.random.random((300,300))
radius=50
def radMask(index,radius,array):
a,b = index
nx,ny = array.shape
y,x = np.ogrid[-a:nx-a,-b:ny-b]
mask = x*x + y*y <= radius*radius
return mask
#meanAll is going to store ~18000 points
meanAll=np.zeros((130,130))
for x in range(130):
for y in range(130):
centerMask=(x,y)
mask=radMask(centerMask,radius,inputMat)
#un-mask center and values below 0
mask[centerMask]=False
mask[inputMat<0]=False
#get the mean
meanAll[x,y]=np.mean(inputMat[mask])

Vectorization of python numpy code is making it slower instead of faster

I am trying to perform image correlation to find which frame out of a set of 20 frames (the set is stored in a 3D array, x) matches best with a given frame (stored as a 2D array, y). This step has to be performed 1000 times.
I tried to vectorize the code to make it run faster. But somehow the vectorization is making the code take twice as long. I am probably doing something wrong in the vectorization process which is making it slower.
Here is the code
import numpy as np
import time
def corr2(a,b):
#Getting shapes and prealocating the auxillairy variables
k = np.shape(a)
#Calculating mean values
AM=np.mean(a)
BM=np.mean(b)
#calculate vectors
c_vect = (a-AM)*(b-BM)
d_vect = (a-AM)**2
e_vect = (b-BM)**2
#Formula itself
r_out = np.sum(c_vect)/float(np.sqrt(np.sum(d_vect)*np.sum(e_vect)))
return r_out
def ZZ_1X_v1_MCC(R,RefImage):
from img_proccessing import corr2
Cor = np.zeros(R.shape[2])
for t in range(R.shape[2]):
Cor[t]=corr2(RefImage,R[:,:,t]) #Correlation
#report
max_correlationvalue_intermediate = np.amax(Cor)
max_correlatedframe_intermediate = np.argmax(Cor)
max_correlatedframeandvalue = [max_correlatedframe_intermediate,max_correlationvalue_intermediate];
return max_correlatedframeandvalue
def ZZ_1X_v1_MCC_vectorized(R,RefImage):
R_shape = np.asarray(np.shape(R))
R_flattened = R.swapaxes(0,2).reshape(R_shape[2],R_shape[0]*R_shape[1])
AA = np.transpose(R_flattened)
RefImageflattened = RefImage.transpose().ravel()
#Calculating mean subtracted values
AAM = AA - np.mean(AA,axis=0)
BM = RefImageflattened - np.mean(RefImageflattened)
#calculate vectors
DD_vect = AAM**2
E_vect = BM**2
EE_vect = np.transpose(np.tile(np.transpose(E_vect),(R_shape[2],1)))
CC_vect = AAM*np.transpose(np.tile(BM,(R_shape[2],1)))
#Formula itself
Cor = np.sum(CC_vect,axis=0)/np.sqrt((np.sum(DD_vect,axis=0)*np.sum(EE_vect,axis=0)).astype(float))
#report
max_correlationvalue_intermediate = np.amax(Cor)
max_correlatedframe_intermediate = np.argmax(Cor)
max_correlatedframeandvalue = [max_correlatedframe_intermediate,max_correlationvalue_intermediate];
return max_correlatedframeandvalue
x = np.arange(400000).reshape((20,200,100)).swapaxes(0,2) #3D array with 20 frames
y = np.transpose(np.arange(20000).reshape((200,100))) #2D array with 1 frame
# using for loop
tic = time.time()
for i in range(500):
[a,b] = ZZ_1X_v1_MCC(x,y)
print(time.time() - tic)
# using vectorization
tic = time.time()
for i in range(500):
[a,b] = ZZ_1X_v1_MCC_vectorized(x,y)
print(time.time() - tic)

Categories

Resources