Related
Let's suppose we have these two matrices
epsilon = np.asmatrix([
[1,2],
[-1,2],
[0,2]
])
and this one:
step_weights = np.asmatrix(np.random.normal(0, 0.5, (np.shape(epsilon)))
I want to populate/update step_weights matrix based on epsilon values, that is:
if epsilon[i,j] > 0:
step_weights[i,j] = np.minimum(1.2 * step_weights[i,j], 50)
elif epsilon[i,j] < 0:
step_weights[i,j] = np.maximum(0.5 * step_weights[i,j], 10**-6)
This is what I have done:
import numpy as np
def update_steps(self, epsilon):
for (i, j), epsilon_ij in np.ndenumerate(epsilon):
if epsilon_ij > 0:
step_weights[i, j] = np.minimum(1.2 * step_weights[i,j], 50)
elif epsilon_ij < 0:
step_weights[i, j] =np.maximum(0.5 * step_weights[i,j], 10**-6)
and that's working fine.
My question is: is there a more efficient/cleaner way to do it, avoiding the for loop? For example exploiting matrix calculus or linear algebra?
Use bool indices array:
>>> np.random.seed(0)
>>> step_weights = np.asmatrix(np.random.normal(0, 0.5, np.shape(epsilon)))
>>> step_weights
matrix([[ 0.88202617, 0.2000786 ],
[ 0.48936899, 1.1204466 ],
[ 0.933779 , -0.48863894]])
>>> mask = epsilon > 0
>>> step_weights[mask] = np.minimum(step_weights.A[mask] * 1.2, 50)
>>> mask = epsilon < 0
>>> step_weights[mask] = np.maximum(step_weights.A[mask] * 0.5, 10 ** -6)
>>> step_weights
matrix([[ 1.05843141, 0.24009433],
[ 0.2446845 , 1.34453592],
[ 0.933779 , -0.58636673]])
Note: The matrix class is not recommended now, and will be discarded in the future. It should use a regular multidimensional array instead. The current multidimensional array already supports many matrix operations (such as using the # operator for matrix multiplication).
I asked a question here with the details: https://math.stackexchange.com/questions/4381785/possibly-speed-up-matrix-multiplications
In short, I am trying to create a P x N matrix, X, with typical element: \sum_{j,k;j,k \neq i} w_{jp} A_{jk} Y_{kp}, where w is P x N, A is N x N and Y is P x N. See the link above for a markup version of that formula.
I'm providing a mwe here to see how I can correct the code (the calculations seem correct, just incomplete see below) and more importantly speed this up however possible:
w = np.array([[2,1],[3,7]])
A = np.array([[2,1],[9,-1]])
Y = np.array([[6,2],[11,8]])
N=w.shape[1]
P=w.shape[0]
X = np.zeros((P, N))
for p in range(P) :
for i in range(N-1):
for j in range(N-1):
X[p,i] = np.delete(w,i,1)[i,p]*np.delete(np.delete(A,i,0),i,1)[i,j]*np.delete(Y.T,i,0)[j,p]
The output looks like:
array([[ -2. , 0. ],
[-56. , 0.]])
If we set (i,p) = to the (1,1) element of X_{ip}, the value can be understood using the formula provided above:
sum_{j,k;j,k \neq i} w_{j1} A_{jk} Y_{k1} = w_12 A_22 Y_12 = 1 * -1 * 2 = -2 as it is in the output.
the (1,2) element of X_{ip} should be:
sum_{j,k;j,k \neq i} w_{j2} A_{jk} Y_{k2} = w_22 A_22 Y_22 = 7 * -1 * 8 = -56 as it is in the output.
But I am not getting the correct answer for the final column of X because my range is to (N-1) not N because I received an IndexError out of bounds when it is N. More importantly, here N=P=2, but I have large N and P and the code, as is, takes a very long time to run. Any suggestions would be greatly appreciated.
Since the delete functions depend only on i, I factored them out, and reordered the loops. Also corrected the w1 index order.
In [274]: w = np.array([[2,1],[3,7]])
...: A = np.array([[2,1],[9,-1]])
...: Y = np.array([[6,2],[11,8]])
...: N=w.shape[1]
...: P=w.shape[0]
...: X = np.zeros((P, N))
...: for i in range(N-1):
...: print('i',i)
...: w1 = np.delete(w,i,1)
...: a1 = np.delete(np.delete(A,i,0),i,1)
...: y1 = np.delete(Y.T,i,0)
...: print(w1.shape, a1.shape, y1.shape)
...: print(w1#a1#y1)
...: print(np.einsum('ij,jk,li->i',w1,a1,y1))
...: for p in range(P):
...: for j in range(N-1):
...: X[p,i] = w1[p,i]*a1*y1[j,p]
...:
i 0
(2, 1) (1, 1) (1, 2)
[[ -2 -8]
[-14 -56]]
[ -2 -56]
In [275]: X
Out[275]:
array([[ -2., 0.],
[-56., 0.]])
Your [-2,-56] are the diagonal of w1#a1#y1, or the einsum. The 0's are from the original np.zeros because i is only on range(1).
This should be faster because the delete is not repeated unnecessarily. np.delete is still relatively expensive, but I haven't tried to figure out exactly what you are doing.
Didn't your question initially have (2,3) and (3,3) arrays? That, or something a bit larger, may be more general and informative.
edit
I think this is closer to the math expressions:
def foo(w,A,Y):
P, N = w.shape
X = np.zeros((P, N))
for i in range(N):
#w1 = np.delete(w,i,1)
a1 = np.delete(A,i,1)
y1 = np.delete(Y.T,i,0)
print(w1.shape, a1.shape, y1.shape)
for p in range(P) :
for j in range(N-1):
X[p,i] += w[p,i]*a1[i,j]*y1[j,p]
return X
we can get rid of the j loop with:
def foo1(w,A,Y):
P, N = w.shape
X = np.zeros((P, N))
for i in range(N):
a1 = np.delete(A,i,1)
y1 = np.delete(Y.T,i,0)
print(w1.shape, a1.shape, y1.shape)
for p in range(P) :
X[p,i] = w[p,i]*np.dot(a1[i,:],y1[:,p])
return X
with
...: w3 = np.array([[2,1,0],[3,7,0.5]])
...: A3 = np.array([[2,1,0],[9,0,8],[1,2,5]])
...: Y3 = np.array([[6,2,-1],[11,8,-7]])
...: w2 = np.array([[2,1],[3,7]])
...: A2 = np.array([[2,1],[9,-1]])
...: Y2 = np.array([[6,2],[11,8]])
both produce
In [372]: foo(w2,A2,Y2)
(2, 1) (2, 1) (1, 2)
...
Out[372]:
array([[ 4., 54.],
[ 24., 693.]])
In [373]: foo(w3,A3,Y3)
(2, 1) (3, 2) (2, 2)
...
Out[373]:
array([[ 4. , 46. , 0. ],
[ 24. , 301. , 13.5]])
and after more fiddling:
def foo4(w,A,Y):
P, N = w.shape
X = np.zeros((P, N))
for i in range(N):
a1 = np.delete(A,i,1)
y1 = np.delete(Y.T,i,0)
X[:,i] = np.einsum('j,jp->p',a1[i,:],y1)
# X[:,i] = a1[i,:]#y1
return X*w
I suspect it is possible to do w*(A#Y.T) and then subtract an array that involves A[:,i] and Y[:,i], but haven't figured out that array.
I'm interested in the version of Increment Numpy multi-d array with repeated indices indexed with a cross-product.
In particular, I want to perform the operation done by the following code using matrix operations to accelerate it:
def get_s(image, grid_size):
W, H = image.shape
s = np.zeros((W, H))
for w in range(W):
for h in range(H):
i, j = int(w / grid_size), int(h / grid_size)
s[i, j] += image[w, h]
return s
My idea was to compute all the (i, j) indices at once and use NumPy's ix_ method to index the matrix s:
def get_s(image, grid_size):
W, H = image.shape
s = np.zeros((W, H))
w_idx, h_idx = np.arange(W), np.arange(H)
x_idx, y_idx = np.trunc(w_idx / grid_size).astype(int), np.trunc(h_idx / grid_size).astype(int)
s[np.ix_(x_idx, y_idx)] += image
return s
It is easier to understand the code above with NumPy's example:
Using ix_ one can quickly construct index arrays that will index the cross product. a[np.ix_([1,3],[2,5])] returns the array [[a[1,2] a[1,5]], [a[3,2] a[3,5]]].
In my case, it's likely that some indices will be repeated (as for example with grid_size=2, int(0 / grid_size) = int(1 / grid_size)). And that's where the Increment Numpy multi-d array with repeated indices question comes.
In case the indices are repeated, I would like to update the matrix with the image value by the same number of times. I cannot get any solution to this problem without any additional loops (e.g., zipping the indices; but you essentially have to perform the actual cross product of the indices for s and the image).
I don't think this is the best way to do it but here's one way.
import numpy as np
image = np.arange(9).reshape(3, 3)
s = np.zeros((5, 5))
x_idx, y_idx = np.meshgrid([0, 0, 2], [1, 1, 2])
# find unique destinations
idxs = np.stack((x_idx.flatten(), y_idx.flatten())).T
idxs_unique, counts = np.unique(idxs, axis = 0, return_counts = True)
# create mask for the source and sumthe source pixels headed to the same destination
idxs_repeated = idxs[None, :, :].repeat(len(idxs_unique), axis = 0)
image_mask = (idxs_repeated == idxs_unique[:, None, :]).all(-1)
pixel_sum = (image.flatten()[None, :]*image_mask).sum(-1)
# assign summed sources to destination
s[tuple(idxs_unique.T)] += pixel_sum
EDIT 1:
If you run into problems caused by memory constraints you can do the image masking and summation in batches as done in the following implementation. I set the batch size to 10 but that parameter can be set to whatever works on your machine.
import numpy as np
image = np.arange(12).reshape(3, 4)
s = np.zeros((5, 5))
x_idx, y_idx = np.meshgrid([0, 0, 2], [1, 1, 2, 1])
idxs = np.stack((x_idx.flatten(), y_idx.flatten())).T
idxs_unique, counts = np.unique(idxs, axis = 0, return_counts = True)
batch_size = 10
pixel_sum = []
for i in range(len(unique_idxs)//batch_size + ((len(unique_idxs)%batch_size)!=0)):
batch = idxs_unique[i*batch_size:(i+1)*batch_size, None, :]
idxs_repeated = idxs[None, :, :].repeat(len(batch), axis = 0)
image_mask = (idxs_repeated == idxs_unique[i*batch_size:(i+1)*batch_size, None, :]).all(-1)
pixel_sum.append((image.flatten()[None, :]*image_mask).sum(-1))
pixel_sum = np.concatenate(pixel_sum)
s[tuple(idxs_unique.T)] += pixel_sum
EDIT 2:
OP's method seems to be faster by far if you use numba.
import numpy as np
from numba import jit
#jit(nopython=True)
def get_s(image, grid_size):
W, H = image.shape
s = np.zeros((W, H))
for w in range(W):
for h in range(H):
i, j = int(w / grid_size), int(h / grid_size)
s[i, j] += image[w, h]
return s
def get_s_vec(image, grid_size, batch_size = 10):
W, H = image.shape
s = np.zeros((W, H))
w_idx, h_idx = np.arange(W), np.arange(H)
x_idx, y_idx = np.trunc(w_idx / grid_size).astype(int), np.trunc(h_idx / grid_size).astype(int)
y_idx, x_idx = np.meshgrid(y_idx, x_idx)
idxs = np.stack((x_idx.flatten(), y_idx.flatten())).T
idxs_unique, counts = np.unique(idxs, axis = 0, return_counts = True)
pixel_sum = []
for i in range(len(unique_idxs)//batch_size + ((len(unique_idxs)%batch_size)!=0)):
batch = idxs_unique[i*batch_size:(i+1)*batch_size, None, :]
idxs_repeated = idxs[None, :, :].repeat(len(batch), axis = 0)
image_mask = (idxs_repeated == idxs_unique[i*batch_size:(i+1)*batch_size, None, :]).all(-1)
pixel_sum.append((image.flatten()[None, :]*image_mask).sum(-1))
pixel_sum = np.concatenate(pixel_sum)
s[tuple(idxs_unique.T)] += pixel_sum
return s
print(f'loop result = {get_s(image, 2)}')
print(f'vector result = {get_s_vec(image, 2)}')
%timeit get_s(image, 2)
%timeit get_s_vec(image, 2)
output:
loop result = [[10. 18. 0. 0.]
[17. 21. 0. 0.]
[ 0. 0. 0. 0.]]
vector result = [[10. 18. 0. 0.]
[17. 21. 0. 0.]
[ 0. 0. 0. 0.]]
The slowest run took 15.00 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 5: 751 ns per loop
1000 loops, best of 5: 195 µs per loop
Does skimage.measure.block_reduce do
what you want?
from skimage.measure import block_reduce
s = block_reduce(image, block_size=(grid_size, grid_size), func=np.sum)
I have a 2D data array and I'm trying to get a profile of values about its center in an efficient manner. So the output should be two one-dimensional arrays: one with the values of distances from the center, the other with the mean of all the values in the original 2D that are at that distance from the center.
Each index has a non-integer distance from the center, which prevents me from using some already known solutions for the problem. Allow me to explain.
Consider these matrices
data = np.random.randn(5,5)
L = 2
x = np.arange(-L,L+1,1)*2.5
y = np.arange(-L,L+1,1)*2.5
xx, yy = np.meshgrid(x, y)
r = np.sqrt(xx**2. + yy**2.)
So the matrices are
In [30]: r
Out[30]:
array([[ 7.07106781, 5.59016994, 5. , 5.59016994, 7.07106781],
[ 5.59016994, 3.53553391, 2.5 , 3.53553391, 5.59016994],
[ 5. , 2.5 , 0. , 2.5 , 5. ],
[ 5.59016994, 3.53553391, 2.5 , 3.53553391, 5.59016994],
[ 7.07106781, 5.59016994, 5. , 5.59016994, 7.07106781]])
In [31]: data
Out[31]:
array([[ 1.27603322, 1.33635284, 1.93093228, 0.76229675, -0.00956535],
[ 0.69556071, -1.70829753, 1.19615919, -1.32868665, 0.29679494],
[ 0.13097791, -1.33302719, 1.48226442, -0.76672223, -1.01836614],
[ 0.51334771, -0.83863115, -0.41541794, 0.34743342, 0.1199237 ],
[-1.02042539, 0.90739383, -2.4858624 , -0.07417987, 0.90748933]])
For this case the expected output should be array([ 0. , 2.5 , 3.53553391, 5. , 5.59016994, 7.07106781]) for the index of distances, and a second array of same length with the mean of all the values that are at those corresponding distances: array([ 0.98791323, -0.32496927, 0.37221219, -0.6209728 , 0.27986926, 0.04060628]).
From this answer there is a very nice function to compute the profile about any arbitrary point. However, the problem with his approach is that it approximates the distance r by the index distance. So his r for my case would be this:
array([[2, 2, 2, 2, 2],
[2, 1, 1, 1, 2],
[2, 1, 0, 1, 2],
[2, 1, 1, 1, 2],
[2, 2, 2, 2, 2]])
which is a pretty big difference for me, since I'm working with small matrices. This approximation, however, allows him to use np.bincount, which is pretty handy (but won't work for me).
I've been trying to expand this for float distance, like my version r, but so far no luck. bincount doesn't work with floats and histogram needs equally-spaced bins, which is not the case. Any suggestion?
Approach #1
def radial_profile_app1(data, r):
mid = data.shape[0]//2
ids = np.rint((r**2)/r[mid-1,mid]**2).astype(int).ravel()
count = np.bincount(ids)
R = data.shape[0]//2 # Radial profile radius
R0 = R+1
dists = np.unique(r[:R0,:R0][np.tril(np.ones((R0,R0),dtype=bool))])
mean_data = (np.bincount(ids, data.ravel())/count)[count!=0]
return dists, mean_data
For the given sample data -
In [475]: radial_profile_app1(data, r)
Out[475]:
(array([ 0. , 2.5 , 3.53553391, 5. , 5.59016994,
7.07106781]),
array([ 1.48226442 , -0.3297520425, -0.8820454775, -0.3605795875,
0.5696863263, 0.2883829525]))
Approach #2
def radial_profile_app2(data, r):
R = data.shape[0]//2 # Radial profile radius
range_arr = np.arange(-R,R+1)
ids = (range_arr[:,None]**2 + range_arr**2).ravel()
count = np.bincount(ids)
R0 = R+1
dists = np.unique(r[:R0,:R0][np.tril(np.ones((R0,R0),dtype=bool))])
mean_data = (np.bincount(ids, data.ravel())/count)[count!=0]
return dists, mean_data
Runtime test -
In [562]: # Setup inputs
...: N = 2001
...: data = np.random.randn(N,N)
...: L = (N-1)//2
...: x = np.arange(-L,L+1,1)*2.5
...: y = np.arange(-L,L+1,1)*2.5
...: xx, yy = np.meshgrid(x, y)
...: r = np.sqrt(xx**2. + yy**2.)
...:
In [563]: out01, out02 = radial_profile_app1(data, r)
...: out11, out12 = radial_profile_app2(data, r)
...:
...: print np.allclose(out01, out11)
...: print np.allclose(out02, out12)
...:
True
True
In [566]: %timeit radial_profile_app1(data, r)
...: %timeit radial_profile_app2(data, r)
...:
10 loops, best of 3: 114 ms per loop
10 loops, best of 3: 91.2 ms per loop
Got what I was expecting with this function:
def radial_prof(data, r):
uniq = np.unique(r)
prof = np.array([ np.mean(data[ r==un ]) for un in uniq ])
return uniq, prof
But I'm still not happy with the fact that I had to use list comprehension (or a python loop), since it might be slow for very large matrices.
Here is an indirect sorting approach that should scale well if batch size and / or number of bins are large. The sorting is O(n log n) all the histogramming is O(n). I've also added a little unscientific speed test. For the speed test I use flat indexing but I left the 2d index code in because its more flexible when dealing with images of different sizes etc.
import numpy as np
# this need only be run once per batch
def r_to_ind(r, dist_bins="auto"):
f = np.argsort(r.ravel())
if dist_bins == "auto":
rs = r.ravel()[f]
bins = np.where(np.r_[True, rs[1:]!=rs[:-1]])[0]
dist_bins = rs[bins]
else:
bins = np.searchsorted(r.ravel()[f], dist_bins)
denom = np.diff(np.r_[bins, r.size])
return f, np.unravel_index(f, r.shape), bins, denom, dist_bins
# this is with adjustable offset
def profile_xy(image, yx, ij, bins, nynx, denom):
(y, x), (i, j), (ny, nx) = yx, ij, nynx
return np.add.reduceat(image[i + y - ny//2, j + x - nx//2], bins) / denom
# this is fixed
def profile_xy_no_offset(image, ij, bins, denom):
return np.add.reduceat(image[ij], bins) / denom
# this is fixed and flat
def profile_xy_no_offset_flat(image, k, bins, denom):
return np.add.reduceat(image.ravel()[k], bins) / denom
data = np.array([[ 1.27603322, 1.33635284, 1.93093228, 0.76229675, -0.00956535],
[ 0.69556071, -1.70829753, 1.19615919, -1.32868665, 0.29679494],
[ 0.13097791, -1.33302719, 1.48226442, -0.76672223, -1.01836614],
[ 0.51334771, -0.83863115, -0.41541794, 0.34743342, 0.1199237 ],
[-1.02042539, 0.90739383, -2.4858624 , -0.07417987, 0.90748933]])
r = np.array([[ 7.07106781, 5.59016994, 5. , 5.59016994, 7.07106781],
[ 5.59016994, 3.53553391, 2.5 , 3.53553391, 5.59016994],
[ 5. , 2.5 , 0. , 2.5 , 5. ],
[ 5.59016994, 3.53553391, 2.5 , 3.53553391, 5.59016994],
[ 7.07106781, 5.59016994, 5. , 5.59016994, 7.07106781]])
f, (i, j), bins, denom, dist_bins = r_to_ind(r)
result = profile_xy(data, (2, 2), (i, j), bins, (5, 5), denom)
print(dist_bins)
# [ 0. 2.5 3.53553391 5. 5.59016994 7.07106781]
print(result)
# [ 1.48226442 -0.32975204 -0.88204548 -0.36057959 0.56968633 0.28838295]
#########################
from timeit import timeit
n = 2001
batch = 100
fake = 10
a = np.random.random((fake, n, n))
l = np.linspace(-1, 1, n)**2
r = sum(np.ix_(l, l))
def run_all():
f, ij, bins, denom, dist_bins = r_to_ind(r)
for b in range(batch):
profile_xy_no_offset_flat(a[b%fake], f, bins, denom)
print(timeit(run_all, number=10))
# 47.4157 (for 10 batches of 100 images of size 2001x2001)
# and my computer is slower than Divakar's ;-)
I've made some more benchmarks comparing mine to #Divakar's approach 3 stripping out everything precomputable into a run-once-per-batch function. The general finding: they are similar mine has a higher upfront cost but is then faster. But they only cross over at around 100 pictures per batch.
I've been trying to write some code which will add the numbers which fall into a certain range and add a corresponding number to a list. I also need to pull the range from a cumsum range.
numbers = []
i=0
z = np.random.rand(1000)
arraypmf = np.array(pmf)
summation = np.cumsum(z)
while i < 6:
index = i-1
a = np.extract[condition, z] # I can't figure out how to write the condition.
length = len(a)
length * numbers.append(i)
I'm not entirely sure what you're trying to do, but the easiest way to do conditions in numpy is to just apply them to the whole array to get a mask:
mask = (z >= 0.3) & (z < 0.6)
Then you can use, e.g., extract or ma if necessary—but in this case, I think you can just rely on the fact that True==1 and False==0 and do this:
zm = z * mask
After all, if all you're doing is summing things up, 0 is the same as not there, and you can just replace len with count_nonzero.
For example:
In [588]: z=np.random.rand(10)
In [589]: z
Out[589]:
array([ 0.33335522, 0.66155206, 0.60602815, 0.05755882, 0.03596728,
0.85610536, 0.06657973, 0.43287193, 0.22596789, 0.62220608])
In [590]: mask = (z >= 0.3) & (z < 0.6)
In [591]: mask
Out[591]: array([ True, False, False, False, False, False, False, True, False, False], dtype=bool)
In [592]: z * mask
Out[592]:
array([ 0.33335522, 0. , 0. , 0. , 0. ,
0. , 0. , 0.43287193, 0. , 0. ])
In [593]: np.count_nonzero(z * mask)
Out[593]: 2
In [594]: np.extract(mask, z)
Out[594]: array([ 0.33335522, 0.43287193])
In [595]: len(np.extract(mask, z))
Out[595]: 2
Here is another approach to do (what I think) you're trying to do:
import numpy as np
z = np.random.rand(1000)
bins = np.asarray([0, .1, .15, 1.])
# This will give the number of values in each range
counts, _ = np.histogram(z, bins)
# This will give the sum of all values in each range
sums, _ = np.histogram(z, bins, weights=z)