Slices along arbitrary axis - python

I have a numpy array A such that
A.shape[axis] = n+1.
Now I want to construct two slices B and C of A by selecting the indices 0, .., n-1 and 1, ..., n respectively along the axis axis. Thus
B.shape[axis] = C.shape[axis] = n
and B and C have the same size as A along the other axes. There must be no copying of data.

# exemple data
A = np.random.rand(2, 3, 4, 5)
axis = 2
n = A.ndim
# building n-dimensional slice
s = [slice(None), ] * n
s[axis] = slice(0, n - 1)
B = A[s]
s[axis] = slice(1, n)
C = A[s]
One-liners :
B = A[[slice(None) if i != axis else slice(0, n-1) for i in xrange(n)]]
C = A[[slice(None) if i != axis else slice(1, n) for i in xrange(n)]]

Related

Add arrays of different dimensions

I am trying to add two arrays of different dimensions. Array A has shape (20,2,2,2,2,3) and array B has shape (20). Currently, I am using np.newaxis 5 times, so B gets the same shape as A and then I add them. In my actual code A is much bigger and this forces me to write np.newaxis many times. Is there a way to avoid repeating the np.newaxis and just tell python to give B the same shape as A?
A = np.zeros([20,2,2,2,2,3])
B = np.arange(1,21)
B = B[:,np.newaxis,np.newaxis,np.newaxis,np.newaxis,np.newaxis]
C = A + B
If you are broadcasting, this will work:
A= np.zeros([20,2,2,2,2,3])
B = np.arange(1,21)
B = B.reshape([20,1,1,1,1,1])
C = A + B
In a more dynamic way:
shape_a = [20,2,2,2,2,3]
A= np.zeros(shape_a)
B = np.arange(1,21)
shape_b = [shape_a[0]] +(len(shape_a)-1)*[1]
B = B.reshape(shape_b)
C = A + B
With no broadcasting:
A = np.zeros([20,2,2,2,2,3])
B = np.arange(1,21)
C = A.copy()
C[:,0,0,0,0,0] += B
And if you don't care about A, just the result:
C = np.zeros([20,2,2,2,2,3])
B = np.arange(1,21)
C[:,0,0,0,0,0] += B
What you are doing is broadcasting to the number of dimensions of A, but if you look carefully, this operation you did does not make B have the same shape as A. Indeed they are still different:
>>> B[:, None, None, None, None, None].shape
(20, 1, 1, 1, 1, 1)
So this is basically applying np.expand_dims consequently five times. An alternative way is to reshape the array with extra singletons:
>>> B.reshape((-1, *(1,)*(A.ndim-1))).shape
(20, 1, 1, 1, 1, 1)
Which will reshape (*,) to (*, 1, 1, 1, 1, 1).
This has the same effect as placing the np.newaxis manually.
For your case, you can also do:
C = (A.reshape(A.shape[0], -1) + B[:,None]).reshape(A.shape)
Or swap the common axis to the end:
C = (A.swapaxes(0,-1) + B).swapaxes(0,-1)

Error: index 9 is out of bounds for axis 0 with size 9

I am attempting to make a Lagrange interpolation function however after construction I get the error index 9 is out of bounds for axis 0 with size 9. Why am a receiving this error and how can I fix it to perform my interpolation?
import numpy as np
b = np.arange(3,12)
y = np.arange(9)
from sympy import Symbol
t = Symbol('t')
d = len(b)
def interpolation(x, z):
if len(x) != len(z):
print("Error: the length of x and z is different")
else:
L = 0
for i in range (d+1):
p = 1
for j in range (d+1):
if j != i:
p *= (t-x[j]/(x[i] - x[j]))
L += z[i]*p
print(interpolation(b, y))
Because the first index is a zero you can only go to the index 8 and 9 is then out of bounds. Your 9 indices are 0, 1, 2, 3, 4, 5, 6, 7, 8.
So you should not loop through d + 1. Use only d.

Remove elements from Numpy array until y has equivalent elements in each value

I have an array y composed of 0 and 1, but at a different frequency.
For example:
y = np.array([0, 0, 1, 1, 1, 1, 0])
And I have an array x of the same length.
x = np.array([0, 1, 2, 3, 4, 5, 6])
The idea is to filter out elements until there are the same number of 0 and 1.
A valid solution would be to remove index 5:
x = np.array([0, 1, 2, 3, 4, 6])
y = np.array([0, 0, 1, 1, 1, 0])
A naive method I can think of is to get the difference between the value frequency of y (in this case 4-3=1) create a mask for y == 1 and switch random elements from True to False until the difference is 0. Then create a mask for y == 0, do a OR between them and apply it to both x and y.
This doesn't really seem the best "python/numpy way" of doing it though.
Any suggestions? Something like randomly select n elements from the highest count, where n is the count of the lowest value.
If this is easier with pandas then that would work for me too.
Naive algorithm assuming 1 > 0:
mask_pos = y == 1
mask_neg = y == 0
pos = len(y[mask_pos])
neg = len(y[mask_neg])
diff = pos-neg
while diff > 0:
rand = np.random.randint(0, len(y))
if mask_pos[rand] == True:
mask_pos[rand] = False
diff -= 1
mask_final = mask_pos | mask_neg
y_new = y[mask_final]
x_new = x[mask_final]
This naive algorithm is really slow
One way to do that with NumPy is this:
import numpy as np
# Makes a mask to balance ones and zeros
def balance_binary_mask(binary_array):
binary_array = np.asarray(binary_array).ravel()
# Count number of ones
z = np.count_nonzero(binary_array)
# If there are less ones than zeros
if z <= len(binary_array) // 2:
# Invert the array
binary_array = ~binary_array
# Find ones
idx = np.nonzero(binary_array)[0]
# Number of elements to remove
rem = 2 * len(idx) - len(binary_array)
# Pick random indices to remove
rem_idx = np.random.choice(idx, size=rem, replace=False)
# Make mask
mask = np.ones_like(binary_array, dtype=bool)
# Mask elements to remove
mask[rem_idx] = False
return mask
# Test
np.random.seed(0)
y = np.array([0, 0, 1, 1, 1, 1, 0])
x = np.array([0, 1, 2, 3, 4, 5, 6])
m = balance_binary_mask(y)
print(m)
# [ True True True True False True True]
y = y[m]
x = x[m]
print(y)
# [0 0 1 1 1 0]
print(x)
# [0 1 2 3 5 6]

Numpy: Get rectangle area just the size of mask

I have an image and a mask. Both are numpy array. I get the mask through GraphSegmentation (cv2.ximgproc.segmentation), so the area isn't rectangle, but not divided. I'd like to get a rectangle just the size of masked area, but I don't know the efficient way.
In other words, unmasked pixels are value of 0 and masked pixels are value over 0, so I want to get a rectangle where...
top = the smallest index of axis 0 whose value > 0
bottom = the largest index of axis 0 whose value > 0
left = the smallest index axis 1 whose value > 0
right = the largest index axis 1 whose value > 0
image = src[top : bottom, left : right]
My code is below
segmentation = cv2.ximgproc.segmentation.createGraphSegmentation()
src = cv2.imread('image_file')
segment = segmentation.processImage(src)
for i in range(np.max(segment)):
dst = np.array(src)
dst[segment != i] = 0
cv2.imwrite('output_file', dst)
If you prefer pure Numpy, you can achieve this using np.where and np.meshgrid:
i, j = np.where(mask)
indices = np.meshgrid(np.arange(min(i), max(i) + 1),
np.arange(min(j), max(j) + 1),
indexing='ij')
sub_image = image[indices]
np.where returns a tuple of arrays specifying, pairwise, the indices in each axis for each non-zero element of mask. We then create arrays of all the row and column indices we will want using np.arange, and use np.meshgrid to generate two grid-shaped arrays that index the part of the image we're interested in. Note that we specify matrix-style indexing using index='ij' to avoid having to transpose the result (the default is Cartesian-style indexing).
Essentially, meshgrid constructs indices so that:
image[indices][a, b] == image[indices[0][a, b], indices[1][a, b]]
Example
Start with the following:
>>> image = np.arange(12).reshape((4, 3))
>>> image
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
Let's say we want to extract the [[3,4],[6,7]] sub-matrix, which is the bounding rectangle for the the following mask:
>>> mask = np.array([[0,0,0],[0,1,0],[1,0,0],[0,0,0]])
>>> mask
array([[0, 0, 0],
[0, 1, 0],
[1, 0, 0],
[0, 0, 0]])
Then, applying the above method:
>>> i, j = np.where(mask)
>>> indices = np.meshgrid(np.arange(min(i), max(i) + 1), np.arange(min(j), max(j) + 1), indexing='ij')
>>> image[indices]
array([[3, 4],
[6, 7]])
Here, indices[0] is a matrix of row indices, while indices[1] is the corresponding matrix of column indices:
>>> indices[0]
array([[1, 1],
[2, 2]])
>>> indices[1]
array([[0, 1],
[0, 1]])
I think using np.amax and np.amin and cropping the image is much faster.
i, j = np.where(mask)
indices = np.meshgrid(np.arange(min(i), max(i) + 1),
np.arange(min(j), max(j) + 1),
indexing='ij')
sub_image = image[indices]
Time taken: 50 msec
where = np.array(np.where(mask))
x1, y1 = np.amin(where, axis=1)
x2, y2 = np.amax(where, axis=1)
sub_image = image[x1:(x2+1), y1:(y2+1)]
Time taken: 5.6 msec
I don't get Hans's results when running the two methods (using NumPy 1.18.5). In any case, there is a much more efficient method, where you take the arg-max along each dimension
i, j = np.where(mask)
y, x = np.meshgrid(
np.arange(min(i), max(i) + 1),
np.arange(min(j), max(j) + 1),
indexing="ij",
)
Took 38 ms
where = np.array(np.where(mask))
y1, x1 = np.amin(where, axis=1)
y2, x2 = np.amax(where, axis=1) + 1
sub_image = image[y1:y2, x1:x2]
Took 35 ms
maskx = np.any(mask, axis=0)
masky = np.any(mask, axis=1)
x1 = np.argmax(maskx)
y1 = np.argmax(masky)
x2 = len(maskx) - np.argmax(maskx[::-1])
y2 = len(masky) - np.argmax(masky[::-1])
sub_image = image[y1:y2, x1:x2]
Took 2 ms
Timings script

What is a pythonic way of finding maximum values and their indices for moving subarrays for numpy ndarray?

I have numpy ndarrays which could be 3 or 4 dimensional. I'd like to find maximum values and their indices in a moving subarray window with specified strides.
For example, suppose I have a 4x4 2d array and my moving subarray window is 2x2 with stride 2 for simplicity:
[[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9,10,11,12],
[13,14,15,16]].
I'd like to find
[[ 6 8],
[14 16]]
for max values and
[(1,1), (3,1),
(3,1), (3,3)]
for indices as output.
Is there a concise, efficient implementation for this for ndarray without using loops?
Here's a solution using stride_tricks:
def make_panes(arr, window):
arr = np.asarray(arr)
r,c = arr.shape
s_r, s_c = arr.strides
w_r, w_c = window
if c % w_c != 0 or r % w_r != 0:
raise ValueError("Window doesn't fit array.")
shape = (r / w_r, c / w_c, w_r, w_c)
strides = (w_r*s_r, w_c*s_c, s_r, s_c)
return np.lib.stride_tricks.as_strided(arr, shape, strides)
def max_in_panes(arr, window):
w_r, w_c = window
r, c = arr.shape
panes = make_panes(arr, window)
v = panes.reshape((-1, w_r * w_c))
ix = np.argmax(v, axis=1)
max_vals = v[np.arange(r/w_r * c/w_c), ix]
i = np.repeat(np.arange(0,r,w_r), c/w_c)
j = np.tile(np.arange(0, c, w_c), r/w_r)
rel_i, rel_j = np.unravel_index(ix, window)
max_ix = i + rel_i, j + rel_j
return max_vals, max_ix
A demo:
>>> vals, ix = max_in_panes(x, (2,2))
>>> print vals
[[ 6 8]
[14 16]]
>>> print ix
(array([1, 1, 3, 3]), array([1, 3, 1, 3]))
Note that this is pretty untested, and is designed to work with 2d arrays. I'll leave the generalization to n-d arrays to the reader...

Categories

Resources