numpy resize n-dimensional array with padding - python

I have two arrays, a and b.
a has shape (1, 2, 3, 4)
b has shape (4, 3, 2, 1)
I would like to make them both (4, 3, 3, 4) with the new positions filled with 0's.
I can do:
new_shape = (4, 3, 3, 4)
a = np.resize(a, new_shape)
b = np.resize(b, new_shape)
..but this repeats the elements of each to form the new elements, which does not work for me.
Instead I thought I could do:
a = a.resize(new_shape)
b = b.resize(new_shape)
..which according to the documentation pads with 0's.
But it doesn't work for multi-dimensional arrays, raising error:
ValueError: resize only works on single-segment arrays
So is there a different way to achieve this? ie. same as np.resize but with 0-padding?
NB: I am only looking for pure-numpy solutions.
EDIT: I'm using numpy version 1.20.2
EDIT: I just found out that is works for numbers, but not for objects, I forgot to mention that it is an array of objects not numbers.

resize method pads with 0s in a flattened sense; the function pads with repeats.
To illustrate how resize "flattens" before padding:
In [108]: a = np.arange(12).reshape(1,4,3)
In [109]: a
Out[109]:
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]]])
In [110]: a1 = a.copy()
In [111]: a1.resize((2,4,4))
In [112]: a1
Out[112]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[ 0, 0, 0, 0]],
[[ 0, 0, 0, 0],
[ 0, 0, 0, 0],
[ 0, 0, 0, 0],
[ 0, 0, 0, 0]]])
If instead I make a target array of the right shape, and copy, I can maintain the original multidimensional block:
In [114]: res = np.zeros((2,4,4),a.dtype)
In [115]: res[:a.shape[0],:a.shape[1],:a.shape[2]]=a
In [116]: res
Out[116]:
array([[[ 0, 1, 2, 0],
[ 3, 4, 5, 0],
[ 6, 7, 8, 0],
[ 9, 10, 11, 0]],
[[ 0, 0, 0, 0],
[ 0, 0, 0, 0],
[ 0, 0, 0, 0],
[ 0, 0, 0, 0]]])
I wrote out the slices explicitly (for clarity). Such a tuple could be created programmatically if needed.

Related

setting the values of sliding windows of an array in numpy

Suppose I have a 2D array with shape (3, 3), call it a, and an array of zeros with shape (7, 7, 5, 5), call it b. I want to modify b in the following way:
for p in range(5):
for q in range(5):
b[p:p + 3, q:q + 3, p, q] = a
Given:
a = np.array([[4, 2, 2],
[9, 0, 5],
[9, 9, 4]])
b = np.zeros((7, 7, 5, 5), dtype=int)
b would end up something like:
>>> b[:, :, 0, 0]
array([[4, 2, 2, 0, 0, 0, 0],
[9, 0, 5, 0, 0, 0, 0],
[9, 9, 4, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]])
>>> b[:, :, 0, 1]
array([[0, 4, 2, 2, 0, 0, 0],
[0, 9, 0, 5, 0, 0, 0],
[0, 9, 9, 4, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]])
One way to think about this to make a sliding window view of b (6D), slice out the parts you want (3D or 4D), and assign a to them.
However, there is a simpler way to do this altogether. The way a sliding window view works is by creating a dimension that steps along less than the full size of the dimension you are viewing. For example:
>>> x = np.array([1, 2, 3, 4])
array([1, 2, 3, 4])
>>> window = np.lib.stride_tricks.as_strided(
x, shape=(x.shape[0] - 2, 3),
strides=x.strides * 2)
[[1 2 3]
[2 3 4]]
I'm deliberately using np.lib.stride_tricks.as_strided rather than np.lib.stride_tricks.sliding_window_view here because it has a certain flexibility that you need.
You can have a stride that is larger than the axis you are viewing, as long as you are careful. Contiguous arrays are more forgiving in this case, but by no means a requirement. An example of this is np.diag. You can implement it something like this:
>>> x = np.arange(12).reshape(3, 4)
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> diag = np.lib.stride_tricks.as_strided(
x, shape=(min(x.shape),),
strides=(sum(x.strides),))
array([ 0, 5, 10])
The trick is to make a view of only the parts of b you care about in a way that makes the assignment easy. Because of broadcasting rules, you will want the last two dimensions of the view to be a.shape, and the strides to be b.strides[:2], since that's where you want to place a.
The first two dimensions of the view will be responsible for making the copies of a. You want 25 copies, so the shape will be (5, 5). The strides are the trickier part. Let's take a look at a 2D case, just because that's easier to visualize, and then attempt to generalize:
>>> a0 = np.array([1, 2])
>>> b0 = np.zeros((4, 3), dtype=int)
>>> b0[0:2, 0] = b0[1:3, 1] = b0[2:4, 2] = a0
The goal is to make a view that strides along the diagonal of b0 in the first axis. So:
>>> np.lib.stride_tricks.as_strided(
b0, shape=(b0.shape[0] - a0.shape[0] + 1, a0.shape[0]),
strides=(sum(b0.strides), b0.strides[0]))[:] = a0
>>> b0
array([[1, 0, 0],
[2, 1, 0],
[0, 2, 1],
[0, 0, 2]])
So that's what you do for b, but adding up every second dimension:
a = np.array([[4, 2, 2],
[9, 0, 5],
[9, 9, 4]])
b = np.zeros((7, 7, 5, 5), dtype=int)
vshape = (*np.subtract(b.shape[:a.ndim], a.shape) + 1,
*a.shape)
vstrides = (*np.add(b.strides[:a.ndim], b.strides[a.ndim:]),
*b.strides[:a.ndim])
np.lib.stride_tricks.as_strided(b, shape=vshape, strides=vstrides)[:] = a
TL;DR
def emplace_window(a, b):
vshape = (*np.subtract(b.shape[:a.ndim], a.shape) + 1, *a.shape)
vstrides = (*np.add(b.strides[:a.ndim], b.strides[a.ndim:]), *b.strides[:a.ndim])
np.lib.stride_tricks.as_strided(b, shape=vshape, strides=vstrides)[:] = a
I've phrased it this way, because now you can apply it to any number of dimensions. The only expectations is that 2 * a.ndim == b.ndim and that b.shape[a.ndim:] == b.shape[:a.ndim] - a.shape + 1.

How to do indexing of a NumPy 3D-array based on 2D-array in Python?

Let say I have a NumPy array A of shape (66,5) and B of shape (100, 66, 5).
The elements of A will index the first dimension (axis=0) of B, where the values are from 0 to 99 (i.e. the first dimension of B is 100).
A =
array([[ 1, 0, 0, 1, 0],
[ 0, 2, 0, 2, 4],
[ 1, 7, 0, 5, 5],
[ 2, 1, 0, 1, 7],
[ 0, 7, 0, 1, 4],
[ 0, 0, 3, 6, 0]
.... ]])
For example, A[4,1] will take index 7 of the first dimension of B, index 4 of the second dimension of B and index 1 of the third dimension B.
What I wanted to is to produce array C of shape (66,5) where it contains the elements in B that are selected based on the elements in A.
You can use np.take_along_axis to do that:
import numpy as np
np.random.seed(0)
a = np.random.randint(100, size=(66, 5))
b = np.random.random(size=(100, 66, 5))
c = np.take_along_axis(b, a[np.newaxis], axis=0)[0]
# Test some element
print(c[25, 3] == b[a[25, 3], 25, 3])
# True
If I understand correctly, you are looking for advances indexing of first dimension of B. You can use np.indices to create the indices required for the other two dimensions of B and use advanced indexing:
idx = np.indices(A.shape)
C = B[A,idx[0],idx[1]]
Example:
B = np.random.rand(10,20,30)
A = np.array([[ 1, 0, 0, 1, 0],
[ 0, 2, 0, 2, 4],
[ 1, 7, 0, 5, 5],
[ 2, 1, 0, 1, 7],
[ 0, 7, 0, 1, 4],
[ 0, 0, 3, 6, 0]])
print(C[4,1]==B[7,4,1])
#True
Use the following (using functions of NumPy library):
print(A)
# array([[2, 0],
# [1, 1],
# [2, 0]])
print(B)
# array([[[ 5, 7],
# [ 0, 0],
# [ 0, 0]],
# [[ 1, 8],
# [ 1, 9],
# [10, 1]],
# [[12, 22],
# [ 2, 2],
# [ 2, 2]]])
temp = A.reshape(-1) + np.cumsum(np.ones([A.reshape(-1).shape[0]])*B.shape[0], dtype = 'int') - 3
C = B.swapaxes(0, 1).swapaxes(2, 1).reshape(-1)[temp].reshape(A.shape)
print(C)
# array([[12, 7],
# [ 1, 9],
# [ 2, 0]])

Sparse matrix hstack getting error regarding subscriptability

Would someone please explain why this does not work?
from scipy.sparse import coo_matrix, hstack
row = np.array([0,3,1,0])
col = np.array([0,3,1,2])
data = np.array([4,5,7,9])
temp = coo_matrix((data, (row, col)))
temp_stack = coo_matrix([0, 11,22,33], ([0, 1,2,3], [0, 0,0,0]))
temp_res = hstack(temp, temp_stack)
I get an error that coo_matrix is not subscriptable, but I don't understand why, it appears that I am concatenating matrices of compatible dimension.
First note that the first argument of hstack is expected to be a tuple containing the arrays to be stacked, so you should call it with hstack((temp, temp_stack)).
Next, temp has shape (4, 4) and temp_stack has shape (1, 4). These shapes can not be hstacked. What shape do expect the result to be? If you are trying to create a result that has shape (5, 4), use vstack:
In [28]: result = vstack((temp, temp_stack))
In [29]: result.A
Out[29]:
array([[ 4, 0, 9, 0],
[ 0, 7, 0, 0],
[ 0, 0, 0, 0],
[ 0, 0, 0, 5],
[ 0, 11, 22, 33]], dtype=int64)
If you meant for temp_stack to have shape (4, 1), then fix how it is created by adding an extra level of parentheses in the call of coo_matrix:
In [38]: temp_stack = coo_matrix(([0, 11, 22, 33], ([0, 1, 2, 3], [0, 0, 0, 0])))
In [39]: temp_stack.shape
Out[39]: (4, 1)
In [40]: result = hstack((temp, temp_stack))
In [41]: result.A
Out[41]:
array([[ 4, 0, 9, 0, 0],
[ 0, 7, 0, 0, 11],
[ 0, 0, 0, 0, 22],
[ 0, 0, 0, 5, 33]], dtype=int64)
By the way, I think it is a SciPy bug that this call
temp_stack = coo_matrix([0, 11,22,33], ([0, 1,2,3], [0, 0,0,0]))
does not raise an error. That call is equivalent to
temp_stack = coo_matrix(arg1=[0, 11,22,33], shape=([0, 1,2,3], [0, 0,0,0]))
and that shape value is clearly not valid. That call to coo_matrix should raise a ValueError. I created an issue for this on the SciPy github site: https://github.com/scipy/scipy/issues/9919

Construct (N+1)-dimensional diagonal matrix from values in N-dimensional array

I have an N-dimensional array. I want to expand it to an (N+1)-dimensional array by putting the values of the final dimension in the diagonal.
For example, using explicit looping:
In [197]: M = arange(5*3).reshape(5, 3)
In [198]: numpy.dstack([numpy.diag(M[i, :]) for i in range(M.shape[0])]).T
Out[198]:
array([[[ 0, 0, 0],
[ 0, 1, 0],
[ 0, 0, 2]],
[[ 3, 0, 0],
[ 0, 4, 0],
[ 0, 0, 5]],
[[ 6, 0, 0],
[ 0, 7, 0],
[ 0, 0, 8]],
[[ 9, 0, 0],
[ 0, 10, 0],
[ 0, 0, 11]],
[[12, 0, 0],
[ 0, 13, 0],
[ 0, 0, 14]]])
which is a 5×3×3 array.
My actual arrays are large and I would like to avoid explicit looping (hiding the looping in map instead of a list comprehension has no performance gain; it's still a loop). Although numpy.diag works for constructing a regular, 2-D diagonal matrix, it does not extend to higher dimensions (when given a 2-D array, it will extract its diagonal instead). The array returned by numpy.diagflat makes everything into one big diagonal, resulting in a 15×15 array which has far more zeroes and cannot be reshaped into 5×3×3.
Is there a way to efficiently construct an (N+1)-diagonal matrix from the values in a N-dimensional array, without calling diag many times?
Use numpy.diagonal to take a view of the relevant diagonals of a properly-shaped N+1-dimensional array, force the view to be writeable with setflags, and write to the view:
expanded = numpy.zeros(M.shape + M.shape[-1:], dtype=M.dtype)
diagonals = numpy.diagonal(expanded, axis1=-2, axis2=-1)
diagonals.setflags(write=True)
diagonals[:] = M
This produces your desired array as expanded.
You can use an almost-impossible-to-guess-if-you-don't-know feature of the ubiquitous np.einsum. When used as follows einsum will return a writable view of the generalized diagonal:
>>> import numpy as np
>>> M = np.arange(5*3).reshape(5, 3)
>>>
>>> out = np.zeros((*M.shape, M.shape[-1]), M.dtype)
>>> np.einsum('...jj->...j', out)[...] = M
>>> out
array([[[ 0, 0, 0],
[ 0, 1, 0],
[ 0, 0, 2]],
[[ 3, 0, 0],
[ 0, 4, 0],
[ 0, 0, 5]],
[[ 6, 0, 0],
[ 0, 7, 0],
[ 0, 0, 8]],
[[ 9, 0, 0],
[ 0, 10, 0],
[ 0, 0, 11]],
[[12, 0, 0],
[ 0, 13, 0],
[ 0, 0, 14]]])
A general way to turn the last dimension of a N-D array into a diagonal matrix:
We will need to reduce the dimensionality of the array, apply the numpy.diag() function to each vector, and then rebuild that to the original dimensionality + 1.
reshaping the matrix to 2 dimensional:
M.reshape(-1, M.shape[-1])
then use map to apply np.diag to that, and rebuild the matrix with an additional dimension using the following:
result.reshape([*M.shape, M.shape[-1]])
All of this combined gives the following:
result = np.array(list(map(
np.diag,
M.reshape(-1, M.shape[-1])
))).reshape([*M.shape, M.shape[-1]])
An example:
shape = np.arange(2,8)
M = np.arange(shape.prod()).reshape(shape)
print(M.shape) # (2, 3, 4, 5, 6, 7)
result = np.array(list(map(np.diag, M.reshape(-1, M.shape[-1])))).reshape([*M.shape, M.shape[-1]])
print(result.shape) # (2, 3, 4, 5, 6, 7, 7)
and res[0,0,0,0,2,:] contains the following:
array([[14, 0, 0, 0, 0, 0, 0],
[ 0, 15, 0, 0, 0, 0, 0],
[ 0, 0, 16, 0, 0, 0, 0],
[ 0, 0, 0, 17, 0, 0, 0],
[ 0, 0, 0, 0, 18, 0, 0],
[ 0, 0, 0, 0, 0, 19, 0],
[ 0, 0, 0, 0, 0, 0, 20]])

Numpy resize and fill with specific value

How can i resize a numpy array and fill it with a specific value (if some dimension is extended) ?
I find a way to extend my array with np.pad but I can't shorten it:
>>> import numpy as np
>>> a = np.ndarray((5, 5), dtype=np.uint16)
>>> a
array([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]], dtype=uint16)
>>> np.pad(a, ((0, 1), (0,3)), mode='constant', constant_values=9)
array([[0, 0, 0, 0, 0, 9, 9, 9],
[0, 0, 0, 0, 0, 9, 9, 9],
[0, 0, 0, 0, 0, 9, 9, 9],
[0, 0, 0, 0, 0, 9, 9, 9],
[0, 0, 0, 0, 0, 9, 9, 9],
[9, 9, 9, 9, 9, 9, 9, 9]], dtype=uint16)
And if i use resize i can't specify the value that I want to use.
>>> a.fill(5)
>>> a.resize((2, 7))
>>> a
array([[5, 5, 5, 5, 5, 5, 5],
[5, 5, 5, 5, 5, 5, 5]], dtype=uint16)
But i would like
>>> a
array([[5, 5, 5, 5, 5, 9, 9],
[5, 5, 5, 5, 5, 9, 9]], dtype=uint16)
After some test I create this function but it's only work when you change x_value or with a lower y_value, if you need to increase y dimension it doesn't work, why ?
VALUE_TO_FILL = 9
def resize(self, x_value, y_value):
x_diff = self.np_array.shape[0] - x_value
y_diff = self.np_array.shape[1] - y_value
self.np_array.resize((x_value, y_value), refcheck=False)
if x_diff < 0:
self.np_array[x_diff:, :] = VALUE_TO_FILL
if y_diff < 0:
self.np_array[:, y_diff:] = VALUE_TO_FILL
Your array has a fixed size data buffer. You can reshape the array without changing that buffer. You can take a slice (view) without changing the buffer. But you can't add values to the array without changing the buffer.
In general resize returns an new array with a new data buffer.
pad is a complex function to handle general cases. But the simplest approach is to create the empty target array, fill it, and then copy the input into the right place.
Alternatively pad could create the fill arrays and concatenate them with the original. But concatenate also makes the empty return and copies.
A do it yourself pad with clipping could be structured as:
n,m = X.shape
R = np.empty((k,l))
R.fill(value)
<calc slices from n,m,k,l>
R[slice1] = X[slice2]
Calculating the slices may require if-else tests or equivalent min/max. You can probably work out those details.
This may be all that is needed
R[:X.shape[0],:X.shape[1]]=X[:R.shape[0],:R.shape[1]]
That's because there's no problem if a slice is larger than the dimension.
In [37]: np.arange(5)[:10]
Out[37]: array([0, 1, 2, 3, 4])
Thus, for example:
In [38]: X=np.ones((3,4),int)
In [39]: R=np.empty((2,5),int)
In [40]: R.fill(9)
In [41]: R[:X.shape[0],:X.shape[1]]=X[:R.shape[0],:R.shape[1]]
In [42]: R
Out[42]:
array([[1, 1, 1, 1, 9],
[1, 1, 1, 1, 9]])
To shorten it, you can use negative values in slice :
>>> import numpy as np
>>> a = np.ndarray((5, 5), dtype=np.uint16)
>>> a
array([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]], dtype=uint16)
>>> b = a[0:-1,0:-3]
>>> b
array([[0, 0],
[0, 0],
[0, 0],
[0, 0]], dtype=uint16)

Categories

Resources