Sparse matrix hstack getting error regarding subscriptability

Sparse matrix hstack getting error regarding subscriptability - python

Would someone please explain why this does not work?
from scipy.sparse import coo_matrix, hstack
row = np.array([0,3,1,0])
col = np.array([0,3,1,2])
data = np.array([4,5,7,9])
temp = coo_matrix((data, (row, col)))
temp_stack = coo_matrix([0, 11,22,33], ([0, 1,2,3], [0, 0,0,0]))
temp_res = hstack(temp, temp_stack)
I get an error that coo_matrix is not subscriptable, but I don't understand why, it appears that I am concatenating matrices of compatible dimension.

First note that the first argument of hstack is expected to be a tuple containing the arrays to be stacked, so you should call it with hstack((temp, temp_stack)).
Next, temp has shape (4, 4) and temp_stack has shape (1, 4). These shapes can not be hstacked. What shape do expect the result to be? If you are trying to create a result that has shape (5, 4), use vstack:
In [28]: result = vstack((temp, temp_stack))
In [29]: result.A
Out[29]:
array([[ 4, 0, 9, 0],
[ 0, 7, 0, 0],
[ 0, 0, 0, 0],
[ 0, 0, 0, 5],
[ 0, 11, 22, 33]], dtype=int64)
If you meant for temp_stack to have shape (4, 1), then fix how it is created by adding an extra level of parentheses in the call of coo_matrix:
In [38]: temp_stack = coo_matrix(([0, 11, 22, 33], ([0, 1, 2, 3], [0, 0, 0, 0])))
In [39]: temp_stack.shape
Out[39]: (4, 1)
In [40]: result = hstack((temp, temp_stack))
In [41]: result.A
Out[41]:
array([[ 4, 0, 9, 0, 0],
[ 0, 7, 0, 0, 11],
[ 0, 0, 0, 0, 22],
[ 0, 0, 0, 5, 33]], dtype=int64)
By the way, I think it is a SciPy bug that this call
temp_stack = coo_matrix([0, 11,22,33], ([0, 1,2,3], [0, 0,0,0]))
does not raise an error. That call is equivalent to
temp_stack = coo_matrix(arg1=[0, 11,22,33], shape=([0, 1,2,3], [0, 0,0,0]))
and that shape value is clearly not valid. That call to coo_matrix should raise a ValueError. I created an issue for this on the SciPy github site: https://github.com/scipy/scipy/issues/9919

Related

setting the values of sliding windows of an array in numpy

Suppose I have a 2D array with shape (3, 3), call it a, and an array of zeros with shape (7, 7, 5, 5), call it b. I want to modify b in the following way:
for p in range(5):
for q in range(5):
b[p:p + 3, q:q + 3, p, q] = a
Given:
a = np.array([[4, 2, 2],
[9, 0, 5],
[9, 9, 4]])
b = np.zeros((7, 7, 5, 5), dtype=int)
b would end up something like:
>>> b[:, :, 0, 0]
array([[4, 2, 2, 0, 0, 0, 0],
[9, 0, 5, 0, 0, 0, 0],
[9, 9, 4, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]])
>>> b[:, :, 0, 1]
array([[0, 4, 2, 2, 0, 0, 0],
[0, 9, 0, 5, 0, 0, 0],
[0, 9, 9, 4, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]])

One way to think about this to make a sliding window view of b (6D), slice out the parts you want (3D or 4D), and assign a to them.
However, there is a simpler way to do this altogether. The way a sliding window view works is by creating a dimension that steps along less than the full size of the dimension you are viewing. For example:
>>> x = np.array([1, 2, 3, 4])
array([1, 2, 3, 4])
>>> window = np.lib.stride_tricks.as_strided(
x, shape=(x.shape[0] - 2, 3),
strides=x.strides * 2)
[[1 2 3]
[2 3 4]]
I'm deliberately using np.lib.stride_tricks.as_strided rather than np.lib.stride_tricks.sliding_window_view here because it has a certain flexibility that you need.
You can have a stride that is larger than the axis you are viewing, as long as you are careful. Contiguous arrays are more forgiving in this case, but by no means a requirement. An example of this is np.diag. You can implement it something like this:
>>> x = np.arange(12).reshape(3, 4)
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> diag = np.lib.stride_tricks.as_strided(
x, shape=(min(x.shape),),
strides=(sum(x.strides),))
array([ 0, 5, 10])
The trick is to make a view of only the parts of b you care about in a way that makes the assignment easy. Because of broadcasting rules, you will want the last two dimensions of the view to be a.shape, and the strides to be b.strides[:2], since that's where you want to place a.
The first two dimensions of the view will be responsible for making the copies of a. You want 25 copies, so the shape will be (5, 5). The strides are the trickier part. Let's take a look at a 2D case, just because that's easier to visualize, and then attempt to generalize:
>>> a0 = np.array([1, 2])
>>> b0 = np.zeros((4, 3), dtype=int)
>>> b0[0:2, 0] = b0[1:3, 1] = b0[2:4, 2] = a0
The goal is to make a view that strides along the diagonal of b0 in the first axis. So:
>>> np.lib.stride_tricks.as_strided(
b0, shape=(b0.shape[0] - a0.shape[0] + 1, a0.shape[0]),
strides=(sum(b0.strides), b0.strides[0]))[:] = a0
>>> b0
array([[1, 0, 0],
[2, 1, 0],
[0, 2, 1],
[0, 0, 2]])
So that's what you do for b, but adding up every second dimension:
a = np.array([[4, 2, 2],
[9, 0, 5],
[9, 9, 4]])
b = np.zeros((7, 7, 5, 5), dtype=int)
vshape = (*np.subtract(b.shape[:a.ndim], a.shape) + 1,
*a.shape)
vstrides = (*np.add(b.strides[:a.ndim], b.strides[a.ndim:]),
*b.strides[:a.ndim])
np.lib.stride_tricks.as_strided(b, shape=vshape, strides=vstrides)[:] = a
TL;DR
def emplace_window(a, b):
vshape = (*np.subtract(b.shape[:a.ndim], a.shape) + 1, *a.shape)
vstrides = (*np.add(b.strides[:a.ndim], b.strides[a.ndim:]), *b.strides[:a.ndim])
np.lib.stride_tricks.as_strided(b, shape=vshape, strides=vstrides)[:] = a
I've phrased it this way, because now you can apply it to any number of dimensions. The only expectations is that 2 * a.ndim == b.ndim and that b.shape[a.ndim:] == b.shape[:a.ndim] - a.shape + 1.

numpy resize n-dimensional array with padding

I have two arrays, a and b.
a has shape (1, 2, 3, 4)
b has shape (4, 3, 2, 1)
I would like to make them both (4, 3, 3, 4) with the new positions filled with 0's.
I can do:
new_shape = (4, 3, 3, 4)
a = np.resize(a, new_shape)
b = np.resize(b, new_shape)
..but this repeats the elements of each to form the new elements, which does not work for me.
Instead I thought I could do:
a = a.resize(new_shape)
b = b.resize(new_shape)
..which according to the documentation pads with 0's.
But it doesn't work for multi-dimensional arrays, raising error:
ValueError: resize only works on single-segment arrays
So is there a different way to achieve this? ie. same as np.resize but with 0-padding?
NB: I am only looking for pure-numpy solutions.
EDIT: I'm using numpy version 1.20.2
EDIT: I just found out that is works for numbers, but not for objects, I forgot to mention that it is an array of objects not numbers.

resize method pads with 0s in a flattened sense; the function pads with repeats.
To illustrate how resize "flattens" before padding:
In [108]: a = np.arange(12).reshape(1,4,3)
In [109]: a
Out[109]:
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]]])
In [110]: a1 = a.copy()
In [111]: a1.resize((2,4,4))
In [112]: a1
Out[112]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[ 0, 0, 0, 0]],
[[ 0, 0, 0, 0],
[ 0, 0, 0, 0],
[ 0, 0, 0, 0],
[ 0, 0, 0, 0]]])
If instead I make a target array of the right shape, and copy, I can maintain the original multidimensional block:
In [114]: res = np.zeros((2,4,4),a.dtype)
In [115]: res[:a.shape[0],:a.shape[1],:a.shape[2]]=a
In [116]: res
Out[116]:
array([[[ 0, 1, 2, 0],
[ 3, 4, 5, 0],
[ 6, 7, 8, 0],
[ 9, 10, 11, 0]],
[[ 0, 0, 0, 0],
[ 0, 0, 0, 0],
[ 0, 0, 0, 0],
[ 0, 0, 0, 0]]])
I wrote out the slices explicitly (for clarity). Such a tuple could be created programmatically if needed.

How to product element-wise a 2-d numpy array into 3-d over the second dimension of the latter?

I have a numpy matrix b = np.array([[1,0,1,0],[0,0,0,1]]) and I want to product it element-wise into a 3-dim array a = np.array([[[1,2,3,4], [5,6,7,8], [9,10,11,12]], [[13,14,15,16], [17,18,19,20], [21,22,23,24]]]) for each index on the second dimension. So, the result I expect should be as follows:
[[[1,0,3,0], [5,0,7,0], [9,0,11,0]], [[0,0,0,16], [0,0,0,20], [0,0,0,24]]]
Numpy does not broadcast if I do a * b. I was thinking of broadcasting b in its second dimension. I tried np.broadcast_to(b, (2,3,4)) but I got error. I tried (np.broadcast_to(b, (3,2,4)).reshape(2,3,4)) but the output is not as expected.

Use None/newaxis to added a new middle dimension (reshape also does this):
In [36]: b.shape
Out[36]: (2, 4)
In [37]: a.shape
Out[37]: (2, 3, 4)
In [38]: b[:,None,:]*a
Out[38]:
array([[[ 1, 0, 3, 0],
[ 5, 0, 7, 0],
[ 9, 0, 11, 0]],
[[ 0, 0, 0, 16],
[ 0, 0, 0, 20],
[ 0, 0, 0, 24]]])
In [39]: b[:,None,:].shape
Out[39]: (2, 1, 4)
broadcast_to can't add that extra dimension automatically. It follows the same rules as b*a operations. It can add leading dimensions if needed, and scale size 1 dimensions. But for anything else, you have to be explicit.
In [41]: np.broadcast_to(b, (2,3,4))
Traceback (most recent call last):
File "<ipython-input-41-3c3268de7ce1>", line 1, in <module>
np.broadcast_to(b, (2,3,4))
File "<__array_function__ internals>", line 5, in broadcast_to
File "/usr/local/lib/python3.8/dist-packages/numpy/lib/stride_tricks.py", line 411, in broadcast_to
return _broadcast_to(array, shape, subok=subok, readonly=True)
File "/usr/local/lib/python3.8/dist-packages/numpy/lib/stride_tricks.py", line 348, in _broadcast_to
it = np.nditer(
ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (2,4) and requested shape (2,3,4)
In [42]: np.broadcast_to(b[:,None,:], (2,3,4))
Out[42]:
array([[[1, 0, 1, 0],
[1, 0, 1, 0],
[1, 0, 1, 0]],
[[0, 0, 0, 1],
[0, 0, 0, 1],
[0, 0, 0, 1]]])

You need to reshape:
c = b.reshape(2,-1,4)*a

Construct (N+1)-dimensional diagonal matrix from values in N-dimensional array

I have an N-dimensional array. I want to expand it to an (N+1)-dimensional array by putting the values of the final dimension in the diagonal.
For example, using explicit looping:
In [197]: M = arange(5*3).reshape(5, 3)
In [198]: numpy.dstack([numpy.diag(M[i, :]) for i in range(M.shape[0])]).T
Out[198]:
array([[[ 0, 0, 0],
[ 0, 1, 0],
[ 0, 0, 2]],
[[ 3, 0, 0],
[ 0, 4, 0],
[ 0, 0, 5]],
[[ 6, 0, 0],
[ 0, 7, 0],
[ 0, 0, 8]],
[[ 9, 0, 0],
[ 0, 10, 0],
[ 0, 0, 11]],
[[12, 0, 0],
[ 0, 13, 0],
[ 0, 0, 14]]])
which is a 5×3×3 array.
My actual arrays are large and I would like to avoid explicit looping (hiding the looping in map instead of a list comprehension has no performance gain; it's still a loop). Although numpy.diag works for constructing a regular, 2-D diagonal matrix, it does not extend to higher dimensions (when given a 2-D array, it will extract its diagonal instead). The array returned by numpy.diagflat makes everything into one big diagonal, resulting in a 15×15 array which has far more zeroes and cannot be reshaped into 5×3×3.
Is there a way to efficiently construct an (N+1)-diagonal matrix from the values in a N-dimensional array, without calling diag many times?

Use numpy.diagonal to take a view of the relevant diagonals of a properly-shaped N+1-dimensional array, force the view to be writeable with setflags, and write to the view:
expanded = numpy.zeros(M.shape + M.shape[-1:], dtype=M.dtype)
diagonals = numpy.diagonal(expanded, axis1=-2, axis2=-1)
diagonals.setflags(write=True)
diagonals[:] = M
This produces your desired array as expanded.

You can use an almost-impossible-to-guess-if-you-don't-know feature of the ubiquitous np.einsum. When used as follows einsum will return a writable view of the generalized diagonal:
>>> import numpy as np
>>> M = np.arange(5*3).reshape(5, 3)
>>>
>>> out = np.zeros((*M.shape, M.shape[-1]), M.dtype)
>>> np.einsum('...jj->...j', out)[...] = M
>>> out
array([[[ 0, 0, 0],
[ 0, 1, 0],
[ 0, 0, 2]],
[[ 3, 0, 0],
[ 0, 4, 0],
[ 0, 0, 5]],
[[ 6, 0, 0],
[ 0, 7, 0],
[ 0, 0, 8]],
[[ 9, 0, 0],
[ 0, 10, 0],
[ 0, 0, 11]],
[[12, 0, 0],
[ 0, 13, 0],
[ 0, 0, 14]]])

A general way to turn the last dimension of a N-D array into a diagonal matrix:
We will need to reduce the dimensionality of the array, apply the numpy.diag() function to each vector, and then rebuild that to the original dimensionality + 1.
reshaping the matrix to 2 dimensional:
M.reshape(-1, M.shape[-1])
then use map to apply np.diag to that, and rebuild the matrix with an additional dimension using the following:
result.reshape([*M.shape, M.shape[-1]])
All of this combined gives the following:
result = np.array(list(map(
np.diag,
M.reshape(-1, M.shape[-1])
))).reshape([*M.shape, M.shape[-1]])
An example:
shape = np.arange(2,8)
M = np.arange(shape.prod()).reshape(shape)
print(M.shape) # (2, 3, 4, 5, 6, 7)
result = np.array(list(map(np.diag, M.reshape(-1, M.shape[-1])))).reshape([*M.shape, M.shape[-1]])
print(result.shape) # (2, 3, 4, 5, 6, 7, 7)
and res[0,0,0,0,2,:] contains the following:
array([[14, 0, 0, 0, 0, 0, 0],
[ 0, 15, 0, 0, 0, 0, 0],
[ 0, 0, 16, 0, 0, 0, 0],
[ 0, 0, 0, 17, 0, 0, 0],
[ 0, 0, 0, 0, 18, 0, 0],
[ 0, 0, 0, 0, 0, 19, 0],
[ 0, 0, 0, 0, 0, 0, 20]])

Numpy resize and fill with specific value

How can i resize a numpy array and fill it with a specific value (if some dimension is extended) ?
I find a way to extend my array with np.pad but I can't shorten it:
>>> import numpy as np
>>> a = np.ndarray((5, 5), dtype=np.uint16)
>>> a
array([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]], dtype=uint16)
>>> np.pad(a, ((0, 1), (0,3)), mode='constant', constant_values=9)
array([[0, 0, 0, 0, 0, 9, 9, 9],
[0, 0, 0, 0, 0, 9, 9, 9],
[0, 0, 0, 0, 0, 9, 9, 9],
[0, 0, 0, 0, 0, 9, 9, 9],
[0, 0, 0, 0, 0, 9, 9, 9],
[9, 9, 9, 9, 9, 9, 9, 9]], dtype=uint16)
And if i use resize i can't specify the value that I want to use.
>>> a.fill(5)
>>> a.resize((2, 7))
>>> a
array([[5, 5, 5, 5, 5, 5, 5],
[5, 5, 5, 5, 5, 5, 5]], dtype=uint16)
But i would like
>>> a
array([[5, 5, 5, 5, 5, 9, 9],
[5, 5, 5, 5, 5, 9, 9]], dtype=uint16)
After some test I create this function but it's only work when you change x_value or with a lower y_value, if you need to increase y dimension it doesn't work, why ?
VALUE_TO_FILL = 9
def resize(self, x_value, y_value):
x_diff = self.np_array.shape[0] - x_value
y_diff = self.np_array.shape[1] - y_value
self.np_array.resize((x_value, y_value), refcheck=False)
if x_diff < 0:
self.np_array[x_diff:, :] = VALUE_TO_FILL
if y_diff < 0:
self.np_array[:, y_diff:] = VALUE_TO_FILL

Your array has a fixed size data buffer. You can reshape the array without changing that buffer. You can take a slice (view) without changing the buffer. But you can't add values to the array without changing the buffer.
In general resize returns an new array with a new data buffer.
pad is a complex function to handle general cases. But the simplest approach is to create the empty target array, fill it, and then copy the input into the right place.
Alternatively pad could create the fill arrays and concatenate them with the original. But concatenate also makes the empty return and copies.
A do it yourself pad with clipping could be structured as:
n,m = X.shape
R = np.empty((k,l))
R.fill(value)
<calc slices from n,m,k,l>
R[slice1] = X[slice2]
Calculating the slices may require if-else tests or equivalent min/max. You can probably work out those details.
This may be all that is needed
R[:X.shape[0],:X.shape[1]]=X[:R.shape[0],:R.shape[1]]
That's because there's no problem if a slice is larger than the dimension.
In [37]: np.arange(5)[:10]
Out[37]: array([0, 1, 2, 3, 4])
Thus, for example:
In [38]: X=np.ones((3,4),int)
In [39]: R=np.empty((2,5),int)
In [40]: R.fill(9)
In [41]: R[:X.shape[0],:X.shape[1]]=X[:R.shape[0],:R.shape[1]]
In [42]: R
Out[42]:
array([[1, 1, 1, 1, 9],
[1, 1, 1, 1, 9]])

To shorten it, you can use negative values in slice :
>>> import numpy as np
>>> a = np.ndarray((5, 5), dtype=np.uint16)
>>> a
array([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]], dtype=uint16)
>>> b = a[0:-1,0:-3]
>>> b
array([[0, 0],
[0, 0],
[0, 0],
[0, 0]], dtype=uint16)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Sparse matrix hstack getting error regarding subscriptability - python

Related

setting the values of sliding windows of an array in numpy

numpy resize n-dimensional array with padding

How to product element-wise a 2-d numpy array into 3-d over the second dimension of the latter?

Construct (N+1)-dimensional diagonal matrix from values in N-dimensional array

Numpy resize and fill with specific value

Categories

Resources