Indexing numpy array with index array of lower dim yields array of higher dim than both - python

a = np.zeros((5,4,3))
v = np.ones((5, 4), dtype=int)
data = a[v]
shp = data.shape
This code gives shp==(5,4,4,3)
I don't understand why. How can a larger array be output? makes no sense to me and would love an explanation.

This is known as advanced indexing. Advanced indexing allows you to select arbitrary elements in the input array based on an N-dimensional index.
Let's use another example to make it clearer:
a = np.random.randint(1, 5, (5,4,3))
v = np.ones((5, 4), dtype=int)
Say in this case a is:
array([[[2, 1, 1],
[3, 4, 4],
[4, 3, 2],
[2, 2, 2]],
[[4, 4, 1],
[3, 3, 4],
[3, 4, 2],
[1, 3, 1]],
[[3, 1, 3],
[4, 3, 1],
[2, 1, 4],
[1, 2, 2]],
...
By indexing with an array of np.ones:
print(v)
array([[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]])
You will simply be indexing a with 1 along the first axis as many times as v. Putting it in another way, when you do:
a[1]
[[4, 4, 1],
[3, 3, 4],
[3, 4, 2],
[1, 3, 1]]
You're indexing along the first axis, as no indexing is specified along the additional axes. It is the same as doing a[1, ...], i.e taking a full slice along the remaining axes. Hence by indexing with a 2D array of ones, you will have the above 2D array (5, 4) times stacked together, resulting in an ndarray of shape (5, 4, 4, 3). Or in other words, a[1], of shape (4,3), stacked 5*4=20 times.
Hence, in this case you'd be getting:
array([[[[4, 4, 1],
[3, 3, 4],
[3, 4, 2],
[1, 3, 1]],
[[4, 4, 1],
[3, 3, 4],
[3, 4, 2],
[1, 3, 1]],
...

the value of v is:
[[1 1 1 1]
[1 1 1 1]
[1 1 1 1]
[1 1 1 1]
[1 1 1 1]]
every single 1 indexes a complete "row" in a, but every "element" in said "row" is a matrix. so every "row" in v indexes a "row" of "matrix"es in a.
(does this make any sense to you..?)
so you get 5 * 4 1s, each is a 4*3 "matrix".
if instead of zeroes you define a as a = np.arange(5*4*3).reshape((5, 4, 3))
it might be easier to understand, because you get to see which parts of a are being chosen:
import numpy as np
a = np.arange(5*4*3).reshape((5, 4, 3))
v = np.ones((5,4), dtype=int)
data = a[v]
print(data)
(output is pretty long, I don't want to paste it here)

Related

How to find and replace specific elements in numpy array?

I have a numpy array: a = [[1, 999, 3], [-1, 1, 3], [2, 999, 6]]
I want to find every instance of number 999 and replace it with the average of the two neighbouring numbers (999 is always in the middle).
I used the following code to try and make this work: np.where(a == 999, .5 * (a[0] + a[2]), a)
But the output I get appends the value I calculate for the first array: [[1, 2, 3], [-1, 1, 3], [2, 2, 6]] instead of:[[1, 2, 3], [-1, 1, 3], [2, 4, 6]]
How can I fix that?
You can get the row indices where the second column equals 999, and replace with the mean of the respective first and third columns. I'm using np.ix_ here to avoid integer based indexing, this will instead create a mesh from the input sequences:
a = np.array([[1, 999, 3], [-1, 1, 3], [2, 999, 6]])
ix = a[:,1] == 999
a[ix, 1] = a[np.ix_(ix, [0,2])].mean(1)
print(a)
array([[ 1, 2, 3],
[-1, 1, 3],
[ 2, 4, 6]])

How can I add a column to a numpy array

How can I add a column containing only "1" to the beginning of a second numpy array.
X = np.array([1, 2], [3, 4], [5, 6])
I want to have X become
[[1,1,2], [1,3,4],[1,5,6]]
You can use the np.insert
new_x = np.insert(x, 0, 1, axis=1)
You can use the np.append method to add your array at the right of a column of 1 values
x = np.array([[1, 2], [3, 4], [5, 6]])
ones = np.array([[1]] * len(x))
new_x = np.append(ones, x, axis=1)
Both will give you the expected result
[[1 1 2]
[1 3 4]
[1 5 6]]
Try this:
>>> X = np.array([[1, 2], [3, 4], [5, 6]])
>>> X
array([[1, 2],
[3, 4],
[5, 6]])
>>> np.insert(X, 0, 1, axis=1)
array([[1, 1, 2],
[1, 3, 4],
[1, 5, 6]])
Since a new array is going to be created in any event, it is just sometimes easier to do so from the beginning. Since you want a column of 1's at the beginning, then you can use builtin functions and the input arrays existing structure and dtype.
a = np.arange(6).reshape(3,2) # input array
z = np.ones((a.shape[0], 3), dtype=a.dtype) # use the row shape and your desired columns
z[:, 1:] = a # place the old array into the new array
z
array([[1, 0, 1],
[1, 2, 3],
[1, 4, 5]])
numpy.insert() will do the trick.
X = np.array([[1, 2], [3, 4], [5, 6]])
np.insert(X,0,[1,2,3],axis=1)
The Output will be:
array([[1, 1, 2],
[2, 3, 4],
[3, 5, 6]])
Note that the second argument is the index before which you want to insert. And the axis = 1 indicates that you want to insert as a column without flattening the array.
For reference:
numpy.insert()

element-wise count along axis of values in numpy array

How can I get an element-wise count of each element's number of occurrences in a numpy array, along a given axis? By "element-wise," I mean each value of the array should be converted to the number of times it appears.
Simple 2D input:
[[1, 1, 1],
[2, 2, 2],
[3, 4, 5]]
Should output:
[[3, 3, 3],
[3, 3, 3],
[1, 1, 1]]
The solution also needs to work relative to a given axis. For example, if my input array a has shape (4, 2, 3, 3), which I think of as "a 4x2 matrix of 3x3 matrices," running solution(a) should spit out a (4, 2, 3, 3) solution of the form above, where each 3x3 "submatrix" contains counts of the corresponding elements relative to that submatrix alone, rather than the entire numpy array at large.
More complex example: suppose I take the example input above a and call skimage.util.shape.view_as_windows(a, (2, 2)). This gives me array b of shape (2, 2, 2, 2):
[[[[1 1]
[2 2]]
[[1 1]
[2 2]]]
[[[2 2]
[3 4]]
[[2 2]
[4 5]]]]
Then solution(b) should output:
[[[[2 2]
[2 2]]
[[2 2]
[2 2]]]
[[[2 2]
[1 1]]
[[2 2]
[1 1]]]]
So even though the value 1 occurs 3 times in a and 4 times in b, it only occurs twice in each 2x2 window.
Starting off approach
We can use np.unique to get the counts of occurrences and also tag each element from 0 onwards, letting us index into those counts with the tags for the desired output, like so -
In [43]: a
Out[43]:
array([[1, 1, 1],
[2, 2, 2],
[3, 4, 5]])
In [44]: _,ids,c = np.unique(a, return_counts=True, return_inverse=True)
In [45]: c[ids].reshape(a.shape)
Out[45]:
array([[3, 3, 3],
[3, 3, 3],
[1, 1, 1]])
For positive integers numbers in input array, we can also use np.bincount -
In [73]: c = np.bincount(a.ravel())
In [74]: c[a]
Out[74]:
array([[3, 3, 3],
[3, 3, 3],
[1, 1, 1]])
For negative integers numbers, simply offset by the minimum in it.
Extending to generic n-dims
Let's use bincount for this -
In [107]: ar
Out[107]:
array([[[1, 1, 1],
[2, 2, 2],
[3, 4, 5]],
[[2, 3, 5],
[4, 3, 4],
[3, 1, 2]]])
In [104]: ar2D = ar.reshape(-1,ar.shape[-2]*ar.shape[-1])
# bincount2D_vectorized from https://stackoverflow.com/a/46256361/ #Divakar
In [105]: c = bincount2D_vectorized(ar2D)
In [106]: c[np.arange(ar2D.shape[0])[:,None], ar2D].reshape(ar.shape)
Out[106]:
array([[[3, 3, 3],
[3, 3, 3],
[1, 1, 1]],
[[2, 3, 1],
[2, 3, 2],
[3, 1, 2]]])

Repeat a NumPy array in multiple dimensions at once?

np.repeat(np.repeat([[1, 2, 3]], 3, axis=0), 3, axis=1)
works as expected and produces
array([[1, 1, 1, 2, 2, 2, 3, 3, 3],
[1, 1, 1, 2, 2, 2, 3, 3, 3],
[1, 1, 1, 2, 2, 2, 3, 3, 3]])
However,
np.repeat([[1, 2, 3]], [3, 3])
and
np.repeat([[1, 2, 3]], [3, 3], axis=0)
produce errors.
Is it possible to repeat an array in multiple dimensions at once?
First off, I think the original method you propose is totally fine. It's readable, it makes sense, and it's not very slow.
You could use the repeat method instead of function which reads a bit more nicely:
>>> x.repeat(3, 1).repeat(3, 0)
array([[1, 1, 1, 2, 2, 2, 3, 3, 3],
[1, 1, 1, 2, 2, 2, 3, 3, 3],
[1, 1, 1, 2, 2, 2, 3, 3, 3]])
With numpy's broadcasting rules, there's likely dozens of ways to create the repeated data and throw it around into the shape you want, too. One approach could be to use np.broadcast_to() and repeat the data in D+1 dimensions, where D is the dimension you need, and then collapse it down to D.
For example:
>>> x = np.array([[1, 2, 3]])
>>> np.broadcast_to(x.T, (3, 3, 3)).reshape((3, 9))
array([[1, 1, 1, 2, 2, 2, 3, 3, 3],
[1, 1, 1, 2, 2, 2, 3, 3, 3],
[1, 1, 1, 2, 2, 2, 3, 3, 3]])
And without reshaping (so that you don't need to know the final length):
>>> np.hstack(np.broadcast_to(x, (3, 3, 3)).T)
array([[1, 1, 1, 2, 2, 2, 3, 3, 3],
[1, 1, 1, 2, 2, 2, 3, 3, 3],
[1, 1, 1, 2, 2, 2, 3, 3, 3]])
And there's likely a dozen other ways to do this. But I still think your original version is more idiomatic, as throwing it into extra dimensions to collapse it down is weird.
It isn't possible, see repeat. But you are using a array with the shape (1,3), so you have to use:
np.repeat(X, [2], axis=0)
because np.repeat(X, [2,2], axis=0) needs shape (2,3), e.g.
X = np.array([[1, 2, 3], [5, 6, 7]])
np.repeat(X, [2, 5], axis=0)
the output looks like:
[[1 2 3]
[1 2 3]
[5 6 7]
[5 6 7]
[5 6 7]
[5 6 7]]
This means [2,5] stands for [2, 5]:2x first row and [2, 5]:5x second row (shape: (2, *doesn't matter*) because axis=0 means you want to repeat the rows.
Therefore you first have to generate an array with the dimensions (3, *), and then produce the next array.
If you want to repeat your array:
np.repeat(X2, [5], axis=0)
produces:
[[1 2 3]
[1 2 3]
[1 2 3]
[1 2 3]
[1 2 3]]
because you have only a 1-dimensional array.
The first call of np.repeat produces a 2D-array, the second call duplicates the columns. If you want to use np.repeat(X2, [5], axis=0) you get the same result as you have mentioned in your post above, because you have to call np.repeat a second time on the output of np.repeat(X2, [5], axis=0).
In my opinion your use of np.repeat is the easiest and best way to achieve your output.
Edit: Hopefully the answer is now more clearly

How do I use numpy to form a 2D array from another array's columns(dimension is 2*4) given an array of indices of column number efficiently

I'm trying to make an array of 2 by n using numpy, elements inside come from specific columns selected by an array of column numbers.
For example if I have something like this
[[1, 2, 3],
[2, 3, 4]]
as my input array, and i want to have columns
[2,3,1,2,3],
i will get
[[2, 3, 1, 2, 3],
[3, 4, 2, 3, 4]]
as my output array
You want to slice along the second dimension. However, keep in mind that numpy uses zero based indexing. You'll need [1, 2, 0, 1, 2] instead of [2, 3, 1, 2, 3]
a = np.array([
[1, 2, 3],
[2, 3, 4]])
a[:, [1, 2, 0, 1, 2]]
array([[2, 3, 1, 2, 3],
[3, 4, 2, 3, 4]])
​

Categories

Resources