Python multi-dimensional notation transpose automatically - python

I have the following minimal example:
a = np.zeros((5,5,5))
a[1,1,:] = [1,1,1,1,1]
print(a[1,:,range(4)])
I would expect as output an array with 5 rows and 4 columns, where we have ones on the second row. Instead it is an array with 4 rows and 5 columns with ones on the second column. What is happening here, and what can I do to get the output I expected?

This is an example of mixed basic and advanced indexing, as discussed in https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#combining-advanced-and-basic-indexing
The slice dimension has been appended to the end.
With one scalar index this is a marginal case for the ambiguity described there. It's been discussed in previous SO questions and one or more bug/issues.
Numpy sub-array assignment with advanced, mixed indexing
In this case you can replace the range with a slice, and get the expected order:
In [215]: a[1,:,range(4)].shape
Out[215]: (4, 5) # slice dimension last
In [216]: a[1,:,:4].shape
Out[216]: (5, 4)
In [219]: a[1][:,[0,1,3]].shape
Out[219]: (5, 3)

Related

why does numpy array return wrong shape of sub arrays when indexing

An example is shown as follows:
>>> import numpy as np
>>> a=np.zeros((288,512))
>>> x1,x2,y1,y2=0,16,0,16
>>> p=a[x1:x2][y1:y2]
>>> p.shape
(16, 512)
>>> p=a[x1:x2,y1:y2]
>>> p.shape
I try to query a patch from an array, ranging from columns 0 to 16, and rows 0 to 16. I index the array in two ways and get very different result. a[x1:x2][y1:y2] gives me the wrong result.
Why?
Thx for helping me!!!
When you do a[x1:x2][y1:y2], you are slicing by rows twice. That is, a[x1:x2] will give you a shape (16,512). The second slice operation in a[x1:x2][y1:y2] is slicing the result of the first operation and will give you the same result.
In the second case, when you do a[x1:x2,y1:y2], you are slicing by the two dimensions of your 2-dimensional array.
Important note: If you have a 2-dimensional array and you slice like this:
a = np.zeros((10,15))
a[1:3].shape
Output:
(2, 15)
you will slice only by rows. Your resulting array will have 2 rows and the total number of columns (15 columns). If you want to slice by rows and columns, you will have to use a[1:3, 1:3].
The two methods of indexing you tried are not equivalent. In the first one (a[x1:x2][y1:y2]), you are essentially indexing the first axis twice. In the second, you are indexing the first and second axes.
a[x1:x2][y1:y2] can be rewritten as
p = a[x1:x2] # result still has two dimensions
p = p[y1:y2]
You are first indexing 0:16 in the first dimension. Then you index 0:16 in the first dimension of the result of the previous operation (which will simply return the same as a[x1:x2] because x1==y1 and x2==y2).
In the second method, you index the first and second dimensions directly. I would not write it this way, but one could write it like this to contrast it with the first method:
a[x1:x2][:, y1:y2]

I don' t understand the retriving value syntax

Here i have the shape of my set
input [8] : train_x.shape
Out [8] : (4500, 3, 2)
Then in don't understand the following syntax
input [9] : train_x_retrive = train_x[:, -1, :]
Thank you for your help
See,
(4500, 3, 2) means 3 dimension data
with 1st dimension having 4500 length, 2nd dimension having 3 length and 3rd dimension having 2 length.
What train_x[:, -1, :] Means is that retrieve all the data of first dimension, of the last data of 2nd dimension of all the 3rd dimension.
results shape will be (4500, 2)
--EDIT--
Turns out if the returned array has only one selection then there is no need to display it, and so np automatically squeeze that column. so instead of getting array of size (4500,1,2) it returns (4500,2)
While #thisisjaymehta 's answer is correct and well explained, I find it is much easier to understand what's happening in a 2D array.
Consider a 3 row, 2 col random array:
import numpy as np
X = np.random.random((3,2))
print(X)
Yields:
array([[0.05809464, 0.49751321],
[0.25815324, 0.23862334],
[0.56815427, 0.91610693]])
We can access individual elements with subscripting. For instance to reach row (horizontal) 0, col (vertical) 1 we can use X[0,1]:
print(X[0,1])
Which yields:
0.49751320772009267
Similarly we can reach the last row, and the first (0th) column by using X[-1,0]:
0.568154265734957
The notation : is used to address the whole of that axis, so to get the last row, and all of the columns in that last row we can use X[-1,:] to yeild:
array([0.56815427, 0.91610693])
This principle extends in 3 or more dimensions as well. So train_x[:, -1, :] means "All rows (first dimension), the last column (second dimension), and all of the third dimension". This results in an array of shape (4500,2) in your example, where you started with (4500, 3, 2).
The way I like to think of this is you have 4500 3x2 images, and you are requesting for the last row of each image. The resultant array contains 4500 1,2 image strips, squeezed into a 4500,2 array.
You could also do -2 in place of -1 to reach the penultimate index.

Numpy - 'nested' array operations and invalid slice error

I am trying to use indices stored in one set of arrays (indexPositions) to perform a simple array operation using a matrix. It is easier to explain with an example
u[(indexPositions[:,1]):(indexPositions[:,2]),(indexPositions[:,0])]=0
The object u is a big matrix whose values I want to set to zero for a given region of space. indexPositions[:,1] contains the 'lower bound' indices and indexPositions[:,2] contains the 'upper bound' indices. This reflects the fact that I want to set to zero anything in between them and therefore want to iterate between these indices.
indexPositions[:,0] contains the column index for which the aforementioned range of rows must be set to zero.
I do not understand why it is not possible to do this (I hope its clear what I'm trying to achieve). I'm sure it has something to do with python not understanding what order its supposed to do these operations in. Is there a way of specifying this? The matrix is quite huge and these operations are happening many many times so I really don't want to use a slow python loop.
Just to make sure we are talking about the same thing, I'll create a simple example:
In [77]: u=np.arange(16).reshape(4,4)
In [78]: I=np.array([[0,2,3],[1,4,2]])
In [79]: i=0
In [80]: u[I[i,0]:I[i,1],I[i,2]]
Out[80]: array([3, 7])
In [85]: i=1
In [86]: u[I[i,0]:I[i,1],I[i,2]]
Out[86]: array([ 6, 10, 14])
I'm using different column order for I, but that doesn't matter.
I selecting 2 elements from the 4th column, and 3 from the 3rd. Different lengths of results suggests that I'll have problems operation with both rows of I at once. I might have to operate on a flattened view of u.
In [93]: [u[slice(x,y),z] for x,y,z in I]
Out[93]: [array([3, 7]), array([ 6, 10, 14])]
If the lengths of the slices are all the same it's more likely that I'd be able to do all with out a loop on I rows.
I'll think about this some more, but I just want to make sure I understood the problem, and why it might be difficult.
1u[I[:,0]:I[:,1],I[:,2]] with : in the slice is defintely going to be a problem.
In [90]: slice(I[:,0],I[:,1])
Out[90]: slice(array([0, 1]), array([2, 4]), None)
Abstractly a slice object accepts arrays or lists, but the numpy indexing does not. So instead of one complex slice, you have to create 2 or more simple ones.
In [91]: [slice(x,y) for x,y in I[:,:2]]
Out[91]: [slice(0, 2, None), slice(1, 4, None)]
I've answered a similar question, one where the slice starts came from a list, but all slices had the same length. i.e. 0:3 from the 1st row, 2:5 from the 2nd, 4:7 from the 3rd etc.
Access multiple elements of an array
How can I select values along an axis of an nD array with an (n-1)D array of indices of that axis?
If the slices are all the same length, then it is possible to use broadcasting to construct the indexing arrays. But in the end the indexing will still be with arrays, not slices.
Fast slicing of numpy array multiple times
Numpy Array Slicing
deal with taking multiple slices from a 1d array, slices with differing offsets and lengths. Your problem could, I think, be cast that way. The alterantives considered all require a list comprehension to construct the slice indexes. The indexes can then be concatenated, followed by one indexing operation, or alteratively, index multiple times and concanentate the results. Timings vary with the number and length of the slices.
An example, adapted from those earlier questions, of constructing a flat index list is:
In [130]: il=[np.arange(v[0],v[1])+v[2]*u.T.shape[1] for v in I]
# [array([12, 13]), array([ 9, 10, 11])]
In [132]: u.T.flat[np.concatenate(il)]
# array([ 3, 7, 6, 10, 14])
Same values as my earlier examples, but in 1 list, not 2.
If the slice arrays have same length, then we can get back an array
In [145]: I2
Out[145]:
array([[0, 2, 3],
[1, 3, 2]])
In [146]: il=np.array([np.arange(v[0],v[1]) for v in I2])
In [147]: u[il,I2[:,2]]
Out[147]:
array([[ 3, 6],
[ 7, 10]])
In this case, il = I2[:,[0]]+np.arange(2) could be used to construct the 1st indexing array instead of the list comprehension (this is the broadcasting I mentioned earlier).

Difference in shapes of numpy array

For the array:
import numpy as np
arr2d = np.array([[1,2,3],[4,5,6],[7,8,9]])
>>> arr2d
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
>>> arr2d[2].shape
(3,)
>>> arr2d[2:,:].shape
(1, 3)
Why do I get different shapes when both statements return the 3rd row? and shouldn't the result be (1,3) in both cases since we are returning a single row with 3 columns?
Why do I get different shapes when both statements return the 3rd row?
Because with the first operation you are indexing the rows, and selecting just ONE element, which -as mentioned in the single-element indexing paragraph of a multidimensional array- returns an array with a lower dimension (a 1D array).
In the 2nd example, you are using a slice as evident by the colon. Slicing operations do not reduce the dimensions of an array. This is also logical, because imagine the array would not have 3 but 4 rows. Then arr2d[2:,:].shape would be (2,3). The developers of numpy made slicing operations consistent and therefor they (slices) never reduce the number of dimensions of the array.
and shouldn't the result be (1,3) in both cases since we are returning a single row with 3 columns?
No, just because of the previous reasons.
When doing arr2d[2], you are taking a row out of the array;
While when doing arr2d[2:, :], you are taking a subset of rows out of the array ('slicing'), in this case being the rows starting from the 3rd to the end, which is only the 3rd, but it didn't change that you are taking a subset, not an element.

Slicing arrays in Numpy / Scipy

I have an array like:
a = array([[1,2,3],[3,4,5],[4,5,6]])
What's the most efficient way to slice out a 1x2 array out of this that has only the first two columns of "a"?
i.e.
array([[2,3],[4,5],[5,6]]) in this case.
Two dimensional numpy arrays are indexed using a[i,j] (not a[i][j]), but you can use the same slicing notation with numpy arrays and matrices as you can with ordinary matrices in python (just put them in a single []):
>>> from numpy import array
>>> a = array([[1,2,3],[3,4,5],[4,5,6]])
>>> a[:,1:]
array([[2, 3],
[4, 5],
[5, 6]])
Is this what you're looking for?
a[:,1:]
To quote documentation, the basic slice syntax is i:j:k where i is the starting index, j is the stopping index, and k is the step (when k > 0).
Now if i is not given, it defaults to 0 if k > 0. Otherwise i defaults to n - 1 for k < 0 (where n is the length of the array).
If j is not given, it defaults to n (length of array).
That's for a one dimensional array.
Now a two dimensional array is a different beast. The slicing syntax for that is a[rowrange, columnrange].
So if you want all the rows, but just the last two columns, like in your case, you do:
a[0:3, 1:3]
Here, "[0:3]" means all the rows from 0 to 3. and "[1:3]" means all columns from column 1 to column 3.
Now as you may be wondering, even though you have only 3 columns and the numbering starts from 1, it must return 3 columns right? i.e: column 1, column 2, column 3
That is the tricky part of this syntax. The first column is actually column 0. So when you say "[1:3]", you are actually saying give me column 1 and column 2. Which are the last two columns you want. (There actually is no column 3.)
Now if you don't know how long your matrix is or if you want all the rows, you can just leave that part empty.
i.e.
a[:, 1:3]
Same goes for columns also. i.e if you wanted say, all the columns but just the first row, you would write
a[0:1,:]
Now, how the above answer a[:,1:] works is because when you say "[1:]" for columns, it means give me everything except for column 0, and till the end of all the columns. i.e empty means 'till the end'.
By now you must realize that anything on either side of the comma is all a subset of the one dimensional case I first mentioned above. i.e if you want to specify your rows using step sizes you can write
a[::2,1]
Which in your case would return
array([[2, 3],
[5, 6]])
i.e. a[::2,1] elucidates as: give me every other row, starting with the top most, and give me only the 2nd column.
This took me some time to figure out. So pasting it here, just in case it helps someone.

Categories

Resources