Extracting specific columns from multi-dimensional array

Extracting specific columns from multi-dimensional array - python

Suppose we have a 3d numpy array in Python of shape (1, 22, 22) -random dimensions for illustration. If i want to extract the first 2 dimensions from Y, Z, then I can do:
new_array = array[:, 0:2, 0:2]
new_array.shape
(1, 2, 2)
But when I try to do the same by explicitly specifying the first two dimensions, as:
new_array = array[:, [0,1], [0,1]]
new_array.shape
(1, 2)
I'm getting a different result. Why's that? How can I select specific dimensions and and not a range of dimensions?

Passing a list to a numpy array's __getite__ uses advanced indexing instead of slicing. See the documentation here.
Advanced indexing is triggered when the selection object, obj, is a non-tuple sequence object, an ndarray (of data type integer or bool), or a tuple with at least one sequence object or ndarray (of data type integer or bool). There are two types of advanced indexing: integer and Boolean.
In your case, you are using the integer array indexing. The chain of integer indices are broadcast and iterated as a single unit. So using
array[:, [0,1], [0,1]]
selects elements (0,0) and (1,1), not the zeroth and first subarray from dimension 1 and the zeroth and first subarray form dimension 2.

I read the documentation and played around with my code. The only thing that seemed to work -but doesn't- with respect to my question is:
columns = np.array(([0, 1]), ([0,1]), dtype=np.intp)
new_array = my_array[:, columns, 0]
I'm still not quite sure why it works though.
EDIT: doesn't work as expected

Related

Numpy slicing of a 3D matrix using a sequence `:n` is different than specifying columns `[0,1]` [duplicate]

I have a 4-D NumPy array, with axis say x,y,z,t. I want to take slice corresponding to t=0 and to permute the order in the y axis.
I have the following
import numpy as np
a = np.arange(120).reshape(4,5,3,2)
b = a[:,[1,2,3,4,0],:,0]
b.shape
I get (5, 4, 3) instead of (4,5,3).
When, instead, I enter
aa = a[:,:,:,0]
bb = aa[:,[1,2,3,4,0],:]
bb.shape
I get the expected (4,5,3). Can someone explain why does the first version swap the first two dimensions?

As #hpaulj mentioned in the comments, this behaviour is because of mixing basic slicing and advanced indexing:
a = np.arange(120).reshape(4,5,3,2)
b = a[:,[1,2,3,4,0],:,0]
In the above code snippet, what happens is the following:
when we do basic slicing along last dimension, it triggers a __getitem__ call. So, that dimension is gone. (i.e. no singleton dimension)
[1,2,3,4,0] returns 5 slices from second dimension. There are two possibilities to put this shape in the returned array: either at the first or at the last position. NumPy decided to put it at the first dimension. This is why you get 5 (5, ...) in the first position in the returned shape tuple. Jaime explained this in one of the PyCon talks, if I recall correctly.
Along first and third dimension, since you slice everything using :, the original length along those dimensions is retained.
Putting all these together, NumPy returns the shape tuple as: (5, 4, 3)
You can read more about it at numpy-indexing-ambiguity-in-3d-arrays and arrays.indexing#combining-advanced-and-basic-indexing

Iterating through a subset of dimensions

I would like to iterate through a subset of dimensions of a numpy array and compare the resulting array elements (which are arrays or the remaining dimension(s)).
The code below does this:
import numpy
def min(h,m):
return h*60+m
exclude_times_default=[min(3,00),min(6,55)]
d=exclude_times_default
exclude_times_wkend=[min(3,00),min(9,00)]
w=exclude_times_wkend;
exclude_times=numpy.array([[[min(3,00),min(6,20)],d,d,d,d,d,[min(3,00),min(6,20)],d,d,[min(3,00),min(6,20)]],
[d,d,d,d,[min(3,00),min(9,30)],[min(3,00),min(9,30)],d,d,d,d],
[[min(20,00),min(7,15)],[min(3,00),min(23,15)],[min(3,00),min(7,15)],[min(3,00),min(7,15)],[min(3,00),min(23,15)],[min(3,00),min(23,15)],d,d,d,d]])
num_level=exclude_times.shape[0]
num_wind=exclude_times.shape[1]
for level in range(num_level):
for window in range(num_wind):
if (exclude_times[level,window,:]==d).all():
print("Default")
exclude_times[level][window]=w
print(level,window,exclude_times[level][window])
The solution does not look very elegant to me, just wondering if there are more elegant solutions.

You can get a 2D mask pinpointing all the window/level combinations set to default like this:
mask = (exclude_times == d[None, None, :]).all(axis=-1)
The expression d[None, None, :] introduces two new axes into a view of d to make it broadcast to the shape of exclude_times properly. Another way to do that would be with an explicit reshape: np.reshape(d, (1, 1, -1)) or d.reshape(1, 1, -1). There are many other ways as well.
The .all(axis=-1) operation reduces the 3D boolean mask along the last axis, giving you a 2D mask indexed be level and window.
To count the number of default entries, use np.countnonzero:
nnz = np.countnonzero(mask)
To count the defaults for each window:
np.countnonzero(mask, axis=0)
To count the defaults for each level:
np.countnonzero(mask, axis=1)
Remember, the axis parameter is the one you reduce, not the one(s) you keep.
Assigning w to the default elements is a bit more complex. The problem is that exclude_times[mask[:, :, None]] is a copy of the original data, and doesn't preserve the shape of the original at all.
You have to do a couple of extra steps to reshape correctly:
exclude_times[mask[:, :, None]] = np.broadcast_to(w[None, :], (nnz, 2)).ravel()

Non-consecutive slicing of a multidimensional array in Python

I am trying to perform non-consectuitive slicing of a multidimensional array like this (Matlab peudo code)
A = B(:,:,[1,3],[2,4,6]) %A and B are two 4D matrices
But when I try to write this code in Python:
A = B[:,:,np.array([0,2]),np.array([1,3,5])] #A and B are two 4D arrays
it gives an error: IndexError: shape mismatch: indexing arrays could not be broadcast...
It should be noted that slicing for one dimension each time works fine!

In numpy, if you use more than one fancy index (i.e. array) to index different dimension of the same array at the same time, they must broadcast. This is designed such that indexing can be more powerful. For your situation, the simplest way to solve the problem is indexing twice:
B[:, :, [0,2]] [..., [1,3,5]]
where ... stands for as many : as possible.
Indexing twice this way would generate some extra data moving time. If you want to index only once, make sure they broadcast (i.e. put fancy indices on different dimension):
B[:, :, np.array([0,2])[:,None], [1,3,5]]
which will result in a X by Y by 2 by 3 array. On the other hand, you can also do
B[:, :, [0,2], np.array([1,3,5])[:,None]]
which will result in a X by Y by 3 by 2 array. The [1,3,5] axis is transposed before the [0,2] axis.
Yon don't have to use np.array([0,2]) if you don't need to do fancy operation with it. Simply [0,2] is fine.
np.array([0,2])[:,None] is equivalent to [[0],[2]], where the point of [:,None] is to create an extra dimension such that the shape becomes (2,1). Shape (2,) and (3,) cannot broadcast, while shape (2,1) and (3,) can, which becomes (2,3).

Numpy [...,None]

I have found myself needing to add features to existing numpy arrays which has led to a question around what the last portion of the following code is actually doing:
np.ones(shape=feature_set.shape)[...,None]
Set-up
As an example, let's say I wish to solve for linear regression parameter estimates by using numpy and solving:
Assume I have a feature set shape (50,1), a target variable of shape (50,), and I wish to use the shape of my target variable to add a column for intercept values.
It would look something like this:
# Create random target & feature set
y_train = np.random.randint(0,100, size = (50,))
feature_set = np.random.randint(0,100,size=(50,1))
# Build a set of 1s after shape of target variable
int_train = np.ones(shape=y_train.shape)[...,None]
# Able to then add int_train to feature set
X = np.concatenate((int_train, feature_set),1)
What I Think I Know
I see the difference in output when I include [...,None] vs when I leave it off. Here it is:
The second version returns an error around input arrays needing the same number of dimensions, and eventually I stumbled on the solution to use [...,None].
Main Question
While I see the output of [...,None] gives me what I want, I am struggling to find any information on what it is actually supposed to do. Can anybody walk me through what this code actually means, what the None argument is doing, etc?
Thank you!

The slice of [..., None] consists of two "shortcuts":
The ellipsis literal component:
The dots (...) represent as many colons as needed to produce a complete indexing tuple. For example, if x is a rank 5 array (i.e., it has 5 axes), then
x[1,2,...] is equivalent to x[1,2,:,:,:],
x[...,3] to x[:,:,:,:,3] and
x[4,...,5,:] to x[4,:,:,5,:].
(Source)
The None component:
numpy.newaxis
The newaxis object can be used in all slicing operations to create an axis of length one. newaxis is an alias for ‘None’, and ‘None’ can be used in place of this with the same result.
(Source)
So, arr[..., None] takes an array of dimension N and "adds" a dimension "at the end" for a resulting array of dimension N+1.
Example:
import numpy as np
x = np.array([[1,2,3],[4,5,6]])
print(x.shape) # (2, 3)
y = x[...,None]
print(y.shape) # (2, 3, 1)
z = x[:,:,np.newaxis]
print(z.shape) # (2, 3, 1)
a = np.expand_dims(x, axis=-1)
print(a.shape) # (2, 3, 1)
print((y == z).all()) # True
print((y == a).all()) # True

Consider this code:
np.ones(shape=(2,3))[...,None].shape
As you see the 'None' phrase change the (2,3) matrix to a (2,3,1) tensor. As a matter of fact it put the matrix in the LAST index of the tensor.
If you use
np.ones(shape=(2,3))[None, ...].shape
it put the matrix in the FIRST‌ index of the tensor

Preserving the dimensions of a slice from a Numpy 3d array

I have a 3d array, a, of shape say a.shape = (10, 10, 10)
When slicing, the dimensions are squeezed automatically i.e.
a[:,:,5].shape = (10, 10)
I'd like to preserve the number of dimensions but also ensure that the dimension that was squeezed is the one that shows 1 i.e.
a[:,:,5].shape = (10, 10, 1)
I have thought of re-casting the array and passing ndmin but that just adds the extra dimensions to the start of the shape tuple regardless of where the slice came from in the array a.

a[:,:,[5]].shape
# (10,10,1)
a[:,:,5] is an example of basic slicing.
a[:,:,[5]] is an example of integer array indexing -- combined with basic slicing. When using integer array indexing the resultant shape is always "identical to the (broadcast) indexing array shapes". Since [5] (as an array) has shape (1,),
a[:,:,[5]] ends up having shape (10,10,1).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extracting specific columns from multi-dimensional array - python

Related

Numpy slicing of a 3D matrix using a sequence `:n` is different than specifying columns `[0,1]` [duplicate]

Iterating through a subset of dimensions

Non-consecutive slicing of a multidimensional array in Python

Numpy [...,None]

Preserving the dimensions of a slice from a Numpy 3d array

Categories

Resources