Numpy array slice using tuple - python

I've read the numpy doc on slicing(especially the bottom where it discusses variable array indexing)
https://docs.scipy.org/doc/numpy/user/basics.indexing.html
But I'm still not sure how I could do the following: Write a method that either returns a 3D set of indices, or a 4D set of indices that are then used to access an array. I want to write a method for a base class, but the classes that derive from it access either 3D or 4D depending on which derived class is instantiated.
Example Code to illustrate idea:
import numpy as np
a = np.ones([2,2,2,2])
size = np.shape(a)
print(size)
for i in range(size[0]):
for j in range(size[1]):
for k in range(size[2]):
for p in range(size[3]):
a[i,j,k,p] = i*size[1]*size[2]*size[3] + j*size[2]*size[3] + k*size[3] + p
print(a)
print('compare')
indices = (0,:,0,0)
print(a[0,:,0,0])
print(a[indices])
In short, I'm trying to get a tuple(or something) that can be used to make both of the following access depending on how I fill the tuple:
a[i, 0, :, 1]
a[i, :, 1]
The slice method looked promising, but it seems to require a range, and I just want a ":" i.e. the whole dimension. What options are out there for variable numpy array dimension access?

In [324]: a = np.arange(8).reshape(2,2,2)
In [325]: a
Out[325]:
array([[[0, 1],
[2, 3]],
[[4, 5],
[6, 7]]])
slicing:
In [326]: a[0,:,0]
Out[326]: array([0, 2])
In [327]: idx = (0,slice(None),0) # interpreter converts : into slice object
In [328]: a[idx]
Out[328]: array([0, 2])
In [331]: idx
Out[331]: (0, slice(None, None, None), 0)
In [332]: np.s_[0,:,0] # indexing trick to generate same
Out[332]: (0, slice(None, None, None), 0)

Your code appears to work how you want it using :. The reason the two examples
(a[i, 0, :, 7], a[i, :, 7])
don't work is because the 7 is out of range of the array. If you change the 7 to something in range like 1 then it returns a value, which I believe is what you are looking for.

Related

How does slicing numpy arrays with other arrays work?

I have a numpy array of shape [batch_size, timesteps_per_samples, width, height], where width and height refer to a 2D grid. The values in this array can be interpreted as an elevation at a certain location that changes over time.
I want to know the elevation over time for various paths within this array. Therefore i have a second array of shape [batch_size, paths_per_batch_sample, timesteps_per_path, coordinates] (coordinates = 2, for x and y in the 2D plane).
The resulting array should be of shape [batch_size, paths_per_batch_sample, timesteps_per_path] containing the elevation over time for each sample within the batch.
The following two examples work. The first one is very slow and just serves for understanding what I am trying to do. I think the second one does what I want but I have no idea why this works nor if it may crash under certain circumstances.
Code for the problem setup:
import numpy as np
batch_size=32
paths_per_batch_sample=10
timesteps_per_path=4
width=64
height=64
elevation = np.arange(0, batch_size*timesteps_per_path*width*height, 1)
elevation = elevation.reshape(batch_size, timesteps_per_path, width, height)
paths = np.random.randint(0, high=width-1, size=(batch_size, paths_per_batch_sample, timesteps_per_path, 2))
range_batch = range(batch_size)
range_paths = range(paths_per_batch_sample)
range_timesteps = range(timesteps_per_path)
The following code works but is very slow:
elevation_per_time = np.zeros((batch_size, paths_per_batch_sample, timesteps_per_path))
for s in range_batch:
for k in range_paths:
for t in range_timesteps:
x_co, y_co = paths[s,k,t,:].astype(int)
elevation_per_time[s,k,t] = elevation[s,t,x_co,y_co]
The following code works (even fast) but I can't understand why and how o.0
elevation_per_time_fast = elevation[
:,
range_timesteps,
paths[:, :, range_timesteps, 0].astype(int),
paths[:, :, range_timesteps, 1].astype(int),
][range_batch, range_batch, :, :]
Prove that the results are equal
check = (elevation_per_time == elevation_per_time_fast)
print(np.all(check))
Can somebody explain how I can slice an nd-array by multiple other arrays?
Especially, I don't understand how the numpy knows that 'range_timesteps' has to run in step (for the index in axis 1,2,3).
Thanks in advance!
Lets take a quick look at slicing numpy array first:
a = np.arange(0,9,1).reshape([3,3])
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
Numpy has 2 ways of slicing array, full sections start:stop and by index from a list [index1, index2 ...]. The output will still be an array with the shape of your slice:
a[0:2,:]
array([[0, 1, 2],
[3, 4, 5]])
a[:,[0,2]]
array([[0, 2],
[3, 5],
[6, 8]])
The second part is that since you get a returned array with the same amount of dimensions you can easily stack any number of slices as long as you dont try to directly access an index outside of the array.
a[:][:][:][:][:][:][:][[0,2]][:,[0,2]]
array([[0, 2],
[6, 8]])

Numpy: slicing a volume using a matrix

I have a 3D numpy volume and a 2D numpy matrix:
foo = np.random.rand(20,20,10)
amin = np.argmin(foo, axis=2)
i would like to use amin variable to slice the volume in the same way np.min would do:
grid = np.indices(min.shape)
idcs = np.stack([grid[0], grid[1], min])
fmin = foo[idcs[0], idcs[1], idcs[2]]
problem is that i can't use np.min because i also need the amin neighbors for interpolation reasons, something that i would obtain doing:
pre = foo[idcs[0], idcs[1], np.clip(idcs[2]-1, 0, 9)]
post = foo[idcs[0], idcs[1], np.clip(idcs[2]+1, 0, 9)]
Is there a more pythonic (nupyic) way to do this without creating an np.grid? something like:
foo[:,:,amin-1:amin+1]
that actually works (i would care about margin handling with an early-padding)
You could use np.ogrid instead of np.indices to save memory.
np.ogrid returns an "open" meshgrid:
In [24]: np.ogrid[:5,:5]
Out[24]:
[array([[0],
[1],
[2],
[3],
[4]]), array([[0, 1, 2, 3, 4]])]
ogrid returns component arrays which can be used as indices
in the same way as one would use np.indices.
NumPy will automatically broadcast the values in the open mesh when they are used as indices:
In [49]: (np.indices((5,5)) == np.broadcast_arrays(*np.ogrid[:5, :5])).all()
Out[49]: True
import numpy as np
h, w, d = 20, 20, 10
foo = np.random.rand(h, w, d)
amin = np.argmin(foo, axis=2)
X, Y = np.ogrid[:h, :w]
amins = np.stack([np.clip(amin+i, 0, d-1) for i in [-1, 0, 1]])
fmins = foo[X, Y, amins]
It's better to store fmin, pre and post in one array, fmins,
since some NumPy/Scipy operations (like argmin or griddata) may need the values in one array. If, later, you need to operate on the 3 components individually, you can always access them using fmins[i] or define
pre, fmin, post = fmins

Python indexing numpy array using a smaller boolean array

https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
If obj.ndim == x.ndim, x[obj] returns a 1-dimensional array filled
with the elements of x corresponding to the True values of obj. The
search order will be row-major, C-style. If obj has True values at
entries that are outside of the bounds of x, then an index error will
be raised. If obj is smaller than x it is identical to filling it with
False.
I read from the numpy reference that I can index a larger array using a smaller boolean array ,and the rest entries would be automatically filled with False.
Example :
From an array, select all rows which sum up to less or equal two:
>>> x = np.array([[0, 1], [1, 1], [2, 2]])
>>> rowsum = x.sum(-1)
>>> x[rowsum <= 2, :]
array([[0, 1],[1, 1]])
But if rowsum would have two dimensions as well:
>>> rowsum = x.sum(-1, keepdims=True)
>>> rowsum.shape
(3, 1)
>>> x[rowsum <= 2, :] # fails
IndexError: too many indices
>>> x[rowsum <= 2]
array([0, 1])
The last one giving only the first elements because of the extra
dimension.
But the example simply doesn't work ,it says "IndexError: boolean index did not match indexed array along dimension 1; dimension is 2 but corresponding boolean dimension is 1"
How to make it work ?I'm using python 3.6.3 and numpy 1.13.3.
From Numpy 11, It's not compatible with the new default behaviour : (boolean-indexing-changes) :
Boolean indexing changes.
...
...
Boolean indexes must match the dimension of the axis that
they index.
...
Internals have been optimized, the docs not yet ....
I think what you are looking for is NumPy broadcasting.
import numpy as np
x = np.array([[0, 1], [1, 1], [2, 2]])
rowsum = x.sum(axis=1)
x[rowsum <= 2]
Gives:
array([[0, 1],
[1, 1]])
The problem is that you used keepdims=True, which means the sum creates a column vector, rather than a rank one array which can be broadcasted.

Python: Cell arrays comparison using minus function

I have 3 cell arrays with each cell array have different sizes of array. How can I perform minus function for each of the possible combinations of cell arrays? For example:
import numpy as np
a=np.array([[np.array([[2,2,1,2]]),np.array([[1,3]])]])
b=np.array([[np.array([[4,2,1]])]])
c=np.array([[np.array([[1,2]]),np.array([[4,3]])]])
The possible combination here is a-b, a-c and b-c. Let's say a - b:
a=2,2,1,2 and 1,3
b=4,2,1
The desired result come with shifting windows due to different size array:
(2,2,1)-(4,2,1) ----> -2,0,0
(2,1,2)-(4,2,1) ----> -2,-1,1
(1,3) -(4,2) ----> -3,1,1
(1,3) -(2,1) ----> 4,-1,2
I would like to know how to use python create shifting window that allow me to minus my cell arrays.
You can use the function sliding_window() from the toolz library to do the shifting window:
>>> import numpy as np
>>> import toolz
>>> a = np.array([2,2,1,2])
>>> b = np.array([4, 2, 1])
>>> for chunk in toolz.sliding_window(b.size, a):
...: print(chunk - b)
...:
[-2 0 0]
[-2 -1 1]
I think this pair of functions does what you want. The first may need some tweaking to get the pairing of the differences right.
import numpy as np
def diffs(a,b):
# collect sliding window differences
# length of window determined by the shorter array
# if a,b are not arrays, need to replace b[...]-a with
# a list comprehension
n,m=len(a),len(b)
if n>m:
# ensure s is the shorter
b,a=a,b # switch
n,m=len(a),len(b)
# may need to correct for sign switch
result=[]
for i in range(0,1+m-n):
result.append(b[i:i+n]-a)
return result
def alldiffs(a,b):
# collect all the differences for elements of a and b
# a,b could be lists or arrays of arrays, or 2d arrays
result=[]
for aa in a:
for bb in b:
result.append(diffs(aa,bb))
return result
# define the 3 arrays
# each is a list of 1d arrays
a=[np.array([2,2,1,2]),np.array([1,3])]
b=[np.array([4,2,1])]
c=[np.array([1,2]),np.array([4,3])]
# display the differences
print(alldiffs(a,b))
print(alldiffs(a,c))
print(alldiffs(b,c))
producing (with some pretty printing):
1626:~/mypy$ python stack30678737.py
[[array([-2, 0, 0]), array([-2, -1, 1])],
[array([ 3, -1]), array([ 1, -2])]]
[[array([1, 0]), array([ 1, -1]), array([0, 0])],
[array([-2, -1]), array([-2, -2]), array([-3, -1])],
[array([ 0, -1])], [array([3, 0])]]
[[array([3, 0]), array([ 1, -1])],
[array([ 0, -1]), array([-2, -2])]]
Comparing my answer to yours, I wonder, are you padding your shorter arrays with 0 so the result is always 3 elements long?
Changing a to a=[np.array([2,2,1,2]),np.array([0,1,3]),np.array([1,3,0])]
produces:
[[array([-2, 0, 0]), array([-2, -1, 1])],
[array([ 4, 1, -2])], [array([ 3, -1, 1])]]
I suppose you could do something fancier with this inner loop:
for i in range(0,1+m-n):
result.append(b[i:i+n]-a)
But why? The first order of business is to get the problem specifications clear. Speed can wait. Besides sliding window code in image packages, there is a neat striding trick in np.lib.stride_tricks.as_strided. But I doubt if that will save time, especially not in small examples like this.

How to generate 2d numpy array?

I'm trying to generate a 2d numpy array with the help of generators:
x = [[f(a) for a in g(b)] for b in c]
And if I try to do something like this:
x = np.array([np.array([f(a) for a in g(b)]) for b in c])
I, as expected, get a np.array of np.array. But I want not this, but ndarray, so I can get, for example, column in a way like this:
y = x[:, 1]
So, I'm curious whether there is a way to generate it in such a way.
Of course it is possible with creating npdarray of required size and filling it with required values, but I want a way to do so in a line of code.
This works:
a = [[1, 2, 3], [4, 5, 6]]
nd_a = np.array(a)
So this should work too:
nd_a = np.array([[x for x in y] for y in a])
To create a new array, it seems numpy.zeros is the way to go
import numpy as np
a = np.zeros(shape=(x, y))
You can also set a datatype to allocate it sensibly
>>> np.zeros(shape=(5,2), dtype=np.uint8)
array([[0, 0],
[0, 0],
[0, 0],
[0, 0],
[0, 0]], dtype=uint8)
>>> np.zeros(shape=(5,2), dtype="datetime64[ns]")
array([['1970-01-01T00:00:00.000000000', '1970-01-01T00:00:00.000000000'],
['1970-01-01T00:00:00.000000000', '1970-01-01T00:00:00.000000000'],
['1970-01-01T00:00:00.000000000', '1970-01-01T00:00:00.000000000'],
['1970-01-01T00:00:00.000000000', '1970-01-01T00:00:00.000000000'],
['1970-01-01T00:00:00.000000000', '1970-01-01T00:00:00.000000000']],
dtype='datetime64[ns]')
See also
How do I create an empty array/matrix in NumPy?
np.full(size, 0) vs. np.zeros(size) vs. np.empty()
Its very simple, do like this
import numpy as np
arr=np.arange(50)
arr_2d=arr.reshape(10,5) #Reshapes 1d array in to 2d, containing 10 rows and 5 columns.
print(arr_2d)

Categories

Resources