Related
I am looking for an efficient way of indexing the columns of a numpy array with several ranges, when only the indexes of the desired ranges are given.
For example, given the following array, and a range size r_size=3:
import numpy as np
arr = np.arange(18).reshape((2,9))
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8],
[ 9, 10, 11, 12, 13, 14, 15, 16, 17]])
This would mean that there are a total of 3 sets of ranges [r0, r1, r2] whose elements in the array are distributed as:
[[r0_00, r0_01, r0_02, r1_00, r1_01, r1_02, r2_00, r2_01, r2_02]
[r0_10, r0_11, r0_12, r1_10, r1_11, r1_12, r2_10, r2_11, r2_12]]
So if I want to access the ranges r0 and r2, then I would obtain:
arr = np.arange(18).reshape((2,9))
r_size = 3
ranges = [0, 2]
# --------------------------------------------------------
# Line that index arr, with the variable ranges... Output:
# --------------------------------------------------------
array([[ 0, 1, 2, 6, 7, 8],
[ 9, 10, 11, 15, 16, 17]])
The fastest way that I've found is the following:
import numpy as np
from itertools import chain
arr = np.arange(18).reshape((2,9))
r_size = 3
ranges = [0,2]
arr[:, list(chain(*[range(r_size*x,r_size*x+r_size) for x in ranges]))]
array([[ 0, 1, 2, 6, 7, 8],
[ 9, 10, 11, 15, 16, 17]])
But I am not sure if it can be improved in terms of speed.
Thanks in advance!
You could start by splitting the array up in r_size chunks:
>>> splits = np.split(arr, r_size, axis=1)
[array([[ 0, 1, 2],
[ 9, 10, 11]]),
array([[ 3, 4, 5],
[12, 13, 14]]),
array([[ 6, 7, 8],
[15, 16, 17]])]
Stack with np.stack and select the correct ranges:
>>> stack = np.stack(splits)[ranges]
array([[[ 0, 1, 2],
[ 9, 10, 11]],
[[ 6, 7, 8],
[15, 16, 17]]])
And concatenate horizontally with np.hstack or np.concantenate on axis=1:
>>> np.stack(stack)
array([[ 0, 1, 2, 6, 7, 8],
[ 9, 10, 11, 15, 16, 17]])
Overall this looks like:
>>> np.hstack(np.stack(np.split(arr, r_size, axis=1))[ranges])
array([[ 0, 1, 2, 6, 7, 8],
[ 9, 10, 11, 15, 16, 17]])
Alternatively, you can work with np.reshapes exclusively which will be faster:
Initial reshape:
>>> arr.reshape(len(arr), -1, r_size)
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]]])
Indexing with ranges:
>>> arr.reshape(len(arr), -1, r_size)[:, ranges]
array([[[ 0, 1, 2],
[ 6, 7, 8]],
[[ 9, 10, 11],
[15, 16, 17]]])
Then, reshaping back to the final form:
>>> arr.reshape(len(arr), -1, r_size)[:, ranges].reshape(len(arr), -1)
You will inevitably need to copy the data to get the desired result in a contiguous array. Although to make it efficient I would suggest trying to minimize the number of times you copy the data. Any kind of reshaping operation can be expressed with np.lib.stride_tricks.as_strided.
Assume the original array contains 64-bit integers, then each element is 8 bytes arranged in some shape:
import numpy as np
arr = np.arange(18).reshape((2,9))
arr.shape, arr.strides
output:
((2, 9), (72, 8))
so each column skips 8 bytes and each row skips 72 bytes. arr.reshape(len(arr), -1, r_size) can be expressed as:
np.lib.stride_tricks.as_strided(arr, (2,3,3), (72,24,8))
output:
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]]])
And arr.reshape(len(arr), -1, r_size)[:, ranges] can be expressed as:
np.lib.stride_tricks.as_strided(arr, (2,2,3), (72,24*2,8))
Output:
array([[[ 0, 1, 2],
[ 6, 7, 8]],
[[ 9, 10, 11],
[15, 16, 17]]])
So far, we have only changes the metadata of the array which means that no data has been copied. This operation has a near-zero performance cost. But to get the final array you will need to copy the data somehow:
np.lib.stride_tricks.as_strided(arr, (2,2,3), (72,24*2,8)).reshape(len(arr), -1)
Output:
array([[ 0, 1, 2, 6, 7, 8],
[ 9, 10, 11, 15, 16, 17]])
This is not a generalized solution, but it might give you some ideas nonetheless on how to optimize.
Unfortunately, my timings do not back these claims but it is intuitive still and worth testing for some larger arrays.
I have a multidimensional gridded array with dimensions of (29,320,180), where 29 is the number of array, 320 is the latidutal value and 180 is the longitudal value. I want to find the min value at every grid point out of all 29 arrays, so finally i can have an array with dimensions of 320x180 consisting of the minimum value at each grid point. I have to undermine that every array has a large number of nan values. How can i achieve that?
For example two arrays with same dimensions:
a=[[1,2,3],[3,5,8],[4,8,12]]
b=[[3,5,6],[9,12,5],[5,6,14]]
and the wanted output will be an array with the min value at each index, meaning:
c=[[1,2,3],[3,5,5],[4,6,12]]
I wasn't sure if you needed the minimum of each array in terms of columns or rows, you can choose which one you want with the example below.
Let's create an example of several small 2D arrays:
import numpy as np
ex_dict = {}
lat_min = []
lon_min = []
# creating fake data assuming instead of the 29 arrays of dimensions 320x180 you have 5 arrays of dimensions 2x5 (so we can see the output) and all the arrays are stored in a dictionnary (because it's easier for me to randomly create them that way :)
for i in range(0,5):
ex_dict[i] = np.stack([np.random.choice(range(i,20), 5, replace=False) for _ in range(2)])
Let's look at our arrays:
ex_dict
{0: array([[19, 18, 5, 13, 6],
[ 5, 12, 3, 8, 0]]),
1: array([[10, 13, 2, 19, 15],
[ 5, 19, 6, 8, 14]]),
2: array([[ 5, 17, 10, 11, 7],
[19, 2, 11, 5, 6]]),
3: array([[14, 3, 17, 4, 11],
[18, 10, 8, 3, 7]]),
4: array([[15, 8, 18, 14, 10],
[ 5, 19, 12, 16, 13]])}
Then let's create a list to store the minimum values for each array (lat_min contains the minimum for each raw and lat_lon for each column through all the arrays):
# for each of the 5 arrays (in this example, stored in the ex_dict dictionnary), find the minimum in each row (axis = 1) and each column (axis = 2)
for i in ex_dict:
lat_min.append(np.nanmin(ex_dict[i], axis=1))
lon_min.append(np.nanmin(ex_dict[i], axis=0))
Our lists with minimum values:
lat_min
[array([5, 0]), array([2, 5]), array([5, 2]), array([3, 3]), array([8, 5])]
lon_min
[array([ 5, 12, 3, 8, 0]),
array([ 5, 13, 2, 8, 14]),
array([ 5, 2, 10, 5, 6]),
array([14, 3, 8, 3, 7]),
array([ 5, 8, 12, 14, 10])]
Let's create a large np array 'a' with 10,000 entries
import numpy as np
a = np.arange(0, 10000)
Let's slice the array with 'n' indices 0->9, 1->10, 2->11, etc.
n = 32
b = list(map(lambda x:np.arange(x, x+10), np.arange(0, n)))
c = a[b]
The weird thing that I am getting, is that if n is smaller than 32, I get an error "IndexError: too many indices for array". If n is bigger or equal than 32, then the code works perfectly. The error occurs regardless of the size of the initial array, or the size of the individual slices, but always with number 32. Note that if n == 1, the code works.
Any idea on what is causing this? Thank you.
Your b is a list of arrays:
In [84]: b = list(map(lambda x:np.arange(x, x+10), np.arange(0, 5)))
In [85]: b
Out[85]:
[array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]),
array([ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]),
array([ 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]),
array([ 4, 5, 6, 7, 8, 9, 10, 11, 12, 13])]
When used as an index:
In [86]: np.arange(1000)[b]
/usr/local/bin/ipython3:1: FutureWarning: Using a non-tuple sequence for multidimensional
indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`.
In the future this will be interpreted as an array index, `arr[np.array(seq)]`,
which will result either in an error or a different result.
#!/usr/bin/python3
---------------------------------------------------------------
IndexError: too many indices for array
A[1,2,3] is the same as A[(1,2,3)] - that is, the comma separated indices are a tuple, which is then passed on to the indexing function. Or to put it another way, a multidimensional index should be a tuple (that includes ones with slices).
Up to now numpy has been a bit sloppy, and allowed us to use a list of indices in the same way. The warning tells us that the developers are in the process of tightening up those restrictions.
The error means it is trying to interpret each array in your list as the index for a separate dimension. An array can have at most 32 dimensions. Evidently for the longer list it doesn't try to treat it as a tuple, and instead creates a 2d array for indexing.
There are various ways we can use your b to index a 1d array:
In [87]: np.arange(1000)[np.hstack(b)]
Out[87]:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 3, 4, 5, 6,
7, 8, 9, 10, 11, 12, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13])
In [89]: np.arange(1000)[np.array(b)] # or np.vstack(b)
Out[89]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
[ 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
[ 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]])
In [90]: np.arange(1000)[b,] # 1d tuple containing b
Out[90]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
[ 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
[ 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]])
Note that if b is a ragged list - one or more of the arrays is shorter, only the hstack version works.
First of all, you're not slicing 0->9, 10->19, 20->29; your slices advance by 1 only: 0->9, 1->10, 11->20. Instead, try this:
n = 32
size = 10
b = list(map(lambda x:np.arange(x, x+size), np.arange(0, n*size, size)))
Next, you've misused the indexing notation. b is a list of arrays, and you've used this entire list to index a. When you have indexed more elements than exist in a, numpy assumes that you want the complex list taken as a sequence of references, and uses them as individual index arrays, one a element per leaf element in b.
However, once you drop below the limit of len(a), then numpy assume that you're trying to give a multi-dimensional slice into a: each element of b is taken as a slice into the corresponding dimension of a. Since a is only 1-dimensional, you get the error message. Your code will run in this mode with n=1, but fails with n=2 and above.
Although your question isn't a duplicate, also please see this one.
so this is a question regarding the use of reshape and how this functions uses each axis on a multidimensional scale.
Suppose I have the following array that contains matrices indexed by the first index. What I want to achieve is to instead index the columns of each matrix with the first index. In order to illustrate this problem, consider the following example where the given numpy array that indexes matrices with its first index is z.
x = np.arange(9).reshape((3, 3))
y = np.arange(9, 18).reshape((3, 3))
z = np.dstack((x, y)).T
Where z looks like:
array([[[ 0, 3, 6],
[ 1, 4, 7],
[ 2, 5, 8]],
[[ 9, 12, 15],
[10, 13, 16],
[11, 14, 17]]])
And its shape is (2, 3, 3). Here, the first index are the two images and the three x three is a matrix.
The question more specifically phrased then, is how to use reshape to obtain the following desired output:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]])
Whose shape is (6, 3). This achieves that the dimension of the array indexes the columns of the matrix x and y as presented above. My natural inclination was to use reshape directly on z in the following way:
out = z.reshape(2 * 3, 3)
But its output is the following which indexes the rows of the matrices and not the columns:
array([[ 0, 3, 6],
[ 1, 4, 7],
[ 2, 5, 8],
[ 9, 12, 15],
[10, 13, 16],
[11, 14, 17]]
Could reshape be used to obtain the desired output above? Or more general, can you control how each axis is used when you use the reshape function?
Two things:
I know how to solve the problem. I can go through each element of the big matrix (z) transposed and then apply reshape in the way above. This increases computation time a little bit and is not really problematic. But it does not generalize and it does not feel python. So I was wondering if there is a standard enlightened way of doing this.
I was not clear on how to phrase this question. If anyone has suggestion on how to better phrase this problem I am all ears.
Every array has a natural (1D flattened) order to its elements. When you reshape an array, it is as though it were flattened first (thus obtaining the natural order), and then reshaped:
In [54]: z.ravel()
Out[54]:
array([ 0, 3, 6, 1, 4, 7, 2, 5, 8, 9, 12, 15, 10, 13, 16, 11, 14,
17])
In [55]: z.ravel().reshape(2*3, 3)
Out[55]:
array([[ 0, 3, 6],
[ 1, 4, 7],
[ 2, 5, 8],
[ 9, 12, 15],
[10, 13, 16],
[11, 14, 17]])
Notice that in the "natural order", 0 and 1 are far apart. However you reshape it, 0 and 1 will not be next to each other along the last axis, which is what you want in the desired array:
desired = np.array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]])
This requires some reordering, which in this case can be done by swapaxes:
In [53]: z.swapaxes(1,2).reshape(2*3, 3)
Out[53]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]])
because swapaxes(1,2) places the values in the desired order
In [56]: z.swapaxes(1,2).ravel()
Out[56]:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17])
In [57]: desired.ravel()
Out[57]:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17])
Note that the reshape method also has a order parameter which can be used to control the (C- or F-) order with which the elements are read from the array and placed in the reshaped array. However, I don't think this helps in your case.
Another way to think about the limits of reshape is to say that all reshapes followed by ravel are the same:
In [71]: z.reshape(3,3,2).ravel()
Out[71]:
array([ 0, 3, 6, 1, 4, 7, 2, 5, 8, 9, 12, 15, 10, 13, 16, 11, 14,
17])
In [72]: z.reshape(3,2,3).ravel()
Out[72]:
array([ 0, 3, 6, 1, 4, 7, 2, 5, 8, 9, 12, 15, 10, 13, 16, 11, 14,
17])
In [73]: z.reshape(3*2,3).ravel()
Out[73]:
array([ 0, 3, 6, 1, 4, 7, 2, 5, 8, 9, 12, 15, 10, 13, 16, 11, 14,
17])
In [74]: z.reshape(3*3,2).ravel()
Out[74]:
array([ 0, 3, 6, 1, 4, 7, 2, 5, 8, 9, 12, 15, 10, 13, 16, 11, 14,
17])
So if the ravel of the desired array is different, there is no way to obtain it only be reshaping.
The same goes for reshaping with order='F', provided you also ravel with order='F':
In [109]: z.reshape(2,3,3, order='F').ravel(order='F')
Out[109]:
array([ 0, 9, 1, 10, 2, 11, 3, 12, 4, 13, 5, 14, 6, 15, 7, 16, 8,
17])
In [110]: z.reshape(2*3*3, order='F').ravel(order='F')
Out[110]:
array([ 0, 9, 1, 10, 2, 11, 3, 12, 4, 13, 5, 14, 6, 15, 7, 16, 8,
17])
In [111]: z.reshape(2*3,3, order='F').ravel(order='F')
Out[111]:
array([ 0, 9, 1, 10, 2, 11, 3, 12, 4, 13, 5, 14, 6, 15, 7, 16, 8,
17])
It is possible to obtain the desired array using two reshapes:
In [83]: z.reshape(2, 3*3, order='F').reshape(2*3, 3)
Out[83]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]])
but I stumbled upon this serendipidously.
If I've totally misunderstood your question and x and y are the givens (not z) then you could obtain the desired array using row_stack instead of dstack:
In [88]: z = np.row_stack([x, y])
In [89]: z
Out[89]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]])
It you look at dstack code you'll discover that
np.dstack((x, y)).T
is effectively:
np.concatenate([i[:,:,None] for i in (x,y)],axis=2).transpose([2,1,0])
It reshapes each component array and then joins them along this new axis. Finally it transposes axes.
Your target is the same as (row stack)
np.concatenate((x,y),axis=0)
So with a bit of reverse engineering we can create it from z with
np.concatenate([i[...,0] for i in np.split(z.T,2,axis=2)],axis=0)
np.concatenate([i.T[:,:,0] for i in np.split(z,2,axis=0)],axis=0)
or
np.concatenate(np.split(z.T,2,axis=2),axis=0)[...,0]
or with a partial transpose we can keep the split-and-rejoin axis first, and just use concatenate:
np.concatenate(z.transpose(0,2,1),axis=0)
or its reshape equivalent
(z.transpose(0,2,1).reshape(-1,3))
I'm trying to flatten a 3d array in numpy over an axis (that is, reducing over an axis and flattening over another)
for instance, if I have
X = array(
[[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9,10,11]],
[[12,13,14,15],
[16,17,18,19],
[20,21,22,23]]])
I want to find the operation that turns X in this:
array([
[ 0, 1, 2, 3,12,13,14,15],
[ 4, 5, 6, 7,16,17,18,19],
[ 8, 9,10,11,20,21,22,23]])
I found that in this case np.concatenate((X[0],X[1]), axis=1) gives the solution, however I want a more generic and efficient way to perform this operation for a N dimensional numpy array.
Use numpy.transpose:
>>> X.transpose(1, 0, 2).ravel()
array([ 0, 1, 2, 3, 12, 13, 14, 15, 4, 5, 6, 7, 16, 17, 18, 19, 8,
9, 10, 11, 20, 21, 22, 23])