I am facing a situation where I have a VERY large numpy.ndarray (really, it's an hdf5 dataset) that I need to find a subset of quickly because they entire array cannot be held in memory. However, I also do not want to iterate through such an array (even declaring the built-in numpy iterator throws a MemoryError) because my script would take literally days to run.
As such, I'm faced with the situation of iterating through some dimensions of the array so that I can perform array-operations on pared down subsets of the full array. To do that, I need to be able to dynamically slice out a subset of the array. Dynamic slicing means constructing a tuple and passing it.
For example, instead of
my_array[0,0,0]
I might use
my_array[(0,0,0,)]
Here's the problem: if I want to slice out all values along a particular dimension/axis of the array manually, I could do something like
my_array[0,:,0]
> array([1, 4, 7])
However, I this does not work if I use a tuple:
my_array[(0,:,0,)]
where I'll get a SyntaxError.
How can I do this when I have to construct the slice dynamically to put something in the brackets of the array?
You could slice automaticaly using python's slice:
>>> a = np.random.rand(3, 4, 5)
>>> a[0, :, 0]
array([ 0.48054702, 0.88728858, 0.83225113, 0.12491976])
>>> a[(0, slice(None), 0)]
array([ 0.48054702, 0.88728858, 0.83225113, 0.12491976])
The slice method reads as slice(*start*, stop[, step]). If only one argument is passed, then it is interpreted as slice(0, stop).
In the example above : is translated to slice(0, end) which is equivalent to slice(None).
Other slice examples:
:5 -> slice(5)
1:5 -> slice(1, 5)
1: -> slice(1, None)
1::2 -> slice(1, None, 2)
Okay, I finally found an answer just as someone else did.
Suppose I have array:
my_array[...]
>array(
[[[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9]],
[[10, 11, 12],
[13, 14, 15],
[16, 17, 18]]])
I can use the slice object, which apparently is a thing:
sl1 = slice( None )
sl2 = slice( 1,2 )
sl3 = slice( None )
ad_array.matrix[(sl1, sl2, sl3)]
>array(
[[[ 4, 5, 6]],
[[13, 14, 15]]])
Related
If you do e.g. the following:
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13, 14, 15]])
print(a[2:10])
Python won't complain and prints the array as in a[2:] which would be great in my usecase. I want to loop through a large array and slice it into equally sized chunks until the array is "used up". The last array can thus be smaller than the rest which doesn't matter to me.
However: I'm concerned about security leaks, performance leaks, the possibility for this behaviour to become deprecated in the near future, etc.. Is it safe and intended to use slicing like this or should it be avoided and I have to go the extra mile to make sure the last chunk is sliced as a[2:] or a[2:len(a)]?
There are related Answers like this but I haven't found anything addressing my concerns
Slice resolution is not done in numpy. slice objects have a convenience method called indices method, which is only documented in the C API under PySlice_GetIndices. In fact the python documentation states that they have no functionality besides storing indices.
When you run a[2:10], the slice object is slice(2, 10), and the length of the axis is a.shape[0] == 5:
>>> slice(2, 10).indices(5)
(2, 5, 1)
This is builtin python behavior, at a lower level than numpy. The linked question has an example of getting an error for the corresponding index:
>>> a[np.arange(2, 10)]
In this case, the passed object is not a slice, so it does get handled by numpy, and raises an error:
IndexError: index 5 is out of bounds for axis 0 with size 5
This is the same error that you would get if you tried accessing the invalid index individually:
>>> a[5]
...
IndexError: index 5 is out of bounds for axis 0 with size 5
Incidentally, python lists and tuples will check the bounds on a scalar index as well:
>>> a.tolist()[5]
...
IndexError: list index out of range
You can implement your own bounds checking, for example to create a fancy index using slice.indices:
>>> a[np.arange(*slice(2, 10).indices(a.shape[0]))]
array([[ 7, 8, 9],
[10, 11, 12],
[13, 14, 15]])
I have an numpy array a that I would like to replace some elements. I have the value of the new elements in a tuple/numpy array and the indexes of the elements of a that needs to be replaced in another tuple/numpy array. Below is an example of using python to do what I want.How do I do this efficiently in NumPy?
Example script:
a = np.arange(10)
print( f'a = {a}' )
newvalues = (10, 20, 35)
indexes = (2, 4, 6)
for n,i in enumerate( indexes ):
a[i]=newvalues[n]
print( f'a = {a}' )
Output:
a = array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
a = array([ 0, 1, 10, 3, 20, 5, 35, 7, 8, 9])
I tried a[indexes]=newvalues but got IndexError: too many indices for array: array is 1-dimensional, but 3 were indexed
The list of indices indicating which elements you want to replace should be a Python list (or similar type), not a tuple. Different items in the selection tuple indicate that they should be selected from different axis dimensions.
Therefore, a[(2, 4, 6)] is the same as a[2, 4, 6], which is interpreted as the value at index 2 in the first dimension, index 4 in the second dimension, and index 6 in the third dimension.
The following code works correctly:
indexes = [2, 4, 6]
a[indexes] = newvalues
See also the page on Indexing from the numpy documentation, specifically the second 'Note' block in the introduction as well as the first 'Warning' under Advanced Indexing:
In Python, x[(exp1, exp2, ..., expN)] is equivalent to x[exp1, exp2, ..., expN]; the latter is just syntactic sugar for the former.
The definition of advanced indexing means that x[(1,2,3),] is fundamentally different than x[(1,2,3)]. The latter is equivalent to x[1,2,3] which will trigger basic selection while the former will trigger advanced indexing. Be sure to understand why this occurs.
The use case is that of rules can be turned on or off.
Was wondering if I could use the numpy array indexing
expression for this.
For e.g. user can specify:
10:40:3
which would mean rules are active within day 10 to 40 every third day.
How do I index in an array using such given expression.
A 10:40:3 expression only works inside of a [], e.g.
x[10:40:3]
where x is a list or array.
The Python interpreter translates that into:
x.__getitem__(slice(10,40,3))
It would be possible to convert a '10:40:3' string into a slice(10,40,3) object, but you could also accept three integers and build the slice from that.
A slice can be used as:
idx = slice(10,40,3)
x[idx]
In [683]: idx = slice(1,10,2)
In [684]: np.arange(20)[idx]
Out[684]: array([1, 3, 5, 7, 9])
In [685]: np.arange(1,10,2)
Out[685]: array([1, 3, 5, 7, 9])
simple way of making a slice from the string:
In [687]: slice(*[int(i) for i in astr.split(':')])
Out[687]: slice(1, 10, 2)
numpy has also defined some special object that can help with slices, but none work with strings
In [690]: np.r_[idx]
Out[690]: array([1, 3, 5, 7, 9])
In [691]: np.s_[1:10:2]
Out[691]: slice(1, 10, 2)
In [693]: np.s_[1::2]
Out[693]: slice(1, None, 2)
In [694]: np.s_[:]
Out[694]: slice(None, None, None)
I'm new to python and I am stuck. I've been playing with this a lot. I am trying to get my 3 lists to join and when I do python says that the new list only contains 1 item. How do I get them to merge completely?
here is the code I have now :
(where avg[] is some array containing lots of data)
q=avg[0:40]
p=avg[53:70]
u=avg[95:145]
pu=p+u
NF=[numpy.append(q,pu)]
>>>len(NF)
>>1
but the actual length of all the items is 107.
Please help
If avg is list, then q, p and u are slices of a list and therefore will be lists too. In that case, you could concatenate the lists using addition:
q+p+u
If you want a NumPy array, you could use np.concatenate:
In [48]: avg = np.arange(20)
In [49]: q = avg[0:4]
In [50]: p = avg[5:7]
In [51]: u = avg[9:14]
In [52]: np.concatenate([q,p,u])
Out[52]: array([ 0, 1, 2, 3, 5, 6, 9, 10, 11, 12, 13])
I made the arrays smaller so the result is easier to check.
Other alternatives include np.hstack and np.r_:
In [53]: np.hstack([q,p,u])
Out[53]: array([ 0, 1, 2, 3, 5, 6, 9, 10, 11, 12, 13])
In [54]: np.r_[q,p,u]
Out[54]: array([ 0, 1, 2, 3, 5, 6, 9, 10, 11, 12, 13])
In the above examples, q, p, and u may be NumPy arrays or Python lists. In each case a NumPy array is returned.
You put your array in another array. try this:
NF=numpy.append(q,pu)
q=avg[0:40]
p=avg[53:70]
u=avg[95:145]
pu=p+u
NF=[numpy.append(q,pu)] #problem right here just do NF = numpy.append(q,pu)
>>>len(NF)
>>1
Okay this is going to be horrifically slow, maybe numpy has its own way of doing it, but using python way and provided that you have one-dimensional arrays, try this:
from itertools import chain
items = chain.from_iterable([avg[0:40], avg[53:70], avg[95:145]])
The statements about will return a generator, that can be converted into a list that reports you its length.
item_list = [x for x in items]
len(item_list)
When using slicing in NumPy, you get all pair-wise elements, e.g.:
>> im = np.arange(1,37).reshape((6, 6))
>> im[1:6:2,1:6:2]
array([[ 8, 10, 12],
[20, 22, 34],
[32, 34, 36]])
However when using lists/tuples of indices this behavior does not seem to be followed:
>> im[(1,3,5),(1,3,5)]
array([ 8, 22, 36])
>> im[[1,3,5],[1,3,5]]
array([ 8, 22, 36])
It is instead gets just the diagonal (in this case). This is problematic if you cannot specify indices as slices, for example (1,3,4) and (1,3,6). For those two tuples I would expect to get all elements at (1,1) (1,3) (1,6) (3,1) ...
All the workarounds I can think of involve fleshing out every pair of elements which is incredibly expensive when trying to extract large numbers of elements from massive images. In MATLAB, im([1,3,5],[1,3,5]) does what I would want. I know there are many tricks in NumPy's indexing and I am probably just missing some subtleties.
As a conclusion, example workarounds:
im[np.meshgrid([1,3,5], [1,3,5], indexing='ij')]
im[zip(*itertools.product([1,3,5], [1,3,5]))].reshape((3,3))
Try numpy.ix_:
>>> im[np.ix_((1,3,5),(1,3,5))]
array([[ 8, 10, 12],
[20, 22, 24],
[32, 34, 36]])
Or you can directly do this:
>>> ix = np.array([1, 3, 5])
>>> iy = np.array([1, 3, 5])
>>> im[ix[:, np.newaxis], iy[np.newaxis, :]]
array([[ 8, 10, 12],
[20, 22, 24],
[32, 34, 36]])
Is this what you need?
i1 = [1,3,5]
i2 = [1,3,5]
print im[i1][:,i2].ravel()
Note a temporary array is created on first indexing. If your array is very big, it might be undesirable.
The answer by other people is correct. Just to explain why this is happening.
From documentation of Indexing on numpy arrays -
When indexing like - x[obj] - Advanced indexing is triggered when the selection object, obj, is a non-tuple sequence object, an ndarray (of data type integer or bool), or a tuple with at least one sequence object or ndarray (of data type integer or bool).
Your case falls into the second , and hence im[(1,3,5),(1,3,5)] triggers Advanced indexing. And later on in the documentation of Advanced indexing , it is explained -
Advanced indexes always are broadcast and iterated as one:
result[i_1, ..., i_M] == x[ind_1[i_1, ..., i_M], ind_2[i_1, ..., i_M],
..., ind_N[i_1, ..., i_M]]
Note that the result shape is identical to the (broadcast) indexing array shapes ind_1, ..., ind_N.
That it result[i_1] would be - x[ind_1[i_1],ind_2[i_1],...ind_N[i_1]]
The documentation suggest to use np.ix_ to achieve behavior similar to basic slicing -
To achieve a behaviour similar to the basic slicing above, broadcasting can be used. The function ix_ can help with this broadcasting. This is best understood with an example.