Indexing and slicing structured ndarrays - python

Now I'm trying to understand possible ways to index numpy structured arrays, and I kinda get stuck with it. Just a couple of simple examples:
import numpy as np
arr = np.array(zip(range(5), range(5, 10)), dtype=[('a', int), ('b', int)])
arr[0] # first row (record)
arr[(0,)] # the same, as expected
arr['a'] # field 'a' of each record
arr[('a',)] # "IndexError: unsupported iterator index" ?!
arr[1:3] # second and third rows (records)
arr[1:3, 'a'] # "ValueError: invalid literal for long() with base 10: 'a'" ?!
arr['a', 1:3] # same error
arr[..., 'a'] # here too...
arr['a', ...] # and here
So, two subquestions arise:
Why is the result for a plain value ('a' in this case) different from the corresponding singleton tuple (('a',))?
Why the last four lines raise the error? And, probably more important, how to get the slice arr['a'][1:3] with a single slice? As you can see, obvious arr['a', 1:3] doesn't work.
I also observed the indexing behavior for built-in list and non-structured ndarray, but couldn't find such issues there: putting a single value in a tuple doesn't change anything, and of course indexing like arr[1, 1:3] for plain ndarray works as expected. Given that, should the errors in my example be considered as bugs in numpy?

First, fields are not the same thing as dimensions - although your array arr has two fields and five rows, numpy actually treats it as one-dimensional (it has shape (5,)). Second, tuples have a special status when used as indices into numpy arrays. When you put a tuple inside the square indexing brackets, numpy interprets it as a sequence of indices into the corresponding dimensions of the array. In the special case where you have nested tuples, each inner tuple is treated as a sequence of indices into that dimension (as if it were a list).
Since fields don't count as dimensions, when you index it with arr[('a',)], numpy interprets 'a' as an index into the rows of arr. The IndexError is therefore raised because strings aren't a valid type for indexing into a dimension of an array (what is the 'a'th row?).
The same thing happens when you try arr['a', 1:3], because this is equivalent to indexing with the tuple ('a', slice(1, 3, None)). The comma between 'a' and 1:3 is what makes it a tuple, regardless of the lack of brackets. Again, numpy tries to index into the rows of arr with 'a', which is invalid. However, even if both elements were valid index types, you would still get an IndexError, since the length of your tuple (2) is greater than the number of dimensions in arr (1).
arr['a'][1:3] and arr[1:3]['a'] are both perfectly valid ways to index a slice of a field.

Related

How to iterate over single element of numpy array

I have a numpy array of shape (100, 1), having all elements similar to as shown below
arr[0] = array(['37107287533902102798797998220837590246510135740250'], dtype=object)
I need to iterate over this single element of array and get the last 10 elements of it. I have not been able to find out how to iterate over single element.
I tried arr[0][-10:] but it returned the entire element and not the last 10 elements
You can get what you want by list comprehension.
np.array([item[0][-10:] for item in arr])
If arr.shape is (100,1), then arr[0].shape is (1,), which is shown array(['astring']) brackets.
arr[0,0] should be a string, e.g. '37107287533902102798797998220837590246510135740250'
Strings take slice indexing, eg. arr[0,0][-10:]
arr[0][0] also works to get one string, but the [0,0] syntax is better.
It isn't clear at what level you want to iterate, since just getting the last 10 characters of one of the string elements doesn't need iteration.
Anyways, pay attention to what each level of indexing is producing, whether it be another array, a list, or a string. Indexing rules for these different classes are similar, but different in important ways.
# import numpy
import numpy as np
arr = np.array(['37107287533902102798797998220837590246510135740250'], dtype=object)
# print the last 10 elements of the array
print(arr[0][-10:])
# iterate through the array and print the elements in reverse order
for i in arr[0][::-1]:
print(i)
# iterate through the array and print the last 10 elements in reverse order
for i in arr[0][-10:][::-1]:
print(I)
# iterate through the array and print the last 10 elements in forward order
for i in arr[0][-10:]:
print(i)
#hpaulj makes a good point. My original answer works with numpy as requested but I didn't really leave the OP an explanation. Using his string advice this how I would do it if it was a string and I wanted to iterate for some reason:
s1 = '37107287533902102798797998220837590246510135740250'
result = 0
for x in s1[-10:]:
print(x)
result += int(x)
print(result)

How to create a new numpy array filled with empty lists?

I want to generate a numpy array filled empty lists. I tried this:
import numpy as np
arr=np.full(6, fill_value=[], dtype=object)
And I got an error:
ValueError: could not broadcast input array from shape (0) into shape (6)
But if I use:
arr = np.empty(6, dtype=object)
arr.fill([])
It is ok. Why does numpy.full not work here? What is the right way to initialize an array filled with empty lists?
The reason you can't use fill_value=[] is hidden in the docs:
In the docs, it says that np.full's fill_value argument is either a scalar or array-like. In the docs for np.asarray, you can find their definition of array-like:
Input data, in any form that can be converted to an array. This includes lists, lists of tuples, tuples, tuples of tuples, tuples of lists and ndarrays.
So, lists are treated specially as "array" fill types and not scalars, which is not what you want. Additionally, arr.fill([]) is actually not what you want, since it fills every element to the same list, which means appending to one appends to all of them. To circumvent this, you can do that this answer states, and just initialize the array with a list:
arr = np.empty(6, dtype=object)
arr[...] = [[] for _ in range(arr.shape[0])]
which will create an array with 6 unique lists, such that appending to one does not append to all of them.
You can try to use numpy.empty(shape, dtype=float, order='C')

Slicing vs indexing

I thought Python (numpy) is zero indexing, but when I slice [:0] it returns empty array. I thought I'm saying slice from zero to zero but clearly I am not.
If instead I use A[1] it returns the position 1 element of by zero-indexing.
When using slice it excludes the endpoint, just like in range(a, b) = a..(b-1)
Though when you do list[:1] it should return [list[0]], not an empty array. Thus I suspect that your array is empty from the beginning.
Reference

dtype of ndarray containing string in python

I know that in case of ndarray containing strings, dtype returned will be of the form dtype(S#) where # denotes the length of the string.
As shown in figure the array 'a' which is generated from a list [1,'2','3']. Once the array is created all the elements become string type. Array 'b' is created from a list ['1',2,'3'].
a.dtype gives S21 while b.dtype gives S1. Length of elements in both a and b is 1. Why the length of elements in first array is taken as 21 even though all the elements have length 1?
It is found that dtype will continue to be 'S21' even if 1 is replaced with 9223372036854775807. Once we use 9223372036854775808, dtype becomes 'S20'. How does this happen
Somebody please explain
np.array is compiled code, so we'd have to dig into that to see exactly what is going on. I don't recall seeing any documentation. So the easiest thing is to just try some values and look for a pattern.
If the 1st element is a string it appears to use the longest string (or str(i) for numbers).
If the 1st is a number it appears to start with some default size.
Unless the dtype is truncating some of the strings, I wouldn't worry too much about this behavior. If it matters, I'd suggest defining your own length.

How to represent ":" in numpy [duplicate]

This question already has answers here:
Array Assignment in numpy / : colon equivalent
(2 answers)
Closed 1 year ago.
I want to slice a multidimensional ndarray but don't know which dimension I will slice on. Lets say we have a ndarray A with shape (6,7,8). Sometimes I need to slice on 1st dimension A[:,3,4], sometimes on third A[1,2,:].
Is there any symbol represent the ":"? I want to use it to generate an index array.
index=np.zeros(3)
index[0]=np.:
index[1]=3
index[2]=4
A[index]
The : slice can be explicitly created by calling slice(None) Here's a short example:
import numpy as np
A = np.arange(9).reshape(3, -1)
# extract the 2nd column
A[:, 1]
# equivalently we can do
cslice = slice(None) # represents the colon
A[cslice, 1]
You want index to be a tuple, with a mix of numbers, lists and slice objects. A number of the numpy functions that take an axis parameter construct such a tuple.
A[(slice(None, None, None), 3, 4)] # == A[:, 3, 4]
there are various ways constructing that tuple:
index = (slice(None),)+(3,4)
index = [slice(None)]*3; index[1] = 3; index[2] = 4
index = np.array([slice(None)]*3]; index[1:]=[3,4]; index=tuple(index)
In this case index can be list or tuple. It just can't be an array.
Starting with a list (or array) is handy in that you can modify values, but it is best to convert it to a tuple before use. I'd have to check the docs for the details, but there are circumstances where a list means something different from a tuple.
http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
Remember that a slicing tuple can always be constructed as obj and used in the x[obj] notation. Slice objects can be used in the construction in place of the [start:stop:step] notation. For example, x[1:10:5,::-1] can also be implemented as obj = (slice(1,10,5), slice(None,None,-1)); x[obj] . This can be useful for constructing generic code that works on arrays of arbitrary dimension.

Categories

Resources