Avoid copying when indexing a numpy arrays using lists

Avoid copying when indexing a numpy arrays using lists - python

Is there a simple way to index arrays using lists or any other collection so that no copy is made (just a view of the array is taken). Please do not try to answer the question in terms of the snippet of code below --- the list I use to index the element is not always short (i.e. thousands of elements, not 4) and the list is a product of an algorithm and hence the number are not necessarily ordered, etc.
For example in the code below columns 1,2 and 3 are selected in both cases but only in the first case a view of the data is returned:
>>> a[:,1:4]
>>> b = a[:,1:4]
>>> b.base is a
True
>>> c = a[:,[1,3,2]]
>>> c.base is a
False

Fancy indexing (using a list of indices to access elements of an array) always produces a copy, as there is no way for numpy to translate it into a new view of the same data, but with a different fixed stride and shape, starting from a particular element.
Under the hood, a numpy array is a pointer to the first element in memory of an array, a dtype, shape and information about how far to move in memory to get to each of the dimensions (next row, column, etc) and some flags. A view on some pre-existing memory just points to some element in that array and fiddles with the stride and shape. Fancy indexing generally specifies random access into that pre-existing memory and you can't force that data into the necessary form, so a copy has to be made.

Related

List to Numpy Array: How to convert a list of different size elements to a numpy array without adding extra values (e.g., Null or 0)?

I have list with different element size. Here is the attachment for the reference:
As can be seen in the image, first list element size is1095, second 16, third 66, and so on. This List should now be converted to a NumPy array. As far as I know, all of the List's elements should be the same size for conversion. However, I've seen some stack overflow posts stack overflow posts where individuals converted a list of different element sizes to a NumPy array with Null or 0 values to to the elements in order to do the size uniform.
However, I do not want to change my data via adding any 0 or Null values. Because I need to input it into the deep model as is. Is there a method to convert a list of elements (different component sizes) to a NumPy array without changing the data?

If you want to use the list as the input for your deep learning model, you can use the mask and padding. You can find it here https://www.tensorflow.org/guide/keras/masking_and_padding
Check that it does not change your data to int. You need to use pad_sequences(x, dtype='float32') (or 'float64') or something like this.

Numpy slice of first arbitrary dimensions

There is this great Question/Answer about slicing the last dimension:
Numpy slice of arbitrary dimensions: for slicing a numpy array to obtain the i-th index in the last dimension, one can use ... or Ellipsis,
slice = myarray[...,i]
What if the first N dimensions are needed ?
For 3D myarray, N=2:
slice = myarray[:,:,0]
For 4D myarray, N=2:
slice = myarray[:,:,0,0]
Does this can be generalized to an arbitrary dimension?

I don't think there's any built-in syntactic sugar for that, but slices are just objects like anything else. The slice(None) object is what is created from :, and otherwise just picking the index 0 works fine.
myarray[(slice(None),)*N+(0,)*(myarray.ndim-N)]
Note the comma in (slice(None),). Python doesn't create tuples from parentheses by default unless the parentheses are empty. The comma signifies that don't just want to compute whatever's on the inside.
Slices are nice because they give you a view into the object instead of a copy of the object. You can use the same idea to, e.g., iterate over everything except the N-th dimension on the N-th dimension. There have been some stackoverflow questions about that, and they've almost unanimously resorted to rolling the indices and other things that I think are hard to reason about in high-dimensional spaces. Slice tuples are your friend.
From the comments, #PaulPanzer points out another technique that I rather like.
myarray.T[(myarray.ndim-N)*(0,)].T
First, transposes in numpy are view-operations instead of copy-operations. This isn't inefficient in the slightest. Here's how it works:
Start with myarray with dimensions (0,...,k)
The transpose myarray.T reorders those to (k,...,0)
The whole goal is to fix the last myarray.ndim-N dimensions from the original array, so we select those with [(myarray.ndim-N)*(0,)], which grabs the first myarray.ndim-N dimensions from this array.
They're in the wrong order. We have dimensions (N-1,...,0). Use another transpose with .T to get the ordering (0,...,N-1) instead.

how to create ndarray with ndim==0 and size==0?

I am testing some edge cases of my program and observed a strange fact. When I create a scalar numpy array, it has size==1 and ndim==0.
>>> A=np.array(1.0)
>>> A.ndim # returns 0
>>> A.size # returns 1
But when I create empty array with no element, then it has size==0 but ndim==1.
>>> A=np.array([])
>>> A.ndim # returns 1
>>> A.size # returns 0
Why is that? I would expect the ndim to be also 0. Or is there another way of creation of 'really' empty array with size and ndim equal to 0?
UPDATE: even A=np.empty(shape=None) does not create dimensionless array of size 0...

I believe the answer is that "No, you can't create an ndarray with both ndim and size of zero". As you've already found out yourself, the (ndim,size) pairs of (1,0) and (0,1) are as low as you can go.
This very nice answer explains a lot about numpy scalar types, and why they're a bit odd to have around. This explanation makes it clear that scalar numpy arrays like array(1) are a very special kind of beast. They only have a single value (causing size==1), but by definition they don't have a sense of dimensionality, hence ndim==0. Non-scalar numpy arrays, on the other hand, can be empty, but they contain at least a pair of square brackets, leading to a minimal ndim of 1, even if their size can be 0 if they are made up of empty lists. (This is how I think about the situation: ndarrays are in a way lists of lists of lists of ..., on as many levels as there are dimensions. 1d arrays are compatible with lists, so an empty list, being still a list, also has a defining dimension.)
The only way to come up with an empty scalar would be to call np.array() like this, but arrays can only be initialized by some actual object. So I believe your program is safe from this edge case.

Python list of numpy matrices behaving strangely

I am trying to work with lists of numpy matrices and am encountering an annoying problem.
Let's say I start with a list of ten 2x2 zero matrices
para=[numpy.matrix(numpy.zeros((2,2)))]*(10)
I access individual matrices like this
para[0]
para[1]
and so on. So far so good.
Now, I want to modify the first row of the second matrix only, leaving all the others unchanged. So I do this
para[1][0]=numpy.matrix([[1,1]])
The first index points to the second matrix in the list and the second index points to the first row in that matrix, replacing it with [1,1].
But strangely enough, this command changes the first row of ALL ten matrices in the list to [1,1] instead of just the second one like I wanted. What gives?

When you multiply the initial list by 10, you end up with a list of 10 numpy arrays which are in fact references to the the same underlying structure. Modifying one will modify all of them because in fact there's only one numpy array, not 10.
If you need proof, check out this example in the REPL:
>>> a = numpy.zeros(10)
>>> a = [numpy.zeros(10)]*10
>>> a[0] is a[1]
True
>>>
The is operator checks if both objects are in fact the same(not if they are equal in value).
What you should do is use a list comprehension to generate your initial arrays instead of a multiplication, like so:
para=[numpy.matrix(numpy.zeros((2,2))) for i in range(10)]
That will call numpy.matrix() ten times instead of just once and generate 10 distinct matrixes.

Operator + to add a tuple to another tuple stored inside a multidimensional array of tuples

I recently found out how to use tuples thanks to great contributions from SO users(see here). However I encounter the problem that I can't add a tuple to another tuple stored inside an array of tuples. For instance if I define:
arrtup=empty((2,2),dtype=('int,int'))
arrtup[0,1]=(3,4)
Then if I try to add another tuple to the existing tupe to come up with a multidimensional index:
arrtup[0,1]+(4,4)
I obtain this error:
TypeError: unsupported operand type(s) for +: 'numpy.void' and 'tuple'
Instead of the expected (3,4,4,4) tuple, which I can obtain by:
(3,4)+(4,4)
Any ideas? Thanks!

You are mixing different concepts, I'm afraid.
Your arrtup array is not an array of tuples, it's a structured ndarray, that is, an array of elements that look like tuples but in fact are records (numpy.void objects, to be exact). In your case, you defined these records to consist in 2 integers. Internally, NumPy creates your array as a 2x2 array of blocks, each block taking a given space defined by your dtype: here, a block consists of 2 consecutive blocks of size int (that is, each sub-block takes the space an int takes on your machine).
When you retrieve an element with arrtup[0,1], you get the corresponding block. Because this block is structured as two-subblocks, NumPy returns a numpy.void (the generic object representing structured blocks), which has the same dtype as your array.
Because you set the size of those blocks at the creation of the array, you're no longer able to modify it. That means that you cannot transform your 2-int records into 4-int ones as you want.
However, you can transform you structured array into an array of objects:
new_arr = arrtup.astype(object)
Lo and behold, your elements are no longer np.void but tuples, that you can modify as you want:
new_arr[0,1] = (3,4) # That's a tuple
new_arr[0,1] += (4,4) # Adding another tuple to the element
Your new_arr is a different beast from your arrtup: it has the same size, true, but it's no longer a structured array, it's an array of objects, as illustrated by
>>> new_arr.dtype
dtype("object")
In practice, the memory layout is quite different between arrtup and newarr. newarr doesn't have the same constraints as arrtup, as the individual elements can have different sizes, but object arrays are not as efficient as structured arrays.

The traceback is pretty clear here. arrtup[0,1] is not a tuple. It's of type `numpy.void'.
You can convert it to a tuple quite easily however:
tuple(arrtup[0,1])
which can be concatenated with other tuples:
tuple(arrtup[0,1]) + (4,4)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Avoid copying when indexing a numpy arrays using lists - python

Related

List to Numpy Array: How to convert a list of different size elements to a numpy array without adding extra values (e.g., Null or 0)?

Numpy slice of first arbitrary dimensions

how to create ndarray with ndim==0 and size==0?

Python list of numpy matrices behaving strangely

Operator + to add a tuple to another tuple stored inside a multidimensional array of tuples

Categories

Resources