Weird "too many indices for array" error in python - python

Let's create a large np array 'a' with 10,000 entries
import numpy as np
a = np.arange(0, 10000)
Let's slice the array with 'n' indices 0->9, 1->10, 2->11, etc.
n = 32
b = list(map(lambda x:np.arange(x, x+10), np.arange(0, n)))
c = a[b]
The weird thing that I am getting, is that if n is smaller than 32, I get an error "IndexError: too many indices for array". If n is bigger or equal than 32, then the code works perfectly. The error occurs regardless of the size of the initial array, or the size of the individual slices, but always with number 32. Note that if n == 1, the code works.
Any idea on what is causing this? Thank you.

Your b is a list of arrays:
In [84]: b = list(map(lambda x:np.arange(x, x+10), np.arange(0, 5)))
In [85]: b
Out[85]:
[array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]),
array([ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]),
array([ 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]),
array([ 4, 5, 6, 7, 8, 9, 10, 11, 12, 13])]
When used as an index:
In [86]: np.arange(1000)[b]
/usr/local/bin/ipython3:1: FutureWarning: Using a non-tuple sequence for multidimensional
indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`.
In the future this will be interpreted as an array index, `arr[np.array(seq)]`,
which will result either in an error or a different result.
#!/usr/bin/python3
---------------------------------------------------------------
IndexError: too many indices for array
A[1,2,3] is the same as A[(1,2,3)] - that is, the comma separated indices are a tuple, which is then passed on to the indexing function. Or to put it another way, a multidimensional index should be a tuple (that includes ones with slices).
Up to now numpy has been a bit sloppy, and allowed us to use a list of indices in the same way. The warning tells us that the developers are in the process of tightening up those restrictions.
The error means it is trying to interpret each array in your list as the index for a separate dimension. An array can have at most 32 dimensions. Evidently for the longer list it doesn't try to treat it as a tuple, and instead creates a 2d array for indexing.
There are various ways we can use your b to index a 1d array:
In [87]: np.arange(1000)[np.hstack(b)]
Out[87]:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 3, 4, 5, 6,
7, 8, 9, 10, 11, 12, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13])
In [89]: np.arange(1000)[np.array(b)] # or np.vstack(b)
Out[89]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
[ 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
[ 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]])
In [90]: np.arange(1000)[b,] # 1d tuple containing b
Out[90]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
[ 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
[ 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]])
Note that if b is a ragged list - one or more of the arrays is shorter, only the hstack version works.

First of all, you're not slicing 0->9, 10->19, 20->29; your slices advance by 1 only: 0->9, 1->10, 11->20. Instead, try this:
n = 32
size = 10
b = list(map(lambda x:np.arange(x, x+size), np.arange(0, n*size, size)))
Next, you've misused the indexing notation. b is a list of arrays, and you've used this entire list to index a. When you have indexed more elements than exist in a, numpy assumes that you want the complex list taken as a sequence of references, and uses them as individual index arrays, one a element per leaf element in b.
However, once you drop below the limit of len(a), then numpy assume that you're trying to give a multi-dimensional slice into a: each element of b is taken as a slice into the corresponding dimension of a. Since a is only 1-dimensional, you get the error message. Your code will run in this mode with n=1, but fails with n=2 and above.
Although your question isn't a duplicate, also please see this one.

Related

Python vectorize 2d and 3d array concatenation with different dimensions

I am trying to implement message passing in graph neural nets. In each graph, there are edges and nodes and a node-to-edge update is implemented as follows:
Where the square brackets denote the concatenation operation, subscripts are indexes and the superscripts are time indexes.
So I am trying to concatenate 3 matrixes of dimensions: AxN, AxBxM, and BxN. And the resulting concatenation is of dimension: AxBx(2N+M). So every (i,j) of the resulting matrix is a concatenation of the ith row of the first matrix, jth row of the third matrix and the (i,j)th element of the second matrix. I managed to implement this in a double for loop as follows:
edge_in = torch.zeros(a, b, m + 2 * n)
edge_in = edge_in.cuda()
for i in range(a):
for j in range(b):
edge_in[i,j] = torch.cat((nodes_a_embeds[i], edge_embeds[i,j], nodes_b_embeds[j]))
However, this is excruciatingly slow. Is this in any way vectorizable? I tried to come up with a solution and then I looked for a solution online but couldn't manage to vectorize it. Thanks.
edit: numeratic example as requested:
First matrix: 5x3
Second matrix: 5x4x2
Third matrix: 4x3
Output should be 5x4x8 then. Let's call our output matrix R.
Then R(1,2) = concatenate(First(1),Second(1,2),Third(2)).
Would this be the correct implementation of your code?
import numpy as np
A = 2
B = 3
M = 4
N = 5
first = np.arange(A*N).reshape((A, N))
first = np.tile(first[:, np.newaxis, :], (1, B, 1))
second = np.arange(A*B*M).reshape((A, B, M))
third = np.arange(B*N).reshape((B, N))
third = np.tile(third[np.newaxis, :, :], (A, 1, 1))
result = np.concatenate((first, second, third), axis=2)
Output:
array([[[ 0, 1, 2, 3, 4, 0, 1, 2, 3, 0, 1, 2, 3, 4],
[ 0, 1, 2, 3, 4, 4, 5, 6, 7, 5, 6, 7, 8, 9],
[ 0, 1, 2, 3, 4, 8, 9, 10, 11, 10, 11, 12, 13, 14]],
[[ 5, 6, 7, 8, 9, 12, 13, 14, 15, 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9, 16, 17, 18, 19, 5, 6, 7, 8, 9],
[ 5, 6, 7, 8, 9, 20, 21, 22, 23, 10, 11, 12, 13, 14]]])

numpy array prod vs multiply list items

I want to do multiply items of a list.
I did it by numpy and python loop the results are different.
would you please tell me what is the problem?
Numpy code
import numpy as np
a= [5, 5, 7, 6, 6, 8, 9, 6, 6, 4, 8, 9, 5]
print (np.prod(a))
>> 2039787520
python code without numpy
a= [5, 5, 7, 6, 6, 8, 9, 6, 6, 4, 8, 9, 5]
k=1
for i in a:
k*=i
print (k)
>> 23514624000
Another case:
Numpy code
a= [4, 7, 6, 5, 4, 5, 6, 8, 2, 8, 4, 8, 9]
import numpy as np
print (np.prod(a))
>> -579076096
without numpy
a= [4, 7, 6, 5, 4, 5, 6, 8, 2, 8, 4, 8, 9]
k=1
for i in a:
k*=i
print (k)
>> 3715891200
Question: why at the second case the result is minus and different?
Python ints are arbitrary-precision. NumPy dtypes aren't; the default integer dtype in NumPy corresponds to C long, which is 32-bit on your platform. A computation that requires numbers too large for a C long will overflow.
You can specify a larger dtype to store larger numbers, but you can't store arbitrarily large numbers.
No overflow:
In [2]: numpy.prod([5, 5, 7, 6, 6, 8, 9, 6, 6, 4, 8, 9, 5], dtype='int64')
Out[2]: 23514624000
Still overflows:
In [3]: numpy.prod([10000, 10000, 10000, 10000, 10000, 10000], dtype='int64')
Out[3]: 2003764205206896640

how can I reshape a numpy array of (100,) to (250,100)

Imagine that you have created an array with 100 dimensions and then you calculate something and fill this array. for whatever reason, you have not created 2d array, what is wrong with this question that you want to assign another dimension to this data, with this justification that for example 250 samples should have this calculated data?!!
I have searched this but I could not find any solution. Maybe I am not searching with correct keyword!
Actually I want to reshape a numpy array of (100,) to (250,100).
I have read this link and a couple of other links but did not help me.
I have also tried this way:
numpyarray = (100,)
transformed_numpyarray = np.reshape(numpyarray,(100,-1)).T
which gives me this output:
(1, 100)
but I really do not want 1 as the first item of 2d array.
what Im trying to do is to either convert to (,100) or at least something like this (250,100). "250" is a constant number I know already so I want to say for example for 250 samples with 100 dimension.
Thanks.
I'm still confused about what you are trying to do. So far I can picture two alternatives - reshape and repeat. To illustrate:
In [148]: x = np.arange(16)
In [149]: x
Out[149]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
In [150]: x.reshape(4,4)
Out[150]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
In [151]: np.repeat(x[None,:], 4, axis=0)
Out[151]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]])
numpy's arrays are static sized, you can't have an array with a variable shape. If you don't know beforehand how many samples you will have you can gradually add them with vstack:
In [4]: numpyarray.shape
Out[4]: (3, 4)
In [5]: new_sample.shape
Out[5]: (4,)
In [6]: numpyarray = np.vstack([numpyarray, new_sample])
In [7]: numpyarray.shape
Out[7]: (4, 4)
you can also first define the size by creating an array full of zeros and then progressively fill it with samples.
numpyarray = np.zeros((250,100))
...
numpyarray[i] = new_sample

Stack slices of numpy array from given indices

I'm struggling to perform the below operation on a numpy vector.
I want to take previous_n samples from vector finishing at indices.
It's like I want to perform a np.take with slicing of the previous_n samples.
Example:
import numpy as np
vector = np.array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
# number of previous samples
previous_n = 3
indices = np.array([ 5, 7, 12])
result
array([[ 3, 4, 5],
[ 5, 6, 7],
[10, 11, 12]])
Ok, this seems to do what I want. Found here
def stack_slices(arr, previous_n, indices):
all_idx = indices[:, None] + np.arange(previous_n) - (previous_n - 1)
return arr[all_idx]
>>> stack_slices(vector, 3, indices)
array([[ 3, 4, 5],
[ 5, 6, 7],
[10, 11, 12]])

sorting list of list w.r.t another list of lists in python

I have two list of lists, where i want to sort the first with respect to the second. For example here i have the two
old = [[1, 7, 3, 2, 5, 4, 6, 0, 8, 9],
[7, 3, 2, 5, 4, 6, 1, 8, 0, 9],
[9, 2, 8, 7, 1, 5, 0, 4, 6, 3]]
new = [[4, 1, 5, 6, 7, 9, 10, 11, 8, 2, 3, 0],
[10, 6, 4, 3, 0, 11, 2, 5, 8, 1, 9, 7],
[0, 1, 7, 10, 9, 6, 4, 5, 8, 2, 3, 11]]
I want to sort the new list of list w.r.t the old list of list. so for the new one the entries should become
sorted_new = [[1, 7, 3, 2, 5, 4, 6, 0, 8, 9, 10, 11],
[7, 3, 2, 5, 4, 6, 1, 8, 0, 9, 10, 11],
[9, 2, 8, 7, 1, 5, 0, 4, 6, 3, 10, 11]]
important to note is both lists that are to be matched are not of equal size. how can I achieve this?
You can use the following approach:
sorted_new = []
for sub_new,sub_old in zip(new,old):
old_idx = {k:v for v,k in enumerate(sub_old)}
sorted_new.append(sorted(sub_new,key=lambda x:old_idx.get(x,len(sub_old))))
This then generates:
>>> sorted_new
[[1, 7, 3, 2, 5, 4, 6, 0, 8, 9, 10, 11], [7, 3, 2, 5, 4, 6, 1, 8, 0, 9, 10, 11], [9, 2, 8, 7, 1, 5, 0, 4, 6, 3, 10, 11]]
The code works as follows: we first run over the two lists new and old concurrently. For every such pair of lists. We first generate a dictionary, with dictionary comprehension, that maps the elements of the sub_old to their corresponding indices in the list.
Next we construct a sorted list for sub_new. If the element of that sub_new is in the old_idx, we return the index (and this is thus the sorting key). If not, we return a default len(sub_old), which is thus greater than all the indices in the dictionary. As a result that element will be placed at the right of the list.
Since Python's sort function is guaranteed to be stable that means that the elements that were not in old, will maintain the original order.
We could have used some magic around the list.index(..) method, instead of constructing such index dictionary. But the problem with .index(..) is that it runs in O(n). So this would make the algorithm O(m×n log n) for every sublist, with m the number of elements in old, and n the number of elements in new.

Categories

Resources