accessing portions of np.array - python

I want to have quick access to np.array elements for example from indexes from 0-6 plus 10 to the end. So far I have tried:
a[0:6,10:]
or
np.concatenate(a[0:6],a[10:])
both are giving me error, with the second one giving me:"TypeError: only integer scalar arrays can be converted to a scalar index"
Edit: concatenate is still giving me problems, so I am going to post my full code here:
Fold_5 = len(predictorX)/5
trainX = np.concatenate(predictorX[:3*int(Fold_5)],predictorX[4*int(Fold_5)])
predictor X is an array with values like
[[0.1,0.4,0.6,0.2],[..]....]

In:
a[0:6,10:]
0:6 selects rows, 10: selects columns. If a isn't 2d or large enough that will result in an error.
In
np.concatenate(a[0:6],a[10:])
the problem is the number of arguments; it takes a list of arrays. A second one, if given is understood to be axis, which should be an integer (hence your error).
np.concatenate([a[0:6],a[10:]])
should work.
Another option is to index with a list
a[0,1,2,3,4,5,10,11,...]]
np.r_ is a handy little tool for constructing such a list:
In [73]: np.r_[0:6, 10:15]
Out[73]: array([ 0, 1, 2, 3, 4, 5, 10, 11, 12, 13, 14])
It in effect does np.concatenate([np.arange(0,6),np.arange(10,15)]).
It doesn't matter whether you index first and the concatenate, or concatenate indexes first and then index. Efficiency is about the same. np.delete chooses among several methods, including these, depending on the size and type of the 'delete' region.
In the trainX expression adding [] to the concatenate call should work. However, predictorX[4*Fold_5] could be a problem. Are you missing a : (as in 10: example)? If you want just one value, then you need to convert it to 1d, e.g. predictorX[[4*Fold_5]]
Fold_5 = len(predictorX)//5 # integer division in py3
trainX = np.concatenate([predictorX[:3*Fold_5], predictorX[4*Fold_5:]])

Here are two more short ways of getting the desired subarray:
np.delete(a, np.s_[6:10])
and
np.r_[a[:6], a[10:]]

np.concatenate takes a sequence of arrays. try
np.concatenate([a[0:6],a[10:]])
or
np.concatenate((a[0:6],a[10:]))

Related

randomly choose different sets in numpy?

I am trying to randomly select a set of integers in numpy and am encountering a strange error. If I define a numpy array with two sets of different sizes, np.random.choice chooses between them without issue:
Set1 = np.array([[1, 2, 3], [2, 4]])
In: np.random.choice(Set1)
Out: [4, 5]
However, once the numpy array are sets of the same size, I get a value error:
Set2 = np.array([[1, 3, 5], [2, 4, 6]])
In: np.random.choice(Set2)
ValueError: a must be 1-dimensional
Could be user error, but I've checked several times and the only difference is the size of the sets. I realize I can do something like:
Chosen = np.random.choice(N, k)
Selection = Set[Chosen]
Where N is the number of sets and k is the number of samples, but I'm just wondering if there was a better way and specifically what I am doing wrong to raise a value error when the sets are the same size.
Printout of Set1 and Set2 for reference:
In: Set1
Out: array([list([1, 3, 5]), list([2, 4])], dtype=object)
In: type(Set1)
Out: numpy.ndarray
In: Set2
Out:
array([[1, 3, 5],
[2, 4, 6]])
In: type(Set2)
Out: numpy.ndarray
Your issue is caused by a misunderstanding of how numpy arrays work. The first example can not "really" be turned into an array because numpy does not support ragged arrays. You end up with an array of object references that points to two python lists. The second example is a proper 2xN numerical array. I can think of two types of solutions here.
The obvious approach (which would work in both cases, by the way), would be to choose the index instead of the sublist. Since you are sampling with replacement, you can just generate the index and use it directly:
Set[np.random.randint(N, size=k)]
This is the same as
Set[np.random.choice(N, k)]
If you want to choose without replacement, your best bet is to use np.random.choice, with replace=False. This is similar to, but less efficient than shuffling. In either case, you can write a one-liner for the index:
Set[np.random.choice(N, k, replace=False)]
Or:
index = np.arange(Set.shape[0])
np.random.shuffle(index)
Set[index[:k]]
The nice thing about np.random.shuffle, though, is that you can apply it to Set directly, whether it is a one- or many-dimensional array. Shuffling will always happen along the first axis, so you can just take the top k elements afterwards:
np.random.shuffle(Set)
Set[:k]
The shuffling operation works only in-place, so you have to write it out the long way. It's also less efficient for large arrays, since you have to create the entire range up front, no matter how small k is.
The other solution is to turn the second example into an array of list objects like the first one. I do not recommend this solution unless the only reason you are using numpy is for the choice function. In fact I wouldn't recommend it at all, since you can, and probably should, use pythons standard random module at this point. Disclaimers aside, you can coerce the datatype of the second array to be object. It will remove any benefits of using numpy, and can't be done directly. Simply setting dtype=object will still create a 2D array, but will store references to python int objects instead of primitives in it. You have to do something like this:
Set = np.zeros(N, dtype=object)
Set[:] = [[1, 2, 3], [2, 4]]
You will now get an object essentially equivalent to the one in the first example, and can therefore apply np.random.choice directly.
Note
I show the legacy np.random methods here because of personal inertia if nothing else. The correct way, as suggested in the documentation I link to, is to use the new Generator API. This is especially true for the choice method, which is much more efficient in the new implementation. The usage is not any more difficult:
Set[np.random.default_rng().choice(N, k, replace=False)]
There are additional advantages, like the fact that you can now choose directly, even from a multidimensional array:
np.random.default_rng().choice(Set2, k, replace=False)
The same goes for shuffle, which, like choice, now allows you to select the axis you want to rearrange:
np.random.default_rng().shuffle(Set)
Set[:k]

How to reverse a numpy array of unknown dimension?

I'm just learning python, but have decided to do so by recoding and improving some old java based school AI project.
My project involved a mathematical operation that is basically a discrete convolution operation, but without one of the functions time reversed.
So, while in my original java project I just wrote all the code to do the operation myself, since I'm working in python, and it's got great math libraries like numpy and scipy, I figured I could just make use of an existing convolution function like scipy.convolve. However, this would require me to pre-reverse one of the two arrays so that when scipy.convolve runs, and reverses one of the arrays to perform the convolution, it's really un-reversing the array. (I also still don't know how I can be sure to pre-reverse the right one of the two arrays so that the two arrays are still slid past each other both forwards rather than both backwards, but I assume I should ask that as a separate question.)
Unlike my java code, which only handled one dimensional data, I wanted to extend this project to multidimensional data. And so, while I have learned that if I had a numpy array of known dimension, such as a three dimensional array a, I could fully reverse the array (or rather get back a view that is reversed, which is much faster), by
a = a(::-1, ::-1, ::-1)
However, this requires me to have a ::-1 for every dimension. How can I perform this same reversal within a method for an array of arbitrary dimension that has the same result as the above code?
You can use np.flip. From the documentation:
numpy.flip(m, axis=None)
Reverse the order of elements in an array along the given axis.
The shape of the array is preserved, but the elements are reordered.
Note: flip(m) corresponds to m[::-1,::-1,...,::-1] with ::-1 at all positions.
This is a possible solution:
slices = tuple([slice(-1, -n-1, -1) for n in a.shape])
result = a[slices]
extends to arbitrary number of axes. Verification:
a = np.arange(8).reshape(2, 4)
slices = tuple([slice(-1, -n-1, -1) for n in a.shape])
result = a[slices]
yields:
>>> a
array([[0, 1, 2, 3],
[4, 5, 6, 7]])
>>> result
array([[7, 6, 5, 4],
[3, 2, 1, 0]])

numpy : indexes too big giving sometimes exceptions, sometimes not

It seems really stupid, but I'm wondering why the following code (numpy 1.11.2) raise an exception:
import numpy as npy
a = npy.arange(0,10)
a[10]
An not this one:
import numpy as npy
a = npy.arange(0,10)
a[1:100]
I can understand, when we want to take part of an array, that's possible we don't really care if the index becomes too big (just taking what is in the array), but it seems a bit tricky too me: it's quite easy too didn't notice you're actually having a but on the way you're counting indexes, without an exception raising.
This is consistent with how Python lists (or sequences in general) behave:
>>> L = list(range(10))
>>> L[10]
IndexError
...
IndexError: list index out of range
>>> L[1:100]
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> L[100:100]
[]
You cannot access an index that does not exit.
But you can have an empty range, i.e. an empty list or and empty NumPy array.
So when if one of the indices is outside of he size of the sequence, take what is there.
The Python tutorial uses a more positive wording:
However, out of range slice indexes are handled gracefully when used for slicing:
When you give the index 1:100, you use slicing. Python, in general, accepts slices larger than the list, and ignores remaining items, so there is no problem. However, when x[10], you specifically refer to the 11-th element (remember that lists start at 0), which does not exist, so you get an exception
In Python, Counting Begins at 0.
In your first example your array has 10 elements, but is indexed from 0 to 9. Therefore, calling a[10], you attempt to call the 11th element, which will give you an error as it outside of the valid index for your array.
As follows:
A = np.arange(0,10)
A = [0,1,2,3,4,5,6,7,8,9]
len(A) = 10
A[9] = 9
You can read about Python 0 indexing here:
https://docs.scipy.org/doc/numpy-1.10.0/user/basics.indexing.html

Numpy nonzero/flatnonzero index order; order of returned elements in boolean indexing

I'm wondering about the order of indices returned by numpy.nonzero / numpy.flatnonzero.
I couldn't find anything in the docs about it. It just says:
A[nonzero(flag)] == A[flag]
While in most cases this is enough, there are some when you need a sorted list of indices. Is it guaranteed that returned indices are sorted in case of 1-D or I need to sort them explicitly? (A similar question is the order of elements returned simply by selecting with a boolean array (A[flag]) which must be the same according to the docs.)
Example: finding the "gaps" between True elements in flag:
flag=np.array([True,False,False,True],dtype=bool)
iflag=flatnonzero(flag)
gaps= iflag[1:] - iflag[:-1]
Thanks.
Given the specification for advanced (or "fancy") indexing with integers, the guarantee that A[nonzero(flag)] == A[flag] is also a guarantee that the values are sorted low-to-high in the 1-d case. However, in higher dimensions, the result (while "sorted") has a different structure than you might expect.
In short, given a 1-dimensional array of integers ind and a 1-dimensional array x to be indexed, we have the following for all valid i defined for ind:
result[i] = x[ind[i]]
result takes the shape of ind, and contains the values of x at the indices indicated by ind. This means that we can deduce that if x[flag] maintains the original order of x, and if x[nonzero(flag)] is the same as x[flag], then nonzero(flag) must always produce indices in sorted order.
The only catch is that for multidimensional arrays, the indices are stored as distinct arrays for each dimension being indexed. So in other words,
x[array([0, 1, 2]), array([0, 0, 0])]
is equal to
array([x[0, 0], x[1, 0], x[2, 0]])
The values are still sorted, but each dimension is broken out into its own array. (You can do interesting things with broadcasting as a result; but that's beyond the scope of this answer.)
The only problem with this line of reasoning is that -- to my great surprise -- I can't find an explicit statement guaranteeing that boolean indexing preserves the original order of the array. Nonetheless, I'm quite certain from experience that it does. More generally, it would be unbelievably perverse to have x[[True, True, True]] return a reversed version of x.

Assume zero for subsequent dimensions when slicing an array

I have need to slice an array where I would like zero to be assumed for every dimension except the first.
Given an array:
x = numpy.zeros((3,3,3))
I would like the following behavior, but without needing to know the number of dimensions before hand:
y = a[:,0,0]
Essentially I am looking for something that would take the place of Ellipsis, but instead of expanding to the needed number of : objects, it would expand into the needed number of zeros.
Is there anything built in for this? If not, what is the best way to get the functionality that I need?
Edit:
One way to do this is to use:
y = x.ravel(0:temp.shape[0])
This works fine, however in some cases (such as mine) ravel will need to create a copy of the array instead of a view. Since I am working with large arrays, I want a more memory efficient way of doing this.
You could create a indexing tuple, like this:
x = arange(3*3*3).reshape(3,3,3)
s = (slice(None),) + (0,)*(x.ndim-1)
print x[s] # array([ 0, 9, 18])
print x[:,0,0] # array([ 0, 9, 18])
I guess you could also do:
x.transpose().flat[:3]
but I prefer the first approach, since it works for any dimension (rather than only the first), and it's obviously equally efficient to just writing x[:,0,0], since it's just a different syntax.
I usually use tom10's method, but here's another:
for i in range(x.ndim-1):
x = x[...,0]

Categories

Resources