numpy apply_along_axis on a 1d array - python

What happens when numpy.apply_along_axis takes a 1d array as input? When I use it on 1d array, I see something strange:
y=array([1,2,3,4])
First try:
apply_along_axis(lambda x: x > 2, 0, y)
apply_along_axis(lambda x: x - 2, 0, y)
returns:
array([False, False, True, True], dtype=bool)
array([-1, 0, 1, 2])
However when I try:
apply_along_axis(lambda x: x - 2 if x > 2 else x, 0, y)
I get an error:
The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I could of course use list comprehension then convert back to array instead, but that seems convoluted and I feel like I'm missing something about apply_along_axis when applied to a 1d array.
UPDATE: as per Jeff G's answer, my confusion stems from the fact that for 1d array with only one axis, what is being passed to the function is in fact the 1d array itself rather than the individual elements.
"numpy.where" is clearly better for my chosen example (and no need for apply_along_axis), but my question is really about the proper idiom for applying a general function (that takes one scalar and returns one scalar) to each element of an array (other than list comprehension), something akin to pandas.Series.apply (or map). I know of 'vectorize' but it seems no less unwieldy than list comprehension.

I'm unclear whether you're asking if y must be 1-D (answer is no, it can be multidimensional) or if you're asking about the function passed into apply_along_axis. To that, the answer is yes: the function you pass must take a 1-D array. (This is stated clearly in the function's documentation).
In your three examples, the type of x is always a 1-D array. The reason your first two examples work is because Python is implicitly broadcasting the > and - operators along that array.
Your third example fails because there is no such broadcasting along an array for if / else. For this to work with apply_along_axis you need to pass a function that takes a 1-D array. numpy.where would work for this:
>>> apply_along_axis(lambda x: numpy.where(x > 2, x - 2, x), 0, y)
array([1, 2, 1, 2])
P.S. In all these examples, apply_along_axis is unnecessary, thanks to broadcasting. You could achieve the same results with these:
>>> y > 2
>>> y - 2
>>> numpy.where(y > 2, y - 2, y)

This answer addresses the updated addendum to your original question:
numpy.vectorize will take an elementwise function and return a new function. The new function can be applied to an entire array. It's like map, but it uses the broadcasting rules of numpy.
f = lambda x: x - 2 if x > 2 else x # your elementwise fn
fv = np.vectorize(f)
fv(np.array([1,2,3,4]))
# Out[5]: array([1, 2, 1, 2])

Related

Using `numpy.vectorize` to create multidimensional array results in ValueError: setting an array element with a sequence

This problem only seems to arise when my dummy function returns an array and thus, a multidimensional array is being created.
I reduced the issue to the following example:
def dummy(x):
y = np.array([np.sin(x), np.cos(x)])
return y
x = np.array([0, np.pi/2, np.pi])
The code I want to optimize looks like this:
y = []
for x_i in x:
y_i = dummy(x_i)
y.append(y_i)
y = np.array(y)
So I thought, I could use vectorize to get rid of the slow loop:
y = np.vectorize(dummy)(x)
But this results in
ValueError: setting an array element with a sequence.
Where even is the sequence, which the error is talking about?!
Your function returns an array when given a scalar:
In [233]: def dummy(x):
...: y = np.array([np.sin(x), np.cos(x)])
...: return y
...:
...:
In [234]: dummy(1)
Out[234]: array([0.84147098, 0.54030231])
In [235]: f = np.vectorize(dummy)
In [236]: f([0,1,2])
...
ValueError: setting an array element with a sequence.
vectorize constructs a empty result array, and tries to put the result of each calculation in it. But a cell of the target array cannot accept an array.
If we specify a otypes parameter, it does work:
In [237]: f = np.vectorize(dummy, otypes=[object])
In [238]: f([0,1,2])
Out[238]:
array([array([0., 1.]), array([0.84147098, 0.54030231]),
array([ 0.90929743, -0.41614684])], dtype=object)
That is, each dummy array is put in a element of a shape (3,) result array.
Since the component arrays all have the same shape, we can stack them:
In [239]: np.stack(_)
Out[239]:
array([[ 0. , 1. ],
[ 0.84147098, 0.54030231],
[ 0.90929743, -0.41614684]])
But as noted, vectorize does not promise a speedup. I suspect we could also use the newer signature parameter, but that's even slower.
vectorize makes some sense if your function takes several scalar arguments, and you'd like to take advantage of numpy broadcasting when feeding sets of values. But as replacement for a simple iteration over a 1d array, it isn't an improvement.
I don't really understand the error either, but with python 3.6.3 you can just write:
y = dummy(x)
so it is automatically vectorized.
Also in the official documentation there is written the following:
The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.
I hope this was at least a little help.

Mapping an integer to array (Python): ValueError: setting an array element with a sequence

I have a defaultdict which maps certain integers to a numpy array of size 20.
In addition, I have an existing array of indices. I want to turn that array of indices into a 2D array, where each original index is converted into an array via my defaultdict.
Finally, in the case that an index isn't found in the defaultdict, I want to create an array of zeros for that index.
Here's what I have so far
converter = lambda x: np.zeros((d), dtype='float32') if x == -1 else cVf[x]
vfunc = np.vectorize(converter)
cvf = vfunc(indices)
np.zeros((d), dtype='float32') and cVf[x] are identical data types/ shapes:
(Pdb) np.shape(cVf[0])
(20,)
Yet I get the error in the title (*** ValueError: setting an array element with a sequence.) when I try to run this code.
Any ideas?
You should give us a some sample arrays or dictionaries (in the case of cVF, so we can make a test run.
Read what vectorize has to say about the return value. Since you don't define otypes, it makes a test calculation to determine the dtype of the returned array. My first thought was that the test calc and subsequent one might be returning different things. But you claim converter will always be returning the same dtype and shape array.
But let's try something simpler:
In [609]: fv = np.vectorize(lambda x: np.array([x,x]))
In [610]: fv([1,2,3])
...
ValueError: setting an array element with a sequence.
It's having trouble with returning any array.
But if I give an otypes, it works
In [611]: fv = np.vectorize(lambda x: np.array([x,x]), otypes=[object])
In [612]: fv([1,2,3])
Out[612]: array([array([1, 1]), array([2, 2]), array([3, 3])], dtype=object)
In fact in this case I could use frompyfunc, which returns object dtype, and is the underlying function for vectorize (and a bit faster).
In [613]: fv = np.frompyfunc(lambda x: np.array([x,x]), 1,1)
In [614]: fv([1,2,3])
Out[614]: array([array([1, 1]), array([2, 2]), array([3, 3])], dtype=object)
vectorize and frompyfunc are designed for functions that are scalar in- scalar out. That scalar may be an object, even array, but is still treated as a scalar.

Explain numpy.where [duplicate]

This question already has answers here:
Is there a NumPy function to return the first index of something in an array?
(20 answers)
Closed 5 years ago.
I'm looking for something similar to list.index(value) that works for numpy arrays. I think that numpy.where might do the trick, but I don't understand how it works, exactly. Could someone please explain
a) what this means
and b) whether or not it works like list.index(value) but with numpy arrays.
This is the article from the documentation:
numpy.where(condition[, x, y])
Return elements, either from x or y, depending on condition.
If only condition is given, return condition.nonzero().
Parameters: condition : array_like, bool
When True, yield x, otherwise yield y.
x, y : array_like, optional
Values from which to choose. x and y need to have the same shape as
condition. Returns: out : ndarray or tuple of ndarrays
If both x and y are specified, the output array contains elements of
x where condition is True, and elements from y elsewhere. If only
condition is given, return the tuple condition.nonzero(), the indices
where condition is True. See also nonzero, choose
Notes If x and y are given and input arrays are 1-D, where is
equivalent to: [xv if c else yv for (c,xv,yv) in
zip(condition,x,y)]
What it means?:
The numpy.where function takes a condition as an argument and returns the indices where that condition is true
Is it like list.index?:
It is close in that it returns the indices of the array where the condition is met, while list.index takes a value as the argument, this can be achieved with numpy.where by passing array == value as the condition.
Example:
Using the array
a = numpy.array([[1,2,3],
[4,5,6],
[7,8,9]])
and calling numpy.where(a == 4) returns (array([1]), array([0]))
calling numpy.where(a >= 4) returns (array([1, 1, 1, 2, 2, 2]), array([0, 1, 2, 0, 1, 2])), two arrays of Y and X coordinates (respectively) where the condition is true.

how can I flatten an 2d numpy array, which has different length in the second axis?

I have a numpy array which looks like:
myArray = np.array([[1,2],[3]])
But I can not flatten it,
In: myArray.flatten()
Out: array([[1, 2], [3]], dtype=object)
If I change the array to the same length in the second axis, then I can flatten it.
In: myArray2 = np.array([[1,2],[3,4]])
In: myArray2.flatten()
Out: array([1, 2, 3, 4])
My Question is:
Can I use some thing like myArray.flatten() regardless the dimension of the array and the length of its elements, and get the output: array([1,2,3])?
myArray is a 1-dimensional array of objects. Your list objects will simply remain in the same order with flatten() or ravel(). You can use hstack to stack the arrays in sequence horizontally:
>>> np.hstack(myArray)
array([1, 2, 3])
Note that this is basically equivalent to using concatenate with an axis of 1 (this should make sense intuitively):
>>> np.concatenate(myArray, axis=1)
array([1, 2, 3])
If you don't have this issue however and can merge the items, it is always preferable to use flatten() or ravel() for performance:
In [1]: u = timeit.Timer('np.hstack(np.array([[1,2],[3,4]]))'\
....: , setup = 'import numpy as np')
In [2]: print u.timeit()
11.0124390125
In [3]: u = timeit.Timer('np.array([[1,2],[3,4]]).flatten()'\
....: , setup = 'import numpy as np')
In [4]: print u.timeit()
3.05757689476
Iluengo's answer also has you covered for further information as to why you cannot use flatten() or ravel() given your array type.
Well, I agree with the other answers when they say that hstack or concatenate do the job in this case. However, I would like to point that even if it 'fixes' the problem, the problem is not addressed properly.
The problem is that even if it looks like the second axis has different length, this is not true in practice. If you try:
>>> myArray.shape
(2,)
>>> myArray.dtype
dtype('O') # stands for Object
>>> myArray[0]
[1, 2]
It shows you that your array is not a 2D array with variable size (as you might think), it is just a 1D array of objects. In your case, the elements are list, being the first element of your array a 2-element list and the second element of the array is a 1-element list.
So, flatten and ravel won't work because transforming 1D array to a 1D array results in exactly the same 1D array. If you have a object numpy array, it won't care about what you put inside, it will treat individual items as unkown items and can't decide how to merge them.
What you should have in consideration, is if this is the behaviour you want for your application. Numpy arrays are specially efficient with fixed-size numeric matrices. If you are playing with arrays of objects, I don't see why would you like to use Numpy instead of regular python lists.
np.hstack works in this case
In [69]: np.hstack(myArray)
Out[69]: array([1, 2, 3])

Numpy multidimensional array slicing

Suppose I have defined a 3x3x3 numpy array with
x = numpy.arange(27).reshape((3, 3, 3))
Now, I can get an array containing the (0,1) element of each 3x3 subarray with x[:, 0, 1], which returns array([ 1, 10, 19]). What if I have a tuple (m,n) and want to retrieve the (m,n) element of each subarray(0,1) stored in a tuple?
For example, suppose that I have t = (0, 1). I tried x[:, t], but it doesn't have the right behaviour - it returns rows 0 and 1 of each subarray. The simplest solution I have found is
x.transpose()[tuple(reversed(t))].transpose()
but I am sure there must be a better way. Of course, in this case, I could do x[:, t[0], t[1]], but that can't be generalised to the case where I don't know how many dimensions x and t have.
you can create the index tuple first:
index = (numpy.s_[:],)+t
x[index]
HYRY solution is correct, but I have always found numpy's r_, c_ and s_ index tricks to be a bit strange looking. So here is the equivalent thing using a slice object:
x[(slice(None),) + t]
That single argument to slice is the stop position (i.e. None meaning all in the same way that x[:] is equivalent to x[None:None])

Categories

Resources