I tried to use numpy.apply_along_axis, but this seems to work only when the applied function collapses the dimension and not when it expands it.
Example:
def dup(x):
return np.array([x, x])
a = np.array([1,2,3])
np.apply_along_axis(dup, axis=0, arr=a) # This doesn't work
I was expecting the matrix below (notice how its dimension has expanded from the input matrix a):
np.array([[1, 1], [2, 2], [3, 3]])
In R, this would be accomplished by the **ply set of functions from the plyr package. How to do it with numpy?
If you just want to repeat the elements you can use np.repeat :
>>> np.repeat(a,2).reshape(3,2)
array([[1, 1],
[2, 2],
[3, 3]])
And for apply a function use np.frompyfunc and for convert to an integrate array use np.vstack:
>>> def dup(x):
... return np.array([x, x])
>>> oct_array = np.frompyfunc(dup, 1, 1)
>>> oct_array(a)
array([array([1, 1]), array([2, 2]), array([3, 3])], dtype=object)
>>> np.vstack(oct_array(a))
array([[1, 1],
[2, 2],
[3, 3]])
For someone used to general Python code, a list comprehension may be the simplest approach:
In [20]: np.array([dup(x) for x in a])
Out[20]:
array([[1, 1],
[2, 2],
[3, 3]])
The comprehension (a loop or mapping that applies dup to each element of a) returns [array([1, 1]), array([2, 2]), array([3, 3])], which is easily turned into a 2d array with np.array().
At least for this small a, it is also faster than the np.frompyfunc approach. The np.frompyfunc function will give full access to broadcasting, but evidently it doesn't apply any fast iteration tricks.
apply_along_axis can help keep indices straight when dealing with many dimensions, but it still is just an iteration method. It's written Python so you can study its code yourself. It is much more complicated than needed for this simple case.
In order for your example to work as expected, a should be 2-dimensional:
def dup(x):
# x is now an array of size 1
return np.array([ x[0], x[0] ])
a = np.array([[1,2,3]]) # 2dim
np.apply_along_axis(dup, axis=0, arr=a)
=>
array([[1, 2, 3],
[1, 2, 3]])
Of course, you probably want to transpose the result.
Related
I am trying to do something like the following in NumPy:
import numpy as np
def f(x):
return x[0] + x[1]
X1 = np.array([0, 1, 2])
X2 = np.array([0, 1, 2])
X = np.meshgrid(X1, X2)
result = np.vectorize(f)(X)
with the expected result being array([[0, 1, 2], [1, 2, 3], [2, 3, 4]]), but it returns the following error:
2
3 def f(x):
----> 4 return x[0] + x[1]
5
6 X1 = np.array([0, 1, 2])
IndexError: invalid index to scalar variable
This is because it tries to apply f to all 18 scalar elements of the mesh grid, whereas I want it applied to 9 pairs of 2 scalars. What is the correct way to do this?
Note: I am aware this code will work if I do not vectorize f, but this is important because f can be any function, e.g. it could contain an if statement which throws value error without vectorizing.
If you persist to use numpy.vectorize you need to define signature when defining vectorize on function.
import numpy as np
def f(x):
return x[0] + x[1]
# Or
# return np.add.reduce(x, axis=0)
X1 = np.array([0, 1, 2])
X2 = np.array([0, 1, 2])
X = np.meshgrid(X1, X2)
# np.asarray(X).shape -> (2, 3, 3)
# shape of the desired result is (3, 3)
f_vec = np.vectorize(f, signature='(n,m,m)->(m,m)')
result = f_vec(X)
print(result)
Output:
[[0 1 2]
[1 2 3]
[2 3 4]]
For the function you mentioned in the comments:
f = lambda x: x[0] + x[1] if x[0] > 0 else 0
You can use np.where:
def f(x):
return np.where(x > 0, x[0] + x[1], 0)
# np.where(some_condition, value_if_true, value_if_false)
Numpy was designed with vectorization in mind -- unless you have some crazy edge-case there's almost always a way to take advantage of Numpy's broadcasting and vectorization. I strongly recommend seeking out vectorized solutions before giving up so easily and resorting to using for loops.
If you are too lazy, or ignorant, to do are "proper" 'vectorization', you can use np.vectorize. But you need to take time to really read its docs. It isn't magic. It can be useful, especially if you need to take advantage of broadcasting, and the function, some reason or other, only accepts scalars.
Rewriting your function to work with scalar inputs (though it also works fine with arrays, in this case):
In [91]: def foo(x,y): return x+y
...: f = np.vectorize(foo)
With scalar inputs:
In [92]: f(1,2)
Out[92]: array(3)
With 2 arrays (a (2,1) and (3,)), returning a (2,3):
In [93]: f(np.array([1,2])[:,None], np.arange(1,4))
Out[93]:
array([[2, 3, 4],
[3, 4, 5]])
Samething with meshgrid:
In [94]: I,J = np.meshgrid(np.array([1,2]), np.arange(1,4),indexing='ij')
In [95]: I
Out[95]:
array([[1, 1, 1],
[2, 2, 2]])
In [96]: J
Out[96]:
array([[1, 2, 3],
[1, 2, 3]])
In [97]: f(I,J)
Out[97]:
array([[2, 3, 4],
[3, 4, 5]])
Or meshgrid arrays as defined in [93]:
In [98]: I,J = np.meshgrid(np.array([1,2]), np.arange(1,4),indexing='ij', sparse=True)
In [99]: I,J
Out[99]:
(array([[1],
[2]]),
array([[1, 2, 3]]))
But in a true vectorized sense, you can just add the 2 arrays:
In [100]: I+J
Out[100]:
array([[2, 3, 4],
[3, 4, 5]])
The first paragraph of np.vectorize docs (my emphasis):
Define a vectorized function which takes a nested sequence of objects or
numpy arrays as inputs and returns a single numpy array or a tuple of numpy
arrays. The vectorized function evaluates pyfunc over successive tuples
of the input arrays like the python map function, except it uses the
broadcasting rules of numpy.
edit
Starting with a function that expects a 2 element tuple, we could add a cover that splits it into two, and apply vectorize to that:
In [103]: def foo1(x): return x[0]+x[1]
...: def foo2(x,y): return foo1((x,y))
...: f = np.vectorize(foo2)
In [104]: f(1,2)
Out[104]: array(3)
X is a 2d element tuple:
In [105]: X = np.meshgrid(np.array([1,2]), np.arange(1,4),indexing='ij')
In [106]: X
Out[106]:
[array([[1, 1, 1],
[2, 2, 2]]),
array([[1, 2, 3],
[1, 2, 3]])]
which can be passed to f as:
In [107]: f(X[0],X[1])
Out[107]:
array([[2, 3, 4],
[3, 4, 5]])
But there's no need to slow things down with that iteration. Just pass the tuple to foo1:
In [108]: foo1(X)
Out[108]:
array([[2, 3, 4],
[3, 4, 5]])
In f = lambda x: x[0] + x[1] if x[0] > 0 else 0 you get the 'ambiguity' valueerror because if only works with scalars. But there are plenty of faster numpy ways of replacing such an if step.
I know it is possible to use meshgrid to get all combinations between two arrays using numpy.
But in my case I have an array of two columns and n rows and another array that I would like to get the unique combinations.
For example:
a = [[1,1],
[2,2],
[3,3]]
b = [5,6]
# The expected result would be:
final_array = [[1,1,5],
[1,1,6],
[2,2,5],
[2,2,6],
[3,3,5],
[3,3,6]]
Which method is the fastest way to get this result using only numpy?
Proposed solution
Ok got the result, but I would like to know if this is a reliable and fast solution for this task, if someone could give me any advice I will appreciate.
a_t = np.tile(a, len(b)).reshape(-1,2)
b_t = np.tile(b, len(a)).reshape(1,-1)
final_array = np.hstack((a_t,b_t.T))
array([[1, 1, 5],
[1, 1, 6],
[2, 2, 5],
[2, 2, 6],
[3, 3, 5],
[3, 3, 6]])
Kind of ugly, but here's one way:
xx = np.repeat(a, len(b)).reshape(-1, a.shape[1])
yy = np.tile(b, a.shape[0])[:, None]
np.concatenate((xx, yy), axis=1)
I have a 2D array:
>>> in_arr = np.array([[1,2],[4,3]])
array([[1, 2],
[4, 3]])
and I find the sorted indices by columns to yield another 2D array:
>>> col_sort = np.argsort(in_arr, axis=1)
array([[0, 1],
[1, 0]])
I would like to know the efficient numpy slice to index the first by the second:
>>> redordered_in_arr = np.*SOME_SLICE_METHOD*(in_arr, col_sort, axis=1)
array([[1, 2],
[3, 4]])
The intention is to then perform a (more complicated) function on the array by column, e.g.:
>>> arr_with_function = reordered_in_arr ** np.array([1,2])
array([[1, 4],
[3, 16]])
and return the elements to their original position in the array
>>> return_order = np.argsort(col_sort, axis=1)
>>> redordered_in_arr = np.*SOME_SLICE_METHOD*(arr_with_function, return_order, axis=1)
array([[1, 4],
[16, 3]])
Ok so thinking about it as I type I might just use apply_over_axis, but I would still like know how to the above efficiently in case it is of value later..
If you want to do all those operations in-place then you don't need argsort(). Numpy supports in-place operations in such situations:
In [12]: in_arr = np.array([[1,2],[4,3]])
In [13]: in_arr.sort(axis=1)
In [14]: in_arr **= [1, 2]
In [15]: in_arr
Out[15]:
array([[ 1, 4],
[ 3, 16]])
But if you need the indices of the sorted items you can get the expected result with a simple indexing.
In [18]: in_arr[np.arange(2)[:,None], col_sort]
Out[18]:
array([[1, 2],
[3, 4]])
I would like to create a function that has input: x.shape==(2,2), and outputs y.shape==(2,2,3).
For example:
#np.vectorize
def foo(x):
#This function doesn't work like I want
return x,x,x
a = np.array([[1,2],[3,4]])
print(foo(a))
#desired output
[[[1 1 1]
[2 2 2]]
[[3 3 3]
[4 4 4]]]
#actual output
(array([[1, 2],
[3, 4]]), array([[1, 2],
[3, 4]]), array([[1, 2],
[3, 4]]))
Or maybe:
#np.vectorize
def bar(x):
#This function doesn't work like I want
return np.array([x,2*x,5])
a = np.array([[1,2],[3,4]])
print(bar(a))
#desired output
[[[1 2 5]
[2 4 5]]
[[3 6 5]
[4 8 5]]]
Note that foo is just an example. I want a way to map over a numpy array (which is what vectorize is supposed to do), but have that map take a 0d object and shove a 1d object in its place. It also seems to me that the dimensions here are arbitrary, as one might wish to take a function that takes a 1d object and returns a 3d object, vectorize it, call it on a 5d object, and get back a 7d object.... However, my specific use case only requires vectorizing a 0d to 1d function, and mapping it appropriately over a 2d array.
It would help, in your question, to show both the actual result and your desired result. As written that isn't very clear.
In [79]: foo(np.array([[1,2],[3,4]]))
Out[79]:
(array([[1, 2],
[3, 4]]), array([[1, 2],
[3, 4]]), array([[1, 2],
[3, 4]]))
As indicated in the vectorize docs, this has returned a tuple of arrays, corresponding to the tuple of values that your function returned.
Your bar returns an array, where as vectorize expected it to return a scalar (or single value):
In [82]: bar(np.array([[1,2],[3,4]]))
ValueError: setting an array element with a sequence.
vectorize takes an otypes parameter that sometimes helps. For example if I say that bar (without the wrapper) returns an object, I get:
In [84]: f=np.vectorize(bar, otypes=[object])
In [85]: f(np.array([[1,2],[3,4]]))
Out[85]:
array([[array([1, 2, 5]), array([2, 4, 5])],
[array([3, 6, 5]), array([4, 8, 5])]], dtype=object)
A (2,2) array of (3,) arrays. The (2,2) shape matches the shape of the input.
vectorize has a relatively new parameter, signature
In [90]: f=np.vectorize(bar, signature='()->(n)')
In [91]: f(np.array([[1,2],[3,4]]))
Out[91]:
array([[[1, 2, 5],
[2, 4, 5]],
[[3, 6, 5],
[4, 8, 5]]])
In [92]: _.shape
Out[92]: (2, 2, 3)
I haven't used this much, so am still getting a feel for how it works. When I've tested it, it is slower than the original scalar version of vectorize. Neither offers any speed advantage of explicit loops. However vectorize does help when 'broadcasting', allowing you to use a variety of input shapes. That's even more useful when your function takes several inputs, not just one as in this case.
In [94]: f(np.array([1,2]))
Out[94]:
array([[1, 2, 5],
[2, 4, 5]])
In [95]: f(np.array(3))
Out[95]: array([3, 6, 5])
For best speed, you want to use existing numpy whole-array functions where possible. For example your foo case can be done with:
In [97]: np.repeat(a[:,:,None],3, axis=2)
Out[97]:
array([[[1, 1, 1],
[2, 2, 2]],
[[3, 3, 3],
[4, 4, 4]]])
np.stack([a]*3, axis=2) also works.
And your bar desired result:
In [100]: np.stack([a, 2*a, np.full(a.shape, 5)], axis=2)
Out[100]:
array([[[1, 2, 5],
[2, 4, 5]],
[[3, 6, 5],
[4, 8, 5]]])
2*a takes advantage of the whole-array multiplication. That's true 'numpy-onic' thinking.
Just repeating the value into another dimension is quite simple:
import numpy as np
x = a = np.array([[1,2],[3,4]])
y = np.repeat(x[:,:,np.newaxis], 3, axis=2)
print y.shape
print y
(2L, 2L, 3L)
[[[1 1 1]
[2 2 2]]
[[3 3 3]
[4 4 4]]]
This seems to work for the "f R0 -> R1 mapped over a nd array giving a (n+1)d one"
def foo(x):
return np.concatenate((x,x))
np.apply_along_axis(foo,2,x.reshape(list(x.shape)+[1]))
doesn't generalize all that well, though
Suppose I have an array, I want to have a matrix from that array by a matrix of index.
import numpy as np
arr = np.array([1,5])
mtxidx = np.array([[0,1,0],[0,1,1],[0,0,0]])
How can I get a matrix [[1,5,1],[1,5,5],[1,1,1]] ?
An initial thought is simply say
arr(mtxidx)
however it doesn't work
Is there any function/method that do this elegantly?
"Fancy" indexing works for me (NB in your question you are trying to call the array object (round brackets) but NumPy "ndarray" objects are not callable):
In [61]: arr[mtxidx]
Out[61]:
array([[1, 5, 1],
[1, 5, 5],
[1, 1, 1]])
Your initial thought was pretty close, simply replacing the parenthesis with [] would make it work.
arr[mtxidx]
A list comprehension would work as well.
>>> np.array([arr[row] for row in mtxidx])
array([[1, 5, 1],
[1, 5, 5],
[1, 1, 1]])
I upvote the fancy indexing proposed by #xnx but if you would have done something in same range but involving an operation (or ..anything else) you can also try this :
arr = np.array([1,5])
mtxidx = np.array([[0,1,0],[0,1,1],[0,0,0]])
def func(v):
return arr[v]
vfunc = np.vectorize(func)
vfunc(mtxidx)
# array([[1, 5, 1],
# [1, 5, 5],
# [1, 1, 1]])