I am trying to do something like the following in NumPy:
import numpy as np
def f(x):
return x[0] + x[1]
X1 = np.array([0, 1, 2])
X2 = np.array([0, 1, 2])
X = np.meshgrid(X1, X2)
result = np.vectorize(f)(X)
with the expected result being array([[0, 1, 2], [1, 2, 3], [2, 3, 4]]), but it returns the following error:
2
3 def f(x):
----> 4 return x[0] + x[1]
5
6 X1 = np.array([0, 1, 2])
IndexError: invalid index to scalar variable
This is because it tries to apply f to all 18 scalar elements of the mesh grid, whereas I want it applied to 9 pairs of 2 scalars. What is the correct way to do this?
Note: I am aware this code will work if I do not vectorize f, but this is important because f can be any function, e.g. it could contain an if statement which throws value error without vectorizing.
If you persist to use numpy.vectorize you need to define signature when defining vectorize on function.
import numpy as np
def f(x):
return x[0] + x[1]
# Or
# return np.add.reduce(x, axis=0)
X1 = np.array([0, 1, 2])
X2 = np.array([0, 1, 2])
X = np.meshgrid(X1, X2)
# np.asarray(X).shape -> (2, 3, 3)
# shape of the desired result is (3, 3)
f_vec = np.vectorize(f, signature='(n,m,m)->(m,m)')
result = f_vec(X)
print(result)
Output:
[[0 1 2]
[1 2 3]
[2 3 4]]
For the function you mentioned in the comments:
f = lambda x: x[0] + x[1] if x[0] > 0 else 0
You can use np.where:
def f(x):
return np.where(x > 0, x[0] + x[1], 0)
# np.where(some_condition, value_if_true, value_if_false)
Numpy was designed with vectorization in mind -- unless you have some crazy edge-case there's almost always a way to take advantage of Numpy's broadcasting and vectorization. I strongly recommend seeking out vectorized solutions before giving up so easily and resorting to using for loops.
If you are too lazy, or ignorant, to do are "proper" 'vectorization', you can use np.vectorize. But you need to take time to really read its docs. It isn't magic. It can be useful, especially if you need to take advantage of broadcasting, and the function, some reason or other, only accepts scalars.
Rewriting your function to work with scalar inputs (though it also works fine with arrays, in this case):
In [91]: def foo(x,y): return x+y
...: f = np.vectorize(foo)
With scalar inputs:
In [92]: f(1,2)
Out[92]: array(3)
With 2 arrays (a (2,1) and (3,)), returning a (2,3):
In [93]: f(np.array([1,2])[:,None], np.arange(1,4))
Out[93]:
array([[2, 3, 4],
[3, 4, 5]])
Samething with meshgrid:
In [94]: I,J = np.meshgrid(np.array([1,2]), np.arange(1,4),indexing='ij')
In [95]: I
Out[95]:
array([[1, 1, 1],
[2, 2, 2]])
In [96]: J
Out[96]:
array([[1, 2, 3],
[1, 2, 3]])
In [97]: f(I,J)
Out[97]:
array([[2, 3, 4],
[3, 4, 5]])
Or meshgrid arrays as defined in [93]:
In [98]: I,J = np.meshgrid(np.array([1,2]), np.arange(1,4),indexing='ij', sparse=True)
In [99]: I,J
Out[99]:
(array([[1],
[2]]),
array([[1, 2, 3]]))
But in a true vectorized sense, you can just add the 2 arrays:
In [100]: I+J
Out[100]:
array([[2, 3, 4],
[3, 4, 5]])
The first paragraph of np.vectorize docs (my emphasis):
Define a vectorized function which takes a nested sequence of objects or
numpy arrays as inputs and returns a single numpy array or a tuple of numpy
arrays. The vectorized function evaluates pyfunc over successive tuples
of the input arrays like the python map function, except it uses the
broadcasting rules of numpy.
edit
Starting with a function that expects a 2 element tuple, we could add a cover that splits it into two, and apply vectorize to that:
In [103]: def foo1(x): return x[0]+x[1]
...: def foo2(x,y): return foo1((x,y))
...: f = np.vectorize(foo2)
In [104]: f(1,2)
Out[104]: array(3)
X is a 2d element tuple:
In [105]: X = np.meshgrid(np.array([1,2]), np.arange(1,4),indexing='ij')
In [106]: X
Out[106]:
[array([[1, 1, 1],
[2, 2, 2]]),
array([[1, 2, 3],
[1, 2, 3]])]
which can be passed to f as:
In [107]: f(X[0],X[1])
Out[107]:
array([[2, 3, 4],
[3, 4, 5]])
But there's no need to slow things down with that iteration. Just pass the tuple to foo1:
In [108]: foo1(X)
Out[108]:
array([[2, 3, 4],
[3, 4, 5]])
In f = lambda x: x[0] + x[1] if x[0] > 0 else 0 you get the 'ambiguity' valueerror because if only works with scalars. But there are plenty of faster numpy ways of replacing such an if step.
Related
>>> ndarr = np.array([0, 1, 2])
>>> (lambda x: x + 1) (ndarr)
array([1, 2, 3])
I see that it replaces every element with the function applied to it.
but when I do this to a two dimensional array:
>>> ndarr = np.array([[0, 1, 2], [3, 4, 5]])
>>> (lambda x: x[0]) (ndarr)
array([0, 1, 2])
I thought this would take the two elements of the array which are [0, 1, 2] and [3, 4, 5], apply the lambda to them resulting in 0 and 3, and the result would be [0, 3]. but this applies the function to the whole array instead. why? and wat do I do to get [0, 3]?
Applying + 1 on a numpy array will cause all elements to be incremented by one, as you have noticed.
However, this does not imply that every operation you perform using a numpy array will be a "mapping". In the second example, there is simply a nested list, and you select the first element of the outer list. The exact same would happen if you just used regular Python:
>>> nlist = [[0, 1, 2], [3, 4, 5]]
>>> (lambda x: x[0]) (nlist)
[0, 1, 2]
You are looking for a mapping if you want to apply an operation on each of the nested arrays. However, this problem can be solved with a slice most easily:
>>> ndarr = np.array([[0, 1, 2], [3, 4, 5]])
>>> ndarr[:,0]
array([0, 3])
I'd like to copy a numpy 2D array into a third dimension. For example, given the 2D numpy array:
import numpy as np
arr = np.array([[1, 2], [1, 2]])
# arr.shape = (2, 2)
convert it into a 3D matrix with N such copies in a new dimension. Acting on arr with N=3, the output should be:
new_arr = np.array([[[1, 2], [1,2]],
[[1, 2], [1, 2]],
[[1, 2], [1, 2]]])
# new_arr.shape = (3, 2, 2)
Probably the cleanest way is to use np.repeat:
a = np.array([[1, 2], [1, 2]])
print(a.shape)
# (2, 2)
# indexing with np.newaxis inserts a new 3rd dimension, which we then repeat the
# array along, (you can achieve the same effect by indexing with None, see below)
b = np.repeat(a[:, :, np.newaxis], 3, axis=2)
print(b.shape)
# (2, 2, 3)
print(b[:, :, 0])
# [[1 2]
# [1 2]]
print(b[:, :, 1])
# [[1 2]
# [1 2]]
print(b[:, :, 2])
# [[1 2]
# [1 2]]
Having said that, you can often avoid repeating your arrays altogether by using broadcasting. For example, let's say I wanted to add a (3,) vector:
c = np.array([1, 2, 3])
to a. I could copy the contents of a 3 times in the third dimension, then copy the contents of c twice in both the first and second dimensions, so that both of my arrays were (2, 2, 3), then compute their sum. However, it's much simpler and quicker to do this:
d = a[..., None] + c[None, None, :]
Here, a[..., None] has shape (2, 2, 1) and c[None, None, :] has shape (1, 1, 3)*. When I compute the sum, the result gets 'broadcast' out along the dimensions of size 1, giving me a result of shape (2, 2, 3):
print(d.shape)
# (2, 2, 3)
print(d[..., 0]) # a + c[0]
# [[2 3]
# [2 3]]
print(d[..., 1]) # a + c[1]
# [[3 4]
# [3 4]]
print(d[..., 2]) # a + c[2]
# [[4 5]
# [4 5]]
Broadcasting is a very powerful technique because it avoids the additional overhead involved in creating repeated copies of your input arrays in memory.
* Although I included them for clarity, the None indices into c aren't actually necessary - you could also do a[..., None] + c, i.e. broadcast a (2, 2, 1) array against a (3,) array. This is because if one of the arrays has fewer dimensions than the other then only the trailing dimensions of the two arrays need to be compatible. To give a more complicated example:
a = np.ones((6, 1, 4, 3, 1)) # 6 x 1 x 4 x 3 x 1
b = np.ones((5, 1, 3, 2)) # 5 x 1 x 3 x 2
result = a + b # 6 x 5 x 4 x 3 x 2
Another way is to use numpy.dstack. Supposing that you want to repeat the matrix a num_repeats times:
import numpy as np
b = np.dstack([a]*num_repeats)
The trick is to wrap the matrix a into a list of a single element, then using the * operator to duplicate the elements in this list num_repeats times.
For example, if:
a = np.array([[1, 2], [1, 2]])
num_repeats = 5
This repeats the array of [1 2; 1 2] 5 times in the third dimension. To verify (in IPython):
In [110]: import numpy as np
In [111]: num_repeats = 5
In [112]: a = np.array([[1, 2], [1, 2]])
In [113]: b = np.dstack([a]*num_repeats)
In [114]: b[:,:,0]
Out[114]:
array([[1, 2],
[1, 2]])
In [115]: b[:,:,1]
Out[115]:
array([[1, 2],
[1, 2]])
In [116]: b[:,:,2]
Out[116]:
array([[1, 2],
[1, 2]])
In [117]: b[:,:,3]
Out[117]:
array([[1, 2],
[1, 2]])
In [118]: b[:,:,4]
Out[118]:
array([[1, 2],
[1, 2]])
In [119]: b.shape
Out[119]: (2, 2, 5)
At the end we can see that the shape of the matrix is 2 x 2, with 5 slices in the third dimension.
Use a view and get free runtime! Extend generic n-dim arrays to n+1-dim
Introduced in NumPy 1.10.0, we can leverage numpy.broadcast_to to simply generate a 3D view into the 2D input array. The benefit would be no extra memory overhead and virtually free runtime. This would be essential in cases where the arrays are big and we are okay to work with views. Also, this would work with generic n-dim cases.
I would use the word stack in place of copy, as readers might confuse it with the copying of arrays that creates memory copies.
Stack along first axis
If we want to stack input arr along the first axis, the solution with np.broadcast_to to create 3D view would be -
np.broadcast_to(arr,(3,)+arr.shape) # N = 3 here
Stack along third/last axis
To stack input arr along the third axis, the solution to create 3D view would be -
np.broadcast_to(arr[...,None],arr.shape+(3,))
If we actually need a memory copy, we can always append .copy() there. Hence, the solutions would be -
np.broadcast_to(arr,(3,)+arr.shape).copy()
np.broadcast_to(arr[...,None],arr.shape+(3,)).copy()
Here's how the stacking works for the two cases, shown with their shape information for a sample case -
# Create a sample input array of shape (4,5)
In [55]: arr = np.random.rand(4,5)
# Stack along first axis
In [56]: np.broadcast_to(arr,(3,)+arr.shape).shape
Out[56]: (3, 4, 5)
# Stack along third axis
In [57]: np.broadcast_to(arr[...,None],arr.shape+(3,)).shape
Out[57]: (4, 5, 3)
Same solution(s) would work to extend a n-dim input to n+1-dim view output along the first and last axes. Let's explore some higher dim cases -
3D input case :
In [58]: arr = np.random.rand(4,5,6)
# Stack along first axis
In [59]: np.broadcast_to(arr,(3,)+arr.shape).shape
Out[59]: (3, 4, 5, 6)
# Stack along last axis
In [60]: np.broadcast_to(arr[...,None],arr.shape+(3,)).shape
Out[60]: (4, 5, 6, 3)
4D input case :
In [61]: arr = np.random.rand(4,5,6,7)
# Stack along first axis
In [62]: np.broadcast_to(arr,(3,)+arr.shape).shape
Out[62]: (3, 4, 5, 6, 7)
# Stack along last axis
In [63]: np.broadcast_to(arr[...,None],arr.shape+(3,)).shape
Out[63]: (4, 5, 6, 7, 3)
and so on.
Timings
Let's use a large sample 2D case and get the timings and verify output being a view.
# Sample input array
In [19]: arr = np.random.rand(1000,1000)
Let's prove that the proposed solution is a view indeed. We will use stacking along first axis (results would be very similar for stacking along the third axis) -
In [22]: np.shares_memory(arr, np.broadcast_to(arr,(3,)+arr.shape))
Out[22]: True
Let's get the timings to show that it's virtually free -
In [20]: %timeit np.broadcast_to(arr,(3,)+arr.shape)
100000 loops, best of 3: 3.56 µs per loop
In [21]: %timeit np.broadcast_to(arr,(3000,)+arr.shape)
100000 loops, best of 3: 3.51 µs per loop
Being a view, increasing N from 3 to 3000 changed nothing on timings and both are negligible on timing units. Hence, efficient both on memory and performance!
This can now also be achived using np.tile as follows:
import numpy as np
a = np.array([[1,2],[1,2]])
b = np.tile(a,(3, 1,1))
b.shape
(3,2,2)
b
array([[[1, 2],
[1, 2]],
[[1, 2],
[1, 2]],
[[1, 2],
[1, 2]]])
A=np.array([[1,2],[3,4]])
B=np.asarray([A]*N)
Edit #Mr.F, to preserve dimension order:
B=B.T
Here's a broadcasting example that does exactly what was requested.
a = np.array([[1, 2], [1, 2]])
a=a[:,:,None]
b=np.array([1]*5)[None,None,:]
Then b*a is the desired result and (b*a)[:,:,0] produces array([[1, 2],[1, 2]]), which is the original a, as does (b*a)[:,:,1], etc.
Summarizing the solutions above:
a = np.arange(9).reshape(3,-1)
b = np.repeat(a[:, :, np.newaxis], 5, axis=2)
c = np.dstack([a]*5)
d = np.tile(a, [5,1,1])
e = np.array([a]*5)
f = np.repeat(a[np.newaxis, :, :], 5, axis=0) # np.repeat again
print('b='+ str(b.shape), b[:,:,-1].tolist())
print('c='+ str(c.shape),c[:,:,-1].tolist())
print('d='+ str(d.shape),d[-1,:,:].tolist())
print('e='+ str(e.shape),e[-1,:,:].tolist())
print('f='+ str(f.shape),f[-1,:,:].tolist())
b=(3, 3, 5) [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
c=(3, 3, 5) [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
d=(5, 3, 3) [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
e=(5, 3, 3) [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
f=(5, 3, 3) [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
Good luck
I have an array like this and need to replace every 1 with 2, every 3 with 4, every 4 with 1. Is there a way to do this just with np and not loops?
import numpy as np
np.random.seed(2)
arr=np.random.randint(1,5,(3,3),int)
arr
array([[1, 4, 2],
[1, 3, 4],
[3, 4, 1]])
If I use array mask sequentially, it doesn't give the expected outcome:
array([[2, 1, 2],
[2, 4, 1],
[4, 1, 2]])
It is based on a conditional logic and not maths formula
If the array values don't necessarely range between 1 and 4 you can use np.select:
import numpy as np
a = np.random.randint(1,5, (3,3))
condlist = [np.logical_or(a==1, a==2), a==3, a==4]
choicelist= [2, 4, 1]
b = np.select(condlist, choicelist)
which does not care about the order of the conditions
Here's one with np.searchsorted for performance efficiency -
def map_values(arr, old_val, new_val):
sidx = old_val.argsort()
idx = np.searchsorted(old_val,arr,sorter=sidx)
return np.where(old_val[idx]==arr, new_val[sidx[idx]], arr)
Sample run -
In [40]: arr
Out[40]:
array([[1, 4, 2],
[1, 3, 4],
[3, 4, 1]])
In [41]: old_val = np.array([1,3,4])
...: new_val = np.array([2,4,1])
In [42]: map_values(arr, old_val, new_val)
Out[42]:
array([[2, 1, 2],
[2, 4, 1],
[4, 1, 2]])
Could do this with a lambda function and np.vectorize():
import numpy as np
np.random.seed(2)
arr=np.random.randint(1,5,(3,3),int)
f = lambda x: x%4 + 1 if x in [1,3,4] else x
vfunc = np.vectorize(f)
Usage:
>>> vfunc(arr)
array([[2, 1, 2],
[2, 4, 1],
[4, 1, 2]])
You have to be careful about the order of assignments. For example, if you do
arr[arr == 4] = 1
arr[arr == 1] = 2
Now all of the elements that were originally 4 will be 2, not 1 as you intend.
One solution is to carefully craft the order of assignments:
arr[arr == 1] = 2
arr[arr == 4] = 1
However, this is very brittle and will fall apart as you introduce more of them. It would be better to create the masks up front from the original array:
ones = arr == 1
fours = arr == 4
arr[ones] = 2
arr[fours] = 1
Now the order of the assignments won't matter because the masks are determined before modifying the array.
You want arr % 4 + 1, except in the case of 2, which stays the same. So use np.where to find all the 2s. Then do arr % 4 + 1, then reset all the 2s.
import numpy as np
np.random.seed(2)
arr=np.random.randint(1,5,(3,3),int)
twos = np.where(arr == 2)
arr = arr % 4 + 1
arr[twos] = 2
print(arr)
I'm trying to get the indices to sort a multidimensional array by the last axis, e.g.
>>> a = np.array([[3,1,2],[8,9,2]])
And I'd like indices i such that,
>>> a[i]
array([[1, 2, 3],
[2, 8, 9]])
Based on the documentation of numpy.argsort I thought it should do this, but I'm getting the error:
>>> a[np.argsort(a)]
IndexError: index 2 is out of bounds for axis 0 with size 2
Edit: I need to rearrange other arrays of the same shape (e.g. an array b such that a.shape == b.shape) in the same way... so that
>>> b = np.array([[0,5,4],[3,9,1]])
>>> b[i]
array([[5,4,0],
[9,3,1]])
Solution:
>>> a[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
array([[1, 2, 3],
[2, 8, 9]])
You got it right, though I wouldn't describe it as cheating the indexing.
Maybe this will help make it clearer:
In [544]: i=np.argsort(a,axis=1)
In [545]: i
Out[545]:
array([[1, 2, 0],
[2, 0, 1]])
i is the order that we want, for each row. That is:
In [546]: a[0, i[0,:]]
Out[546]: array([1, 2, 3])
In [547]: a[1, i[1,:]]
Out[547]: array([2, 8, 9])
To do both indexing steps at once, we have to use a 'column' index for the 1st dimension.
In [548]: a[[[0],[1]],i]
Out[548]:
array([[1, 2, 3],
[2, 8, 9]])
Another array that could be paired with i is:
In [560]: j=np.array([[0,0,0],[1,1,1]])
In [561]: j
Out[561]:
array([[0, 0, 0],
[1, 1, 1]])
In [562]: a[j,i]
Out[562]:
array([[1, 2, 3],
[2, 8, 9]])
If i identifies the column for each element, then j specifies the row for each element. The [[0],[1]] column array works just as well because it can be broadcasted against i.
I think of
np.array([[0],
[1]])
as 'short hand' for j. Together they define the source row and column of each element of the new array. They work together, not sequentially.
The full mapping from a to the new array is:
[a[0,1] a[0,2] a[0,0]
a[1,2] a[1,0] a[1,1]]
def foo(a):
i = np.argsort(a, axis=1)
return (np.arange(a.shape[0])[:,None], i)
In [61]: foo(a)
Out[61]:
(array([[0],
[1]]), array([[1, 2, 0],
[2, 0, 1]], dtype=int32))
In [62]: a[foo(a)]
Out[62]:
array([[1, 2, 3],
[2, 8, 9]])
The above answers are now a bit outdated, since new functionality was added in numpy 1.15 to make it simpler; take_along_axis (https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.take_along_axis.html) allows you to do:
>>> a = np.array([[3,1,2],[8,9,2]])
>>> np.take_along_axis(a, a.argsort(axis=-1), axis=-1)
array([[1 2 3]
[2 8 9]])
I found the answer here, with someone having the same problem. They key is just cheating the indexing to work properly...
>>> a[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
array([[1, 2, 3],
[2, 8, 9]])
You can also use linear indexing, which might be better with performance, like so -
M,N = a.shape
out = b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
So, a.argsort(1)+(np.arange(M)[:,None]*N) basically are the linear indices that are used to map b to get the desired sorted output for b. The same linear indices could also be used on a for getting the sorted output for a.
Sample run -
In [23]: a = np.array([[3,1,2],[8,9,2]])
In [24]: b = np.array([[0,5,4],[3,9,1]])
In [25]: M,N = a.shape
In [26]: b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
Out[26]:
array([[5, 4, 0],
[1, 3, 9]])
Rumtime tests -
In [27]: a = np.random.rand(1000,1000)
In [28]: b = np.random.rand(1000,1000)
In [29]: M,N = a.shape
In [30]: %timeit b[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
10 loops, best of 3: 133 ms per loop
In [31]: %timeit b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
10 loops, best of 3: 96.7 ms per loop
I tried to use numpy.apply_along_axis, but this seems to work only when the applied function collapses the dimension and not when it expands it.
Example:
def dup(x):
return np.array([x, x])
a = np.array([1,2,3])
np.apply_along_axis(dup, axis=0, arr=a) # This doesn't work
I was expecting the matrix below (notice how its dimension has expanded from the input matrix a):
np.array([[1, 1], [2, 2], [3, 3]])
In R, this would be accomplished by the **ply set of functions from the plyr package. How to do it with numpy?
If you just want to repeat the elements you can use np.repeat :
>>> np.repeat(a,2).reshape(3,2)
array([[1, 1],
[2, 2],
[3, 3]])
And for apply a function use np.frompyfunc and for convert to an integrate array use np.vstack:
>>> def dup(x):
... return np.array([x, x])
>>> oct_array = np.frompyfunc(dup, 1, 1)
>>> oct_array(a)
array([array([1, 1]), array([2, 2]), array([3, 3])], dtype=object)
>>> np.vstack(oct_array(a))
array([[1, 1],
[2, 2],
[3, 3]])
For someone used to general Python code, a list comprehension may be the simplest approach:
In [20]: np.array([dup(x) for x in a])
Out[20]:
array([[1, 1],
[2, 2],
[3, 3]])
The comprehension (a loop or mapping that applies dup to each element of a) returns [array([1, 1]), array([2, 2]), array([3, 3])], which is easily turned into a 2d array with np.array().
At least for this small a, it is also faster than the np.frompyfunc approach. The np.frompyfunc function will give full access to broadcasting, but evidently it doesn't apply any fast iteration tricks.
apply_along_axis can help keep indices straight when dealing with many dimensions, but it still is just an iteration method. It's written Python so you can study its code yourself. It is much more complicated than needed for this simple case.
In order for your example to work as expected, a should be 2-dimensional:
def dup(x):
# x is now an array of size 1
return np.array([ x[0], x[0] ])
a = np.array([[1,2,3]]) # 2dim
np.apply_along_axis(dup, axis=0, arr=a)
=>
array([[1, 2, 3],
[1, 2, 3]])
Of course, you probably want to transpose the result.