Related
I have a NumPy array of shape (5, x, y).
I want to modify every element in the first three channels only by an equation
element = (element - a)/b.
I want the other two channels to remain the same. How would you index the array to achieve this?
Since shape is (channels, x, y) you can use
x = np.random.rand(5,300,400)
a,b = 10,15
x[0:3] = (x[0:3] - a)/b
Generally one would use indices to get a slice of the array with only relevant values.
>>> desired_shape = (5, 2, 2) # xy = (2,2) in this example
>>> ary = np.array(range(5 * 2 * 2))
>>> ary.shape = desired_shape
>>> ary
array([[[ 0, 1],
[ 2, 3]],
[[ 4, 5],
[ 6, 7]],
[[ 8, 9],
[10, 11]],
[[12, 13],
[14, 15]],
[[16, 17],
[18, 19]]])
>>> channels_view = ary[:3, ...] # up to 3 in 1st axis, preserve the others
>>> channels_view
array([[[ 0, 1],
[ 2, 3]],
[[ 4, 5],
[ 6, 7]],
[[ 8, 9],
[10, 11]]])
>>> ary[:3, ...] = (ary[:3, ...] - a) / b
It is also possible to use np.view(), so we can string more operations without having to slice the array every time.
>>> view = ary.view()
>>> view = view[:3, ...]
>>> view
array([[[ 0, 1],
[ 2, 3]],
[[ 4, 5],
[ 6, 7]],
[[ 8, 9],
[10, 11]]])
For this example, let's suppose we want to halve all the values in the first three channels for now:
>>> view //= 2 # use //= rather than /=, because this is an integer array, and we didn't specify a dtype, so numpy assumes fixed point integers (longs) rather than floats
>>> view
array([[[0, 0],
[1, 1]],
[[2, 2],
[3, 3]],
[[4, 4],
[5, 5]]])
>>> ary
array([[[ 0, 0],
[ 1, 1]],
[[ 2, 2],
[ 3, 3]],
[[ 4, 4],
[ 5, 5]],
[[12, 13],
[14, 15]],
[[16, 17],
[18, 19]]])
But uh oh! Turns out we actually had to multiply this by several factors of a number!
>>> factors_of_420
[2, 2, 3, 5, 7]
Kind of a dumb example, I know, but just assume we can't know what the number is ahead of time. Like, just pretend we're getting the factors from a TCP server or something.
We could write it like this:
>>> ary
array([[[ 0, 1],
[ 2, 3]],
[[ 4, 5],
[ 6, 7]],
[[ 8, 9],
[10, 11]],
[[12, 13],
[14, 15]],
[[16, 17],
[18, 19]]])
>>> for fac in factors_of_420:
... ary[:3, ...] = ary[:3, ...] * fac
...
>>> ary
array([[[ 0, 420],
[ 840, 1260]],
[[1680, 2100],
[2520, 2940]],
[[3360, 3780],
[4200, 4620]],
[[ 12, 13],
[ 14, 15]],
[[ 16, 17],
[ 18, 19]]])
But that's a bit ugly, isn't it? Also, I bet running the slicing operation twice (once for setting, and once for getting) for every factor in the list can be a bit of a performance hit.
This is where view shines. We can just make one view, and operate on that, and numpy applies the operations to the underlying array for us:
We don't need to sacrifice anything. We make nicer and faster code at the same time!
>>> ary
array([[[ 0, 1],
[ 2, 3]],
[[ 4, 5],
[ 6, 7]],
[[ 8, 9],
[10, 11]],
[[12, 13],
[14, 15]],
[[16, 17],
[18, 19]]])
>>> view = ary.view()[:3, ...] # make our pre-sliced view, yum!
>>> view
array([[[ 0, 1],
[ 2, 3]],
[[ 4, 5],
[ 6, 7]],
[[ 8, 9],
[10, 11]]])
>>> for fac in factors_of_420:
... view *= fac # use the *= (in place) operator, because 'view =' sets view to something else and does not apply to ary
...
>>> view
array([[[ 0, 420],
[ 840, 1260]],
[[1680, 2100],
[2520, 2940]],
[[3360, 3780],
[4200, 4620]]])
>>> ary
array([[[ 0, 420],
[ 840, 1260]],
[[1680, 2100],
[2520, 2940]],
[[3360, 3780],
[4200, 4620]],
[[ 12, 13],
[ 14, 15]],
[[ 16, 17],
[ 18, 19]]])
Let's see what timing tells us.
>>> class WeirdSliceMultiplier:
... def __init__(self):
... self.factors = [2, 2, 3, 5, 7]
... def setup(self):
... self.ary = np.reshape(range(5 * 2 * 2), (5, 2, 2))
... def setup_with_view(self):
... self.setup()
... self.view = self.ary.view()[:3, ...]
... def multiply_every_slice(self):
... for fac in self.factors:
... self.ary[:3, ...] = self.ary[:3, ...] * fac
... def multiply_view(self):
... for fac in self.factors:
... self.view *= fac
>>> timeit.timeit(multiplier.multiply_every_slice, multiplier.setup, number=50000) # 'slice for every factor' version
0.9404756519943476
>>> timeit.timeit(multiplier.multiply_view, multiplier.setup_with_view, number=50000) # 'slice view ahead of time' version
0.8748960520024411
Note in the second timeit call that view is set in the setup (preciesly, in setup_with_view), rather than the actual function being timed. This is because setting view doesn't count to the final time, as it is supposed to be ahead of time, and we're only counting the actual operation of multiplication, not any others that apply to view but may be stringed before or after.
Edit: Also, as #MadPhysicist pointed out in #mujiga's answer, we may actually prefer using the inplace operators. In fact, we already use them in the multiply_view function, so using inplace operators for both is a fairer comparison:
>>> class WeirdSliceMultiplier:
... def __init__(self):
... self.factors = [2, 2, 3, 5, 7]
... def setup(self):
... self.ary = np.reshape(range(5 * 2 * 2), (5, 2, 2))
... def setup_with_view(self):
... self.setup()
... self.view = self.ary.view()[:3, ...]
... def multiply_every_slice_inplace(self):
... for fac in self.factors:
... self.ary[:3, ...] *= fac
... def multiply_view(self):
... for fac in self.factors:
... self.view *= fac
...
>>> multiplier = WeirdSliceMultiplier()
>>> timeit.timeit(multiplier.multiply_every_slice_inplace, multiplier.setup, number=50000) # 'slice for every factor' version, but with inplace operators
1.0672136489883997
>>> timeit.timeit(multiplier.multiply_view, multiplier.setup_with_view, number=50000) # 'slice view ahead of time' version again for comparison
0.9300520950055216
The strange change in the time to execute setup_with_view, possibly to do with the CPU load average or something, can be fixed using a normalizing factor:
>>> old_viewslice_time = 0.8748960520024411
>>> new_viewslice_time = 0.9300520950055216
>>> norm_fac = old_viewslice_time / new_viewslice_time
>>> norm_fac
0.9406957488733435
>>> new_viewslice_time * norm_fac # should be very similar to old_viewslice_time
0.8748960520024411
>>> new_everyslice_inplace_time = 1.0672136489883997
>>> new_everyslice_inplace_time * norm_fac
1.003923342742996
How can we get a new matrix containing the average value of A row for each column if B[i,j] == 1 ?
Suppose we have a matrix A(3,4) and a matrix B(3,3)
A = [1 2 3 4
15 20 7 10
0 5 18 12]
And an adjacency matrix
B = [1 0 1
0 0 1
1 1 1 ]
Expected output matrix C which takes the average value of the connected pixels in B :
for example [(1+0)/2 (2+5)/2 (3+18)/2 (4+12)/2] so we get [0.5 , 3.5 10.5 8] in the first row.
C =[0.5 3.5 10.5 8
0 5 18 12
5.33 9 9.33 8.66]
To find the neighborhood of each i, I implemented the following code :
for i in range(A.shape[0]):
for j in range(A.shape[0]):
if (B[i,j] == 1):
print(j)
You can form the sums you need by matrix multiplying:
>>> A = np.array([[1, 2, 3, 4], [15, 20, 7, 10], [0, 5, 18, 12]])
>>> B = np.array([[1, 0, 1], [0, 0, 1], [1, 1, 1]])
>>> summed_groups = B#A
>>> summed_groups
array([[ 1, 7, 21, 16],
[ 0, 5, 18, 12],
[16, 27, 28, 26]])
To get the means normalize by the number of terms per group:
>>> group_sizes = B.sum(axis=1,keepdims=True)
>>> group_sizes
array([[2],
[1],
[3]])
>>> summed_groups / group_sizes
array([[ 0.5 , 3.5 , 10.5 , 8. ],
[ 0. , 5. , 18. , 12. ],
[ 5.33333333, 9. , 9.33333333, 8.66666667]])
Side note: you could also get the group sizes by matrix multiplication:
>>> group_sizes_alt = B#np.ones((len(A),1))
>>> group_sizes_alt
array([[2.],
[1.],
[3.]])
It is convenient to use boolean indexing. For example,
>>> A[[True, False, True], :]
array([[ 1, 2, 3, 4],
[ 0, 5, 18, 12]])
this selects rows 0 and 2 of the A matrix. You can loop over the columns of B and construct the C matrix:
A = np.array([[1, 2, 3, 4], [15, 20, 7, 10], [0, 5, 18, 12]])
B = np.array([[1, 0, 1], [0, 0, 1], [1, 1, 1]]).astype(bool)
C = np.array([A[B[:, i], :].mean(axis=0) for i in range(A.shape[0])])
print(np.around(C, 2))
Result:
[[ 0.5 3.5 10.5 8. ]
[ 0. 5. 18. 12. ]
[ 5.33 9. 9.33 8.67]]
a = np.array([0,1,2])
b = np.array([3,4,5,6,7])
...
c = np.dot(a,b)
I want to transpose b so I can calculate the dot product of a and b.
You can use numpy's broadcasting for this:
import numpy as np
a = np.array([0,1,2])
b = np.array([3,4,5,6,7])
In [3]: a[:,None]*b
Out[3]:
array([[ 0, 0, 0, 0, 0],
[ 3, 4, 5, 6, 7],
[ 6, 8, 10, 12, 14]])
This has nothing to do with a dot product, though. But in the comments you said, that you want this result.
You could also use the numpy function outer:
In [4]: np.outer(a, b)
Out[4]:
array([[ 0, 0, 0, 0, 0],
[ 3, 4, 5, 6, 7],
[ 6, 8, 10, 12, 14]])
Well for this what you want is the outer product of the two arrays. The function you want to use for this is np.outer, :
a = np.array([0,1,2])
b = np.array([3,4,5,6,7])
np.outer(a,b)
array([[ 0, 0, 0, 0, 0],
[ 3, 4, 5, 6, 7],
[ 6, 8, 10, 12, 14]])
So with NumPy you could reshape swapping axes:
a = np.swapaxes([a], 1, 0)
# [[0]
# [1]
# [2]]
Then
print(a * b)
# [[ 0 0 0 0 0]
# [ 3 4 5 6 7]
# [ 6 8 10 12 14]]
Swapping b require to transpose the product, se here below.
Or usual NumPy reshape:
a = np.array([0,1,2])
b = np.array([3,4,5,6,7]).reshape(5,1)
print((a * b).T)
# [[ 0 0 0 0 0]
# [ 3 4 5 6 7]
# [ 6 8 10 12 14]]
Reshape is like b = np.array([ [bb] for bb in [3,4,5,6,7] ]) then b becomes:
# [[3]
# [4]
# [5]
# [6]
# [7]]
While reshaping a no need to transpose:
a = np.array([0,1,2]).reshape(3,1)
b = np.array([3,4,5,6,7])
print(a * b)
# [[ 0 0 0 0 0]
# [ 3 4 5 6 7]
# [ 6 8 10 12 14]]
Just out of curiosity, good old list comprehension:
a = [0,1,2]
b = [3,4,5,6,7]
print( [ [aa * bb for bb in b] for aa in a ] )
#=> [[0, 0, 0, 0, 0], [3, 4, 5, 6, 7], [6, 8, 10, 12, 14]]
Others have provided the outer and broadcasted solutions. Here's the dot one(s):
np.dot(a.reshape(3,1), b.reshape(1,5))
a[:,None].dot(b[None,:])
a[None].T.dot( b[None])
Conceptually I think it's a bit of an overkill, but due to implementation details, it actually is fastest
.
I have an array of values that I want to replace with from an array of choices based on which choice is linearly closest.
The catch is the size of the choices is defined at runtime.
import numpy as np
a = np.array([[0, 0, 0], [4, 4, 4], [9, 9, 9]])
choices = np.array([1, 5, 10])
If choices was static in size, I would simply use np.where
d = np.where(np.abs(a - choices[0]) > np.abs(a - choices[1]),
np.where(np.abs(a - choices[0]) > np.abs(a - choices[2]), choices[0], choices[2]),
np.where(np.abs(a - choices[1]) > np.abs(a - choices[2]), choices[1], choices[2]))
To get the output:
>>d
>>[[1, 1, 1], [5, 5, 5], [10, 10, 10]]
Is there a way to do this more dynamically while still preserving the vectorization.
Subtract choices from a, find the index of the minimum of the result, substitute.
a = np.array([[0, 0, 0], [4, 4, 4], [9, 9, 9]])
choices = np.array([1, 5, 10])
b = a[:,:,None] - choices
np.absolute(b,b)
i = np.argmin(b, axis = -1)
a = choices[i]
print a
>>>
[[ 1 1 1]
[ 5 5 5]
[10 10 10]]
a = np.array([[0, 3, 0], [4, 8, 4], [9, 1, 9]])
choices = np.array([1, 5, 10])
b = a[:,:,None] - choices
np.absolute(b,b)
i = np.argmin(b, axis = -1)
a = choices[i]
print a
>>>
[[ 1 1 1]
[ 5 10 5]
[10 1 10]]
>>>
The extra dimension was added to a so that each element of choices would be subtracted from each element of a. choices was broadcast against a in the third dimension, This link has a decent graphic. b.shape is (3,3,3). EricsBroadcastingDoc is a pretty good explanation and has a graphic 3-d example at the end.
For the second example:
>>> print b
[[[ 1 5 10]
[ 2 2 7]
[ 1 5 10]]
[[ 3 1 6]
[ 7 3 2]
[ 3 1 6]]
[[ 8 4 1]
[ 0 4 9]
[ 8 4 1]]]
>>> print i
[[0 0 0]
[1 2 1]
[2 0 2]]
>>>
The final assignment uses an Index Array or Integer Array Indexing.
In the second example, notice that there was a tie for element a[0,1] , either one or five could have been substituted.
To explain wwii's excellent answer in a little more detail:
The idea is to create a new dimension which does the job of comparing each element of a to each element in choices using numpy broadcasting. This is easily done for an arbitrary number of dimensions in a using the ellipsis syntax:
>>> b = np.abs(a[..., np.newaxis] - choices)
array([[[ 1, 5, 10],
[ 1, 5, 10],
[ 1, 5, 10]],
[[ 3, 1, 6],
[ 3, 1, 6],
[ 3, 1, 6]],
[[ 8, 4, 1],
[ 8, 4, 1],
[ 8, 4, 1]]])
Taking argmin along the axis you just created (the last axis, with label -1) gives you the desired index in choices that you want to substitute:
>>> np.argmin(b, axis=-1)
array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2]])
Which finally allows you to choose those elements from choices:
>>> d = choices[np.argmin(b, axis=-1)]
>>> d
array([[ 1, 1, 1],
[ 5, 5, 5],
[10, 10, 10]])
For a non-symmetric shape:
Let's say a had shape (2, 5):
>>> a = np.arange(10).reshape((2, 5))
>>> a
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
Then you'd get:
>>> b = np.abs(a[..., np.newaxis] - choices)
>>> b
array([[[ 1, 5, 10],
[ 0, 4, 9],
[ 1, 3, 8],
[ 2, 2, 7],
[ 3, 1, 6]],
[[ 4, 0, 5],
[ 5, 1, 4],
[ 6, 2, 3],
[ 7, 3, 2],
[ 8, 4, 1]]])
This is hard to read, but what it's saying is, b has shape:
>>> b.shape
(2, 5, 3)
The first two dimensions came from the shape of a, which is also (2, 5). The last dimension is the one you just created. To get a better idea:
>>> b[:, :, 0] # = abs(a - 1)
array([[1, 0, 1, 2, 3],
[4, 5, 6, 7, 8]])
>>> b[:, :, 1] # = abs(a - 5)
array([[5, 4, 3, 2, 1],
[0, 1, 2, 3, 4]])
>>> b[:, :, 2] # = abs(a - 10)
array([[10, 9, 8, 7, 6],
[ 5, 4, 3, 2, 1]])
Note how b[:, :, i] is the absolute difference between a and choices[i], for each i = 1, 2, 3.
Hope that helps explain this a little more clearly.
I love broadcasting and would have gone that way myself too. But, with large arrays, I would like to suggest another approach with np.searchsorted that keeps it memory efficient and thus achieves performance benefits, like so -
def searchsorted_app(a, choices):
lidx = np.searchsorted(choices, a, 'left').clip(max=choices.size-1)
ridx = (np.searchsorted(choices, a, 'right')-1).clip(min=0)
cl = np.take(choices,lidx) # Or choices[lidx]
cr = np.take(choices,ridx) # Or choices[ridx]
mask = np.abs(a - cl) > np.abs(a - cr)
cl[mask] = cr[mask]
return cl
Please note that if the elements in choices are not sorted, we need to add in the additional argument sorter with np.searchsorted.
Runtime test -
In [160]: # Setup inputs
...: a = np.random.rand(100,100)
...: choices = np.sort(np.random.rand(100))
...:
In [161]: def broadcasting_app(a, choices): # #wwii's solution
...: return choices[np.argmin(np.abs(a[:,:,None] - choices),-1)]
...:
In [162]: np.allclose(broadcasting_app(a,choices),searchsorted_app(a,choices))
Out[162]: True
In [163]: %timeit broadcasting_app(a, choices)
100 loops, best of 3: 9.3 ms per loop
In [164]: %timeit searchsorted_app(a, choices)
1000 loops, best of 3: 1.78 ms per loop
Related post : Find elements of array one nearest to elements of array two
for i in range(limit_1):
for j in range(limit_2):
a[i][j]=np.sqrt(np.absolute(b[i])**2+np.absolute(c[j])**2)
Is there an alternative way to perform this task, using perhaps a numpy function?
Your original code:
limit_1 = 4
limit_2 = 3
import numpy as np
a = np.zeros([limit_1, limit_2])
b = np.array([1, -6, 7, 3])
c = np.array([3, 2, -1])
print("Original:")
for i in range(limit_1):
for j in range(limit_2):
a[i][j]=np.sqrt(np.absolute(b[i])**2+np.absolute(c[j])**2)
print(a)
Outputs:
Original:
[[ 3.16227766 2.23606798 1.41421356]
[ 6.70820393 6.32455532 6.08276253]
[ 7.61577311 7.28010989 7.07106781]
[ 4.24264069 3.60555128 3.16227766]]
And the shortened version:
print("Improved:")
a = np.sqrt(
np.tile(np.array([b]).transpose(), (1, limit_2)) ** 2 +\
np.tile(np.array(c).transpose(), (limit_1, 1)) ** 2)
print(a)
Outputs:
Improved:
[[ 3.16227766 2.23606798 1.41421356]
[ 6.70820393 6.32455532 6.08276253]
[ 7.61577311 7.28010989 7.07106781]
[ 4.24264069 3.60555128 3.16227766]]
Explanation
First we stretch the vector column b to a matrix (and then take it's 2nd power):
>>> np.tile(np.array([b]).transpose(), (1, limit_2))
array([[ 1, 1, 1],
[-6, -6, -6],
[ 7, 7, 7],
[ 3, 3, 3]])
>>> np.tile(np.array([b]).transpose(), (1, limit_2)) ** 2
array([[ 1, 1, 1],
[36, 36, 36],
[49, 49, 49],
[ 9, 9, 9]])
Then we do the same for the row column c:
>>> np.tile(np.array(c).transpose(), (limit_1, 1))
array([[ 3, 2, -1],
[ 3, 2, -1],
[ 3, 2, -1],
[ 3, 2, -1]])
>>> np.tile(np.array(c).transpose(), (limit_1, 1)) ** 2
array([[9, 4, 1],
[9, 4, 1],
[9, 4, 1],
[9, 4, 1]])
We then sum them together and calculate the root.
P.S. 1 - I used only the squared power instead of the absolute value, but if you still need the absolute value you can use it the same way.
P.S. 2 - Notice that the calculation can be done more efficient, i.e calculate the power before we tile the arrays, but this way is more clearer for this post)
Note that there is no point of squaring the absolute value as n**2 and abs(n)**2 are exactly the same.
Either way, using list comprehension:
temp = [math.sqrt(numpy.absolute(x)**2 + numpy.absolute(y)**2) for x in b for y in c]
a = [temp[x:x+limit_2] for x in range(0, len(temp), limit_2)]
You could use broadcasting by extending b from 1D to 2D by introducing a new singleton axis as the second axis with np.newaxis/None and then perform operations against c. This would simplify things there and also achieved a vectorized method, like so -
np.sqrt(np.abs(b[:,None])**2 + np.abs(c)**2)
As also talked about in the other answers that since squaring would inherently produce non-negative numbers, so we can just skip the absolute operation, to give us -
np.sqrt(b[:,None]**2 + c**2)