I have a NumPy array of shape (5, x, y).
I want to modify every element in the first three channels only by an equation
element = (element - a)/b.
I want the other two channels to remain the same. How would you index the array to achieve this?
Since shape is (channels, x, y) you can use
x = np.random.rand(5,300,400)
a,b = 10,15
x[0:3] = (x[0:3] - a)/b
Generally one would use indices to get a slice of the array with only relevant values.
>>> desired_shape = (5, 2, 2) # xy = (2,2) in this example
>>> ary = np.array(range(5 * 2 * 2))
>>> ary.shape = desired_shape
>>> ary
array([[[ 0, 1],
[ 2, 3]],
[[ 4, 5],
[ 6, 7]],
[[ 8, 9],
[10, 11]],
[[12, 13],
[14, 15]],
[[16, 17],
[18, 19]]])
>>> channels_view = ary[:3, ...] # up to 3 in 1st axis, preserve the others
>>> channels_view
array([[[ 0, 1],
[ 2, 3]],
[[ 4, 5],
[ 6, 7]],
[[ 8, 9],
[10, 11]]])
>>> ary[:3, ...] = (ary[:3, ...] - a) / b
It is also possible to use np.view(), so we can string more operations without having to slice the array every time.
>>> view = ary.view()
>>> view = view[:3, ...]
>>> view
array([[[ 0, 1],
[ 2, 3]],
[[ 4, 5],
[ 6, 7]],
[[ 8, 9],
[10, 11]]])
For this example, let's suppose we want to halve all the values in the first three channels for now:
>>> view //= 2 # use //= rather than /=, because this is an integer array, and we didn't specify a dtype, so numpy assumes fixed point integers (longs) rather than floats
>>> view
array([[[0, 0],
[1, 1]],
[[2, 2],
[3, 3]],
[[4, 4],
[5, 5]]])
>>> ary
array([[[ 0, 0],
[ 1, 1]],
[[ 2, 2],
[ 3, 3]],
[[ 4, 4],
[ 5, 5]],
[[12, 13],
[14, 15]],
[[16, 17],
[18, 19]]])
But uh oh! Turns out we actually had to multiply this by several factors of a number!
>>> factors_of_420
[2, 2, 3, 5, 7]
Kind of a dumb example, I know, but just assume we can't know what the number is ahead of time. Like, just pretend we're getting the factors from a TCP server or something.
We could write it like this:
>>> ary
array([[[ 0, 1],
[ 2, 3]],
[[ 4, 5],
[ 6, 7]],
[[ 8, 9],
[10, 11]],
[[12, 13],
[14, 15]],
[[16, 17],
[18, 19]]])
>>> for fac in factors_of_420:
... ary[:3, ...] = ary[:3, ...] * fac
...
>>> ary
array([[[ 0, 420],
[ 840, 1260]],
[[1680, 2100],
[2520, 2940]],
[[3360, 3780],
[4200, 4620]],
[[ 12, 13],
[ 14, 15]],
[[ 16, 17],
[ 18, 19]]])
But that's a bit ugly, isn't it? Also, I bet running the slicing operation twice (once for setting, and once for getting) for every factor in the list can be a bit of a performance hit.
This is where view shines. We can just make one view, and operate on that, and numpy applies the operations to the underlying array for us:
We don't need to sacrifice anything. We make nicer and faster code at the same time!
>>> ary
array([[[ 0, 1],
[ 2, 3]],
[[ 4, 5],
[ 6, 7]],
[[ 8, 9],
[10, 11]],
[[12, 13],
[14, 15]],
[[16, 17],
[18, 19]]])
>>> view = ary.view()[:3, ...] # make our pre-sliced view, yum!
>>> view
array([[[ 0, 1],
[ 2, 3]],
[[ 4, 5],
[ 6, 7]],
[[ 8, 9],
[10, 11]]])
>>> for fac in factors_of_420:
... view *= fac # use the *= (in place) operator, because 'view =' sets view to something else and does not apply to ary
...
>>> view
array([[[ 0, 420],
[ 840, 1260]],
[[1680, 2100],
[2520, 2940]],
[[3360, 3780],
[4200, 4620]]])
>>> ary
array([[[ 0, 420],
[ 840, 1260]],
[[1680, 2100],
[2520, 2940]],
[[3360, 3780],
[4200, 4620]],
[[ 12, 13],
[ 14, 15]],
[[ 16, 17],
[ 18, 19]]])
Let's see what timing tells us.
>>> class WeirdSliceMultiplier:
... def __init__(self):
... self.factors = [2, 2, 3, 5, 7]
... def setup(self):
... self.ary = np.reshape(range(5 * 2 * 2), (5, 2, 2))
... def setup_with_view(self):
... self.setup()
... self.view = self.ary.view()[:3, ...]
... def multiply_every_slice(self):
... for fac in self.factors:
... self.ary[:3, ...] = self.ary[:3, ...] * fac
... def multiply_view(self):
... for fac in self.factors:
... self.view *= fac
>>> timeit.timeit(multiplier.multiply_every_slice, multiplier.setup, number=50000) # 'slice for every factor' version
0.9404756519943476
>>> timeit.timeit(multiplier.multiply_view, multiplier.setup_with_view, number=50000) # 'slice view ahead of time' version
0.8748960520024411
Note in the second timeit call that view is set in the setup (preciesly, in setup_with_view), rather than the actual function being timed. This is because setting view doesn't count to the final time, as it is supposed to be ahead of time, and we're only counting the actual operation of multiplication, not any others that apply to view but may be stringed before or after.
Edit: Also, as #MadPhysicist pointed out in #mujiga's answer, we may actually prefer using the inplace operators. In fact, we already use them in the multiply_view function, so using inplace operators for both is a fairer comparison:
>>> class WeirdSliceMultiplier:
... def __init__(self):
... self.factors = [2, 2, 3, 5, 7]
... def setup(self):
... self.ary = np.reshape(range(5 * 2 * 2), (5, 2, 2))
... def setup_with_view(self):
... self.setup()
... self.view = self.ary.view()[:3, ...]
... def multiply_every_slice_inplace(self):
... for fac in self.factors:
... self.ary[:3, ...] *= fac
... def multiply_view(self):
... for fac in self.factors:
... self.view *= fac
...
>>> multiplier = WeirdSliceMultiplier()
>>> timeit.timeit(multiplier.multiply_every_slice_inplace, multiplier.setup, number=50000) # 'slice for every factor' version, but with inplace operators
1.0672136489883997
>>> timeit.timeit(multiplier.multiply_view, multiplier.setup_with_view, number=50000) # 'slice view ahead of time' version again for comparison
0.9300520950055216
The strange change in the time to execute setup_with_view, possibly to do with the CPU load average or something, can be fixed using a normalizing factor:
>>> old_viewslice_time = 0.8748960520024411
>>> new_viewslice_time = 0.9300520950055216
>>> norm_fac = old_viewslice_time / new_viewslice_time
>>> norm_fac
0.9406957488733435
>>> new_viewslice_time * norm_fac # should be very similar to old_viewslice_time
0.8748960520024411
>>> new_everyslice_inplace_time = 1.0672136489883997
>>> new_everyslice_inplace_time * norm_fac
1.003923342742996
Related
I am new to numpy and python and I am trying to understand the usage of transpose function of numpy. The code below works fine but I am still not be able to understand the effect of transpose function and also the use of the arguments inside it. It would be great help if someone can explain the usage and effect of transpose function in below code.
import numpy as np
my_list = [[[[[[1,2],[3,4]],[[1,2],[3,4]]], [[[1,2],[3,4]],[[1,2],[3,4]]]],[[[[1,2],[3,4]],[[1,2],[3,4]]], [[[1,2],[3,4]],[[1,2],[3,4]]]]], [[[[[1,2],[3,4]],[[1,2],[3,4]]], [[[1,0],[1,1]],[[1,0],[1,1]]]],[[[[1,0],[1,1]],[[1,0],[1,1]]], [[[1,0],[1,1]],[[1,0],[1,1]]]]]]
arr = np.array(my_list)
perm_testing = [0,1,2,3,4,5]
testing = arr.transpose(perm_testing)
print(testing)
Edit
import numpy as np
my_list = [[1,2],[3,4]]
arr = np.array(my_list)
perm_testing = [1,0]
testing = arr.transpose(perm_testing)
print(testing)
[[1 3]
[2 4]]
Here's an attempt to visually explain for a 3d-array. I hope it'll help you better understand what's happening:
a=np.arange(24).reshape(2,4,3)
# array([[[ 0, 1, 2],
# [ 3, 4, 5],
# [ 6, 7, 8],
# [ 9, 10, 11]],
#
# [[12, 13, 14],
# [15, 16, 17],
# [18, 19, 20],
# [21, 22, 23]]])
And a visual 3d representation of a (axis 0 corresponds to the first bracket level and to the first size in the shape, and so on for axis 1 and 2):
a.transpose(1,0,2) # swapping axis 0 and 1
# array([[[ 0, 1, 2],
# [12, 13, 14]],
#
# [[ 3, 4, 5],
# [15, 16, 17]],
#
# [[ 6, 7, 8],
# [18, 19, 20]],
#
# [[ 9, 10, 11],
# [21, 22, 23]]])
Visual 3d representation of the new array (sorry, my drawing skills are quite limited):
I have a list like the following:
[[1, 1], [7, 7], [20, 20], [9, 9], [-12, -12]]
And I'm trying to have a new list which has the same number of lists inside, but changes the value of the elements by calculating the average of an element with the element after and before.
What do I mean by that ?:
Let's say I have the sub-list sub = [7,7]
at index 1. I want this list to be [9,9], because sub[1][0] + lst_before_sub[0][0] + lst_after_sub[1][0] = 7 + 1 + 20 = 28, and 28//3 = 9 (I want integer divison).
The ideal output would be:
[[4, 4], [9, 9], [12, 12], [5, 5], [-1, -1]]
I have currently this code:
copy_l = copy.deepcopy(audio_data)
sub_list = []
for i in range(0, len(audio_data)-1):
sub_data = []
for j in range(2):
if i == 0:
audio_data[i][j] += int(audio_data[i+1][j] / 2)
sub_data.append(audio_data[i][j])
elif audio_data[i+1] == audio_data[-1]:
audio_data[i+1][j] = int((audio_data[i+1][j]+audio_data[i][j])/2)
sub_data.append(audio_data[i+1][j])
else:
audio_data = copy_l
audio_data[i][j] = int((audio_data[i-1][j] + audio_data[i][j] + audio_data[i+1][j])/3)
sub_data.append(audio_data[i][j])
sub_list.append(sub_data)
print(sub_list)
where audio_data is the list [[1, 1], [7, 7], [20, 20], [9, 9], [-12, -12]] that I passed in.
(I have separated the average calculation in three cases:
- First element of the list: [1,1] so the average is just 1 + 7 // 2 (no element before [1,1])
- Last element of the list: [-12,-12] so the average is just -12 + 9 // 2 (no element after [-12,-12])
- All the elements in between
)
Problem is, my output (sub_list) is:
[[4, 4], [9, 9], [12, 12], [-1, -1]]
And it seems that [9,9] never turns into [5,5]
Does someone have any idea how to achieve what I want, or even an idea to make it simpler ? I hope I was clear enough, if not feel free to ask me more details, thank you!
EDIT:
I'm seeking a solution without numpy, list comprehension or zip.
Here is a way to do it:
data = [[1, 1], [7, 7], [20, 20], [9, 9], [-12, -12]]
out = []
for i in range(len(data)):
first = max(i -1, 0) # don't have the start of the slice <0
last = min(i + 2, len(data)) # neither beyond the end of the list
mean = [sum(col) // (last-first) for col in zip(*data[first:last])]
out.append(mean)
print(out)
# [[4, 4], [9, 9], [12, 12], [5, 5], [-2, -2]]
We take slices of data around the current item.
Then, we zip the sublists, and we calculate the result on the first (resp. second) values of the sublists.
Also, note that using Python's integer division, we get -2 for -3//2, not -1 as you got by rounding to the closest to 0. If you really want to do that, you'll have to use a custom function for the division.
Here's a NumPy solution:
import numpy as np
def mean3(data):
return np.convolve(np.r_[data[:2].mean(), data, data[-2:].mean()], np.ones(3), 'valid')//3
>>> np.apply_along_axis(mean3, 0, audio_data)
array([[ 4., 4.],
[ 9., 9.],
[12., 12.],
[ 5., 5.],
[-2., -2.]])
Or, if you prefer the int(x/y) definition of integer division:
import numpy as np
def mean3(data):
return (np.convolve(np.r_[data[:2].mean(), data, data[-2:].mean()], np.ones(3), 'valid')/3).astype(int)
>>> np.apply_along_axis(mean3, 0, audio_data)
array([[ 4, 4],
[ 9, 9],
[12, 12],
[ 5, 5],
[-1, -1]])
I am using python 3
I would like to start from a list of nodes in 3 dimensions and build a grid.
I would like to avoid the construct
import numpy as np
l = np.zeros(len(xv)*len(yv)*len(zv))
for (i,x) in zip(range(len(xv)),xv):
for (j,y) in zip(range(len(yv)),yv):
for (k,z) in zip(range(len(zv)),zv):
l[i,j,k] = func(x,y,z)
I am looking for a more compact version of the above lines. An iterator like zip, but that would iterate on all possible tuple in the grid
You can use something like np.meshgrid to construct your grid. Assuming that func is properly vectorized, that should be good enough to construct l
X, Y, Z = np.meshgrid(xv, yv, zv)
l = func(X, Y, Z)
If func isn't vectorized, you can construct a vectorized version using np.vectorize.
Also note that you might even be able to get away without using np.meshgrid through judicious use of np.newaxis:
>>> x
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> y
array([0, 1, 2])
>>> z
array([0, 1])
>>> def func(x, y, z):
... return x + y + z
...
>>> vfunc = np.vectorize(func)
>>> vfunc(x[:, np.newaxis, np.newaxis], y[np.newaxis, :, np.newaxis], z[np.newaxis, np.newaxis, :])
array([[[ 0, 1],
[ 1, 2],
[ 2, 3]],
[[ 1, 2],
[ 2, 3],
[ 3, 4]],
[[ 2, 3],
[ 3, 4],
[ 4, 5]],
[[ 3, 4],
[ 4, 5],
[ 5, 6]],
[[ 4, 5],
[ 5, 6],
[ 6, 7]],
[[ 5, 6],
[ 6, 7],
[ 7, 8]],
[[ 6, 7],
[ 7, 8],
[ 8, 9]],
[[ 7, 8],
[ 8, 9],
[ 9, 10]],
[[ 8, 9],
[ 9, 10],
[10, 11]],
[[ 9, 10],
[10, 11],
[11, 12]]])
As pointed out in the comments, np.ix_ can be used as a shortcut instead of np.newaxis:
vfunc(*np.ix_(xv, yv, zv))
Also note that with this stupid simple function, np.vectorize isn't necessary and will actually hurt our performance a lot...
Say your func is something like
def func(x,y,z,indices):
xv, yv, zv = [i[j] for i,j in zip((x,y,z),indices)]
#do a calc with the value for the specific x,y,z points
Hook the lists you want to it using partial by doing
from functools import partial
f = partial(func, x=xv, y=yv, z=zv)
Now just do a map supplying the indices and you're set!
l = list(map(lambda x: f(indices=x), itertools.product(x,y,z)))
With a simple function:
def foo(x,y,z):
return x**2 + y*2 + z
and space defined by:
In [328]: xv, yv, zv = [np.arange(i) for i in [2,3,4]]
This iteration is as fast any as any, even if it is a bit wordy:
In [329]: res = np.zeros((xv.shape[0], yv.shape[0], zv.shape[0]), dtype=int)
In [330]: for i,x in enumerate(xv):
...: for j,y in enumerate(yv):
...: for k,z in enumerate(zv):
...: res[i,j,k] = foo(x,y,z)
In [331]: res
Out[331]:
array([[[0, 1, 2, 3],
[2, 3, 4, 5],
[4, 5, 6, 7]],
[[1, 2, 3, 4],
[3, 4, 5, 6],
[5, 6, 7, 8]]])
As #mgilson explains, you can generate 3 arrays that define the 3d space with:
In [332]: I,J,K = np.meshgrid(xv,yv,zv,indexing='ij',sparse=True)
In [333]: I.shape
Out[333]: (2, 1, 1)
In [334]: J.shape
Out[334]: (1, 3, 1)
In [335]: I,J,K = np.ix_(xv,yv,zv) # equivalently
In [336]: I.shape
Out[336]: (2, 1, 1)
foo was written so it works with arrays just as well as with scalars, so:
In [337]: res1 = foo(I,J,K)
In [338]: res1
Out[338]:
array([[[0, 1, 2, 3],
...
[5, 6, 7, 8]]])
So if your function fits this pattern, use it. Look at those I,J,K arrays, with and without sparse.
There are other tools for generating the i,j,k sets. For example:
for i,j,k in np.ndindex(res.shape):
res[i,j,k] = foo(xv[i], yv[j], zv[k])
for i,j,k in itertools.product(range(2),range(3),range(4)):
res[i,j,k] = foo(xv[i], yv[j], zv[k])
itertools.product is fast, especially when used as list(product(...)). But the iteration mechanism isn't that important. It's the repeated call to foo that take up most of the time.
ndindex actually uses nditer, which can be used directly in:
it = np.nditer([I,J,K,None],flags=['external_loop','buffered'])
for x,y,z,r in it:
r[...] = foo(x,y,z)
it.operands[-1]
nditer is best described in:
https://docs.scipy.org/doc/numpy/reference/arrays.nditer.html. It is best used as a stepping stone toward a cython version. Otherwise it doesn't have any speed advantages. (Though with this foo, and 'external_loop' it is as fast as foo(I,J,K)). Note that this doesn't need the indices (but see 'multi_index').
And yes, there's vectorize. Convenient, but not a speedy solution.
vfoo=np.vectorize(foo, otypes=['int'])
vfoo(I,J,K)
Here is my code. What I want it to return is an array of matrices
[[1,1],[1,1]], [[2,4],[8,16]], [[3,9],[27,81]]
I know I can probably do it using for loop and looping through my vector k, but I was wondering if there is a simple way that I am missing. Thanks!
from numpy import *
import numpy as np
k=np.arange(1,4,1)
print k
def exam(p):
return np.array([[p,p**2],[p**3,p**4]])
print exam(k)
The output:
[1 2 3]
[[[ 1 2 3]
[ 1 4 9]]
[[ 1 8 27]
[ 1 16 81]]]
The key is to play with the shapes and broadcasting.
b = np.arange(1,4) # the base
e = np.arange(1,5) # the exponent
b[:,np.newaxis] ** e
=>
array([[ 1, 1, 1, 1],
[ 2, 4, 8, 16],
[ 3, 9, 27, 81]])
(b[:,None] ** e).reshape(-1,2,2)
=>
array([[[ 1, 1],
[ 1, 1]],
[[ 2, 4],
[ 8, 16]],
[[ 3, 9],
[27, 81]]])
If you must have the output as a list of matrices, do:
m = (b[:,None] ** e).reshape(-1,2,2)
[ np.mat(a) for a in m ]
=>
[matrix([[1, 1],
[1, 1]]),
matrix([[ 2, 4],
[ 8, 16]]),
matrix([[ 3, 9],
[27, 81]])]
I am trying to figure out a better way to check if two 2D arrays contain the same rows. Take the following case for a short example:
>>> a
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> b
array([[6, 7, 8],
[3, 4, 5],
[0, 1, 2]])
In this case b=a[::-1]. To check if two rows are equal:
>>>a=a[np.lexsort((a[:,0],a[:,1],a[:,2]))]
>>>b=b[np.lexsort((b[:,0],b[:,1],b[:,2]))]
>>> np.all(a-b==0)
True
This is great and fairly fast. However the issue comes about when two rows are "close":
array([[-1.57839867 2.355354 -1.4225235 ],
[-0.94728367 0. -1.4225235 ],
[-1.57839867 -2.355354 -1.4225215 ]]) <---note ends in 215 not 235
array([[-1.57839867 -2.355354 -1.4225225 ],
[-1.57839867 2.355354 -1.4225225 ],
[-0.94728367 0. -1.4225225 ]])
Within a tolerance of 1E-5 these two arrays are equal by row, but the lexsort will tell you otherwise. This can be solved by a different sorting order but I would like a more general case.
I was toying with the idea of:
a=a.reshape(-1,1,3)
>>> a-b
array([[[-6, -6, -6],
[-3, -3, -3],
[ 0, 0, 0]],
[[-3, -3, -3],
[ 0, 0, 0],
[ 3, 3, 3]],
[[ 0, 0, 0],
[ 3, 3, 3],
[ 6, 6, 6]]])
>>> np.all(np.around(a-b,5)==0,axis=2)
array([[False, False, True],
[False, True, False],
[ True, False, False]], dtype=bool)
>>>np.all(np.any(np.all(np.around(a-b,5)==0,axis=2),axis=1))
True
This doesn't tell you if the arrays are equal by row just if all points in b are close to a value in a. The number of rows can be several hundred and I need to do it quite a bit. Any ideas?
Your last code doesn't do what you think it is doing. What it tells you is whether every row in b is close to a row in a. If you change the axis you use for the outer calls to np.any and np.all, you could check whether every row in a is close to some row in b. If both every row in b is close to a row in a, and every row in a is close to a row in b, then the sets are equal. Probably not very computationally efficient, but probably very fast in numpy for moderately sized arrays:
def same_rows(a, b, tol=5) :
rows_close = np.all(np.round(a - b[:, None], tol) == 0, axis=-1)
return (np.all(np.any(rows_close, axis=-1), axis=-1) and
np.all(np.any(rows_close, axis=0), axis=0))
>>> rows, cols = 5, 3
>>> a = np.arange(rows * cols).reshape(rows, cols)
>>> b = np.arange(rows)
>>> np.random.shuffle(b)
>>> b = a[b]
>>> a
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]])
>>> b
array([[ 9, 10, 11],
[ 3, 4, 5],
[ 0, 1, 2],
[ 6, 7, 8],
[12, 13, 14]])
>>> same_rows(a, b)
True
>>> b[0] = b[1]
>>> b
array([[ 3, 4, 5],
[ 3, 4, 5],
[ 0, 1, 2],
[ 6, 7, 8],
[12, 13, 14]])
>>> same_rows(a, b) # not all rows in a are close to a row in b
False
And for not too big arrays, performance is reasonable, even though it is having to build an array of (rows, rows, cols):
In [2]: rows, cols = 1000, 10
In [3]: a = np.arange(rows * cols).reshape(rows, cols)
In [4]: b = np.arange(rows)
In [5]: np.random.shuffle(b)
In [6]: b = a[b]
In [7]: %timeit same_rows(a, b)
10 loops, best of 3: 103 ms per loop