I am using python 3
I would like to start from a list of nodes in 3 dimensions and build a grid.
I would like to avoid the construct
import numpy as np
l = np.zeros(len(xv)*len(yv)*len(zv))
for (i,x) in zip(range(len(xv)),xv):
for (j,y) in zip(range(len(yv)),yv):
for (k,z) in zip(range(len(zv)),zv):
l[i,j,k] = func(x,y,z)
I am looking for a more compact version of the above lines. An iterator like zip, but that would iterate on all possible tuple in the grid
You can use something like np.meshgrid to construct your grid. Assuming that func is properly vectorized, that should be good enough to construct l
X, Y, Z = np.meshgrid(xv, yv, zv)
l = func(X, Y, Z)
If func isn't vectorized, you can construct a vectorized version using np.vectorize.
Also note that you might even be able to get away without using np.meshgrid through judicious use of np.newaxis:
>>> x
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> y
array([0, 1, 2])
>>> z
array([0, 1])
>>> def func(x, y, z):
... return x + y + z
...
>>> vfunc = np.vectorize(func)
>>> vfunc(x[:, np.newaxis, np.newaxis], y[np.newaxis, :, np.newaxis], z[np.newaxis, np.newaxis, :])
array([[[ 0, 1],
[ 1, 2],
[ 2, 3]],
[[ 1, 2],
[ 2, 3],
[ 3, 4]],
[[ 2, 3],
[ 3, 4],
[ 4, 5]],
[[ 3, 4],
[ 4, 5],
[ 5, 6]],
[[ 4, 5],
[ 5, 6],
[ 6, 7]],
[[ 5, 6],
[ 6, 7],
[ 7, 8]],
[[ 6, 7],
[ 7, 8],
[ 8, 9]],
[[ 7, 8],
[ 8, 9],
[ 9, 10]],
[[ 8, 9],
[ 9, 10],
[10, 11]],
[[ 9, 10],
[10, 11],
[11, 12]]])
As pointed out in the comments, np.ix_ can be used as a shortcut instead of np.newaxis:
vfunc(*np.ix_(xv, yv, zv))
Also note that with this stupid simple function, np.vectorize isn't necessary and will actually hurt our performance a lot...
Say your func is something like
def func(x,y,z,indices):
xv, yv, zv = [i[j] for i,j in zip((x,y,z),indices)]
#do a calc with the value for the specific x,y,z points
Hook the lists you want to it using partial by doing
from functools import partial
f = partial(func, x=xv, y=yv, z=zv)
Now just do a map supplying the indices and you're set!
l = list(map(lambda x: f(indices=x), itertools.product(x,y,z)))
With a simple function:
def foo(x,y,z):
return x**2 + y*2 + z
and space defined by:
In [328]: xv, yv, zv = [np.arange(i) for i in [2,3,4]]
This iteration is as fast any as any, even if it is a bit wordy:
In [329]: res = np.zeros((xv.shape[0], yv.shape[0], zv.shape[0]), dtype=int)
In [330]: for i,x in enumerate(xv):
...: for j,y in enumerate(yv):
...: for k,z in enumerate(zv):
...: res[i,j,k] = foo(x,y,z)
In [331]: res
Out[331]:
array([[[0, 1, 2, 3],
[2, 3, 4, 5],
[4, 5, 6, 7]],
[[1, 2, 3, 4],
[3, 4, 5, 6],
[5, 6, 7, 8]]])
As #mgilson explains, you can generate 3 arrays that define the 3d space with:
In [332]: I,J,K = np.meshgrid(xv,yv,zv,indexing='ij',sparse=True)
In [333]: I.shape
Out[333]: (2, 1, 1)
In [334]: J.shape
Out[334]: (1, 3, 1)
In [335]: I,J,K = np.ix_(xv,yv,zv) # equivalently
In [336]: I.shape
Out[336]: (2, 1, 1)
foo was written so it works with arrays just as well as with scalars, so:
In [337]: res1 = foo(I,J,K)
In [338]: res1
Out[338]:
array([[[0, 1, 2, 3],
...
[5, 6, 7, 8]]])
So if your function fits this pattern, use it. Look at those I,J,K arrays, with and without sparse.
There are other tools for generating the i,j,k sets. For example:
for i,j,k in np.ndindex(res.shape):
res[i,j,k] = foo(xv[i], yv[j], zv[k])
for i,j,k in itertools.product(range(2),range(3),range(4)):
res[i,j,k] = foo(xv[i], yv[j], zv[k])
itertools.product is fast, especially when used as list(product(...)). But the iteration mechanism isn't that important. It's the repeated call to foo that take up most of the time.
ndindex actually uses nditer, which can be used directly in:
it = np.nditer([I,J,K,None],flags=['external_loop','buffered'])
for x,y,z,r in it:
r[...] = foo(x,y,z)
it.operands[-1]
nditer is best described in:
https://docs.scipy.org/doc/numpy/reference/arrays.nditer.html. It is best used as a stepping stone toward a cython version. Otherwise it doesn't have any speed advantages. (Though with this foo, and 'external_loop' it is as fast as foo(I,J,K)). Note that this doesn't need the indices (but see 'multi_index').
And yes, there's vectorize. Convenient, but not a speedy solution.
vfoo=np.vectorize(foo, otypes=['int'])
vfoo(I,J,K)
Related
I have a 3D array that I like to repeat 4 times.
Achieved via a mixture of Numpy and Python methods:
>>> z = np.arange(9).reshape(3,3)
>>> z
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> z2 = []
>>> for i in range(4):
z2.append(z)
>>> z2
[array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]), array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]), array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]), array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])]
>>> z2 = np.array(z2)
>>> z2
array([[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]])
Achieved via Pure NumPy:
>>> z2 = np.repeat(z[np.newaxis,...], 4, axis=0)
>>> z2
array([[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]])
Are the elements created by numpy.repeat() views of the original numpy.array() or unique elements?
If the latter, is there an equivalent NumPy functions that can create views of the original array the same way as numpy.repeat()?
I think such an ability can help reduce the buffer space of z2 in the event size of z is large and when there are many repeats of z involved.
A follow-up on one of #FrankYellin answer:
>>> z = np.arange(9).reshape(3,3)
>>> z
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> z2 = np.repeat(z[np.newaxis,...], 1_000_000_000, axis=0)
>>> z2.nbytes
72000000000
>>> y2 = np.broadcast_to(z, (1_000_000_000, 3, 3))
>>> y2.nbytes
72000000000
The nbytes from using np.broadcast_to() is the same as np.repeat(). This is surprising given that the former returns a readonly view on the original z array with the given shape. Having said this, I did notice that np.broadcast_to() created the y2 array instantaneously, while the creation of z2 via np.repeat() took abt 40 seconds to complete. Hence,np.broadcast_to() yielded significantly faster performance.
If you want a writable version, it is doable, but it's really ugly.
If you want a read-only version, np.broadcast_to(z, (4, 3, 3)) should be all you need.
Now the ugly writable version. Be careful. You can corrupt memory if you mess the arguments up.
> z.shape
(3, 3)
> z.strides
(24, 8)
from numpy.lib.stride_tricks import as_strided
z2 = as_strided(z, shape=(4, 3, 3), strides=(0, 24, 8))
and you end up with:
>>> z2[1, 1, 1]
4
>>> z2[1, 1, 1] = 100
>>> z2[2, 1, 1]
100
>>>
You are using strides to say that I want to create a second array overlayed on top of the first array. You set its new shape, and you prefix 0 to the previous stride, indicating that the first dimension has no effect on the data you want.
Make sure you understand strides.
numpy.repeat creates a new array and not a view (you can check it by looking the __array_interface__ field). In fact, it is not possible to create a view on the original array in the general case since Numpy views does not support such pattern. A views is basically just an object containing a pointer to a raw memory buffer, some strides, a shape and a type. While it is possible to repeat one item N times with a 0 stride, it is not possible to repeat 2 items N times (without adding a new dimension to the output array). Thus, no there is no way to build a function like numpy.repeat having the same array output shape to repeat items of the last axis. If adding a new dimension is Ok, then you can build an array with a new dimension and a stride set to 0. Repeating the last dimension is possible though. The answer of #FrankYellin gives a good example. Note that reshaping/ravel the resulting array cause a mandatory copy. Supporting such advanced views would make the Numpy code more complex or/and less efficient for a feature that is only used rarely by users.
I have a very huge numpy array like this:
np.array([1, 2, 3, 4, 5, 6, 7 , ... , 12345])
I need to create subgroups of n elements (in the example n = 3) in another array like this:
np.array([[1, 2, 3],[4, 5, 6], [6, 7, 8], [...], [12340, 12341, 12342], [12343, 12344, 12345]])
I did accomplish that using normal python lists, just appending the subgroups to another list. But, I'm having a hard time trying to do that in numpy.
Any ideas how can I do that?
Thanks!
You can use np.reshape(-1, 3), where the -1 means "whatever's left".
>>> array = np.arange(1, 12346)
>>> array
array([ 1, 2, 3, ..., 12343, 12344, 12345])
>>> array.reshape(-1, 3)
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
...,
[12337, 12338, 12339],
[12340, 12341, 12342],
[12343, 12344, 12345]])
You can use np.reshape():
From the documentation (link in title):
numpy.reshape(a, newshape, order='C')
Gives a new shape to an array without changing its data.
Here is an example of how you can apply it to your situation:
>>> import numpy as np
>>> a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 12345])
>>> a.reshape((int(len(a)/3), 3))
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 12345]], dtype=object)
Note that obviously, the length of the array (len(a)) has to be a multiple of 3 to be able to reshape it into a 2-dimensional numpy array, because they must be rectangular.
I have an array of values that I want to replace with from an array of choices based on which choice is linearly closest.
The catch is the size of the choices is defined at runtime.
import numpy as np
a = np.array([[0, 0, 0], [4, 4, 4], [9, 9, 9]])
choices = np.array([1, 5, 10])
If choices was static in size, I would simply use np.where
d = np.where(np.abs(a - choices[0]) > np.abs(a - choices[1]),
np.where(np.abs(a - choices[0]) > np.abs(a - choices[2]), choices[0], choices[2]),
np.where(np.abs(a - choices[1]) > np.abs(a - choices[2]), choices[1], choices[2]))
To get the output:
>>d
>>[[1, 1, 1], [5, 5, 5], [10, 10, 10]]
Is there a way to do this more dynamically while still preserving the vectorization.
Subtract choices from a, find the index of the minimum of the result, substitute.
a = np.array([[0, 0, 0], [4, 4, 4], [9, 9, 9]])
choices = np.array([1, 5, 10])
b = a[:,:,None] - choices
np.absolute(b,b)
i = np.argmin(b, axis = -1)
a = choices[i]
print a
>>>
[[ 1 1 1]
[ 5 5 5]
[10 10 10]]
a = np.array([[0, 3, 0], [4, 8, 4], [9, 1, 9]])
choices = np.array([1, 5, 10])
b = a[:,:,None] - choices
np.absolute(b,b)
i = np.argmin(b, axis = -1)
a = choices[i]
print a
>>>
[[ 1 1 1]
[ 5 10 5]
[10 1 10]]
>>>
The extra dimension was added to a so that each element of choices would be subtracted from each element of a. choices was broadcast against a in the third dimension, This link has a decent graphic. b.shape is (3,3,3). EricsBroadcastingDoc is a pretty good explanation and has a graphic 3-d example at the end.
For the second example:
>>> print b
[[[ 1 5 10]
[ 2 2 7]
[ 1 5 10]]
[[ 3 1 6]
[ 7 3 2]
[ 3 1 6]]
[[ 8 4 1]
[ 0 4 9]
[ 8 4 1]]]
>>> print i
[[0 0 0]
[1 2 1]
[2 0 2]]
>>>
The final assignment uses an Index Array or Integer Array Indexing.
In the second example, notice that there was a tie for element a[0,1] , either one or five could have been substituted.
To explain wwii's excellent answer in a little more detail:
The idea is to create a new dimension which does the job of comparing each element of a to each element in choices using numpy broadcasting. This is easily done for an arbitrary number of dimensions in a using the ellipsis syntax:
>>> b = np.abs(a[..., np.newaxis] - choices)
array([[[ 1, 5, 10],
[ 1, 5, 10],
[ 1, 5, 10]],
[[ 3, 1, 6],
[ 3, 1, 6],
[ 3, 1, 6]],
[[ 8, 4, 1],
[ 8, 4, 1],
[ 8, 4, 1]]])
Taking argmin along the axis you just created (the last axis, with label -1) gives you the desired index in choices that you want to substitute:
>>> np.argmin(b, axis=-1)
array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2]])
Which finally allows you to choose those elements from choices:
>>> d = choices[np.argmin(b, axis=-1)]
>>> d
array([[ 1, 1, 1],
[ 5, 5, 5],
[10, 10, 10]])
For a non-symmetric shape:
Let's say a had shape (2, 5):
>>> a = np.arange(10).reshape((2, 5))
>>> a
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
Then you'd get:
>>> b = np.abs(a[..., np.newaxis] - choices)
>>> b
array([[[ 1, 5, 10],
[ 0, 4, 9],
[ 1, 3, 8],
[ 2, 2, 7],
[ 3, 1, 6]],
[[ 4, 0, 5],
[ 5, 1, 4],
[ 6, 2, 3],
[ 7, 3, 2],
[ 8, 4, 1]]])
This is hard to read, but what it's saying is, b has shape:
>>> b.shape
(2, 5, 3)
The first two dimensions came from the shape of a, which is also (2, 5). The last dimension is the one you just created. To get a better idea:
>>> b[:, :, 0] # = abs(a - 1)
array([[1, 0, 1, 2, 3],
[4, 5, 6, 7, 8]])
>>> b[:, :, 1] # = abs(a - 5)
array([[5, 4, 3, 2, 1],
[0, 1, 2, 3, 4]])
>>> b[:, :, 2] # = abs(a - 10)
array([[10, 9, 8, 7, 6],
[ 5, 4, 3, 2, 1]])
Note how b[:, :, i] is the absolute difference between a and choices[i], for each i = 1, 2, 3.
Hope that helps explain this a little more clearly.
I love broadcasting and would have gone that way myself too. But, with large arrays, I would like to suggest another approach with np.searchsorted that keeps it memory efficient and thus achieves performance benefits, like so -
def searchsorted_app(a, choices):
lidx = np.searchsorted(choices, a, 'left').clip(max=choices.size-1)
ridx = (np.searchsorted(choices, a, 'right')-1).clip(min=0)
cl = np.take(choices,lidx) # Or choices[lidx]
cr = np.take(choices,ridx) # Or choices[ridx]
mask = np.abs(a - cl) > np.abs(a - cr)
cl[mask] = cr[mask]
return cl
Please note that if the elements in choices are not sorted, we need to add in the additional argument sorter with np.searchsorted.
Runtime test -
In [160]: # Setup inputs
...: a = np.random.rand(100,100)
...: choices = np.sort(np.random.rand(100))
...:
In [161]: def broadcasting_app(a, choices): # #wwii's solution
...: return choices[np.argmin(np.abs(a[:,:,None] - choices),-1)]
...:
In [162]: np.allclose(broadcasting_app(a,choices),searchsorted_app(a,choices))
Out[162]: True
In [163]: %timeit broadcasting_app(a, choices)
100 loops, best of 3: 9.3 ms per loop
In [164]: %timeit searchsorted_app(a, choices)
1000 loops, best of 3: 1.78 ms per loop
Related post : Find elements of array one nearest to elements of array two
Suppose I have a numpy array as below
a = np.asarray([[1,2,3],[1,4,3],[2,5,4],[2,7,5]])
array([[1, 2, 3],
[1, 4, 3],
[2, 5, 4],
[2, 7, 5]])
How can I flatten column 2 and 3 for each unique element in column 1 like below:
array([[1, 2, 3, 4, 3],
[2, 5, 4, 7, 5],])
Thank you for your help.
Another option using list comprehension:
np.array([np.insert(a[a[:,0] == k, 1:].flatten(), 0, k) for k in np.unique(a[:,0])])
# array([[1, 2, 3, 4, 3],
# [2, 5, 4, 7, 5]])
import numpy as np
a = np.asarray([[1,2,3],[1,4,3],[2,5,4],[2,7,5]])
d = {}
for row in a:
d[row[0]] = np.concatenate( (d.get(row[0], []), row[1:]) )
r = np.array([np.concatenate(([key], d[key])) for key in d])
print(r)
This prints:
[[ 1. 2. 3. 4. 3.]
[ 2. 5. 4. 7. 5.]]
Since as posted in the comments, we know that each unique element in column-0 would have a fixed number of rows and by which I assumed it was meant same number of rows, we can use a vectorized approach to solve the case. We sort the rows based on column-0 and look for shifts along it, which would signify group change and thus give us the exact number of rows associated per unique element in column-0. Let's call it L. Finally, we slice sorted array to select columns-1,2 and group L rows together by reshaping. Thus, the implementation would be -
sa = a[a[:,0].argsort()]
L = np.unique(sa[:,0],return_index=True)[1][1]
out = np.column_stack((sa[::L,0],sa[:,1:].reshape(-1,2*L)))
For more performance boost, we can use np.diff to calculate L, like so -
L = np.where(np.diff(sa[:,0])>0)[0][0]+1
Sample run -
In [103]: a
Out[103]:
array([[1, 2, 3],
[3, 7, 8],
[1, 4, 3],
[2, 5, 4],
[3, 8, 2],
[2, 7, 5]])
In [104]: sa = a[a[:,0].argsort()]
...: L = np.unique(sa[:,0],return_index=True)[1][1]
...: out = np.column_stack((sa[::L,0],sa[:,1:].reshape(-1,2*L)))
...:
In [105]: out
Out[105]:
array([[1, 2, 3, 4, 3],
[2, 5, 4, 7, 5],
[3, 7, 8, 8, 2]])
I need to accomplish the following task:
from:
a = array([[1,3,4],[1,2,3]...[1,2,1]])
(add one element to each row) to:
a = array([[1,3,4,x],[1,2,3,x]...[1,2,1,x]])
I have tried doing stuff like a[n] = array([1,3,4,x])
but numpy complained of shape mismatch. I tried iterating through a and appending element x to each item, but the changes are not reflected.
Any ideas on how I can accomplish this?
Appending data to an existing array is a natural thing to want to do for anyone with python experience. However, if you find yourself regularly appending to large arrays, you'll quickly discover that NumPy doesn't easily or efficiently do this the way a python list will. You'll find that every "append" action requires re-allocation of the array memory and short-term doubling of memory requirements. So, the more general solution to the problem is to try to allocate arrays to be as large as the final output of your algorithm. Then perform all your operations on sub-sets (slices) of that array. Array creation and destruction should ideally be minimized.
That said, It's often unavoidable and the functions that do this are:
for 2-D arrays:
np.hstack
np.vstack
np.column_stack
np.row_stack
for 3-D arrays (the above plus):
np.dstack
for N-D arrays:
np.concatenate
import numpy as np
a = np.array([[1,3,4],[1,2,3],[1,2,1]])
b = np.array([10,20,30])
c = np.hstack((a, np.atleast_2d(b).T))
returns c:
array([[ 1, 3, 4, 10],
[ 1, 2, 3, 20],
[ 1, 2, 1, 30]])
One way to do it (may not be the best) is to create another array with the new elements and do column_stack. i.e.
>>>a = array([[1,3,4],[1,2,3]...[1,2,1]])
[[1 3 4]
[1 2 3]
[1 2 1]]
>>>b = array([1,2,3])
>>>column_stack((a,b))
array([[1, 3, 4, 1],
[1, 2, 3, 2],
[1, 2, 1, 3]])
Appending a single scalar could be done a bit easier as already shown (and also without converting to float) by expanding the scalar to a python-list-type:
import numpy as np
a = np.array([[1,3,4],[1,2,3],[1,2,1]])
x = 10
b = np.hstack ((a, [[x]] * len (a) ))
returns b as:
array([[ 1, 3, 4, 10],
[ 1, 2, 3, 10],
[ 1, 2, 1, 10]])
Appending a row could be done by:
c = np.vstack ((a, [x] * len (a[0]) ))
returns c as:
array([[ 1, 3, 4],
[ 1, 2, 3],
[ 1, 2, 1],
[10, 10, 10]])
np.insert can also be used for the purpose
import numpy as np
a = np.array([[1, 3, 4],
[1, 2, 3],
[1, 2, 1]])
x = 5
index = 3 # the position for x to be inserted before
np.insert(a, index, x, axis=1)
array([[1, 3, 4, 5],
[1, 2, 3, 5],
[1, 2, 1, 5]])
index can also be a list/tuple
>>> index = [1, 1, 3] # equivalently (1, 1, 3)
>>> np.insert(a, index, x, axis=1)
array([[1, 5, 5, 3, 4, 5],
[1, 5, 5, 2, 3, 5],
[1, 5, 5, 2, 1, 5]])
or a slice
>>> index = slice(0, 3)
>>> np.insert(a, index, x, axis=1)
array([[5, 1, 5, 3, 5, 4],
[5, 1, 5, 2, 5, 3],
[5, 1, 5, 2, 5, 1]])
If x is just a single scalar value, you could try something like this to ensure the correct shape of the array that is being appended/concatenated to the rightmost column of a:
import numpy as np
a = np.array([[1,3,4],[1,2,3],[1,2,1]])
x = 10
b = np.hstack((a,x*np.ones((a.shape[0],1))))
returns b as:
array([[ 1., 3., 4., 10.],
[ 1., 2., 3., 10.],
[ 1., 2., 1., 10.]])
target = []
for line in a.tolist():
new_line = line.append(X)
target.append(new_line)
return array(target)