What is the simplest way to compare two NumPy arrays for equality (where equality is defined as: A = B iff for all indices i: A[i] == B[i])?
Simply using == gives me a boolean array:
>>> numpy.array([1,1,1]) == numpy.array([1,1,1])
array([ True, True, True], dtype=bool)
Do I have to and the elements of this array to determine if the arrays are equal, or is there a simpler way to compare?
(A==B).all()
test if all values of array (A==B) are True.
Note: maybe you also want to test A and B shape, such as A.shape == B.shape
Special cases and alternatives (from dbaupp's answer and yoavram's comment)
It should be noted that:
this solution can have a strange behavior in a particular case: if either A or B is empty and the other one contains a single element, then it return True. For some reason, the comparison A==B returns an empty array, for which the all operator returns True.
Another risk is if A and B don't have the same shape and aren't broadcastable, then this approach will raise an error.
In conclusion, if you have a doubt about A and B shape or simply want to be safe: use one of the specialized functions:
np.array_equal(A,B) # test if same shape, same elements values
np.array_equiv(A,B) # test if broadcastable shape, same elements values
np.allclose(A,B,...) # test if same shape, elements have close enough values
The (A==B).all() solution is very neat, but there are some built-in functions for this task. Namely array_equal, allclose and array_equiv.
(Although, some quick testing with timeit seems to indicate that the (A==B).all() method is the fastest, which is a little peculiar, given it has to allocate a whole new array.)
If you want to check if two arrays have the same shape AND elements you should use np.array_equal as it is the method recommended in the documentation.
Performance-wise don't expect that any equality check will beat another, as there is not much room to optimize comparing two elements. Just for the sake, i still did some tests.
import numpy as np
import timeit
A = np.zeros((300, 300, 3))
B = np.zeros((300, 300, 3))
C = np.ones((300, 300, 3))
timeit.timeit(stmt='(A==B).all()', setup='from __main__ import A, B', number=10**5)
timeit.timeit(stmt='np.array_equal(A, B)', setup='from __main__ import A, B, np', number=10**5)
timeit.timeit(stmt='np.array_equiv(A, B)', setup='from __main__ import A, B, np', number=10**5)
> 51.5094
> 52.555
> 52.761
So pretty much equal, no need to talk about the speed.
The (A==B).all() behaves pretty much as the following code snippet:
x = [1,2,3]
y = [1,2,3]
print all([x[i]==y[i] for i in range(len(x))])
> True
Let's measure the performance by using the following piece of code.
import numpy as np
import time
exec_time0 = []
exec_time1 = []
exec_time2 = []
sizeOfArray = 5000
numOfIterations = 200
for i in xrange(numOfIterations):
A = np.random.randint(0,255,(sizeOfArray,sizeOfArray))
B = np.random.randint(0,255,(sizeOfArray,sizeOfArray))
a = time.clock()
res = (A==B).all()
b = time.clock()
exec_time0.append( b - a )
a = time.clock()
res = np.array_equal(A,B)
b = time.clock()
exec_time1.append( b - a )
a = time.clock()
res = np.array_equiv(A,B)
b = time.clock()
exec_time2.append( b - a )
print 'Method: (A==B).all(), ', np.mean(exec_time0)
print 'Method: np.array_equal(A,B),', np.mean(exec_time1)
print 'Method: np.array_equiv(A,B),', np.mean(exec_time2)
Output
Method: (A==B).all(), 0.03031857
Method: np.array_equal(A,B), 0.030025185
Method: np.array_equiv(A,B), 0.030141515
According to the results above, the numpy methods seem to be faster than the combination of the == operator and the all() method and by comparing the numpy methods the fastest one seems to be the numpy.array_equal method.
Usually two arrays will have some small numeric errors,
You can use numpy.allclose(A,B), instead of (A==B).all(). This returns a bool True/False
Now use np.array_equal. From documentation:
np.array_equal([1, 2], [1, 2])
True
np.array_equal(np.array([1, 2]), np.array([1, 2]))
True
np.array_equal([1, 2], [1, 2, 3])
False
np.array_equal([1, 2], [1, 4])
False
On top of the other answers, you can now use an assertion:
numpy.testing.assert_array_equal(x, y)
You also have similar function such as numpy.testing.assert_almost_equal()
https://numpy.org/doc/stable/reference/generated/numpy.testing.assert_array_equal.html
Just for the sake of completeness. I will add the
pandas approach for comparing two arrays:
import numpy as np
a = np.arange(0.0, 10.2, 0.12)
b = np.arange(0.0, 10.2, 0.12)
ap = pd.DataFrame(a)
bp = pd.DataFrame(b)
ap.equals(bp)
True
FYI: In case you are looking of How to
compare Vectors, Arrays or Dataframes in R.
You just you can use:
identical(iris1, iris2)
#[1] TRUE
all.equal(array1, array2)
#> [1] TRUE
Related
I've found myself using NumPy arrays for memory management and speed of computation more and more lately, on large volumes of structured data (such as points and polygons). In doing so, there is always a situation where I need to perform some function f(x) on the entire array. From experience, and Googling, iterating over the array is not the way to do this, so insted a function should be vectorized and broadcast to the entire array.
Looking at the documentation for numpy.vectorize we get this example:
def myfunc(a, b):
"Return a-b if a>b, otherwise return a+b"
if a > b:
return a - b
else:
return a + b
>>> vfunc = np.vectorize(myfunc)
>>> vfunc([1, 2, 3, 4], 2)
array([3, 4, 1, 2])
And per the docs it really just creates a for loop so it doesnt access the lower C loops for truly vectorized operations (either in BLAS or SIMD). So that got me wondering, if the above is "vectorized", what is this?:
def myfunc_2(a, b):
cond = a > b
a[cond] -= b
a[~cond] += b
return a
>>> myfunc_2(np.array([1, 2, 3, 4], 2))
array([3, 4, 1, 2])
Or even this:
>>> a = np.array([1, 2, 3, 4]
>>> b = 2
>>> np.where(a > b, a - b, a + b)
array([3, 4, 1, 2])
So I ran some tests on these, what I believe to be comparable examples:
>>> arr = np.random.randint(200, size=(1000000,))
>>> setup = 'from __main__ import vfunc, arr'
>>> timeit('vfunc(arr, 50)', setup=setup, number=1)
0.60175449999997
>>> arr = np.random.randint(200, size=(1000000,))
>>> setup = 'from __main__ import myfunc_2, arr'
>>> timeit('myfunc_2(arr, 50)', setup=setup, number=1)
0.07464979999997468
>>> arr = np.random.randint(200, size=(1000000,))
>>> setup = 'from __main__ import myfunc_3, arr'
>>> timeit('myfunc_3(arr, 50)', setup=setup, number=1)
0.0222587000000658
And with larger run windows:
>>> arr = np.random.randint(200, size=(1000000,))
>>> setup = 'from __main__ import vfunc, arr'
>>> timeit('vfunc(arr, 50)', setup=setup, number=1000)
621.5853878000003
>>> arr = np.random.randint(200, size=(1000000,))
>>> setup = 'from __main__ import myfunc_2, arr'
>>> timeit('myfunc_2(arr, 50)', setup=setup, number=1000)
98.19819199999984
>>> arr = np.random.randint(200, size=(1000000,))
>>> setup = 'from __main__ import myfunc_3, arr'
>>> timeit('myfunc_3(arr, 50)', setup=setup, number=1000)
26.128515100000186
Clearly the other options are major improvements over using numpy.vectorize. This leads me to wonder several things about why anybody would use numpy.vectorize at all if you can write what appear to be "purely vectorized" functions or use battery provided functions like numpy.where.
Now for the questions:
What are the requirements to say a function is "vectorized" if not converted via numpy.vectorize? Just broadcastable in its entirety?
How does NumPy determine if a function is "vectorized"/broadcastable?
Why isn't this form of vectorization documented anywhere? (i.e., why doesn't NumPy have a "How to write a vectorized function" page?
"vectorization" can mean be different things depending on context. Use of low level C code with BLAS or SIMD is just one.
In physics 101, a vector represents a point or velocity whose numeric representation can vary with coordinate system. Thus I think of "vectorization", broadly speaking, as performing math operations on the "whole" array, without explicit control over numerical elements.
numpy basically adds a ndarray class to python. It has a large number of methods (and operators and ufunc) that do indexing and math in compiled code (not necessarily using processor specific SIMD). The big gain in speed, relative to Python level iteration, is the use of compiled code optimized for the ndarray data structure. Python level iteration (interpreted code) on arrays is actually slower than on equivalent lists.
I don't think numpy formally defines "vectorization". There isn't a "vector" class. I haven't searched the documentation for those terms. Here, and possibly on other forums, it just means, writing code that makes optimal use of ndarray methods. Generally that means avoiding python level iteration.
np.vectorize is a tool for applying arrays to functions that only accept scalar inputs. It doesn't not compile or otherwise "look inside" that function. But it does accept and apply arguments in a fully broadcasted sense, such as in:
In [162]: vfunc(np.arange(3)[:,None],np.arange(4))
Out[162]:
array([[0, 1, 2, 3],
[1, 2, 3, 4],
[2, 1, 4, 5]])
Speedwise np.vectorize is slower than the equivalent list comprehension, at least for smaller sample cases. Recent testing shows that it scales better, so for large inputs it may be better. But still the performance is nothing like your myfunc_2.
myfunc is not "vectorized" simply because expressions like if a > b do not work with arrays.
np.where(a > b, a - b, a + b) is "vectorized" because all arguments to the where work with arrays, and where itself uses them with full broadcasting powers.
In [163]: a,b = np.arange(3)[:,None], np.arange(4)
In [164]: np.where(a>b, a-b, a+b)
Out[164]:
array([[0, 1, 2, 3],
[1, 2, 3, 4],
[2, 1, 4, 5]])
myfunc_2 is "vectorized", at least in a:
In [168]: myfunc_2(a,2)
Out[168]:
array([[4],
[1],
[2]])
It does not work when b is array; it's trickier to match the a[cond] shape with anything but a scalar:
In [169]: myfunc_2(a,b)
Traceback (most recent call last):
Input In [169] in <cell line: 1>
myfunc_2(a,b)
Input In [159] in myfunc_2
a[cond] -= b
IndexError: boolean index did not match indexed array along dimension 1; dimension is 1 but corresponding boolean dimension is 4
===
What are the requirements to say a function is "vectorized" if not converted via numpy.vectorize? Just broadcastable in its entirety?
In your examples, my_func is not "vectorized" because it only works with scalars. vfunc is full "vectorized", but not faster. where is also "vectorized" and (probably) faster, though this may be scale dependent. my_func2 is only "vectorized" in a.
How does NumPy determine if a function is "vectorized"/broadcastable?
numpy doesn't determine anything like this. numpy is a ndarray class with many methods. It's just the use of those methods that makes a block of code "vectorized".
Why isn't this form of vectorization documented anywhere? (i.e., why doesn't NumPy have a "How to write a vectorized function" page?
Keep in mind the distinction between "vectorization" as a performance strategy, and the basic idea of operating on whole arrays.
Vectorize Documentation
The documentation provides a great example in def mypolyval(p, x):: there's no good way to write that as a where condition or using simple logic.
def mypolyval(p, x):
_p = list(p)
res = _p.pop(0)
while _p:
res = res*x + _p.pop(0)
return res
vpolyval = np.vectorize(mypolyval, excluded=['p'])
vpolyval(p=[1, 2, 3], x=[0, 1])
array([3, 6])
That is, np.vectorize is clearly what the reference documentation states: convenience to write code in the same fashion even without the performance benefits.
And as for the documentation telling you how to write vectorized code, it does though in the relevant documentation. It says in the documentation what you mentioned above:
The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.
Remember: the documentation is an API reference guide with some additional caveats: it's not a NumPy tutorial.
UFunc Documentation
The appropriate reference documentation and glossary document this clearly:
A universal function (or ufunc for short) is a function that operates on ndarrays in an element-by-element fashion, supporting array broadcasting, type casting, and several other standard features. That is, a ufunc is a “vectorized” wrapper for a function that takes a fixed number of specific inputs and produces a fixed number of specific outputs. For detailed information on universal functions, see Universal functions (ufunc) basics.
NumPy hands off array processing to C, where looping and computation are much faster than in Python. To exploit this, programmers using NumPy eliminate Python loops in favor of array-to-array operations. vectorization can refer both to the C offloading and to structuring NumPy code to leverage it.
Summary
Simply put, np.vectorize is for code legibility so you can write similar code to actually vectorized ufuncs. It is not for performance, but there are times when you have no good alternative.
My problem
Suppose I have
a = np.array([ np.array([1,2]), np.array([3,4]), np.array([5,6]), np.array([7,8]), np.array([9,10])])
b = np.array([ np.array([5,6]), np.array([1,2]), np.array([3,192])])
They are two arrays, of different sizes, containing other arrays (the inner arrays have same sizes!)
I want to count how many items of b (i.e. inner arrays) are also in a. Notice that I am not considering their position!
How can I do that?
My Try
count = 0
for bitem in b:
for aitem in a:
if aitem==bitem:
count+=1
Is there a better way? Especially in one line, maybe with some comprehension..
The numpy_indexed package contains efficient (nlogn, generally) and vectorized solutions to these types of problems:
import numpy_indexed as npi
count = len(npi.intersection(a, b))
Note that this is subtly different than your double loop, discarding duplicate entries in a and b for instance. If you want to retain duplicates in b, this would work:
count = npi.in_(b, a).sum()
Duplicate entries in a could also be handled by doing npi.count(a) and factoring in the result of that; but anyway, im just rambling on for illustration purposes since I imagine the distinction probably does not matter to you.
Here is a simple way to do it:
a = np.array([ np.array([1,2]), np.array([3,4]), np.array([5,6]), np.array([7,8]), np.array([9,10])])
b = np.array([ np.array([5,6]), np.array([1,2]), np.array([3,192])])
count = np.count_nonzero(
np.any(np.all(a[:, np.newaxis, :] == b[np.newaxis, :, :], axis=-1), axis=0))
print(count)
>>> 2
You can do what you want in one liner as follows:
count = sum([np.array_equal(x,y) for x,y in product(a,b)])
Explanation
Here's an explanation of what's happening:
Iterate through the two arrays using itertools.product which will create an iterator over the cartesian product of the two arrays.
Compare each two arrays in a tuple (x,y) coming from step 1. using np.array_equal
True is equal to 1 when using sum on a list
Full example:
The final code looks like this:
import numpy as np
from itertools import product
a = np.array([ np.array([1,2]), np.array([3,4]), np.array([5,6]), np.array([7,8]), np.array([9,10])])
b = np.array([ np.array([5,6]), np.array([1,2]), np.array([3,192])])
count = sum([np.array_equal(x,y) for x,y in product(a,b)])
# output: 2
You can convert the rows to dtype = np.void and then use np.in1d as on the resulting 1d arrays
def void_arr(a):
return np.ascontiguousarray(a).view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))
b[np.in1d(void_arr(b), void_arr(a))]
array([[5, 6],
[1, 2]])
If you just want the number of intersections, it's
np.in1d(void_arr(b), void_arr(a)).sum()
2
Note: if there are repeat items in b or a, then np.in1d(void_arr(b), void_arr(a)).sum() likely won't be equal to np.in1d(void_arr(a), void_arr(b)).sum(). I've reversed the order from my original answer to match your question (i.e. how many elements of b are in a?)
For more information, see the third answer here
I would like to test the equality of multiple args (i.e. it should return True if all args are equal and False if at least one argument differs).
As numpy.equal can only handle two arguments, I would have tried reduce but it, obviously, fails:
reduce(np.equal, (4, 4, 4)) # return False because...
reduce(np.equal, (True, 4)) # ... is False
You can use np.unique to check if the length of unique items within your array is 1:
np.unique(array).size == 1
Or np.all() in order to check if all of the items are equal with one of your items (for example the first one):
np.all(array == array[0])
Demo:
>>> a = np.array([1, 1, 1, 1])
>>> b = np.array([1, 1, 1, 2])
>>> np.unique(a).size == 1
True
>>> np.unique(b).size == 1
False
>>> np.all(a==a[0])
True
>>> np.all(b==b[0])
False
The numpy_indexed package has a builtin function for this. Note that it also works on multidimensional arrays, ie you can use it to check if a stack of images are all identical, for instance.
import numpy_indexed as npi
npi.all_equal(array)
If your args are floating point values the equality test can produce weird results due to round off errors. In this case you should use a more robust approach, for example numpy.allclose:
In [636]: x = [2./3., .2/.3]
In [637]: x
Out[637]: [0.6666666666666666, 0.6666666666666667]
In [638]: xarr = np.array(x)
In [639]: np.unique(xarr).size == 1
Out[639]: False
In [640]: np.all(xarr == xarr[0])
Out[640]: False
In [641]: reduce(np.allclose, x)
Out[641]: True
Note: Python 3 users will need to include the sentence from functools import reduce since reduce is no longer a built-in function in Python 3.
Im fairly new to numpy arrays and have encountered a problem when comparing one array with another.
I have two arrays, such that:
a = np.array([1,2,3,4,5])
b = np.array([2,4,3,5,2])
I want to do something like the following:
if b > a:
c = b
else:
c = a
so that I end up with an array c = np.array([2,4,3,5,5]).
This can be otherwise thought of as taking the max value for each element of the two arrays.
However, I am running into the error
ValueError: The truth value of an array with more than one element is ambiguous.
Use a.any() or a.all().
I have tried using these but Im not sure that the are right for what I want.
Is someone able to offer some advice in solving this?
You are looking for the function np.fmax. It takes the element-wise maximum of the two arrays, ignoring NaNs.
import numpy as np
a = np.array([1, 2, 3, 4, 5])
b = np.array([2, 4, 3, 5, 2])
c = np.fmax(a, b)
The output is
array([2, 4, 3, 5, 5])
As with almost everything else in numpy, comparisons are done element-wise, returning a whole array:
>>> b > a
array([ True, True, False, True, False], dtype=bool)
So, is that true or false? What should an if statement do with it?
Numpy's answer is that it shouldn't try to guess, it should just raise an exception.
If you want to consider it true because at least one value is true, use any:
>>> if np.any(b > a): print('Yes!')
Yes!
If you want to consider it false because not all values are true, use all:
>>> if np.all(b > a): print('Yes!')
But I'm pretty sure you don't want either of these. You want to broadcast the whole if/else over the array.
You could of course wrap the if/else logic for a single value in a function, then explicitly vectorize it and call it:
>>> def mymax(a, b):
... if b > a:
... return b
... else:
... return a
>>> vmymax = np.vectorize(mymax)
>>> vmymax(a, b)
array([2, 4, 3, 5, 5])
This is worth knowing how to do… but very rarely worth doing. There's usually a more indirect way to do it using natively-vectorized functions—and often a more direct way, too.
One way to do it indirectly is by using the fact that True and False are numerical 1 and 0:
>>> (b>a)*b + (b<=a)*a
array([2, 4, 3, 5, 5])
This will add the 1*b[i] + 0*a[i] when b>a, and 0*b[i] + 1*a[i] when b<=a. A bit ugly, but not too hard to understand. There are clearer, but more verbose, ways to write this.
But let's look for an even better, direct solution.
First, notice that your mymax function will do exactly the same as Python's built-in max, for 2 values:
>>> vmymax = np.vectorize(max)
>>> vmymax(a, b)
array([2, 4, 3, 5, 5])
Then consider that for something so useful, numpy probably already has it. And a quick search will turn up maximum:
>>> np.maximum(a, b)
array([2, 4, 3, 5, 5])
Here's an other way of achieving this
c = np.array([y if y>z else z for y,z in zip(a,b)])
The following methods also work:
Use numpy.maximum
>>> np.maximum(a, b)
Use numpy.max and numpy.vstack
>>> np.max(np.vstack(a, b), axis = 0)
May not be the most efficient one but this is a more suitable answer to the original question:
import numpy as np
c = np.zeros(shape=(5,1))
a = np.array([1,2,3,4,5])
b = np.array([2,4,3,5,2])
for i in range(5):
if b.item(i) > a.item(i):
c[i] = b.item(i)
else:
c[i] = a.item(i)
I need to generate a lot of random numbers. I've tried using random.random but this function is quite slow. Therefore I switched to numpy.random.random which is way faster! So far so good. The generated random numbers are actually used to calculate some thing (based on the number). I therefore enumerate over each number and replace the value. This seems to kill all my previously gained speedup. Here are the stats generated with timeit():
test_random - no enumerate
0.133111953735
test_np_random - no enumerate
0.0177130699158
test_random - enumerate
0.269361019135
test_np_random - enumerate
1.22525310516
as you can see, generating the number is almost 10 times faster using numpy, but enumerating over those numbers gives me equal run times.
Below is the code that I'm using:
import numpy as np
import timeit
import random
NBR_TIMES = 10
NBR_ELEMENTS = 100000
def test_random(do_enumerate=False):
y = [random.random() for i in range(NBR_ELEMENTS)]
if do_enumerate:
for index, item in enumerate(y):
# overwrite the y value, in reality this will be some function of 'item'
y[index] = 1 + item
def test_np_random(do_enumerate=False):
y = np.random.random(NBR_ELEMENTS)
if do_enumerate:
for index, item in enumerate(y):
# overwrite the y value, in reality this will be some function of 'item'
y[index] = 1 + item
if __name__ == '__main__':
from timeit import Timer
t = Timer("test_random()", "from __main__ import test_random")
print "test_random - no enumerate"
print t.timeit(NBR_TIMES)
t = Timer("test_np_random()", "from __main__ import test_np_random")
print "test_np_random - no enumerate"
print t.timeit(NBR_TIMES)
t = Timer("test_random(True)", "from __main__ import test_random")
print "test_random - enumerate"
print t.timeit(NBR_TIMES)
t = Timer("test_np_random(True)", "from __main__ import test_np_random")
print "test_np_random - enumerate"
print t.timeit(NBR_TIMES)
What's the best way to speed this up and why does enumerate slow things down so dramatically?
EDIT: the reason I use enumerate is because I need both the index and the value of the current element.
To take full advantage of numpy's speed, you want to create ufuncs whenever possible. Applying vectorize to a function as mgibsonbr suggests is one way to do that, but a better way, if possible, is simply to construct a function that takes advantage of numpy's built-in ufuncs. So something like this:
>>> import numpy
>>> a = numpy.random.random(10)
>>> a + 1
array([ 1.29738145, 1.33004628, 1.45825441, 1.46171177, 1.56863326,
1.58502855, 1.06693054, 1.93304272, 1.66056379, 1.91418473])
>>> (a + 1) * 0.25 / 4
array([ 0.08108634, 0.08312789, 0.0911409 , 0.09135699, 0.09803958,
0.09906428, 0.06668316, 0.12081517, 0.10378524, 0.11963655])
What is the nature of the function you want to apply across the numpy array? If you tell us, perhaps we can help you come up with a version that uses only numpy ufuncs.
It's also possible to generate an array of indices without using enumerate. Numpy provides ndenumerate, which is an iterator, and probably slower, but it also provides indices, which is a very quick way to generate the indices corresponding to the values in an array. So...
>>> numpy.indices(a.shape)
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
So to be more explicit, you can use the above and combine them using numpy.rec.fromarrays:
>>> a = numpy.random.random(10)
>>> ind = numpy.indices(a.shape)
>>> numpy.rec.fromarrays([ind[0], a])
rec.array([(0, 0.092473494150913438), (1, 0.20853257641948986),
(2, 0.35141455604686067), (3, 0.12212258656960817),
(4, 0.50986868372639049), (5, 0.0011439325711705139),
(6, 0.50412473457942508), (7, 0.28973489788728601),
(8, 0.20078799423168536), (9, 0.34527678271856999)],
dtype=[('f0', '<i8'), ('f1', '<f8')])
It's starting to sound like your main concern is performing the operation in-place. That's harder to do using vectorize but it's easy with the ufunc approach:
>>> def somefunc(a):
... a += 1
... a /= 15
...
>>> a = numpy.random.random(10)
>>> b = a
>>> somefunc(a)
>>> a
array([ 0.07158446, 0.07052393, 0.07276768, 0.09813235, 0.09429439,
0.08561703, 0.11204622, 0.10773558, 0.11878885, 0.10969279])
>>> b
array([ 0.07158446, 0.07052393, 0.07276768, 0.09813235, 0.09429439,
0.08561703, 0.11204622, 0.10773558, 0.11878885, 0.10969279])
As you can see, numpy performs these operations in-place.
Check numpy.vectorize, it should let you apply arbitrary functions to numpy arrays. For your simple example, you'd do something like this:
vecFunc = vectorize(lambda x: x + 1)
vecFunc(y)
However, that will create a new numpy array instead of modifying it in-place (which may or may not be a problem in your particular case).
In general, you'll always be better manipulating numpy structures with numpy functions than iterating with python functions, since the former are not only optimized but implemented in C, while the latter will be always interpreted.