Better alternative to nested for loops through arrays in numpy? - python

Often I need to traverse an array and perform some operation on each entry, where the operation may depend on the indices and the value of the entry. Here is a simple example.
import numpy as np
N=10
M = np.zeros((N,N))
for i in range(N):
for j in range(N):
M[i,j] = 1/((i+2*j+1)**2)
Is there a shorter, cleaner, or more pythonic way to perform such tasks?

What you show is 'pythonic' in the sense that it uses a Python list and iteration approach. The only use of numpy is in assigning the values, M{i,j] =. Lists don't take that kind of index.
To make most use of numpy, make index grids or arrays, and calculate all values at once, without explicit loop. For example, in your case:
In [333]: N=10
In [334]: I,J = np.ogrid[0:10,0:10]
In [335]: I
Out[335]:
array([[0],
[1],
[2],
[3],
[4],
[5],
[6],
[7],
[8],
[9]])
In [336]: J
Out[336]: array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
In [337]: M = 1/((I + 2*J + 1)**2)
In [338]: M
Out[338]:
array([[ 1. , 0.11111111, 0.04 , 0.02040816, 0.01234568,
0.00826446, 0.00591716, 0.00444444, 0.00346021, 0.00277008],
...
[ 0.01 , 0.00694444, 0.00510204, 0.00390625, 0.00308642,
0.0025 , 0.00206612, 0.00173611, 0.00147929, 0.00127551]])
ogrid is one of several ways of construction sets of arrays that can be 'broadcast' together. meshgrid is another common function.
In your case, the equation is one that works well with 2 arrays like this. It depends very much on broadcasting rules, which you should study.
If the function only takes scalar inputs, we will have to use some form of iteration. That has been a frequent SO question; search for [numpy] vectorize.

np.fromfunction is intended for that :
def f(i,j) : return 1/((i+2*j+1)**2)
M = np.fromfunction(f,(N,N))
it's slighty slower that the 'hand made' vectorised way , but easy to understand.

I would say that's the most straight forward and universally understood way of performing that iteration.
An alternative would be to iterate over over the values and call a function for a given (i, j) pair
import itertools
N = 10
M = np.zeros((N,N))
def do_work(i, j):
M[i,j] = 1/((i+2*j+1)**2)
[do_work(i, j) for (i, j) in itertools.product(xrange(N), xrange(N))]
Here I just used itertools.product to create a generator for an possible (i, j) values, you can just as well use a for loop.
for (i, j) in itertools.product(xrange(N), xrange(N)):
M[i,j] = 1/((i+2*j+1)**2)

Yes, you can do this in pure NumPy without using any loops:
import numpy as np
N = 10
i = np.arange(N)[:, np.newaxis]
j = np.arange(N)
M = 1/((i+2*j+1)**2)
The reason why this works is because NumPy automatically performs outer products whenever you mix row- and column vectors within an expression.
Moreover, since this is pure NumPy, the code will also run a lot faster.
For example, for N=10**4, the double for loop version takes 48.3 seconds on my computer, whereas this code is already finished after only 1.2 seconds.

Related

What is difference b/w Python Range() vs Numpy.arange() function?

I learned on my web search that numpy.arange take less space than python range function. but i tried
using below it gives me different result.
import sys
x = range(1,10000)
print(sys.getsizeof(x)) # --> Output is 48
a = np.arange(1,10000,1,dtype=np.int8)
print(sys.getsizeof(a)) # --> OutPut is 10095
Could anyone please explain?
In PY3, range is an object that can generate a sequence of numbers; it is not the actual sequence. You may need to brush up on some basic Python reading, paying attention to things like lists and generators, and their differences.
In [359]: x = range(3)
In [360]: x
Out[360]: range(0, 3)
We have use something like list or a list comprehension to actually create those numbers:
In [361]: list(x)
Out[361]: [0, 1, 2]
In [362]: [i for i in x]
Out[362]: [0, 1, 2]
A range is often used in a for i in range(3): print(i) kind of loop.
arange is a numpy function that produces a numpy array:
In [363]: arr = np.arange(3)
In [364]: arr
Out[364]: array([0, 1, 2])
We can iterate on such an array, but it is slower than [362]:
In [365]: [i for i in arr]
Out[365]: [0, 1, 2]
But for doing things math, the array is much better:
In [366]: arr * 10
Out[366]: array([ 0, 10, 20])
The array can also be created from the list [361] (and for compatibility with earlier Py2 usage from the range itself):
In [376]: np.array(list(x)) # np.array(x)
Out[376]: array([0, 1, 2])
But this is slower than using arange directly (that's an implementation detail).
Despite the similarity in names, these shouldn't be seen as simple alternatives. Use range in basic Python constructs such as for loop and comprehension. Use arange when you need an array.
An important innovation in Python (compared to earlier languages) is that we could iterate directly on a list. We didn't have to step through indices. And if we needed indices along with with values we could use enumerate:
In [378]: alist = ['a','b','c']
In [379]: for i in range(3): print(alist[i]) # index iteration
a
b
c
In [380]: for v in alist: print(v) # iterate on list directly
a
b
c
In [381]: for i,v in enumerate(alist): print(i,v) # index and values
0 a
1 b
2 c
Thus you might not see range used that much in basic Python code.
the range type constructor creates range objects, which represent sequences of integers with a start, stop, and step in a space efficient manner, calculating the values on the fly.
np.arange function returns a numpy.ndarray object, which is essentially a wrapper around a primitive array. This is a fast and relatively compact representation, compared to if you created a python list, so list(range(N)), but range objects are more space efficient, and indeed, take constant space, so for all practical purposes, range(a) is the same size as range(b) for any integers a, b
As an aside, you should take care interpreting the results of sys.getsizeof, you must understand what it is doing. So do not naively compare the size of Python lists and numpy.ndarray, for example.
Perhaps whatever you read was referring to Python 2, where range returned a list. List objects do require more space than numpy.ndarray objects, generally.
arange store each individual value of the array while range store only 3 values (start, stop and step). That's the reason arange is taking more space compared to range.
As the question is about the size, this will be the answer.
But there are many advantages of using numpy array and arange than python lists for speed, space and efficiency perspective.

Increment all entries in an array by 'n' without a for loop

I have an array:
arr = [5,5,5,5,5,5]
I want to increment a particular range in the arr by 'n'. So if n=2 and the range is [2,5].
The array should look like this:
arr = [5,5,7,7,7,5]
Needed to do this without a for loop, for a problem im trying to solve.
Tried:
arr[2:5] = [n]*3
but that obviously replaces the entries and becomes:
arr = [5,5,3,3,3,5]
Any suggestions would be highly appriciated.
n = 2
arr_range = slice(2, 5)
arr = [5,5,7,7,7,5]
arr[arr_range] = map(lambda x: x+n, arr[arr_range])
# arr
# [5, 5, 9, 9, 9, 5]
But I would recommend using numpy...
import numpy as np
n = 2
arr_range = slice(2, 5)
arr = np.array([5,5,7,7,7,5])
arr[arr_range] += n
You actually have a list, not an array. If you convert it to a Numpy array it is simple.
>>> n=3
>>> arr = np.array([5,5,5,5,5,5])
>>> arr[2:5] += n
>>> arr
array([5, 5, 8, 8, 8, 5])
You have basically two options (for code see below):
Use slice assignment via a list comprehension (a[:] = [x+1 for x in a]),
Use a for-loop (even though you exclude this in your question, I don't see a legitimate reason for doing so).
They come with pros and cons. Let's assume you are going to replace some fraction of the list items (as opposed to a fixed number of items). The for-loop runs in Python and hence might be slower but it has O(1) memory usage. The list comprehension and slice assignment both operate in C (assuming you are using CPython) but it has O(N) memory usage due to the temporary list.
Using a generator doesn't buy anything since it is converted to a list anyway before the assignment happens (this is necessary because if the generator had fewer or more items than the slice, the list would need to be resized accordingly; see the source code).
Using a map adds even more overhead since it needs to call the mapped function on every item.
The following is a performance comparison of the different methods. The for-loop is fastest for very small lists since it has minimal overhead (just the range object). For more than about a dozen items, the list comprehension clearly outperforms the other methods and especially for larger lists (len(a) > 3e5) the difference to the generator becomes noticeable (the generator cannot provide information about its size, so the generated list needs to be resized as more items are fetched). For very large lists the difference between for-loop and list comprehension seems to shrink again since the memory overhead tends to outweigh the loop cost, but reaching that point would require unusually large lists (where you'd be better off using something like Numpy anyway).
This is the code using the perfplot package:
import numpy
import perfplot
def use_generator(a):
i = slice(0, len(a)//2)
a[i] = (x+1 for x in a[i])
def use_map(a):
i = slice(0, len(a)//2)
a[i] = map(lambda x: x+1, a[i])
def use_list(a):
i = slice(0, len(a)//2)
a[i] = [x+1 for x in a[i]]
def use_loop(a):
for i in range(len(a)//2):
a[i] += 1
perfplot.show(
setup=lambda n: [0]*n,
kernels=[use_generator, use_map, use_list, use_loop],
n_range=[2**k for k in range(1, 26)],
xlabel="len(a)",
equality_check=None,
)

Numpy: find row-wise common element efficiently

Suppose we are given two 2D numpy arrays a and b with the same number of rows. Assume furthermore that we know that each row i of a and b has at most one element in common, though this element may occur multiple times. How can we find this element as efficiently as possible?
An example:
import numpy as np
a = np.array([[1, 2, 3],
[2, 5, 2],
[5, 4, 4],
[2, 1, 3]])
b = np.array([[4, 5],
[3, 2],
[1, 5],
[0, 5]])
desiredResult = np.array([[np.nan],
[2],
[5],
[np.nan]])
It is easy to come up with a streightforward implementation by applying intersect1d along the first axis:
from intertools import starmap
desiredResult = np.array(list(starmap(np.intersect1d, zip(a, b))))
Apperently, using python's builtin set operations is even quicker. Converting the result to the desired form is easy.
However, I need an implementation as efficient as possible. Hence, I do not like the starmap, as I suppose that it requires a python call for every row. I would like a purely vectorized option, and would be happy, if this even exploitet our additional knowledge that there is at most one common value per row.
Does anyone have ideas how I could speed up the task and implement the solution more elegantly? I would be okay with using C code or cython, but coding effort should be not too much.
Approach #1
Here's a vectorized one based on searchsorted2d -
# Sort each row of a and b in-place
a.sort(1)
b.sort(1)
# Use 2D searchsorted row-wise between a and b
idx = searchsorted2d(a,b)
# "Clip-out" out of bounds indices
idx[idx==a.shape[1]] = 0
# Get mask of valid ones i.e. matches
mask = np.take_along_axis(a,idx,axis=1)==b
# Use argmax to get first match as we know there's at most one match
match_val = np.take_along_axis(b,mask.argmax(1)[:,None],axis=1)
# Finally use np.where to choose between valid match
# (decided by any one True in each row of mask)
out = np.where(mask.any(1)[:,None],match_val,np.nan)
Approach #2
Numba-based one for memory efficiency -
from numba import njit
#njit(parallel=True)
def numba_f1(a,b,out):
n,a_ncols = a.shape
b_ncols = b.shape[1]
for i in range(n):
for j in range(a_ncols):
for k in range(b_ncols):
m = a[i,j]==b[i,k]
if m:
break
if m:
out[i] = a[i,j]
break
return out
def find_first_common_elem_per_row(a,b):
out = np.full(len(a),np.nan)
numba_f1(a,b,out)
return out
Approach #3
Here's another vectorized one based on stacking and sorting -
r = np.arange(len(a))
ab = np.hstack((a,b))
idx = ab.argsort(1)
ab_s = ab[r[:,None],idx]
m = ab_s[:,:-1] == ab_s[:,1:]
m2 = (idx[:,1:]*m)>=a.shape[1]
m3 = m & m2
out = np.where(m3.any(1),b[r,idx[r,m3.argmax(1)+1]-a.shape[1]],np.nan)
Approach #4
For an elegant one, we can make use of broadcasting for a resource-hungry method -
m = (a[:,None]==b[:,:,None]).any(2)
out = np.where(m.any(1),b[np.arange(len(a)),m.argmax(1)],np.nan)
Doing some research, I found that checking whether two lists are disjoint runs in O(n+m), whereby n and m are the lengths of the lists (see here). The idea is that instertion and lookup of elements run in constant time for hash maps. Therefore, inserting all elements from the first list into a hashmap takes O(n) operations, and checking for each element in the second list whether it is already in the hash map takes O(m) operations. Therefore, solutions based on sorting, which run in O(n log(n) + m log(m)), are not optimal asymptotically.
Though the solutions by #Divakar are highly efficient in many use cases, they are less efficient, if the second dimension is large. Then, a solution based on hash maps is better suited. I have implemented it as follows in cython:
import numpy as np
cimport numpy as np
import cython
from libc.math cimport NAN
from libcpp.unordered_map cimport unordered_map
np.import_array()
#cython.boundscheck(False)
#cython.wraparound(False)
def get_common_element2d(np.ndarray[double, ndim=2] arr1,
np.ndarray[double, ndim=2] arr2):
cdef np.ndarray[double, ndim=1] result = np.empty(arr1.shape[0])
cdef int dim1 = arr1.shape[1]
cdef int dim2 = arr2.shape[1]
cdef int i, j
cdef unordered_map[double, int] tmpset = unordered_map[double, int]()
for i in range(arr1.shape[0]):
for j in range(dim1):
# insert arr1[i, j] as key without assigned value
tmpset[arr1[i, j]]
for j in range(dim2):
# check whether arr2[i, j] is in tmpset
if tmpset.count(arr2[i,j]):
result[i] = arr2[i,j]
break
else:
result[i] = NAN
tmpset.clear()
return result
I have created test cases as follows:
import numpy as np
import timeit
from itertools import starmap
from mycythonmodule import get_common_element2d
m, n = 3000, 3000
a = np.random.rand(m, n)
b = np.random.rand(m, n)
for i, row in enumerate(a):
if np.random.randint(2):
common = np.random.choice(row, 1)
b[i][np.random.choice(np.arange(n), np.random.randint(min(n,20)), False)] = common
# we need to copy the arrays on each test run, otherwise they
# will remain sorted, which would bias the results
%timeit [set(aa).intersection(bb) for aa, bb in zip(a.copy(), b.copy())]
# returns 3.11 s ± 56.8 ms
%timeit list(starmap(np.intersect1d, zip(a.copy(), b.copy)))
# returns 1.83 s ± 55.4
# test sorting method
# divakarsMethod1 is the appraoch #1 in #Divakar's answer
%timeit divakarsMethod1(a.copy(), b.copy())
# returns 1.88 s ± 18 ms
# test hash map method
%timeit get_common_element2d(a.copy(), b.copy())
# returns 1.46 s ± 22.6 ms
These results seem to indicate that the naive approach is actually better than some vectorized versions. However, the vectorized algorithms play out their strengths, if many rows with fewer columns are considered (a different use case). In these cases, the vectorized approaches are more than 5 times faster than the naive appraoch and the sorting method turns out to be best.
Conclusion: I will go with the HashMap-based cython version, because it is among the most efficient variants in both use cases. If I had to set up cython first, I would use the sorting-based method.
Not sure if this is faster, but we can try a couple things here:
Method 1 np.intersect1d with list comprehension
[np.intersect1d(arr[0], arr[1]) for arr in list(zip(a,b))]
# Out
[array([], dtype=int32), array([2]), array([5]), array([], dtype=int32)]
Or to list:
[np.intersect1d(arr[0], arr[1]).tolist() for arr in list(zip(a,b))]
# Out
[[], [2], [5], []]
Method 2 set with list comprehension:
[list(set(arr[0]) & set(arr[1])) for arr in list(zip(a,b))]
# Out
[[], [2], [5], []]

Iterate over numpy with index (numpy equivalent of python enumerate)

I'm trying to create a function that will calculate the lattice distance (number of horizontal and vertical steps) between elements in a multi-dimensional numpy array. For this I need to retrieve the actual numbers from the indexes of each element as I iterate through the array. I want to store those values as numbers that I can run through a distance formula.
For the example array A
A=np.array([[1,2,3],[4,5,6],[7,8,9]])
I'd like to create a loop that iterates through each element and for the first element 1 it would retrieve a=0, b=0 since 1 is at A[0,0], then a=0, b=1 for element 2 as it is located at A[0,1], and so on...
My envisioned output is two numbers (corresponding to the two index values for that element) for each element in the array. So in the example above, it would be the two values that I am assigning to be a and b. I only will need to retrieve these two numbers within the loop (rather than save separately as another data object).
Any thoughts on how to do this would be greatly appreciated!
As I've become more familiar with the numpy and pandas ecosystem, it's become clearer to me that iteration is usually outright wrong due to how slow it is in comparison, and writing to use a vectorized operation is best whenever possible. Though the style is not as obvious/Pythonic at first, I've (anecdotally) gained ridiculous speedups with vectorized operations; more than 1000x in a case of swapping out a form like some row iteration .apply(lambda)
#MSeifert's answer much better provides this and will be significantly more performant on a dataset of any real size
More general Answer by #cs95 covering and comparing alternatives to iteration in Pandas
Original Answer
You can iterate through the values in your array with numpy.ndenumerate to get the indices of the values in your array.
Using the documentation above:
A = np.array([[1,2,3],[4,5,6],[7,8,9]])
for index, values in np.ndenumerate(A):
print(index, values) # operate here
You can do it using np.ndenumerate but generally you don't need to iterate over an array.
You can simply create a meshgrid (or open grid) to get all indices at once and you can then process them (vectorized) much faster.
For example
>>> x, y = np.mgrid[slice(A.shape[0]), slice(A.shape[1])]
>>> x
array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2]])
>>> y
array([[0, 1, 2],
[0, 1, 2],
[0, 1, 2]])
and these can be processed like any other array. So if your function that needs the indices can be vectorized you shouldn't do the manual loop!
For example to calculate the lattice distance for each point to a point say (2, 3):
>>> abs(x - 2) + abs(y - 3)
array([[5, 4, 3],
[4, 3, 2],
[3, 2, 1]])
For distances an ogrid would be faster. Just replace np.mgrid with np.ogrid:
>>> x, y = np.ogrid[slice(A.shape[0]), slice(A.shape[1])]
>>> np.hypot(x - 2, y - 3) # cartesian distance this time! :-)
array([[ 3.60555128, 2.82842712, 2.23606798],
[ 3.16227766, 2.23606798, 1.41421356],
[ 3. , 2. , 1. ]])
Another possible solution:
import numpy as np
A=np.array([[1,2,3],[4,5,6],[7,8,9]])
for _, val in np.ndenumerate(A):
ind = np.argwhere(A==val)
print val, ind
In this case you will obtain the array of indexes if value appears in array not once.

Populate numpy matrix from the difference of two vectors

Is it possible to construct a numpy matrix from a function? In this case specifically the function is the absolute difference of two vectors: S[i,j] = abs(A[i] - B[j]). A minimal working example that uses regular python:
import numpy as np
A = np.array([1,3,6])
B = np.array([2,4,6])
S = np.zeros((3,3))
for i,x in enumerate(A):
for j,y in enumerate(B):
S[i,j] = abs(x-y)
Giving:
[[ 1. 3. 5.]
[ 1. 1. 3.]
[ 4. 2. 0.]]
It would be nice to have a construction that looks something like:
def build_matrix(shape, input_function, *args)
where I can pass an input function with it's arguments and retain the speed advantage of numpy.
In addition to what #JoshAdel has suggested, you can also use the outer method of any numpy ufunc to do the broadcasting in the case of two arrays.
In this case, you just want np.subtract.outer(A, B) (Or, rather, the absolute value of it).
While either one is fairly readable for this example, in some cases broadcasting is more useful, while in others using ufunc methods is cleaner.
Either way, it's useful to know both tricks.
E.g.
import numpy as np
A = np.array([1,3,6])
B = np.array([2,4,6])
diff = np.subtract.outer(A, B)
result = np.abs(diff)
Basically, you can use outer, accumulate, reduce, and reduceat with any numpy ufunc such as subtract, multiply, divide, or even things like logical_and, etc.
For example, np.cumsum is equivalent to np.add.accumulate. This means you could implement something like a cumdiv by np.divide.accumulate if you even needed to.
I recommend taking a look into numpy's broadcasting capabilities:
In [6]: np.abs(A[:,np.newaxis] - B)
Out[6]:
array([[1, 3, 5],
[1, 1, 3],
[4, 2, 0]])
http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html
Then you could simply write your function as:
In [7]: def build_matrix(func,args):
...: return func(*args)
...:
In [8]: def f1(A,B):
...: return np.abs(A[:,np.newaxis] - B)
...:
In [9]: build_matrix(f1,(A,B))
Out[9]:
array([[1, 3, 5],
[1, 1, 3],
[4, 2, 0]])
This should also be considerably faster than your solution for larger arrays.

Categories

Resources