How can I optimize loops through a multidimensional array? - python

After gathering some 3D netCDF data on Python 3, I am in the process of looping through each x,y data point to calculate another variable. The calculation of this variable is dependent upon the z for a given x,y point. The code seems to be running correctly but is awfully slow; I am wondering if anyone has suggestions on how to optimize the code to have it run more quickly.
I've gone from a lengthier code that defined many intermediate variables to something rather bare bones, which is shown here. Even after trimming the code, it runs slowly (i.e., a few minutes for each i in the outer for loop).
for i in range(0,217):
print(i)
for j in range(0,301):
for k in range(10,30):
if (data.variables[longvars[v][2]][0][k][i][j]-data.variables[longvars[v][3]][0][i][j]) <= 3000.0:
break
if (abs(data.variables[longvars[v][2]][0][k][i][j]-data.variables[longvars[v][3]][0][i][j])-3000.) \
< (abs(data.variables[longvars[v][2]][0][k-1][i][j]-data.variables[longvars[v][3]][0][i][j])-3000.):
lev = k
else:
lev = k-1
newd[i][j] = np.sqrt(((data.variables[longvars[v][0]][0][lev][i][j]-data.variables[longvars[v][4]][0][0][i][j])**2)+((data.variables[longvars[v][1]][0][lev][i][j]-data.variables[longvars[v][5]][0][0][i][j])**2))
I imagine there may be a way to do this with another array that stores the correct z (k) level for each x,y (i,j) point, then runs the calculation over the entire array of data. However, I don't know that it would be any faster. I appreciate any help that folks can provide!

The logic looks sound, but we can optimize it a bit further using generators and comprehensions.
Let's isolate the inner logic into a function called findZValue.
def findZValue(v, i, j, variables, longvars, np):
Forgive me if I am reading this wrong, but it looks like you are trying to find the index of the value closest to 3000? If so first we will make a generator that returns a tuple containing the index and the absolute value of "variable - variable - 3000":
def findZValue(v, i, j, variables, longvars, np):
lev = ((k, abs(variables[longvars[v][2]][0][k][i][j] - variables[longvars[v][3]][0][i][j] - 3000)) for k in range(10, 30))
In order to get the value we want, we wrap the whole thing in a min function (with the key saying we want it sorted by the second value) and specify we want to get the index (i.e. the first value in the tuple returned by min):
def findZValue(v, i, j, variables, longvars, np):
lev = min(((k, abs(variables[longvars[v][2]][0][k][i][j] - variables[longvars[v][3]][0][i][j] - 3000)) for k in range(10, 30)), key = lambda t: t[1])[0]
For the value put into "newd" it looks like you are taking the root of the sum of the squares (i.e. its normalization or magnitude). Luckily numpy (which is what I assume "np" is) has a built-in-method for finding the magnitude/normalization of an array: np.linalg.norm. All we have to do is put the other values into an np.array and then call it on them:
def findZValue(v, i, j, variables, longvars, np):
lev = min(((k, abs(variables[longvars[v][2]][0][k][i][j] - variables[longvars[v][3]][0][i][j] - 3000)) for k in range(10, 30)), key = lambda t: t[1])[0]
return np.linalg.norm(np.array(variables[longvars[v][0]][0][lev][i][j]-variables[longvars[v][4]][0][0][i][j], variables[longvars[v][1]][0][lev][i][j]-variables[longvars[v][5]][0][0][i][j]))
Now we can put the entire loop into a nested comprehension:
newd = [[findZValue(v, i, j, data.variables, longvars, np) for j in range(301)] for i in range(217)]
def findZValue(v, i, j, variables, longvars, np):
lev = min(((k, abs(variables[longvars[v][2]][0][k][i][j] - variables[longvars[v][3]][0][i][j] - 3000)) for k in range(10, 30)), key = lambda t: t[1])[0]
return np.linalg.norm(np.array(variables[longvars[v][0]][0][lev][i][j]-variables[longvars[v][4]][0][0][i][j], variables[longvars[v][1]][0][lev][i][j]-variables[longvars[v][5]][0][0][i][j]))
Using generators and comprehensions should speed things up over using for loops. But if you really want to crank things up we can use "multiprocessing". Specifically, a multiprocessing pool. In order to do so we will need to create a second function to handle each vector (this is due to restrictions on how multiprocessing pools work):
from multiprocessing import Pool
def findZValue(v, i, j, variables, longvars, np):
lev = min(((k, abs(variables[longvars[v][2]][0][k][i][j] - variables[longvars[v][3]][0][i][j] - 3000)) for k in range(10, 30)), key = lambda t: t[1])[0]
return np.linalg.norm(np.array(variables[longvars[v][0]][0][lev][i][j]-variables[longvars[v][4]][0][0][i][j], variables[longvars[v][1]][0][lev][i][j]-variables[longvars[v][5]][0][0][i][j]))
def findZValuesForVector(vector):
return [findZValue(*values) for values in vector]
with Pool(processes=4) as pool:
newd = pool.map(findZValuesForVector, [[[v, i, j, data.variables, longvars, np] for j in range(301)] for i in range(217)])
You can alter the number of "processes" created for the pool to see what gives you the best results.

Related

Fastest way to count duplicate integers in two distinct sections of an array

In this snippet of Python code,
fun iterates through the array arr and counts the number of identical integers in two array sections for every section pair. (It simulates a matrix.) This makes n*(n-1)/2*m comparisons in total, giving a time complexity of O(n^2).
Are there programming solutions or ways of reframing this problem that would yield equivalent results but have reduced time complexity?
# n > 500000, 0 < i < n, m = 100
# dim(arr) = n*m, 0 < arr[x] < 4294967311
arr = mp.RawArray(ctypes.c_uint, n*m)
def fun(i):
for j in range(i-1,0,-1):
count = 0
for k in range(0,m):
count += (arr[i*m+k] == arr[j*m+k])
if count/m > 0.7:
return (i,j)
return ()
arr is a shared memory array, therefore it's best kept read-only for simplicity and performance reasons.
arr is implemented as a 1D RawArray from multiprocessing. The reason for this it has by far the fastest performance according to my tests. Using a numpy 2D array, for example, like this:
arr = np.ctypeslib.as_array(mp.RawArray(ctypes.c_uint, n*m)).reshape(n,m)
would provide vectorization capabilities, but increases the total runtime by an order of magnitude - 250s vs. 30s for n = 1500, which amounts to 733%.
Since you can't change the array characteristics at all, I think you're stuck with O(n^2). numpy would gain some vectorization, but would change the access for others sharing the array. Start with the innermost operation:
for k in range(0,m):
count += (arr[i][k] == arr[j][k])
Change this to a one-line assignment:
count = sum(arr[i][k] == arr[j][k] for k in range(m))
Now, if this is truly an array, rather than a list of lists, use the array package's vectorization to simplify the loops, one at a time:
count = sum(arr[i] == arr[j]) # results in a vector of counts
You can now return the j indices where count[j] / m > 0.7. Note that there's no real need to return i for each one: it's constant within the function, and the calling program already has the value. Your array package likely has a pair of vectorized indexing operations that can return those indices. If you're using numpy, those are easy enough to look up on this site.
So after fiddling around some more, I was able to cut down the running time greatly with help from NumPy's vectorization and Numba's JIT compiler. Going back to the original code:
arr = mp.RawArray(ctypes.c_uint, n*m)
def fun(i):
for j in range(i-1,0,-1):
count = 0
for k in range(0,m):
count += (arr[i*m+k] == arr[j*m+k])
if count/m > 0.7:
return (i,j)
return ()
We can leave out the bottom return statement as well as dismiss the idea of using count entirely, leaving us with:
def fun(i):
for j in range(i-1,0,-1):
if sum(arr[i*m+k] == arr[j*m+k] for k in range(m)) > 0.7*m:
return (i,j)
Then, we change the array arr to a NumPy format:
np_arr = np.frombuffer(arr,dtype='int32').reshape(m,n)
The important thing to note here is that we do not use a NumPy array as a shared memory array to be written from multiple processes, avoiding the overhead pitfall.
Finally, we apply Numba's decorator and rewrite the sum function in vector form so that it works with the new array:
import numba as nb
#nb.njit(fastmath=True,parallel=True)
def fun(i):
for j in range(i-1, 0, -1):
if np.sum(np_arr[i] == np_arr[j]) > 0.7*m:
return (i,j)
This reduced the running time to 7.9s, which is definitely a victory for me.

Combination of matrix elements giving non-zero value (PYTHON)

I have to evaluate the following expression, given two quite large matrices A,B and a very complicated function F:
The mathematical expression
I was thinking if there is an efficient way in order to first find those indices i,j that will give a non-zero element after the multiplication of the matrices, so that I avoid the quite slow 'for loops'.
Current working code
# Starting with 4 random matrices
A = np.random.randint(0,2,size=(50,50))
B = np.random.randint(0,2,size=(50,50))
C = np.random.randint(0,2,size=(50,50))
D = np.random.randint(0,2,size=(50,50))
indices []
for i in range(A.shape[0]):
for j in range(A.shape[0]):
if A[i,j] != 0:
for k in range(B.shape[1]):
if B[j,k] != 0:
for l in range(C.shape[1]):
if A[i,j]*B[j,k]*C[k,l]*D[l,i]!=0:
indices.append((i,j,k,l))
print indices
As you can see, in order to get the indices I need I have to use nested loops (= huge computational time).
My guess would be NO: you cannot avoid the for-loops. In order to find all the indices ij you need to loop through all the elements which defeats the purpose of this check. Therefore, you should go ahead and use simple array elementwise multiplication and dot product in numpy - it should be quite fast with for loops taken care by numpy.
However, if you plan on using a Python loop then the answer is YES, you can avoid them by using numpy, using the following pseudo-code (=hand-waving):
i, j = np.indices((N, M)) # CAREFUL: you may need to swap i<->j or N<->M
fs = F(i, j, z) # array of values of function F
# for a given z over the index grid
R = np.dot(A*fs, B) # summation over j
# return R # if necessary do a summation over i: np.sum(R, axis=...)
If the issue is that computing fs = F(i, j, z) is a very slow operation, then you will have to identify elements of A that are zero using two loops built-in into numpy (so they are quite fast):
good = np.nonzero(A) # hidden double loop (for 2D data)
fs = np.zeros_like(A)
fs[good] = F(i[good], j[good], z) # compute F only where A != 0

Surprising behaviour of enumerate function

I wrote some Python code using the enumerate function.
A = [2,3,5,7]
for i, x in enumerate(A):
# calculate product with each element to the right
for j, y in enumerate(A, start=i+1):
print(x*y)
I expected it to calculate 6 products: 2*3, 2*5, 2*7, 3*5, 3*7, 5*7
Instead, it calculated all possible 16 products. What's going on?
The start parameter of enumerate solely influences the first value of the yielded tuple (i.e. i and j), it does not influence at which index the enumeration starts. As the manual puts it, enumerate is equivalent to this:
def enumerate(sequence, start=0):
n = start
for elem in sequence:
yield n, elem
n += 1
What you want is this:
for i, x in enumerate(A):
for y in A[i + 1:]:
print(x * y)
The question here is firstly what enumerate did, and secondly why you're using it. The base function of enumerate is to convert an iterable of the form (a,b,c) to an iterable of the form ((start,a), (start+1,b), (start+2,c)). It adds a new column which is typically used as an index; in your code, this is i and j. It doesn't change the entries contained in the sequence.
I believe the operation you were intending is a slice, extracting only part of the list:
for i, x in enumerate(A):
for y in A[i+1:]:
print(x*y)
If it is important not to copy the list (it rarely is), you can replace A[i+1:] with itertools.islice(A, i+1, len(A)).
A side note is that the start argument may be useful in the outer loop in this code. We're only using i+1, not i so we may as well use that value as our index:
for nextindex, x in enumerate(A, 1):
for y in A[nextindex:]:
print(x*y)

alternative (faster) way to 3 nested for loop python

How can I make this function faster? (I call it a lot of times and it could result in some speed improvements)
def vectorr(I, J, K):
vect = []
for k in range(0, K):
for j in range(0, J):
for i in range(0, I):
vect.append([i, j, k])
return vect
You can try to take a look at itertools.product
Equivalent to nested for-loops in a generator expression. For example,
product(A, B) returns the same as ((x,y) for x in A for y in B).
The nested loops cycle like an odometer with the rightmost element
advancing on every iteration. This pattern creates a lexicographic
ordering so that if the input’s iterables are sorted, the product
tuples are emitted in sorted order.
Also no need in 0 while calling range(0, I) and etc - use just range(I)
So in your case it can be:
import itertools
def vectorr(I, J, K):
return itertools.product(range(K), range(J), range(I))
You said you want it to be faster. Let's use NumPy!
import numpy as np
def vectorr(I, J, K):
arr = np.empty((I*J*K, 3), int)
arr[:,0] = np.tile(np.arange(I), J*K)
arr[:,1] = np.tile(np.repeat(np.arange(J), I), K)
arr[:,2] = np.repeat(np.arange(K), I*J)
return arr
There may be even more elegant tweaks possible here, but that's a basic tiling that gives the same result (but as a 2D array rather than a list of lists). The code for this is all implemented in C, so it's very, very fast--this may be important if the input values may get somewhat large.
The other answers are more thorough and, in this specific case at least, better, but in general, if you're using Python 2, and for large values of I, J, or K, use xrange() instead of range(). xrange gives a generator-like object, instead of constructing a list, so you don't have to allocate memory for the entire list.
In Python 3, range works like Python 2's xrange.
import numpy
def vectorr(I,J,K):
val = numpy.indices( (I,J,K))
val.shape = (3,-1)
return val.transpose() # or val.transpose().tolist()

Is there a way to avoid this memory error?

I'm currently working through the problems on Project Euler, and so far I've come up with this code for a problem.
from itertools import combinations
import time
def findanums(n):
l = []
for i in range(1, n + 1):
s = []
for j in range(1, i):
if i % j == 0:
s.append(j)
if sum(s) > i:
l.append(i)
return l
start = time.time() #start time
limit = 28123
anums = findanums(limit + 1) #abundant numbers (1..limit)
print "done finding abundants", time.time() - start
pairs = combinations(anums, 2)
print "done finding combinations", time.time() - start
sums = map(lambda x: x[0]+x[1], pairs)
print "done finding all possible sums", time.time() - start
print "start main loop"
answer = 0
for i in range(1,limit+1):
if i not in sums:
answer += i
print "ANSWER:",answer
When I run this I run into a MemoryError.
The traceback:
File "test.py", line 20, in <module>
sums = map(lambda x: x[0]+x[1], pairs)
I've tried to prevent it by disabling garbage collection from what I've been able to get from Google but to no avail. Am I approaching this the wrong way? In my head this feels like the most natural way to do it and I'm at a loss at this point.
SIDE NOTE: I'm using PyPy 2.0 Beta2(Python 2.7.4) because it is so much faster than any other python implementation I've used, and Scipy/Numpy are over my head as I'm still just beginning to program and I don't have an engineering or strong math background.
As Kevin mention in the comments, your algorithm might be wrong, but anyway your code is not optimized.
When using very big lists, it is common to use generators, there is a famous, great Stackoverflow answer explaining the concepts of yield and generator - What does the "yield" keyword do in Python?
The problem is that your pairs = combinations(anums, 2) doesn't generate a generator, but generates a large object which is much more memory consuming.
I changed your code to have this function, since you iterating over the collection only once, you can lazy evaluate the values:
def generator_sol(anums1, s):
for comb in itertools.combinations(anums1, s):
yield comb
Now instead of pairs = combinations(anums, 2) which generates a large object.
Use:
pairs = generator_sol(anums, 2)
Then, instead of using the lambda I would use another generator:
sums_sol = (x[0]+x[1] for x in pairs)
Another tip, you better look at xrange which is more suitable, it doens't generate a list but an xrange object which is more efficient in many cases (such as here).
Let me suggest you to use generators. Try changing this:
sums = map(lambda x: x[0]+x[1], pairs)
to
sums = (a+b for (a,b) in pairs)
Ofiris solution is also ok but implies that itertools.combinations return a list when it's wrong. If you are going to keep solving project euler problems have a look at the itertools documentation.
The issue is that anums is big - about 28000 elements long. so pairs must be 28000*28000*8 bytes = 6GB. If you used numpy you could cast anums as a numpy.int16 array, in which case the result pairs would be 1.5GB - more manageable:
import numpy as np
#cast
anums = np.array([anums],dtype=np.int16)
#compute the sum of all the pairs via outer product
pairs = (anums + anums.T).ravel()

Categories

Resources