Speed up nested for-loops in python / going through numpy array - python

Say I have 4 numpy arrays A,B,C,D , each the size of (256,256,1792).
I want to go through each element of those arrays and do something to it, but I need to do it in chunks of 256x256x256-cubes.
My code looks like this:
for l in range(7):
x, y, z, t = 0,0,0,0
for m in range(a.shape[0]):
for n in range(a.shape[1]):
for o in range(256*l,256*(l+1)):
t += D[m,n,o] * constant
x += A[m,n,o] * D[m,n,o] * constant
y += B[m,n,o] * D[m,n,o] * constant
z += C[m,n,o] * D[m,n,o] * constant
final = (x+y+z)/t
doOutput(final)
The code works and outputs exactly what I want, but its awfully slow. I've read online that those kind of nested for loops should be avoided in python. What is the cleanest solution to it? (right now I'm trying to do this part of my code in C and somehow import it via Cython or other tools, but I'd love a pure python solution)
Thanks
Add on
Willem Van Onsem's Solution to the first part seems to work just fine and I think I comprehend it. But now I want to modify my values before summing them. It looks like
(within the outer l loop)
for m in range(a.shape[0]):
for n in range(a.shape[1]):
for o in range(256*l,256*(l+1)):
R += (D[m,n,o] * constant * (A[m,n,o]**2
+ B[m,n,o]**2 + C[m,n,o]**2)/t - final**2)
doOutput(R)
I obviously can't just square the sum x = (A[:a.shape[0],:a.shape[1],256*l:256*(l+1)]*Dsub).sum()**2*constant since (A²+B²) != (A+B)²
How can I redo this last for loops?

Since you update t with every element of m in range(a.shape[0]), n in range(a.shape[1]) and o in range(256*l,256*(l+1)), you can substitute:
for m in range(a.shape[0]):
for n in range(a.shape[1]):
for o in range(256*l,256*(l+1)):
t += D[m,n,o]
With:
t += D[:a.shape[0],:a.shape[1],256*l:256*(l+1)].sum()
The same for the other assignments. So you can rewrite your code to:
for l in range(7):
Dsub = D[:a.shape[0],:a.shape[1],256*l:256*(l+1)]
x = (A[:a.shape[0],:a.shape[1],256*l:256*(l+1)]*Dsub).sum()*constant
y = (B[:a.shape[0],:a.shape[1],256*l:256*(l+1)]*Dsub).sum()*constant
z = (C[:a.shape[0],:a.shape[1],256*l:256*(l+1)]*Dsub).sum()*constant
t = Dsub.sum()*constant
final = (x+y+z)/t
doOutput(final)
Note that the * in numpy is the element-wise multiplication, not the matrix product. You can do the multiplication before the sum, but since the sum of a multiplications with a constant is equal to the multiplication of that constant with the sum, I think it is more efficient to do this out of the loop.
If a.shape[0] is equal to D.shape[0], etc. You can use : instead of :a.shape[0]. Based on your question, that seems to be the case. so:
# only when `a.shape[0] == D.shape[0], a.shape[1] == D.shape[1] (and so for A, B and C)`
for l in range(7):
Dsub = D[:,:,256*l:256*(l+1)]
x = (A[:,:,256*l:256*(l+1)]*Dsub).sum()*constant
y = (B[:,:,256*l:256*(l+1)]*Dsub).sum()*constant
z = (C[:,:,256*l:256*(l+1)]*Dsub).sum()*constant
t = Dsub.sum()*constant
final = (x+y+z)/t
doOutput(final)
Processing the .sum() on the numpy level will boost performance since you do not convert values back and forth and with .sum(), you use a tight loop.
EDIT:
Your updated question does not change much. You can simply use:
m,n,_* = a.shape
lo,hi = 256*l,256*(l+1)
R = (D[:m,:n,lo:hi]*constant*(A[:m,:n,lo:hi]**2+B[:m,:n,lo:hi]**2+D[:m,:n,lo:hi]**2)/t-final**2)).sum()
doOutput(R)

Related

IndexError trying to solve difference equations numerically in python

I'm practicing how to numerically solve difference equations, but I often run into problems like the one below.
Can anyone help me sort this out?
import numpy as np
N = 10
#alternative 1
#x = np.zeros(N+1, int) # Produces error IndexError: index 11 is out of bounds for axis 0 with size 11
#alternative 2
x = (N+1)*[0] # Produces error: IndexError: list assignment index out of range
x[0] = 1000
r = 1.02
for n in range(1, N+1):
x[n+1] = r**(n+1)*x[0]
print(f"x[{n}] = {x[n+1]}")
Fixing the indices
The range of your indices is inconsistent with the way you use them in the loop. You can use either of the following two possible loops, but don't mix them:
for n in range(1, N+1):
x[n] = r**n * x[0]
for n in range(0, N):
x[n+1] = r**(n+1) * x[0]
Optimization: multiplications instead of exponentiations
Note that computing an exponent ** is always more costly than computing a multiplication *; you can slightly optimize your code by using a recurrence formula:
for n in range(1, N+1):
x[n] = r * x[n-1]
for n in range(0, N):
x[n+1] = r * x[n]
Using library functions: itertools, numpy or pandas
What you are asking for is called a geometric progression. Python provides several ways of computing geometric progressions without writing the loop yourself.
Documentation: numpy.geomspace
Documentation: itertools.accumulate
Question: Geometric progression using Python / Pandas / Numpy
Question: python geometric sequence
Question: Generate a geometric progression using list comprehension
Question: Making a list of a geometric progression when the ratio and range are given
Question: Writing python code to calculate a Geometric progression
For instance:
import itertools # accumulate, repeat
import operator # mul
def geometric_progression(x0, r, N):
return list(itertools.accumulate(itertools.repeat(r,N), operator.mul, initial=x0))
print(geometric_progression(1000, 1.2, 10))
# [1000, 1200.0, 1440.0, 1728.0, 2073.6, 2488.3199999999997, 2985.9839999999995, 3583.180799999999, 4299.816959999999, 5159.780351999999, 6191.736422399998]
I think your problem that you should remember the index of any element in the list starting from zero and the index of the last element is N - 1 where N is the count of the elements in the list.
So you should make this change in your for loop:
for n in range(0, N):
Also, your using of print should be a reflection to the data in your list. So you should fix the argument of your print function to the following:
print(f"x[{n+1}] = {x[n+1]}")
After making these changes, you will get this result:
x[1] = 1020.0
x[2] = 1040.4
x[3] = 1061.208
x[4] = 1082.43216
x[5] = 1104.0808032
x[6] = 1126.1624192640002
x[7] = 1148.68566764928
x[8] = 1171.6593810022657
x[9] = 1195.092568622311
x[10] = 1218.9944199947574
Please, Note you have N + 1 elements not N elements in your list because of this line of your code
x = (N+1)*[0]
Hope this help.
The length of your array is 11, which means the last element is accessed by x[10]. But in the loop, the value being called when n is 10 is x[11] which makes it go out of range.
I'm not sure about the constraints of your problem, but if you want to access x[11], change the total size of the array to x = (N+2)*[0].
Output
x[1] = 1040.4
x[2] = 1061.208
x[3] = 1082.43216
x[4] = 1104.0808032
x[5] = 1126.1624192640002
x[6] = 1148.68566764928
x[7] = 1171.6593810022657
x[8] = 1195.092568622311
x[9] = 1218.9944199947574
x[10] = 1243.3743083946524

Nested summation in python

I'm trying to perform the following nested summation in python:
so I tried the following code:
import numpy as np
gamma = 17
R = 0.5
H = np.array([0.1,0.2])
alpha = np.array([0.1,0.2])
n = 2
F = 0
for i in range(n):
for j in range(i+1):
F = F + 3*gamma*H[i]*(R+H[j]*np.tan(alpha[j]))**2
But of course, this isn't giving me the right answer since it is summing all the terms again in the j loop. My question is how I can solve it? Bear in mind that this is just a small piece of a big expression with several summations for j like the one above inside a summation for i, so it must be something a little optimized. Thank you in advance!
I see at least the following options (order by increase efficiency):
Just the readable Pythonic Way
One nice thing about python is that you can write these expressions in very close analogy to the way it is written in math. In your case, you want to sum over an iterable of numbers:
f = sum(
3 * gamma * H[i] * (
R + (
sum(
H[j] * np.tan(alpha[j])
for j in range(i+1)
)
)
)**2
for i in range(n)
)
Caching the Inner Sum
In your case, the inner sum
sum(
H[j] * np.tan(alpha[j])
for j in range(i+1)
)
is calculated multiple times, while it just increments in every iteration. Let's just call this term inner_sum(index). Then inner_sum(index-1) has already been calculated in the previous iteration. So we loose time when we recalculate it in every iteration. One approach could be to make inner_sum a function and cache its previous results. We could use functools.cache for that purpose:
from functools import cache
#cache
def inner_sum(index: int) -> float:
if not index:
return H[0] * np.tan(alpha[0])
return inner_sum(H, index - 1) + H[index] * np.tan(alpha[index])
Now, we can just write:
f = sum(
3 * gamma * H[i] * (
R + inner_sum(i)
)**2
for i in range(n)
)
Using a Generator for the Partial Sum
This is still not memory-efficient, because we store all the H[i] for i < index in memory, while we actually just need the last one. There are different ways to implement an object which only stores the last value. You could just store it in a variable inner_sum_previous, for example. Or you could make inner_sum a proper generator spitting out (in fact: yielding) the partial sums one after another:
from typing import Generator
def partial_sum() -> Generator[float, None, None]:
partsum = 0
index = 0
while True:
try:
partsum += H[index] * np.tan(alpha[index])
yield partsum
index += 1
except IndexError:
raise StopIteration
With this, we would write;
partial_sum_generator = partial_sum()
f = sum(
3 * gamma * H[i] * (
R + next(partial_sum_generator)
)**2
for i in range(n)
)
for loop, in this case, is very similar to ∑, i.e. everything that is outside the ∑ in your formula should be outside the for loop, i.e.:
...
F = 0
for i in range(n):
inner_sum = 0
for j in range(i+1):
inner_sum += H[j] * np.tan(alpha[j])
F += 3 * gamma * H[i] * (R + inner_sum) ** 2
Calculate the part inside the parenthesis first in the j loop, store it in a variable, then multiply it by the rest of the expression afterwards.
Just for compactness and readability, I would go for something like this:
for i in range(n):
3*gamma*H[i]*(R + np.sum([H[j]*np.tan(alpha[i]) for j in range(i)]))**2
Obviously, you can also convert the first for loop into a sum over a list as I did with the second summation to make the expression more compact, but I think it is more readable this way.

element-wise matrix multiplication (Hadamard product) using numpy

So suppose i have two numpy ndarrays whose elements are matrices. I need element-wise multiplication for these two arrays, however, there should be matrix multiplication between the two matrix elements. Of course i would be able to implement this with for loops but i was looking to solve this problem without using an explicit for loop. How do i implement this?
EDIT: This for-loop does what I want to do. I'm on python 2.7
n = np.arange(8).reshape(2,2,1,2)
l = np.arange(1,9).reshape(2,2,2,1)
k = np.zeros((2,2))
for i in range(len(n)):
for j in range(len(n[i])):
k[i][j] = np.asscalar(n[i][j].dot(l[i][j]))
print k
Assuming your arrays of matrices are given as n+2 dimensional arrays A and B. What you want to achieve is as simple as C = A#B
Example
outer_dims = 2,3,4
inner_dims = 4,5,6
A = np.random.randint(0,10,(*outer_dims, *inner_dims[:2]))
B = np.random.randint(0,10,(*outer_dims, *inner_dims[1:]))
C = A#B
# check
for I in np.ndindex(outer_dims):
assert (C[I] == A[I]#B[I]).all()
UPDATE: Py2 version; thanks # hpaulj, Divakar
A = np.random.randint(0,10, outer_dims + inner_dims[:2])
B = np.random.randint(0,10, outer_dims + inner_dims[1:])
C = np.matmul(A,B)
# check
for I in np.ndindex(outer_dims):
assert (C[I] == np.matmul(A[I],B[I])).all()
If I understand correctly, this might work:
import numpy as np
a = np.array([[1,1],[1,0]])
b = np.array([[3,4],[5,4]])
x = np.array([[a,b],[b,a]])
y = np.array([[a,a],[b,b]])
result = np.array([_x # _y for _x, _y in zip(x,y)])

Efficient Numpy search in a non-monotonic array

I am trying to conduct something similar to searchsorted, but in the case where the array is not completely monotonic. Say I have a scalar, c and a 1D array x, I want to find the indices i of all elements such that x[i] < c <= x[i + 1]. Importantly, x is not completely monotonic.
The following code works, but I just would like to know if this is the most efficient way to do this, or if there is a simper way:
x = np.array([1,2,3,1,2,3,1,2,3])
c = 2.5
t = c > x[:-1]
u = c <= x[1:]
v = t*u
i = v.nonzero()[0]
Or in one line of code:
i = ( (c > x[:-1]) * (c <= x[1:] ).nonzero()[0]
Is this the most efficient way to recover these indices?
Two additional questions.
Is there an easy way to extend this to the case where c is a 1D array and x is a 2D array, where c has as many elements as "rows" in x, and I perform this search for each element of c in the corresponding "row" of x?
My ultimate goal is to do this with a three dimensional case. That is, suppose c is still a 1D vector with n elements. Now, let x be a 3D array, with dimensions j by n by k. Is there a way to do #1 above for each "submatrix" in x? Basically, performing #1 above j times.
For example:
x1 = np.array([1,2,3,1,2,3],[1,2,3,1,2,3],[1,2,3,1,2,3])
x2 = x1 + 1
x = np.array([x1,x2])
c = np.array([1.5,2.5,3.5])
Under #1 above, when we compare c and x1, we would get: [[0,4],[1,5],[]]
When we compare c and x2, we would get: [[],[0,4],[1,5]]
Finally, under #2, I would like to get:
[[[0,4],[1,5],[]],
[[],[0,4],[1,5]]]
We could compare once to give us the boolean mask and re-use it with negation to get the other comparison array and also use slicing -
m = c > x
i = np.flatnonzero( m[:-1] & ~m[1:] )
We can extend it to x as 2D and c as 1D case with a loop, but do minimal computations with it by pre-computing on the masks generation in a vectorized manner, like so -
m = c[:,None] > x
m2 = m[:,:-1] & ~m[:,1:]
i = [np.flatnonzero( mi ) for mi in m2]
On such task, numpy make too much comparisons. You can win a 5X factor with Numba. No difficulties to adapt for 3 dimensions.
#numba.njit
def ind(x,c):
res = empty_like(x)
i=j=0
while i < x.size-1:
if x[i]<c and c<=x[i+1]:
res[j]=i
j+=1
i+=1
return res[:j]

NumPy slicing: All except one array entry

What is the best way to exclude exact one NumPy array entry from an operation?
I have an array x containing n values and want to exclude the i-th entry when I call numpy.prod(x). I know about MaskedArray, but is there another/better way?
I think the simplest would be
np.prod(x[:i]) * np.prod(x[i+1:])
This should be fast and also works when you don't want to or can't modify x.
And in case x is multidimensional and i is a tuple:
x_f = x.ravel()
i_f = np.ravel_multi_index(i, x.shape)
np.prod(x_f[:i_f]) * np.prod(x_f[i_f+1:])
You could use np.delete whch removes an element from a one-dimensional array:
import numpy as np
x = np.arange(1, 5)
i = 2
y = np.prod(np.delete(x, i)) # gives 8
I don't think there is any better way, honestly. Even without knowing the NumPy functions, I would do it like:
#I assume x is array of len n
temp = x[i] #where i is the index of the value you don't want to change
x = x * 5
#...do whatever with the array...
x[i] = temp
If I understand correctly, your problem is one dimensional? Even if not, you can do this the same way.
EDIT:
I checked the prod function and in this case I think you can just replace the value u don't want to use with 1 (using temp approach I've given you above) and later just put in the right value. It is just a in-place change, so it's kinda efficient. The second way you can do this is just to divide the result by the x[i] value (assuming it's not 0, as commenters said).
As np.prod is taking the product of all the elements in an array, if we want to exclude one element from the solution, we can set that element to 1 first in order to ignore it (as p * 1 = p).
So:
>>> n = 10
>>> x = np.arange(10)
>>> i = 0
>>> x[i] = 1
>>> np.prod(x)
362880
which, we can see, works:
>>> 1 * 2 * 3 * 4 * 5 * 6 * 7 * 8 * 9
362880
You could use a list comprehension to index all the points but 1:
i = 2
np.prod(x[[val for val in range(len(x)) if val != i]])
or use a set difference:
np.prod(x[list(set(range(len(x)) - {i})])

Categories

Resources