Parallelize these nested for loops in python

Parallelize these nested for loops in python - python

I have a multidimensional array (result) that should be filled by some nested loops. Function fun() is a complex and time-consuming function. I want to fill my array elements in a parallel manner, so I can use all my system's processing power.
Here's the code:
import numpy as np
def fun(x, y, z):
# time-consuming computation...
# ...
return output
dim1 = 10
dim2 = 20
dim3 = 30
result = np.zeros([dim1, dim2, dim3])
for i in xrange(dim1):
for j in xrange(dim2):
for k in xrange(dim3):
result[i, j, k] = fun(i, j, k)
My question is that "Can I parallelize this code or not? if yes, How?"
I'm using Windows 10 64-bit and python 2.7.
Please provide your solution by changing my code if you can.
Thanks!

If you want a more general solution, taking advantage of fully parallel execution, then why not use something like this:
>>> import multiprocess as mp
>>> p = mp.Pool()
>>>
>>> # a time consuming function taking x,y,z,...
>>> def fun(*args):
... import time
... time.sleep(.1)
... return sum(*args)
...
>>> dim1, dim2, dim3 = 10, 20, 30
>>> import itertools
>>> input = ((i,j,k) for i,j,k in itertools.combinations_with_replacement(xrange(dim3), 3) if i < dim1 and j < dim2)
>>> results = p.map(fun, input)
>>> p.close()
>>> p.join()
>>>
>>> results[:2]
[0, 1]
>>> results[-2:]
[56, 57]
Note I'm using multiprocess instead of multiprocessing, but that's only to get the ability to work in the interpreter.
I didn't use a numpy.array, but if you had to... you could just dump the output from p.map directly into a numpy.array and then modify the shape attribute to be shape = (dim1, dim2, dim3), or you could do something like this:
>>> input = ((i,j,k) for i,j,k in itertools.combinations_with_replacement(xrange(dim3), 3) if i < dim1 and j < dim2)
>>> import numpy as np
>>> results = np.empty(dim1*dim2*dim3)
>>> res = p.imap(fun, input)
>>> for i,r in enumerate(res):
... results[i] = r
...
>>> results.shape = (dim1,dim2,dim3)

Here is a version of code that runs fun(i, j, k) in parallel for differend k indices. This is done by running fun in different processes by using https://docs.python.org/2/library/multiprocessing.html
import numpy as np
from multiprocessing import Pool
def fun(x, y, z):
# time-consuming computation...
# ...
return output
def fun_wrapper(indices):
fun(*indices)
if __name__ == '__main__':
dim1 = 10
dim2 = 20
dim3 = 30
result = np.zeros([dim1, dim2, dim3])
pool = Pool(processes=8)
for i in xrange(dim1):
for j in xrange(dim2):
result[i, j] = pool.map(fun_wrapper, [(i, j, k) for k in xrange(dim3)])
This is not the most elegant solution but you may start with it. And you will get a speed up only if fun contains time-consuming computation

A simple approach could be to divide the array in sections and create some threads to operate throught these sections. For example one section from (0,0,0) to (5,10,15) and other one from (5,10,16) to (10,20,30).
You can use threading module and do something like this
import numpy as np
import threading as t
def fun(x, y, z):
# time-consuming computation...
# ...
return output
dim1 = 10
dim2 = 20
dim3 = 30
result = np.zeros([dim1, dim2, dim3])
#b - beginning index, e - end index
def work(ib,jb,kb,ie,je,ke):
for i in xrange(ib,ie):
for j in xrange(jb,je):
for k in xrange(kb,ke):
result[i, j, k] = fun(i, j, k)
threads = list()
threads.append(t.Thread(target=work, args(0,0,0,dim1/2,dim2/2,dim3/2))
threads.append(t.Thread(target=work, args(dim1/2,dim2/2,dim3/2 +1,dim1, dim2, dim3))
for thread in threads:
thread.start()
You can define these sections through some algorithm and determine the number of threads dynamically. Hope it helps you or at least give you some ideas.

Related

Find the number in a given range so that the gcd of the number with any element of a given list will always be 1

Given a number M and a list A which contains N elements (A1, A2,...)
Find the all the numbers k so that:
1=<k=<M which satisfied gcd(Ai, k) is always equal to 1
Here's my code, the only problem for it is that it uses loops in each other, which slow the process if my inputs are big, how can I fix it so that it requires less time?
N, M = [int(v) for v in input().split()]
A = [int(v) for v in input().split()]
from math import gcd
cnt = 0
print(N)
for k in range(1, M+1):
for i in range(N):
if gcd(k, A[i]) == 1:
cnt += 1
if cnt == N:
print(k)
cnt = 0
inputs example: (first line contains N and M, second contains the list A1, A2,...)
3 12
6 1 5

Here's a fast version that eliminates the nested loops:
N, M = [int(v) for v in input().split()]
A = [int(v) for v in input().split()]
from math import gcd
print(N)
l = 1
for v in A:
l = l*v//gcd(l, v)
for k in range(1, M+1):
if gcd(l, k) == 1:
print(k)
It works by first taking the LCM, l, of the values in A. It then suffices to check if the GCD of k and l is 1, which means there are no common factors with any of the values in A.
Note: If you're using a newer version of Python than I am (3.9 or later), you can import lcm from math and replace l = l*v//gcd(l, v) with l = lcm(l, v).
Or, as Kelly Bundy pointed out, lcm accepts an arbitrary number of arguments, so the first loop can be replaced with l = lcm(*A) if you're using 3.9 or later.

Just another approach using sympy.theory, factorint and Python sets which from the point of view of speed has on my machine no advantage compared to the math.lcm() or the math.gcd() based solutions if applied to small sizes of lists and numbers, but excels at very large size of randomized list:
M = 12
lstA = (6, 1, 5)
from sympy.ntheory import factorint
lstAfactors = []
for a in lstA:
lstAfactors += factorint(a)
setA = set(lstAfactors)
for k in range(1, M+1):
if not (set(factorint(k)) & setA):
print(k)
The code above implements the idea described in the answer of Yatisi coded by Tom Karzes using math.gcd(), but is using sympy.ntheory factorint() and set() instead of math gcd().
In terms of speed the factorint() solution seems to be fastest on the below tested data:
# ======================================================================
from time import perf_counter as T
from math import gcd, lcm
from sympy import factorint
from random import choice
#M = 3000
#lstA = 100 * [6, 12, 18, 121, 256, 1024, 361, 2123, 39]
M = 8000
lstA = [ choice(range(1,8000)) for _ in range(8000) ]
# ----------------------------------------------------------------------
from sympy.ntheory import factorint
lstResults = []
lstAfactors = []
sT=T()
for a in lstA:
lstAfactors += factorint(a)
setA = set(lstAfactors)
for k in range(1, M+1):
if not (set(factorint(k)) & setA):
lstResults += [k]
print("factorint:", T()-sT)
#print(lstResults)
print("---")
# ----------------------------------------------------------------------
lstResults = []
sT=T()
#l = 1
#for a in lstA:
# l = (l*a)//gcd(l, a) # can be replaced by:
l = lcm(*lstA) # least common multiple divisible by all lstA items
# ^-- which runs MAYBE a bit faster than the loop with gcd()
for k in range(1, M+1):
if gcd(l, k) == 1:
lstResults += [k]
print("lcm() :", T()-sT)
#print(lstResults)
print("---")
# ----------------------------------------------------------------------
lstResults = []
sT=T()
l = 1
for a in lstA:
l = (l*a)//gcd(l, a) # can be replaced by:
#l = lcm(*lstA) # least common multiple divisible by all lstA items
# ^-- which runs MAYBE a bit faster than the loop with gcd()
for k in range(1, M+1):
if gcd(l, k) == 1:
lstResults += [k]
print("gcd() :", T()-sT)
#print(lstResults)
print("---")
# ----------------------------------------------------------------------
import numpy as np
A = np.array(lstA)
def find_gcd_np(M, A, to_gcd=1):
vals = np.arange(1, M + 1)
return vals[np.all(np.gcd(vals, np.array(A)[:, None]) == to_gcd, axis=0)]
sT=T()
lstResults = find_gcd_np(M, A, 1).tolist()
print("numpy :", T()-sT)
#print(lstResults)
print("---")
printing
factorint: 0.09754624799825251
---
lcm() : 0.10102138598449528
---
gcd() : 0.10236155497841537
---
numpy : 6.923375226906501
---
The timing results change extremely for the second data variant in the code provided above printing:
factorint: 0.021642255946062505
---
lcm() : 0.0010238440008834004
---
gcd() : 0.0013772319070994854
---
numpy : 0.19953695288859308
---
where the factorint based approach is 20x and the numpy based approach 200x times slower than the gcd/lcm based one.
Run the timing test yourself online. It won't run the case of large data, but it can at least demonstrate that the numpy approach is 100x times slower than the gcd one:
factorint: 0.03271647123619914
---
lcm() : 0.003286922350525856
---
gcd() : 0.0029655308462679386
---
numpy : 0.41759901121258736
1 https://ato.pxeger.com/run?1=3VXBitswED0W9BXDGoq16-zaSbtsAzmk0GN6KLmUkAbFkdcismQkZbc-9Et62Uv7Uf2ajiwnNt1Du7ClUIORpXmaefNmLH39Xjeu1Orh4dvBFaObHy--RDB7locURlfgRMUBQFS1Ng5qbopNrg_KcQPMwjKAKubKHnSb7xKQeRVstqnq5mQrWO60EcoFo2Fqh0NnzEstck6iBfqCGUzSNCWRtG6OkyxN4RxW1wlkY3xv_JglMH7tV9LxqwQm136ejSf4-WZNhk46H6suQoxhb3mcJTdpSikU2sAGhIKw7BdhTUgEo2d5BjJcKldybZrHaiDDD9wepLOe9GrdUg5mGxbscraMKfFkmSfrAVOCOQ6RF7PeZ8wosbxNHId4AAte9n3KKNziIqOtOxAFKO0g9pt6Z3sU6qV3NO9gEEIfWWPk1X5N6hZ8dto3PUsAaY_skpIoGPtN9AhHlc7oMyo-4DXULpK-kUj0i4ZRmwqaYnnO6NUV9m8sE2AUIsiZgi0Hw2vJcr6DbTMF4rHY3_G53-9RkjOL7aurSiuoMK6oJYeduBNWbPFr2wCTsg0HwvHKYsxPoxHclyIvwRyUhcX849t3yGorfFtY_3-5EoNjw4DUuoZ7gf-Yp_bb6nX89xQPAsj-oFo-F-oRT6nW3y5WqNXjdn9SpaL_rVSt139Wqu7YUgd_pOPxr2rijxdVXzJjWNOeMZTseAGFULsNkt2oOl4kME_AaT-fHZO_Y9Iet56kgQvIaGs23B2MalErj5EyxsFn75eSPuScrqYJvNeKr1sRQxjsic_CzlLat9Owyx6zy-il01JYF5-0C1k-UepwC3eX8fFS_gk)

This is probably more a math question than a programming question, however, here comes my take: Depending on M and A, it might be better to
Find the prime divisors of the Ai (have a look at this) and put them in a set.
Either remove (sieve) all multiples of these primes from list(range(1,M+1)), which you can do (more) efficiently by smart ordering, or find all primes smaller or equal to M (which could even be pre-computed) that are not divisors of any Ai and compute all multiples up to M.
Explanation: Since gcd(Ai,k)=1 if and only if Ai and k have no common divisors, they also have no prime divisors. Thus, we can first find all prime divisors of the Ai and then make sure our k don't have any of them as divisors, too.

Using numpy with vectorised operations will be a good alternative when your input range M goes up to hundreds and higher and A is stably small (is about as your current A):
import numpy as np
def find_gcd_np(M, A, to_gcd=1):
vals = np.arange(1, M + 1)
return vals[np.all(np.gcd(vals, np.array(A)[:, None]) == to_gcd, axis=0)]
Usage:
print(find_gcd_np(100, [6, 1, 5], 1))

How to append multiprocessed result that is in a loop?

I want to utilize multiple processors to calculate a function for two lists of values. In the test case below, my idea is that I have two lists: c = [1,2,3], and c_shift = [2,3,4]. I want to evaluate a function for a single value in each list, and append two separate solution arrays.
import numpy as np
import multiprocessing as mp
def function(x,a,b,c):
return a*x**2+b*x+c
def calculate(x,a,b,c):
c_shift = c+1
result = []
result_shift = []
for i in range(len(c)):
process0 = mp.Process(target = function, args = (x,a,b,c[i]))
process1 = mp.Process(target = function, args = (x,a,b,c_shift[i]))
process0.start()
process1.start()
process0.join()
process1.join()
# After it finishes, how do I append each list?
return np.array(result), np.array(result_shift)
if __name__ == '__main__':
x = np.linspace(-1,1,50)
a = 1
b = 1
c = np.array([1,2,3])
calculate(x,a,b,c)
When each process finishes, and goes through join(), how do I append process0 to result = [] and process1 to result_shift = []?
The structure of the returned results should have the form:
result = [ [1 x 50], [1 x 50], [1 x 50] ]
result_shifted = [ [1 x 50], [1 x 50], [1 x 50] ]

Slightly different approach but I think this is what you were looking to do?
import multiprocessing
import numpy as np
from functools import partial
def your_func(c, your_x, a, b):
results = []
for c_value in c:
results.append(a * your_x ** 2 + b * your_x + c_value)
return results
def get_results(c_values):
your_x = np.linspace(-1, 1, 50)
a = 1
b = 1
with multiprocessing.Pool() as pool:
single_arg_function = partial(your_func, your_x=your_x, a=a, b=b)
out = pool.map(single_arg_function, c_values)
return out
if __name__ == "__main__":
c_values = [np.array([1, 2, 3]), np.array([1, 2, 3]) + 1]
out = get_results(c_values)
result_one = out[0]
result_two = out[1]

I'm not sure exactly what you're trying to accomplish with the shifted results, but in terms of concurrency, you should check out concurrent.futures to execute parallel tasks. Also, take a look at functools.partial to create a partial object - essentially a function with pre-filled args / kwargs. Here's an example:
import concurrent.futures
from functools import partial
import numpy as np
def map_processes(func, _iterable):
with concurrent.futures.ProcessPoolExecutor() as executor:
result = executor.map(func, _iterable)
return result
def function(x, a, b, c):
return a * x**2 + b * (x + c)
if __name__ == "__main__":
base_func = partial(function, np.linspace(-1, 1, 50), 1, 1)
print(list(map_processes(base_func, np.array([1, 2, 3]))))

Indexing multidimensional numpy array inside numba's jitclass

I'm trying to insert a small multidimensional array into a larger one inside a numba jitclass. The small array is set specific positions of the larger array defined by an index list.
The following MWE shows the problem without numba - everything works as expected
import numpy as np
class NumbaClass(object):
def __init__(self, n, m):
self.A = np.zeros((n, m))
# solution 1 using pure python
def nonNumbaFunction1(self, idx, values):
self.A[idx[:, None], idx] = values
# solution 2 using pure python
def nonNumbaFunction2(self, idx, values):
self.A[np.ix_(idx, idx)] = values
if __name__ == "__main__":
n = 6
m = 8
obj = NumbaClass(n, m)
print(f'A =\n{obj.A}')
idx = np.array([0, 2, 5])
values = np.arange(len(idx)**2).reshape(len(idx), len(idx))
print(f'values =\n{values}')
obj.nonNumbaFunction1(idx, values)
print(f'A =\n{obj.A}')
obj.nonNumbaFunction2(idx, values)
print(f'A =\n{obj.A}')
Both functions nonNumbaFunction1 and nonNumbaFunction2 do not work inside a numba class. So my current solution looks like this which is not really nice in my opinion
import numpy as np
from numba import jitclass
from numba import int64, float64
from collections import OrderedDict
specs = OrderedDict()
specs['A'] = float64[:, :]
#jitclass(specs)
class NumbaClass(object):
def __init__(self, n, m):
self.A = np.zeros((n, m))
# solution for numba jitclass
def numbaFunction(self, idx, values):
for i in range(len(values)):
idxi = idx[i]
for j in range(len(values)):
idxj = idx[j]
self.A[idxi, idxj] = values[i, j]
if __name__ == "__main__":
n = 6
m = 8
obj = NumbaClass(n, m)
print(f'A =\n{obj.A}')
idx = np.array([0, 2, 5])
values = np.arange(len(idx)**2).reshape(len(idx), len(idx))
print(f'values =\n{values}')
obj.numbaFunction(idx, values)
print(f'A =\n{obj.A}')
So my questions are:
Does anyone know a solution to this indexing in numba or is there another vectorized solution?
Is there a faster solution for nonNumbaFunction1?
It might be useful to know that inserted array is small (4x4 to 10x10), but this indexing appears in nested loops so it has to be quiet fast as well! Later I need a similar indexing for three dimensional objects too.

Because of limitations on numba's indexing support, I don't think you can do any better than writing out the for loops yourself. To make it generic across dimensions, you could use the generated_jit decorator to specialize. Something like this:
def set_2d(target, values, idx):
for i in range(values.shape[0]):
for j in range(values.shape[1]):
target[idx[i], idx[j]] = values[i, j]
def set_3d(target, values, idx):
for i in range(values.shape[0]):
for j in range(values.shape[1]):
for k in range(values.shape[2]):
target[idx[i], idx[j], idx[k]] = values[i, j, l]
#numba.generated_jit
def set_nd(target, values, idx):
if target.ndim == 2:
return set_2d
elif target.ndim == 3:
return set_3d
Then, this could be used in your jitclass
specs = OrderedDict()
specs['A'] = float64[:, :]
#jitclass(specs)
class NumbaClass(object):
def __init__(self, n, m):
self.A = np.zeros((n, m))
def numbaFunction(self, idx, values):
set_nd(self.A, values, idx)

How to vectorize a class instantiation to allow NumPy arrays as input?

I programmed class which looks something like this:
import numpy as np
class blank():
def __init__(self,a,b,c):
self.a=a
self.b=b
self.c=c
n=5
c=a/b*8
if (a>b):
y=c+a*b
else:
y=c-a*b
p = np.empty([1,1])
k = np.empty([1,1])
l = np.empty([1,1])
p[0]=b
k[0]=b*(c-1)
l[0]=p+k
for i in range(1, n, 1):
p=np.append(p,l[i-1])
k=np.append(k,(p[i]*(c+1)))
l=np.append(l,p[i]+k[i])
komp = np.zeros(shape=(n, 1))
for i in range(0, n):
pl_avg = (p[i] + l[i]) / 2
h=pl_avg*3
komp[i]=pl_avg*h/4
self.tot=komp+l
And when I call it like this:
from ex1 import blank
import numpy as np
res=blank(1,2,3)
print(res.tot)
everything works well.
BUT I want to call it like this:
res = blank(np.array([1,2,3]), np.array([3,4,5]), 3)
Is there an easy way to call it for each i element of this two arrays without editing class code?

You won't be able to instantiate a class with NumPy arrays as inputs without changing the class code. #PabloAlvarez and #NagaKiran already provided alternative: iterate with zip over arrays and instantiate class for each pair of elements. While this is pretty simple solution, it defeats the purpose of using NumPy with its efficient vectorized operations.
Here is how I suggest you to rewrite the code:
from typing import Union
import numpy as np
def total(a: Union[float, np.ndarray],
b: Union[float, np.ndarray],
n: int = 5) -> np.array:
"""Calculates what your self.tot was"""
bc = 8 * a
c = bc / b
vectorized_geometric_progression = np.vectorize(geometric_progression,
otypes=[np.ndarray])
l = np.stack(vectorized_geometric_progression(bc, c, n))
l = np.atleast_2d(l)
p = np.insert(l[:, :-1], 0, b, axis=1)
l = np.squeeze(l)
p = np.squeeze(p)
pl_avg = (p + l) / 2
komp = np.array([0.75 * pl_avg ** 2]).T
return komp + l
def geometric_progression(bc, c, n):
"""Calculates array l"""
return bc * np.logspace(start=0,
stop=n - 1,
num=n,
base=c + 2)
And you can call it both for sole numbers and NumPy arrays like that:
>>> print(total(1, 2))
[[2.6750000e+01 6.6750000e+01 3.0675000e+02 1.7467500e+03 1.0386750e+04]
[5.9600000e+02 6.3600000e+02 8.7600000e+02 2.3160000e+03 1.0956000e+04]
[2.1176000e+04 2.1216000e+04 2.1456000e+04 2.2896000e+04 3.1536000e+04]
[7.6205600e+05 7.6209600e+05 7.6233600e+05 7.6377600e+05 7.7241600e+05]
[2.7433736e+07 2.7433776e+07 2.7434016e+07 2.7435456e+07 2.7444096e+07]]
>>> print(total(3, 4))
[[1.71000000e+02 3.39000000e+02 1.68300000e+03 1.24350000e+04 9.84510000e+04]
[8.77200000e+03 8.94000000e+03 1.02840000e+04 2.10360000e+04 1.07052000e+05]
[5.59896000e+05 5.60064000e+05 5.61408000e+05 5.72160000e+05 6.58176000e+05]
[3.58318320e+07 3.58320000e+07 3.58333440e+07 3.58440960e+07 3.59301120e+07]
[2.29323574e+09 2.29323590e+09 2.29323725e+09 2.29324800e+09 2.29333402e+09]]
>>> print(total(np.array([1, 3]), np.array([2, 4])))
[[[2.67500000e+01 6.67500000e+01 3.06750000e+02 1.74675000e+03 1.03867500e+04]
[1.71000000e+02 3.39000000e+02 1.68300000e+03 1.24350000e+04 9.84510000e+04]]
[[5.96000000e+02 6.36000000e+02 8.76000000e+02 2.31600000e+03 1.09560000e+04]
[8.77200000e+03 8.94000000e+03 1.02840000e+04 2.10360000e+04 1.07052000e+05]]
[[2.11760000e+04 2.12160000e+04 2.14560000e+04 2.28960000e+04 3.15360000e+04]
[5.59896000e+05 5.60064000e+05 5.61408000e+05 5.72160000e+05 6.58176000e+05]]
[[7.62056000e+05 7.62096000e+05 7.62336000e+05 7.63776000e+05 7.72416000e+05]
[3.58318320e+07 3.58320000e+07 3.58333440e+07 3.58440960e+07 3.59301120e+07]]
[[2.74337360e+07 2.74337760e+07 2.74340160e+07 2.74354560e+07 2.74440960e+07]
[2.29323574e+09 2.29323590e+09 2.29323725e+09 2.29324800e+09 2.29333402e+09]]]
You can see that results are in compliance.
Explanation:
First of all I'd like to note that your calculation of p, k, and l doesn't have to be in the loop. Moreover, calculating k is unnecessary. If you see carefully, how elements of p and l are calculated, they are just geometric progressions (except the 1st element of p):
p = [b, b*c, b*c*(c+2), b*c*(c+2)**2, b*c*(c+2)**3, b*c*(c+2)**4, ...]
l = [b*c, b*c*(c+2), b*c*(c+2)**2, b*c*(c+2)**3, b*c*(c+2)**4, b*c*(c+2)**5, ...]
So, instead of that loop, you can use np.logspace. Unfortunately, np.logspace doesn't support base parameter as an array, so we have no other choice but to use np.vectorize which is just a loop under the hood...
Calculating of komp though is easily vectorized. You can see it in my example. No need for loops there.
Also, as I already noted in a comment, your class doesn't have to be a class, so I took a liberty of changing it to a function.
Next, note that input parameter c is overwritten, so I got rid of it. Variable y is never used. (Also, you could calculate it just as y = c + a * b * np.sign(a - b))
And finally, I'd like to remark that creating NumPy arrays with np.append is very inefficient (as it was pointed out by #kabanus), so you should always try to create them at once - no loops, no appending.
P.S.: I used np.atleast_2d and np.squeeze in my code and it could be unclear why I did it. They are necessary to avoid if-else clauses where we would check dimensions of array l. You can print intermediate results to see what is really going on there. Nothing difficult.

if it is just calling class with two different list elements, loop can satisfies well
res = [blank(i,j,3) for i,j in zip(np.array([1,2,3]),np.array([3,4,5]))]
You can see list of values for res variable

The only way I can think of iterating lists of arrays is by using a function on the main program for iteration and then do the operations you need to do inside the loop.
This solution works for each element of both arrays (note to use zip function for making the iteration in both lists if they have a small size as listed in this answer here):
for n,x in zip(np.array([1,2,3]),np.array([3,4,5])):
res=blank(n,x,3)
print(res.tot)
Hope it is what you need!

Multiprocessing for calculation on one array

This question following this one [1]. I have a big 3D array and i have to do some heavy calculations on it.
I would like to split a slice of my array in 4 parts and do calculations for each part with each 4 cores of my computer...
And do that for each slices of my 3D array...what is the best way to do that?
import numpy
size = 8.
Y=(arange(2000))
X=(arange(2000))
(xx,yy)=meshgrid(X,Y)
array=zeros((Y.shape[0],X.shape[0],size))
array[:,:,0] = 0
array[:,:,1] = X+Y
array[:,:,2] = X*cos(X)+Y*sin(Y)
array[:,:,3] = X**3+sin(X)+X**2+Y**2+sin(Y)

You can use Pool from the multiprocessing module:
from multiprocessing import Pool
def f(num):
return num * 2 # replace with heavy computation
lst = [1,2,3,4,5,6,7,8,9,10,11]
p = Pool(4)
print p.map(f, lst)
It will work equally well with a 3-dimensional numpy array:
from multiprocessing import Pool
import numpy
def f(num):
return num * 2 # replace with heavy computation
arr = numpy.array(
[numpy.array([
numpy.array([1,2,3]),
numpy.array([4,5,6]),
numpy.array([7,8,9])]),
numpy.array([
numpy.array([1,2,3]),
numpy.array([4,5,6]),
numpy.array([7,8,9])])])
p = Pool(4)
print p.map(f, arr)

As an alternative to multiprocessing, you can use the concurrent.futures module:
import concurrent.futures
def f(num):
return num * 2
arr = […]
with concurrent.futures.ProcessPoolExecutor() as exc:
print(list(exc.map(f, arr)))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parallelize these nested for loops in python - python

Related

Find the number in a given range so that the gcd of the number with any element of a given list will always be 1

How to append multiprocessed result that is in a loop?

Indexing multidimensional numpy array inside numba's jitclass

How to vectorize a class instantiation to allow NumPy arrays as input?

Multiprocessing for calculation on one array

Categories

Resources