I am trying to combine the solutions provided in both of these SO answers - Using threading to slice an array into chunks and perform calculation on each chunk and reassemble the returned arrays into one array and Pass multiple parameters to concurrent.futures.Executor.map?. I have a numpy array that I chunk into segments and I want each chunk to be sent to a separate thread and an additional argument to be sent along with the chunk of the original array. This additional argument is a constant and will not change. The performCalc is a function that will take two arguments -one the chunk of the original numpy array and a constant.
First solution I tried
import psutil
import numpy as np
import sys
from concurrent.futures import ThreadPoolExecutor
from functools import partial
def main():
testThread()
def testThread():
minLat = -65.76892
maxLat = 66.23587
minLon = -178.81404
maxLon = 176.2949
latGrid = np.arange(minLat,maxLat,0.05)
lonGrid = np.arange(minLon,maxLon,0.05)
gridLon,gridLat = np.meshgrid(latGrid,lonGrid)
grid_points = np.c_[gridLon.ravel(),gridLat.ravel()]
n_jobs = psutil.cpu_count(logical=False)
chunk = np.array_split(grid_points,n_jobs,axis=0)
x = ThreadPoolExecutor(max_workers=n_jobs)
maxDistance = 4.3
func = partial(performCalc,chunk)
args = [chunk,maxDistance]
# This prints 4.3 twice although there are four cores in the system
results = x.map(func,args)
# This prints 4.3 four times correctly
results1 = x.map(performTest,chunk)
def performCalc(chunk,maxDistance):
print(maxDistance)
return chunk
def performTest(chunk):
print("test")
main()
So performCalc() prints 4.3 twice even though the number of cores in the system is 4. While performTest() prints test four times correctly. I am not able to figure out the reason for this error.
Also I am sure the way I set up the for itertools.partial call is incorrect.
1) There are four chunks of the original numpy array.
2) Each chunk is to be paired with maxDistance and sent to performCalc()
3) There will be four threads that will print maxDistance and will return parts of the total result which will be returned in one array
Where am I going wrong ?
UPDATE
I tried using the lambda approach as well
results = x.map(lambda p:performCalc(*p),args)
but this prints nothing.
Using the solution provided by user mkorvas as shown here - How to pass a function with more than one argument to python concurrent.futures.ProcessPoolExecutor.map()? I was able to solve my problem as shown in the solution here -
import psutil
import numpy as np
import sys
from concurrent.futures import ThreadPoolExecutor
from functools import partial
def main():
testThread()
def testThread():
minLat = -65.76892
maxLat = 66.23587
minLon = -178.81404
maxLon = 176.2949
latGrid = np.arange(minLat,maxLat,0.05)
lonGrid = np.arange(minLon,maxLon,0.05)
print(latGrid.shape,lonGrid.shape)
gridLon,gridLat = np.meshgrid(latGrid,lonGrid)
grid_points = np.c_[gridLon.ravel(),gridLat.ravel()]
print(grid_points.shape)
n_jobs = psutil.cpu_count(logical=False)
chunk = np.array_split(grid_points,n_jobs,axis=0)
x = ThreadPoolExecutor(max_workers=n_jobs)
maxDistance = 4.3
func = partial(performCalc,maxDistance)
results = x.map(func,chunk)
def performCalc(maxDistance,chunk):
print(maxDistance)
return chunk
main()
What apparently one needs to do(and I do not know why and maybe somebody can clarify in another answer) is you need to switch the order of input to the function performCalc()
as shown here -
def performCalc(maxDistance,chunk):
print(maxDistance)
return chunk
Related
I'm trying to repeatedly run a function that requires a few positional arguments and involves random number generation (to generate many samples of a distribution). For a MWE, I think this captures everything:
import numpy as np
import multiprocessing as mup
from functools import partial
def rarr(xsize, ysize, k):
return np.random.rand(xsize, ysize)
def clever_array(nsamp, xsize=100, ysize=100, ncores=None):
np.random.seed()
if ncores is None:
p = mup.Pool()
else:
p = mup.Pool(ncores)
out = p.map_async( partial(rarr, xsize, ysize), range(nsamp))
p.close()
return np.array(out.get())
Note that the final positional argument for rarr() is just a dummy variable, since I am using map_async(), which requires an iterable. Now if I run %timeit clever_array(500, ncores = 1) I get 208 ms, whereas %timeit clever_array(500, ncores = 5) takes 149 ms. So there is definitely some kind of parallelism happening (the speedup isn't terribly impressive for this MWE but is decent in my real code).
However, I'm wondering a few things -- is there a more natural implementation other than the dummy variable for rarr() passed as an iterable to map_async to run this many times? Is there any obvious way to pass the xsize and ysize args to rarr() other than partial()? And is there any way to ensure different results from the different cores other than initializing a different random.seed() every time?
Thanks for any help!
Typically when we use multiprocessing we would expect different results from each invocation of a function, therefore it doesn't quite make sense to call the same function many times. In order to ensure the randomness of the sampling output, it is best to separate the random state (seed) from the function itself. The best approach as recommended by the numpy official doc is to use a np.random.Generator object, created via np.random.default_rng([seed]). With that we can modify your code to
import numpy as np
import multiprocessing as mup
from functools import partial
def rarr(xsize, ysize, rng):
return rng.random((xsize, ysize))
def clever_array(nsamp, xsize=100, ysize=100, ncores=None):
if ncores is None:
p = mup.Pool()
else:
p = mup.Pool(ncores)
out = p.map_async(partial(rarr, xsize, ysize), map(np.random.default_rng, range(nsamp)))
p.close()
return np.array(out.get())
I have a for loop that is calculating the sum of the output of a custom function called calculate_some that takes tuples as input and then outputs a single value. I wanted to speed up this code as it goes through 1000+ values.
Can vectorization speed this up ? What are my options ?
sum_calculate = 0
for i in range (0,len(GT_ndarray)):
sum_calculate = sum_calculate + calculate_some(Candidates[i][0],Candidates[i][1])
print(sum_calculate)
The code for calculate_some is this
def calculate_some(arr1,arr2):
some = arr1[0]*arr2[0]+arr1[1]+arr2[1]+arr1[2]*arr2[2]
return some
You can use multiprocessing.Pool, for example
import multiprocessing as mp
def worker(i):
return calculate_some(Candidates[i][0],Candidates[i][1])
pool = mp.Pool(mp.cpu_count() - 1)
sum_calculate = sum(list(pool.map(worker, range(len(GT_ndarray)))))
pool.close()
Similar to this question How to share a variable in 'joblib' Python library
I want to share a variable in joblib. However, my problem is completely different, I have a huge variable (2-3Gb of RAM) and I want all my threads to read from it. They will never write, something like:
def func(varThatChange, varToRead):
# Do something over varToRead depending on varThatChange
return results
def main():
results = Parallel(n_jobs=100)(delayed(func)(varThatChange, varToRead) for varThatChange in listVars)
I cannot share it normally because it needs a lot of time to copy the variable, moreover, I go out of memory.
How can I share it?
if your data/variable can be indexed you can use an approach like that:
from joblib import Parallel, delayed
import numpy as np
# dummy data
big_data = np.arange(1000)
# size of the data
data_size = len(big_data)
# number of chunks the data should be divided in for multiprocessing
num_chunks = 12
# size of one chunk
chunk_size = int(data_size / num_chunks)
# get the indices of the chunks
chunk_ind = [[i, i + chunk_size] for i in range(0, data_size, chunk_size)]
# function that does the data processing
def processing_func(segment):
# do the data processing
x = big_data[segment[0] : segment[-1]] * 1
return x
# results of the parallel processing - one list per chunk
parallel_results = Parallel(n_jobs=10)(delayed(processing_func)(i) for i in chunk_ind)
I am using JModelica to optimize a model using IPOPT in the background.
I would like to run many optimizations in parallel. At the moment I am doing
this using the multiprocessing module.
Right now, the code is as follows. It performs a parameter sweep over the
variables T and So and writes the results to output files named for these
parameters. The output files also contain a list of the parameters used in the
model along with the run results.
#!/usr/local/jmodelica/bin/jm_python.sh
import itertools
import multiprocessing
import numpy as np
import time
import sys
import signal
import traceback
import StringIO
import random
import cPickle as pickle
def PrintResToFile(filename,result):
def StripMX(x):
return str(x).replace('MX(','').replace(')','')
varstr = '#Variable Name={name: <10}, Unit={unit: <7}, Val={val: <10}, Col={col:< 5}, Comment="{comment}"\n'
with open(filename,'w') as fout:
#Print all variables at the top of the file, along with relevant information
#about them.
for var in result.model.getAllVariables():
if not result.is_variable(var.getName()):
val = result.initial(var.getName())
col = -1
else:
val = "Varies"
col = result.get_column(var.getName())
unit = StripMX(var.getUnit())
if not unit:
unit = "X"
fout.write(varstr.format(
name = var.getName(),
unit = unit,
val = val,
col = col,
comment = StripMX(var.getAttribute('comment'))
))
#Ensure that time variable is printed
fout.write(varstr.format(
name = 'time',
unit = 's',
val = 'Varies',
col = 0,
comment = 'None'
))
#The data matrix contains only time-varying variables. So fetch all of
#these, couple them in tuples with their column number, sort by column
#number, and then extract the name of the variable again. This results in a
#list of variable names which are guaranteed to be in the same order as the
#data matrix.
vkeys_in_order = [(result.get_column(x),x) for x in result.keys() if result.is_variable(x)]
vkeys_in_order = map(lambda x: x[1], sorted(vkeys_in_order))
for vk in vkeys_in_order:
fout.write("{0:>13},".format(vk))
fout.write("\n")
sio = StringIO.StringIO()
np.savetxt(sio, result.data_matrix, delimiter=',', fmt='%13.5f')
fout.write(sio.getvalue())
def RunModel(params):
T = params[0]
So = params[1]
try:
import pyjmi
signal.signal(signal.SIGINT, signal.SIG_IGN)
#For testing what happens if an error occurs
# import random
# if random.randint(0,100)<50:
# raise "Test Exception"
op = pyjmi.transfer_optimization_problem("ModelClass", "model.mop")
op.set('a', 0.20)
op.set('b', 1.00)
op.set('f', 0.05)
op.set('h', 0.05)
op.set('S0', So)
op.set('finalTime', T)
# Set options, see: http://www.jmodelica.org/api-docs/usersguide/1.13.0/ch07s06.html
opt_opts = op.optimize_options()
opt_opts['n_e'] = 40
opt_opts['IPOPT_options']['tol'] = 1e-10
opt_opts['IPOPT_options']['output_file'] = '/z/err_'+str(T)+'_'+str(So)+'_info.dat'
opt_opts['IPOPT_options']['linear_solver'] = 'ma27' #See: http://www.coin-or.org/Ipopt/documentation/node50.html
res = op.optimize(options=opt_opts)
result_file_name = 'out_'+str(T)+'_'+str(So)+'.dat'
PrintResToFile(result_file_name, res)
return (True,(T,So))
except:
ex_type, ex, tb = sys.exc_info()
return (False,(T,So),traceback.extract_tb(tb))
try:
fstatus = open('status','w')
except:
print("Could not open status file!")
sys.exit(-1)
T = map(float,[10,20,30,40,50,60,70,80,90,100,110,120,130,140])
So = np.arange(0.1,30.1,0.1)
tspairs = list(itertools.product(T,So))
random.shuffle(tspairs)
pool = multiprocessing.Pool()
mapit = pool.imap_unordered(RunModel,tspairs)
pool.close()
completed = 0
while True:
try:
res = mapit.next(timeout=2)
pickle.dump(res,fstatus)
fstatus.flush()
completed += 1
print(res)
print "{0: >4} of {1: >4} ({2: >4} left)".format(completed,len(tspairs),len(tspairs)-completed)
except KeyboardInterrupt:
pool.terminate()
pool.join()
sys.exit(0)
except multiprocessing.TimeoutError:
print "{0: >4} of {1: >4} ({2: >4} left)".format(completed,len(tspairs),len(tspairs)-completed)
except StopIteration:
break
Using the model:
optimization ModelClass(objective=-S(finalTime), startTime=0, finalTime=100)
parameter Real S0 = 2;
parameter Real F0 = 0;
parameter Real a = 0.2;
parameter Real b = 1;
parameter Real f = 0.05;
parameter Real h = 0.05;
output Real F(start=F0, fixed=true, min=0, max=100, unit="kg");
output Real S(start=S0, fixed=true, min=0, max=100, unit="kg");
input Real u(min=0, max=1);
equation
der(F) = u*(a*F+b);
der(S) = f*F/(1+h*F)-u*(a*F+b);
end ModelClass;
Is this safe?
No, it is not safe. op.optimize() will store the optimization results with a file name derived from the model name, and then load the results to return the data, so when you try to run several optimizations at once you will get a race condition. To circumvent this, you can provide distinct result file names in opt_opts['result_file_name'].
No. It does not seem to be safe as of 02015-11-09.
The code above names output files according to the input parameters. The output files also contain the input parameters used to run the model.
With 4 cores two situations arise:
Occasionally the error Inconsistent number of lines in the result data. is raised in the file /usr/local/jmodelica/Python/pyjmi/common/io.py.
Output files show one set of parameters internally but are named for a different set of parameters, which indicates disagreement between the parameters the script thinks it is processing and the parameters that are actually being processed.
With 24 cores:
The error The result does not seem to be of a supported format. is repeatedly raised by /usr/local/jmodelica/Python/pyjmi/common/io.py.
Together, this information suggests that intermediate files are being used by JModelica, but that there is overlap in the names of the intermediate files resulting in errors in the best case and incorrect results in the worst case.
One might hypothesize that this is the result of bad random number generation in a tempfile function somewhere, but a bug relating to that was resolved on 02011-11-25. Perhaps the PRNGs are being seeded based on a system clock or a constant and therefore progress in sync?
However, this does not seem to be the case since the following does not produce collisions:
#!/usr/bin/env python
import time
import tempfile
import os
import collections
from multiprocessing import Pool
def f(x):
tf = tempfile.NamedTemporaryFile(delete=False)
print(tf.name)
return tf.name
p = Pool(24)
ret = p.map(f, range(2000))
counts = collections.Counter(ret)
print(counts)
I am working with some code in Python and MPI4PY that is throwing a strange error. When I try to run the code below I it throws the following:
ERROR; return code from pthread_create() is 11
Error detail: Resource temporarily unavailable
sh: fork: retry: Resource temporarily unavailable
/home/sfortney/anaconda/lib/python2.7/site-packages/numexpr/cpuinfo.py:40: UserWarning: [Errno 11] Resource temporarily unavailable
warnings.warn(str(e), UserWarning, stacklevel=stacklevel)
I scaled up this code from a simpler, working MPI4PY script that I have posted as well below. From my research on this error it seems that I am creating too many threads. This seems odd to me as I am not calling any threading, just multiple processors (My basic understanding is that threads are an intra-core phenomenon which wouldn't be touched with if I was just calling multiple cores and doing one thing on each. Sorry if this is not true.).
I can't make sense of why the code at the bottom works perfectly but the code immediately below which uses the same structure does not. Why would the code below be running into thread constraints? And where in the code is it even calling multiple threads?
I have posted the whole code below for reproducibility of the error. If it is relevant I am running this on a 32 core LinuxBox.
#to run this call "mpiexec -n 10 python par_implement_wavefront.py" in terminal
from __future__ import division
import pandas as pd
import numpy as np
import itertools
import os
from itertools import chain, combinations
from operator import add
from collections import Counter
home="/home/sfortney"
np.set_printoptions(precision=2, suppress=True)
#choose dimensionality and granularity
dim=2
gran=5
from mpi4py import MPI
from mpi4py.MPI import ANY_SOURCE
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
command_buffer = np.zeros(3) # first entry is boolean, second is tuple with objective function inputs, third is array index
result_buffer=np.zeros(3) # first position is node,
if rank==0:
#defining all of our functions we will need on the root node first
#makes ax1 into the axes of an n dim array
def axis_fitter(arr, dim, gran, start=1, stop=101):
ax1=np.linspace(start,stop, num=gran)
for i in range(dim):
indexlist=[0]*dim
indexlist[i]= slice(None)
arr[indexlist]=ax1
return arr
#this is used to make the inital queues
#fix me to work with nan's!
def queue_init(arr):
queue=[]
queueposs=[]
queuedone=np.argwhere(arr >0).tolist()
return queue,queueposs,queuedone
#this is used in the queue updating function
def queue_sorter(queue):
queue.sort(key=lambda x: np.linalg.norm(np.array(x)))
# using the L1 norm
# queue.sort(key=lambda x: sum(x))
return queue
#this finds all the indicies to the "back" of our box
def back_index(dim):
standardbasis=[]
for i in range(dim):
vec=[0]*dim
vec[i]=vec[i]+1
standardbasis.append(vec)
powerset=[]
for z in chain.from_iterable(combinations(standardbasis,r) for r in range(len(standardbasis)+1)):
powerset.append(z)
powersetnew=[]
for i in range(len(powerset)):
powersetnew.append([sum(x) for x in zip(*list(powerset[i]))])
powersetnew.remove([])
powersetnew=[[i*(-1) for i in x] for x in powersetnew]
return powersetnew
#this takes a completed index and updates our queue of possible values
#as well as our done queue
def queue_update(queue,queueposs,queuedone, arr,dim,comp_idx=[0,0]):
queuedone.append(comp_idx)
if comp_idx==[0,0]:
init_index=[1]*dim
queue.append(init_index)
for i in range(dim):
poss_index=[1]*dim
poss_index[i]=2
queueposs.append(poss_index)
return queue,queueposs,queuedone
else:
queuedone.append(comp_idx)
try:
queueposs.remove(comp_idx)
except:
pass
for i in range(dim):
new_idx=comp_idx[:]
new_idx[i]=new_idx[i]+1
back_list=back_index(dim)
back_list2=[]
for x in back_list:
back_list2.append(list(np.add(np.asarray(new_idx),np.asarray(x))))
if set(tuple(x) for x in back_list2).issubset(set(tuple(x) for x in queuedone)):
queueposs.append(new_idx)
queueposs=list(set(tuple(x) for x in queueposs)-set(tuple(x) for x in queuedone))
queueposs=[list(x) for x in queueposs]
queueposs=queue_sorter(queueposs)
try:
for x in range(len(queueposs)):
queueappender=(queueposs).pop(x)
queue.append(queueappender)
except:
print "queueposs empty"
queue=queue_sorter(queue)
return queue,queueposs,queuedone
#this function makes it so we dont have to pass the whole array through MPI but only the pertinent information
def objectivefuncprimer(arr, queue_elem, dim):
inputs=back_index(dim)
inputs2=[]
for x in inputs:
inputs2.append(list(np.add(np.asarray(queue_elem),np.asarray(x))))
inputs3=[]
for x in range(len(inputs2)):
inputs3.append(arr[tuple(inputs2[x])])
return inputs3
#this function takes a value and an index and assigns the array that value at the index
def arrupdater(val,idx):
arr[tuple(idx)]=val
return arr, idx
#########Initializing
all_finished=False
#make our empty array
sizer=tuple([gran]*dim)
arr=np.zeros(shape=sizer)
nodes_avail=range(1, size) # 0 is not a worker
#assumes axes all start at same place
ax1=np.linspace(20,30, num=gran)
arr=axis_fitter(arr, dim, gran)
#fitting axes and initializing queues
arr=axis_fitter(arr, dim, gran, start=20, stop=30)
queue,queueposs,queuedone =queue_init(arr)
#running first updater
queue,queueposs,queuedone=queue_update(queue,queueposs,queuedone,arr,dim)
def sender(queue):
send_num=min(len(queue),len(nodes_avail))
for k in range(send_num):
node=nodes_avail.pop()
queue_elem=queue.pop(k)
command_buffer[0]=int(all_finished)
command_buffer[1]=queue_elem
command_buffer[2]=objectivefuncprimer(arr,queue_elem,dim)
comm.Send(command_buffer, dest=node)
while all_finished==False:
sender(queue)
comm.Recv(result_buffer,source=MPI.ANY_SOURCE)
arr,comp_idx=arrupdater(result_buffer[1],result_buffer[2])
queue,queueposs,queuedone=queue_update(queue,queueposs,queuedone,arr,dim,comp_idx)
nodes_avail.append(result_buffer[0])
if len(queuedone)==gran**2:
for n in range(1, size):
comm.Send(np.array([True,0,0]), dest=n)
all_finished=True
print arr
if rank>0:
all_finished_worker=False
#this test function will only work in 2d
def objectivefunc2d_2(inputs):
#this will be important for more complicated functions later
#backnum=(2**dim)-1
val=sum(inputs)
return val
while all_finished_worker==False:
comm.Recv(command_buffer, source=0)
all_finished_worker=bool(command_buffer[0])
if all_finished_worker==False:
result=objectivefunc2d_2(command_buffer[2])
# print str(result) +" from "+str(rank)
result_buffer=np.array([rank,result,command_buffer[1]])
comm.Send(result_buffer, dest=0)
This code works and is of the basic structure that I used above but on a much, much more simple example.
from __future__ import division
import numpy as np
import os
from itertools import chain, combinations
from mpi4py import MPI
from mpi4py.MPI import ANY_SOURCE
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
command_buffer = np.zeros(2) # first entry is boolean, rest is data
result_buffer=np.zeros(2) # first position is node, rest is data
if rank==0:
all_finished=False
nodes_avail=range(1, size) # 0 is not a worker
arr=[]
q=range(20)
def primer(q):
return int(all_finished),q
def sender(q):
send_num=min(len(q),len(nodes_avail))
for k in range(send_num):
node=nodes_avail.pop()
queue_init=q.pop()
command_buffer[0]=primer(queue_init)[0]
command_buffer[1]=primer(queue_init)[1]
comm.Send(command_buffer, dest=node)
while all_finished==False:
sender(q)
# update q
comm.Recv(result_buffer,source=MPI.ANY_SOURCE)
arr.append(result_buffer[1])
nodes_avail.append(result_buffer[0])
if len(arr)==20:
for n in range(1, size):
comm.Send(np.array([True,0]), dest=n)
all_finished=True
print arr
if rank>0:
all_finished_worker=False
while all_finished_worker==False:
comm.Recv(command_buffer, source=0)
all_finished_worker=bool(command_buffer[0])
if all_finished_worker==False:
result=command_buffer[1]*2
# print str(result) +" from "+str(rank)
result_buffer=np.array([rank,result])
comm.Send(result_buffer, dest=0)