PicklingError on pp module of python - python

I'm using the pp module of python.
What I need to do is run in parallel the "fmin" function of "scipy.optimize".
I'm importing fmin like this:
from scipy.optimize import fmin
Next, I'm defining a function which executes the fmin function like this:
def fitting():
v = fmin(e, v0, args=(x,y),maxiter=10000, maxfun=10000)
return v
And for this to run in parallel I'm using:
job5 = job_server.submit(fitting, (e, v0, x, y,), (fitting,), ("scipy.optimize",))
v = job5()
Then I get a PicklingError in module of job5. That is "scipy.optimize" I guess.
I also tried import scipy.optimize as sth but job_server.submit does not accept "sth" as a module.
Any solutions?
Thank you.

Put the from scipy.optimize import fmin import line into the fitting function directly, and stop passing it into submit.

You can't do this with pp very easily. However, if you use dill and the fork of pp that is in pathos (i.e. pathos.pp) then it works in most cases.
See several examples in the mystic optimization package, which provides parallel and distributed optimization using extensions of the scipy optimizers.
For example, this works with both pathos.multiprocessing and pathos.pp:
https://github.com/uqfoundation/mystic/blob/master/examples/buckshot_example06.py
The above code launches several fmin_powell instances in parallel, which can give you pseudo-global optimization with steepest-descent speeds.
Get the code here: https://github.com/uqfoundation

Related

Pass arguments to objective function in scipy differential evolution with multiple workers

For some optimization problem I am using differential evolution from scipys optimization toolbox.
I'd like to use several CPUs to speed up the process, but I would like to pass several additional arguments to the objective function. However, these are not just some scalars but some datasets which are required for the optimization to evaluate the models.
When I try to pass the arguments to the objective function directly in the usual way, python complains that the objective function is not pickable. When I put my data into a dictionary and pass that to the objective function, python complains about
" File "/usr/lib64/python3.6/multiprocessing/connection.py", line 393, in _send_bytes
header = struct.pack("!i", n)
struct.error: 'i' format requires -2147483648 <= number <= 2147483647
"
How can I pass nontrivial data to the objective function of differential evolution when using multiple workers? I have yet to find a way.
Something like
par02 = {'a':2,'b':3, "data":train_data}
# Define optimization bounds.
bounds = [(0, 10), (0, 10)]
# Attempt to optimize in series.
# series_result = differential_evolution(rosenbrock, bounds, args=(par02,))
# print(series_result.x)
# Attempt to optimize in parallel.
parallel_result = differential_evolution(rosenbrock, bounds, args=(par02,),
updating='deferred', workers=-1)
does not work for example.
Anyone got an idea? Or do I really have to load the data from disk everytime the objective function is called? That would slow down the optimization considerably I believe.
It always helps if a MWE is provided. For parallel processing to be used the objective function and args need to be picklable.
The following illustrates the problem:
-------stuff.py--------
from scipy.optimize import rosen
def obj(x, *args):
return rosen(x)
-------in the CLI-------
import numpy as np
import pickle
from scipy.optimize import differential_evolution, rosen
from stuff import obj
train_data = "123"
par02 = {'a':2,'b':3, "data":train_data}
bounds = [(0, 10), (0, 10)]
# This will work because `obj` is importable in
# the __main__ context.
differential_evolution(obj, bounds, args=(par02,),
updating='deferred', workers=-1)
def some_other_func(x, *args):
pass
wont_work = {'a':2,'b':3, "data":some_other_func}
# look at the output here. ` some_other_func` is referenced with
# respect to __main__
print(pickle.dumps(wont_work))
# the following line will hang because `some_other_func`
# is not importable by the main process; to get it
# to work the function has to reside in an importable file.
parallel_result = differential_evolution(obj, bounds, args=(wont_work,),
updating='deferred', workers=-1)
Basically you can't use classes/functions that are defined in the main context (i.e. the CLI), they have to be importable by main.
https://docs.python.org/3/library/multiprocessing.html#multiprocessing-programming

Using R packages in Python using rpy2

There is a package in R that I need to use on my data. All my data preprocessing has already been done in python and all the modelling as well. The package in R is 'PMA'. I have used r2py before using Rs PLS package as follows
import numpy as np
from rpy2.robjects.numpy2ri import numpy2ri
import rpy2.robjects as ro
def Rpcr(X_train,Y_train,X_test):
ro.r('''source('R_pls.R')''')
r_pls=ro.globalenv['R_pls']
r_x_train=numpy2ri(X_train)
r_y_train=numpy2ri(Y_train)
r_x_test=numpy2ri(X_test)
p_res=r_pls(r_x_train,r_y_train,r_x_test)
yp_test=np.array(p_res[0])
yp_test=yp_test.reshape((yp_test.size,))
yp_train=np.array(p_res[1])
yp_train=yp_train.reshape((yp_train.size,))
ncomps=np.array(p_res[2])
ncomps=ncomps.reshape((ncomps.size,))
return yp_test,yp_train,ncomps
when I followed this format is gave an error that function numpy2ri does not exist.
So I have been working off of rpy2 manual and have tried a number of things with no success. The package I am working with in R is implemented like so:
library('PMA')
cspa=CCA(X,Z,typex="standard", typez="standard", K=1, penaltyx=0.25, penaltyz=0.25)
# X and Z are dataframes with dimension ppm and pXq
# cspa returns an R object which I need two attributes u and v
U<-cspa$u
V<-cspa$v
So trying to implement something like I was seeing on the rpy2 tried to load the module in python and use it in python like so
import rpy2.robjects as ro
from rpy2.robjects.packages import SignatureTranslatedAnonymousPackage as STAP
from rpy2.robjects import numpy2ri
from rpy2.robjects.packages import importr
base=importr('base'
scca=importr('PMA')
numpy2ri.activate() # To turn NumPy arrays X1 and X2 to r objects
out=scca.CCA(X1,X2,typex="standard",typez="standard", K=1, penaltyz=0.25,penaltyz=0.25)
and got the following error
OMP: Error #15: Initializing libomp.dylib, but found libiomp5.dylib already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://openmp.llvm.org/
Abort trap: 6
I also tried using R code directly using an example they had
string<-'''SCCA<-function(X,Z,K,alpha){
library("PMA")
scca<-CCA(X,Z,typex="standard",typez="standard",K=K penaltyx=alpha,penaltyz=alpha)
u<-scca$u
v<-scca$v
out<-list(U=u,V=v)
return(out)}'''
scca=STAP(string,"scca")
which as I understand can be used like an r function directly
numpy2ri.activate()
scca(X,Z,1,0.25)
this results in the same error as above.
So I do not know exactly how to fix it and have been unable to find anything similar.
The error for some reason is a mac-os issue. https://stackoverflow.com/a/53014308/1628393
Thus all you have to do
is modify it with this command and it works well
os.environ['KMP_DUPLICATE_LIB_OK']='True'
string<-'''SCCA<-function(X,Z,K,alpha){
library("PMA")
scca<-CCA(X,Z,typex="standard",typez="standard",K=Kpenaltyx=alpha,penaltyz=alpha)
u<-scca$u
v<-scca$v
out<-list(U=u,V=v)
return(out)}'''
scca=STAP(string,"scca")
then the function is called by
scca.SCCA(X,Z,1,0.25)

Python: ValueError: not enough values to unpack (expected 3, got 1)

I am kinda new to multiprocessing library in python. I have watched couple of lectures on youtube and have implemented basic instances of process from multiprocessing. My problem is, I want to speed up my code by utilizing all the 8 available cores. I asked a similar question in this regard a couple of days back but couldnt elicit a reasonable answer. I went back and did thorough homework and now after having understood the basics of pool(among other things in multiprocessing), I am kinda stuck in a peculier situation. I went through similar problems that were posed here at SO for instance this one-(Python multiprocessing pool map with multiple arguments) and (Python multiprocessing pool.map for multiple arguments). I even implemented the solution mentioned in former. However my code is still throwing this error-
File "minimal_example_so.py", line 84, in <module>
X,u,t=pool.starmap(solver,[args])
ValueError: not enough values to unpack (expected 3, got 1)
My code looks something like this-
import time
ti=time.time()
import matplotlib.pyplot as plt
from scipy.integrate import ode
import numpy as np
from numpy import sin,cos,tan,zeros,exp,tanh,dot,array
from matplotlib import rc
import itertools
from const_for_drdo_mod4 import *
plt.style.use('bmh')
import mpltex
from functools import reduce
from multiprocessing import Pool
import multiprocessing
linestyles=mpltex.linestyle_generator()
def zetta(x,spr,c):
num=len(x)*len(c)
Mu=[[] for i in range(len(x))]
for i in range(len(x)):
Mu[i]=np.zeros(len(c))
m=[]
for i in range(len(x)):
for j in range(len(c)):
Mu[i][j]=exp(-.5*((x[i]-c[j])/spr)**2)
b=list(itertools.product(*Mu))
for i in range(len(b)):
m.append(reduce(lambda x,y:x*y,b[i]))
m=np.array(m)
S=np.sum(m)
return m/S
def f(t,Y,a,b,spr,tim,so,k,K,C):
x1,x2=Y[0],Y[1]
e=x1-2
de=-2*x1+a*x2+b*sin(x1)
s=de+2*e
xx=[e,de]
sold,ti=so,tim
#import pdb;pdb.set_trace()
theta=Y[2:2+len(C)**len(xx)]
Z=zetta(xx,spr,C)
u=dot(Z,theta)
Z1=list(Z)
dt=time.time()-ti
ti=time.time()
sodt=(s-sold)/dt
x1dot=de
x2dot=-x2*cos(x1)+cos(2*x1)*u
xdot=[x1dot,x2dot]
thetadot=[-20*number*(sodt+k*s+K*tanh(s))-20*.1*number2 for number,number2 in zip(Z1,theta)]
sold=s
ydot=xdot+thetadot
return [ydot,u]
def solver(t0,y0,t1,dt,a,b,spr,tim,so,k,K,C):
num=2
x,t=[[] for i in range(2+len(C)**num)],[]
u=[]
r=ode(lambda t,y,a,b,spr,tim,so,k,K,C: f(t,y,a,b,spr,tim,so,k,K,C)[0]).set_integrator('dopri5',method='bdf')
r.set_initial_value(y0,t0).set_f_params(a,b,spr,tim,so,k,K,C)
while r.successful() and r.t<t1:
r.integrate(r.t+dt)
for i in range(2+len(C)**num):
x[i].append(r.y[i])
u.append(f(r.t,r.y,a,b,spr,tim,so,k,K,C)[1])
t.append(r.t)
return x,u,t
if __name__=='__main__':
spr,C=1.5,[-3,-1.5,0,1.5,3]
num=2
k,K=2,5
tim,so=0,0
a,b=1,2
y0,T=[0.1,0],100
x1=[0 for i in range(len(C)**num)]
x0=y0+x1
args=(0,x0,T,1e-2,a,b,spr,tim,so,k,K,C)
pool=multiprocessing.Pool(3)
X,u,t=pool.starmap(solver,[args])
#X,u,t=solver(0,x0,T,1e-2,a,b,spr,tim,so,k,K,C)
nam=["x1","x2"]
pool.close()
pool.join()
plt.figure(1)
for i in range(len(X[0:2])):
plt.plot(t,X[i],label=nam[i])
plt.legend(loc='upper right')
plt.figure(2)
for i in range(len(X[2:])):
plt.plot(t,X[i])
plt.figure(3)
plt.plot(t,u)
plt.show()
Here I wish to tell that all my arguments are either plain numbers or lists/array of numbers which needs to be passed to the solver method. I tried couple of ways to pass my arguments to the pool using map and even starmap, but all of them were in vain. Kindly help, thanks in advance.
PS- my code is working absolutely fine without pool.
You're calling starmap with a list of just one tuple of arguments. The return value will therefore also be a list containing one element - the tuple returned by one call to solver. So you're effectively saying
X, u, t = [(x1, u1, t1)]
which is why you get the exception you're getting: you can't unpack one value (the returned tuple) into three variables. If you want to use starmap here you'll need to do something like:
[(X,u,t)] = pool.starmap(solver,[args])
instead. But for only one set of arguments it makes more sense to use apply, since it's designed for a single invocation:
X,u,t = pool.apply(solver, args)

Is there a way to serialize a sympy ufuncify-ied function?

Is there a way to serialize or save a function that has been binarized using the ufuncify tool from SymPy's autowrap module?
There is a solution here using dill: How to serialize sympy lambdified function?, but this only works for lambdified functions.
Here's a minimal working example to illustrate:
>>> import pickle
>>> import dill
>>> import numpy
>>> from sympy import *
>>> from sympy.utilities.autowrap import ufuncify
>>> x, y, z = symbols('x y z')
>>> fx = ufuncify([x], sin(x))
>>> dill.settings['recurse'] = True
>>> dill.detect.errors(fx)
pickle.PicklingError("Can't pickle <ufunc 'wrapper_module_0'>: it's not found as __main__.wrapper_module_0")
Motivation: I have a few 50-million character-long SymPy expressions that I'd like to speed up; ufuncify works wonderfully (three-orders-of-magnitude improvement over lambidfy alone), but it takes 3 hr to ufuncify each expression. I would like to be able to take advantage of the ufuncify-ied expressions from time to time (without waiting a day to re-process them). Update: to illustrate, computer going to sleep killed my python kernel, and now I will need to wait ~10 hr to recover the binarized functions
Got it. This is actually simpler than using dill/pickle, as the autowrap module produces and compiles C code (by default) that can be imported as a C-extension module [1] [2].
One way to save the generated code is to simply specify the temp directory for autowrap[3], e.g.:
Python environment 1:
import numpy
from sympy import *
from sympy.utilities.autowrap import ufuncify
x, y, z = symbols('x y z')
fx = ufuncify([x], sin(x), tempdir='tmp')
Python environment 2 (same root directory):
import sys
sys.path.append('tmp')
import wrapper_module_0
wrapper_module_0.wrapped_8859643136(1)
There's probably a better approach (e.g., is there an easy way to name the module and its embedded function?), but this will work for now.
[1] http://www.scipy-lectures.org/advanced/interfacing_with_c/interfacing_with_c.html
[2] https://docs.python.org/2/extending/extending.html
[3] http://docs.sympy.org/latest/modules/utilities/autowrap.html

How to use IPython.parallel for functions with multiple inputs?

This is my first attempt at using IPython.parallel so please bear with me.
I read this question
Parfor for Python
and am having trouble implementing a simple example as follows:
import gmpy2 as gm
import numpy as np
from IPython.parallel import Client
rc = Client()
lview = rc.load_balanced_view()
lview.block = True
a = 1
def L2(ii,jj):
out = []
out.append(gm.fac(ii+jj+a))
return out
Nloop = 100
ii = range(Nloop)
jj = range(Nloop)
R2 = lview.map(L2, zip(ii, jj))
The problems I have are:
a is defined outside the loop and I think I need to do something like "push" but am a bit confused by that. Do I need to "pull" after?
there are two arguments that are required for the function and I don't know how to pass them correctly. I tried things like zip(ii,jj) but got some errors.
Also,, I assume the fact that I'm using a random library gmpy2 shouldn't affect things. Is this correct? Do I need to do anything special for this?
Ideally I would like your help so on this simple example the code runs error free.
If you think it would be beneficial to post my failed attempts at #2 let me know. I'm in the dark with #1.
I found two ways that make this work:
One is pushing the variable to the cores. There is no need to pull it. The variable will simply be defined in the namespace of each process-engine.
rc.client[:].push({'a':a})
R2 = lview.map(L2, ii, jj)
The other way is as to redefine L2 to take a as an input and pass an array of a's to the map function:
def L2(ii,jj,a):
out = []
out.append(gm.fac(ii+jj+a))
return out
R2 = lview.map(L2, ii, jj, [a]*Nloop)
With regards to the import as per this website:
http://ipython.org/ipython-doc/dev/parallel/parallel_multiengine.html#non-blocking-execution
You simply import the required libraries in the function:
Note the import inside the function. This is a common model, to ensure
that the appropriate modules are imported where the task is run. You
can also manually import modules into the engine(s) namespace(s) via
view.execute('import numpy')().
Or you can do as per this link
http://ipython.org/ipython-doc/dev/parallel/parallel_multiengine.html#remote-imports

Categories

Resources