Python3 not handling matplotlib plot when using a multiprocess pool - python

I have a small script creating different plots. Since no data are shared, I can do some multiprocessing. Using python2.7, no problem. With python3.6, I can´t seem to make it work.
I am using a pool (https://docs.python.org/3/library/multiprocessing.html and https://docs.python.org/2/library/multiprocessing.html) since I do not share objects or anything.
For Python3, I get a crash without traceback at line (fig = plt.figure(number)).
I am running on MacOs X sierra. I believe the problem is the same as for this topic (Saving multiple matplotlib figures with multiprocessing). Unfortunately, the problem wasn´t really addressed as not being the main issue.
One fast answer would be to use python2.7, but other pieces of my work rely on python3+ features.
Any idea on how to have traceback at least (verbose mode didn't show anything related to the crash), and then to solve this issue?
Many thanks
Here is the smallest code producing the error, coming from the thread mentioned above. (this code will create 4 files in the folder of the script).
import matplotlib.pyplot as plt
import numpy.random as random
from multiprocessing import Pool
def do_plot(number):
fig = plt.figure(number)
a = random.sample(100)
b = random.sample(100)
plt.scatter(a, b)
plt.savefig("%03d.jpg" % (number,))
plt.close()
print("Done ", number)
if __name__ == '__main__':
pool = Pool()
pool.map(do_plot, range(4))

Related

Python: ValueError: not enough values to unpack (expected 3, got 1)

I am kinda new to multiprocessing library in python. I have watched couple of lectures on youtube and have implemented basic instances of process from multiprocessing. My problem is, I want to speed up my code by utilizing all the 8 available cores. I asked a similar question in this regard a couple of days back but couldnt elicit a reasonable answer. I went back and did thorough homework and now after having understood the basics of pool(among other things in multiprocessing), I am kinda stuck in a peculier situation. I went through similar problems that were posed here at SO for instance this one-(Python multiprocessing pool map with multiple arguments) and (Python multiprocessing pool.map for multiple arguments). I even implemented the solution mentioned in former. However my code is still throwing this error-
File "minimal_example_so.py", line 84, in <module>
X,u,t=pool.starmap(solver,[args])
ValueError: not enough values to unpack (expected 3, got 1)
My code looks something like this-
import time
ti=time.time()
import matplotlib.pyplot as plt
from scipy.integrate import ode
import numpy as np
from numpy import sin,cos,tan,zeros,exp,tanh,dot,array
from matplotlib import rc
import itertools
from const_for_drdo_mod4 import *
plt.style.use('bmh')
import mpltex
from functools import reduce
from multiprocessing import Pool
import multiprocessing
linestyles=mpltex.linestyle_generator()
def zetta(x,spr,c):
num=len(x)*len(c)
Mu=[[] for i in range(len(x))]
for i in range(len(x)):
Mu[i]=np.zeros(len(c))
m=[]
for i in range(len(x)):
for j in range(len(c)):
Mu[i][j]=exp(-.5*((x[i]-c[j])/spr)**2)
b=list(itertools.product(*Mu))
for i in range(len(b)):
m.append(reduce(lambda x,y:x*y,b[i]))
m=np.array(m)
S=np.sum(m)
return m/S
def f(t,Y,a,b,spr,tim,so,k,K,C):
x1,x2=Y[0],Y[1]
e=x1-2
de=-2*x1+a*x2+b*sin(x1)
s=de+2*e
xx=[e,de]
sold,ti=so,tim
#import pdb;pdb.set_trace()
theta=Y[2:2+len(C)**len(xx)]
Z=zetta(xx,spr,C)
u=dot(Z,theta)
Z1=list(Z)
dt=time.time()-ti
ti=time.time()
sodt=(s-sold)/dt
x1dot=de
x2dot=-x2*cos(x1)+cos(2*x1)*u
xdot=[x1dot,x2dot]
thetadot=[-20*number*(sodt+k*s+K*tanh(s))-20*.1*number2 for number,number2 in zip(Z1,theta)]
sold=s
ydot=xdot+thetadot
return [ydot,u]
def solver(t0,y0,t1,dt,a,b,spr,tim,so,k,K,C):
num=2
x,t=[[] for i in range(2+len(C)**num)],[]
u=[]
r=ode(lambda t,y,a,b,spr,tim,so,k,K,C: f(t,y,a,b,spr,tim,so,k,K,C)[0]).set_integrator('dopri5',method='bdf')
r.set_initial_value(y0,t0).set_f_params(a,b,spr,tim,so,k,K,C)
while r.successful() and r.t<t1:
r.integrate(r.t+dt)
for i in range(2+len(C)**num):
x[i].append(r.y[i])
u.append(f(r.t,r.y,a,b,spr,tim,so,k,K,C)[1])
t.append(r.t)
return x,u,t
if __name__=='__main__':
spr,C=1.5,[-3,-1.5,0,1.5,3]
num=2
k,K=2,5
tim,so=0,0
a,b=1,2
y0,T=[0.1,0],100
x1=[0 for i in range(len(C)**num)]
x0=y0+x1
args=(0,x0,T,1e-2,a,b,spr,tim,so,k,K,C)
pool=multiprocessing.Pool(3)
X,u,t=pool.starmap(solver,[args])
#X,u,t=solver(0,x0,T,1e-2,a,b,spr,tim,so,k,K,C)
nam=["x1","x2"]
pool.close()
pool.join()
plt.figure(1)
for i in range(len(X[0:2])):
plt.plot(t,X[i],label=nam[i])
plt.legend(loc='upper right')
plt.figure(2)
for i in range(len(X[2:])):
plt.plot(t,X[i])
plt.figure(3)
plt.plot(t,u)
plt.show()
Here I wish to tell that all my arguments are either plain numbers or lists/array of numbers which needs to be passed to the solver method. I tried couple of ways to pass my arguments to the pool using map and even starmap, but all of them were in vain. Kindly help, thanks in advance.
PS- my code is working absolutely fine without pool.
You're calling starmap with a list of just one tuple of arguments. The return value will therefore also be a list containing one element - the tuple returned by one call to solver. So you're effectively saying
X, u, t = [(x1, u1, t1)]
which is why you get the exception you're getting: you can't unpack one value (the returned tuple) into three variables. If you want to use starmap here you'll need to do something like:
[(X,u,t)] = pool.starmap(solver,[args])
instead. But for only one set of arguments it makes more sense to use apply, since it's designed for a single invocation:
X,u,t = pool.apply(solver, args)

Python multiprocessing pool function not defined

I need to implement a multiprocessing pool that utilizes arbitrary packages for calculations. For this, I'm using Python and joblib 0.9.0. This code is basically the structure I want.
import numpy as np
from joblib import pool
def someComputation(x):
return np.interp(x, [-1, 1], [-1, 1])
if __name__ == '__main__':
some_set_of_numbers = [-1,-0.5,0,0.5,1]
the_pool = pool.Pool(processes=2)
solutions = [the_pool.apply_async(someComputation, (x,)) for x in some_set_of_numbers]
print(solutions[0].get())
On both Windows 10 and Red Hat Enterprise Linux running Anaconda 4.3.1 Python 3.6.0 (as well as 3.5 and 3.4 with virtual envs), I get that 'np' was never passed into the someComputation() function raising the error
File "C:\Anaconda3\lib\site-packages\multiprocessing_on_dill\pool.py", line 608, in get
raise self._value
NameError: name 'np' is not defined
however, on my Mac OS X 10.11.6 running Python 3.5 and the same joblib, I get the expected output of '-1' with the exact same code. This question is essentially the same, but it dealt with pathos and not joblib. The general answer was to include the numpy import statement inside of the function
from joblib import pool
def someComputation(x):
import numpy as np
return np.interp(x, [-1, 1], [-1, 1])
if __name__ == '__main__':
some_set_of_numbers = [-1,-0.5,0,0.5,1]
the_pool = pool.Pool(processes=2)
solutions = [the_pool.apply_async(someComputation, (x,)) for x in some_set_of_numbers]
print(solutions[0].get())
This solves the issue on the Windows and Linux machines, where they now output '-1' as expected but this solution seems clunky. Is there any reason why the first bit of code would work on a Mac, but not Windows or Linux? I ultimately need to run this code on the Linux machine so is there any fix that doesn't include putting the import statement inside of the function?
Edit:
After investigating a bit further, I found an old workaround I put in years ago that looks like is causing the issue. In joblib/pool.py, I changed line 44 from
from multiprocessing.pool import Pool
to
from multiprocessing_on_dill.pool import Pool
to support pickling of arbitrary functions. For some reason, this change is what really causes the issue on Windows and Linux, but the Mac machine runs just fine. Using multiprocessing instead of multiprocessing_on_dill solves the above issue, but the code doesn't work for the majority of my cases since they can't be pickled.
I am not sure what the exact issue is, but it appears that there is some problem with transferring the global scope over to the subprocesses that run the task. You can potentially avoid name errors by binding the name np as a function parameter:
def someComputation(x, np=np):
return np.interp(x, [-1, 1], [-1, 1])
This has the advantage of not requiring a call to the import machinery every time the function is run. The name np will be bound to the function when it is first evaluated during module loading.

Running rpy2 in parallel using multiprocessing raises weird exception that cannot be caught

So this is a problem that I have not been able to solve, and neither do I know of a good way to make a MCVE out of. Essentially, it has been briefly discussed here, but as the comments show, there was some disagreement, and the final verdict is still out. Hence I am posting a similar question again, hoping to get a better answer.
Background
I have sensor data from a couple of thousand sensors, that I get every minute. My interest lies in forecasting the data. For this I am using the ARIMA family of forecasting models. Long story short, after discussion with the rest of my research group, we decided to use the Arima function available in the R package forecast, instead of the statsmodels implementation of the same.
Problem Definition
Since, I have data from a few thousand sensors, for which I would like to at least analyse a whole week's worth of data (to begin with), and since a week has 7 days, I have 7 times the number of sensors data with me. Essentially a some 14k sensor-day combinations. Finding the best ARIMA order (which minimizes BIC) and forecasting the next day of week data takes about 1 minute for each sensor-day combination. Which means upwards of 11 days to just process one week data on a single core!
This is obviously a waste, when I have 15 more cores just idling away the whole time. So, obviously, this is a problem for parallel processing. Note that each sensor-day combination does not influence any other sensor-day combination. Also, the rest of my code is fairly well profiled, and optimized.
Issue
The issue is that I get this weird error that I cannot catch anywhere. Here is the error reproduced:
Exception in thread Thread-3:
Traceback (most recent call last):
File "/home/kartik/miniconda3/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/home/kartik/miniconda3/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "/home/kartik/miniconda3/lib/python3.5/multiprocessing/pool.py", line 429, in _handle_results
task = get()
File "/home/kartik/miniconda3/lib/python3.5/multiprocessing/connection.py", line 251, in recv
return ForkingPickler.loads(buf.getbuffer())
File "/home/kartik/miniconda3/lib/python3.5/site-packages/rpy2/robjects/robject.py", line 55, in _reduce_robjectmixin
rinterface_level=rinterface_factory(rdumps, rtypeof)
ValueError: Mismatch between the serialized object and the expected R type (expected 6 but got 24)
Here are a few characteristics of this error that I have discovered:
It is raised in the rpy2 package
It has something to do with Thread 3. Since Python is zero indexed, I am guessing this is the fourth thread. Therefore, 4x6 = 24, which adds up to the numbers shown in the final error statement
rpy2 is being used in only one place in my code where it might have to recode returned values into Python types. Protecting that line in try: ... except: ... does not catch that exception
The exception is not raised when I ditch the multiprocessing and call the function within a loop
The exception does not crash the program, just suspends it forever (till I Ctrl+C it into terminating)
All that I tried till now, have had no effect in resolving the error
Things Tried
I have tried everything from extreme procedural coding, with functions to deal with the least cases (that is only one function to be called in parallel), to extreme encapsulation, where the executable block in the if __name__ == '__main__': calls a function which reads in the data, does the necessary grouping, then passes the groups to another function, which imports multiprocessing and calls another function in parallel, which imports the processing module that imports rpy2, and passes the data to the Arima function in R.
Basically, it doesn't matter if rpy2 is called and initialized deep inside function nests, such that it has no idea another instance might be initialized, or if it is called and initialized once, globally, the error is raised if multiprocessing is involved.
Pseudo Code
Here is an attempt to present at least some basic pseudo code such that the error might be reproduced.
import numpy as np
import pandas as pd
def arima_select(y, order):
from rpy2 import robjects as ro
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
pandas2ri.activate()
forecast = importr('forecast')
res = forecast.Arima(y, order=ro.FloatVector(order))
return res
def arima_wrapper(data):
data = data[['tstamp', 'val']]
data.set_index('tstamp', inplace=True)
return arima_select(data, (1,1,1))
def applyParallel(groups, func):
from multiprocessing import Pool, cpu_count
with Pool(cpu_count()) as p:
ret_list = p.map(func, [group for _, group in groups])
return pd.concat(ret_list, keys=[name for name, _ in groups])
def wrapper():
df = pd.read_csv('file.csv', parse_dates=[1], infer_datetime_format=True)
df['day'] = df['tstamp'].dt.day
res = applyParallel(df.groupby(['sensor', 'day']), arima_wrapper)
print(res)
Obviously, the above code can be encapsulated further, but I think it should reproduce the error quite accurately.
Data Sample
Here is the output of print(data.head(6)) when placed immediately below data.set_index('tstamp', inplace=True) in arima_wrapper from the pseudo code above:
Or alternatively, data for a sensor, for a whole week can be generated simply with:
def data_gen(start_day):
r = pd.Series(pd.date_range('2016-09-{}'.format(str(start_day)),
periods=24*60, freq='T'),
name='tstamp')
d = pd.Series(np.random.randint(10, 80, 1440), name='val')
s = pd.Series(['sensor1']*1440, name='sensor')
return pd.concat([s, r, d], axis=1)
df = pd.concat([data_gen(day) for day in range(1,8)], ignore_index=True)
Observations and Questions
The first observation is that this error is only raised when multiprocessing is involved, not when the function (arima_wrapper) is called in a loop. Therefore, it must be associated somehow with multiprocessing issues. R is not very multiprocess friendly, but when written in the way shown in the pseudo code, each instance of R should not know about the existence of the other instances.
The way the pseudo code is structured, there must be an initialization of rpy2 for each call inside the multiple subprocesses spawned by multiprocessing. If that were true, each instance of rpy2 should have spawned its own instance of R, which should just execute one function, and terminate. That would not raise any errors, because it would be similar to the single threaded operation. Is my understanding here accurate, or am I completely or partially missing the point?
Were all instances of rpy2 to somehow share an instance of R, then I might reasonably expect the error. What is true: is R shared among all instances of rpy2, or is there an instance of R for each instance of rpy2?
How might this issue be overcome?
Since SO hates question threads with multiple questions in them, I will prioritize my questions such that partial answers will be accepted. Here is my priority list:
How might this issue be overcome? A working code example that does not raise the issue will be accepted as answer even if it does not answer any other question, provided no other answer does better, or was posted earlier.
Is my understanding of Python imports accurate, or am I missing the point about multiple instances of R? If I am wrong, how should I edit the import statements such that a new instance is created within each subprocess? Answers to this question are likely to point me towards a probable solution, and will be accepted, provided no answer does better, or was posted earlier
Is R shared among all instances of rpy2 or is there an instance of R for each instance of rpy2? Answers to this question will be accepted only if they lead to a resolution of the problem.
(...) Long story short (...)
Really ?
How might this issue be overcome? A working code example that does not
raise the issue will be accepted as answer even if it does not answer
any other question, provided no other answer does better, or was
posted earlier.
Answers may leave a quite bit of work on your end...
Is my understanding of Python imports accurate, or am
I missing the point about multiple instances of R? If I am wrong, how
should I edit the import statements such that a new instance is
created within each subprocess? Answers to this question are likely to
point me towards a probable solution, and will be accepted, provided
no answer does better, or was posted earlier
Python packages/modules are "uniquely" imported across your process which means that all code using the package/module within the process is using the same single import (you don't have a copy per import in a given block).
Because of this, I'd recommend to use an initialization function when creating your Pool rather than repeatedly import rpy2 and setup the conversion each time a task is sent to a worker. You may also gain in performance if each task is short.
def arima_select(y, order):
# FIXME: check whether the rpy2.robjects package
# should be (re) imported as ro to be visible
res = forecast.Arima(y, order=ro.FloatVector(order))
return res
forecast = None
def worker_init():
from rpy2 import robjects as ro
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
pandas2ri.activate()
global forecast
forecast = importr('forecast')
def applyParallel(groups, func):
from multiprocessing import Pool, cpu_count
with Pool(cpu_count(), worker_init) as p:
ret_list = p.map(func, [group for _, group in groups])
return pd.concat(ret_list, keys=[name for name, _ in groups])
Is R shared among all
instances of rpy2 or is there an instance of R for each instance of
rpy2? Answers to this question will be accepted only if they lead to a
resolution of the problem.
rpy2 is making R available by linking its C shared library. One such library per Python process, and that's as a stateful library (R not able to handle concurrency). I think that your issue has more to do with object serialization (see http://rpy2.readthedocs.io/en/version_2.8.x/robjects_serialization.html#object-serialization) than with concurrency.
What is happening is some apparent confusion when reconstructing the R objects after Python pickled the rpy2 object. More specifically, when looking that the R object types mentioned in the error message:
>>> from rpy2.rinterface import str_typeint
>>> str_typeint(6)
'LANGSXP'
>>> str_typeint(24)
'RAWSXP'
I am guessing that the R object returned by forecast.Arima contains an unevaluated R expression (for example the call that lead to that result object) and when serializing and unserializing it is coming back as something different (a raw vector of bytes). This is possibly a bug with R's own serialization mechanism (since rpy2 is using it behind the hood). For now, and solve your issue, you may want to extract what forecast.Arima what you care most about and only return that from the function call ran by the worker.
The following changes to the arima_select function in the pesudo code presented in the question work:
import numpy as np
import pandas as pd
from rpy2 import rinterface as ri
ri.initr()
def arima_select(y, order):
def rimport(packname):
as_environment = ri.baseenv['as.environment']
require = ri.baseenv['require']
require(ri.StrSexpVector([packname]),
quiet = ri.BoolSexpVector((True, )))
packname = ri.StrSexpVector(['package:' + str(packname)])
pack_env = as_environment(packname)
return pack_env
frcst = rimport("forecast")
args = (('y', ri.FloatSexpVector(y)),
('order', ri.FloatSexpVector(order)),
('include.constant', ri.StrSexpVector(const)))
return frcst['Arima'].rcall(args, ri.globalenv)
Keeping the rest of the pseudo code the same. Note that I have since optimized the code further, and it does not require all the functions presented in the question. Basically, the following is necessary and sufficient:
import numpy as np
import pandas as pd
from rpy2 import rinterface as ri
ri.initr()
def arima(y, order=(1,1,1)):
# This is the same as arima_select above, just renamed to arima
...
def applyParallel(groups, func):
from multiprocessing import Pool, cpu_count
with Pool(cpu_count(), worker_init) as p:
ret_list = p.map(func, [group for _, group in groups])
return pd.concat(ret_list, keys=[name for name, _ in groups])
def main():
# Create your df in your favorite way:
def data_gen(start_day):
r = pd.Series(pd.date_range('2016-09-{}'.format(str(start_day)),
periods=24*60, freq='T'),
name='tstamp')
d = pd.Series(np.random.randint(10, 80, 1440), name='val')
s = pd.Series(['sensor1']*1440, name='sensor')
return pd.concat([s, r, d], axis=1)
df = pd.concat([data_gen(day) for day in range(1,8)], ignore_index=True)
applyParallel(df.groupby(['sensor', pd.Grouper(key='tstamp', freq='D')]),
arima) # Note one may use partial from functools to pass order to arima
Note that I also do not call arima directly from applyParallel since my goal is to find the best model for the given series (for a sensor and day). I use a function arima_wrapper to iterate through the order combinations, and call arima at each iteration.

Pyplot "cannot connect to X server localhost:10.0" despite ioff() and matplotlib.use('Agg')

I have a piece of code which gets called by a different function, carries out some calculations for me and then plots the output to a file. Seeing as the whole script can take a while to run for larger datasets and since I may want to analyse multiple datasets at a given time I start it in screen then disconnect and close my putty session and check back on it the next day. I am using Ubuntu 14.04. My code looks as follows (I have skipped the calculations):
import shelve
import os, sys, time
import numpy
import timeit
import logging
import csv
import itertools
import graph_tool.all as gt
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
plt.ioff()
#Do some calculations
print 'plotting indeg'
# Let's plot its in-degree distribution
in_hist = gt.vertex_hist(g, "in")
y = in_hist[0]
err = numpy.sqrt(in_hist[0])
err[err >= y] = y[err >= y] - 1e-2
plt.figure(figsize=(6,4))
plt.errorbar(in_hist[1][:-1], in_hist[0], fmt="o",
label="in")
plt.gca().set_yscale("log")
plt.gca().set_xscale("log")
plt.gca().set_ylim(0.8, 1e5)
plt.gca().set_xlim(0.8, 1e3)
plt.subplots_adjust(left=0.2, bottom=0.2)
plt.xlabel("$k_{in}$")
plt.ylabel("$NP(k_{in})$")
plt.tight_layout()
plt.savefig("in-deg-dist.png")
plt.close()
print 'plotting outdeg'
#Do some more stuff
The script runs perfectly happily until I get to the plotting commands. To try and get to the root of the problem I am currently running it in putty without screen and with no X11 applications. The ouput I get is the following:
plotting indeg
PuTTY X11 proxy: unable to connect to forwarded X server: Network error: Connection refused
: cannot connect to X server localhost:10.0
I presume this is caused by the code trying to open a window but I thought that by explicitely setting plt.off() that would be disabled. Since it wasn't I followed this thread (Generating matplotlib graphs without a running X server ) and specified the backend, but that didn't solve the problem either. Where might I be going wrong?
The calling function calls other functions too which also use matplotlib. These get called only after this one but during the import statement their dependecies get loaded. Seeing as they were loaded first they disabled the subsequent matplotlib.use('Agg') declaration. Moving that declaration to the main script has solved the problem.

Python rpy2 and matplotlib conflict when using multiprocessing

I am trying to calculate and generate plots using multiprocessing. On Linux the code below runs correctly, however on the Mac (ML) it doesn't, giving the error below:
import multiprocessing
import matplotlib.pyplot as plt
import numpy as np
import rpy2.robjects as robjects
def main():
pool = multiprocessing.Pool()
num_figs = 2
# generate some random numbers
input = zip(np.random.randint(10,1000,num_figs),
range(num_figs))
pool.map(plot, input)
def plot(args):
num, i = args
fig = plt.figure()
data = np.random.randn(num).cumsum()
plt.plot(data)
main()
The Rpy2 is rpy2==2.3.1 and R is 2.13.2 (I could not install R 3.0 and rpy2 latest version on any mac without getting segmentation fault).
The error is:
The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__() to debug.
The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
I have tried everything to understand what the problem is with no luck. My configuration is:
Danials-MacBook-Pro:~ danialt$ brew --config
HOMEBREW_VERSION: 0.9.4
ORIGIN: https://github.com/mxcl/homebrew
HEAD: 705b5e133d8334cae66710fac1c14ed8f8713d6b
HOMEBREW_PREFIX: /usr/local
HOMEBREW_CELLAR: /usr/local/Cellar
CPU: dual-core 64-bit penryn
OS X: 10.8.3-x86_64
Xcode: 4.6.2
CLT: 4.6.0.0.1.1365549073
GCC-4.2: build 5666
LLVM-GCC: build 2336
Clang: 4.2 build 425
X11: 2.7.4 => /opt/X11
System Ruby: 1.8.7-358
Perl: /usr/bin/perl
Python: /usr/local/bin/python => /usr/local/Cellar/python/2.7.4/Frameworks/Python.framework/Versions/2.7/bin/python2.7
Ruby: /usr/bin/ruby => /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/ruby
Any ideas?
This error occurs on Mac OS X when you perform a GUI operation outside the main thread, which is exactly what you are doing by shifting your plot function to the multiprocessing.Pool (I imagine that it will not work on Windows either for the same reason - since Windows has the same requirement). The only way that I can imagine it working is using the pool to generate the data, then have your main thread wait in a loop for the data that's returned (a queue is the way I usually handle it...).
Here is an example (recognizing that this may not do what you want - plot all the figures "simultaneously"? - plt.show() blocks so only one is drawn at a time and I note that you do not have it in your sample code - but without I don't see anything on my screen - however, if I take it out - there is no blocking and no error because all GUI functions are happening in the main thread):
import multiprocessing
import matplotlib.pyplot as plt
import numpy as np
import rpy2.robjects as robjects
data_queue = multiprocessing.Queue()
def main():
pool = multiprocessing.Pool()
num_figs = 10
# generate some random numbers
input = zip(np.random.randint(10,10000,num_figs), range(num_figs))
pool.map(worker, input)
figs_complete = 0
while figs_complete < num_figs:
data = data_queue.get()
plt.figure()
plt.plot(data)
plt.show()
figs_complete += 1
def worker(args):
num, i = args
data = np.random.randn(num).cumsum()
data_queue.put(data)
print('done ',i)
main()
Hope this helps.
I had a similar issue with my worker, which was loading some data, generating a plot, and saving it to a file. Note that this is slightly different than what the OP's case, which seems to be oriented around interactive plotting. Still, I think it's relevant.
A simplified version of my code:
def worker(id):
data = load_data(id)
plot_data_to_file(data) # Generates a plot and saves it to a file.
def plot_something_parallel(ids):
pool = multiprocessing.Pool()
pool.map(worker, ids)
plot_something_parallel(ids=[1,2,3])
This caused the same error others mention:
The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__() to debug.
Following #bbbruce's train of thought, I solved my problem by switching the matplotlib backend from TKAgg to the default. Specifically, I commented out the following line in my matplotlibrc file:
#backend : TkAgg
This might be rpy2-specific.
There are reports of a similar problem with OS X and multiprocessing here and there.
I think that using an initializer that imports the packages needed to run the code in plot could solve the problem (multiprocessing-doc).
I had a similar issue and found that setting the start method in multiprocessing to use forkserver works as long as it comes after your if name == main: statement.
if __name__ == '__main__':
multiprocessing.set_start_method('forkserver')
first_process = multiprocessing.Process(target = targetOne)
second_process = multiprocessing.Process(target = targetTwo)
first_process.start()
second_process.start()
Try to upgrade matplotlib to 3.0.3:
pip3 install matplotlib --upgrade
Then everything goes fine.
=======================================================================
No need to read below anymore.
Yesterday, my multiprocess plot works on my MacBook Air. But not working on my MacBook Pro tomorrow morning with the same code, displaying many:
The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__() to debug.
The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
They are all using 4th gen i intel CPU (i5-4xxx with air and i7-4xxx with pro). So if there are no difference on hardware, it must be on software.
So I just tried update matplot to 3.0.3 on MacBook Pro( was 3.0.1), every thing goes fine.
Also, no need to do pool.apply_async anymore.

Categories

Resources