Running python in python block by block capturing the output - python

As a pet project, I want to build something similar to the Jupyter notebook. Given an array of strings, each of which is a peace of python code, I would like to run each piece one by one in a single python process and then associate blocks of output with each piece of code. I would also like to manage it all in another (parent) python process.
To make the problem tangible, let's say I have a list of strings, each is a piece of python code. One string use variables from the preceding piece of code, i.e. they should all be run in a single process. Now I want to run one piece of code, wait until it finishes, capture the output, then run the next piece, and so on.
Unfortunately, googling around only gave me an example, where I can run peace of code, using subprocess.Popen('python', stdout=PIPE, ...), but with this approach, it will start executing my command after I close stdin, effectively closing the whole python process.

You can use contextlib.redirect_stdout from the standard library to capture the output of exec() calls. With that, your idea of code blocks (as I understand them) is straightforward to implement:
import io
from contextlib import redirect_stdout
class Block:
def __init__(self, code=''):
self.code = code
self.stdout = io.StringIO()
def run(self):
with redirect_stdout(self.stdout):
exec(self.code, globals()) # Pass global variable dict to allow modification
#property
def output(self):
return self.stdout.getvalue()
>>> b1 = Block('a = 42; print(a)')
>>> b2 = Block('print(1/a)')
>>> b1.run()
>>> b2.run()
>>> b1.output
'42\n'
>>> b2.output
'0.023809523809523808\n'

Related

How to share variables in multiprocessing [duplicate]

The following does not work
one.py
import shared
shared.value = 'Hello'
raw_input('A cheap way to keep process alive..')
two.py
import shared
print shared.value
run on two command lines as:
>>python one.py
>>python two.py
(the second one gets an attribute error, rightly so).
Is there a way to accomplish this, that is, share a variable between two scripts?
Hope it's OK to jot down my notes about this issue here.
First of all, I appreciate the example in the OP a lot, because that is where I started as well - although it made me think shared is some built-in Python module, until I found a complete example at [Tutor] Global Variables between Modules ??.
However, when I looked for "sharing variables between scripts" (or processes) - besides the case when a Python script needs to use variables defined in other Python source files (but not necessarily running processes) - I mostly stumbled upon two other use cases:
A script forks itself into multiple child processes, which then run in parallel (possibly on multiple processors) on the same PC
A script spawns multiple other child processes, which then run in parallel (possibly on multiple processors) on the same PC
As such, most hits regarding "shared variables" and "interprocess communication" (IPC) discuss cases like these two; however, in both of these cases one can observe a "parent", to which the "children" usually have a reference.
What I am interested in, however, is running multiple invocations of the same script, ran independently, and sharing data between those (as in Python: how to share an object instance across multiple invocations of a script), in a singleton/single instance mode. That kind of problem is not really addressed by the above two cases - instead, it essentially reduces to the example in OP (sharing variables across two scripts).
Now, when dealing with this problem in Perl, there is IPC::Shareable; which "allows you to tie a variable to shared memory", using "an integer number or 4 character string[1] that serves as a common identifier for data across process space". Thus, there are no temporary files, nor networking setups - which I find great for my use case; so I was looking for the same in Python.
However, as accepted answer by #Drewfer notes: "You're not going to be able to do what you want without storing the information somewhere external to the two instances of the interpreter"; or in other words: either you have to use a networking/socket setup - or you have to use temporary files (ergo, no shared RAM for "totally separate python sessions").
Now, even with these considerations, it is kinda difficult to find working examples (except for pickle) - also in the docs for mmap and multiprocessing. I have managed to find some other examples - which also describe some pitfalls that the docs do not mention:
Usage of mmap: working code in two different scripts at Sharing Python data between processes using mmap | schmichael's blog
Demonstrates how both scripts change the shared value
Note that here a temporary file is created as storage for saved data - mmap is just a special interface for accessing this temporary file
Usage of multiprocessing: working code at:
Python multiprocessing RemoteManager under a multiprocessing.Process - working example of SyncManager (via manager.start()) with shared Queue; server(s) writes, clients read (shared data)
Comparison of the multiprocessing module and pyro? - working example of BaseManager (via server.serve_forever()) with shared custom class; server writes, client reads and writes
How to synchronize a python dict with multiprocessing - this answer has a great explanation of multiprocessing pitfalls, and is a working example of SyncManager (via manager.start()) with shared dict; server does nothing, client reads and writes
Thanks to these examples, I came up with an example, which essentially does the same as the mmap example, with approaches from the "synchronize a python dict" example - using BaseManager (via manager.start() through file path address) with shared list; both server and client read and write (pasted below). Note that:
multiprocessing managers can be started either via manager.start() or server.serve_forever()
serve_forever() locks - start() doesn't
There is auto-logging facility in multiprocessing: it seems to work fine with start()ed processes - but seems to ignore the ones that serve_forever()
The address specification in multiprocessing can be IP (socket) or temporary file (possibly a pipe?) path; in multiprocessing docs:
Most examples use multiprocessing.Manager() - this is just a function (not class instantiation) which returns a SyncManager, which is a special subclass of BaseManager; and uses start() - but not for IPC between independently ran scripts; here a file path is used
Few other examples serve_forever() approach for IPC between independently ran scripts; here IP/socket address is used
If an address is not specified, then an temp file path is used automatically (see 16.6.2.12. Logging for an example of how to see this)
In addition to all the pitfalls in the "synchronize a python dict" post, there are additional ones in case of a list. That post notes:
All manipulations of the dict must be done with methods and not dict assignments (syncdict["blast"] = 2 will fail miserably because of the way multiprocessing shares custom objects)
The workaround to dict['key'] getting and setting, is the use of the dict public methods get and update. The problem is that there are no such public methods as alternative for list[index]; thus, for a shared list, in addition we have to register __getitem__ and __setitem__ methods (which are private for list) as exposed, which means we also have to re-register all the public methods for list as well :/
Well, I think those were the most critical things; these are the two scripts - they can just be ran in separate terminals (server first); note developed on Linux with Python 2.7:
a.py (server):
import multiprocessing
import multiprocessing.managers
import logging
logger = multiprocessing.log_to_stderr()
logger.setLevel(logging.INFO)
class MyListManager(multiprocessing.managers.BaseManager):
pass
syncarr = []
def get_arr():
return syncarr
def main():
# print dir([]) # cannot do `exposed = dir([])`!! manually:
MyListManager.register("syncarr", get_arr, exposed=['__getitem__', '__setitem__', '__str__', 'append', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort'])
manager = MyListManager(address=('/tmp/mypipe'), authkey='')
manager.start()
# we don't use the same name as `syncarr` here (although we could);
# just to see that `syncarr_tmp` is actually <AutoProxy[syncarr] object>
# so we also have to expose `__str__` method in order to print its list values!
syncarr_tmp = manager.syncarr()
print("syncarr (master):", syncarr, "syncarr_tmp:", syncarr_tmp)
print("syncarr initial:", syncarr_tmp.__str__())
syncarr_tmp.append(140)
syncarr_tmp.append("hello")
print("syncarr set:", str(syncarr_tmp))
raw_input('Now run b.py and press ENTER')
print
print 'Changing [0]'
syncarr_tmp.__setitem__(0, 250)
print 'Changing [1]'
syncarr_tmp.__setitem__(1, "foo")
new_i = raw_input('Enter a new int value for [0]: ')
syncarr_tmp.__setitem__(0, int(new_i))
raw_input("Press any key (NOT Ctrl-C!) to kill server (but kill client first)".center(50, "-"))
manager.shutdown()
if __name__ == '__main__':
main()
b.py (client)
import time
import multiprocessing
import multiprocessing.managers
import logging
logger = multiprocessing.log_to_stderr()
logger.setLevel(logging.INFO)
class MyListManager(multiprocessing.managers.BaseManager):
pass
MyListManager.register("syncarr")
def main():
manager = MyListManager(address=('/tmp/mypipe'), authkey='')
manager.connect()
syncarr = manager.syncarr()
print "arr = %s" % (dir(syncarr))
# note here we need not bother with __str__
# syncarr can be printed as a list without a problem:
print "List at start:", syncarr
print "Changing from client"
syncarr.append(30)
print "List now:", syncarr
o0 = None
o1 = None
while 1:
new_0 = syncarr.__getitem__(0) # syncarr[0]
new_1 = syncarr.__getitem__(1) # syncarr[1]
if o0 != new_0 or o1 != new_1:
print 'o0: %s => %s' % (str(o0), str(new_0))
print 'o1: %s => %s' % (str(o1), str(new_1))
print "List is:", syncarr
print 'Press Ctrl-C to exit'
o0 = new_0
o1 = new_1
time.sleep(1)
if __name__ == '__main__':
main()
As a final remark, on Linux /tmp/mypipe is created - but is 0 bytes, and has attributes srwxr-xr-x (for a socket); I guess this makes me happy, as I neither have to worry about network ports, nor about temporary files as such :)
Other related questions:
Python: Possible to share in-memory data between 2 separate processes (very good explanation)
Efficient Python to Python IPC
Python: Sending a variable to another script
You're not going to be able to do what you want without storing the information somewhere external to the two instances of the interpreter.
If it's just simple variables you want, you can easily dump a python dict to a file with the pickle module in script one and then re-load it in script two.
Example:
one.py
import pickle
shared = {"Foo":"Bar", "Parrot":"Dead"}
fp = open("shared.pkl","w")
pickle.dump(shared, fp)
two.py
import pickle
fp = open("shared.pkl")
shared = pickle.load(fp)
print shared["Foo"]
sudo apt-get install memcached python-memcache
one.py
import memcache
shared = memcache.Client(['127.0.0.1:11211'], debug=0)
shared.set('Value', 'Hello')
two.py
import memcache
shared = memcache.Client(['127.0.0.1:11211'], debug=0)
print shared.get('Value')
What you're trying to do here (store a shared state in a Python module over separate python interpreters) won't work.
A value in a module can be updated by one module and then read by another module, but this must be within the same Python interpreter. What you seem to be doing here is actually a sort of interprocess communication; this could be accomplished via socket communication between the two processes, but it is significantly less trivial than what you are expecting to have work here.
you can use the relative simple mmap file.
you can use the shared.py to store the common constants. The following code will work across different python interpreters \ scripts \processes
shared.py:
MMAP_SIZE = 16*1024
MMAP_NAME = 'Global\\SHARED_MMAP_NAME'
* The "Global" is windows syntax for global names
one.py:
from shared import MMAP_SIZE,MMAP_NAME
def write_to_mmap():
map_file = mmap.mmap(-1,MMAP_SIZE,tagname=MMAP_NAME,access=mmap.ACCESS_WRITE)
map_file.seek(0)
map_file.write('hello\n')
ret = map_file.flush() != 0
if sys.platform.startswith('win'):
assert(ret != 0)
else:
assert(ret == 0)
two.py:
from shared import MMAP_SIZE,MMAP_NAME
def read_from_mmap():
map_file = mmap.mmap(-1,MMAP_SIZE,tagname=MMAP_NAME,access=mmap.ACCESS_READ)
map_file.seek(0)
data = map_file.readline().rstrip('\n')
map_file.close()
print data
*This code was written for windows, linux might need little adjustments
more info at - https://docs.python.org/2/library/mmap.html
Share a dynamic variable by Redis:
script_one.py
from redis import Redis
from time import sleep
cli = Redis('localhost')
shared_var = 1
while True:
cli.set('share_place', shared_var)
shared_var += 1
sleep(1)
Run script_one in a terminal (a process):
$ python script_one.py
script_two.py
from redis import Redis
from time import sleep
cli = Redis('localhost')
while True:
print(int(cli.get('share_place')))
sleep(1)
Run script_two in another terminal (another process):
$ python script_two.py
Out:
1
2
3
4
5
...
Dependencies:
$ pip install redis
$ apt-get install redis-server
I'd advise that you use the multiprocessing module. You can't run two scripts from the commandline, but you can have two separate processes easily speak to each other.
From the doc's examples:
from multiprocessing import Process, Queue
def f(q):
q.put([42, None, 'hello'])
if __name__ == '__main__':
q = Queue()
p = Process(target=f, args=(q,))
p.start()
print q.get() # prints "[42, None, 'hello']"
p.join()
You need to store the variable in some sort of persistent file. There are several modules to do this, depending on your exact need.
The pickle and cPickle module can save and load most python objects to file.
The shelve module can store python objects in a dictionary-like structure (using pickle behind the scenes).
The dbm/bsddb/dbhash/gdm modules can store string variables in a dictionary-like structure.
The sqlite3 module can store data in a lightweight SQL database.
The biggest problem with most of these are that they are not synchronised across different processes - if one process reads a value while another is writing to the datastore then you may get incorrect data or data corruption. To get round this you will need to write your own file locking mechanism or use a full-blown database.
If you wanna read and modify shared data between 2 scripts which run separately, a good solution would be to take advantage of python multiprocessing module and use a Pipe() or a Queue() (see differences here). This way you get to sync scripts and avoid problems regarding concurrency and global variables (like what happens if both scripts wanna modify a variable at the same time).
The best part about using pipes/queues is that you can pass python objects through them.
Also there are methods to avoid waiting for data if there hasn't been passed yet (queue.empty() and pipeConn.poll()).
See an example using Queue() below:
# main.py
from multiprocessing import Process, Queue
from stage1 import Stage1
from stage2 import Stage2
s1= Stage1()
s2= Stage2()
# S1 to S2 communication
queueS1 = Queue() # s1.stage1() writes to queueS1
# S2 to S1 communication
queueS2 = Queue() # s2.stage2() writes to queueS2
# start s2 as another process
s2 = Process(target=s2.stage2, args=(queueS1, queueS2))
s2.daemon = True
s2.start() # Launch the stage2 process
s1.stage1(queueS1, queueS2) # start sending stuff from s1 to s2
s2.join() # wait till s2 daemon finishes
# stage1.py
import time
import random
class Stage1:
def stage1(self, queueS1, queueS2):
print("stage1")
lala = []
lis = [1, 2, 3, 4, 5]
for i in range(len(lis)):
# to avoid unnecessary waiting
if not queueS2.empty():
msg = queueS2.get() # get msg from s2
print("! ! ! stage1 RECEIVED from s2:", msg)
lala = [6, 7, 8] # now that a msg was received, further msgs will be different
time.sleep(1) # work
random.shuffle(lis)
queueS1.put(lis + lala)
queueS1.put('s1 is DONE')
# stage2.py
import time
class Stage2:
def stage2(self, queueS1, queueS2):
print("stage2")
while True:
msg = queueS1.get() # wait till there is a msg from s1
print("- - - stage2 RECEIVED from s1:", msg)
if msg == 's1 is DONE ':
break # ends loop
time.sleep(1) # work
queueS2.put("update lists")
EDIT: just found that you can use queue.get(False) to avoid blockage when receiving data. This way there's no need to check first if the queue is empty. This is no possible if you use pipes.
Use text files or environnement variables. Since the two run separatly, you can't really do what you are trying to do.
In your example, the first script runs to completion, and then the second script runs. That means you need some sort of persistent state. Other answers have suggested using text files or Python's pickle module. Personally I am lazy, and I wouldn't use a text file when I could use pickle; why should I write a parser to parse my own text file format?
Instead of pickle you could also use the json module to store it as JSON. This might be preferable if you want to share the data to non-Python programs, as JSON is a simple and common standard. If your Python doesn't have json, get simplejson.
If your needs go beyond pickle or json -- say you actually want to have two Python programs executing at the same time and updating the persistent state variables in real time -- I suggest you use the SQLite database. Use an ORM to abstract the database away, and it's super easy. For SQLite and Python, I recommend Autumn ORM.
This method seems straight forward for me:
class SharedClass:
def __init__(self):
self.data = {}
def set_data(self, name, value):
self.data[name] = value
def get_data(self, name):
try:
return self.data[name]
except:
return "none"
def reset_data(self):
self.data = {}
sharedClass = SharedClass()
PS : you can set the data with a parameter name and a value for it, and to access the value you can use the get_data method, below is the example:
to set the data
example 1:
sharedClass.set_data("name","Jon Snow")
example 2:
sharedClass.set_data("email","jon#got.com")\
to get the data
sharedClass.get_data("email")\
to reset the entire state simply use
sharedClass.reset_data()
Its kind of accessing data from a json object (dict in this case)
Hope this helps....
You could use the basic from and import functions in python to import the variable into two.py. For example:
from filename import variable
That should import the variable from the file.
(Of course you should replace filename with one.py, and replace variable with the variable you want to share to two.py.)
You can also solve this problem by making the variable as global
python first.py
class Temp:
def __init__(self):
self.first = None
global var1
var1 = Temp()
var1.first = 1
print(var1.first)
python second.py
import first as One
print(One.var1.first)

Using os.popen() or subprocess to execute functions

I'm currently studying threading, multiprocess, and os documentations to improve the structure of my program. However to be honest to me, some of it is sophisticated, I can't get it to implement on my program, either it crashes due to stackoverflow, or gets the wrong ourput or no output at all. So here's my problem.
Let's say I have a list of names that gets passed into a function and that function is what I want to run in another console with - ofcourse a python interpretter. and have it run there in a full cycle.
Let's say I have this:
def execute_function(name, arg1, arg2):
While True:
#do something
for name in names:
execute_function(name, arg1, arg2)
what should I use in order to run this function to open another console programatically on python and run it there While True: is it subproccess/multiprocess/threading or perhaps os.popen()?
And how should I execute, in this example? The multiprocessing Pool and Process always crashes with me. So I think its not the right solution. So far from what I've searched I haven't seen examples with threading and subprocess being used with functions. Is there a workaround on this? or perhaps a simple solution I might have missed? Thanks.
Edit:
A similar code:
if symbols is not None and symbols1 is not None:
symbols = [x for x in symbols if x is not None]
symbols1 = [x for x in symbols1 if x is not None]
if symbol != None and symbol in symbols and symbol in symbols1:
with Pool(len(exchanges)) as p:
p.map(bot_algorithm, (a, b, symbol, expent,amount))
http://prntscr.com/j4viat - what the error looks like
subprocess is always usually preferred over os.system().
The docs contain a number of examples - in your case, your execute_function() function might want to use subprocess.check_output() if you want to see the results of the command.
eg.:
def execute_function(name, arg1, arg2):
output = subprocess.check_output(["echo", name])
print(output)
All this does though is launch a new process, and waits for it to return. While that's technically two processes, it's not exactly what you'd call multi-threading.
To use run multiple subprocesses at synchronously, you might do something like this with the multiprocessing library:
from multiprocessing.dummy import Pool
def execute_function(name, arg1, arg2):
return subprocess.check_output(["echo", name])
names = ["alex", "bob", "chrissy"]
pool = Pool()
map_results = pool.map(execute_function, names)
this maps an iterator (the list of names) to a function (execute_function) and runs them all at once. Well, as many cores as your machine has at once. map_results is a list of return values from the execute_function func.

Pathos multiprocessing in class produces garbled standard output

I'm trying to use multiprocessing in a class I have written to speed up calculations. I'm using pathos.multiprocessing and dill, and using map on a ProcessingPool. I've tested the functionality of multiprocessing in a console and it performed as expected. The issue I'm having is that when I try to implement it in my code, as soon as it calls pool.map the terminal I'm using starts spitting out ridiculous nonsense. The output is recognizable as being from the code, but I have no idea how it's making it print. Some of it comes from a method like I defined below, where it includes the current datetime. In the nonsense I see that it's printing the current time, after pool.map was called, so this isn't just something that's just being repeatedly printed out, it's new output. Here is a little code illustrating how I'm using multiprocessing.
My_func is a little more complicated than I have below, but as a first step I changed it to literally what is written below, and the problem still persists.
Additionally, Ctr-C does trigger a KeyboardInterrupt, but does not completely stop the program. I'm using Visual Studio and python 2.7.13 on Windows 10.
from pathos.multiprocessing import ProcessingPool
import dill
import datetime
class my_class(Object):
def __init__(self):
pool = ProcessingPool(nodes=4)
p1 = [1,2,3]
p2 = [4,5,6]
p3 = [7,8,9]
results = pool.map(self.my_func, p1, p2, p3)
def my_func(self,x,y,z):
print(x,y,z)
def status_printout(self,message):
header = datetime.datetime.now().strftime('%Y/%m/%d %H:%M:%S')
print(header+' -- '+message)
Try using a Lock to ensure only one of the subprocesses writes to stdout at a time.
I was not using the suggested
if __name__ == '__main__':
freeze_support()
for Windows. Things are behaving normally now.

How to use multiprocess in python on a class object

I am fairly new to Python, and my experience is specific to its use in Powerflow modelling through the API provided in Siemens PSS/e. I have a script that I have been using for several years that runs some simulation on a large data set.
In order to get to finish quickly, I usually split the inputs up into multiple parts, then run multiple instances of the script in IDLE. Ive recently added a GUI for the inputs, and have refined the code to be more object oriented, creating a class that the GUI passes the inputs to but then works as the original script did.
My question is how do I go about splitting the process from within the program itself rather than making multiple copies? I have read a bit about the mutliprocess module but I am not sure how to apply it to my situation. Essentially I want to be able to instantiate N number of the same object, each running in parallel.
The class I have now (called Bot) is passed a set of arguments and creates a csv output while it runs until it finishes. I have a separate block of code that puts the pieces together at the end but for now I just need to understand the best approach to kicking multiple Bot objects off once I hit run in my GUI. Ther are inputs in the GUI to specify the number of N segments to be used.
I apologize ahead of time if my question is rather vague. Thanks for any information at all as Im sort of stuck and dont know where to look for better answers.
Create a list of configurations:
configurations = [...]
Create a function which takes the relevant configuration, and makes use of your Bot:
def function(configuration):
bot = Bot(configuration)
bot.create_csv()
Create a Pool of workers with however many CPUs you want to use:
from multiprocessing import Pool
pool = Pool(3)
Call the function multiple times which each configuration in your list of configurations.
pool.map(function, configurations)
For example:
from multiprocessing import Pool
import os
class Bot:
def __init__(self, inputs):
self.inputs = inputs
def create_csv(self):
pid = os.getpid()
print('TODO: create csv in process {} using {}'
.format(pid, self.inputs))
def use_bot(inputs):
bot = Bot(inputs)
bot.create_csv()
def main():
configurations = [
['input1_1.txt', 'input1_2.txt'],
['input2_1.txt', 'input2_2.txt'],
['input3_1.txt', 'input3_2.txt']]
pool = Pool(2)
pool.map(use_bot, configurations)
if __name__ == '__main__':
main()
Output:
TODO: create csv in process 10964 using ['input2_1.txt', 'input2_2.txt']
TODO: create csv in process 8616 using ['input1_1.txt', 'input1_2.txt']
TODO: create csv in process 8616 using ['input3_1.txt', 'input3_2.txt']
If you'd like to make life a little less complicated, you can use multiprocess instead of multiprocessing, as there is better support for classes and also for working in the interpreter. You can see below, we can now work directly with a method on a class instance, which is not possible with multiprocessing.
>>> from multiprocess import Pool
>>> import os
>>>
>>> class Bot(object):
... def __init__(self, x):
... self.x = x
... def doit(self, y):
... pid = os.getpid()
... return (pid, self.x + y)
...
>>> p = Pool()
>>> b = Bot(5)
>>> results = p.imap(b.doit, range(4))
>>> print dict(results)
{46552: 7, 46553: 8, 46550: 5, 46551: 6}
>>> p.close()
>>> p.join()
Above, I'm using imap, to get an iterator on the results, which I'll just dump into a dict. Note that you should close your pools after you are done, to clean up. On Windows, you may also want to look at freeze_support, for cases where the code otherwise fails to run.
>>> import multiprocess as mp
>>> mp.freeze_support

Python: Non-Responsive multiprocessing.pool.map_async() function

I have a strange problem here.
I have a python program that executes code held in seperate .py files, designed to be executed in sequence, one after another. The codes work fine, however they take too long to run. My plan was to split up processing each of these .py files amongst 4 processors using multiprocessing.pool.map_async(function, arguments) using execfile() as the function and the filename as the argument.
So anyways, when I run the code, absolutely nothing happens at all, not even an error.
Take a look and see if you can help me out, I run the file in SeqFile.runner(SeqFile.file).
class FileRunner:
def __init__(self, file):
self.file = file
def runner(self, file):
self.run = pool.map_async(execfile, file)
SeqFile = FileRunner("/Users/haysb/Dropbox/Stuart/Sample_proteins/Code/SVS_CodeParts/SequencePickler.py")
VolFile = FileRunner("/Users/haysb/Dropbox/Stuart/Sample_proteins/Code/SVS_CodeParts/VolumePickler.py")
CWFile = FileRunner("/Users/haysb/Dropbox/Stuart/Sample_proteins/Code/SVS_CodeParts/Combine_and_Write.py")
(SeqFile.runner(SeqFile.file))
You have several problems here - I'm guessing you never used multiprocessing before.
One of your problems is that you fire off an async operation but never wait for it to end. If you did wait for it to end, you'd get more info. For example, add:
result = SeqFile.run.get()
Do that, and you'll see the exception raised in the child process: you're mapping execfile over the string bound to file, so execfile sees one character at a time. execfile barfs when the first thing it tries to do is (in effect):
execfile("/")
apply_async() would make a lot more sense, or map_async() passed a list of all the files you want to run.
Etc - this gets tedious ;-)
Specifics
Let's get rid of the irrelevant cruft here, and show a complete executable program. I have three files a.py, b.py and c.py. Here's a.py:
print "I'm A!"
The other two are the obvious variations.
Here's my entire driver:
if __name__ == "__main__":
import multiprocessing as mp
files = ["a.py", "b.py", "c.py"]
pool = mp.Pool(2)
pool.imap_unordered(execfile, files)
pool.close()
pool.join()
That's all it takes, and prints (some permutation of):
I'm A!
I'm B!
I'm C!
imap_unordered() splits the list of files up among the worker processes, and doesn't care ("unordered") which order they run in. That's maximally efficient. Note that I restricted the number of workers to 2, just to show that it works fine even though there are more files (3) than worker processes (2).
You can get any of the Pool functions to work similarly. If you have ;-) to use map_async(), for example, replace the imap_unordered() call with:
async = pool.map_async(execfile, files)
async.get()
Or:
asyncs = [pool.apply_async(execfile, (fn,)) for fn in files]
for a in asyncs:
a.get()
Clearer? Keep it as simple as possible at first.

Categories

Resources