Python, Using Remote Managers and Multiprocessing

Python, Using Remote Managers and Multiprocessing - python

I want to use the remote manager functions in the multiprocessing module to distribute work among many machines. I know there are 3rd party modules, but I want to stick with core as much as possible. I know for desktop (single machine), you can use the multiprocessing.Pool class to limit the number of CPUs, but have a couple of questions with remote managers.
I have the following code for the remote manager:
from multiprocessing.managers import BaseManager
import Queue
queue = Queue.Queue()
class QueueManager(BaseManager): pass
QueueManager.register('get_queue', callable=lambda:queue)
m = QueueManager(address=('', 50000), authkey='abracadabra')
s = m.get_server()
s.serve_forever()
This works great, and I can even submit a job into the Queue using the following code:
QueueManager.register('get_queue')
m = QueueManager(address=('machinename', 50000), authkey='abracadabra')
m.connect()
queue = m.get_queue()
queue.put('hello')
You can also the queue.get() to get a single entry in the queue.
How do you get the items in the queue? When I tried to iterate through the queue, I enter an infinite loop.
On the workers, can you limit each machine to 1 job per machine?
Since this method seems to be a pull method, where the workers need to examine if a job exists, can there be a push method where the multiprocessing server can be triggered?

Iterating over a queue is the same as doing:
while True:
elem = queue.get() #queue empty -> it blocks!!!
An elegant way to "iterate" over a queue and block your worker process when there are no more jobs to execute is to use None(or something else) as a sentinel and use iter(callable, sentinel):
for job in iter(queue.get, None):
# execute the calculation
output_queue.put(result)
#shutdown the worker process
Which is equivalent to:
while True:
job = queue.get()
if job is None:
break
#execute the calculation
output_queue.put(result)
#shutdown the worker process
Note that you have to insert in the queu a sentinel for each worker subprocess, otherwise there will be subprocesses waiting for it.
Regarding your second question, I don't understand what you are asking. The BaseManager provides one server that executes the calls from the clients, so, obviously, all requests are satisfied by the same machine.
Or do you mean allow each client to do only a request? I don't see any option for this, even though it could be implemented "by hand".
I don't understand your question. What is like a pull method? Can you rephrase your question with a bit more details on what you mean by "a push method where the multiprocessing server can be triggered"?

Related

Completing threads and interacting with results at a different rate

I'd like to get some feedback on an approach for receiving data from multiple threads in a concurrent.futures.ThreadPoolExecutor and iterating over the results. Given the scenario a ThreadPoolExecutor has future thread results appended to a buffer container and a secondary / decoupled operation read and withdraw from the same buffer container.
Thread Manager Workflow
/|-> Thread 1 > results \
ThreadPoolExecutor --|-> Thread 2 > results --> Queue [1,2,3] (end)
\|-> Thread 3 > results /
Now we have results from the threads in a First-In-First-Out queue container - which needs to be thread-safe. Now the above process is done and results (str|int|bool|list|dict|any) are in the container awaiting processing by the next step: Communicate the gathered results.
Communication Workflow
/|-> Terminal Print
Queue [1,2,3] < Listener > Communicate --|-> Speech Engine Say
\|-> Write to Log / File
The Communicate class needs to be "listening" on the Queue for new entries, and processing each as they come in at it's own speed (the rate of speech using a text to speech module - Producer-Consumer Problem) and potentially any number of other outputs, so this really can't be invoked from the top-down. If, the Thread Manager calls directly or lets each thread call the Communicate class directly to invoke the Speech Engine we will hear stuttered speech as the speech engine will override itself with each invocation. Thus, we need to decouple the Thread Manager workflow from the Communicate workflow but have them write & read with an In/Out type buffer or Queue and need for a "listener" concept.
I've found references for a structure like the following running as a daemon thread, but the while loop makes me cringe and consumes too much cpu, so I still need a non-blocking approach, where self.pipeline is a queue.Queue object:
while True :
try :
if not self.pipeline.empty ( ) :
task = self.pipeline.get ( timeout=1 )
if task :
self.serve ( task, )
except queue.Empty :
continue
Again, in need of something other than a while loop for this...

As you write in the comments, its standard producer consumer problem. One solution in python is using multithreading and the Queue class
The queue is thread safe . Its using a mutex internally, which handles busy waiting.
Queue.get will eventually call wait on its internal mutex. This will block the calling thread . But instead of busy waiting , which is using cpu, the thread will be put in sleep state. A thread scheduler of the os will take over from here, and will wake up the thread , when items are available (simplified ).
So you can still have while True loops within multiple thread consumers which call queue.get on shared queue. If items are available the threads directly process them, if not, they go into sleep mode and free the cpu. Same goes for producer threads , they simply call Queue.put
However there is one caveat in python. Python has something called global interpreter lock - GIL. This is because it is using a lot of c extension and allows modules which bring in c extensions. But those are not always thread safe. A GIL means, that only one thread will run on only one cpu at a time.
So , once an item is in the queue, only one consumer at a time will wake up and process the result. Also normally one producer can run at a time.
Except those threads start waiting for some I/O, like reading from a socket. Because I/O notification is handled by some other cpu part, there is always some waiting time for I/O. In that time, the threads release the GIL and other threads can do the work.
Summed up, it only makes sense to have multiple consumers and producer threads if they also do some I/O work - read/write on a network socket or disk. This is called concurrency. If you want to use multiple cpu cores at same time, you need to use multiprocessing in python instead of threads.
And it only makes sense to have more processes than cores, if there is also some IO work.
Example

I would suggest that you use multiprocessing rather than threading to ensure maximum parallelism. I am not sure whether you really need a process pool for what you are trying to do rather than 4 dedicated processes; it's a question of how "threads" 1 through 3 are getting their data for feeding to the queue to be processed by the 4th process. Are these implemented by a single, identical worker function to whom "jobs" are submitted? If so then a process pool of 3 identical workers is what you want. But if these are 3 separate functions with their own processing logic, then you just want to create 3 Process instances. I am working on the second assumption.
Since we are now in the realm of multiprocessing, I would suggest using a "managed" Queue instance created with the following code:
with multiprocessing.Manager() as manager:
q = manager.Queue()
Access to such a queue is synchronized across processeses. The following code is a rough idea of creating the processes and accessing the queue:
import multiprocessing
import time
class Communicate:
def listen(self, q):
while True:
obj = q.get()
if obj == None: # our signal to terminate
return
# do something with objects
print(obj)
def process1(q):
while True:
time.sleep(1)
q.put(1)
def process2(q):
while True:
time.sleep(.5)
q.put(2)
def process3(q):
while True:
time.sleep(1.5)
q.put(3)
if __name__ == '__main__':
communicator = Communicate()
with multiprocessing.Manager() as manager:
#start the commmunicator process:
q = manager.Queue()
p = multiprocessing.Process(target=communicator.listen, args=(q,))
p.start()
# start the other 3 processes:
p1 = multiprocessing.Process(target=process1, args=(q,))
p1.daemon = True
p1.start()
# start the other 3 processes:
p2 = multiprocessing.Process(target=process2, args=(q,))
p2.daemon = True
p2.start()
# start the other 3 processes:
p3 = multiprocessing.Process(target=process3, args=(q,))
p3.daemon = True
p3.start()
input('Hit any enter to terminate\n')
q.put(None) # signal for termination
p.join() # wait for process to complete

AttributeError 'DupFd' in 'multiprocessing.resource_sharer' | Python multiprocessing + threading

I'm trying to communicate between multiple threading.Thread(s) doing I/O-bound tasks and multiple multiprocessing.Process(es) doing CPU-bound tasks. Whenever a thread finds work for a process, it will be put on a multiprocessing.Queue, together with the sending end of a multiprocessing.Pipe(duplex=False). The processes then do their part and send results back to the threads via the Pipe. This procedure seems to work in roughly 70% of the cases, the other 30% I receive an AttributeError: Can't get attribute 'DupFd' on <module 'multiprocessing.resource_sharer' from '/usr/lib/python3.5/multiprocessing/resource_sharer.py'>
To reproduce:
import multiprocessing
import threading
import time
def thread_work(work_queue, pipe):
while True:
work_queue.put((threading.current_thread().name, pipe[1]))
received = pipe[0].recv()
print("{}: {}".format(threading.current_thread().name, threading.current_thread().name == received))
time.sleep(0.3)
def process_work(work_queue):
while True:
thread, pipe = work_queue.get()
pipe.send(thread)
work_queue = multiprocessing.Queue()
for i in range(0,3):
receive, send = multiprocessing.Pipe(duplex=False)
t = threading.Thread(target=thread_work, args=[work_queue, (receive, send)])
t.daemon = True
t.start()
for i in range(0,2):
p = multiprocessing.Process(target=process_work, args=[work_queue])
p.daemon = True
p.start()
time.sleep(5)
I had a look in the multiprocessing source code, but couldn't understand why this error occurs.
I tried using the queue.Queue, or a Pipe with duplex=True (default) but coudn't find a pattern in the error. Does anyone have a clue how to debug this?

You are forking an already multi-threaded main-process here. That is known to be problematic in general.
It is in-fact problem prone (and not just in Python). The rule is "thread after you fork, not before". Otherwise, the locks used by the thread executor will get duplicated across processes. If one of those processes dies while it has the lock, all of the other processes using that lock will deadlock -Raymond Hettinger.
Trigger for the error you get is apparantly that the duplication of the file-descriptor for the pipe fails in the child process.
To resolve this issue, either create your child-processes as long as your main-process is still single-threaded or use another start_method for creating new processes like 'spawn' (default on Windows) or 'forkserver', if available.
forkserver
When the program starts and selects the forkserver start method, a server process is started. From then on, whenever a new process is needed, the parent process connects to the server and requests that it fork a new process. The fork server process is single threaded so it is safe for it to use os.fork(). No unnecessary resources are inherited.
Available on Unix platforms which support passing file descriptors over Unix pipes. docs
You can specify another start_method with:
multiprocessing.set_start_method(method)
Set the method which should be used to start child processes. method can be 'fork', 'spawn' or 'forkserver'.
Note that this should be called at most once, and it should be protected inside the if name == 'main' clause of the main module. docs
For a benchmark of the specific start_methods (on Ubuntu 18.04) look here.

asyncio with multiple processors [duplicate]

As almost everyone is aware when they first look at threading in Python, there is the GIL that makes life miserable for people who actually want to do processing in parallel - or at least give it a chance.
I am currently looking at implementing something like the Reactor pattern. Effectively I want to listen for incoming socket connections on one thread-like, and when someone tries to connect, accept that connection and pass it along to another thread-like for processing.
I'm not (yet) sure what kind of load I might be facing. I know there is currently setup a 2MB cap on incoming messages. Theoretically we could get thousands per second (though I don't know if practically we've seen anything like that). The amount of time spent processing a message isn't terribly important, though obviously quicker would be better.
I was looking into the Reactor pattern, and developed a small example using the multiprocessing library that (at least in testing) seems to work just fine. However, now/soon we'll have the asyncio library available, which would handle the event loop for me.
Is there anything that could bite me by combining asyncio and multiprocessing?

You should be able to safely combine asyncio and multiprocessing without too much trouble, though you shouldn't be using multiprocessing directly. The cardinal sin of asyncio (and any other event-loop based asynchronous framework) is blocking the event loop. If you try to use multiprocessing directly, any time you block to wait for a child process, you're going to block the event loop. Obviously, this is bad.
The simplest way to avoid this is to use BaseEventLoop.run_in_executor to execute a function in a concurrent.futures.ProcessPoolExecutor. ProcessPoolExecutor is a process pool implemented using multiprocessing.Process, but asyncio has built-in support for executing a function in it without blocking the event loop. Here's a simple example:
import time
import asyncio
from concurrent.futures import ProcessPoolExecutor
def blocking_func(x):
time.sleep(x) # Pretend this is expensive calculations
return x * 5
#asyncio.coroutine
def main():
#pool = multiprocessing.Pool()
#out = pool.apply(blocking_func, args=(10,)) # This blocks the event loop.
executor = ProcessPoolExecutor()
out = yield from loop.run_in_executor(executor, blocking_func, 10) # This does not
print(out)
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
For the majority of cases, this is function alone is good enough. If you find yourself needing other constructs from multiprocessing, like Queue, Event, Manager, etc., there is a third-party library called aioprocessing (full disclosure: I wrote it), that provides asyncio-compatible versions of all the multiprocessing data structures. Here's an example demoing that:
import time
import asyncio
import aioprocessing
import multiprocessing
def func(queue, event, lock, items):
with lock:
event.set()
for item in items:
time.sleep(3)
queue.put(item+5)
queue.close()
#asyncio.coroutine
def example(queue, event, lock):
l = [1,2,3,4,5]
p = aioprocessing.AioProcess(target=func, args=(queue, event, lock, l))
p.start()
while True:
result = yield from queue.coro_get()
if result is None:
break
print("Got result {}".format(result))
yield from p.coro_join()
#asyncio.coroutine
def example2(queue, event, lock):
yield from event.coro_wait()
with (yield from lock):
yield from queue.coro_put(78)
yield from queue.coro_put(None) # Shut down the worker
if __name__ == "__main__":
loop = asyncio.get_event_loop()
queue = aioprocessing.AioQueue()
lock = aioprocessing.AioLock()
event = aioprocessing.AioEvent()
tasks = [
asyncio.async(example(queue, event, lock)),
asyncio.async(example2(queue, event, lock)),
]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()

Yes, there are quite a few bits that may (or may not) bite you.
When you run something like asyncio it expects to run on one thread or process. This does not (by itself) work with parallel processing. You somehow have to distribute the work while leaving the IO operations (specifically those on sockets) in a single thread/process.
While your idea to hand off individual connections to a different handler process is nice, it is hard to implement. The first obstacle is that you need a way to pull the connection out of asyncio without closing it. The next obstacle is that you cannot simply send a file descriptor to a different process unless you use platform-specific (probably Linux) code from a C-extension.
Note that the multiprocessing module is known to create a number of threads for communication. Most of the time when you use communication structures (such as Queues), a thread is spawned. Unfortunately those threads are not completely invisible. For instance they can fail to tear down cleanly (when you intend to terminate your program), but depending on their number the resource usage may be noticeable on its own.
If you really intend to handle individual connections in individual processes, I suggest to examine different approaches. For instance you can put a socket into listen mode and then simultaneously accept connections from multiple worker processes in parallel. Once a worker is finished processing a request, it can go accept the next connection, so you still use less resources than forking a process for each connection. Spamassassin and Apache (mpm prefork) can use this worker model for instance. It might end up easier and more robust depending on your use case. Specifically you can make your workers die after serving a configured number of requests and be respawned by a master process thereby eliminating much of the negative effects of memory leaks.

Based on #dano's answer above I wrote this function to replace places where I used to use multiprocess pool + map.
def asyncio_friendly_multiproc_map(fn: Callable, l: list):
"""
This is designed to replace the use of this pattern:
with multiprocessing.Pool(5) as p:
results = p.map(analyze_day, list_of_days)
By letting caller drop in replace:
asyncio_friendly_multiproc_map(analyze_day, list_of_days)
"""
tasks = []
with ProcessPoolExecutor(5) as executor:
for e in l:
tasks.append(asyncio.get_event_loop().run_in_executor(executor, fn, e))
res = asyncio.get_event_loop().run_until_complete(asyncio.gather(*tasks))
return res

See PEP 3156, in particular the section on Thread interaction:
http://www.python.org/dev/peps/pep-3156/#thread-interaction
This documents clearly the new asyncio methods you might use, including run_in_executor(). Note that the Executor is defined in concurrent.futures, I suggest you also have a look there.

threading and multithreading in python with an example

I am a beginner in python and unable to get an idea about threading.By using simple example could someone please explain threading and multithreading in python?
-Thanks

Here is Alex Martelli's answer about multithreading, as linked above.
He uses a simple program that tries some URLs then returns the contents of first one to respond.
import Queue
import threading
import urllib2
# called by each thread
def get_url(q, url):
q.put(urllib2.urlopen(url).read())
theurls = ["http://google.com", "http://yahoo.com"]
q = Queue.Queue()
for u in theurls:
t = threading.Thread(target=get_url, args = (q,u))
t.daemon = True
t.start()
s = q.get()
print s
This is a case where threading is used as a simple optimization: each subthread is waiting for a URL to resolve and respond, in order to put its contents on the queue; each thread is a daemon (won't keep the process up if main thread ends -- that's more common than not); the main thread starts all subthreads, does a get on the queue to wait until one of them has done a put, then emits the results and terminates (which takes down any subthreads that might still be running, since they're daemon threads).
Proper use of threads in Python is invariably connected to I/O operations (since CPython doesn't use multiple cores to run CPU-bound tasks anyway, the only reason for threading is not blocking the process while there's a wait for some I/O). Queues are almost invariably the best way to farm out work to threads and/or collect the work's results, by the way, and they're intrinsically threadsafe so they save you from worrying about locks, conditions, events, semaphores, and other inter-thread coordination/communication concepts.

Python multiprocessing load balancer

Short question: Is it possible to have N work processes and a balancer process that will find worker that does nothing at this time and pass UnitOfWork to it?
Long question:
Imagine class like this, witch will be subclassed for certain tasks:
class UnitOfWork:
def __init__(self, **some_starting_parameters):
pass
def init(self):
# open connections, etc.
def run(self):
# do the job
Start the balancer and worker process:
balancer = LoadBalancer()
workers = balancer.spawn_workers(10)
Deploy work (balancer should find a lazy worker, and pass a task to it, or else if every worker is busy, add UOW to queue and wait till free worker):
balancer.work(UnitOfWork(some=parameters))
# internally, find free worker, pass UOW, ouw.init() + ouw.run()
Is this possible (or is it crazy)?
PS I'm familiar with multiprocessing Process class, and process pools, but:
Every Process instance starts a process (yep :) ) - I want fixed num of workers
I want Process instance that can make generic work

I suggest you take a look at multiprocessing.Pool() because I believe it exactly solves your problem. It runs N "worker processes" and as each worker finishes a task, another task is provided. And there is no need for "poison pills"; it is very simple.
I have always used the .map() method on the pool.
Python multiprocessing.Pool: when to use apply, apply_async or map?
EDIT: Here is an answer I wrote to another question, and I used multiprocessing.Pool() in my answer.
Parallel file matching, Python

You don't need any smarts in the balancer; the Queue alone will do what you want. Throw each unit of work into the queue, and have the workers loop, taking a single work unit from the queue and processing it on each iteration. I don't think there's any problem passing an instance of UnitOfWork through the queue.
If you have a fixed amount of work to be done, you can create a "no more work to be done" work unit (a "poison pill") that tells a worker to shut down, and after all the regular work is put into the queue, put as many poison pills into the queue as you have workers.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.