For a couple weeks I have been trying to solve a problem with a multiprocessing module in python (2.7.x)
Idea:
Lets have Message Queue (RabbitMQ in our case). Create a listener on that queue and on the message spawn task which will process that message.
Problem:
Everything works fine, but after a couple hundred tasks, some sub-processes became zombies which is the main problem.
We have also some limitation (such as max number of tasks per machine) - which in the end leads that the machine stops processing any task.
Current implementation:
I created minimal code which should explain our approach
# -*- coding: utf-8 -*-
from multiprocessing import Process
import signal
from threading import Lock
class Task(Process):
def __init__(self, data):
super(Task, self).__init__()
self.data = data
def run(self):
# ignore sigchild signals in subprocess
signal.signal(signal.SIGCHLD, signal.SIG_DFL)
self.do_job() # long job there
pass
def do_job(self):
# very long job
pass
class MQListener(object):
def __init__(self):
self.tasks = []
self.tasks_lock = Lock()
self.register_signal_handler()
mq = RabbitMQ()
mq.listen("task_queue", self.on_message)
def register_signal_handler(self):
signal.signal(signal.SIGCHLD, self.on_signal_received)
def on_signal_received(self, *_):
self._check_existing_processes()
def on_message(self, message):
# ack message and create task
task = Task(message)
with self.tasks_lock:
self.tasks.append(task)
task.start()
pass
def _check_existing_processes(self):
"""
go over all created task, if some is not alive - remove them from tasks collection
"""
try:
with self.tasks_lock:
running_tasks = []
for w in self.tasks:
if not w.is_alive():
w.join()
else:
running_tasks.append(w)
self.tasks = running_tasks
except Exception:
# log
pass
if __name__ == '__main__':
m = MQListener()
I'm quite open to use some library for that - if you can recommend some, that will be great as well.
Using SIGCHLD to catch child processes termination has quite many gotchas. The signal handler is run asynchronously and multiple SIGCHLD calls might get aggregated.
In short is better not to use it as long as you're not really aware of how it works.
Your program has, as well, another issue: what happens if you get 10000 messages at once? You'll spawn 10000 processes altogether and kill your machine.
You could use a process Pool and let it handle all these issues for you.
from multiprocessing import Pool
class MQListener(object):
def __init__(self):
self.pool = Pool()
self.rabbitclient = RabbitMQ()
def new_message(self, message):
self.pool.apply_async(do_job, args=(message, ))
def run(self):
self.rabbitclient.listen("task_queue", self.new_message)
app = MQListener()
app.run()
Related
I have main works with heavy calculations and also logging with many IO operations.
I don't care much about either the speed or the order of logging.
What I want is a log collector who can take the context I want to log in a new thread so that my main script can keep running without being blocked.
The code I tried is as below:
import threading
from loguru import logger
from collections import deque
import time
class ThreadLogger:
def __init__(self):
self.thread = threading.Thread(target=self.run, daemon=True)
self.log_queue = deque()
self.thread.start()
self.run()
def run(self):
# I also have tried while True:
while self.log_queue:
log_func, context = self.log_queue.popleft()
log_func(*context)
def addLog(self, log_func, context):
self.log_queue.append([log_func, context])
thlogger = ThreadLogger()
for i in range(20):
# add log here with new thread so that won't affect main jobs
thlogger.addLog(logger.debug, (f'hi {i}',))
# main jobs here (I want to do some real shit here with heavy calculation)
The code above doesn't really work as my expectation.
It cannot detect by itself when to digest the queue
Also, if I use "while True: " it just blocks the queue that the queue is never getting longer.
All other techniques I can come out with aren't really doing on a new single thread
Any suggestions I would be very appreciated!
Remove the call self.run() as you already have started a thread to run that method. And it is that call that is blocking your program. It causes the main thread to sit blocked on the empty queue.
def __init__(self):
self.thread = threading.Thread(target=self.run, daemon=True)
self.log_queue = deque()
self.thread.start()
#self.run() # remove
Once you do that then you can change while self.log_queue: to while True:
As Dan D.'s answer
import threading
from loguru import logger
from collections import deque
import time
class ThreadLogger:
def __init__(self):
self.thread = threading.Thread(target=self.run, daemon=True)
self.log_queue = deque()
self.thread.start()
def run(self):
while True:
if self.log_queue:
log_func, context = self.log_queue.popleft()
log_func(*context)
def addLog(self, log_func, context):
self.log_queue.append([log_func, context])
thlogger = ThreadLogger()
for i in range(20):
thlogger.addLog(logger.debug, (f'hi {i}',))
time.sleep(1) # wait for log to happen
I have 4 different Python custom objects and an events queue. Each obect has a method that allows it to retrieve an event from the shared events queue, process it if the type is the desired one and then puts a new event on the same events queue, allowing other processes to process it.
Here's an example.
import multiprocessing as mp
class CustomObject:
def __init__(events_queue: mp.Queue) -> None:
self.events_queue = event_queue
def process_events_queue() -> None:
event = self.events_queue.get()
if type(event) == SpecificEventDataTypeForThisClass:
# do something and create a new_event
self.events_queue.put(new_event)
else:
self.events_queue.put(event)
# there are other methods specific to each object
These 4 objects have specific tasks to do, but they all share this same structure. Since I need to "simulate" the production condition, I want them to run all at the same time, indipendently from eachother.
Here's just an example of what I want to do, if possible.
import multiprocessing as mp
import CustomObject
if __name__ == '__main__':
events_queue = mp.Queue()
data_provider = mp.Process(target=CustomObject, args=(events_queue,))
portfolio = mp.Process(target=CustomObject, args=(events_queue,))
engine = mp.Process(target=CustomObject, args=(events_queue,))
broker = mp.Process(target=CustomObject, args=(events_queue,))
while True:
data_provider.process_events_queue()
portfolio.process_events_queue()
engine.process_events_queue()
broker.process_events_queue()
My idea is to run each object in a separate process, allowing them to communicate with events shared through the events_queue. So my question is, how can I do that?
The problem is that obj = mp.Process(target=CustomObject, args=(events_queue,)) returns a Process instance and I can't access the CustomObject methods from it. Also, is there a smarter way to achieve what I want?
Processes require a function to run, which defines what the process is actually doing. Once this function exits (and there are no non-daemon threads) the process is done. This is similar to how Python itself always executes a __main__ script.
If you do mp.Process(target=CustomObject, args=(events_queue,)) that just tells the process to call CustomObject - which instantiates it once and then is done. This is not what you want, unless the class actually performs work when instantiated - which is a bad idea for other reasons.
Instead, you must define a main function or method that handles what you need: "communicate with events shared through the events_queue". This function should listen to the queue and take action depending on the events received.
A simple implementation looks like this:
import os, time
from multiprocessing import Queue, Process
class Worker:
# separate input and output for simplicity
def __init__(self, commands: Queue, results: Queue):
self.commands = commands
self.results = results
# our main function to be run by a process
def main(self):
# each process should handle more than one command
while True:
value = self.commands.get()
# pick a well-defined signal to detect "no more work"
if value is None:
self.results.put(None)
break
# do whatever needs doing
result = self.do_stuff(value)
print(os.getpid(), ':', self, 'got', value, 'put', result)
time.sleep(0.2) # pretend we do something
# pass on more work if required
self.results.put(result)
# placeholder for what needs doing
def do_stuff(self, value):
raise NotImplementedError
This is a template for a class that just keeps on processing events. The do_stuff method must be overloaded to define what actually happens.
class AddTwo(Worker):
def do_stuff(self, value):
return value + 2
class TimesThree(Worker):
def do_stuff(self, value):
return value * 3
class Printer(Worker):
def do_stuff(self, value):
print(value)
This already defines fully working process payloads: Process(target=TimesThree(in_queue, out_queue).main) schedules the main method in a process, listening for and responding to commands.
Running this mainly requires connecting the individual components:
if __name__ == '__main__':
# bookkeeping of resources we create
processes = []
start_queue = Queue()
# connect our workers via queues
queue = start_queue
for element in (AddTwo, TimesThree, Printer):
instance = element(queue, Queue())
# we run the main method in processes
processes.append(Process(target=instance.main))
queue = instance.results
# start all processes
for process in processes:
process.start()
# send input, but do not wait for output
start_queue.put(1)
start_queue.put(248124)
start_queue.put(-256)
# send shutdown signal
start_queue.put(None)
# wait for processes to shutdown
for process in processes:
process.join()
Note that you do not need classes for this. You can also compose functions for a similar effect, as long as everything is pickle-able:
import os, time
from multiprocessing import Queue, Process
def main(commands, results, do_stuff):
while True:
value = commands.get()
if value is None:
results.put(None)
break
result = do_stuff(value)
print(os.getpid(), ':', do_stuff, 'got', value, 'put', result)
time.sleep(0.2)
results.put(result)
def times_two(value):
return value * 2
if __name__ == '__main__':
in_queue, out_queue = Queue(), Queue()
worker = Process(target=main, args=(in_queue, out_queue, times_two))
worker.start()
for message in (1, 3, 5, None):
in_queue.put(message)
while True:
reply = out_queue.get()
if reply is None:
break
print('result:', reply)
I just started getting familiar with multiprocessing in python and got stuck at a problem which I'm not able to solve the way i want and I don't find any clear information if what I'm trying is even properly solvable.
What i'm trying to do is something similar to the following:
import time
from multiprocessing import Process, Event, Queue
from threading import Thread
class Main:
def __init__(self):
self.task_queue = Queue()
self.process = MyProcess(self.task_queue)
self.process.start()
def execute_script(self, code):
ProcessCommunication(code, self.task_queue).start()
class ProcessCommunication(Thread):
def __init__(self, script, task_queue):
super().__init__()
self.script = script
self.script_queue = task_queue
self.script_end_event = Event()
def run(self):
self.script_queue.put((self.script, self.script_end_event))
while not self.script_end_event.is_set():
time.sleep(0.1)
class MyProcess(Process):
class ExecutionThread(Thread):
def __init__(self, code, end_event):
super().__init__()
self.code = code
self.event = end_event
def run(self):
exec(compile(self.code, '<string>', 'exec'))
self.event.set()
def __init__(self, task_queue):
super().__init__(name="TEST_PROCESS")
self.task_queue = task_queue
self.status = None
def run(self):
while True:
if not self.task_queue.empty():
script, end_event = self.task_queue.get()
if script is None:
break
self.ExecutionThread(script, end_event).start()
So I would like to have one separate process, which is running during the whole runtime of my main Process, to execute user written scripts in an environment with restriced user privileges, restriced namespace. Also to protect the main process from potential endless loops without waiting times which load the CPU core too much.
Example Code to use the structure could look something like this:
if __name__ == '__main__':
main_class = Main()
main_class.execute_script("print(1)")
The main process can start several scripts simultaneously and I would like to pass an event, together with the execution request, to the process so that the main process gets notified whenever one of the scripts finished.
However, the Python process Queues somehow do not like the passing of events thorugh the queue and throw the following Error.
'RuntimeError: Semaphore objects should only be shared between processes through inheritance'
As I create another event with every execution request, I can't pass them on instantiation of the Process.
I came up with one way to solve this, which is passing an identifier together with the code and basically set up another queue which is served with the identifier whenever the end_event would be set. However, the usage of events seems much more elegant to me and i wonder if there is solution which I did not think of yet.
I'm looking into concurrency options for Python. Since I'm an iOS/macOS developer, I'd find it very useful if there was something like NSOperationQueue in python.
Basically, it's a queue to which you can add operations (every operation is Operation-derived class with run method to implement) which are executed either serially, or in parallel or ideally various dependencies can be set on operations (ie that some operation depends on others being executed before it can start).
have you looked celery as an option? This is what celery website quotes
Celery is an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well.
I'm looking for it, too. But since it doesn't seem to exist yet, I have written my own implementation:
import time
import threading
import queue
import weakref
class OperationQueue:
def __init__(self):
self.thread = None
self.queue = queue.Queue()
def run(self):
while self.queue.qsize() > 0:
msg = self.queue.get()
print(msg)
# emulate if it cost time
time.sleep(2)
def addOperation(self, string):
# put to queue first for thread safe.
self.queue.put(string)
if not (self.thread and self.thread.is_alive()):
print('renew a thread')
self.thread = threading.Thread(target=self.run)
self.thread.start()
myQueue = OperationQueue()
myQueue.addOperation("test1")
# test if it auto free
item = weakref.ref(myQueue)
time.sleep(1)
myQueue.addOperation("test2")
myQueue = None
time.sleep(3)
print(f'item = {item}')
print("Done.")
I'm trying to write a program in which 1 function adds information to a queue and the other function reads the queue in the meanwhile and does some misc task with it. The program has to put info in the queue and read from it at the same time.
Example:
from multiprocessing import Process
from time import clock, sleep
import Queue
class HotMess(object):
def __init__(self):
self.market_id=['1','2','3']
self.q2=Queue.Queue()
self.session=True
def run(self):
if __name__=='__main__':
t = Process(target=self.get_data)
k = Process(target=self.main_code)
t.run()
k.run()
def main_code(self):
while self.session:
result=self.q2.get()
print(result)
def get_data(self):
#while self.session:
for market in self.market_id:
self.q2.put(market)
mess=HotMess()
mess.run()
Now this produces output of 1 2 3. So far so good. But I actually want get_data to be a while loop and basicly run indefinately. Now if you uncomment while self.session: in the get_data function and do the indent. Now it doesn't produce any output and I think this is due that the Process get_data doesn't finish as long as self.session=True.
My question is how can I main_code() not wait for get_data() and just start working the Queue so they both interact with the Queue (q2). I tried looking at process/threading/POpen but i'm quite far out of my comfort zone and at a bit of a loss.
look here
you should use multiprocessing.Queue. it's for the communication between different processes.
And I also change run to start and join.
from multiprocessing import Process, Queue
from time import clock, sleep
class HotMess(object):
def __init__(self):
self.market_id=['1','2','3']
self.q2=Queue()
self.session=True
def run(self):
if __name__=='__main__':
t = Process(target=self.get_data)
k = Process(target=self.main_code)
t.start()
k.start()
t.join()
k.join()
def main_code(self):
while self.session:
result=self.q2.get()
print(result)
def get_data(self):
while self.session:
for market in self.market_id:
self.q2.put(market)
mess=HotMess()
mess.run()