I'm looking into concurrency options for Python. Since I'm an iOS/macOS developer, I'd find it very useful if there was something like NSOperationQueue in python.
Basically, it's a queue to which you can add operations (every operation is Operation-derived class with run method to implement) which are executed either serially, or in parallel or ideally various dependencies can be set on operations (ie that some operation depends on others being executed before it can start).
have you looked celery as an option? This is what celery website quotes
Celery is an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well.
I'm looking for it, too. But since it doesn't seem to exist yet, I have written my own implementation:
import time
import threading
import queue
import weakref
class OperationQueue:
def __init__(self):
self.thread = None
self.queue = queue.Queue()
def run(self):
while self.queue.qsize() > 0:
msg = self.queue.get()
print(msg)
# emulate if it cost time
time.sleep(2)
def addOperation(self, string):
# put to queue first for thread safe.
self.queue.put(string)
if not (self.thread and self.thread.is_alive()):
print('renew a thread')
self.thread = threading.Thread(target=self.run)
self.thread.start()
myQueue = OperationQueue()
myQueue.addOperation("test1")
# test if it auto free
item = weakref.ref(myQueue)
time.sleep(1)
myQueue.addOperation("test2")
myQueue = None
time.sleep(3)
print(f'item = {item}')
print("Done.")
Related
I have main works with heavy calculations and also logging with many IO operations.
I don't care much about either the speed or the order of logging.
What I want is a log collector who can take the context I want to log in a new thread so that my main script can keep running without being blocked.
The code I tried is as below:
import threading
from loguru import logger
from collections import deque
import time
class ThreadLogger:
def __init__(self):
self.thread = threading.Thread(target=self.run, daemon=True)
self.log_queue = deque()
self.thread.start()
self.run()
def run(self):
# I also have tried while True:
while self.log_queue:
log_func, context = self.log_queue.popleft()
log_func(*context)
def addLog(self, log_func, context):
self.log_queue.append([log_func, context])
thlogger = ThreadLogger()
for i in range(20):
# add log here with new thread so that won't affect main jobs
thlogger.addLog(logger.debug, (f'hi {i}',))
# main jobs here (I want to do some real shit here with heavy calculation)
The code above doesn't really work as my expectation.
It cannot detect by itself when to digest the queue
Also, if I use "while True: " it just blocks the queue that the queue is never getting longer.
All other techniques I can come out with aren't really doing on a new single thread
Any suggestions I would be very appreciated!
Remove the call self.run() as you already have started a thread to run that method. And it is that call that is blocking your program. It causes the main thread to sit blocked on the empty queue.
def __init__(self):
self.thread = threading.Thread(target=self.run, daemon=True)
self.log_queue = deque()
self.thread.start()
#self.run() # remove
Once you do that then you can change while self.log_queue: to while True:
As Dan D.'s answer
import threading
from loguru import logger
from collections import deque
import time
class ThreadLogger:
def __init__(self):
self.thread = threading.Thread(target=self.run, daemon=True)
self.log_queue = deque()
self.thread.start()
def run(self):
while True:
if self.log_queue:
log_func, context = self.log_queue.popleft()
log_func(*context)
def addLog(self, log_func, context):
self.log_queue.append([log_func, context])
thlogger = ThreadLogger()
for i in range(20):
thlogger.addLog(logger.debug, (f'hi {i}',))
time.sleep(1) # wait for log to happen
I have 4 different Python custom objects and an events queue. Each obect has a method that allows it to retrieve an event from the shared events queue, process it if the type is the desired one and then puts a new event on the same events queue, allowing other processes to process it.
Here's an example.
import multiprocessing as mp
class CustomObject:
def __init__(events_queue: mp.Queue) -> None:
self.events_queue = event_queue
def process_events_queue() -> None:
event = self.events_queue.get()
if type(event) == SpecificEventDataTypeForThisClass:
# do something and create a new_event
self.events_queue.put(new_event)
else:
self.events_queue.put(event)
# there are other methods specific to each object
These 4 objects have specific tasks to do, but they all share this same structure. Since I need to "simulate" the production condition, I want them to run all at the same time, indipendently from eachother.
Here's just an example of what I want to do, if possible.
import multiprocessing as mp
import CustomObject
if __name__ == '__main__':
events_queue = mp.Queue()
data_provider = mp.Process(target=CustomObject, args=(events_queue,))
portfolio = mp.Process(target=CustomObject, args=(events_queue,))
engine = mp.Process(target=CustomObject, args=(events_queue,))
broker = mp.Process(target=CustomObject, args=(events_queue,))
while True:
data_provider.process_events_queue()
portfolio.process_events_queue()
engine.process_events_queue()
broker.process_events_queue()
My idea is to run each object in a separate process, allowing them to communicate with events shared through the events_queue. So my question is, how can I do that?
The problem is that obj = mp.Process(target=CustomObject, args=(events_queue,)) returns a Process instance and I can't access the CustomObject methods from it. Also, is there a smarter way to achieve what I want?
Processes require a function to run, which defines what the process is actually doing. Once this function exits (and there are no non-daemon threads) the process is done. This is similar to how Python itself always executes a __main__ script.
If you do mp.Process(target=CustomObject, args=(events_queue,)) that just tells the process to call CustomObject - which instantiates it once and then is done. This is not what you want, unless the class actually performs work when instantiated - which is a bad idea for other reasons.
Instead, you must define a main function or method that handles what you need: "communicate with events shared through the events_queue". This function should listen to the queue and take action depending on the events received.
A simple implementation looks like this:
import os, time
from multiprocessing import Queue, Process
class Worker:
# separate input and output for simplicity
def __init__(self, commands: Queue, results: Queue):
self.commands = commands
self.results = results
# our main function to be run by a process
def main(self):
# each process should handle more than one command
while True:
value = self.commands.get()
# pick a well-defined signal to detect "no more work"
if value is None:
self.results.put(None)
break
# do whatever needs doing
result = self.do_stuff(value)
print(os.getpid(), ':', self, 'got', value, 'put', result)
time.sleep(0.2) # pretend we do something
# pass on more work if required
self.results.put(result)
# placeholder for what needs doing
def do_stuff(self, value):
raise NotImplementedError
This is a template for a class that just keeps on processing events. The do_stuff method must be overloaded to define what actually happens.
class AddTwo(Worker):
def do_stuff(self, value):
return value + 2
class TimesThree(Worker):
def do_stuff(self, value):
return value * 3
class Printer(Worker):
def do_stuff(self, value):
print(value)
This already defines fully working process payloads: Process(target=TimesThree(in_queue, out_queue).main) schedules the main method in a process, listening for and responding to commands.
Running this mainly requires connecting the individual components:
if __name__ == '__main__':
# bookkeeping of resources we create
processes = []
start_queue = Queue()
# connect our workers via queues
queue = start_queue
for element in (AddTwo, TimesThree, Printer):
instance = element(queue, Queue())
# we run the main method in processes
processes.append(Process(target=instance.main))
queue = instance.results
# start all processes
for process in processes:
process.start()
# send input, but do not wait for output
start_queue.put(1)
start_queue.put(248124)
start_queue.put(-256)
# send shutdown signal
start_queue.put(None)
# wait for processes to shutdown
for process in processes:
process.join()
Note that you do not need classes for this. You can also compose functions for a similar effect, as long as everything is pickle-able:
import os, time
from multiprocessing import Queue, Process
def main(commands, results, do_stuff):
while True:
value = commands.get()
if value is None:
results.put(None)
break
result = do_stuff(value)
print(os.getpid(), ':', do_stuff, 'got', value, 'put', result)
time.sleep(0.2)
results.put(result)
def times_two(value):
return value * 2
if __name__ == '__main__':
in_queue, out_queue = Queue(), Queue()
worker = Process(target=main, args=(in_queue, out_queue, times_two))
worker.start()
for message in (1, 3, 5, None):
in_queue.put(message)
while True:
reply = out_queue.get()
if reply is None:
break
print('result:', reply)
I just started getting familiar with multiprocessing in python and got stuck at a problem which I'm not able to solve the way i want and I don't find any clear information if what I'm trying is even properly solvable.
What i'm trying to do is something similar to the following:
import time
from multiprocessing import Process, Event, Queue
from threading import Thread
class Main:
def __init__(self):
self.task_queue = Queue()
self.process = MyProcess(self.task_queue)
self.process.start()
def execute_script(self, code):
ProcessCommunication(code, self.task_queue).start()
class ProcessCommunication(Thread):
def __init__(self, script, task_queue):
super().__init__()
self.script = script
self.script_queue = task_queue
self.script_end_event = Event()
def run(self):
self.script_queue.put((self.script, self.script_end_event))
while not self.script_end_event.is_set():
time.sleep(0.1)
class MyProcess(Process):
class ExecutionThread(Thread):
def __init__(self, code, end_event):
super().__init__()
self.code = code
self.event = end_event
def run(self):
exec(compile(self.code, '<string>', 'exec'))
self.event.set()
def __init__(self, task_queue):
super().__init__(name="TEST_PROCESS")
self.task_queue = task_queue
self.status = None
def run(self):
while True:
if not self.task_queue.empty():
script, end_event = self.task_queue.get()
if script is None:
break
self.ExecutionThread(script, end_event).start()
So I would like to have one separate process, which is running during the whole runtime of my main Process, to execute user written scripts in an environment with restriced user privileges, restriced namespace. Also to protect the main process from potential endless loops without waiting times which load the CPU core too much.
Example Code to use the structure could look something like this:
if __name__ == '__main__':
main_class = Main()
main_class.execute_script("print(1)")
The main process can start several scripts simultaneously and I would like to pass an event, together with the execution request, to the process so that the main process gets notified whenever one of the scripts finished.
However, the Python process Queues somehow do not like the passing of events thorugh the queue and throw the following Error.
'RuntimeError: Semaphore objects should only be shared between processes through inheritance'
As I create another event with every execution request, I can't pass them on instantiation of the Process.
I came up with one way to solve this, which is passing an identifier together with the code and basically set up another queue which is served with the identifier whenever the end_event would be set. However, the usage of events seems much more elegant to me and i wonder if there is solution which I did not think of yet.
For a couple weeks I have been trying to solve a problem with a multiprocessing module in python (2.7.x)
Idea:
Lets have Message Queue (RabbitMQ in our case). Create a listener on that queue and on the message spawn task which will process that message.
Problem:
Everything works fine, but after a couple hundred tasks, some sub-processes became zombies which is the main problem.
We have also some limitation (such as max number of tasks per machine) - which in the end leads that the machine stops processing any task.
Current implementation:
I created minimal code which should explain our approach
# -*- coding: utf-8 -*-
from multiprocessing import Process
import signal
from threading import Lock
class Task(Process):
def __init__(self, data):
super(Task, self).__init__()
self.data = data
def run(self):
# ignore sigchild signals in subprocess
signal.signal(signal.SIGCHLD, signal.SIG_DFL)
self.do_job() # long job there
pass
def do_job(self):
# very long job
pass
class MQListener(object):
def __init__(self):
self.tasks = []
self.tasks_lock = Lock()
self.register_signal_handler()
mq = RabbitMQ()
mq.listen("task_queue", self.on_message)
def register_signal_handler(self):
signal.signal(signal.SIGCHLD, self.on_signal_received)
def on_signal_received(self, *_):
self._check_existing_processes()
def on_message(self, message):
# ack message and create task
task = Task(message)
with self.tasks_lock:
self.tasks.append(task)
task.start()
pass
def _check_existing_processes(self):
"""
go over all created task, if some is not alive - remove them from tasks collection
"""
try:
with self.tasks_lock:
running_tasks = []
for w in self.tasks:
if not w.is_alive():
w.join()
else:
running_tasks.append(w)
self.tasks = running_tasks
except Exception:
# log
pass
if __name__ == '__main__':
m = MQListener()
I'm quite open to use some library for that - if you can recommend some, that will be great as well.
Using SIGCHLD to catch child processes termination has quite many gotchas. The signal handler is run asynchronously and multiple SIGCHLD calls might get aggregated.
In short is better not to use it as long as you're not really aware of how it works.
Your program has, as well, another issue: what happens if you get 10000 messages at once? You'll spawn 10000 processes altogether and kill your machine.
You could use a process Pool and let it handle all these issues for you.
from multiprocessing import Pool
class MQListener(object):
def __init__(self):
self.pool = Pool()
self.rabbitclient = RabbitMQ()
def new_message(self, message):
self.pool.apply_async(do_job, args=(message, ))
def run(self):
self.rabbitclient.listen("task_queue", self.new_message)
app = MQListener()
app.run()
I have a threaded python application with a long-running mainloop in the background thread. This background mainloop is actually a call to pyglet.app.run(), which drives a GUI window and also can be configured to call other code periodically. I need a do_stuff(duration) function to be called at will from the main thread to trigger an animation in the GUI, wait for the animation to stop, and then return. The actual animation must be done in the background thread because the GUI library can't handle being driven by separate threads.
I believe I need to do something like this:
import threading
class StuffDoer(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
self.max_n_times = 0
self.total_n_times = 0
self.paused_ev = threading.Event()
def run(self):
# this part is outside of my control
while True:
self._do_stuff()
# do other stuff
def _do_stuff(self):
# this part is under my control
if self.paused_ev.is_set():
if self.max_n_times > self.total_n_times:
self.paused_ev.clear()
else:
if self.total_n_times >= self.max_n_times:
self.paused_ev.set()
if not self.paused_ev.is_set():
# do stuff that must execute in the background thread
self.total_n_times += 1
sd = StuffDoer()
sd.start()
def do_stuff(n_times):
sd.max_n_times += n_times
sd.paused_ev.wait_for_clear() # wait_for_clear() does not exist
sd.paused_ev.wait()
assert (sd.total_n_times == sd.max_n_times)
EDIT: use max_n_times instead of stop_time to clarify why Thread.join(duration) won't do the trick.
From the documentation for threading.Event:
wait([timeout])
Block until the internal flag is true.
If the internal flag is true on entry,
return immediately. Otherwise, block
until another thread calls set() to
set the flag to true, or until the
optional timeout occurs.
I've found I can get the behavior I'm looking for if I have a pair of events, paused_ev and not_paused_ev, and use not_paused_ev.wait(). I could almost just use Thread.join(duration), except it needs to only return precisely when the background thread actually registers that the time is up. Is there some other synchronization object or other strategy I should be using instead?
I'd also be open to arguments that I'm approaching this whole thing the wrong way, provided they're good arguments.
Hoping I get some revision or additional info from my comment, but I'm kind of wondering if you're not overworking things by subclassing Thread. You can do things like this:
class MyWorker(object):
def __init__(self):
t = Thread(target = self._do_work, name "Worker Owned Thread")
t.daemon = True
t.start()
def _do_work(self):
While True:
# Something going on here, forever if necessary. This thread
# will go away if the other non-daemon threads terminate, possibly
# raising an exception depending this function's body.
I find this makes more sense when the method you want to run is something that is more appropriately a member function of some other class than it would be to as the run method on the thread. Additionally, this saves you from having to encapsulate a bunch of business logic inside of a Thread. All IMO, of course.
It appears that your GUI animation thread is using a spin-lock in its while True loop. This can be prevented using thread-safe queues. Based on my reading of your question, this approach would be functionally equivalent and efficient.
I'm omitting some details of your code above which would not change. I'm also assuming here that the run() method which you do not control uses the self.stop_time value to do its work; otherwise there is no need for a threadsafe queue.
from Queue import Queue
from threading import Event
class StuffDoer:
def __init__(self, inq, ready):
self.inq = inq
self.ready = ready
def _do_stuff(self):
self.ready.set()
self.stop_time = self.inq.get()
GUIqueue = Queue()
control = Event()
sd = StuffDoer(GUIqueue, control)
def do_stuff(duration):
control.clear()
GUIqueue.put(time.time() + duration)
control.wait()
I ended up using a Queue similar to what #wberry suggested, and making use of Queue.task_done and Queue.wait:
import Queue
import threading
class StuffDoer(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
self.setDaemon(True)
self.max_n_times = 0
self.total_n_times = 0
self.do_queue = Queue.Queue()
def run(self):
# this part is outside of my control
while True:
self._do_stuff()
# do other stuff
def _do_stuff(self):
# this part is under my control
if self.total_n_times >= self.max_n_times:
try:
self.max_n_times += self.do_queue.get(block=False)
except Queue.Empty, e:
pass
if self.max_n_times > self.total_n_times:
# do stuff that must execute in the background thread
self.total_n_times += 1
if self.total_n_times >= self.max_n_times:
self.do_queue.task_done()
sd = StuffDoer()
sd.start()
def do_stuff(n_times):
sd.do_queue.put(n_times)
sd.do_queue.join()
assert (sd.total_n_times == sd.max_n_times)
I made solution based on #g.d.d.c advice for this question. There is my code:
threads = []
# initializing aux thread(s) in the main thread ...
t = threading.Thread(target=ThreadF, args=(...))
#t.setDaemon(True) # I'm not sure does it really needed
t.start()
threads.append(t.ident)
# Block main thread
while filter(lambda thread: thread.ident in threads, threading.enumerate()):
time.sleep(10)
Also, you can use Thread.join to block the main thread - it is better way.