Python - Using parallel processes communicating with API to improve speed

Python - Using parallel processes communicating with API to improve speed - python

I'm trying to write a program in which 1 function adds information to a queue and the other function reads the queue in the meanwhile and does some misc task with it. The program has to put info in the queue and read from it at the same time.
Example:
from multiprocessing import Process
from time import clock, sleep
import Queue
class HotMess(object):
def __init__(self):
self.market_id=['1','2','3']
self.q2=Queue.Queue()
self.session=True
def run(self):
if __name__=='__main__':
t = Process(target=self.get_data)
k = Process(target=self.main_code)
t.run()
k.run()
def main_code(self):
while self.session:
result=self.q2.get()
print(result)
def get_data(self):
#while self.session:
for market in self.market_id:
self.q2.put(market)
mess=HotMess()
mess.run()
Now this produces output of 1 2 3. So far so good. But I actually want get_data to be a while loop and basicly run indefinately. Now if you uncomment while self.session: in the get_data function and do the indent. Now it doesn't produce any output and I think this is due that the Process get_data doesn't finish as long as self.session=True.
My question is how can I main_code() not wait for get_data() and just start working the Queue so they both interact with the Queue (q2). I tried looking at process/threading/POpen but i'm quite far out of my comfort zone and at a bit of a loss.

look here
you should use multiprocessing.Queue. it's for the communication between different processes.
And I also change run to start and join.
from multiprocessing import Process, Queue
from time import clock, sleep
class HotMess(object):
def __init__(self):
self.market_id=['1','2','3']
self.q2=Queue()
self.session=True
def run(self):
if __name__=='__main__':
t = Process(target=self.get_data)
k = Process(target=self.main_code)
t.start()
k.start()
t.join()
k.join()
def main_code(self):
while self.session:
result=self.q2.get()
print(result)
def get_data(self):
while self.session:
for market in self.market_id:
self.q2.put(market)
mess=HotMess()
mess.run()

Related

How to manage the exit of a process without blocking its thread in Python?

I'm trying to code a kind of task manager in Python. It's based on a job queue, the main thread is in charge of adding jobs to this queue. I have made this class to handle the jobs queued, able to limit the number of concurrent processes and handle the output of the finished processes.
Here comes the problem, the _check_jobs function I don't get updated the returncode value of each process, independently of its status (running, finished...) job.returncode is always None, therefore I can't run if statement and remove jobs from the processing job list.
I know it can be done with process.communicate() or process.wait() but I don't want to block the thread that launches the processes. Is there any other way to do it, maybe using a ProcessPoolExecutor? The queue can be hit by processes at any time and I need to be able to handle them.
Thank you all for your time and support :)
from queue import Queue
import subprocess
from threading import Thread
from time import sleep
class JobQueueManager(Queue):
def __init__(self, maxsize: int):
super().__init__(maxsize)
self.processing_jobs = []
self.process = None
self.jobs_launcher=Thread(target=self._worker_job)
self.processing_jobs_checker=Thread(target=self._check_jobs_status)
self.jobs_launcher.start()
self.processing_jobs_checker.start()
def _worker_job(self):
while True:
# Run at max 3 jobs concurrently
if self.not_empty and len(self.processing_jobs) < 3:
# Get job from queue
job = self.get()
# Execute a task without blocking the thread
self.process = subprocess.Popen(job)
self.processing_jobs.append(self.process)
# util if queue.join() is used to block the queue
self.task_done()
else:
print("Waiting 4s for jobs")
sleep(4)
def _check_jobs_status(self):
while True:
# Check if jobs are finished
for job in self.processing_jobs:
# Sucessfully completed
if job.returncode == 0:
self.processing_jobs.remove(job)
# Wait 4 seconds and repeat
sleep(4)
def main():
q = JobQueueManager(100)
task = ["stress", "--cpu", "1", "--timeout", "20"]
for i in range(10): #put 10 tasks in the queue
q.put(task)
q.join() #block until all tasks are done
if __name__ == "__main__":
main()

I answer myself, I have come up with a working solution. The JobExecutor class handles in a custom way the Pool of processes. The watch_completed_tasks function tries to watch and handle the output of the tasks when they are done. This way everything is done with only two threads and the main thread is not blocked when submitting processes.
import subprocess
from threading import Timer
from concurrent.futures import ProcessPoolExecutor, as_completed
import logging
def launch_job(job):
process = subprocess.Popen(job, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
print(f"launching {process.pid}")
return [process.pid, process.stdout.read(), process.stderr.read()]
class JobExecutor(ProcessPoolExecutor):
def __init__(self, max_workers: int):
super().__init__(max_workers)
self.futures = []
self.watch_completed_tasks()
def submit(self, command):
future = super().submit(launch_job, command)
self.futures.append(future)
return future
def watch_completed_tasks(self):
# Manage tasks completion
for completed_task in as_completed(self.futures):
print(f"FINISHED task with PID {completed_task.result()[0]}")
self.futures.remove(completed_task)
# call this function evevery 5 seconds
timer_thread = Timer(5.0, self.watch_completed_tasks)
timer_thread.setName("TasksWatcher")
timer_thread.start()
def main():
executor = JobExecutor(max_workers=5)
for i in range(10):
task = ["stress",
"--cpu", "1",
"--timeout", str(i+5)]
executor.submit(task)

How to create one thread for slowly logging so that the main jobs can continue running (in python)?

I have main works with heavy calculations and also logging with many IO operations.
I don't care much about either the speed or the order of logging.
What I want is a log collector who can take the context I want to log in a new thread so that my main script can keep running without being blocked.
The code I tried is as below:
import threading
from loguru import logger
from collections import deque
import time
class ThreadLogger:
def __init__(self):
self.thread = threading.Thread(target=self.run, daemon=True)
self.log_queue = deque()
self.thread.start()
self.run()
def run(self):
# I also have tried while True:
while self.log_queue:
log_func, context = self.log_queue.popleft()
log_func(*context)
def addLog(self, log_func, context):
self.log_queue.append([log_func, context])
thlogger = ThreadLogger()
for i in range(20):
# add log here with new thread so that won't affect main jobs
thlogger.addLog(logger.debug, (f'hi {i}',))
# main jobs here (I want to do some real shit here with heavy calculation)
The code above doesn't really work as my expectation.
It cannot detect by itself when to digest the queue
Also, if I use "while True: " it just blocks the queue that the queue is never getting longer.
All other techniques I can come out with aren't really doing on a new single thread
Any suggestions I would be very appreciated!

Remove the call self.run() as you already have started a thread to run that method. And it is that call that is blocking your program. It causes the main thread to sit blocked on the empty queue.
def __init__(self):
self.thread = threading.Thread(target=self.run, daemon=True)
self.log_queue = deque()
self.thread.start()
#self.run() # remove
Once you do that then you can change while self.log_queue: to while True:

As Dan D.'s answer
import threading
from loguru import logger
from collections import deque
import time
class ThreadLogger:
def __init__(self):
self.thread = threading.Thread(target=self.run, daemon=True)
self.log_queue = deque()
self.thread.start()
def run(self):
while True:
if self.log_queue:
log_func, context = self.log_queue.popleft()
log_func(*context)
def addLog(self, log_func, context):
self.log_queue.append([log_func, context])
thlogger = ThreadLogger()
for i in range(20):
thlogger.addLog(logger.debug, (f'hi {i}',))
time.sleep(1) # wait for log to happen

Pass dynamically created multiprocess.Event() through a process Queue

I just started getting familiar with multiprocessing in python and got stuck at a problem which I'm not able to solve the way i want and I don't find any clear information if what I'm trying is even properly solvable.
What i'm trying to do is something similar to the following:
import time
from multiprocessing import Process, Event, Queue
from threading import Thread
class Main:
def __init__(self):
self.task_queue = Queue()
self.process = MyProcess(self.task_queue)
self.process.start()
def execute_script(self, code):
ProcessCommunication(code, self.task_queue).start()
class ProcessCommunication(Thread):
def __init__(self, script, task_queue):
super().__init__()
self.script = script
self.script_queue = task_queue
self.script_end_event = Event()
def run(self):
self.script_queue.put((self.script, self.script_end_event))
while not self.script_end_event.is_set():
time.sleep(0.1)
class MyProcess(Process):
class ExecutionThread(Thread):
def __init__(self, code, end_event):
super().__init__()
self.code = code
self.event = end_event
def run(self):
exec(compile(self.code, '<string>', 'exec'))
self.event.set()
def __init__(self, task_queue):
super().__init__(name="TEST_PROCESS")
self.task_queue = task_queue
self.status = None
def run(self):
while True:
if not self.task_queue.empty():
script, end_event = self.task_queue.get()
if script is None:
break
self.ExecutionThread(script, end_event).start()
So I would like to have one separate process, which is running during the whole runtime of my main Process, to execute user written scripts in an environment with restriced user privileges, restriced namespace. Also to protect the main process from potential endless loops without waiting times which load the CPU core too much.
Example Code to use the structure could look something like this:
if __name__ == '__main__':
main_class = Main()
main_class.execute_script("print(1)")
The main process can start several scripts simultaneously and I would like to pass an event, together with the execution request, to the process so that the main process gets notified whenever one of the scripts finished.
However, the Python process Queues somehow do not like the passing of events thorugh the queue and throw the following Error.
'RuntimeError: Semaphore objects should only be shared between processes through inheritance'
As I create another event with every execution request, I can't pass them on instantiation of the Process.
I came up with one way to solve this, which is passing an identifier together with the code and basically set up another queue which is served with the identifier whenever the end_event would be set. However, the usage of events seems much more elegant to me and i wonder if there is solution which I did not think of yet.

Is there something like NSOperationQueue from ObjectiveC in Python?

I'm looking into concurrency options for Python. Since I'm an iOS/macOS developer, I'd find it very useful if there was something like NSOperationQueue in python.
Basically, it's a queue to which you can add operations (every operation is Operation-derived class with run method to implement) which are executed either serially, or in parallel or ideally various dependencies can be set on operations (ie that some operation depends on others being executed before it can start).

have you looked celery as an option? This is what celery website quotes
Celery is an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well.

I'm looking for it, too. But since it doesn't seem to exist yet, I have written my own implementation:
import time
import threading
import queue
import weakref
class OperationQueue:
def __init__(self):
self.thread = None
self.queue = queue.Queue()
def run(self):
while self.queue.qsize() > 0:
msg = self.queue.get()
print(msg)
# emulate if it cost time
time.sleep(2)
def addOperation(self, string):
# put to queue first for thread safe.
self.queue.put(string)
if not (self.thread and self.thread.is_alive()):
print('renew a thread')
self.thread = threading.Thread(target=self.run)
self.thread.start()
myQueue = OperationQueue()
myQueue.addOperation("test1")
# test if it auto free
item = weakref.ref(myQueue)
time.sleep(1)
myQueue.addOperation("test2")
myQueue = None
time.sleep(3)
print(f'item = {item}')
print("Done.")

Python - multiprocessing - processes became zombies

For a couple weeks I have been trying to solve a problem with a multiprocessing module in python (2.7.x)
Idea:
Lets have Message Queue (RabbitMQ in our case). Create a listener on that queue and on the message spawn task which will process that message.
Problem:
Everything works fine, but after a couple hundred tasks, some sub-processes became zombies which is the main problem.
We have also some limitation (such as max number of tasks per machine) - which in the end leads that the machine stops processing any task.
Current implementation:
I created minimal code which should explain our approach
# -*- coding: utf-8 -*-
from multiprocessing import Process
import signal
from threading import Lock
class Task(Process):
def __init__(self, data):
super(Task, self).__init__()
self.data = data
def run(self):
# ignore sigchild signals in subprocess
signal.signal(signal.SIGCHLD, signal.SIG_DFL)
self.do_job() # long job there
pass
def do_job(self):
# very long job
pass
class MQListener(object):
def __init__(self):
self.tasks = []
self.tasks_lock = Lock()
self.register_signal_handler()
mq = RabbitMQ()
mq.listen("task_queue", self.on_message)
def register_signal_handler(self):
signal.signal(signal.SIGCHLD, self.on_signal_received)
def on_signal_received(self, *_):
self._check_existing_processes()
def on_message(self, message):
# ack message and create task
task = Task(message)
with self.tasks_lock:
self.tasks.append(task)
task.start()
pass
def _check_existing_processes(self):
"""
go over all created task, if some is not alive - remove them from tasks collection
"""
try:
with self.tasks_lock:
running_tasks = []
for w in self.tasks:
if not w.is_alive():
w.join()
else:
running_tasks.append(w)
self.tasks = running_tasks
except Exception:
# log
pass
if __name__ == '__main__':
m = MQListener()
I'm quite open to use some library for that - if you can recommend some, that will be great as well.

Using SIGCHLD to catch child processes termination has quite many gotchas. The signal handler is run asynchronously and multiple SIGCHLD calls might get aggregated.
In short is better not to use it as long as you're not really aware of how it works.
Your program has, as well, another issue: what happens if you get 10000 messages at once? You'll spawn 10000 processes altogether and kill your machine.
You could use a process Pool and let it handle all these issues for you.
from multiprocessing import Pool
class MQListener(object):
def __init__(self):
self.pool = Pool()
self.rabbitclient = RabbitMQ()
def new_message(self, message):
self.pool.apply_async(do_job, args=(message, ))
def run(self):
self.rabbitclient.listen("task_queue", self.new_message)
app = MQListener()
app.run()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - Using parallel processes communicating with API to improve speed - python

Related

How to manage the exit of a process without blocking its thread in Python?

How to create one thread for slowly logging so that the main jobs can continue running (in python)?

Pass dynamically created multiprocess.Event() through a process Queue

Is there something like NSOperationQueue from ObjectiveC in Python?

Python - multiprocessing - processes became zombies

Categories

Resources