Reducing cpu usage in python multiprocessing without sacrificing responsiveness

Reducing cpu usage in python multiprocessing without sacrificing responsiveness - python

I have a multiprocessing programs in python, which spawns several sub-processes and manages them (restarting them if the children identify problems, etc). Each subprocess is unique and their setup depends on a configuration file. The general structure of the master program is:
def main():
messageQueue = multiprocessing.Queue()
errorQueue = multiprocessing.Queue()
childProcesses = {}
for required_children in configuration:
childProcesses[required_children] = MultiprocessChild(errorQueue, messageQueue, *args, **kwargs)
for child_process in ChildProcesses:
ChildProcesses[child_process].start()
while True:
if local_uptime > configuration_check_timer: # This is to check if configuration file for processes has changed. E.g. check every 5 minutes
reload_configuration()
killChildProcessIfConfigurationChanged()
relaunchChildProcessIfConfigurationChanged()
# We want to relaunch error processes immediately (so while statement)
# Errors are not always crashes. Sometimes other system parameters change that require relaunch with different, ChildProcess specific configurations.
while not errorQueue.empty():
_error_, _childprocess_ = errorQueue.get()
killChildProcess(_childprocess_)
relaunchChildProcess(_childprocess)
print(_error_)
# Messages are allowed to lag if a configuration_timer is going to trigger or errorQueue gets something (so if statement)
if not messageQueue.empty():
print(messageQueue.get())
Is there a way to prevent the contents of the infinite while True loop take up 100pct CPU. If I add a sleep event at the end of the loop (e.g. sleep for 10s), then errors will take 10s to correct, ans messages will take 10s to flush.
If on the other hand, there was a way to have a time.sleep() for the duration of the configuration_check_timer, while still running code if messageQueue or errorQueue get stuff inside them, that would be nice.

Related

Python multiprocessing gradually increases memory until it runs our

I have a python program with multiple modules. They go like this:
Job class that is the entry point and manages the overall flow of the program
Task class that is the base class for the tasks to be run on given data. Many SubTask classes created specifically for different types of calculations on different columns of data are derived from the Task class. think of 10 columns in the data and each one having its own Task to do some processing. eg. 'price' column can used by a CurrencyConverterTask to return local currency values and so on.
Many other modules like a connector for getting data, utils module etc, which I don't think are relevant for this question.
The general flow of program: get data from the db continuously -> process the data -> write back the updated data to the db.
I decided to do it in multiprocessing because the tasks are relatively simple. Most of them do some basic arithmetic or logic operations and running it in one process takes a long time, especially getting data from a large db and processing in sequence is very slow.
So the multiprocessing (mp) code looks something like this (I cannot expose the entire file so i'm writing a simplified version, the parts not included are not relevant here. I've tested by commenting them out so this is an accurate representation of the actual code):
class Job():
def __init__():
block_size = 100 # process 100 rows at a time
some_query = "SELECT * IF A > B" # some query to filter data from db
def data_getter():
# continusouly get data from the db and put it into a queue in blocks
cursor = Connector.get_data(some_query)
block = []
for item in cursor:
block.append(item)
if len(block) ==block_size:
data_queue.put(data)
block = []
data_queue.put(None) # this will indicate the worker processors when to stop
def monitor():
# continuously monitor the system stats
timer = Timer()
while (True):
if timer.time_taken >= 60: # log some stats every 60 seconds
print(utils.system_stats())
timer.reset()
def task_runner():
while True:
# get data from the queue
# if there's no data, break out of loop
data = data_queue.get()
if data is None:
break
# run task one by one
for task in tasks:
task.do_something(data)
def run():
# queue to put data for processing
data_queue = mp.Queue()
# start a process for reading data from db
dg = mp.Process(target=self.data_getter).start()
# start a process for monitoring system stats
mon = mp.Process(target=self.monitor).start()
# get a list of tasks to run
tasks = [t for t in taskmodule.get_subtasks()]
workers = []
# start 4 processes to do the actual processing
for _ in range(4):
worker = mp.Process(target=task_runner)
worker.start()
workers.append(worker)
for w in workers:
w.join()
mon.terminate() # terminate the monitor process
dg.terminate() # end the data getting process
if __name__ == "__main__":
job = Job()
job.run()
The whole program is run like: python3 runjob.py
Expected behaviour: continuous stream of data goes in the data_queue and the each worker process gets the data and processes until there's no more data from the cursor at which point the workers finish and the entire program finishes.
This is working as expected but what is not expected is that the system memory usage keeps creeping up continuously until the system crashes. The data i'm getting here is not copied anywhere (at least intentionally). I expect the memory usage to be steady throughout the program. The length of the data_queue rarely exceeds 1 or 2 since the processes are fast enough to get the data when available so It's not the queue holding too much data.
My guess is that all the processes initiated here are long running ones and that has something to do with this. Although I can print the pid and if I follow the PID on top command the data_getter and monitor processes don't exceed more than 2% of memory usage. the 4 worker processes also don't use a lot of memory. And neither does the main process the whole thing runs in. there is an unaccounted for process that takes up 20%+ of the ram. And it bugs me so much I can't figure out what it is.

Avoid increased runtime when opening threads in consecutive runs

I'm doing my final thesis and my topic is the creation of a software that will run and control an on-satellite experiment.
For that reason, I had to implement the reading of multiple sensors while the experiment is running. To do that, I wrote the code so that it will create a new thread for each sensor (multiprocessing might not work because I don't yet know which system the software will run on and therefore I can't say if there will be multiple processors available) and these threads run as daemons all the while the software does its thing. It works well, but now I need to test the whole thing and this is where it gets problematic:
To properly test each and every route the software could take, I have multiple variables that need to be set and so there will be a lot of test runs (I calculated around 17.000 but could be wrong). While the first few test runs go over quickly, each run takes longer and longer. I have fiddled around with my code a little bit and it turns out that without threading, each test takes about the same time. Unfortunately, I do not know why and my knowledge of the matter is very limited. The code concerning the threading is as follows:
This sets up the creation of each thread (sensor_list will be populated with multiple sensors in non-test conditions)
sensor_list = [<a single sensor>]
for sensor in sensor_list:
thread = threading.Thread(
target=self.store_sensor_data,
args=[sensor, query_frequency],
daemon=True,
name=f"Thread_{sensor}",
)
self.threads.append(thread)
thread.start()
The function which actually deals with getting and writing the sensor data, self.store_sensor_data, looks like this:
def store_sensor_data(self, sensor, frequency):
"""Get the current reading and result from 'sensor' and store them.
sensor (Sensor) - the sensor whose data shall be stored
frequency (int) - the frequency (in 1/s) at which data shall be stored
"""
value_id = 0
while not self.HALT:
value_id += 1
sensor_reading = sensor.get_reading()
sensor_result = sensor.get_result()
try:
# if there already is a list for that sensor, append the data to it
self.experiment_report.sensor_data_raw[str(sensor)].append(
(value_id, sensor_reading)
)
except KeyError:
# if there is no list, create one containing the current sensor value
self.experiment_report.sensor_data_raw[str(sensor)] = [
(value_id, sensor_reading)
]
# repeat the same for the 'result'
try:
self.experiment_report.sensor_data[str(sensor)].append(
(value_id, sensor_result)
)
except KeyError:
self.experiment_report.sensor_data[str(sensor)] = [
(value_id, sensor_result)
]
time.sleep(1 / frequency)
after the experiment is done, I stop the threads by calling
def interrupt_sensor_data_recording(self):
"""Interrupt the storing of sensor data by ending all daemon threads.
threads (list) - a list of currently running threads
"""
if len(self.threads) > 0:
self.HALT = True
for thread in self.threads:
if thread.is_alive():
logger.debug(f"Stopping thread '{thread.getName()}'")
thread.join()
else:
thread.join()
logger.debug(f"Thread '{thread.getName()}' was already stopped")
Now I am unsure if how I stop the daemon threads is appropriate and this might be the source of my problems. But there also might be some implication that I don't know about yet and in both cases, it would be nice if someone with more knowledge than me could help me out here.
Thanks in advance!

Streamz/Dask: gather does not wait for all results of buffer

Imports:
from dask.distributed import Client
import streamz
import time
Simulated workload:
def increment(x):
time.sleep(0.5)
return x + 1
Let's suppose I'd like to process some workload on a local Dask client:
if __name__ == "__main__":
with Client() as dask_client:
ps = streamz.Stream()
ps.scatter().map(increment).gather().sink(print)
for i in range(10):
ps.emit(i)
This works as expected, but sink(print) will, of course, enforce waiting for each result, thus the stream will not execute in parallel.
However, if I use buffer() to allow results to be cached, then gather() does not seem to correctly collect all results anymore and the interpreter exits before getting results. This approach:
if __name__ == "__main__":
with Client() as dask_client:
ps = streamz.Stream()
ps.scatter().map(increment).buffer(10).gather().sink(print)
# ^
for i in range(10): # - allow parallel execution
ps.emit(i) # - before gather()
...does not print any results for me. The Python interpreter just exits shortly after starting the script and before buffer() emits it's results, thus nothing gets printed.
However, if the main process is forced to wait for some time, the results are printed in parallel fashion (so they do not wait for each other, but are printed nearly simultaneously):
if __name__ == "__main__":
with Client() as dask_client:
ps = streamz.Stream()
ps.scatter().map(increment).buffer(10).gather().sink(print)
for i in range(10):
ps.emit(i)
time.sleep(10) # <- force main process to wait while ps is working
Why is that? I thought gather() should wait for a batch of 10 results since buffer() should cache exactly 10 results in parallel before flushing them to gather(). Why does gather() not block in this case?
Is there a nice way to otherwise check if a Stream still contains elements being processed in order to prevent the main process from exiting prematurely?

"Why is that?": because the Dask distributed scheduler (which executes the stream mapper and sink functions) and your python script run in different processes. When the "with" block context ends, your Dask Client is closed and execution shuts down before the items emitted to the stream are able reach the sink function.
"Is there a nice way to otherwise check if a Stream still contains elements being processed": not that I am aware of. However: if the behaviour you want is (I'm just guessing here) the parallel processing of a bunch of items, then Streamz is not what you should be using, vanilla Dask should suffice.

how to start multiple jobs in python and communicate with the main job

I am a novice user of python multithreading/multiprocessing, so please bear with me.
I would like to solve the following problem and I need some help/suggestions in this regard.
Let me describe in brief:
I would like to start a python script which does something in the
beginning sequentially.
After the sequential part is over, I would like to start some jobs
in parallel.
Assume that there are four parallel jobs I want to start.
I would like to also start these jobs on some other machines using "lsf" on the computing cluster.My initial script is also running on a ” lsf”
machine.
The four jobs which I started on four machines will perform two logical steps A and B---one after the other.
When a job started initially, they start with logical step A and finish it.
After every job (4jobs) has finished the Step A; they should notify the first job which started these. In other words, the main job which started is waiting for the confirmation from these four jobs.
Once the main job receives confirmation from these four jobs; it should notify all the four jobs to do the logical step B.
Logical step B will automatically terminate the jobs after finishing the task.
Main job is waiting for the all jobs to finish and later on it should continue with the sequential part.
An example scenario would be:
Python script running on an “lsf” machine in the cluster starts four "tcl shells" on four “lsf” machines.
In each tcl shell, a script is sourced to do the logical step A.
Once the step A is done, somehow they should inform the python script which is waiting for the acknowledgement.
Once the acknowledgement is received from all the four, python script inform them to do the logical step B.
Logical step B is also a script which is sourced in their tcl shell; this script will also close the tcl shell at the end.
Meanwhile, python script is waiting for all the four jobs to finish.
After all four jobs are finished; it should continue with the sequential part again and finish later on.
Here are my questions:
I am confused about---should I use multithreading/multiprocessing. Which one suits better?
In fact what is the difference between these two? I read about these but I wasn't able to conclude.
What is python GIL? I also read somewhere at any one point in time only one thread will execute.
I need some explanation here. It gives me an impression that I can't use threads.
Any suggestions on how could I solve my problem systematically and in a more pythonic way.
I am looking for some verbal step by step explanation and some pointers to read on each step.
Once the concepts are clear, I would like to code it myself.
Thanks in advance.

In addition to roganjosh's answer, I would include some signaling to start the step B after A has finished:
import multiprocessing as mp
import time
import random
import sys
def func_A(process_number, queue, proceed):
print "Process {} has started been created".format(process_number)
print "Process {} has ended step A".format(process_number)
sys.stdout.flush()
queue.put((process_number, "done"))
proceed.wait() #wait for the signal to do the second part
print "Process {} has ended step B".format(process_number)
sys.stdout.flush()
def multiproc_master():
queue = mp.Queue()
proceed = mp.Event()
processes = [mp.Process(target=func_A, args=(x, queue)) for x in range(4)]
for p in processes:
p.start()
#block = True waits until there is something available
results = [queue.get(block=True) for p in processes]
proceed.set() #set continue-flag
for p in processes: #wait for all to finish (also in windows)
p.join()
return results
if __name__ == '__main__':
split_jobs = multiproc_master()
print split_jobs

1) From the options you listed in your question, you should probably use multiprocessing in this case to leverage multiple CPU cores and compute things in parallel.
2) Going further from point 1: the Global Interpreter Lock (GIL) means that only one thread can actually execute code at any one time.
A simple example for multithreading that pops up often here is having a prompt for user input for, say, an answer to a maths problem. In the background, they want a timer to keep incrementing at one second intervals to register how long the person took to respond. Without multithreading, the program would block whilst waiting for user input and the counter would not increment. In this case, you could have the counter and the input prompt run on different threads so that they appear to be running at the same time. In reality, both threads are sharing the same CPU resource and are constantly passing an object backwards and forwards (the GIL) to grant them individual access to the CPU. This is hopeless if you want to properly process things in parallel. (Note: In reality, you'd just record the time before and after the prompt and calculate the difference rather than bothering with threads.)
3) I have made a really simple example using multiprocessing. In this case, I spawn 4 processes that compute the sum of squares for a randomly chosen range. These processes do not have a shared GIL and therefore execute independently unlike multithreading. In this example, you can see that all processes start and end at slightly different times, but we can aggregate the results of the processes into a single queue object. The parent process will wait for all 4 child processes to return their computations before moving on. You could then repeat the code for func_B (not included in the code).
import multiprocessing as mp
import time
import random
import sys
def func_A(process_number, queue):
start = time.time()
print "Process {} has started at {}".format(process_number, start)
sys.stdout.flush()
my_calc = sum([x**2 for x in xrange(random.randint(1000000, 3000000))])
end = time.time()
print "Process {} has ended at {}".format(process_number, end)
sys.stdout.flush()
queue.put((process_number, my_calc))
def multiproc_master():
queue = mp.Queue()
processes = [mp.Process(target=func_A, args=(x, queue)) for x in xrange(4)]
for p in processes:
p.start()
# Unhash the below if you run on Linux (Windows and Linux treat multiprocessing
# differently as Windows lacks os.fork())
#for p in processes:
# p.join()
results = [queue.get() for p in processes]
return results
if __name__ == '__main__':
split_jobs = multiproc_master()
print split_jobs

python running coverage on never ending process

I have a multi processed web server with processes that never end, I would like to check my code coverage on the whole project in a live environment (not only from tests).
The problem is, that since the processes never end, I don't have a good place to set the cov.start() cov.stop() cov.save() hooks.
Therefore, I thought about spawning a thread that in an infinite loop will save and combine the coverage data and then sleep some time, however this approach doesn't work, the coverage report seems to be empty, except from the sleep line.
I would be happy to receive any ideas about how to get the coverage of my code,
or any advice about why my idea doesn't work. Here is a snippet of my code:
import coverage
cov = coverage.Coverage()
import time
import threading
import os
class CoverageThread(threading.Thread):
_kill_now = False
_sleep_time = 2
#classmethod
def exit_gracefully(cls):
cls._kill_now = True
def sleep_some_time(self):
time.sleep(CoverageThread._sleep_time)
def run(self):
while True:
cov.start()
self.sleep_some_time()
cov.stop()
if os.path.exists('.coverage'):
cov.combine()
cov.save()
if self._kill_now:
break
cov.stop()
if os.path.exists('.coverage'):
cov.combine()
cov.save()
cov.html_report(directory="coverage_report_data.html")
print "End of the program. I was killed gracefully :)"

Apparently, it is not possible to control coverage very well with multiple Threads.
Once different thread are started, stopping the Coverage object will stop all coverage and start will only restart it in the "starting" Thread.
So your code basically stops the coverage after 2 seconds for all Thread other than the CoverageThread.
I played a bit with the API and it is possible to access the measurments without stopping the Coverage object.
So you could launch a thread that save the coverage data periodically, using the API.
A first implementation would be something like in this
import threading
from time import sleep
from coverage import Coverage
from coverage.data import CoverageData, CoverageDataFiles
from coverage.files import abs_file
cov = Coverage(config_file=True)
cov.start()
def get_data_dict(d):
"""Return a dict like d, but with keys modified by `abs_file` and
remove the copied elements from d.
"""
res = {}
keys = list(d.keys())
for k in keys:
a = {}
lines = list(d[k].keys())
for l in lines:
v = d[k].pop(l)
a[l] = v
res[abs_file(k)] = a
return res
class CoverageLoggerThread(threading.Thread):
_kill_now = False
_delay = 2
def __init__(self, main=True):
self.main = main
self._data = CoverageData()
self._fname = cov.config.data_file
self._suffix = None
self._data_files = CoverageDataFiles(basename=self._fname,
warn=cov._warn)
self._pid = os.getpid()
super(CoverageLoggerThread, self).__init__()
def shutdown(self):
self._kill_now = True
def combine(self):
aliases = None
if cov.config.paths:
from coverage.aliases import PathAliases
aliases = PathAliases()
for paths in self.config.paths.values():
result = paths[0]
for pattern in paths[1:]:
aliases.add(pattern, result)
self._data_files.combine_parallel_data(self._data, aliases=aliases)
def export(self, new=True):
cov_report = cov
if new:
cov_report = Coverage(config_file=True)
cov_report.load()
self.combine()
self._data_files.write(self._data)
cov_report.data.update(self._data)
cov_report.html_report(directory="coverage_report_data.html")
cov_report.report(show_missing=True)
def _collect_and_export(self):
new_data = get_data_dict(cov.collector.data)
if cov.collector.branch:
self._data.add_arcs(new_data)
else:
self._data.add_lines(new_data)
self._data.add_file_tracers(get_data_dict(cov.collector.file_tracers))
self._data_files.write(self._data, self._suffix)
if self.main:
self.export()
def run(self):
while True:
sleep(CoverageLoggerThread._delay)
if self._kill_now:
break
self._collect_and_export()
cov.stop()
if not self.main:
self._collect_and_export()
return
self.export(new=False)
print("End of the program. I was killed gracefully :)")
A more stable version can be found in this GIST.
This code basically grab the info collected by the collector without stopping it.
The get_data_dict function take the dictionary in the Coverage.collector and pop the available data. This should be safe enough so you don't lose any measurement.
The report files get updated every _delay seconds.
But if you have multiple process running, you need to add extra efforts to make sure all the process run the CoverageLoggerThread. This is the patch_multiprocessing function, monkey patched from the coverage monkey patch...
The code is in the GIST. It basically replaces the original Process with a custom process, which start the CoverageLoggerThread just before running the run method and join the thread at the end of the process.
The script main.py permits to launch different tests with threads and processes.
There is 2/3 drawbacks to this code that you need to be carefull of:
It is a bad idea to use the combine function concurrently as it performs comcurrent read/write/delete access to the .coverage.* files. This means that the function export is not super safe. It should be alright as the data is replicated multiple time but I would do some testing before using it in production.
Once the data have been exported, it stays in memory. So if the code base is huge, it could eat some ressources. It is possible to dump all the data and reload it but I assumed that if you want to log every 2 seconds, you do not want to reload all the data every time. If you go with a delay in minutes, I would create a new _data every time, using CoverageData.read_file to reload previous state of the coverage for this process.
The custom process will wait for _delay before finishing as we join the CoverageThreadLogger at the end of the process so if you have a lot of quick processes, you want to increase the granularity of the sleep to be able to detect the end of the Process more quickly. It just need a custom sleep loop that break on _kill_now.
Let me know if this help you in some way or if it is possible to improve this gist.
EDIT:
It seems you do not need to monkey patch the multiprocessing module to start automatically a logger. Using the .pth in your python install you can use a environment variable to start automatically your logger on new processes:
# Content of coverage.pth in your site-package folder
import os
if "COVERAGE_LOGGER_START" in os.environ:
import atexit
from coverage_logger import CoverageLoggerThread
thread_cov = CoverageLoggerThread(main=False)
thread_cov.start()
def close_cov()
thread_cov.shutdown()
thread_cov.join()
atexit.register(close_cov)
You can then start your coverage logger with COVERAGE_LOGGER_START=1 python main.y

Since you are willing to run your code differently for the test, why not add a way to end the process for the test? That seems like it will be simpler than trying to hack coverage.

You can use pyrasite directly, with the following two programs.
# start.py
import sys
import coverage
sys.cov = cov = coverage.coverage()
cov.start()
And this one
# stop.py
import sys
sys.cov.stop()
sys.cov.save()
sys.cov.html_report()
Another way to go would be to trace the program using lptrace even if it only prints calls it can be useful.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.