background function in Python - python

I've got a Python script that sometimes displays images to the user. The images can, at times, be quite large, and they are reused often. Displaying them is not critical, but displaying the message associated with them is. I've got a function that downloads the image needed and saves it locally. Right now it's run inline with the code that displays a message to the user, but that can sometimes take over 10 seconds for non-local images. Is there a way I could call this function when it's needed, but run it in the background while the code continues to execute? I would just use a default image until the correct one becomes available.

Do something like this:
def function_that_downloads(my_args):
# do some long download here
then inline, do something like this:
import threading
def my_inline_function(some_args):
# do some stuff
download_thread = threading.Thread(target=function_that_downloads, name="Downloader", args=some_args)
download_thread.start()
# continue doing stuff
You may want to check if the thread has finished before going on to other things by calling download_thread.isAlive()

Typically the way to do this would be to use a thread pool and queue downloads which would issue a signal, a.k.a an event, when that task has finished processing. You can do this within the scope of the threading module Python provides.
To perform said actions, I would use event objects and the Queue module.
However, a quick and dirty demonstration of what you can do using a simple threading.Thread implementation can be seen below:
import os
import threading
import time
import urllib2
class ImageDownloader(threading.Thread):
def __init__(self, function_that_downloads):
threading.Thread.__init__(self)
self.runnable = function_that_downloads
self.daemon = True
def run(self):
self.runnable()
def downloads():
with open('somefile.html', 'w+') as f:
try:
f.write(urllib2.urlopen('http://google.com').read())
except urllib2.HTTPError:
f.write('sorry no dice')
print 'hi there user'
print 'how are you today?'
thread = ImageDownloader(downloads)
thread.start()
while not os.path.exists('somefile.html'):
print 'i am executing but the thread has started to download'
time.sleep(1)
print 'look ma, thread is not alive: ', thread.is_alive()
It would probably make sense to not poll like I'm doing above. In which case, I would change the code to this:
import os
import threading
import time
import urllib2
class ImageDownloader(threading.Thread):
def __init__(self, function_that_downloads):
threading.Thread.__init__(self)
self.runnable = function_that_downloads
def run(self):
self.runnable()
def downloads():
with open('somefile.html', 'w+') as f:
try:
f.write(urllib2.urlopen('http://google.com').read())
except urllib2.HTTPError:
f.write('sorry no dice')
print 'hi there user'
print 'how are you today?'
thread = ImageDownloader(downloads)
thread.start()
# show message
thread.join()
# display image
Notice that there's no daemon flag set here.

I prefer to use gevent for this sort of thing:
import gevent
from gevent import monkey; monkey.patch_all()
greenlet = gevent.spawn( function_to_download_image )
display_message()
# ... perhaps interaction with the user here
# this will wait for the operation to complete (optional)
greenlet.join()
# alternatively if the image display is no longer important, this will abort it:
#greenlet.kill()
Everything runs in one thread, but whenever a kernel operation blocks, gevent switches contexts when there are other "greenlets" running. Worries about locking, etc are much reduced, as there is only one thing running at a time, yet the image will continue to download whenever a blocking operation executes in the "main" context.
Depending on how much, and what kind of thing you want to do in the background, this can be either better or worse than threading-based solutions; certainly, it is much more scaleable (ie you can do many more things in the background), but that might not be of concern in the current situation.

import threading
import os
def killme():
if keyboard.read_key() == "q":
print("Bye ..........")
os._exit(0)
threading.Thread(target=killme, name="killer").start()
If you want to add more keys, add defs and threading.Thread(target=killme, name="killer").start() lines multiple times. It looks bad but works much better than complex codes.

Related

can I run a piece of code continuously and the main code at the same time in python? [duplicate]

I've got a Python script that sometimes displays images to the user. The images can, at times, be quite large, and they are reused often. Displaying them is not critical, but displaying the message associated with them is. I've got a function that downloads the image needed and saves it locally. Right now it's run inline with the code that displays a message to the user, but that can sometimes take over 10 seconds for non-local images. Is there a way I could call this function when it's needed, but run it in the background while the code continues to execute? I would just use a default image until the correct one becomes available.
Do something like this:
def function_that_downloads(my_args):
# do some long download here
then inline, do something like this:
import threading
def my_inline_function(some_args):
# do some stuff
download_thread = threading.Thread(target=function_that_downloads, name="Downloader", args=some_args)
download_thread.start()
# continue doing stuff
You may want to check if the thread has finished before going on to other things by calling download_thread.isAlive()
Typically the way to do this would be to use a thread pool and queue downloads which would issue a signal, a.k.a an event, when that task has finished processing. You can do this within the scope of the threading module Python provides.
To perform said actions, I would use event objects and the Queue module.
However, a quick and dirty demonstration of what you can do using a simple threading.Thread implementation can be seen below:
import os
import threading
import time
import urllib2
class ImageDownloader(threading.Thread):
def __init__(self, function_that_downloads):
threading.Thread.__init__(self)
self.runnable = function_that_downloads
self.daemon = True
def run(self):
self.runnable()
def downloads():
with open('somefile.html', 'w+') as f:
try:
f.write(urllib2.urlopen('http://google.com').read())
except urllib2.HTTPError:
f.write('sorry no dice')
print 'hi there user'
print 'how are you today?'
thread = ImageDownloader(downloads)
thread.start()
while not os.path.exists('somefile.html'):
print 'i am executing but the thread has started to download'
time.sleep(1)
print 'look ma, thread is not alive: ', thread.is_alive()
It would probably make sense to not poll like I'm doing above. In which case, I would change the code to this:
import os
import threading
import time
import urllib2
class ImageDownloader(threading.Thread):
def __init__(self, function_that_downloads):
threading.Thread.__init__(self)
self.runnable = function_that_downloads
def run(self):
self.runnable()
def downloads():
with open('somefile.html', 'w+') as f:
try:
f.write(urllib2.urlopen('http://google.com').read())
except urllib2.HTTPError:
f.write('sorry no dice')
print 'hi there user'
print 'how are you today?'
thread = ImageDownloader(downloads)
thread.start()
# show message
thread.join()
# display image
Notice that there's no daemon flag set here.
I prefer to use gevent for this sort of thing:
import gevent
from gevent import monkey; monkey.patch_all()
greenlet = gevent.spawn( function_to_download_image )
display_message()
# ... perhaps interaction with the user here
# this will wait for the operation to complete (optional)
greenlet.join()
# alternatively if the image display is no longer important, this will abort it:
#greenlet.kill()
Everything runs in one thread, but whenever a kernel operation blocks, gevent switches contexts when there are other "greenlets" running. Worries about locking, etc are much reduced, as there is only one thing running at a time, yet the image will continue to download whenever a blocking operation executes in the "main" context.
Depending on how much, and what kind of thing you want to do in the background, this can be either better or worse than threading-based solutions; certainly, it is much more scaleable (ie you can do many more things in the background), but that might not be of concern in the current situation.
import threading
import os
def killme():
if keyboard.read_key() == "q":
print("Bye ..........")
os._exit(0)
threading.Thread(target=killme, name="killer").start()
If you want to add more keys, add defs and threading.Thread(target=killme, name="killer").start() lines multiple times. It looks bad but works much better than complex codes.

python running coverage on never ending process

I have a multi processed web server with processes that never end, I would like to check my code coverage on the whole project in a live environment (not only from tests).
The problem is, that since the processes never end, I don't have a good place to set the cov.start() cov.stop() cov.save() hooks.
Therefore, I thought about spawning a thread that in an infinite loop will save and combine the coverage data and then sleep some time, however this approach doesn't work, the coverage report seems to be empty, except from the sleep line.
I would be happy to receive any ideas about how to get the coverage of my code,
or any advice about why my idea doesn't work. Here is a snippet of my code:
import coverage
cov = coverage.Coverage()
import time
import threading
import os
class CoverageThread(threading.Thread):
_kill_now = False
_sleep_time = 2
#classmethod
def exit_gracefully(cls):
cls._kill_now = True
def sleep_some_time(self):
time.sleep(CoverageThread._sleep_time)
def run(self):
while True:
cov.start()
self.sleep_some_time()
cov.stop()
if os.path.exists('.coverage'):
cov.combine()
cov.save()
if self._kill_now:
break
cov.stop()
if os.path.exists('.coverage'):
cov.combine()
cov.save()
cov.html_report(directory="coverage_report_data.html")
print "End of the program. I was killed gracefully :)"
Apparently, it is not possible to control coverage very well with multiple Threads.
Once different thread are started, stopping the Coverage object will stop all coverage and start will only restart it in the "starting" Thread.
So your code basically stops the coverage after 2 seconds for all Thread other than the CoverageThread.
I played a bit with the API and it is possible to access the measurments without stopping the Coverage object.
So you could launch a thread that save the coverage data periodically, using the API.
A first implementation would be something like in this
import threading
from time import sleep
from coverage import Coverage
from coverage.data import CoverageData, CoverageDataFiles
from coverage.files import abs_file
cov = Coverage(config_file=True)
cov.start()
def get_data_dict(d):
"""Return a dict like d, but with keys modified by `abs_file` and
remove the copied elements from d.
"""
res = {}
keys = list(d.keys())
for k in keys:
a = {}
lines = list(d[k].keys())
for l in lines:
v = d[k].pop(l)
a[l] = v
res[abs_file(k)] = a
return res
class CoverageLoggerThread(threading.Thread):
_kill_now = False
_delay = 2
def __init__(self, main=True):
self.main = main
self._data = CoverageData()
self._fname = cov.config.data_file
self._suffix = None
self._data_files = CoverageDataFiles(basename=self._fname,
warn=cov._warn)
self._pid = os.getpid()
super(CoverageLoggerThread, self).__init__()
def shutdown(self):
self._kill_now = True
def combine(self):
aliases = None
if cov.config.paths:
from coverage.aliases import PathAliases
aliases = PathAliases()
for paths in self.config.paths.values():
result = paths[0]
for pattern in paths[1:]:
aliases.add(pattern, result)
self._data_files.combine_parallel_data(self._data, aliases=aliases)
def export(self, new=True):
cov_report = cov
if new:
cov_report = Coverage(config_file=True)
cov_report.load()
self.combine()
self._data_files.write(self._data)
cov_report.data.update(self._data)
cov_report.html_report(directory="coverage_report_data.html")
cov_report.report(show_missing=True)
def _collect_and_export(self):
new_data = get_data_dict(cov.collector.data)
if cov.collector.branch:
self._data.add_arcs(new_data)
else:
self._data.add_lines(new_data)
self._data.add_file_tracers(get_data_dict(cov.collector.file_tracers))
self._data_files.write(self._data, self._suffix)
if self.main:
self.export()
def run(self):
while True:
sleep(CoverageLoggerThread._delay)
if self._kill_now:
break
self._collect_and_export()
cov.stop()
if not self.main:
self._collect_and_export()
return
self.export(new=False)
print("End of the program. I was killed gracefully :)")
A more stable version can be found in this GIST.
This code basically grab the info collected by the collector without stopping it.
The get_data_dict function take the dictionary in the Coverage.collector and pop the available data. This should be safe enough so you don't lose any measurement.
The report files get updated every _delay seconds.
But if you have multiple process running, you need to add extra efforts to make sure all the process run the CoverageLoggerThread. This is the patch_multiprocessing function, monkey patched from the coverage monkey patch...
The code is in the GIST. It basically replaces the original Process with a custom process, which start the CoverageLoggerThread just before running the run method and join the thread at the end of the process.
The script main.py permits to launch different tests with threads and processes.
There is 2/3 drawbacks to this code that you need to be carefull of:
It is a bad idea to use the combine function concurrently as it performs comcurrent read/write/delete access to the .coverage.* files. This means that the function export is not super safe. It should be alright as the data is replicated multiple time but I would do some testing before using it in production.
Once the data have been exported, it stays in memory. So if the code base is huge, it could eat some ressources. It is possible to dump all the data and reload it but I assumed that if you want to log every 2 seconds, you do not want to reload all the data every time. If you go with a delay in minutes, I would create a new _data every time, using CoverageData.read_file to reload previous state of the coverage for this process.
The custom process will wait for _delay before finishing as we join the CoverageThreadLogger at the end of the process so if you have a lot of quick processes, you want to increase the granularity of the sleep to be able to detect the end of the Process more quickly. It just need a custom sleep loop that break on _kill_now.
Let me know if this help you in some way or if it is possible to improve this gist.
EDIT:
It seems you do not need to monkey patch the multiprocessing module to start automatically a logger. Using the .pth in your python install you can use a environment variable to start automatically your logger on new processes:
# Content of coverage.pth in your site-package folder
import os
if "COVERAGE_LOGGER_START" in os.environ:
import atexit
from coverage_logger import CoverageLoggerThread
thread_cov = CoverageLoggerThread(main=False)
thread_cov.start()
def close_cov()
thread_cov.shutdown()
thread_cov.join()
atexit.register(close_cov)
You can then start your coverage logger with COVERAGE_LOGGER_START=1 python main.y
Since you are willing to run your code differently for the test, why not add a way to end the process for the test? That seems like it will be simpler than trying to hack coverage.
You can use pyrasite directly, with the following two programs.
# start.py
import sys
import coverage
sys.cov = cov = coverage.coverage()
cov.start()
And this one
# stop.py
import sys
sys.cov.stop()
sys.cov.save()
sys.cov.html_report()
Another way to go would be to trace the program using lptrace even if it only prints calls it can be useful.

continuosly running function in background in python

I want to run a function continuoulsy in parallel to my main process.How do i do it in python?multiprocessing?threading or thread module?
I am new to python.Any help much appreciated.
If the aim is to capture stderr and do some action you can simply replace sys.stderr by a custom object:
>>> import sys
>>> class MyLogger(object):
... def __init__(self, callback):
... self._callback = callback
... def write(self, text):
... if 'log' in text:
... self._callback(text)
... sys.__stderr__.write(text) # continue writing to normal stderr
...
>>> def the_callback(s):
... print('Stderr: %r' % s)
...
>>> sys.stderr = MyLogger(the_callback)
>>> sys.stderr.write('Some log message\n')
Stderr: 'Some log message'
Some log message
>>>
>>> sys.stderr.write('Another message\n')
Another message
If you want to handle tracebacks and exceptions you can use sys.excepthook.
If you want to capture logs created by the logging module you can implement your own Handler class similar to the above Logger but reimplementing the emit method.
A more interesting, but less practical solution would be to use some kind of scheduler and generators to simulate parallel execution without actually creating threads(searching on the internet will yield some nice results about this)
It definitely depends on your aim, but I'd suggest looking at the threading module. There are many good StackOverflow questions on the use of threading and multithreading (e.g., Multiprocessing vs Threading Python).
Here's a brief skeleton from one of my projects:
import threading # Threading module itself
import Queue # A handy way to pass tasks to your thread
job_queue = Queue.Queue()
job_queue.append('one job to do')
# This is the function that we want to keep running while our program does its thing
def function_to_run_in_background():
# Do something...here is one form of flow control
while True:
job_to_do = job_queue.get() # Get the task from the Queue
print job_to_do # Print what it was we fetched
job_queue.task_done() # Signal that we've finished with that queue item
# Launch the thread...
t = threadingThread(target=function_to_run_in_background, args=(args_to_pass,))
t.daemon = True # YOU MAY NOT WANT THIS: Only use this line if you want the program to exit without waiting for the thread to finish
t.start() # Starts the thread
t.setName('threadName') # Makes it easier to interact with the thread later
# Do other stuff
sleep(5)
print "I am still here..."
job_queue.append('Here is another job for the thread...')
# Wait for everything in job_queue to finish. Since the thread is a daemon, the program will now exit, killing the thread.
job_queue.join()
if you just want to run a function in background in the same process, do:
import thread
def function(a):
pass
thread.start_new(function, (1,)) # a is 1 then
I found that client-server architecture was solution for me. Running server, and spawning many clients talking to server and between clients directly, something like messenger.
Talking/comunication can be achieved through network or text file located in memory, (to speed things up and save hard drive).
Bakuriu: give u a good tip about logging module.

right way to run some code with timeout in Python

I looked online and found some SO discussing and ActiveState recipes for running some code with a timeout. It looks there are some common approaches:
Use thread that run the code, and join it with timeout. If timeout elapsed - kill the thread. This is not directly supported in Python (used private _Thread__stop function) so it is bad practice
Use signal.SIGALRM - but this approach not working on Windows!
Use subprocess with timeout - but this is too heavy - what if I want to start interruptible task often, I don't want fire process for each!
So, what is the right way? I'm not asking about workarounds (eg use Twisted and async IO), but actual way to solve actual problem - I have some function and I want to run it only with some timeout. If timeout elapsed, I want control back. And I want it to work on Linux and Windows.
A completely general solution to this really, honestly does not exist. You have to use the right solution for a given domain.
If you want timeouts for code you fully control, you have to write it to cooperate. Such code has to be able to break up into little chunks in some way, as in an event-driven system. You can also do this by threading if you can ensure nothing will hold a lock too long, but handling locks right is actually pretty hard.
If you want timeouts because you're afraid code is out of control (for example, if you're afraid the user will ask your calculator to compute 9**(9**9)), you need to run it in another process. This is the only easy way to sufficiently isolate it. Running it in your event system or even a different thread will not be enough. It is also possible to break things up into little chunks similar to the other solution, but requires very careful handling and usually isn't worth it; in any event, that doesn't allow you to do the same exact thing as just running the Python code.
What you might be looking for is the multiprocessing module. If subprocess is too heavy, then this may not suit your needs either.
import time
import multiprocessing
def do_this_other_thing_that_may_take_too_long(duration):
time.sleep(duration)
return 'done after sleeping {0} seconds.'.format(duration)
pool = multiprocessing.Pool(1)
print 'starting....'
res = pool.apply_async(do_this_other_thing_that_may_take_too_long, [8])
for timeout in range(1, 10):
try:
print '{0}: {1}'.format(duration, res.get(timeout))
except multiprocessing.TimeoutError:
print '{0}: timed out'.format(duration)
print 'end'
If it's network related you could try:
import socket
socket.setdefaulttimeout(number)
I found this with eventlet library:
http://eventlet.net/doc/modules/timeout.html
from eventlet.timeout import Timeout
timeout = Timeout(seconds, exception)
try:
... # execution here is limited by timeout
finally:
timeout.cancel()
For "normal" Python code, that doesn't linger prolongued times in C extensions or I/O waits, you can achieve your goal by setting a trace function with sys.settrace() that aborts the running code when the timeout is reached.
Whether that is sufficient or not depends on how co-operating or malicious the code you run is. If it's well-behaved, a tracing function is sufficient.
An other way is to use faulthandler:
import time
import faulthandler
faulthandler.enable()
try:
faulthandler.dump_tracebacks_later(3)
time.sleep(10)
finally:
faulthandler.cancel_dump_tracebacks_later()
N.B: The faulthandler module is part of stdlib in python3.3.
If you're running code that you expect to die after a set time, then you should write it properly so that there aren't any negative effects on shutdown, no matter if its a thread or a subprocess. A command pattern with undo would be useful here.
So, it really depends on what the thread is doing when you kill it. If its just crunching numbers who cares if you kill it. If its interacting with the filesystem and you kill it , then maybe you should really rethink your strategy.
What is supported in Python when it comes to threads? Daemon threads and joins. Why does python let the main thread exit if you've joined a daemon while its still active? Because its understood that someone using daemon threads will (hopefully) write the code in a way that it wont matter when that thread dies. Giving a timeout to a join and then letting main die, and thus taking any daemon threads with it, is perfectly acceptable in this context.
I've solved that in that way:
For me is worked great (in windows and not heavy at all) I'am hope it was useful for someone)
import threading
import time
class LongFunctionInside(object):
lock_state = threading.Lock()
working = False
def long_function(self, timeout):
self.working = True
timeout_work = threading.Thread(name="thread_name", target=self.work_time, args=(timeout,))
timeout_work.setDaemon(True)
timeout_work.start()
while True: # endless/long work
time.sleep(0.1) # in this rate the CPU is almost not used
if not self.working: # if state is working == true still working
break
self.set_state(True)
def work_time(self, sleep_time): # thread function that just sleeping specified time,
# in wake up it asking if function still working if it does set the secured variable work to false
time.sleep(sleep_time)
if self.working:
self.set_state(False)
def set_state(self, state): # secured state change
while True:
self.lock_state.acquire()
try:
self.working = state
break
finally:
self.lock_state.release()
lw = LongFunctionInside()
lw.long_function(10)
The main idea is to create a thread that will just sleep in parallel to "long work" and in wake up (after timeout) change the secured variable state, the long function checking the secured variable during its work.
I'm pretty new in Python programming, so if that solution has a fundamental errors, like resources, timing, deadlocks problems , please response)).
solving with the 'with' construct and merging solution from -
Timeout function if it takes too long to finish
this thread which work better.
import threading, time
class Exception_TIMEOUT(Exception):
pass
class linwintimeout:
def __init__(self, f, seconds=1.0, error_message='Timeout'):
self.seconds = seconds
self.thread = threading.Thread(target=f)
self.thread.daemon = True
self.error_message = error_message
def handle_timeout(self):
raise Exception_TIMEOUT(self.error_message)
def __enter__(self):
try:
self.thread.start()
self.thread.join(self.seconds)
except Exception, te:
raise te
def __exit__(self, type, value, traceback):
if self.thread.is_alive():
return self.handle_timeout()
def function():
while True:
print "keep printing ...", time.sleep(1)
try:
with linwintimeout(function, seconds=5.0, error_message='exceeded timeout of %s seconds' % 5.0):
pass
except Exception_TIMEOUT, e:
print " attention !! execeeded timeout, giving up ... %s " % e

Python: Pass or Sleep for long running processes?

I am writing an queue processing application which uses threads for waiting on and responding to queue messages to be delivered to the app. For the main part of the application, it just needs to stay active. For a code example like:
while True:
pass
or
while True:
time.sleep(1)
Which one will have the least impact on a system? What is the preferred way to do nothing, but keep a python app running?
I would imagine time.sleep() will have less overhead on the system. Using pass will cause the loop to immediately re-evaluate and peg the CPU, whereas using time.sleep will allow the execution to be temporarily suspended.
EDIT: just to prove the point, if you launch the python interpreter and run this:
>>> while True:
... pass
...
You can watch Python start eating up 90-100% CPU instantly, versus:
>>> import time
>>> while True:
... time.sleep(1)
...
Which barely even registers on the Activity Monitor (using OS X here but it should be the same for every platform).
Why sleep? You don't want to sleep, you want to wait for the threads to finish.
So
# store the threads you start in a your_threads list, then
for a_thread in your_threads:
a_thread.join()
See: thread.join
If you are looking for a short, zero-cpu way to loop forever until a KeyboardInterrupt, you can use:
from threading import Event
Event().wait()
Note: Due to a bug, this only works on Python 3.2+. In addition, it appears to not work on Windows. For this reason, while True: sleep(1) might be the better option.
For some background, Event objects are normally used for waiting for long running background tasks to complete:
def do_task():
sleep(10)
print('Task complete.')
event.set()
event = Event()
Thread(do_task).start()
event.wait()
print('Continuing...')
Which prints:
Task complete.
Continuing...
signal.pause() is another solution, see https://docs.python.org/3/library/signal.html#signal.pause
Cause the process to sleep until a signal is received; the appropriate handler will then be called. Returns nothing. Not on Windows. (See the Unix man page signal(2).)
I've always seen/heard that using sleep is the better way to do it. Using sleep will keep your Python interpreter's CPU usage from going wild.
You don't give much context to what you are really doing, but maybe Queue could be used instead of an explicit busy-wait loop? If not, I would assume sleep would be preferable, as I believe it will consume less CPU (as others have already noted).
[Edited according to additional information in comment below.]
Maybe this is obvious, but anyway, what you could do in a case where you are reading information from blocking sockets is to have one thread read from the socket and post suitably formatted messages into a Queue, and then have the rest of your "worker" threads reading from that queue; the workers will then block on reading from the queue without the need for neither pass, nor sleep.
Running a method as a background thread with sleep in Python:
import threading
import time
class ThreadingExample(object):
""" Threading example class
The run() method will be started and it will run in the background
until the application exits.
"""
def __init__(self, interval=1):
""" Constructor
:type interval: int
:param interval: Check interval, in seconds
"""
self.interval = interval
thread = threading.Thread(target=self.run, args=())
thread.daemon = True # Daemonize thread
thread.start() # Start the execution
def run(self):
""" Method that runs forever """
while True:
# Do something
print('Doing something imporant in the background')
time.sleep(self.interval)
example = ThreadingExample()
time.sleep(3)
print('Checkpoint')
time.sleep(2)
print('Bye')

Categories

Resources