My scripts have multiple components, and only some pieces need to be nice-d. i.e., run in low priority.
Is there a way to nice only one method of Python, or I need to break it down into several processes?
I am using Linux, if that matters.
You could write a decorator that renices the running process on entry and exit:
import os
import functools
def low_priority(f):
#functools.wraps(f)
def reniced(*args, **kwargs):
os.nice(5)
try:
f(*args,**kwargs)
finally:
os.nice(-5)
return reniced
Then you can use it this way:
#low_priority
def test():
pass # Or whatever you want to do.
Disclaimers:
Works on my machine, not sure how universal os.nice is.
As noted below, whether it works or not may depend on your os/distribution, or on being root.
Nice is on a per-process basis. Behaviour with multiple threads per process will likely not be sane, and may crash.
Related
I was searching for quite some time but I was unable to find a simple solution.
I have a python script that runs indefinitely and saves some files when a condition is met (enough data gathered). Is there a way to terminate the execution of the script and trigger a function that would save the gathered (but yet unsaved) data?
Every time I have to do something (let's say close the computer), I must manually stop (terminate) script (in Pycharm) and I loose a part of the data which is not yet saved.
Edit: Thanks to #Alexander, I was able to solve this. A simple code that might better outline the solution:
import atexit
import time
#atexit.register
def on_close():
print('success') #save my data
while True:
print('a')
time.sleep(2)
Now when clicking on a Stop button, 'on_close' function is executed and I am able to save my data...
Use the atexit module. It part of the python std lib.
import atexit
#atexit.register
def on_close():
... do comething
atexit.register(func, *args, **kwargs)
Register func as a function to be executed at termination. Any optional arguments that are to be passed to func must be passed as arguments to register(). It is possible to register the same function and arguments more than once.
This function returns func, which makes it possible to use it as a decorator.
I wanted to use concurrency in Python for the first time. So I started reading a lot about Python concurreny (GIL, threads vs processes, multiprocessing vs concurrent.futures vs ...) and seen a lot of convoluted examples. Even in examples using the high level concurrent.futures library.
So I decided to just start trying stuff and was surprised with the very, very simple code I ended up with:
from concurrent.futures import ThreadPoolExecutor
class WebHostChecker(object):
def __init__(self, websites):
self.webhosts = []
for website in websites:
self.webhosts.append(WebHost(website))
def __iter__(self):
return iter(self.webhosts)
def check_all(self):
# sequential:
#for webhost in self:
# webhost.check()
# threaded:
with ThreadPoolExecutor(max_workers=10) as executor:
executor.map(lambda webhost: webhost.check(), self.webhosts)
class WebHost(object):
def __init__(self, hostname):
self.hostname = hostname
def check(self):
print("Checking {}".format(self.hostname))
self.check_dns() # only modifies internal state, i.e.: sets self.dns
self.check_http() # only modifies internal status, i.e.: sets self.http
Using the classes looks like this:
webhostchecker = WebHostChecker(["urla.com", "urlb.com"])
webhostchecker.check_all() # -> this calls .check() on all WebHost instances in parallel
The relevant multiprocessing/threading code is only 3 lines. I barely had to modify my existing code (which I hoped to be able to do when first starting to write the code for sequential execution, but started to doubt after reading the many examples online).
And... it works! :)
It perfectly distributes the IO-waiting among multiple threads and runs in less than 1/3 of the time of the original program.
So, now, my question(s):
What am I missing here?
Could I implement this differently? (Should I?)
Why are other examples so convoluted? (Although I must say I couldn't find an exact example doing a method call on multiple objects)
Will this code get me in trouble when I expand my program with features/code I cannot predict right now?
I think I already know of one potential problem and it would be nice if someone can confirm my reasoning: if WebHost.check() also becomes CPU bound I won't be able to swap ThreadPoolExecutor for ProcessPoolExecutor. Because every process will get cloned versions of the WebHost instances? And I would have to code something to sync those cloned instances back to the original?
Any insights/comments/remarks/improvements/... that can bring me to greater understanding will be much appreciated! :)
Ok, so I'll add my own first gotcha:
If webhost.check() raises an Exception, then the thread just ends and self.dns and/or self.http might NOT have been set. However, with the current code, you won't see the Exception, UNLESS you also access the executor.map() results! Leaving me wondering why some objects raised AttributeErrors after running check_all() :)
This can easily be fixed by just evaluating every result (which is always None, cause I'm not letting .check() return anything). You can do it after all threads have run or during. I choose to let Exceptions be raised during (ie: within the with statement), so the program stops at the first unexpected error:
def check_all(self):
with ThreadPoolExecutor(max_workers=10) as executor:
# this alone works, but does not raise any exceptions from the threads:
#executor.map(lambda webhost: webhost.check(), self.webhosts)
for i in executor.map(lambda webhost: webhost.check(), self.webhosts):
pass
I guess I could also use list(executor.map(lambda webhost: webhost.check(), self.webhosts)) but that would unnecessarily use up memory.
Let us say we have a python function magical_attack(energy) which may or may not last more than a second. It could even be an infinite loop? How would I run, but if it goes over a second, terminate it, and tell the rest of the program. I am looking for a sleek module to do this. Example:
import timeout
try: timout.run(magical_attack(5), 1)
except timeout.timeouterror:
blow_up_in_face(wizard)
Note: It is impossible to modify the function. It comes from the outside during runtime.
The simplest way to do this is to run the background code in a thread:
t = threading.Thread(target=magical_attack, args=(5,))
t.start()
t.join(1)
if not t.isAlive():
blow_up_in_face(wizard)
However, note that this will not cancel the magical_attack function; it could still keep spinning along in the background for as long as it wants even though you no longer care about the results.
Canceling threads safely is inherently hard to do, and different on each platform, so Python doesn't attempt to provide a way to do it. If you need that, there are three alternatives:
If you can edit the code of magical_attack to check a flag every so often, you can cancel it cooperatively by just setting that flag.
You can use a child process instead of a thread, which you can then kill safely.
You can use ctypes, pywin32, PyObjC, etc. to access platform-specific routines to kill the thread. But you have to really know what you're doing to make sure you do it safely, and don't confuse Python in doing it.
As Chris Pak pointed out, the futures module in Python 3.2+ makes this even easier. For example, you can throw off thousands of jobs without having thousands of threads; you can apply timeouts to a whole group of jobs as if they were a single job; etc. Plus, you can switch from threads to processes with a trivial one-liner change. Unfortunately, Python 2.7 does not have this module—but there is a quasi-official backport that you can install and use just as easily.
Abamert beat me there on the answer I was preparing, except for this detail:
If, and only if, the outside function is executed through the Python interpreter, even though you can't change it (for example, from a compiled module), you might be able to use the technique described in this other question to kill the thread that calls that function using an exception.
Is there any way to kill a Thread in Python?
Of course, if you did have control over the function you were calling, the StoppableThread class from that answer works well for this:
import threading
class StoppableThread(threading.Thread):
"""Thread class with a stop() method. The thread itself has to check
regularly for the stopped() condition."""
def __init__(self):
super(StoppableThread, self).__init__()
self._stop = threading.Event()
def stop(self):
self._stop.set()
def stopped(self):
return self._stop.isSet()
class Magical_Attack(StoppableThread):
def __init__(self, enval):
self._energy = enval
super(Magical_Attack, self).__init__()
def run(self):
while True and not self.stopped():
print self._energy
if __name__ == "__main__":
a = Magical_Attack(5)
a.start()
a.join(5.0)
a.stop()
Similar to this question, I'd like to have Nose run a test (or all tests) n times -- but not in parallel.
I have a few hundred tests in a project; some are some simple unit tests. Others are integration tests w/ some degree of concurrency. Frequently when debugging tests I want to "hit" a test harder; a bash loop works, but makes for a lot of cluttered output -- no more nice single "." for each passing test. Having the ability to beat on the selected tests for some number of trials seems like a natural thing to ask Nose to do, but I haven't found it anywhere in the docs.
What's the simplest way to get Nose to do this (other than a bash loop)?
You can write a nose test as a generator, and nose will then run each function
yielded:
def check_something(arg):
# some test ...
def test_something():
for arg in some_sequence:
yield (check_something, arg)
Using nose-testconfig, you could make the number of test runs a command line argument:
from testconfig import config
# ...
def test_something():
for n in range(int(config.get("runs", 1))):
yield (check_something, arg)
Which you'd call from the command line with e.g.
$ nosetests --tc=runs:5
... for more than one run.
Alternatively (but also using nose-testconfig), you could write a decorator:
from functools import wraps
from testconfig import config
def multi(fn):
#wraps(fn)
def wrapper():
for n in range(int(config.get("runs", 1))):
fn()
return wrapper
#multi
def test_something():
# some test ...
And then, if you want to divide your tests into different groups, each with their own command line argument for the number of runs:
from functools import wraps
from testconfig import config
def multi(cmd_line_arg):
def wrap(fn):
#wraps(fn)
def wrapper():
for n in range(int(config.get(cmd_line_arg, 1))):
fn()
return wrapper
return wrap
#multi("foo")
def test_something():
# some test ...
#multi("bar")
def test_something_else():
# some test ...
Which you can call like this:
$ nosetests --tc=foo:3 --tc=bar:7
You'll have to write a script to do this, but you can repeat the test names on the commandline X times.
nosetests testname testname testname testname testname testname testname
etc.
Solution I ended up using is create sh script run_test.sh:
var=0
while $1; do
((var++))
echo "*** RETRY $var"
done
Usage:
./run_test.sh "nosetests TestName"
It runs test infinitely but stops on first error.
One way is in the test itself:
Change this:
class MyTest(unittest.TestCase):
def test_once(self):
...
To this:
class MyTest(unittest.TestCase):
def assert_once(self):
...
def test_many(self):
for _ in range(5):
self.assert_once()
There should never be a reason to run a test more than once. It's important that your tests are deterministic (i.e. given the same state of the codebase, they always produce the same result.) If this isn't the case, then instead of running tests more than once, you should redesign the tests and/or code so that they are.
For example, one reason why tests fail intermittently is a race condition between the test and the code-under-test (CUT). In this circumstance, a naive response is to add a big 'voodoo sleep' to the test, to 'make sure' that the CUT is finished before the test starts asserting.
This is error-prone though, because if your CUT is slow for any reason (underpowered hardware, loaded box, busy database, etc) then it will fail sporadically. A better solution in this instance is to have your test wait for an event, rather than sleeping.
The event could be anything of your choosing. Sometimes, events you can use are already being generated (e.g. Javascript DOM events, the 'pageRendered' kind of events that Selenium tests can make use of.) Other times, it might be appropriate for you to add code to your CUT which raises an event when it's done (perhaps your architecture involves other components that are interested in events like this.)
Often though, you'll need to re-write the test such that it tries to detect whether your CUT is finished executing (e.g. does the output file exist yet?), and if not, sleeps for 50ms and then tries again. Eventually it will time out and fail, but only do this after a very long time (e.g. 100 times the expected execution time of your CUT)
Another approach is to design your CUT using 'onion/hexagonal/ports'n'adaptors' principles, which insists your business logic should be free of all external dependencies. This means that your business logic can be tested using plain old sub-millisecond unit tests, which never touch the network or filesystem. Once this is done, you need far fewer end-to-end system tests, because they are now serving just as integration tests, and don't need to try to manipulate every detail and edge-case of your business logic going through the UI. This approach will also yield big benefits in other areas, such as improved CUT design (reducing dependencies between components), tests are much easier to write, and the time taken to run the whole test suite is much reduced.
Using approaches like the above can entirely eliminate the problem of unreliable tests, and I'd recommend doing so, to improve not just your tests, but also your codebase, and your design abilities.
I need to detect when a program crashes or is not running using python and restart it. I need a method that doesn't necessarily rely on the python module being the parent process.
I'm considering implementing a while loop that essentially does
ps -ef | grep process name
and when the process isn't found it starts another. Perhaps this isn't the most efficient method. I'm new to python so possibly there is a python module that does this already.
Why implement it yourself? An existing utility like daemon or Debian's start-stop-daemon is more likely to get the other difficult stuff right about running long-living server processes.
Anyway, when you start the service, put its pid in /var/run/<name>.pid and then make your ps command just look for that process ID, and check that it is the right process. On Linux you can simply look at /proc/<pid>/exe to check that it points to the right executable.
Please don't reinvent init. Your OS has capabilities to do this that require nearly no system resources and will definitely do it better and more reliably than anything you can reproduce.
Classic Linux has /etc/inittab
Ubuntu has /etc/event.d (upstart)
OS X has launchd
Solaris has smf
The following code checks a given process in a given interval, and restarts it.
#Restarts a given process if it is finished.
#Compatible with Python 2.5, tested on Windows XP.
import threading
import time
import subprocess
class ProcessChecker(threading.Thread):
def __init__(self, process_path, check_interval):
threading.Thread.__init__(self)
self.process_path = process_path
self.check_interval = check_interval
def run (self):
while(1):
time.sleep(self.check_interval)
if self.is_ok():
self.make_sure_process_is_running()
def is_ok(self):
ok = True
#do the database locks, client data corruption check here,
#and return true/false
return ok
def make_sure_process_is_running(self):
#This call is blocking, it will wait for the
#other sub process to be finished.
retval = subprocess.call(self.process_path)
def main():
process_path = "notepad.exe"
check_interval = 1 #In seconds
pm = ProcessChecker(process_path, check_interval)
pm.start()
print "Checker started..."
if __name__ == "__main__":
main()
maybe you need http://supervisord.org
I haven't tried it myself, but there is a Python System Information module that can be used to find processes and get information about them. AFAIR there is a ProcessTable class that can be used to inspect the running processes, but it doesn't seem to be very well documented...
I'd go the command-line route (it's just easier imho) as long as you only check every second or two the resource usage should be infintesimal compared to the available processing on any system less than 10 years old.