Testing Python code in thread without modifications?

Testing Python code in thread without modifications? - python

Let's say I have this blob of code that's made to be one long-running thread of execution, to poll for events and fire off other events (in my case, using XMLRPC calls). It needs to be refactored into clean objects so it can be unit tested, but in the meantime I want to capture some of its current behavior in some integration tests, treating it like a black box. For example:
# long-lived code
import xmlrpclib
s = xmlrpclib.ServerProxy('http://XXX:yyyy')
def do_stuff():
while True:
...
if s.xyz():
s.do_thing(...)
_
# test code
import threading, time
# stub out xmlrpclib
def run_do_stuff():
other_code.do_stuff()
def setUp():
t = threading.Thread(target=run_do_stuff)
t.setDaemon(True)
def tearDown():
# somehow kill t
t.join()
def test1():
t.start()
time.sleep(5)
assert some_XMLRPC_side_effects
The last big issue is that the code under test is designed to run forever, until a Ctrl-C, and I don't see any way to force it to raise an exception or otherwise kill the thread so I can start it up from scratch without changing the code I'm testing. I lose the ability to poll any flags from my thread as soon as I call the function under test.
I know this is really not how tests are designed to work, integration tests are of limited value, etc, etc, but I was hoping to show off the value of testing and good design to a friend by gently working up to it rather than totally redesigning his software in one go.

The last big issue is that the code under test is designed to run forever, until a Ctrl-C, and I don't see any way to force it to raise an exception or otherwise kill the thread
The point of Test-Driven Development is to rethink your design so that it is testable.
Loop forever -- while seemingly fine for production use -- is untestable.
So make the loop terminate. It won't hurt production. It will improve testability.
The "designed to run forever" is not designed for testability. So fix the design to be testable.

I think I found a solution that does what I was looking for: Instead of using a thread, use a separate process.
I can write a small python stub to do mocking and run the code in a controlled way. Then I can write the actual tests to run my stub in a subprocess for each test and kill it when each test is finished. The test process could interact with the stub over stdio or a socket.

Related

How can I stop the execution of a Python function from outside of it?

So I have this library that I use and within one of my functions I call a function from that library, which happens to take a really long time. Now, at the same time I have another thread running where I check for different conditions, what I want is that if a condition is met, I want to cancel the execution of the library function.
Right now I'm checking the conditions at the start of the function, but if the conditions happen to change while the library function is running, I don't need its results, and want to return from it.
Basically this is what I have now.
def my_function():
if condition_checker.condition_met():
return
library.long_running_function()
Is there a way to run the condition check every second or so and return from my_function when the condition is met?
I've thought about decorators, coroutines, I'm using 2.7 but if this can only be done in 3.x I'd consider switching, it's just that I can't figure out how.

You cannot terminate a thread. Either the library supports cancellation by design, where it internally would have to check for a condition every once in a while to abort if requested, or you have to wait for it to finish.
What you can do is call the library in a subprocess rather than a thread, since processes can be terminated through signals. Python's multiprocessing module provides a threading-like API for spawning forks and handling IPC, including synchronization.
Or spawn a separate subprocess via subprocess.Popen if forking is too heavy on your resources (e.g. memory footprint through copying of the parent process).
I can't think of any other way, unfortunately.

Generally, I think you want to run your long_running_function in a separate thread, and have it occasionally report its information to the main thread.
This post gives a similar example within a wxpython program.
Presuming you are doing this outside of wxpython, you should be able to replace the wx.CallAfter and wx.Publisher with threading.Thread and PubSub.
It would look something like this:
import threading
import time
def myfunction():
# subscribe to the long_running_function
while True:
# subscribe to the long_running_function and get the published data
if condition_met:
# publish a stop command
break
time.sleep(1)
def long_running_function():
for loop in loops:
# subscribe to main thread and check for stop command, if so, break
# do an iteration
# publish some data
threading.Thread(group=None, target=long_running_function, args=()) # launches your long_running_function but doesn't block flow
myfunction()
I haven't used pubsub a ton so I can't quickly whip up the code but it should get you there.
As an alternative, do you know the stop criteria before you launch the long_running_function? If so, you can just pass it as an argument and check whether it is met internally.

Terminating an IronPython script

This may not specifically be an IronPython question, so a Python dev out there might be able to assist.
I want to run python scripts in my .Net desktop app using IronPython, and would like to give users the ability to forcibly terminate a script. Here's my test script (I'm new to Python so it might not be totally correct):-
import atexit
import time
import sys
#atexit.register
def cleanup():
print 'doing cleanup/termination code'
sys.exit()
for i in range(100):
print 'doing something'
time.sleep(1)
(Note that I might want to specify an "atexit" function in some scripts, allowing them to perform any cleanup during normal or forced termination).
In my .Net code I'm using the following code to terminate the script:
_engine.Runtime.Shutdown();
This results in the script's atexit function being called, but the script doesn't actually terminate - the for loop keeps going. A couple of other SO articles (here and here) say that sys.exit() should do the trick, so what am I missing?

It seems that it's not possible to terminate a running script - at least not in a "friendly" way. One approach I've seen is to run the IronPython engine in another thread, and abort the thread if you need to stop the script.
I wasn't keen on this brute-force approach, which would risk leaving any resources used by the script (e.g. files) open.
In the end, I create a C# helper class like this:-
public class HostFunctions
{
public bool AbortScript { get; set; }
// Other properties and functions that I want to expose to the script...
}
When the hosting application wants to terminate the script it sets AbortScript to true. This object is passed to the running script via the scope:-
_hostFunctions = new HostFunctions();
_scriptScope = _engine.CreateScope();
_scriptScope.SetVariable("HostFunctions", _hostFunctions);
In my scripts I just need to strategically place checks to see if an abort has been requested, and deal with it appropriately, e.g.:-
for i in range(100):
print 'doing something'
time.sleep(1)
if HostFunctions.AbortScript:
cleanup()

It seems that if you are using ".NET 5" or higher then aborting Thread might work imperfect.
Thread.Abort() is not supported on ".NET 5" or higher and throws PlatformNotSupportedException.
You probably will find a solution to use Thread.Interrupt(), but it has slightly different behavior:
If your Python script does not have any Thread.Sleep() it won't stop your script;
It looks like you couldn't Abort that Thread twice, but you can Interrupt that Thread twice. So, if your Python script is using finally blocks or "Context Manager", you will be able to Interrupt it by calling Thread.Interrupt() twice (with some delays between those calls).

A thread is blocked by a blocking call - how do I make a timeout on the blocking call?

I have a python program which operates an external program and starts a timeout thread. Timeout thread should countdown for 10 minutes and if the script, which operates the external program isn't finished in that time, it should kill the external program.
My thread seems to work fine on the first glance, my main script and the thread run simultaneously with no issues. But if a pop up window appears in the external program, it stops my scripts, so that even the countdown thread stops counting, therefore totally failing it's job.
I assume the issue is that the script calls a blocking function in API for the external program, which is blocked by the pop up window. I understand why it blocks my main program, but don't understand why it blocks my countdown thread. So, one possible solution might be to run a separate script for the countdown, but I would like to keep it as clean as possible and it seems really messy to start a script for this.
I have searched everywhere for a clue, but I didn't find much. There was a reference to the gevent library here:
background function in Python
, but it seems like such a basic task, that I don't want to include external library for this.
I also found a solution which uses a windows multimedia timer here, but I've never worked with this before and am afraid the code won't be flexible with this. Script is Windows-only, but it should work on all Windows from XP on.
For Unix I found signal.alarm which seems to do exactly what I want, but it's not available for Windows. Any alternatives for this?
Any ideas on how to work with this in the most simplified manner?
This is the simplified thread I'm creating (run in IDLE to reproduce the issue):
import threading
import time
class timeToKill():
def __init__(self, minutesBeforeTimeout):
self.stop = threading.Event()
self.countdownFrom = minutesBeforeTimeout * 60
def startCountdown(self):
self.countdownThread= threading.Thread(target=self.countdown, args=(self.countdownFrom,))
self.countdownThread.start()
def stopCountdown(self):
self.stop.set()
self.countdownThread.join()
def countdown(self,seconds):
for second in range(seconds):
if(self.stop.is_set()):
break
else:
print (second)
time.sleep(1)
timeout = timeToKill(1)
timeout.startCountdown()
raw_input("Blocking call, waiting for input:\n")

One possible explanation for a function call to block another Python thread is that CPython uses global interpreter lock (GIL) and the blocking API call doesn't release it (NOTE: CPython releases GIL on blocking I/O calls therefore your raw_input() example should work as is).
If you can't make the buggy API call to release GIL then you could use a process instead of a thread e.g., multiprocessing.Process instead of threading.Thread (the API is the same). Different processes are not limited by GIL.

For quick and dirty threading, I usually resort to subprocess commands. it is quite robust and os independent. It does not give as fine grained control as the thread and queue modules but for external calls to programs generally does nicely. Note the shell=True must be used with caution.
#this can be any command
p1 = subprocess.Popen(["python", "SUBSCRIPTS/TEST.py", "0"], shell=True)
#the thread p1 will run in the background - asynchronously. If you want to kill it after some time, then you need
#here do some other tasks/computations
time.sleep(10)
currentStatus = p1.poll()
if currentStatus is None: #then it is still running
try:
p1.kill() #maybe try os.kill(p1.pid,2) if p1.kill does not work
except:
#do something else if process is done running - maybe do nothing?
pass

`eventlet.spawn` doesn't work as expected

I'm writing a web UI for data analysis tasks.
Here's the way it's supposed to work:
After a user specifies parameters like dataset and learning rate, I create a new task record, then a executor for this task is started asyncly (The executor may take a long time to run.), and the user is redirected to some other page.
After searching for an async library for python, I started with eventlet, here's what I wrote in a flask view function:
db.save(task)
eventlet.spawn(executor, task)
return redirect("/show_tasks")
With the code above, the executor didn't execute at all.
What may be the problem of my code? Or maybe I should try something else?

While you been given with direct solutions, i will try to answer your first question and explain why your code does not work as expected.
Disclosures: i currently maintain Eventlet. This comment will contain a number of simplifications to fit into reasonable size.
Brief introduction to cooperative multithreading
There are two ways to do Multithreading and Eventlet exploits cooperative approach. At the core is Greenlet library which basically allows you to create independent "execution contexts". One could think of such context as frozen state of all local variables and a pointer to next instruction. Basically, multithreading = contexts + scheduler. Greenlet provides contexts so we need a scheduler, something that makes decisions about which context should occupy CPU right now. It turns, to make decisions we should also run some code. Which means a separate context (green thread). This special green thread is called a Hub in Eventlet code base. Scheduler maintains an ordered set of contexts that need to be run ASAP - run queue and set of contexts that are waiting for something (e.g. network IO or time limited sleep) to finish.
But since we are doing cooperative multitasking, one context will execute indefinitely unless it explicitly yields to another. This would be very sad style of programming, and also by definition incompatible with existing libraries (pointing at they-know-who); so what Eventlet does is it provides green versions of common modules, changed in such way that they switch to Hub instead of blocking everything. Then, some time may be spent in other green threads or in Hub's wait-for-external-events implementation, in which case Hub would switch back to green thread originating that event - and it would continue execution.
End. Now back to your problem.
What eventlet.spawn actually does: it creates a new execution context. Basically, allocates an object in memory. Also it tells scheduler to put this context into run queue, so at first possible moment, Hub will switch to newly spawned function. Your code does not provide such a moment. There is no place where you explicitly give up execution to other green threads, for Eventlet this is usually done via eventlet.sleep(). And since you don't use green versions of common modules, there is no chance to yield implicitly when other code waits. Most appropriate (if not the only one) place would be your WSGI server's accept loop: it should give other green threads chance to run while waiting for next request. Mentioned in first answer eventlet.monkey_patch() is just a convenient way to replace all (or subset of) common modules with their corresponding green versions.
Unwanted opinion on overall design
In separate section, to skip easily. Iff you are building error resistant software, you usually want to limit execution time for spawned threads (including but not limited to "green") and processes and at least report(log) or react to their unhandled errors. In provided code, your spawned green thread, technically may run in next moment or five minutes later (again, because nobody yields CPU) or fail with unhandled exception. Luckily, Eventlet provides two solutions for both problems: Timeout with_timeout() allow to limit waiting time (remember, if it does not yield, you can't possibly limit it) and GreenThread.link() to catch all exceptions. It may be tempting (it was for me) to reraise exceptions in "main" code, and link() allows that easily, but consider that exceptions would be raised from sleep and IO calls - places where you yield to Hub. This may provide some really counter intuitive tracebacks.

You'll need to patch some system libraries in order to make eventlet work. Here is a minimal working example (also as gist):
#!/usr/bin/env python
from flask import Flask
import time
import eventlet
eventlet.monkey_patch()
app = Flask(__name__)
app.debug = True
def background():
""" do something in the background """
print('[background] working in the background...')
time.sleep(2)
print('[background] done.')
return 42
def callback(gt, *args, **kwargs):
""" this function is called when results are available """
result = gt.wait()
print("[cb] %s" % result)
#app.route('/')
def index():
greenth = eventlet.spawn(background)
greenth.link(callback)
return "Hello World"
if __name__ == '__main__':
app.run()
More on that:
http://eventlet.net/doc/patching.html#monkey-patch
One of the challenges of writing a library like Eventlet is that the built-in networking libraries don’t natively support the sort of cooperative yielding that we need.

Eventlet may indeed be suitable for your purposes, but it doesn't just fit in with any old application; Eventlet requires that it be in control of all your application's I/O.
You may be able to get away with either
Starting Eventlet's main loop in another thread, or even
Not using Eventlet and just spawning your task in another thread.
Celery may be another option.

How to pause and resume a thread in python

I make a thread to run a script, and it may spend much time. And I want to pause and resume it in another thread. If I use a flag and detect it, it can not pause immediately. I have searched a lot, but it seems that self.__flag, self.pause can not achieve the target.
class MT(threading.Thread):
def __init__():
self.__running = threading.Event()
self.__running.set()
self.__flag = threading.Event()
self.__flag.set()
def run(self):
'''
run the script
'''
while self.__running.isSet():
self.__flag.wait()
moudleTest()
def pause(self):
'''
pause the thread
'''
self.__flag.clear()
def resume(self):
'''
resume the thread
'''
self._-flag.set()

What you want is not possible without diving below the Python layer using C extensions with OS specific techniques, e.g. on Windows, SuspendThread. You can not immediately and completely suspend another thread via Python level APIs, because doing so is considered absurdly dangerous.
Even when such a thing is possible, it's a terrible idea, prone to deadlocks and other terrible things. Just for example, pre-CPython 3.3, there was a single global import lock for the whole interpreter. If the other thread was in the middle of importing a module when it was suspended, no other thread could import at all until it was resumed and finished the import (causing a deadlock if that thread was the one responsible for resuming the suspended thread); in CPython 3.3+, it's better, but if another thread tried to import that specific module, it would deadlock just as badly.
In summary: Use Locks, Events and/or Conditions appropriately, and if you need faster pauses, make the wait checks more often (interspersed with thread "work" more regularly). If your code can't tolerate even a tiny delay before the pause, you have a design problem that you need to fix (e.g. you're using Event to simulate locking or the like, possibly for performance, which is hilariously misguided, since Events are built on Conditions which are in turn built on Locks, and all but Lock are implemented at the Python layer, not the C layer, and therefore quite slow).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.