Twisted: Making code non-blocking

Twisted: Making code non-blocking - python

I'm a bit puzzled about how to write asynchronous code in python/twisted. Suppose (for arguments sake) I am exposing a function to the world that will take a number and return True/False if it is prime/non-prime, so it looks vaguely like this:
def IsPrime(numberin):
for n in range(2,numberin):
if numberin % n == 0: return(False)
return(True)
(just to illustrate).
Now lets say there is a webserver which needs to call IsPrime based on a submitted value. This will take a long time for large numberin.
If in the meantime another user asks for the primality of a small number, is there a way to run the two function calls asynchronously using the reactor/deferreds architecture so that the result of the short calc gets returned before the result of the long calc?
I understand how to do this if the IsPrime functionality came from some other webserver to which my webserver would do a deferred getPage, but what if it's just a local function?
i.e., can Twisted somehow time-share between the two calls to IsPrime, or would that require an explicit invocation of a new thread?
Or, would the IsPrime loop need to be chunked into a series of smaller loops so that control can be passed back to the reactor rapidly?
Or something else?

I think your current understanding is basically correct. Twisted is just a Python library and the Python code you write to use it executes normally as you would expect Python code to: if you have only a single thread (and a single process), then only one thing happens at a time. Almost no APIs provided by Twisted create new threads or processes, so in the normal course of things your code runs sequentially; isPrime cannot execute a second time until after it has finished executing the first time.
Still considering just a single thread (and a single process), all of the "concurrency" or "parallelism" of Twisted comes from the fact that instead of doing blocking network I/O (and certain other blocking operations), Twisted provides tools for performing the operation in a non-blocking way. This lets your program continue on to perform other work when it might otherwise have been stuck doing nothing waiting for a blocking I/O operation (such as reading from or writing to a socket) to complete.
It is possible to make things "asynchronous" by splitting them into small chunks and letting event handlers run in between these chunks. This is sometimes a useful approach, if the transformation doesn't make the code too much more difficult to understand and maintain. Twisted provides a helper for scheduling these chunks of work, cooperate. It is beneficial to use this helper since it can make scheduling decisions based on all of the different sources of work and ensure that there is time left over to service event sources without significant additional latency (in other words, the more jobs you add to it, the less time each job will get, so that the reactor can keep doing its job).
Twisted does also provide several APIs for dealing with threads and processes. These can be useful if it is not obvious how to break a job into chunks. You can use deferToThread to run a (thread-safe!) function in a thread pool. Conveniently, this API returns a Deferred which will eventually fire with the return value of the function (or with a Failure if the function raises an exception). These Deferreds look like any other, and as far as the code using them is concerned, it could just as well come back from a call like getPage - a function that uses no extra threads, just non-blocking I/O and event handlers.
Since Python isn't ideally suited for running multiple CPU-bound threads in a single process, Twisted also provides a non-blocking API for launching and communicating with child processes. You can offload calculations to such processes to take advantage of additional CPUs or cores without worrying about the GIL slowing you down, something that neither the chunking strategy nor the threading approach offers. The lowest level API for dealing with such processes is reactor.spawnProcess. There is also Ampoule, a package which will manage a process pool for you and provides an analog to deferToThread for processes, deferToAMPProcess.

Related

Thread are not happening at the same time?

I have a program that fetches data via an API. I created a function that only takes the target data as an argument and with a for-loop I run this method 10 times.
The programm takes quite some time to display the data because the next function call only happens when the function before has done its work.
I want to use Threads to make it all happen quicker. However, I'm confused. On realpython.org I read this:
A thread is a separate flow of execution. This means that your program will have two things happening at once. But for most Python 3 implementations the different threads do not actually execute at the same time: they merely appear to. It’s tempting to think of threading as having two (or more) different processors running on your program, each one doing an independent task at the same time. That’s almost right. The threads may be running on different processors, but they will only be running one at a time.
First they say: "This means that your program will have two things happening at once" and then they say "but they will only be running one at a time". So my threads are not done simultaneously?
I want to make a decision on whether to use Threads or Multiprocessing but I can't figure it out.
Can somebody help?

With both Threads or Multiprocessing you must assume that execution of your program could jump from one thread/process to another randomly. The difference is that with Threads, code is never really executed at the same time. That means there is always only one CPU core doing your work. With Multiprocessing, your code runs on multiple cores at the same time. So only Multiprocessing will solve your computation N times faster with N processes. (There will be some overhead of course.) If you are not doing any heavy computation, but need to create the illusion of things running in parallel, use threads. This is especially useful for GUIs.
The confusing part is that IO (copying files or loading something from the web for example) is not CPU bound, as it does not require a lot of CPU instructions to happen. So always use threads for this. To understand it a bit more, you should realise that when a thread is waiting for an IO operation to finish, it is actually in a blocked state. This allows other threads to run. So if you use threads to fetch data the first thread will start loading it and then block. This makes room for the the second thread to do the same and so on. When one of the threads has the data ready, it will unblock, run the rest of its code and finish.
(Note that when multiple threads are running they can pause randomly and give room for other threads to run for a while and then carry on. (See first sentence of this answer.))
Generally always use threads unless you need to do something CPU heavy in parallel. Multiprocessing has a lot of limitations when it comes to how it works internally and using it is more complicated and heavy.
This only applies to some implementations of Python tough, for example the most commonly used "official" implementation, CPython. In other languages or less common Python implementations threads are often able to execute instructions on multiple cores at the same time.

What's the point of multithreading in Python if the GIL exists?

From what I understand, the GIL makes it impossible to have threads that harness a core each individually.
This is a basic question, but, what is then the point of the threading library? It seems useless if the threaded code has equivalent speed to a normal program.

In some cases an application may not utilize even one core fully and using threads (or processes) may help to do that.
Think of a typical web application. It receives requests from clients, does some queries to the database and returns data back to the client. Given that IO operation is order of magnitude slower than CPU operation most of the time such application is waiting for IO to complete. First, it waits to read the request from the socket. Then it waits till the request to the database is written into the socket opened to the DB. Then it waits for response from the database and then for response to be written to the client socket.
Waiting for IO to complete may take 90% (or more) of the time the request is processed. When single threaded application is waiting on IO it just not using the core and the core is available for execution. So such application has a room for other threads to execute even on a single core.
In this case when one thread waits for IO to complete it releases GIL and another thread can continue execution.

Strictly speaking, CPython supports multi-io-bound-thread + single-cpu-bound-thread.
I/O bound method: file.open, file.write, file.read, socket.send, socket.recv, etc. When Python calls these I/O functions, it will release GIL and acquire GIL after I/O function returns implicitly.
CPU bound method: arithmetic calculation, etc.
C extension method: method must call PyEval_SaveThread and PyEval_RestoreThread explicitly to tell the Python interpreter what you are doing.

The threading library works very well despite the presence of the GIL.
Before I explain, you should know that Python's threads are real threads - they are normal operating system threads running the Python interpreter. The GIL (or Global Interpreter Lock) is only taken when running pure Python code, and in many cases is completely released and not even checked.
The GIL does not prevent these operations from running concurrently:
IO operations, such as sending & receiving network data or reading/writing to a file.
Heavy builtin CPU bound operations, such as hashing or compressing.
Some C extension operations, such as numpy calculations.
Any of these (and plenty more) would run perfectly fine in a concurrent fashion, and in the majority of the programs these are the heftier parts taking the longest time.
Building an example API in Python that takes astronomical data and calculates trajectories would mean that:
Processing the input and assembling the network packets would be done in parallel.
The trajectory calculations should they be in numpy would all be parallel.
Adding the data to a database would be parallel.
Returning the data over the network would be parallel.
Basically the GIL won't affect the vast majority of the program runtime.
Moreover, at least for networking, other methodologies are more prevalent these days such as asyncio which offers cooperative multi-tasking on the same thread, effectively eliminating the downside of thread overload and allowing for considerably more connections to run at the same time. By utilizing that, the GIL is not even relevant.
The GIL can be a problem and make threading useless in programs that are CPU intensive while running pure Python code, such as a simple program calculating Fibonacci's numbers, but in the majority of real world cases, unless you're running an enormously scaled website such as Youtube (which admittedly has encountered problems), the GIL is not a significant concern.

Please read this: https://opensource.com/article/17/4/grok-gil
There're two concepts here:
Cooperative multi-tasking: When one thread perform i/o bound tasks, it surrenders lock on GIL so other threads may proceed.
Preemptive multi-tasking: Essentially every thread runs for a certain duration (in terms of number of byte codes executed or time), it surrender the lock so other threads can proceed.
So while one thread runs at a time, (1) means we're still utilizing the core most efficiently - note this is not helping with CPU bound workloads. And (2) means each threads get a fair amount of CPU time allocated.

twisted: processing incoming events in synchronous code

Suppose there's a synchronous function in a twisted-powered Python program that takes a long time to execute, doing that in a lot of reasonable-sized pieces of work. If the function could return deferreds, this would be a no-brainer, however the function happens to be deep inside some synchronous code, so that yielding deferreds to continue is impossible.
Is there a way to let twisted handle outstanding events without leaving that function? I.e. what I want to do is something along the lines of
def my_func():
results = []
for item in a_lot_of_items():
results.append(do_computation(item))
reactor.process_outstanding_events()
return results
Of course, this imposes reentrancy requirements on the code, but still, there's QCoreApplication.processEvents for that in Qt, is there anything in twisted?

The solution taken by some event-loop-based systems (essentially the solution you're referencing via Qt's QCoreApplication.processEvents API) is to make the main loop re-entrant. In Twisted terms, this would mean something like (not working code):
def my_expensive_task_that_cannot_be_asynchronous():
#inlineCallbacks
def do_work(units):
for unit in units:
yield do_one_work_asynchronously(unit)
work = do_work(some_work_units())
work.addBoth(lambda ignored: reactor.stop())
reactor.run()
def main():
# Whatever your setup is...
# Then, hypothetical event source triggering your
# expensive function:
reactor.callLater(
30,
my_expensive_task_that_cannot_be_asynchronous,
)
reactor.run()
Notice how there are two reactor.run calls in this program. If Twisted had a re-entrant event loop, this second call would start spinning the reactor again and not return until a matching reactor.stop call is encountered. The reactor would process all events it knows about, not just the ones generated by do_work, and so you would have the behavior you desire.
This requires a re-entrant event loop because my_expensive_task_... is already being called by the reactor loop. The reactor loop is on the call stack. Then, reactor.run is called and the reactor loop is now on the call stack again. So the usual issues apply: the event loop cannot have left over state in its frame (otherwise it may be invalid by the time the nested call is complete), it cannot leave its instance state inconsistent during any calls out to other code, etc.
Twisted does not have a re-entrant event loop. This is a feature that has been considered and, at least in the past, explicitly rejected. Supporting this features brings a huge amount of additional complexity (described above) to the implementation and the application. If the event loop is re-entrant then it becomes very difficult to avoid requiring all application code to be re-entrant safe as well. This negates one of the major benefits of the cooperative multitasking approach Twisted takes to concurrency (that you are guaranteed your functions will not be re-entered).
So, when using Twisted, this solution is out.
I'm not aware of another solution which would allow you to continue to run this code in the reactor thread. You mentioned that the code in question is nested deeply within some other synchronous code. The other options that come to mind are:
make the synchronous code capable of dealing with asynchronous things
factor the expensive parts out and compute them first, then pass the result in to the rest of the code
run all of that code, not just the computationally expensive part, in another thread

You could use deferToThread.
http://twistedmatrix.com/documents/13.2.0/core/howto/threading.html
That method runs your calculation in a separate thread and returns a deferred that is called back when the calculation is actually finished.

The issue is if do_heavy_computation() is code that blocks then execution won't go to the next function. In this case use deferToThread or blockingCallFromThread for heavy calculations. Also if you don't care for the results of the calculation then you can use callInThread. Take a look at documentation on threads

This should do:
for item in items:
reactor.callLater(0, heavy_func, item)
reactor.callLater should bring you back into the event loop.

Python: Continuously and cancelably repeat execution with fixed interval

What is the best way to continuously repeat the execution of a given function at a fixed interval while being able to terminate the executor (thread or process) immediately?
Basically I know two approaches:
use multiprocessing and function with infinite cycle and time.sleep at the end. Processing is terminated with process.terminate() in any state.
use threading and constantly recreate timers at the end of the thread function. Processing is terminated by timer.cancel() while sleeping.
(both “in any state” and “while sleeping” are fine, even though the latter may be not immediate). The problem is that I have to use both multiprocessing and threading as the latter appears not to work on ARM (some fuzzy interaction of python interpreter and vim, outside of vim everything is fine) (I was using the second approach there, have not tried threading+cycle; no code is currently left) and the former spawns way too many processes which I would like not to see unless really required. This leads to a problem of having to code two different approaches while threading with cycle is just a few more imports for drop-in replacements of all multiprocessing stuff wrapped in if/else (except that there is no thread.terminate()). Is there some better way to do the job?
Currently used code is here (currently with cycle for both jobs), but I do not think it will be much useful to answer the question.
Update: The reason why I am using this solution are functions that display file status (and some other things like branch) in version control systems in vim statusline. These statuses must be updated, but updating them immediately cannot be done without using hooks and I have no idea how to set hooks temporary and remove on vim quit without possibly spoiling user configuration. Thus standard solution is cache expiring after N seconds. But when cache expired I need to do an expensive shell call and the delay appears to be noticeable, the more noticeable the heavier IO load is. What I am implementing now is updating values for viewed buffers each N seconds in a separate process thus delays are bothering that process and not me. Threads are likely to also work because GIL does not affect calls to external programs.

I'm not clear on why a single long-lived thread that loops infinitely over the tasks wouldn't work for you? Or why you end up with many processes in the multiprocess option?
My immediate reaction would have been a single thread with a queue to feed it things to do. But I may be misunderstanding the problem.

I do not know how do it simply and/or cleanly in Python, but I was wondering if maybe you couldn't take avantage of an existing system scheduler, e.g. crontab for *nix system.
There is an API in python and it might satisfied your needs.

How can multiple calculations be launched in parallel, while stopping them all when the first one returns? [Python]

How can multiple calculations be launched in parallel, while stopping them all when the first one returns?
The application I have in mind is the following: there are multiple ways of calculating a certain value; each method takes a different amount of time depending on the function parameters; by launching calculations in parallel, the fastest calculation would automatically be "selected" each time, and the other calculations would be stopped.
Now, there are some "details" that make this question more difficult:
The parameters of the function to be calculated include functions (that are calculated from data points; they are not top-level module functions). In fact, the calculation is the convolution of two functions. I'm not sure how such function parameters could be passed to a subprocess (they are not pickeable).
I do not have access to all calculation codes: some calculations are done internally by Scipy (probably via Fortran or C code). I'm not sure whether threads offer something similar to the termination signals that can be sent to processes.
Is this something that Python can do relatively easily?

I would look at the multiprocessing module if you haven't already. It offers a way of offloading tasks to separate processes whilst providing you with a simple, threading like interface.
It provides the same kinds of primatives as you get in the threading module, for example, worker pools and queues for passing messages between your tasks, but it allows you to sidestep the issue of the GIL since your tasks actually run in separate processes.
The actual semantics of what you want are quite specific so I don't think there is a routine that fits the bill out-of-the-box, but you can surely knock one up.
Note: if you want to pass functions around, they cannot be bound functions since these are not pickleable, which is a requirement for sharing data between your tasks.

Because of the global interpreter lock you would be hard pressed to get any speedup this way. In reality even multithreaded programs in Python only run on one core. Thus, you would just be doing N processes at 1/N times the speed. Even if one finished in half the time of the others you would still lose time in the big picture.

Processes can be started and killed trivially.
You can do this.
import subprocess
watch = []
for s in ( "process1.py", "process2.py", "process3.py" ):
sp = subprocess.Popen( s )
watch.append( sp )
Now you're simply waiting for one of those to finish. When one finishes, kill the others.
import time
winner= None
while winner is None:
time.sleep(10)
for w in watch:
if w.poll() is not None:
winner= w
break
for w in watch:
if w.poll() is None: w.kill()
These are processes -- not threads. No GIL considerations. Make the operating system schedule them; that's what it does best.
Further, each process is simply a script that simply solves the problem using one of your alternative algorithms. They're completely independent and stand-alone. Simple to design, build and test.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.