Make two competing functions and kill the slow one

Make two competing functions and kill the slow one - python

In python, I have to fetch crypto data from binance every minute and do some calculations. For fetching data I have two functions func_a() and func_b(). They both do the same thing but in wildly different manner. Sometimes func_a is faster and sometimes func_b is faster. I want to run both the functions in parallel, if any of the function returns result to me faster, I want to kill the other one and move on (because they both are going to bring the same result).
How can I achieve this in python? Please mind that I do not want to replace these functions or their mechanics.

Python threads aren't very suitable for this purpose for two reasons:
The Python GIL means that if you spawn two CPU-bound threads, each of the two threads will run at half its normal speed (because only one thread is actually running at any given instant; the other is waiting to acquire the interpreter lock)
There is no reliable way to unilaterally kill a thread, because if you do that, any resources it had allocated will be leaked, causing major problems.
If you really want to be able to cancel a function-in-progress, then, you have two options:
Modify the function to periodically check a "please_quit" boolean variable (or whatever) and return immediately if that boolean's state has changed to True. Then your main thread can set the please_quit variable and then call join() on the thread, and rest assured that the thread will quit ASAP. (This does require that you have the ability to modify the function's implementation)
Spawn child processes instead of child threads. A child process takes more resources to launch, but it can run truly in parallel (since it has its own separate Python interpreter) and it is safe (usually) to unilaterally kill it, because the OS will automatically clean up all of the process's held resources when the process is killed.

Related

Creating a thread inside a child process

To my understanding, a thread is a unit under a process. So if I use the multi-threading library in python, it would create the threads under the main process (correct me if im wrong since im still learning). But is there a way to create threads under a different process or child process? So is it possible to multithread in a process since a process has its own shared memory. Lets say an example, i have an application which needs to run in parallel with 3 process. In each process, i want it to run concurrently and share the same memory space. If let's say this is possible, does this mean i need to have a threading code inside my function so that when i run the function with a different process, it will create its own thread?
P.s: I know the gil locks a thread in a process but what im curious is it even possible for a process to create its own thread.
Also its not specifically for python. I just want to know in general about this

Try not to confuse threads and processes. In python, a process is effectively a separate program with its own copy of the python interpreter (at least on platforms that use method spawn to create new processes, such as Window). These are created with the multiprocessing library.
A process can have one or more threads. These share the same memory and can share global variables. These are created with the threading library.
Its perfectly acceptable to create a separate process, and have that process create several threads (although it may be harder to manage as the program grows in size).
As you mentioned the GIL, it does not affect process as they each have their own GIL. Threads within a process are affected by the GIL but they do drop the lock at various points which allows your threading.Thread code to effectively run "concurrently".
But is there a way to create threads under a different process or child process?
Yes
In each process, I want it to run concurrently and share the same memory space.
If you are using separate processes, they do not share the same memory. You need to use an object like a multiprocessing.Queue to transfer data between the processes or shared memory structures such as multiprocessing.Array.
does this mean I need to have a threading code inside my function so that when I run the function with a different process, it will create its own thread?
Yes

Thread are not happening at the same time?

I have a program that fetches data via an API. I created a function that only takes the target data as an argument and with a for-loop I run this method 10 times.
The programm takes quite some time to display the data because the next function call only happens when the function before has done its work.
I want to use Threads to make it all happen quicker. However, I'm confused. On realpython.org I read this:
A thread is a separate flow of execution. This means that your program will have two things happening at once. But for most Python 3 implementations the different threads do not actually execute at the same time: they merely appear to. It’s tempting to think of threading as having two (or more) different processors running on your program, each one doing an independent task at the same time. That’s almost right. The threads may be running on different processors, but they will only be running one at a time.
First they say: "This means that your program will have two things happening at once" and then they say "but they will only be running one at a time". So my threads are not done simultaneously?
I want to make a decision on whether to use Threads or Multiprocessing but I can't figure it out.
Can somebody help?

With both Threads or Multiprocessing you must assume that execution of your program could jump from one thread/process to another randomly. The difference is that with Threads, code is never really executed at the same time. That means there is always only one CPU core doing your work. With Multiprocessing, your code runs on multiple cores at the same time. So only Multiprocessing will solve your computation N times faster with N processes. (There will be some overhead of course.) If you are not doing any heavy computation, but need to create the illusion of things running in parallel, use threads. This is especially useful for GUIs.
The confusing part is that IO (copying files or loading something from the web for example) is not CPU bound, as it does not require a lot of CPU instructions to happen. So always use threads for this. To understand it a bit more, you should realise that when a thread is waiting for an IO operation to finish, it is actually in a blocked state. This allows other threads to run. So if you use threads to fetch data the first thread will start loading it and then block. This makes room for the the second thread to do the same and so on. When one of the threads has the data ready, it will unblock, run the rest of its code and finish.
(Note that when multiple threads are running they can pause randomly and give room for other threads to run for a while and then carry on. (See first sentence of this answer.))
Generally always use threads unless you need to do something CPU heavy in parallel. Multiprocessing has a lot of limitations when it comes to how it works internally and using it is more complicated and heavy.
This only applies to some implementations of Python tough, for example the most commonly used "official" implementation, CPython. In other languages or less common Python implementations threads are often able to execute instructions on multiple cores at the same time.

Would setting a mutex manually improve performance?

My python program is definitely cpu bound but 40% to 55% of the time spent is performed in C code in the z3 solver (which doesn’t knows anything against the gil) where each single call to the C function (z3_optimize_check) take almost a minute to complete (so far the parallel_enable parameter still result in this function working in single thread mode and blocking the main thread).
I can’t use multiprocessing as z3_objects aren’t serializable friendly (except if someone here can prove otherwise). As they are several tasks (where each tasks adds more z3 work in a dict for other tasks), I initially set up mulithreading directly. But the Gil definitely hurts performance more than there is a benefit (especially with hyperthreading) despite the huge time spent in the solver.
But if I set up a blocking mutex manually (through threading.Lock.aquire()) in the z3py module just after the switch from C code which would allows an other thread running only if all other threads are performing solver work, would this remove the gil performance penalty (since their would be only 1 thread at time executing python code and it would always be the same one until the lock is released before z3_optimize_check)?
I mean would using threading.Lock.aquire() triggers calls to PyEval_SaveThread() as if z3 was doing it directly?

so far the parallel_enable parameter still result in this function working in single thread mode and blocking the main thread
I think you are misunderstanding that. z3 running in parallel mode means that you call it from a single Python thread, and then it spawns multiple OS-level threads for itself, doing the job, cleaning up the threads and returning the result for you. It does not miraculously enable Python running without GIL.
From the viewpoint of Python, it still does one thing at a time, and that one thing is making the call to z3. And it is holding GIL for the entire time. So if you see more than one CPU core/thread utilized while the calculation is running, that is the effect of parallel mode of z3, internally branching to multiple threads.
There is another thing, releasing GIL, like what blocking I/O operations do. It does not happen by magic, there is a call-pair for that:
PyThreadState* PyEval_SaveThread()
Release the global interpreter lock (if it has been created) and reset the thread state to NULL, returning the previous thread state (which is not NULL). If the lock has been created, the current thread must have acquired it.
void PyEval_RestoreThread(PyThreadState *tstate)
Acquire the global interpreter lock (if it has been created) and set the thread state to tstate, which must not be NULL. If the lock has been created, the current thread must not have acquired it, otherwise deadlock ensues.
These are C calls, so they are accessible for extension developers. When developers know that the code will run for a long time, without the need for accessing Python internals, PyEval_SaveThread() can be used, and then Python can proceed with other Python threads. And when the long whatever is done, the thread can re-introduce itself and apply for GIL using PyEval_RestoreThread().
But, these things happen only if developers make them happen. And with z3 it might not be the case.
To provide a direct answer to your question: no, Python code can not release GIL and keep it released, as GIL is the lock what a Python thread has to hold when it proceeds. So whenever a Python "instruction" returns, GIL is held again.
Apparently somehow I managed to not include the link I wanted to, so they are on page https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock (and the linked paragraph discusses what I shortly summarized).
Z3 is open source (https://github.com/Z3Prover/z3), and the source code does not contain neither PyEval_SaveThread, nor the wrapper-shortcut Py_BEGIN_ALLOW_THREADS character sequences.
But, it has a parallel Python example, btw. https://github.com/Z3Prover/z3/blob/master/examples/python/parallel.py, with
from multiprocessing.pool import ThreadPool
So I would assume that it might be tested and working with multiprocessing.

Why Do I have to worry about Thread Safety in CPython?

From what I understand, the Global Interpreter Lock allows only a single thread to access the interpreter and execute bytecode. If that's the case, then at any given time, only a single thread will be using the interpreter and its memory.
With that I believe that it is fair to exclude the possibility of having race cases, since no two threads can access the interpreter's memory at the same time, yet I still see warnings about making sure data structures are "thread safe". There is a possibility that it may be covering all implementations of the python interpreter (like cython) which can switch off the GIL and allow true multi threading.
I understand the importance of thread safety in interpreter environments that do not have the GIL enabled. However, for CPython, why is thread safety encouraged when writing multi threaded python code? What is the worse that can happen in the CPython environment?

Of course race conditions can still take place, because access to datastructures is not atomic.
Say you test for a key being present in a dictionary, then do something to add the key:
if key not in dictionary:
# calculate new value
value = elaborate_calculation()
dictionary[key] = value
The thread can be switched at any point after the not in test has returned true, and another thread will also come to the conclusion that the key isn't there. Now two threads are doing the calculation, and you don't know which one will win.
All that the GIL does is protect Python's internal interpreter state. This doesn't mean that data structures used by Python code itself are now locked and protected.

An important note: the multiprocessing module in Python is synchonous to some degree despite the GIL, in that access to the same variable can occur across different processes simultaneously.
This has a likelyhood of corrupting your data, or at least disrupting your control flow, which would be why thread safety is reccomended.
As to why it happens, despite there only being one interpriter, there isn't anything stopping (at least as far as I can tell) two preinterprited pieces of code accessing the same parts of the shared memory synchonously. When doing say:
import multiprocessing
def my_func ():
print("hello world")
my_process=multiprocessing.Process (target=my_func, args=(,))
my_process.start ()
my_process.join ()
My understanding is that the time it takes to interprit (in this case) my_func was buried in the overhead it takes to spawn a new process.
In this case, the term "process" is more suitable here, because there are worker threads that are temporarily spawned just to copy data, so there's some data handshaking doing on, so it's actually quite a bit of a different process (pun intended) than the spawning of a traditional thread.
I hope this helps.

How can multiple calculations be launched in parallel, while stopping them all when the first one returns? [Python]

How can multiple calculations be launched in parallel, while stopping them all when the first one returns?
The application I have in mind is the following: there are multiple ways of calculating a certain value; each method takes a different amount of time depending on the function parameters; by launching calculations in parallel, the fastest calculation would automatically be "selected" each time, and the other calculations would be stopped.
Now, there are some "details" that make this question more difficult:
The parameters of the function to be calculated include functions (that are calculated from data points; they are not top-level module functions). In fact, the calculation is the convolution of two functions. I'm not sure how such function parameters could be passed to a subprocess (they are not pickeable).
I do not have access to all calculation codes: some calculations are done internally by Scipy (probably via Fortran or C code). I'm not sure whether threads offer something similar to the termination signals that can be sent to processes.
Is this something that Python can do relatively easily?

I would look at the multiprocessing module if you haven't already. It offers a way of offloading tasks to separate processes whilst providing you with a simple, threading like interface.
It provides the same kinds of primatives as you get in the threading module, for example, worker pools and queues for passing messages between your tasks, but it allows you to sidestep the issue of the GIL since your tasks actually run in separate processes.
The actual semantics of what you want are quite specific so I don't think there is a routine that fits the bill out-of-the-box, but you can surely knock one up.
Note: if you want to pass functions around, they cannot be bound functions since these are not pickleable, which is a requirement for sharing data between your tasks.

Because of the global interpreter lock you would be hard pressed to get any speedup this way. In reality even multithreaded programs in Python only run on one core. Thus, you would just be doing N processes at 1/N times the speed. Even if one finished in half the time of the others you would still lose time in the big picture.

Processes can be started and killed trivially.
You can do this.
import subprocess
watch = []
for s in ( "process1.py", "process2.py", "process3.py" ):
sp = subprocess.Popen( s )
watch.append( sp )
Now you're simply waiting for one of those to finish. When one finishes, kill the others.
import time
winner= None
while winner is None:
time.sleep(10)
for w in watch:
if w.poll() is not None:
winner= w
break
for w in watch:
if w.poll() is None: w.kill()
These are processes -- not threads. No GIL considerations. Make the operating system schedule them; that's what it does best.
Further, each process is simply a script that simply solves the problem using one of your alternative algorithms. They're completely independent and stand-alone. Simple to design, build and test.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.