I have an asyncio based program which has very inconsistent CPU load. I need to do some relatively computation intensive things to fill up a buffer which the program reads from. However, if I do this while there's high load, I may end up causing the latency-sensitive parts to be slower than I'd like, as the "precompute the stuff" coroutine will be hogging a lot of CPU time. There are also coroutines that must run frequently (handling heartbeats for a websocket connection), so if this preprocessing takes too long those will die.
One solution I've come up with is to simply do this in another process which has lower priority, but if I could keep this all in a single program I'd be much happier. What is a good design for handling this sort of situation?
Related
I have a program that fetches data via an API. I created a function that only takes the target data as an argument and with a for-loop I run this method 10 times.
The programm takes quite some time to display the data because the next function call only happens when the function before has done its work.
I want to use Threads to make it all happen quicker. However, I'm confused. On realpython.org I read this:
A thread is a separate flow of execution. This means that your program will have two things happening at once. But for most Python 3 implementations the different threads do not actually execute at the same time: they merely appear to. It’s tempting to think of threading as having two (or more) different processors running on your program, each one doing an independent task at the same time. That’s almost right. The threads may be running on different processors, but they will only be running one at a time.
First they say: "This means that your program will have two things happening at once" and then they say "but they will only be running one at a time". So my threads are not done simultaneously?
I want to make a decision on whether to use Threads or Multiprocessing but I can't figure it out.
Can somebody help?
With both Threads or Multiprocessing you must assume that execution of your program could jump from one thread/process to another randomly. The difference is that with Threads, code is never really executed at the same time. That means there is always only one CPU core doing your work. With Multiprocessing, your code runs on multiple cores at the same time. So only Multiprocessing will solve your computation N times faster with N processes. (There will be some overhead of course.) If you are not doing any heavy computation, but need to create the illusion of things running in parallel, use threads. This is especially useful for GUIs.
The confusing part is that IO (copying files or loading something from the web for example) is not CPU bound, as it does not require a lot of CPU instructions to happen. So always use threads for this. To understand it a bit more, you should realise that when a thread is waiting for an IO operation to finish, it is actually in a blocked state. This allows other threads to run. So if you use threads to fetch data the first thread will start loading it and then block. This makes room for the the second thread to do the same and so on. When one of the threads has the data ready, it will unblock, run the rest of its code and finish.
(Note that when multiple threads are running they can pause randomly and give room for other threads to run for a while and then carry on. (See first sentence of this answer.))
Generally always use threads unless you need to do something CPU heavy in parallel. Multiprocessing has a lot of limitations when it comes to how it works internally and using it is more complicated and heavy.
This only applies to some implementations of Python tough, for example the most commonly used "official" implementation, CPython. In other languages or less common Python implementations threads are often able to execute instructions on multiple cores at the same time.
From what I understand, the GIL makes it impossible to have threads that harness a core each individually.
This is a basic question, but, what is then the point of the threading library? It seems useless if the threaded code has equivalent speed to a normal program.
In some cases an application may not utilize even one core fully and using threads (or processes) may help to do that.
Think of a typical web application. It receives requests from clients, does some queries to the database and returns data back to the client. Given that IO operation is order of magnitude slower than CPU operation most of the time such application is waiting for IO to complete. First, it waits to read the request from the socket. Then it waits till the request to the database is written into the socket opened to the DB. Then it waits for response from the database and then for response to be written to the client socket.
Waiting for IO to complete may take 90% (or more) of the time the request is processed. When single threaded application is waiting on IO it just not using the core and the core is available for execution. So such application has a room for other threads to execute even on a single core.
In this case when one thread waits for IO to complete it releases GIL and another thread can continue execution.
Strictly speaking, CPython supports multi-io-bound-thread + single-cpu-bound-thread.
I/O bound method: file.open, file.write, file.read, socket.send, socket.recv, etc. When Python calls these I/O functions, it will release GIL and acquire GIL after I/O function returns implicitly.
CPU bound method: arithmetic calculation, etc.
C extension method: method must call PyEval_SaveThread and PyEval_RestoreThread explicitly to tell the Python interpreter what you are doing.
The threading library works very well despite the presence of the GIL.
Before I explain, you should know that Python's threads are real threads - they are normal operating system threads running the Python interpreter. The GIL (or Global Interpreter Lock) is only taken when running pure Python code, and in many cases is completely released and not even checked.
The GIL does not prevent these operations from running concurrently:
IO operations, such as sending & receiving network data or reading/writing to a file.
Heavy builtin CPU bound operations, such as hashing or compressing.
Some C extension operations, such as numpy calculations.
Any of these (and plenty more) would run perfectly fine in a concurrent fashion, and in the majority of the programs these are the heftier parts taking the longest time.
Building an example API in Python that takes astronomical data and calculates trajectories would mean that:
Processing the input and assembling the network packets would be done in parallel.
The trajectory calculations should they be in numpy would all be parallel.
Adding the data to a database would be parallel.
Returning the data over the network would be parallel.
Basically the GIL won't affect the vast majority of the program runtime.
Moreover, at least for networking, other methodologies are more prevalent these days such as asyncio which offers cooperative multi-tasking on the same thread, effectively eliminating the downside of thread overload and allowing for considerably more connections to run at the same time. By utilizing that, the GIL is not even relevant.
The GIL can be a problem and make threading useless in programs that are CPU intensive while running pure Python code, such as a simple program calculating Fibonacci's numbers, but in the majority of real world cases, unless you're running an enormously scaled website such as Youtube (which admittedly has encountered problems), the GIL is not a significant concern.
Please read this: https://opensource.com/article/17/4/grok-gil
There're two concepts here:
Cooperative multi-tasking: When one thread perform i/o bound tasks, it surrenders lock on GIL so other threads may proceed.
Preemptive multi-tasking: Essentially every thread runs for a certain duration (in terms of number of byte codes executed or time), it surrender the lock so other threads can proceed.
So while one thread runs at a time, (1) means we're still utilizing the core most efficiently - note this is not helping with CPU bound workloads. And (2) means each threads get a fair amount of CPU time allocated.
I heard that we should avoid IO blocking operation in Twisted framework,
But what about I have to export received data to external txt file
Now I write my code in this way,
Is there any better solution ? Thanks
class BeginningPrinter(Protocol):
def __init__(self, finished):
self.finished = finished
self.counter = 0
def dataReceived(self, bytes):
self.counter += 1
f = open('export.txt', 'a')
f.write(bytes)
Blocking file I/O should be avoided in Twisted for much the same reason any blocking operations should be avoided. Any single thread can only do one thing at a time. If that thread is the reactor thread and the thing you have it doing is blocking on an operation to complete then no other work you've assigned to the reactor is going to make progress until that operation finishes. This leads to poor use of resources and unresponsive applications.
This is particularly problematic when your program blocks on network I/O because networks are slow. Even worse than being slow, often the program on the other end of the network can't be relied on to be particularly cooperative. It may intentionally go slowly, particularly if its operator learns this will have a negative impact on your software.
Disk I/O is a slightly different case from this. Compared to networks, disks are often fast (your local network might be faster than your disk but your disk is probably faster than random connections across the public internet). Disks are usually also not malicious (they don't try to service your requests as slowly as possible). Because of this, many programs written using Twisted consider filesystem operations to be "fast enough" and disregard the fact that technically they're done using blocking I/O.
There are exceptional cases where you might want to go another route. For one application I have worked on, the expected case was for the disk bandwidth to be almost completely used almost all of the time by other software running on the same machine. This often resulted in simple filesystem operations in the Twisted-using process taking hundreds or thousands of milliseconds which resulted in an unacceptable performance degradation. In this case we opted to move filesystem operations to a second process and drive them with a simple protocol running over a UNIX socket.
Since the tools for asynchronous filesystem operations are quite primitive, going this route incurs a non-trivial additional development cost. You should consider whether your application is actually going to suffer from the 1ms or 2ms wait times (or lower, given the rise of SSDs) it will incur for doing blocking disk I/O under most normal circumstances or whether your software might need to function well under circumstances of extraordinary disk load before deciding which route to take.
Background
I'm a bit new to developing and had a general python/programming question. If you have a method that is a recursion, what is involved to enabling multiple threads or multiprocessing? I've done some light reading and a few examples but they seem to be applying the syntax for new code(and not very cpu intensive tasks), I'm more wondering how do I re-design existing code to do this?
Say I have something thats cpu intensive(basically keeps adding to itself until limit is hit):
def adderExample(sum, number):
if sum > 1000:
print 'sum is larger than 10. Stoping'
else:
sum = sum + number
print sum
number = number + 1
adderExample(sum, number)
adderExample(0,0)
Question(s)/Though process
How would I approach this to make it run faster assuming I have multiple cores available(I want it to eventually want it span machines but I think thats a sperate issue with hadoop so I'll keep this example to only one system with multiple cpu's)? It seems threading it isn't the best choice(because of the time it takes to spawn new threads), if thats true should I only focus on multiprocessing? If so, can recursions be split to different cpu's(vai queues I assume and then rejoin after its done)? Can I create multiple threads for each process than split those processes over multiple cpu's? Lastly, is recursion depth limits an overall limit or is it based on threads/proceses, if so does multiprocessing/threading get around it?
Another question(related) how do those guys trying to codes(rsa, wireless keys,etc) via brute force overcome this problem? I assume they are scaling their mathematical processes over multiple cpu somehow. This or any example to build my understanding would be great.
Any tips/suggestions would be great
Thanks!
Such a loop wouldn't benefit much at all from threading. Consider that you're doing a series of additions, whose intermediate values depend on the previous iterations. This can't be parallelized, because the threads would be stomping on each other's values and overwriting things. You can lock the data so only one thread works on it at a time, but then you lose any benefit of having multiple threads working on that data.
Threads work best when they have independent data sets. e.g. a graphics renderer is a perfect example. Each thread renders a subset of the larger image - they may share common data sources for texture/vertex/color/etc... data, but each thread has its own little section of the total image to work one, and doesn't touch other areas of the image. Whatever thread #1 does on its little section of pixels won't affect what thread #2 is doing elsewhere in the image.
For your related question, password cracking is another example where threading/multiprocessing makes sense. Each thread goes off on its own testing multiple possible passwords against one common "to be cracked" list. What one thread is doing doesn't affect any of the other cracker threads, unless you get a match, which may mean all threads abort since the job is "done".
Once threads become interdependent on each other, you lose a lot of the benefits of having multiple threads. They'll spend more time waiting for the other to finish than they'll spend on doing actual work. Of course, this doesn't say you should never use threads. Sometimes it does makes sense to have multiple threads, even if they are interdependent. E.g. a graphics thread + sound effects thread + action processor thread + A.I. calculations thread, etc... in a game. each one is nominally dependent on each other, but while the sound thread is busy generating the bang+ricochet audio for the gun the player just shot, the a.i. thread is off calculating what the game's mobs are doing, the graphics thread is drawing some clouds in the background, etc...
Threading kinda sorta implies multiple stacks, recursion single stacks. That said, if you get to the recurse-left, recurse-right part and decide to spawn threads for the sub-problems if the current count of threads is "low" and do straight recursion otherwise you can combine the concepts.
But regular Python is not a good language for this pattern. Python threads all run on the same interpreter hardware thread, so you won't actually pick up any multiprocessing goodness.
Phunctor is correct that the threading library is a poor choice for parallelizing this type of problem, due to the "Global Interpreter Lock" that prevents multiple threads from executing Python code in parallel.
Where the threading library can be highly useful, though, is when each thread's code spends a lot of time waiting for I/O to happen. So, for example, if you're implementing a server that has to hit the disk or wait on a network response, servicing a request in each thread can be very efficient, since the threading library can favor the ones that are not waiting on I/O and thus maximize use of the Python interpreter. (In a single thread, you'd have to use a tight loop checking the statuses of your I/O requests, which would tend to be wasteful as load got high.)
I'm just starting to work on a tornado application that is having some CPU issues. The CPU time will monotonically grow as time goes by, maxing out the CPU at 100%. The system is currently designed to not block the main thread. If it needs to do something that blocks and asynchronous drivers aren't available, it will spawn another thread to do the blocking operation.
Thus we have the main thread being almost totally CPU-bound and a bunch of other threads that are almost totally IO-bound. From what I've read, this seems to be the perfect way to run into problems with the GIL. Plus, my profiling shows that we're spending a lot of time waiting on signals (which I'm assuming is what __semwait_signal is doing), which is consistent with the effects the GIL would have in my limited understanding.
If I use sys.setcheckinterval to set the check interval to 300, the CPU growth slows down significantly. What I'm trying to determine is whether I should increase the check interval, leave it at 300, or be scared with upping it. After all, I notice that CPU performance gets better, but I'm a bit concerned that this will negatively impact the system's responsiveness.
Of course, the correct answer is probably that we need to rethink our architecture to take the GIL into account. But that isn't something that can be done immediately. So how do I determine the appropriate course of action to take in the short-term?
The first thing I would check for would be to ensure that you're properly exiting threads. It's very hard to figure out what's going on with just your description to go from, but you use the word "monotonically," which implies that CPU use is tied to time rather than to load.
You may very well be running into threading limits of Python, but it should vary up and down with load (number of active threads,) and CPU usage (context switching costs) should reduce as those threads exit. Is there some reason for a thread, once created, to live forever? If that's the case, prioritize that rearchitecture. Otherwise, short term would be to figure out why CPU usage is tied to time and not load. It implies that each new thread has a permanent, irreversible cost in your system - meaning it never exits.