What is the point of Pool.apply() in Python if it blocks

What is the point of Pool.apply() in Python if it blocks - python

I am trying to understand what the point of Pool.apply() is, within a parallelisation module.
My understanding is it is synchronous, and so no parallel processing occurs - but you just get the overhead of running code in separate processes.

I agree that it's not useful for parallelization, but it might have other uses. For example, if by poor life choices, your function has an unholy mess of side effects like changing global variables (probably because you only intended to execute it in a Pool when writing it), running it in a separate process can help with that.
Of course, using it like that is most likely an anti-pattern, so I guess this function only exists for compatibility reasons...

Related

Python multiprocessing.Pool(): am I limited in what I can return?

I am using Python's multi-processing pool. I have been told, although not experienced this myself so I cannot post the code, that one cannot just "return" anything from within the multiprocessing.Pool()-worker back to the multiprocessing.Pool()'s main process. Words like "pickling" and "lock" were being thrown around but I am not sure.
Is this correct, and if so, what are these limitations?
In my case, I have a function which generates a mutable class object and then returns it after it has done some work with it. I'd like to have 8 processes run this function, generate their own classes, and return each of them after they're done. Full code is NOT written yet, so I cannot post it.
Any issues I may run into?
My code is: res = pool.map(foo, list_of_parameters)

Q : "Is this correct, and if so, what are these limitations?"
It depends. It is correct, but the SER/DES processing is the problem here, as a pair of disjoint processes tries to "send" something ( there: a task specification with parameters and back: ... Yessss, the so long waited for result* )
Initial versions of the Python standard library of modules piece, responsible for doing this, the pickle-module, was not able to SER-ialise some more complex types of objects, Class-instances being one such example.
There are newer and newer versions evolving, sure, yet this SER/DES step is one of the SPoFs that may avoid a smooth code-execution for some such cases.
Next are the cases, that finish by throwing a Memory Error as they request as much memory allocations, that the O/S simply rejects any new request for such an allocation, and the whole process attempt to produce and send pickle.dumps( ... ) un-resolvably crashes.
Do we have any remedies available?
Well, maybe yes, maybe no - Mike McKearn's dill may help in some cases to better handle complex objects in SER/DES-processing.
May try to use import dill as pickle; pickle.dumps(...) and test your hot-candidates for Class()-instances to get SER/DES-ed, if they get a chance to pass through. If not, no way using this low-hanging fruit first trick.
Next, a less easy way would be to avoid your dependence on hardwired multiprocessing.Pool()-instantiations and their (above)-limited SER/comms/DES-methods, and design your processing strategy as a distributed-computing system, based on a communicating agents paradigm.
That way you benefit from a right-sized, just-enough designed communication interchange between intelligent-enough agents, that know (as you've designed them to know it) what to tell one to the others, even without sending any mastodon-sized BLOB(s), that accidentally crash the processing in any of the SPoF(s) you cannot both prevent and salvage ex-post.
There seem no better ways forward I know about or can foresee in 2020-Q4 for doing this safe and smart.

Does Python multiprocessing module need multi-core CPU?

Do I need a multi-core CPU to take advantage of the Python multiprocessing module?
Also, can someone tell me how it works under the hood?

multiprocessing asks the OS to launch one or more new processes, running the same version of Python and the same version of your script. It can also set up pipes or other ways of sharing data directly between them.
It usually works like magic; when you peek under the hood, sometimes it looks like sausage being made, but you can usually understand the sausage grinders. The multiprocessing docs do a great job explaining things further (they're long, but then there's a lot to explain). And if you need even more under-the-hood knowledge, the docs link to the source, which is pretty readable Python code. If you have a specific question after reading, come back to SO and ask a specific question.
Meanwhile, you can get some of the benefits of multiprocessing without multiple cores.
The main benefit—the reason the module was designed—is parallelism for speed. And obviously, without 4 cores, you aren't going to cut your time down to 25%. But sometimes, you actually can get a bit of speedup even with a single core, especially if that core has "hyperthreading" or similar technologies. I've seen times come down to 80%, or even 60%. More commonly, they'll go up to 108% instead (because you did get a small benefit from hyperthreading, but the overhead cost was higher than the gain). But try it with your code and see.
Meanwhile, you get all of the side benefits:
Concurrency: You can run multiple tasks at once without them blocking each other. Of course threads, asyncio, and other techniques can do this too.
Isolation: You can run multiple tasks at once without the risk of one of them changing data that another one wasn't expecting to change.
Crash protection: If a child task segfaults, only that task is affected. (Well, you still have to be careful of any side-effects—if it crashed in the middle of writing a file that another tasks expects to be in a consistent shape, you're still in trouble.)
You can also use the multiprocessing module without multiple processes. Sometimes you just want the higher-level API of the module, but you want to use it with threads; multiprocessing.dummy does that. And you can switch back and forth in a couple lines of code to test it both ways. Or you can use the higher-level concurrent.futures.ProcessPoolExecutor wrapper, if its model fits what you want to do. Besides often being simpler, it lets you switch between threads and processes by just changing one word in one line.
Also, redesigning your program around multiprocessing takes you a step closer to further redesigning it as a distributed system that runs on multiple separate machines. It forces you to deal with questions like how your tasks communicate without being able to share everything, without forcing you to deal with further questions like how they communicate without reliable connections.

Why do we need gevent.queue?

My understanding of Gevent is that it's merely concurrency and not parallelism. My understanding of concurrency mechanisms like Gevent and AsyncIO is that, nothing in the Python application is ever executing at the same time.
The closest you get is, calling a non-blocking IO method, and while waiting for that call to return other methods within the Python application are able to be executed. Again, none of the methods within the Python application ever actually execute Python code at the same time.
With that said, why is there a need for gevent.queue? It sounds to me like the Python application doesn't really need to worry about more than one Python method accessing a queue instance at a time.
I'm sure there's a scenario that I'm not seeing that gevent.queue fixes, I'm just curious what that is.

Although you are right that no two statements execute at the same time within a single Python process, you might want to ensure that a series of statements execute atomically, or you might want to impose an order on the execution of certain statements, and in that case things like gevent.queue become useful. A tutorial is here.

How to choose between different concurrent method available in Python?

There's different ways of doing concurrent in Python, below is a simple list:
process-based: process.Popen, multiprocessing.Process, old fashioned os.system, os.popen, os.exe*
thread-based: threading.Thread
microthread-based: greenlet
I know the difference between thread-based concurrency and process-based concurrency, and I know some (but not too much) about GIL's impact in CPython's thread support.
For a beginner who want to implement some level of concurrency, how to choose between them? Or, what's the general difference between them? Are there any more ways to do concurrent in Python?
I'm not sure if I'm asking the right question, please feel free to improve this question.

The reason all three of these mechanisms exist is that they have different strengths and weaknesses.
First, if you have huge numbers of small, independent tasks, and there's no sensible way to batch them up (typically, this means you're writing a C10k server, but that's not the only possible case), microthreads win hands down. You can only run a few hundred OS threads or processes before everything either bogs down or just fails. So, either you use microthreads, or you give up on automatic concurrency and start writing explicit callbacks or coroutines. This is really the only time microthreads win; otherwise, they're just like OS threads except a few things don't work right.
Next, if your code is CPU-bound, you need processes. Microthreads are an inherently single-core solution; Threads in Python generally can't parallelize well because of the GIL; processes get as much parallelism as the OS can handle. So, processes will let your 4-core system run your code 4x as fast; nothing else will. (In fact, you might want to go farther and distribute across separate computers, but you didn't ask about that.) But if your code is I/O-bound, core-parallelism doesn't help, so threads are just as good as processes.
If you have lots of shared, mutable data, things are going to be tough. Processes require explicitly putting everything into sharable structures, like using multiprocessing.Array in place of list, which gets nightmarishly complicated. Threads share everything automatically—which means there are race conditions everywhere. Which means you need to think through your flow very carefully and use locks effectively. With processes, an experienced developers can build a system that works on all of the test data but has to be reorganized every time you give it a new set of inputs. With threads, an experienced developer can write code that runs for weeks before accidentally and silently scrambling everyone's credit card numbers.
Whichever of those two scares you more—do that one, because you understand the problem better. Or, if it's at all possible, step back and try to redesign your code to make most of the shared data independent or immutable. This may not be possible (without making things either too slow or too hard to understand), but think about it hard before deciding that.
If you have lots of independent data or shared immutable data, threads clearly win. Processes need either explicit sharing (like multiprocessing.Array again) or marshaling. multiprocessing and its third-party alternatives make marshaling pretty easy for the simple cases where everything is picklable, but it's still not as simple as just passing values around directly, and it's also a lot slower.
Unfortunately, most cases where you have lots of immutable data to pass around are the exact same cases where you need CPU parallelism, which means you have a tradeoff. And the best answer to this tradeoff may be OS threads on your current 4-core system, but processes on the 16-core system you have in 2 years. (If you organize things around, e.g., multiprocessing.ThreadPool or concurrent.futures.ThreadPoolExecutor, and trivially switch to Pool or ProcessPoolExecutor later—or even with a runtime configuration switch—that pretty much solves the problem. But this isn't always possible.)
Finally, if your application inherently requires an event loop (e.g., a GUI app or a network server), pick the framework you like first. Coding with, say, PySide vs. wx, or twisted vs. gevent, is a bigger difference than coding with microthreads vs. OS threads. And, once you've picked the framework, see how much you can take advantage of its event loop where you thought you needed real concurrency. For example, if you need some code to run every 30 seconds, don't start a thread (micro- or OS) for that, ask the framework to schedule it however it wants.

Program structure in long running data processing python script

For my current job I am writing some long-running (think hours to days) scripts that do CPU intensive data-processing. The program flow is very simple - it proceeds into the main loop, completes the main loop, saves output and terminates: The basic structure of my programs tends to be like so:
<import statements>
<constant declarations>
<misc function declarations>
def main():
for blah in blahs():
<lots of local variables>
<lots of tightly coupled computation>
for something in somethings():
<lots more local variables>
<lots more computation>
<etc., etc.>
<save results>
if __name__ == "__main__":
main()
This gets unmanageable quickly, so I want to refactor it into something more manageable. I want to make this more maintainable, without sacrificing execution speed.
Each chuck of code relies on a large number of variables however, so refactoring parts of the computation out to functions would make parameters list grow out of hand very quickly. Should I put this sort of code into a python class, and change the local variables into class variables? It doesn't make a great deal of sense tp me conceptually to turn the program into a class, as the class would never be reused, and only one instance would ever be created per instance.
What is the best practice structure for this kind of program? I am using python but the question is relatively language-agnostic, assuming a modern object-oriented language features.

First off, if your program is going to be running for hours/days then the overhead of switching to using classes/methods instead of putting everything in a giant main is pretty much non-existent.
Additionally, refactoring (even if it does involve passing a lot of variables) should help you improve speed in the long run. Profiling an application which is designed well is much easier because you can pin-point the slow parts and optimize there. Maybe a new library comes along that's highly optimized for your calculations... a well designed program will let you plug it in and test right away. Or perhaps you decide to write a C Module extension to improve the speed of a subset of your calculations, a well designed application will make that easy too.
It's hard to give concrete advice without seeing <lots of tightly coupled computation> and <lots more computation>. But, I would start with making every for block it's own method and go from there.

Not too clean, but works well in little projects...
You can start using modules as if they were singleton instances, and create real classes only when you feel the complexity of the module or the computation justifies them.
If you do that, you would want to use "import module" and not "from module import stuff" -- it's cleaner and will work better if "stuff" can be reassigned. Besides, it's recommended in the Google guidelines.

Using a class (or classes) can help you organize your code.
Simplicity of form (such as through use of class attributes and methods) is important because it helps you see your algorithm, and can help you more easily unit test the parts.
IMO, these benefits far outweigh the slight loss of speed that may come with using OOP.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.