I've researched this topic quite a bit and can't seem to come to a conclusion.
So I know OpenCL can be used for parallel processing using both the GPU and CPU (in contrast to CUDA). Since I want to do parallel processing with GPU and CPU, would it be better to use Multiprocessing module from python + PyOpenCL/PyCUDA for parallel processing or just use PyOpenCL for both GPU and CPU parallel programming?
I'm pretty new to this but intuitively, I would imagine multiprocessing module from python to be the best possible way to do CPU parallel processing in Python.
Any help or direction would be much appreciated
I don't know if you already got your answer, but take in mind GPUs are designed for floating point operations, and execute a complete python processes in it could slower than you may expect for a GPU.
Anyway, once you are new in parallell processing, you should start with multiprocessing module, once GPU programming, and OpenCL library itself, are diffcult to learn when you have no base.
You may take a look here https://philipwfowler.github.io/2015-01-13-oxford/intermediate/python/04-multiprocessing.html
Related
I have a Python script that is currently using multiprocessing to perform tasks in parallel. My advisor recommended me to use GNU Parallel to speed it up since Python programs always execute on a single core. Should I keep the multiprocessing script as it is and use GNU Parallel on top of it? Or should I remove the multiprocessing part and then use GNU Parallel? Does it make a difference?
Does it make a difference?
There is a really simple answer: Try it and measure.
The performance of parallelization these days depends on so many factors and it really depends on your application (and sometimes on your hardware).
I've been using the SBM algorithm, implemented in the graph-tool package. I need to process a huge amount of data and need to run it in parallel.
I know that OpenMP is activated by default in this package and used in the specific and compatible algorithms, but the documentation doesn't specify which algorithms.
I've tried openmp_enabled() or openmp_set_num_threads() and also export OMP_NUM_THREADS=16. Everything seems fine but when I check the running processes, it's not paralleled.
Do you have any experience with implementing SBM parallelized?
Although graph-tool uses OpenMP, not every algorithm is implemented in parallel, simply because this cannot be done in some cases. The SBM inference algorithm implemented in graph-tool is based on MCMC, which cannot be parallelized in general. Because of this, enabling OpenMP will have no effect.
is there any benefit in using multiprocessing versus running several python interpreters in parallel for long-running, embarrassingly parallel tasks?
At the moment, I'm just firing up several python interpreters that run the analysis over slices of input data, each of them dumping the results into a separate pickle file. It is trivial to slice the input data as well as to combine the results. I'm using python 3.4 on OS X and linux for that.
Is rewriting the code with the multiprocessing module worth the effort? It seems to me that not, but then I'm far away from being expert...
Well, one advantage of multiprocessing is that there are tools for interprocess communication and you can have shared variables with different sorts of restrictions. But if you don't need that your approach is perfectly viable. What you are doing is basically what most map reduce systems automate. I am sure there is some slight performance penalty in running an entire other interpreter but it's probably insignificant.
Google is sponsoring an Open Source project to increase the speed of Python by 5x.
Unladen-Swallow seems to have a good project plan
Why is concurrency such a hard problem?
Is LLVM going to solve the concurrency problem?
Are there solutions other than Multi-core for Hardware advancement?
LLVM is several things together - kind of a virtual machine/optimizing compiler, combined with different frontends that take the input in a particular language and output the result in an intermediate language. This intermediate output can be run with the virtual machine, or can be used to generate a standalone executable.
The problem with concurrency is that, although it was used for a long time in scientific computing, it has just recently has become common in consumer apps. So while it's widely known how to program a scientific calculation program to achieve great performance, it is completely different thing to write a mail user agent/word processor that can be good at concurrency. Also, most of the current OS's were being designed with a single processor in mind, and they may not be fully prepared for multicore processors.
The benefit of LLVM with respect to concurrency is that you have an intermediate output, and if in the future there are advances in concurrency, then by updating your interpreter you instantly gain those benefits in all LLVM-compiled programs. This is not so easy if you had compiled to a standalone executable. So LLVM doesn't solve the concurrency problem per se but it leaves an open door for future enhancements.
Sure there are more possible advances for the hardware like quantum computers, genetics computers, etc. But we have to wait for them to become a reality.
The switch to LLVM itself isn't solving the concurrency problem. That's being solved separately, by getting rid of the Global Interpreter Lock.
I'm not sure how I feel about that; I use threads mainly to deal with blocking I/O, not to take advantage of multicore processors (for that, I would use the multiprocessing module to spawn separate processes).
So I kind of like the GIL; it makes my life a lot easier not having to think about tricky synchronization issues.
LLVM takes care of the nitty-gritty of code generation, so it lets them rewrite Psyco in a way that's more general, portable, maintainable. That in turn allows them to rewrite the CPython core, which lets them experiment with alternate GCs and other things needed to improve python's support for concurrency.
In other words, LLVM doesn't solve the concurrency problem, it just frees up your hands so YOU can solve it.
A reliable coder friend told me that Python's current multi-threading implementation is seriously buggy - enough to avoid using altogether. What can said about this rumor?
Python threads are good for concurrent I/O programming. Threads are swapped out of the CPU as soon as they block waiting for input from file, network, etc. This allows other Python threads to use the CPU while others wait. This would allow you to write a multi-threaded web server or web crawler, for example.
However, Python threads are serialized by the GIL when they enter interpreter core. This means that if two threads are crunching numbers, only one can run at any given moment. It also means that you can't take advantage of multi-core or multi-processor architectures.
There are solutions like running multiple Python interpreters concurrently, using a C based threading library. This is not for the faint of heart and the benefits might not be worth the trouble. Let's hope for an all Python solution in a future release.
The standard implementation of Python (generally known as CPython as it is written in C) uses OS threads, but since there is the Global Interpreter Lock, only one thread at a time is allowed to run Python code. But within those limitations, the threading libraries are robust and widely used.
If you want to be able to use multiple CPU cores, there are a few options. One is to use multiple python interpreters concurrently, as mentioned by others. Another option is to use a different implementation of Python that does not use a GIL. The two main options are Jython and IronPython.
Jython is written in Java, and is now fairly mature, though some incompatibilities remain. For example, the web framework Django does not run perfectly yet, but is getting closer all the time. Jython is great for thread safety, comes out better in benchmarks and has a cheeky message for those wanting the GIL.
IronPython uses the .NET framework and is written in C#. Compatibility is reaching the stage where Django can run on IronPython (at least as a demo) and there are guides to using threads in IronPython.
The GIL (Global Interpreter Lock) might be a problem, but the API is quite OK. Try out the excellent processing module, which implements the Threading API for separate processes. I am using that right now (albeit on OS X, have yet to do some testing on Windows) and am really impressed. The Queue class is really saving my bacon in terms of managing complexity!
EDIT: it seemes the processing module is being included in the standard library as of version 2.6 (import multiprocessing). Joy!
As far as I know there are no real bugs, but the performance when threading in cPython is really bad (compared to most other threading implementations, but usually good enough if all most of the threads do is block) due to the GIL (Global Interpreter Lock), so really it is implementation specific rather than language specific. Jython, for example, does not suffer from this due to using the Java thread model.
See this post on why it is not really feasible to remove the GIL from the cPython implementation, and this for some practical elaboration and workarounds.
Do a quick google for "Python GIL" for more information.
If you want to code in python and get great threading support, you might want to check out IronPython or Jython. Since the python code in IronPython and Jython run on the .NET CLR and Java VM respectively, they enjoy the great threading support built into those libraries. In addition to that, IronPython doesn't have the GIL, an issue that prevents CPython threads from taking full advantage of multi-core architectures.
I've used it in several applications and have never had nor heard of threading being anything other than 100% reliable, as long as you know its limits. You can't spawn 1000 threads at the same time and expect your program to run properly on Windows, however you can easily write a worker pool and just feed it 1000 operations, and keep everything nice and under control.