How's Python Multiprocessing Implemented on Windows?

How's Python Multiprocessing Implemented on Windows? - python

Given the absence of a Windows fork() call, how's the multiprocessing package in Python 2.6 implemented under Windows? On top of Win32 threads or some sort of fake fork or just compatibility on top of the existing multithreading?

It's done using a subprocess call to sys.executable (i.e. start a new Python process) followed by serializing all of the globals, and sending those over the pipe. A poor man's cloning of the current process. This is the cause of the extra restrictions found when using multiprocessing on Windows plaform.
You may also be interested in viewing Jesse Noller's talk from PyCon about multiprocessing where he discusses its use.

Related

It is said that python doesn't support multithreading, then why does it have a threading module?

I have been working on python programming language, python is arguably a slow language due to many factors, out of which include the lack of multithreading features, if it doesn't support multithreading, then why does it have a threading module?

Python's single-threaded nature is due to the GIL (Global interpreter lock). When people refer to python being single threaded, they are describing how python operates when not using the threading or multiprocessing libraries. You can still make python use more threads, or spin up multiple processes, but for each instance of the code that you are running, it will only be using a single thread.
Javascript for example can make use of multiple threads and doesn't require any additional "work" to make this happen.
Check out this video for some more info: https://www.youtube.com/watch?v=m2yeB94CxVQ

Python and Threads with PyPy?

I have a kivy application in python which uses some threads.
As python is not able to run these threads on different Cores due to the Global Interpreter Lock, I would have liked to try to use PyPy for it and see if I can make the threads run faster of different cores since PyPy is different and offers stackless (what ever that is? :).
Does somebody have some information to share on how to make a simple python program, which launches some threads by the module threading, running with the pypy interpreter such that it uses this stackless feature?

Pypy won't resolve Python problems of running a single-thread each time, since it also makes use of the GIL - http://doc.pypy.org/en/latest/faq.html#does-pypy-have-a-gil-why
Besides that, Kivy is a complex project embedding Python itself - although I don't know it very well, I doubt it is possible to switch the Python used in it for Pypy.
Depending on what you are doing, you may want to use the multiprocessing module instead of threading - it is a drop-in replacement that will make transparent inter-process calls to Python functions, and can therefore take advantage of multiple-cores.
https://docs.python.org/3/library/multiprocessing.html
This is standard in cPython and can likely be used from within Kivy, if (and only if) all code in the subprocess just take care of number-crunching, and so on, and all user interaction and display updates are made on the main process.

How to detect a process is running using python

I have to do some task whenever some specific applications were launched. Is there any way to list all the processes running in an operating system or detect whenever a new process is created in operating system

Yeah, there is a bit of stuff in the standard Python library that Jython doesn't implement probably because it's platform dependent and too much work for the small Jython community to implement. Try looking for a Java solution, it's trivial to invoke Java code from Jython.

You could use psutil. Psutil provides information on running processes and system utilization.

How do twisted and multiprocessing.Process create zombies?

In python, using twisted loopingcall, multiprocessing.Process, and multiprocessing.Queue; is it possible to create a zombie process. And, if so, then how?

A zombie is a process which has completed but whose completion has not yet been noticed by the process which started it. It's the Twisted process's responsibility to reap its children.
If you start the process with spawnProcess, everything should always work as expected. However, as described in bug #733 in Twisted (which has long been fixed), there are a plethora of nasty edge-cases when you want to use Twisted with other functions that spawn processes, as Python's API historically made it difficult to cooperate between signal handlers.
This is all fixed in recent versions of the code, but I believe you may still encounter this bug in the following conditions:
You are using a version of Twisted earlier than 10.1.
You are using a version of Python earlier than 2.6.
You are not building Twisted's native extension modules (if you're working from a development checkout or unpacked tarball rather than an installed version, you can fix this with python setup.py build_ext -i).
You are using a module like popen or subprocess.
Hopefully upgrading Twisted or running the appropriate command will fix your immediate issue, but you should still consider using spawnProcess, since that lets you treat process output as a normal event in the reactor event loop.

Are Python threads buggy?

A reliable coder friend told me that Python's current multi-threading implementation is seriously buggy - enough to avoid using altogether. What can said about this rumor?

Python threads are good for concurrent I/O programming. Threads are swapped out of the CPU as soon as they block waiting for input from file, network, etc. This allows other Python threads to use the CPU while others wait. This would allow you to write a multi-threaded web server or web crawler, for example.
However, Python threads are serialized by the GIL when they enter interpreter core. This means that if two threads are crunching numbers, only one can run at any given moment. It also means that you can't take advantage of multi-core or multi-processor architectures.
There are solutions like running multiple Python interpreters concurrently, using a C based threading library. This is not for the faint of heart and the benefits might not be worth the trouble. Let's hope for an all Python solution in a future release.

The standard implementation of Python (generally known as CPython as it is written in C) uses OS threads, but since there is the Global Interpreter Lock, only one thread at a time is allowed to run Python code. But within those limitations, the threading libraries are robust and widely used.
If you want to be able to use multiple CPU cores, there are a few options. One is to use multiple python interpreters concurrently, as mentioned by others. Another option is to use a different implementation of Python that does not use a GIL. The two main options are Jython and IronPython.
Jython is written in Java, and is now fairly mature, though some incompatibilities remain. For example, the web framework Django does not run perfectly yet, but is getting closer all the time. Jython is great for thread safety, comes out better in benchmarks and has a cheeky message for those wanting the GIL.
IronPython uses the .NET framework and is written in C#. Compatibility is reaching the stage where Django can run on IronPython (at least as a demo) and there are guides to using threads in IronPython.

The GIL (Global Interpreter Lock) might be a problem, but the API is quite OK. Try out the excellent processing module, which implements the Threading API for separate processes. I am using that right now (albeit on OS X, have yet to do some testing on Windows) and am really impressed. The Queue class is really saving my bacon in terms of managing complexity!
EDIT: it seemes the processing module is being included in the standard library as of version 2.6 (import multiprocessing). Joy!

As far as I know there are no real bugs, but the performance when threading in cPython is really bad (compared to most other threading implementations, but usually good enough if all most of the threads do is block) due to the GIL (Global Interpreter Lock), so really it is implementation specific rather than language specific. Jython, for example, does not suffer from this due to using the Java thread model.
See this post on why it is not really feasible to remove the GIL from the cPython implementation, and this for some practical elaboration and workarounds.
Do a quick google for "Python GIL" for more information.

If you want to code in python and get great threading support, you might want to check out IronPython or Jython. Since the python code in IronPython and Jython run on the .NET CLR and Java VM respectively, they enjoy the great threading support built into those libraries. In addition to that, IronPython doesn't have the GIL, an issue that prevents CPython threads from taking full advantage of multi-core architectures.

I've used it in several applications and have never had nor heard of threading being anything other than 100% reliable, as long as you know its limits. You can't spawn 1000 threads at the same time and expect your program to run properly on Windows, however you can easily write a worker pool and just feed it 1000 operations, and keep everything nice and under control.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How's Python Multiprocessing Implemented on Windows? - python

Given the absence of a Windows fork() call, how's the multiprocessing package in Python 2.6 implemented under Windows? On top of Win32 threads or some sort of fake fork or just compatibility on top of the existing multithreading?

Related

It is said that python doesn't support multithreading, then why does it have a threading module?

Python and Threads with PyPy?

How to detect a process is running using python

How do twisted and multiprocessing.Process create zombies?

Are Python threads buggy?

Categories

Resources