As title say, does Python cStringIO protect their internal structures for multithreading use?
Thank you.
Take a look at an excellent work on explaining GIL, then note that cStringIO is written purely in C, and its calls don't release GIL.
It means that the running thread won't voluntarily switch during read()/write() (with current virtual machine implementation). (The OS will preempt the thread, however other Python threads won't be able to acquire GIL.)
Taking a look at the source: Python-2.7.1/Modules/cStringIO.c there is no mention about internals protection. When in doubt, look at source :)
I assume you are talking about the CPython implementation of Python.
In CPython there is a global interpreter lock which means that only a single thread of Python code can execute at a time. Code written in C will therefore also be effectively single threaded unless it explicitly releases the global lock.
What that means is that if you have multiple Python threads all using cStringIO simultaneously there won't be any problem as only a single call to a cStringIO method can be active at a time and cStringIO never releases the lock. However if you call it directly from C code which is running outside the locked environment you will have problems. Also if you do anything more complex than just reading or writing you will have issues, e.g. if you start using seek as your calls may overlap in unexpected ways.
Also note that some methods such as writelines can invoke Python code from inside the method so in that case you might get other output interleaved inside a single call to writelines.
That is true for most of the standard Python objects: you can safely use objects from multiple threads as the individual operations won't break, but the order in which things happen won't be defined.
It is as "thread-safe", as file operations can be (which means — not much). The Python implementation you're using has Global Interpreter Lock (GIL), which will guarantee that each individual file operation on cStringIO will not be interrupted by another thread. That does not however guarantee, that concurrent file operations from multiple threads won't be interleaved.
No it is not currently thread safe.
Related
I am working on some changes to a library which embeds Python which require me to utilize sub-interpreters in order to support resetting the python state, while avoiding calling Py_Finalize (since calling Py_Initialize afterwards is a no-no).
I am only somewhat familiar with the library, but I am increasingly discovering places where PyGILState_Ensure and other PyGILState_* functions are being used to acquire the GIL in response to some external callback. Some of these callbacks originate from outside Python, so our thread certainly doesn't hold the GIL, but sometimes the callback originates from within Python, so we definitely hold the GIL.
After switching to sub-interpreters, I almost immediately saw a deadlock on a line calling PyGILState_Ensure, since it called PyEval_RestoreThread even though it was clearly already being executed from within Python (and so the GIL was held):
For what it's worth, I have verified that a line that calls PyEval_RestoreThread does get executed before this call to PyGILState_Ensure (it's well before the first call into Python in the above picture).
I am using Python 3.8.2. Clearly, the documentation wasn't lying when it says:
Note that the PyGILState_* functions assume there is only one global interpreter (created automatically by Py_Initialize()). Python supports the creation of additional interpreters (using Py_NewInterpreter()), but mixing multiple interpreters and the PyGILState_* API is unsupported.
It is quite a lot of work to refactor the library so that it tracks internally if the GIL is held or not, and seems rather silly. There should be a way to determine if the GIL is held! However, the only function I can find is PyGILState_Check, but that's a member of the forbidden PyGILState API. I'm not sure it'll work. Is there a canonical way to do this with sub-interpreters?
I've been pondering this line in the documentation:
Also note that combining this functionality with PyGILState_* APIs is delicate, because these APIs assume a bijection between Python thread states and OS-level threads, an assumption broken by the presence of sub-interpreters.
I suspect that the issue was that there's something involving thread local storage on the PyGILState_* API.
I've come to think that it's actually not really possible to tell if the GIL is held by the application. There's no central static place where Python stores that the GIL is held, because it's either held by "you" (in your external code) or by the Python code. It's always held by someone. So the question of "is the GIL held" isn't the question the PyGILState API is asking. It's asking "does this thread hold the GIL", which makes it easier to have multiple non-Python threads interacting with the interpreter.
I overcame this issue by restoring the bijection as best I could by creating a separate thread per sub-interpreter, with the order of operations being very strictly as follows:
Grab the GIL in the main thread, either explicitly or with Py_Initialize (if this is the first time). Be very careful, the thread state from Py_Initialize must only ever be used in the main thread. Don't restore it to another thread: Some module might use the PyGILState_* API and the deadlock will happen again.
Create the thread. I just used std::thread.
Spawn the subinterpreter with Py_NewInterpreter. Be very careful, this will give you a new thread state. As with the main thread state, this thread state must only be used from this thread.
Release the GIL in the new thread when you're ready for Python to do its thing.
Now, there's some gotchas I discovered:
asyncio in Python 3.8-3.9 has a use-after-free bug where the first interpreter loading it manages some resources. So if that interpreter is ended (releasing those resources) and a new interpreter grabs asyncio, there will be a segfault. I overcame this by manually loading asyncio through the C API in the main interpreter, since that one lives forever.
Many libraries, including numpy, lxml, and several networking libraries will have trouble with multiple subinterpreters. I believe that Python itself is enforcing this: An ImportError results when importing any of these libraries with: Interpreter change detected - This module can only be loaded into one interpreter per process. This so far seems to be an insurmountable issue for me since I do require numpy in my application.
I would like to test how my application responds to functions that hold the GIL. Is there a convenient function that holds the GIL for a predictable (or even a significant) amount of time?
My ideal function would be something that operated like time.sleep except that, unlike sleep, it would hold the GIL
A simple, but hacky, way to hold the GIL is to use the re module with a known-to-be-slow match:
import re
re.match(r'(a?){30}a{30}', 'a'*30)
On my machine, this holds the GIL for 48 seconds with Python 2.7.14 (and takes almost as long on 3.6.3). However, this relies on implementation details, and may stop working if the re module gets improvements.
A more direct approach would be to write a c module that just sleeps. Python C extensions don't automatically release the GIL (unlike, say, ctypes). Follow the hellomodule example here, and replace the printf() with a call to sleep() (or the Windows equivalent). Once you build the module, you'll have a GIL holding function you can use anywhere.
You can use a C library's sleep function in "PyDLL" mode.
# Use libc in ctypes "PyDLL" mode, which prevents CPython from
# releasing the GIL during procedure calls.
_libc_name = ctypes.util.find_library("c")
if _libc_name is None:
raise RuntimeError("Cannot find libc")
libc_py = ctypes.PyDLL(_libc_name)
...
libc_py.usleep(...)
(See https://gist.github.com/jonashaag/d455671003205120a864d3aa69536661 for details on how to pickle the reference, for example if using in a distributed computing environment.)
I have a really specific need :
I want to create a python console with a Qt Widget, and to be able to have several independent interpreters.
Now let me try to explain where are my problems and all tries I did, in order of the ones I'd most like to make working to those I can use by default
The first point is that all functions in the Python C API (PyRun[...], PyEval[...] ...) need the GIL locked, that forbid any concurrent code interpretations from C ( or I'd be really glad to be wrong !!! :D )
Therefore, I tried another approach than the "usual way" : I made a loop in python that call read() on my special file and eval the result. This function (implemented as a built extension) blocks until there is data to read. (Actually, it's currently a while in C code rather than a pthread based condition)
Then, with PyRun_simpleString(), I launch my loop in another thread. This is where the problem is : my read function, in addition to block the current thread (that is totally normal), it blocks the whole interpreter, and PyRun_simpleString() doesn't return...
Finally I've this last idea which risks to be relatively slow : To have a dedicated thread in C++ which run the interpreter, and do every thing in python to manage input/output. This could be a loop which creates the jobs when there is a console needing to execute a command. Seems not to be really hard to do, but I prefer ask you : is there a way to make the above possibilities to work, or is there another way I didn't think about or is my last idea the best ?
One alternative is to just re-use code from IPython and its Qt Console. This assumes by independent interpreters you imply they won't share memory. IPythons run the Python interpreter in multiple processes and communicates with them over TCP or Unix domain sockets with the help of ZeroMQ.
Also, from your question I'm not sure if you're aware of the common blocking I/O idiom in Python C extensions:
Py_BEGIN_ALLOW_THREADS
... Do some blocking I/O operation ...
Py_END_ALLOW_THREADS
This releases the GIL so that other threads can execute Python code while your function is blocking. See Python/C API Reference Manual: Thread State and the Global Interpreter Lock.
If your main requirement is to have several interpreters independent from each other, you'd probably better suited doing fork() and exec() than doing multithreading.
This way each of the interpreters would live in it's own address space not disturbing one of the others.
I'd like people's opinion on which direction to choose between different solutions to implement inter-thread named-pipe communication.
I'm working on a solution for the following:
A 3rd party binary on AIX calls a shared object.
I build this shared object using the python 2.7.5 api, so I have a python thread (64 bit).
So the stack is:
3rd p binary -> my shared object / dll 'python-bridge' -> python 2.7.5 interpreter (persistent)
From custom code inside the 3rd party binary (in a propriatary language), I initialize the python interpreter through the python-bridge, precompile python code blocks through the python-bridge, and execute these bits of code using PyEval_EvalCode in the bridge.
The python interpreter stays alive during the session, and is closed just before the session ends.
Simple sequential python code works fine, and fast. After the call to the shared object method, python references are all decreased (inside the method) and no garbage remains. The precompiled python module stays in memory, works fine. However, I also need to interact with streaming data of the main executable. That executable (of which I don't have the source code) supports fifo through a named pipe, which I want to use for inter-thread communication.
Since the named pipe is blocking, I need a separate thread.
I came up with 3 or 4 alternatives (feel free to give more suggestions)
Use the multiprocess module within python
Make my own C thread, using pthread_create, and use python in there (carefully, I know about the non-threadsafe issues)
Make my own C thread, using pthread_create, parse the named pipe from C, and calling the python interpreter main thread from there
(maybe possible?) use the simpler Threading module of python (which isn't 'pure' threading), and release the GIL at the end of the API call to the bridge. (haven't dared to do this, need someone with insight here. Simple test with Threading and sleep shows it's working within the python call, but the named pipe Thread does nothing after returning to the main non-python process)
What do you suggest?
I'm trying option 1 at the moment, with some success, but it 'feels' a bit bloated to spawn a new process just for parsing a named pipe.
Thanks for your help, Tijs
Answering my own question:
I've implemented this (a while back) using option 4. Works good, very stable.
Releasing the GIL wasn't happening in my first attempt, because I didn't initialize threading.
After that, smooth sailing.
I need to convert a threading application to a multiprocessing application for multiple reasons (GIL, memory leaks). Fortunately the threads are quite isolated and only communicate via Queue.Queues. This primitive is also available in multiprocessing so everything looks fine. Now before I enter this minefield I'd like to get some advice on the upcoming problems:
How to ensure that my objects can be transfered via the Queue? Do I need to provide some __setstate__?
Can I rely on put returning instantly (like with threading Queues)?
General hints/tips?
Anything worthwhile to read apart from the Python documentation?
Answer to part 1:
Everything that has to pass through a multiprocessing.Queue (or Pipe or whatever) has to be picklable. This includes basic types such as tuples, lists and dicts. Also classes are supported if they are top-level and not too complicated (check the details). Trying to pass lambdas around will fail however.
Answer to part 2:
A put consists of two parts: It takes a semaphore to modify the queue and it optionally starts a feeder thread. So if no other Process tries to put to the same Queue at the same time (for instance because there is only one Process writing to it), it should be fast. For me it turned out to be fast enough for all practical purposes.
Partial answer to part 3:
The plain multiprocessing.queue.Queue lacks a task_done method, so it cannot be used as a drop-in replacement directly. (A subclass provides the method.)
The old processing.queue.Queue lacks a qsize method and the newer multiprocessing version is inaccurate (just keep this in mind).
Since filedescriptors normally inherited on fork, care needs to be taken about closing them in the right processes.