Static methods and thread safety

Static methods and thread safety - python

In python with all this idea of "Everything is an object" where is thread-safety?
I am developing django website with wsgi. Also it would work in linux, and as I know they use effective process management, so we could not think about thread-safety alot. I am not doubt in how module loads, and there functions are static or not? Every information would be helpfull.

Functions in a module are equivalent to static methods in a class. The issue of thread safety arises when multiple threads may be modifying shared data, or even one thread may be modifying such data while others are reading it; it's best avoided by making data be owned by ONE module (accessed via Queue.Queue from others), but when that's not feasible you have to resort to locking and other, more complex, synchronization primitives.
This applies whether the access to shared data happens in module functions, static methods, or instance methods -- and the shared data is such whether it's instance variables, class ones, or global ones (scoping and thread safety are essentially disjoint, except that function-local data is, to a pont, intrinsically thread-safe -- no other thread will ever see the data inside a function instance, until and unless the function deliberately "shares" it through shared containers).
If you use the multiprocessing module in Python's standard library, instead of the threading module, you may in fact not have to care about "thread safety" -- essentially because NO data is shared among processes... well, unless you go out of your way to change that, e.g. via mmapped files;-).

See the python documentation to better understand the general thread safety implications of Python.
Django itself seems to be thread safe as of 1.0.3, but your code may not and you will have to verify that...
My advice would be to simply don't care about that and serve your application with multiple processes instead of multiple threads (for example by using apache 'prefork' instead of 'worker' MPM).

Related

How can I check if a thread holds the GIL with sub-interpreters?

I am working on some changes to a library which embeds Python which require me to utilize sub-interpreters in order to support resetting the python state, while avoiding calling Py_Finalize (since calling Py_Initialize afterwards is a no-no).
I am only somewhat familiar with the library, but I am increasingly discovering places where PyGILState_Ensure and other PyGILState_* functions are being used to acquire the GIL in response to some external callback. Some of these callbacks originate from outside Python, so our thread certainly doesn't hold the GIL, but sometimes the callback originates from within Python, so we definitely hold the GIL.
After switching to sub-interpreters, I almost immediately saw a deadlock on a line calling PyGILState_Ensure, since it called PyEval_RestoreThread even though it was clearly already being executed from within Python (and so the GIL was held):
For what it's worth, I have verified that a line that calls PyEval_RestoreThread does get executed before this call to PyGILState_Ensure (it's well before the first call into Python in the above picture).
I am using Python 3.8.2. Clearly, the documentation wasn't lying when it says:
Note that the PyGILState_* functions assume there is only one global interpreter (created automatically by Py_Initialize()). Python supports the creation of additional interpreters (using Py_NewInterpreter()), but mixing multiple interpreters and the PyGILState_* API is unsupported.
It is quite a lot of work to refactor the library so that it tracks internally if the GIL is held or not, and seems rather silly. There should be a way to determine if the GIL is held! However, the only function I can find is PyGILState_Check, but that's a member of the forbidden PyGILState API. I'm not sure it'll work. Is there a canonical way to do this with sub-interpreters?

I've been pondering this line in the documentation:
Also note that combining this functionality with PyGILState_* APIs is delicate, because these APIs assume a bijection between Python thread states and OS-level threads, an assumption broken by the presence of sub-interpreters.
I suspect that the issue was that there's something involving thread local storage on the PyGILState_* API.
I've come to think that it's actually not really possible to tell if the GIL is held by the application. There's no central static place where Python stores that the GIL is held, because it's either held by "you" (in your external code) or by the Python code. It's always held by someone. So the question of "is the GIL held" isn't the question the PyGILState API is asking. It's asking "does this thread hold the GIL", which makes it easier to have multiple non-Python threads interacting with the interpreter.
I overcame this issue by restoring the bijection as best I could by creating a separate thread per sub-interpreter, with the order of operations being very strictly as follows:
Grab the GIL in the main thread, either explicitly or with Py_Initialize (if this is the first time). Be very careful, the thread state from Py_Initialize must only ever be used in the main thread. Don't restore it to another thread: Some module might use the PyGILState_* API and the deadlock will happen again.
Create the thread. I just used std::thread.
Spawn the subinterpreter with Py_NewInterpreter. Be very careful, this will give you a new thread state. As with the main thread state, this thread state must only be used from this thread.
Release the GIL in the new thread when you're ready for Python to do its thing.
Now, there's some gotchas I discovered:
asyncio in Python 3.8-3.9 has a use-after-free bug where the first interpreter loading it manages some resources. So if that interpreter is ended (releasing those resources) and a new interpreter grabs asyncio, there will be a segfault. I overcame this by manually loading asyncio through the C API in the main interpreter, since that one lives forever.
Many libraries, including numpy, lxml, and several networking libraries will have trouble with multiple subinterpreters. I believe that Python itself is enforcing this: An ImportError results when importing any of these libraries with: Interpreter change detected - This module can only be loaded into one interpreter per process. This so far seems to be an insurmountable issue for me since I do require numpy in my application.

Does multiprocessing.shared_memory require locking?

I'm learning about shared memory in Python, especially the python 3.8 module multiprocessing.shared_memory. I see no mention of locking in the documentation. (Although the parent module, multiprocessing, has a Lock object). Are locks somehow handled in the underlying code of multiprocessing.shared_memory or in /dev/shm? That is, is it safe to write to a SharedMemory object from multiple processes at the same time with no explicit locking? Thank you advance for any clarification.
https://docs.python.org/3/library/multiprocessing.shared_memory.html

The Array() class is supposed to be like the synchronized version of shared memory, so no I expect not. You get what it says: a block of shared memory with no synchronization overhead, for times when you don't need it or want to implement your own.

Understanding shared_memory in Python 3.8

I'm trying to understand some of shared_memory's operation.
Looking at the source , it looks like the module uses shm_open() for UNIX environments, and CreateFileMapping \ OpenFileMapping on windows, combined with mmap.
I understand from here, that in order to avoid a thorough serialization / deserialization by pickle, one needs to implement __setstate__() and __getstate__() explicitly for his shared datatype.
I do not see any such implementation in shared_memory.py.
How does shared_memory circumvent the pickle treatment?
Also, on a Windows machine, this alone seems to survive accross interpreters:
from mmap import mmap
shared_size = 12
shared_label = "my_mem"
mmap(-1, shared_size , shared_label)
Why then is CreateFileMapping \ OpenFileMapping needed here?

How does shared_memory circumvent the pickle treatment?
I think you are confusing shared ctypes and shared objects between processes.
First, you don't have to use the sharing mechanisms provided by multiprocessing in order to get shared objects, you can just wrap basic primitives such as mmap / Windows-equivalent or get fancier using any API that your OS/kernel provides you.
Next, the second link you mention regarding how copy is done and how __getstate__ defines the behavior of the pickling is dependent on you — using the sharedctypes module API. You are not forced to perform pickling to share memory between two processes.
In fact, sharedctypes is backed by anonymous shared memory which uses: https://github.com/python/cpython/blob/master/Lib/multiprocessing/heap.py#L31
Both implementations relies on an mmap-like primitive.
Anyway, if you try to copy something using sharedctype, you will hit:
https://github.com/python/cpython/blob/master/Lib/multiprocessing/sharedctypes.py#L98
https://github.com/python/cpython/blob/master/Lib/multiprocessing/sharedctypes.py#L39
https://github.com/python/cpython/blob/master/Lib/multiprocessing/sharedctypes.py#L135
And this function is using ForkingPickler which will make use of pickle and then… ultimately, you'll call __getstate__ somewhere.
But it's not relevant with shared_memory, because shared_memory is not really a ctype-like object.
You have other ways to share objects between processes, using the Resource Sharer / Tracker API: https://github.com/python/cpython/blob/master/Lib/multiprocessing/resource_sharer.py which will rely on pickle serialization/deserialization.
But you don't share shared memory through shared memory, right?
When you use: https://github.com/python/cpython/blob/master/Lib/multiprocessing/shared_memory.py
You create a block of memory with a unique name, and all processes must have the unique name before sharing the memory, otherwise you will not be able to attach it.
Basically, the analogy is:
You have a group of friends and you all have a unique secret base that only you have the location, you will go on errands, be away from each other, but you can all meet at this unique location.
In order for this to work, you must all know the location before going away from each other. If you do not have it beforehand, you are not certain that you will be able to figure out the place to meet them.
That is the same with the shared_memory, you only need its name to open it. You don't share / transfer shared_memory between processes. You read into shared_memory using its unique name from multiple processes.
As a result, why would you pickle it? You can. You can absolutely pickle it. But that might not be built-in, because it's straightforward to just send the unique name to all your processes through another shared memory channel or anything like that.
There is no circumvention required here. ShareableList is just an example of application of SharedMemory class. As you can see it here: https://github.com/python/cpython/blob/master/Lib/multiprocessing/shared_memory.py#L314
It requires something akin to a unique name, you can use anonymous shared memory also and transmit its name later through another channel (write a temporary file, send it back to some API, whatever).
Why then is CreateFileMapping \ OpenFileMapping needed here?
Because it depends on your Python interpreter, here you are might be using CPython, which is doing the following:
https://github.com/python/cpython/blob/master/Modules/mmapmodule.c#L1440
It's already using CreateFileMapping indirectly so that doing CreateFileMapping then attaching it is just duplicating the already-done work in CPython.
But, what about others interpreters? Do all interpreters perform the necessary to make mmap work on non-POSIX platforms? Maybe the rationale of the developer would be this.
Anyway, it is not surprising that mmap would work out of the box.

How to convert Python threading code to multiprocessing code?

I need to convert a threading application to a multiprocessing application for multiple reasons (GIL, memory leaks). Fortunately the threads are quite isolated and only communicate via Queue.Queues. This primitive is also available in multiprocessing so everything looks fine. Now before I enter this minefield I'd like to get some advice on the upcoming problems:
How to ensure that my objects can be transfered via the Queue? Do I need to provide some __setstate__?
Can I rely on put returning instantly (like with threading Queues)?
General hints/tips?
Anything worthwhile to read apart from the Python documentation?

Answer to part 1:
Everything that has to pass through a multiprocessing.Queue (or Pipe or whatever) has to be picklable. This includes basic types such as tuples, lists and dicts. Also classes are supported if they are top-level and not too complicated (check the details). Trying to pass lambdas around will fail however.
Answer to part 2:
A put consists of two parts: It takes a semaphore to modify the queue and it optionally starts a feeder thread. So if no other Process tries to put to the same Queue at the same time (for instance because there is only one Process writing to it), it should be fast. For me it turned out to be fast enough for all practical purposes.
Partial answer to part 3:
The plain multiprocessing.queue.Queue lacks a task_done method, so it cannot be used as a drop-in replacement directly. (A subclass provides the method.)
The old processing.queue.Queue lacks a qsize method and the newer multiprocessing version is inaccurate (just keep this in mind).
Since filedescriptors normally inherited on fork, care needs to be taken about closing them in the right processes.

What is so bad with threadlocals

Everybody in Django world seems to hate threadlocals(http://code.djangoproject.com/ticket/4280, http://code.djangoproject.com/wiki/CookBookThreadlocalsAndUser). I read Armin's essay on this(http://lucumr.pocoo.org/2006/7/10/why-i-cant-stand-threadlocal-and-others), but most of it hinges on threadlocals is bad because it is inelegant.
I have a scenario where theadlocals will make things significantly easier. (I have a app where people will have subdomains, so all the models need to have access to the current subdomain, and passing them from requests is not worth it, if the only problem with threadlocals is that they are inelegant, or make for brittle code.)
Also a lot of Java frameworks seem to be using threadlocals a lot, so how is their case different from Python/Django 's?

I avoid this sort of usage of threadlocals, because it introduces an implicit non-local coupling. I frequently use models in all kinds of non-HTTP-oriented ways (local management commands, data import/export, etc). If I access some threadlocals data in models.py, now I have to find some way to ensure that it is always populated whenever I use my models, and this could get quite ugly.
In my opinion, more explicit code is cleaner and more maintainable. If a model method requires a subdomain in order to operate, that fact should be made obvious by having the method accept that subdomain as a parameter.
If I absolutely could find no way around storing request data in threadlocals, I would at least implement wrapper methods in a separate module that access threadlocals and call the model methods with the needed data. This way the models.py remains self-contained and models can be used without the threadlocals coupling.

I don't think there is anything wrong with threadlocals - yes, it is a global variable, but besides that it's a normal tool. We use it just for this purpose (storing subdomain model in the context global to the current request from middleware) and it works perfectly.
So I say, use the right tool for the job, in this case threadlocals make your app much more elegant than passing subdomain model around in all the model methods (not mentioning the fact that it is even not always possible - when you are overriding django manager methods to limit queries by subdomain, you have no way to pass anything extra to get_query_set, for example - so threadlocals is the natural and only answer).

Also a lot of Java frameworks seem to be using threadlocals a lot, so how is their case different from Python/Django 's?
CPython's interpreter has a Global Interpreter Lock (GIL) which means that only one Python thread can be executed by the interpreter at any given time. It isn't clear to me that a Python interpreter implementation would necessarily need to use more than one operating system thread to achieve this, although in practice CPython does.
Java's main locking mechanism is via objects' monitor locks. This is a decentralized approach that allows the use of multiple concurrent threads on multi-core and or multi-processor CPUs, but also produces much more complicated synchronization issues for the programmer to deal with.
These synchronization issues only arise with "shared-mutable state". If the state isn't mutable, or as in the case of a ThreadLocal it isn't shared, then that is one less complicated problem for the Java programmer to solve.
A CPython programmer still has to deal with the possibility of race conditions, but some of the more esoteric Java problems (such as publication) are presumably solved by the interpreter.
A CPython programmer also has the option to code performance critical code in Python-callable C or C++ code where the GIL restriction does not apply. Technically a Java programmer has a similar option via JNI, but this is rightly or wrongly considered less acceptable in Java than in Python.

You want to use threadlocals when you're working with multiple threads and want to localize some objects to a specific thread, eg. having one database connection for each thread.
In your case, you want to use it more as a global context (if I understand you correctly), which is probably a bad idea. It will make your app a bit slower, more coupled and harder to test.
Why is passing it from request not worth it? Why don't you store it in session or user profile?
There difference with Java is that web development there is much more stateful than in Python/PERL/PHP/Ruby world so people are used to all kind of contexts and stuff like that. I don't think that is an advantage, but it does seem like it at the beginning.

I have found using ThreadLocal is an excellent way to implement Dependency Injection in a HTTP request/response environment (i.e. any webapp). You just set up a servlet filter to 'inject' the object you need into the thread on receiving the request and 'uninject' it on returning the response.
It's a smart man's DI without all the XML ugliness, without the MB of Spring Jars (not to mention its learning curve) and without all the cryptic repetitive #annotation nonsense and because it doesn't individually inject many object instances with the dependencies it's probably a heck of a lot faster and uses less memory.
It worked so well we opened sourced our exPOJO Filter that can inject a Hibernate session or a JDO PersistenceManager using ThreadLocal:
http://www.expojo.com

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.