I am facing a very weird problem (at least I think it is weird).
Basically I have a very simple function which is called from multiple threads concurrently. While calling the function from a single sequential thread seems to work correctly, when I call the function concurrently it seems to mess up data from different callers.
The actual code is hard to replicate standalone, I just wondered what the life/scope of python data structure is. If I call the same function multiple times concurrently their internal variables/arguments are independent from one another right?
I seem to remember I ran into a similar problem when I was using recursive functions, where a function's data whould persist from call to call, could this be the case?
Thanks
Related
I am using the multiprocessing module since i have two programs of which one requires information from the other one. With this module i can run them both simultaneously and keep feeding information.
With some basic code the multiprocessing worked just fine. However copying in the process my own programs which basically keep creating information in matrix form seems to be a problem. The code stops at the line with the append method. Is it possible that you cannot use this method in a multiprocessing.process?
If so what can i use then? Is there a list of functions and methods usable in these processes?
I am using Python's multi-processing pool. I have been told, although not experienced this myself so I cannot post the code, that one cannot just "return" anything from within the multiprocessing.Pool()-worker back to the multiprocessing.Pool()'s main process. Words like "pickling" and "lock" were being thrown around but I am not sure.
Is this correct, and if so, what are these limitations?
In my case, I have a function which generates a mutable class object and then returns it after it has done some work with it. I'd like to have 8 processes run this function, generate their own classes, and return each of them after they're done. Full code is NOT written yet, so I cannot post it.
Any issues I may run into?
My code is: res = pool.map(foo, list_of_parameters)
Q : "Is this correct, and if so, what are these limitations?"
It depends. It is correct, but the SER/DES processing is the problem here, as a pair of disjoint processes tries to "send" something ( there: a task specification with parameters and back: ... Yessss, the so long waited for result* )
Initial versions of the Python standard library of modules piece, responsible for doing this, the pickle-module, was not able to SER-ialise some more complex types of objects, Class-instances being one such example.
There are newer and newer versions evolving, sure, yet this SER/DES step is one of the SPoFs that may avoid a smooth code-execution for some such cases.
Next are the cases, that finish by throwing a Memory Error as they request as much memory allocations, that the O/S simply rejects any new request for such an allocation, and the whole process attempt to produce and send pickle.dumps( ... ) un-resolvably crashes.
Do we have any remedies available?
Well, maybe yes, maybe no - Mike McKearn's dill may help in some cases to better handle complex objects in SER/DES-processing.
May try to use import dill as pickle; pickle.dumps(...) and test your hot-candidates for Class()-instances to get SER/DES-ed, if they get a chance to pass through. If not, no way using this low-hanging fruit first trick.
Next, a less easy way would be to avoid your dependence on hardwired multiprocessing.Pool()-instantiations and their (above)-limited SER/comms/DES-methods, and design your processing strategy as a distributed-computing system, based on a communicating agents paradigm.
That way you benefit from a right-sized, just-enough designed communication interchange between intelligent-enough agents, that know (as you've designed them to know it) what to tell one to the others, even without sending any mastodon-sized BLOB(s), that accidentally crash the processing in any of the SPoF(s) you cannot both prevent and salvage ex-post.
There seem no better ways forward I know about or can foresee in 2020-Q4 for doing this safe and smart.
I am trying to understand what the point of Pool.apply() is, within a parallelisation module.
My understanding is it is synchronous, and so no parallel processing occurs - but you just get the overhead of running code in separate processes.
I agree that it's not useful for parallelization, but it might have other uses. For example, if by poor life choices, your function has an unholy mess of side effects like changing global variables (probably because you only intended to execute it in a Pool when writing it), running it in a separate process can help with that.
Of course, using it like that is most likely an anti-pattern, so I guess this function only exists for compatibility reasons...
My understanding of Gevent is that it's merely concurrency and not parallelism. My understanding of concurrency mechanisms like Gevent and AsyncIO is that, nothing in the Python application is ever executing at the same time.
The closest you get is, calling a non-blocking IO method, and while waiting for that call to return other methods within the Python application are able to be executed. Again, none of the methods within the Python application ever actually execute Python code at the same time.
With that said, why is there a need for gevent.queue? It sounds to me like the Python application doesn't really need to worry about more than one Python method accessing a queue instance at a time.
I'm sure there's a scenario that I'm not seeing that gevent.queue fixes, I'm just curious what that is.
Although you are right that no two statements execute at the same time within a single Python process, you might want to ensure that a series of statements execute atomically, or you might want to impose an order on the execution of certain statements, and in that case things like gevent.queue become useful. A tutorial is here.
Basic using threads question here.
I'm modifying a program with 2 thread classes and I'd like to use a function defined in one class in both classes now.
As a thread newbie (only been playing with them for a few months) is it OK to move the function out of the thread class into the main program and just call it from both classes or do I need to duplicate the function in the other class that doesn't have it?
regards
Simon
You can call the same function from both threads. The issue to be aware of is modifying shared data from two threads at once. If the function attempts to modify the same data from both threads, you will end up with an unpredictable program.
So the answer to your question is, "it depends what the function does."
It certainly won't help to copy the function into both thread classes. What matters is what the function does, not how many copies of the code there are.
might wanna checkout thread locking. threads operating on 1 function/method can 'lock' that function in many languages so other threads can't access it at the same time. http://en.wikipedia.org/wiki/Lock_(computer_science)