Python check-if-exists-and-update atomic operation - python

I need a thread-safe (atomic?) data structure in Python that can ensure the following:
# visited : defaultdict()
if node not in visited:
assert node not in visited
visited[node] = True

As a high level programming language, Python is not particularly close to the processor that supports atomic operations like CAS. In fact, the Python global interpreter lock prevents your threads from running at the same time. This doesn't obviate the need for atomic operations of course (another thread could still be scheduled between the check and set) , but it does make Python look pretty unattractive for the CPU intensive applications that make atomic operations valuable.
There's perhaps one way to do it; Python can integrate with c library files. So you could write C to perform CAS operations, maybe. I think it would still be subject to the GIL .
I usually use Python threads to handle concurrent blocking operations like parallelization of API calls. In these cases other inter thread communication mechanisms make more sense than atomic operations on shared variables. They're simpler to implement, easier to reason about, and, given the performance characteristics of Python, are fast enough.

Related

Multi-Thread Binary Tree Search Algorithm

I find implementing a multi-threaded binary tree search algorithm in Python can be challenging because it requires proper synchronization and management of multiple threads accessing shared data structures.
One approach, I think is to achieve this would be to use a thread-safe queue data structure to distribute search tasks to worker threads, and use locks or semaphores to ensure that each node in the tree is accessed by only one thread at a time.
How can you implement a multi-threaded binary tree search algorithm in Python that takes advantage of multiple cores, while maintaining thread safety and avoiding race conditions?
To implement a multi-threaded binary tree search algorithm in Python:
Define a task queue data structure such as a Queue from the queue
module to distribute search tasks to worker threads.
Create worker threads that will pull tasks from the task queue,
search the binary tree for the target value, and return the result.
Use locks or semaphores (such as a Lock or Semaphore from the
threading module) to ensure that each node in the tree is accessed
by only one thread at a time.
In the main thread, insert tasks into the task queue to search
different parts of the tree.
Wait for the worker threads to complete and retrieve the results of
their searches.
By using a thread-safe task queue, locks or semaphores to protect
shared data structures, and properly managing the coordination and
synchronization of multiple threads, you can ensure that your
multi-threaded binary tree search algorithm is correct and
efficient.
An Alternative solution using concurrent.futures:
def search(node, value):
if node is None:
return None
if node.value == value:
return node
if value < node.value:
return search(node.left, value)
else:
return search(node.right, value)
def parallel_search(node, value):
with concurrent.futures.ThreadPoolExecutor() as executor:
future_left = executor.submit(search, node.left, value)
future_right = executor.submit(search, node.right, value)
result_left = future_left.result()
if result_left:
return result_left
result_right = future_right.result()
return result_right
The parallel_search function splits the search task into two separate tasks, one for the left subtree and one for the right subtree, and submits each task to the executor. The executor runs each task in a separate thread, allowing the search to take advantage of multiple cores.
By using the concurrent.futures module, the implementation is thread-safe and avoids race conditions, as the module takes care of managing the threads and the shared data structures.
Another potential disadvantage is that it can lead to increased complexity in the code, as the abstractions provided by the module may make it more difficult to understand the underlying coordination and synchronization mechanisms.
It is also possible that the module may not support certain advanced features that may be needed for a specific use case.
The module is relatively new and may have bugs or compatibility issues with certain versions of Python. It's important to weigh the benefits of using the concurrent.futures module against the potential disadvantages, and choose the most appropriate solution for the specific use case.
In general, it's important to carefully consider the trade-offs between using the concurrent.futures module and a manually managed solution when implementing a multi-threaded binary tree search algorithm in Python. A combination of both approaches may also be possible, where the concurrent.futures module is used for simple tasks and manual management is used for more complex tasks that require finer control over the number of worker threads and coordination mechanisms.
How can you implement a multi-threaded binary tree search algorithm in Python that takes advantage of multiple cores, while maintaining thread safety and avoiding race conditions?
You can write a multi-threaded binary tree search in Python that is thread-safe and has no race conditions. Another answer makes some good suggestions about that.
But if you're writing it in pure Python then you cannot make effective use of multiple cores to improve the performance of your search, at least not with CPython, because the Global Interpreter Lock prevents any concurrent execution within the Python interpreter. Multithreading can give you a performance improvement if your threads spend a significant fraction of their time in native code or blocked, but tree searching does not have any characteristics that would make room for an improvement from multithreading in a CPython environment.

Parallel threading python GIL vs Java

I know that python has a GIL that make threads not be able to run at the same time therefore threading is just context switching.
Why is java different?
Threads on the same CPU in every language cannot run parallel.
Is creating new thread in java utilizes cores in multi core machine?
python can only spawn threads on the same CPU, in contrast to java?
If 1. Is the case, when using more threads than CPUs even in java it comes back to context switching again for several of them?
If 1. Is the case then how is it differ from multiprocessing? Because utilizing multiple cores isn't guaranteed?
Isn't the whole point of threading is being able to use the same memory space? If java does run some of them in multiple threads for perallelism, how do they really share memory?
Thank you
Why is java different?
Because it is able to effectively use multiple cores at the same time.
Does creating a new thread in java utilizes cores in multi core machine?
Yes.
Python can only spawn threads on the same CPU, in contrast to Java?
Java can spawn multiple threads which will on different CPUs. Java is not responsible for the actual thread scheduling. That is handled by the OS. And the OS may reschedule a thread to a different CPU to the one that it started on.
I am not sure about the precise details for Python, but I think the GIL is an implementation detail rather than something that it intrinsic to the language itself1. But either way, in a Python implementation, the GIL means that you would get little performance benefit in spawning threads on multiple cores. As this page says:
"The Python Global Interpreter Lock or GIL, in simple words, is a mutex (or a lock) that allows only one thread to hold the control of the Python interpreter."
If 1. is the case, when using more threads than CPUs does it come back to context switching in Java?
It depends. When switching a CPU between threads belonging to different processes, a full context switch is involved. But when switching between threads in the same process, only the (user) registers need to be switched. (The virtual memory registers and caches don't need to be switched / flushed because the threads share the same virtual address space.)
If 1. is the case then how is it differ from multiprocessing? Because utilizing multiple cores isn't guaranteed?
The key difference between multi-threading and multi-processing is that processes do not share any memory. By contrast, one thread in a process can see the memory of all of the others ... modulo issues of when changes are visible.
This difference has a variety of consequences.
Isn't the whole point of threading is being able to use the same memory space?
Yes, that is the main point ... when you compare multi-threading with multi-processing.
If Java does run some of them in multiple threads for parallelism ...
Java supports threads for many reasons. Parallelism is only one of those reasons. Others include multiplexing I/O and simplifying certain kinds of programming problem. These other reasons are also relevant to Python.
... how do [Java threads] really share memory?
The hardware deals with the issues of making the physical memory visible to all of the threads, and propagation of changes via the memory caches. It is complicated.
In Java the onus is on the programmer to "do the right thing" when threads make use of shared variables / objects. You need to use volatile variables, or synchronized blocks / methods, or something else that ensures that there is a happens before chain between a write and subsequent read. (Otherwise you can get issues with changes not being visible.)
This transfer of responsibility to the programmer allows the compiler to generate code with fewer main memory operations ... and hence that is faster. The downside is that if an application doesn't obey the rules, it is liable to behave in unexpected ways.
By contrast, in Python the memory model is unspecified, but there is an expectation (by millions of Python programmers) that it will behave in an intuitive fashion; i.e. a shared variable write performed by one thread will immediately be visible to other threads. This is hard to achieve efficiently while also allowing Python threads to run in parallel.
1 - While the GIL is not formally part of the Python spec, the influence of GIL on the (unspecified!) Python memory model and Python programmers assumptions make it more than merely an implementation detail. It remains to be seen if Python can successfully evolve into a language where multi-threading can use multiple cores effectively.
Not a complete answer here, but just adding a couple of things that Stephen C didn't already say:
Python can only spawn threads on the same CPU, in contrast to java?
That would be an optimization, not an essential fact. There's no reason in principle why Python could not simply allow the OS to schedule its threads on whatever CPU happened to be available at any given time.
OTOH, given that no two Python threads can do significant work at the same time, it potentially could improve performance if the threads all had affinity for the same CPU. (See what Stephen C said about "full context switch" vs. "only the (user) registers."
Giving user-mode processes control over processor affinity is a relatively new feature in some operating systems. I have no idea of whether or not any Python version actually uses that feature.
If java does run...multiple threads for parallelism...?
Java doesn't "run multiple threads for parallelism." Your Java program creates multiple threads for whatever reason you happen to want them. Most modern OSs provide threads. Java simply makes that ability available to application programmers in a way that is tightly integrated with the language itself. You are free to use them (or not) however you see fit.

How to speed up nested loops in python with concurrency?

i have the following code:
def multiple_invoice_matches(payment_regex, invoice_regex):
multiple_invoice_payment_matches=[]
for p in payment_regex:
if p["match_count"]>1:
for k in p["matches"]:
for i in invoice_regex:
if i["rechnung_nr"] ==k:
multiple_invoice_payment_matches.append({"fuzzy_ratio":100, "type":2, "m_match":0, "invoice":i, "payment":p})
return multiple_invoice_payment_matches
The sizes of payment_regex and invoice_regex are really huge. Therefore, the code snippet give above takes too much time to return the result. How can I speed up running time of this code?
You could take a look at the numba library, if your data has the possibility of parallelization, rewrite your function using the numba library would definitely speed up your code.
Without the dimensions of size and how your data is structured it's kind of hard to give a general approach to optimize your function.
I could say partition your data into multiple ranges (either by payment_regex, or by invoice_regex, or both) and then add those partitions to a work queue that is processed by multiple threads. Wait for those threads to finish (i.e.: join them), and then construct your final list based on the partial results you got for each partition.
This will work well in other programming languages, but unfortunately, not in Python, because of GIL - the Python's Global Interpreter Lock.
If you don't know much about GIL here's a decent article, saying:
The Python Global Interpreter Lock or GIL, in simple words,
is a mutex (or a lock) that allows only one thread to hold
the control of the Python interpreter.
[...]
The impact of the GIL isn’t visible to developers who execute
single-threaded programs, but it can be a performance bottleneck
in CPU-bound and multi-threaded code.
To evade GIL you basically have two options:
(1) spawn multiple Python processes and use shared memory for backing up your data => concurrency will now rely on the OS for switching between processes (e.g.: use numpy and shared memory, see here)
(2) use a Python package that can manipulate your data and implements the multi-threading model in C, where GIL is not effective (e.g.: use numba)
You may ask yourself then why Python supports multi-threading in the first place?
Multi-threading in Python is mostly useful when the threads are blocked by IO operations (read/write of files, sockets, etc.) or by other system calls that put the thread in the sleep state. That's where Python releases the GIL lock and other threads can operate concurrently while some are at sleep.

How does dask achieve parallelism?

I don't quite understand dask's parallelism model (https://docs.dask.org/en/latest/delayed-best-practices.html)
Given that python is single-threaded, what performance benefit can delayed actually offer? My understanding is it infers independent processes/functions as parts of a graph and then executes them in "parallel", but how is that possible?
I see how they might be "concurrent" processes, but even so - given that the function is sync, how can it perform any concurrent processes?
Simple: python is not "single-threaded", it can run many threads simultaneously. You are maybe thinking of the global interpreter lock (GIL), which makes the interpreter run exactly one operation at a time from one of the threads. Many libraries do not need to hold the GIL, however, so thread-based parallelism is real and useful in many cases. This will generally be true for numerical libraries (pandas...) and other things that do most of their work in compiled C/C++ code.
In addition, Dask supports process-based parallelism, that bypasses the GIL issue, but at the cost of communication and memory overhead. Whether this is better or worse for you will depend on your workload.
Finally, the distributed scheduler is ideal even on a single machine, because it enables you to choose the threads/processes mix that is right for whatever you are doing.

multiprocess or threading in python?

I have a python application that grabs a collection of data and for each piece of data in that collection it performs a task. The task takes some time to complete as there is a delay involved. Because of this delay, I don't want each piece of data to perform the task subsequently, I want them to all happen in parallel. Should I be using multiprocess? or threading for this operation?
I attempted to use threading but had some trouble, often some of the tasks would never actually fire.
If you are truly compute bound, using the multiprocessing module is probably the lightest weight solution (in terms of both memory consumption and implementation difficulty.)
If you are I/O bound, using the threading module will usually give you good results. Make sure that you use thread safe storage (like the Queue) to hand data to your threads. Or else hand them a single piece of data that is unique to them when they are spawned.
PyPy is focused on performance. It has a number of features that can help with compute-bound processing. They also have support for Software Transactional Memory, although that is not yet production quality. The promise is that you can use simpler parallel or concurrent mechanisms than multiprocessing (which has some awkward requirements.)
Stackless Python is also a nice idea. Stackless has portability issues as indicated above. Unladen Swallow was promising, but is now defunct. Pyston is another (unfinished) Python implementation focusing on speed. It is taking an approach different to PyPy, which may yield better (or just different) speedups.
Tasks runs like sequentially but you have the illusion that are run in parallel. Tasks are good when you use for file or connection I/O and because are lightweights.
Multiprocess with Pool may be the right solution for you because processes runs in parallel so are very good with intensive computing because each process run in one CPU (or core).
Setup multiprocess may be very easy:
from multiprocessing import Pool
def worker(input_item):
output = do_some_work()
return output
pool = Pool() # it make one process for each CPU (or core) of your PC. Use "Pool(4)" to force to use 4 processes, for example.
list_of_results = pool.map(worker, input_list) # Launch all automatically
For small collections of data, simply create subprocesses with subprocess.Popen.
Each subprocess can simply get it's piece of data from stdin or from command-line arguments, do it's processing, and simply write the result to an output file.
When the subprocesses have all finished (or timed out), you simply merge the output files.
Very simple.
You might consider looking into Stackless Python. If you have control over the function that takes a long time, you can just throw some stackless.schedule()s in there (saying yield to the next coroutine), or else you can set Stackless to preemptive multitasking.
In Stackless, you don't have threads, but tasklets or greenlets which are essentially very lightweight threads. It works great in the sense that there's a pretty good framework with very little setup to get multitasking going.
However, Stackless hinders portability because you have to replace a few of the standard Python libraries -- Stackless removes reliance on the C stack. It's very portable if the next user also has Stackless installed, but that will rarely be the case.
Using CPython's threading model will not give you any performance improvement, because the threads are not actually executed in parallel, due to the way garbage collection is handled. Multiprocess would allow parallel execution. Obviously in this case you have to have multiple cores available to farm out your parallel jobs to.
There is much more information available in this related question.
If you can easily partition and separate the data you have, it sounds like you should just do that partitioning externally, and feed them to several processes of your program. (i.e. several processes instead of threads)
IronPython has real multithreading, unlike CPython and it's GIL. So depending on what you're doing it may be worth looking at. But it sounds like your use case is better suited to the multiprocessing module.
To the guy who recommends stackless python, I'm not an expert on it, but it seems to me that he's talking about software "multithreading", which is actually not parallel at all (still runs in one physical thread, so cannot scale to multiple cores.) It's merely an alternative way to structure asynchronous (but still single-threaded, non-parallel) application.
You may want to look at Twisted. It is designed for asynchronous network tasks.

Categories

Resources