Atomic operations between main thread and subthread in python - python

I have a list in my python program that gets new items on certain occasions (It's a message-queue consumer). Then I have a thread that every few minutes checks to see if there's anything in the list, and if there is then I want to do an action on each item and then empty the list.
Now my problem: should I use locks to ensure that the action in the subthread is atomic, and does this ensure that the main thread can't alter the list while I'm going through the list?
Or should I instead use some kind of flag?
Pseudocode to make my problem clearer.
Subthread:
def run(self):
while 1:
if get_main_thread_list() is not empty:
do_operations()
empty_the_list()
sleep(30)
Main thread:
list = []
def on_event(item):
list.add(item)
def main():
start_thread()
start_listening_to_events()
I hope this makes my problem clearer, and any links to resources or comments are obviously welcome!
PS: I'm well aware that I just might not grasp threaded programming well enough for this question, if you believe so could you please take some time explaining whats wrong with my reasoning if you have the time.

should I use locks to ensure that the action in the subthread is atomic, and does this ensure that the main thread can't alter the list while I'm going through the list?
Yes. If you implement it correctly yes.
Or should I instead use some kind of flag?
"some kind of flag" == lock, so you'd better use threading locks.
Important: It looks to me like you're trying to reimplement the queue module from the stdlib, you might want to take a look at it.
Other than having a bunch of interesting features is also thread safe:
The queue module implements multi-producer, multi-consumer queues. It is especially useful in threaded programming when information must be exchanged safely between multiple threads. The Queue class in this module implements all the required locking semantics.

Related

How can a process get the current value of a global variable in python?

My code is like this:
import pygame
from multiprocessing import Process, Queue
#dosomething
#dosomething
def keyboard():
keys_pressed = q.get()
if key_pressed[pygame.K_a]:
#dosomething
q = Queue()
keyboard_process = Process(target=keyboard)
keyboard_process.start()
while True:
q.put(pygame.key.get_pressed())
#dosomething
keyboard_process.join()
#dosomething
But, the value of "q" is always [0, 0, ……, 0] even if I press "A"."keyboard_process.join()" always does nothing.So the game doesn't work.
How can a process get the current value of a global variable in python? Please help me.
The short answer is: "No".
When you start a separate child process, it gets a complete copy of the memory-space of the parent. Once started it is completely independent of the parent. The only way of communicating parent/child is to use some kind of inter-process communication mechanism - pipes, shared memory, network-sockets, files, etc.
However if you start a separate processing thread, all threads share the same memory.
Before #humble_D 's answer was deleted, it had a great link to a relevant answer detailing some methods of inter-process communication. Ref: multiprocessing: sharing a large read-only object between processes?
To be honest, if you're seriously considering PyGame for a new game, it probably doesn't need multiprocessing. If there's some super CPU-bound sub-process that needs to happen, you would probably be better off writing this part in a native-binary compiled language (e.g.: C/C++/Rust ) and launch that as the sub-process.
But first, just write your game, then optimise the parts that are slow.

When to use event/condition/lock/semaphore in python's threading module?

Python provides 4 different synchronizing mechanisms in threading module: Event/Condition/Lock(RLock)/Semaphore.
I understand they can be used to synchronize access of shared resources/critical sections between threads. But I am not quite sure when to use which.
Can they be used interchangeably? Or are some of them 'higher level', using others as building blocks? If so, which ones are built on which?
It would be great if someone can illustrate with some examples.
This article probably contains all the information you need. The question is indeed very broad, but let me try to explain how I use each as an example:
Event - Use it when you need threads to communicate a certain state was met so they can both work together in sync. I use it mostly for initiation process of two threads where one dependes on the other.
Example: A client has a threaded manager, and its __init__() needs to know the manager is done instantiating some attributes before it can move on.
Lock/RLock - Use it when you are working with a shared resource and you want to make sure no other thread is reading/writing to it. Although I'd argue that while locking before writing is mandatory, locking before reading could be optional. But it is good to make sure that while you are reading/writing, no other thread is modifying it at the same time. RLock has the ability to be acquired multiple times by its owner, and release() must be called the same amount of times acquire() was in order for it to be used by another thread trying to acquire it.
I haven't used Condition that much, and frankly never had to use Semaphore, so this answer has room of editing and improvement.

Python Multiprocessing - how to make processes wait while active?

Well, I'm quite new to python and multiprocessing, and what I need to know is if there is any way to make active processes wait for something like "all processes have finished using a given resource", then continue their works. And yes, I really need them to wait, the main purpose is related to synchronization. It's not about finishing the processes and joining them, it's about waiting while they're running, should I use something like a Condition/Event or something? I couldn't find anything really helpful anywhere.
It would be something like this:
import multiprocessing
def worker(args):
#1. working
#2. takes the resource from the manager
#3. waits for all other processes to finish the same step above
#4. returns to 1.
if __name__ == '__main__':
manager = multiprocessing.Manager()
resource = manager.something()
pool = multiprocessing.Pool(n)
result = pool.map(worker, args)
pool.close()
pool.join()
Edit: The "working" part takes a lot more time than the other parts, so I still take advantage of multiprocessing, even if the access to that single resource is serial. Let's say the problem works this way: I have multiple processes running a solution finder (an evolutionary algorithm), and every "n" solutions made, I use that resource to exchange some data between those processes and improve solutions using the information. So, I need all of them to wait before exchanging that info. It's a little hard to explain, and I'm not really here to discuss the theory, I just want to know if there is any way I could do what I tried to describe in the main question.
I'm not sure, that I understood your question. But I think you can use Queue. It's good solution to transmit data from one process to another. You can implement something like:
1. Process first chunk
2. Write results to queue
3. Waits until queue is not full
4. Returns to 1
I actually found out a way to do what I wanted.
As you can see in the question, the code was using a manager along the processes. So, in simple words, I made a shared resource which works basically like a "Log". Every time a process finishes its work, it writes a permission in the log. Once all the desired permissions are there, the processes continue their works (also, using this, I could set specific orders of access for a resource, for example).
Please note that this is not a Lock or a Semaphore.
I suppose this isn't a good method at all, but it suits the problem's needs and doesn't delay the execution.

Heavy I/O and python multiprocessing/multithreading

I am designing a little soft which involves:
Fetching a resource on the internet,
Some user interaction (quick editing of the resource),
Some processing.
I would like to do so with many resources (they are all listed in a list). Each is independent from the others. Since the editing part is quite weary, I would like to make life easier for the user (probably me) so that he does not have to wait for the download of each ressource. For simplicity we forget the third task here.
My idea was to use the threading or multiprocessing module. Some thread (say thread 1) would do the "download" in advance while another (say thread 2) one would interact with the user on an already downloaded resource.
Question: How can I make sure that thread 1 is always ahead of at least ahead_min resources and at most ahead_max (ahead_max>ahead_min) at all times?
I typically would need something similar to Queue.Queue(ahead_max) (or multiprocessing.Queue(ahead_max)) except that when ahead_max is attained then insertion is blocked until there are at most ahead_min elements left in the queue (in fact it blocks until the queue is empty, see http://docs.python.org/library/queue.html#module-Queue). Popping should also be blocked until at least ahead_min+1 elements are in the queue (at the end of the sequence of resources I can then insert some dummy objects to ensure even the last resource is processed).
Any idea? If you think of any simpler alternative, please share!
In this case I would suggest to subclass Queue and implement your own logic. This should be an easy task as the implementation of the Queue class is already in Python.
You can use this as template
from queue import Queue
class MyQueue(Queue):
def put(self, item, block=True, timeout=None):
...
def get(self, block=True, timeout=None):
...
First of all, it seems that threading is preferable over multiprocessing in this case, because your task seems to be more IO bound than CPU bound. Then, yes, make use of queues in order to set up the communication between the different "modules". If the default pop behavior is not enough for you, you can play with Queue.qsize() and implement your own logic.

Python - How can I make this code asynchronous?

Here's some code that illustrates my problem:
def blocking1():
while True:
yield 'first blocking function example'
def blocking2():
while True:
yield 'second blocking function example'
for i in blocking1():
print 'this will be shown'
for i in blocking2():
print 'this will not be shown'
I have two functions which contain while True loops. These will yield data which I will then log somewhere (most likely, to an sqlite database).
I've been playing around with threading and have gotten it working. However, I don't really like it... What I would like to do is make my blocking functions asynchronous. Something like:
def blocking1(callback):
while True:
callback('first blocking function example')
def blocking2(callback):
while True:
callback('second blocking function example')
def log(data):
print data
blocking1(log)
blocking2(log)
How can I achieve this in Python? I've seen the standard library comes with asyncore and the big name in this game is Twisted but both of these seem to be used for socket IO.
How can I async my non-socket related, blocking functions?
A blocking function is a function which doesn't return, but still leaves your process idle - unable to complete more work.
You're asking us to make your blocking functions non-blocking. However – unless you're writing an operating system – you don't have any blocking functions. You might have functions which block because they make calls to blocking system calls, or you might have functions which "block" because they do a lot of computation.
Making the former type of function non-blocking is impossible without making the underlying system call non-blocking. Depending on what that system call is, it may be difficult to make it non-blocking without also adding an event loop to your program; you don't just need to make the call and have it not block, you also have to make another call to determine that the result of that call will be delivered somewhere you could associate it.
The answer to this question is a very long python program and a lot of explanations of different OS interfaces and how they work, but luckily I already wrote that answer on a different site; I called it Twisted. If your particular task is already supported by a Twisted reactor, then you're in luck. Otherwise, as long as your task maps to some existing operating system concept, you can extend a reactor to support it. Practically speaking there are only 2 of these mechanisms: file descriptors on every sensible operating system ever, and I/O Completion Ports on Windows.
In the other case, if your functions are consuming a lot of CPU, and therefore not returning, they're not really blocking; your process is still chugging along and getting work done. There are three ways to deal with that:
separate threads
separate processes
if you have an event loop, write a task that periodically yields, by writing the task in such a way that it does some work, then asks the event loop to resume it in the near future in order to allow other tasks to run.
In Twisted this last technique can be accomplished in various ways, but here's a syntactically convenient trick that makes it easy:
from twisted.internet import reactor
from twisted.internet.task import deferLater
from twisted.internet.defer import inlineCallbacks, returnValue
#inlineCallbacks
def slowButSteady():
result = SomeResult()
for something in somethingElse:
result.workHardForAMoment(something)
yield deferLater(reactor, 0, lambda : None)
returnValue(result)
You can use generators for cooperative multitasking, but you have to write your own main loop that passes control between them.
Here's a (very simple) example using your example above:
def blocking1():
while True:
yield 'first blocking function example'
def blocking2():
while True:
yield 'second blocking function example'
tasks = [blocking1(), blocking2()]
# Repeat until all tasks have stopped
while tasks:
# Iterate through all current tasks. Use
# tasks[:] to copy the list because we
# might mutate it.
for t in tasks[:]:
try:
print t.next()
except StopIteration:
# If the generator stops, remove it from the task list
tasks.remove(t)
You could further improve it by allowing the generators to yield new generators, which then could be added to tasks, but hopefully this simplified example will give the general idea.
The twisted framework is not just sockets. It has asynchronous adapters for many scenarios, including interacting with subprocesses. I recommend taking a closer look at that. It does what you are trying to do.
If you don't want to use full OS threading, you might try Stackless, which is a variant of Python that adds many interesting features, including "microthreads". There are a number of good examples that you will find helpful.
Your code isn’t blocking. blocking1() and it’s brother return iterators immediately (not blocking), and neither does a single iteration block (in your case).
If you want to “eat” from both iterators one-by-one, don’t make your program try to eat up “blocking1()” entirely, before continuing...
for b1, b2 in zip(blocking1(), blocking2()):
print 'this will be shown', b1, 'and this, too', b2

Categories

Resources