Python MultiThreading. Releasing resources - python

I am writing super awesome software where i will create a new thread every new minute. This thread will store some data on a remote database server and end. When a new thread is created resources(memory...) are assigned to that thread. If i don't correctly free those resources at some time i will have a problem.
The thread that stores the data can sometimes end unexpectedly, an error because the remote server is unreachable. This is not a problem the thread will end and the data will be stored the next minute together with the data of that next minute.
So my question is: Do python threads free all the resources they use when they end as expected? Do they free all resources when they end because of a error?

Python threads (as opposed to multiprocessing processes) use the same block of memory. If a thread adds something to a data structure that is directly or indirectly referenced from the master thread or other workers (for instance, a shared dictionary or list), that data won't be deleted when the thread dies. So basically, as long as the only data your threads write to memory is referenced by variables local to the thread target function scope or below, the resources should be cleaned up the next time the gc runs after the thread exits.

Related

Termination of hanging threads in Python

I am quite new to Python.
My problem:
I am starting a bunch of threads in parallel each one trying to create a session with one of a number of foreign hosts (to perform a few activities in those sessions later). Some of these hosts may be in an awkward state in which case the session creation fails eventually, however, that takes about 60 secs. If successful, the session is created immediately. Hence I want to terminate the respective session creation threads after a reasonable time (a few secs). However, I learned that the only way to stop a thread is to communicate an event status to the thread to observe - which makes no sense if it is stuck in an action (here: to establish the session). I use join() with a timeout - that speeds up the execution for all sessions successfully created; however, main() won't exit of course until all the temporarily stuck threads have returned. Is there really no way to cut this short?
Thanks for any useful advice.
Uwe

Multithreaded Master/Slave in Python - Add or Remove threads

I want to implement a master slave multithreaded architecture where a master thread keeps refreshing a list of tasks that needs to be done by the slave threads. Sometimes, the list can contain new tasks, or other taks be removed, which means it has to update the slave thread list.
Is there a way to control the thread save list from the master thread? For example, if the task list after refresh doesn't contain the thread "5", it must be stopped and be removed from redoing it, while the others already in the threadpool are ignored, or it would create, start and add to the pool a new thread that doesn't have the said task in the pool.
The slave tasks are a while loop listeners that would eventually write in a DB.
Which is the safest way to stop a thread without crashing the whole pool?
Thank you!
PS: I'm sorry about the slave/master terminology, but I found the children/parent just as bad!

Python threading leaving a thread unattended for seconds

I am trying to develop a stable structure where basically there are three threads running parallely:
One thread reading a serial port for incoming data.
Other thread checking continuously a file for new lines (basically same as previous thread)
The last one destinated to other periodicall functions calls like keep alive command through serial port and deleting old files.
First two threads are on a infinite while loop that always checks for incoming new data, the third is a scheduled function that call other functions and sleeps untill the next function calls.
When my third thread is doing stuffs the other two threads are being delayed to handle new data. I have read a bit about GIL and maybe this is the reasson why I am having these delays.
Should I use other type of structure to priorize handling all the incoming data asap instead of the other thread?

Python multiprocessing queue empty in one process and full in another

I use a list of processes with queues for each one. Another thread is used to fill these queues one after the other and the processes fetch the data from it. The problem is that after a while the queues raise an empty exception from within the processes but the thread get a full exception. When I check the queue size it is consistent with the exceptions.
To make it worse this behavior can only be reproduced as part of a large code base, i can’t generate a small program to reproduce this.
Anyone had similar issues with multiprocessing queues not being consistent in different processes?
Edit
To add more to the description of the pipeline. I have multiple worker objects, each worker has an input queue (multiprocessing.Queue), a worker queue (multiprocessing.Queue), an output queue (threading.Queue), a worker process (multiprocessing.Process) and a manager thread (threading.Thread).
Against all these workers, I have a single feeder thread (threading.Thread) that adds sample identifiers to the input queues of all workers, one by one. The sample identifiers are very small in size (paths of files) so the feeder thread can keep up with the processes.
The worker gets the sample identifiers from the input queue, reads these samples, processes them and puts them into the worker queue on by one. The manager thread reads the data in the worker queues and puts it into the output queue because multiprocessing.Queue is slower on read.
All .get() and .put() calls have timeouts and I keep track of time it takes to get new data from this pipeline. I also have mechanisms for closing it and reopening it, by joining all processes and threads (even for queues) and then recreating all of them from scratch. When everything is working, the main process goes over the workers and reads the data off of their output queue one by one. It also takes a few ms to read new data most of the time.
This whole pipeline exists two times in my code (used for machine learning with Tensorflow). One instance is used for training and is created close to the beginning of the program, the other is used for testing. The second instance is created after a while of training, it goes over all of my dataset and then resets. When the second instance is run for the second time it gets stuck after 1000 samples or so. When it is stuck and I break on debug mode in the main process, I see that the input queue is full and the worker and output queues are empty. When I then break inside one of the worker processes I see that their input queue is empty. It seems like for some reason the worker process sees a different input queue than it should. Note that this is not some race issue because this result is stable.
Edit 2
I zeroed in on the point that the program hangs on. It seems like performing json.loads() on read file data. This means that the problem is different than what originally described. The processes hang and don't see an empty queue.
code for opening the file:
with open(json_path, 'r') as f:
data = f.read()
json_data = json.loads(data) # <== program hangs at this line
I tried using signal.alarm package to pinpoint where in json.loads() the program hangs but it doesn't raise the exception. The problem is reproduced with a single multiprocessing.Process as well, but not when all processing is done in the main process.
Rings a bell to anyone?

Separate process sharing queue with main process (producer/consumer)

I'm pretty new to multiprocessing in Python and I've done a lot of digging around, but can't seem to find exactly what I'm looking for. I have a bit of a consumer/producer problem where I have a simple server with an endpoint that consumes from a queue and a function that produces onto the queue. The queue can be full, so the producer doesn't always need to be running.
While the queue isn't full, I want the producer task to run but I don't want it to block the server from receiving or servicing requests. I tried using multithreading but this producing process is very slow and the GIL slows it down too much. I want the server to be running all the time, and whenever the queue is no longer full (something has been consumed), I want to kick off this producer task as a separate process and I want it to run until the queue is full again. What is the best way to share the queue so that the producer process can access the queue used by the main process?
What is the best way to share the queue so that the producer process can access the queue used by the main process?
If this is the important part of your question (which seems like it's actually several questions), then multiprocessing.Queue seems to be exactly what you need. I've used this in several projects to have multiple processes feed a queue for consumption by a separate process, so if that's what you're looking for, this should work.

Categories

Resources