Python: Threading/multiprocessing with matplotlib and user input

Python: Threading/multiprocessing with matplotlib and user input - python

I am currently working on a code that will continuous plot data retrieved via serial communication while also allowing for user input, in the form of raw_input, to control the job, such as starting/stopping/clearing the plot and setting the save file names for the data. Currently, I'm trying to do this by having an extra thread that will just read user input and relay it to the program while it continuously plots and saves the data.
Unfortunately, I have run into some errors where commands that are entered during the plotting loop freeze the program for 2 minutes or so, which I believe has to do with matplotlib not being thread safe, where a command entered while the loop is not working with the plotting libraries will lead to a response in 1-2 seconds.
I have attempted switching from threading to the multiprocessing library to try to alleviate the problem to no avail, where the program will not show a plot, leading me to believe the plotting process never starts (the plotting command is the first command in it). I can post the codes or the relevant parts of either program if necessary.
I wanted to know if there was any way around these issues, or if I should start rethinking how I want to program this. Any suggestions on different ways of incorporating user input are welcome too.
Thanks

If matplotlib is not thread-safe, the right thing to do is to serialize all the inputs into matplotlib through a single event queue. Matplotlib can retrieve items off the queue until the queue is empty, and then process all the new parameters.
Your serial communication code and your raw_input should simply put data on this queue, and not attempt direct communication with matplotlib.
Your matplotlib thread will be doing one of three things: (1) waiting for new data; (2) retrieving new data and processing it (e.g. appending it to the arrays to be plotted or changing output file names) and staying in this state as long as the queue is not empty, or moving on to state (3) if it is; or (3) invoking matplotlib to do the plotting, then looping back to state (1).
If you are implementing multiple action commands from your raw_input, you can add some auxiliary state variables. For example, if 'stop' is read from the queue, then you would set a variable that would cause state (3) to skip the plotting and go straight to state (1), and if 'start' is read from the queue, you would reset this variable, and would resume plotting when data is received.
You might think you want to do something fancy like: "if I see data, wait to make sure more is not coming before I start to plot." This is usually a mistake. You would have to tune your wait time very carefully, and then would still find times when your plotting never happened because of the timing of the input data. If you have received data (you are in state 2), and the queue is empty, just plot! In the time taken to do that, if 4 more data points come in, then you'll plot 4 more next time...

Related

How to collect continuous data with Python Telnet

I have a python script that connects to a Power Supply via a Telnet session. The flow of the script is as follows:
# Connect to Device
tn = telnetlib.Telnet(HOST,PORT)
# Turn On
tn.write("OUT 1\r")
# Get Current Voltage
current_voltage = tn.write("MV?\r")
# Turn Off
tn.write("OUT 0\r")
What I'd like to do is be able to get the Current Voltage every t milliseconds(ms) and be able to display it on my Tkinter GUI until the device is commanded to be turned off. Ideally I'd like to display it on a chart such that I have Voltage vs. time, but i can live with just a dynamic text display for now. The current_voltage variable will store a string representing the current voltage value. What is the best way I can accomplish this? Thanks.

Every millisecond is probably more than tkinter can handle. It depends a bit on how expensive it is to fetch the voltage. If it takes longer than a millisecond, you're going to need threads or multiprocessing.
The simplest solution is to use after to schedule the retrieval of the data every millisecond, though again, I'm not sure it can keep up. The problem is that the event loop needs time to process events, and giving it such a tiny window of time when it's not fetching voltages may result in a laggy GUI.
The general technique is to write a function that does some work, and then calls after to have itself called again in the future.
For example:
root = tk.Tk()
...
def get_voltage():
<your code to get the voltage goes here>
# get the voltage again in one millisecond
root.after(1, get_voltage)
...
get_voltage()
root.mainloop()
the other choice is to use threads, where you have a thread that does nothing but get the voltage information and put it on a queue. Then, using the same technique as above, you can pull the latest voltage(s) off of the queue for display.

Asynchronous listening/iteration of pipes in python

I'm crunching a tremendous amount of data and since I have a 12 core server at my disposal, I've decided to split the work by using the multiprocessing library. The way I'm trying to do this is by having a single parent process that dishes out work evenly to multiple worker processes, then another that acts as a collector/funnel of all the completed work to be moderately processed for final output. Having done something similar to this before, I'm using Pipes because they are crazy fast in contrast to managed ques.
Sending data out to the workers using the pipes is working fine. However, I'm stuck on efficiently collecting the data from the workers. In theory, the work being handed out will be processed at the same pace and they will all get done at the same time. In practice, this never happens. So, I need to be able to iterate over each pipe to do something, but if there's nothing there, I need it to move on to the next pipe and check if anything is available for processing. As mentioned, it's on a 12 core machine, so I'll have 10 workers funneling down to one collection process.
The workers use the following to read from their pipe (called WorkerRadio)
for Message in iter(WorkerRadio.recv, 'QUIT'):
Crunch Numbers & perform tasks here...
CollectorRadio.send(WorkData)
WorkerRadio.send('Quitting')
So, they sit there looking at the pipe until something comes in. As soon as they get something they start doing their thing. Then fire it off to the data collection process. If they get a quit command, they acknowledge and shut down peacefully.
As for the collector, I was hoping to do something similar but instead of just 1 pipe (radio) there would be 10 of them. The collector needs to check all 10, and do something with the data that comes in. My first try was doing something like the workers...
i=0
for Message in iter(CollectorRadio[i].recv, 'QUIT'):
Crunch Numbers & perform tasks here...
if i < NumOfRadios:
i += 1
else:
i = 0
CollectorRadio.send('Quitting')
That didn't cut it & I tried a couple other ways of manipulating without success too. I either end up with syntax errors, or like the above, I get stuck on the first radio because it never changes for some reason. I looked into having all the workers talking into a single pipe, but the Python site explicit states that "data in a pipe may become corrupted if two processes (or threads) try to read from or write to the same end of the pipe at the same time."
As I mentioned, I'm also worried about some processes going slower than the others and holding up progress. If at all possible, I would like something that doesn't wait around for data to show up (ie. check and move on if nothing's there).
Any help on this would be greatly appreciated. I've seen some use of managed ques that might allow this to work; but, from my testing, managed ques are significantly slower than pipes and I can use as much performance on this as I can muster.
SOLUTION:
Based on pajton's post here's what I did to make it work...
#create list of pipes(labeled as radios)
TheRadioList = [CollectorRadio[i] for i in range(NumberOfRadios)]
while True:
#check for data on the pipes/radios
TheTransmission, Junk1, Junk2 = select.select(TheRadioList, [], [])
#find out who sent the data (which pipe/radio)
for TheSender in TheTransmission:
#read the data from the pipe
TheMessage = TheSender.recv()
crunch numbers & perform tasks here...

If you are using standard system pipes, then you can use select system call to query for which descriptors the data is available. Bt default select will block until at least one of passed descriptors is ready:
read_pipes = [pipe_fd0, pipe_fd1, ... ]
while True:
read_fds, write_fds, exc_fds = select.select(read_pipes, [], [] )
for read_fd in read_fds:
# read from read_fd pipe descriptor

Terminate Python Program, but Recover Data

I have an inefficient simulation running (it has been running for ~24 hours).
It can be split into 3 independent parts, so I would like to cancel the simulation, and start a more efficient one, but still recover the data that has already been calculated for the first part.
When an error happens in a program, for example, you can still access the data that the script was working with, and examine it to see where things went wrong.
Is there a way to kill the process manually without losing the data?

You could start a debugger such as winpdb, or any of several IDE debuggers, in a separate session, attach to the running process, (this halts it), set a break point in a section of the code that has access to your data, resume until you reach the break point and then save your data to a file, your new process could then load that data as a starting point.

How to prebuffer an incoming network stream with gstreamer?

I'm using gstreamer to stream audio over the network. My goal is seemingly simple: Prebuffer the incoming stream up to a certain time/byte threshold and then start playing it.
I might be overlooking a really simple feature of gstreamer, but so far, I haven't been able to find a way to do that.
My (simplified) pipeline looks like this: udpsrc -> alsasink. So far all my attempts at achieving my goal have been using a queue element in between:
Add a queue element in between.
Use the min-threshold-time property. This actually works but the problem is, that it makes all the incoming data spend the specified minimum amount of time in the queue, rather than just the beginning, which is not what I want.
To work around the previous problem, I tried to have the queue notify my code when data enters the audio sink for the first time, thinking that this is the time to unset the min-time property that I set earlier, and thus, achieving the "prebuffering" behavior.
Here's is a rough equivalent of the code I tried:
def remove_thresh(pad, info, queue):
pad.remove_data_probe(probe_id)
queue.set_property("min-threshold-time", 0)
queue.set_property("min-threshold-time", delay)
queue.set_property("max-size-time", delay * 2)
probe_id = audiosink.get_pad("sink").add_data_probe(remove_thresh, queue)
This doesn't work for two reasons:
My callback gets called way earlier than the delay variable I provided.
After it gets called, all of the data that was stored in the queue is lost. the playback starts as if the queue weren't there at all.
I think I have a fundamental misunderstanding of how this thing works. Does anyone know what I'm doing wrong, or alternatively, can provide a (possibly) better way to do this?
I'm using python here, but any solution in any language is welcome.
Thanks.

Buffering has already been implemented in GStreamer. Some elements, like the queue, are capable of building this buffer and post bus messages regarding the buffer level (the state of the queue).
An application wanting to have more network resilience, then, should listen to these messages and pause playback if the buffer level is not high enough (usually, whenever it is below 100%).
So, all you have to do is set the pipeline to the PAUSED state while the queue is buffering. In your case, you only want to buffer once, so use any logic for this (maybe set flag variables to pause the pipeline only the first time).
Set the "max-size-bytes" property of the queue to the value you want.
Either listen to the "overrun" signal to notify you when the buffer becomes full or use gst_message_parse_buffering () to find the buffering level.
Once your buffer is full, set the pipeline to PLAYING state and then ignore all further buffering messages.
Finally, for a complete streaming example, you can refer to this tutorial: https://gstreamer.freedesktop.org/documentation/tutorials/basic/streaming.html
The code is in C, but the walkthroughs should help you with you want.

I was having the exact same problems as you with a different pipeline (appsrc), and after spending days trying to find an elegant solution (and ending up with code remarkably similar to what you posted)... all I did was switch the flag is-live to False and the buffering worked automagically. (no need for min-threshold-time or anything else)
Hope this helps.

Server side command line queuing

Is it possible to have a server side program that queues and manages processes that are executed at the command line?
The project I am working on takes an image from the user, modifies the image then applies it as a texture to a 3D shape. This 3D scene is generated by blender/Cinema 4d at the command line which outputs it as an image. It is this process that needs to be queued or somehow managed from the server side program. The end result sent back to the user is a video containing an animated 3D shape with their image as a texture applied to it.
These renders may take a while (they may not) but how can I ensure that they are executed at the right times and done so in a queued manner?
This will preferably be done in python.

Lacking more details about how/why you're doing queuing (can only run so many at a time, things need to be done in the right order, etc?), it's hard to suggest a specific solution. However, the basic answer for any situation is that you want to use the subprocess module to fire off the processes, and then you can watch them (using the tools afforded to you by that module) to wait until they're complete and then execute the next one in the queue.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.