How to prebuffer an incoming network stream with gstreamer?

How to prebuffer an incoming network stream with gstreamer? - python

I'm using gstreamer to stream audio over the network. My goal is seemingly simple: Prebuffer the incoming stream up to a certain time/byte threshold and then start playing it.
I might be overlooking a really simple feature of gstreamer, but so far, I haven't been able to find a way to do that.
My (simplified) pipeline looks like this: udpsrc -> alsasink. So far all my attempts at achieving my goal have been using a queue element in between:
Add a queue element in between.
Use the min-threshold-time property. This actually works but the problem is, that it makes all the incoming data spend the specified minimum amount of time in the queue, rather than just the beginning, which is not what I want.
To work around the previous problem, I tried to have the queue notify my code when data enters the audio sink for the first time, thinking that this is the time to unset the min-time property that I set earlier, and thus, achieving the "prebuffering" behavior.
Here's is a rough equivalent of the code I tried:
def remove_thresh(pad, info, queue):
pad.remove_data_probe(probe_id)
queue.set_property("min-threshold-time", 0)
queue.set_property("min-threshold-time", delay)
queue.set_property("max-size-time", delay * 2)
probe_id = audiosink.get_pad("sink").add_data_probe(remove_thresh, queue)
This doesn't work for two reasons:
My callback gets called way earlier than the delay variable I provided.
After it gets called, all of the data that was stored in the queue is lost. the playback starts as if the queue weren't there at all.
I think I have a fundamental misunderstanding of how this thing works. Does anyone know what I'm doing wrong, or alternatively, can provide a (possibly) better way to do this?
I'm using python here, but any solution in any language is welcome.
Thanks.

Buffering has already been implemented in GStreamer. Some elements, like the queue, are capable of building this buffer and post bus messages regarding the buffer level (the state of the queue).
An application wanting to have more network resilience, then, should listen to these messages and pause playback if the buffer level is not high enough (usually, whenever it is below 100%).
So, all you have to do is set the pipeline to the PAUSED state while the queue is buffering. In your case, you only want to buffer once, so use any logic for this (maybe set flag variables to pause the pipeline only the first time).
Set the "max-size-bytes" property of the queue to the value you want.
Either listen to the "overrun" signal to notify you when the buffer becomes full or use gst_message_parse_buffering () to find the buffering level.
Once your buffer is full, set the pipeline to PLAYING state and then ignore all further buffering messages.
Finally, for a complete streaming example, you can refer to this tutorial: https://gstreamer.freedesktop.org/documentation/tutorials/basic/streaming.html
The code is in C, but the walkthroughs should help you with you want.

I was having the exact same problems as you with a different pipeline (appsrc), and after spending days trying to find an elegant solution (and ending up with code remarkably similar to what you posted)... all I did was switch the flag is-live to False and the buffering worked automagically. (no need for min-threshold-time or anything else)
Hope this helps.

Related

mpi4py recv data cap?

I am working on a program that is communication intensive with a group of people. I'm not particularly good at debugging distributed programs, but I have a strong suspicion that I am sending too many messages at once to a process. I have reimplemented the actor model in mpi4py. Each process has a "mailbox" of jobs and when they finish with their mailbox they decide to go into CHECK_FOR_UPDATES mode, where they see if there is any new messages they can receive.
I had issues with the program that a group of students and I have been working on. When the load became too big it would start to crash, but we couldn't figure out where the issue was because we're all pretty bad at debugging stuff.
I asked some people at my school if he had any ideas and suggested that, as we are reimplementing the actor system, we should consider using Akka. A student this year said that there may still be a problem, that one actor may get inundated with messages and crash. I asked about it here. The stream model seems not to be what we want (see my comment for more details) and I have since then looked back at the mpi4py program as I had not accounted for this problem before.
In the plain C or Fortran implementation, it appears that there is a count parameter for MPI_Recv. I noticed that comm.recv has no count parameter and suspect that when a process goes into CHECK_FOR_UPDATES mode it just consume a ton of messages from a variety of sources and dies. (Technically, I don't know for sure, but we suspect it might be the case.) Is there a way to cap the amount of data comm.recv accepts?
(Note: I want to avoid using comm.Recv variant as it restricts the user to using numpy arrays.)

Found the answer:
The recv() and irecv() methods may be passed a buffer object that can be repeatedly used to receive messages avoiding internal memory allocation. The buffer must be sufficiently large to accomodate the transmitted messages.
Emphasis mine. Therefore, I have to use Send and Recv.

How to ignore SIGKILL or force a process into 'D' sleep state?

I am trying to figure out how to get a process to ignore SIGKILL. The way I understand it, this isn't normally possible. My idea is to get a process into the 'D' state permanently. I want to do this for testing purposes (the corner case isn't really reproducible). I'm not sure this is possible programatically (I don't want to go damage hardware). I'm working in C++ and Python, but any language should be fine. I have root access.
I don't have any code to show because I don't know how to get started with this, or if it's even possible. Could I possibly set up a bad NFS and try reading from it?
Apologies in advance if this is a duplicate question; I didn't find anyone else trying to induce the D state.
Many thanks.

To get a process into the "D" state (uninterruptible sleep), you have to write kernel code which does that, and then call that code from user space via a system call.
In the Linux kernel, this is done by setting the current task state to uninterruptible, and invoking the scheduler:
set_current_state(TASK_UNINTERRUPTIBLE);
schedule();
Of course, these actions are normally wrapped with additional preparations so that the task has a way to wake up, such as registering on some wait queue or whatever.
Device drivers for low-latency devices such as mass storage use uninterruptible sleeps to simplify their logic. It should only be used when there is a sure-fire way that the process will wake up almost no matter what happens.
Kernel code to do little thing like performing an uninterruptible sleep can be put into a tiny module (start with a minimal driver skeleton) whose initialization function performs the code and then returns nonzero. You can then run the code using insmod, e.g.
insmod my_uninterruptible_sleep_mod.ko
there is no need to rmmod because the function fails, and so the module is unloaded immediately.

It is not possible to ignore SIGKILL or handle it in any way.
From man sigaction:
The sa_mask field specified in act is not allowed to block SIGKILL or SIGSTOP. Any attempt to do so will be silently ignored.

Asynchronous listening/iteration of pipes in python

I'm crunching a tremendous amount of data and since I have a 12 core server at my disposal, I've decided to split the work by using the multiprocessing library. The way I'm trying to do this is by having a single parent process that dishes out work evenly to multiple worker processes, then another that acts as a collector/funnel of all the completed work to be moderately processed for final output. Having done something similar to this before, I'm using Pipes because they are crazy fast in contrast to managed ques.
Sending data out to the workers using the pipes is working fine. However, I'm stuck on efficiently collecting the data from the workers. In theory, the work being handed out will be processed at the same pace and they will all get done at the same time. In practice, this never happens. So, I need to be able to iterate over each pipe to do something, but if there's nothing there, I need it to move on to the next pipe and check if anything is available for processing. As mentioned, it's on a 12 core machine, so I'll have 10 workers funneling down to one collection process.
The workers use the following to read from their pipe (called WorkerRadio)
for Message in iter(WorkerRadio.recv, 'QUIT'):
Crunch Numbers & perform tasks here...
CollectorRadio.send(WorkData)
WorkerRadio.send('Quitting')
So, they sit there looking at the pipe until something comes in. As soon as they get something they start doing their thing. Then fire it off to the data collection process. If they get a quit command, they acknowledge and shut down peacefully.
As for the collector, I was hoping to do something similar but instead of just 1 pipe (radio) there would be 10 of them. The collector needs to check all 10, and do something with the data that comes in. My first try was doing something like the workers...
i=0
for Message in iter(CollectorRadio[i].recv, 'QUIT'):
Crunch Numbers & perform tasks here...
if i < NumOfRadios:
i += 1
else:
i = 0
CollectorRadio.send('Quitting')
That didn't cut it & I tried a couple other ways of manipulating without success too. I either end up with syntax errors, or like the above, I get stuck on the first radio because it never changes for some reason. I looked into having all the workers talking into a single pipe, but the Python site explicit states that "data in a pipe may become corrupted if two processes (or threads) try to read from or write to the same end of the pipe at the same time."
As I mentioned, I'm also worried about some processes going slower than the others and holding up progress. If at all possible, I would like something that doesn't wait around for data to show up (ie. check and move on if nothing's there).
Any help on this would be greatly appreciated. I've seen some use of managed ques that might allow this to work; but, from my testing, managed ques are significantly slower than pipes and I can use as much performance on this as I can muster.
SOLUTION:
Based on pajton's post here's what I did to make it work...
#create list of pipes(labeled as radios)
TheRadioList = [CollectorRadio[i] for i in range(NumberOfRadios)]
while True:
#check for data on the pipes/radios
TheTransmission, Junk1, Junk2 = select.select(TheRadioList, [], [])
#find out who sent the data (which pipe/radio)
for TheSender in TheTransmission:
#read the data from the pipe
TheMessage = TheSender.recv()
crunch numbers & perform tasks here...

If you are using standard system pipes, then you can use select system call to query for which descriptors the data is available. Bt default select will block until at least one of passed descriptors is ready:
read_pipes = [pipe_fd0, pipe_fd1, ... ]
while True:
read_fds, write_fds, exc_fds = select.select(read_pipes, [], [] )
for read_fd in read_fds:
# read from read_fd pipe descriptor

Improving speed of xmlrpclib

I'm working with a device that is essentially a black box, and the only known communication method for it is XML-RPC. It works for most needs, except for when I need to execute two commands very quickly after each other. Due to the overhead and waiting for the RPC response, this is not as quick as desired.
My main question is, how does one reduce this overhead to make this functionality possible? I know the obvious solution is to ditch XML-RPC, but I don't think that's possible for this device, as I have no control over implementing any other protocols from the "server". This also makes it impossible to do a MultiCall, as I can not add valid instructions for MultiCall. Does MultiCall have to be implemented server side? For example, if I have method1(), method2(), and method3() all implemented by the server already, should this block of code work to execute them all in one reply? I'd assume no from my testing so far, as the documentation shows examples where I need to initialize commands on the server side.
server=xmlrpclib.ServerProxy(serverURL)
multicall=xmlrpclib.MultiCall(server)
multicall.method1()
multicall.method2()
mutlicall.method3()
multicall()
Also, looking through the source of xmlrpclib, I see references to a "FastParser" as opposed to a default one that is used. However, I can not determine how to enable this parser over the default. Additionally, the comment on this answer mentions that it parses one character at a time. I believe this is related, but again, no idea how to change this setting.

Unless the bulk size of your requests or responses are very large, it's unlikely that changing the parser will affect the turnaround time (since CPU is much faster than network).
You might want to consider, if possible, sending more than one command to the device without waiting for the response from the first one. If the device can handle multiple requests at once, then this may be of benefit. Even if the device only handles requests in sequence, you can still have the next request waiting at the device so that there is no delay after processing the previous one. If the device serialises requests in this way, then that's goingn to be about the best you can do.

Python twisted asynchronous write using deferred

With regard to the Python Twisted framework, can someone explain to me how to write asynchronously a very large data string to a consumer, say the protocol.transport object?
I think what I am missing is a write(data_chunk) function that returns a Deferred. This is what I would like to do:
data_block = get_lots_and_lots_data()
CHUNK_SIZE = 1024 # write 1-K at a time.
def write_chunk(data, i):
d = transport.deferredWrite(data[i:i+CHUNK_SIZE])
d.addCallback(write_chunk, data, i+1)
write_chunk(data, 0)
But, after a day of wandering around in the Twisted API/Documentation, I can't seem to locate anything like the deferredWrite equivalence. What am I missing?

As Jean-Paul says, you should use IProducer and IConsumer, but you should also note that the lack of deferredWrite is a somewhat intentional omission.
For one thing, creating a Deferred for potentially every byte of data that gets written is a performance problem: we tried it in the web2 project and found that it was the most significant performance issue with the whole system, and we are trying to avoid that mistake as we backport web2 code to twisted.web.
More importantly, however, having a Deferred which gets returned when the write "completes" would provide a misleading impression: that the other end of the wire has received the data that you've sent. There's no reasonable way to discern this. Proxies, smart routers, application bugs and all manner of network contrivances can conspire to fool you into thinking that your data has actually arrived on the other end of the connection, even if it never gets processed. If you need to know that the other end has processed your data, make sure that your application protocol has an acknowledgement message that is only transmitted after the data has been received and processed.
The main reason to use producers and consumers in this kind of code is to avoid allocating memory in the first place. If your code really does read all of the data that it's going to write to its peer into a giant string in memory first (data_block = get_lots_and_lots_data() pretty directly implies that) then you won't lose much by doing transport.write(data_block). The transport will wake up and send a chunk of data as often as it can. Plus, you can simply do transport.write(hugeString) and then transport.loseConnection(), and the transport won't actually disconnect until either all of the data has been sent or the connection is otherwise interrupted. (Again: if you don't wait for an acknowledgement, you won't know if the data got there. But if you just want to dump some bytes into the socket and forget about it, this works okay.)
If get_lots_and_lots_data() is actually reading a file, you can use the included FileSender class. If it's something which is sort of like a file but not exactly, the implementation of FileSender might be a useful example.

The way large amounts of data is generally handled in Twisted is using the Producer/Consumer APIs. This doesn't give you a write method that returns a Deferred, but it does give you notification about when it's time to write more data.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.