Python does not stop reading from stdin buffer - python

In Node.JS, I spawn a child Python process to be piped. I want to send a UInt8Array through stdin. So as to notify the size of the buffer data to be read, I send the size of it before. But it doesn't stop reading for the actual data from the buffer properly after a specified size. As a result, the Python process doesn't terminate forever. I've checked that it takes bufferSize properly and converts it into an integer. In the absence of size = int(input()) and python.stdin.write(bufferSize.toString() + "\n") and when the size of the buffer is hardcoded, it works correctly. I couldn't figure out why it does not end waiting after reading for the specified amount of bytes.
// Node.JS
const python_command = command.serializeBinary()
const python = spawn('test/production_tests/py_test_scripts/protocolbuffer/venv/bin/python', ['test/production_tests/py_test_scripts/protocolbuffer/command_handler.py']);
const bufferSize = python_command.byteLength
python.stdin.write(bufferSize.toString() + "\n")
python.stdin.write(python_command)
# Python
size = int(input())
data = sys.stdin.buffer.read(size)
In a nutshell, the problem arises from the fact that putting normal stdin input() firstly and then sys.stdin.buffer.read. I guess the preceding one conflicts with the successive one and precludes it to work normally.

There are two potential problems here. The first is that the pipe between node.js and the python script is block buffered. You won't see any data on the python side until either a block's worth of data is filled (system dependent) or the pipe is closed. The second is that there is a decoder between input and the byte stream coming in on stdin. This decoder is free to read ahead in the stream as it wishes. Reading sys.stdin.buffer may miss whatever happens to be buffered in the decoder.
You can solve the second problem by doing all of your reads from the buffer as shown below. The first problem needs to be solved on the node.js side - likely by closing its subprocess stdin. You may be better off just writing the size as a binary number, say uint64.
import struct
import sys
# read size - assuming its coming in as ascii stream
size_buf = []
while True:
c = sys.stdin.buffer.read(1)
if c == b"\n":
size = int(b"".join(size_buf))
break
size_buf.append(c)
fmt = "B" # read unsigned char
fmtsize = struct.calcsize(fmt)
buf = [struct.unpack(fmt, sys.stdin.buffer.read(fmtsize))[0] for _ in range(size)]
print(buf)

Related

How to read stdin buffer in advance before an EOF in python3?

In my python code I wrote the following function to receive self-defined binary package from stdin.
def recvPkg():
## The first 4 bytes stands for the remaining package length
Len = int.from_bytes(sys.stdin.buffer.read(4), byteorder='big', signed=True)
## Then read the remaining package
data = json.loads(str(sys.stdin.buffer.read(Len), 'utf-8'))
## do something...
while True:
recvPkg()
Then, in another Node.js program I spawn this python program as a child process, and send bytes to it.
childProcess = require('child_process').spawn('./python_code.py');
childProcess.stdin.write(someBinaryPackage)
I expect the child process to read from its stdin buffer once a package is received and give the output. But it doesn't work, and I think the reason is that the child process won't begin to read unless its stdin buffer receive a signal, like an EOF. As a proof, if I close childProcess's stdin after stdin.write, the python code will work and receive all the buffered packages at once. This is not the way I want because I need childProcess's stdin to be open. So is there any other way for node.js to send a signal to childProcess to inform of reading from stdin buffer?
(sorry for poor english.
From Wikipedia (emphasis mine):
Input from a terminal never really "ends" (unless the device is disconnected), but it is useful to enter more than one "file" into a terminal, so a key sequence is reserved to indicate end of input. In UNIX the translation of the keystroke to EOF is performed by the terminal driver, so a program does not need to distinguish terminals from other input files.
There is no way to send an EOF character how you are expecting. EOF isn't really a character that exists. When you're in a terminal, you can press the key sequence ctrlz on Windows, and ctrld on UNIX-like enviroments. These produce control characters for the terminal (code 26 on Windows, code 04 on UNIX) and are read by the terminal. The terminal (upon reading this code) will then essentially stop writing to a programs stdin and close it.
In Python, a file object will .read() forever. The EOF condition is that .read() returns ''. In some other languages, this might be -1, or some other condition.
Consider:
>>> my_file = open("file.txt", "r")
>>> my_file.read()
'This is a test file'
>>> my_file.read()
''
The last character here isn't EOF, there's just nothing there. Python has .read() until the end of the file and can't .read() any more.
Because stdin in a special type of 'file' it doesn't have an end. You have to define that end. The terminal has defined that end as the control characters, but here you are not passing data to stdin via a terminal, you'll have to manage it yourself.
Just closing the file
Input [...] never really "ends" (unless the device is disconnected)
Closing stdin is probably the simplest solution here. stdin is an infinite file, so once you're done writing to it, just close it.
Expect your own control character
Another option is to define your own control character. You can use whatever you want here. The example below uses a NULL byte.
Python
class FileWithEOF:
def __init__(self, file_obj):
self.file = file_obj
self.value = bytes()
def __enter__(self):
return self
def __exit__(self, *args, **kwargs):
pass
def read(self):
while True:
val = self.file.buffer.read(1)
if val == b"\x00":
break
self.value += val
return self.value
data = FileWithEOF(sys.stdin).read()
Node
childProcess = require('child_process').spawn('./python_code.py');
childProcess.stdin.write("Some text I want to send.");
childProcess.stdin.write(Buffer.from([00]));
You might be reading the wrong length
I think the value you're capturing in Len is less than the length of your file.
Python
import sys
while True:
length = int(sys.stdin.read(2))
with open("test.txt", "a") as f:
f.write(sys.stdin.read(length))
Node
childProcess = require('child_process').spawn('./test.py');
// Python reads the first 2 characters (`.read(2)`)
childProcess.stdin.write("10");
// Python reads 9 characters, but does nothing because it's
// expecting 10. `stdin` is still capable of producing bytes from
// Pythons point of view.
childProcess.stdin.write("123456789");
// Writing the final byte hits 10 characters, and the contents
// are written to `test.txt`.
childProcess.stdin.write("A");

MPD, FIFO, Python, Audioop, Arduino, and Voltmeter: "Faking" a VU Meter

I'm trying to use a computer connected to an Arduino (which is itself connected to some 5V voltmeters) to "fake" an old school stereo VU meter. My goal is to have the computer that is playing the audio file analyze the signal and send the amplitude information to the Arudino via a serial connection to be displayed on the voltmeters.
I'm using MPD to render and send the audio to a USB DAC (ODAC). MPD is also outputting to a FIFO, which I read from using a Python script. I read from the FIFO in 4096 byte chunks, then use the audioop library to split that chunk/sample into a left and right channel and compute the maximum amplitude of each channel.
Here's the problem - I'm getting swamped with data. I'm guessing my math is wrong or that I don't understand how a FIFO works (or maybe both). MPD is outputting everything in 44100:16:2 format - I thought that meant that it would be writing out 44,100 4-byte samples per second. So if I'm grabbing 4096 byte chunks, I should expect about 43 chunks per second. But I'm getting far more than that (over 100) and the number of chunks I get per second doesn't change if I up my chunk size. For example, if I double my chunk size to 8192, I still get roughly the same number of chunks per second. So clearly I'm doing something wrong, but I don't know what it is. Anyone have any thoughts?
Here is the relevant portion of my mpd.conf file:
audio_output {
type "fifo"
name "my_fifo"
path "/tmp/mpd.fifo"
format "44100:16:2"
}
And here is the Python script:
import os
import audioop
import time
import errno
import math
#Open the FIFO that MPD has created for us
#This represents the sample (44100:16:2) that MPD is currently "playing"
fifo = os.open('/tmp/mpd.fifo', os.O_RDONLY)
while 1:
try:
rawStream = os.read(fifo, 4096)
except OSError as err:
if err.errno == errno.EAGAIN or err.errno == errno.EWOULDBLOCK:
rawStream = None
else:
raise
if rawStream:
leftChannel = audioop.tomono(rawStream, 2, 1, 0)
rightChannel = audioop.tomono(rawStream, 2, 0, 1)
stereoPeak = audioop.max(rawStream, 2)
leftPeak = audioop.max(leftChannel, 2)
rightPeak = audioop.max(rightChannel, 2)
leftDB = 20 * math.log10(leftPeak) -74
rightDB = 20 * math.log10(rightPeak) -74
print(rightPeak, leftPeak, rightDB, leftDB)
Answering my own question. It turns out that, regardless of how many bytes I specified should be read, os.read() was returning 2048 bytes. What that means is that the second parameter that os.read() takes is the maximum number of bytes it will read - but there's no guarantee that that many bytes will actually be read. I had thought that by leaving out the NONBLOCK option when I opened the FIFO that the os.read() call would wait around until it received an end of file or the number of bytes specified. But that's not the case. To get around this issue, my code now checks the length of the byte string returned by os.read() and - if that length is less than my specified chunk size - will wait to grab the next chunk(s) and then will concatenate all the chunks together so that I have a chunk size that matches my target before I move on to processing the data.

Sending data chunks over named pipe in linux

I want to send data blocks over named pipe and want receiver to know where data blocks end. How should I do it with named pipes? Should I use some format for joining and splitting blocks (treat pipe always as stream of bytes) or are there any other method?
I've tried opening and closing pipe at sender for every data block but data becomes concatenated at receiver side (EOF not send):
for _ in range(2):
with open('myfifo', 'bw') as f:
f.write(b'+')
Result:
rsk#fe temp $ cat myfifo
++rsk#fe temp $
You can either use some sort of delimiter or a frame structure over your pipes, or (preferably) use multiprocessing.Pipe like objects and run Pickled Python objects through them.
The first option is simply defining a simple protocol you will be running through your pipe. Add a header to each chunk of data you send so that you know what to do with it. For instance, use a length-value system:
import struct
def send_data(file_descriptor, data):
length = struct.pack('>L', len(data))
packet = "%s%s" % (length, data)
file_descriptor.write(packet)
def read_data(file_descriptor):
binary_length = file_descriptor.read(4)
length = struct.unpack('>L', binary_length)[0]
data = ''
while len(data) < length:
data += file_descriptor.read(length - len(data))
As for the other option - You can try reading the code of the multiprocessing module, but essentially, you just run through the pipe the result of cPickle.dumps and then you read it into cPickle.loads to get Python objects.
I would just use lines of JSON ecoded data. These are easy to debug and the performance is reasonable.
For an example on reading and writing lines:
http://www.tutorialspoint.com/python/file_writelines.htm
For an example of using ujson (UltraJSON):
https://pypi.python.org/pypi/ujson
In addition to other solutions, you don't need to stick on named pipes. Named sockets aren't worse and provide more handy features. With AF_LOCAL and SOCK_SEQPACKET, message boundaries transport is maintained by the kernel, so what is written by a single send() will be got on the opposite side with a single recv().

python socket: sending and receiving 16 bytes

See edits below.
I have two programs that communicate through sockets. I'm trying to send a block of data from one to the other. This has been working with some test data, but is failing with others.
s.sendall('%16d' % len(data))
s.sendall(data)
print(len(data))
sends to
size = int(s.recv(16))
recvd = ''
while size > len(recvd):
data = s.recv(1024)
if not data:
break
recvd += data
print(size, len(recvd))
At one end:
s = socket.socket()
s.connect((server_ip, port))
and the other:
c = socket.socket()
c.bind(('', port))
c.listen(1)
s,a = c.accept()
In my latest test, I sent a 7973903 byte block and the receiver reports size as 7973930.
Why is the data block received off by 27 bytes?
Any other issues?
Python 2.7 or 2.5.4 if that matters.
EDIT: Aha - I'm probably reading past the end of the send buffer. If remaining bytes is less than 1024, I should only read the number of remaining bytes. Is there a standard technique for this sort of data transfer? I have the feeling I'm reinventing the wheel.
EDIT2: I'm screwing up by reading the next file in the series. I'm sending file1 and the last block is 997 bytes. Then I send file2, so the recv(1024) at the end of file1 reads the first 27 bytes of file2.
I'll start another question on how to do this better.
Thanks everyone. Asking and reading comments helped me focus.
First, the line
size = int(s.recv(16))
might read less than 16 bytes — it is unlikely, I will grant, but possible depending on how the network buffers align. The recv() call argument is a maximum value, a limit on how much data you are willing to receive. But you might only receive one byte. The operating system will generally give you control back once at least one byte has arrived, maybe (depending on the OS and on how busy the CPU is) after waiting another few milliseconds in case a second packet arrives with some further data, so that it only has to wake you up once instead of twice.
So you would want to say instead (to do the simplest possible loop; other variants are possible):
data = ''
while len(data) < 16:
more = s.recv(16 - len(data))
if not more:
raise EOFError()
data += more
This is indeed a wheel nearly everyone re-invents because it is so often needed. And your own code needs it a second time: your while loop needs its recv() to count down, asking for smaller and smaller limits until finally it has received exactly the number of bytes that were promised, and no more.

Python EOF for multi byte requests of file.read()

The Python docs on file.read() state that An empty string is returned when EOF is encountered immediately. The documentation further states:
Note that this method may call the
underlying C function fread() more
than once in an effort to acquire as
close to size bytes as possible. Also
note that when in non-blocking mode,
less data than was requested may be
returned, even if no size parameter
was given.
I believe Guido has made his view on not adding f.eof() PERFECTLY CLEAR so need to use the Python way!
What is not clear to ME, however, is if it is a definitive test that you have reached EOF if you receive less than the requested bytes from a read, but you did receive some.
ie:
with open(filename,'rb') as f:
while True:
s=f.read(size)
l=len(s)
if l==0:
break # it is clear that this is EOF...
if l<size:
break # ? Is receiving less than the request EOF???
Is it a potential error to break if you have received less than the bytes requested in a call to file.read(size)?
You are not thinking with your snake skin on... Python is not C.
First, a review:
st=f.read() reads to EOF, or if opened as a binary, to the last byte;
st=f.read(n) attempts to reads n bytes and in no case more than n bytes;
st=f.readline() reads a line at a time, the line ends with '\n' or EOF;
st=f.readlines() uses readline() to read all the lines in a file and returns a list of the lines.
If a file read method is at EOF, it returns ''. The same type of EOF test is used in the other 'file like" methods like StringIO, socket.makefile, etc. A return of less than n bytes from f.read(n) is most assuredly NOT a dispositive test for EOF! While that code may work 99.99% of the time, it is the times it does not work that would be very frustrating to find. Plus, it is bad Python form. The only use for n in this case is to put an upper limit on the size of the return.
What are some of the reasons the Python file-like methods returns less than n bytes?
EOF is certainly a common reason;
A network socket may timeout on read yet remain open;
Exactly n bytes may cause a break between logical multi-byte characters (such as \r\n in text mode and, I think, a multi-byte character in Unicode) or some underlying data structure not known to you;
The file is in non-blocking mode and another process begins to access the file;
Temporary non-access to the file;
An underlying error condition, potentially temporary, on the file, disc, network, etc.
The program received a signal, but the signal handler ignored it.
I would rewrite your code in this manner:
with open(filename,'rb') as f:
while True:
s=f.read(max_size)
if not s: break
# process the data in s...
Or, write a generator:
def blocks(infile, bufsize=1024):
while True:
try:
data=infile.read(bufsize)
if data:
yield data
else:
break
except IOError as (errno, strerror):
print "I/O error({0}): {1}".format(errno, strerror)
break
f=open('somefile','rb')
for block in blocks(f,2**16):
# process a block that COULD be up to 65,536 bytes long
Here's what my C compiler's documentation says for the fread() function:
size_t fread(
void *buffer,
size_t size,
size_t count,
FILE *stream
);
fread returns the number of full items
actually read, which may be less than
count if an error occurs or if the end
of the file is encountered before
reaching count.
So it looks like getting less than size means either an error has occurred or EOF has been reached -- so breaking out of the loop would be the correct thing to do.

Categories

Resources