Related
Consider this small python script odd-read-blocking.py:
#!/usr/bin/python
import signal
import sys
sig = None
def handler(signum, frame):
global sig
sig = signum
signal.signal(signal.SIGINT, handler)
signal.signal(signal.SIGTERM, handler)
x = sys.stdin.read(3)
print 'signal', sig
print 'read bytes', len(x)
exit(0)
I run this and feed it with two bytes of standard input data ('a' + '\n'):
> echo a | ./odd-read-blocking.py
signal None
read bytes 2
>
Fine.
Now I feed it with the same two bytes (by typing 'a' + '\n' into its standard input). Please note that standard input is then not at EOF yet and potentially has more data to come. So the read blocks, as it expects one more byte. I use Ctrl+C on the script.
> ./odd-read-blocking.py
a
^Csignal 2
read bytes 2
>
Fine. We see that two bytes have been read and signal 2 was received.
Now I open a standard input stream, but do not send any byte on it. The read blocks as expected. If I now use Ctrl+C on the script, it will keep sitting there and wait. The read will not be interrupted. The SIGINT will not be processed.
> ./odd-read-blocking.py
^C
Nothing here. Script still running (seemingly blocked at the read).
Now hitting return once, then Ctrl+C again:
^Csignal 2
read bytes 1
>
So, only after receiving at least some data (a single '\n' in this case) on its standard input will the script behave as I expect it and correctly interrupt the blocked read and tell me it has received signal 2 and read 1 byte.
Alternative 1: instead of using Ctrl+C as shown above, I have tried this same thing using kill pid from a separate terminal. The behaviour is the same.
Alternative 2: instead of using the shell standard input as described above, I have done this:
> sleep 2000 | ./odd-read-blocking.py
When using kill pid to send SIGTERM to the odd-read-blocking.py process I get the same behaviour. Here, the script process can only be killed using SIGKILL (9).
Why isn't the read interrupted, when it is blocking on an as yet empty but still active standard input stream?
I find this odd. Who doesn't? Who can explain?
The short version
If a Python signal handler throws an exception to abandon an ongoing file.read, any data already read is lost. (Any asynchronous exception, like the default KeyboardInterrupt, makes it basically impossible to prevent this sort of failure unless you have a way to mask it.)
To minimize the need for this, file.read returns early (i.e., with a shorter string than requested) when it is interrupted by a signal—note that this is in addition to the EOF and non-blocking I/O cases that are documented! However, it can't do this when it has no data yet, since it returns the empty string to indicate EOF.
Details
As always, the way to understand behavior like this is with strace.
read(2)
The actual read system call has a dilemma when a signal arrives while the process is blocked. First, the (C) signal handler gets invoked—but because that could happen between any two instructions, there's very little it can do beyond setting a flag (or writing to a self-pipe). Then what? If SA_RESTART is set, the call is resumed; otherwise…
If no data has been transferred yet, read can fail and the client can check its signal flag. It fails with the special EINTR to clarify that nothing actually went wrong with the I/O.
If some data has already been written into the (userspace) buffer, it can't just return "failure", because data would be lost—the client can't know how much (if any) data is in the buffer. So it just returns success (the number of bytes read so far)! Short reads like this are always a possibility: the client has to call read again to check that it has reached end of file. (Just like file.read, a short read of 0 bytes would be EOF.) The client therefore has to check their signal flag after every read, whether it succeeds or not. (Note that this is still not perfectly reliable, but it's good enough for many interactive use cases.)
file.read()
The system call isn't the whole story: after all, the normal configuration for a terminal has it return immediately after seeing a newline. Python 2's low-level file.read is a wrapper for fread, which will issue another read if one is short. But when a read fails with EINTR, fread returns early and file.read calls your (Python) signal handler. (If you add output to it, you'll see that it's called immediately for each signal you send, even if file.read doesn't return.)
Then it's faced with a dilemma similar to that for the system call: as discussed, a short read can't be empty because it means EOF. Unlike a C signal handler, however, a Python one can do arbitrary work (including raising an exception to abort the I/O immediately, at the cost of risking data loss as mentioned at the beginning), and it's considered a convenient simplification to the interface to hide the possibility EINTR. So the fread call is just silently repeated.
Python 3.5
The rules for retrying changed in 3.5. Now the io.IOBase.read resumes even if it has data in hand; this is more consistent, but it forces the use of exceptions to stop reading, which means that you can't opt to wait on some data in order not to risk losing any you already have. The very heavyweight solution is to switch to multiplexed I/O and use signal.set_wakeup_fd(); this has the added advantage of allowing SIGINT to affect the main thread without having to bother with masking it in all the others.
I'm confused about socket.send() and socket.sendall() functions in Python. As I understand from the documentation send() function uses TCP protocol and sendall() function uses UDP protocol for sending data. I know that TCP is more reliable for most of the Web Applications because we can check which packets are sent and which packets are not. That's why, I think use of send() function can be more reliable rather than sendall() function.
At this point, I want to ask what is the exact difference between these two functions and which one is more reliable for web applications?
Thank you.
socket.send is a low-level method and basically just the C/syscall method send(3) / send(2). It can send less bytes than you requested, but returns the number of bytes sent.
socket.sendall is a high-level Python-only method that sends the entire buffer you pass or throws an exception. It does that by calling socket.send until everything has been sent or an error occurs.
If you're using TCP with blocking sockets and don't want to be bothered
by internals (this is the case for most simple network applications),
use sendall.
And python docs:
Unlike send(), this method continues to send data from string until
either all data has been sent or an error occurs. None is returned on
success. On error, an exception is raised, and there is no way to
determine how much data, if any, was successfully sent
Credits to Philipp Hagemeister for brief description I got in the past.
edit
sendall use under the hood send - take a look on cpython implementation. Here is sample function acting (more or less) like sendall :
def sendall(sock, data, flags=0):
ret = sock.send(data, flags)
if ret > 0:
return sendall(sock, data[ret:], flags)
else:
return None
or from rpython (pypy source):
def sendall(self, data, flags=0, signal_checker=None):
"""Send a data string to the socket. For the optional flags
argument, see the Unix manual. This calls send() repeatedly
until all data is sent. If an error occurs, it's impossible
to tell how much data has been sent."""
with rffi.scoped_nonmovingbuffer(data) as dataptr:
remaining = len(data)
p = dataptr
while remaining > 0:
try:
res = self.send_raw(p, remaining, flags)
p = rffi.ptradd(p, res)
remaining -= res
except CSocketError, e:
if e.errno != _c.EINTR:
raise
if signal_checker is not None:
signal_checker()
Is it me, or can I not find a good tutorial on non-blocking sockets in python?
I'm not sure how to exactly work the .recv and the .send in it. According to the python docs, (my understanding of it, at least) the recv'ed or send'ed data might be only partial data. So does that mean I have to somehow concatenate the data while recv and make sure all data sends through in send. If so, how? An example would be much appreciated.
It doesn't really matter if your socket is in non-blocking mode or not, recv/send work pretty much the same; the only difference is that non-blocking socket throws 'Resource temporarily unavailable' error instead of waiting for data/socket.
recv method returns numbers of bytes received, which is told to be less or equal to the passed bufsize. If you want to receive exactly size bytes, you should do something similar to the following code:
def recvall(sock, size):
data = ''
while len(data) < size:
d = sock.recv(size - len(data))
if not d:
# Connection closed by remote host, do what best for you
return None
data += d
return data
This is important to remember, that in blocking mode you have to do exactly the same. (The number of bytes passed to application layer is for example limited by recv buffer size in the OS.)
send method returns number of bytes sent, which is told to be less or equal to the length of passed string. If you want to ensure the whole message was sent, you should do something similar to the following code:
def sendall(sock, data):
while data:
sent = sock.send(data)
data = data[sent:]
You can use sock.sendall directly, but (according to the documentation) on error, an exception is raised, and there is no way to determine how much data, if any, was successfully sent.
The sockets in Python follow the BSD socket API and behave in the similar way to c-style sockets (the difference is, for example, they throw exception instead of returning error code). You should be happy with any socket tutorial on the web and manpages.
Keep bytes you want to send in a buffer. (A list of byte-strings would be best, since you don't have to concatenate them.) Use the fcntl.fcntl function to set the socket in non-blocking mode:
import fcntl, os
fcntl.fcntl(mysocket, fcntl.F_SETFL, os.O_NONBLOCK)
Then select.select will tell you when it is OK to read and write to the socket. (Writing when it is not OK will give you the EAGAIN error in non-blocking mode.) When you write, check the return value to see how many bytes were actually written. Eliminate that many bytes from your buffer. If you use the list-of-strings approach, you only need to try writing the first string each time.
If you read the empty string, your socket has closed.
I have a rare bug that seems to occur reading a socket.
It seems, that during reading of data sometimes I get only 1-3 bytes of a data package that is bigger than this.
As I learned from pipe-programming, there I always get at least 512 bytes as long as the sender provides enough data.
Also my sender does at least transmit >= 4 Bytes anytime it does transmit anything -- so I was thinking that at least 4 bytes will be received at once in the beginning (!!) of the transmission.
In 99.9% of all cases, my assumption seems to hold ... but there are really rare cases, when less than 4 bytes are received. It seems to me ridiculous, why the networking system should do this?
Does anybody know more?
Here is the reading-code I use:
mySock, addr = masterSock.accept()
mySock.settimeout(10.0)
result = mySock.recv(BUFSIZE)
# 4 bytes are needed here ...
...
# read remainder of datagram
...
The sender sends the complete datagram with one call of send.
Edit: the whole thing is working on localhost -- so no complicated network applications (routers etc.) are involved. BUFSIZE is at least 512 and the sender sends at least 4 bytes.
I assume you're using TCP. TCP is a stream based protocol with no idea of packets or message boundaries.
This means when you do a read you may get less bytes than you request. If your data is 128k for example you may only get 24k on your first read requiring you to read again to get the rest of the data.
For an example in C:
int read_data(int sock, int size, unsigned char *buf) {
int bytes_read = 0, len = 0;
while (bytes_read < size &&
((len = recv(sock, buf + bytes_read,size-bytes_read, 0)) > 0)) {
bytes_read += len;
}
if (len == 0 || len < 0) doerror();
return bytes_read;
}
As far as I know, this behaviour is perfectly reasonable. Sockets may, and probably will fragment your data as they transmit it. You should be prepared to handle such cases by applying appropriate buffering techniques.
On other hand, if you are transmitting the data on the localhost and you are indeed getting only 4 bytes it probably means you have a bug somewhere else in your code.
EDIT: An idea - try to fire up a packet sniffer and see whenever the packet transmitted will be full or not; this might give you some insight whenever your bug is in your client or in your server.
The simple answer to your question, "Read from socket: Is it guaranteed to at least get x bytes?", is no. Look at the doc strings for these socket methods:
>>> import socket
>>> s = socket.socket()
>>> print s.recv.__doc__
recv(buffersize[, flags]) -> data
Receive up to buffersize bytes from the socket. For the optional flags
argument, see the Unix manual. When no data is available, block until
at least one byte is available or until the remote end is closed. When
the remote end is closed and all data is read, return the empty string.
>>>
>>> print s.settimeout.__doc__
settimeout(timeout)
Set a timeout on socket operations. 'timeout' can be a float,
giving in seconds, or None. Setting a timeout of None disables
the timeout feature and is equivalent to setblocking(1).
Setting a timeout of zero is the same as setblocking(0).
>>>
>>> print s.setblocking.__doc__
setblocking(flag)
Set the socket to blocking (flag is true) or non-blocking (false).
setblocking(True) is equivalent to settimeout(None);
setblocking(False) is equivalent to settimeout(0.0).
From this it is clear that recv() is not required to return as many bytes as you asked for. Also, because you are calling settimeout(10.0), it is possible that some, but not all, data is received near the expiration time for the recv(). In that case recv() will return what it has read - which will be less than you asked for (but consistenty < 4 bytes does seem unlikely).
You mention datagram in your question which implies that you are using (connectionless) UDP sockets (not TCP). The distinction is described here. The posted code does not show socket creation so we can only guess here, however, this detail can be important. It may help if you could post a more complete sample of your code.
If the problem is reproducible you could disable the timeout (which incidentally you do not seem to be handling) and see if that fixes the problem.
This is just the way TCP works. You aren't going to get all of your data at once. There are just too many timing issues between sender and receiver including the senders operating system, NIC, routers, switches, the wires themselves, the receivers NIC, OS, etc. There are buffers in the hardware, and in the OS.
You can't assume that the TCP network is the same as a OS pipe. With the pipe, it's all software so there's no cost in delivering the whole message at once for most messages. With the network, you have to assume there will be timing issues, even in a simple network.
That's why recv() can't give you all the data at once, it may just not be available, even if everything is working right. Normally, you will call recv() and catch the output. That should tell you how many bytes you've received. If it's less than you expect, you need to keep calling recv() (as has been suggested) until you get the correct number of bytes. Be aware that in most cases, recv() returns -1 on error, so check for that and check your documentation for ERRNO values. EAGAIN in particular seems to cause people problems. You can read about it on the internet for details, but if I recall, it means that no data is available at the moment and you should try again.
Also, it sounds like from your post that you're sure the sender is sending the data you need sent, but just to be complete, check this:
http://beej.us/guide/bgnet/output/html/multipage/advanced.html#sendall
You should be doing something similar on the recv() end to handle partial receives. If you have a fixed packet size, you should read until you get the amount of data you expect. If you have a variable packet size, you should read until you have the header that tells you how much data you send(), then read that much more data.
From the Linux man page of recv http://linux.about.com/library/cmd/blcmdl2_recv.htm:
The receive calls normally return any
data available, up to the requested
amount, rather than waiting for
receipt of the full amount requested.
So, if your sender is still transmitting bytes, the call will only give what has been transmitted so far.
If the sender sends 515 bytes, and your BUFSIZE is 512, then the first recv will return 512 bytes, and the next will return 3 bytes... Could this be what's happening?
(This is just one case amongst many which will result in a 3-byte recv from a larger send...)
If you are still interested, patterns like this :
# 4 bytes are needed here ......
# read remainder of datagram...
may create the silly window thing.
Check this out
Use recv_into(...) method from the socket module.
Robert S. Barnes written the example in C.
But you can use Python 2.x with standard python-libraries:
def readReliably(s,n):
buf = bytearray(n)
view = memoryview(buf)
sz = s.recv_into(view,n)
return sz,buf
while True:
sk,skfrom = s.accept()
sz,buf = io.readReliably(sk,4)
a = struct.unpack("4B",buf)
print repr(a)
...
Notice, that sz returned by readReliably() function may be greater than n.
I want to get out of loop when there is no data but loop seems to be stopping at recvfrom
image=''
while 1:
data,address=self.socket.recvfrom(512)
if data is None:break
image=image+data
count=count+1
print str(count)+' packets received...'
Try setting to a non-blocking socket. You would do this before the loop starts. You can also try a socket with a timeout.
recvfrom may indeed stop (waiting for data) unless you've set your socket to non-blocking or timeout mode. Moreover, if the socket gets closed by your counterpart, the indication of "socket was closed, nothing more to receive" is not a value of None for data -- it's an empty string, ''. So you could change your test to if not data: break for more generality.
What is the blocking mode of your socket?
If you are in blocking mode (which I think is the default), your program would stop until data is available... You would then not get to the next line after the recv() until data is coming.
If you switch to non-blocking mode, however (see socket.setblocking(flag)), I think that it will raise an exception you would have to catch rather than null-check.
You might want to set socket.setdefaulttimeout(n) to get out of the loop if no data is returned after specified time period.