Should sockets be set to non-blocking when used with select.select in Python?
What difference does it make if they are or aren't?
Occasionally I find that calling send on a socket that returns sendable will block. Furthermore I find that blocking sockets will generally send the whole buffer given (128 KiB). In non-blocking mode, sending will accept far fewer bytes (20-40 KiB compared with the example given earlier) and return quicker. I'm using Python 3.1 on Lucid.
The answer might be OS dependent unfortunately. I'm replying only regarding Linux.
I'm not aware of differences regarding blocking/non-blocking sockets in select, but on linux, the select system call man page has this in it 'BUGS' section:
Under Linux, select() may report a
socket file descriptor as "ready for
reading", while nevertheless a
subsequent read blocks. This could
for example happen when data has
arrived but upon examination has
wrong checksum and is discarded. There may be other
circumstances in which a file
descriptor is spuriously reported as
ready. Thus it may be safer to use
O_NONBLOCK on sockets that should not
block.
I doubt a python abstraction above that could "hide" this issue without side-effects.
As for the blocking write sending more data, that's expected. send will block until there is enough buffer space to pass your whole request down if the socket is blocking. If the socket is non-blocking, it only sends as much as can currently fit in the socket's send buffer.
Related
I'm experimenting with the python socket library (3.5, on linux mint 18), trying to understand UDP. I'm a hardware person dabbling in software, and UDP seems simpler to get my head around than TCP. I am well aware that UDP does not guarantee to deliver packets one for one.
So far, I can follow the tutorials to echo data back from a server to a client.
However, I like to push things to see what happens when applications don't follow the expected path, I detest writing things that 'hang' when unexpected things happen.
If a server binds a socket to a port number, then the client sends several messages to that port, before the server calls recvfrom() several times, I find that each call returns one message, with the messages in order. In other words, the messages have been buffered, later messages have not overwritten earlier messages in the queue. I was not surprised to see this happen, but also would not have been surprised to find only the last received message available, aka buffer length of one.
Is this buffer, and its depth, a python implementation detail, a linux mint/ubuntu detail, or defined by the UDP protocol?
Is this buffer, and its depth, a python implementation detail, a linux
mint/ubuntu detail, or defined by the UDP protocol?
The UDP socket's buffer sizes are an implementation detail of your OS's networking stack. Each OS tries to set reasonable default size based on its expected use-cases, but you can override the OS's default size (up to some maximum value, anyway) on a per-socket basis by calling socket.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, newSizeInBytes) and/or socket.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, newSizeInBytes)
The buffers will queue up as many packets as they have space to hold, then drop any incoming packets that they can't fully fit into the remaining space.
UDP buffers are in the operating system's network stack. The size of the buffers will depend on how much memory your computer has and kernel configuration settings. On modern computers with gigabytes of memory, it's likely that the OS will have plenty of space for UDP buffers, and it will be difficult to overflow them unless the computer is extremely overloaded.
There might be some way for you to configure the OS to limit the amount of memory used for UDP buffers, so that you can cause overflows and see what the symptoms are in your test application. I don't know the configuration settings, you could try asking in Unix & Linux or AskUbuntu.com.
I am currently working on a server + client combo on python and I'm using TCP sockets. From networking classes I know, that TCP connection should be closed step by step, first one side sends the signal, that it wants to close the connection and waits for confirmation, then the other side does the same. After that, socket can be safely closed.
I've seen in python documentation function socket.shutdown(flag), but I don't see how it could be used in this standard method, theoretical of closing TCP socket. As far as I know, it just blocks either reading, writing or both.
What is the best, most correct way to close TCP socket in python? Are there standard functions for closing signals or do I need to implement them myself?
shutdown is useful when you have to signal the remote client that no more data is being sent. You can specify in the shutdown() parameter which half-channel you want to close.
Most commonly, you want to close the TX half-channel, by calling shutdown(1). In TCP level, it sends a FIN packet, and the remote end will receive 0 bytes if blocking on read(), but the remote end can still send data back, because the RX half-channel is still open.
Some application protocols use this to signal the end of the message. Some other protocols find the EOM based on data itself. For example, in an interactive protocol (where messages are exchanged many times) there may be no opportunity, or need, to close a half-channel.
In HTTP, shutdown(1) is one method that a client can use to signal that a HTTP request is complete. But the HTTP protocol itself embeds data that allows to detect where a request ends, so multiple-request HTTP connections are still possible.
I don't think that calling shutdown() before close() is always necessary, unless you need to explicitly close a half-channel. If you want to cease all communication, close() does that too. Calling shutdown() and forgetting to call close() is worse because the file descriptor resources are not freed.
From Wikipedia: "On SVR4 systems use of close() may discard data. The use of shutdown() or SO_LINGER may be required on these systems to guarantee delivery of all data." This means that, if you have outstanding data in the output buffer, a close() could discard this data immediately on a SVR4 system. Linux, BSD and BSD-based systems like Apple are not SVR4 and will try to send the output buffer in full after close(). I am not sure if any major commercial UNIX is still SVR4 these days.
Again using HTTP as an example, an HTTP client running on SVR4 would not lose data using close() because it will keep the connection open after request to get the response. An HTTP server under SVR would have to be more careful, calling shutdown(2) before close() after sending the whole response, because the response would be partly in the output buffer.
According to the python documentation which says:
Strictly speaking, you’re supposed to use shutdown on a socket before
you close it. The shutdown is an advisory to the socket at the other
end. Depending on the argument you pass it, it can mean “I’m not going
to send anymore, but I’ll still listen”, or “I’m not listening, good
riddance!”. Most socket libraries, however, are so used to programmers
neglecting to use this piece of etiquette that normally a close is the
same as shutdown(); close(). So in most situations, an explicit
shutdown is not needed.
I think the most correct way to close a TCP connection would be to use shutdown before closing a connection, because close is not atomic! This can make some bugs. Suppose you're using close function without shutdown and the data didn't send to the server correctly, at the same time python closes the connection and server can't reply to client, now the socket at the other end may hang indefinitely.
Should I call the send function from WinAPI (https://msdn.microsoft.com/ru-ru/library/windows/desktop/ms740149(v=vs.85).aspx) in a loop to guarantee that all data has been sent (like the similar send function from Python standard library) or does it work like the sendall function?
Python and the Winsock API both implement the BSD socket api, so the send function works pretty much identically. So no, it is not a convenient sendall and far FAR too much code assumes it is.
You could write a simple sendall function on windows by looping on send until all the bytes have been successfully sent, but this is a bad idea in GUI applications as it will prevent the application from responding to the user, and in more complex console applications it might prevent other sockets communicating.
If either of these is a concern, you should investigate WSAAsyncSelect for GUI applications, where your window proc will receive window messages each time the socket is ready to be read or written to, or select which allows a console program / non gui worker thread to loop on a collection of up to 32 sockets (windows limitation) to test which ones are readable or writable.
Twisted includes a reactor implemented on top of MsgWaitForMultipleObjects. Apparently the reactor has problems reliably noticing when a TCP connection ends, at least in the case where a peer sends some bytes and then quickly closes the connection. What seems to happen is:
The reactor calls MsgWaitForMultipleObjects with some socket handles and QS_ALLINPUT.
The call completes and indicates the handle for a socket in this state (that is, has bytes waiting to be read and has been closed by the peer) is active.
The reactor dispatches this notification to the common TCP implementation.
The TCP implementation reads the available bytes from the socket. There are some, they get delivered to application code.
Control is returned to the reactor, which eventually calls MsgWaitForMultipleObjects again.
MsgWaitForMultipleObjects never again indicates that the handle is active. The TCP implementation never gets to look at the socket again, so it can never detect that the connection is closed.
This makes it appear as though MsgWaitForMultipleObjects is an edge-triggered notification mechanism. The MSDN documentation says:
Waits until one or all of the specified objects are in the signaled state
or the time-out interval elapses.
This doesn't sound like edge-triggering. It sounds like level-triggering.
Is MsgWaitForMultipleObjects actually edge-triggered? Or is it level-triggered and this misbehavior is caused by some other aspect of its behavior?
Addendum The MSDN docs for WSAEventSelect explains what's going on here a bit more, including pointing out that FD_CLOSE is basically a one-off event. After its signaled once, you'll never get it again. This goes some way towards explaining why Twisted has this problem. I'm still interested to hear how to effectively use MsgWaitForMultipleObjects given this limitation, though.
In order to use WSAEventSelect and differentiate activities, you need to call WSAEnumNetworkEvents. Make sure you're processing each event that was reported, not just the first.
WSAAsyncSelect makes it easy to determine the cause, and is often used together with MsgWaitForMultipleObjects.
So you might use WSAAsyncSelect instead of WSAEventSelect.
Also, I think you have a fundamental misunderstanding of the difference between edge-triggered and level-triggered. Your reasoning seems to be more related to auto-reset vs manual-reset events.
I've been scouring the Internet looking for a solution to my problem with Python. I'm trying to use a urllib2 connection to read a potentially endless stream of data from an HTTP server. It's part of some interactive communication, so it's important that I can get the data that's available, even if it's not a whole buffer full. There seems to be no way to have read \ readline return the available data. It will block forever waiting for the entire (endless) stream before it returns.
Even if I set the underlying file descriptor to non-blocking using fnctl, the urllib2 file-object still blocks!! In general there seems to be no way to make python file-objects, upon read, return all available data if there is some and block otherwise.
I've seen a few posts about people seeking help with this, but I have seen no solutions. What gives? Am I missing something? This seems like such a normal use-case to completely ruin! I'm hoping to utilize urllib2's ability to detect configured proxies and use chunked encoding, but I can't if it won't cooperate.
Edit: Upon request, here is some example code
Client:
connection = urllib2.urlopen(commandpath)
id = connection.readline()
Now suppose that the server is using chunked transfer encoding, and writes one chunk down the stream and the chunk contains the line, and then waits. The connection is still open, but the client has data waiting in a buffer.
I cannot get read or readline to return the data I know it has waiting for it, because it tries to read until the end of the connection. In this case the connection may never close so it will wait either forever or until an inactivity timeout occurs, severing the connection. Once the connection is severed it will return, but that's obviously not the behavior I want.
urllib2 operates at the HTTP level, which works with complete documents. I don't think there's a way around that without hacking into the urllib2 source code.
What you can do is use plain sockets (you'll have to talk HTTP yourself in this case), and call sock.recv(maxbytes) which does read only available data.
Update: you may want to try to call conn.fp._sock.recv(maxbytes), instead of conn.read(bytes) on an urllib2 connection.