How to use dataReceived in Twisted? - python

I have implemented a server program using Twisted. I am using basic.lineReceiver with the method dataReceived to receive data from multiple clients. Also, I am using protocol.ServerFactory to keep track of connected clients. The server sends some commands to each connected client. Based on the response that the server gets from each client, it (the server) should perform some tasks. Thus, the best solution that came to my mind was to create a buffer for received messages as a python list, and each time that the functions at server side want to know the response from a client, they access the last element of the buffer list (of that client).
This approach has turned out to be unreliable. The first issue is that since TCP streaming is used, sometimes messages merge (I can use a delimiter for this). Second, the received messages are sometimes not in their appropriate sequence. Third, the networking communication seems to be too slow, as when the server initially tries to access the last element of the buffered list, the list is empty (this shows that the last messages on the buffer might not be the response to the last sent commands).
Could you tell me what is the best parctice for using dataReceived or its equivalents in the above problem? thank you in advance.
EDIT 1: Answer- While I accept #Jean-Paul Calderone's answer since I certainly learned from it, I would like to add that in my own research of Twisted's documentation, I learned that in order to avoid delays in communications of the server, one should use return at the end of dataReceived() or lineReceived() functions, and this solved part of my problem. The rest, were explained in the answer.

I have implemented a server program using Twisted. I am using basic.lineReceiver with the method dataReceived to receive data from multiple clients.
This is a mistake - an unfortunately common one brought on by the mistaken use of inheritance in many of Twisted's protocol implementations as the mechanism for building up more and more sophisticated behaviors. When you use twisted.protocols.basic.LineReceiver, the dataReceived callback is not for you. LineReceiver.dataReceived is an implementation detail of LineReceiver. The callback for you is LineReceiver.lineReceived. LineReceiver.dataReceived looks like it might be for you - it doesn't start with an underscore or anything - but it's not. dataReceived is how LineReceiver receives information from its transport. It is one of the public methods of IProtocol - the interface between a transport and the protocol interpreting the data received over that transport. Yes, I just said "public method" there. The trouble is it's public for the benefit of someone else. This is confusing and perhaps not communicated as well as it could be. No doubt this is why it is a Frequently Asked Question.
This approach has turned out to be unreliable. The first issue is that since TCP streaming is used, sometimes messages merge (I can use a delimiter for this).
Use of dataReceived is why this happens. LineReceiver already implements delimiter-based parsing for you. That's why it's called "line" receiver - it receives lines separated by a delimiter. If you override lineReceived instead of dataReceived then you'll be called which each line that is received, regardless of how TCP splits things up or smashes them together.
Second, the received messages are sometimes not in their appropriate sequence.
TCP is a reliable, ordered, stream-oriented transport. "Ordered" means that bytes arrive in the same order they are sent. Put another way, when you write("x"); write("y") it is guaranteed that the receiver will receive "x" before they receive "y" (they may receive "x" and "y" in the same call to recv() but if they do, the data will definitely be "xy" and not "yx"; or they may receive the two bytes in two calls to recv() and if they do, the first recv() will definitely by "x" and the second will definitely be "y", not the other way around).
If bytes appear to be arriving in a different order than you sent them, there's probably another bug somewhere that makes it look like this is happening - but it actually isn't. Your platform's TCP stack is very likely very close to bug free and in particular it probably doesn't have TCP data re-ordering bugs. Likewise, this area of Twisted is extremely well tested and probably works correctly. This leaves a bug in your application code or a misinterpretation of your observations. Perhaps your code doesn't always append data to a list or perhaps the data isn't being sent in the order you expected.
Another possibility is that you are talking about the order in which data arrives across multiple separate TCP connections. TCP is only ordered over a single connection. If you have two connections, there are very few (if any) guarantees about the order in which data will arrive over them.
Third, the networking communication seems to be too slow, as when the server initially tries to access the last element of the buffered list, the list is empty (this shows that the last messages on the buffer might not be the response to the last sent commands).
What defines "too slow"? The network is as fast as the network is. If that's not fast enough for you, find a fatter piece of copper. It sounds like what you really mean here is that your server sometimes expects data to have arrived before that data actually arrives. This doesn't mean the network is too slow, though, it means your server isn't properly event driven. If you're inspecting a buffer and not finding the information you expected, it's because you inspected it before the occurrence of the event which informs you of the arrival of that information. This is why Twisted has all these callback methods - dataReceived, lineReceived, connectionLost, etc. When lineReceived is called, this is an event notification telling you that right now something happened which resulted in a line being available (and, for convenience, lineReceived takes one argument - an object representing the line which is now available).
If you have some code that is meant to run when a line has arrived, consider putting that code inside an implementation of the lineReceived method. That way, when it runs (in response to a line being received), you can be 100% sure that you have a line to operate on. You can also be sure that it will run as soon as possible (as soon as the line arrives) but no sooner.

Related

How to receive for sendall in python socket module

I am fairly new to using sockets, and this will probably have a simple answer that I am overlooking, but since an hour of agonizing has not yielded results so... what the heck.
How do I receive for .sendall() in the python socket module? By this I mean how do I receive data from a socket with out a buffer? is there a simple solution for this like some sort of conn.recvall() function or do I have it write out logic to do this? If I do have to write logic for it, then how should I do it? Should I just keep using .recv() with some arbitrary buffint or do I have to split the inputs into segments before sending? Which is more efficient, or better? Is there a smarter way to go about it?
Thanks
send and sendall will chop your buffer into pieces for sending over the network. It's important to remember that TCP is a streaming protocol, not a packet protocol. If you send 1,024 bytes, it might be received by the other end as 1,024 bytes, or as one of 256 and one of 768, or one of 1,000 and one of 24. The receiver need to know when the transmission is complete. Sometimes it's fixed buffer, sometimes you'll send a byte count first, sometimes you use a special termination character, sometimes you wait for a timeout. The receiver just needs to keep calling .recv until he knows its done.
Some of the higher level Python packages (like twisted (which I recommend)) can handle that for you.

How to make an endmark in python TCP socket calls send() and recv()

I am relatively new to sockets and very new to python. How would you go about making an endmark for python send() and recv().
I have searched all over and there is no easy tutorial. I have read the man page for recv(2) a thousand times and it ironically makes less sense to me each time I read it.
I would like to use the send() function in a server to let the client calling recv() know when the end of the send() is.
Do you use the flag argument of send?
Or do you use something like "|".join(str1, str2) and use an if statement in the client to recognize the | and parse the statement?
TCP is not message oriented protocol. It does not maintain message boundaries nor it does not help in other ways to achieve it. It is up-to the application to mark boundaries. Client and server can agree upon a method in exchange of data. Common method is to put in the message length in the data you send.
[2 byte message length][Actual Data of Interest]
The end which receives packets will always look for two byte length indicator. recv as much data indicated by length bytes, process them and again go to recv length bytes and so on.
Another method is that the application can mark the start and end of the message with markers. It also needs to handle cases where the markers can also be part of actual data.
[Start Indicator][Actual Data of Interest][ End Indicator]

How to get a list of connected clients in a server using Python epoll

I would like to send a message to a subset of clients connected to a server which uses Python epoll. Is there a practical way of getting list of fileno's of connected clients?
Normally, the way you handle this is to keep track of the clients yourself. Every time you accept a new client, before adding it to the epoll, also add it to your collection. Every time you drop a client, remove it from your collection.
The typical minimal thing to store is a dict mapping sockets to whatever else you need to keep track of. You can use the socket itself, a weakref to it, or its fileno as the key. The data you need often includes things like a read buffer and a write buffer, the peer address (as received on accept), a name or user ID or auth cookie, etc.
And I'm not sure how you're doing anything useful with a polling server that doesn't have at least a read buffer attached to each client socket. What do you do when you get a partial message? (For that matter, if you're planning to send a message to these clients, and you don't have a write buffer, how are you going to do that?)
But to answer your specific question:
There is no way to get a list of fd's from a Python epoll object, because there is no way to do this with the underlying API.
Conceivably, you could write something that steps through everything from 0 to your max open fd and tries to do an epoll_ctl with it and distinguish based on the error. This is probably a very bad idea, but it may not be impossible. You'd probably have to ctypes down to the native function, and you may need to play around with different possibilities to find out what has the right effect. For example, maybe if you do epoll_ctl(my_epoll.fileno(), EPOLL_CTL_MOD, fd, NULL), you'll get ENOENT for a unregistered fd, EBADF for a nonexistent fd, but EINVAL (because of the NULL event) for a valid fd. Or maybe it'll segfault. I can't guarantee there's any combination of parameters that will distinguish, but with some trial and error, you might find one.
(By the way, there's nothing that says that an epoll has to be a list of connected clients; e.g., it may be the connected clients plus the listener socket plus a "quit" pipe.)

How can a disconnected TCP socket be reliably detected using MsgWaitForMultipleObjects?

Twisted includes a reactor implemented on top of MsgWaitForMultipleObjects. Apparently the reactor has problems reliably noticing when a TCP connection ends, at least in the case where a peer sends some bytes and then quickly closes the connection. What seems to happen is:
The reactor calls MsgWaitForMultipleObjects with some socket handles and QS_ALLINPUT.
The call completes and indicates the handle for a socket in this state (that is, has bytes waiting to be read and has been closed by the peer) is active.
The reactor dispatches this notification to the common TCP implementation.
The TCP implementation reads the available bytes from the socket. There are some, they get delivered to application code.
Control is returned to the reactor, which eventually calls MsgWaitForMultipleObjects again.
MsgWaitForMultipleObjects never again indicates that the handle is active. The TCP implementation never gets to look at the socket again, so it can never detect that the connection is closed.
This makes it appear as though MsgWaitForMultipleObjects is an edge-triggered notification mechanism. The MSDN documentation says:
Waits until one or all of the specified objects are in the signaled state
or the time-out interval elapses.
This doesn't sound like edge-triggering. It sounds like level-triggering.
Is MsgWaitForMultipleObjects actually edge-triggered? Or is it level-triggered and this misbehavior is caused by some other aspect of its behavior?
Addendum The MSDN docs for WSAEventSelect explains what's going on here a bit more, including pointing out that FD_CLOSE is basically a one-off event. After its signaled once, you'll never get it again. This goes some way towards explaining why Twisted has this problem. I'm still interested to hear how to effectively use MsgWaitForMultipleObjects given this limitation, though.
In order to use WSAEventSelect and differentiate activities, you need to call WSAEnumNetworkEvents. Make sure you're processing each event that was reported, not just the first.
WSAAsyncSelect makes it easy to determine the cause, and is often used together with MsgWaitForMultipleObjects.
So you might use WSAAsyncSelect instead of WSAEventSelect.
Also, I think you have a fundamental misunderstanding of the difference between edge-triggered and level-triggered. Your reasoning seems to be more related to auto-reset vs manual-reset events.

Checking files retrieved by Twisted's FTPClient.retrieveFile method for completeness

I'm writing a custom ftp client to act as a gatekeeper for incoming multimedia content from subcontractors hired by one of our partners. I chose twisted because it allows me to parse the file contents before writing the files to disk locally, and I've been looking for occasion to explore twisted anyway. I'm using 'twisted.protocols.ftp.FTPClient.retrieveFile' to get the file, passing the escaped path to the file, and a protocol to the 'retrieveFile' method. I want to be absolutely sure that the entire file has been retrieved because the event handler in the call back is going to write the file to disk locally, then delete the remote file from the ftp server alla '-E' switch behavior in the lftp client. My question is, do I really need to worry about this, or can I assume that an err back will happen if the file is not fully retrieved?
There are a couple unit tests for behavior in this area.
twisted.test.test_ftp.FTPClientTestCase.test_failedRETR is the most directly relevant one. It covers the case where the control and data connections are lost while a file transfer is in progress.
It seems to me that test coverage in this area could be significantly improved. There are no tests covering the case where just the data connection is lost while a transfer is in progress, for example. One thing that makes this tricky, though, is that FTP is not a very robust protocol. The end of a file transfer is signaled by the data connection closing. To be safe, you have to check to see if you received as many bytes as you expected to receive. The only way to perform this check is to know the file size in advance or ask the server for it using LIST (FTPClient.list).
Given all this, I'd suggest that when a file transfer completes, you always ask the server how many bytes you should have gotten and make sure it agrees with the number of bytes delivered to your protocol. You may sometimes get an errback on the Deferred returned from retrieveFile, but this will keep you safe even in the cases where you don't.

Categories

Resources