Reading socket buffer using asyncore - python

I'm new to Python (I have been programming in Java for multiple years now though), and I am working on a simple socket-based networking application (just for fun). The idea is that my code connects to a remote TCP end-point and then listens for any data being pushed from the server to the client, and perform some parsing on this.
The data being pushed from server -> client is UTF-8 encoded text, and each line is delimited by CRLF (\x0D\x0A). You probably guessed: the idea is that the client connects to the server (until cancelled by the user), and then reads and parses the lines as they come in.
I've managed to get this to work, however, I'm not sure that I'm doing this quite the right way. So hence my actual questions (code to follow):
Is this the right way to do it in Python (ie. is it really this simple)?
Any tips/tricks/useful resources (apart from the reference documentation) regarding buffers/asyncore?
Currently, the data is being read and buffered as follows:
def handle_read(self):
self.ibuffer = b""
while True:
self.ibuffer += self.recv(self.buffer_size)
if ByteUtils.ends_with_crlf(self.ibuffer):
self.logger.debug("Got full line including CRLF")
break
else:
self.logger.debug("Buffer not full yet (%s)", self.ibuffer)
self.logger.debug("Filled up the buffer with line")
print(str(self.ibuffer, encoding="UTF-8"))
The ByteUtils.ends_with_crlf function simply checks the last two bytes of the buffer for \x0D\x0A. The first question is the main one (answer is based on this), but any other ideas/tips are appreciated. Thanks.

TCP is a stream, and you are not guaranteed that your buffer will not contain the end of one message and the beginning of the next.
So, checking for \n\r at the end of the buffer will not work as expected in all situations. You have to check each byte in the stream.
And, I would strongly recommend that you use Twisted instead of asyncore.
Something like this (from memory, might not work out of the box):
from twisted.internet import reactor, protocol
from twisted.protocols.basic import LineReceiver
class MyHandler(LineReceiver):
def lineReceived(self, line):
print "Got line:", line
f = protocol.ClientFactory()
f.protocol = MyHandler
reactor.connectTCP("127.0.0.1", 4711, f)
reactor.run()

It's even simpler -- look at asynchat and its set_terminator method (and other helpful tidbits in that module). Twisted is orders of magnitude richer and more powerful, but, for sufficiently simple tasks, asyncore and asynchat (which are designed to interoperate smoothly) are indeed very simple to use, as you've started observing.

Related

How to solve this simple one-way local machine messaging problem

I have a first sender script in Python 3.10 which needs to send some data
def post_updates(*args):
sender.send_message("optional_key", args)
Then a second receiver script in Python 3.7 which needs to receive this data
while True:
args = receiver.get_message("optional_key", blocking=True)
print("args received:", args)
Constraints:
Each script should not depend on the presence of the other to run.
The sender should try to send regardless if the receiver is running.
The receiver should try to receive regardless if the sender is running.
The message can consist of basic python objects (dict, list) and should be serialized automatically.
I need to send over 100 messages per second (minimizing latency if possible).
Local PC only (Windows) and no need for security.
Are there 1-liner solutions to this simple problem? Everything I look up seems overly complicated or requires a TCP server to be started beforehand. I don't mind installing popular modules.
UDP and JSON look perfect for what you're asking for, as long as
you don't need there to be more than one receiver
you don't need very large messages
you just need to send combinations of dicts, lists, strings, and numbers, not Python objects of arbitrary classes
you're not being overly literal about finding a "1-liner": it's a very small amount of code to write, and you're free to define your own helper functions.
Python's standard library has all you need for this. Encoding and decoding from JSON is as simple as json.dumps() and json.loads(). For sending and receiving, I suggest following the example on the Python wiki. You need to create the socket first with
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
regardless of if you're making the sender or the receiver. The receiver will then need to bind to the local port to listen to it:
sock.bind(('127.0.0.1', PORT))
And then the sender sends with sock.sendto() and the receiver receives with sock.recvfrom().
The good old pipe might do the job, but you need to assess how big the buffer size needs to be (given the async nature of your sender/receiver), and change the default pipe buffer size.

How does the select() function in the select module of Python exactly work?

I am working on writing a network-oriented application in Python. I had earlier worked on using blocking sockets, but after a better understanding of the requirement and concepts, I am wanting to write the application using non-blocking sockets and thus an event-driven server.
I understand that the functions in the select module in Python are to be used to conveniently see which socket interests us and so forth. Towards that I was basically trying to flip through a couple of examples of an event-driven server and I had come across this one:
"""
An echo server that uses select to handle multiple clients at a time.
Entering any line of input at the terminal will exit the server.
"""
import select
import socket
import sys
host = ''
port = 50000
backlog = 5
size = 1024
server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server.bind((host,port))
server.listen(backlog)
input = [server,sys.stdin]
running = 1
while running:
inputready,outputready,exceptready = select.select(input,[],[])
for s in inputready:
if s == server:
# handle the server socket
client, address = server.accept()
input.append(client)
elif s == sys.stdin:
# handle standard input
junk = sys.stdin.readline()
running = 0
else:
# handle all other sockets
data = s.recv(size)
if data:
s.send(data)
else:
s.close()
input.remove(s)
server.close()
The parts that I didn't seem to understand are the following:
In the code snippet inputready,outputready,exceptready = select.select(input,[],[]), I believe the select() function returns three possibly empty lists of waitable objects for input, output and exceptional conditions. So it makes sense that the first argument to the select() function is the list containing the server socket and the stdin. However, where I face confusion is in the else block of the code.
Since we are for-looping over the list of inputready sockets, it is clear that the select() function will choose a client socket that is ready to be read. However, after we read data using recv() and find that the socket has actually sent data, we would want to echo it back to the client. My question is how can we write to this socket without adding it to the list passed as second argument to the select() function call? Meaning, how can we call send() on the new socket directly without 'registering' it with select() as a writable socket?
Also, why do we loop only over the sockets ready to be read (inputready in this case)? Isn't it necessary to loop over even the outputready list to see which sockets are ready to be written?
Obviously, I am missing something here.
It would also be really helpful if somebody could explain in a little more detailed fashion the working of select() function or point to good documentation.
Thank you.
Probably that snippet of code is just a simple example and so it is not exhaustive. You are free to write and read in every socket, also if select does not tell you that they are ready. But, of course, if you do this you cannot be sure that your send() won't block.
So, yes, it would be best practice to rely on select also for writing operations.
There are also many other function which have a similar purpose and in many cases they are better then select (e.g. epoll), but they are not available on all platforms.
Information about the select, epoll & other functions may be found in Linux man pages.
However in python there are many nice libraries used to handle many connections, some of these are: Twisted and gevent

Interactive Python Client/Server with Twisted

I've been trying to wrap my mind around how to get Twisted to perform, for lack of a better word, "interactive" client/server behavior.
I managed to put together a pair of Protocol and ClientFactory classes that do connect to a service, and perform an immediate query/response (see: connectionMade -> self.queryStatus). This succeeds as expected and prints the server's response from the Factory class.
My problem now is that I'll have outside events that must cause data to be sent, while always listening for potential incoming data. But once the reactor.run() loop is going, I'm not sure how the rest of my application is meant to trigger a data send.
I've tried a few different approaches since, but this is the simplest approach that did handle the recv part as described:
class myListenerProtocol(LineReceiver):
delimiter = '\n'
def connectionMade(self):
print("Connected to: %s" % self.transport.getPeer())
self.queryStatus(1)
def dataReceived(self, data):
print("Receiving Data from %s" % self.transport.getPeer())
...
self.commandReceived(self.myData)
def commandReceived(self, myData):
self.factory.commandReceived(myData)
def connectionLost(self, reason):
print("Disconnected.")
def queryStatus(self, CommandValue):
...
strSend = CommandValue # or some such
self.transport.write(strSend)
class mySocketFactory(ClientFactory):
protocol = myListenerProtocol
def __init__(self):
pass
def buildProtocol(self, address):
proto = ClientFactory.buildProtocol(self, address)
return proto
def commandReceived(self, myData):
print myData
reactor.stop() # It won't normally stop after recv
def clientConnectionFailed(self, connector, reason):
print("Connection failed.")
reactor.stop()
def main():
f = mySocketFactory()
reactor.connectTCP("10.10.10.1", 1234, f)
reactor.run()
I imagine this is pretty straight-forward, but countless hours into numerous examples and documentation have left me without a good understanding of how I'm meant to deal with this scenario.
My problem now is that I'll have outside events that must cause data to be sent, while always listening for potential incoming data. But once the reactor.run() loop is going, I'm not sure how the rest of my application is meant to trigger a data send.
"Outside events"? Like what? Data arriving on a connection? Great, having the reactor running means you'll actually be able to handle that data.
Or maybe someone is clicking a button in a GUI? Try one of the GUI integration reactors - again, you can't handle those events until you have a reactor running.
You're probably getting stuck because you think your main function should do reactor.run() and then go on to do other things. This isn't how it works. When you write an event-driven program, you define all of your event sources and then let the event loop call your handlers when events arrive on those sources.
Well, there are many approaches to that, and the best one really depends on the context of your application, so I won't detail you one way of doing this here, but rather link you to a reading I had recently on hacker's news:
http://www.devmusings.com/blog/2013/05/23/python-concurrency/
and good use-case example, though it may not apply to what you're working on (or you may have read it):
http://eflorenzano.com/blog/2008/11/17/writing-markov-chain-irc-bot-twisted-and-python/
BTW, you may also have a look at gevent or tornado that are good at handling that kind of things.
If your other "events" are from a GUI toolkit (like GTK or QT) be really careful of the GIL, and even if you just want command line events you'll need threads and still be careful of that.
Finally, if you want to make more interaction, you may as well write different kind of "peers" for your server, that interacts with the different use cases you're working on (one client that connects to a GUI, another with a CLI, another with a database, another with a SAAS' API etc..).
In other words, if your design is not working, try changing your perspective!

Should sockets be non-blocking to work with select in Python?

Should sockets be set to non-blocking when used with select.select in Python?
What difference does it make if they are or aren't?
Occasionally I find that calling send on a socket that returns sendable will block. Furthermore I find that blocking sockets will generally send the whole buffer given (128 KiB). In non-blocking mode, sending will accept far fewer bytes (20-40 KiB compared with the example given earlier) and return quicker. I'm using Python 3.1 on Lucid.
The answer might be OS dependent unfortunately. I'm replying only regarding Linux.
I'm not aware of differences regarding blocking/non-blocking sockets in select, but on linux, the select system call man page has this in it 'BUGS' section:
Under Linux, select() may report a
socket file descriptor as "ready for
reading", while nevertheless a
subsequent read blocks. This could
for example happen when data has
arrived but upon examination has
wrong checksum and is discarded. There may be other
circumstances in which a file
descriptor is spuriously reported as
ready. Thus it may be safer to use
O_NONBLOCK on sockets that should not
block.
I doubt a python abstraction above that could "hide" this issue without side-effects.
As for the blocking write sending more data, that's expected. send will block until there is enough buffer space to pass your whole request down if the socket is blocking. If the socket is non-blocking, it only sends as much as can currently fit in the socket's send buffer.

Constipated Python urllib2 sockets

I've been scouring the Internet looking for a solution to my problem with Python. I'm trying to use a urllib2 connection to read a potentially endless stream of data from an HTTP server. It's part of some interactive communication, so it's important that I can get the data that's available, even if it's not a whole buffer full. There seems to be no way to have read \ readline return the available data. It will block forever waiting for the entire (endless) stream before it returns.
Even if I set the underlying file descriptor to non-blocking using fnctl, the urllib2 file-object still blocks!! In general there seems to be no way to make python file-objects, upon read, return all available data if there is some and block otherwise.
I've seen a few posts about people seeking help with this, but I have seen no solutions. What gives? Am I missing something? This seems like such a normal use-case to completely ruin! I'm hoping to utilize urllib2's ability to detect configured proxies and use chunked encoding, but I can't if it won't cooperate.
Edit: Upon request, here is some example code
Client:
connection = urllib2.urlopen(commandpath)
id = connection.readline()
Now suppose that the server is using chunked transfer encoding, and writes one chunk down the stream and the chunk contains the line, and then waits. The connection is still open, but the client has data waiting in a buffer.
I cannot get read or readline to return the data I know it has waiting for it, because it tries to read until the end of the connection. In this case the connection may never close so it will wait either forever or until an inactivity timeout occurs, severing the connection. Once the connection is severed it will return, but that's obviously not the behavior I want.
urllib2 operates at the HTTP level, which works with complete documents. I don't think there's a way around that without hacking into the urllib2 source code.
What you can do is use plain sockets (you'll have to talk HTTP yourself in this case), and call sock.recv(maxbytes) which does read only available data.
Update: you may want to try to call conn.fp._sock.recv(maxbytes), instead of conn.read(bytes) on an urllib2 connection.

Categories

Resources