Why do I see random read errors with Python BaseHTTPServer? - python

I have Python code that calls external HTTP services. I want to test this code by setting up mock HTTP servers that imitate those external services. I do this by starting a BaseHTTPServer in a separate thread, and then calling that server from the main thread. It looks like this:
import BaseHTTPServer, httplib, threading, time
class MockHandler(BaseHTTPServer.BaseHTTPRequestHandler):
def do_POST(self):
self.send_response(200)
self.send_header('Content-Type', 'application/json')
self.end_headers()
self.wfile.write('{"result": "success"}')
class ServerThread(threading.Thread):
def run(self):
svr = BaseHTTPServer.HTTPServer(('127.0.0.1', 8540), MockHandler)
svr.handle_request()
ServerThread().start()
time.sleep(0.1) # Give the thread some time to get up
conn = httplib.HTTPConnection('127.0.0.1', 8540)
conn.request('POST', '/', 'foo=bar&baz=qux')
resp_body = conn.getresponse().read()
However, some of the requests fail in the read() call, with socket.error: [Errno 104] Connection reset by peer. I can reproduce it, with varying frequency, on several machines with Python 2.6, though not with 2.7.
But the most interesting thing is, if I don’t send the POST data (i.e. if I omit the third argument to conn.request()), the error does not occur.
What could this be?
Alternatively, is there another quick and easy way to set up mock HTTP servers in Python?

"...in a separate thread, and then calling that server from the main thread."
Don't use threads for this kind of thing.
Use processes. subprocess.Popen (and your operating system's normal features) will do a much, much better job of assuring that this works properly.

Related

Starting and stopping flask on demand

I am writing an application, which can expose a simple RPC interface implemented with flask. However I want it to be possible to activate and deactivate that interface. Also it should be possible to have multiple instances of the application running in the same python interpreter, which each have their own RPC interface.
The service is only exposed to localhost and this is a prototype, so I am not worried about security. I am looking for a small and easy solution.
The obvious way here seems to use the flask development server, however I can't find a way to shut it down.
I have created a flask blueprint for the functionality I want to expose and now I am trying to write a class to wrap the RPC interface similar to this:
class RPCInterface:
def __init__(self, creating_app, config):
self.flask_app = Flask(__name__)
self.flask_app.config.update(config)
self.flask_app.my_app = creating_app
self.flask_app.register_blueprint(my_blueprint)
self.flask_thread = Thread(target=Flask.run, args=(self.flask_app,),
name='flask_thread', daemon=True)
def shutdown(self):
# Seems impossible with the flask server
raise NotImplemented()
I am using the variable my_app of the current app to pass the instance of my application this RPC interface is working with into the context of the requests.
It can be shut down from inside a request (as described here http://flask.pocoo.org/snippets/67/), so one solution would be to create a shutdown endpoint and send a request with the test client to initiate a shutdown. However that requires a flask endpoint just for this purpose. This is far from clean.
I looked into the source code of flask and werkzeug and figured out the important part (Context at https://github.com/pallets/werkzeug/blob/master/werkzeug/serving.py#L688) looks like this:
def inner():
try:
fd = int(os.environ['WERKZEUG_SERVER_FD'])
except (LookupError, ValueError):
fd = None
srv = make_server(hostname, port, application, threaded,
processes, request_handler,
passthrough_errors, ssl_context,
fd=fd)
if fd is None:
log_startup(srv.socket)
srv.serve_forever()
make_server returns an instance of werkzeugs server class, which inherits from pythons http.server class. This in turn is a python BaseSocketServer, which exposes a shutdown method. The problem is that the server created here is just a local variable and thus not accessible from anywhere.
This is where I ran into a dead end. So my question is:
Does anybody have another idea how to shut down this server easily?
Is there any other simple server to run flask on? Something which does not require an external process and can just be started and stopped in a few lines of code? Everything listed in the flask doc seems to have a complex setup.
Answering my own question in case this ever happens again to anyone.
The first solution involved switching from flask to klein. Klein is basically flask with less features, but running on top of the twisted reactor. This way the integration is very simple. Basically it works like this:
from klein import Klein
from twisted.internet import reactor
app = Klein()
#app.route('/')
def home(request):
return 'Some website'
endpoint = serverFromString(reactor, endpoint_string)
endpoint.listen(Site(app.resource()))
reactor.run()
Now all the twisted tools can be used to start and stop the server as needed.
The second solution I switched to further down the road was to get rid of HTTP as a transport protocol. I switched to JSONRPC on top of twisted's LineReceiver protocol. This way everything got even simpler and I didn't use any of the HTTP stuff anyway.
This is a terrible, horrendous hack that nobody should ever use for any purpose whatsoever... except maybe if you're trying to write an integration test suite. There are probably better approaches - but if you're trying to do exactly what the question is asking, here goes...
import sys
from socketserver import BaseSocketServer
# implementing the shutdown() method above
def shutdown(self):
for frame in sys._current_frames().values():
while frame is not None:
if 'srv' in frame.f_locals and isinstance(frame.f_locals['srv'], BaseSocketServer):
frame.f_locals['srv'].shutdown()
break
else:
continue
break
self.flask_thread.join()

Python HTTP client with request pipelining

The problem: I need to send many HTTP requests to a server. I can only use one connection (non-negotiable server limit). The server's response time plus the network latency is too high – I'm falling behind.
The requests typically don't change server state and don't depend on the previous request's response. So my idea is to simply send them on top of each other, enqueue the response objects, and depend on the Content-Length: of the incoming responses to feed incoming replies to the next-waiting response object. In other words: Pipeline the requests to the server.
This is of course not entirely safe (any reply without Content-Length: means trouble), but I don't care -- in that case I can always retry any queued requests. (The safe way would be to wait for the header before sending the next bit. That'd might help me enough. No way to test beforehand.)
So, ideally I want the following client code (which uses client delays to mimic network latency) to run in three seconds.
Now for the $64000 question: Is there a Python library which already does this, or do I need to roll my own? My code uses gevent; I could use Twisted if necessary, but Twisted's standard connection pool does not support pipelined requests. I also could write a wrapper for some C library if necessary, but I'd prefer native code.
#!/usr/bin/python
import gevent.pool
from gevent import sleep
from time import time
from geventhttpclient import HTTPClient
url = 'http://local_server/100k_of_lorem_ipsum.txt'
http = HTTPClient.from_url(url, concurrency=1)
def get_it(http):
print time(),"Queueing request"
response = http.get(url)
print time(),"Expect header data"
# Do something with the header, just to make sure that it has arrived
# (the greenlet should block until then)
assert response.status_code == 200
assert response["content-length"] > 0
for h in response.items():
pass
print time(),"Wait before reading body data"
# Now I can read the body. The library should send at
# least one new HTTP request during this time.
sleep(2)
print time(),"Reading body data"
while response.read(10000):
pass
print time(),"Processing my response"
# The next request should definitely be transmitted NOW.
sleep(1)
print time(),"Done"
# Run parallel requests
pool = gevent.pool.Pool(3)
for i in range(3):
pool.spawn(get_it, http)
pool.join()
http.close()
Dugong is an HTTP/1.1-only client which claims to support real HTTP/1.1 pipelining. The tutorial includes several examples on how to use it, including one using threads and another using asyncio.
Be sure to verify that the server you're communicating with actually supports HTTP/1.1 pipelining—some servers claim to support HTTP/1.1 but don't implement pipelining.
I think txrequests could get you most of what you are looking for, using the background_callback to en-queue processing of responses on a separate thread. Each request would still be it's own thread but using a session means by default it would reuse the same connection.
https://github.com/tardyp/txrequests#working-in-the-background
It seems you are running python2.
For python3 >= 3.5
you could use async/await loop
See asyncio
Also, there is a library built on top for better, easier use
called Trio, available on pip.
Another thing I can think of is multiple threads with locks.
I will think on how to better explain this or could it even work.

SocketServer ThreadingTCPServer & Asyncore Dispatcher

I want to add a timeout to individual connections within my request handler for a server using the SocketServer module.
Let me start by saying this is the first time I'm attempting to do network programming using Python. I've sub-classed SocketServer.BaseRequestHandler and SocketServer.ThreadingTCPServer & SocketServer.TCPServer and managed to create two classes with some basic threaded TCP functionality.
However I would like my incoming connections to time-out. Trying to override any of the built in SocketServer time-out values and methods does not work, as the documentation says this works only with forking server. I have managed to create a timer thread that fires after X seconds, but due to the nature of the blocking recv call within the Handler thread, this is of no use, as I would be forced to kill it, and this is something I really want to avoid.
So it is my understanding that I need an asyncore implementation, where I get notified and read certain amount of data. In the event that no data is sent over a period of 5 seconds lets say, I want to close that connection (I know how to cleanly do that).
I have found a few examples of using asyncore with sockets, but none using SocketServer. So, how can I implement asyncore & threadingTCPserver ?
Is it possible?
Has anyone done it?
You can also set a timeout on the recv call, like this:
sock.settimeout(1.0)
Since you use SocketServer, you will have to find the underlying socket somewhere in the SocketServer. Please note that SocketServer will create the socket for you, so there is no need to do that yourself.
You will probably have defined a RequestHandler to go with your SocketServer. It should look something like this:
class RequestHandler(SocketServer.BaseRequestHandler):
def setup(self):
# the socket is called request in the request handler
self.request.settimeout(1.0)
def handle(self):
while True:
try:
data = self.request.recv(1024)
if not data:
break # connection is closed
else:
pass # do your thing
except socket.timeout:
pass # handle timeout

pyftpdlib slow .read on file blocks entire mainloop

Helllo,
I am using a custom AbstractFS on pyftpdlib that maps files on a HTTP server to FTP.
This files are returned by my implementation of open (of AbstractFS) which returns a httplib.HTTPResponse wrapped by the following class:
class HTTPConnWrapper:
def __init__(self, obj, filename):
# make it more file obj like
self.obj = obj
self.closed = True
self.name = filename.split(os.sep)[-1]
def seek(self, arg):
pass
def read(self, bytes):
#print 'read', bytes
read = self.obj.read(100) #we DONT read var byes, but 100 bytes
#print 'ok'
return read
The problem is that if a client is downloading files the entire server becommes sluggish.
What can I do?
Any ideas?
PS:
And why just monkey patching everything with evenetlet does'nt magically makes everything work?
pyftpdlib uses Python's asyncore module which polls and interacts with dispatchers. Each time you map an FTP request to a request to the HTTP server you're blocking the asyncore loop that pydftpdlib is using. You should implement your HTTP requests as dispatchers that fit the asyncore model, or generate threads to handle the request asynchronously and post the result back to the FTP request handler when the data has arrived. This is somewhat difficult as there's no provided mechanism to interrupt asyncore's polling loop from external threads.
As for eventlet, I don't know that it would play nicely with asyncore, which is already utilizing a nonblocking IO mechanism.
Ok, I posted a bug report on pyftpdlib:
I wouldn't even know what to recommend exactly as it's a problem which is hard to resolve and there's no easy or standard way to deal with it.
But I got a crazy solution to solve this problem without using pyftpdlib.
rewrite everything using wsgidav (which uses the cherrypy
wsgiserver, so its threaded)
mount that WebDAV filesystem as native filesystem (net use on
windows, mount.davfs on linux)
serve this mounted filesystem with any ftp server that can handle
blocking file systems

Tornado WebSocket Question

Finally decided to go with Tornado as a WebSocket server, but I have a question about how it's implemented.
After following a basic tutorial on creating a working server, I ended up with this:
#!/usr/bin/env python
from tornado.httpserver import HTTPServer
from tornado.ioloop import IOLoop
from tornado.web import Application
from tornado.websocket import WebSocketHandler
class Handler(WebSocketHandler):
def open(self):
print "New connection opened."
def on_message(self, message):
print message
def on_close(self):
print "Connection closed."
print "Server started."
HTTPServer(Application([("/", Handler)])).listen(1024)
IOLoop.instance().start()
It works great and all, but I was wondering if the other modules (tornado.httpserver, tornado.ioloop, and tornado.web) are actually needed to run the server.
It's not a huge issue having them, but I just wanted to make sure there wasn't a better way to do whatever they do (I haven't covered those modules at all, yet.).
tornado.httpserver :
A non-blocking, single-threaded HTTP server.
Typical applications have little direct interaction with the HTTPServer class.
HTTPServer is a very basic connection handler. Beyond parsing the HTTP request body and headers, the only HTTP semantics implemented in HTTPServer is HTTP/1.1 keep-alive connections.
tornado.ioloop :
An I/O event loop for non-blocking sockets.
So, the ioloop can be used for setting the time-out of the response.
In general, methods on RequestHandler and elsewhere in tornado are not thread-safe. In particular, methods such as write(), finish(), and flush() must only be called from the main thread. If you use multiple threads it is important to use IOLoop.add_callback to transfer control back to the main thread before finishing the request.
tornado.web :
Provides RequestHandler and Application classes
Helps with additional tools and optimizations to take advantage of the Tornado non-blocking web server and tools.
So, these are the provisions by this module :
Entry points : Hook for subclass initialization.
Input
Output
Cookies
I hope, this will cover the modules you left.
Yes they're needed because you're using each import from each module/package you reference. If you reference something at the top of your source but never use it again in any of the following code then of course you don't need them but in this case you use your imports.

Categories

Resources