pyftpdlib slow .read on file blocks entire mainloop - python

Helllo,
I am using a custom AbstractFS on pyftpdlib that maps files on a HTTP server to FTP.
This files are returned by my implementation of open (of AbstractFS) which returns a httplib.HTTPResponse wrapped by the following class:
class HTTPConnWrapper:
def __init__(self, obj, filename):
# make it more file obj like
self.obj = obj
self.closed = True
self.name = filename.split(os.sep)[-1]
def seek(self, arg):
pass
def read(self, bytes):
#print 'read', bytes
read = self.obj.read(100) #we DONT read var byes, but 100 bytes
#print 'ok'
return read
The problem is that if a client is downloading files the entire server becommes sluggish.
What can I do?
Any ideas?
PS:
And why just monkey patching everything with evenetlet does'nt magically makes everything work?

pyftpdlib uses Python's asyncore module which polls and interacts with dispatchers. Each time you map an FTP request to a request to the HTTP server you're blocking the asyncore loop that pydftpdlib is using. You should implement your HTTP requests as dispatchers that fit the asyncore model, or generate threads to handle the request asynchronously and post the result back to the FTP request handler when the data has arrived. This is somewhat difficult as there's no provided mechanism to interrupt asyncore's polling loop from external threads.
As for eventlet, I don't know that it would play nicely with asyncore, which is already utilizing a nonblocking IO mechanism.

Ok, I posted a bug report on pyftpdlib:
I wouldn't even know what to recommend exactly as it's a problem which is hard to resolve and there's no easy or standard way to deal with it.
But I got a crazy solution to solve this problem without using pyftpdlib.
rewrite everything using wsgidav (which uses the cherrypy
wsgiserver, so its threaded)
mount that WebDAV filesystem as native filesystem (net use on
windows, mount.davfs on linux)
serve this mounted filesystem with any ftp server that can handle
blocking file systems

Related

Temporary ftp server for testing

I want to write a test for my code which uses an FTP library and does upload data via FTP.
I would like to avoid the need for a real FTP server in my test.
What is the most simple way to test my code?
There are several edge-cases which I would like to test.
For example, my code tries to create a directory which already exists.
I want to catch the exception and do appropriate error handling.
I know that I could use the mocking library. I used it before. But maybe there is a better solution for this use case?
Update Why I don't want to do mocking: I know that I could use mocking to solve this. I could mock the library I use (I use ftputil from Stefan Schwarzer) and test my code this way. But what happens if I change my code and use a different FTP library in the future? Then I would need to re-write my testing code, too. I am lazy. I want to be able to rewrite the real code I am testing without touching the test code. But maybe I am still missing a cool way to use mocking.
Solved with https://github.com/tbz-pariv/ftpservercontext
Firstly to hey this or of the way. You aren't asking about Mocking, your question is about Faking.
Fake, an implementation of an interface, which expresses correct behaviour, but cannot be used in production.
Mock, an implementation of an interface that responds to interactions based on a scripted (script as in movie script, not uncompiled code) response.
Stub, an implementation of an interface lacking any real implementation. Usually used in mcguffin style tests.
Notice that in every case the word "interface" is used.
Your question asks how to Fake a TCP port such that the behaviour is a FTP server, with STATE of a rw filesystem underneath.
This is hard.
It is much easier to MOCK an internal interface that throws when you call the mkdir function.
If you must FAKE a FTP server. I suggest creating a docker container with the server in the state you want and use docker to handle the repeatability and lifecycle of the FTP server.
ContextManager:
class FTPServerContext(object):
banner = 'FTPServerContext ready'
def __init__(self, directory_to_serve):
self.directory_to_serve = directory_to_serve
def __enter__(self):
cmd = ['serve_directory_via_ftp']
self.pipe = subprocess.Popen(cmd, cwd=self.directory_to_serve)
time.sleep(2) # TODO check banner via https://stackoverflow.com/a/4896288/633961
def __exit__(self, *args):
self.pipe.kill()
console_script:
def serve_directory_via_ftp():
# https://pyftpdlib.readthedocs.io/en/latest/tutorial.html
authorizer = DummyAuthorizer()
authorizer.add_user('testuser-ftp', 'testuser-ftp-pwd', '.', perm='elradfmwMT')
handler = FTPHandler
handler.authorizer = authorizer
handler.banner = testutils.FTPServerContext.banner
address = ('localhost', 2121)
server = FTPServer(address, handler)
server.serve_forever()
Usage in test:
def test_execute_job_and_create_log(self):
temp_dir = tempfile.mkdtemp()
with testutils.FTPServerContext(temp_dir) as ftp_context:
execute_job_and_create_log(...)
Code is in the public domain under any license you want. It would great if you make this a pip installable package at pypi.org.

Starting and stopping flask on demand

I am writing an application, which can expose a simple RPC interface implemented with flask. However I want it to be possible to activate and deactivate that interface. Also it should be possible to have multiple instances of the application running in the same python interpreter, which each have their own RPC interface.
The service is only exposed to localhost and this is a prototype, so I am not worried about security. I am looking for a small and easy solution.
The obvious way here seems to use the flask development server, however I can't find a way to shut it down.
I have created a flask blueprint for the functionality I want to expose and now I am trying to write a class to wrap the RPC interface similar to this:
class RPCInterface:
def __init__(self, creating_app, config):
self.flask_app = Flask(__name__)
self.flask_app.config.update(config)
self.flask_app.my_app = creating_app
self.flask_app.register_blueprint(my_blueprint)
self.flask_thread = Thread(target=Flask.run, args=(self.flask_app,),
name='flask_thread', daemon=True)
def shutdown(self):
# Seems impossible with the flask server
raise NotImplemented()
I am using the variable my_app of the current app to pass the instance of my application this RPC interface is working with into the context of the requests.
It can be shut down from inside a request (as described here http://flask.pocoo.org/snippets/67/), so one solution would be to create a shutdown endpoint and send a request with the test client to initiate a shutdown. However that requires a flask endpoint just for this purpose. This is far from clean.
I looked into the source code of flask and werkzeug and figured out the important part (Context at https://github.com/pallets/werkzeug/blob/master/werkzeug/serving.py#L688) looks like this:
def inner():
try:
fd = int(os.environ['WERKZEUG_SERVER_FD'])
except (LookupError, ValueError):
fd = None
srv = make_server(hostname, port, application, threaded,
processes, request_handler,
passthrough_errors, ssl_context,
fd=fd)
if fd is None:
log_startup(srv.socket)
srv.serve_forever()
make_server returns an instance of werkzeugs server class, which inherits from pythons http.server class. This in turn is a python BaseSocketServer, which exposes a shutdown method. The problem is that the server created here is just a local variable and thus not accessible from anywhere.
This is where I ran into a dead end. So my question is:
Does anybody have another idea how to shut down this server easily?
Is there any other simple server to run flask on? Something which does not require an external process and can just be started and stopped in a few lines of code? Everything listed in the flask doc seems to have a complex setup.
Answering my own question in case this ever happens again to anyone.
The first solution involved switching from flask to klein. Klein is basically flask with less features, but running on top of the twisted reactor. This way the integration is very simple. Basically it works like this:
from klein import Klein
from twisted.internet import reactor
app = Klein()
#app.route('/')
def home(request):
return 'Some website'
endpoint = serverFromString(reactor, endpoint_string)
endpoint.listen(Site(app.resource()))
reactor.run()
Now all the twisted tools can be used to start and stop the server as needed.
The second solution I switched to further down the road was to get rid of HTTP as a transport protocol. I switched to JSONRPC on top of twisted's LineReceiver protocol. This way everything got even simpler and I didn't use any of the HTTP stuff anyway.
This is a terrible, horrendous hack that nobody should ever use for any purpose whatsoever... except maybe if you're trying to write an integration test suite. There are probably better approaches - but if you're trying to do exactly what the question is asking, here goes...
import sys
from socketserver import BaseSocketServer
# implementing the shutdown() method above
def shutdown(self):
for frame in sys._current_frames().values():
while frame is not None:
if 'srv' in frame.f_locals and isinstance(frame.f_locals['srv'], BaseSocketServer):
frame.f_locals['srv'].shutdown()
break
else:
continue
break
self.flask_thread.join()

Why do I see random read errors with Python BaseHTTPServer?

I have Python code that calls external HTTP services. I want to test this code by setting up mock HTTP servers that imitate those external services. I do this by starting a BaseHTTPServer in a separate thread, and then calling that server from the main thread. It looks like this:
import BaseHTTPServer, httplib, threading, time
class MockHandler(BaseHTTPServer.BaseHTTPRequestHandler):
def do_POST(self):
self.send_response(200)
self.send_header('Content-Type', 'application/json')
self.end_headers()
self.wfile.write('{"result": "success"}')
class ServerThread(threading.Thread):
def run(self):
svr = BaseHTTPServer.HTTPServer(('127.0.0.1', 8540), MockHandler)
svr.handle_request()
ServerThread().start()
time.sleep(0.1) # Give the thread some time to get up
conn = httplib.HTTPConnection('127.0.0.1', 8540)
conn.request('POST', '/', 'foo=bar&baz=qux')
resp_body = conn.getresponse().read()
However, some of the requests fail in the read() call, with socket.error: [Errno 104] Connection reset by peer. I can reproduce it, with varying frequency, on several machines with Python 2.6, though not with 2.7.
But the most interesting thing is, if I don’t send the POST data (i.e. if I omit the third argument to conn.request()), the error does not occur.
What could this be?
Alternatively, is there another quick and easy way to set up mock HTTP servers in Python?
"...in a separate thread, and then calling that server from the main thread."
Don't use threads for this kind of thing.
Use processes. subprocess.Popen (and your operating system's normal features) will do a much, much better job of assuring that this works properly.

Twisted, FTP, and "streaming" large files

I'm attempting to implement what can best be described as "an FTP interface to an HTTP API". Essentially, there is an existing REST API that can be used to manage a user's files for a site, and I'm building a mediator server that re-exposes this API as an FTP server. So you can login with, say, Filezilla and list your files, upload new ones, delete old ones, etc.
I'm attempting this with twisted.protocols.ftp for the (FTP) server, and twisted.web.client for the (HTTP) client.
The thing I'm running up against is, when a user tries to download a file, "streaming" that file from an HTTP response to my FTP response. Similar for uploading.
The most straightforward approach would be to download the entire file from the HTTP server, then turn around and send the contents to the user. The problem with this is that any given file could be many gigabytes large (think drive images, ISO files, etc). With this approach, though, the contents of the file would be held in memory between the time I download it from the API and the time I send it to the user - not good.
So my solution is to try to "stream" it - as I get chunks of data from the API's HTTP response, I just want to turn around and send those chunks along to the FTP user. Seems straightforward.
For my "custom FTP functionality", I'm using a subclass of ftp.FTPShell. The reading method of this, openForReading, returns a Deferred that fires with an implementation of IReadFile.
Below is my (initial, simple) implementation for "streaming HTTP". I use the fetch function to setup an HTTP request, and the callback I pass in gets called with each chunk I get from the response.
I thought I could use some sort of two-ended buffer object to transport the chunks between the HTTP and FTP, by using the buffer object as the file-like object required by ftp._FileReader, but that's quickly proving not to work, as the consumer from the send call almost immediately closes the buffer (because it's returning an empty string, because there's no data to read yet, etc). Thus, I'm "sending" empty files before I even start receiving the HTTP response chunks.
Am I close, but missing something? Am I on the wrong path altogether? Is what I want to do really impossible (I highly doubt that)?
from twisted.web import client
import urlparse
class HTTPStreamer(client.HTTPPageGetter):
def __init__(self):
self.callbacks = []
def addHandleResponsePartCallback(self, callback):
self.callbacks.append(callback)
def handleResponsePart(self, data):
for cb in self.callbacks:
cb(data)
client.HTTPPageGetter.handleResponsePart(self, data)
class HTTPStreamerFactory(client.HTTPClientFactory):
protocol = HTTPStreamer
def __init__(self, *args, **kwargs):
client.HTTPClientFactory.__init__(self, *args, **kwargs)
self.callbacks = []
def addChunkCallback(self, callback):
self.callbacks.append(callback)
def buildProtocol(self, addr):
p = client.HTTPClientFactory.buildProtocol(self, addr)
for cb in self.callbacks:
p.addHandleResponsePartCallback(cb)
return p
def fetch(url, callback):
parsed = urlparse.urlsplit(url)
f = HTTPStreamerFactory(parsed.path)
f.addChunkCallback(callback)
from twisted.internet import reactor
reactor.connectTCP(parsed.hostname, parsed.port or 80, f)
As a side note, this is only my second day with Twisted - I spent most of yesterday reading through Dave Peticolas' Twisted Introduction, which has been a great starting point, even if based on an older version of twisted.
That said, I may be doing things wrong.
I thought I could use some sort of two-ended buffer object to transport the chunks between the HTTP and FTP, by using the buffer object as the file-like object required by ftp._FileReader, but that's quickly proving not to work, as the consumer from the send call almost immediately closes the buffer (because it's returning an empty string, because there's no data to read yet, etc). Thus, I'm "sending" empty files before I even start receiving the HTTP response chunks.
Instead of using ftp._FileReader, you want something that will do a write whenever a chunk arrives from your HTTPStreamer to a callback it supplies. You never need/want to do a read from a buffer on the HTTP, because there's no reason to even have such a buffer. As soon as HTTP bytes arrive, write them to the consumer. Something like...
class FTPStreamer(object):
implements(IReadFile)
def __init__(self, url):
self.url = url
def send(self, consumer):
fetch(url, consumer.write)
# You also need a Deferred to return here, so the
# FTP implementation knows when you're done.
return someDeferred
You may also want to use Twisted's producer/consumer interface to allow the transfer to be throttled, as may be necessary if your connection to the HTTP server is faster than your user's FTP connection to you.

How to serve data from UDP stream over HTTP in Python?

I am currently working on exposing data from legacy system over the web. I have a (legacy) server application that sends and receives data over UDP. The software uses UDP to send sequential updates to a given set of variables in (near) real-time (updates every 5-10 ms). thus, I do not need to capture all UDP data -- it is sufficient that the latest update is retrieved.
In order to expose this data over the web, I am considering building a lightweight web server that reads/write UDP data and exposes this data over HTTP.
As I am experienced with Python, I am considering to use it.
The question is the following: how can I (continuously) read data from UDP and send snapshots of it over TCP/HTTP on-demand with Python? So basically, I am trying to build a kind of "UDP2HTTP" adapter to interface with the legacy app so that I wouldn't need to touch the legacy code.
A solution that is WSGI compliant would be much preferred. Of course any tips are very welcome and MUCH appreciated!
Twisted would be very suitable here. It supports many protocols (UDP, HTTP) and its asynchronous nature makes it possible to directly stream UDP data to HTTP without shooting yourself in the foot with (blocking) threading code. It also support wsgi.
Here's a quick "proof of concept" app using the twisted framework. This assumes that the legacy UDP service is listening on localhost:8000 and will start sending UDP data in response to a datagram containing "Send me data". And that the data is 3 32bit integers. Additionally it will respond to an "HTTP GET /" on port 2080.
You could start this with twistd -noy example.py:
example.py
from twisted.internet import protocol, defer
from twisted.application import service
from twisted.python import log
from twisted.web import resource, server as webserver
import struct
class legacyProtocol(protocol.DatagramProtocol):
def startProtocol(self):
self.transport.connect(self.service.legacyHost,self.service.legacyPort)
self.sendMessage("Send me data")
def stopProtocol(self):
# Assume the transport is closed, do any tidying that you need to.
return
def datagramReceived(self,datagram,addr):
# Inspect the datagram payload, do sanity checking.
try:
val1, val2, val3 = struct.unpack("!iii",datagram)
except struct.error, err:
# Problem unpacking data log and ignore
log.err()
return
self.service.update_data(val1,val2,val3)
def sendMessage(self,message):
self.transport.write(message)
class legacyValues(resource.Resource):
def __init__(self,service):
resource.Resource.__init__(self)
self.service=service
self.putChild("",self)
def render_GET(self,request):
data = "\n".join(["<li>%s</li>" % x for x in self.service.get_data()])
return """<html><head><title>Legacy Data</title>
<body><h1>Data</h1><ul>
%s
</ul></body></html>""" % (data,)
class protocolGatewayService(service.Service):
def __init__(self,legacyHost,legacyPort):
self.legacyHost = legacyHost #
self.legacyPort = legacyPort
self.udpListeningPort = None
self.httpListeningPort = None
self.lproto = None
self.reactor = None
self.data = [1,2,3]
def startService(self):
# called by application handling
if not self.reactor:
from twisted.internet import reactor
self.reactor = reactor
self.reactor.callWhenRunning(self.startStuff)
def stopService(self):
# called by application handling
defers = []
if self.udpListeningPort:
defers.append(defer.maybeDeferred(self.udpListeningPort.loseConnection))
if self.httpListeningPort:
defers.append(defer.maybeDeferred(self.httpListeningPort.stopListening))
return defer.DeferredList(defers)
def startStuff(self):
# UDP legacy stuff
proto = legacyProtocol()
proto.service = self
self.udpListeningPort = self.reactor.listenUDP(0,proto)
# Website
factory = webserver.Site(legacyValues(self))
self.httpListeningPort = self.reactor.listenTCP(2080,factory)
def update_data(self,*args):
self.data[:] = args
def get_data(self):
return self.data
application = service.Application('LegacyGateway')
services = service.IServiceCollection(application)
s = protocolGatewayService('127.0.0.1',8000)
s.setServiceParent(services)
Afterthought
This isn't a WSGI design. The idea for this would to use be to run this program daemonized and have it's http port on a local IP and apache or similar to proxy requests. It could be refactored for WSGI. It was quicker to knock up this way, easier to debug.
The software uses UDP to send sequential updates to a given set of variables in (near) real-time (updates every 5-10 ms). thus, I do not need to capture all UDP data -- it is sufficient that the latest update is retrieved
What you must do is this.
Step 1.
Build a Python app that collects the UDP data and caches it into a file. Create the file using XML, CSV or JSON notation.
This runs independently as some kind of daemon. This is your listener or collector.
Write the file to a directory from which it can be trivially downloaded by Apache or some other web server. Choose names and directory paths wisely and you're done.
Done.
If you want fancier results, you can do more. You don't need to, since you're already done.
Step 2.
Build a web application that allows someone to request this data being accumulated by the UDP listener or collector.
Use a web framework like Django for this. Write as little as possible. Django can serve flat files created by your listener.
You're done. Again.
Some folks think relational databases are important. If so, you can do this. Even though you're already done.
Step 3.
Modify your data collection to create a database that the Django ORM can query. This requires some learning and some adjusting to get a tidy, simple ORM model.
Then write your final Django application to serve the UDP data being collected by your listener and loaded into your Django database.

Categories

Resources