Mock S3 server stalls if using HTTP/1.1

Mock S3 server stalls if using HTTP/1.1 - python

I'm writing a test where a simple mock S3 is loaded up in the test environment using http.server.HTTPServer/http.server.BaseHTTPRequestHandler, to test multipart download behaviour involving Boto's S3Transfer.
It works fine, unless I specify that the server uses HTTP/1.1. In this case, it would download 2 8mb parts of a 100mb file, and then just hang. I would like for the mock server to use HTTP/1.1 since that's what the real S3 uses (I believe).
A simplified version of the test is below, that can be run by...
pip3 install boto3
python3 test.py
# test.py
import http.server
import re
import threading
import boto3
from botocore import (
UNSIGNED,
)
from botocore.client import (
Config,
)
length = 100 * 2**20
class MockS3(http.server.BaseHTTPRequestHandler):
# If the below line is commented, the download completes
protocol_version = 'HTTP/1.1'
def do_GET(self):
range_header = self.headers['Range']
match = re.search(r'^bytes=(\d+)-(\d*)', range_header)
start_inclusive_str, end_inclusive_str = match.group(1), match.group(2)
start = int(start_inclusive_str)
end = int(end_inclusive_str) + 1 if end_inclusive_str else length
bytes_to_send = end - start
self.send_response(206)
self.send_header('Content-Length', str(bytes_to_send))
self.end_headers()
self.wfile.write(bytearray(bytes_to_send))
def do_HEAD(self):
self.send_response(200)
self.send_header('Content-Length', length)
self.end_headers()
server_address = ('localhost', 5678)
server = http.server.HTTPServer(server_address, MockS3)
thread = threading.Thread(target=server.serve_forever)
thread.daemon = True
thread.start()
class Writable():
def write(self, data):
pass
s3_client = boto3.client('s3',
endpoint_url='http://localhost:5678',
config=Config(signature_version=UNSIGNED),
)
s3_client.download_fileobj(
Bucket='some',
Key='key',
Fileobj=Writable(),
)
Note that Writable is deliberately not seekable: in my real code, I'm using a non-seekable file-like object.
Yes, moto can be used to to make a mock S3, and I do so for other tests, but for this particular test I would like "real" server. There are custom file objects involved, and want to ensure that S3Transfer, and other code that isn't relevant to this question, behaves together as I expect.
How can I setup a mock S3 server that uses HTTP/1.1 and that S3Transfer can download from?

There is a bug in your threading logic. What you're currently doing is serving on a separate thread, but what you really want to do is concurrently handle requests on multiple threads.
This can be achieved by creating a very dumb HTTP server which just mixes in a threading capabilities:
class ThreadingServer(ThreadingMixIn, HTTPServer):
pass
and serving from this server instead of the base HTTPServer.
As for why this works with HTTP/1.0, the connection was closed after a single request was serviced.

Related

python-socketio not always emitting when tracking downloading of a file on a flask server

I'm using a flask server for RESTful web services and python-socketio to achieve bi-directional communication between the server and the client to keep track of download progress on the backend.
I take variable sio declared in the server.py file and pass it in as a parameter into a new object that will use it to emit to the client certain messages about it progress with downloading a file on the server.
sio = socketio.Server(async_mode='threading')
omics_env = None
#sio.on('init', namespace='/guardiome')
def init(sid, data):
global omics_env
if omics_env == None:
omics_env = Environment(socket=sio)
omics_env.conda.download_conda()
omics_env.data_management.download_omics_data()
The issue is when the file is downloading in python server, it emits a message to the client every time it has written 1 percent of data to file. But it's doesn't always emit to the client every time it has downloaded/written 1 percent of the data to file.
It will usually report progress to 18% percent, hold off for a while, then report back 40%, skipping the emits between 18% and 40%.
Some might say it's internet probably lagging, but I did print statements in the download function on top of the emit function which shows that it's writing/downloading every 1 percent of the data.
I also have checked online for other resource. Some mentioned using eventlet and do something like this at the highlest level of code of the server.
import eventlet
evenlet.monkey_patch()
But that doesn't lead to the code emitting at all.
Others have mentioned using a message queue like redis, but I can't use redis and I plan on turning the whole python code into a binary executable for it to be completely portable on linux platform to communicate with a local client.
Here is my server.py
import socketio
import eventlet.wsgi
from environment import Environment
from flask import Flask, jsonify, request, send_file
from flask_cors import CORS
omics_env = None
sio = socketio.Server(async_mode='threading')
app = Flask(__name__)
CORS(app)
#sio.on('init', namespace='/guardiome')
def init(sid, data):
global omics_env
if omics_env == None:
omics_env = Environment(socket=sio)
omics_env.conda.download_conda()
omics_env.data_management.download_omics_data()
omics_env.logger.info('_is_ready()')
sio.emit(
event='init',
data={'status': True, 'information': None},
namespace='/guardiome')
try:
# wrap Flask application with engineio's middleware
app.wsgi_app = socketio.Middleware(sio, app.wsgi_app)
# Launch the server with socket integration
app.run(port=8008, debug=False, threaded=True)
finally:
pass
# LOGGER.info('Exiting ...')
Here is the download_w_progress function that i pass sio into as reporter parameter
def download_w_progress(url , path, reporter=None):
ssl._create_default_https_context = ssl._create_unverified_context
r = requests.get(url, stream=True)
# Helper lambda functions
progress_report = lambda current, total: int((current/total)*100)
raw_percent = lambda current, total: (current/total)*100
# TODO(mak3): Write lambda function for reporting amount of file downloaded
# in MB, KB, GB, or whatever
with open(path, 'wb') as f:
total_length = int(r.headers.get('content-length'))
progress_count = 0
chunk_size = 1024
# Used to cut down on emit the same rounded percentage number
previous_percent = -1
# Read and write the file in chunks to its destination
for chunk in r.iter_content(chunk_size=1024):
progress_dict = {
"percent": progress_report(progress_count, total_length)
}
if reporter != None:
# Limit the number of emits sent to prevent
# to socket from overworking
if progress_dict["percent"] != previous_percent:
reporter.emit(event="environment", namespace="/guardiome", data=progress_dict)
# TODO(mak3): Remove or uncomment in production
if progress_dict["percent"] != previous_percent:
print(progress_dict["percent"], end='\r')
progress_count += chunk_size
previous_percent = progress_dict["percent"]
if chunk:
f.write(chunk)
f.flush()

Sorry I missed this question when you posted it.
There are a couple of problems in your code. You are choosing the async_mode='threading. In general it is best to omit this argument and let the server choose the best async mode depending on the server that you are using. When you add eventlet, for example, the threading mode is not going to work, there is actually a specific async mode for eventlet.
So my recommendation would be to:
remove the async_mode argument in the socketio.Server() constructor
install eventlet in your virtual environment
replace the app.run() section in your script with code that starts the eventlet server, or given that you are using Flask, use the Flask-SocketIO extension, which already has this code built in.
add a sio.sleep(0) call inside the loop where you read your file. This will give eventlet a chance to keep all tasks running smoothly.

Use TLS and Python for authentication

I want to make a little update script for a software that runs on a Raspberry Pi and works like a local server. That should connect to a master server in the web to get software updates and also to verify the license of the software.
For that I set up two python scripts. I want these to connect via a TLS socket. Then the client checks the server certificate and the server checks if it's one of the authorized clients. I found a solution for this using twisted on this page.
Now there is a problem left. I want to know which client (depending on the certificate) is establishing the connection. Is there a way to do this in Python 3 with twisted?
I'm happy with every answer.

In a word: yes, this is quite possible, and all the necessary stuff is
ported to python 3 - I tested all the following under Python 3.4 on my Mac and it seems to
work fine.
The short answer is
"use twisted.internet.ssl.Certificate.peerFromTransport"
but given that a lot of set-up is required to get to the point where that is
possible, I've constructed a fully working example that you should be able to
try out and build upon.
For posterity, you'll first need to generate a few client certificates all
signed by the same CA. You've probably already done this, but so others can
understand the answer and try it out on their own (and so I could test my
answer myself ;-)), they'll need some code like this:
# newcert.py
from twisted.python.filepath import FilePath
from twisted.internet.ssl import PrivateCertificate, KeyPair, DN
def getCAPrivateCert():
privatePath = FilePath(b"ca-private-cert.pem")
if privatePath.exists():
return PrivateCertificate.loadPEM(privatePath.getContent())
else:
caKey = KeyPair.generate(size=4096)
caCert = caKey.selfSignedCert(1, CN="the-authority")
privatePath.setContent(caCert.dumpPEM())
return caCert
def clientCertFor(name):
signingCert = getCAPrivateCert()
clientKey = KeyPair.generate(size=4096)
csr = clientKey.requestObject(DN(CN=name), "sha1")
clientCert = signingCert.signRequestObject(
csr, serialNumber=1, digestAlgorithm="sha1")
return PrivateCertificate.fromCertificateAndKeyPair(clientCert, clientKey)
if __name__ == '__main__':
import sys
name = sys.argv[1]
pem = clientCertFor(name.encode("utf-8")).dumpPEM()
FilePath(name.encode("utf-8") + b".client.private.pem").setContent(pem)
With this program, you can create a few certificates like so:
$ python newcert.py a
$ python newcert.py b
Now you should have a few files you can use:
$ ls -1 *.pem
a.client.private.pem
b.client.private.pem
ca-private-cert.pem
Then you'll want a client which uses one of these certificates, and sends some
data:
# tlsclient.py
from twisted.python.filepath import FilePath
from twisted.internet.endpoints import SSL4ClientEndpoint
from twisted.internet.ssl import (
PrivateCertificate, Certificate, optionsForClientTLS)
from twisted.internet.defer import Deferred, inlineCallbacks
from twisted.internet.task import react
from twisted.internet.protocol import Protocol, Factory
class SendAnyData(Protocol):
def connectionMade(self):
self.deferred = Deferred()
self.transport.write(b"HELLO\r\n")
def connectionLost(self, reason):
self.deferred.callback(None)
#inlineCallbacks
def main(reactor, name):
pem = FilePath(name.encode("utf-8") + b".client.private.pem").getContent()
caPem = FilePath(b"ca-private-cert.pem").getContent()
clientEndpoint = SSL4ClientEndpoint(
reactor, u"localhost", 4321,
optionsForClientTLS(u"the-authority", Certificate.loadPEM(caPem),
PrivateCertificate.loadPEM(pem)),
)
proto = yield clientEndpoint.connect(Factory.forProtocol(SendAnyData))
yield proto.deferred
import sys
react(main, sys.argv[1:])
And finally, a server which can distinguish between them:
# whichclient.py
from twisted.python.filepath import FilePath
from twisted.internet.endpoints import SSL4ServerEndpoint
from twisted.internet.ssl import PrivateCertificate, Certificate
from twisted.internet.defer import Deferred
from twisted.internet.task import react
from twisted.internet.protocol import Protocol, Factory
class ReportWhichClient(Protocol):
def dataReceived(self, data):
peerCertificate = Certificate.peerFromTransport(self.transport)
print(peerCertificate.getSubject().commonName.decode('utf-8'))
self.transport.loseConnection()
def main(reactor):
pemBytes = FilePath(b"ca-private-cert.pem").getContent()
certificateAuthority = Certificate.loadPEM(pemBytes)
myCertificate = PrivateCertificate.loadPEM(pemBytes)
serverEndpoint = SSL4ServerEndpoint(
reactor, 4321, myCertificate.options(certificateAuthority)
)
serverEndpoint.listen(Factory.forProtocol(ReportWhichClient))
return Deferred()
react(main, [])
For simplicity's sake we'll just re-use the CA's own certificate for the
server, but in a more realistic scenario you'd obviously want a more
appropriate certificate.
You can now run whichclient.py in one window, then python tlsclient.py a;
python tlsclient.py b in another window, and see whichclient.py print out
a and then b respectively, identifying the clients by the commonName
field in their certificate's subject.
The one caveat here is that you might initially want to put that call to
Certificate.peerFromTransport into a connectionMade method; that won't
work.
Twisted does not presently have a callback for "TLS handshake complete";
hopefully it will eventually, but until it does, you have to wait until you've
received some authenticated data from the peer to be sure the handshake has
completed. For almost all applications, this is fine, since by the time you
have received instructions to do anything (download updates, in your case) the
peer must already have sent the certificate.

How could I make asynchronous mysql operations in tornado using Python3.4?

I now use Python3.4 and I want to use asynchronous mysql client in Tornado. I have found torndb but after reading its source code, I think it couldn't make asynchronous mysql operations because it just encapsulates MySQLdb package.
So is there a way to make asynchronous mysql operations in Tornado?

The canonical way to use MySQL with tornado is to use a separate set of processes to talk to MySQL and use asynchronous http requests to talk to those servers (see also answer #2 in Is Tornado really non-blocking?). These processes can be on the same machine and using tornado, or application servers somewhere else. A minimal example:
import json, sys, time
from MySQLdb import connect, cursors
from tornado import gen, httpclient, web, netutil, process, httpserver, ioloop
class BackendHandler(web.RequestHandler):
def get(self):
time.sleep(1) # simulate longer query
cur = connect(db='tornado', user='root').cursor(cursors.DictCursor)
cur.execute("SELECT * FROM foo")
self.write(json.dumps(list(cur.fetchall())))
class FrontendHandler(web.RequestHandler):
#gen.coroutine
def get(self):
http_client = httpclient.AsyncHTTPClient(max_clients=500)
response = yield http_client.fetch("http://localhost:8001/foo")
self.set_header("Content-Type", 'application/json')
self.write(response.body)
if __name__ == "__main__":
number_of_be_tasks = int(sys.argv[1]) if len(sys.argv) > 1 else 20
number_of_fe_tasks = int(sys.argv[2]) if len(sys.argv) > 2 else 1
fe_sockets = netutil.bind_sockets(8000) # need to bind sockets
be_sockets = netutil.bind_sockets(8001) # before forking
task_id = process.fork_processes(number_of_be_tasks + number_of_fe_tasks)
if task_id < number_of_fe_tasks:
handler_class = FrontendHandler
sockets = fe_sockets
else:
handler_class = BackendHandler
sockets = be_sockets
httpserver.HTTPServer(web.Application([(r"/foo", handler_class)])
).add_sockets(sockets)
ioloop.IOLoop.instance().start()
That said, if the main thing your web server is doing is talking to MySQL directly tornado doesn't win you much (as you'll need as many processes as you want concurrent MySQL connections). In that case a better stack might well be nginx+uwsgi+python. What tornado's really good for is talking to multiple backend servers, using HTTP, potentially in parallel.

Enable access control on simple HTTP server

I have the following shell script for a very simple HTTP server:
#!/bin/sh
echo "Serving at http://localhost:3000"
python -m SimpleHTTPServer 3000
I was wondering how I can enable or add a CORS header like Access-Control-Allow-Origin: * to this server?

Unfortunately, the simple HTTP server is really that simple that it does not allow any customization, especially not for the headers it sends. You can however create a simple HTTP server yourself, using most of SimpleHTTPRequestHandler, and just add that desired header.
For that, simply create a file simple-cors-http-server.py (or whatever) and, depending on the Python version you are using, put one of the following codes inside.
Then you can do python simple-cors-http-server.py and it will launch your modified server which will set the CORS header for every response.
With the shebang at the top, make the file executable and put it into your PATH, and you can just run it using simple-cors-http-server.py too.
Python 3 solution
Python 3 uses SimpleHTTPRequestHandler and HTTPServer from the http.server module to run the server:
#!/usr/bin/env python3
from http.server import HTTPServer, SimpleHTTPRequestHandler, test
import sys
class CORSRequestHandler (SimpleHTTPRequestHandler):
def end_headers (self):
self.send_header('Access-Control-Allow-Origin', '*')
SimpleHTTPRequestHandler.end_headers(self)
if __name__ == '__main__':
test(CORSRequestHandler, HTTPServer, port=int(sys.argv[1]) if len(sys.argv) > 1 else 8000)
Python 2 solution
Python 2 uses SimpleHTTPServer.SimpleHTTPRequestHandler and the BaseHTTPServer module to run the server.
#!/usr/bin/env python2
from SimpleHTTPServer import SimpleHTTPRequestHandler
import BaseHTTPServer
class CORSRequestHandler (SimpleHTTPRequestHandler):
def end_headers (self):
self.send_header('Access-Control-Allow-Origin', '*')
SimpleHTTPRequestHandler.end_headers(self)
if __name__ == '__main__':
BaseHTTPServer.test(CORSRequestHandler, BaseHTTPServer.HTTPServer)
Python 2 & 3 solution
If you need compatibility for both Python 3 and Python 2, you could use this polyglot script that works in both versions. It first tries to import from the Python 3 locations, and otherwise falls back to Python 2:
#!/usr/bin/env python
try:
# Python 3
from http.server import HTTPServer, SimpleHTTPRequestHandler, test as test_orig
import sys
def test (*args):
test_orig(*args, port=int(sys.argv[1]) if len(sys.argv) > 1 else 8000)
except ImportError: # Python 2
from BaseHTTPServer import HTTPServer, test
from SimpleHTTPServer import SimpleHTTPRequestHandler
class CORSRequestHandler (SimpleHTTPRequestHandler):
def end_headers (self):
self.send_header('Access-Control-Allow-Origin', '*')
SimpleHTTPRequestHandler.end_headers(self)
if __name__ == '__main__':
test(CORSRequestHandler, HTTPServer)

Try an alternative like http-server
As SimpleHTTPServer is not really the kind of server you deploy to production, I'm assuming here that you don't care that much about which tool you use as long as it does the job of exposing your files at http://localhost:3000 with CORS headers in a simple command line
# install (it requires nodejs/npm)
npm install http-server -g
#run
http-server -p 3000 --cors
Need HTTPS?
If you need https in local you can also try caddy or certbot
Edit 2022: my favorite solution is now serve, used internally by Next.js.
Just run npx serve --cors
Some related tools you might find useful
ngrok: when running ngrok http 3000, it creates an url https://$random.ngrok.com that permits anyone to access your http://localhost:3000 server. It can expose to the world what runs locally on your computer (including local backends/apis)
localtunnel: almost the same as ngrok
now: when running now, it uploads your static assets online and deploy them to https://$random.now.sh. They remain online forever unless you decide otherwise. Deployment is fast (except the first one) thanks to diffing. Now is suitable for production frontend/SPA code deployment It can also deploy Docker and NodeJS apps. It is not really free, but they have a free plan.

I had the same problem and came to this solution:
class Handler(SimpleHTTPRequestHandler):
def send_response(self, *args, **kwargs):
SimpleHTTPRequestHandler.send_response(self, *args, **kwargs)
self.send_header('Access-Control-Allow-Origin', '*')
I simply created a new class inheriting from SimpleHTTPRequestHandler that only changes the send_response method.

try this: https://github.com/zk4/livehttp. support CORS.
python3 -m pip install livehttp
goto your folder, and run livehttp. that`s all.
http://localhost:5000

You'll need to provide your own instances of do_GET() (and do_HEAD() if choose to support HEAD operations). something like this:
class MyHTTPServer(SimpleHTTPServer):
allowed_hosts = (('127.0.0.1', 80),)
def do_GET(self):
if self.client_address not in allowed_hosts:
self.send_response(401, 'request not allowed')
else:
super(MyHTTPServer, self).do_Get()

My working code:
self.send_response(200)
self.send_header( "Access-Control-Allow-Origin", "*")
self.end_headers()
self.wfile.write( bytes(json.dumps( answ ), 'utf-8'))

using twisted adbapi in ZSI soap

I'm new to python and currently researching its viability to be used as a soap server. I currently have a very rough application that uses the mysql blocking api, but would like to try twisted adbapi. I've successfully used twisted adbapi on regular twisted code using reactors, but can't seem to make it work with code below using ZSI framework. It's not returning anything from mysql. Anyone ever used twisted adbapi with ZSI?
import os
import sys
from dpac_server import *
from ZSI.twisted.wsgi import (SOAPApplication,
soapmethod,
SOAPHandlerChainFactory)
from twisted.enterprise import adbapi
import MySQLdb
def _soapmethod(op):
op_request = GED("http://www.example.org/dpac/", op).pyclass
op_response = GED("http://www.example.org/dpac/", op + "Response").pyclass
return soapmethod(op_request.typecode, op_response.typecode,operation=op, soapaction=op)
class DPACServer(SOAPApplication):
factory = SOAPHandlerChainFactory
#_soapmethod('GetIPOperation')
def soap_GetIPOperation(self, request, response, **kw):
dbpool = adbapi.ConnectionPool("MySQLdb", '127.0.0.1','def_user', 'def_pwd', 'def_db', cp_reconnect=True)
def _dbSPGeneric(txn, cmts):
txn.execute("call def_db.getip(%s)", (cmts, ))
return txn.fetchall()
def dbSPGeneric(cmts):
return dbpool.runInteraction(_dbSPGeneric, cmts)
def returnResults(results):
response.Result = results
def showError(msg):
response.Error = msg
response.Result = ""
response.Error = ""
d = dbSPGeneric(request.Cmts)
d.addCallbacks(returnResults, showError)
return request, response
def main():
from wsgiref.simple_server import make_server
from ZSI.twisted.wsgi import WSGIApplication
application = WSGIApplication()
httpd = make_server('127.0.0.1', 8080, application)
application['dpac'] = DPACServer()
print "listening..."
httpd.serve_forever()
if __name__ == '__main__':
main()

The code you posted creates a new ConnectionPool per (some kind of) request and it never stops the pool. This means you'll eventually run out of resources and you won't be able to service any more requests. "Eventually" is probably after one or two or three requests.
If you never get any responses perhaps this isn't the problem you've encountered. It will be a problem at some point though.
On closer inspection, I wonder if this code even runs the Twisted reactor at all. On first read, I thought you were using some ZSI Twisted integration to run your server. Now I see that you're using wsgiref.simple_server. I am moderately confident that this won't work.
You're already using Twisted, use Twisted's WSGI server instead.
Beyond that, verify that ZSI executes your callbacks in the correct thread. The default for WSGI applications is to run in a non-reactor thread. Twisted APIs are not thread-safe, so if ZSI doesn't do something to correct for this, you'll have bugs introduced by using un-thread-safe APIs in threads.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Mock S3 server stalls if using HTTP/1.1 - python

Related

python-socketio not always emitting when tracking downloading of a file on a flask server

Use TLS and Python for authentication

How could I make asynchronous mysql operations in tornado using Python3.4?

Enable access control on simple HTTP server

using twisted adbapi in ZSI soap

Categories

Resources