I'm trying to "proxy" my Flask server (i will call it Server#01) with another server(Server#02). It's working well except for one thing : when the Server#01 use send_from_directory(), i don't know how to re-send this file.
My classic "proxy"
result = requests.get(my_path_to_server01)
return Response(stream_with_context(result.iter_content()),
content_type = result.headers['Content-Type'])
With a file a response, it's taking hours... So i tried many things. The one who work is :
result = requests.get(my_path_to_server01, stream=True)
with open('img.png', 'wb') as out_file:
shutil.copyfileobj(result.raw, out_file)
return send_from_directory('./', 'img.png')
I would like to "redirect" my response ("result" variable), or send/copy a stream of my file. Anyways I don't want to use a physical file because it don't seems the proper way in my mind and i can imagine all problems who can happens because of that.
There should not be any problem with your "classic" proxy other than that it should use stream=True, and specify a chunk_size for response.iter_content().
By default chunk_size is 1 byte, so the streaming will be very inefficient and consequently very slow. Trying a larger chunk size, e.g. 10K should yield faster transfers. Here's some code for the proxy.
import requests
from flask import Flask, Response, stream_with_context
app = Flask(__name__)
my_path_to_server01 = 'http://localhost:5000/'
#app.route("/")
def streamed_proxy():
r = requests.get(my_path_to_server01, stream=True)
return Response(r.iter_content(chunk_size=10*1024),
content_type=r.headers['Content-Type'])
if __name__ == "__main__":
app.run(port=1234)
You don't even need to use stream_with_context() here because you don't need access to the request context within the generator returned by iter_content().
Related
I have the following code that I am using to try to chunk response from a http.client.HTTPSConnection get request to an API (please note that the response is gzip encoded:
connection = http.client.HTTPSConnection(api, context = ssl._create_unverified_context())
connection.request('GET', api_url, headers = auth)
response = connection.getresponse()
while chunk := response.read(20):
data = gzip.decompress(chunk)
data = json.loads(chunk)
print(data)
This always gives out an error that it is not a gzipped file (b'\xe5\x9d'). Not sure how I can chunk data and still achieve what I am trying to do here. Basically, I am chunking so that I don't have to load the entire response in memory.
Please note I can't use any other libraries like requests, urllib etc.
The most probable reason for that is, the response your received is indeed not a gzipped file.
I notice that in your code, you pass a variable called auth. Typically, a server won't send you a compressed response if you don't specify in the request headers that you can accept it. If there is only auth-related keys in your headers like your variable name suggests, you won't receive a gzipped response. First, make sure you have 'Accept-Encoding': 'gzip' in your headers.
Going forward, you will face another problem:
Basically, I am chunking so that I don't have to load the entire response in memory.
gzip.decompress will expect a complete file, so you would need to reconstruct it and load it entirely in memory before doing that, which would undermine the whole point of chunking the response. Trying to decompress a part of a gzip with gzip.decompress will most likely give you an EOFError saying something like Compressed file ended before the end-of-stream marker was reached.
I don't know if you can manage that directly with the gzip library, but I know how to do it with zlib. You will also need to convert your chunk to a file-like object, you can do that with io.BytesIO. I see you have very strong constraints on libraries, but zlib and io are part of the python default, so hopefully you have them available. Here is a rework of your code that should help you going on:
import http
import ssl
import gzip
import zlib
from io import BytesIO
# your variables here
api = 'your_api_host'
api_url = 'your_api_endpoint'
auth = {'AuhtKeys': 'auth_values'}
# add the gzip header
auth['Accept-Encoding'] = 'gzip'
# prepare decompressing object
decompressor = zlib.decompressobj(16 + zlib.MAX_WBITS)
connection = http.client.HTTPSConnection(api, context = ssl._create_unverified_context())
connection.request('GET', api_url, headers = auth)
response = connection.getresponse()
while chunk := response.read(20):
data = decompressor.decompress(BytesIO(chunk).read())
print(data)
The problem is that gzip.decompress expects a complete file, you can't just provide a chunk to it, because the deflate algorithm relies on previous data during decompression. The whole point of the algorithm is that it's able to repeat something that it has already seen before, therefore, all data is required.
However, deflate only cares about the last 32 KiB or so. Therefore, it is possible to stream decompress such a file without needing much memory. This is not something you need to implement yourself though, Python provides the gzip.GzipFile class which can be used to wrap the file handle and behaves like a normal file:
import io
import gzip
# Create a file for testing.
# In your case you can just use the response object you get.
file_uncompressed = ""
for line_index in range(10000):
file_uncompressed += f"This is line {line_index}.\n"
file_compressed = gzip.compress(file_uncompressed.encode())
file_handle = io.BytesIO(file_compressed)
# This library does all the heavy lifting
gzip_file = gzip.GzipFile(fileobj=file_handle)
while chunk := gzip_file.read(1024):
print(chunk)
Here is a body of code that works, taken from: https://stackoverflow.com/a/18043472
It uses the requests module in python to download an image.
import requests, shutil
url = 'http://example.com/img.png'
response = requests.get(url, stream=True)
with open('img.png', 'wb') as out_file:
shutil.copyfileobj(response.raw, out_file)
del response
Two questions I've been thinking about:
1) Why is it necessary to set stream=True? (I've tested it without that parameter and the image is blank) Conceptually, I don't understand what a streaming GET request is.
2) What's the difference between a raw response and a response? (Why is shutil.copyfileobj necessary, why can't I just directly write to file?)
Thanks!
Quote from documentation:
If you set stream to True when making a request, Requests cannot
release the connection back to the pool unless you consume all the
data or call Response.close.
More info here.
I am making a drop box like service using openstack. I am making a web interface using flask. User gets the object data in content of get request. I am sending data to the user iteratively. But my Flask app stops until the whole object is dowloaded. How could I make it non blocking?
#Returns the json content
r = swift_account.getObject(container_name, object_name)
filename = r.headers['X-Object-Meta-Orig-Filename']
#Make a generator so that all the content are not stored at once in memory
def generate():
for chunk in r.iter_content():
yield chunk
response = make_response(Response(stream_with_context(generate())))
response.headers['Content-Disposition'] = 'attachment; filename=' + filename
return response
Whether flask runs as blocking or non-blocking depends how you run it. When you say it blocks, how are you running it?
Unless it's being run by something with async capabilities like gunicorn with async or at least a threading model for multiple requests like apache with mod_wsgi then it won't be able to respond to more than one request at a time.
I would like to know if it is possible to enable gzip compression
for Server-Sent Events (SSE ; Content-Type: text/event-stream).
It seems it is possible, according to this book:
http://chimera.labs.oreilly.com/books/1230000000545/ch16.html
But I can't find any example of SSE with gzip compression. I tried to
send gzipped messages with the response header field
Content-Encoding set to "gzip" without success.
For experimenting around SSE, I am testing a small web application
made in Python with the bottle framework + gevent ; I am just running
the bottle WSGI server:
#bottle.get('/data_stream')
def stream_data():
bottle.response.content_type = "text/event-stream"
bottle.response.add_header("Connection", "keep-alive")
bottle.response.add_header("Cache-Control", "no-cache")
bottle.response.add_header("Content-Encoding", "gzip")
while True:
# new_data is a gevent AsyncResult object,
# .get() just returns a data string when new
# data is available
data = new_data.get()
yield zlib.compress("data: %s\n\n" % data)
#yield "data: %s\n\n" % data
The code without compression (last line, commented) and without gzip
content-encoding header field works like a charm.
EDIT: thanks to the reply and to this other question: Python: Creating a streaming gzip'd file-like?, I managed to solve the problem:
#bottle.route("/stream")
def stream_data():
compressed_stream = zlib.compressobj()
bottle.response.content_type = "text/event-stream"
bottle.response.add_header("Connection", "keep-alive")
bottle.response.add_header("Cache-Control", "no-cache, must-revalidate")
bottle.response.add_header("Content-Encoding", "deflate")
bottle.response.add_header("Transfer-Encoding", "chunked")
while True:
data = new_data.get()
yield compressed_stream.compress("data: %s\n\n" % data)
yield compressed_stream.flush(zlib.Z_SYNC_FLUSH)
TL;DR: If the requests are not cached, you likely want to use zlib and declare Content-Encoding to be 'deflate'. That change alone should make your code work.
If you declare Content-Encoding to be gzip, you need to actually use gzip. They are based on the the same compression algorithm, but gzip has some extra framing. This works, for example:
import gzip
import StringIO
from bottle import response, route
#route('/')
def get_data():
response.add_header("Content-Encoding", "gzip")
s = StringIO.StringIO()
with gzip.GzipFile(fileobj=s, mode='w') as f:
f.write('Hello World')
return s.getvalue()
That only really makes sense if you use an actual file as a cache, though.
There's also middleware you can use so you don't need to worry about gzipping responses for each of your methods. Here's one I used recently.
https://code.google.com/p/ibkon-wsgi-gzip-middleware/
This is how I used it (I'm using bottle.py with the gevent server)
from gzip_middleware import Gzipper
import bottle
app = Gzipper(bottle.app())
run(app = app, host='0.0.0.0', port=8080, server='gevent')
For this particular library, you can set w/c types of responses you want to compress by modifying the DEFAULT_COMPRESSABLES variable for example
DEFAULT_COMPRESSABLES = set(['text/plain', 'text/html', 'text/css',
'application/json', 'application/x-javascript', 'text/xml',
'application/xml', 'application/xml+rss', 'text/javascript',
'image/gif'])
All responses go through the middleware and get gzipped without modifying your existing code. By default, it compresses responses whose content-type belongs to DEFAULT_COMPRESSABLES and whose content-length is greater than 200 characters.
I'm using Python Flask + nginx with FCGI.
On some requests, I have to output large responses. Usually those responses are fetched from a socket. Currently I'm doing the response like this:
response = []
while True:
recv = s.recv(1024)
if not recv: break
response.append(recv)
s.close()
response = ''.join(response)
return flask.make_response(response, 200, {
'Content-type': 'binary/octet-stream',
'Content-length': len(response),
'Content-transfer-encoding': 'binary',
})
The problem is I actually do not need the data. I also have a way to determine the exact response length to be fetched from the socket. So I need a good way to send the HTTP headers, then start outputing directly from the socket, instead of collecting it in memory and then supplying to nginx (probably by some sort of a stream).
I was unable to find the solution to this seemingly common issue. How would that be achieved?
Thank you!
if response in flask.make_response is an iterable, it will be iterated over to produce the response, and each string is written to the output stream on it's own.
what this means is that you can also return a generator which will yield the output when iterated over. if you know the content length, then you can (and should) pass it as header.
a simple example:
from flask import Flask
app = Flask(__name__)
import sys
import time
import flask
#app.route('/')
def generated_response_example():
n = 20
def response_generator():
for i in range(n):
print >>sys.stderr, i
yield "%03d\n" % i
time.sleep(.2)
print >>sys.stderr, "returning generator..."
gen = response_generator()
# the call to flask.make_response is not really needed as it happens imlicitly
# if you return a tuple.
return flask.make_response(gen ,"200 OK", {'Content-length': 4*n})
if __name__ == '__main__':
app.run()
if you run this and try it in a browser, you should see a nice incemental count...
(the content type is not set because it seems if i do that my browser waits until the whole content has been streamed before rendering the page. wget -qO - localhost:5000 doesn't have this problems.