Python Flask + nginx fcgi - output large response? - python

I'm using Python Flask + nginx with FCGI.
On some requests, I have to output large responses. Usually those responses are fetched from a socket. Currently I'm doing the response like this:
response = []
while True:
recv = s.recv(1024)
if not recv: break
response.append(recv)
s.close()
response = ''.join(response)
return flask.make_response(response, 200, {
'Content-type': 'binary/octet-stream',
'Content-length': len(response),
'Content-transfer-encoding': 'binary',
})
The problem is I actually do not need the data. I also have a way to determine the exact response length to be fetched from the socket. So I need a good way to send the HTTP headers, then start outputing directly from the socket, instead of collecting it in memory and then supplying to nginx (probably by some sort of a stream).
I was unable to find the solution to this seemingly common issue. How would that be achieved?
Thank you!

if response in flask.make_response is an iterable, it will be iterated over to produce the response, and each string is written to the output stream on it's own.
what this means is that you can also return a generator which will yield the output when iterated over. if you know the content length, then you can (and should) pass it as header.
a simple example:
from flask import Flask
app = Flask(__name__)
import sys
import time
import flask
#app.route('/')
def generated_response_example():
n = 20
def response_generator():
for i in range(n):
print >>sys.stderr, i
yield "%03d\n" % i
time.sleep(.2)
print >>sys.stderr, "returning generator..."
gen = response_generator()
# the call to flask.make_response is not really needed as it happens imlicitly
# if you return a tuple.
return flask.make_response(gen ,"200 OK", {'Content-length': 4*n})
if __name__ == '__main__':
app.run()
if you run this and try it in a browser, you should see a nice incemental count...
(the content type is not set because it seems if i do that my browser waits until the whole content has been streamed before rendering the page. wget -qO - localhost:5000 doesn't have this problems.

Related

How to make a request inside a simple mitmproxy script?

Good day,
I am currently trying to figure out a way to make non blocking requests inside a simple script of mitmproxy, but the documentation doesn't seem to be clear for me for the first look.
I think it's probably the easiest if I show my current code and describe my issue below:
from copy import copy
from mitmproxy import http
def request(flow: http.HTTPFlow):
headers = copy(flow.request.headers)
headers.update({"Authorization": "<removed>", "Requested-URI": flow.request.pretty_url})
req = http.HTTPRequest(
first_line_format="origin_form",
scheme=flow.request.scheme,
port=443,
path="/",
http_version=flow.request.http_version,
content=flow.request.content,
host="my.api.xyz",
headers=headers,
method=flow.request.method
)
print(req.get_text())
flow.response = http.HTTPResponse.make(
200, req.content,
)
Basically I would like to intercept any HTTP(S) request done and make a non blocking request to an API endpoint at https://my.api.xyz/ which should take all original headers and return a png screenshot of the originally requested URL.
However the code above produces an empty content and the print returns nothing either.
My issue seems to be related to: mtmproxy http get request in script and Resubmitting a request from a response in mitmproxy but I still couldn't figure out a proper way of sending requests inside mitmproxy.
The following piece of code probably does what you are looking for:
from copy import copy
from mitmproxy import http
from mitmproxy import ctx
from mitmproxy.addons import clientplayback
def request(flow: http.HTTPFlow):
ctx.log.info("Inside request")
if hasattr(flow.request, 'is_custom'):
return
headers = copy(flow.request.headers)
headers.update({"Authorization": "<removed>", "Requested-URI": flow.request.pretty_url})
req = http.HTTPRequest(
first_line_format="origin_form",
scheme='http',
port=8000,
path="/",
http_version=flow.request.http_version,
content=flow.request.content,
host="localhost",
headers=headers,
method=flow.request.method
)
req.is_custom = True
playback = ctx.master.addons.get('clientplayback')
f = flow.copy()
f.request = req
playback.start_replay([f])
It uses the clientplayback addon in order to send out the request. When this new request is sent, that will generate another request event which will then be an infinite loop. That is the reason for the is_custom attribute I added to the request there. If the request that generated this event is the one that we have created, then we don't want to create a new request from it.

Flask send stream as response

I'm trying to "proxy" my Flask server (i will call it Server#01) with another server(Server#02). It's working well except for one thing : when the Server#01 use send_from_directory(), i don't know how to re-send this file.
My classic "proxy"
result = requests.get(my_path_to_server01)
return Response(stream_with_context(result.iter_content()),
content_type = result.headers['Content-Type'])
With a file a response, it's taking hours... So i tried many things. The one who work is :
result = requests.get(my_path_to_server01, stream=True)
with open('img.png', 'wb') as out_file:
shutil.copyfileobj(result.raw, out_file)
return send_from_directory('./', 'img.png')
I would like to "redirect" my response ("result" variable), or send/copy a stream of my file. Anyways I don't want to use a physical file because it don't seems the proper way in my mind and i can imagine all problems who can happens because of that.
There should not be any problem with your "classic" proxy other than that it should use stream=True, and specify a chunk_size for response.iter_content().
By default chunk_size is 1 byte, so the streaming will be very inefficient and consequently very slow. Trying a larger chunk size, e.g. 10K should yield faster transfers. Here's some code for the proxy.
import requests
from flask import Flask, Response, stream_with_context
app = Flask(__name__)
my_path_to_server01 = 'http://localhost:5000/'
#app.route("/")
def streamed_proxy():
r = requests.get(my_path_to_server01, stream=True)
return Response(r.iter_content(chunk_size=10*1024),
content_type=r.headers['Content-Type'])
if __name__ == "__main__":
app.run(port=1234)
You don't even need to use stream_with_context() here because you don't need access to the request context within the generator returned by iter_content().

Non blocking download of file in flask?

I am making a drop box like service using openstack. I am making a web interface using flask. User gets the object data in content of get request. I am sending data to the user iteratively. But my Flask app stops until the whole object is dowloaded. How could I make it non blocking?
#Returns the json content
r = swift_account.getObject(container_name, object_name)
filename = r.headers['X-Object-Meta-Orig-Filename']
#Make a generator so that all the content are not stored at once in memory
def generate():
for chunk in r.iter_content():
yield chunk
response = make_response(Response(stream_with_context(generate())))
response.headers['Content-Disposition'] = 'attachment; filename=' + filename
return response
Whether flask runs as blocking or non-blocking depends how you run it. When you say it blocks, how are you running it?
Unless it's being run by something with async capabilities like gunicorn with async or at least a threading model for multiple requests like apache with mod_wsgi then it won't be able to respond to more than one request at a time.

Is it possible to use gzip compression with Server-Sent Events (SSE)?

I would like to know if it is possible to enable gzip compression
for Server-Sent Events (SSE ; Content-Type: text/event-stream).
It seems it is possible, according to this book:
http://chimera.labs.oreilly.com/books/1230000000545/ch16.html
But I can't find any example of SSE with gzip compression. I tried to
send gzipped messages with the response header field
Content-Encoding set to "gzip" without success.
For experimenting around SSE, I am testing a small web application
made in Python with the bottle framework + gevent ; I am just running
the bottle WSGI server:
#bottle.get('/data_stream')
def stream_data():
bottle.response.content_type = "text/event-stream"
bottle.response.add_header("Connection", "keep-alive")
bottle.response.add_header("Cache-Control", "no-cache")
bottle.response.add_header("Content-Encoding", "gzip")
while True:
# new_data is a gevent AsyncResult object,
# .get() just returns a data string when new
# data is available
data = new_data.get()
yield zlib.compress("data: %s\n\n" % data)
#yield "data: %s\n\n" % data
The code without compression (last line, commented) and without gzip
content-encoding header field works like a charm.
EDIT: thanks to the reply and to this other question: Python: Creating a streaming gzip'd file-like?, I managed to solve the problem:
#bottle.route("/stream")
def stream_data():
compressed_stream = zlib.compressobj()
bottle.response.content_type = "text/event-stream"
bottle.response.add_header("Connection", "keep-alive")
bottle.response.add_header("Cache-Control", "no-cache, must-revalidate")
bottle.response.add_header("Content-Encoding", "deflate")
bottle.response.add_header("Transfer-Encoding", "chunked")
while True:
data = new_data.get()
yield compressed_stream.compress("data: %s\n\n" % data)
yield compressed_stream.flush(zlib.Z_SYNC_FLUSH)
TL;DR: If the requests are not cached, you likely want to use zlib and declare Content-Encoding to be 'deflate'. That change alone should make your code work.
If you declare Content-Encoding to be gzip, you need to actually use gzip. They are based on the the same compression algorithm, but gzip has some extra framing. This works, for example:
import gzip
import StringIO
from bottle import response, route
#route('/')
def get_data():
response.add_header("Content-Encoding", "gzip")
s = StringIO.StringIO()
with gzip.GzipFile(fileobj=s, mode='w') as f:
f.write('Hello World')
return s.getvalue()
That only really makes sense if you use an actual file as a cache, though.
There's also middleware you can use so you don't need to worry about gzipping responses for each of your methods. Here's one I used recently.
https://code.google.com/p/ibkon-wsgi-gzip-middleware/
This is how I used it (I'm using bottle.py with the gevent server)
from gzip_middleware import Gzipper
import bottle
app = Gzipper(bottle.app())
run(app = app, host='0.0.0.0', port=8080, server='gevent')
For this particular library, you can set w/c types of responses you want to compress by modifying the DEFAULT_COMPRESSABLES variable for example
DEFAULT_COMPRESSABLES = set(['text/plain', 'text/html', 'text/css',
'application/json', 'application/x-javascript', 'text/xml',
'application/xml', 'application/xml+rss', 'text/javascript',
'image/gif'])
All responses go through the middleware and get gzipped without modifying your existing code. By default, it compresses responses whose content-type belongs to DEFAULT_COMPRESSABLES and whose content-length is greater than 200 characters.

detect if a web page is changed

In my python application I have to read many web pages to collect data. To decrease the http calls I would like to fetch only changed pages. My problem is that my code always tells me that the pages have been changed (code 200) but in reality it is not.
This is my code:
from models import mytab
import re
import urllib2
from wsgiref.handlers import format_date_time
from datetime import datetime
from time import mktime
def url_change():
urls = mytab.objects.all()
# this is some urls:
# http://www.venere.com/it/pensioni/venezia/pensione-palazzo-guardi/#reviews
# http://www.zoover.it/italia/sardegna/cala-gonone/san-francisco/hotel
# http://www.orbitz.com/hotel/Italy/Venice/Palazzo_Guardi.h161844/#reviews
# http://it.hotels.com/ho292636/casa-del-miele-susegana-italia/
# http://www.expedia.it/Venezia-Hotel-Palazzo-Guardi.h1040663.Hotel-Information#reviews
# ...
for url in urls:
request = urllib2.Request(url.url)
if url.last_date == None:
now = datetime.now()
stamp = mktime(now.timetuple())
url.last_date = format_date_time(stamp)
url.save()
request.add_header("If-Modified-Since", url.last_date)
try:
response = urllib2.urlopen(request) # Make the request
# some actions
now = datetime.now()
stamp = mktime(now.timetuple())
url.last_date = format_date_time(stamp)
url.save()
except urllib2.HTTPError, err:
if err.code == 304:
print "nothing...."
else:
print "Error code:", err.code
pass
I do not understand what has gone wrong. Can anyone help me?
Web servers aren't required to send a 304 header as the response when you send an 'If-Modified-Since' header. They're free to send a HTTP 200 and send the entire page again.
Sending a 'If-Modified-Since' or 'If-None-Since' alerts the server that you'd like a cached response if available. It's like sending an 'Accept-Encoding: gzip, deflate' header -- you're just telling the server you'll accept something, not requiring it.
A good way to check if a site returns 304 is to use google chromes dev tools. E.g. below is an annotated example of using chrome on the bls website. Keep refreshing and you will see that the server keeps returning 304. If you force refresh with Ctrl+F5 (windows), you will see that instead it returns status code 200.
You can use this technique on your example to find out if the server does not return 304, or if you have incorrectly formatted your request headers somehow. Sometimes a webpage has a resource imported on to it which does not respect the If- headers and so it returns 200 whatever you do (If any resource on the page does not return 304, the whole page will return 200), but sometimes you are only looking at a specific part of a website and you can cheat by loading the resource directly and bypassing the whole document.

Categories

Resources