How to compress a JSON payload from Django Rest API [duplicate]

How to compress a JSON payload from Django Rest API [duplicate] - python

I was wondering: would it be possible to compress the response payload in Django REST?
At the moment, the response payloads are plain JSON data. However, there's quite a lot of data to bounce back and forth so I was wondering if compressing the data would help with the bandwidth issues.

HTTP response compression will most likely not be handled by Django but by your HTTP server using the gzip or deflate algorithms.
You just need to make sure your HTTP server is configured to compress HTTP Responses with Content-Type header set to application/json.
How to enable gzip compression for nginx: https://rtcamp.com/tutorials/nginx/enable-gzip/

The following worked for me.
I actually turned gzip on at the nginx level, not within Django or Django Rest Framework.
/etc/nginx/nginx.conf file:
http {
#... other settings ...#
##
# Gzip Settings
##
gzip on;
gzip_disable "msie6";
gzip_vary on;
gzip_proxied any;
gzip_comp_level 6;
gzip_buffers 16 8k;
gzip_http_version 1.1;
gzip_types text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript;
}
This leaves the compressing up to the nginx server and as most modern browsers automatically know how to extract (uncompress) gzip compression, I didn't need to do anything on my client-side - even when receiving json data inside an Angular spa app.
My 1.3 MB JSON payload turned into about a 180 KB payload.
A pretty quick and fast way to save MB's of data.

If you are using the Django / DRF built-in web server rather than Apache or nginx, that uses its own WSGI server, so those methods won't work for you.
However, Django does have a built-in gzip middleware which you should be able to use, as described in these answers:
https://stackoverflow.com/a/1864377/2540707
https://stackoverflow.com/a/14821684/2540707
That being said, for production use you should be using a real web server rather than Django's built-in one.

Related

uwsgi/nginx configuration for chunked response

I have two endpoints like below:
GET on /api/v1/foo
POST on /api/v1/foo
I need the POST implementation to send back chunked responses using HTTP/1.1 chuked-tranfer encoding however the GET endpoint should send plain JSON
My setup is nginx -> uwsgi -> flask.
I see some of my chunks currently getting truncated at a hex size of 1000 which is 4K in bytes and not the same as my flask layer sent it. Probably because I'm missing some nginx or uwsgi configuration.
uwsgi configuration(uwsgi.ini):
[uwsgi]
route = ^/api/v1/foo$ goto:dochunked
route-run = last:
route-label = dochunked
route-if = equal:$\{REQUEST_METHOD\};POST goto:dopostchunked
route-run = last:
route-label = dopostchunked
route-run = chunked:
nginx configuration:
location / {
uwsgi_pass unix:var/uwsgi.sock;
uwsgi_read_timeout 600;
include uwsgi_params;
}
location /api/v1/foo {
uwsgi_pass unix:var/uwsgi.sock;
uwsgi_read_timeout 600;
include uwsgi_params;
if ($request_method = "POST" ) {
set $chunked_transfer_encoding on;
add_header X-Accel-Buffering no;
}
}
curl response headers
HTTP/1.1 200 OK
Server: nginx/1.10.1
Date: Wed, 03 Jan 2018 00:06:50 GMT
Content-Type: text/plain
Transfer-Encoding: chunked
Connection: keep-alive
X-Frame-Options: deny
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
X-Accel-Buffering: no

Chunking is about Transfer-Encoding, plain JSON is Content-Type, the two things are not related.
The transfer encoding stuff is just about the communication methods used by the two HTTP 1.1 endpoints (the server and the client). Like would be a gzip compression, also. Using chunked transmission avoids using the Content-Length headers and allows the response to be sent in multiple chunks, of course. But on the other side, once the response is received chunks are added, and you should not see any difference between a response sent via Content-Length+big-body-in-one-chunk or a body-sent-in-multiple-chunks.
I say should because you may experience problems with bad HTTP/1.1. libraries which do not wait until the end of the message (last chunk marker) before launching something like an response-receveid event for application languages.
Usually using chunks or not is the responsability of the HTTP server, and you have few contgrol other that because chunks support is a requested feature of HTTP/1.1. Playing with the size of the response body and the size of buffers used by the http server you may see differences on the way chunks are made. If you have multiple actors in the chain (like here flask and Nginx), each actor can decide to reorganize the chunks, merge some of them (buffering), or not.
But as I said, you should not care about it. Unless your client side of the application as bugs with chunked encoding, that would mean your side of the HTTP communication doesn't understand HTTP/1.1.
Finally, if you really need to avoid chunks, but you shouldn't, I see 3 options:
You could enforce an HTTP/1.0 response. No chunks with HTTP/1.0. But that's a very very old version of the protocol. To do that you'll have to ask for HTTP/1.0 in the request side, you'll get an HTTP/1.1 response from Nginx but without the advanced features of HTTP/1.1 (like chunks).
You could use the nginx chunked_transfer_encoding setting. we can see it's on by default, so usually you use that to set it to off on a specific location. Your current way of using it does nothing. This option was made specifically for bad HTTp clients, as stated:
It may come in handy when using a software failing to support chunked
encoding despite the standard’s requirement.
You could maybe also try playing with proxy_buffering off, that may work, I'm unsure.

Python Flask CORS - API always allows any origin

I've looked through many SO answers, and can't seem to find this issue. I have a feeling that I'm just missing something obvious.
I have a basic Flask api, and I've implemented both the flask_cors extension and the custom Flask decorator [#crossdomain from Armin Ronacher].1 (http://flask.pocoo.org/snippets/56/) Both show the same issue.
This is my example app:
application = Flask(__name__,
static_url_path='',
static_folder='static')
CORS(application)
application.config['CORS_HEADERS'] = 'Content-Type'
#application.route('/api/v1.0/example')
#cross_origin(origins=['http://example.com'])
# #crossdomain(origin='http://example.com')
def api_example():
print(request.headers)
response = jsonify({'key': 'value'})
print(response.headers)
return response
(EDIT 3 inserted):
When I make a GET request to that endpoint from JS in a browser (from 127.0.0.1), it always returns 200, when I would expect to see:
Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'http://127.0.0.1:5000' is therefore not allowed access. The response had HTTP status code 403.
CURL:
ACCT:ENVIRON user$ curl -i http://127.0.0.1:5000/api/v1.0/example
HTTP/1.0 200 OK
Content-Type: application/json
Content-Length: 20
Access-Control-Allow-Origin: http://example.com
Server: Werkzeug/0.11.4 Python/2.7.11
Date: [datetime]
{
"key": "value"
}
LOG:
Content-Length:
User-Agent: curl/7.54.0
Host: 127.0.0.1:5000
Accept: */*
Content-Type:
Content-Type: application/json
Content-Length: 20
127.0.0.1 - - [datetime] "GET /api/v1.0/example HTTP/1.1" 200 -
I'm not even seeing all of the proper headers in the response, and it doesn't seem to care what the origin is in the request.
Any ideas what I'm missing? Thanks!
EDIT:
As a side note, looking at the documentation example here (https://flask-cors.readthedocs.io/en/v1.7.4/#a-more-complicated-example), it shows:
#app.route("/")
def helloWorld():
'''
Since the path '/' does not match the regular expression r'/api/*',
this route does not have CORS headers set.
'''
return '''This view is not exposed over CORS.'''
...which is rather interesting since I already have the root path (and others) exposed without any CORS decoration, and they are working fine from any origin. So it seems that there is something fundamentally wrong with this setup.
Along those lines, the tutorial here (https://blog.miguelgrinberg.com/post/designing-a-restful-api-with-python-and-flask) seems to indicate that Flask apis should naturally be exposed without protection (I would assume that's just since the CORS extension hasn't been applied), but my application is basically just operating like the CORS extension doesn't even exist (other than a few notes in the log that you can see).
EDIT 2:
My comments were unclear, so I created three example endpoints on AWS API Gateway with different CORS settings. They are GET method endpoints that simply return "success":
1) CORS not enabled (default):
Endpoint: https://t9is0yupn4.execute-api.us-east-1.amazonaws.com/prod/cors-default
Response:
XMLHttpRequest cannot load
https://t9is0yupn4.execute-api.us-east-1.amazonaws.com/prod/cors-default.
Response to preflight request doesn't pass access control check: No
'Access-Control-Allow-Origin' header is present on the requested
resource. Origin 'http://127.0.0.1:5000' is therefore not allowed
access. The response had HTTP status code 403.
2) CORS enabled - Origin Restricted:
Access-Control-Allow-Headers: 'Content-Type'
Access-Control-Allow-Origin: 'http://example.com'
Endpoint: https://t9is0yupn4.execute-api.us-east-1.amazonaws.com/prod/cors-enabled-example
Response:
XMLHttpRequest cannot load
https://t9is0yupn4.execute-api.us-east-1.amazonaws.com/prod/cors-enabled-example.
Response to preflight request doesn't pass access control check: The
'Access-Control-Allow-Origin' header has a value 'http://example.com'
that is not equal to the supplied origin. Origin
'http://127.0.0.1:5000' is therefore not allowed access.
3) CORS enabled - Origin Wildcard:
Access-Control-Allow-Headers: 'Content-Type'
Access-Control-Allow-Origin: '*'
Endpoint: https://t9is0yupn4.execute-api.us-east-1.amazonaws.com/prod/cors-enabled-wildcard
Response:
"success"
I'm not that experienced with infrastructure, but my expectation was that enabling the Flask CORS extension would cause my api endpoints to mimic this behavior depending on what I set at the origins= setting. What am I missing in this Flask setup?
SOLUTION EDIT:
Alright, so given that something on my end was obviously not normal, I stripped down my app and re-implemented some very basic APIs for each variation of CORS origin restriction. I've been using AWS's elastic beanstalk to host the test environment, so I re-uploaded those examples and ran a JS ajax request to each. It's now working.
I'm getting the Access-Control-Allow-Origin error on naked endpoints. It appears that when I configured the app for deployment I was uncommenting CORS(application, resources=r'/api/*'), which was obviously allowing all origins for the naked endpoints!
I'm not sure why my route with a specific restriction (origins=[]) was also allowing everything, but that must have been some type of typo or something small, because it's working now.
A special thanks to sideshowbarker for all the help!

From your question as-is, it’s not completely clear what behavior you’re expecting. But as far as how the CORS protocol works, it seems like your server is already behaving as expected.
Specifically, the curl response cited in the question shows this response header:
Access-Control-Allow-Origin: http://example.com
That indicates a server already configured to tell browsers, Only allow cross-origin requests from frontend JavaScript code running in browsers if code’s running at the origin http://example.com.
If the behavior you’re expecting is that the server will now refuse requests from non-browser clients such as curl, then CORS configuration on its own isn’t going to cause a server to do that.
The only thing a server does differently when you configure it with CORS support is just to send the Access-Control-Allow-Origin response header and other CORS response headers. That’s it.
Actual enforcement of CORS restrictions is done only by browsers, not by servers.
So no matter what server-side CORS configuration you make, the server still goes on accepting requests from all clients and origins it would otherwise; in other words, all clients from all origins still keep on getting responses from the server just as they would otherwise.
But browsers will only expose responses from cross-origin requests to frontend JavsScript code running at a particular origin if the server the request was sent to opts-in to permitting the request by responding with an Access-Control-Allow-Origin header that allows that origin.
That’s the only thing you can do using CORS configuration. You can’t make a server only accept and respond to requests from particular origins just by doing any server-side CORS configuration. To do that, you need to use something other than just CORS configuration.

How to tell the HTTP server to not send chunked encoding

I am currently writing a HTTP client to do a HTTP POST on a URL that returns a HTTP response.
However, for error messages code 400 and 500, it sends back non chunked HTTP response, and for success messages, 201, it sends a chunked response.
In the request, I am setting the content-length, so I am not sure why it is still sending us the chunked transfer encoding. Is there any other header I can set in the request, that will tell the HTTP server not to send chunked encoding?
headerList.append("POST /v2/charges HTTP/1.1")
headerList.append("Content-Type: application/json")
headerList.append("host: xxxxxxxxx")
headerList.append("request-id: ABCD001123")
headerList.append("Content-length: %d" %len(Msg))
hostReqHeader = "\r\n".join(headerList)
reqData = hostReqHeader + '\r\n\r\n' + qbPosMsg
I am using sockets to send these HTTP messages, and not using httplib or requests library.

Chunked is a required feature of HTTP/1.1. If you do not require any other 1.1-specific features, specify HTTP/1.0 in your request:
headerList.append("POST /v2/charges HTTP/1.0")

The Content-Length header you are specifying in your request applies to the request, not the server's response.
Chunked transfer is only used by the HTTP/1.1 server in a response when the client specifies HTTP/1.1 as the protocol. If you want to disable chunked transfer completely, you specify HTTP/1.0 as the protocol in your request.
Alternatively, make use of an HTTP client library that supports chunked transfer - any one that supports HTTP/1.1 will, because in any HTTP/1.1 conversation the server is free to choose whether to use chunked transfer for any request.
HTTP/1.1 (and indeed HTTP/2) servers still support HTTP/1.0, and HTTP 1.0 remains quite useful for the ability to write simple clients with a few lines of code, that can still query modern web servers (albeit, with the need to TLS-wrap for HTTPS support). So it's quite appropriate in this situation. I think that is the beauty of HTTP, that the basic protocol is so simple. HTTP 1.1 and HTTP 2.0 add progressively more complexity in terms of being able to write a client that supports them, but all that complexity is optional - HTTP 1.0 can still be used.

Django Response always Chunked with text/html cannot set Content-Length

In my Django Application's views.py , I return an HttpResponse object after attempting to set the following HTTP Header fields:
# Create a Response Object with the content to return
response = HttpResponse("%s"%(output_display),mimetype='text/html')
response['Cache-Control'] = 'must-revalidate, max-age=20'
response['Vary'] = 'Accept-Encoding'
response['Transfer-Encoding'] = 'gzip'
#response['Content-Encoding'] = 'gzip'
response['Connection'] = 'close'
#response['Content-Type'] = 'text/html'
response['Content-Length'] = '%s'%(len(output_display))
return response
I then capture the output using the Live HTTP Headers plugin with FireFox, and it looks like:
HTTP/1.1 200 OK
Date: Sun, 10 Mar 2013 14:55:09 GMT
Server: Apache/2.2.22 (Ubuntu)
Transfer-Encoding: gzip, chunked <---------- Why 'chunked'?
Vary: Accept-Encoding
Connection: close
Cache-Control: must-revalidate, max-age=20
Content-Encoding: gzip
Content-Type: text/html <---------------------- No Content-Length even though I set it?
X-Pad: avoid browser bug
I am trying to cache using Apache2's mem_cache, so I need the Content-Length to be set and cannot have 'chunked' for Transfer-Encoding.
My Apache2 mem_cache.conf looks like ( large numbers just for testing ):
<IfModule mod_mem_cache.c>
CacheEnable mem /
MCacheSize 10000
MCacheMaxObjectCount 10000000
MCacheMinObjectSize 1
MCacheMaxObjectSize 10000000
MCacheMaxStreamingBuffer 10000000
</IfModule>
But even though I explicitly set the Content-Length and Transfer-Encoding in my response code, 'chunked' is inserted automatically and therefore my Content-Length is not honored. Why is this? How can I fix this to get the desired response? Thanks -

I came across a similar issue recently with a mod_wsgi application; I was trying to update an apache configuration that was using its built-in disk cache, to use socache/memcache instead.
The disk cache was working, but switching to memcache or shmcb didn't work. If I issued a request for a resource I wanted cached, it wouldn't store it in the cache (CacheDetailHeader is helpful for this). Checking the logs at debug, I found the message:
[Wed Dec 05 18:52:16.571002 2018] [cache_socache:debug] \
[pid 884:tid 140422596777728] mod_cache_socache.c(389): \
[client 127.0.0.1:56576] AH02346: URL 'http://127.0.1.1:80/cacheme/c?' \
had no explicit size, ignoring, referer: http://127.0.0.1/
It seems that socache doesn't like objects that don't have explicit sizes. I tried setting the newer, socache equivalents of those mod_memcache settings to sufficiently large values: CacheSocacheMaxSize and CacheSocacheReadSize.
I know that the Content-Length header was being set and made it through to somewhere; it showed up in the mod_wsgi logs when I deliberately miscalculated it.
A few things I found:
Don't set Transfer-Encoding header yourself, as this is forbidden by the WSGI specification:
Who set the Transfer-Encoding: chunked header?
Even though you're setting the Content-Length header yourself, it's also being gzipped by apache. This changes the length; when Apache doesn't know what the length will be, it switches to chunked and removes the Content-Length header.
I found that with:
Content-Type: text/html
Content-Length set to my utf-8 encoding size
set in the python/mod_wsgi application, and:
SetEnv no-gzip 1
set in the apache configuration, that the object made it into a shmcb cache.
It looks like when apache gzips an object, it changes the headers to that it isn't accepted by socache.
I looked around for ways to make them compatible, but couldn't find too much on this issue. There is some mention of reordering the cache/deflate filters in the mod_cache documentation:
https://httpd.apache.org/docs/2.4/mod/mod_cache.html#finecontrol
This worked if I put in a directive to reorder the cache/deflate filters:
# within a directory
SetOutputFilter CACHE;DEFLATE
Curiously, on a cache miss, the server returned gzipped content, but on a cache hit, the server returned unencoded text/html. This looks odd, but I haven't understood the FilterChain directives well enough to try those out.
I also found some mention of this in a related issue with php/content-length:
https://serverfault.com/questions/183843/content-length-not-sent-when-gzip-compression-enabled-in-apache
The answer there found that if they set the DeflateBufferSize to a large-enough value, then content-length would be set.
I couldn't get this to work.
So it looks like one is stuck between choosing cached or gzipped.

Apache does not Compress/Deflate/GZIP responses

I have set up apache .htaccess file so that it should compress JSONs that my application spews. The application is written in python, and is linked to apache with CGI scripts.
<ifmodule mod_deflate.c>
AddOutputFilterByType DEFLATE text/text text/html text/plain text/xml text/css application/x-javascript application/javascript text/javascript text/json application/json
</ifmodule>
My JSONs returned are still not GZipped, although static files returned are. Are there ideas or thoughts as to how I can fix this?
I am using Apache2

.htaccess doesn't affect script output, so you will have to handle gzip compression from Python app. E.g. in case of Django app you can use GZipMiddleware.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.