I've split my previous question into two.
A consumer of my REST API says that on occasion I am returning a 400 Bad Request - The request sent by the client was syntactically incorrect. error.
My application (Python/Flask) logs don't seem to be capturing this, and neither do my webserver/Nginx logs.
I've changed the default 400 and 404 error message HTML's by adding the following line to the config (and adding the corresponding HTML pages in the proper directory):
error_page 404 /404.html;
location = /404.html {
root html;
}
error_page 400 /400.html;
location = /400.html {
root html;
}
This error message plainly states "Nginx" so that I know Nginx is giving the 400 and not my python/flask application (whose 400 has been changed to plainly state "Flask").
Is there a way that I can intentionally cause Nginx to return a 400 error, perhaps through python, CURL, or Postman? For example, I've also changed the 404 error page, and can intentionally get Nginx to return the corresponding error HTML by calling an invalid URL.
Create a location in NGINX's config to return 400 errors:
location /error {
return 400;
}
Presto.
Nginx will return status code 400 if it receives a header field larger than the configured large_client_header_buffers
A request header field cannot exceed the size of one buffer as well,
or the 400 (Bad Request) error is returned to the client. Buffers are
allocated only on demand. By default, the buffer size is equal to 8K
bytes. If after the end of request processing a connection is
transitioned into the keep-alive state, these buffers are released.
Source: http://nginx.org/en/docs/http/ngx_http_core_module.html#large_client_header_buffers
So you just need to create a curl request with a header larger than 8k. Here's an example using a bit of python to generate the header variable to pass into curl:
(nginx)macbook:nginx joeyoung$ myheader=$(python -c "print 'A'*9000")
(nginx)macbook:nginx joeyoung$ curl -vvv --header "X-MyHeader: $myheader" http://my.example.website.com
Results:
...
>
< HTTP/1.1 400 Bad Request
< Server: nginx/1.4.7
< Date: Wed, 02 Sep 2015 22:37:29 GMT
< Content-Type: text/html
< Content-Length: 248
< Connection: close
Related
I have two endpoints like below:
GET on /api/v1/foo
POST on /api/v1/foo
I need the POST implementation to send back chunked responses using HTTP/1.1 chuked-tranfer encoding however the GET endpoint should send plain JSON
My setup is nginx -> uwsgi -> flask.
I see some of my chunks currently getting truncated at a hex size of 1000 which is 4K in bytes and not the same as my flask layer sent it. Probably because I'm missing some nginx or uwsgi configuration.
uwsgi configuration(uwsgi.ini):
[uwsgi]
route = ^/api/v1/foo$ goto:dochunked
route-run = last:
route-label = dochunked
route-if = equal:$\{REQUEST_METHOD\};POST goto:dopostchunked
route-run = last:
route-label = dopostchunked
route-run = chunked:
nginx configuration:
location / {
uwsgi_pass unix:var/uwsgi.sock;
uwsgi_read_timeout 600;
include uwsgi_params;
}
location /api/v1/foo {
uwsgi_pass unix:var/uwsgi.sock;
uwsgi_read_timeout 600;
include uwsgi_params;
if ($request_method = "POST" ) {
set $chunked_transfer_encoding on;
add_header X-Accel-Buffering no;
}
}
curl response headers
HTTP/1.1 200 OK
Server: nginx/1.10.1
Date: Wed, 03 Jan 2018 00:06:50 GMT
Content-Type: text/plain
Transfer-Encoding: chunked
Connection: keep-alive
X-Frame-Options: deny
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
X-Accel-Buffering: no
Chunking is about Transfer-Encoding, plain JSON is Content-Type, the two things are not related.
The transfer encoding stuff is just about the communication methods used by the two HTTP 1.1 endpoints (the server and the client). Like would be a gzip compression, also. Using chunked transmission avoids using the Content-Length headers and allows the response to be sent in multiple chunks, of course. But on the other side, once the response is received chunks are added, and you should not see any difference between a response sent via Content-Length+big-body-in-one-chunk or a body-sent-in-multiple-chunks.
I say should because you may experience problems with bad HTTP/1.1. libraries which do not wait until the end of the message (last chunk marker) before launching something like an response-receveid event for application languages.
Usually using chunks or not is the responsability of the HTTP server, and you have few contgrol other that because chunks support is a requested feature of HTTP/1.1. Playing with the size of the response body and the size of buffers used by the http server you may see differences on the way chunks are made. If you have multiple actors in the chain (like here flask and Nginx), each actor can decide to reorganize the chunks, merge some of them (buffering), or not.
But as I said, you should not care about it. Unless your client side of the application as bugs with chunked encoding, that would mean your side of the HTTP communication doesn't understand HTTP/1.1.
Finally, if you really need to avoid chunks, but you shouldn't, I see 3 options:
You could enforce an HTTP/1.0 response. No chunks with HTTP/1.0. But that's a very very old version of the protocol. To do that you'll have to ask for HTTP/1.0 in the request side, you'll get an HTTP/1.1 response from Nginx but without the advanced features of HTTP/1.1 (like chunks).
You could use the nginx chunked_transfer_encoding setting. we can see it's on by default, so usually you use that to set it to off on a specific location. Your current way of using it does nothing. This option was made specifically for bad HTTp clients, as stated:
It may come in handy when using a software failing to support chunked
encoding despite the standard’s requirement.
You could maybe also try playing with proxy_buffering off, that may work, I'm unsure.
I am trying to hit the Atlassian Confluence REST API using python requests.
I've successfully called a GET api, but when I call the PUT to update a confluence page, it returns 200, but didn't update the page.
I used chrome::YARC to verify that the API was working properly (which it was). After a while trying to debug it, I reverted to try using urllib3, which worked just fine.
I'd really like to use requests, but I can't for the life of me figure this one out after hours and hours of trying to debug, Google, etc.
I'm running Mac/Python3:
$ uname -a
Darwin mylaptop.local 16.7.0 Darwin Kernel Version 16.7.0: Thu Jun 15 17:36:27 PDT 2017; root:xnu-3789.70.16~2/RELEASE_X86_64 x86_64
$ python3 --version
Python 3.6.1
Here's my code that shows all three ways I'm trying this (two requests and one urllib3):
def update(self, spaceKey, pageTitle, newContent, contentType='storage'):
if contentType not in ('storage', 'wiki', 'plain'):
raise ValueError("Invalid contentType={}".format(contentType))
# Get current page info
self._refreshPage(spaceKey, pageTitle) # I retrieve it before I update it.
orig_version = self.version
# Content already same as requested content. Do nothing
if self.wiki == newContent:
return
data_dict = {
'type' : 'page',
'version' : {'number' : self.version + 1},
'body' : {
contentType : {
'representation' : contentType,
'value' : str(newContent)
}
}
}
data_json = json.dumps(data_dict).encode('utf-8')
put = 'urllib3' #for now until I figure out why requests.put() doesn't work
enable_http_logging()
if put == 'requests':
r = self._cs.api.content(self.id).PUT(json=data_dict)
r.raise_for_status()
elif put == 'urllib3':
urllib3.disable_warnings() # I know, you can quit your whining now!!!
headers = { 'Content-Type' : 'application/json;charset=utf-8' }
auth_header = urllib3.util.make_headers(basic_auth=":".join(self._cs.session.auth))
headers = {**headers, **auth_header}
http = urllib3.PoolManager()
r = http.request('PUT', str(self._cs.api.content(self.id)), body=data_json, headers=headers)
else:
raise ValueError("Huh? Unknown put type: {}".format(put))
enable_http_logging(False)
# Verify page was updated
self._refreshPage(spaceKey, pageTitle) # Check for changes
if self.version != orig_version + 1:
raise RuntimeError("Page not updated. Still at version {}".format(self.version))
if self.wiki != newContent:
raise RuntimeError("Page version updated, but not content.")
Any help would be great.
Update 1: Adding request dump
-----------START-----------
PUT http://confluence.myco.com/rest/api/content/101904815
User-Agent: python-requests/2.18.4
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive
Content-Length: 141
Content-Type: application/json
Authorization: Basic <auth-token-here>==
b'{"type": "page", "version": {"number": 17}, "body": {"storage": {"representation": "storage", "value": "new body here version version 17"}}}'
requests never went back to PUT (Bug???)
What you're observing is requests behaving consistently with web browsers: reacting to HTTP 302 redirect with a GET request.
From Wikipedia:
The user agent (e.g. a web browser) is invited by a response with this code to make a second, otherwise identical, request to the new URL specified in the location field.
(...)
Many web browsers implemented this code in a manner that violated this standard, changing the request type of the new request to GET, regardless of the type employed in the original request (e.g. POST)
(...)
As a consequence, the update of RFC 2616 changes the definition to allow user agents to rewrite POST to GET.
So this behaviour is consistent with RFC 2616. I don't think we can say which of the two libraries behaves "more correctly".
Looks like a difference in how the requests and urllib3 modules deal with switching from http to https. (See #Kos answer above). Here's what I found when I checked the debug logs.
So I got to thinking after #JonClements suggested I send him the Response dump. After doing some research I found the magic runs to enable debugging for requests and urllib3 (See here).
In looking at the diffs from both, I noticed that they were being redirected from http to https for my companies confluence site:
urllib3:
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): confluence.myco.com
DEBUG:urllib3.connectionpool:http://confluence.myco.com:80 "PUT /rest/api/content/101906196 HTTP/1.1" 302 237
DEBUG:urllib3.util.retry:Incremented Retry for (url='http://confluence.myco.com/rest/api/content/101906196'): Retry(total=2, connect=None, read=None, redirect=None, status=None)
INFO:urllib3.poolmanager:Redirecting
http://confluence.myco.com/rest/api/content/101906196 ->
https://confluence.myco.com/rest/api/content/101906196
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): confluence.myco.com
DEBUG:urllib3.connectionpool:https://confluence.myco.com:443 "PUT /rest/api/content/101906196 HTTP/1.1" 200 None
while requests tried with my PUT and then after redirecting went to GET:
DEBUG:urllib3.connectionpool:http://confluence.myco.com:80 "PUT /rest/api/content/101906196 HTTP/1.1" 302 237
DEBUG:urllib3.connectionpool:https://confluence.myco.com:443 "GET /rest/api/content/101906196 HTTP/1.1" 200 None
requests never went back to PUT
I changed my initial url from http: to https: and everything worked fine.
Why does django ignore the HTTP_X_FORWARDED_PROTO if it comes through the wire?
I added to the settings.xml the following config:
# make sure we know when we are secure when we are behind a proxy
SECURE_PROXY_SSL_HEADER = ('HTTP_X_FORWARDED_PROTO', 'https')
I made a test to test that if
def testHttpSupport(self):
url = reverse('configuration-list')
response = self.client.get(url, HTTP_X_FORWARDED_PROTO='https')
cfg = response.data[0]
cfg_url = cfg['url']
self.assertTrue(cfg_url.startswith('https'))
this works fine. The url of the return object starts with https.
however if I try :
curl -v -H 'HTTP_X_FORWARDED_PROTO: https' http://localhost:8000/api/users/
...
> GET /api/users/ HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.51.0
> Accept: */*
> HTTP_X_FORWARDED_PROTO: https
>
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Date: Mon, 03 Jul 2017 16:22:04 GMT
< Server: WSGIServer/0.2 CPython/3.6.1
< Content-Type: application/json
< Allow: GET, POST, OPTIONS
< Vary: Accept, Cookie
< X-Frame-Options: SAMEORIGIN
< Content-Length: 197
<
* Curl_http_done: called premature == 0
* Closing connection 0
[{"url":"http://localhost:8000/api/users/1/",...
How come it does not return 'https://' based urls like in my unit-test?
The issue is the header name. When accessing Django through a WSGI server, you should use the X-Forwarded-Proto header instead of the HTTP_X_FORWARDED_PROTO:
curl -v -H 'X-Forwarded-Proto: https' http://localhost:8000/api/users/
The WSGI protocol states that the relevant CGI specifications must be followed, which say:
Meta-variables with names beginning with 'HTTP_' contain values read
from the client request header fields, if the protocol used is HTTP.
The HTTP header field name is converted to upper case, has all
occurrences of "-" replaced with "_" and has 'HTTP_' prepended to
give the meta-variable name.
(source)
So whenever you are using a WSGI server, the X-Forwarded-Proto header is automatically converted to HTTP_X_FORWARDED_PROTO before it is passed in to Django. When you pass in the HTTP_X_FORWARDED_PROTO header instead, HTTP_ must still be prepended according to the specification. Thus, you end up with a header named HTTP_HTTP_X_FORWARDED_PROTO in Django.
self.client is not a WSGI server, and values passed in through the kwargs are inserted directly into the WSGI environment, without any processing. So in that case you have to do the conversion yourself and actually use the HTTP_X_FORWARDED_PROTO key:
CGI specification
The headers sent via **extra should follow CGI specification. For example, emulating a different “Host” header as sent in the HTTP request from the browser to the server should be passed as HTTP_HOST.
(source)
I wrote my own custom client which sends raw http requests via my wifi card to my flask webserver.
This is what a typical requests looks like:
Content-Length: 214
User-Agent: blah
Connection: close
Host: 1.2.3.4:5000
Content-Type: application/json
{"data":[{"scoutId":2,"message":"ph=5.65"},{"scoutId":4,"message":"ph=4.28"},{"scoutId":3,"message":"ph=4.28"},{"scoutId":2,"message":"ph=5.65"},{"scoutId":4,"message":"ph=4.28"},{"scoutId":3,"message":"ph=4.30"}]}
Sometimes, my clients screw up and send malformed JSON requests to my flask server. Typically, flask will just display:
1.2.3.5 - - [01/Sep/2014 22:13:03] "POST / HTTP/1.1" 400 -
and nothing informative about the request.
I would like to track every single request that resulted in 400 in my environment and analyze what is causing these errors.
Where can I place my custom error function in my flask server?
Try turning this on:
app.config['TRAP_BAD_REQUEST_ERRORS'] = True
This should make flask raise an exception instead of just logging the 400 (see documentation here).
If you need to do something more than that, make an event handler:
http://flask.pocoo.org/docs/0.10/patterns/errorpages/
#app.errorhandler(400)
def page_not_found(exc):
#do something with the exception object `exc` here
....
Or try wrapping the body of your view function in try/except.
In my Django Application's views.py , I return an HttpResponse object after attempting to set the following HTTP Header fields:
# Create a Response Object with the content to return
response = HttpResponse("%s"%(output_display),mimetype='text/html')
response['Cache-Control'] = 'must-revalidate, max-age=20'
response['Vary'] = 'Accept-Encoding'
response['Transfer-Encoding'] = 'gzip'
#response['Content-Encoding'] = 'gzip'
response['Connection'] = 'close'
#response['Content-Type'] = 'text/html'
response['Content-Length'] = '%s'%(len(output_display))
return response
I then capture the output using the Live HTTP Headers plugin with FireFox, and it looks like:
HTTP/1.1 200 OK
Date: Sun, 10 Mar 2013 14:55:09 GMT
Server: Apache/2.2.22 (Ubuntu)
Transfer-Encoding: gzip, chunked <---------- Why 'chunked'?
Vary: Accept-Encoding
Connection: close
Cache-Control: must-revalidate, max-age=20
Content-Encoding: gzip
Content-Type: text/html <---------------------- No Content-Length even though I set it?
X-Pad: avoid browser bug
I am trying to cache using Apache2's mem_cache, so I need the Content-Length to be set and cannot have 'chunked' for Transfer-Encoding.
My Apache2 mem_cache.conf looks like ( large numbers just for testing ):
<IfModule mod_mem_cache.c>
CacheEnable mem /
MCacheSize 10000
MCacheMaxObjectCount 10000000
MCacheMinObjectSize 1
MCacheMaxObjectSize 10000000
MCacheMaxStreamingBuffer 10000000
</IfModule>
But even though I explicitly set the Content-Length and Transfer-Encoding in my response code, 'chunked' is inserted automatically and therefore my Content-Length is not honored. Why is this? How can I fix this to get the desired response? Thanks -
I came across a similar issue recently with a mod_wsgi application; I was trying to update an apache configuration that was using its built-in disk cache, to use socache/memcache instead.
The disk cache was working, but switching to memcache or shmcb didn't work. If I issued a request for a resource I wanted cached, it wouldn't store it in the cache (CacheDetailHeader is helpful for this). Checking the logs at debug, I found the message:
[Wed Dec 05 18:52:16.571002 2018] [cache_socache:debug] \
[pid 884:tid 140422596777728] mod_cache_socache.c(389): \
[client 127.0.0.1:56576] AH02346: URL 'http://127.0.1.1:80/cacheme/c?' \
had no explicit size, ignoring, referer: http://127.0.0.1/
It seems that socache doesn't like objects that don't have explicit sizes. I tried setting the newer, socache equivalents of those mod_memcache settings to sufficiently large values: CacheSocacheMaxSize and CacheSocacheReadSize.
I know that the Content-Length header was being set and made it through to somewhere; it showed up in the mod_wsgi logs when I deliberately miscalculated it.
A few things I found:
Don't set Transfer-Encoding header yourself, as this is forbidden by the WSGI specification:
Who set the Transfer-Encoding: chunked header?
Even though you're setting the Content-Length header yourself, it's also being gzipped by apache. This changes the length; when Apache doesn't know what the length will be, it switches to chunked and removes the Content-Length header.
I found that with:
Content-Type: text/html
Content-Length set to my utf-8 encoding size
set in the python/mod_wsgi application, and:
SetEnv no-gzip 1
set in the apache configuration, that the object made it into a shmcb cache.
It looks like when apache gzips an object, it changes the headers to that it isn't accepted by socache.
I looked around for ways to make them compatible, but couldn't find too much on this issue. There is some mention of reordering the cache/deflate filters in the mod_cache documentation:
https://httpd.apache.org/docs/2.4/mod/mod_cache.html#finecontrol
This worked if I put in a directive to reorder the cache/deflate filters:
# within a directory
SetOutputFilter CACHE;DEFLATE
Curiously, on a cache miss, the server returned gzipped content, but on a cache hit, the server returned unencoded text/html. This looks odd, but I haven't understood the FilterChain directives well enough to try those out.
I also found some mention of this in a related issue with php/content-length:
https://serverfault.com/questions/183843/content-length-not-sent-when-gzip-compression-enabled-in-apache
The answer there found that if they set the DeflateBufferSize to a large-enough value, then content-length would be set.
I couldn't get this to work.
So it looks like one is stuck between choosing cached or gzipped.