If-None-Match header not accepted request header - python

I'm trying to understand how Etag works in Django. I added middleware in settings ('django.middleware.http.ConditionalGetMiddleware') and this seems to work as it generates the Etag:
HTTP/1.0 200 OK
Date: Mon, 15 Jan 2018 16:58:30 GMT
Server: WSGIServer/0.2 CPython/3.6.0
Content-Type: application/json
Vary: Accept
Allow: GET, HEAD, OPTIONS
X-Frame-Options: SAMEORIGIN
Content-Length: 1210
ETag: "060e28ac5f08d82ba0cd876a8af64e6d"
Access-Control-Allow-Origin: *
However, when I put If-None-Match: '*' in the request header, I get the following error:
Request header field If-None-Match is not allowed by Access-Control-Allow-Headers in preflight response.
And I notice the request method sent back in the response is OPTIONS and the rest of the headers look like this:
HTTP/1.0 200 OK
Date: Mon, 15 Jan 2018 17:00:26 GMT
Server: WSGIServer/0.2 CPython/3.6.0
Content-Type: text/html; charset=utf-8
Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: accept, accept-encoding, authorization, content-type, dnt, origin, user-agent, x-csrftoken, x-requested-with
Access-Control-Allow-Methods: DELETE, GET, OPTIONS, PATCH, POST, PUT
Access-Control-Max-Age: 86400
So the question is how do I get If-None-Match to be an allowed header? I'm not sure if this is a server or client issue. I'm using Django/DRF/Vue for my stack and Axios for making http requests.

As the response contains various CORS headers, I believe you have already used django-cors-headers, you could adjust Access-Control-Allow-Headers with CORS_ALLOW_HEADERS config option, get more detail on its doc.

Related

HTTP header cut in half with `urllib3.exceptions.HeaderParsingError: [MissingHeaderBodySeparatorDefect()], unparsed data`

I spotted a weird warning in logs:
[WARNING] urllib3.connectionpool:467: Failed to parse headers (url=https://REDACTED): [MissingHeaderBodySeparatorDefect()], unparsed data: 'trol,Content-Type\r\n\r\n'
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 465, in _make_request
assert_header_parsing(httplib_response.msg)
File "/usr/local/lib/python3.8/dist-packages/urllib3/util/response.py", line 91, in assert_header_parsing
raise HeaderParsingError(defects=defects, unparsed_data=unparsed_data)
urllib3.exceptions.HeaderParsingError: [MissingHeaderBodySeparatorDefect()], unparsed data: 'trol,Content-Type\r\n\r\n'
This is from calling a standard requests.post() on a web service I fully control (a Python app behind nginx).
When I turn on debuglevel=1 in http.client.HTTPResponse I see this:
reply: 'HTTP/1.1 200 OK\r\n'
header: Server: nginx/1.18.0 (Ubuntu)
header: Date: Tue, 30 Nov 2021 22:14:04 GMT
header: Content-Type: application/json
header: Transfer-Encoding: chunked
header: Connection: keep-alive
header: Vary: Accept-Encoding
header: Access-Control-Allow-Origin: *
header: Access-Control-Allow-Credentials: true
header: Access-Control-Allow-Methods: GET, POST, OPTIONS
header: Access-Control-Allow-Headers: DNT,X-Mx-ReqToken,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Con
Note the last header ending abruptly in ,If-Modified-Since,Cache-Con.
Clearly, requests==2.26.0 (via urllib3==1.26.7 via http.client) cuts the last header in half for some reason during parsing, and then later complains it has "left over" data with the remaining trol,Content-Type\r\n\r\n.
In this case the warning is not critical, because the header is not really needed. But it's scary this is happening, because… what else is being cut / misparsed?
The same endpoint works fine from e.g. curl:
$ curl -i -XPOST https://REDACTED
HTTP/1.1 200 OK
Server: nginx/1.18.0 (Ubuntu)
Date: Sat, 04 Dec 2021 20:08:59 GMT
Content-Type: application/json
Content-Length: 53
Connection: keep-alive
Vary: Accept-Encoding
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
Access-Control-Allow-Methods: GET, POST, OPTIONS
Access-Control-Allow-Headers: DNT,X-Mx-ReqToken,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Con
trol,Content-Type
…JSON response…
Any idea what could be wrong? Many thanks.
Your webserver, or its configuration, looks broken. Have a look at what is generating that CORS Access-Control-Allow-Headers header because it is not permitted to contain a line break.

Python requests and urllib2 get different headers when connecting to the same host

We've got a server providing .txt files, basically some log files growing over time. When I use urllib2 to send GET to the server r = urllib2.urlopen('http://example.com') , the headers of the response would be:
Date: XXX
Server: Apache
Last-Modified: XXX
Accept-Ranges: bytes
Content-Length: 12345678
Vary: Accept-Encoding
Connection: close
Content-Type: text/plain
While if r = requests.get('http://example.com'):
Content-Encoding: gzip
Accept-Ranges: bytes
Vary: Accept-Encoding
Keep-alive: timeout=5, max=128
Last-Modified: XXX
Connection: Keep-Alive
ETag: xxxxxxxxx
Content-Type: text/plain
The second response is the same with what I get using chrome develop tools. So why the two are different? I need the Content-Length header to determine how many bytes I need to download every time, becasue the file could grow really big.
EDIT:
Using httpbin.org/get to test:
urllib2 response:
{u'args': {},
u'headers': {u'Accept-Encoding': u'identity',
u'Host': u'httpbin.org',
u'User-Agent': u'Python-urllib/2.7'},
u'origin': u'ip',
u'url': u'http://httpbin.org/get'}
response headers:
Server: nginx
Date: Sat, 14 Jan 2017 07:41:16 GMT
Content-Type: application/json
Content-Length: 207
Connection: close
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
requests response:
{u'args': {},
u'headers': {u'Accept': u'*/*',
u'Accept-Encoding': u'gzip, deflate',
u'Host': u'httpbin.org',
u'User-Agent': u'python-requests/2.11.1'},
u'origin': u'ip',
u'url': u'http://httpbin.org/get'}
response headers:
Server : nginx
Date : Sat, 14 Jan 2017 07:42:39 GMT
Content-Type : application/json
Content-Length : 239
Connection : keep-alive
Access-Control-Allow-Origin : *
Access-Control-Allow-Credentials : true
Quote from Lukasa at github:
The response is different because requests indicates that it supports
gzip-encoded bodies, by sending an Accept-Encoding: gzip, deflate
header field. urllib2 does not. You'll find if you added that header
to your urllib2 request that you get the new behaviour.
Clearly, in this case, the server is dynamically gzipping the
responses. This means it doesn't know how long the response will be,
so it is sending using chunked transfer encoding.
If you really must get the Content-Length header, then you should add
the following headers to your Requests request: {'Accept-Encoding':
'identity'}.

urllib2 returns sometimes old page - returns strange header

I'm working on a python script which works with a JSON returned by an URL.
Since a couple of days urllib2 returns (just sometimes) an old state of the JSON.
I did add the headers "Cache-Control":"max-age=0" etc. still it sometimes happen.
If I print out the request info I get:
Server: nginx/1.8.0
Date: Thu, 03 Sep 2015 17:02:47 GMT
Content-Type: application/json; charset=utf-8
Content-Length: 3539
Status: 200 OK
X-XHR-Current-Location: /shop/169464.json
X-UA-Compatible: IE=Edge,chrome=1
ETag: "b1fbe7a01e0832025a3afce23fc2ab56"
X-Request-Id: 4cc0d399f943ad09a903f18a6ce1c488
X-Runtime: 0.123033
X-Rack-Cache: miss
Accept-Ranges: bytes
X-Varnish: 1707606900 1707225496
Age: 2860
Via: 1.1 varnish
Cache-Control: private, max-age=0, must-revalidate
Pragma: no-cache
X-Cache: HIT
X-Cache: MISS from adsl
X-Cache-Lookup: MISS from adsl:21261
Connection: close
has it something to do with the header "Age" or "X-Cache-Rack"? Or any ideas how I can fix it?
thanks in advance!
try to fake the user-agent, remove cookies, drop sessions.
fake_user_agent = ['chrome','firefox','safari']
request = urllib2.Request(url)
request.add_header('User-Agent', get_random(fake_user_agent))
content = urllib2.build_opener().open(request)
if all doesn't work, then try using tor to change ip per request.
if nothing works, then you can't bypass it because you are most definitely connecting to the transparent proxy

Is there a way to tell if a page opened with Mechanize isn't returning "search results"?

I am using Mechanize to log in to a web site and make a search. After extracting the links/info I want, I then recurisively move from the current page to the next to the next page. What I'm wondering is if there's an easy way to tell -- based on header information, for instance -- if there are "No results found" or similar page. If so, I could quickly check the header for a "404" or no-results page and then return.
I couldn't find it in the documentation and from what I can tell the answer is no. Can anyone here say more definitely, tho, whether the answer is in fact no?? Thanks in advance.
(Presently I just do a .find() for 'no results' after I .read() the link.)
NOTES:
1) Header Info for a "good" page (with results):
reply: 'HTTP/1.1 200 OK\r\n'
header: Server: nginx
header: Date: Thu, 12 Sep 2013 18:33:10 GMT
header: Content-Type: text/html; charset=utf-8
header: Transfer-Encoding: chunked
header: Connection: close
header: Vary: Accept-Encoding
header: Status: 200 OK
header: X-UA-Compatible: IE=Edge,chrome=1
header: Cache-Control: must-revalidate, private, max-age=0
header: X-Request-Id: b501064808b265fc6e478fa88e622710
header: X-Runtime: 0.478829
header: X-Rack-Cache: miss
header: Content-Encoding: gzip
2) Header Info from a "bad" (no results page)
reply: 'HTTP/1.1 200 OK\r\n'
header: Server: nginx
header: Date: Thu, 12 Sep 2013 18:33:11 GMT
header: Content-Type: text/html; charset=utf-8
header: Transfer-Encoding: chunked
header: Connection: close
header: Vary: Accept-Encoding
header: Status: 200 OK
header: X-UA-Compatible: IE=Edge,chrome=1
header: Cache-Control: must-revalidate, private, max-age=0
header: X-Request-Id: 1ae89b2b25ba7983f8a48fa17f7a1798
header: X-Runtime: 0.127865
header: X-Rack-Cache: miss
header: Content-Encoding: gzip
The response header is generated by the server, you could add your own "no results" parameter and parse that...otherwise you have to analyze the content.
If you're set on using the header the only thing I can see between the two is that the bad search returned 4x faster -- maybe you could find a moving average for elapsed response times.

how can I get complete header info from urlib2 request?

I am using the python urllib2 library for opening URL, and what I want is to get the complete header info of the request. When I use response.info I only get this:
Date: Mon, 15 Aug 2011 12:00:42 GMT
Server: Apache/2.2.0 (Unix)
Last-Modified: Tue, 01 May 2001 18:40:33 GMT
ETag: "13ef600-141-897e4a40"
Accept-Ranges: bytes
Content-Length: 321
Connection: close
Content-Type: text/html
I am expecting the complete info as given by live_http_headers (add-on for firefox), e.g:
http://www.yellowpages.com.mt/Malta-Web/127151.aspx
GET /Malta-Web/127151.aspx HTTP/1.1
Host: www.yellowpages.com.mt
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-gb,en;q=0.5
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Cookie: __utma=156587571.1883941323.1313405289.1313405289.1313405289.1; __utmz=156587571.1313405289.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)
HTTP/1.1 302 Found
Connection: Keep-Alive
Content-Length: 141
Date: Mon, 15 Aug 2011 12:17:25 GMT
Location: http://www.trucks.com.mt
Content-Type: text/html; charset=utf-8
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET, UrlRewriter.NET 2.0.0
X-AspNet-Version: 2.0.50727
Set-Cookie: ASP.NET_SessionId=zhnqh5554omyti55dxbvmf55; path=/; HttpOnly
Cache-Control: private
My request function is:
def dorequest(url, post=None, headers={}):
cOpener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookielib.CookieJar()))
urllib2.install_opener( cOpener )
if post:
post = urllib.urlencode(post)
req = urllib2.Request(url, post, headers)
response = cOpener.open(req)
print response.info() // this does not give complete header info, how can i get complete header info??
return response.read()
url = 'http://www.yellowpages.com.mt/Malta-Web/127151.aspx'
html = dorequest(url)
Is it possible to achieve the desired header info details by using urllib2? I don't want to switch to httplib.
Those are all of the headers the server is sending when you do the request with urllib2.
Firefox is showing you the headers it's sending to the server as well.
When the server gets those headers from Firefox, some of them may trigger it to send back additional headers, so you end up with more response headers as well.
Duplicate the exact headers Firefox sends, and you'll get back an identical response.
Edit: That location header is sent by the page that does the redirect, not the page you're redirected to. Just use response.url to get the location of the page you've been sent to.
That first URL uses a 302 redirect. If you don't want to follow the redirect, but see the headers from the first page instead, use a URLOpener instead of a FancyURLOpener, which automatically follows redirects.
I see that server returns HTTP/1.1 302 Found - HTTP redirect.
urllib automatically follow redirects, so headers returned by urllib is headers from http://www.trucks.com.mt, not http://www.yellowpages.com.mt/Malta-Web/127151.aspx

Categories

Resources