Python requests library doesn't have all headers - python

I am using python requests library for a POST request and I expect a return message with an empty payload. I am interested in the headers of the returned message, specifically the 'Location' attribute. I tried the following code:
response=requests.request(method='POST', url=url, headers={'Content-Type':'application/json'}, data=data)
print response.headers ##Displays a case-insensitve map
print response.headers['Location'] ##blows up
Strangely the 'Location' attribute is missing in the headers map. If I try the same POST request on postman, I do get a valid Location attribute. Has anyone else seen this? Is this a bug in the requests library?

Sounds like everything's working as expected? Check your response.history
From the Requests documentation:
Requests will automatically perform location redirection for all verbs except HEAD.
>>> r = requests.get('http://github.com')
>>> r.url
'https://github.com/'
>>> r.status_code
200
>>> r.history
[<Response [301]>]
From the HTTP Location page on wikipedia:
The HTTP Location header field is returned in responses from an HTTP server under two circumstances:
To ask a web browser to load a different web page. In this circumstance, the Location header should be sent with an HTTP status code of 3xx. It is passed as part of the response by a web server when the requested URI has:
Moved temporarily, or
Moved permanently
To provide information about the location of a newly-created resource. In this circumstance, the Location header should be sent with an HTTP status code of 201 or 202.1

The requests library follows redirections automatically.
To take a look at the redirections, look at the history of the requests. More details in the docs.
Or you pass the extra allow_redirects=False parameter when making the request.

Related

How to send entire http request from file?

I am writing a small application that interprets the http response of a request. I am writing the application in python. I have not found anything that allows me to send the body + headers stored in one file. I can send certain parts like the headers but not the entire request.
For example, if the request is:
GET /index.html HTTP/1.1
Host: localhost
Cookie: bob=lemon
I want to send this entire request in one go. How would I do this in python?
Check out the python requests library. https://requests.readthedocs.io/en/master/user/quickstart/#make-a-request
For the request above it would look something like
import requests
url = 'http://localhost:[YOUR PORT HERE]/'
cookies = {bob : lemon}
r = requests.get(url, cookies=cookies)
To check if you had a successful request you should get a 200 code from.
r.status_code
Check out the library for more, it is very extensive.

403 (Forbidden) error when using Fortnite Tracker api with urllib.request

I am attempting to get user statistics from the Fortnite tracker api.
I have an api key and am using the correct url as indicated in the documentation
Template url:
https://api.fortnitetracker.com/v1/profile/{platform}/{epic-nickname}
Desired url:
https://api.fortnitetracker.com/v1/profile/pc/xantium0
If I use this link in browser I get {"message":"No API key found in request"} (as I have not passed the API key) so the link should be correct. Also if I do not pass the api key with urllib then I still get a 403 error.
I have checked out how to pass a header in a request: How do I set headers using python's urllib?
and so far have this code:
import urllib.request as ur
request = ur.Request('https://api.fortnitetracker.com/v1/profile/pc/xantium0', headers={'TRN-Api-Key' : 'xxx'})
response = ur.urlopen(request)
print(response.read())
When run I get this error:
urllib.error.HTTPError: HTTP Error 403: Forbidden
403 checks out as:
HTTP 403 is a standard HTTP status code communicated to clients by an HTTP server to indicate that the server understood the request, but will not fulfill it. There are a number of sub-status error codes that provide a more specific reason for responding with the 403 status code.
https://en.wikipedia.org/wiki/HTTP_403
The response is the same if I don't pass the api key in the header.
I can only think of three reasons this code is not working:
I have passed the wrong header name (i.e. it's not TRN-Api-Key)
My code is incorrect and I am not actually passing a header to the server
I have been banned
My problem is that I think my code is correct:
From the documentation:
urllib.request.Request(url, data=None, headers={}, origin_req_host=None, unverifiable=False, method=None)
I have passed the url and I have passed the headers (wihout confusing with the data arguement). The api documentation also mentions it should be passed in the headers.
I am also quite sure I need to use the TRN-Api-Key as it is shown in the api documentation:
TRN-Api-Key: xxx
Also in this question (using Ruby):
header = {
key: "TRN-Api-Key: Somelong-api-key-here"
}
Or I have been banned (this is possible although I got the key 15 minutes ago) is there a way to check? Would this error be returned?
What is preventing me from getting the user statistics?
Try using requests, a pythonic, fast and widely used module.
import requests
url = 'https://api.fortnitetracker.com/v1/profile/pc/xantium0'
headers = {
'TRN-Api-Key' : 'xxx'
}
response = requests(url, headers=headers)
print('Requests was successful:', response.ok)
print(response.text)
If it doesn't work you can visit the url with your browser, then check the requests:
in Firefox press Cntrl+Shift+E, in Chrome Cntrl+E (or Inspect with Cntrl+Shift+I and then go to Network). Press on "https://api.fortnitetracker.com/v1/profile/pc/xantium0" and change the headers. On Firefox there's the button Modify and resend. Check the response and eventually, try to change the header api key name.
Hope this helps, let me know.

HTTP Error 307 - Temporary redirect in python script [duplicate]

I'm using Python 3.7 with urllib.
All work fine but it seems not to athomatically redirect when it gets an http redirect request (307).
This is the error i get:
ERROR 2020-06-15 10:25:06,968 HTTP Error 307: Temporary Redirect
I've to handle it with a try-except and manually send another request to the new Location: it works fine but i don't like it.
These is the piece of code i use to perform the request:
req = urllib.request.Request(url)
req.add_header('Authorization', auth)
req.add_header('Content-Type','application/json; charset=utf-8')
req.data=jdati
self.logger.debug(req.headers)
self.logger.info(req.data)
resp = urllib.request.urlopen(req)
url is an https resource and i set an header with some Authhorization info and content-type.
req.data is a JSON
From urllib documentation i've understood that the redirects are authomatically performed by the the library itself, but it doesn't work for me. It always raises an http 307 error and doesn't follow the redirect URL.
I've also tried to use an opener specifiyng the default redirect handler, but with the same result
opener = urllib.request.build_opener(urllib.request.HTTPRedirectHandler)
req = urllib.request.Request(url)
req.add_header('Authorization', auth)
req.add_header('Content-Type','application/json; charset=utf-8')
req.data=jdati
resp = opener.open(req)
What could be the problem?
The reason why the redirect isn't done automatically has been correctly identified by yours truly in the discussion in the comments section. Specifically, RFC 2616, Section 10.3.8 states that:
If the 307 status code is received in response to a request other
than GET or HEAD, the user agent MUST NOT automatically redirect the
request unless it can be confirmed by the user, since this might
change the conditions under which the request was issued.
Back to the question - given that data has been assigned, this automatically results in get_method returning POST (as per how this method was implemented), and since that the request method is POST, and the response code is 307, an HTTPError is raised instead as per the above specification. In the context of Python's urllib, this specific section of the urllib.request module raises the exception.
For an experiment, try the following code:
import urllib.request
import urllib.parse
url = 'http://httpbin.org/status/307'
req = urllib.request.Request(url)
req.data = b'hello' # comment out to not trigger manual redirect handling
try:
resp = urllib.request.urlopen(req)
except urllib.error.HTTPError as e:
if e.status != 307:
raise # not a status code that can be handled here
redirected_url = urllib.parse.urljoin(url, e.headers['Location'])
resp = urllib.request.urlopen(redirected_url)
print('Redirected -> %s' % redirected_url) # the original redirected url
print('Response URL -> %s ' % resp.url) # the final url
Running the code as is may produce the following
Redirected -> http://httpbin.org/redirect/1
Response URL -> http://httpbin.org/get
Note the subsequent redirect to get was done automatically, as the subsequent request was a GET request. Commenting out req.data assignment line will result in the lack of the "Redirected" output line.
Other notable things to note in the exception handling block, e.read() may be done to retrieve the response body produced by the server as part of the HTTP 307 response (since data was posted, there might be a short entity in the response that may be processed?), and that urljoin is needed as the Location header may be a relative URL (or simply has the host missing) to the subsequent resource.
Also, as a matter of interest (and for linkage purposes), this specific question has been asked multiple times before and I am rather surprised that they never got any answers, which follows:
How to handle 307 redirection using urllib2 from http to https
HTTP Error 307: Temporary Redirect in Python3 - INTRANET
HTTP Error 307 - Temporary redirect in python script

invalid response from proxy with python requests

I am using Requests API with Python2.7.
I am trying to download certain webpages through proxy servers. I have a list of available proxy servers. But not all proxy servers work as desired. Some proxies require authentication, others redirect to advertisement pages etc. In order to detect/verify incorrect responses, I have included two checks in my url requests code. It looks similar to this
import requests
proxy = '37.228.111.137:80'
url = 'http://www.google.ca/'
response = requests.get(url, proxies = {'http' : 'http://%s' % proxy})
if response.url != url or response.status_code != 200:
print 'incorrect response'
else:
print 'response correct'
print response.text
There are some proxy servers with which the requests.get call is successful and they pass these two conditions and still contain invalid html source in response.text attribute. However, if I use the same proxy in my FireFox browser and try to open the same webpage, I am displayed an invalid webpage, but my python script says that the response should be valid.
Can someone point to me that what other necessary checks I am missing to weed out incorrect html results?
or
How can I successfully verify if the webpage I intended to receive is correct?
Regards.
What is an "invalid webpage" when displayed by your browser? The server can return a HTTP status code of 200, but the content is an error message. You understand it to be an error message because you can comprehend it, a browser or code can not.
If you have any knowledge about the content of the target page, you could check whether the returned HTML contains that content and accept it on that basis.

In Python why does urllib.urlopen make Google give an http status "302 Moved"?

Using Python 2.6.6 on CentOS 6.4
import urllib
#url = 'http://www.google.com.hk' #ok
#url = 'http://clients1.google.com.hk' #ok
#url = 'http://clients1.google.com.hk/complete/search' #ok (blank)
url = 'http://clients1.google.com.hk/complete/search?output=toolbar&hl=zh-CN&q=abc' #fails
print url
page = urllib.urlopen(url).read()
print page
Using the first 3 URLs, the code works. But with the 4th URL, Python gives the following 302:
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
here.
</BODY></HTML>
The URL in my code is the same as the URL it tells me to use:
My URL: http://clients1.google.com.hk/complete/search?output=toolbar&hl=zh-CN&q=abc
Its URL: http://clients1.google.com.hk/complete/search?output=toolbar&hl=zh-CN&q=abc
Google says URL moved, but the URLs are the same. Any ideas why?
Update: The URLs all work fine in a browser. But in Python command line the 4th URL is giving a 302.
urllib is ignoring the cookies and sending the new request without cookies, so it causes a redirect loop at that URL. To handle this you can use urllib2 (which is more up-to-date) and add a cookie handler:
import urllib2
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor())
response = opener.open('http://clients1.google.com.hk/complete/search?output=toolbar&hl=zh-CN&q=abc')
print response.read()
It most likely has to do with the headers and perhaps cookies. I did a quick test on the command-line using curl. It also gives me the 302 moved. The Location header it provides is different, as is the one in the document. If I follow the body URL I get a 204 response (weird). If I follow the Location header I end up getting a circular response like you indicate.
Perhaps important is the Set-Cookie header. It may be redirecting until it gets an appropriate cookie set. It may also be scanning the User-Agent and doing something based on that. Those are the big aspects that differentiate a browser from a tool like requests, or urlib. The browser creates sessions, stores cookies, and sends different headers.
I don't know why urllib fails (I get the same response), however requests lib works perfectly:
import requests
url = 'http://clients1.google.com.hk/complete/search?output=toolbar&hl=zh-CN&q=abc' # fails
print (requests.get(url).text)
If you use your favorite web debugger (Fiddler for me) and open up that URL in your browser, you'll see that you also get that initial 302 response. Your browser is just smart enough to redirect you automatically. So your code is returning the correct response. If you want your code to redirect to the new URL automatically, then you have to make your code smart enough to do so.

Categories

Resources