I am trying to get an http response from a website using the requests module. I get status code 410 in my response:
<Response [410]>
From the documentation, it appears that the forwarding url for the web content may not be intentionally available to the clients. Is this indeed the case, or am I missing something? Trying to confirm if the webpage can be scrapped at all:
url='http://www.b2i.us/profiles/investor/ResLibraryView.asp?ResLibraryID=81517&GoTopage=3&Category=1836&BzID=1690&G=666'
try:
response = requests.get(url)
except requests.exceptions.RequestException as e:
print(e)
Some webisites don't respond well to HTTP requests with 'python-requests' as a User Agent String.
You can get a 200 OK response if you set the User-Agent header to 'Mozilla'.
url='http://www.b2i.us/profiles/investor/ResLibraryView.asp?ResLibraryID=81517&GoTopage=3&Category=1836&BzID=1690&G=666'
headers={'User-Agent':'Mozilla/5'}
response = requests.get(url, headers=headers)
print(response)
< Response [200] >
This works for Mac OSX, but I am having issues with the same approach in Windows on a VMWare virtual machine I run automated tasks from. Why would the behavior be different? Is there a separate workaround for Window machines?
Related
I am trying to run a REST request in windows 7 but it is not executed from the python code below.
The code works in ubuntu, but doesn't windows 7:
def get_load_names(url='http://<ip>:5000/loads_list'):
response = requests.get(url)
if response.status_code == 200:
jData = json.loads(response.content)
print(jData)
else:
print('error', response)
Also if I paste the url in the browser, I see the request output. So I assume there is something to do with the firewall.
I created rules to open the port 5000 for input and output, but no luck so far.
Unless you have a very specific reason for writing your own error handling, you should use the built-in raise_for_status()
import requests
import json
response = requests.get('http://<ip>:5000/loads_list')
response.raise_for_status()
jData = json.loads(response.text)
print(jData)
This will hopefully raise an informative error message that you can deal with.
I have a web bot that is trying to get a cookie.
The flow goes:
I go get captcha, and a csrftoken (cookie)
I solve captcha and send solution to server.
They send back the session id.
The session id is a response cookie; although I seem to not get it in python.
The POST request to the server looks like this:
cookies={'csrftoken': 'h1239phtluwrane',}
headers = {'foo': 'bar'}
session=requests.Session()
r=session.post(URL, headers=headers, data=data, cookies=cookies)
try:
cookies['sessionid']=session.cookies['sessionid']
except KeyError:
print("Error getting correct cookie. %s" %session.cookies)
Then in session.cookies there is only the csrftoken as a request cookie.... But no response cookie to be found.
On another note. This same exact code used to work but suddenly stopped working even though I did not edit it. I verified that the server methods did not change.
To get your response cookies do:
print(r.json()['cookies'])
#{'tasty_cookie': 'yum'}
I am trying to get a response from an internal url which I can access through my laptop using a web-browser.
s = requests.Session()
r = s.get(url_1, auth=auth, verify=False)
print r.text
the reply i get is: 401 - unauthorized.
It's obviously going to be difficult to debug an HTTP 401 Unauthorized as we don't have access to the internal URL. Your code looks correct to me so I'm assuming this is a real 401 Unauthorized which means the request has incorrect authentication credentials. My advice would be to make sure you have reviewed the Python Requests docs on authentication and consider that your request is likely going through a proxy so the Requests docs on proxy config might be helpful.
I am using Requests API with Python2.7.
I am trying to download certain webpages through proxy servers. I have a list of available proxy servers. But not all proxy servers work as desired. Some proxies require authentication, others redirect to advertisement pages etc. In order to detect/verify incorrect responses, I have included two checks in my url requests code. It looks similar to this
import requests
proxy = '37.228.111.137:80'
url = 'http://www.google.ca/'
response = requests.get(url, proxies = {'http' : 'http://%s' % proxy})
if response.url != url or response.status_code != 200:
print 'incorrect response'
else:
print 'response correct'
print response.text
There are some proxy servers with which the requests.get call is successful and they pass these two conditions and still contain invalid html source in response.text attribute. However, if I use the same proxy in my FireFox browser and try to open the same webpage, I am displayed an invalid webpage, but my python script says that the response should be valid.
Can someone point to me that what other necessary checks I am missing to weed out incorrect html results?
or
How can I successfully verify if the webpage I intended to receive is correct?
Regards.
What is an "invalid webpage" when displayed by your browser? The server can return a HTTP status code of 200, but the content is an error message. You understand it to be an error message because you can comprehend it, a browser or code can not.
If you have any knowledge about the content of the target page, you could check whether the returned HTML contains that content and accept it on that basis.
I am using python requests library for a POST request and I expect a return message with an empty payload. I am interested in the headers of the returned message, specifically the 'Location' attribute. I tried the following code:
response=requests.request(method='POST', url=url, headers={'Content-Type':'application/json'}, data=data)
print response.headers ##Displays a case-insensitve map
print response.headers['Location'] ##blows up
Strangely the 'Location' attribute is missing in the headers map. If I try the same POST request on postman, I do get a valid Location attribute. Has anyone else seen this? Is this a bug in the requests library?
Sounds like everything's working as expected? Check your response.history
From the Requests documentation:
Requests will automatically perform location redirection for all verbs except HEAD.
>>> r = requests.get('http://github.com')
>>> r.url
'https://github.com/'
>>> r.status_code
200
>>> r.history
[<Response [301]>]
From the HTTP Location page on wikipedia:
The HTTP Location header field is returned in responses from an HTTP server under two circumstances:
To ask a web browser to load a different web page. In this circumstance, the Location header should be sent with an HTTP status code of 3xx. It is passed as part of the response by a web server when the requested URI has:
Moved temporarily, or
Moved permanently
To provide information about the location of a newly-created resource. In this circumstance, the Location header should be sent with an HTTP status code of 201 or 202.1
The requests library follows redirections automatically.
To take a look at the redirections, look at the history of the requests. More details in the docs.
Or you pass the extra allow_redirects=False parameter when making the request.