Get webpage text from URL after authorization

Get webpage text from URL after authorization - python

I need to get the output printed on the screen on accessing a url with username and password. When I access the url through my browser, I get a popup where I enter the credentials and get the output in the browser. How do I do it using python script? I tried the following, but it only returns <Response [200]> which means that the request is successful. The output I want is a simple text message.
import requests
response = requests.get(url, auth=(username, password))
print response
I have tried requests.post also, with same results.

print response tries to print out a Response object. If you want the text of the response, use print response.text.
You may want to read the Quickstart documentation for the python-requests library here: http://docs.python-requests.org/en/latest/user/quickstart/.

Related

Using python requests to open multiple tabs while maintaining session cookies

I am attempting to use requests to obtain information from a website. The problem is that this website requires a homepage to open in one tab, while the information you need is open in another. If I close that homepage, the page I need no longer keeps my login session. How could I imitate having two tabs open to prevent this issue. Example:
session = requests.Session()
payload = {'username':'username_here','password':'password_here'}
webSession = session.post("http://website.com/login", data=payload)
webSession2 = session.get("https://website.com/home/home-page")
webSession3 = session.get("https://website.com/Reports/1234")
webSession returns <Response [200]>
webSession2 also returns <Response [200]>, implying my login was successful.
webSession3 returns <Response [401]>, implying I'm no longer logged in.
How can I get webSession3 to return the information I want?

You're already creating a Session object. Use it to get cookies after your first request (probably homepage in your case).
cookies = session.cookies.get_dict()
Now pass cookies in your subsequent request(s):
response = session.post("https://website.com/Reports/1234", cookies=cookies)

Responds from Http request is different from Python and browser

I am testing the Python library request to see if it is suitable for my work. Here is my sample code for reference:
import requests
url = "http://www.genenetwork.org/webqtl/main.py?cmd=sch&gene=Grin2b&tissue=hip&format=text"
print url
print requests.get(url)
My Output:
http://www.genenetwork.org/webqtl/main.py?cmd=sch&gene=Grin2b&tissue=hip&format=text
Response [200]
Output that I get from my browser & my expected result:
What made the differences? How can I get my expected results? I wanted to process the data inside the webpage.

Your code is currently printing the status code of your GET request. You can access the requested content via the text attribute of the Response class returned by the get method.
import requests
r = requests.get("http://www.genenetwork.org/webqtl/main.py?cmd=sch&gene=Grin2b&tissue=hip&format=text")
r.text

invalid response from proxy with python requests

I am using Requests API with Python2.7.
I am trying to download certain webpages through proxy servers. I have a list of available proxy servers. But not all proxy servers work as desired. Some proxies require authentication, others redirect to advertisement pages etc. In order to detect/verify incorrect responses, I have included two checks in my url requests code. It looks similar to this
import requests
proxy = '37.228.111.137:80'
url = 'http://www.google.ca/'
response = requests.get(url, proxies = {'http' : 'http://%s' % proxy})
if response.url != url or response.status_code != 200:
print 'incorrect response'
else:
print 'response correct'
print response.text
There are some proxy servers with which the requests.get call is successful and they pass these two conditions and still contain invalid html source in response.text attribute. However, if I use the same proxy in my FireFox browser and try to open the same webpage, I am displayed an invalid webpage, but my python script says that the response should be valid.
Can someone point to me that what other necessary checks I am missing to weed out incorrect html results?
or
How can I successfully verify if the webpage I intended to receive is correct?
Regards.

What is an "invalid webpage" when displayed by your browser? The server can return a HTTP status code of 200, but the content is an error message. You understand it to be an error message because you can comprehend it, a browser or code can not.
If you have any knowledge about the content of the target page, you could check whether the returned HTML contains that content and accept it on that basis.

Python REST API Issues

I am working on a tool that queries a number of APIs, one of which is a RESTful API. All of the other functions (API calls) of my program work fine with requests.get(), however with the REST API, I do not seem to be able to access the actual content of the response, only the status code. i.e. when I simply print the response, (not response.status_code) I get: <Response [200]> output to the screen. Any ideas?
Snippet of code:
# The URL is correct in my program, For sure.
url = ('http://APIurl/%s' % entry)
try:
response = requests.get(url)
# prints <Response [200]>
print response
# Fails, expecting JSON that isn't there
results.append(response.json())

print the response object's attributes to see what it has available:
print response.__dict__

response.text is your friend, if the content is not valid json.

You need to print response.context or response.text. Your data is probably there.
Sometimes when your request is wrong, the API returns the whole error page (in HTML). So if you're getting a bunch of HTML code, make sure you're request parameters are ok.

How to access the password protected Router Page from Python?

I want to fetch the page 192.168.1.1 /basic/home_dhcplist.htm
from the router but it asks for username and password at the start.
I am fetching the page in Python through urllib2
import urllib2
response = urllib2.urlopen('http://192.168.1.1/basic/home_dhcplist.htm')
html = response.read()
str="Prasads"
value= html.find(str)
print value
if value!=-1 :
print "found"
else:
print "not found"
response.close()

Every home router I have seen uses basic auth for authentication. This is simply another header that you send along with the request. Each time you request a page the username and password are sent as headers to the server, where they are verified each and every request.
I would suggest the requests library over urllib2.
import requests
r = requests.get('http://192.168.1.1/basic/home_dhcplist.htm', auth=('username', 'password'))
if 'Prasads' in r.text():
print "found"
else:
print "not found"

Basically you need to set the cookie which maintains the session, most probably.
Access the page via a browser(Firefox) enter the login pass when prompted to do so.
Press Ctrl-Shift-k, and reload the page and click on any of the most recent GET requests, you'll get a window showing the GET request details. Note the Request Headers and set the cookie accordingly.
The key-value which would be the most useful is Authorization.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Get webpage text from URL after authorization - python

print response tries to print out a Response object. If you want the text of the response, use print response.text. You may want to read the Quickstart documentation for the python-requests library here: http://docs.python-requests.org/en/latest/user/quickstart/.

Related

Using python requests to open multiple tabs while maintaining session cookies

Responds from Http request is different from Python and browser

invalid response from proxy with python requests

Python REST API Issues

How to access the password protected Router Page from Python?

Categories

Resources