How to authenticate a site with Python using urllib2?

How to authenticate a site with Python using urllib2? - python

After much reading here on Stackoverflow as well as the web I'm still struggling with getting things to work.
My challenge: to get access to a restricted part of a website for which I'm a member using Python and urllib2.
From what I've read the code should be like this:
mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
url = 'http://www.domain.com'
mgr.add_password(None, url, 'username', 'password')
handler = urllib2.HTTPBasicAuthHandler(mgr)
opener = urllib2.build_opener(handler)
urllib2.install_opener(opener)
try:
response = urllib2.urlopen('http://www.domain.com/restrictedpage')
page = response.read()
print page.geturl()
except IOError, e:
print e
The print doesn't print "http://www.domain.com/restrictedpage", but shows "http://www.domain.com/login" so my credentials aren't stored/processed and I'm being redirected.
How can I get this to work? I've been trying for days and keep hitting the same dead ends. I've tried all the examples I could find to no avail.
My main question is: what's needed to authenticate to a website using Python and urllib2?
Quick question: what am I doing wrong?

Check first manually what is really happening when you are successfully authenticated (instructions with Chrome):
Open develper tools in Chrome (Ctrl + Shift + I)
Click Network tab
Go and do the authentication manually (go the the page, type user + passwd + submit)
check the POST method in the Network tab of the developer tools
check the Request Headers, Query String Parameters and Form Data. There you find all the information needed what you need to have in your own POST.
Then install "Advanced Rest Client (ARC)" Chrome extension
Use the ARC to construct a valid POST for authentication.
Now you know what to have in your headers and form data. Here's a sample code using Requests that worked for me for one particular site:
import requests
USERNAME = 'user' # put correct usename here
PASSWORD = 'password' # put correct password here
LOGINURL = 'https://login.example.com/'
DATAURL = 'https://data.example.com/secure_data.html'
session = requests.session()
req_headers = {
'Content-Type': 'application/x-www-form-urlencoded'
}
formdata = {
'UserName': USERNAME,
'Password': PASSWORD,
'LoginButton' : 'Login'
}
# Authenticate
r = session.post(LOGINURL, data=formdata, headers=req_headers, allow_redirects=False)
print r.headers
print r.status_code
print r.text
# Read data
r2 = session.get(DATAURL)
print "___________DATA____________"
print r2.headers
print r2.status_code
print r2.text

For HTTP Basic Auth you can refer this : http://www.voidspace.org.uk/python/articles/authentication.shtml

Related

Python - urllib - Server is not getting everything after '?' in URL

I am writing a Python3 script to trigger an Axis camera via their HTTP API. The string they need in the url to trigger the Digital Input (with authorization) is:
"http://{ip}/axis-cgi/param.cgi?action=update&IOPort.I0.Input.Trig={open/closed}"
If I put this in a browser it works AKA response 200 - OK. When I run this on my Linux machine using the urllib request.
#encode user,pass
values = { 'username': username,'password': password }
data = urllib.urlencode(values)
#Axis param.cgi action
action = "update"
trig = "closed"
cgi_args = {"action": action, "IOPort.I0.Input.Trig": trig}
cgi_data = urllib.urlencode(cgi_args)
url = "http://{ip}/axis-cgi/param.cgi?{data}".format(ip=ip, data=cgi_data)
req = urllib2.Request(url, data, headers={'Content-type': 'text/html'})
print(req.get_full_url())
response = urllib2.urlopen(req)
result = response.read()
print (result)
The output is:
http://192.168.50.191/axis-cgi/param.cgi?action=update&IOPort.I0.Input.Trig=closed
action must be specified
I know that I am authenticated otherwise I get Unauthorized response from server.
As you can see for debugging I print the req.get_full_url() to make sure I've built the url string correctly, however the server responds with action must be specified which is what I get in the browser when I just have http://192.168.50.191/axis-cgi/param.cgi? in the address bar. So the action portion of the URL appears to not be making it to the server.
I've tried:
Using %3F as the ? character, but I get 404 error
Embedding the data in the data parameter does not work either
Anything I am missing here?

used pycurl with digest authorization for a GET request with auth:
def configCurl():
username = "user"
password = "pass"
auth_mode = pycurl.HTTPAUTH_DIGEST
curl = pycurl.Curl()
curl.setopt(pycurl.HTTPAUTH, auth_mode)
curl.setopt(pycurl.USERPWD, "{}:{}".format(username, password))
curl.setopt(pycurl.WRITEFUNCTION, lambda x: None)
return curl
curl = configCurl()
curl.setopt(curl.URL, url)
curl.perform()
curl.close()

Python web scraping login

I am trying to login to a website using python.
The login URL is :
https://login.flash.co.za/apex/f?p=pwfone:login
and the 'form action' url is shown as :
https://login.flash.co.za/apex/wwv_flow.accept
When I use the ' inspect element' on chrome when logging in manually, these are the form posts that show up (pt_02 = password):
There a few hidden items that I'm not sure how to add into the python code below.
When I use this code, the login page is returned:
import requests
url = 'https://login.flash.co.za/apex/wwv_flow.accept'
values = {'p_flow_id': '1500',
'p_flow_step_id': '101',
'p_page_submission_id': '3169092211412',
'p_request': 'LOGIN',
'p_t01': 'solar',
'p_t02': 'password',
'p_checksum': ''
}
r = requests.post(url, data=values)
print r.content
How can I adjust this code to perform a login?
Chrome network:

This is more or less your script should look like. Use session to handle the cookies automatically. Fill in the username and password fields manually.
import requests
from bs4 import BeautifulSoup
logurl = "https://login.flash.co.za/apex/f?p=pwfone:login"
posturl = 'https://login.flash.co.za/apex/wwv_flow.accept'
with requests.Session() as s:
s.headers = {"User-Agent":"Mozilla/5.0"}
res = s.get(logurl)
soup = BeautifulSoup(res.text,"lxml")
values = {
'p_flow_id': soup.select_one("[name='p_flow_id']")['value'],
'p_flow_step_id': soup.select_one("[name='p_flow_step_id']")['value'],
'p_instance': soup.select_one("[name='p_instance']")['value'],
'p_page_submission_id': soup.select_one("[name='p_page_submission_id']")['value'],
'p_request': 'LOGIN',
'p_arg_names': soup.select_one("[name='p_arg_names']")['value'],
'p_t01': 'username',
'p_arg_names': soup.select_one("[name='p_arg_names']")['value'],
'p_t02': 'password',
'p_md5_checksum': soup.select_one("[name='p_md5_checksum']")['value'],
'p_page_checksum': soup.select_one("[name='p_page_checksum']")['value']
}
r = s.post(posturl, data=values)
print r.content

since I cannot recreate your case I can't tell you what exactly to change, but when I was doing such things I used Postman to intercept all requests my browser sends. So I'd install that, along with browser extension and then perform login. Then you can view the request in Postman, also view the response it received there, what's more it provides you with Python code of request too, so you could simply copy and use it then.
Shortly, use Pstman, perform login, clone their request.

Login to website with Python requests

I am having some trouble logging into this website: https://illinoisjoblink.illinois.gov/ada/r/home
I am able to submit the payload, but I get redirected to a page claiming a bookmark error. Here is the code and relevant error messages. I'm not sure how to proceed. I appreciate any and all help. Thanks!
session = requests.Session()
soup = BeautifulSoup(session.get(SEARCH_URL).content, "html.parser")
inputs = soup.find_all('input')
token = ''
for t in inputs:
try:
if t['name'] == 'authenticity_token':
token = t['value']
break
except KeyError as e:
pass
login_data = dict(v_username=USER_NAME,
v_password=PASSWORD,
authenticity_token=token,
commit='Log In')
login_data['utf-8'] = '✓'
r = session.post(LOGIN_URL, data=login_data)
print(r.content)
Bookmark Error <b>You may be seeing this error as a result of bookmarking this page. Unfortunately, our site design will not allow the bookmarking of most internal pages.</b> If you wish to contact the system administra
tor concerning this error, you may send an email to DES.IJLSysAdmSpprt#illinois.gov. Please reference error number <b>646389</b>.<p>Thank you for your patience.<br><br> Hit your browser back button to return to the previous page.

Perhaps your login_data parameters are wrong? When I inspect the POST for logging in, the necessary parameters appear to be: v_username, v_password, FormName, fromlogin, and maybe most importantly, button. I would suggest you add all these parameters to your login_data dictionary. Your dictionary would look something like:
login_data = {'v_username': USER_NAME,
'v_password': PASSWORD,
'authenticity_token': token,
'FormName': 0,
'fromlogin': 1,
'button': 'Log+in'}

403 error while calling Reddit API.

I'm trying to access data saved by the user. And it keeps returning a 403 error with this being its api end point.
http://www.reddit.com/dev/api#GET_user_{username}_saved
I'm thoroughly confused what to send in my headers to make this request work and the reddit documentation has no mention of it at all. Help?
I'm using Python-requests library to do this.

Referring to line 686 in reddit's code in listingcontroller.py (here) :
if (where in ('saved', 'hidden') and not
((c.user_is_loggedin and c.user._id == vuser._id) or
c.user_is_admin)):
return self.abort403()
you can clearly see that you must be logged in as username or be an admin in order to get the saved or hidden data - otherwise you get a 403 error.

As #zenpoy already mentioned (and which you already know), you have to be logged in. Therefore, you should save the cookie, which you get as a response of a valid call to api/login. I've written some code, which logs a user in and retrieves all saved things:
import urllib
import urllib2
import cookielib
import json
login_url = 'https://ssl.reddit.com/api/login/'
saved_url = 'https://ssl.reddit.com/user/<username>/saved.json'
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
def login(username, passwd):
values = {'user': username,
'api_type': 'json',
'passwd': passwd}
data = urllib.urlencode(values)
response = opener.open(login_url, data).read()
print json.loads(response)
def retrieve_saved(username):
url = saved_url.replace('<username>', username)
response = opener.open(url).read()
print json.loads(response)
login(<username>, <passwd>)
retrieve_saved(<username>)

No Content-Disposition header in response from mechanize

I have to download the csv file from AdWords Billing page during the Celery task. And I have no idea what's wrong with my implementation, so need your help.
Log in:
browser = mechanize.Browser()
browser.open('https://accounts.google.com/ServiceLogin')
browser.select_form(nr=0)
browser['Email'] = g_email
browser['Passwd'] = g_password
browser.submit()
browser.set_handle_robots(False)
billing_resp = browser.open('https://adwords.google.com/')
It's OK, I'm on the billing page now. Next, I've parsed the result page for token and ids, analyzed request headers and action url in Chrome debugger and now I want to make POST request and receive my csv file. Response headers (in Chrome) are:
content-disposition:attachment; filename="myclientcenter.csv.gz"
content-length:307479
content-type:application/x-gzip; charset=UTF-8
With mechanize:
data = {
'__u': effectiveUserId,
'__c': customerId,
'token': token,
}
browser.addheaders = [
('accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'),
('content-type', 'application/x-www-form-urlencoded'),
("accept-encoding", "gzip,deflate,sdch"),
('user-agent', "Mozilla/5.0"),
('referer', "https://adwords.google.com/mcm/Mcm?__u=8183865359&__c=3069937889"),
('origin', "https://adwords.google.com"),
]
browser.set_handle_refresh(True)
browser.set_debug_responses(True)
browser.set_debug_redirects(True)
browser.set_handle_referer(True)
browser.set_debug_http(True)
browser.set_handle_equiv(True)
browser.set_handle_gzip(True)
response = browser.open(
'https://adwords.google.com/mcm/file/ClientSummary/',
data='&'.join(['='.join(pair) for pair in data.items()]),
)
BUT! The Content-Length header is 0 and no Content-Disposition in this response. Why? And what can I do to make it work?
Had a try to use Requests, but could not even pass the login stage...

I have the answer on my own question now (thanks for my team lead).
The main mistake is in this incorrect request data:
data = {
'__u': effectiveUserId,
'__c': customerId,
'token': token,
}
Let's try again, with proper solution.
# Open Google login page and log in.
browser = mechanize.Browser()
try:
browser.open('https://accounts.google.com/ServiceLogin')
browser.select_form(nr=0)
browser['Email'] = 'email#adwords.login'
browser['Passwd'] = 'password'
browser.submit()
except HTTPError:
raise AdWordsException("Can't find the Google login form")
We are logged in now and can go deeper.
try:
browser.set_handle_robots(False)
billing_resp = browser.open('https://adwords.google.com/')
except HTTPError:
raise AdWordsException("Can't open AdWords dashboard page")
# Welcome to the AdWords billing dashboard. We can get
# session-unique token from this page for the further POST-request
token_re = re.search(r"token:\'(.{41})\'", billing_resp.read())
if token_re is None:
raise AdWordsException("Can't parse the token")
# It's time for some magic now. We have to construct proper mcsSelector
# serialized data structure. This is GWT-RPC wire protocol hell.
# Paste your specific version from web debugger.
MCS_TEMPLATE = (
"7|0|49|https://adwords.google.com/mcm/gwt/|18FBB090A5C26E56AC16C9DF0689E720|"
"com.google.ads.api.services.common.selector.Selector/1054041135|"
"com.google.ads.api.services.common.date.DateRange/1118087507|"
"com.google.ads.api.services.common.date.Date/373224763|"
"java.util.ArrayList/4159755760|java.lang.String/2004016611|ClientName|"
"ExternalCustomerId|PrimaryUserLogin|PrimaryCompanyName|IsManager|"
"SalesChannel|Tier|AccountSettingTypes|Labels|Alerts|CostWithCurrency|"
"CostUsd|Clicks|Impressions|Ctr|Conversions|ConversionRate|SearchCtr|"
"ContentCtr|BudgetAmount|BudgetStartDate|BudgetEndDate|BudgetPercentSpent|"
"BudgetType|RemainingBudget|ClientDateTimeZoneId|"
"com.google.ads.api.services.common.selector.OrderBy/524388450|"
"SearchableData|"
"com.google.ads.api.services.common.sorting.SortOrder/2037387810|"
"com.google.ads.api.services.common.pagination.Paging/363399854|"
"com.google.ads.api.services.common.selector.Predicate/451365360|"
"SeedObfuscatedCustomerId|"
"com.google.ads.api.services.common.selector.Predicate$Operator/2293561107|"
"java.util.Arrays$ArrayList/2507071751|[Ljava.lang.String;/2600011424|"
"3069937889|ExcludeSeeds|true|ClientTraversal|DIRECT|"
"com.google.ads.api.services.common.selector.Summary/3224078220|included|1|"
"2|3|4|5|"
"{report_date}|5|{report_date}" # take a note of this
"|6|26|7|8|7|9|7|10|7|11|7|12|7|13|7|14|7|15|7|16|7|17|7|18|7|19|7|20|7|21|"
"7|22|7|23|7|24|7|25|7|26|7|27|7|28|7|29|7|30|7|31|7|32|7|33|6|0|0|0|6|2|34|"
"35|36|0|34|9|-35|37|100|0|6|0|6|3|38|39|40|2|41|42|1|43|38|44|40|0|41|42|1|"
"45|38|46|-45|41|42|1|47|0|0|6|0|6|1|48|6|0|49|6|0|0|"
)
# To take stats for today
report_date = datetime.date.today()
mcs_selector = MCS_TEMPLATE.format(
report_date='%s|%s|%s' % (
report_date.day,
report_date.month,
report_date.year
),
)
data = urllib.urlencode({
'token': token_re.group(1),
'mcsSelector': mcs_selector,
})
# And... it finally works! Token and proper mcsSelector is all we need.
# POST-request with this data returns zipped csv file for us with
# current balance state and another info that's not available via AdWords API
zipped_csv = browser.open(
'https://adwords.google.com/mcm/file/ClientSummary',
data=data
)
# Unpack it and use as you wish.
with gzip.GzipFile(mode='r', fileobj=zipped_csv) as csv_io:
try:
csv = StringIO.StringIO(csv_io.read())
except IOError:
raise AdWordsException("Can't get CSV file from response")
finally:
browser.close()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to authenticate a site with Python using urllib2? - python

For HTTP Basic Auth you can refer this : http://www.voidspace.org.uk/python/articles/authentication.shtml

Related

Python - urllib - Server is not getting everything after '?' in URL

Python web scraping login

Login to website with Python requests

403 error while calling Reddit API.

No Content-Disposition header in response from mechanize

Categories

Resources