Running GET with SSL and authentication in Python - python

I can download things from my controlled server in one way - by passing the document ID into a link like so :
https://website/deployLink/442/document/download/$NUMBER
If I navigate to this in my browser, it downloads the file with ID $NUMBER.
The problem is, I have 9,000 files on my server, which is SSL encrypted and usually requires signing in with a username/password on a dialog box popup which appears on the web-page.
I posted a similar thread to this already, where I downloaded the files via WGET. Now I would like to try and use Python, and I'd like to provide the username/password and get through the SSL encryption.
Here is my attempt to grab one file, which results in a 401 error. Full stacktrace below.
import urllib2
import ctypes
from HTMLParser import HTMLParser
# create a password manager
password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
# Add the username and password.
top_level_url = "https://website.com/home.html"
password_mgr.add_password(None, top_level_url, "admin", "password")
handler = urllib2.HTTPBasicAuthHandler(password_mgr)
# create "opener" (OpenerDirector instance)
opener = urllib2.build_opener(handler)
# Install the opener.
# Now all calls to urllib2.urlopen use our opener.
urllib2.install_opener(opener)
# Grab website
response = urllib2.urlopen('https://website/deployLink/442/document/download/1')
html = response.read()
class MyHTMLParser(HTMLParser):
url=''https://website/deployLink/442/document/download/1')'
# Save the file
webpage = urllib2.urlopen(url)
with open('Test.doc','wb') as localFile:
localFile.write(webpage.read())
What have I done incorrectly here? Is what I am attempting possible?
C:\Python27\python.exe C:/Users/ADMIN/PycharmProjects/GetFile.py
Traceback (most recent call last):
File "C:/Users/ADMIN/PycharmProjects/GetFile.py", line 22, in <module>
response = urllib2.urlopen('https://website/deployLink/442/document/download/1')
File "C:\Python27\lib\urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 437, in open
response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 550, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 475, in error
return self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 409, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 558, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 401: Processed
Process finished with exit code 1
Here's my authent page with some info removed for privacy :
Authent url ends in :443.

Assuming your code above is accurate, then I think your problem is related to the URIs in your add_password method. You have this when setting up the username/password:
# Add the username and password.
top_level_url = "https://website.com/home.html"
password_mgr.add_password(None, top_level_url, "admin", "password")
handler = urllib2.HTTPBasicAuthHandler(password_mgr)
And then your subsequent request goes to this URI:
# Grab website
response = urllib2.urlopen('https://website/deployLink/442/document/download/1')
(I'm assuming they've been "scrubbed" incorrectly, and they should be the same, and just move on. See: "website" vs. "website.com")
The second URI is not a child of the first URI based on their respective path portions. The URI path /deployLink/442/document/download/1 is not a child of /home.html. From the perspective of the library, you'd have no auth data for the second URI.

Related

Error while accessing workdy RAAS report through python

I am trying to access the workday report through python. i am able to access this through browser with userid and passwd. But when i run through python i am getting the below error.
import os
import platform
import ssl
import urllib.request
import urllib.parse
ssl_context = ssl.SSLContext(ssl.PROTOCOL_TLS)
ssl_context.verify_mode = ssl.CERT_REQUIRED
ssl_context.check_hostname = True
ssl_context.load_default_certs()
if platform.system().lower() == 'windows':
import certifi
print(os.path.relpath(certifi.where()))
ssl_context.load_verify_locations(
#cafile=os.path.relpath(certifi.where()),
cafile="C:\\abc_Tools\\TDX_INT166\\Lib\\site-packages\\certifi\\workday.pem",
capath=None,
cadata=None)
print(platform.system().lower())
url = 'https://wd5-impl-services1.workday.com/ccx/service/customreport2/xxx/ISU_INT167/CR_-
_FIN_Report'
username = 'XXXXXXXXX' # 10 digit ID
password = 'XXXXXXXXX'
values ={'username' : username, 'password':password}
data = urllib.parse.urlencode(values).encode("utf-8")
#cookies = cookielib.CookieJar()
https_handler = urllib.request.HTTPSHandler(context=ssl_context)
opener = urllib.request.build_opener(https_handler)
ret = opener.open(url, timeout=2)
print(ret)
I am getting the below error.
site-packages\certifi\cacert.pem
windows
Traceback (most recent call last):
File "C:\SLU_Tools\TDX_INT166\Lib\Certificate_testing.py", line 38, in <module>
ret = opener.open(url, timeout=2)
File
"C:\Users\prasannakumaravel\AppData\Local\Programs\Python\Python39\lib\urllib\request.py",
line 523, in open
response = meth(req, response)
File
"C:\Users\prasannakumaravel\AppData\Local\Programs\Python\Python39\lib\urllib\request.py",
line 632, in http_response
response = self.parent.error(
File
"C:\Users\prasannakumaravel\AppData\Local\Programs\Python\Python39\lib\urllib\request.py",
line 561, in error
return self._call_chain(*args)
File
"C:\Users\prasannakumaravel\AppData\Local\Programs\Python\Python39\lib\urllib\request.py",
line 494, in _call_chain
result = func(*args)
File
"C:\Users\prasannakumaravel\AppData\Local\Programs\Python\Python39\lib\urllib\request.py",
line 641, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 401: Unauthorized
I tried other ways as well. But nothing worked. so far. Is this something doable?
When calling a RAAS this way, you need to authenticate using Basic Auth. The username/password needs to be Base64 encoded and the result is added as an HTTP header. See this answer for an example.
You will need to set up the authorizations using the Basic Auth type, and you will need to encode your username/password following the Basic Auth schemes -- converting the credentials to base64 first, then back to a string format and using it in the authorization string.
I used code similar to the snippet below in our web applications:
import requests, base64
url = '<Workday_RaaS_API_Endpoint>'
username = '<your_RaaS_user_username>'
password = '<your_RaaS_user_password>' # replace with the password here
auth = 'Basic %s' % base64.b64encode(bytes('%s:%s' % (username, password), 'utf-8')).decode('utf-8')
headers = { 'Authorization': auth }
res = requests.get(url=url, headers=headers)
print(res.json())

"HTTP Error 401: Unauthorized" when querying youtube api for playlist with python

I try to write a simple python3 script that gets some playlist informations via the youtube API. However I always get a 401 Error whereas it works perfectly when I enter the request string in a browser or making a request with w-get. I'm relatively new to python and I guess I'm missing some important point here.
This is my script. Of course I actually use a real API-Key.
from urllib.request import Request, urlopen
from urllib.parse import urlencode
api_key = "myApiKey"
playlist_id = input('Enter playlist id: ')
output_file = input('Enter name of output file (default is playlist id')
if output_file == '':
output_file = playlist_id
url = 'https://www.googleapis.com/youtube/v3/playlistItems'
params = {'part': 'snippet',
'playlistId': playlist_id,
'key': api_key,
'fields': 'items/snippet(title,description,position,resourceId/videoId),nextPageToken,pageInfo/totalResults',
'maxResults': 50,
'pageToken': '', }
data = urlencode(params)
request = Request(url, data.encode('utf-8'))
response = urlopen(request)
content = response.read()
print(content)
Unfortunately it rises a error at response = urlopen(request)
Traceback (most recent call last):
File "gpd-helper.py", line 35, in <module>
response = urlopen(request)
File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.4/urllib/request.py", line 461, in open
response = meth(req, response)
File "/usr/lib/python3.4/urllib/request.py", line 571, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python3.4/urllib/request.py", line 499, in error
return self._call_chain(*args)
File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain
result = func(*args)
File "/usr/lib/python3.4/urllib/request.py", line 579, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 401: Unauthorized
I looked up the documentation but couldn't find any hint. According to the docs other authentication than the api key is not required for listing a public playlist.
After diving deeper into the docs of python and google I found the solution to my problem.
Pythons Request object automatically creates a POST request when the data parameter is given but the youtube api expects GET (with post params)
The Solution is to ether supply the GET argument for the method parameter in python 3.4
request = Request(url, data.encode('utf-8'), method='GET')
or concatenate the url with the urlencoded post data
request = Request(url + '?' + data)

Logging into a Coursera account Using Python

I had learned a lot of things from MOOCs so I wanted to return something back to them for this purpose I was thinking of designing a small app in kivy which thus requires python implementation, Actually the thing I wanted to achieve was to log in to my Coursera account via program and collect the information about the courses I am currently pursuing, for this first I have to log in to the coursera( https://accounts.coursera.org/signin?post_redirect=https%3A%2F%2Fwww.coursera.org%2F ), Upon searching the Web I came across this piece of code :
import urllib2, cookielib, urllib
username = "abcdef#abcdef.com"
password = "uvwxyz"
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'password' : password})
info = opener.open("https://accounts.coursera.org/signin",login_data)
for line in info:
print line
and some similar codes as well, but none worked for me, every approach lead to me this type of error:
Traceback (most recent call last):
File "C:\Python27\Practice\web programming\coursera login.py", line 9, in <module>
info = opener.open("https://accounts.coursera.org/signin",login_data)
File "C:\Python27\lib\urllib2.py", line 410, in open
response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 523, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 448, in error
return self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 382, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 531, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 404: Not Found
Is the error due to https protocol or there is something that I am missing?
I don't want to use any 3rd party libraries.
I'm using requests for this purpose and I think it is a great python library. Here is some example code how it could work:
import requests
from requests.auth import HTTPBasicAuth
credentials = HTTPBasicAuth('username', 'password')
response = requests.get("https://accounts.coursera.org/signin", auth=credentials)
print response.status_code
# if everything was fine then it prints
>>> 200
Here is the link to requests:
http://docs.python-requests.org/en/latest/
I think you need to use HTTPBasicAuthHandler module of urllib2. Check section 'Basic Authentication'. https://docs.python.org/2/howto/urllib2.html
And I strongly recommend you requests module. It will make your code better. http://docs.python-requests.org/en/latest/

python automatic re-accessing without changing cookie

I have a problem with accessing specific web site.
The Web site automatically redirect to Check Page which is displaying "check your Browser"
The Check page returns HTTP 503 errors in first time.
Then web browser(chrome, IE etc) automatically re-access again.
Finally I can get into web site.
The problem is I want to access to site in Python.
So I use urllib and urllib2 both.
u = urllib.open(url)
print u.read()
Same with urllib2, but it doesn't work raising 503 error.
urllib also get HTTP 503 code but it doesn't raise error.
So I need to re-access without changing cookie
u = urllib.open(url)
u = urllib.open(url) ## cookie is changed
print u.read()
Simply I tried to call open function twice. But cookie is changed and it doesn't work
(Check Page Again)
So I use urllib2 with cooklib
import os.path
cj = None
ClientCookie = None
cookielib = None
import cookielib
import urllib2
cj = cookielib.LWPCookieJar()
if os.path.isfile('cookie.lpw'):
cj.load('cookie.lpw')
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)
theurl = url
txdata = None
txheaders = {'User-Agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'}
req = urllib2.Request(theurl, txdata, txheaders)
handle = urllib2.urlopen(req) ## error raised
Error Code Here
Traceback (most recent call last):
File "<pyshell#20>", line 1, in <module>
handle = urlopen(req)
File "C:\Python27\lib\urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 410, in open
response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 523, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 448, in error
return self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 382, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 531, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 503: Service Temporarily Unavailable
Simply I want to re-access the site when got HTTP 503 error without change cookies.
But I don't know how to do it.
Somebody help me please.

Get request Activities for strava v3 api python

edit - as I couldn't tag this with Strava here are the docs if you're interested - http://strava.github.io/api/
I get through the authentication fine and gain the access_token (and my athlete info) in a response.read.
I'm having problems with the next step:
I want to return information about a specific activity.
import urllib2
import urllib
access_token = str(tp[3]) #this comes from the response not shown
print access_token
ath_url = 'https://www.strava.com/api/v3/activities/108838256'
ath_val = values={'access_token':access_token}
ath_data = urllib.urlencode (ath_val)
ath_req = urllib2.Request(ath_url, ath_data)
ath_response = urllib2.urlopen(ath_req)
the_page = ath_response.read()
print the_page
error is
Traceback (most recent call last):
File "C:\Users\JordanR\Python2.6\documents\strava\auth.py", line 30, in <module>
ath_response = urllib2.urlopen(ath_req)
File "C:\Users\JordanR\Python2.6\lib\urllib2.py", line 124, in urlopen
return _opener.open(url, data, timeout)
File "C:\Users\JordanR\Python2.6\lib\urllib2.py", line 389, in open
response = meth(req, response)
File "C:\Users\JordanR\Python2.6\lib\urllib2.py", line 502, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Users\JordanR\Python2.6\lib\urllib2.py", line 427, in error
return self._call_chain(*args)
File "C:\Users\JordanR\Python2.6\lib\urllib2.py", line 361, in _call_chain
result = func(*args)
File "C:\Users\JordanR\Python2.6\lib\urllib2.py", line 510, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 404: Not Found
the 404 is a puzzle as I know this activity exists?
Is 'access_token' the correct header information?
the documentation (http://strava.github.io/api/v3/activities/#get-details) uses Authorization: Bearer ? I'm not sure how liburl would encode the Bearer part of the info?
sorry if some of my terminology is a bit off, I'm new to this.
answered this bad boy.
import requests as r
access_token = tp[3]
ath_url = 'https://www.strava.com/api/v3/activities/108838256'
header = {'Authorization': 'Bearer 4b1d12006c51b685fd1a260490_example_jklfds'}
data = r.get(ath_url, headers=header).json()
it needed the "Bearer" part adding in the Dict.
thanks for the help idClark
I prefer using the third-party Requests module. You do indeed need to follow the documentation and use the Authorization: Header documented in the API
Requests has an arg for header data. We can then create a dict where the key is Authorization and its value is a single string Bearer access_token
#install requests from pip if you want
import requests as r
url = 'https://www.strava.com/api/v3/activities/108838256'
header = {'Authorization': 'Bearer access_token'}
r.get(url, headers=header).json()
If you really want to use urllib2
#using urllib2
import urllib2
req = urllib.Request(url)
req.add_header('Authorization', 'Bearer access_token')
resp = urllib2.urlopen(req)
content = resp.read()
Just remember access_token needs to be the literal string value, eg acc09cds09c097d9c097v9

Categories

Resources