Usually I've been able to get around 403 Errors once I've added a known User Agent but I'm now trying to login and then eventually scrape and cannot figure out how to bypass this error.
Code:
import urllib
import http.cookiejar
cj = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib.request.install_opener(opener)
authentication_url = 'https://www.linkedin.com/'
payload = {
'session_key': 'email',
'session_password': 'password'
}
data = urllib.parse.urlencode(payload)
binary_data = data.encode('UTF-8')
req = urllib.request.Request(authentication_url, binary_data)
resp = urllib.request.urlopen(req)
contents = resp.read()
Traceback:
Traceback (most recent call last):
File "C:/Python34/loginLinked.py", line 16, in <module>
resp = urllib.request.urlopen(req)
File "C:\Python34\lib\urllib\request.py", line 161, in urlopen
return opener.open(url, data, timeout)
File "C:\Python34\lib\urllib\request.py", line 469, in open
response = meth(req, response)
File "C:\Python34\lib\urllib\request.py", line 579, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python34\lib\urllib\request.py", line 507, in error
return self._call_chain(*args)
File "C:\Python34\lib\urllib\request.py", line 441, in _call_chain
result = func(*args)
File "C:\Python34\lib\urllib\request.py", line 587, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
See my answer to this question:
why isn't Requests not signing into a website correctly?
I should start with stating that you really should use their API:
http://developer.linkedin.com/apis
There does not seem to be any POST login on the frontpage of linkedin using those parameters?
This is the login URL you must POST to:
https://www.linkedin.com/uas/login-submit
Be aware that this probably wont work either, as you need at least the csrfToken parameter from the login form.
You probably need the loginCsrfParam too, also from the login form on the frontpage.
Something like this might work. Not tested, you might need to add the other POST parameters.
import requests
s = requests.session()
def get_csrf_tokens():
url = "https://www.linkedin.com/"
req = s.get(url).text
csrf_token = req.split('name="csrfToken" value=')[1].split('" id="')[0]
login_csrf_token = req.split('name="loginCsrfParam" value="')[1].split('" id="')[0]
return csrf_token, login_csrf_token
def login(username, password):
url = "https://www.linkedin.com/uas/login-submit"
csrfToken, loginCsrfParam = get_csrf_tokens()
data = {
'session_key': username,
'session_password': password,
'csrfToken': csrfToken,
'loginCsrfParam': loginCsrfParams
}
req = s.post(url, data=data)
login('username', 'password')
Related
I am trying to access the workday report through python. i am able to access this through browser with userid and passwd. But when i run through python i am getting the below error.
import os
import platform
import ssl
import urllib.request
import urllib.parse
ssl_context = ssl.SSLContext(ssl.PROTOCOL_TLS)
ssl_context.verify_mode = ssl.CERT_REQUIRED
ssl_context.check_hostname = True
ssl_context.load_default_certs()
if platform.system().lower() == 'windows':
import certifi
print(os.path.relpath(certifi.where()))
ssl_context.load_verify_locations(
#cafile=os.path.relpath(certifi.where()),
cafile="C:\\abc_Tools\\TDX_INT166\\Lib\\site-packages\\certifi\\workday.pem",
capath=None,
cadata=None)
print(platform.system().lower())
url = 'https://wd5-impl-services1.workday.com/ccx/service/customreport2/xxx/ISU_INT167/CR_-
_FIN_Report'
username = 'XXXXXXXXX' # 10 digit ID
password = 'XXXXXXXXX'
values ={'username' : username, 'password':password}
data = urllib.parse.urlencode(values).encode("utf-8")
#cookies = cookielib.CookieJar()
https_handler = urllib.request.HTTPSHandler(context=ssl_context)
opener = urllib.request.build_opener(https_handler)
ret = opener.open(url, timeout=2)
print(ret)
I am getting the below error.
site-packages\certifi\cacert.pem
windows
Traceback (most recent call last):
File "C:\SLU_Tools\TDX_INT166\Lib\Certificate_testing.py", line 38, in <module>
ret = opener.open(url, timeout=2)
File
"C:\Users\prasannakumaravel\AppData\Local\Programs\Python\Python39\lib\urllib\request.py",
line 523, in open
response = meth(req, response)
File
"C:\Users\prasannakumaravel\AppData\Local\Programs\Python\Python39\lib\urllib\request.py",
line 632, in http_response
response = self.parent.error(
File
"C:\Users\prasannakumaravel\AppData\Local\Programs\Python\Python39\lib\urllib\request.py",
line 561, in error
return self._call_chain(*args)
File
"C:\Users\prasannakumaravel\AppData\Local\Programs\Python\Python39\lib\urllib\request.py",
line 494, in _call_chain
result = func(*args)
File
"C:\Users\prasannakumaravel\AppData\Local\Programs\Python\Python39\lib\urllib\request.py",
line 641, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 401: Unauthorized
I tried other ways as well. But nothing worked. so far. Is this something doable?
When calling a RAAS this way, you need to authenticate using Basic Auth. The username/password needs to be Base64 encoded and the result is added as an HTTP header. See this answer for an example.
You will need to set up the authorizations using the Basic Auth type, and you will need to encode your username/password following the Basic Auth schemes -- converting the credentials to base64 first, then back to a string format and using it in the authorization string.
I used code similar to the snippet below in our web applications:
import requests, base64
url = '<Workday_RaaS_API_Endpoint>'
username = '<your_RaaS_user_username>'
password = '<your_RaaS_user_password>' # replace with the password here
auth = 'Basic %s' % base64.b64encode(bytes('%s:%s' % (username, password), 'utf-8')).decode('utf-8')
headers = { 'Authorization': auth }
res = requests.get(url=url, headers=headers)
print(res.json())
I need to use the urllib for this, not urllib2, urllib3, or Requests.
The curl equivalent...
curl -H "Content-Type: application/json" -X POST -d '{"title":"My New Title","content":"blahblah","excerpt":"blah"}' http://localhost/wp-json/wp/v2/posts --user 'user:xxx'
... and the following code work fine:
r = requests.post(url + '/wp-json/wp/v2/posts',
auth=HTTPBasicAuth(username, password),
data=json.dumps(params),
headers={'content-type': 'application/json'})
if(not(r.ok)):
raise Exception('failed with rc=' + r.status_code)
return json.loads(r.text)
But this fails:
params = {
'title': 'The Title',
'content': content,
'exerpt': 'blah'
}
postparam = json.dumps(params).encode('utf-8')
password_mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm()
password_mgr.add_password(None, url, username, password)
handler = urllib.request.HTTPBasicAuthHandler(password_mgr)
opener = urllib.request.build_opener(handler)
req = urllib.request.Request(url + '/wp-json/wp/v2/posts', method='POST', data=postparam)
req.add_header('Content-Type', 'application/json')
r = opener.open(req)
if(r.getcode() != 200):
raise Exception('failed with rc=' + r.getcode())
Traceback:
Traceback (most recent call last):
File "test_wp_api.py", line 10, in <module>
rs = create(endpoint, "test")
File "C:\...\wordpress.py", line 28, in create
r = opener.open(req)
File "c:\usr\Python37-32\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "c:\usr\Python37-32\lib\urllib\request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "c:\usr\Python37-32\lib\urllib\request.py", line 569, in error
return self._call_chain(*args)
File "c:\usr\Python37-32\lib\urllib\request.py", line 503, in _call_chain
result = func(*args)
File "c:\usr\Python37-32\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 401: Unauthorized
What's the cause, and how to fix it?
I will venture a guess. Please, try this out and if it does not address your issue, I can ditch the answer (but it won't easily fit into a comment as is).
Not sure about requests. but curl when provided with credentials does not wait for challenge (HTTP 401 and info on supported authentication methods) to try (respond with) authentication (attempt). Some servers do not send such challenge and just expect pre-authenticated session or a force (lucky guess a bit) authentication. I have to talk to some hosts like that too. If that is the case, you cannot change server setup and you are OK to just assume HTTP basic is supported, you can try forcing your credentials into each request. Reusing your opener,username, and password and you also need to import base64 for this snippet which does just that:
credentials = '{}:{}'.format(username, password).encode('ascii')
credentials = base64.b64encode(credentials)
credentials = b"Basic " + credentials
opener.addheaders.append(("Authorization", credentials))
I try to write a simple python3 script that gets some playlist informations via the youtube API. However I always get a 401 Error whereas it works perfectly when I enter the request string in a browser or making a request with w-get. I'm relatively new to python and I guess I'm missing some important point here.
This is my script. Of course I actually use a real API-Key.
from urllib.request import Request, urlopen
from urllib.parse import urlencode
api_key = "myApiKey"
playlist_id = input('Enter playlist id: ')
output_file = input('Enter name of output file (default is playlist id')
if output_file == '':
output_file = playlist_id
url = 'https://www.googleapis.com/youtube/v3/playlistItems'
params = {'part': 'snippet',
'playlistId': playlist_id,
'key': api_key,
'fields': 'items/snippet(title,description,position,resourceId/videoId),nextPageToken,pageInfo/totalResults',
'maxResults': 50,
'pageToken': '', }
data = urlencode(params)
request = Request(url, data.encode('utf-8'))
response = urlopen(request)
content = response.read()
print(content)
Unfortunately it rises a error at response = urlopen(request)
Traceback (most recent call last):
File "gpd-helper.py", line 35, in <module>
response = urlopen(request)
File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.4/urllib/request.py", line 461, in open
response = meth(req, response)
File "/usr/lib/python3.4/urllib/request.py", line 571, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python3.4/urllib/request.py", line 499, in error
return self._call_chain(*args)
File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain
result = func(*args)
File "/usr/lib/python3.4/urllib/request.py", line 579, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 401: Unauthorized
I looked up the documentation but couldn't find any hint. According to the docs other authentication than the api key is not required for listing a public playlist.
After diving deeper into the docs of python and google I found the solution to my problem.
Pythons Request object automatically creates a POST request when the data parameter is given but the youtube api expects GET (with post params)
The Solution is to ether supply the GET argument for the method parameter in python 3.4
request = Request(url, data.encode('utf-8'), method='GET')
or concatenate the url with the urlencoded post data
request = Request(url + '?' + data)
I have a problem with accessing specific web site.
The Web site automatically redirect to Check Page which is displaying "check your Browser"
The Check page returns HTTP 503 errors in first time.
Then web browser(chrome, IE etc) automatically re-access again.
Finally I can get into web site.
The problem is I want to access to site in Python.
So I use urllib and urllib2 both.
u = urllib.open(url)
print u.read()
Same with urllib2, but it doesn't work raising 503 error.
urllib also get HTTP 503 code but it doesn't raise error.
So I need to re-access without changing cookie
u = urllib.open(url)
u = urllib.open(url) ## cookie is changed
print u.read()
Simply I tried to call open function twice. But cookie is changed and it doesn't work
(Check Page Again)
So I use urllib2 with cooklib
import os.path
cj = None
ClientCookie = None
cookielib = None
import cookielib
import urllib2
cj = cookielib.LWPCookieJar()
if os.path.isfile('cookie.lpw'):
cj.load('cookie.lpw')
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)
theurl = url
txdata = None
txheaders = {'User-Agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'}
req = urllib2.Request(theurl, txdata, txheaders)
handle = urllib2.urlopen(req) ## error raised
Error Code Here
Traceback (most recent call last):
File "<pyshell#20>", line 1, in <module>
handle = urlopen(req)
File "C:\Python27\lib\urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 410, in open
response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 523, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 448, in error
return self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 382, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 531, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 503: Service Temporarily Unavailable
Simply I want to re-access the site when got HTTP 503 error without change cookies.
But I don't know how to do it.
Somebody help me please.
I would like to be able to update an issue in Jira v5.1 from a Python script using the REST api. I have the following piece of code to extract the information of an existing issue, which works perfectly:
import urllib2
import urllib
import cookielib
import json
serverURL = 'http://jiraserver.com'
# Get the authentication cookie using the REST API
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
authURL = serverURL + '/rest/auth/latest/session'
creds = {'username' : jirauser, 'password' : passwd}
req = urllib2.Request(authURL)
req.add_data(json.dumps(creds))
req.add_header("Content-type", "application/json")
req.add_header("Accept", "application/json")
fp = opener.open(req)
fp.close()
queryURL = serverURL + '/rest/api/latest/issue/SANDBOX-150'
req = urllib2.Request(queryURL)
req.add_header("Content-type", "application/json")
req.add_header("Accept", "application/json")
fp = opener.open(req)
data = json.load(fp)
fp.close()
I would like to extend this to be able to update the same issue, and I have the following piece of code:
queryURL = serverURL + '/rest/api/latest/issue/SANDBOX-150'
issueUpdate = {
'update': {
'comment': [
{
'add': {
'body': 'this is a comment'
}
}
]
}
}
req = urllib2.Request(queryURL)
req.add_data(json.dumps(issueUpdate))
req.add_header("Content-type", "application/json")
req.add_header("Accept", "application/json")
fp = opener.open(req)
fp.close()
When I try to execute the code, I get the following error message:
File "/usr/lib64/python2.6/urllib2.py", line 397, in open
response = meth(req, response)
File "/usr/lib64/python2.6/urllib2.py", line 510, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib64/python2.6/urllib2.py", line 435, in error
return self._call_chain(*args)
File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain
result = func(*args)
File "/usr/lib64/python2.6/urllib2.py", line 518, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 405: Method Not Allowed
and the error points back to "fp = opener.open(req)" in my code.
I have tried to search the web to see if I could find out a solution but without luck. Does anyone know what I'm doing wrong?
Thanks and regards
If you're using Python 2.7.x, I recommend jira-python. It's a Python package that handles the entire REST communication with Jira:
http://jira-python.readthedocs.org/en/latest/