urllib2, python, garbage response when opening specific site

urllib2, python, garbage response when opening specific site - python

So I have been looking around and have managed to cobble together some code that lets me login to the website, http://forums.somethingawful.com
It works, I can see from the response that it works.
When I try using the same urllib2 opener that I created for the above login, to visit this part of the site http://forums.somethingawful.com/attachment.php?attachmentid=300 (which I need to be logged in to view) to open this page, I get a response of "ÿØÿà"
EDIT: http://i.imgur.com/PmWl1s4.png
I have included a screenshot of what the target page looks like when logged in, if this is anymore help
Any ideas why?
"""
# Script to log in to website and store cookies.
# run as: python web_login.py USERNAME PASSWORD
#
# sources of code include:
#
# http://stackoverflow.com/questions/2954381/python-form-post-using-urllib2-also-question-on-saving-using-cookies
# http://stackoverflow.com/questions/301924/python-urllib-urllib2-httplib-confusion
# http://www.voidspace.org.uk/python/articles/cookielib.shtml
#
# mashed together by Martin Chorley
#
# Licensed under a Creative Commons Attribution ShareAlike 3.0 Unported License.
# http://creativecommons.org/licenses/by-sa/3.0/
"""
import urllib, urllib2
import cookielib
import sys
import urlparse
from BeautifulSoup import BeautifulSoup as bs
class WebLogin(object):
def __init__(self, username, password):
# url for website we want to log in to
self.base_url = 'http://forums.somethingawful.com/'
# login action we want to post data to
# could be /login or /account/login or something similar
self.login_action = '/account.php?'
# file for storing cookies
self.cookie_file = 'login.cookies'
# user provided username and password
self.username = username
self.password = password
# set up a cookie jar to store cookies
self.cj = cookielib.MozillaCookieJar(self.cookie_file)
# set up opener to handle cookies, redirects etc
self.opener = urllib2.build_opener(
urllib2.HTTPRedirectHandler(),
urllib2.HTTPHandler(debuglevel=0),
urllib2.HTTPSHandler(debuglevel=0),
urllib2.HTTPCookieProcessor(self.cj)
)
# pretend we're a web browser and not a python script
self.opener.addheaders = [('User-agent',
('Chrome/16.0.912.77'))
]
# open the front page of the website to set and save initial cookies
response = self.opener.open(self.base_url)
self.cj.save()
# try and log in to the site
response = self.login()
response2 = self.opener.open("http://forums.somethingawful.com/attachment.php?attachmentid=300")
print response2.read() + "LLLLLL"
# method to do login
def login(self):
# parameters for login action
# may be different for different websites
# check html source of website for specifics
login_data = urllib.urlencode({
'action': 'login',
'username': 'username',
'password': 'password'
})
# construct the url
login_url = self.base_url + self.login_action
# then open it
response = self.opener.open(login_url, login_data)
# save the cookies and return the response
self.cj.save()
return response
if __name__ == "__main__":
username = "username"
password = "password"
# initialise and login to the website
test = WebLogin(username, password)

Try this instead:
import urllib2,cookielib
def login(username,password):
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookielib.CookieJar()))
url1 = "http://forums.somethingawful.com/attachment.php?attachmentid=300"
url2 = "http://forums.somethingawful.com/account.php?action=loginform"
data = "&username="+username+"&password="+password
socket = opener.open(url1)
socket = opener.open(url2,data)
return socket.read()
P.S.: I wrote it as a standalone function; you can integrate it into your class if it works for you. In addition, the call to opener.open(url1) might be redundant; would need a valid pair of username/password in order to verify that...

Related

How to login to CAS with multiple pages using python requests?

I am trying to use python requests to login to my university's CAS. I've seen a few tutorials and posts online, but they all work for a CAS that has it's username and password fields on the same page. Here is the website I'm trying to login to: https://cas.tamu.edu/cas/login?service=https://howdy.tamu.edu/uPortal/Login&renew=true
As you can see, there is only a field for a username, then after clicking submit, you are taken to the same page, only the field is now for a password. Here is the current code I have that doesn't work:
import requests
import lxml.html
from bs4 import BeautifulSoup
# URL of webpage
login_url = "https://cas.tamu.edu/cas/login?service=https://howdy.tamu.edu/uPortal/Login&renew=true"
howdy = "https://howdy.tamu.edu/uPortal/normal/render.uP"
username = # my username
password = # my password
# create a session to store cookies
sesh = requests.session()
params = {'service': howdy}
# gets the URL and converts the text of the HTML code
req = sesh.get(login_url, params=params)
html_content = req.text
print html_content
# parsing the page for hidden inputs
login_html = lxml.html.fromstring(html_content)
hidden_inputs = login_html.xpath(r'//form//input[#type="hidden"]')
user_form = {x.attrib["name"]: x.attrib["value"] for x in hidden_inputs}
print(user_form)
user_form["username"] = username
user_response = sesh.post(login_url, data=user_form)
print user_response.url
# same thing for the password page
pass_form = {x.attrib["name"]: x.attrib["value"] for x in hidden_inputs}
print(pass_form)
pass_form["password"] = password
pass_response = sesh.post(user_response.url, data=pass_form)
print pass_response.url
I based this off of this tutorial: https://brennan.io/2016/03/02/logging-in-with-requests/. Specifically the section about CAS.

I found a workaround that is not exactly what I wanted, but works for my purposes. I'm posting this in case anyone finds this with a similar problem. I used browsercookie to use the current cookies from chrome, so if I'm already logged in, then I can access info behind the CAS. More info in this post: https://stackoverflow.com/a/29628188/9613156.

Python3.x how to stay logged in using requests? [duplicate]

I am using the requests module.
I have figured out how to submit data to a login form on a website and retrieve the session key, but I can't see an obvious way to use this session key in subsequent requests.
Can someone fill in the ellipsis in the code below or suggest another approach?
>>> import requests
>>> login_data = {'formPosted': '1', 'login_email': 'me#example.com', 'password': 'pw'}
>>> r = requests.post('https://localhost/login.py', login_data)
>>>
>>> r.text
'You are being redirected here'
>>> r.cookies
{'session_id_myapp': '127-0-0-1-825ff22a-6ed1-453b-aebc-5d3cf2987065'}
>>>
>>> r2 = requests.get('https://localhost/profile_data.json', ...)

You can easily create a persistent session using:
s = requests.Session()
After that, continue with your requests as you would:
s.post('https://localhost/login.py', login_data)
# logged in! cookies saved for future requests.
r2 = s.get('https://localhost/profile_data.json', ...)
# cookies sent automatically!
# do whatever, s will keep your cookies intact :)
For more about Sessions: https://requests.readthedocs.io/en/latest/user/advanced/#session-objects

the other answers help to understand how to maintain such a session. Additionally, I want to provide a class which keeps the session maintained over different runs of a script (with a cache file). This means a proper "login" is only performed when required (timout or no session exists in cache). Also it supports proxy settings over subsequent calls to 'get' or 'post'.
It is tested with Python3.
Use it as a basis for your own code. The following snippets are release with GPL v3
import pickle
import datetime
import os
from urllib.parse import urlparse
import requests
class MyLoginSession:
"""
a class which handles and saves login sessions. It also keeps track of proxy settings.
It does also maintine a cache-file for restoring session data from earlier
script executions.
"""
def __init__(self,
loginUrl,
loginData,
loginTestUrl,
loginTestString,
sessionFileAppendix = '_session.dat',
maxSessionTimeSeconds = 30 * 60,
proxies = None,
userAgent = 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1',
debug = True,
forceLogin = False,
**kwargs):
"""
save some information needed to login the session
you'll have to provide 'loginTestString' which will be looked for in the
responses html to make sure, you've properly been logged in
'proxies' is of format { 'https' : 'https://user:pass#server:port', 'http' : ...
'loginData' will be sent as post data (dictionary of id : value).
'maxSessionTimeSeconds' will be used to determine when to re-login.
"""
urlData = urlparse(loginUrl)
self.proxies = proxies
self.loginData = loginData
self.loginUrl = loginUrl
self.loginTestUrl = loginTestUrl
self.maxSessionTime = maxSessionTimeSeconds
self.sessionFile = urlData.netloc + sessionFileAppendix
self.userAgent = userAgent
self.loginTestString = loginTestString
self.debug = debug
self.login(forceLogin, **kwargs)
def modification_date(self, filename):
"""
return last file modification date as datetime object
"""
t = os.path.getmtime(filename)
return datetime.datetime.fromtimestamp(t)
def login(self, forceLogin = False, **kwargs):
"""
login to a session. Try to read last saved session from cache file. If this fails
do proper login. If the last cache access was too old, also perform a proper login.
Always updates session cache file.
"""
wasReadFromCache = False
if self.debug:
print('loading or generating session...')
if os.path.exists(self.sessionFile) and not forceLogin:
time = self.modification_date(self.sessionFile)
# only load if file less than 30 minutes old
lastModification = (datetime.datetime.now() - time).seconds
if lastModification < self.maxSessionTime:
with open(self.sessionFile, "rb") as f:
self.session = pickle.load(f)
wasReadFromCache = True
if self.debug:
print("loaded session from cache (last access %ds ago) "
% lastModification)
if not wasReadFromCache:
self.session = requests.Session()
self.session.headers.update({'user-agent' : self.userAgent})
res = self.session.post(self.loginUrl, data = self.loginData,
proxies = self.proxies, **kwargs)
if self.debug:
print('created new session with login' )
self.saveSessionToCache()
# test login
res = self.session.get(self.loginTestUrl)
if res.text.lower().find(self.loginTestString.lower()) < 0:
raise Exception("could not log into provided site '%s'"
" (did not find successful login string)"
% self.loginUrl)
def saveSessionToCache(self):
"""
save session to a cache file
"""
# always save (to update timeout)
with open(self.sessionFile, "wb") as f:
pickle.dump(self.session, f)
if self.debug:
print('updated session cache-file %s' % self.sessionFile)
def retrieveContent(self, url, method = "get", postData = None, **kwargs):
"""
return the content of the url with respect to the session.
If 'method' is not 'get', the url will be called with 'postData'
as a post request.
"""
if method == 'get':
res = self.session.get(url , proxies = self.proxies, **kwargs)
else:
res = self.session.post(url , data = postData, proxies = self.proxies, **kwargs)
# the session has been updated on the server, so also update in cache
self.saveSessionToCache()
return res
A code snippet for using the above class may look like this:
if __name__ == "__main__":
# proxies = {'https' : 'https://user:pass#server:port',
# 'http' : 'http://user:pass#server:port'}
loginData = {'user' : 'usr',
'password' : 'pwd'}
loginUrl = 'https://...'
loginTestUrl = 'https://...'
successStr = 'Hello Tom'
s = MyLoginSession(loginUrl, loginData, loginTestUrl, successStr,
#proxies = proxies
)
res = s.retrieveContent('https://....')
print(res.text)
# if, for instance, login via JSON values required try this:
s = MyLoginSession(loginUrl, None, loginTestUrl, successStr,
#proxies = proxies,
json = loginData)

Check out my answer in this similar question:
python: urllib2 how to send cookie with urlopen request
import urllib2
import urllib
from cookielib import CookieJar
cj = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
# input-type values from the html form
formdata = { "username" : username, "password": password, "form-id" : "1234" }
data_encoded = urllib.urlencode(formdata)
response = opener.open("https://page.com/login.php", data_encoded)
content = response.read()
EDIT:
I see I've gotten a few downvotes for my answer, but no explaining comments. I'm guessing it's because I'm referring to the urllib libraries instead of requests. I do that because the OP asks for help with requests or for someone to suggest another approach.

The documentation says that get takes in an optional cookies argument allowing you to specify cookies to use:
from the docs:
>>> url = 'http://httpbin.org/cookies'
>>> cookies = dict(cookies_are='working')
>>> r = requests.get(url, cookies=cookies)
>>> r.text
'{"cookies": {"cookies_are": "working"}}'
http://docs.python-requests.org/en/latest/user/quickstart/#cookies

Upon trying all the answers above, I found that using "RequestsCookieJar" instead of the regular CookieJar for subsequent requests fixed my problem.
import requests
import json
# The Login URL
authUrl = 'https://whatever.com/login'
# The subsequent URL
testUrl = 'https://whatever.com/someEndpoint'
# Logout URL
testlogoutUrl = 'https://whatever.com/logout'
# Whatever you are posting
login_data = {'formPosted':'1',
'login_email':'me#example.com',
'password':'pw'
}
# The Authentication token or any other data that we will receive from the Authentication Request.
token = ''
# Post the login Request
loginRequest = requests.post(authUrl, login_data)
print("{}".format(loginRequest.text))
# Save the request content to your variable. In this case I needed a field called token.
token = str(json.loads(loginRequest.content)['token']) # or ['access_token']
print("{}".format(token))
# Verify Successful login
print("{}".format(loginRequest.status_code))
# Create your Requests Cookie Jar for your subsequent requests and add the cookie
jar = requests.cookies.RequestsCookieJar()
jar.set('LWSSO_COOKIE_KEY', token)
# Execute your next request(s) with the Request Cookie Jar set
r = requests.get(testUrl, cookies=jar)
print("R.TEXT: {}".format(r.text))
print("R.STCD: {}".format(r.status_code))
# Execute your logout request(s) with the Request Cookie Jar set
r = requests.delete(testlogoutUrl, cookies=jar)
print("R.TEXT: {}".format(r.text)) # should show "Request Not Authorized"
print("R.STCD: {}".format(r.status_code)) # should show 401

Save only required cookies and reuse them.
import os
import pickle
from urllib.parse import urljoin, urlparse
login = 'my#email.com'
password = 'secret'
# Assuming two cookies are used for persistent login.
# (Find it by tracing the login process)
persistentCookieNames = ['sessionId', 'profileId']
URL = 'http://example.com'
urlData = urlparse(URL)
cookieFile = urlData.netloc + '.cookie'
signinUrl = urljoin(URL, "/signin")
with requests.Session() as session:
try:
with open(cookieFile, 'rb') as f:
print("Loading cookies...")
session.cookies.update(pickle.load(f))
except Exception:
# If could not load cookies from file, get the new ones by login in
print("Login in...")
post = session.post(
signinUrl,
data={
'email': login,
'password': password,
}
)
try:
with open(cookieFile, 'wb') as f:
jar = requests.cookies.RequestsCookieJar()
for cookie in session.cookies:
if cookie.name in persistentCookieNames:
jar.set_cookie(cookie)
pickle.dump(jar, f)
except Exception as e:
os.remove(cookieFile)
raise(e)
MyPage = urljoin(URL, "/mypage")
page = session.get(MyPage)

snippet to retrieve json data, password protected
import requests
username = "my_user_name"
password = "my_super_secret"
url = "https://www.my_base_url.com"
the_page_i_want = "/my_json_data_page"
session = requests.Session()
# retrieve cookie value
resp = session.get(url+'/login')
csrf_token = resp.cookies['csrftoken']
# login, add referer
resp = session.post(url+"/login",
data={
'username': username,
'password': password,
'csrfmiddlewaretoken': csrf_token,
'next': the_page_i_want,
},
headers=dict(Referer=url+"/login"))
print(resp.json())

This will work for you in Python;
# Call JIRA API with HTTPBasicAuth
import json
import requests
from requests.auth import HTTPBasicAuth
JIRA_EMAIL = "****"
JIRA_TOKEN = "****"
BASE_URL = "https://****.atlassian.net"
API_URL = "/rest/api/3/serverInfo"
API_URL = BASE_URL+API_URL
BASIC_AUTH = HTTPBasicAuth(JIRA_EMAIL, JIRA_TOKEN)
HEADERS = {'Content-Type' : 'application/json;charset=iso-8859-1'}
response = requests.get(
API_URL,
headers=HEADERS,
auth=BASIC_AUTH
)
print(json.dumps(json.loads(response.text), sort_keys=True, indent=4, separators=(",", ": ")))

Scrape data from a page that requires a login

I am new to Python and Web Scraping and I am trying to write a very basic script that will get data from a webpage that can only be accessed after logging in. I have looked at a bunch of different examples but none are fixing the issue. This is what I have so far:
from bs4 import BeautifulSoup
import urllib, urllib2, cookielib
username = 'name'
password = 'pass'
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'password' : password})
opener.open('WebpageWithLoginForm')
resp = opener.open('WebpageIWantToAccess')
soup = BeautifulSoup(resp, 'html.parser')
print soup.prettify()
As of right now when I print the page it just prints the contents of the page as if I was not logged in. I think the issue has something to do with the way I am setting the cookies but I am really not sure because I do not fully understand what is happening with the cookie processor and its libraries.
Thank you!
Current Code:
import requests
import sys
EMAIL = 'usr'
PASSWORD = 'pass'
URL = 'https://connect.lehigh.edu/app/login'
def main():
# Start a session so we can have persistant cookies
session = requests.session(config={'verbose': sys.stderr})
# This is the form data that the page sends when logging in
login_data = {
'username': EMAIL,
'password': PASSWORD,
'LOGIN': 'login',
}
# Authenticate
r = session.post(URL, data=login_data)
# Try accessing a page that requires you to be logged in
r = session.get('https://lewisweb.cc.lehigh.edu/PROD/bwskfshd.P_CrseSchdDetl')
if __name__ == '__main__':
main()

You can use the requests module.
Take a look at this answer that i've linked below.
https://stackoverflow.com/a/8316989/6464893

Python requests post doing nothing

I am using requests and cfscrape library to login to https://kissanime.to/Login
'''Login to website'''
def login(self, usr, pw):
login_url = 'https://kissanime.to/Login'
sess = requests.Session()
# login credentials
payload = {
'username': usr,
'password': pw,
'redirect': ''
}
# Creating cfscrape instance of the session
scraper_sess = cfscrape.create_scraper(sess)
a = scraper_sess.post(login_url, data=payload)
print(a.text)
print(a.status_code)
a.text gives me the same login page
a.status_code gives me 200
That means my login is not working at all. Am I missing something? According to chrome's network monitor, I should also get status code 302
POST DATA image:

I solved it using mechanicalsoup
Code:
import mechanicalsoup
'''Login to website'''
def login(self, usr, pw):
login_url = 'https://kissanime.to/Login'
# Creating cfscrape instance
self.r = cfscrape.create_scraper()
login_page = self.r.get(login_url)
# Creating a mechanicalsoup browser instance with
# response object of cfscrape
browser = mechanicalsoup.Browser(self.r)
soup = BeautifulSoup(login_page.text, 'html.parser')
# grab the login form
login_form = soup.find('form', {'id':'formLogin'})
# find login and password inputs
login_form.find('input', {'name': 'username'})['value'] = usr
login_form.find('input', {'name': 'password'})['value'] = pw
browser.submit(login_form, login_page.url)

This content is from Requests Documentation:
Many web services that require authentication accept HTTP Basic Auth. This is the simplest kind, and Requests supports it straight out of the box.
requests.get('https://api.github.com/user', auth=HTTPBasicAuth('user', 'pass'))
You have to send the payload as JSON..
import requests,json
'''Login to website'''
def login(self, usr, pw):
login_url = 'https://kissanime.to/Login'
sess = requests.Session()
# login credentials
payload = {
'username': usr,
'password': pw,
'redirect': ''
}
# Creating cfscrape instance of the session
scraper_sess = cfscrape.create_scraper(sess)
a = scraper_sess.post(login_url, data=json.dumps(payload))
print(a.text)
print(a.status_code)
Reference: http://docs.python-requests.org/en/master/user/authentication/

Python Requests and persistent sessions

I am using the requests module.
I have figured out how to submit data to a login form on a website and retrieve the session key, but I can't see an obvious way to use this session key in subsequent requests.
Can someone fill in the ellipsis in the code below or suggest another approach?
>>> import requests
>>> login_data = {'formPosted': '1', 'login_email': 'me#example.com', 'password': 'pw'}
>>> r = requests.post('https://localhost/login.py', login_data)
>>>
>>> r.text
'You are being redirected here'
>>> r.cookies
{'session_id_myapp': '127-0-0-1-825ff22a-6ed1-453b-aebc-5d3cf2987065'}
>>>
>>> r2 = requests.get('https://localhost/profile_data.json', ...)

You can easily create a persistent session using:
s = requests.Session()
After that, continue with your requests as you would:
s.post('https://localhost/login.py', login_data)
# logged in! cookies saved for future requests.
r2 = s.get('https://localhost/profile_data.json', ...)
# cookies sent automatically!
# do whatever, s will keep your cookies intact :)
For more about Sessions: https://requests.readthedocs.io/en/latest/user/advanced/#session-objects

the other answers help to understand how to maintain such a session. Additionally, I want to provide a class which keeps the session maintained over different runs of a script (with a cache file). This means a proper "login" is only performed when required (timout or no session exists in cache). Also it supports proxy settings over subsequent calls to 'get' or 'post'.
It is tested with Python3.
Use it as a basis for your own code. The following snippets are release with GPL v3
import pickle
import datetime
import os
from urllib.parse import urlparse
import requests
class MyLoginSession:
"""
a class which handles and saves login sessions. It also keeps track of proxy settings.
It does also maintine a cache-file for restoring session data from earlier
script executions.
"""
def __init__(self,
loginUrl,
loginData,
loginTestUrl,
loginTestString,
sessionFileAppendix = '_session.dat',
maxSessionTimeSeconds = 30 * 60,
proxies = None,
userAgent = 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1',
debug = True,
forceLogin = False,
**kwargs):
"""
save some information needed to login the session
you'll have to provide 'loginTestString' which will be looked for in the
responses html to make sure, you've properly been logged in
'proxies' is of format { 'https' : 'https://user:pass#server:port', 'http' : ...
'loginData' will be sent as post data (dictionary of id : value).
'maxSessionTimeSeconds' will be used to determine when to re-login.
"""
urlData = urlparse(loginUrl)
self.proxies = proxies
self.loginData = loginData
self.loginUrl = loginUrl
self.loginTestUrl = loginTestUrl
self.maxSessionTime = maxSessionTimeSeconds
self.sessionFile = urlData.netloc + sessionFileAppendix
self.userAgent = userAgent
self.loginTestString = loginTestString
self.debug = debug
self.login(forceLogin, **kwargs)
def modification_date(self, filename):
"""
return last file modification date as datetime object
"""
t = os.path.getmtime(filename)
return datetime.datetime.fromtimestamp(t)
def login(self, forceLogin = False, **kwargs):
"""
login to a session. Try to read last saved session from cache file. If this fails
do proper login. If the last cache access was too old, also perform a proper login.
Always updates session cache file.
"""
wasReadFromCache = False
if self.debug:
print('loading or generating session...')
if os.path.exists(self.sessionFile) and not forceLogin:
time = self.modification_date(self.sessionFile)
# only load if file less than 30 minutes old
lastModification = (datetime.datetime.now() - time).seconds
if lastModification < self.maxSessionTime:
with open(self.sessionFile, "rb") as f:
self.session = pickle.load(f)
wasReadFromCache = True
if self.debug:
print("loaded session from cache (last access %ds ago) "
% lastModification)
if not wasReadFromCache:
self.session = requests.Session()
self.session.headers.update({'user-agent' : self.userAgent})
res = self.session.post(self.loginUrl, data = self.loginData,
proxies = self.proxies, **kwargs)
if self.debug:
print('created new session with login' )
self.saveSessionToCache()
# test login
res = self.session.get(self.loginTestUrl)
if res.text.lower().find(self.loginTestString.lower()) < 0:
raise Exception("could not log into provided site '%s'"
" (did not find successful login string)"
% self.loginUrl)
def saveSessionToCache(self):
"""
save session to a cache file
"""
# always save (to update timeout)
with open(self.sessionFile, "wb") as f:
pickle.dump(self.session, f)
if self.debug:
print('updated session cache-file %s' % self.sessionFile)
def retrieveContent(self, url, method = "get", postData = None, **kwargs):
"""
return the content of the url with respect to the session.
If 'method' is not 'get', the url will be called with 'postData'
as a post request.
"""
if method == 'get':
res = self.session.get(url , proxies = self.proxies, **kwargs)
else:
res = self.session.post(url , data = postData, proxies = self.proxies, **kwargs)
# the session has been updated on the server, so also update in cache
self.saveSessionToCache()
return res
A code snippet for using the above class may look like this:
if __name__ == "__main__":
# proxies = {'https' : 'https://user:pass#server:port',
# 'http' : 'http://user:pass#server:port'}
loginData = {'user' : 'usr',
'password' : 'pwd'}
loginUrl = 'https://...'
loginTestUrl = 'https://...'
successStr = 'Hello Tom'
s = MyLoginSession(loginUrl, loginData, loginTestUrl, successStr,
#proxies = proxies
)
res = s.retrieveContent('https://....')
print(res.text)
# if, for instance, login via JSON values required try this:
s = MyLoginSession(loginUrl, None, loginTestUrl, successStr,
#proxies = proxies,
json = loginData)

Check out my answer in this similar question:
python: urllib2 how to send cookie with urlopen request
import urllib2
import urllib
from cookielib import CookieJar
cj = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
# input-type values from the html form
formdata = { "username" : username, "password": password, "form-id" : "1234" }
data_encoded = urllib.urlencode(formdata)
response = opener.open("https://page.com/login.php", data_encoded)
content = response.read()
EDIT:
I see I've gotten a few downvotes for my answer, but no explaining comments. I'm guessing it's because I'm referring to the urllib libraries instead of requests. I do that because the OP asks for help with requests or for someone to suggest another approach.

The documentation says that get takes in an optional cookies argument allowing you to specify cookies to use:
from the docs:
>>> url = 'http://httpbin.org/cookies'
>>> cookies = dict(cookies_are='working')
>>> r = requests.get(url, cookies=cookies)
>>> r.text
'{"cookies": {"cookies_are": "working"}}'
http://docs.python-requests.org/en/latest/user/quickstart/#cookies

Upon trying all the answers above, I found that using "RequestsCookieJar" instead of the regular CookieJar for subsequent requests fixed my problem.
import requests
import json
# The Login URL
authUrl = 'https://whatever.com/login'
# The subsequent URL
testUrl = 'https://whatever.com/someEndpoint'
# Logout URL
testlogoutUrl = 'https://whatever.com/logout'
# Whatever you are posting
login_data = {'formPosted':'1',
'login_email':'me#example.com',
'password':'pw'
}
# The Authentication token or any other data that we will receive from the Authentication Request.
token = ''
# Post the login Request
loginRequest = requests.post(authUrl, login_data)
print("{}".format(loginRequest.text))
# Save the request content to your variable. In this case I needed a field called token.
token = str(json.loads(loginRequest.content)['token']) # or ['access_token']
print("{}".format(token))
# Verify Successful login
print("{}".format(loginRequest.status_code))
# Create your Requests Cookie Jar for your subsequent requests and add the cookie
jar = requests.cookies.RequestsCookieJar()
jar.set('LWSSO_COOKIE_KEY', token)
# Execute your next request(s) with the Request Cookie Jar set
r = requests.get(testUrl, cookies=jar)
print("R.TEXT: {}".format(r.text))
print("R.STCD: {}".format(r.status_code))
# Execute your logout request(s) with the Request Cookie Jar set
r = requests.delete(testlogoutUrl, cookies=jar)
print("R.TEXT: {}".format(r.text)) # should show "Request Not Authorized"
print("R.STCD: {}".format(r.status_code)) # should show 401

Save only required cookies and reuse them.
import os
import pickle
from urllib.parse import urljoin, urlparse
login = 'my#email.com'
password = 'secret'
# Assuming two cookies are used for persistent login.
# (Find it by tracing the login process)
persistentCookieNames = ['sessionId', 'profileId']
URL = 'http://example.com'
urlData = urlparse(URL)
cookieFile = urlData.netloc + '.cookie'
signinUrl = urljoin(URL, "/signin")
with requests.Session() as session:
try:
with open(cookieFile, 'rb') as f:
print("Loading cookies...")
session.cookies.update(pickle.load(f))
except Exception:
# If could not load cookies from file, get the new ones by login in
print("Login in...")
post = session.post(
signinUrl,
data={
'email': login,
'password': password,
}
)
try:
with open(cookieFile, 'wb') as f:
jar = requests.cookies.RequestsCookieJar()
for cookie in session.cookies:
if cookie.name in persistentCookieNames:
jar.set_cookie(cookie)
pickle.dump(jar, f)
except Exception as e:
os.remove(cookieFile)
raise(e)
MyPage = urljoin(URL, "/mypage")
page = session.get(MyPage)

snippet to retrieve json data, password protected
import requests
username = "my_user_name"
password = "my_super_secret"
url = "https://www.my_base_url.com"
the_page_i_want = "/my_json_data_page"
session = requests.Session()
# retrieve cookie value
resp = session.get(url+'/login')
csrf_token = resp.cookies['csrftoken']
# login, add referer
resp = session.post(url+"/login",
data={
'username': username,
'password': password,
'csrfmiddlewaretoken': csrf_token,
'next': the_page_i_want,
},
headers=dict(Referer=url+"/login"))
print(resp.json())

This will work for you in Python;
# Call JIRA API with HTTPBasicAuth
import json
import requests
from requests.auth import HTTPBasicAuth
JIRA_EMAIL = "****"
JIRA_TOKEN = "****"
BASE_URL = "https://****.atlassian.net"
API_URL = "/rest/api/3/serverInfo"
API_URL = BASE_URL+API_URL
BASIC_AUTH = HTTPBasicAuth(JIRA_EMAIL, JIRA_TOKEN)
HEADERS = {'Content-Type' : 'application/json;charset=iso-8859-1'}
response = requests.get(
API_URL,
headers=HEADERS,
auth=BASIC_AUTH
)
print(json.dumps(json.loads(response.text), sort_keys=True, indent=4, separators=(",", ": ")))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

urllib2, python, garbage response when opening specific site - python

Related

How to login to CAS with multiple pages using python requests?

Python3.x how to stay logged in using requests? [duplicate]

Scrape data from a page that requires a login

Python requests post doing nothing

Python Requests and persistent sessions

Categories

Resources