I am attempting to use requests to obtain information from a website. The problem is that this website requires a homepage to open in one tab, while the information you need is open in another. If I close that homepage, the page I need no longer keeps my login session. How could I imitate having two tabs open to prevent this issue. Example:
session = requests.Session()
payload = {'username':'username_here','password':'password_here'}
webSession = session.post("http://website.com/login", data=payload)
webSession2 = session.get("https://website.com/home/home-page")
webSession3 = session.get("https://website.com/Reports/1234")
webSession returns <Response [200]>
webSession2 also returns <Response [200]>, implying my login was successful.
webSession3 returns <Response [401]>, implying I'm no longer logged in.
How can I get webSession3 to return the information I want?
You're already creating a Session object. Use it to get cookies after your first request (probably homepage in your case).
cookies = session.cookies.get_dict()
Now pass cookies in your subsequent request(s):
response = session.post("https://website.com/Reports/1234", cookies=cookies)
Related
I am trying to get some data from an xml of a website which requires to login first. I try to get python requests Session and I am able to login using get method (shows 200). After that I try to get access to the xml, but it gets me 401.
So far I know that the server checks with each call whether the client is sending a JSESSIONID cookie and whether the VALUE matches the current session.
I manage to get the corresponding cookies and tried to send it via post method but still get 401.
Maybe I think to complicated. I also do not need to achieve this with requests. I would be glad to just get the information from the xml.
import requests
login_url = 'https://www.example.com/login
USER_NAME = 'user'
PASSWORD = 'pass'
xml = 'https://www.example.com/channel_index?123&p=web'
with requests.Session() as s:
response = s.get(login_url,auth = (USER_NAME, PASSWORD))
print(response)
r = s.get(xml)
cookies = s.cookies.get_dict()
r = s.post(xml, cookies = cookies)
print(r)
HELLO I'm now trying to get information from the website that needs log in.
But I already get 200 response in the reqeustURL where I should POST some ID, passwords and requests.
headers dict have requests_headers that can be seen in the chrome developer network tap. form data dict have the ID and passwords.
login_site = requests.post(requestUrl, headers=headers, data=form_data)
status_code = login_site.status_code print(status_code)
I got 200
The code below is the way I've tried.
1. Session.
when I tried to set cookies with session, I failed. I've heard that session could set the cookies when I scrape other pages that need log-in.
session = requests.Session()
session.post(requestUrl, headers=headers, data=form_data)
test = session.get('~~') #the website that I want to scrape
print(test.status_code)
I got 403
2. Manually set cookie
I manually made the cookie dict that I can get
cookies = {'wcs_bt':'...','_production_session_id':'...'}
r = requests.post('http://engoo.co.kr/dashboard', cookies = cookies)
print(r.status_code)
I also got 403
Actually, I don't know what should I write in the cookies dict. when I get,'wcs_bt=AAA; _production_session_id=BBB; _ga=CCC;',should I change it to dict {'wcs_bt':'AAA'.. }?
When I get cookies
login_site = requests.post(requestUrl, headers=headers, data=form_data)
print(login_site.cookies)
in this code, I only can get
RequestsCookieJar[Cookie _production_session_id=BBB]
Somehow, I failed it also.
How can I scrape it with the cookie?
Scraping a modern (circa 2017 or later) Web site that requires a login can be very tricky, because it's likely that some important portion of the login process is implemented in Javascript.
Unless you execute that Javascript exactly as a browser would, you won't be able to complete the login. Unfortunately, the basic Python libraries won't help.
Consider Selenium with Python, which is used for testing Web sites but can be used to automate any interaction with a Web site.
I'm trying to send a post request to a website to get a json response. I can see the json response in Chrome Inspector when I click on a link, but I can get it using requests.
Firstly I tried to used requests Session to get the cookies first and use them in the post request, to no avail.
session = requests.Session()
session.get('http://www.auchandrive.fr/drive/pagestatique.pagetemplate.popuphandler.popinchangementmagasin.changermag/537?t:ac=PAGE_STATIQUE_ENGAGEMENTS')
response = session.post('http://www.auchandrive.fr/drive/rayon.productlist.pagination_0.topage/1?t:ac=3686973/3686997')
print response.text
Secondly I used Selenium+PhantomJS to get the cookies and used them in requests, no results!
browser = webdriver.PhantomJS(PHANTOMJS_PATH)
browser.get('http://www.auchandrive.fr/drive/pagestatique.pagetemplate.popuphandler.popinchangementmagasin.changermag/537?t:ac=PAGE_STATIQUE_ENGAGEMENTS')
all_cookie = {}
for cookie in browser.get_cookies():
all_cookie[cookie['name']] = cookie['value']
rep = requests.post('http://www.auchandrive.fr/drive/rayon.productlist.pagination_0.topage/1?t:ac=3686973/3686997', cookies=all_cookie)
It only works when I manually take the cookies from Chrome.
I can't see what's the problem!
session = requests.Session()
session.get('http://www.auchandrive.fr/drive/pagestatique.pagetemplate.popuphandler.popinchangementmagasin.changermag/537?t:ac=PAGE_STATIQUE_ENGAGEMENTS')
response = session.post('http://www.auchandrive.fr/drive/rayon.productlist.pagination_0.topage/1?t:ac=3686973/3686997')
print(response.json)
Using the json attribute will fetch the JSON response. You can also use requests to make a persistent session, so the cookies are provided.
response.cookies #The cookies attribute
I've tried solving the problem using Mechanize, but I couldn't get it to work.
A website only allows access to the data if cookies are sent after login. I need to do the following:
Log in using POST to a page
Store cookies
Access protected page
You can use a Session in Requests. From the documentation:
The Session object allows you to persist certain parameters across
requests. It also persists cookies across all requests made from the
Session instance.
Here's how a log in and subsequent request might look:
import requests
s = requests.Session(verify='my_cert_file.crt')
r = s.post('https://secure-site.com/login', data={
'username': my_username,
'password': my_password,
})
# logging in sets a cookie which the session remembers
print s.cookies
r = s.get('https://secure-site.com/secure-data')
print r.json()
I came up with the following solution using Mechanize. Cookies are managed by mechanize.Browser.
br = mechanize.Browser()
resp = br.open('https://xxxxxxxxxxxxxxxxxxx')
br.select_form(nr=0)
br['username'] = username
br['password'] = password
response = br.submit()
time.sleep(1)
resp_second = br.open('https://secretwebpage')
print resp_second.read()
I have been googling for this problem for a week now.
The thing I want to achive is the following:
Send a POST request to the URL including the correct credentials.
Save the session (not cookie since my website is not using cookies at the moment)
With the saved session open a session protected URL and grab the contents.
I have seen alot of topics on this with cookies but not with sessions, I tried sessions with requests but seems to fail everytime.
You want to use a URL opener. Here's a sample of how I've managed to do it. If you just want a default opener, use opener=urllib.request.build_opener(), otherwise use the custom opener. This worked when I had to log into a website and keep a session, using URL as your URL, user as user, password as password, all changed as appropriate.
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(http.cookiejar.CookieJar()))
pData=urllib.parse.urlencode({"identity":user,"password":password})
req=urllib.request.Request(URL,pData.encode('utf-8'))
opener.open(req)
req=urllib.request.Request(url)
response= opener.open(req)