I came across this question: How to use Python to login to a webpage and retrieve cookies for later usage?
So, I'm trying to log into a page, (using the request method, second answer).
When I print the HTML using
print request.text
It will print the HTML of the login page, but not the subpage that I put on request.
Is there a problem with the code (which I don't think) or is it mine?
The code is similar to the one on that question, with different pages and usernames.
Thanks!
from requests import session
USERNAME = 'myuser'
PASSWORD = 'mypwd'
payload = {
'action': 'login',
'username': USERNAME,
'password': PASSWORD
}
with session() as c:
c.post('https://www.bricklink.com/login.asp', data=payload) #Login page
request = c.get('http://www.bricklink.com/orderExcelFinal.asp?') #Page I want to access
print request.headers
print request.text
Output
HTML code for the Login page, but not the page I want to access
Your code isn't sending the correct data on the login request.
Each web page is different, and sends different data in order to log in. Yours should be structured like this:
from requests import session
USERNAME = 'myuser'
PASSWORD = 'mypwd'
query = {
'logInTo': '',
'logFolder': 'p',
'logSub': 'w',
}
payload = {
'a': 'a',
'logFrmFlag': 'Y',
'frmUsername': USERNAME,
'frmPassword': PASSWORD,
}
with session() as c:
c.post('https://www.bricklink.com/login.asp', params=query, data=payload) #Login page
request = c.get('http://www.bricklink.com/orderExcelFinal.asp') #Page I want to access
print request.headers
print request.text
In the future, when you need to work out what data needs to be send on an attempt to submit a form, you should use Chrome or Firefox's Developer Tools. Use these to record your login attempt, and then structure the data accordingly. Getting started with using Chrome's developer tools is a bit beyond the scope of this answer, but there are lots of good resources on the web for finding out how you get this information.
Related
i want to webscrpe this website and i need to login to get the account datail (https://online.cspension.com.hk/hkpensionweb/#/login?source=corporateHK&vgnLocale=en_CA), i don't want to use selenium
how to login and get the account datail
You can use a requests.Session() instance to make a post request to the login url with your login details as a payload. Making requests from a session instance is essentially the same as using requests normally, it simply adds persistence, allowing you to store and use cookies etc.
import requests
# Fill in your details here to be posted to the login form.
payload = {
'inUserName': 'username',
'inUserPass': 'password'
}
# Use 'with' to ensure the session context is closed after use.
with requests.Session() as s:
p = s.post('LOGIN_URL', data=payload)
# print the html returned or something more intelligent to see if it's a successful login page.
print p.text
# An authorised request.
r = s.get('A protected web page url')
print r.text
# etc...
I am trying to login to a website www.seek.com.au. I am trying to test the possibility to remote login using Python request module. The site is Front end is designed using React and hence I don't see any form action component in www.seek.com.au/sign-in
When I run the below code, I see the response code as 200 indicating success, but I doubt if it's actually successful. The main concern is which URL to use in case if there is no action element in the login submit form.
import requests
payload = {'email': <username>, 'password': <password>}
url = 'https://www.seek.com.au'
with requests.Session() as s:
response_op = s.post(url, data=payload)
# print the response status code
print(response_op.status_code)
print(response_op.text)
When i examine the output data (response_op.text), i see word 'Sign in' and 'Register' in output which indicate the login failed. If its successful, the users first name will be shown in the place. What am I doing wrong here ?
P.S: I am not trying to scrape data from this website but I am trying to login to a similar website.
Try this code:
import requests
payload={"email": "test#test.com", "password": "passwordtest", "rememberMe": True}
url = "https://www.seek.com.au:443/userapi/login"
with requests.Session() as s:
response_op = s.post(url, json=payload)
# print the response status code
print(response_op.status_code)
print(response_op.text)
You are sending the request to the wrong url.
Hope this helps
I'm trying to login to Campaign Monitor to scrape some data from pages related to email campaign performance.
The "login-protected" URL of the page I'm trying to access looks like this:
https://mycompany.createsend.com/campaigns/reports/lists/DFGDF987GD98F7GD?s=BCV98B5XF54BVC54BC
Going to that page in a web browser (try it here) will redirect to the login page, itself with a URL like this:
https://login.createsend.com/l/98SDF76DS87F68S/DFGDF987GD98F7GD?ReturnUrl=%2Fcampaigns%2Freports%2Flists%2FBCV98B5XF54BVC54BC%3Fs%3BCV98B5XF54BVC54BC&s=7DS6F87S6DF876SDF76
What I've gathered from trying to solve this is that I need to open a session, authenticate on the redirect URL, then request the URL that I actually want (using the authenticated session).
Here is the code I'm using to try to accomplish that:
payload = {
'username': 'myUsername',
'password': 'myPassword',
}
redURL = 'https://login.createsend.com/l/98SDF76DS87F68S/DFGDF987GD98F7GD?ReturnUrl=%2Fcampaigns%2Freports%2Flists%2FBCV98B5XF54BVC54BC%3Fs%3BCV98B5XF54BVC54BC&s=7DS6F87S6DF876SDF76'
with requests.Session() as s:
p = s.post(redURL, data=payload)
# This prints the "success" message I've pasted below
print p.content
r = s.get('https://mycompany.createsend.com/campaigns/reports/lists/DFGDF987GD98F7GD?s=BCV98B5XF54BVC54BC')
# This prints the HTML of the login page again, as if I'm not authenticated
print r.content
Here is the "successful" response after the first POST for the session:
{"MultipleAccounts":false,"LoginStatus":"Success","SiteAddress":"https://mycompany.createsend.com","ErrorMessage":"","SessionExpired":false,"Url":"https://mycompany.createsend.com/login?Origin=Marketing\u0026ReturnUrl=%2fcampaigns%2freports%2flists%2f92D2FBCV98B5XF54BVC%3fs%7DS6F87S6DF876SDF76\u0026s=2FBCV98B5XF54BVC","DomainSwitchAddress":"https://mycompany.createsend.com","DomainSwitchAddressQueryString":null,"NeedsDomainSwitch":false}
Can someone please help me out with why the second request in the session prints the HTML of the login page instead of the HTML of the authenticated version of the page (ie. the page with the data I'm looking for)?
I'm trying to use Python to scrape a website, but I have to login first before I can get to the page with the data on it.
The URL for the login page is:
https://tunein.com/account/login/?returnTo=https://amplifier.tunein.com/sessions/new&source=amplifier
I have read numerous threads which seem to answer the question, but I'm struggling to relate it to my own situation.
The code I have (from a response in this thread) is:
import requests
# Fill in your details here to be posted to the login form.
payload = {
'Username': 'user',
'Password': 'password'
}
# Use 'with' to ensure the session context is closed after use.
with requests.Session() as s:
p = s.post('https://tunein.com/account/login/?returnTo=https://amplifier.tunein.com/sessions/new&source=amplifier', data=payload)
# print the html returned or something more intelligent to see if it's a successful login page.
print p.text
I have looked at the source code to see what the name of the form fields are, hence the 'Username' and 'Password' attributes in the payload variable.
When I run the script, p.text just returns the HTML of the same page so it obviously isn't logging in correctly. Any suggestions? Is there a better way to do it?
Edit:
The "Form Data" headers once I log in are:
Username:user
Password:pass
Remember:true
Remember:false
btnLogin:Sign In
ReturnTo:https://amplifier.tunein.com/sessions/new
Source:amplifier
Does this mean I have to add all of these to my payload variable?
I checked this question but it only has one answer and it's a little over my head (just started with Python). I'm using Python 3.
I'm trying to scrape data from this page, but if you have a BP account, the page is a lot different/more useful. I need my program to log me in before I have BeautifulSoup get the data for me.
So far I have
from bs4 import BeautifulSoup
import urllib.request
import requests
username = 'myUsername'
password = 'myPassword'
from requests import session
payload = {'action': 'Log in',
'Username: ': username,
'Password: ': password}
# the next 3 lines are pretty much copied from a different StackOverflow
# question. I don't really understand what they're doing, and obviously these
# are where the problem is.
with session() as c:
c.post('https://www.baseballprospectus.com/manageprofile.php', data=payload)
response = c.get('http://www.baseballprospectus.com/sortable/index.php?cid=1820315')
soup = BeautifulSoup(response.content, "lxml")
for row in soup.find_all('tr')[7:]:
cells = row.find_all('td')
name = cells[1].text
print(name)
The script does work, it just pulls the data from the site before it's logged in, so its not the data I want.
Conceptually, there is no problem with your code. You're using a session object to send a login request, then with the same session you're sending a request for the desired page. This means that the cookies set by the login request should be kept for the second request. If you want to read more about the workings of the Session object, here's the relevant Requests documentation.
Since I don't have a valid login for Baseball Prospectus, I'll have to guess that something is wrong with the data you're sending to the login page. A quick inspection using the 'Network' tab in Chrome's Developer Tools, shows that the login page, manageprofile.php, accepts four POST parameters:
username: myUsername
password: myPassword
action: muffinklezmer
nocache: some long number, e.g. 2417395155
However you're sending a different set of parameters, and specifying a different value for the 'action' parameter. Note that the parameter names have to match the original request exactly, otherwise manageprofile.php will not accept the login.
Try replacing the payload dictionary with this version:
payload = {
'action': 'muffinklezmer',
'username': username,
'password': password}
If this doesn't work, try adding the 'nocache' parameter too, e.g.:
'nocache': '1437955145'