I have logged into a website using python requests, but when I want to start scraping data from other pages on the site, it seems like I'm no longer authenticated?
I'm recieving a 401 error when trying to access part of the site that starts with "https://api"
I've tried using auth, proxies, but nothing is working. Works perfectly fine in Chrome. Also after I login I am able to see that my information is appearing in some of the api content, but when I do a GET on the homepage of the website I am not longer logged in.
payload_login = {'email': 'me#email.com', 'password': 'password'}
with requests.Sessions() as s:
url = 'https:/api.website.com/login"
r = s.post(url, data=login_data)
print r.content ###this acutally returns 200 meaning I've successfully login in
print s.get('https://api.website.com/userProjects', auth=HttpNtlmAuth(user,pass)
Output is <Response [401]>
Related
I'm trying to get a bunch of PDFs off of a site that sits behind a login so I don't manually have to download each and every one. I figured this would be easy, but I'm getting a "Missing Key-Pair-Id query parameter" error back. Here's what I have:
payload={'username':'user','password':'pass'}
with requests.Session() as session:
post = session.post('https://website.com/login.do', data=payload)
r = session.get('https://files.website.com/1.pdf')
print(r.text)
I'm printing r.text just because that's where I'm getting the above message. My post variable is giving me a response of 200 and the contents post.text is a redirect link with "code: success", too. If I click that link (or copy paste it into a private browser), I'm logged in just fine. And browsing to the pdf link works just fine. What am I missing here? Thanks.
I'm trying to download some (.csv) files from a private website with Python requests method.
I can access the website by using a browser. After typing in the url, it pops up a window to fill in username and password.
After that, it starts to download a (.csv) file.
However, it failed when I used Python requests method.
Here is my code.
import requests
# username and pwd in base64
b64_IDpass = '******'
tics_headers = {
"Host": 'http://tics-sign.com',
"Authorization": 'Basic {}'.format(b64_IDpass)
}
# company internet proxy
proxy = {'http': '*****'}
# url
url_get = 'http://tics-sign.com/getlist'
r = requests.get(url_get,
headers=tics_headers,
proxies=proxy)
print(r)
# <Response [404]>
I've checked the headers in a browser, there is no problem.
But why it returns <Response [404]> when using Python?
You need to post your password and username before you get the link.
So you could try this:
request.post("http://tics-sign.com", tics_headers)
And then get the info:
request.get(url_get, proxies=proxy)
This has worked for me in all the previous sites have scraped which need authentication.
The problem is that each site has a different way for accepting authentication. So it may
not even work.
It also may be that python is not getting redirected to http://tics-sign.com/displaypanel/login.aspx. curl didn't for me.
Edit:
I looked at the HTML source of your website and I came up with this:
login_data = {"logName": your_id, "pwd": your_password}
request.post(http://tics-sign.com/displaypanel/login.aspx, login_data)
r = request.get(url_get, proxies=proxy)
You can look at my blog for more info.
I'm tyring to login to a website login.php using Python requests module.
If the attempt is successful, the page will be redirected to index.php
If not, it remains there in login.php.
I was able to do the same with mechanize module.
import mechanize
b = mechanize.Browser()
url = 'http://localhost/test/login.php'
response = b.open(url)
b.select_form(nr=0)
b.form['username'] = 'admin'
b.form['password'] = 'wrongpwd'
b.method = 'post'
response = b.submit()
print(response.geturl())
if response.geturl() == url:
print('Failed')
else:
print('OK')
If login/password is correct
user#linux:~$ python script.py
http://localhost/test/index.php
OK
user#linux:~$
If login/password is wrong
user#linux:~$ python script.py
http://localhost/test/login.php
Failed
user#linux:~$
My question is how to do the same with requests module?
I was trying different approach here, but none of them work.
I've took the code from your question and modified it:
import requests
url = 'http://localhost/test/login.php'
values = {'username': 'admin', 'password': 'wrongpwd'}
r = requests.post(url, data=values)
print(r.url) # prints the final url of the response
You can know it's a sure thing because it's documented in the source code. All I've done is opened the definition of the Response class.
Now, back to your original question.
Python requests module to verify if HTTP login is successful or not
It depends on whether the website is properly implemented.
When you send a form, any website replies to you with an HTTP response, which contains a status code. A properly implemented website returns different status codes depending on the stuff you've sent. Here's a list of them. If everything is honky-dory, the status code of the response will be 200:
import requests
url = 'http://localhost/test/login.php'
values = {'username': 'admin', 'password': 'wrongpwd'}
r = requests.post(url, data=values)
print(r.status_code == 200) # prints True
If the user entered the wrong credentials, the status code of the response will be 401 (see the list above). Now, if a website is not implemented properly, it will respond with 200 anyway and you'll have to guess whether the login is successful based on other things, such as response.content and response.url.
I am trying to login to a website www.seek.com.au. I am trying to test the possibility to remote login using Python request module. The site is Front end is designed using React and hence I don't see any form action component in www.seek.com.au/sign-in
When I run the below code, I see the response code as 200 indicating success, but I doubt if it's actually successful. The main concern is which URL to use in case if there is no action element in the login submit form.
import requests
payload = {'email': <username>, 'password': <password>}
url = 'https://www.seek.com.au'
with requests.Session() as s:
response_op = s.post(url, data=payload)
# print the response status code
print(response_op.status_code)
print(response_op.text)
When i examine the output data (response_op.text), i see word 'Sign in' and 'Register' in output which indicate the login failed. If its successful, the users first name will be shown in the place. What am I doing wrong here ?
P.S: I am not trying to scrape data from this website but I am trying to login to a similar website.
Try this code:
import requests
payload={"email": "test#test.com", "password": "passwordtest", "rememberMe": True}
url = "https://www.seek.com.au:443/userapi/login"
with requests.Session() as s:
response_op = s.post(url, json=payload)
# print the response status code
print(response_op.status_code)
print(response_op.text)
You are sending the request to the wrong url.
Hope this helps
I'm trying to login to Campaign Monitor to scrape some data from pages related to email campaign performance.
The "login-protected" URL of the page I'm trying to access looks like this:
https://mycompany.createsend.com/campaigns/reports/lists/DFGDF987GD98F7GD?s=BCV98B5XF54BVC54BC
Going to that page in a web browser (try it here) will redirect to the login page, itself with a URL like this:
https://login.createsend.com/l/98SDF76DS87F68S/DFGDF987GD98F7GD?ReturnUrl=%2Fcampaigns%2Freports%2Flists%2FBCV98B5XF54BVC54BC%3Fs%3BCV98B5XF54BVC54BC&s=7DS6F87S6DF876SDF76
What I've gathered from trying to solve this is that I need to open a session, authenticate on the redirect URL, then request the URL that I actually want (using the authenticated session).
Here is the code I'm using to try to accomplish that:
payload = {
'username': 'myUsername',
'password': 'myPassword',
}
redURL = 'https://login.createsend.com/l/98SDF76DS87F68S/DFGDF987GD98F7GD?ReturnUrl=%2Fcampaigns%2Freports%2Flists%2FBCV98B5XF54BVC54BC%3Fs%3BCV98B5XF54BVC54BC&s=7DS6F87S6DF876SDF76'
with requests.Session() as s:
p = s.post(redURL, data=payload)
# This prints the "success" message I've pasted below
print p.content
r = s.get('https://mycompany.createsend.com/campaigns/reports/lists/DFGDF987GD98F7GD?s=BCV98B5XF54BVC54BC')
# This prints the HTML of the login page again, as if I'm not authenticated
print r.content
Here is the "successful" response after the first POST for the session:
{"MultipleAccounts":false,"LoginStatus":"Success","SiteAddress":"https://mycompany.createsend.com","ErrorMessage":"","SessionExpired":false,"Url":"https://mycompany.createsend.com/login?Origin=Marketing\u0026ReturnUrl=%2fcampaigns%2freports%2flists%2f92D2FBCV98B5XF54BVC%3fs%7DS6F87S6DF876SDF76\u0026s=2FBCV98B5XF54BVC","DomainSwitchAddress":"https://mycompany.createsend.com","DomainSwitchAddressQueryString":null,"NeedsDomainSwitch":false}
Can someone please help me out with why the second request in the session prints the HTML of the login page instead of the HTML of the authenticated version of the page (ie. the page with the data I'm looking for)?