I'm trying to write a simple scraper to get usage details on my internet account - I've successfully written it using Powershell, but I'd like to move it to Python for ease of use/deployment. If I print r.text (result of POST to login page) I just get the login page form details again.
I think the solution might be something along the lines of using prepare_request? Apologies if I'm missing something super obvious, been about 5 years since I touched python ^^
import requests
USERNAME = 'usernamehere'
PASSWORD = 'passwordhere'
loginURL = 'https://myaccount.amcom.com.au/ClientLogin.aspx'
secureURL = 'https://myaccount.amcom.com.au/FibreUsageDetails.aspx'
session = requests.session()
req_headers = {'Content-Type': 'application/x-www-form-urlencoded'}
formdata = {
'ctl00$MemberToolsContent$txtUsername': USERNAME,
'ctl00$MemberToolsContent$txtPassword': PASSWORD,
'ctl00$MemberToolsContent$btnLogin' : 'Login'
}
session.get(loginURL)
r = session.post(loginURL, data=formdata, headers=req_headers, allow_redirects=False)
r2 = session.get(secureURL)
I've referenced these threads in my attempts:
HTTP POST and GET with cookies for authentication in python
Authentication and python Requests
Powershell script for reference:
$r=Invoke-WebRequest -Uri 'https://myaccount.amcom.com.au/ClientLogin.aspx' -UseDefaultCredentials -SessionVariable RequestForm
$r.Forms[0].Fields['ctl00$MemberToolsContent$txtUsername'] = "usernamehere"
$r.Forms[0].Fields['ctl00$MemberToolsContent$txtPassword'] = "passwordhere"
$r.Forms[0].Fields['ctl00$MemberToolsContent$btnLogin'] = "Login"
$response = Invoke-WebRequest -Uri 'https://myaccount.amcom.com.au/ClientLogin.aspx' -WebSession $RequestForm -Method POST -Body $r.Forms[0].Fields -ContentType 'application/x-www-form-urlencoded'
$response2 = Invoke-WebRequest -Uri 'https://myaccount.amcom.com.au/FibreUsageDetails.aspx' -WebSession $RequestForm
import requests
import re
from bs4 import BeautifulSoup
user="xyzmohsin"
passwd="abcpassword"
s=requests.Session()
headers={"User-Agent":"Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36"}
s.headers.update(headers)
login_url="https://myaccount.amcom.com.au/ClientLogin.aspx"
r=s.get(login_url)
soup=BeautifulSoup(r.content)
RadMasterScriptManager_TSM=soup.find(src=re.compile("RadMasterScriptManager_TSM"))['src'].split("=")[-1]
EVENTTARGET=soup.find(id="__EVENTTARGET")['value']
EVENTARGUMENT=soup.find(id="__EVENTARGUMENT")['value']
VIEWSTATE=soup.find(id="__VIEWSTATE")['value']
VIEWSTATEGENERATOR=soup.find(id="__VIEWSTATEGENERATOR")['value']
data={"RadMasterScriptManager_TSM":RadMasterScriptManager_TSM,
"__EVENTTARGET":EVENTTARGET,
"__EVENTARGUMENT":EVENTARGUMENT,
"__VIEWSTATE":VIEWSTATE,
"__VIEWSTATEGENERATOR":VIEWSTATEGENERATOR,
"ctl00_TopMenu_RadMenu_TopNav_ClientState":"",
"ctl00%24MemberToolsContent%24HiddenField_Redirect":"",
"ctl00%24MemberToolsContent%24txtUsername":user,
"ctl00%24MemberToolsContent%24txtPassword":passwd,
"ctl00%24MemberToolsContent%24btnLogin":"Login"}
headers={"Content-Type":"application/x-www-form-urlencoded",
"Host":"myaccount.amcom.com.au",
"Origin":"https://myaccount.amcom.com.au",
"Referer":"https://myaccount.amcom.com.au/ClientLogin.aspx"}
r=s.post(login_url,data=data,headers=headers)
I don't have the username and password hence the couldn't test the headers in the final post requests. If it doesn't work - then please remove Host, Origin and Referer from the final post requests's headers.
Hope that helps :-)
Related
I want to land on the main (learning) page of my Duolingo profile but I am having a little trouble finding the correct way to sign into the website with my credentials using Python Requests.
I have tried making requests as well as I understood them but I am pretty much a noob in this so it has all went in vain thus far.
Help would be really appreciated!
This is what I was trying by my own means by the way:
#The Dictionary Keys/Values and the Post Request URL were taken from the Network Source code in Inspect on Google Chrome
import requests
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/81.0.4044.138 Safari/537.36'
}
login_data =
{
'identifier': 'something#email.com',
'password': 'myPassword'
}
with requests.Session() as s:
url = "https://www.duolingo.com/2017-06-30/login?fields="
s.post(url, headers = headers, params = login_data)
r = s.get("https://www.duolingo.com/learn")
print(r.content)
The post request receives the following content:
b'{"details": "Malformed JSON: No JSON object could be decoded", "error": "BAD_REQUEST_SCHEMA"}'
And since the login fails, the get request for the learn page receives this:
b'<html>\n <head>\n <title>401 Unauthorized</title>\n </head>\n <body>\n <h1>401
Unauthorized</h1>\n This server could not verify that you are authorized to access the document you
requested. Either you supplied the wrong credentials (e.g., bad password), or your browser does not
understand how to supply the credentials required.<br/><br/>\n\n\n\n </body>\n</html>'
Sorry if I am making any stupid mistakes. I do not know a lot about all this. Thanks!
If you inspect the POST request carefully you can see that:
accepted content type is application/json
there are more fields than you have supplied (distinctId, landingUrl)
the data is sent as a json request body and not url params
The only thing that you need to figure out is how to get distinctId then you can do the following:
EDIT:
Sending email/password as json body appears to be enough and there is no need to get distinctId, example:
import requests
import json
headers = {'content-type': 'application/json'}
data = {
'identifier': 'something#email.com',
'password': 'myPassword',
}
with requests.Session() as s:
url = "https://www.duolingo.com/2017-06-30/login?fields="
# use json.dumps to convert dict to serialized json string
s.post(url, headers=headers, data=json.dumps(data))
r = s.get("https://www.duolingo.com/learn")
print(r.content)
I am trying to get data from a page. I've tried to read the posts of other people who had the same problem, Making a get request first to get cookies, setting headers, none of it works. When I examine the output of print(soup.title.get_text()) I still end up getting "Log In" as the title returned. The login_data has the same key names as the HTML <input> elements, e.g <input name=ctl00$cphMain$logIn$UserName ...> for username and <input name=ctl00$cphMain$logIn$Password ...> for password. Not sure what to do next. I can't use selenium, as I have to execute this script on an EC2 instance that's running a splunk server.
import requests
from bs4 import BeautifulSoup
link = "****"
login_URL = "https://erecruit.elwoodstaffing.com/Login.aspx"
login_data = {
"ctl00$cphMain$logIn$UserName": "****",
"ctl00$cphMain$logIn$Password": "****"
}
with requests.Session() as session:
z = session.get(login_URL)
session.headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.63 Safari/537.36',
'Content-Type':'application/json;charset=UTF-8',
}
post = session.post(login_URL, data=login_data)
response = session.get(link)
html = response.text
soup = BeautifulSoup(html, "html.parser")
print(soup.title.get_text())
I actually found the answer.
You can basically just go to the network tab using chrome, and then copy requests as a cURL statement. Then, just use a website or tool to convert the cURL statement to its programming language equivalent (Python, node, java, and so forth).
I'm trying to log to a website https://app.factomos.com/connexion but that doesn't work, I still have the error 403, I try with different headers and different data, but I really don't know where is the problem...
I try another way with MechanicalSoup but that still return to the connection page.
If someone can help me... Thank you for your time :/
import requests
from bs4 import BeautifulSoup as bs
url = 'https://factomos.com'
email = 'myemail'
password = 'mypassword'
url_login = 'https://factomos.com/connexion'
headers = requests.utils.default_headers()
headers.update({
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36',
})
data_login = {
'appAction': 'login',
'email': email,
'password': password
}
with requests.Session() as s:
dash = s.post(url_login, headers=headers, data=data_login)
print(dash.status_code)
# MechanicalSoup
import mechanicalsoup
browser = mechanicalsoup.StatefulBrowser()
resp = browser.open("https://app.factomos.com/connexion")
browser.select_form('form[id="login-form"]')
browser["email"] = 'myemail'
browser["password"] = 'mypassword'
response = browser.submit_selected()
print("submite: ", response.status_code)
print(browser.get_current_page())
I expect a response 200 with the dashboard page but the actual response is 403 or the connection page.
The URL you are using to login (https://factomos.com/connexion) is not the correct endpoint to log in. You can find this out using a browser's devtools/inspect element panel, specifically the "Network" tab.
Accessing this panel will vary by browser. Here's how you do it in Chrome, but in general you can access it by right clicking and clicking Inspect element.
From there, I sent a fake login attempt and I found the actual login endpoint is:
https://app.factomos.com/controllers/app-pro/login-ajax.php
As soon as you send the request, you can view details about it. Once you get a response, you can see those details too. Here are the request details:
Here was the form data I sent:
And the response:
{"error":{"code":-1,"message":"Identifiant ou mot de passe incorrect"}}
I have tried logging into GitHub using the following code:
url = 'https://github.com/login'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36',
'login':'username',
'password':'password',
'authenticity_token':'Token that keeps changing',
'commit':'Sign in',
'utf8':'%E2%9C%93'
}
res = requests.post(url)
print(res.text)
Now, res.text prints the code of login page. I understand that it maybe because the token keeps changing continuously. I have also tried setting the URL to https://github.com/session but that does not work either.
Can anyone tell me a way to generate the token. I am looking for a way to login without using the API. I had asked another question where I mentioned that I was unable to login. One comment said that I am not doing it right and it is possible to login just by using the requests module without the help of Github API.
ME:
So, can I log in to Facebook or Github using the POST method? I have tried that and it did not work.
THE USER:
Well, presumably you did something wrong
Can anyone please tell me what I did wrong?
After the suggestion about using sessions, I have updated my code:
s = requests.Session()
headers = {Same as above}
s.put('https://github.com/session', headers=headers)
r = s.get('https://github.com/')
print(r.text)
I still can't get past the login page.
I think you get back to the login page because you are redirected and since your code doesn't send back your cookies, you can't have a session.
You are looking for session persistance, requests provides it :
Session Objects The Session object allows you to persist certain
parameters across requests. It also persists cookies across all
requests made from the Session instance, and will use urllib3's
connection pooling. So if you're making several requests to the same
host, the underlying TCP connection will be reused, which can result
in a significant performance increase (see HTTP persistent
connection).
s = requests.Session()
s.get('http://httpbin.org/cookies/set/sessioncookie/123456789')
r = s.get('http://httpbin.org/cookies')
print(r.text)
# '{"cookies": {"sessioncookie": "123456789"}}'
http://docs.python-requests.org/en/master/user/advanced/
Actually in post method the request parameters should be in request body, not in header.So the login data should be in data parameter.
For github, authenticity token is present in value attribute of an input tag which is extracted using BeautifulSoup library.
This code works fine
import requests
from getpass import getpass
from bs4 import BeautifulSoup
headers = {
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'
}
login_data = {
'commit': 'Sign in',
'utf8': '%E2%9C%93',
'login': input('Username: '),
'password': getpass()
}
url = 'https://github.com/session'
session = requests.Session()
response = session.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html5lib')
login_data['authenticity_token'] = soup.find(
'input', attrs={'name': 'authenticity_token'})['value']
response = session.post(url, data=login_data, headers=headers)
print(response.status_code)
response = session.get('https://github.com', headers=headers)
print(response.text)
This code works perfectly
headers = {
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'
}
login_data = {
'commit': 'Sign in',
'utf8': '%E2%9C%93',
'login': 'your-username',
'password': 'your-password'
}
with requests.Session() as s:
url = "https://github.com/session"
r = s.get(url, headers=headers)
soup = BeautifulSoup(r.content, 'html5lib')
login_data['authenticity_token'] = soup.find('input', attrs={'name': 'authenticity_token'})['value']
r = s.post(url, data=login_data, headers=headers)
You can also try using the PyGitHub API to perform common git tasks.
Check the link below:
https://github.com/PyGithub/PyGithub
I am try to learn python, but I have no knowledge about HTTP, I read some posts here about how to use requests to login web site. But it doesn't work. My simple code is here (not real number and password):
#!/usr/bin/env python3
import requests
login_data = {'txtDID': '111111111',
'txtPswd': 'mypassword'}
with requests.Session() as c:
c.post('http://phone.ipkall.com/login.asp', data=login_data)
r = c.get('http://phone.ipkall.com/update.asp')
print(r.text)
print("Done")
But I can't get my personal information which should be showed after login. Can anyone give me some hint? Or point me a direction? I have no idea what's going wrong.
Servers don't like bots (scripts) for security reason. So your script have to behave like human using real browser. First use get() to get session cookies, set user-agent in headers to real one. Use http://httpbin.org/headers to see what user-agent is send by your browser.
Always check results r.status_code and r.url
So you can start with this:
(I don't have acount on this server so I can't test it)
#!/usr/bin/env python3
import requests
s = requests.Session()
s.headers.update({
'User-agent': "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:30.0) Gecko/20100101 Firefox/30.0",
})
# --------
# to get cookies, session ID, etc.
r = s.get('http://phone.ipkall.com/login.asp')
print( r.status_code, r.url )
# --------
login_data = {
'txtDID': '111111111',
'txtPswd': 'mypassword',
'submit1': 'Submit'
}
r = s.post('http://phone.ipkall.com/process.asp?action=verify', data=login_data)
print( r.status_code, r.url )
# --------
BTW: If page use JavaScript you have problem because requests can't run javascript on page.