Logging in using Requests in Python - python

Hello I am trying to login to https://www.neighborwho.com using Requests for Python but with my code the website response keeps telling me that it cannot find any user with my username when in fact I can login using normal browser manually. I know I could use a headless browser or maybe lxml or mechanicalsoup etc but I am learning python and requests right now so want to see if it can be done in requests
Here is my code:
import requests
url = 'https://www.neighborwho.com/api/v5/session'
payload = {'user[email]': 'my_username',
'user[password]': 'my_password'}
headers = {'referer': 'https://www.neighborwho.com/app/login',
'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36',
'x-requested-with': 'XMLHttpRequest',
'origin': 'https://www.neighborwho.com',
'accept': 'application/json, text/javascript, */*; q=0.01',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9',
'content-length': '451',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-origin'
}
s = requests.Session()
resp = s.post(url, data=payload, headers=headers)
print(resp.status_code)
print(resp.text)
Here is the output I am getting:
401
{"session":{"errors":"We do not see an account that matches that
email/password combination. For security reasons we may occasionally reset
passwords. If you have an account that matches the email address
\"my_username\" and need to reset your password, please use the link
below."},"meta":{"status":401}}

Related

I'm receiving a 419 page expired status code when trying to use requests. How do I successfully login?

I'm getting a 419 page expired status code when using requests on this site. I gathered the information for the headers and data by monitoring the network tab of the developer console. How can I use the Python requests module to successfully login?
import requests
url = 'https://rates.itgtrans.com/login'
headers = {
'authority': 'rates.itgtrans.com',
'cache-control': 'max-age=0',
'sec-ch-ua': '"Chromium";v="94", "Google Chrome";v="94", ";Not A Brand";v="99"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'upgrade-insecure-requests': '1',
'origin': 'https://rates.itgtrans.com',
'content-type': 'application/x-www-form-urlencoded',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'navigate',
'sec-fetch-user': '?1',
'sec-fetch-dest': 'document',
'referer': 'https://rates.itgtrans.com/login',
'accept-language': 'en-US,en;q=0.9',
'cookie': 'XSRF-TOKEN=eyJpdiI6IkEzbi9JQkVwbWloZTM1UVdSdVJtK0E9PSIsInZhbHVlIjoiM1pxQVYxajhPcWdlZ1NlYlVMSUlyQzFISVpPNjNrMVB0UmNYMXZGa0crSmYycURoem1vR0FzRUMrNjB2bXFPbCs4U3ZyeGM4ZVNLZ1NjRGVmditUMldNUUNmYmVzeTY2WS85VC93a1c0M0JUMk1Jek00TTNLVnlPb2VVRXpiN0ciLCJtYWMiOiJkNjQyMTMwMGRmZmQ4YTg0ZTNhZDgzODQ5M2NiMmE2ODdlYjRlOTIyMWE5Yjg4YzEyMTBjNTI2ODQxY2YxMzNkIiwidGFnIjoiIn0%3D; draymaster_session=eyJpdiI6Im9vUDZabmlYSTY0a1lSNGdYZzZHT0E9PSIsInZhbHVlIjoiMGVVcSs2T3RheGhMeDNVVFJUQjRmb212TkoySVY5eWFjeVNHT1lGWE9sRHdtR3JTa0REZFhMTzNJeisyTjNOZ1hrQnNscWY0dXBheFFaRFhIdDAvUlFMOFdvTFdaOXBoejcwb2ZDNFNMdDZ6MUFxT2dHU3hlNVkxZmpiTnd2Z0QiLCJtYWMiOiIwN2RmZTc1ZDUzYzViYTgzYWU1MjFjNjIxZjYzMzY3MDE0YjI4MDhkMWMwMTVkYmYxYWM2MzQ0ODM1YzRkNDY1IiwidGFnIjoiIn0%3D'
}
data = {
'_token': 'o8jJ4tR3PHkuz5TR2kuoHwBAdHd5RczFx2rlul1C',
'email': '****',
'password': '****',
'button': ''
}
with requests.Session() as s:
cookies = s.cookies
p = s.post(url='https://rates.itgtrans.com/login', data=data, headers=headers, cookies=cookies)
print(p)
As for me all problem is that you always use the same _token.
Server for every user should generate new uniq token which is valid only few minutes - all for security reason (so hacker can't get it and use it after longer time)
BTW: went I run your code and get page with status 419 and display p.text then I see HTML with text Page Expired which can confirm that you use expired token.
You should always GET this page and search new token in HTML
<input name="_token" type="hidden" value="Xz0pJ0djGVnfaRMuXNDGMdBmZRbc55Ql2Q2CTPit"/>
and use this value in POST
I don't have account on this page but using fresh token from <input name="_token"> I get status 200 instead of 419.
import requests
from bs4 import BeautifulSoup
url = 'https://rates.itgtrans.com/login'
headers = {
'authority': 'rates.itgtrans.com',
'cache-control': 'max-age=0',
'origin': 'https://rates.itgtrans.com',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'referer': 'https://rates.itgtrans.com/login',
'accept-language': 'en-US,en;q=0.9',
}
data = {
'_token': '-empty-',
'email': '****',
'password': '****',
'button': ''
}
with requests.Session() as s:
# --- first GET page ---
response = s.get(url='https://rates.itgtrans.com/login', headers=headers)
#print(response.text)
# --- search fresh token in HTML ---
soup = BeautifulSoup(response.text)
token = soup.find('input', {'name': "_token"})['value']
print('token:', token)
# --- run POST with new token ---
data['_token'] = token
response = s.post(url='https://rates.itgtrans.com/login', data=data, headers=headers)
#print(response.text)
print('status_code:', response.status_code)
BTW:
I get 200 even if I don't use headers.
Because code uses Session so I don't have to copy cookies from GET to POST because Session copies them automatically.

python requests 403 for json, however in browsers works fine

I'm trying to get data from etoro. This link works in my browser https://www.etoro.com/sapi/userstats/CopySim/Username/viveredidividend/OneYearAgo but it's forbidden via request.get() even if I add user agent, headers and even cookies.
import requests
url = "https://www.etoro.com/sapi/userstats/CopySim/Username/viveredidividend/OneYearAgo"
headers = {
'Host': 'www.etoro.com',
'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0',
'Accept': '*/*',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate, br',
'Connection': 'keep-alive',
'Referer': 'https://www.etoro.com/people/viveredidividend/chart',
'Cookie': 'XXX',
'TE': 'Trailers'
}
requests.get(url, headers=headers)
>>> <Response [403]>
How to solve it without selenium?
This error gives when you doesn't authenticate the python code in browser. When you login with website it is authenticate and its remember it, thats why you can use and works fine in browser by site.
In order to solve this problem you first need to authenticate the browser in your python code.
To authenticate,
import requests
response = requests.get(url, auth=(username, password))
The error 403 tells that the request you are making is getting blocked. Actually, the website is protected by cloudflare which is preventing the website to get scraped. You can check it by executing print(response.text) in your code and you'll see Access denied | www.etoro.com used Cloudflare to restrict access in the returned cloudflare HTML inside title tag.
Under the hood, when you sent the requests it goes through the cloudflare server and verify whether it's coming from the real browser or not. If the request pass the verification then only it forward the request to website server which returns the valid response. Otherwise, the cloudflare block the request.
It's difficult to bypass cloudflare. Nevertheless, you can try your luck with the code given below.
Code
import urllib.request
url = 'https://www.etoro.com/sapi/userstats/CopySim/Username/viveredidividend/OneYearAgo'
headers = {
'authority': 'www.etoro.com',
'pragma': 'no-cache',
'cache-control': 'no-cache',
'sec-ch-ua': '" Not;A Brand";v="99", "Google Chrome";v="91", "Chromium";v="91"',
'accept': 'application/json, text/plain, */*',
'accounttype': 'Real',
'applicationidentifier': 'ReToro',
'sec-ch-ua-mobile': '?0',
'applicationversion': '331.0.2',
'user-agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0',
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'cors',
'sec-fetch-dest': 'empty',
'referer': 'https://www.etoro.com/discover/markets/cryptocurrencies',
'accept-language': 'en-US,en;q=0.9',
'cookie': '__cfruid=e7f40231e2946a1a645f6fa0eb19af969527087e-1624781498; _gcl_au=1.1.279416294.1624782732; _gid=GA1.2.518227313.1624782732; _scid=64860a19-28e4-4e83-9f65-252b26c70796; _fbp=fb.1.1624782732733.795190273; __adal_ca=so%3Ddirect%26me%3Dnone%26ca%3Ddirect%26co%3D%28not%2520set%29%26ke%3D%28not%2520set%29; __adal_cw=1624782733150; _sctr=1|1624732200000; _gaexp=GAX1.2.eSuc0QBTRhKbpaD4vT_-oA.18880.x331; _hjTLDTest=1; _hjid=bb69919f-e61b-4a94-a03b-db7b1f4ec4e4; hp_preferences=%7B%22locale%22%3A%22en-gb%22%7D; funnelFromId=38; eToroLocale=en-gb; G_ENABLED_IDPS=google; marketing_visitor_regulation_id=10; marketing_visitor_country=96; __cflb=0KaS4BfEHptJdJv5nwPFxhdSsqV6GxaSK8BuVNBmVkuj6hYxsLDisSwNTSmCwpbFxkL3LDuPyToV1fUsaeNLoSNtWLVGmBErMgEeYAyzW4uVUEoJHMzTirQMGVAqNKRnL; __cf_bm=6ef9d6f250ee71d99f439672839b52ac168f7c89-1624785170-1800-ASu4E7yXfb+ci0NsW8VuCgeJiCE72Jm9uD7KkGJdy1XyNwmPvvg388mcSP+hTCYUJvtdLyY2Vl/ekoQMAkXDATn0gyFR0LbMLl0b7sCd1Fz/Uwb3TlvfpswY1pv2NvCdqJBy5sYzSznxEsZkLznM+IGjMbvSzQffBIg6k3LDbNGPjWwv7jWq/EbDd++xriLziA==; _uetsid=2ba841e0d72211eb9b5cc3bdcf56041f; _uetvid=2babee20d72211eb97efddb582c3c625; _ga=GA1.2.1277719802.1624782732; _gat_UA-2056847-65=1; __adal_ses=*; __adal_id=47f4f887-c22b-4ce0-8298-37d6a0630bdd.1624782733.2.1624785174.1624782818.770dd6b7-1517-45c9-9554-fc8d210f1d7a; _gat=1; TS01047baf=01d53e5818a8d6dc983e2c3d0e6ada224b4742910600ba921ea33920c60ab80b88c8c57ec50101b4aeeb020479ccfac6c3c567431f; outbrain_cid_fetch=true; _ga_B0NS054E7V=GS1.1.1624785164.2.1.1624785189.35; TMIS2=9a74f8b353780f2fbe59d8dc1d9cd901437be0b823f8ee60d0ab36264e2503993c5e999eaf455068baf761d067e3a4cf92d9327aaa1db627113c6c3ae3b39cd5e8ea5ce755fb8858d673749c5c919fe250d6297ac50c5b7f738927b62732627c5171a8d3a86cdc883c43ce0e24df35f8fe9b6f60a5c9148f0a762e765c11d99d; mp_dbbd7bd9566da85f012f7ca5d8c6c944_mixpanel=%7B%22distinct_id%22%3A%20%2217a4c99388faa1-0317c936b045a4-34647600-13c680-17a4c993890d70%22%2C%22%24device_id%22%3A%20%2217a4c99388faa1-0317c936b045a4-34647600-13c680-17a4c993890d70%22%2C%22%24initial_referrer%22%3A%20%22%24direct%22%2C%22%24initial_referring_domain%22%3A%20%22%24direct%22%7D',
}
request = urllib.request.Request(url, headers=headers)
response = urllib.request.urlopen(request).read()
print(response.decode('utf-8'))

Account authenticate post method requests in python

In the following code, I am trying to do POST method to microsoft online account, and I am starting with a page that requires to post an email. This is my try till now
import requests
from bs4 import BeautifulSoup
url = 'https://moe-register.emis.gov.eg/account/login?ReturnUrl=%2Fhome%2FRegistrationForm'
headers ={
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9,ar;q=0.8',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
'Content-Type': 'application/x-www-form-urlencoded',
'Cookie':'__RequestVerificationToken=vdS3aPPg5qQ2bH9ADTppeKIVJfclPsMI6dqB6_Ru11-2XJPpLfs7jBlejK3n0PZuYl-CwuM2hmeCsXzjZ4bVfj2HGLs2KOfBUphZHwO9cOQ1; .AspNet.MOEEXAMREGFORM=ekeG7UWLA6OSbT8ZoOBYpC_qYMrBQMi3YOwrPGsZZ_3XCuCsU1BP4uc5QGGE2gMnFgmiDIbkIk_8h9WtTi-P89V7ME6t_mBls6T3uR2jlllCh0Ob-a-a56NaVNIArqBLovUnLGMWioPYazJ9DVHKZY7nR_SvKVKg2kPkn6KffkpzzHOUQAatzQ2FcStZBYNEGcfHF6F9ZkP3VdKKJJM-3hWC8y62kJ-YWD0sKAgAulbKlqcgL1ml6kFoctt2u66eIWNm3ENnMbryh8565aIk3N3UrSd5lBoO-3Qh8jdqPCCq38w3cURRzCd1Z1rhqYb3V2qYs1ULRT1_SyRXFQLrJs5Y9fsMNkuZVeDp_CKfyzM',
'Host': 'moe-register.emis.gov.eg',
'Origin': 'https://moe-register.emis.gov.eg',
'Referer': 'https://moe-register.emis.gov.eg/account/login?ReturnUrl=%2Fhome%2FRegistrationForm',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'}
with requests.session() as s:
# r = s.post(url)
#soup = BeautifulSoup(r.content, 'lxml')
data = {'EmailAddress': '476731809#matrouh1.moe.edu.eg'}
r_post = s.post(url, data=data, headers=headers, verify=False)
soup = BeautifulSoup(r_post.content, 'lxml')
print(soup)
What I got is the same page that requires the post of the email again. I expected to get the page that requires sign-in password..
This is the starting page
and this is an example of the email that needed to be posted 476731809#matrouh1.moe.edu.eg
** I have tried to use such a code but I got the page sign in again (although the credentials are correct)
Can you please try this code
import requests
from bs4 import BeautifulSoup
url = 'https://login.microsoftonline.com/common/login'
s = requests.Session()
res = s.get('https://login.microsoftonline.com')
cookies = dict(res.cookies)
res = s.post(url,
auth=('476731809#matrouh1.moe.edu.eg', 'Std#050202'),
verify=False,
cookies=cookies)
soup = BeautifulSoup(res.text, 'html.parser')
print(soup)
I checked out the page and following seems to be working:
import requests
headers = {
'Connection': 'keep-alive',
'Cache-Control': 'max-age=0',
'Upgrade-Insecure-Requests': '1',
'Origin': 'https://moe-register.emis.gov.eg',
'Content-Type': 'application/x-www-form-urlencoded',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-User': '?1',
'Sec-Fetch-Dest': 'document',
'Referer': 'https://moe-register.emis.gov.eg/account/login',
'Accept-Language': 'en-US,en;q=0.9,gl;q=0.8,fil;q=0.7,hi;q=0.6',
}
data = {
'EmailAddress': '476731809#matrouh1.moe.edu.eg'
}
response = requests.post('https://moe-register.emis.gov.eg/account/authenticate', headers=headers, data=data, verify=False)
Your POST endpoint seems to be wrong, since you need to re-direct from /login to /authenticate to proceed with the request (I am on a mac so my user-agent may be different than yours/required, you can change that from the headers variable).

How to fix 403 login post using requests after 3-4 requests?

Im trying to post login to this site using python requests post. First time i can requests for 3-4 times. But until 5 times i got 403 error from the server.
I already tried to set headers, included referer,origin,user-agent and proxy but not helped much.
import json
import requests
response = requests.Session()
url = 'https://www.saksfifthavenue.com/account/login?_k=%2Faccount%2Fsummary'
while True:
try:
headers = {
'sec-fetch-mode': 'cors',
'origin': 'https://www.saksfifthavenue.com',
'accept-encoding': 'gzip, deflate',
'accept-language': 'en-US',
'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36',
'content-type': 'application/json;charset=UTF-8',
'accept': 'application/json, text/plain, */*',
'referer': url,
'authority': 'www.saksfifthavenue.com',
'sec-fetch-site': 'same-origin',
'dnt': '1',
}
data = {"username":"demo#gmail.com,"password":"Thisisatest"}
login = response.post(
'https://www.saksfifthavenue.com/v1/account-service/accounts/sign-in', headers=headers, data=json.dumps(data)).content
loginCheck = login.decode()
print(loginCheck)
if "Sorry, this does not match our records. Please try again." in loginCheck:
print('Login failed!!!')
break
elif """Your Account""" in loginCheck:
print('Login success!!!')
else:
print('403 Error. Login Failed')
break
except:
pass
It looks like the server detected your spide requests, you request too fast, maybe you can try to set an interval for these requests.
But why you need to login post in a while? (and without logout

POST requests using cookie from session

I am trying to scrape a website using POST request to fill the form:
http://www.planning2.cityoflondon.gov.uk/online-applications/search.do?action=advanced
in python, this goes as follow:
import requests
import webbrowser
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-US,en;q=0.9',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
'Cookie': 'JSESSIONID=OwXG0Hkxj+X9ELygHZa-aLQ5.undefined; _ga=GA1.3.1911942552.',
'Content-Type': 'application/x-www-form-urlencoded',
'Host': 'www.planning2.cityoflondon.gov.uk',
'Origin': 'http://www.planning2.cityoflondon.gov.uk',
'Referer': 'http://www.planning2.cityoflondon.gov.uk/online-applications/search.do?action=advanced',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'
}
data = {
'searchCriteria.developmentType': '002',
'date(applicationReceivedStart)': '01/08/2000',
'date(applicationReceivedEnd)': '01/08/2018'
}
url = 'http://www.planning2.cityoflondon.gov.uk/online-applications/advancedSearchResults.do?action=firstPage'
test_file = 'planning_app.html'
with requests.Session() as session:
r = session.post(url, headers = headers, data = data)
with open (test_file, 'w') as file:
file.write(r.text)
webbrowser.open(test_file)
As you can see from the page reopened with webbrowser, this gives an error of outdated cookie.
For this to work I would need to manually go to the webpage, perform a query while opening the inspect panel of google chrome on the network tab, look at the cookie in the requests header and copy paste the cookie in my code. This would work until of course the cookie is expired again.
I tried to automate that retrieval of the cookie by doing the following:
headers_get = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-US,en;q=0.9',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
'Host': 'www.planning2.cityoflondon.gov.uk',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36'
}
with requests.Session() as session:
c = session.get('http://www.planning2.cityoflondon.gov.uk/online-applications/', headers = headers_get)
headers['Cookie'] = 'JSESSIONID=' + list(c.cookies.get_dict().values())[0]
r = session.post(url, headers = headers, data = data)
with open (test_file, 'w') as file:
file.write(r.text)
webbrowser.open(test_file)
I would expect this to work as it is simply automating what i do manually:
Go to the page of the GET request, get the cookie from it add said cookie to the headers dict of the POST request.
However I still receive the 'server error' page from the POST requests.
Anyone would be able to get an understanding of why this happen?
The requests.post accept cookies name parameter. Using it instead of sending cookies directly in header may fix the problem:
with requests.Session() as session:
c = session.get('http://www.planning2.cityoflondon.gov.uk/online- applications/', headers = headers_get)
# Also, you can set with cookies=session.cookies
r = session.post(url, headers = headers, data = data, cookies=c.cookies)
Basically I suppose there may be some javascript logic on the site, which isn't executed with the use of requests.post. If that's the case, to resolve that you have to use selenium for filling and submitting form.
Please see Dynamic Data Web Scraping with Python, BeautifulSoup which has similar problem - javascript not executed.

Categories

Resources