How can I log in this specific website using Python Requests? - python

I am trying to log in this website using the following request but it doesn't work
The cookie never contains 'userid'.
What should I change? Do I need to add headers in my post request?
import requests
payload = {
'ctl00$MasterMainContent$LoginCtrl$Username': 'myemail#email.com',
'ctl00$MasterMainContent$LoginCtrl$Password': 'mypassword',
'ctl00$MasterMainContent$LoginCtrl$cbxRememberMe' : 'on',
}
with requests.Session() as s:
login_page = s.get('http://www.bentekenergy.com/')
response = s.post('http://benport.bentekenergy.com/Login.aspx', data=payload)
if 'userid' in response.cookies:
print("connected")
else:
print("not connected")
Edit 1 (following comments):
I am not sure about what to put in the request headers, below is what I tried, but unsuccessfully.
request_headers = {
'Accept':'image/webp,image/*,*/*;q=0.8',
'Accept-Encoding':'gzip, deflate, sdch, br',
'Accept-Language':'en-US,en;q=0.8',
'Connection':'keep-alive',
'Cookie':'ACOOKIE=C8ctADJmMTc1YTRhLTBiMTEtNGViOC1iZjE0LTM5NTNkZDVmMDc1YwAAAAABAAAASGYBALlflFnvWZRZAQAAAABLAAC5X5RZ71mUWQAAAAA-',
'Host':'statse.webtrendslive.com',
'Referer':'https://benport.bentekenergy.com/Login.aspx',
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'
}
Edit 2 (following stovfl answer):
I use now the following payload, fill each attributes with the value in the form and complete it with username, password and rememberMe.
I also tried with the following headers in the request.
Still not connected
payload = {
'__VIEWSTATE' : '',
'__VIEWSTATEGENERATOR' : '',
'__PREVIOUSPAGE' : '',
'__EVENTVALIDATION' : '',
'isAuthenticated' : 'False',
'ctl00$hfAccessKey' : '',
'ctl00$hfVisibility' : '',
'ctl00$hfDateTime' : '',
'ctl00$hfHash' : '',
'ctl00$hfAnnouncementsUrl' : '',
'ctl00$MasterMainContent$LoginCtrl$Username' : '',
'ctl00$MasterMainContent$LoginCtrl$Password' : '',
'ctl00$MasterMainContent$LoginCtrl$cbxRememberMe' : '',
}
request_headers = {
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Encoding':'gzip, deflate, br',
'Accept-Language':'en-US,en;q=0.8',
'Cache-Control':'max-age=0',
'Connection':'keep-alive',
'Content-Length':'7522',
'Content-Type':'application/x-www-form-urlencoded',
'Cookie':'',
'Host':'benport.bentekenergy.com',
'Origin':'https://benport.bentekenergy.com',
'Referer':'https://benport.bentekenergy.com/Login.aspx',
'Upgrade-Insecure-Requests':'1',
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'
}
with requests.Session() as s:
response = s.get('http://benport.bentekenergy.com/Login.aspx')
soup = BeautifulSoup(response.text, "html.parser")
if soup.find("input", {"name" : "ctl00$MasterMainContent$LoginCtrl$Username"}):
print("not connected")
soup = BeautifulSoup(response.text, "lxml")
for element in soup.select("input"):
if element.get("name") in payload:
payload[element.get("name")] = element.get("value")
payload['ctl00$MasterMainContent$LoginCtrl$Username'] = 'myemail#email.com'
payload['ctl00$MasterMainContent$LoginCtrl$Password'] = 'mypassword'
payload['ctl00$MasterMainContent$LoginCtrl$cbxRememberMe'] = 'on'
response = s.post('http://benport.bentekenergy.com/Login.aspx', data=payload, headers=request_headers)
print (s.cookies)
soup = BeautifulSoup(response.text, "html.parser")
if soup.find("input", {"name" : "ctl00$MasterMainContent$LoginCtrl$Username"}):
print("not connected")
else:
print("connected")
s.cookies contains:
<RequestsCookieJar[<Cookie BenportState=q1k2r2eqftltjm55igy5mg55 for .bentekenergy.com/>, <Cookie RememberMe=True for .bentekenergy.com/>]>
Edit 3 (answer!):
I added
'__EVENTTARGET' : ''
in the payload and filled it with the value 'ctl00$MasterMainContent$LoginCtrl$btnSignIn'
Now I am connected!
NB: the headers were not necessary, just the payload

Comment: ... found that there is a parameter '__EVENTTARGET' that was not in the payload. It needed to contain 'ctl00$MasterMainContent$LoginCtrl$btnSignIn'. Now I am connected!
Yes, overlooked the Submit Button, there is a Javascript:
href="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions("ctl00$headerLoginCtrl$btnSignIn",
Relevant: SO Answer How To see POST Data
Comment: ... based on your answer (Edit 2). Still not connected
You are using http instead of https
Will be Auto-Redirected to https.
The <RequestsCookieJar has changed, so some progress.
I'm still unsure about your Authenticated Check: if soup.find("input", {"name"....
Have you Check the Page Content?
Any Error Message?
Don't use BeautifulSoup(... your following Requests should be using Session s to reuse the assigned Cookie.
E.g. response = s.get('<url to some resticted page>
Try request_headers with only 'User-Agent'
Analysis <form>:
Login URL: https://benport.bentekenergy.com/Login.aspx
Form: action: /Login.aspx, method: post
If value not empty means: Pre-Set-Values from Login Page.
1:input type:hidden value:/wEPDwUKLT... id:__VIEWSTATE
2:input type:hidden value:0BA31D5D id:__VIEWSTATEGENERATOR
3:input type:hidden value:2gILTn0H1S... id:__PREVIOUSPAGE
4:input type:hidden value:/wEWDAKIr6... id:__EVENTVALIDATION
5:input type:hidden value:False id:isAuthenticated
6:input type:hidden value:nu66O9eqvE id:ctl00_hfAccessKey
7:input type:hidden value:public id:ctl00_hfVisibility
8:input type:hidden value:08%2F16%2F... id:ctl00_hfDateTime
9:input type:hidden value:3AB353573D... id:ctl00_hfHash
10:input type:hidden value://announce... id:ctl00_hfAnnouncementsUrl
11:input type:text value:empty id:ctl00_MasterMainContent_LoginCtrl_Username
12:input type:password value:empty id:ctl00_MasterMainContent_LoginCtrl_Password
13:input type:checkbox value:empty id:ctl00_MasterMainContent_LoginCtrl_cbxRememberMe

Related

Python request cookies setting does not work

I want to crawl the data from this website:'http://www.stcn.com/article/search.html?search_type=all&page_time=1', but the website needs to have cookies on the homepage first, so I first get the cookies he needs from this website('http://www.stcn.com/article/search.html') and set them into the request, but it doesn't work after many attempts.
My code looks like this:
import requests
headers = {
'User-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36','Host':'www.stcn.com'}
def _getStcnCookie(keyWords='all'):
url = "http://www.stcn.com/article/search.html"
data = {'keyword': keyWords}
r = requests.get(url, data, headers=headers, timeout=10)
if r.status_code != 200:
return None
return requests.utils.dict_from_cookiejar(r.cookies)
def searchStcnData(url,keyWords) :
myHeader = dict.copy(headers)
myHeader['X-Requested-With'] = 'XMLHttpRequest'
cookies = _getStcnCookie(keyWords=keyWords)
print(cookies)
jar = requests.cookies.cookiejar_from_dict(cookies)
data = {'keyword':'Paxlovid', 'page_time': 1, 'search_type': 'all'}
#Option One
s = requests.Session()
response = s.post(url, data, headers=myHeader, timeout=5, cookies=cookies)
print(response.text)
# Option two
# myHeader['Cookie'] = 'advanced-stcn_web=potef1789mm5nqgmd6jc1rcih3; path=/; HttpOnly;'+cookiesStr
# Option three
r = requests.post(url, data, headers=myHeader, timeout=5, cookies=cookies)
print(r.json())
return r.json()
searchStcnData('http://www.stcn.com/article/search.html?search_type=all&page_time=1','Paxlovid')
I've tried options 1, 2, and 3 to no avail.
I set cookies in Postman, and only set 'advanced-stcn_web=5sdfitvu42qggmnjvop4dearj4' can get the data, like this :
{
"state": 1,
"msg": "操作成功",
"data": "<li class=\"\">\n <div class=\"content\">\n <div class=\"tt\">\n <a href=\"/article/detail/769123.html\" target=\"_blank\">\n ......
"page_time": 2
}

Supreme adding to cart with Python Requests

So I'm currently trying to cart an item on supreme through requests. After posting the carting request I don't get any errors but just that it didn't work as a response.
#imports
import requests
from bs4 import BeautifulSoup
#constants
baseurl = "https://www.supremenewyork.com/"
product_category = ""
size = ["Medium"]
product_keywords = ["Supreme®/The North Face® Steep Tech Fleece Pant"]
product_style = ["Brown"]
#functions
def carting(url):
session = requests.Session()
r = session.get(baseurl+url)
soup = BeautifulSoup(r.text, "html.parser")
name = soup.find("h1", {"itemprop" : "name"}).text
style = soup.find("p", {"itemprop" : "model"}).text
for keyword in product_keywords:
if keyword in name:
for keyword in product_style:
if keyword in style:
print("Product Found! Adding to cart...")
form = soup.find("form", {"id" : "cart-add"})
payload = {
"utf8" : "✓",
"authenticity_token" : form.find("input", {"name" : "authenticity_token"})["value"],
"style" : form.find("input", {"name" : "style"})["value"],
"size" : "92001", #form.find("select", {"id" : "size"})["value"] Need to rework getting Size through Keyword
"qty" : "1"
}
headers = {
"user-agent" : "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36",
"origin" : "https://www.supremenewyork.com",
"referer" : baseurl + url,
"path" : form["action"],
'Host': 'www.supremenewyork.com',
'Accept': 'application/json',
'Proxy-Connection': 'keep-alive',
'X-Requested-With': 'XMLHttpRequest',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-us',
'Content-Type': 'application/x-www-form-urlencoded',
'Origin': 'http://www.supremenewyork.com',
'Connection': 'keep-alive',
}
response = session.post(baseurl+form["action"], data=payload)
print(response.text)
return(session)
carting("/shop/pants/mj1czv0pa/jcyp91a8w")
The answer I get printed is
Product Found! Adding to cart...
{"cart":[],"success":false}
I wondered if I just have to gamble on which of all the headers I have to include so it works since maybe the site wants to have a certain or maybe all headers included that would be sent manually.
Help appreciated!

Scraping with AJAX - how to obtain the data?

I am trying to scrape the data from https://www.anre.ro/ro/info-consumatori/comparator-oferte-tip-de-furnizare-a-gn, which gets its input via Ajax (request URL is https://www.anre.ro/ro/ajax/comparator/get_results_gaz).
However, I can see that the Form Data is in a form of - tip_client=casnic&modalitate_racordare=sistem_de_distributie&transee_de_consum=b1&tip_pret_unitar=cu_reglementate&id_judet=ALBA&id_siruta=1222&consum_mwh=&pret_furnizare_mwh=&componenta_fixa=&suplimentar_componenta_fixa=&termen_plata=&durata_contractului=&garantii=&frecventa_emitere_factura=&tip_pret= (if I view source in Chrome). How do I pass this to scrapy or any other module to retrieve the desired webpage?
So far, I have this (is the json format correct considering the Form Data?):
class ExSpider(scrapy.Spider):
name = 'ExSpider'
allowed_domains = ['anre.ro']
def start_requests(self):
params = {
"tip_client":"casnic",
"modalitate_racordare":"sistem_de_distributie",
"transee_de_consum":"b1",
"tip_pret_unitar":"cu_reglementate",
"id_judet":"ALBA",
"id_siruta":"1222",
"consum_mwh":"",
"pret_furnizare_mwh":"",
"componenta_fixa":"",
"suplimentar_componenta_fixa":"",
"termen_plata":"",
"durata_contractului":"",
"garantii":"",
"frecventa_emitere_factura":"",
"tip_pret":""
}
r = scrapy.FormRequest('https://www.anre.ro/ro/ajax/comparator/get_results_gaz', method = "POST",formdata=params)
print(r)
The following should produce the required response from that page you wish to grab data from.
class ExSpider(scrapy.Spider):
name = "exspider"
url = 'https://www.anre.ro/ro/ajax/comparator/get_results_gaz'
payload = {
'tip_client': 'casnic',
'modalitate_racordare': 'sistem_de_distributie',
'transee_de_consum': 'b2',
'tip_pret_unitar': 'cu_reglementate',
'id_judet': 'ALBA',
'id_siruta': '1222',
'consum_mwh': '',
'pret_furnizare_mwh': '',
'componenta_fixa': '',
'suplimentar_componenta_fixa': '',
'termen_plata': '',
'durata_contractului': '',
'garantii': '',
'frecventa_emitere_factura': '',
'tip_pret': ''
}
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest',
'Referer': 'https://www.anre.ro/ro/info-consumatori/comparator-oferte-tip-de-furnizare-a-gn'
}
def start_requests(self):
yield scrapy.FormRequest(
self.url,
formdata=self.payload,
headers=self.headers,
callback=self.parse
)
def parse(self, response):
print(response.text)

Unable to grab expected result from a site issuing post requests

I'm trying to fetch some json response from a webpage using the script below. Here are the steps to populate the result in that site. Click on the AGREE button located at the bottom of this webpage and then on the EDIT SEARCH button and finally on SHOW RESULTS button without changing anything.
I've tried like this:
import requests
from bs4 import BeautifulSoup
url = 'http://finra-markets.morningstar.com/BondCenter/Results.jsp'
post_url = 'http://finra-markets.morningstar.com/bondSearch.jsp'
payload = {
'postData': {'Keywords':[]},
'ticker': '',
'startDate': '',
'endDate': '',
'showResultsAs': 'B',
'debtOrAssetClass': '1,2',
'spdsType': ''
}
payload_second = {
'count': '20',
'searchtype': 'B',
'query': {"Keywords":[{"Name":"debtOrAssetClass","Value":"3,6"},{"Name":"showResultsAs","Value":"B"}]},
'sortfield': 'issuerName',
'sorttype': '1',
'start': '0',
'curPage': '1'
}
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1; ) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36'
s.headers['Referer'] = 'http://finra-markets.morningstar.com/BondCenter/UserAgreement.jsp'
r = s.post(url,json=payload)
s.headers['Access-Control-Allow-Headers'] = r.headers['Access-Control-Allow-Headers']
s.headers['cf-request-id'] = r.headers['cf-request-id']
s.headers['CF-RAY'] = r.headers['CF-RAY']
s.headers['X-Requested-With'] = 'XMLHttpRequest'
s.headers['Origin'] = 'http://finra-markets.morningstar.com'
s.headers['Referer'] = 'http://finra-markets.morningstar.com/BondCenter/Results.jsp'
r = s.post(post_url,json=payload_second)
print(r.content)
This is the result I get when I run the script above:
b'\n\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\n\n\n{}'
How can I make the script populate expected result from that site?
P.S. I do not wish to go for selenium to get this done.
The response for http://finra-markets.morningstar.com/BondCenter/Results.jsp doesn't contain the search results. It must be fetching the data asynchronously.
An easy way to find out which network requests returned the search results, is to search the requests for one of the search results using Firefox's Dev Tools:
To convert the HTTP request to a Python request, I copy the request as a CURL code method from Firefox, import it into Postman and then export it as a Python code (a little long-winded (and lazy) I know!):
All this leads to the following code:
import requests
url = "http://finra-markets.morningstar.com/bondSearch.jsp"
payload = "count=20&searchtype=B&query=%7B%22Keywords%22%3A%5B%7B%22Name%22%3A%22debtOrAssetClass%22%2C%22Value%22%3A%223%2C6%22%7D%2C%7B%22Name%22%3A%22showResultsAs%22%2C%22Value%22%3A%22B%22%7D%5D%7D&sortfield=issuerName&sorttype=1&start=0&curPage=1"
headers = {
'User-Agent': "...",
'Accept': "text/plain, */*; q=0.01",
'Accept-Language': "en-US,en;q=0.5",
'Content-Type': "application/x-www-form-urlencoded; charset=UTF-8",
'X-Requested-With': "XMLHttpRequest",
'Origin': "http://finra-markets.morningstar.com",
'DNT': "1",
'Connection': "keep-alive",
'Referer': "http://finra-markets.morningstar.com/BondCenter/Results.jsp",
'Cookie': "...",
'cache-control': "no-cache"
}
response = requests.request("POST", url, data=payload, headers=headers)
print(response.text)
The response wasn't 100% JSON. So I just stripped away the outer whitespace and {B:..} part:
>>> text = response.text.strip()[3:-1]
>>> import json
>>> data = json.loads(text)
>>> data['Columns'][0]
{'moodyRating': {'ratingText': '', 'ratingNumber': 0},
'fitchRating': {'ratingText': None, 'ratingNumber': None},
'standardAndPoorRating': {'ratingText': '', 'ratingNumber': 0},

Unable to login to a site with requests

For fun, I'm trying to use Python requests to log on to my school's student portal. This is what I've come up with so far. I'm trying to be very explicit on the headers, because I'm getting a 200 status code (the code you also get when failing to login) instead of a 302 (successful login).
import sys
import os
import requests
def login(username, password):
url = '(link)/home.html#sign-in-content'
values = {
'translator_username' : '',
'translator_password' : '',
'translator_ldappassword' : '',
'returnUrl' : '',
'serviceName' : 'PS Parent Portal',
'serviceTicket' : '',
'pcasServerUrl' : '\/',
'credentialType' : 'User Id and Password Credential',
'account' : username,
'pw' : password,
'translatorpw' : password
}
headers = {
'accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'accept-encoding' : 'gzip, deflate, br',
'accept-language' : 'en-US,en;q=0.9',
'cache-control' : 'max-age=0',
'connection' : 'keep-alive',
'content-type' : 'application/x-www-form-urlencoded',
'host' : '(link)',
'origin' : '(link)',
'referer' : '(link)guardian/home.html',
'upgrade-insecure-requests' : '1'
}
with requests.Session() as s:
p = s.post(url, data=values)
if p.status_code == 302:
print(p.text)
print('Authentication error', p.status_code)
r = s.get('(link)guardian/home.html')
print(r.text)
def main():
login('myname', 'mypass')
if __name__ == '__main__':
main()
Using Chrome to examine the network requests, all of these headers are under 'Request Headers' in addition to a long cookie number, content-length, and user-agent.
The forms are as follows:
pstoken:(token)
contextData:(text)
translator_username:
translator_password:
translator_ldappassword:
returnUrl:(url)guardian/home.html
serviceName:PS Parent Portal
serviceTicket:
pcasServerUrl:\/
credentialType:User Id and Password Credential
account:f
pw:(id)
translatorpw:
Am I missing something with the headers/form names? Is it a problem with cookies?
If I look at p.requests.headers, this is what is sent:
{'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36', 'accept-encoding': 'gzip, deflate, br', 'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8', 'connection': 'keep-alive', 'accept-language': 'en-US,en;q=0.9', 'cache-control': 'max-age=0', 'content-type': 'application/x-www-form-urlencoded', 'host': '(url)', 'origin': '(url)', 'referer': '(url)guardian/home.html', 'upgrade-insecure-requests': '1', 'Content-Length': '263'}
p.text gives me the HTML of the login page
Tested with PowerAPI, requests, Mechanize, and RoboBrowser. All fail.
What response do you expect? You are using a wrong way to analyze your response.
with requests.Session() as s:
p = s.post(url, data=values)
if p.status_code == 302:
print(p.text)
print('Authentication error', p.status_code)
r = s.get('(link)guardian/home.html')
print(r.text)
In your code, you print out Authentication error ignoring status_code, I think it at least should like this:
with requests.Session() as s:
p = s.post(url, data=values)
if p.status_code == 302:
print(p.text)
r = s.get('(link)guardian/home.html')
print(r.text)
else:
print('Authentication error', p.status_code)

Categories

Resources