Scraping with AJAX - how to obtain the data?

Scraping with AJAX - how to obtain the data? - python

I am trying to scrape the data from https://www.anre.ro/ro/info-consumatori/comparator-oferte-tip-de-furnizare-a-gn, which gets its input via Ajax (request URL is https://www.anre.ro/ro/ajax/comparator/get_results_gaz).
However, I can see that the Form Data is in a form of - tip_client=casnic&modalitate_racordare=sistem_de_distributie&transee_de_consum=b1&tip_pret_unitar=cu_reglementate&id_judet=ALBA&id_siruta=1222&consum_mwh=&pret_furnizare_mwh=&componenta_fixa=&suplimentar_componenta_fixa=&termen_plata=&durata_contractului=&garantii=&frecventa_emitere_factura=&tip_pret= (if I view source in Chrome). How do I pass this to scrapy or any other module to retrieve the desired webpage?
So far, I have this (is the json format correct considering the Form Data?):
class ExSpider(scrapy.Spider):
name = 'ExSpider'
allowed_domains = ['anre.ro']
def start_requests(self):
params = {
"tip_client":"casnic",
"modalitate_racordare":"sistem_de_distributie",
"transee_de_consum":"b1",
"tip_pret_unitar":"cu_reglementate",
"id_judet":"ALBA",
"id_siruta":"1222",
"consum_mwh":"",
"pret_furnizare_mwh":"",
"componenta_fixa":"",
"suplimentar_componenta_fixa":"",
"termen_plata":"",
"durata_contractului":"",
"garantii":"",
"frecventa_emitere_factura":"",
"tip_pret":""
}
r = scrapy.FormRequest('https://www.anre.ro/ro/ajax/comparator/get_results_gaz', method = "POST",formdata=params)
print(r)

The following should produce the required response from that page you wish to grab data from.
class ExSpider(scrapy.Spider):
name = "exspider"
url = 'https://www.anre.ro/ro/ajax/comparator/get_results_gaz'
payload = {
'tip_client': 'casnic',
'modalitate_racordare': 'sistem_de_distributie',
'transee_de_consum': 'b2',
'tip_pret_unitar': 'cu_reglementate',
'id_judet': 'ALBA',
'id_siruta': '1222',
'consum_mwh': '',
'pret_furnizare_mwh': '',
'componenta_fixa': '',
'suplimentar_componenta_fixa': '',
'termen_plata': '',
'durata_contractului': '',
'garantii': '',
'frecventa_emitere_factura': '',
'tip_pret': ''
}
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest',
'Referer': 'https://www.anre.ro/ro/info-consumatori/comparator-oferte-tip-de-furnizare-a-gn'
}
def start_requests(self):
yield scrapy.FormRequest(
self.url,
formdata=self.payload,
headers=self.headers,
callback=self.parse
)
def parse(self, response):
print(response.text)

Related

How to get a correct session_id? (Scrapy, Python)

There is an url: https://maps.leicester.gov.uk/map/Aurora.svc/run?inspect_query=QPPRN&inspect_value=ROH9385&script=%5CAurora%5Cw3%5CPLANNING%5Cw3PlanApp_MG.AuroraScript%24&nocache=f73eee56-45da-f708-87e7-42e82982370f&resize=always
It returns the coordinates. To get the coordinates - it does 3 requests(I SUPPOSE):
the url mentioned above
requesting session_id
getting coordinates using previousely mentioned session_id.
I am getting session_id in the 2nd step, but it is wrong. I can't get coordinates in step 3 using it. How can I know that the problem is in session_id? When I insert the session_id taken from the browser - my code works fine and coordinates are received.
Here are the requests in browser:
Here is the correct response from browser:
And this is what I'm getting with my code:
Here is my code (it is for Scrapy framework):
'''
import inline_requests
#inline_requests.inline_requests
def get_map_data(self, response):
""" Getting map data. """
map_referer = ("https://maps.leicester.gov.uk/map/Aurora.svc/run?inspect_query=QPPRN&"
"inspect_value=ROH9385&script=%5CAurora%5Cw3%5CPLANNING%5Cw3PlanApp_MG.AuroraScript"
"%24&nocache=f73eee56-45da-f708-87e7-42e82982370f&resize=always")
response = yield scrapy.Request(
url=map_referer,
meta=response.meta,
method='GET',
dont_filter=True,
)
time_str = str(int(time.time()*1000))
headers = {
'Referer': response.url,
'Accept': 'application/javascript, */*; q=0.8',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7',
'Host': 'maps.leicester.gov.uk',
'Sec-Fetch-Dest': 'script',
'Sec-Fetch-Mode': 'no-cors',
'Sec-Fetch-Site': 'same-origin',
'Connection': 'keep-alive',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36'
}
response.meta['handle_httpstatus_all'] = True
url = ( 'https://maps.leicester.gov.uk/map/Aurora.svc/RequestSession?userName=inguest'
'&password=&script=%5CAurora%5Cw3%5CPLANNING%5Cw3PlanApp_MG.AuroraScript%24&'
f'callback=_jqjsp&_{time_str}=' )
reqest_session_response = yield scrapy.Request(
url=url,
meta=response.meta,
method='GET',
headers=headers,
dont_filter=True,
)
session_id = re.search(r'"SessionId":"([^"]+)', reqest_session_response.text)
session_id = session_id.group(1) if session_id else None
print(8888888888888)
print(session_id)
# session_id = '954f04e2-e52c-4dd9-9046-f3f013d3f633'
# pprn = item.get('other', {}).get('PPRN')
pprn = 'ROH9385' # hard coded for the current page
if session_id and pprn:
time_str = str(int(time.time()*1000))
url = ('https://maps.leicester.gov.uk/map/Aurora.svc/FindValue'
f'Location?sessionId={session_id}&value={pprn}&query=QPPRN&callback=_jqjsp'
f'&_{time_str}=')
coords_response = yield scrapy.Request(
url = url,
method='GET',
meta=reqest_session_response.meta,
dont_filter = True,
)
print(coords_response.text)
breakpoint()'''
Could you please correct my code so that it could get coordinates?

The website creates a sessionId first, then use the sessionId creates a layer on server (I guess). Then you can start requesting, otherwise it can't find the map layer under that sessionId.
import requests
url = "https://maps.leicester.gov.uk/map/Aurora.svc/RequestSession?userName=inguest&password=&script=%5CAurora%5Cw3%5CPLANNING%5Cw3PlanApp_MG.AuroraScript%24"
res = requests.get(url, verify=False).json()
sid = res["Session"]["SessionId"]
url = f"https://maps.leicester.gov.uk/map/Aurora.svc/OpenScriptMap?sessionId={sid}"
res = requests.get(url, verify=False)
url = f"https://maps.leicester.gov.uk/map/Aurora.svc/FindValueLocation?sessionId={sid}&value=ROH9385&query=QPPRN"
res = requests.get(url, verify=False).json()
print(res)

Unable to grab expected result from a site issuing post requests

I'm trying to fetch some json response from a webpage using the script below. Here are the steps to populate the result in that site. Click on the AGREE button located at the bottom of this webpage and then on the EDIT SEARCH button and finally on SHOW RESULTS button without changing anything.
I've tried like this:
import requests
from bs4 import BeautifulSoup
url = 'http://finra-markets.morningstar.com/BondCenter/Results.jsp'
post_url = 'http://finra-markets.morningstar.com/bondSearch.jsp'
payload = {
'postData': {'Keywords':[]},
'ticker': '',
'startDate': '',
'endDate': '',
'showResultsAs': 'B',
'debtOrAssetClass': '1,2',
'spdsType': ''
}
payload_second = {
'count': '20',
'searchtype': 'B',
'query': {"Keywords":[{"Name":"debtOrAssetClass","Value":"3,6"},{"Name":"showResultsAs","Value":"B"}]},
'sortfield': 'issuerName',
'sorttype': '1',
'start': '0',
'curPage': '1'
}
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1; ) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36'
s.headers['Referer'] = 'http://finra-markets.morningstar.com/BondCenter/UserAgreement.jsp'
r = s.post(url,json=payload)
s.headers['Access-Control-Allow-Headers'] = r.headers['Access-Control-Allow-Headers']
s.headers['cf-request-id'] = r.headers['cf-request-id']
s.headers['CF-RAY'] = r.headers['CF-RAY']
s.headers['X-Requested-With'] = 'XMLHttpRequest'
s.headers['Origin'] = 'http://finra-markets.morningstar.com'
s.headers['Referer'] = 'http://finra-markets.morningstar.com/BondCenter/Results.jsp'
r = s.post(post_url,json=payload_second)
print(r.content)
This is the result I get when I run the script above:
b'\n\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\n\n\n{}'
How can I make the script populate expected result from that site?
P.S. I do not wish to go for selenium to get this done.

The response for http://finra-markets.morningstar.com/BondCenter/Results.jsp doesn't contain the search results. It must be fetching the data asynchronously.
An easy way to find out which network requests returned the search results, is to search the requests for one of the search results using Firefox's Dev Tools:
To convert the HTTP request to a Python request, I copy the request as a CURL code method from Firefox, import it into Postman and then export it as a Python code (a little long-winded (and lazy) I know!):
All this leads to the following code:
import requests
url = "http://finra-markets.morningstar.com/bondSearch.jsp"
payload = "count=20&searchtype=B&query=%7B%22Keywords%22%3A%5B%7B%22Name%22%3A%22debtOrAssetClass%22%2C%22Value%22%3A%223%2C6%22%7D%2C%7B%22Name%22%3A%22showResultsAs%22%2C%22Value%22%3A%22B%22%7D%5D%7D&sortfield=issuerName&sorttype=1&start=0&curPage=1"
headers = {
'User-Agent': "...",
'Accept': "text/plain, */*; q=0.01",
'Accept-Language': "en-US,en;q=0.5",
'Content-Type': "application/x-www-form-urlencoded; charset=UTF-8",
'X-Requested-With': "XMLHttpRequest",
'Origin': "http://finra-markets.morningstar.com",
'DNT': "1",
'Connection': "keep-alive",
'Referer': "http://finra-markets.morningstar.com/BondCenter/Results.jsp",
'Cookie': "...",
'cache-control': "no-cache"
}
response = requests.request("POST", url, data=payload, headers=headers)
print(response.text)
The response wasn't 100% JSON. So I just stripped away the outer whitespace and {B:..} part:
>>> text = response.text.strip()[3:-1]
>>> import json
>>> data = json.loads(text)
>>> data['Columns'][0]
{'moodyRating': {'ratingText': '', 'ratingNumber': 0},
'fitchRating': {'ratingText': None, 'ratingNumber': None},
'standardAndPoorRating': {'ratingText': '', 'ratingNumber': 0},

How fix load more ajax request in scrapy

I am using Windows 10 with Python 3. I never get the 2nd-page data. Please check.
Thanks in advance!
scrapy shell "https://www.industrystock.com/html/hydraulic-cylinder/product-result-uk-19931-0.html"
my terminal
url = 'https://www.industrystock.com/html/hydraulic-cylinder/product-result-uk-19931-0.html'
form = {
'lang': 'en',
'beta': 'false',
'action': 'RESULTPAGE_AJAX#getOverview',
'content': 'resultpage',
'subContent': 'result',
'company_id': '0',
'override_id': '0',
'domain_id': '0',
'user_id': '0',
'keyword_id': '19931',
'JSONStr': '{"key":"company","length":9,"keyword_id":null,"index":6,"filter":{},"override":{"key":"company"},"query":"Hydraulic Cylinder"}'}
headers = {
'Content-Type': 'json/',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest',
}
req = scrapy.FormRequest(url, method='POST', formdata=form, headers=headers)
fetch(req)
view(response)
We expect to crawl the load more pages and data!

I tried to find a way to do it without rendering the page:
from scrapy import Spider
import scrapy
import json
import logging
class IndustrystockSpider(Spider):
name = "industry_stock"
allowed_domains = ['industrystock.com']
start_urls = ["https://www.industrystock.com/html/hydraulic-cylinder/product-result-uk-19931-0.html"]
custom_settings = {'ROBOTSTXT_OBEY': False}
ajax_url = 'https://www.industrystock.com/ajax/ajax_live.php'
headers = {
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Referer': 'https://www.industrystock.com/html/hydraulic-cylinder/product-result-uk-19931-0.html',
'Origin': 'https://www.industrystock.com',
'X-Requested-With': 'XMLHttpRequest',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
}
data = {
'lang': 'en',
'beta': 'false',
'action': 'RESULTPAGE_AJAX#getOverview',
'content': 'resultpage',
'subContent': 'result',
'company_id': '0',
'override_id': '0',
'domain_id': '0',
'user_id': '0',
'keyword_id': '19931',
}
#staticmethod
def construct_json_str(index):
return '{"key":"company","length":9,"keyword_id":null,"index":' + \
str(index) + \
',"filter":{},"override":{"key":"company"},"query":"Hydraulic Cylinder"}'
def parse(self, response):
index = 0
data = self.data
data['JSONStr'] = self.construct_json_str(index)
logging.info(f"data is {data}")
yield scrapy.FormRequest(self.ajax_url,
callback=self.parse_detail,
method='POST',
formdata=data,
headers=self.headers,
meta={'index': index})
def parse_detail(self, response):
company_data = json.loads(response.body)
overview = company_data['result']['overview']
if overview:
for company in overview:
company_id = company['company_id']
logging.info(f"company_id {company_id}")
previous_index = response.meta['index']
index = previous_index + 1
data = self.data
data['JSONStr'] = self.construct_json_str(index)
yield scrapy.FormRequest(self.ajax_url,
callback=self.parse_detail,
method='POST',
formdata=data,
headers=self.headers,
dont_filter=True,
meta={'index': index})

302 Redirect in xhr request

i need to send a post request to this url :
http://lastsecond.ir/hotels/ajax
you can see the other parameters send by this request here:
formdata:
filter_score:
sort:reviewed_at
duration:0
page:1
base_location_id:1
request header:
:authority:lastsecond.ir
:method:POST
:path:/hotels/ajax
:scheme:https
accept:*/*
accept-encoding:gzip, deflate, br
accept-language:en-US,en;q=0.9,fa;q=0.8,ja;q=0.7
content-length:67
content-type:application/x-www-form-urlencoded; charset=UTF-8
cookie:_jsuid=2453861291; read_announcements=,11,11; _ga=GA1.2.2083988810.1511607903; _gid=GA1.2.1166842676.1513922852; XSRF-TOKEN=eyJpdiI6IlZ2TklPcnFWU3AzMlVVa0k3a2xcL2dnPT0iLCJ2YWx1ZSI6ImVjVmt2c05STWRTUnJod1IwKzRPNk4wS2lST0k1UTk2czZwZXJxT2FQNmppNkdUSFdPK29kU29RVHlXbm1McTlFSlM5VlIwbGNhVUozbXFBbld5c2tRPT0iLCJtYWMiOiI4YmNiMGQwMzdlZDgyZTE2YWNlMWY1YjdmMzViNDQwMmRjZGE4YjFmMmM1ZmUyNTQ0NmE1MGRjODFiNjMwMzMwIn0%3D; lastsecond-session=eyJpdiI6ImNZQjdSaHhQM1lZaFJIZzhJMWJXN0E9PSIsInZhbHVlIjoiK1NWdHJiUTdZQzBYeEsyUjE3QXFhUGJrQXBGcExDMVBXTjhpSVJLRlFnUjVqXC9USHBxNGVEZ3dwKzVGcG5yeU93VTZncG9wRGpvK0VpVnQ2b1ByVnh3PT0iLCJtYWMiOiI4NTFkYmQxZTFlMTMxOWFmZmU1ZjA1ZGZhNTMwNDFmZmU0N2FjMGVjZTg1OGU2NGE0YTNmMTc2MDA5NWM1Njg3In0%3D
origin:https://lastsecond.ir
referer:https://lastsecond.ir/hotels?score=&page=1&sort=reviewed_at&duration=0
user-agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36
x-csrf-token:oMpQTG0wN0YveJIk2WhkesvzjZE2FqHkDqPiW8Dy
x-requested-with:XMLHttpRequest
the result of this code suppose to be a json file, by it redirect the request to the its parent url. i'm using scrapy with python to send this request, here is scrapy code :
class HotelsSpider(scrapy.Spider):
name = 'hotels'
allowed_domains = ['lastsecond.ir']
start_urls = ['http://lastsecond.ir/hotels']
def parse(self, response):
data = {
'filter_score': '',
'sort': 'reviewed_at',
'duration': '0',
'page': '1',
'base_location_id': '1'
}
headers = {
'user-agent': 'Mozilla/5.0',
'x-csrf-token': 'oMpQTG0wN0YveJIk2WhkesvzjZE2FqHkDqPiW8Dy',
'x-requested-with': 'XMLHttpRequest'
}
url = 'https://lastsecond.ir/hotels/ajax'
return FormRequest(
url=url,
callback=self.parse_details,
formdata=data,
method="POST",
headers=headers,
dont_filter=True
)
def parse_details(self, response):
data = response.body_as_unicode()
print(data)
#f = open('output.json', 'w')
#f.write(data)
#f.close()
i've changed my code so it gets the fresh csrf-token everytime it sends a request:
class HotelsSpider(scrapy.Spider):
name = 'hotels'
allowed_domains = ['lastsecond.ir']
start_urls = ['http://lastsecond.ir/hotels']
def parse(self, response):
html = response.body_as_unicode()
start = html.find("var csrftoken = '")
start = start + len(b"var csrftoken = '")
end = html.find("';", start)
self.csrftoken = html[start:end]
print('csrftoken:', self.csrftoken)
yield self.ajax_request('1')
def ajax_request(self, page):
data = {
'filter_score': '',
'sort': 'reviewed_at',
'duration': '0',
'page': page,
'base_location_id': '1'
}
headers = {
'user-agent': 'Mozilla/5.0',
'x-csrf-token': self.csrftoken,
'x-requested-with': 'XMLHttpRequest'
}
url = 'https://lastsecond.ir/hotels/ajax'
return FormRequest(
url=url,
callback=self.parse_details,
formdata=data,
method="POST",
headers=headers,
dont_filter=True
)
def parse_details(self, response):
print(response.body_as_unicode())
any help would be appreciated.

Your mistake is the same 'x-csrf-token' in every request.
'x-csrf-token' is method to block bots/scripts.
Wikipedia: Cross Site Request Forgery
Every time you open page in browser portal generates new, uniqe 'x-csrf-token' which can be correct only for short time. You can't use the same 'x-csrf-token' all the time.
In answer to previous question I make GET request to get page and to find fresh X-CSRF-TOKEN.
See self.csrftoken in code
def parse(self, response):
print('url:', response.url)
html = response.body_as_unicode()
start = html.find("var csrftoken = '")
start = start + len(b"var csrftoken = '")
end = html.find("';" , start)
self.csrftoken = html[start:end]
print('csrftoken:', self.csrftoken)
yield self.create_ajax_request('1')
And later I use this token to read AJAX requests.
def create_ajax_request(self, page):
'''
subfunction can't use `yield, it has to `return` Request to `parser`
and `parser` can use `yield`
'''
print('yield page:', page)
url = 'https://lastsecond.ir/hotels/ajax'
headers = {
'X-CSRF-TOKEN': self.csrftoken,
'X-Requested-With': 'XMLHttpRequest',
}
params = {
'filter_score': '',
'sort': 'reviewed_at',
'duration': '0',
'page': page,
'base_location_id': '1',
}
return scrapy.FormRequest(url,
callback=self.parse_details,
formdata=params,
headers=headers,
dont_filter=True,
)

Are you making an Illegal Request? The easiest way to learn it is to copy the request in the browser as curl (F12 -> Network -> Right Click on specify request -> Copy -> Copy as Curl), and convert it to python language with this tool (without Scrapy)

How can I log in this specific website using Python Requests?

I am trying to log in this website using the following request but it doesn't work
The cookie never contains 'userid'.
What should I change? Do I need to add headers in my post request?
import requests
payload = {
'ctl00$MasterMainContent$LoginCtrl$Username': 'myemail#email.com',
'ctl00$MasterMainContent$LoginCtrl$Password': 'mypassword',
'ctl00$MasterMainContent$LoginCtrl$cbxRememberMe' : 'on',
}
with requests.Session() as s:
login_page = s.get('http://www.bentekenergy.com/')
response = s.post('http://benport.bentekenergy.com/Login.aspx', data=payload)
if 'userid' in response.cookies:
print("connected")
else:
print("not connected")
Edit 1 (following comments):
I am not sure about what to put in the request headers, below is what I tried, but unsuccessfully.
request_headers = {
'Accept':'image/webp,image/*,*/*;q=0.8',
'Accept-Encoding':'gzip, deflate, sdch, br',
'Accept-Language':'en-US,en;q=0.8',
'Connection':'keep-alive',
'Cookie':'ACOOKIE=C8ctADJmMTc1YTRhLTBiMTEtNGViOC1iZjE0LTM5NTNkZDVmMDc1YwAAAAABAAAASGYBALlflFnvWZRZAQAAAABLAAC5X5RZ71mUWQAAAAA-',
'Host':'statse.webtrendslive.com',
'Referer':'https://benport.bentekenergy.com/Login.aspx',
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'
}
Edit 2 (following stovfl answer):
I use now the following payload, fill each attributes with the value in the form and complete it with username, password and rememberMe.
I also tried with the following headers in the request.
Still not connected
payload = {
'__VIEWSTATE' : '',
'__VIEWSTATEGENERATOR' : '',
'__PREVIOUSPAGE' : '',
'__EVENTVALIDATION' : '',
'isAuthenticated' : 'False',
'ctl00$hfAccessKey' : '',
'ctl00$hfVisibility' : '',
'ctl00$hfDateTime' : '',
'ctl00$hfHash' : '',
'ctl00$hfAnnouncementsUrl' : '',
'ctl00$MasterMainContent$LoginCtrl$Username' : '',
'ctl00$MasterMainContent$LoginCtrl$Password' : '',
'ctl00$MasterMainContent$LoginCtrl$cbxRememberMe' : '',
}
request_headers = {
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Encoding':'gzip, deflate, br',
'Accept-Language':'en-US,en;q=0.8',
'Cache-Control':'max-age=0',
'Connection':'keep-alive',
'Content-Length':'7522',
'Content-Type':'application/x-www-form-urlencoded',
'Cookie':'',
'Host':'benport.bentekenergy.com',
'Origin':'https://benport.bentekenergy.com',
'Referer':'https://benport.bentekenergy.com/Login.aspx',
'Upgrade-Insecure-Requests':'1',
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'
}
with requests.Session() as s:
response = s.get('http://benport.bentekenergy.com/Login.aspx')
soup = BeautifulSoup(response.text, "html.parser")
if soup.find("input", {"name" : "ctl00$MasterMainContent$LoginCtrl$Username"}):
print("not connected")
soup = BeautifulSoup(response.text, "lxml")
for element in soup.select("input"):
if element.get("name") in payload:
payload[element.get("name")] = element.get("value")
payload['ctl00$MasterMainContent$LoginCtrl$Username'] = 'myemail#email.com'
payload['ctl00$MasterMainContent$LoginCtrl$Password'] = 'mypassword'
payload['ctl00$MasterMainContent$LoginCtrl$cbxRememberMe'] = 'on'
response = s.post('http://benport.bentekenergy.com/Login.aspx', data=payload, headers=request_headers)
print (s.cookies)
soup = BeautifulSoup(response.text, "html.parser")
if soup.find("input", {"name" : "ctl00$MasterMainContent$LoginCtrl$Username"}):
print("not connected")
else:
print("connected")
s.cookies contains:
<RequestsCookieJar[<Cookie BenportState=q1k2r2eqftltjm55igy5mg55 for .bentekenergy.com/>, <Cookie RememberMe=True for .bentekenergy.com/>]>
Edit 3 (answer!):
I added
'__EVENTTARGET' : ''
in the payload and filled it with the value 'ctl00$MasterMainContent$LoginCtrl$btnSignIn'
Now I am connected!
NB: the headers were not necessary, just the payload

Comment: ... found that there is a parameter '__EVENTTARGET' that was not in the payload. It needed to contain 'ctl00$MasterMainContent$LoginCtrl$btnSignIn'. Now I am connected!
Yes, overlooked the Submit Button, there is a Javascript:
href="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions("ctl00$headerLoginCtrl$btnSignIn",
Relevant: SO Answer How To see POST Data
Comment: ... based on your answer (Edit 2). Still not connected
You are using http instead of https
Will be Auto-Redirected to https.
The <RequestsCookieJar has changed, so some progress.
I'm still unsure about your Authenticated Check: if soup.find("input", {"name"....
Have you Check the Page Content?
Any Error Message?
Don't use BeautifulSoup(... your following Requests should be using Session s to reuse the assigned Cookie.
E.g. response = s.get('<url to some resticted page>
Try request_headers with only 'User-Agent'
Analysis <form>:
Login URL: https://benport.bentekenergy.com/Login.aspx
Form: action: /Login.aspx, method: post
If value not empty means: Pre-Set-Values from Login Page.
1:input type:hidden value:/wEPDwUKLT... id:__VIEWSTATE
2:input type:hidden value:0BA31D5D id:__VIEWSTATEGENERATOR
3:input type:hidden value:2gILTn0H1S... id:__PREVIOUSPAGE
4:input type:hidden value:/wEWDAKIr6... id:__EVENTVALIDATION
5:input type:hidden value:False id:isAuthenticated
6:input type:hidden value:nu66O9eqvE id:ctl00_hfAccessKey
7:input type:hidden value:public id:ctl00_hfVisibility
8:input type:hidden value:08%2F16%2F... id:ctl00_hfDateTime
9:input type:hidden value:3AB353573D... id:ctl00_hfHash
10:input type:hidden value://announce... id:ctl00_hfAnnouncementsUrl
11:input type:text value:empty id:ctl00_MasterMainContent_LoginCtrl_Username
12:input type:password value:empty id:ctl00_MasterMainContent_LoginCtrl_Password
13:input type:checkbox value:empty id:ctl00_MasterMainContent_LoginCtrl_cbxRememberMe

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Scraping with AJAX - how to obtain the data? - python

Related

How to get a correct session_id? (Scrapy, Python)

Unable to grab expected result from a site issuing post requests

How fix load more ajax request in scrapy

302 Redirect in xhr request

How can I log in this specific website using Python Requests?

Categories

Resources