Extract JSON response data only visible in Network>XHR > angular response - python

Thanks in advance for looking into this query.
I am trying to extract data from the angular response which is not visible in the HTML code using the inspect function of Chrome browser.
I researched and looked for solutions and have been able to find the data in the Network (tab)>Fetch/XHR>Response (screenshots) and also wrote code based on the knowledge I gained researching this topic.
Response
In order to read the response I am trying the below code by passing the parameters and cookies grabbed from the main URL
and passing them into the request via the below code segment from the main code shared further below. The parameters were created based on information I found under tab Network (tab)>Fetch/XHR>Header
http = urllib3.PoolManager()
r = http.request('GET',
'https://www.barchart.com/proxies/core-api/v1/quotes/get?' + urlencode(params),
headers=headers
)
QUESTIONS
Please help confirm what am I missing or doing wrong? I want to read and store the json response what should I be doing?
JSON to be extracted
Also is there a way to read the params using a function?, instead of assigning them as I have done below. What I mean is similar to what I have done for cookies (headers = x.cookies.get_dict()) is there a way to read and assign parameters?
Below is the full code I am using.
import requests
import urllib3
from urllib.parse import urlencode
url = 'https://www.barchart.com/etfs-funds/performance/percent-change/advances?viewName=main&timeFrame=5d&orderBy=percentChange5d&orderDir=desc'
header = {'accept': 'application/json', 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36'}
s = requests.Session()
x = s.get(url, headers=header)
headers = x.cookies.get_dict()
params = { 'lists': 'etfs.us.percent.advances.unleveraged.5d',
'orderDir': 'desc',
'fields': 'symbol,symbolName,lastPrice,weightedAlpha,percentChangeYtd,percentChange1m,percentChange3m,percentChange1y,symbolCode,symbolType,hasOptions',
'orderBy': 'percentChange',
'meta': 'field.shortName,field.type,field.description,lists.lastUpdate',
'hasOptions': 'true',
'page': '1',
'limit': '100',
'in(leverage%2C(1x))':'',
'raw': '1'}
http = urllib3.PoolManager()
r = http.request('GET',
'https://www.barchart.com/proxies/core-api/v1/quotes/get?' + urlencode(params),
headers=headers
)
r.data
r.data response is below, returning an error.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">\n<HTML><HEAD><META
HTTP-EQUIV="Content-Type" CONTENT="text/html;
charset=iso-8859-1">\n<TITLE>ERROR: The request could not be
satisfied</TITLE>\n</HEAD><BODY>\n<H1>403 ERROR</H1>\n<H2>The request
could not be satisfied.</H2>\n<HR noshade size="1px">\nRequest
blocked.\nWe can\'t connect to the server for this app or website at
this time. There might be too much traffic or a configuration error.
Try again later, or contact the app or website owner.\n<BR
clear="all">\nIf you provide content to customers through CloudFront,
you can find steps to troubleshoot and help prevent this error by
reviewing the CloudFront documentation.\n<BR clear="all">\n<HR noshade
size="1px">\n<PRE>\nGenerated by cloudfront (CloudFront)\nRequest ID:
vcjzkFEpvdtf6ihDpy4dVkYx1_lI8SUu3go8mLqJ8MQXR-KRpCvkng==\n</PRE>\n<ADDRESS>\n</ADDRESS>\n</BODY></HTML>

You can get reponse by name, on your screenshot name get?lists=etfs.us is what you need, you also need to install playwright
There is a guide here: https://www.zenrows.com/blog/web-scraping-intercepting-xhr-requests#use-case-nseindiacom
from playwright.sync_api import sync_playwright
url = "https://www.barchart.com/etfs-funds/performance/percent-change/advances?viewName=main&timeFrame=5d&orderBy=percentChange5d&orderDir=desc"
with sync_playwright() as p:
def handle_response(response):
# the endpoint we are insterested in
if ("get?lists=etfs.us" in response.url):
print(response.json())
browser = p.chromium.launch()
page = browser.new_page()
page.on("response", handle_response)
page.goto(url, wait_until="networkidle")
page.context.close()
browser.close()

Related

How do I web scrape this link and iterate through the page numbers?

My goal is to web scrape this url link and iterate through the pages. I keep getting a strange error. My code and error follows:
import requests
import json
import pandas as pd
url = 'https://www.acehardware.com/api/commerce/storefront/locationUsageTypes/SP/locations?page='
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:98.0) Gecko/20100101 Firefox/98.0',
}
#create a url list to scrape data from all pages
url_list = []
for i in range(0, 4375):
url_list.append(url + str(i))
response = requests.get(url, headers=headers)
data = response.json()
d = json.dumps(data)
df = pd.json_normalize(d)
Error:
{'items': [{'applicationName': 'ReverseProxy', 'errorCode': 'UNAUTHORIZED', 'message': 'You are Unauthorized to perform the attempted operation. Application access token required', 'additionalErrorData': [{'name': 'OperationName', 'value': 'http://www.acehardware.com/api/commerce/storefront/locationUsageTypes/SP/locations?page=0&page=1'}]}], 'exceptionDetail': {'type': 'Mozu.Core.Exceptions.VaeUnAuthorizedException'}
This is strange to me because I should be able to access each page on this url
Specifically, since I can follow the link and copy and paste the json data. Is there a way to scrape this site without an api key?
It works in your browser because you have the cookie token saved in you local storage.
Once you delete all cookies, it does not work when you try to navigate to API link directly.
The token cookie is sb-sf-at-prod-s. Add this cookie to your headers and it will work.
I do not know if the value of this cookie is linked to my ip address. But if it is and it does not work for you. Just change the value of this cookie to one from your browser.
This cookies maybe is valid only for some request or for some time.
I recommend you to put some sleep between each request.
This website has security antibot Akamai.
import requests
import json
url = 'https://www.acehardware.com/api/commerce/storefront/locationUsageTypes/SP/locations?page='
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:98.0) Gecko/20100101 Firefox/98.0',
'cookie': 'sb-sf-at-prod=at=%2FVzynTSsuVJGJMAd8%2BjAO67EUtyn1fIEaqKmCi923rynHnztv6rQZH%2F5LMa7pmMBRiW00x2L%2B%2FLfmJhJKLpNMoK9OFJi069WHbzphl%2BZFM%2FpBV%2BdqmhCL%2FtylU11GQYQ8y7qavW4MWS4xJzWdmKV%2F01iJ0RkwynJLgcXmCzcde2oqgxa%2FAYWa0hN0xuYBMFlCoHJab1z3CU%2F01FJlsBDzXmJwb63zAJGVj4PIH5LvlcbnbOhbouQBKxCrMyrmpvxDf70U3nTl9qxF9qgOyTBZnvMBk1juoK8wL1K3rYp51nBC0O%2Bthd94wzQ9Vkolk%2B4y8qapFaaxRtfZiBqhAAtMg%3D%3D'
}
#create a url list to scrape data from all pages
url_list = []
for i in range(0, 4375):
url_list.append(url + str(i))
response = requests.get(url, headers=headers)
data = response.json()
d = json.dumps(data)
print(d)
I hope I have been able to help you.

Web Scrapping Python - Immoscout24 - Robot Rejection

So im trying to make a data science project using information from this site. But sadly when I try to scrape it, it blocks me because it thinks I am a bot. I saw a couple of post here: Python webscraping blocked
but it seems that Immoscout have already found a solution to this workaround. Does somebody know how I can come around this? thanks!
My Code:
import requests
from bs4 import BeautifulSoup
import random
headers = {"User-Agent": "Mozilla/5.0 (Linux; U; Android 4.2.2; he-il; NEO-X5-116A Build/JDQ39) AppleWebKit/534.30 ("
"KHTML, like Gecko) Version/4.0 Safari/534.30 , 'Accept-Language': 'en-US,en;q=0.5'"}
url = "https://www.immobilienscout24.de/Suche/de/berlin/berlin/wohnung-kaufen?enteredFrom=one_step_search"
response = requests.get(url, cookies={'required_cookie': 'reese84=xxx'} ,headers=headers)
webpage = response.content
print(response.status_code)
soup = BeautifulSoup(webpage, "html.parser")
print(soup.prettify)
thanks :)
Data is generating dynamically from API calls json response as POST method and You can extract data using only requests module.So,You can follow the next example.
import requests
headers= {
'content-type': 'application/json',
'x-requested-with': 'XMLHttpRequest'
}
api_url = "https://www.immobilienscout24.de/Suche/de/berlin/berlin/wohnung-kaufen?pagenumber=1"
jsonData = requests.post(api_url).json()
for item in jsonData['searchResponseModel']['resultlist.resultlist']['resultlistEntries'][0]['resultlistEntry']:
value=item['attributes'][0]['attribute'][0]['value'].replace('€','').replace('.',',')
print(value)
Output:
4,350,000
285,000
620,000
590,000
535,000
972,500
579,000
1,399,900
325,000
749,000
290,000
189,900
361,825
199,900
299,000
195,000
1,225,000
199,000
825,000
315,000

Is there a way I can get data that is being loaded with ajax request on a website using web scraping in python?

I am trying to get the listing data on this page https://stashh.io/collection/secret1f7mahjdux4hldnn6nqc8vnu0h5466ks9u8fwg3?sort=sold_date+desc using web scraping
Because the data is being loaded with Javascript, I can't use something like requests and BeautifulSoup. I checked the network tab to see how the request are being sent. I found that to get the data, I need to get the sid to make further request I can get the sid with the code below
def get_sid():
url = "https://stashh.io/socket.io/?EIO=4&transport=polling&t=NyPfiJ-"
response = requests.get(url)
response.raise_for_status()
text = response.text[1:]
data = {"data": ast.literal_eval(text)}
return data["data"]["sid"]
Then use the SID to send a request to this endpoint which gets the data using the code below
def get_listings():
sid = get_sid()
headers = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.87 Safari/537.36"
}
url = f"https://stashh.io/socket.io/?EIO=4&transport=polling&sid={sid}"
response = requests.get(url, headers=headers)
response.raise_for_status()
print(response.content)
return response.json()
I am getting this b'2' as a response instead of this
434[{"nfts":[{"_id":"61ffffd9aa7f94f21e7262c0","collection":"secret1f7mahjdux4hldnn6nqc8vnu0h5466ks9u8fwg3","id":"354","fullid":"secret1f7mahjdux4hldnn6nqc8vnu0h5466ks9u8fwg3_354","name":"Amalia","thumbnail":[{"authentication":{"key":"","user":""},"file_type":"image","extension":"png","url":"https://arweave.net/7pVsbsC2M6uVDMHaVxds-oZkDNajhsrIkKEDT-vfkM8/public_image.png"}],"created_at":1644080437,"royalties_decimal_rate":3,"royalties":[{"recipient":null,"rate":20},{"recipient":null,"rate":15},{"recipient":null,"rate":15}],"isTemplate":false,"mint_on_demand":{"serial":null,"quantity":null,"version":null,"from_template":""},"template":{},"likes":[{"from":"secret19k85udnt8mzxlt3tx0gk29thgnszyjcxe8vrkt","timestamp":1644543830855}],"listing"...
I resort to using selenium to get the data, it works but it's quite slow.
Is there a way I can get this data without using selenium?

Login into Duolingo using Python Requests

I want to land on the main (learning) page of my Duolingo profile but I am having a little trouble finding the correct way to sign into the website with my credentials using Python Requests.
I have tried making requests as well as I understood them but I am pretty much a noob in this so it has all went in vain thus far.
Help would be really appreciated!
This is what I was trying by my own means by the way:
#The Dictionary Keys/Values and the Post Request URL were taken from the Network Source code in Inspect on Google Chrome
import requests
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/81.0.4044.138 Safari/537.36'
}
login_data =
{
'identifier': 'something#email.com',
'password': 'myPassword'
}
with requests.Session() as s:
url = "https://www.duolingo.com/2017-06-30/login?fields="
s.post(url, headers = headers, params = login_data)
r = s.get("https://www.duolingo.com/learn")
print(r.content)
The post request receives the following content:
b'{"details": "Malformed JSON: No JSON object could be decoded", "error": "BAD_REQUEST_SCHEMA"}'
And since the login fails, the get request for the learn page receives this:
b'<html>\n <head>\n <title>401 Unauthorized</title>\n </head>\n <body>\n <h1>401
Unauthorized</h1>\n This server could not verify that you are authorized to access the document you
requested. Either you supplied the wrong credentials (e.g., bad password), or your browser does not
understand how to supply the credentials required.<br/><br/>\n\n\n\n </body>\n</html>'
Sorry if I am making any stupid mistakes. I do not know a lot about all this. Thanks!
If you inspect the POST request carefully you can see that:
accepted content type is application/json
there are more fields than you have supplied (distinctId, landingUrl)
the data is sent as a json request body and not url params
The only thing that you need to figure out is how to get distinctId then you can do the following:
EDIT:
Sending email/password as json body appears to be enough and there is no need to get distinctId, example:
import requests
import json
headers = {'content-type': 'application/json'}
data = {
'identifier': 'something#email.com',
'password': 'myPassword',
}
with requests.Session() as s:
url = "https://www.duolingo.com/2017-06-30/login?fields="
# use json.dumps to convert dict to serialized json string
s.post(url, headers=headers, data=json.dumps(data))
r = s.get("https://www.duolingo.com/learn")
print(r.content)

UrlOpen Redirected to default page

The default data link is http://tools.morningstar.co.uk/uk/fundscreener/results.aspx?LanguageId=en-GB&Universe=FOGBR%24%24ALL&CurrencyId=GBP&URLKey=t92wz0sj7c&Site=uk
But, I do not want data on this default page. I want the data under Portfolio tab. So, I used Firefox to determine the url of the portfolio and attempted following python code:
testpage = urlopen('http://tools.morningstar.co.uk/uk/fundscreener/results.aspx?LanguageId=en-GB&Universe=FOGBR%24%24ALL&CurrencyId=GBP&URLKey=t92wz0sj7c&Site=uk&tabAction=Portfolio')
However, page is always redirected to the default link. How do I get to the portfolio page?
You need to pay attention to the request that is being made along with all the headers and the data.
For getting the "portfolio" data, if you inspect, you will see that POST request is being along with log of data is sent and payload data (form data) is to used to send the portfolio data back in response.
What you need to do is mimic the request to fetch the response data and then handle that according to your need. You can do something like this :
import requests
from lxml import html
headers = {
'user-agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36',
'accept-language':'en-US,en;q=0.8,ms;q=0.6'
}
url = "http://tools.morningstar.co.uk/uk/fundscreener/results.aspx?LanguageId=en-GB&Universe=FOGBR%24%24ALL&CurrencyId=GBP&URLKey=t92wz0sj7c&Site=uk"
payload = {
'ctl00_ContentPlaceHolder1_aFundScreenerResultControl_ScriptManager1_HiddenField':';;AjaxControlToolkit, Version=3.5.7.123, Culture=neutral, PublicKeyToken=28f01b0e84b6d53e:en-GB:5a4df314-b4a2-4da2-a207-9284f1b1e96c:de1feab2:f2c8e708:720a52bf:f9cec9bc:589eaa30:a67c2700:ab09e3fe:87104b7c:8613aea7:3202a5a2:be6fb298',
'__EVENTTARGET':'TabAction',
'__EVENTARGUMENT':'Portfolio',
'__LASTFOCUS':'',
'__VIEWSTATE':'Or/Z5BkJx2WVGMIPWgbVTVzk9hu+/eDKDHsbG74cJRlPSPW9dXuSQt31f2njq7X4NCZF/VW7u63TU5lF3lWGIAFNRoIIWwlRVMeMWeHygunbmBVxWWO08k90rAhbOCiyeOgKoaL1lVKO0R0DGS9rjl1Gah7C2NiIyLeD8boWobKLRV47aRiqaWI9ZYprxoky4zmuNp4NP51z0QLfb/4TvQKfcXJcUHHAAknVurwXfye3cHiUGf7pOyI84E9KJscHsbowC6mejPX4XmlXLVrVXk/lupYU8yTXSp03D2vfyPcQcrxt3y/uF0kXNG+4A/hFWOQFazVk1SRMYnQlrWtQ9Ulh58Q71zEZvX3yZhnp2EA5ZnYuOfeFWCnwwUBa6s9o8uLocDK1Q4chtjXDqK7q2W89kPZoyYjmgB5xunFDt8A7Sz3IFaDkJEyPYdBPOKx1Y1zv0g3/gwBnd64UXkTlBySHZao2CB/OBNQoqI6RqI6L44nrbESabh+DHBCdcCKeL8Pj+lsM5o7P0ShXpXHbCRTPk4PiWVeP4hk1vyOFA7tiReoWEPwQvDe3sqWh+K7EHLHefW5ke6W9zy5seHuC1vfcVTwT5FUIcTaAnhoDSphsMHWPoVc/vtcfExPWUx/aC2KIf1m+DKtN/no8Frt4SYqxMGtDMSUZjMR5xhFHSaqfjv/0Gs+RVod4N+A4rYeUO07A9VTLTE8SuZ4ovxjhrEtAQ3bYqzt29leHpmFT7Pfl7OZw3t3wt6SjQX+Q3M5ozThannhRKaDJCnBZdFh7ZnY4fgCLpDNyMDq3FccJC0V6PDSuu6enpPWOcy4NJj5H+/rEqo61/e2wgmefzt7Zaygu4v66MmOKLqbWymNa6C1Xuc0u0FhERUrSWrL/rS9kwC+LA7aWFPhdnEnwPewV6yj7kWzb4IZZ6ivGs3CXYH7G2HTnsP/P2bHXNV+YaaXTkdKJXkiPF7/qQ3JKzZYhDJjj0PObqtI2RlhmeecF8Lq8SfRRrBTXWjvg48q7nXurZ7ztX28QRDHC5aP/13+X8RyvLRmiM4V3vRdMjxpt8ySZtKM5wpEA4XjTUtCWtrNKO18yc0pbMaGRm5xEoXLY/i1cHC62OvKRsEYX82q+KuGyqEKwPoElc2SbMfyEit8M6tkA12wBce4cqlUMX4D85OOKCMzhY+h0loQFSsgVFfqKpEHpH9yg5lKtRg0dZ8P301xoGCeXhBhyZIp234EAdOOQySV4iNcBykLFGOuB0w63KqVbQRejqlnj2Qd+OkoXQ4hAh9tgCXdxhOHZ1hLB3nHHNMT3TDBO+j8eXgxAE8PN9zt6Xj7qGqDmkHAlwMP3Q8er0Ms2i80x7pUzvy5ixozAbUgfuEeKtjkK7fSD0UkKMa/YELEjTkgVJm6goPPIR3D2lwNAQLyHM8xLFSy3evkpJojw+QEFw4U9n31CoO6OB15Isqy/E1MPgwq9Wz3mUn2iYH1JruwsgQqQXraUKAiyMlpfbtj2YQL39Zp+AwzPeDDbwRaCCNBFvmpapcJyMpmzlzd0tr9gdV1GoVTtWBg+UcVGSsQi4XkD+32CfUuQ+ZpFlmUoYLuYSAEFV7Y97MlqLMqW89r/BZXRXNacpizFFrnQlCnsM4Bj4DUp+K7pcxAaYRKWcH3tiQO6zhCa2b8YoawWzQ5Ij8b19Z7PLN3Yug9ldeJ2CcYOzUQebT23ofSNtCU+uTbYzzh6RE8Bg/rut8R6A1uwYBWvjfL7N7M2fUSd01pwYgJ0BfsViV1pipzpCvTL5hGf1aK25gR+T7GtIxNbrdlo6Z1LbV/xYQYIDTod5dq6wUttZJVLeLVZRkCAv+M+o7Bvd86pi82TIdC8foOPgo7OR6ykPk+aMt1pr/hBV3tmBOUvMyYADmmOZQR+L/AQ57tRukeRyACeTJq1b5icpxawI+qn71we6eAKmg5POvkbq+pI+YnoSs1Mhk9OWeJ1CPRg3P5TDMIhqXsG4mKY6awMwZXF12/r4qb7bRnfZGFukHBAYJTRsmZsLgiicM2uJ7kchxs2U/jwVcItGHgnIYkg1r7TTJ3oFo1rHEVhFHm8dIem3iI/VUpbe/XZyEKseDxoALSbASjYxM5n2eGfBLFnHMHv8RPfrX5EBfD0ZzMAVc8MoSycTsJuJI8L912Eewk9Cz3mb7o2zF9L+8syg8NpEDy78kIa0lE+QNqvdtk3P7uCxUckKWdmKLUfU2zaTBBGkIcDo8xXktZFgC+yQbUtxD2yFC21tvSA3xJaPVWqycMiVRp3fwIabWylnRnnwLqvAjIPTKiZI5w/szdciCwzx0GhSY14xpVV+jlLlfH8KCqVBVL5NIzxRTw+ELVPHOS3orE1dKtCcOqM22GE5PsU69E7ViA+fC2Gn/HzkUfUHPBKjKixX9hTmZzOnXToBU5sdEMZ1i3Jte+xfk3YVzYv9TO9f5EiibdNgw8MdCrXwxlgYNZUob0PixOajsPed+qv2PTl+kvLOSTkw6Z6K892TJkBvpAGQvP/zSgUorcNhuAJwQVG32TnX0HypMPpVwX0SqOhZLGM9essa7guKOrA3GdIDsoA2/f4JkFlJMtVgXKGPNXr7mTCeq2H8vFfQbH/59wPfMgrxxo6s9C+Tyt5zG3lyRoTEGUr4QwBkSeHq4J6Vya3sFDH911QHrfFuaKF2auqHHGuKyCCViqpb3A1Z4/GbllXBC4cmjyKc8FfI5i2eSSEMOd95N198ZCOD7x1zXPACX9QjaMdzZbadJ9UHXYsb/7l87ujNY4x5S9oQXgfW8fva9i4oqTqMV3VXTQK8lVcFovH0OxXXpNZ+rPm8Tj5kbRGrMgp6CdyxWSLKvqYv8f57ICr6ozaxyAd8XiTM+AhkfnXsN8BcH0u1yP6WUBDkUjhBi+4lfO6Dj5r6pFIN65GqPaz0mRFDpZU3nVQ1CmmeXneh0ZT/u7tG7Ray5Md5jr9onVsWfWnbc0hbUP0ghMANhtZtcrLpFikwxxQybdsS/xWdB4dLenTMAi2hn0KQ196thhQvvhEvEWaSxuEjX+iaQB14kXwOHAsBj8Ikp4lIdBsVctVQFVNzM3+F+UfDIbpTFh4IaAvOWNZzFGZYjdKDKKIuIgSAhdkHZbjQGpvXWdx12WR1/I/aqk5dx8OFpU3Lq/thZxQ+0oODetvex87L6lKWMgUcvQQAzAXbwzFp4wcTHnQuKJ21hqotOfn8F0GmWv59/hqfH1oFpt6/ENAs162hXOdGt5kTYl7u6X+ciQiIioRLiJ/NRIOoa1T++6v2FMk9acnOfNYMxEGeBdtqmLIN70aL8wvoFLliCkUhfe4yPaFQzFo26JsnnAXUpuiDKfs5fjDS+Rk/1BfVScqDIMv8IL8RDIoWxg8NX5DOOJPwAc3uC+s/kCCpoG2L0m9FLgSBv6Nr9wuv1rt59C/K/5RETD/VP415ArnuUBrdGpuYza1FvYyCo85HREzIL2lN6yZUBUXbBBrWxa3LiGaojhfhCyflhhHs+GoM8zfY5IW7Wpvp/YMPAgxXNRtegGL80+HU/dmlkRO8nRx3eyzpcpWZ302rK9m+OYqtfUXwvFKR7ULWnk/2aHsTQe6lwifxK70QG+jhZlrJqbPGi8vSpajsGMw5iU+VJM4CEDcGhvgpzODw3LkXPvsFrdLq8eUzHXo1Ox+yiZ2zSN3vGDcGeEZiQAbG2dcNt7niW+reozfdxVQAi4uLpPGWYu8jvVnRxoMuQEKEGzIiwNNsvpgCMGdUfk0izvvkTplz8lvk6ROlhy417VtiiVXVIMTAovFNO+W3O3/17LJ9Ed7QPYdoUO4n5fidYX6r4QUAoRGowMAPHQRIdg/AqN7N/EfmDRD72t7BOqvzVetXuTVId75vKB2P0CwoQPDIy0ynLZcTRykRs38LIHwYI5irp1NUjCee7mvo1RE0asD670LM03ZFMCOu/hmgln2dk5oFeyysISdxVUQKRmI6VytwEsSviOZeP1cZgB5DakdSgCaloRI2JGVbZ22B9UgO6hFSvfHhox5y1p/CzIrJPd+GUB80wmFX8Kgl0DSjsf9PJNQlAKu/jb85+wvF7exNrPyrShkWE9lYjbcmBHPYc+8J3ia1N3LWtVbR1x554dYoWHVGw3VbU3bWfqLjn5Eon4x3h5R/bCwVBorVCsQ99SzWCv5J9dMRF38r8y1yA4iEUPcX89n4nl91t6cnia4THuk2hhbaBPeu2PFwnTuwiJxAknVEGFUslAXu621wvmyssftVnQ+jzirCQNJAXyE75t+pNrWmQJXrpHDxnR3V9/LFrNy3tZn61H+UkEY1QK29bUJHE+DOfSnS4QkNY3VpLpaBdBeBorOOZ6dEc+lzVDcPrgjL+1fqu/yHwFmxCN8MfreDuX5E8M86YAR/xnyYAJRMxafR1p9eIG+cgHwIBeOhCw1J0p+ydN/bNK7KousyY4OcEr4zTF6crn6LmN6C7zDqabx8cjMRmUeNl24x27LxJhakNbmQjMVPSfo6Ro/edo0L7pG+pbj9SwiaJxGkr65b8pvrTwDFKh7tLUQtZ9j4s4y6MiQ565q2OJp93rm2deHHXUsM/ziI1t9OV00dbjuhTaLTmF5u3rIpKgryYVvmIa6G081WcKCeWz46amFLg7v39SB6XNuL0AIxIow5Hu+S0oIv51+ycUZoLUnypTFe/SnlobjYxAxsJt/cnxZ4wh556EZ9rN0HaJrbbb7pA4uaDBvz4EL8ndM+zEmlacyfQlKSr+jdB0XX+An6zhNQv3D5dkz84QdmPuAavhemrwr2m30Q/tNZ8DZdsBpOyR/U86nplu9Sx5LcFGnWULX+teY24nBUUghfuhGRPEr0dHPUUgMwpQq1fpcz6YQft09B0uthQiYWhNXvnsrlvnLzZTWTZLjFfwDlNn5RZqn0fAxudbM+eOzL9xvx4TBEEpcyf5nLTuNAKvfeZm4KWcRmV+WPnDJxmf7OlTVNKsXiY7Y+bJMjgNfKMh3oQws/1+gtATMlYSdjNIzuYSglhMyXS+BPRI8dpDq2VeF/cb6AII0Pvyq0H7nRadP1xccD6hTKdb4rP7kEAAZClm7P4M4Mog+CAXePDMw3kSkRGzsbT/6rKffKp9crRcOnKwSHU2yuf+NBTES6xeaPD0R7YwjbrRHPDsOoOdQXEcn/bl0oNnnLheSZhDKdFERtlvrpVB8qZ469A2Jqw/X5QMcIrEb0gisLWRSuiCpg/zmFDqaDsj1M8evc2MPGtkKxw9IsuupthsWKxYkbwB2inJdnLwgCDx+2B5oIT7pYbLricSseF1ukjL3uEyHicA3WztLzKoLjumpzevRWBs1VnYCL0Ow0U0yABR/dz3nh0mcE6X0iBb8ulgp+zn/8CNTNEE7lVSPPn3FFr6+mNuYu5O9fn9G6lji/8muhJWTW/9bbrA/2ZVPK4pto7mmfh/OWmkHnw3Te4ZysDIOXcXD7BCixoSB/3l88JQrGB/EAqrNz6oEhXeQ9hof2EhwKI3ZoxvKh5jfDii3PWI+NJPdFFtP+zRS+P1p4aMpQC703rHkmiSFRJIIaPnnbnXNN1NhBefjkjFA6nTvUcYsbBtKQzFJbAEiBnhOo/+jgUdd31gZbZbRi2Iw+Pv07qjDgwVznE5HLwEu8y2k+mdW7f1RKIgjiOhPA4CzBcWumeo7USUDpHaLNWEP0lLiwuxxB8CigRUln763e4xFAvd+vPlBoJJsJBUezJ5OdV5AC6Fe9/UuFT8Ov+Bsknk072xPIHxLks5J6XxDNrm7mnDKTirLE3y2OLpy5gAUPc1n7UpdH08k3C4y+8iqILZXfN6WzR459QcmY7Uu2YFSbxVM6dVYsE7arsp3zyDgjCgnctbrlO2A2iJ2P7f8eGdYEnMjm8Hv4lwFfSHDKVuVoD3+2Amw5CtE9Smtdidm4OTC/C1yePG+IvtXlx+21lgPpRdWOCFmz8/bQusVvRlQCz8any5fVnXJERaeQxC8UMbWgmRPQs1Q0rrpe7V8LKq4W5rwmEClsmUoqWiaDXN/nuzuPY2Bm7l4Qo7B0JQd/AEA3Kw4/4L8XLbQ8JHtnamJExXbDFZp2jPjz/9igiPq5j1+/ZqJtnwHPa54R2gLGbV9plpEMOu94Og993CM4QxKN4LSD/TKUV/ik46I5H1texmN7RWMcL1gAPnO0AVbiI8sP1xNxclROHanYTudyVVKld0qrq6ht4eYOgPL2RWB8ma1i8DiqfEy2Z+iDHIDv4nB92ktT7BKNA9MEC19+Nmbv1nNgiULtt+jOZZ92XlPnVU5fYIQZEsn/VPzxFx+6yoDmdN9+aeC6k/SfWFPJhdQ6e0Kk9sOpZQowS+GpaTGw86m2K/RyyzQyzJetV58Wlon3bYwzST6N/CHxiyWJ6KiC+JOJHjKuZjoX33FKKp/LqkG7PlC599l6afuCmU/e9r+MczU+BqdMZTAHshA5mpgPqT2gjHvD2Er+J9Be/P7YHCUvOUFpnfcDWVDz0Hx3kqibf8iFKBFDzK+YHk7U0I0O7yRgLTpxf3CC3ZpEF0unuuM/29BgViIucoRdPIHceE2aTUcH27myPTT5SJ8SGpJrfr5bS264VZ7la/ewkdSVAvfKZAD95KofW0up8Bt4z14+IQCE1Xe1c2fA6Vr20junvkX4ZgQVrlWGztCX26LigP19olHT8mDFPGzCEuXXqFDSGUaJ9bpRglkd5Ps9JkKbCfnzF+PLFMByr5xDywnCaMDyzLttVzfdu32qVI3UP51UZ1lnnylXuKjuctCJiC3it20cmMf4VK41gZpGbUlWoasmnx5OGapdvwGTtrT0cVNRkg6UXjj2BuZncnTBPzNhwarsZUsBBtOAbDue7xfEOXn1lLUDQ1u8xKqwzN4tqYTbp+VQAuI3LG9FaF6cD0wQX9qyFji7oIcmsHMe9KROCKi83IdG/Td0ML2/h3yUG/VkKVTdKbX2QGNShAKCT3JPPfR49q0WIeuJ4SCskuiZfzzo2m3rdodBvqEPiGSOmlmp6/RLGF0iWtZ177aqYzeEEMwO2BpxR0ZB/+0JTD9u+DE50h83q3eRsikNc4VlZDB8JWJW3WRLU/AxRefA0rP3nes0B7Thx0MTXpRsOGjprtSKYR9h73QxDugTSNA77sCpjSutaswovVdn9Lj6k4IL6av1gRK2Wso/e8YNEglHp+VBEegXw35ZOByleDUwqdOS07xCA9hBS0Oec0v7YvFEAIJnW4FunEV2fscciH2gVpDR2s+FbKjVdT7t6qNnWT4HX2PuL2mrHDKgE/l8tDvJ2z2zZW1fTiExfTngLbTOyhlptrc8RFDAJBi3jdYw4HU9LufawJjUIukBgiX9cM5y2IykqNzM10tMsMVfxRH10lHqieGf9e0u2ht+gRmYwooRzoyenkKlWVHh8E37+yXa0SHuV5qXb/8sk1IGqE5p0wL7qWUfOTRAdWtPll30n16f7Epfl+dYI8m93uTk2FrL0Dsosdkp5BmIilduNXje1bMonEtliHrJ012Q0FIxVjEOZDUTUXYwRw0mF0eaxvKu27cJ1OqYUGfJk9zqAiAc9QnTBDL/f+zljgBzs6FWC2PWASaoMrS5Q0aNlN/y5wmFma9swHoEwrBUXr4Vi2Nyf5jj/FijJz77DaNs1J4G5uUF8Abe8HRvYc0XCtEMWqCcv3W78Px4/v/ThOMvamMcJvBh91/6Ep95/SzHvuEDpb8WsUKjwpXDdmp1k7QgMwb0ymrheZhxj/mYklx4EtnMWYwIPt02RJbEFoEcgB09chAg5x8rTh6FmJzGHmZOv7A8oEg0CvrO2pT+aiKqCTRcJsOKKvZnLXlQg1TwgJh3jCgvVVSPGIEO4RpIWMNT1/Opno6ytmiubgX5NoythDBrG5WtoAltsfomRTkb1NWOhcam9Q=',
'__VIEWSTATEGENERATOR':'7400285C',
'__VIEWSTATEENCRYPTED':'',
'__EVENTVALIDATION':'1p0545yV8Pljo87c7Dlb6kiemgGIXd5S3wYUGUoMyg6IPO12GWyBgNM7bu67YNl9f9Sx9ad9lHLIfwYtw0nDGqWYtWBnM8PHrdmxYdOb5+qUooGamIPBCCel/8Ri+FGpvNGPTZkKeuYzfomnlqr/mYoMcjdnsiQWCf7Dvou8X5p8A/pkHRReFtE8H4xIhr8X3MU6lpxBhHZKj3UK+hBHCWxEnQkGb0Nz2Pi8hyWNt5AUu830RSQnl793RwuxwQ1HCmJYFEx00c11gXmSn36PPP42OCMstDR/GpK2LUPsNQbdJ7TUq25rzG/5SIjYxWA0nQbGY/mWaY3Q9iCo7k5o9QnZEf4yLaOF5g7nEva4lTZNwx27ynyDAWrRBVE0KsGTsbIQMgMPqCV8gzc6irsluosW+EI6zW0mdaeoiBaGYHFQnJ77a5rnbpL0j6fMiDfL+5VW7jAaRnyz3Y10Cn1TlXEY4rvQjHZIxvK/rBe2WSVkyXIhrUgAl7a7EvDGnBniVdsizrBSgASrjcT9svJ0aHPEpfxJmy8nuzV0pZbXGzG1q06Xyij4SoxHfi7tf1dPjOn6+zdR2SY0/sQvXmQ35bAlbnFMKWdzyJHB0uEm6GYQtV4Dcj6fjGijxjIiQW0SgjuRd7/8k1i7MvEnD+I6MRIBhx/KNOdP3os9oP8pyMicIz8V1o7KENKwX4fyUmbIx34adCXt/DXT1sjFtNu4S0vzvOU/AtxBLOEcw+clV25xSp/94dEq/dge+K2ySRuxKt6DcNhjDMYvc1ACXbfVjANG8ar3x7n9kX14EMtnpip0RI9ypma3tOmjqip+Qc+lyc12A0jV714BfTzw6nSjYya75Idztq1gZifNt+pQn7GO7Qw2kIqNnXvpA+UkWbsTlnTKyY9gTqPHbF5XcSvtvDfNYM6mxVMJZ1MyAt4pxrCgyGAC0IswPZ8wMAablR9fNStFs67D4kyeUCU/2IVTD1/pfmMC8meLuaXpgHkl2er3Wr84H2lVL+xUd7/wCUSkLFke68SeRfqPl7dIR9hstVJC0cCTbmco0KxTzcwln3QdoxveE/N8v31Z9teZoJxdeRRFyJQFGHw81JVor2kACBsIkioLUvz4IpUxE8XEwUrjHCBJyZG3QzcQAxSBXprztdoknBgrd38wCssuCa3gQvIoMtbCaedWhmY0pA/AI0aHTHR/j5nTg9jaqeEViZF0hLVEhVz2MojXswtp0aA70YwuMPBmMCdgy5w+wSeThtsyt6j+b2NHHRkE0uqEc8D5XDUB5M2UolpsZCcuOKSwK+jwGdeb3gPWssUQShMgRTEoFLapZJKX7c8/yeZ7Lf4KrFE4pBz8+JeFJOVFl3y1ewckAFZVHdYvu1901Aq6PKTMu6kGz/LElno8fCJbyKacsZ3LtpssGhvFBN0vv0WIn4elkiLCL3u9v6oWOxaK4OIDTaVwDLjb5BvBpd2Szj5diHAG1IXoVQYvJ2VEDVwbiTUChXRZcDY6bAm7dvkYWLOxsa0whGz2xeeApbEceeQrHREelH89ucBenmuENPiF98Kf4mZQ/ThiVhFxiWAux1b0Dn0z7M/mXfposuy8ytqtRry3SJoC5V7I+7E4N5x0JyVwN/vtxBpd5h443R35RvDZ1tnscirsGzNoulevkeUqM+I6TgjrvBF00fv3isZvzUjIXK6E9cAg4G7aPyuirI+KIICi9VNxF5fDRxi2UPTHiB3NT01Vez5GVt0Tu8lpn2iakJSBjihOYORrSI+xJzbQdnCzJa1+h8UiAFXgpqWviJUVXG22wFQ1HQckAbFxU/Pcyx+QrsnDrhwihqmnwFd1fuwOy74SAvPMojpxujxWDe+37nhroEyhrk5yOB65RDUcQFS77a+3RwuNyXAodTC3QMp5lMZD1Ae8zGEBesg4zbkP7aMS+ljYBShRN6n9KYhHZ7s2Iq5V4K6GrUcOFdXP157jN4vBuj8l+UoBIPjpMm9KKpLnCuSjGNPIyoxfPg==',
'ctl00$ContentPlaceHolder1$aFundScreenerResultControl$ddlPageSize':'20',
'ctl00$ContentPlaceHolder1$aFundScreenerResultControl$hdnFilterBySelection':'',
'ctl00$ContentPlaceHolder1$aFundScreenerResultControl$hdnSelectedIndex':'',
'ctl00$ContentPlaceHolder1$aFundScreenerResultControl$hdnSelectedFunds':'',
'ctl00$ContentPlaceHolder1$aFundScreenerResultControl$hdnCustomImageFileIds':'',
'ctl00$ContentPlaceHolder1$aFundScreenerResultControl$hdnSelectedSecurityCount':'0',
'ctl00$ContentPlaceHolder1$aFundScreenerResultControl$hdnTabs':'Snapshot,ShortTerm,Performance,Portfolio,FeesAndDetails',
'ctl00$ContentPlaceHolder1$aFundScreenerResultControl$msExportToExcel$hdnSelectedRow':'',
'ctl00$ContentPlaceHolder1$aFundScreenerResultControl$msExportToExcel$hdnCheckedColumns':'',
'ctl00$ContentPlaceHolder1$aFundScreenerResultControl$msExportToExcel$hdnExportLimit':'50',
'ctl00$ContentPlaceHolder1$aFundScreenerResultControl$msExportToExcel$hdnExportCount':'0',
'ctl00$ContentPlaceHolder1$aFundScreenerResultControl$msExportToExcel$toExport':'',
'ctl00$ContentPlaceHolder1$aFundScreenerResultControl$msExportToExcel$hdnAllowAllFundsExport':'true',
'ctl00$ContentPlaceHolder1$aFundScreenerResultControl$txtSaveSearch':'',
'ctl00$__RequestVerificationToken':'e_BrPK0DxBjkgMrhfkdyFJjp1nPzltSn0h20aUjHJPSe3W4w3FRsFQNo_YY3Ml0D1CkNGqC5PEJBigtZuvdbiYrldSMrUoOQFUjaPifPbM41'
}
r = requests.post(url, headers = headers, data = payload)
print(r.content)
root = html.fromstring(r.content)
You can now fetch the elements you need from root using xpath such as :
root.xpath('//input[#class="some_class"]')
Refer to scraping and lxml documentation for more understanding.
I have used all the payload data from the request, you can remove some and check for what is absolutely necessary for the request.
Also, follow the website rules about scraping and scrape gracefully without putting too much pressure on the website.

Categories

Resources