I've been trying to get past the form page on http://dq.ndc.bsnl.co.in/bsnl-web/residentialSearch.seam using the python Requests module.
The problem I'm guessing is the AJAX in the form field. And I really have no clue about how to go about sending a request with Python Requests for that.
I know that this can be done through Selenium, but I need it done through requests.
Here's my current code:
import requests
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:27.0) Gecko/20100101 Firefox/27.0'
}
payload = {
"residential": "residential",
"residential:j_id12": "",
"residential:firstField": 'a',
"residential:criteria1": "3",
"residential:city": "ASIND",
"residential:button1": "residential:button1",
"residential:suggestionBoxId_selection": "",
"javax.faces.ViewState": "j_id1"
}
with requests.Session() as s:
# print s.headers
print s.get('http://dq.ndc.bsnl.co.in/bsnl-web/residentialSearch.seam')
print s.headers
print s.cookies
resp = s.post(
'http://dq.ndc.bsnl.co.in/bsnl-web/residentialSearch.seam',
data=payload, headers=headers)
print resp.text
You are pretty near to the full solution. First you need the AJAXREQUEST in the payload to start the search and then follow the redirect to the first results page. The next pages you get with more requests. Only problem: there is no real end-of-pages mark, it starts over with the first page again. So I have to look into the contents for Page x of y.
import re
import requests
import requests.models
# non-standard conform redirect:
requests.Response.is_redirect = property(lambda self: (
'location' in self.headers and (
self.status_code in requests.models.REDIRECT_STATI or
self.headers.get('Ajax-Response', '') == 'redirect'
)))
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:27.0) Gecko/20100101 Firefox/27.0'
}
payload = {
"AJAXREQUEST": "loader2",
"residential": "residential",
"residential:j_id12": "",
"residential:firstField": 'a',
"residential:criteria1": "3",
"residential:city": "ASIND",
"residential:button1": "residential:button1",
"residential:suggestionBoxId_selection": "",
"javax.faces.ViewState": "j_id1"
}
with requests.Session() as s:
print s.get('http://dq.ndc.bsnl.co.in/bsnl-web/residentialSearch.seam')
print s.headers
print s.cookies
resp = s.post(
'http://dq.ndc.bsnl.co.in/bsnl-web/residentialSearch.seam',
data=payload, headers=headers)
while True:
# do data processing
for l in resp.text.split("subscriber');")[1:]: print l[2:].split('<')[0]
# look for next page
current, last = re.search('Page (\d+) of (\d+)', resp.text).groups()
if int(current) == int(last):
break
resp = s.post('http://dq.ndc.bsnl.co.in/bsnl-web/resSrchDtls.seam',
data={'AJAXREQUEST':'_viewRoot',
'j_id10':'j_id10',
'javax.faces.ViewState':'j_id2',
'j_id10:PGDOWNLink':'j_id10:PGDOWNLink',
}, headers=headers)
Related
After visiting this website, when I fill out the inputbox with Sydney CBD, NSW and hit the search button, I can see the required results displayed on that site.
I wish to scrape the property links using requests module. When I go for the following attempt, I can get the property links from the first page.
The problem here is that I hardcoded the value of sha256Hash within params, which is not what I want to do. I don't know if the ID retrieved by issuing a get requests to the suggestion url needs to be converted to sha256Hash.
However, when I do that using this function get_hashed_string(), the value it produces is different from the hardcoded one that is available within params. As a result, the script spits out a keyError on this line: container = res.json().
import requests
import hashlib
from pprint import pprint
from bs4 import BeautifulSoup
url = 'https://suggest.realestate.com.au/consumer-suggest/suggestions'
link = 'https://lexa.realestate.com.au/graphql'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}
payload = {
'max': '7',
'type': 'suburb,region,precinct,state,postcode',
'src': 'homepage-web',
'query': 'Sydney CBD, NSW'
}
params = {"operationName":"searchByQuery","variables":{"query":"{\"channel\":\"buy\",\"page\":1,\"pageSize\":25,\"filters\":{\"surroundingSuburbs\":true,\"excludeNoSalePrice\":false,\"ex-under-contract\":false,\"ex-deposit-taken\":false,\"excludeAuctions\":false,\"excludePrivateSales\":false,\"furnished\":false,\"petsAllowed\":false,\"hasScheduledAuction\":false},\"localities\":[{\"searchLocation\":\"sydney cbd, nsw\"}]}","testListings":False,"nullifyOptionals":False},"extensions":{"persistedQuery":{"version":1,"sha256Hash":"ef58e42a4bd826a761f2092d573ee0fb1dac5a70cd0ce71abfffbf349b5b89c1"}}}
def get_hashed_string(keyword):
hashed_str = hashlib.sha256(keyword.encode('utf-8')).hexdigest()
return hashed_str
with requests.Session() as s:
s.headers.update(headers)
r = s.get(url,params=payload)
hashed_id = r.json()['_embedded']['suggestions'][0]['id']
# params['extensions']['persistedQuery']['sha256Hash'] = get_hashed_string(hashed_id)
res = s.post(link,json=params)
container = res.json()['data']['buySearch']['results']['exact']['items']
for item in container:
print(item['listing']['_links']['canonical']['href'])
If I run the script as is, it works beautifully. When I uncomment the line params['extensions']['persistedQuery']--> and run the script again, the script breaks.
How can I generate the value of sha256Hash and use the same within the script above?
This is not how graphql works. The sha value stays the same across all requests but what you're missing is a valid graphql query.
You have to reconstruct that first and then just use the API pagination - that's the key.
Here's how:
import json
import requests
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/109.0",
"Accept": "application/graphql+json, application/json",
"Content-Type": "application/json",
"Host": "lexa.realestate.com.au",
"Referer": "https://www.realestate.com.au/",
}
endpoint = "https://lexa.realestate.com.au/graphql"
graph_query = "{\"channel\":\"buy\",\"page\":page_number,\"pageSize\":25,\"filters\":{\"surroundingSuburbs\":true," \
"\"excludeNoSalePrice\":false,\"ex-under-contract\":false,\"ex-deposit-taken\":false," \
"\"excludeAuctions\":false,\"excludePrivateSales\":false,\"furnished\":false,\"petsAllowed\":false," \
"\"hasScheduledAuction\":false},\"localities\":[{\"searchLocation\":\"sydney cbd, nsw\"}]}"
graph_json = {
"operationName": "searchByQuery",
"variables": {
"query": "",
"testListings": False,
"nullifyOptionals": False
},
"extensions": {
"persistedQuery": {
"version": 1,
"sha256Hash": "ef58e42a4bd826a761f2092d573ee0fb1dac5a70cd0ce71abfffbf349b5b89c1"
}
}
}
if __name__ == '__main__':
with requests.Session() as s:
for page in range(1, 3):
graph_json['variables']['query'] = graph_query.replace('page_number', str(page))
r = s.post(endpoint, headers=headers, data=json.dumps(graph_json))
listing = r.json()['data']['buySearch']['results']['exact']['items']
for item in listing:
print(item['listing']['_links']['canonical']['href'])
This should give you:
https://www.realestate.com.au/property-apartment-nsw-sydney-140558991
https://www.realestate.com.au/property-apartment-nsw-sydney-141380404
https://www.realestate.com.au/property-apartment-nsw-sydney-140310979
https://www.realestate.com.au/property-apartment-nsw-sydney-141259592
https://www.realestate.com.au/property-apartment-nsw-barangaroo-140555291
https://www.realestate.com.au/property-apartment-nsw-sydney-140554403
https://www.realestate.com.au/property-apartment-nsw-millers+point-141245584
https://www.realestate.com.au/property-apartment-nsw-haymarket-139205259
https://www.realestate.com.au/project/hyde-metropolitan-by-deicorp-sydney-600036803
https://www.realestate.com.au/property-apartment-nsw-haymarket-140807411
https://www.realestate.com.au/property-apartment-nsw-sydney-141370756
https://www.realestate.com.au/property-apartment-nsw-sydney-141370364
https://www.realestate.com.au/property-apartment-nsw-haymarket-140425111
https://www.realestate.com.au/project/greenland-centre-sydney-600028910
https://www.realestate.com.au/property-apartment-nsw-sydney-141364136
https://www.realestate.com.au/property-apartment-nsw-sydney-139367203
https://www.realestate.com.au/property-apartment-nsw-sydney-141156696
https://www.realestate.com.au/property-apartment-nsw-sydney-141362880
https://www.realestate.com.au/property-studio-nsw-sydney-141311384
https://www.realestate.com.au/property-apartment-nsw-haymarket-141354876
https://www.realestate.com.au/property-apartment-nsw-the+rocks-140413283
https://www.realestate.com.au/property-apartment-nsw-sydney-141350552
https://www.realestate.com.au/property-apartment-nsw-sydney-140657935
https://www.realestate.com.au/property-apartment-nsw-barangaroo-139149039
https://www.realestate.com.au/property-apartment-nsw-haymarket-141034784
https://www.realestate.com.au/property-apartment-nsw-sydney-141230640
https://www.realestate.com.au/property-apartment-nsw-barangaroo-141340768
https://www.realestate.com.au/property-apartment-nsw-haymarket-141337684
https://www.realestate.com.au/property-unitblock-nsw-millers+point-141337528
https://www.realestate.com.au/property-apartment-nsw-sydney-141028828
https://www.realestate.com.au/property-apartment-nsw-sydney-141223160
https://www.realestate.com.au/property-apartment-nsw-sydney-140643067
https://www.realestate.com.au/property-apartment-nsw-sydney-140768179
https://www.realestate.com.au/property-apartment-nsw-haymarket-139406051
https://www.realestate.com.au/property-apartment-nsw-haymarket-139406047
https://www.realestate.com.au/property-apartment-nsw-sydney-139652067
https://www.realestate.com.au/property-apartment-nsw-sydney-140032667
https://www.realestate.com.au/property-apartment-nsw-sydney-127711002
https://www.realestate.com.au/property-apartment-nsw-sydney-140903924
https://www.realestate.com.au/property-apartment-nsw-walsh+bay-139130519
https://www.realestate.com.au/property-apartment-nsw-sydney-140285823
https://www.realestate.com.au/property-apartment-nsw-sydney-140761223
https://www.realestate.com.au/project/111-castlereagh-sydney-600031082
https://www.realestate.com.au/property-apartment-nsw-sydney-140633099
https://www.realestate.com.au/property-apartment-nsw-haymarket-141102892
https://www.realestate.com.au/property-apartment-nsw-sydney-139522379
https://www.realestate.com.au/property-apartment-nsw-sydney-139521259
https://www.realestate.com.au/property-apartment-nsw-sydney-139521219
https://www.realestate.com.au/property-apartment-nsw-haymarket-140007279
https://www.realestate.com.au/property-apartment-nsw-haymarket-139156515
here my code :
session = requests.Session()
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:86.0) Gecko/20100101 Firefox/86.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
'Cache-Control': 'max-age=0'}
def generating_data():
main_url='https://opencorporates.com/users/sign_in'
r1 = session.get(main_url, headers=headers)
soup = BeautifulSoup(r1.text, 'html.parser')
tokens = soup.find('meta', attrs={'name':'csrf-token'})
token = tokens.get('content')
print(f'token is : {token}')
print('Login!')
datas = {
'utf8': '✓',
'authenticity_token': token,
'user[email]':'user',
'user[password]':'pass',
'submit':''
}
r2 = session.post('https://opencorporates.com/users/sign_in',headers=headers, data=datas, cookies=r1.cookies)
r3 = session.get('https://opencorporates.com/companies?utf8=%E2%9C%93&q=above+and+beyond&commit=Go&jurisdiction_code=&utf8=%E2%9C%93&commit=Go =&controller=searches&action=search_companies&inactive=false&mode=best_fields&search_fields[]=name&branch=false&nonprofit=&order=score', headers=headers, cookies=r1.cookies)
f = open('./res.html', 'w+')
f.write(r3.text)
f.close
generating_data()
i already get the result of login if print the r2 line, but when change to next line r3, it show the page like we are not login yet, anyone can help ? thanks
You need to remove the portion cookies=r1.cookies since you are already using a session. What this does is it overwrites the cookies collected from response of r2 that would have been sent along with the request, and which might been important for logging in. Same goes for the r2. In general, you do not need to deal with cookies yourself when you are using a session with requests. Your code for generating_data() then becomes:
def generating_data():
main_url='https://opencorporates.com/users/sign_in'
r1 = session.get(main_url, headers=headers)
soup = BeautifulSoup(r1.text, 'html.parser')
tokens = soup.find('meta', attrs={'name':'csrf-token'})
token = tokens.get('content')
print(f'token is : {token}')
print('Login!')
datas = {
'utf8': '✓',
'authenticity_token': token,
'user[email]':'user',
'user[password]':'pass',
'submit':''
}
r2 = session.post('https://opencorporates.com/users/sign_in',headers=headers, data=datas)
r3 = session.get('https://opencorporates.com/companies?utf8=%E2%9C%93&q=above+and+beyond&commit=Go&jurisdiction_code=&utf8=%E2%9C%93&commit=Go =&controller=searches&action=search_companies&inactive=false&mode=best_fields&search_fields[]=name&branch=false&nonprofit=&order=score', headers=headers)
f = open('./res.html', 'w+')
f.write(r3.text)
f.close
I was trying to scrape this website but was'nt getting the table data. I even got the request data from the Chrome dev tools but I cannot find out what I'm doing wrong.
Here is my script:
import requests,json
url='https://www.assetmanagement.hsbc.de/api/v1/nav/funds'
payload={"appliedFilters":[[{"active":True,"id":"Yes"}]],"paging":{"fundsPerPage":-1,"currentPage":1},"view":"Documents","searchTerm":[],"selectedValues":[],"pageInformation":{"country":"DE","language":"DE","investorType":"INST","tokenIssue":{"url":"/api/v1/token/issue"},"dataUrl":{"url":"/api/v1/nav/funds","id":"e0FFNDg5MTJELUFEMzEtNEQ5RC04MzA4LTdBQzZERTgyQTc4Rn0="},"shareClassUrl":{"url":"/api/v1/nav/shareclass","id":"ezUxODdjODJiLWY1YmItNDIzOC1hM2Y0LWY5NzZlY2JmMTU3OX0="},"filterUrl":{"url":"/api/v1/nav/filters","id":"ezRFREYxQTU3LTVENkYtNDBDRC1CMjJDLTQ0NDc4Nzc1NTlFQn0="},"presentationUrl":{"url":"/api/v1/nav/presentation","id":"e0E1NEZDODZGLUE5MDctNDUzQi04RTYyLTIxNDNBMEM1MEVGQ30="},"liveDataUrl":{"id":"ezlEMjA2MDk5LUNCRTItNENGMy1BRThBLUM0RTMwMEIzMjlDQ30="},"fundDetailPageUrl":"/de/institutional-investors/fund-centre","forceHttps":True}}
headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36"}
r = requests.post(url,headers=headers,data=payload)
print(r.content)
While it lacked initially the IFC-Cache-Header http header in the first place, there is also a JWT token that is passed via Authorization header.
To retrieve this token, you first need to extract values from the root page :
GET https://www.assetmanagement.hsbc.de/de/institutional-investors/fund-centre
which features the following javacript object:
window.HSBC.dpas = {
"pageInformation": {
"country": "X", <========= HERE
"language": "X", <========= HERE
"tokenIssue": {
"url": "/api/v1/token/issue",
},
"dataUrl": {
"url": "/api/v1/nav/funds",
"id": "XXXXXXXXXXXXXXXXXXXXXXXXXXXX" <========= HERE
},
....
}
}
You can extract the window.HSBC.dpas javascript object value using regex and then reformat the string so that it becomes valid JSON
These values are then passed in http headers such as X-COUNTRY, X-COMPONENT and X-LANGUAGE to the following call:
GET https://www.assetmanagement.hsbc.de/api/v1/token/issue
It returns the JWT token directly and add the Authorization header to the request as Authorization: Bearer {token}:
GET https://www.assetmanagement.hsbc.de/api/v1/nav/funds
Example:
import requests
import re
import json
api_url = "https://www.assetmanagement.hsbc.de/api/v1"
funds_url=f"{api_url}/nav/funds"
token_url = f"{api_url}/token/issue"
# call the /fund-centre url to get the documentID value in the javascript
url = "https://www.assetmanagement.hsbc.de/de/institutional-investors/fund-centre?f=Yes&n=-1&v=Documents"
r = requests.get(url,
params = {
"f":"Yes",
"n": "-1",
"v": "Documents"
})
# this gets the javascript object
res = re.search(r"^.*window\.HSBC\.dpas\s*=\s*([^;]*);", r.text, re.DOTALL)
group = res.group(1)
# convert to valid JSON: remove trailing commas: https://stackoverflow.com/a/56595068 (added "e")
regex = r'''(?<=[}\]"'e]),(?!\s*[{["'])'''
result_json = re.sub(regex, "", group, 0)
result = json.loads(result_json)
print(result["pageInformation"]["dataUrl"])
# call /token/issue API to get a token
r = requests.post(token_url,
headers= {
"X-Country": result["pageInformation"]["country"],
"X-Component": result["pageInformation"]["dataUrl"]["id"],
"X-Language": result["pageInformation"]["language"]
}, data={})
token = r.text
print(token)
# call /nav/funds API
payload={
"appliedFilters":[[{"active":True,"id":"Yes"}]],
"paging":{"fundsPerPage":-1,"currentPage":1},
"view":"Documents",
"searchTerm":[],
"selectedValues":[],
"pageInformation": result["pageInformation"]
}
headers={
"IFC-Cache-Header": "de,de,inst,documents,yes,1,n-1",
"Authorization": f"Bearer {token}"
}
r = requests.post(funds_url,headers=headers,json=payload)
print(r.content)
Try this on repl.it
There is this site called dnsdumpster that provides all the sub-domains for a domain. I am trying to automate this process and print out a list of the subdomains. Each individual sub-domain is within the "td" HTML tag. I am trying to iterate through all these tags and print out the sub-domains, but I get an error.
import requests
import re
from bs4 import BeautifulSoup
headers = {
'Host' : 'dnsdumpster.com',
'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:80.0) Gecko/20100101 Firefox/80.0',
'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language' : 'en-US,en;q=0.5',
'Accept-Encoding' : 'gzip, deflate',
'DNT' : '1',
'Upgrade-Insecure-Requests' : '1',
'Referer' : 'https://dnsdumpster.com/',
'Connection' : 'close'
}
proxies = {
'http' : 'http://127.0.0.1:8080'
}
domain = 'google.com'
with requests.Session() as s:
url = 'https://dnsdumpster.com'
response = s.get(url, headers=headers, proxies=proxies)
response.encoding = 'utf-8' # Optional: requests infers this internally
soup1 = BeautifulSoup(response.text, 'html.parser')
input = soup1.find_all('input')
csrfmiddlewaretoken_raw = str(input[0])
csrfmiddlewaretoken = csrfmiddlewaretoken_raw[55:119]
data = {
'csrfmiddlewaretoken' : csrfmiddlewaretoken,
'targetip' : domain
}
send_data = s.post(url, data=data, proxies=proxies, headers=headers)
print(send_data.status_code)
soup2 = BeautifulSoup(send_data.text, 'html.parser')
td = soup2.find_all('td')
for i in len(td):
item = str(td[i])
subdomain = item[21:37]
print(subdomain)
Error looks like this:
Traceback (most recent call last): File "dns_dumpster_4.py", line
39, in
for i in len(td): TypeError: 'int' object is not iterable
And once the above error is solve, I would also need help with another question:
How can I use a regular expression to get the individual sub-domain from within this "td" tag, because the contents of this tag is very long and messy and I only need the subdomain. I would really appreciate it, if some could help me with a simple get the sub-domain name only.
I try to catch subdomain with out using regex.
import requests
from bs4 import BeautifulSoup
headers = {
'Host' : 'dnsdumpster.com',
'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:80.0) Gecko/20100101 Firefox/80.0',
'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language' : 'en-US,en;q=0.5',
'Accept-Encoding' : 'gzip, deflate',
'DNT' : '1',
'Upgrade-Insecure-Requests' : '1',
'Referer' : 'https://dnsdumpster.com/',
'Connection' : 'close'
}
proxies = {
'http' : 'http://127.0.0.1:8080'
}
domain = 'google.com'
with requests.Session() as s:
url = 'https://dnsdumpster.com'
response = s.get(url, headers=headers, proxies=proxies)
response.encoding = 'utf-8' # Optional: requests infers this internally
soup1 = BeautifulSoup(response.text, 'html.parser')
input = soup1.find_all('input')
csrfmiddlewaretoken_raw = str(input[0])
csrfmiddlewaretoken = csrfmiddlewaretoken_raw[55:119]
data = {
'csrfmiddlewaretoken' : csrfmiddlewaretoken,
'targetip' : domain
}
send_data = s.post(url, data=data, proxies=proxies, headers=headers)
print(send_data.status_code)
soup2 = BeautifulSoup(send_data.text, 'html.parser')
td = soup2.find_all('td', {'class': 'col-md-3'})
# for dom in range(0, len(td),2):
# print(td[dom].get_text(strip=True, separator='\n'))
mysubdomain = []
for dom in range( len(td)):
# print(td[dom].get_text(strip=True, separator='\n'))
if '.' in td[dom].get_text(strip=True):
x = td[dom].get_text(strip=True, separator=',').split(',')
mysubdomain.append(x)
# print(x)
# y = td[dom].get_text(strip=True, separator=',').split(',')[1]
# mysubdomain.append(td[dom].get_text(strip=True, separator=','))
print(mysubdomain)
# print(td)
# for i in range(len(td)):
# item = str(td[i])
# print('\n', item, '\n')
# subdomain = item[21:37]
# print(subdomain)
from functools import reduce
flat_list_of_mysubdomain = reduce(lambda x, y: x + y, mysubdomain)
print(flat_list_of_mysubdomain)
I hope its help you.
Search url - http://aptaapps.apta.org/findapt/Default.aspx?UniqueKey=.
Need to get data for the zipcode(10017)
Sending post requests but I receive the search page(response from the search url) but not the page with results.
My code:
# -*- coding: UTF-8 -*-
import requests
from bs4 import BeautifulSoup, element
search_url = "http://aptaapps.apta.org/findapt/Default.aspx?UniqueKey="
session = requests.Session()
r = session.get(search_url)
post_page = BeautifulSoup(r.text, "lxml")
try:
target_value = post_page.find("input", id="__EVENTTARGET")["value"]
except TypeError:
target_value = ""
try:
arg_value = post_page.find("input", id="__EVENTARGUMENT")["value"]
except TypeError:
arg_value = ""
try:
state_value = post_page.find("input", id="__VIEWSTATE")["value"]
except TypeError:
state_value = ""
try:
generator_value = post_page.find("input", id="__VIEWSTATEGENERATOR")["value"]
except TypeError:
generator_value = ""
try:
validation_value = post_page.find("input", id="__EVENTVALIDATION")["value"]
except TypeError:
validation_value = ""
post_data = {
"__EVENTTARGET": target_value,
"__EVENTARGUMENT": arg_value,
"__VIEWSTATE": state_value,
"__VIEWSTATEGENERATOR": generator_value,
"__EVENTVALIDATION": validation_value,
"ctl00$SearchTerms2": "",
"ctl00$maincontent$txtZIP": "10017",
"ctl00$maincontent$txtCity": "",
"ctl00$maincontent$lstStateProvince": "",
"ctl00$maincontent$radDist": "1",
"ctl00$maincontent$btnSearch": "Find a Physical Therapist"
}
headers = {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Accept-Encoding": "gzip, deflate",
"Accept-Language": "ru-RU,ru;q=0.8,en-US;q=0.6,en;q=0.4",
"Cache-Control": "max-age=0",
"Content-Length": "3025",
"Content-Type": "application/x-www-form-urlencoded",
"Host": "aptaapps.apta.org",
"Origin": "http://aptaapps.apta.org",
"Proxy-Connection": "keep-alive",
"Referer": "http://aptaapps.apta.org/findapt/default.aspx?UniqueKey=",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36"
}
post_r = session.post(search_url, data=post_data, headers=headers)
print(post_r.text)
Short Answer:
try to replace:
post_r = session.post(search_url, data=post_data, headers=headers)
to:
post_r = session.post(search_url, json=post_data, headers=headers)
Long Answer:
For POST method, there are many kinds of data types to post in. Such as form-data, x-www-form-urlencoded, application/json, file and etc.
You should know what is the type of the post data. There is a brilliant chrome plugin called postman. You can use it to try different data type and find what is the correct one.
After you find, use the correct parameter key in requests.post, the parameter data if for form-data and x-www-form-urlencoded. The parameter json is for json format. You can reference the document of requests to know more about the parameter.