Incorrect response from post request with requests - python

Search url - http://aptaapps.apta.org/findapt/Default.aspx?UniqueKey=.
Need to get data for the zipcode(10017)
Sending post requests but I receive the search page(response from the search url) but not the page with results.
My code:
# -*- coding: UTF-8 -*-
import requests
from bs4 import BeautifulSoup, element
search_url = "http://aptaapps.apta.org/findapt/Default.aspx?UniqueKey="
session = requests.Session()
r = session.get(search_url)
post_page = BeautifulSoup(r.text, "lxml")
try:
target_value = post_page.find("input", id="__EVENTTARGET")["value"]
except TypeError:
target_value = ""
try:
arg_value = post_page.find("input", id="__EVENTARGUMENT")["value"]
except TypeError:
arg_value = ""
try:
state_value = post_page.find("input", id="__VIEWSTATE")["value"]
except TypeError:
state_value = ""
try:
generator_value = post_page.find("input", id="__VIEWSTATEGENERATOR")["value"]
except TypeError:
generator_value = ""
try:
validation_value = post_page.find("input", id="__EVENTVALIDATION")["value"]
except TypeError:
validation_value = ""
post_data = {
"__EVENTTARGET": target_value,
"__EVENTARGUMENT": arg_value,
"__VIEWSTATE": state_value,
"__VIEWSTATEGENERATOR": generator_value,
"__EVENTVALIDATION": validation_value,
"ctl00$SearchTerms2": "",
"ctl00$maincontent$txtZIP": "10017",
"ctl00$maincontent$txtCity": "",
"ctl00$maincontent$lstStateProvince": "",
"ctl00$maincontent$radDist": "1",
"ctl00$maincontent$btnSearch": "Find a Physical Therapist"
}
headers = {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Accept-Encoding": "gzip, deflate",
"Accept-Language": "ru-RU,ru;q=0.8,en-US;q=0.6,en;q=0.4",
"Cache-Control": "max-age=0",
"Content-Length": "3025",
"Content-Type": "application/x-www-form-urlencoded",
"Host": "aptaapps.apta.org",
"Origin": "http://aptaapps.apta.org",
"Proxy-Connection": "keep-alive",
"Referer": "http://aptaapps.apta.org/findapt/default.aspx?UniqueKey=",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36"
}
post_r = session.post(search_url, data=post_data, headers=headers)
print(post_r.text)

Short Answer:
try to replace:
post_r = session.post(search_url, data=post_data, headers=headers)
to:
post_r = session.post(search_url, json=post_data, headers=headers)
Long Answer:
For POST method, there are many kinds of data types to post in. Such as form-data, x-www-form-urlencoded, application/json, file and etc.
You should know what is the type of the post data. There is a brilliant chrome plugin called postman. You can use it to try different data type and find what is the correct one.
After you find, use the correct parameter key in requests.post, the parameter data if for form-data and x-www-form-urlencoded. The parameter json is for json format. You can reference the document of requests to know more about the parameter.

Related

Unable to query graphql with a sha256 hash to scrape property links from a webpage

After visiting this website, when I fill out the inputbox with Sydney CBD, NSW and hit the search button, I can see the required results displayed on that site.
I wish to scrape the property links using requests module. When I go for the following attempt, I can get the property links from the first page.
The problem here is that I hardcoded the value of sha256Hash within params, which is not what I want to do. I don't know if the ID retrieved by issuing a get requests to the suggestion url needs to be converted to sha256Hash.
However, when I do that using this function get_hashed_string(), the value it produces is different from the hardcoded one that is available within params. As a result, the script spits out a keyError on this line: container = res.json().
import requests
import hashlib
from pprint import pprint
from bs4 import BeautifulSoup
url = 'https://suggest.realestate.com.au/consumer-suggest/suggestions'
link = 'https://lexa.realestate.com.au/graphql'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}
payload = {
'max': '7',
'type': 'suburb,region,precinct,state,postcode',
'src': 'homepage-web',
'query': 'Sydney CBD, NSW'
}
params = {"operationName":"searchByQuery","variables":{"query":"{\"channel\":\"buy\",\"page\":1,\"pageSize\":25,\"filters\":{\"surroundingSuburbs\":true,\"excludeNoSalePrice\":false,\"ex-under-contract\":false,\"ex-deposit-taken\":false,\"excludeAuctions\":false,\"excludePrivateSales\":false,\"furnished\":false,\"petsAllowed\":false,\"hasScheduledAuction\":false},\"localities\":[{\"searchLocation\":\"sydney cbd, nsw\"}]}","testListings":False,"nullifyOptionals":False},"extensions":{"persistedQuery":{"version":1,"sha256Hash":"ef58e42a4bd826a761f2092d573ee0fb1dac5a70cd0ce71abfffbf349b5b89c1"}}}
def get_hashed_string(keyword):
hashed_str = hashlib.sha256(keyword.encode('utf-8')).hexdigest()
return hashed_str
with requests.Session() as s:
s.headers.update(headers)
r = s.get(url,params=payload)
hashed_id = r.json()['_embedded']['suggestions'][0]['id']
# params['extensions']['persistedQuery']['sha256Hash'] = get_hashed_string(hashed_id)
res = s.post(link,json=params)
container = res.json()['data']['buySearch']['results']['exact']['items']
for item in container:
print(item['listing']['_links']['canonical']['href'])
If I run the script as is, it works beautifully. When I uncomment the line params['extensions']['persistedQuery']--> and run the script again, the script breaks.
How can I generate the value of sha256Hash and use the same within the script above?
This is not how graphql works. The sha value stays the same across all requests but what you're missing is a valid graphql query.
You have to reconstruct that first and then just use the API pagination - that's the key.
Here's how:
import json
import requests
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/109.0",
"Accept": "application/graphql+json, application/json",
"Content-Type": "application/json",
"Host": "lexa.realestate.com.au",
"Referer": "https://www.realestate.com.au/",
}
endpoint = "https://lexa.realestate.com.au/graphql"
graph_query = "{\"channel\":\"buy\",\"page\":page_number,\"pageSize\":25,\"filters\":{\"surroundingSuburbs\":true," \
"\"excludeNoSalePrice\":false,\"ex-under-contract\":false,\"ex-deposit-taken\":false," \
"\"excludeAuctions\":false,\"excludePrivateSales\":false,\"furnished\":false,\"petsAllowed\":false," \
"\"hasScheduledAuction\":false},\"localities\":[{\"searchLocation\":\"sydney cbd, nsw\"}]}"
graph_json = {
"operationName": "searchByQuery",
"variables": {
"query": "",
"testListings": False,
"nullifyOptionals": False
},
"extensions": {
"persistedQuery": {
"version": 1,
"sha256Hash": "ef58e42a4bd826a761f2092d573ee0fb1dac5a70cd0ce71abfffbf349b5b89c1"
}
}
}
if __name__ == '__main__':
with requests.Session() as s:
for page in range(1, 3):
graph_json['variables']['query'] = graph_query.replace('page_number', str(page))
r = s.post(endpoint, headers=headers, data=json.dumps(graph_json))
listing = r.json()['data']['buySearch']['results']['exact']['items']
for item in listing:
print(item['listing']['_links']['canonical']['href'])
This should give you:
https://www.realestate.com.au/property-apartment-nsw-sydney-140558991
https://www.realestate.com.au/property-apartment-nsw-sydney-141380404
https://www.realestate.com.au/property-apartment-nsw-sydney-140310979
https://www.realestate.com.au/property-apartment-nsw-sydney-141259592
https://www.realestate.com.au/property-apartment-nsw-barangaroo-140555291
https://www.realestate.com.au/property-apartment-nsw-sydney-140554403
https://www.realestate.com.au/property-apartment-nsw-millers+point-141245584
https://www.realestate.com.au/property-apartment-nsw-haymarket-139205259
https://www.realestate.com.au/project/hyde-metropolitan-by-deicorp-sydney-600036803
https://www.realestate.com.au/property-apartment-nsw-haymarket-140807411
https://www.realestate.com.au/property-apartment-nsw-sydney-141370756
https://www.realestate.com.au/property-apartment-nsw-sydney-141370364
https://www.realestate.com.au/property-apartment-nsw-haymarket-140425111
https://www.realestate.com.au/project/greenland-centre-sydney-600028910
https://www.realestate.com.au/property-apartment-nsw-sydney-141364136
https://www.realestate.com.au/property-apartment-nsw-sydney-139367203
https://www.realestate.com.au/property-apartment-nsw-sydney-141156696
https://www.realestate.com.au/property-apartment-nsw-sydney-141362880
https://www.realestate.com.au/property-studio-nsw-sydney-141311384
https://www.realestate.com.au/property-apartment-nsw-haymarket-141354876
https://www.realestate.com.au/property-apartment-nsw-the+rocks-140413283
https://www.realestate.com.au/property-apartment-nsw-sydney-141350552
https://www.realestate.com.au/property-apartment-nsw-sydney-140657935
https://www.realestate.com.au/property-apartment-nsw-barangaroo-139149039
https://www.realestate.com.au/property-apartment-nsw-haymarket-141034784
https://www.realestate.com.au/property-apartment-nsw-sydney-141230640
https://www.realestate.com.au/property-apartment-nsw-barangaroo-141340768
https://www.realestate.com.au/property-apartment-nsw-haymarket-141337684
https://www.realestate.com.au/property-unitblock-nsw-millers+point-141337528
https://www.realestate.com.au/property-apartment-nsw-sydney-141028828
https://www.realestate.com.au/property-apartment-nsw-sydney-141223160
https://www.realestate.com.au/property-apartment-nsw-sydney-140643067
https://www.realestate.com.au/property-apartment-nsw-sydney-140768179
https://www.realestate.com.au/property-apartment-nsw-haymarket-139406051
https://www.realestate.com.au/property-apartment-nsw-haymarket-139406047
https://www.realestate.com.au/property-apartment-nsw-sydney-139652067
https://www.realestate.com.au/property-apartment-nsw-sydney-140032667
https://www.realestate.com.au/property-apartment-nsw-sydney-127711002
https://www.realestate.com.au/property-apartment-nsw-sydney-140903924
https://www.realestate.com.au/property-apartment-nsw-walsh+bay-139130519
https://www.realestate.com.au/property-apartment-nsw-sydney-140285823
https://www.realestate.com.au/property-apartment-nsw-sydney-140761223
https://www.realestate.com.au/project/111-castlereagh-sydney-600031082
https://www.realestate.com.au/property-apartment-nsw-sydney-140633099
https://www.realestate.com.au/property-apartment-nsw-haymarket-141102892
https://www.realestate.com.au/property-apartment-nsw-sydney-139522379
https://www.realestate.com.au/property-apartment-nsw-sydney-139521259
https://www.realestate.com.au/property-apartment-nsw-sydney-139521219
https://www.realestate.com.au/property-apartment-nsw-haymarket-140007279
https://www.realestate.com.au/property-apartment-nsw-haymarket-139156515

Python: Replace Python text - CURL Python

I need the variable defined as NUEMRODEDNI to be tempered, how is it done?
Excuse me I'm a newbie
I don't know how to use a variable defined in python, I need it to be replaced as shown in the image
import string
import requests
from requests.structures import CaseInsensitiveDict
url = "www.url.com:8022/SistemaIdentificacion/servlet/com.personas.consultapersona?030bf8cfcd4bfccbd543df61b1b43f67,gx-no-cache=1648440596691"
NUEMRODEDNI = "41087712"
headers = CaseInsensitiveDict()
headers["User-Agent"] = "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:98.0) Gecko/20100101 Firefox/98.0"
headers["Accept"] = "*/*"
headers["Accept-Language"] = "es-AR,es;q=0.8,en-US;q=0.5,en;q=0.3"
headers["Accept-Encoding"] = "gzip, deflate"
headers["GxAjaxRequest"] = "1"
headers["Content-Type"] = "application/json"
headers["AJAX_SECURITY_TOKEN"] = "a6da9873adb..."
headers["X-GXAUTH-TOKEN"] = "eyJ0eXAiOiJ..."
headers["Origin"] = "http://www.url.com"
headers["Connection"] = "keep-alive"
headers["Referer"] = "www.url.com/SistemaIdentificacion/servlet/com.personas.consultapersona"
headers["Cookie"] = "GX_CLIENT_ID=0496f100-9e4e-4e36-a68d-ba3770ee2bff; GX_SESSION_ID=KUqyHU%2FZbpu96sYlj7Gry8bCYpV6CaSgVk0BLxVCpAU%3D; JSESSIONID=1812E6AC00940BDB325EF9592CB93FF8; GxTZOffset=America/Argentina/Buenos_Aires"
data = '{"MPage":false,"cmpCtx":"","parms":[EDIT HERE,{"s":"M","v":[["","(None)"],["F","Femenino"],["M","Masculino"]]},"M",{"User":"","CompanyCode":0,"Profile":"","UsrOneLoginID":"6647","Depid":1,"UsrLP":6488,"unidad":"","unidadid":"68","IMFParteCuerpo":"","denunciasid":0,"destino":"68","TipoPersona":"","NombreArchivo":"","denorigen":"","macdestinoscodorganigrama":""}],"hsh":[],"objClass":"consultapersona","pkgName":"com.personas","events":["ENTER"],"grids":{}}'
resp = requests.post(url, headers=headers, data=data)
print(resp.content)
Please provide your code typed out rather than a screenshot of it, so that we can simply copy & run it on our end.
Nontheless you should try:
NUMERODNI = "41087712"
data = '{"MPage":false,"cmpCtx":"","parms":[EDIT HERE,{"s":"M","v":[["","(None)"],["F","Femenino"],["M","Masculino"]]},"M",{"User":"","CompanyCode":0,"Profile":"","UsrOneLoginID":"6647","Depid":1,"UsrLP":6488,"unidad":"","unidadid":"68","IMFParteCuerpo":"","denunciasid":0,"destino":"68","TipoPersona":"","NombreArchivo":"","denorigen":"","macdestinoscodorganigrama":""}],"hsh":[],"objClass":"consultapersona","pkgName":"com.personas","events":["ENTER"],"grids":{}}'
data = data.replace("EDIT HERE", NUMERODNI)
print(data) # '{... "parms":[41087712, ...}'
This solution definetly delivers the desired string as result.
If your code still does not do what you would like it to, then the actual issue has to be somewhere else.
Since you're POSTing JSON data, it's easier to just keep your data as a Python dict and tell Requests to JSON-ify it (requests.post(json=...)).
That way you don't need string substitution, just use the variable.
I also took the opportunity to make your headers construction shorter – it can just be a dict.
import requests
url = "www.url.com:8022/SistemaIdentificacion/servlet/com.personas.consultapersona?030bf8cfcd4bfccbd543df61b1b43f67,gx-no-cache=1648440596691"
NUEMRODEDNI = "41087712"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:98.0) Gecko/20100101 Firefox/98.0",
"Accept": "*/*",
"Accept-Language": "es-AR,es;q=0.8,en-US;q=0.5,en;q=0.3",
"Accept-Encoding": "gzip, deflate",
"GxAjaxRequest": "1",
"Content-Type": "application/json",
"AJAX_SECURITY_TOKEN": "a6da9873adb...",
"X-GXAUTH-TOKEN": "eyJ0eXAiOiJ...",
"Origin": "http://www.url.com",
"Connection": "keep-alive",
"Referer": "www.url.com/SistemaIdentificacion/servlet/com.personas.consultapersona",
"Cookie": "GX_CLIENT_ID=0496f100-9e4e-4e36-a68d-ba3770ee2bff; GX_SESSION_ID=KUqyHU%2FZbpu96sYlj7Gry8bCYpV6CaSgVk0BLxVCpAU%3D; JSESSIONID=1812E6AC00940BDB325EF9592CB93FF8; GxTZOffset=America/Argentina/Buenos_Aires",
}
data = {
"MPage": False,
"cmpCtx": "",
"parms": [
NUEMRODEDNI,
{"s": "M", "v": [["", "(None)"], ["F", "Femenino"], ["M", "Masculino"]]},
"M",
{
"User": "",
"CompanyCode": 0,
"Profile": "",
"UsrOneLoginID": "6647",
"Depid": 1,
"UsrLP": 6488,
"unidad": "",
"unidadid": "68",
"IMFParteCuerpo": "",
"denunciasid": 0,
"destino": "68",
"TipoPersona": "",
"NombreArchivo": "",
"denorigen": "",
"macdestinoscodorganigrama": "",
},
],
"hsh": [],
"objClass": "consultapersona",
"pkgName": "com.personas",
"events": ["ENTER"],
"grids": {},
}
resp = requests.post(url, headers=headers, json=data)
resp.raise_for_status()
print(resp.content)

Wizzair scraping

I'm trying to scrape WizzAir for personal use. Can't understand what's wrong with my code. Could it be incorrect payload object or cookies?
import requests
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36",
"Accept": "application/json, text/plain, */*",
"Accept-Encoding": "gzip, deflate, sdch, br",
"Accept-Language": "en-US,en;q=0.8,lt;q=0.6,ru;q=0.4",
"Origin": "https://wizzair.com",
"Referer": "https://wizzair.com/"
}
search_url = "https://wizzair.com/lt-LT/FlightSearch"
session = requests.Session()
r = session.get("https://be.wizzair.com/3.8.2/Api/asset/yellowRibbon", headers=headers, allow_redirects=False)
session_id = r.cookies["ASP.NET_SessionId"]
cookies = {
"ASP.NET_SessionId": session_id,
"HomePageSelector": "FlightSearch",
}
# wizz_url = "https://be.wizzair.com/3.8.2/Api/search/search"
wizz_url = "https://be.wizzair.com/3.8.2/Api/asset/farechart"
payload = {"flightList":[{"departureStation":"VNO","arrivalStation":"FCO","departureDate":"2017-02-20"}],"adultCount":1,"childCount":0,"infantCount":0,"wdc":True, "dayInterval":3}
r = session.post(url=wizz_url,data=payload,headers=headers, cookies=cookies)
print r.content
>>> {"validationCodes":["FlightCount_MustBe_OneOrTwo"]}
I run this - even without session and cookies - and get some data.
You have to send it as JSON - using json=payload
import requests
payload = {
"flightList":[
{
"departureStation": "VNO",
"arrivalStation": "FCO",
"departureDate": "2017-02-20"
}
],
"adultCount": 1,
"childCount": 0,
"infantCount": 0,
"wdc": True,
"dayInterval": 3
}
url = 'https://be.wizzair.com/3.8.2/Api/search/search'
r = requests.post(url, json=payload)
print(r.text)
data = r.json()
print(data['outboundFlights'][0]['flightNumber'])
If you will have to use cookies and headers then use Session and then you don't have to copy cookies and headers from one request to another.
import requests
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36",
#"Accept": "application/json, text/plain, */*",
#"Accept-Encoding": "gzip, deflate, sdch, br",
#"Accept-Language": "en-US,en;q=0.8,lt;q=0.6,ru;q=0.4",
}
s = requests.Session()
s.headers.update(headers)
# to get cookies
r = s.get("https://www.wizzair.com/")
payload = {
"flightList":[
{
"departureStation": "VNO",
"arrivalStation": "FCO",
"departureDate": "2017-02-20"
}
],
"adultCount": 1,
"childCount": 0,
"infantCount": 0,
"wdc": True,
"dayInterval": 3
}
url = 'https://be.wizzair.com/3.8.2/Api/search/search'
r = s.post(url, json=payload)
print(r.text)
data = r.json()
print(data['outboundFlights'][0]['flightNumber'])

Getting a Bad Request when im Posting Data with requests

Im always getting the error message "Bad Request" when im trying to Post data to steam, i did lot of resears and i dont know how to fix this.
Post Values:
# Post Values
total = int(item['price'])
fee = int(item['fee'])
subtotal = total-fee
Cookies:
# Cookies
c = []
c.append('steamMachineAuthXXXXXXXXXXXXXXXXX='+steamMachineAuth)
c.append('steamLogin='+steamLogin)
c.append('steamLoginSecure='+steamLoginSecure)
c.append('sessionid='+sessionid)
cookie = ''
for value in c:
cookie = cookie+''+value+'; '
Headers:
# Headers
headers = {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Accept-Language": "de-DE,de;q=0.8,en-US;q=0.6,en;q=0.4",
"Connection": "keep-alive",
"Host": "steamcommunity.com",
"Referer": hosturl+"market/listings/"+appid+"/"+item['market_hash_name'],
"Cookie": cookie,
"Origin": hosturl,
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36",
"X-Requested-With": "XMLHttpRequest"
}
Post data:
# Post Data
post = {
'sessionid': sessionid,
'currency': int(currency),
'subtotal': subtotal,
'fee': fee,
'total': total,
'quantity': 1
}
Url:
# url
url = hosturl+'market/buylisting/'+item['listingid']
Sending Request:
# Sending Request
se = requests.Session()
re = se.post(url, data=post, headers=headers)
print re.reason
Output:
Bad Request
I can't speak specifically about the Steam service as I haven't used it yet, but my experience with typical Bad Request responses is that you're either trying an HTTP verb that isn't supported or your request is not formatted correctly.
In your case, I suspect it's the latter.
My first candidate to look at is your cookie formatting. Are you sure you don't have characters that need to be escaped?
I could suggest using something like this instead:
c = {
'steamMachineAuthXXXXXXXXXXXXXXXXX': steamMachineAuth,
'steamLogin': steamLogin,
'steamLoginSecure': steamLoginSecure,
'sessionid': sessionid
}
cookie = '; ',join('{}="{}"'.format(k, v) for k, v in c.items())

Request decoding in Python

import urllib.request
url = ""
http_header = {
"User-Agent": "Mozilla/5.0(compatible; MSIE 10.0; Windows NT 6.2; Trident/6.0) like Gecko",
"Accept": "text/html, application/xhtml+xml, */*",
"Accept-Language": "ko-KR",
"Content-type": "application/x-www-form-urlencoded",
"Host": ""
}
params = {
'id': 'asdgasd',
'call_flag': 'idPoolChk',
'reg_type': 'NY'
}
data = urllib.parse.urlencode(params).encode()
req = urllib.request.Request(url, data)
response = urllib.request.urlopen(req)
the_page = response.read()
print(the_page)
When I run this program, I get:
\xec\x95\x84\xec\x9d\xb4\xeb\x94\x94\xea\xb0\x80 \xec\xa4\x91\xeb\xb3\xb5\xeb\x90\xa9\xeb\x8b\x88\xeb\x8b\xa4
How can I transform this to Korean?
It's UTF-8.
print(the_page.decode('utf-8'))
assuming your console will handle those characters.

Categories

Resources