I'm using Selenium-wire to try and read the request response text of some network traffic. The code I have isn't fully reproducable as the account is behind a paywall.
The bit of selenium-wire I'm currently using using is:
for request in driver.requests:
if request.method == 'POST' and request.headers['Content-Type'] == 'application/json':
# The body is in bytes so convert to a string
body = driver.last_request.body.decode('utf-8')
# Load the JSON
data = json.loads(body)
Unfortunately though, that is reading the payload of the request
and I'm trying to parse the Response:
You need to get last_request's response:
body = driver.last_request.response.body.decode('utf-8')
data = json.loads(body)
I usually use these 3 steps
# I define the scopes to avoid other post requests that are not related
# we can also use it to only select the required endpoints
driver.scopes = [
# .* is a regex stands for any char 0 or more times
'.*stackoverflow.*',
'.*github.*'
]
# visit the page
driver.get('LINK')
# get the response
response = driver.last_request # or driver.requests[-1]
# get the json
js = json.loads(
decode(
response.response.body,
# get the encoding from the request
response.headers.get('Content-Encoding', 'identity'),
)
)
# this clears all the requests it's a good idea to do after each visit to the page
del driver.requests
for more info here is the doc
Related
I am trying to adapt my HTTP request from running it in Python to R. This is the post request in Python:
import requests
import json
r = requests.post("https://feed-dev.ihsmarkit.com/apikey",
data={'username': 'markit/resellers/API_OPS/accounts/demo.dv', 'password':
'Example#N6'})
print("POST /apikey", r.status_code, r.reason)
apikey = r.text
print(apikey)
I did some research and found the httr package in R is best for dealing with API related requests. However I tried to use the POST() function in a few attempts but got the same error 400 ("MISSING_PARAMETER": Parameter username not provided.") responses. Here are a few attempts I used:
#attempt 1
response <- POST(url = "https://feed-dev.ihsmarkit.com/apikey",
add_headers(.headers = c("Content-Type"="application/x-www-form-urlencoded")),
authenticate('markit/resellers/API_OPS/accounts/demo.dv', 'Example#N6')
)
#attempt 2
request_body <- data.frame(
username = 'markit/resellers/API_OPS/accounts/demo.dv',
password = 'Example#N6'
)
request_body_json <- toJSON(list(data = request_body), auto_unbox = TRUE)
POST(url = "https://feed-dev.ihsmarkit.com/apikey",
add_headers(.headers = c("Content-Type"="application/x-www-form-urlencoded","Accept"="application/json"),
body = request_body_json))
#attempt 3
result <- POST(url = "https://feed-dev.ihsmarkit.com/apikey",
add_headers(.headers = c("Content-Type"="application/x-www-form-urlencoded","Accept"="application/json")),
body = '{"data":{"username":"markit/resellers/API_OPS/accounts/demo.dv", "password":"Example#N6}}',
encode = 'raw')
Do you know how should I properly convert my request?
Use
response <- POST(url = "https://feed-dev.ihsmarkit.com/apikey",
encode = "form",
body= list(
username ='markit/resellers/API_OPS/accounts/demo.dv',
password = 'Example#N6')
)
Just pass your data as a list and the POST will take care of formatting it as form data when you choose encode="form". Your python code doesn't seem to use JSON at all. You just have literal dictionary values where you are storing your data. Only use authenticate() when the HTTP endpoint requires basic HTTP authentication. For endpoints that require a username/password in the body of the message, that's not how basic HTTP authentication works.
Summary:
Currently i am doing a GET Request on a {.log} URL which is having around 7000+ lines.
I need to GET the Response, validate for a particular message in the response and if its not present, i need to do a GET Request again on the same URL.
This iteration on the GET is very time consuming and most of the time results in a stuck state
Expectation:
I need a way out wherein i do a GET Request operation and fetch only last 100 lines as a response rather than fetching all the 7000+ lines every time.
URL = "http://sdd.log"
Code
def get_log(self):
logging.info("Sending a get request to retrieve pronghorn log")
resp = requests.request("GET", "http://ssdg.log")
logging.info("Printing the callback url response")
#logging.info(resp)
#logging.info(resp.text)
return resp.text
You cannot simply download only the last 100 lines of an HTTP request. You can however simply get the last 100 lines of the resulting response by using
data = resp.text.split('\n')
last_lines = '\n'.join(data[-100:])
return last_lines
So, if your server accepts range requests then you can use code like this to get the last 4096 bytes
import requests
from io import BytesIO
url = 'https://file-examples.com/wp-content/uploads/2017/10/file_example_JPG_100kB.jpg'
resp = requests.request("HEAD", url)
unit = resp.headers['Accept-Ranges']
print(resp.headers['Content-Length'])
print(unit)
headers = {'Range': f'{unit}=-4096'}
print(headers)
resp = requests.request("GET", url, headers=headers)
b = BytesIO()
for chunk in resp.iter_content(chunk_size=128):
b.write(chunk)
print(b.tell())
b.seek(0)
data = b.read()
print(f"len(data): {len(data)}")
I'm trying to decode a qr image from a website with python: https://zxing.org/w/decode.jspx
And i don't know why my post requests fail and i don't get any response
import requests
url ="https://zxing.org/w/decode.jspx"
session = requests.Session()
f = {'f':open("new.png","rb")}
response = session.post(url,files = f)
f = open("page.html","w")
f.write(response.text)
f.close()
session.close()
Even when i do it with a get requests it still fail ... :/
url ="https://zxing.org/w/decode.jspx"
session = requests.Session()
data = {'u':'https://www.qrstuff.com/images/default_qrcode.png'}
response = session.post(url,data = data)
f = open("page.html","w")
f.write(response.text)
f.close()
session.close()
maby because the website contain two forms ? ...
Thanks for helping
You can do this:
import urllib
url ="https://zxing.org/w/decode?u=https://www.qrstuff.com/images/default_qrcode.png"
response = urllib.urlopen(url)
f = open("page.html","w")
f.write(response.read())
f.close()
If you want to send url action == get and if you want to post data as a file, action == post.
You can check it with Hackbar addons on Firefox
Well i just saw my mistake ...
the web site is : https://zxing.org/w/decode.jspx
but once you have a post or a get it'll be
https://zxing.org/w/decode without ".jspx" so i just removed it and every thing worked well !!
I used Postman to send a raw request to Jetstar website to get the flight details. And I wanted to use python script to do the same thing using requests library, but I cannot get back the correct response.
Here what I have done in Postman:
And a simple script I used to send post request:
import requests
files = {'file': open('PostContent.txt', 'rb')}
if __name__ == "__name__"):
url = "http://www.jetstar.com/"
r = requests.post(url, files = files)
print(r.text)
When I run the python script, I always get the welcome page not flight details. I am not sure what is error?
Note: The PostContent.txt contains the form-data in raw text when I search for flights.
I used Chrome Dev Tool to capture the POST request when I search for a specific flight date. And it is the Form Data in the Headers.
Try using a dictionary instead of a FILE. The FILE is supposed to be for posting a FILE, not a FORM-ENCODED post, which is probably what the site expects.
payload = {
'DropDownListCurrency': 'SGD'
}
r = requests.post("http://httpbin.org/post", data=payload)
You use a key file which is wrong for this type of request. Also your sample code isn't working! Just paste working code here...
import requests
import logging
logging.basicConfig(level=logging.DEBUG)
payload = {"__EVENTTARGET":"",
"__EVENTARGUMENT":"",
"__VIEWSTATE":"/wEPDwUBMGQYAQUeX19Db250cm9sc1JlcXVpcmVQb3N0QmFja0tleV9fFgEFJ01lbWJlckxvZ2luU2VhcmNoVmlldyRtZW1iZXJfUmVtZW1iZXJtZSDCMtVG/1lYc7dy4fVekQjBMvD5",
"pageToken":"",
"total_price":"",
"ControlGroupSearchView$AvailabilitySearchInputSearchView$RadioButtonMarketStructure":"RoundTrip",
"ControlGroupSearchView$AvailabilitySearchInputSearchView$TextBoxMarketOrigin1":"Nadi (NAN)",
"ControlGroupSearchView$AvailabilitySearchInputSearchView$TextBoxMarketDestination1":"Melbourne (Tullamarine) (MEL)",
"ControlGroupSearchView$AvailabilitySearchInputSearchView$TextboxDepartureDate1":"14/01/2015",
"ControlGroupSearchView$AvailabilitySearchInputSearchView$TextboxDestinationDate1":"16/02/2015",
"ControlGroupSearchView$AvailabilitySearchInputSearchView$DropDownListCurrency":"AUD",
"ControlGroupSearchView$AvailabilitySearchInputSearchView$TextBoxMarketOrigin2":"",
"ControlGroupSearchView$AvailabilitySearchInputSearchView$TextBoxMarketDestination2":"",
"ControlGroupSearchView$AvailabilitySearchInputSearchView$TextboxDepartureDate2":"16/02/2015",
"ControlGroupSearchView$AvailabilitySearchInputSearchView$TextboxDestinationDate2":"",
"ControlGroupSearchView$AvailabilitySearchInputSearchView$TextBoxMarketOrigin3":"",
"ControlGroupSearchView$AvailabilitySearchInputSearchView$TextBoxMarketDestination3":"",
"ControlGroupSearchView$AvailabilitySearchInputSearchView$TextboxDepartureDate3":"27/12/2014",
"ControlGroupSearchView$AvailabilitySearchInputSearchView$TextboxDestinationDate3":"",
"ControlGroupSearchView$AvailabilitySearchInputSearchView$TextBoxMarketOrigin4":"",
"ControlGroupSearchView$AvailabilitySearchInputSearchView$TextBoxMarketDestination4":"",
"ControlGroupSearchView$AvailabilitySearchInputSearchView$TextboxDepartureDate4":"03/01/2015",
"ControlGroupSearchView$AvailabilitySearchInputSearchView$TextboxDestinationDate4":"",
"ControlGroupSearchView$AvailabilitySearchInputSearchView$TextBoxMarketOrigin5":"",
"ControlGroupSearchView$AvailabilitySearchInputSearchView$TextBoxMarketDestination5":"",
"ControlGroupSearchView$AvailabilitySearchInputSearchView$TextboxDepartureDate5":"10/01/2015",
"ControlGroupSearchView$AvailabilitySearchInputSearchView$TextboxDestinationDate5":"",
"ControlGroupSearchView$AvailabilitySearchInputSearchView$TextBoxMarketOrigin6":"",
"ControlGroupSearchView$AvailabilitySearchInputSearchView$TextBoxMarketDestination6":"",
"ControlGroupSearchView$AvailabilitySearchInputSearchView$TextboxDepartureDate6":"17/01/2015",
"ControlGroupSearchView$AvailabilitySearchInputSearchView$TextboxDestinationDate6":"",
"ControlGroupSearchView$AvailabilitySearchInputSearchView$DropDownListPassengerType_ADT":1,
"ControlGroupSearchView$AvailabilitySearchInputSearchView$DropDownListPassengerType_CHD":0,
"ControlGroupSearchView$AvailabilitySearchInputSearchView$DropDownListPassengerType_INFANT":0,
"ControlGroupSearchView$AvailabilitySearchInputSearchView$RadioButtonSearchBy":"SearchStandard",
"ControlGroupSearchView$AvailabilitySearchInputSearchView$TextBoxMultiCityOrigin1":"Origin",
"ControlGroupSearchView$AvailabilitySearchInputSearchView$TextBoxMultiCityDestination1":"Destination",
"ControlGroupSearchView$AvailabilitySearchInputSearchView$TextboxDepartureMultiDate1":"",
"ControlGroupSearchView$AvailabilitySearchInputSearchView$TextBoxMultiCityOrigin2":"Origin",
"ControlGroupSearchView$AvailabilitySearchInputSearchView$TextBoxMultiCityDestination2":"Destination",
"ControlGroupSearchView$AvailabilitySearchInputSearchView$TextboxDepartureMultiDate2":"",
"ControlGroupSearchView$AvailabilitySearchInputSearchView$DropDownListMultiPassengerType_ADT":1,
"ControlGroupSearchView$AvailabilitySearchInputSearchView$DropDownListMultiPassengerType_CHD":0,
"ControlGroupSearchView$AvailabilitySearchInputSearchView$DropDownListMultiPassengerType_INFANT":0,
"ControlGroupSearchView$AvailabilitySearchInputSearchView$numberTrips":2,
"ControlGroupSearchView$AvailabilitySearchInputSearchView$ButtonSubmit":""}
if __name__ == "__main__":
url = "http://booknow.jetstar.com/Search.aspx"
r = requests.post(url, data=payload)
print(r.text)
I would like to open a StackExchange API (search endpoint) URL and parse the result [0]. The documentation says that all results are in JSON format [1]. I open up this URL in my web browser and the results are absolutely fine [2]. However, when I try opening it up using a Python program it returns encoded text which I am unable to parse. Here's a snip
á¬ôŸ?ÍøäÅ€ˆËç?bçÞIË
¡ëf)j´ñ‚TF8¯KÚpr®´Ö©iUizEÚD +¦¯÷tgNÈÑ.G¾LPUç?Ñ‘Ù~]ŒäÖÂ9Ÿð1£µ$JNóa?Z&Ÿtž'³Ðà#Í°¬õÅj5ŸE÷*æJî”Ï>íÓé’çÔqQI’†ksS™¾þEíqÝýly
My program to open a URL is as follows. What am I doing particularly wrong?
''' Opens a URL and returns the result '''
def open_url(query):
request = urllib2.Request(query)
response = urllib2.urlopen(request)
text = response.read()
#results = json.loads(text)
print text
title = openRawResource, AssetManager.AssetInputStream throws IOException on read of larger files
page1_query = stackoverflow_search_endpoint % (1,urllib.quote_plus(title),access_token,key)
[0] https://api.stackexchange.com/2.1/search/advanced?page=1&pagesize=100&order=desc&sort=relevance&q=openRawResource%2C+AssetManager.AssetInputStream+throws+IOException+on+read+of+larger+files&site=stackoverflow&access_token=******&key=******
[1] https://api.stackexchange.com/docs
[2] http://hastebin.com/qoxaxahaxa.sm
Soultion
I found the solution. Here's how you would do it.
request = urllib2.Request(query)
request.add_header('Accept-encoding', 'gzip')
response = urllib2.urlopen(request)
if response.info().get('Content-Encoding') == 'gzip':
buf = StringIO( response.read())
f = gzip.GzipFile(fileobj=buf)
data = f.read()
result = json.loads(data)
Can not post the complete output as it is too huge.Many Thanks to Evert and Kristaps for pointing out about decompression and setting headers on the request. In addition, another similar question one would want to look into [3].
[3] Does python urllib2 automatically uncompress gzip data fetched from webpage?
The next paragraph of the documentation says:
Additionally, all API responses are compressed. The Content-Encoding
header is always set, but some proxies will strip this out. The proper way to decode API responses can be found here.
Your output does look like it may be compressed. Browsers automatically decompress data (depending on the Content-Encoding), so you would need to look at the header and do the same: results = json.loads(zlib.decompress(text)) or something similar.
Do check the here link as well.
I found the solution. Here's how you would do it.
request = urllib2.Request(query)
request.add_header('Accept-encoding', 'gzip')
response = urllib2.urlopen(request)
if response.info().get('Content-Encoding') == 'gzip':
buf = StringIO( response.read())
f = gzip.GzipFile(fileobj=buf)
data = f.read()
result = json.loads(data)
Can not post the complete output as it is too huge.Many Thanks to Evert and Kristaps for pointing out about decompression and setting headers on the request. In addition, another similar question one would want to look into [1].
[1] Does python urllib2 automatically uncompress gzip data fetched from webpage?