Firstly, I should add that you can find this request by doing the following:
1- Go to [airline site][1]
2- Type in "From" = "syd"
3- Type in "To" = "sin"
4- Make the departure date sep.3 and click one-way and search
5- On the search result page check your network get request when you click on an available seat option radio button
I'm trying to use the requests module to get the response for example from this site
And this is what I'm trying:
url = "http://www.singaporeair.com/chooseFlightJson.form?"
payload = {'selectedFlightIdDetails[0]':amount_data,'hid_flightIDs':'','hid_recommIDs':'','tripType':"O",'userPreferedCurrency':""}
response = requests.get(url, params=payload)
print response.json()
The response is supposed to be:
{"price":{"total":"595.34","currency":{"code":"AUD","label":""},"adult":{"count":1,"label":"Adult","cost":"328.00","total":"328.00"},"child":{"count":0,"label":"Child","cost":"0.00","total":"0.00"},"infant":{"count":0,"label":"Infant","cost":"0.00","total":"0.00"},"addOns":[{"label":"Airport / Government taxes ","cost":"83.24"},{"label":"Carrier Surcharges","cost":"184.10"}],"disclaimer":"Prices are shown in Canadian Dollar(CAD)","rate":"595.34 AUD \u003d 913.80 CAD","ratehint":"Estimated","unFormattedTotal":"595.34"},"messages":{"O3FF11SYD":"A few seats left","O1FF31SYD":" ","R0FF31SYD":"A few seats left","O2FF31SYD":"A few seats left","O0FF31SYD":" ","O1FF11SYD":"A few seats left","O0FF21SYD":" ","O2FF21SYD":" ","O3FF21SYD":" ","O1FF21SYD":" "},"cabinClass":{"onwardCabin":["Economy"]}}
The response is the value None, encoded in JSON; the server returns null\r\n, which means the same as None in Python.
The content type is wrong here; it is set to text/html, but the response.json() return value is entirely correct for what the server sent:
>>> import requests
>>> url = "http://www.singaporeair.com/chooseFlightJson.form?"
>>> amount_data = 0
>>> payload = {'selectedFlightIdDetails[0]':amount_data,'hid_flightIDs':'','hid_recommIDs':'','tripType':"O",'userPreferedCurrency':""}
>>> response = requests.get(url, params=payload)
>>> response
<Response [200]>
>>> response.headers['content-type']
'text/html; charset=ISO-8859-1'
>>> response.text
'null\r\n'
>>> response.json() is None
True
change protocol from http to https.
url = "https://www.singaporeair.com/chooseFlightJson.form?"
The solution is to use requests session, like so:
session = requests.Session()
Then to call all of the urls you need to simply do:
response = session.get(url)
This sets the cookies and session variables which are necessary to retrieve the data.
I have seen jsessionid used in different ways, in the url, or in this case in a cookie. But there are probably other session info that are required, which is taken care of by a requests session object.
while making http request make sure that response_type set to exact use case you are trying with.
In my case response_type='object' worked to eliminate None type in response.
Related
I have a problem with the etherscan api on ropsten testnetwork, the output of the code is: expecting value line 1 column 1 (char 0)
the code:
import requests, json
ADD = "0xfbb61B8b98a59FbC4bD79C23212AddbEFaEB289f"
KEY = "HERE THE API KEY"
REQ = requests.get(f"https://api-ropsten.etherscan.io/api?module=account&action=balance&address={str(ADD)}&tag=latest&apikey={str(KEY)}")
CONTENT = json.loads(REQ.content)
BALANCE = int(CONTENT['result'])
print(BALANCE)
When I try to do a request it gives back <Response [403]>
Some websites don't allow Python scripts to access their website. You can get around this by adding a user agent in you request.
the code would look something like this:
import requests, json
ADD = "0xfbb61B8b98a59FbC4bD79C23212AddbEFaEB289f"
KEY = "HERE THE API KEY"
LINK = f"https://api-ropsten.etherscan.io/api?module=account&action=balance&address={str(ADD)}&tag=latest&apikey={str(KEY)}"
headers = {"HERE YOUR USER-AGENT"}
REQ = requests.get(LINK, headers = headers)
CONTENT = json.loads(REQ.content)
BALANCE = int(CONTENT['result'])
print(BALANCE)
To find your user agent simply type in google: my user agent
I am using python to send a request to an API in order to pull sensor data.
The first request that I am sending is to gauge the current time of the sensor.
The second request is to gauge the amount of time, since the sensor identified motion.
I am looking to subtract the first request, by the second.
Below is the method of pulling the data I am using, which works fine. I just cant figure out why it doesn't work when I try and subtract them.
Any help would be much appreciated.
url = "IPADDRESS"
payload={}
headers = {
"Authorization":"Password"
}
response1 = requests.request("GET", url, headers=headers, data=payload, verify=False)
print('Current Time: ', response1.text)
url = "IPAddress"
payload={}
headers = {
"Authorization": "Password"
}
response = requests.request("GET", url, headers=headers, data=payload, verify=False) #HTTP request
print('Time since last motion: ', response.text)
updatedtime = response1.text
occupancelevel = response.text
result = updatedtime - occupancelevel
print(result)
Subtraction only works with some datatypes. (Specifically, only objects which have a .__sub__ method can be subtracted.)
I'm not sure what .data is, as per the docs it doesn't exist, but if it does it clearly isn't something subtractable.
If you are getting a simple number back from your endpoint, you can cast it first:
response = float(response1.data) - float(response2.data)
Otherwise you are going to have to dig into what the response type is, and work accordingly. (Hint: if you don't know, it's probably json.)
Incidentally if, as I think, there is no undeclared .data, you want .text or .json() to get at the response.
References
https://docs.python-requests.org/en/latest/api/#requests.Response
I can't figure out how to call this api correctly using python urllib or requests.
Let me give you the code I have now:
import requests
url = "http://api.cortical.io:80/rest/expressions/similar_terms?retina_name=en_associative&start_index=0&max_results=1&sparsity=1.0&get_fingerprint=false"
params = {"positions":[0,6,7,29]}
headers = { "api-key" : key,
"Content-Type" : "application/json"}
# Make a get request with the parameters.
response = requests.get(url, params=params, headers=headers)
# Print the content of the response
print(response.content)
I've even added in the rest of the parameters to the params variable:
url = 'http://api.cortical.io:80/rest/expressions/similar_terms?'
params = {
"retina_name":"en_associative",
"start_index":0,
"max_results":1,
"sparsity":1.0,
"get_fingerprint":False,
"positions":[0,6,7,29]}
I get this message back:
An internal server error has been logged # Sun Apr 01 00:03:02 UTC
2018
So I'm not sure what I'm doing wrong. You can test out their api here, but even with testing I can't figure it out. If I go out to http://api.cortical.io/, click on the Expression tab, click on the POST /expressions/similar_terms option then paste {"positions":[0,6,7,29]} in the body textbox and hit the button, it'll give you a valid response, so nothing is wrong with their API.
I don't know what I'm doing wrong. can you help me?
The problem is that you're mixing query string parameters and post data in your params dictionary.
Instead, you should use the params parameter for your query string data, and the json parameter (since the content type is json) for your post body data.
When using the json parameter, the Content-Type header is set to 'application/json' by default. Also, when the response is json you can use the .json() method to get a dictionary.
An example,
import requests
url = 'http://api.cortical.io:80/rest/expressions/similar_terms?'
params = {
"retina_name":"en_associative",
"start_index":0,
"max_results":1,
"sparsity":1.0,
"get_fingerprint":False
}
data = {"positions":[0,6,7,29]}
r = requests.post(url, params=params, json=data)
print(r.status_code)
print(r.json())
200
[{'term': 'headphones', 'df': 8.991197733061748e-05, 'score': 4.0, 'pos_types': ['NOUN'], 'fingerprint': {'positions': []}}]
So, I can't speak to why there's a server error in a third-party API, but I followed your suggestion to try using the API UI directly, and noticed you're using a totally different endpoint than the one you're trying to call in your code. In your code you GET from http://api.cortical.io:80/rest/expressions/similar_terms but in the UI you POST to http://api.cortical.io/rest/expressions/similar_terms/bulk. It's apples and oranges.
Calling the endpoint you mention in the UI call works for me, using the following variation on your code, which requires using requests.post, and as was also pointed out by t.m. adam, the json parameter for the payload, which also needs to be wrapped in a list:
import requests
url = "http://api.cortical.io/rest/expressions/similar_terms/bulk?retina_name=en_associative&start_index=0&max_results=1&sparsity=1.0&get_fingerprint=false"
params = [{"positions":[0,6,7,29]}]
headers = { "api-key" : key,
"Content-Type" : "application/json"}
# Make a get request with the parameters.
response = requests.post(url, json=params, headers=headers)
# Print the content of the response
print(response.content)
Gives:
b'[[{"term":"headphones","df":8.991197733061748E-5,"score":4.0,"pos_types":["NOUN"],"fingerprint":{"positions":[]}}]]'
I'm using the Python requests package to send http requests. I want to add a single proxy to the requests session object. eg.
session = requests.Session()
session.proxies = {...} # Here I want to add a single proxy
Currently I am looping through a bunch of proxies, and at each iteration a new session is made. I only want to set a single proxy for each iteration.
The only example I see in the documentation is:
proxies = {
"http": "http://10.10.1.10:3128",
"https": "http://10.10.1.10:1080",
}
requests.get("http://example.org", proxies=proxies)
I've tried to follow this, but to no avail. Here is my code from the script:
# eg. line = 59.43.102.33:80
r = s.get('http://icanhazip.com', proxies={'http': 'http://' + line})
But I get an error:
requests.packages.urllib3.exceptions.LocationParseError: Failed to parse 59.43.102.33:80
How is it possible to set a single proxy on a session object?
In addition to #neowu' answer, if you would like to set a proxy for the lifetime of a session object, you can also do the following -
import requests
proxies = {'http': 'http://10.11.4.254:3128'}
s = requests.session()
s.proxies.update(proxies)
s.get("http://www.example.com") # Here the proxies will also be automatically used because we have attached those to the session object, so no need to pass separately in each call
In fact, you are right, but you must ensure your defination of 'line', I have tried this , it's ok:
>>> import requests
>>> s = requests.Session()
>>> s.get("http://www.baidu.com", proxies={'http': 'http://10.11.4.254:3128'})
<Response [200]>
Did you define the line like line = ' 59.43.102.33:80', there is a space at the front of address.
There are other ways you can set proxies, apart from the solutions you have got so far:
import requests
with requests.Session() as s:
# either like this
s.proxies = {'https': 'http://105.234.154.195:8888', 'http': 'http://199.188.92.69:8000'}
# or like this
s.proxies['https'] = 'http://105.234.154.195:8888'
r = s.get(link)
Hopefully this may lead to an answer:
urllib3.util.url.parse_url(url)
Given a url, return a parsed Url namedtuple. Best-effort is performed to parse incomplete urls. Fields not provided will be None.
retrived from https://urllib3.readthedocs.org/en/latest/helpers.html
I am using the Python Requests Module to datamine a website. As part of the datamining, I have to HTTP POST a form and check if it succeeded by checking the resulting URL. My question is, after the POST, is it possible to request the server to not send the entire page? I only need to check the URL, yet my program downloads the entire page and consumes unnecessary bandwidth. The code is very simple
import requests
r = requests.post(URL, payload)
if 'keyword' in r.url:
success
fail
An easy solution, if it's implementable for you. Is to go low-level. Use socket library.
For example you need to send a POST with some data in its body. I used this in my Crawler for one site.
import socket
from urllib import quote # POST body is escaped. use quote
req_header = "POST /{0} HTTP/1.1\r\nHost: www.yourtarget.com\r\nUser-Agent: For the lulz..\r\nContent-Type: application/x-www-form-urlencoded; charset=UTF-8\r\nContent-Length: {1}"
req_body = quote("data1=yourtestdata&data2=foo&data3=bar=")
req_url = "test.php"
header = req_header.format(req_url,str(len(req_body))) #plug in req_url as {0}
#and length of req_body as Content-length
s = socket.socket(socket.AF_INET,socket.SOCK_STREAM) #create a socket
s.connect(("www.yourtarget.com",80)) #connect it
s.send(header+"\r\n\r\n"+body+"\r\n\r\n") # send header+ two times CR_LF + body + 2 times CR_LF to complete the request
page = ""
while True:
buf = s.recv(1024) #receive first 1024 bytes(in UTF-8 chars), this should be enought to receive the header in one try
if not buf:
break
if "\r\n\r\n" in page: # if we received the whole header(ending with 2x CRLF) break
break
page+=buf
s.close() # close the socket here. which should close the TCP connection even if data is still flowing in
# this should leave you with a header where you should find a 302 redirected and then your target URL in "Location:" header statement.
There's a chance the site uses Post/Redirect/Get (PRG) pattern. If so then it's enough to not follow redirect and read Location header from response.
Example
>>> import requests
>>> response = requests.get('http://httpbin.org/redirect/1', allow_redirects=False)
>>> response.status_code
302
>>> response.headers['location']
'http://httpbin.org/get'
If you need more information on what would you get if you had followed redirection then you can use HEAD on the url given in Location header.
Example
>>> import requests
>>> response = requests.get('http://httpbin.org/redirect/1', allow_redirects=False)
>>> response.status_code
302
>>> response.headers['location']
'http://httpbin.org/get'
>>> response2 = requests.head(response.headers['location'])
>>> response2.status_code
200
>>> response2.headers
{'date': 'Wed, 07 Nov 2012 20:04:16 GMT', 'content-length': '352', 'content-type':
'application/json', 'connection': 'keep-alive', 'server': 'gunicorn/0.13.4'}
It would help if you gave some more data, for example, a sample URL that you're trying to request. That being said, it seems to me that generally you're checking if you had the correct URL after your POST request using the following algorithm relying on redirection or HTTP 404 errors:
if original_url == returned request url:
correct url to a correctly made request
else:
wrong url and a wrongly made request
If this is the case, what you can do here is use the HTTP HEAD request (another type of HTTP request like GET, POST, etc.) in Python's requests library to get only the header and not also the page body. Then, you'd check the response code and redirection url (if present) to see if you made a request to a valid URL.
For example:
def attempt_url(url):
'''Checks the url to see if it is valid, or returns a redirect or error.
Returns True if valid, False otherwise.'''
r = requests.head(url)
if r.status_code == 200:
return True
elif r.status_code in (301, 302):
if r.headers['location'] == url:
return True
else:
return False
elif r.status_code == 404:
return False
else:
raise Exception, "A status code we haven't prepared for has arisen!"
If this isn't quite what you're looking for, additional detail on your requirements would help. At the very least, this gets you the status code and headers without pulling all of the page data.