i'm playing a little with google places api and requests
I got :
r = requests.get(self.url, params={'key': KEY, 'location': self.location, 'radius': self.radius, 'types': "airport"}, proxies=proxies)
r returns a 200 code, fine, but I'm confused by what r.json() returns compared to r.content
extract of r.json() :
{u'html_attributions': [],
u'next_page_token': u'CoQC-QAAABT4REkkX9NCxPWp0JcGK70kT4C-zM70b11btItnXiKLJKpr7l2GeiZeyL5y6NTDQA6ASDonIe5OcCrCsUXbK6W0Y09FqhP57ihFdQ7Bw1pGocLs_nAJodaS4U7goekbnKDlV3TaL8JMr4XpQBvlMN2dPvhFayU6RcF5kwvIm1YtucNOAUk-o4kOOziaJfeLqr3bk_Bq6DoCBwRmSEdZj34RmStdrX5RAirQiB2q_fHd6HPuHQzZ8EfdggqRLxpkFM1iRSnfls9WlgEJDxGB91ILpBsQE3oRFUoGoCfpYA-iW7E3uUD_ufby-JRqxgjD2isEIn8tntmFDjzQmjOraFQSEC6RFpAztLuk7l2ayfXsvw4aFO9gIhcXtG0LPucJkEa2nj3PxUDl',
u'results': [{u'geometry': {u'location': {u'lat': -33.939923,
u'lng': 151.175276}},
while extract of r.content :
'{\n "html_attributions" : [],\n "next_page_token" : "CoQC-QAAABT4REkkX9NCxPWp0JcGK70kT4C-zM70b11btItnXiKLJKpr7l2GeiZeyL5y6NTDQA6ASDonIe5OcCrCsUXbK6W0Y09FqhP57ihFdQ7Bw1pGocLs_nAJodaS4U7goekbnKDlV3TaL8JMr4Xp
so r.content has the double quotes like a "correct" json object while r.json() seems to have changed all double-quotes in single-quotes.
Should I care about it or not ? I can still access r.json() contents fine, just wondered if this was normal for requests to return an object with single quotes.
The json() method doesn't actually return JSON. It returns a python object (read: dictionary) that contains the same information as the json data. When you print it out, the quotes are added for the sake of readability, they are not actually in your data.
Should I care about it or not?
Not.
What you can however is to add
jsonresponse=json.dump(requests.get(xxx).json())
in order to get valid json in jsonresponse.
Python uses single or double quotes for strings. By default, it'll display single quote for strings.
However, JSON specification only consider double quotes to mark strings.
Note that requests' response.json() will return native Python types which are slightly different from their JSON representation you can see with response.content.
You are seeing the single quotes because you are looking at Python, not JSON.
Calling Response.json attempts to parse the content of the Response as JSON. If it is successful, it will return a combination of dicts, lists and native Python types as #Two-Bit Alchemist alluded to in his comment.
Behind the scenes, The json method is just calling complexjson.loads on the response text (see here). If you dig further to look at the requests.compat module to figure out what complexjson is, it is the simplejson package if it is importable on the system (i.e. installed) and the standard library json package otherwise (see here). So, modulo considerations about the encoding, you can read a call to Response.json as equivalent to:
import requests
import json
response = requests.get(...)
json.loads(response.text)
TL;DR: nothing exciting is happening and no, what is returned from Response.json is not intended to be valid JSON but rather valid JSON transformed into Python data structures and types.
Related
I am working on a program that reads the content of a Restful API from ImportIO. The connection works, and data is returned, but it's a jumbled mess. I'm trying to clean it to only return Asins.
I have tried using the split keyword and delimiter to no success.
stuff = requests.get('https://data.import.io/extractor***')
stuff.content
I get the content, but I want to extract only Asins.
results
While .content gives you access to the raw bytes of the response payload, you will often want to convert them into a string using a character encoding such as UTF-8. the response will do that for you when you access .text.
response.txt
Because the decoding of bytes to str requires an encoding scheme, requests will try to guess the encoding based on the response’s headers if you do not specify one. You can provide an explicit encoding by setting .encoding before accessing .text:
If you take a look at the response, you’ll see that it is actually serialized JSON content. To get a dictionary, you could take the str you retrieved from .text and deserialize it using json.loads(). However, a simpler way to accomplish this task is to use .json():
response.json()
The type of the return value of .json() is a dictionary, so you can access values in the object by key.
You can do a lot with status codes and message bodies. But, if you need more information, like metadata about the response itself, you’ll need to look at the response’s headers.
For More Info: https://realpython.com/python-requests/
What format is the return information in? Typically Restful API's will return the data as json, you will likely have luck parsing the it as a json object.
https://realpython.com/python-requests/#content
stuff_dictionary = stuff.json()
With that, you can load the content is returned as a dictionary and you will have a much easier time.
EDIT:
Since I don't have the full URL to test, I can't give an exact answer. Given the content type is CSV, using a pandas DataFrame is pretty easy. With a quick StackOverflow search, I found the following answer: https://stackoverflow.com/a/43312861/11530367
So I tried the following in the terminal and got a dataframe from it
from io import StringIO
import pandas as pd
pd.read_csv(StringIO("HI\r\ntest\r\n"))
So you should be able to perform the following
from io import StringIO
import pandas as pd
df = pd.read_csv(StringIO(stuff.content))
If that doesn't work, consider dropping the first three bytes you have in your response: b'\xef\xbb\xf'. Check the answer from Mark Tolonen to get parse this.
After that, selecting the ASIN (your second column) from your dataframe should be easy.
asins = df.loc[:, 'ASIN']
asins_arr = asins.array
The response is the byte string of CSV content encoded in UTF-8. The first three escaped byte codes are a UTF-8-encoded BOM signature. So stuff.content.decode('utf-8-sig') should decode it. stuff.text may also work if the encoding was returned correctly in the response headers.
I am putting a JSON response into a variable via requests.json() like this:
response = requests.get(some_url, params=some_params).json()
This however converts JSON's original " to Python's ', true to True, null to None.
This poses a problem when trying to save the response as text and the convert it back to JSON - sure, I can use .replace() for all conversions mentioned above, but even once I do that, I get other funny json decoder errors.
Is there any way in Python to get JSON response and keep original JavaScript format?
json() is the JSON decoder method. You are looking at a Python object, that is why it looks like Python.
Other formats are listed on the same page, starting from Response Content
.text: text - it has no separate link/paragraph, it is right under "Response Content"
.content: binary, as bytes
.json(): decoded JSON, as Python object
.raw: streamed bytes (so you can get parts of content as it comes)
You need .text for getting text, including JSON data.
You can get the raw text of your response with requests.get(some_url, params=some_params).text
It is the json method which converts to a Python friendly format.
I'm having a hard time understanding what is going on with this walmart API and I can't seem to iterate through key, values like I wish. I get different errors depending on the way I attack the problem.
import requests
import json
import urllib
response=requests.get("https://grocery.walmart.com/v0.1/api/stores/4104/departments/1256653758154/aisles/1256653758260/products?count=60&start=0")
info = json.loads(response.text)
print(info)
I'm not sure if I'm playing with a dictionary or a JSON object.
I'm thrown off because the API itself has no quotes over key/val.
When I do a json.loads it comes in but only comes in with single quotes.
I've tried going at it with for-loops but can only traverse the top layer and nothing else. My overall goal is to retrieve the info from the API link, turn it into JSON and be able to grab which ever key/val I need from it.
I'm not sure if I'm playing with a dictionary or a JSON object.
Python has no concept of a "JSON Object". It's a dictionary.
I'm thrown off because the API itself has no quotes over key/val.
Yes it does
{"aisleName":"Organic Dairy, Eggs & Meat","productCount":17,"products":[{"data":
When I do a json.loads it comes in but only comes in with single quotes
Because it's a Python dictionary, and the repr() of dict uses single quotes.
Try print(info['aisleName']) for example
header = {'Content-type': 'application/json','Authorization': 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' }
url = 'https://sandbox-authservice.priaid.ch/login'
response = requests.post(url, headers = header, verify=False).json()
token = json.dumps(response)
print token['ValidThrough']
I want to print the ValidThrough Attribute in my webhook, which is received as JSON data via a POST call. I know this has been asked a number of times here, but print token['ValidThrough']isnt working for me.I receive the error "TypeError: string indices must be integers, not str"
Since the response already seems to be in json, there is no need to use json.dumps.
json.dumps on a dictionary will return a string which cannot be indexed obviously and hence that error.
a requests response .json() method already loads the content of the string to json.
You should use that, but your code later serializes it back to a string, and hence the error (token is a string representation of the dict you are expecting, not the dict). You should just omit the json.dumps(response) line, and use response['ValidThrough']
There's another error here, even if you assume that the .json() returns a string that should be unserialized again you should've used json.loads(response) in order to load it into a dict (not dumps to serialize it again)
I'm starting to learn Python and I've written the following Python code (some of it omitted) and it works fine, but I'd like to understand it better. So I do the following:
html_doc = requests.get('[url here]')
Followed by:
if html_doc.status_code == 200:
soup = BeautifulSoup(html_doc.text, 'html.parser')
line = soup.find('a', class_="some_class")
value = re.search('[regex]', str(line))
print (value.group(0))
My questions are:
What does html_doc.text really do? I understand that it makes "text" (a string?) out of html_doc, but why isn't it text already? What is it? Bytes? Maybe a stupid question but why doesn't requests.get create a really long string containing the HTML code?
The only way that I could get the result of re.search was by value.group(0) but I have literally no idea what this does. Why can't I just look at value directly? I'm passing it a string, there's only one match, why is the resulting value not a string?
requests.get() return value, as stated in docs, is Response object.
re.search() return value, as stated in docs, is MatchObject object.
Both objects are introduced, because they contain much more information than simply response bytes (e.g. HTTP status code, response headers etc.) or simple found string value (e.g. it includes positions of first and last matched characters).
For more information you'll have to study docs.
FYI, to check type of returned value you may use built-in type function:
response = requests.get('[url here]')
print type(response) # <class 'requests.models.Response'>
Seems to me you are lacking some basic knowledge about Classes, Object and methods...etc, you need to read more about it here (for Python 2.7) and about requests module here.
Concerning what you asked, when you type html_doc = requests.get('url'), you are creating an instance of class requests.models.Response, you can check it by:
>>> type(html_doc)
<class 'requests.models.Response'>
Now, html_doc has methods, thus html_doc.text will return to you the server's response
Same goes for re module, each of its methods generates response object that are not simply int or string