Dealing with a preceding comma when parsing JSON in Python - python

I'm querying an API for some JSON formatted data but it is coming back with slightly invalid formatting. There is a preceding comma which is causing a problem, I was wondering if there was any way around this?
I'm using the Requests library to issue the API queries and read the JSON like so:
resp = requests.get(citedByURL % (eid, apiKey, citedByPerPage, startPoint))
data = resp.json()
The JSON has an error which you can see here:
"entry": [{, "link": [{"#ref": "self", "#href": "http://api.elsevier.com/content/abstract/scopus_id/77957867010"}
And hence Python throws the following error:
ValueError: Expecting property name enclosed in double quotes: line 1 column 1164 (char 1163)
Is there anything I can do to maybe preprocess the data before attempting to load it as JSON?

resp = requests.get(citedByURL % (eid, apiKey, citedByPerPage, startPoint))
data = resp.text()
data = data.replace("[{,", "[{")
data = json.loads(data)

Related

JSON wrapped in NULL? [duplicate]

This question already has answers here:
What is JSONP, and why was it created?
(10 answers)
Django - Parse JSONP (Json with Padding)
(2 answers)
Closed last month.
I'm using the API of an affiliate network (Sovrn), expecting to retrieve a product's specification using the URL.
As per their documentation, I use:
url = 'URL-goes-here'
headers = {
"accept": "application/json",
"authorization": "VERY-HARD-TO-GUESS"
}
response = requests.get(url, headers=headers)
The code is working, the response I get is 200, the header contains the magical content-type application/json line
when I do
print(response.text)
I get
NULL({"merchantName":"Overstock","canonicalUrl":"URL goes here","title":"product name",...});
I tested for response type of response.text, it's <class 'str'> as expected. But when I try to process the response as json:
product_details = json.load(response.text)
I get an error message:
requests.exceptions.JSONDecodeError: [Errno Expecting value]
I'm new to JSON, but I assume the error is due to the outer NULL that the (seemingly valid) data is wrapped in.
After spending a few hours searching for a solution, it seems that I must be missing something obvious, but not sure what.
Any pointers would be extremely helpful.
That's clearly a bug in the API. Assuming it will be fixed after you complain, you could add a hack to your code
def sovrn_json_load_hack(json_text):
"""sovrn is returning invalid json as of (revision here)."""
if not json_text.startswith ("NULL("):
return json.loads(json_text)
else:
return json.loads(json_text[5:-2])
You can ignore NULL( at the beginning and ); at the end by using string slicing:
product_details = json.loads(response.text[5:-2])
Additionally, you should be using json.loads() as the content is a string.

How to handle API responses(JSON) containing \x00 (or \u0000) in its data, and store the data in Postgres using django models?

I'm making an api call, and trying to store its response in Postgres using Django models.
Here's what I have been doing:
response = requests.post(url='some.url.com', data=json.dumps(data), headers={'some': 'header'})
response_data = json.loads(response.content.decode('utf-8'))
#handler is a object of a model
handler.api_response = response_data
handler.save()
But this used to fail, when my json had fields like 'field_name': '\x00\x00\x00\x00\x00'. It used to give following error :
DataError at /api/booking/reprice/
unsupported Unicode escape sequence
LINE 1: ... NULL, "api_status" = 0, "api_response" = '{"errorRe...
^
DETAIL: \u0000 cannot be converted to text.
CONTEXT: JSON data, line 1: ..."hoursConfirmed": 0, "field_name":...
How i tried to resolve this is by using the following:
response = requests.post(url='some.url.com', data=json.dumps(data), headers={'some': 'header'})
response_data = json.loads(response.content.decode('utf-8'))
#handler is a object of a model
handler.api_response = json.loads(json.dumps(response_data).encode("unicode-escape").decode())
handler.save()
The initial issue was solved then. But recently, when i got a field with value 'field2_name': 'Hey "Whats up"'. This thing failed by giving error:
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1 column 143 (char 142)
Probably because json.loads() got confused with " inside the value as an enclosing " and not an escaped ".
Now, If i print the initial response just after json.loads(response.content.decode('utf-8')) statement, it shows the field as \x00\x00\x00\x00\x00.
But output of following code:
response = requests.post(url='some.url.com', data=json.dumps(data), headers={'some': 'header'})
response_data = json.loads(response.content.decode('utf-8'))
print(json.dumps(response_data))
This shows the field as \\u0000\\u0000\\u0000\\u0000\\u0000.
How does \x00 change to \\u0000
And how do i save this field into postgres tables ?
This is what i could think of.
json.loads(json.dumps(response_data).replace('\\u0000',''))
To add this statement before saving to postgres.
Is there a better way ?
Is the code response_data = json.loads(response.content.decode('utf-8')) wrong ? Or causing not to escape that particular character ?
FWIW, I recently ran into this and your proposed solution also worked for me. I think it's probably the simplest way to deal with it. Apparently this is the only character that can't go into a Postgres JSON column.
json.loads(json.dumps(response_data).replace('\\u0000',''))

How to parse jsonp returned from api using python

I'm very new to coding, and I'm building my first web application using open REST api with python flask.
I think the api is returning jsonp which looks like this - callbackfunction{ json }; and I get from other posts that all I need to do is getting rid of this padding. However, I can't figure out at which point I should implement the stripping.
This is my code. 5th line is throwing an error "the JSON object must be str, bytes or bytearray, not HTTPResponse"
def lookup(title):
try:
url = "http://www.aladin.co.kr/ttb/api/ItemSearch.aspx?ttbkey=foo&Query=bar"
result = urllib.request.urlopen(url)
data = json.loads(result)
data_json = data.split("{", 1)[1].strip("}")
return data_json
except requests.RequestException:
return None
I'm sure it's working well until 4th line. When I tried the code below, at least it returned result, though cryptic, like this.
b'{ "version" : "20070901", "title" :
"\xec\x95\x8c\xeb\x9d\xbc\xeb\x94\x98 \xea\xb2\x80 ...
"customerReviewRank":9 } ] };'
Judging by the keys, I'm pretty sure this is the information I requested. So what can I do to fix this? Thanks in advance!
def lookup(title):
try:
url = "http://www.aladin.co.kr/ttb/api/ItemSearch.aspx?ttbkey=foo&Query=bar"
result = urllib.request.urlopen(url)
res = result.readline()
return res

How to return this valid json data in Python?

I tested using Python to translate a curl to get some data.
import requests
import json
username="abc"
password="123"
headers = {
'Content-Type': 'application/json',
}
params = (
('version', '2017-05-01'),
)
data = '{"text":["This is message one."], "id":"en-es"}'
response = requests.post('https://somegateway.service/api/abc', headers=headers, params=params, data=data, auth=(username, password))
print(response.text)
The above works fine. It returns json data.
It seems ["This is message one."] is a list. I want to use a variable that loads a file to replace this list.
I tried:
with open(f,"r",encoding='utf-8') as fp:
file_in_list=fp.read().splitlines()
toStr=str(file_in_list)
data = '{"text":'+toStr+', "id":"en-es"}'
response = requests.post('https://somegateway.service/api/abc', headers=headers, params=params, data=data, auth=(username, password))
print(response.text)
But it returned error below.
{
"code" : 400,
"error" : "Mapping error, invalid JSON"
}
Can you help? How can I have valid response.text?
Thanks.
update:
The content of f contains only five lines below:
This is message one.
this is 2.
this is three.
this is four.
this is five.
The reason your existing code fails is that str applied to a list of strings will only rarely give you valid JSON. They're not intended to do the same thing. JSON only allows double-quoted strings; Python allows both single- and double-quoted strings. And, unless your strings all happen to include ' characters, Python will render them with single quotes:
>>> print(["abc'def"]) # gives you valid JSON, but only by accident
["abc'def"]
>>> print(["abc"]) # does not give you valid JSON
['abc']
If you want to get the valid JSON encoding of a list of strings, don't try to trick str into giving you valid JSON by accident, just use the json module:
toStr = json.dumps(file_in_list)
But, even more simply, you shouldn't be trying to figure out how to construct JSON strings in the first place. Just create a dict and json.dumps the whole thing:
data = {"text": file_in_list, "id": "en-es"}
data_str = json.dumps(data)
Being able to do this is pretty much the whole point of JSON: it's a simple way to automatically serialize all of the types that are common to all the major scripting languages.
Or, even better, let requests do it for you by passing a json argument instead of a data argument:
data = {"text": file_in_list, "id": "en-es"}
response = requests.post('https://somegateway.service/api/abc', headers=headers, params=params, json=data, auth=(username, password))
This also automatically takes care of setting the Content-Type header to application/json for you. You weren't doing that—and, while many servers will accept your input without it, it's illegal, and some servers will not allow it.
For more details, see the section More complicated POST requests in the requests docs. But there really aren't many more details.
tldr;
toStr = json.dumps(file_in_list)
Explanation
Assuming your file contains something like
String_A
String_B
You need to ensure that toStr is:
Enclosed by [ and ]
Every String in the list is enclosed by quotation marks.
So your raw json (as a String) is equal to '{"text":["String_A", "String_B"], "id":"en-es"}'

Python - string indices must be integers, not str - need help understanding data structure

My code looks like this:
payload = base64.b64decode(record['kinesis']['data'])
print("Decoded payload: " + payload)
In the log the result of the print line looks like this:
Decoded payload:
{
"timeStamp": 1509835693.7319956,
"thing": "testing/23"
}
Wouldn't I reference the timeStamp like this:
payload['timeStamp']
I am confused by what I have in this data structure. Can someone please explain to me what I have here and how I access the data inside the variable payload?
The decoded data is a string (as the error says), not a dictionary. You need to parse it before accessing its elements.
Considering your data is in JSON format, like the one you presented above:
import json
payload_str = base64.b64decode(record['kinesis']['data'])
payload = json.loads(payload_str) # parsing
print("Decoded payload: ", payload)
And now you have no problem accessing payload['timeStamp'], as far as the JSON does contain this field.

Categories

Resources