Reading JSON file using Python - python

I have a JSON file called 'elements.json':
[
{ldraw="003238a",lgeo="003238a",slope=0,anton=0,lutz=0,owen=0,damien=0},
{ldraw="003238b",lgeo="003238b",slope=0,anton=0,lutz=0,owen=0,damien=0},
{ldraw="003238c",lgeo="003238c",slope=0,anton=0,lutz=0,owen=0,damien=0},
{ldraw="003238d",lgeo="003238d",slope=0,anton=0,lutz=0,owen=0,damien=0}
]
I have a Python file called 'test.py':
import json
with open('elements.json') as json_file:
data = json.load(json_file)
for p in data:
print('ldraw: ' + p['ldraw'])
print('lgeo: ' + p['lgeo'])
Running from the Windows command line I get this error:
Traceback (most recent call last):
File "test.py", line 4, in <module>
data = json.load(json_file)
File "C:\Python27\lib\json\__init__.py", line 278, in load
**kw)
File "C:\Python27\lib\json\__init__.py", line 326, in loads
return _default_decoder.decode(s)
File "C:\Python27\lib\json\decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Python27\lib\json\decoder.py", line 382, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting property name: line 2 column 2 (char 3)
What property name is expected? Why am I getting the error?

You aren't following the JSON specification. See json.org for details.
[
{"ldraw":"003238a","lgeo":"003238a","slope":0,"anton":0,"lutz":0,"owen":0,"damien":0},
{"ldraw":"003238b","lgeo":"003238b","slope":0,"anton":0,"lutz":0,"owen":0,"damien":0},
{"ldraw":"003238c","lgeo":"003238c","slope":0,"anton":0,"lutz":0,"owen":0,"damien":0},
{"ldraw":"003238d","lgeo":"003238d","slope":0,"anton":0,"lutz":0,"owen":0,"damien":0}
]
Your Python code is correct.
Your ldraw and lgeo values look like hexadecimal; JSON does not support hex, and you will have to do the extra work yourself.
[Edit: They're not]

Your file elements.json is not a valid json file.
It should have looked like this -
[{"ldraw":"003238a","lgeo":"003238a"}]

Your JSON format is invalid, JSON stands for JavaScript Object Notation, like the Javascript Object. So, you should replace "=" to ":". It means key-value pairs.
Wrong:
ldraw="003238a"
ldraw: 003238a // if no quote, the value should be the digit only.
Right:
ldraw: "003238a"
ldraw: { "example-key": "value" }
ldraw: "True"

Related

Load JSON object including escaped json string

I'm trying to load a JSON object from a string (via Python). This object has a single key mapped to an array. The array includes a single value which is another serialized JSON object. I have tried a few online JSON parsers / validators, but can't seem to identify what the issue with loading this object is.
JSON Data:
{
"parent": [
"{\"key\":\"value\"}"
]
}
Trying to load from Python:
>>> import json
>>> test_string = '{"parent":["{\"key\":\"value\"}"]}'
>>> json.loads(test_string)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 382, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting , delimiter: line 1 column 15 (char 14)
If you try out your string in the REPL, you'll see pretty quickly why it doesn't work:
>>> '{"parent":["{\"key\":\"value\"}"]}'
'{"parent":["{"key":"value"}"]}'
Notice the \ have gone away because python is treating them as escape sequences ...
One easy fix is to use a raw string:
>>> r'{"parent":["{\"key\":\"value\"}"]}'
'{"parent":["{\\"key\\":\\"value\\"}"]}'
e.g.
>>> import json
>>> test_string = r'{"parent":["{\"key\":\"value\"}"]}'
>>> json.loads(test_string)
{u'parent': [u'{"key":"value"}']}

json.loads() giving exception that it expects a value, looks like value is there

Code:
loaded_json = json.loads(json_set)
json_set is a String gleaned from a webpage which is JSON formatted data. The full String (warning: LONG) is here: http://pastebin.com/wykwNEeg
There error it gives me (if I save the string to its own file and readlines + json.loads that line in IDLE) is:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.4/json/__init__.py", line 318, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.4/json/decoder.py", line 343, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.4/json/decoder.py", line 361, in raw_decode
raise ValueError(errmsg("Expecting value", s, err.value)) from None
ValueError: Expecting value: line 1 column 62233 (char 62232)
","distance":\u002d1,"lo
^(gedit tells me column 62233 lies between the colon and the \
I'm guessing it has something to do with the unicode, this particular one being unicode for -, so that value should be "distance":-1
What's strange is that if I print the line out when I hit the exception (or wherever, I guess) it comes out as above. However if I open up a python3 IDLE session and do this, I get different results:
>>> mystr = '"distance":\u002d1'
>>> mystr
'"distance":-1'
>>> print(mystr)
"distance":-1
>>>
How do I get this JSON to load properly?
===============
This data comes earlier on from code that looks like so (basically showing that the string is a result of response.decode('utf8')):
'''This bit gets the page from the website, it's called from the below code block'''
def load_arbitrary_page(self, url):
response = self.opener.open(url)
response_list = response.readlines()
decode_list = []
for line in response_list:
decode = line.decode('utf8')
decode_list.append(decode)
print(BeautifulSoup(''.join(decode_list)).find("title"))
return decode_list
html = grabber.load_arbitrary_page(url)
count+=1
for line in html:
#Appears to show up 3 times, all in the same line
if "<my search parameter>" in line:
content_list.append(line)
break
Finally, the content_list is split on comments (re.split("<!-- ...) and the final portion of that becomes the variable json_set.
If you look at the ECMA-404 standard for JSON, you'll see that numbers may have an optional leading minus sign which they designate as U+002D which is the ASCII minus sign. However, \u002D is not a minus sign. It is a character escape for a minus sign, but character escapes are only valid in the context of a string value. But string values must start and end with double quotes, so this is not a string value. Thus the data you have does not parse as a valid JSON value, and the Python JSON parser is correct in rejecting it.
If you try validating that data blob using the http://jsonlint.com/ website, it will also report that the data is not valid JSON.
Parse error on line 2172:
... "distance": \u002d1,
-----------------------^
Expecting 'STRING', 'NUMBER', 'NULL', 'TRUE', 'FALSE', '{', '['
The example you give with IDLE working is not an equal comparison because the string you gave is different:
'"distance":\u002d1' != '"distance":\\u002d1'
The string on the left is the string you gave IDLE and if you enclosed it in curly braces, it would be valid JSON:
>>> json.loads('{"distance":\u002d1}')
{'distance': -1}
But if you give it the string on the right, you'll see that it will not work as you expect:
>>> json.loads('{"distance":\\u002d1}')
Traceback (most recent call last):
File "/usr/lib/python3.2/json/decoder.py", line 367, in raw_decode
obj, end = self.scan_once(s, idx)
StopIteration
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.2/json/__init__.py", line 309, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.2/json/decoder.py", line 351, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.2/json/decoder.py", line 369, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded

How to convert this json string to dict?

After executing the following code:
import json
a = '{"excludeTypes":"*.exe;~\\$*.*"}'
json.loads(a)
I get:
Traceback (most recent call last):
File "", line 1, in
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/init.py", line 338, in loads
return _default_decoder.decode(s)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 365, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 381, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting property name: line 1 column 2 (char 1)
So how can I convert 'a' to dict.
Please note that the string is already in 'a' and I cannot add 'r' in front of it. Ideally, the string should have been {"excludeTypes":"*.exe;~\\\\$*.*"}
Also, the following code doesn't work:
import json
a = '{"excludeTypes":"*.exe;~\\$*.*"}'
b = repr(a)
json.loads(b)
import ast
d = ast.literal_eval(a)
By escaping Escape Character "\":
import json
a = '{"excludeTypes":"*.exe;~\\$*.*"}'
a = a.replace("\\","\\\\")
json.loads(a)

json module bug in Python 3.4.1?

I'm trying to use SharePoint 2010's REST API, which was going swimmingly until i ran into this:
Traceback (most recent call last):
File "TestJSON.py", line 21, in <module>
json.loads(s)
File "c:\Python33\lib\json\__init__.py", line 316, in loads
return _default_decoder.decode(s)
File "c:\Python33\lib\json\decoder.py", line 351, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "c:\Python33\lib\json\decoder.py", line 367, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting ',' delimiter: line 1 column 14 (char 13)
Test case:
import json
s='''{"etag": "W/\"1\""}'''
json.loads(s)
Python 3.3.5 gives the same error. Did i find a bug in the JSON library?
Update:
The actual error i'm getting (preceded by affected part) is this:
>>>>>>Err:tration?$filter=Modified%20gt%20datetime\'2014-04-30T00:00:00.000Z\'&$orderby=Mo<<<<<<<
Traceback (most recent call last):
File "TestURL.py", line 41, in <module>
j = json.loads(body)
File "c:\Python33\lib\json\__init__.py", line 316, in loads
return _default_decoder.decode(s)
File "c:\Python33\lib\json\decoder.py", line 351, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "c:\Python33\lib\json\decoder.py", line 367, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Invalid \escape: line 4005 column 114 (char 314020)
from
body = response.read().decode("utf-8")
print(">>>>>>Err:{}<<<<<<<".format(body[314020-40:314020+40]))
The string literal is not escaped correctly. Make sure the string really represent JSON.
>>> s = r'''{"etag": "W/\"1\""}''' # NOTE: raw string literal
>>> json.loads(s)
{'etag': 'W/"1"'}
The \' sequence is invalid JSON. Single quotes do not need to be escaped, making this an invalid string escape.
You could try to repair it after the fact:
import re
data = re.sub(r"(?<!\\)\\'", "'", data)
before loading it with JSON. This replaces \' with plain ', provided the backslash wasn't already escaped by a preceding \.
Since single quotes can only appear in string values, this should be safe.

Invalid Control Character in JSON decode (using Python)

def list(type, extra=""):
if extra != "":
entity = "http://api.crunchbase.com/v/1/" + type + "/" + extra + ".js?api_key=" + key
data = json.load(urllib2.urlopen(entity))
else:
entity = "http://api.crunchbase.com/v/1/" + type + ".js?api_key=" + key
data = json.load(urllib2.urlopen(entity))
return data
The function list is called specifically here:
x = colink
details = list(co, x)
specifically on the instance where x is "if_this_then_that" and co is "company"
The code breaks down on this line when I query on the second line (the entity link is properly formatted). The Error Message is below and the line in the JSON file where the error occurs follows. I am not sure how to handle the unicode error when getting data through a JSON API. Any suggestions on how to remedy this would be appreciated.
Traceback (most recent call last):
File "crunch_API.py", line 95, in <module>
details = list(co, x)
File "crunch_API.py", line 34, in list
data = json.load(urllib2.urlopen(entity))
File "C:\Python27\lib\json\__init__.py", line 278, in load
**kw)
File "C:\Python27\lib\json\__init__.py", line 326, in loads
return _default_decoder.decode(s)
File "C:\Python27\lib\json\decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Python27\lib\json\decoder.py", line 382, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Invalid control character at: line 24 column 89 (char 881)
"overview": "\u003Cp\u003EIFTTT is a service that lets you create powerful connections with one simple statement: if this then that.\u003C/p\u003E", #### Where the error occurs

Categories

Resources