My Python program receives JSON data, and I need to get bits of information out of it. How can I parse the data and use the result? I think I need to use json.loads for this task, but I can't understand how to do it.
For example, suppose that I have jsonStr = '{"one" : "1", "two" : "2", "three" : "3"}'. Given this JSON, and an input of "two", how can I get the corresponding data, "2"?
Beware that .load is for files; .loads is for strings. See also: Reading JSON from a file.
Occasionally, a JSON document is intended to represent tabular data. If you have something like this and are trying to use it with Pandas, see Python - How to convert JSON File to Dataframe.
Some data superficially looks like JSON, but is not JSON.
For example, sometimes the data comes from applying repr to native Python data structures. The result may use quotes differently, use title-cased True and False rather than JSON-mandated true and false, etc. For such data, see Convert a String representation of a Dictionary to a dictionary or How to convert string representation of list to a list.
Another common variant format puts separate valid JSON-formatted data on each line of the input. (Proper JSON cannot be parsed line by line, because it uses balanced brackets that can be many lines apart.) This format is called JSONL. See Loading JSONL file as JSON objects.
Sometimes JSON data from a web source is padded with some extra text. In some contexts, this works around security restrictions in browsers. This is called JSONP and is described at What is JSONP, and why was it created?. In other contexts, the extra text implements a security measure, as described at Why does Google prepend while(1); to their JSON responses?. Either way, handling this in Python is straightforward: simply identify and remove the extra text, and proceed as before.
Very simple:
import json
data = json.loads('{"one" : "1", "two" : "2", "three" : "3"}')
print(data['two']) # or `print data['two']` in Python 2
Sometimes your json is not a string. For example if you are getting a json from a url like this:
j = urllib2.urlopen('http://site.com/data.json')
you will need to use json.load, not json.loads:
j_obj = json.load(j)
(it is easy to forget: the 's' is for 'string')
For URL or file, use json.load(). For string with .json content, use json.loads().
#! /usr/bin/python
import json
# from pprint import pprint
json_file = 'my_cube.json'
cube = '1'
with open(json_file) as json_data:
data = json.load(json_data)
# pprint(data)
print "Dimension: ", data['cubes'][cube]['dim']
print "Measures: ", data['cubes'][cube]['meas']
Following is simple example that may help you:
json_string = """
{
"pk": 1,
"fa": "cc.ee",
"fb": {
"fc": "",
"fd_id": "12345"
}
}"""
import json
data = json.loads(json_string)
if data["fa"] == "cc.ee":
data["fb"]["new_key"] = "cc.ee was present!"
print json.dumps(data)
The output for the above code will be:
{"pk": 1, "fb": {"new_key": "cc.ee was present!", "fd_id": "12345",
"fc": ""}, "fa": "cc.ee"}
Note that you can set the ident argument of dump to print it like so (for example,when using print json.dumps(data , indent=4)):
{
"pk": 1,
"fb": {
"new_key": "cc.ee was present!",
"fd_id": "12345",
"fc": ""
},
"fa": "cc.ee"
}
Parsing the data
Using the standard library json module
For string data, use json.loads:
import json
text = '{"one" : "1", "two" : "2", "three" : "3"}'
parsed = json.loads(example)
For data that comes from a file, or other file-like object, use json.load:
import io, json
# create an in-memory file-like object for demonstration purposes.
text = '{"one" : "1", "two" : "2", "three" : "3"}'
stream = io.StringIO(text)
parsed = json.load(stream) # load, not loads
It's easy to remember the distinction: the trailing s of loads stands for "string". (This is, admittedly, probably not in keeping with standard modern naming practice.)
Note that json.load does not accept a file path:
>>> json.load('example.txt')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.8/json/__init__.py", line 293, in load
return loads(fp.read(),
AttributeError: 'str' object has no attribute 'read'
Both of these functions provide the same set of additional options for customizing the parsing process. Since 3.6, the options are keyword-only.
For string data, it is also possible to use the JSONDecoder class provided by the library, like so:
import json
text = '{"one" : "1", "two" : "2", "three" : "3"}'
decoder = json.JSONDecoder()
parsed = decoder.decode(text)
The same keyword parameters are available, but now they are passed to the constructor of the JSONDecoder, not the .decode method. The main advantage of the class is that it also provides a .raw_decode method, which will ignore extra data after the end of the JSON:
import json
text_with_junk = '{"one" : "1", "two" : "2", "three" : "3"} ignore this'
decoder = json.JSONDecoder()
# `amount` will count how many characters were parsed.
parsed, amount = decoder.raw_decode(text_with_junk)
Using requests or other implicit support
When data is retrieved from the Internet using the popular third-party requests library, it is not necessary to extract .text (or create any kind of file-like object) from the Response object and parse it separately. Instead, the Response object directly provides a .json method which will do this parsing:
import requests
response = requests.get('https://www.example.com')
parsed = response.json()
This method accepts the same keyword parameters as the standard library json functionality.
Using the results
Parsing by any of the above methods will result, by default, in a perfectly ordinary Python data structure, composed of the perfectly ordinary built-in types dict, list, str, int, float, bool (JSON true and false become Python constants True and False) and NoneType (JSON null becomes the Python constant None).
Working with this result, therefore, works the same way as if the same data had been obtained using any other technique.
Thus, to continue the example from the question:
>>> parsed
{'one': '1', 'two': '2', 'three': '3'}
>>> parsed['two']
'2'
I emphasize this because many people seem to expect that there is something special about the result; there is not. It's just a nested data structure, though dealing with nesting is sometimes difficult to understand.
Consider, for example, a parsed result like result = {'a': [{'b': 'c'}, {'d': 'e'}]}. To get 'e' requires following the appropriate steps one at a time: looking up the a key in the dict gives a list [{'b': 'c'}, {'d': 'e'}]; the second element of that list (index 1) is {'d': 'e'}; and looking up the 'd' key in there gives the 'e' value. Thus, the corresponding code is result['a'][1]['d']: each indexing step is applied in order.
See also How can I extract a single value from a nested data structure (such as from parsing JSON)?.
Sometimes people want to apply more complex selection criteria, iterate over nested lists, filter or transform the data, etc. These are more complex topics that will be dealt with elsewhere.
Common sources of confusion
JSON lookalikes
Before attempting to parse JSON data, it is important to ensure that the data actually is JSON. Check the JSON format specification to verify what is expected. Key points:
The document represents one value (normally a JSON "object", which corresponds to a Python dict, but every other type represented by JSON is permissible). In particular, it does not have a separate entry on each line - that's JSONL.
The data is human-readable after using a standard text encoding (normally UTF-8). Almost all of the text is contained within double quotes, and uses escape sequences where appropriate.
Dealing with embedded data
Consider an example file that contains:
{"one": "{\"two\": \"three\", \"backslash\": \"\\\\\"}"}
The backslashes here are for JSON's escape mechanism.
When parsed with one of the above approaches, we get a result like:
>>> example = input()
{"one": "{\"two\": \"three\", \"backslash\": \"\\\\\"}"}
>>> parsed = json.loads(example)
>>> parsed
{'one': '{"two": "three", "backslash": "\\\\"}'}
Notice that parsed['one'] is a str, not a dict. As it happens, though, that string itself represents "embedded" JSON data.
To replace the embedded data with its parsed result, simply access the data, use the same parsing technique, and proceed from there (e.g. by updating the original result in place):
>>> parsed['one'] = json.loads(parsed['one'])
>>> parsed
{'one': {'two': 'three', 'backslash': '\\'}}
Note that the '\\' part here is the representation of a string containing one actual backslash, not two. This is following the usual Python rules for string escapes, which brings us to...
JSON escaping vs. Python string literal escaping
Sometimes people get confused when trying to test code that involves parsing JSON, and supply input as an incorrect string literal in the Python source code. This especially happens when trying to test code that needs to work with embedded JSON.
The issue is that the JSON format and the string literal format each have separate policies for escaping data. Python will process escapes in the string literal in order to create the string, which then still needs to contain escape sequences used by the JSON format.
In the above example, I used input at the interpreter prompt to show the example data, in order to avoid confusion with escaping. Here is one analogous example using a string literal in the source:
>>> json.loads('{"one": "{\\"two\\": \\"three\\", \\"backslash\\": \\"\\\\\\\\\\"}"}')
{'one': '{"two": "three", "backslash": "\\\\"}'}
To use a double-quoted string literal instead, double-quotes in the string literal also need to be escaped. Thus:
>>> json.loads('{\"one\": \"{\\\"two\\\": \\\"three\\\", \\\"backslash\\\": \\\"\\\\\\\\\\\"}\"}')
{'one': '{"two": "three", "backslash": "\\\\"}'}
Each sequence of \\\" in the input becomes \" in the actual JSON data, which becomes " (embedded within a string) when parsed by the JSON parser. Similarly, \\\\\\\\\\\" (five pairs of backslashes, then an escaped quote) becomes \\\\\" (five backslashes and a quote; equivalently, two pairs of backslashes, then an escaped quote) in the actual JSON data, which becomes \\" (two backslashes and a quote) when parsed by the JSON parser, which becomes \\\\" (two escaped backslashes and a quote) in the string representation of the parsed result (since now, the quote does not need escaping, as Python can use single quotes for the string; but the backslashes still do).
Simple customization
Aside from the strict option, the keyword options available for json.load and json.loads should be callbacks. The parser will call them, passing in portions of the data, and use whatever is returned to create the overall result.
The "parse" hooks are fairly self-explanatory. For example, we can specify to convert floating-point values to decimal.Decimal instances instead of using the native Python float:
>>> import decimal
>>> json.loads('123.4', parse_float=decimal.Decimal)
Decimal('123.4')
or use floats for every value, even if they could be converted to integer instead:
>>> json.loads('123', parse_int=float)
123.0
or refuse to convert JSON's representations of special floating-point values:
>>> def reject_special_floats(value):
... raise ValueError
...
>>> json.loads('Infinity')
inf
>>> json.loads('Infinity', parse_constant=reject_special_floats)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.8/json/__init__.py", line 370, in loads
return cls(**kw).decode(s)
File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.8/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
File "<stdin>", line 2, in reject_special_floats
ValueError
Customization example using object_hook and object_pairs_hook
object_hook and object_pairs_hook can be used to control what the parser does when given a JSON object, rather than creating a Python dict.
A supplied object_pairs_hook will be called with one argument, which is a list of the key-value pairs that would otherwise be used for the dict. It should return the desired dict or other result:
>>> def process_object_pairs(items):
... return {k: f'processed {v}' for k, v in items}
...
>>> json.loads('{"one": 1, "two": 2}', object_pairs_hook=process_object_pairs)
{'one': 'processed 1', 'two': 'processed 2'}
A supplied object_hook will instead be called with the dict that would otherwise be created, and the result will substitute:
>>> def make_items_list(obj):
... return list(obj.items())
...
>>> json.loads('{"one": 1, "two": 2}', object_hook=make_items_list)
[('one', 1), ('two', 2)]
If both are supplied, the object_hook will be ignored and only the object_items_hook will be used.
Text encoding issues and bytes/unicode confusion
JSON is fundamentally a text format. Input data should be converted from raw bytes to text first, using an appropriate encoding, before the file is parsed.
In 3.x, loading from a bytes object is supported, and will implicitly use UTF-8 encoding:
>>> json.loads('"text"')
'text'
>>> json.loads(b'"text"')
'text'
>>> json.loads('"\xff"') # Unicode code point 255
'ÿ'
>>> json.loads(b'"\xff"') # Not valid UTF-8 encoded data!
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.8/json/__init__.py", line 343, in loads
s = s.decode(detect_encoding(s), 'surrogatepass')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 1: invalid start byte
UTF-8 is generally considered the default for JSON. While the original specification, ECMA-404 does not mandate an encoding (it only describes "JSON text", rather than JSON files or documents), RFC 8259 demands:
JSON text exchanged between systems that are not part of a closed ecosystem MUST be encoded using UTF-8 [RFC3629].
In such a "closed ecosystem" (i.e. for local documents that are encoded differently and will not be shared publicly), explicitly apply the appropriate encoding first:
>>> json.loads(b'"\xff"'.decode('iso-8859-1'))
'ÿ'
Similarly, JSON files should be opened in text mode, not binary mode. If the file uses a different encoding, simply specify that when opening it:
with open('example.json', encoding='iso-8859-1') as f:
print(json.load(f))
In 2.x, strings and byte-sequences were not properly distinguished, which resulted in a lot of problems and confusion particularly when working with JSON.
Actively maintained 2.x codebases (please note that 2.x itself has not been maintained since Jan 1, 2020) should consistently use unicode values to represent text and str values to represent raw data (str is an alias for bytes in 2.x), and accept that the repr of unicode values will have a u prefix (after all, the code should be concerned with what the value actually is, not what it looks like at the REPL).
Historical note: simplejson
simplejson is simply the standard library json module, but maintained and developed externally. It was originally created before JSON support was added to the Python standard library. In 2.6, the simplejson project was incorporated into the standard library as json. Current development maintains compatibility back to 2.5, although there is also an unmaintained, legacy branch that should support as far back as 2.2.
The standard library generally uses quite old versions of the package; for example, my 3.8.10 installation reports
>>> json.__version__
'2.0.9'
whereas the most recent release (as of this writing) is 3.18.1. (The tagged releases in the Github repository only go as far back as 3.8.2; the 2.0.9 release dates to 2009.
I have as yet been unable to find comprehensive documentation of which simplejson versions correspond to which Python releases.
Related
My Python program receives JSON data, and I need to get bits of information out of it. How can I parse the data and use the result? I think I need to use json.loads for this task, but I can't understand how to do it.
For example, suppose that I have jsonStr = '{"one" : "1", "two" : "2", "three" : "3"}'. Given this JSON, and an input of "two", how can I get the corresponding data, "2"?
Beware that .load is for files; .loads is for strings. See also: Reading JSON from a file.
Occasionally, a JSON document is intended to represent tabular data. If you have something like this and are trying to use it with Pandas, see Python - How to convert JSON File to Dataframe.
Some data superficially looks like JSON, but is not JSON.
For example, sometimes the data comes from applying repr to native Python data structures. The result may use quotes differently, use title-cased True and False rather than JSON-mandated true and false, etc. For such data, see Convert a String representation of a Dictionary to a dictionary or How to convert string representation of list to a list.
Another common variant format puts separate valid JSON-formatted data on each line of the input. (Proper JSON cannot be parsed line by line, because it uses balanced brackets that can be many lines apart.) This format is called JSONL. See Loading JSONL file as JSON objects.
Sometimes JSON data from a web source is padded with some extra text. In some contexts, this works around security restrictions in browsers. This is called JSONP and is described at What is JSONP, and why was it created?. In other contexts, the extra text implements a security measure, as described at Why does Google prepend while(1); to their JSON responses?. Either way, handling this in Python is straightforward: simply identify and remove the extra text, and proceed as before.
Very simple:
import json
data = json.loads('{"one" : "1", "two" : "2", "three" : "3"}')
print(data['two']) # or `print data['two']` in Python 2
Sometimes your json is not a string. For example if you are getting a json from a url like this:
j = urllib2.urlopen('http://site.com/data.json')
you will need to use json.load, not json.loads:
j_obj = json.load(j)
(it is easy to forget: the 's' is for 'string')
For URL or file, use json.load(). For string with .json content, use json.loads().
#! /usr/bin/python
import json
# from pprint import pprint
json_file = 'my_cube.json'
cube = '1'
with open(json_file) as json_data:
data = json.load(json_data)
# pprint(data)
print "Dimension: ", data['cubes'][cube]['dim']
print "Measures: ", data['cubes'][cube]['meas']
Following is simple example that may help you:
json_string = """
{
"pk": 1,
"fa": "cc.ee",
"fb": {
"fc": "",
"fd_id": "12345"
}
}"""
import json
data = json.loads(json_string)
if data["fa"] == "cc.ee":
data["fb"]["new_key"] = "cc.ee was present!"
print json.dumps(data)
The output for the above code will be:
{"pk": 1, "fb": {"new_key": "cc.ee was present!", "fd_id": "12345",
"fc": ""}, "fa": "cc.ee"}
Note that you can set the ident argument of dump to print it like so (for example,when using print json.dumps(data , indent=4)):
{
"pk": 1,
"fb": {
"new_key": "cc.ee was present!",
"fd_id": "12345",
"fc": ""
},
"fa": "cc.ee"
}
Parsing the data
Using the standard library json module
For string data, use json.loads:
import json
text = '{"one" : "1", "two" : "2", "three" : "3"}'
parsed = json.loads(example)
For data that comes from a file, or other file-like object, use json.load:
import io, json
# create an in-memory file-like object for demonstration purposes.
text = '{"one" : "1", "two" : "2", "three" : "3"}'
stream = io.StringIO(text)
parsed = json.load(stream) # load, not loads
It's easy to remember the distinction: the trailing s of loads stands for "string". (This is, admittedly, probably not in keeping with standard modern naming practice.)
Note that json.load does not accept a file path:
>>> json.load('example.txt')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.8/json/__init__.py", line 293, in load
return loads(fp.read(),
AttributeError: 'str' object has no attribute 'read'
Both of these functions provide the same set of additional options for customizing the parsing process. Since 3.6, the options are keyword-only.
For string data, it is also possible to use the JSONDecoder class provided by the library, like so:
import json
text = '{"one" : "1", "two" : "2", "three" : "3"}'
decoder = json.JSONDecoder()
parsed = decoder.decode(text)
The same keyword parameters are available, but now they are passed to the constructor of the JSONDecoder, not the .decode method. The main advantage of the class is that it also provides a .raw_decode method, which will ignore extra data after the end of the JSON:
import json
text_with_junk = '{"one" : "1", "two" : "2", "three" : "3"} ignore this'
decoder = json.JSONDecoder()
# `amount` will count how many characters were parsed.
parsed, amount = decoder.raw_decode(text_with_junk)
Using requests or other implicit support
When data is retrieved from the Internet using the popular third-party requests library, it is not necessary to extract .text (or create any kind of file-like object) from the Response object and parse it separately. Instead, the Response object directly provides a .json method which will do this parsing:
import requests
response = requests.get('https://www.example.com')
parsed = response.json()
This method accepts the same keyword parameters as the standard library json functionality.
Using the results
Parsing by any of the above methods will result, by default, in a perfectly ordinary Python data structure, composed of the perfectly ordinary built-in types dict, list, str, int, float, bool (JSON true and false become Python constants True and False) and NoneType (JSON null becomes the Python constant None).
Working with this result, therefore, works the same way as if the same data had been obtained using any other technique.
Thus, to continue the example from the question:
>>> parsed
{'one': '1', 'two': '2', 'three': '3'}
>>> parsed['two']
'2'
I emphasize this because many people seem to expect that there is something special about the result; there is not. It's just a nested data structure, though dealing with nesting is sometimes difficult to understand.
Consider, for example, a parsed result like result = {'a': [{'b': 'c'}, {'d': 'e'}]}. To get 'e' requires following the appropriate steps one at a time: looking up the a key in the dict gives a list [{'b': 'c'}, {'d': 'e'}]; the second element of that list (index 1) is {'d': 'e'}; and looking up the 'd' key in there gives the 'e' value. Thus, the corresponding code is result['a'][1]['d']: each indexing step is applied in order.
See also How can I extract a single value from a nested data structure (such as from parsing JSON)?.
Sometimes people want to apply more complex selection criteria, iterate over nested lists, filter or transform the data, etc. These are more complex topics that will be dealt with elsewhere.
Common sources of confusion
JSON lookalikes
Before attempting to parse JSON data, it is important to ensure that the data actually is JSON. Check the JSON format specification to verify what is expected. Key points:
The document represents one value (normally a JSON "object", which corresponds to a Python dict, but every other type represented by JSON is permissible). In particular, it does not have a separate entry on each line - that's JSONL.
The data is human-readable after using a standard text encoding (normally UTF-8). Almost all of the text is contained within double quotes, and uses escape sequences where appropriate.
Dealing with embedded data
Consider an example file that contains:
{"one": "{\"two\": \"three\", \"backslash\": \"\\\\\"}"}
The backslashes here are for JSON's escape mechanism.
When parsed with one of the above approaches, we get a result like:
>>> example = input()
{"one": "{\"two\": \"three\", \"backslash\": \"\\\\\"}"}
>>> parsed = json.loads(example)
>>> parsed
{'one': '{"two": "three", "backslash": "\\\\"}'}
Notice that parsed['one'] is a str, not a dict. As it happens, though, that string itself represents "embedded" JSON data.
To replace the embedded data with its parsed result, simply access the data, use the same parsing technique, and proceed from there (e.g. by updating the original result in place):
>>> parsed['one'] = json.loads(parsed['one'])
>>> parsed
{'one': {'two': 'three', 'backslash': '\\'}}
Note that the '\\' part here is the representation of a string containing one actual backslash, not two. This is following the usual Python rules for string escapes, which brings us to...
JSON escaping vs. Python string literal escaping
Sometimes people get confused when trying to test code that involves parsing JSON, and supply input as an incorrect string literal in the Python source code. This especially happens when trying to test code that needs to work with embedded JSON.
The issue is that the JSON format and the string literal format each have separate policies for escaping data. Python will process escapes in the string literal in order to create the string, which then still needs to contain escape sequences used by the JSON format.
In the above example, I used input at the interpreter prompt to show the example data, in order to avoid confusion with escaping. Here is one analogous example using a string literal in the source:
>>> json.loads('{"one": "{\\"two\\": \\"three\\", \\"backslash\\": \\"\\\\\\\\\\"}"}')
{'one': '{"two": "three", "backslash": "\\\\"}'}
To use a double-quoted string literal instead, double-quotes in the string literal also need to be escaped. Thus:
>>> json.loads('{\"one\": \"{\\\"two\\\": \\\"three\\\", \\\"backslash\\\": \\\"\\\\\\\\\\\"}\"}')
{'one': '{"two": "three", "backslash": "\\\\"}'}
Each sequence of \\\" in the input becomes \" in the actual JSON data, which becomes " (embedded within a string) when parsed by the JSON parser. Similarly, \\\\\\\\\\\" (five pairs of backslashes, then an escaped quote) becomes \\\\\" (five backslashes and a quote; equivalently, two pairs of backslashes, then an escaped quote) in the actual JSON data, which becomes \\" (two backslashes and a quote) when parsed by the JSON parser, which becomes \\\\" (two escaped backslashes and a quote) in the string representation of the parsed result (since now, the quote does not need escaping, as Python can use single quotes for the string; but the backslashes still do).
Simple customization
Aside from the strict option, the keyword options available for json.load and json.loads should be callbacks. The parser will call them, passing in portions of the data, and use whatever is returned to create the overall result.
The "parse" hooks are fairly self-explanatory. For example, we can specify to convert floating-point values to decimal.Decimal instances instead of using the native Python float:
>>> import decimal
>>> json.loads('123.4', parse_float=decimal.Decimal)
Decimal('123.4')
or use floats for every value, even if they could be converted to integer instead:
>>> json.loads('123', parse_int=float)
123.0
or refuse to convert JSON's representations of special floating-point values:
>>> def reject_special_floats(value):
... raise ValueError
...
>>> json.loads('Infinity')
inf
>>> json.loads('Infinity', parse_constant=reject_special_floats)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.8/json/__init__.py", line 370, in loads
return cls(**kw).decode(s)
File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.8/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
File "<stdin>", line 2, in reject_special_floats
ValueError
Customization example using object_hook and object_pairs_hook
object_hook and object_pairs_hook can be used to control what the parser does when given a JSON object, rather than creating a Python dict.
A supplied object_pairs_hook will be called with one argument, which is a list of the key-value pairs that would otherwise be used for the dict. It should return the desired dict or other result:
>>> def process_object_pairs(items):
... return {k: f'processed {v}' for k, v in items}
...
>>> json.loads('{"one": 1, "two": 2}', object_pairs_hook=process_object_pairs)
{'one': 'processed 1', 'two': 'processed 2'}
A supplied object_hook will instead be called with the dict that would otherwise be created, and the result will substitute:
>>> def make_items_list(obj):
... return list(obj.items())
...
>>> json.loads('{"one": 1, "two": 2}', object_hook=make_items_list)
[('one', 1), ('two', 2)]
If both are supplied, the object_hook will be ignored and only the object_items_hook will be used.
Text encoding issues and bytes/unicode confusion
JSON is fundamentally a text format. Input data should be converted from raw bytes to text first, using an appropriate encoding, before the file is parsed.
In 3.x, loading from a bytes object is supported, and will implicitly use UTF-8 encoding:
>>> json.loads('"text"')
'text'
>>> json.loads(b'"text"')
'text'
>>> json.loads('"\xff"') # Unicode code point 255
'ÿ'
>>> json.loads(b'"\xff"') # Not valid UTF-8 encoded data!
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.8/json/__init__.py", line 343, in loads
s = s.decode(detect_encoding(s), 'surrogatepass')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 1: invalid start byte
UTF-8 is generally considered the default for JSON. While the original specification, ECMA-404 does not mandate an encoding (it only describes "JSON text", rather than JSON files or documents), RFC 8259 demands:
JSON text exchanged between systems that are not part of a closed ecosystem MUST be encoded using UTF-8 [RFC3629].
In such a "closed ecosystem" (i.e. for local documents that are encoded differently and will not be shared publicly), explicitly apply the appropriate encoding first:
>>> json.loads(b'"\xff"'.decode('iso-8859-1'))
'ÿ'
Similarly, JSON files should be opened in text mode, not binary mode. If the file uses a different encoding, simply specify that when opening it:
with open('example.json', encoding='iso-8859-1') as f:
print(json.load(f))
In 2.x, strings and byte-sequences were not properly distinguished, which resulted in a lot of problems and confusion particularly when working with JSON.
Actively maintained 2.x codebases (please note that 2.x itself has not been maintained since Jan 1, 2020) should consistently use unicode values to represent text and str values to represent raw data (str is an alias for bytes in 2.x), and accept that the repr of unicode values will have a u prefix (after all, the code should be concerned with what the value actually is, not what it looks like at the REPL).
Historical note: simplejson
simplejson is simply the standard library json module, but maintained and developed externally. It was originally created before JSON support was added to the Python standard library. In 2.6, the simplejson project was incorporated into the standard library as json. Current development maintains compatibility back to 2.5, although there is also an unmaintained, legacy branch that should support as far back as 2.2.
The standard library generally uses quite old versions of the package; for example, my 3.8.10 installation reports
>>> json.__version__
'2.0.9'
whereas the most recent release (as of this writing) is 3.18.1. (The tagged releases in the Github repository only go as far back as 3.8.2; the 2.0.9 release dates to 2009.
I have as yet been unable to find comprehensive documentation of which simplejson versions correspond to which Python releases.
My Python program receives JSON data, and I need to get bits of information out of it. How can I parse the data and use the result? I think I need to use json.loads for this task, but I can't understand how to do it.
For example, suppose that I have jsonStr = '{"one" : "1", "two" : "2", "three" : "3"}'. Given this JSON, and an input of "two", how can I get the corresponding data, "2"?
Beware that .load is for files; .loads is for strings. See also: Reading JSON from a file.
Occasionally, a JSON document is intended to represent tabular data. If you have something like this and are trying to use it with Pandas, see Python - How to convert JSON File to Dataframe.
Some data superficially looks like JSON, but is not JSON.
For example, sometimes the data comes from applying repr to native Python data structures. The result may use quotes differently, use title-cased True and False rather than JSON-mandated true and false, etc. For such data, see Convert a String representation of a Dictionary to a dictionary or How to convert string representation of list to a list.
Another common variant format puts separate valid JSON-formatted data on each line of the input. (Proper JSON cannot be parsed line by line, because it uses balanced brackets that can be many lines apart.) This format is called JSONL. See Loading JSONL file as JSON objects.
Sometimes JSON data from a web source is padded with some extra text. In some contexts, this works around security restrictions in browsers. This is called JSONP and is described at What is JSONP, and why was it created?. In other contexts, the extra text implements a security measure, as described at Why does Google prepend while(1); to their JSON responses?. Either way, handling this in Python is straightforward: simply identify and remove the extra text, and proceed as before.
Very simple:
import json
data = json.loads('{"one" : "1", "two" : "2", "three" : "3"}')
print(data['two']) # or `print data['two']` in Python 2
Sometimes your json is not a string. For example if you are getting a json from a url like this:
j = urllib2.urlopen('http://site.com/data.json')
you will need to use json.load, not json.loads:
j_obj = json.load(j)
(it is easy to forget: the 's' is for 'string')
For URL or file, use json.load(). For string with .json content, use json.loads().
#! /usr/bin/python
import json
# from pprint import pprint
json_file = 'my_cube.json'
cube = '1'
with open(json_file) as json_data:
data = json.load(json_data)
# pprint(data)
print "Dimension: ", data['cubes'][cube]['dim']
print "Measures: ", data['cubes'][cube]['meas']
Following is simple example that may help you:
json_string = """
{
"pk": 1,
"fa": "cc.ee",
"fb": {
"fc": "",
"fd_id": "12345"
}
}"""
import json
data = json.loads(json_string)
if data["fa"] == "cc.ee":
data["fb"]["new_key"] = "cc.ee was present!"
print json.dumps(data)
The output for the above code will be:
{"pk": 1, "fb": {"new_key": "cc.ee was present!", "fd_id": "12345",
"fc": ""}, "fa": "cc.ee"}
Note that you can set the ident argument of dump to print it like so (for example,when using print json.dumps(data , indent=4)):
{
"pk": 1,
"fb": {
"new_key": "cc.ee was present!",
"fd_id": "12345",
"fc": ""
},
"fa": "cc.ee"
}
Parsing the data
Using the standard library json module
For string data, use json.loads:
import json
text = '{"one" : "1", "two" : "2", "three" : "3"}'
parsed = json.loads(example)
For data that comes from a file, or other file-like object, use json.load:
import io, json
# create an in-memory file-like object for demonstration purposes.
text = '{"one" : "1", "two" : "2", "three" : "3"}'
stream = io.StringIO(text)
parsed = json.load(stream) # load, not loads
It's easy to remember the distinction: the trailing s of loads stands for "string". (This is, admittedly, probably not in keeping with standard modern naming practice.)
Note that json.load does not accept a file path:
>>> json.load('example.txt')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.8/json/__init__.py", line 293, in load
return loads(fp.read(),
AttributeError: 'str' object has no attribute 'read'
Both of these functions provide the same set of additional options for customizing the parsing process. Since 3.6, the options are keyword-only.
For string data, it is also possible to use the JSONDecoder class provided by the library, like so:
import json
text = '{"one" : "1", "two" : "2", "three" : "3"}'
decoder = json.JSONDecoder()
parsed = decoder.decode(text)
The same keyword parameters are available, but now they are passed to the constructor of the JSONDecoder, not the .decode method. The main advantage of the class is that it also provides a .raw_decode method, which will ignore extra data after the end of the JSON:
import json
text_with_junk = '{"one" : "1", "two" : "2", "three" : "3"} ignore this'
decoder = json.JSONDecoder()
# `amount` will count how many characters were parsed.
parsed, amount = decoder.raw_decode(text_with_junk)
Using requests or other implicit support
When data is retrieved from the Internet using the popular third-party requests library, it is not necessary to extract .text (or create any kind of file-like object) from the Response object and parse it separately. Instead, the Response object directly provides a .json method which will do this parsing:
import requests
response = requests.get('https://www.example.com')
parsed = response.json()
This method accepts the same keyword parameters as the standard library json functionality.
Using the results
Parsing by any of the above methods will result, by default, in a perfectly ordinary Python data structure, composed of the perfectly ordinary built-in types dict, list, str, int, float, bool (JSON true and false become Python constants True and False) and NoneType (JSON null becomes the Python constant None).
Working with this result, therefore, works the same way as if the same data had been obtained using any other technique.
Thus, to continue the example from the question:
>>> parsed
{'one': '1', 'two': '2', 'three': '3'}
>>> parsed['two']
'2'
I emphasize this because many people seem to expect that there is something special about the result; there is not. It's just a nested data structure, though dealing with nesting is sometimes difficult to understand.
Consider, for example, a parsed result like result = {'a': [{'b': 'c'}, {'d': 'e'}]}. To get 'e' requires following the appropriate steps one at a time: looking up the a key in the dict gives a list [{'b': 'c'}, {'d': 'e'}]; the second element of that list (index 1) is {'d': 'e'}; and looking up the 'd' key in there gives the 'e' value. Thus, the corresponding code is result['a'][1]['d']: each indexing step is applied in order.
See also How can I extract a single value from a nested data structure (such as from parsing JSON)?.
Sometimes people want to apply more complex selection criteria, iterate over nested lists, filter or transform the data, etc. These are more complex topics that will be dealt with elsewhere.
Common sources of confusion
JSON lookalikes
Before attempting to parse JSON data, it is important to ensure that the data actually is JSON. Check the JSON format specification to verify what is expected. Key points:
The document represents one value (normally a JSON "object", which corresponds to a Python dict, but every other type represented by JSON is permissible). In particular, it does not have a separate entry on each line - that's JSONL.
The data is human-readable after using a standard text encoding (normally UTF-8). Almost all of the text is contained within double quotes, and uses escape sequences where appropriate.
Dealing with embedded data
Consider an example file that contains:
{"one": "{\"two\": \"three\", \"backslash\": \"\\\\\"}"}
The backslashes here are for JSON's escape mechanism.
When parsed with one of the above approaches, we get a result like:
>>> example = input()
{"one": "{\"two\": \"three\", \"backslash\": \"\\\\\"}"}
>>> parsed = json.loads(example)
>>> parsed
{'one': '{"two": "three", "backslash": "\\\\"}'}
Notice that parsed['one'] is a str, not a dict. As it happens, though, that string itself represents "embedded" JSON data.
To replace the embedded data with its parsed result, simply access the data, use the same parsing technique, and proceed from there (e.g. by updating the original result in place):
>>> parsed['one'] = json.loads(parsed['one'])
>>> parsed
{'one': {'two': 'three', 'backslash': '\\'}}
Note that the '\\' part here is the representation of a string containing one actual backslash, not two. This is following the usual Python rules for string escapes, which brings us to...
JSON escaping vs. Python string literal escaping
Sometimes people get confused when trying to test code that involves parsing JSON, and supply input as an incorrect string literal in the Python source code. This especially happens when trying to test code that needs to work with embedded JSON.
The issue is that the JSON format and the string literal format each have separate policies for escaping data. Python will process escapes in the string literal in order to create the string, which then still needs to contain escape sequences used by the JSON format.
In the above example, I used input at the interpreter prompt to show the example data, in order to avoid confusion with escaping. Here is one analogous example using a string literal in the source:
>>> json.loads('{"one": "{\\"two\\": \\"three\\", \\"backslash\\": \\"\\\\\\\\\\"}"}')
{'one': '{"two": "three", "backslash": "\\\\"}'}
To use a double-quoted string literal instead, double-quotes in the string literal also need to be escaped. Thus:
>>> json.loads('{\"one\": \"{\\\"two\\\": \\\"three\\\", \\\"backslash\\\": \\\"\\\\\\\\\\\"}\"}')
{'one': '{"two": "three", "backslash": "\\\\"}'}
Each sequence of \\\" in the input becomes \" in the actual JSON data, which becomes " (embedded within a string) when parsed by the JSON parser. Similarly, \\\\\\\\\\\" (five pairs of backslashes, then an escaped quote) becomes \\\\\" (five backslashes and a quote; equivalently, two pairs of backslashes, then an escaped quote) in the actual JSON data, which becomes \\" (two backslashes and a quote) when parsed by the JSON parser, which becomes \\\\" (two escaped backslashes and a quote) in the string representation of the parsed result (since now, the quote does not need escaping, as Python can use single quotes for the string; but the backslashes still do).
Simple customization
Aside from the strict option, the keyword options available for json.load and json.loads should be callbacks. The parser will call them, passing in portions of the data, and use whatever is returned to create the overall result.
The "parse" hooks are fairly self-explanatory. For example, we can specify to convert floating-point values to decimal.Decimal instances instead of using the native Python float:
>>> import decimal
>>> json.loads('123.4', parse_float=decimal.Decimal)
Decimal('123.4')
or use floats for every value, even if they could be converted to integer instead:
>>> json.loads('123', parse_int=float)
123.0
or refuse to convert JSON's representations of special floating-point values:
>>> def reject_special_floats(value):
... raise ValueError
...
>>> json.loads('Infinity')
inf
>>> json.loads('Infinity', parse_constant=reject_special_floats)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.8/json/__init__.py", line 370, in loads
return cls(**kw).decode(s)
File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.8/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
File "<stdin>", line 2, in reject_special_floats
ValueError
Customization example using object_hook and object_pairs_hook
object_hook and object_pairs_hook can be used to control what the parser does when given a JSON object, rather than creating a Python dict.
A supplied object_pairs_hook will be called with one argument, which is a list of the key-value pairs that would otherwise be used for the dict. It should return the desired dict or other result:
>>> def process_object_pairs(items):
... return {k: f'processed {v}' for k, v in items}
...
>>> json.loads('{"one": 1, "two": 2}', object_pairs_hook=process_object_pairs)
{'one': 'processed 1', 'two': 'processed 2'}
A supplied object_hook will instead be called with the dict that would otherwise be created, and the result will substitute:
>>> def make_items_list(obj):
... return list(obj.items())
...
>>> json.loads('{"one": 1, "two": 2}', object_hook=make_items_list)
[('one', 1), ('two', 2)]
If both are supplied, the object_hook will be ignored and only the object_items_hook will be used.
Text encoding issues and bytes/unicode confusion
JSON is fundamentally a text format. Input data should be converted from raw bytes to text first, using an appropriate encoding, before the file is parsed.
In 3.x, loading from a bytes object is supported, and will implicitly use UTF-8 encoding:
>>> json.loads('"text"')
'text'
>>> json.loads(b'"text"')
'text'
>>> json.loads('"\xff"') # Unicode code point 255
'ÿ'
>>> json.loads(b'"\xff"') # Not valid UTF-8 encoded data!
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.8/json/__init__.py", line 343, in loads
s = s.decode(detect_encoding(s), 'surrogatepass')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 1: invalid start byte
UTF-8 is generally considered the default for JSON. While the original specification, ECMA-404 does not mandate an encoding (it only describes "JSON text", rather than JSON files or documents), RFC 8259 demands:
JSON text exchanged between systems that are not part of a closed ecosystem MUST be encoded using UTF-8 [RFC3629].
In such a "closed ecosystem" (i.e. for local documents that are encoded differently and will not be shared publicly), explicitly apply the appropriate encoding first:
>>> json.loads(b'"\xff"'.decode('iso-8859-1'))
'ÿ'
Similarly, JSON files should be opened in text mode, not binary mode. If the file uses a different encoding, simply specify that when opening it:
with open('example.json', encoding='iso-8859-1') as f:
print(json.load(f))
In 2.x, strings and byte-sequences were not properly distinguished, which resulted in a lot of problems and confusion particularly when working with JSON.
Actively maintained 2.x codebases (please note that 2.x itself has not been maintained since Jan 1, 2020) should consistently use unicode values to represent text and str values to represent raw data (str is an alias for bytes in 2.x), and accept that the repr of unicode values will have a u prefix (after all, the code should be concerned with what the value actually is, not what it looks like at the REPL).
Historical note: simplejson
simplejson is simply the standard library json module, but maintained and developed externally. It was originally created before JSON support was added to the Python standard library. In 2.6, the simplejson project was incorporated into the standard library as json. Current development maintains compatibility back to 2.5, although there is also an unmaintained, legacy branch that should support as far back as 2.2.
The standard library generally uses quite old versions of the package; for example, my 3.8.10 installation reports
>>> json.__version__
'2.0.9'
whereas the most recent release (as of this writing) is 3.18.1. (The tagged releases in the Github repository only go as far back as 3.8.2; the 2.0.9 release dates to 2009.
I have as yet been unable to find comprehensive documentation of which simplejson versions correspond to which Python releases.
I have a string, which I evaluate as:
import ast
def parse(s):
return ast.literal_eval(s)
print parse(string)
{'_meta': {'name': 'foo', 'version': 0.2},
'clientId': 'google.com',
'clip': False,
'cts': 1444088114,
'dev': 0,
'uuid': '4375d784-809f-4243-886b-5dd2e6d2c3b7'}
But when I use jsonlint.com to validate the above json..
it throws schema error..
If I try to use json.loads
I see the following error:
Try: json.loads(str(parse(string)))
ValueError: Expecting property name: line 1 column 1 (char 1)
I am basically trying to convert this json in avro How to covert json string to avro in python?
ast.literal_eval() loads Python syntax. It won't parse JSON, that's what the json.loads() function is for.
Converting a Python object to a string with str() is still Python syntax, not JSON syntax, that is what json.dumps() is for.
JSON is not Python syntax. Python uses None where JSON uses null; Python uses True and False for booleans, JSON uses true and false. JSON strings always use " double quotes, Python uses either single or double, depending on the contents. When using Python 2, strings contain bytes unless you use unicode objects (recognisable by the u prefix on their literal notation), but JSON strings are fully Unicode aware. Python will use \xhh for Unicode characters in the Latin-1 range outside ASCII and \Uhhhhhhhh for non-BMP unicode points, but JSON only ever uses \uhhhh codes. JSON integers should generally be viewed as limited to the range representable by the C double type (since JavaScript numbers are always floating point numbers), Python integers have no limits other than what fits in your memory.
As such, JSON and Python syntax are not interchangeable. You cannot use str() on a Python object and expect to parse it as JSON. You cannot use json.dumps() and parse it with ast.literal_eval(). Don't confuse the two.
I got a problem, first I made a api that accepts a post request,
then responds with JSON as a result.
Post request data had been encoded, I accepted the post data, and got the data
rightly, then I response with the new data in JSON format.
But when I returned the JSON, I found that the string is unicode format, e.g.
{
'a':'\u00e3\u0080'
}
but, I want to get a format like this:
{
'a':"ã"
}
I want this format because I found that this unicode format didn't work well in IE8.
Yes, IE8.
What can I do for this issue?
Thanks!
If you're using standard library json module, specifying ensure_ascii=False give you what you want.
For example:
>>> print json.dumps({'a': u'ã'})
{"a": "\u00e3"}
>>> print json.dumps({'a': u'ã'}, ensure_ascii=False)
{"a": "ã"}
According to json.dump documentation:
If ensure_ascii is True (the default), all non-ASCII characters in the
output are escaped with \uXXXX sequences, and the result is a str
instance consisting of ASCII characters only. If ensure_ascii is
False, some chunks written to fp may be unicode instances. This
usually happens because the input contains unicode strings or the
encoding parameter is used. ...
BTW, what do you mean "unicode format didn't work well in IE8," ?
I am confuse now why I am not able to parse this JSON string. Similar code works fine on other JSON string but not on this one - I am trying to parse JSON String and extract script from the JSON.
Below is my code.
for step in steps:
step_path = '/example/v1' +'/'+step
data, stat = zk.get(step_path)
jsonStr = data.decode("utf-8")
print(jsonStr)
j = json.loads(json.dumps(jsonStr))
print(j)
shell_script = j['script']
print(shell_script)
So the first print(jsonStr) will print out something like this -
{"script":"#!/bin/bash\necho Hello world1\n"}
And the second print(j) will print out something like this -
{"script":"#!/bin/bash\necho Hello world1\n"}
And then the third print doesn't gets printed out and it gives this error -
Traceback (most recent call last):
File "test5.py", line 33, in <module>
shell_script = j['script']
TypeError: string indices must be integers
So I am wondering what wrong I am doing here?
I have used same above code to parse the JSON and it works fine..
The problem is that jsonStr is a string that encodes some object in JSON, not the actual object.
You obviously knew it was a string, because you called it jsonStr. And it's proven by the fact that this line works:
jsonStr = data.decode("utf-8")
So, jsonStr is a string. Calling json.dumps on a string is perfectly legal. It doesn't matter whether that string was the JSON encoding of some object, or your last name; you can encode that string in JSON. And then you can decode that string, getting back the original string.
So, this:
j = json.loads(json.dumps(jsonStr))
… is going to give you back the exact same string as jsonStr in j. Which you still haven't decoded to the original object.
To do that, just don't do the extra encode:
j = json.loads(jsonStr)
If that isn't clear, try playing with it an interactive terminal:
>>> obj = ['abc', {'a': 1, 'b': 2}]
>>> type(obj)
list
>>> obj[1]['b']
2
>>> j = json.dumps(obj)
>>> type(j)
str
>>> j[1]['b']
TypeError: string indices must be integers
>>> jj = json.dumps(j)
>>> type(jj)
str
>>> j
'["abc", {"a": 1, "b": 2}]'
>>> jj
'"[\\"abc\\", {\\"a\\": 1, \\"b\\": 2}]"'
>>> json.loads(j)
['abc', {'a': 1, 'b': 2}]
>>> json.loads(j) == obj
True
>>> json.loads(jj)
'["abc", {"a": 1, "b": 2}]'
>>> json.loads(jj) == j
True
>>> json.loads(jj) == obj
False
Try replacing j = json.loads(json.dumps(jsonStr)) with j = json.loads(jsonStr).
Ok... So for people who are still lost because they are used to JS this is what I understood after having tested multiple use cases :
json.dumps does not make your string ready to be loaded with json.loads. It will only encode it to JSON specs (by adding escapes pretty much everywhere) !
json.loads will transform a correctly formatted JSON string to a python dictionary. It will only work if the JSON follows the JSON specs (no single quotes, uppercase for boolean's first letter, etc).
Dumping JSON - An encoding story
Lets take an example !
$ obj = {"foobar": True}
This is NOT json ! This is a python dictionary that uses python types (like booleans).
True is not compatible with the JSON specs so in order to send this to an API you would have to serialize it to REAL JSON. That's where json.dumps comes in !
$ json.dumps({"foobar": True})
'{"foobar": true}'
See ? True became true which is real JSON. You now have a string that you can send to the real world. Good job.
Loading JSON - A decoding story
So now lets talk about json.loads.
You have a string that looks like json but its only a string and what you want is a python dictionary. Lets walk through the following examples :
$ string = '{"foobar": true}'
$ dict = json.loads(string)
{'foobar': True}
Here we have a string that looks like JSON. You can use json.loads to transform this string in dictionary and do dict["foobar"] which will return True.
So, why so many errors ?
Well, if your JSON looks like JSON but is not really JSON compatible (spec wise), for instance :
$ string = "{'foobar': true}"
$ json.loads(string)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes
BAM ! This is not working because JSON specs wont allow you to have single quotes but only double ones...
If you reverse the quotes to '{"foobar": true}' then it will work.
What you probably have tried is :
string = json.loads(json.dumps("{'foobar': true}"))
This JSON is invalid (check the quotes) and moreover you'll get a string as a results. Disapointed ? I know...
json.dumps WILL fix you JSON string but will also encode it. The encoding will render json.loads useless even if the JSON is now good to go.
You have to understand that json.dumps encodes and json.loads decodes !
So what you did here is encode a string and then decode the string. But its still a string ! you haven't done anything to change that fact ! If you want to get it from string to dictionary then you need an extra step... => A second json.loads !
Lets try that with a valid JSON (no mean single quotes)
$ obj = json.loads(json.loads(json.dumps('{"foobar": true}')))
$ obj["foobar"]
True
The json string went through json.dumps and got encoded. Then it when through json.loads where it got decoded (useless...YEAY). Finaly, it went through json.loads AGAIN and got transformed from string to dictionary. As you can see, using json.dumps only adds a useless step in that situation.
One last thing. If we do the same thing again but with a bad JSON:
$ string = json.loads(json.loads(json.dumps("{'foobar': true}")))
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes
Quotes are wrong here (ain't you getting used to this by now ?).
What happend here is that json.dumps fixed your JSON. json.loads removed the fix (lol) and finaly json.loads got the bad JSON which did not change as the first 2 steps canceled each other.
TL;DR
In conclusion :
Fix you JSON yourself ! Don't give to json.loads wrongly formated JSON and don't try to mix json.loads with json.dumps to fix what only you can fix.
Hope this helped someone ;-)
Disclaimer. I'm no python expert.
Feel free to challenge this answer in the comment section.