Problem decoding json strings using json module - python

After contacting a server I get the following strings as response
{"kind": "t2", "data": {"has_mail": null, "name": "shadyabhi", "created": 1273919273.0, "created_utc": 1273919273.0, "link_karma": 1343, "comment_karma": 301, "is_gold": false, "is_mod": false, "id": "425zf", "has_mod_mail": null}}
which is stored as type 'str' in my script.
Now, when I try to decode it using json.dumps(mystring, sort_keys=True, indent=4), I get this.
"{\"kind\": \"t2\", \"data\": {\"has_mail\": null, \"name\": \"shadyabhi\", \"created\": 1273919273.0, \"created_utc\": 1273919273.0, \"link_karma\": 1343, \"comment_karma\": 301, \"is_gold\": false, \"is_mod\": false, \"id\": \"425zf\", \"has_mod_mail\": null}}"
which should really be like this
shadyabhi#archlinux ~ $ echo '{"kind": "t2", "data": {"has_mail": "null", "name": "shadyabhi", "created": 1273919273.0, "created_utc": 1273919273.0, "link_karma": 1343, "comment_karma": 299, "is_gold": "false", "is_mod": "false", "id": "425zf", "has_mod_mail": "null"}}' | python2 -mjson.tool
{
"data": {
"comment_karma": 299,
"created": 1273919273.0,
"created_utc": 1273919273.0,
"has_mail": "null",
"has_mod_mail": "null",
"id": "425zf",
"is_gold": "false",
"is_mod": "false",
"link_karma": 1343,
"name": "shadyabhi"
},
"kind": "t2"
}
shadyabhi#archlinux ~ $
So, what is it that's going wrong?

You need to load it before you can dump it. Try this:
data = json.loads(returnFromWebService)
json.dumps(data, sort_keys=True, indent=4)
To add a bit more detail - you're receiving a string, and then asking the json library to dump it to a string. That doesn't make a great deal of sense. What you need to do first is put the data into a more meaningful container. By calling loads you take the string value of the return and parse it into an actual Python Dictionary. Then, you can pass that data to dumps which outputs a string using your requested formatting.

You have things backwards. If you want to convert a string to a data structure you need to use json.loads(thestring). json.dumps() is for converting a data structure to a json encoded string.

You are supposed to dump an object (like a dictionary) which then becomes a string, not the other way round... see here.
Use json.loads() instead.

You want json.loads. The dumps method is for going the other way (dumping an object to a json string).

Related

python TypeError: string indices must be integers json

Can some one tell me what I am doing wrong ?I am Getting this error..
went through the earlier post of similar error. couldn't able to understand..
import json
import re
import requests
import subprocess
res = requests.get('https://api.tempura1.com/api/1.0/recipes', auth=('12345','123'), headers={'App-Key': 'some key'})
data = res.text
extracted_recipes = []
for recipe in data['recipes']:
extracted_recipes.append({
'name': recipe['name'],
'status': recipe['status']
})
print extracted_recipes
TypeError: string indices must be integers
data contains the below
{
"recipes": {
"47635": {
"name": "Desitnation Search",
"status": "SUCCESSFUL",
"kitchen": "eu",
"active": "YES",
"created_at": 1501672231,
"interval": 5,
"use_legacy_notifications": false
},
"65568": {
"name": "Validation",
"status": "SUCCESSFUL",
"kitchen": "us-west",
"active": "YES",
"created_at": 1522583593,
"interval": 5,
"use_legacy_notifications": false
},
"47437": {
"name": "Gateday",
"status": "SUCCESSFUL",
"kitchen": "us-west",
"active": "YES",
"created_at": 1501411588,
"interval": 10,
"use_legacy_notifications": false
}
},
"counts": {
"total": 3,
"limited": 3,
"filtered": 3
}
}
You are not converting the text to json. Try
data = json.loads(res.text)
or
data = res.json()
Apart from that, you probably need to change the for loop to loop over the values instead of the keys. Change it to something the following
for recipe in data['recipes'].values()
There are two problems with your code, which you could have found out by yourself by doing a very minimal amount of debugging.
The first problem is that you don't parse the response contents from json to a native Python object. Here:
data = res.text
data is a string (json formatted, but still a string). You need to parse it to turn it into it's python representation (in this case a dict). You can do it using the stdlib's json.loads() (general solution) or, since you're using python-requests, just by calling the Response.json() method:
data = res.json()
Then you have this:
for recipe in data['recipes']:
# ...
Now that we have turned data into a proper dict, we can access the data['recipes'] subdict, but iterating directly over a dict actually iterates over the keys, not the values, so in your above for loop recipe will be a string ( "47635", "65568" etc). If you want to iterate over the values, you have to ask for it explicitly:
for recipe in data['recipes'].values():
# now `recipe` is the dict you expected

json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 190) [duplicate]

This question already has answers here:
Python json.loads shows ValueError: Extra data
(11 answers)
Loading JSONL file as JSON objects
(5 answers)
Closed 2 years ago.
I am running the following code-
import json
addrsfile =
open("C:\\Users\file.json",
"r")
addrJson = json.loads(addrsfile.read())
addrsfile.close()
if addrJson:
print("yes")
But giving me following error-
Traceback (most recent call last):
File "C:/Users/Mayur/Documents/WebPython/Python_WebServices/test.py", line 9, in <module>
addrJson = json.loads(addrsfile.read())
File "C:\Users\Mayur\Anaconda3\lib\json\__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "C:\Users\Mayur\Anaconda3\lib\json\decoder.py", line 342, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 190)
Anyone help me please?
JSON file is like-
{"name": "XYZ", "address": "54.7168,94.0215", "country_of_residence": "PQR", "countries": "LMN;PQRST", "date": "28-AUG-2008", "type": null}
{"name": "OLMS", "address": null, "country_of_residence": null, "countries": "Not identified;No", "date": "23-FEB-2017", "type": null}
You have two records in your json file, and json.loads() is not able to decode more than one. You need to do it record by record.
See Python json.loads shows ValueError: Extra data
OR you need to reformat your json to contain an array:
{
"foo" : [
{"name": "XYZ", "address": "54.7168,94.0215", "country_of_residence": "PQR", "countries": "LMN;PQRST", "date": "28-AUG-2008", "type": null},
{"name": "OLMS", "address": null, "country_of_residence": null, "countries": "Not identified;No", "date": "23-FEB-2017", "type": null}
]
}
would be acceptable again. But there cannot be several top level objects.
I was parsing JSON from a REST API call and got this error. It turns out the API had become "fussier" (eg about order of parameters etc) and so was returning malformed results. Check that you are getting what you expect :)
This error can also show up if there are parts in your string that json.loads() does not recognize. An in this example string, an error will be raised at character 27 (char 27).
string = """[{"Item1": "One", "Item2": False}, {"Item3": "Three"}]"""
My solution to this would be to use the string.replace() to convert these items to a string:
import json
string = """[{"Item1": "One", "Item2": False}, {"Item3": "Three"}]"""
string = string.replace("False", '"False"')
dict_list = json.loads(string)

Writing multiple json objects to a json file

I have a list of json objects that I would like to write to a json file. Example of my data is as follows:
{
"_id": "abc",
"resolved": false,
"timestamp": "2017-04-18T04:57:41 366000",
"timestamp_utc": {
"$date": 1492509461366
},
"sessionID": "abc",
"resHeight": 768,
"time_bucket": ["2017-year", "2017-04-month", "2017-16-week", "2017-04-18-day", "2017-04-18 16-hour"],
"referrer": "Standalone",
"g_event_id": "abc",
"user_agent": "abc"
"_id": "abc",
} {
"_id": "abc",
"resolved": false,
"timestamp": "2017-04-18T04:57:41 366000",
"timestamp_utc": {
"$date": 1492509461366
},
"sessionID": "abc",
"resHeight": 768,
"time_bucket": ["2017-year", "2017-04-month", "2017-16-week", "2017-04-18-day", "2017-04-18 16-hour"],
"referrer": "Standalone",
"g_event_id": "abc",
"user_agent": "abc"
}
I would like to wirte this to a json file. Here's the code that I am using for this purpose:
with open("filename", 'w') as outfile1:
for row in data:
outfile1.write(json.dumps(row))
But this gives me a file with only 1 long row of data. I would like to have a row for each json object in my original data. I know there are some other StackOverflow questions that are trying to address somewhat similar situation (by externally inserting '\n' etc.), but it hasn't worked in my case for some reason. I believe there has to be a pythonic way to do this.
How do I achieve this?
The format of the file you are trying to create is called JSON lines.
It seems, you are asking why the jsons are not separated with a newline. Because write method does not append the newline.
If you want implicit newlines you should better use print function:
with open("filename", 'w') as outfile1:
for row in data:
print(json.dumps(row), file=outfile1)
Use the indent argument to output json with extra whitespace. The default is to not output linebreaks or extra spaces.
with open('filename.json', 'w') as outfile1:
json.dump(data, outfile1, indent=4)
https://docs.python.org/3/library/json.html#basic-usage

How to use Python decode multiple json object with it in one json file [duplicate]

I have a multi-gigabyte JSON file. The file is made up of JSON objects that are no more than a few thousand characters each, but there are no line breaks between the records.
Using Python 3 and the json module, how can I read one JSON object at a time from the file into memory?
The data is in a plain text file. Here is an example of a similar record. The actual records contains many nested dictionaries and lists.
Record in readable format:
{
"results": {
"__metadata": {
"type": "DataServiceProviderDemo.Address"
},
"Street": "NE 228th",
"City": "Sammamish",
"State": "WA",
"ZipCode": "98074",
"Country": "USA"
}
}
}
Actual format. New records start one after the other without any breaks.
{"results": { "__metadata": {"type": "DataServiceProviderDemo.Address"},"Street": "NE 228th","City": "Sammamish","State": "WA","ZipCode": "98074","Country": "USA" } } }{"results": { "__metadata": {"type": "DataServiceProviderDemo.Address"},"Street": "NE 228th","City": "Sammamish","State": "WA","ZipCode": "98074","Country": "USA" } } }{"results": { "__metadata": {"type": "DataServiceProviderDemo.Address"},"Street": "NE 228th","City": "Sammamish","State": "WA","ZipCode": "98074","Country": "USA" } } }
Generally speaking, putting more than one JSON object into a file makes that file invalid, broken JSON. That said, you can still parse data in chunks using the JSONDecoder.raw_decode() method.
The following will yield complete objects as the parser finds them:
from json import JSONDecoder
from functools import partial
def json_parse(fileobj, decoder=JSONDecoder(), buffersize=2048):
buffer = ''
for chunk in iter(partial(fileobj.read, buffersize), ''):
buffer += chunk
while buffer:
try:
result, index = decoder.raw_decode(buffer)
yield result
buffer = buffer[index:].lstrip()
except ValueError:
# Not enough data to decode, read more
break
This function will read chunks from the given file object in buffersize chunks, and have the decoder object parse whole JSON objects from the buffer. Each parsed object is yielded to the caller.
Use it like this:
with open('yourfilename', 'r') as infh:
for data in json_parse(infh):
# process object
Use this only if your JSON objects are written to a file back-to-back, with no newlines in between. If you do have newlines, and each JSON object is limited to a single line, you have a JSON Lines document, in which case you can use Loading and parsing a JSON file with multiple JSON objects in Python instead.
Here is a slight modification of Martijn Pieters' solution, which will handle JSON strings separated with whitespace.
def json_parse(fileobj, decoder=json.JSONDecoder(), buffersize=2048,
delimiters=None):
remainder = ''
for chunk in iter(functools.partial(fileobj.read, buffersize), ''):
remainder += chunk
while remainder:
try:
stripped = remainder.strip(delimiters)
result, index = decoder.raw_decode(stripped)
yield result
remainder = stripped[index:]
except ValueError:
# Not enough data to decode, read more
break
For example, if data.txt contains JSON strings separated by a space:
{"business_id": "1", "Accepts Credit Cards": true, "Price Range": 1, "type": "food"} {"business_id": "2", "Accepts Credit Cards": true, "Price Range": 2, "type": "cloth"} {"business_id": "3", "Accepts Credit Cards": false, "Price Range": 3, "type": "sports"}
then
In [47]: list(json_parse(open('data')))
Out[47]:
[{u'Accepts Credit Cards': True,
u'Price Range': 1,
u'business_id': u'1',
u'type': u'food'},
{u'Accepts Credit Cards': True,
u'Price Range': 2,
u'business_id': u'2',
u'type': u'cloth'},
{u'Accepts Credit Cards': False,
u'Price Range': 3,
u'business_id': u'3',
u'type': u'sports'}]
If your JSON documents contains a list of objects, and you want to read one object one-at-a-time, you can use the iterative JSON parser ijson for the job. It will only read more content from the file when it needs to decode the next object.
Note that you should use it with the YAJL library, otherwise you will likely not see any performance increase.
That being said, unless your file is really big, reading it completely into memory and then parsing it with the normal JSON module will probably still be the best option.

Load multiple JSON objects from the same file in Python [duplicate]

I have a multi-gigabyte JSON file. The file is made up of JSON objects that are no more than a few thousand characters each, but there are no line breaks between the records.
Using Python 3 and the json module, how can I read one JSON object at a time from the file into memory?
The data is in a plain text file. Here is an example of a similar record. The actual records contains many nested dictionaries and lists.
Record in readable format:
{
"results": {
"__metadata": {
"type": "DataServiceProviderDemo.Address"
},
"Street": "NE 228th",
"City": "Sammamish",
"State": "WA",
"ZipCode": "98074",
"Country": "USA"
}
}
}
Actual format. New records start one after the other without any breaks.
{"results": { "__metadata": {"type": "DataServiceProviderDemo.Address"},"Street": "NE 228th","City": "Sammamish","State": "WA","ZipCode": "98074","Country": "USA" } } }{"results": { "__metadata": {"type": "DataServiceProviderDemo.Address"},"Street": "NE 228th","City": "Sammamish","State": "WA","ZipCode": "98074","Country": "USA" } } }{"results": { "__metadata": {"type": "DataServiceProviderDemo.Address"},"Street": "NE 228th","City": "Sammamish","State": "WA","ZipCode": "98074","Country": "USA" } } }
Generally speaking, putting more than one JSON object into a file makes that file invalid, broken JSON. That said, you can still parse data in chunks using the JSONDecoder.raw_decode() method.
The following will yield complete objects as the parser finds them:
from json import JSONDecoder
from functools import partial
def json_parse(fileobj, decoder=JSONDecoder(), buffersize=2048):
buffer = ''
for chunk in iter(partial(fileobj.read, buffersize), ''):
buffer += chunk
while buffer:
try:
result, index = decoder.raw_decode(buffer)
yield result
buffer = buffer[index:].lstrip()
except ValueError:
# Not enough data to decode, read more
break
This function will read chunks from the given file object in buffersize chunks, and have the decoder object parse whole JSON objects from the buffer. Each parsed object is yielded to the caller.
Use it like this:
with open('yourfilename', 'r') as infh:
for data in json_parse(infh):
# process object
Use this only if your JSON objects are written to a file back-to-back, with no newlines in between. If you do have newlines, and each JSON object is limited to a single line, you have a JSON Lines document, in which case you can use Loading and parsing a JSON file with multiple JSON objects in Python instead.
Here is a slight modification of Martijn Pieters' solution, which will handle JSON strings separated with whitespace.
def json_parse(fileobj, decoder=json.JSONDecoder(), buffersize=2048,
delimiters=None):
remainder = ''
for chunk in iter(functools.partial(fileobj.read, buffersize), ''):
remainder += chunk
while remainder:
try:
stripped = remainder.strip(delimiters)
result, index = decoder.raw_decode(stripped)
yield result
remainder = stripped[index:]
except ValueError:
# Not enough data to decode, read more
break
For example, if data.txt contains JSON strings separated by a space:
{"business_id": "1", "Accepts Credit Cards": true, "Price Range": 1, "type": "food"} {"business_id": "2", "Accepts Credit Cards": true, "Price Range": 2, "type": "cloth"} {"business_id": "3", "Accepts Credit Cards": false, "Price Range": 3, "type": "sports"}
then
In [47]: list(json_parse(open('data')))
Out[47]:
[{u'Accepts Credit Cards': True,
u'Price Range': 1,
u'business_id': u'1',
u'type': u'food'},
{u'Accepts Credit Cards': True,
u'Price Range': 2,
u'business_id': u'2',
u'type': u'cloth'},
{u'Accepts Credit Cards': False,
u'Price Range': 3,
u'business_id': u'3',
u'type': u'sports'}]
If your JSON documents contains a list of objects, and you want to read one object one-at-a-time, you can use the iterative JSON parser ijson for the job. It will only read more content from the file when it needs to decode the next object.
Note that you should use it with the YAJL library, otherwise you will likely not see any performance increase.
That being said, unless your file is really big, reading it completely into memory and then parsing it with the normal JSON module will probably still be the best option.

Categories

Resources