Python errors when trying to read and query a JSON file - python

I am trying to write a Python function as part of my job to be able to check the existence of data in a JSON file which I can only get by downloading it from a website. I am the only resource here with any coding or scripting experience (HTML, CSS & SQL) so this has fallen to me to sort out. I have no experience thus far with Python.
I am not allowed to change the structure or format of the JSON file, the format of it is:
{
"naglowek": {
"dataGenerowaniaDanych": "20210514",
"liczbaTransformacji": "5000",
"schemat": "RRRRMMDDNNNNNNNNNNBBBBBBBBBBBBBBBBBBBBBBBBBB"
},
"skrotyPodatnikowCzynnych": [
"examplestring1",
"examplestring2",
"examplestring3",
"examplestring4",
],
"maski": [
"examplemask1",
"examplemask2",
"examplemask3",
"examplemask4"
]
}
I have tried numerous examples found online but none of them seem to work. From looking at various websites the Python code I have is:
import json
with open('20210514.json') as myfile:
data = json.load(myfile)
print(data)
keyVal = 'examplestring2'
if keyVal in data:
# Print the success message and the value of the key
print("Data is found in JSON data")
else:
# Print the message if the value does not exist
print("Data is not found in JSON data")
But I am getting these errors below, I am a complete newbie to Python so am having trouble deciphering them:
D:\PycharmProjects\venv\Scripts\python.exe D:/PycharmProjects/json_test.py
Traceback (most recent call last):
File "D:\PycharmProjects\json_test.py", line 4, in <module>
data = json.load(myfile)
File "C:\Users\xyz\AppData\Local\Programs\Python\Python39\lib\json\__init__.py", line 293, in load
return loads(fp.read(),
File "C:\Users\xyz\AppData\Local\Programs\Python\Python39\lib\json\__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "C:\Users\xyz\AppData\Local\Programs\Python\Python39\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\xyz\AppData\Local\Programs\Python\Python39\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 12 column 5 (char 921)
Process finished with exit code 1
Any help would be massively appreciated!

{
"naglowek": {
"dataGenerowaniaDanych": "20210514",
"liczbaTransformacji": "5000",
"schemat": "RRRRMMDDNNNNNNNNNNBBBBBBBBBBBBBBBBBBBBBBBBBB"
},
"skrotyPodatnikowCzynnych": [
"examplestring1",
"examplestring2",
"examplestring3",
"examplestring4"
],
"maski": [
"examplemask1",
"examplemask2",
"examplemask3",
"examplemask4"
]
}
This should work. The problem here is that you have a comma at the end of a list which your parser can't handle. ECMAScript 5 introduced the ability to parse that. But apparently JSON in general doesn't support it (yet?). So, make sure to not have a comma at the end of a list.
For your if-else statement to be correct, you'd have to change it to something like this:
keyVal = 'examplestring2'
keyName = 'skrotyPodatnikowCzynnych'
if keyName in data.keys() and keyval in data[keyName]:
# Print the success message and the value of the key
print("Data is found in JSON data")
else:
# Print the message if the value does not exist
print("Data is not found in JSON data")

Remove the trailing comma. JSON specification does not allow a trailing comma

If you don't want to change the file structure then you have to do this:
import yaml
with open('20210514.json') as myfile:
data = yaml.load(myfile, Loader=yaml.FullLoader)
print(data)
You also need to install yaml first.
https://pyyaml.org/

Related

Number plate detection JSON dataset

I am new to JSON. I am doing a project for Vehicle Number Plate Detection.
I have a dataset of the form:
{"content": "http://com.dataturks.a96-i23.open.s3.amazonaws.com/2c9fafb0646e9cf9016473f1a561002a/77d1f81a-bee6-487c-aff2-0efa31a9925c____bd7f7862-d727-11e7-ad30-e18a56154311.jpg.jpeg","annotation":[{"label":["number_plate"],"notes":"","points":[{"x":0.7220843672456576,"y":0.5879828326180258},{"x":0.8684863523573201,"y":0.6888412017167382}],"imageWidth":806,"imageHeight":466}],"extras":null},
{"content": "http://com.dataturks.a96-i23.open.s3.amazonaws.com/2c9fafb0646e9cf9016473f1a561002a/4eb236a3-6547-4103-b46f-3756d21128a9___06-Sanjay-Dutt.jpg.jpeg","annotation":[{"label":["number_plate"],"notes":"","points":[{"x":0.16194331983805668,"y":0.8507795100222717},{"x":0.582995951417004,"y":1}],"imageWidth":494,"imageHeight":449}],"extras":null},
There are in total 240 blocks of data.
I want to do two things with the above dataset.
Firstly,I need to download all the images from each block and secondly,need to get the values of "points" column to a text file.
I am getting problem while getting the values for the columns.
import json
jsonFile = open('Indian_Number_plates.json', 'r')
x = json.load(jsonFile)
for criteria in x['annotation']:
for key, value in criteria.iteritems():
print(key, 'is:', value)
print('')
I have written the above code to get all the values under the "annotation".
But,getting the following error
Traceback (most recent call last):
File "prac.py", line 13, in <module>
x = json.load(jsonFile)
File "C:\python364\Lib\json\__init__.py", line 299, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "C:\python364\Lib\json\__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "C:\python364\Lib\json\decoder.py", line 342, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 394 (char 393)
Please help me for getting the values for "points" column and also for downloading the images from the link in the "content" section.
i found this answer while searching. Essentially, you can read an object, catch the exception when JSON sees an unexpected object, and then seek/reparse and build a list of objects.
in Java, i'd just tell you to use Jackson and their SAX style streaming interface, as i've done that to read a list of objects formatted like this - if JSON in python has a streaming api, i'd use that instead of the exception handler workaround
the error comes because your file contains two records or more :
{"content": "http://com.dataturks.a96- } ..... {"content": .....
to solve this you should reformat your json so that all the records are contained in an array :
{ "data" : [ {"content": "http://com.dataturks.a96- .... },{"content":... }]}
to download the images, extract the image names and urls and use requests :
import requests
with open(image_name, 'wb') as handle:
response = requests.get(pic_url, stream=True)
if not response.ok:
print response
for block in response.iter_content(1024):
if not block:
break
handle.write(block)

Extract JSON Data in Python - Example Code Included

I am brand new to using JSON data and fairly new to Python. I am struggling with being able to parse the following JSON data in Python, in order to import the data into a SQL Server database. I already have a program that will import the parsed data into sql server using PYDOBC, however I can't for the life of me figure out how to correctly parse the JSON data into a Python dictionary.
I know there are a number of threads that address this issue, however I was unable to find any examples of the same JSON data structure. Any help would be greatly appreciated as I am completely stuck on this issue. Thank you SO! Below is a cut of the JSON data I am working with:
{
"data":
[
{
"name": "Mobile Application",
"url": "https://www.example-url.com",
"metric": "users",
"package": "example_pkg",
"country": "USA",
"data": [
[ 1396137600000, 5.76 ],
[ 1396224000000, 5.79 ],
[ 1396310400000, 6.72 ],
....
[ 1487376000000, 7.15 ]
]
}
],"as_of":"2017-01-22"}
Again, I apologize if this thread is repetitive, however as I mentioned above, I was not able to work out the logic from other threads as I am brand new to using JSON.
Thank you again for any help or advice in regard to this.
import json
with open("C:\\Pathyway\\7Park.json") as json_file:
data = json.load(json_file)
assert data["data"][0]["metric"] == "users"
The above code results with the following error:
Traceback (most recent call last):
File "JSONpy", line 10, in <module>
data = json.load(json_file)
File "C:\json\__init__.py", line 291, in load
**kw)
File "C:\json\__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "C:\json\decoder.py", line 367, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 2 column 1 - line 7 column 1 (char 23549 - 146249)
Assuming the data you've described (less the ... ellipsis) is in a file called j.json, this code parses the JSON document into a Python object:
import json
with open("j.json") as json_file:
data = json.load(json_file)
assert data["data"][0]["metric"] == "users"
From your error message it seems possible that your file is not a single JSON document, but a sequence of JSON documents separated by newlines. If that is the case, then this code might be more helpful:
import json
with open("j.json") as json_file:
for line in json_file:
data = json.loads(line)
print (data["data"][0]["metric"])

Python -- get at JSON info that's written like XML

In Python, I usually do simple JSON with this sort of template:
url = "url"
file = urllib2.urlopen(url)
json = file.read()
parsed = json.loads(json)
and then get at the variables with calls like:
parsed[obj name][value name]
But, this works with JSON that's formatted roughly like:
{'object':{'index':'value', 'index':'value'}}
The JSON I just encountered is formatted like:
{'index':'value', 'index':'value'},{'index':'value', 'index':'value'}
so there are no names for me to reference the different blocks. Of course the blocks give different info, but have the same "keys" -- much like XML is usually formatted. Using my method above, how would I parse through this JSON?
The following is not a valid JSON.
{'index':'value', 'index':'value'},{'index':'value', 'index':'value'}
Where as
[{'index':'value', 'index':'value'},{'index':'value', 'index':'value'}] is a valid JSON.
and python trackback shows that
import json
string = "{'index':'value', 'index':'value'},{'index':'value', 'index':'value'}"
parsed = json.loads(string)
print parsed
Traceback (most recent call last):
File "/Users/tron/Desktop/test3.py", line 3, in <module>
parsed_json = json.loads(json_string)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 369, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 1 column 27 - line 1 column 54 (char 26 - 53)
[Finished in 0.0s with exit code 1]
where is if you do
json_string = '[{"a":"value", "b":"value"},{"a":"value", "b":"value"}]'
everything works fine.
If that is the case, you can refer to it as an array of Jsons. where json_string[0] is the first JSON string. json_string[1] is the second and so on.
Otherwise if you think this is going to be an issue that you "just have to deal with". Here is one option:
Think of the ways JSON can be malformed and write a simple class to account for them. In the case above, here is a hacky way you can deal with it.
import json
json_string = '{"a":"value", "b":"value"},{"a":"value", "b":"value"}'
def parseJson(string):
parsed_json = None
try:
parsed_json = json.loads(string)
print parsed_json
except ValueError, e:
print string, "didnt parse"
if "Extra data" in str(e.args):
newString = "["+string+"]"
print newString
return parseJson(newString)
You could add more if/else to deal with various things you run into. I have to admit, this is very hacky and I don't think you can ever account for every possible mutation.
Good luck
The result must be list of dict:
[{'index1':'value1', 'index2':'value2'},{'index1':'value1', 'index2':'value2'}]
thus you can reference it using numbers: item[1]['index1']

Handle JSON Decode Error when nothing returned

I am parsing json data. I don't have an issue with parsing and I am using simplejson module. But some api requests returns empty value. Here is my example:
{
"all" : {
"count" : 0,
"questions" : [ ]
}
}
This is the segment of my code where I parse the json object:
qByUser = byUsrUrlObj.read()
qUserData = json.loads(qByUser).decode('utf-8')
questionSubjs = qUserData["all"]["questions"]
As I mentioned for some requests I get the following error:
Traceback (most recent call last):
File "YahooQueryData.py", line 164, in <module>
qUserData = json.loads(qByUser)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/simplejson/__init__.py", line 385, in loads
return _default_decoder.decode(s)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/simplejson/decoder.py", line 402, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/simplejson/decoder.py", line 420, in raw_decode
raise JSONDecodeError("No JSON object could be decoded", s, idx)
simplejson.decoder.JSONDecodeError: No JSON object could be decoded: line 1 column 0 (char 0)
What would be the best way to handle this error?
There is a rule in Python programming called "it is Easier to Ask for Forgiveness than for Permission" (in short: EAFP). It means that you should catch exceptions instead of checking values for validity.
Thus, try the following:
try:
qByUser = byUsrUrlObj.read()
qUserData = json.loads(qByUser).decode('utf-8')
questionSubjs = qUserData["all"]["questions"]
except ValueError: # includes simplejson.decoder.JSONDecodeError
print('Decoding JSON has failed')
EDIT: Since simplejson.decoder.JSONDecodeError actually inherits from ValueError (proof here), I simplified the catch statement by just using ValueError.
If you don't mind importing the json module, then the best way to handle it is through json.JSONDecodeError (or json.decoder.JSONDecodeError as they are the same) as using default errors like ValueError could catch also other exceptions not necessarily connected to the json decode one.
from json.decoder import JSONDecodeError
try:
qByUser = byUsrUrlObj.read()
qUserData = json.loads(qByUser).decode('utf-8')
questionSubjs = qUserData["all"]["questions"]
except JSONDecodeError as e:
# do whatever you want
//EDIT (Oct 2020):
As #Jacob Lee noted in the comment, there could be the basic common TypeError raised when the JSON object is not a str, bytes, or bytearray. Your question is about JSONDecodeError, but still it is worth mentioning here as a note; to handle also this situation, but differentiate between different issues, the following could be used:
from json.decoder import JSONDecodeError
try:
qByUser = byUsrUrlObj.read()
qUserData = json.loads(qByUser).decode('utf-8')
questionSubjs = qUserData["all"]["questions"]
except JSONDecodeError as e:
# do whatever you want
except TypeError as e:
# do whatever you want in this case

How to use simplejson to decode following data?

I grab some data from a URL, and search online to find out the data is in in Jason data format, but when I tried to use simplejson.loads(data), it will raise exception.
First time deal with jason data, any suggestion how to decode the data?
Thanks
=================
result = simplejson.loads(data, encoding="utf-8")
File "F:\My Documents\My Dropbox\StockDataDownloader\simplejson__init__.py", line 401, in loads
return cls(encoding=encoding, **kw).decode(s)
File "F:\My Documents\My Dropbox\StockDataDownloader\simplejson\decoder.py", line 402, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "F:\My Documents\My Dropbox\StockDataDownloader\simplejson\decoder.py", line 420, in raw_decode
raise JSONDecodeError("No JSON object could be decoded", s, idx)
simplejson.decoder.JSONDecodeError: No JSON object could be decoded: line 1 column 0 (char 0)
============================
data = "{identifier:'ID', label:'As at Wed 4 Aug 2010 05:05 PM',items:[{ID:0,N:'2ndChance',NC:'528',R:'NONE',I:'NONE',M:'-',LT:0.335,C:0.015,VL:51.000,BV:20.000,B:0.330,S:0.345,SV:20.000,O:0.335,H:0.335,L:0.335,V:17085.000,SC:'4',PV:0.320,P:4.6875,P_:'X',V_:''},{ID:1,N:'8Telecom',NC:'E25',R:'NONE',I:'NONE',M:'-',LT:0.190,C:0.000,VL:965.000,BV:1305.000,B:0.185,S:0.190,SV:641.000,O:0.185,H:0.190,L:0.185,V:179525.000,SC:'2',PV:0.190,P:0.0,P_:'X',V_:''},{ID:2,N:'A-Sonic',NC:'A53',R:'NONE',I:'NONE',M:'-',LT:0.090,C:0.005,VL:1278.000,BV:17.000,B:0.090,S:0.095,SV:346.000,O:0.090,H:0.090,L:0.090,V:115020.000,SC:'A',PV:0.085,P:5.882352734375,P_:'X',V_:''},{ID:3,N:'AA Grp',NC:'5GZ',R:'NONE',I:'NONE',M:'t',LT:0.000,C:0.000,VL:0.000,BV:100.000,B:0.050,S:0.060,SV:50.000,O:0.000,H:0.000,L:0.000,V:0.000,SC:'2',PV:0.050,P:0.0,P_:'X',V_:''}]}"
You're using simplejson correctly, but the site that gave you that data isn't using JSON format properly. Look at json.org, which uses simple syntax diagrams to show what is JSON: in the object diagram, after { (unless the object is empty, in which case a } immediately follows), JSON always has a string -- and as you see in that diagram, this means something that starts with a double quote. So, the very start of the string:
{identifier:
tells you that's incorrect JSON -- no double quotes around the word identifier.
Working around this problem is not as easy as recognizing it's there, but I wanted to reassure you, at least, about your code. Sigh it does seem that broken websites, such a great tradition in old HTML days, are with us to stay no matter how modern the technology they break is...:-(

Categories

Resources