converting string to dictionary using json.loads

converting string to dictionary using json.loads - python

I'm trying to pass some json data extracted from a JavaScript file.
I have the following variable in my python code. I get the string from file.read(). I know the below will be set as a dict if pasted into a python code as is.
resultStr = {"inst":{"summary":{"statistics":[],"wa_recursive":"100.000%","files":11,"dus":11}},"du":{"summary":{"statistics":[{"type":"stmt","data":"Statement Coverage","status":"covered","weight":1,"rhits":"100.000%","rtotal":"100.000%"},{"data":"Statements","rhits":86.000,"rtotal":86.000},{"data":"Subprograms","rhits":0.000,"rtotal":0.000},{"type":"branch","data":"Branch Coverage","status":"covered","weight":1,"rhits":"100.000%","rtotal":"100.000%"},{"data":"Branch paths","rhits":42.000,"rtotal":42.000},{"data":"Branches","rhits":21.000,"rtotal":21.000},{"type":"toggle","data":"Toggle Coverage","status":"uncovered","weight":1,"rhits":"94.410%","rtotal":"100.000%"},{"data":"Toggle bins","rhits":304.000,"rtotal":322.000},{"data":"Signal bits","rhits":150.000,"rtotal":161.000}],"wa_recursive":"98.137%","files":11,"dus":11}}};
When i pass this string into the json loader
json.loads(resultStr)
I get the following exception
File "C:\Python34\lib\json\__init__.py", line 318, in loads
return _default_decoder.decode(s)
File "C:\Python34\lib\json\decoder.py", line 346, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 1 column 825 - line 1 column 826 (char 824 - 825)
To simplify its failing on the last part of the string
"wa_recursive":"98.137%","files":11,"dus":11}}};
I've tried to just enter it manually and it is recognized as a dictionary in the python code.
I cant seem to find any fault with it so some assistance would be appreciated :)
Thank you :)

The following works fine for me. Did you keep the semicolon in the string?
import json
resultStr = '{"inst":{"summary":{"statistics":[],"wa_recursive":"100.000%","files":11,"dus":11}},"du":{"summary":{"statistics":[{"type":"stmt","data":"Statement Coverage","status":"covered","weight":1,"rhits":"100.000%","rtotal":"100.000%"},{"data":"Statements","rhits":86.000,"rtotal":86.000},{"data":"Subprograms","rhits":0.000,"rtotal":0.000},{"type":"branch","data":"Branch Coverage","status":"covered","weight":1,"rhits":"100.000%","rtotal":"100.000%"},{"data":"Branch paths","rhits":42.000,"rtotal":42.000},{"data":"Branches","rhits":21.000,"rtotal":21.000},{"type":"toggle","data":"Toggle Coverage","status":"uncovered","weight":1,"rhits":"94.410%","rtotal":"100.000%"},{"data":"Toggle bins","rhits":304.000,"rtotal":322.000},{"data":"Signal bits","rhits":150.000,"rtotal":161.000}],"wa_recursive":"98.137%","files":11,"dus":11}}}'
decodedData = json.loads(resultStr);
print(decodedData);

Related

Python errors when trying to read and query a JSON file

I am trying to write a Python function as part of my job to be able to check the existence of data in a JSON file which I can only get by downloading it from a website. I am the only resource here with any coding or scripting experience (HTML, CSS & SQL) so this has fallen to me to sort out. I have no experience thus far with Python.
I am not allowed to change the structure or format of the JSON file, the format of it is:
{
"naglowek": {
"dataGenerowaniaDanych": "20210514",
"liczbaTransformacji": "5000",
"schemat": "RRRRMMDDNNNNNNNNNNBBBBBBBBBBBBBBBBBBBBBBBBBB"
},
"skrotyPodatnikowCzynnych": [
"examplestring1",
"examplestring2",
"examplestring3",
"examplestring4",
],
"maski": [
"examplemask1",
"examplemask2",
"examplemask3",
"examplemask4"
]
}
I have tried numerous examples found online but none of them seem to work. From looking at various websites the Python code I have is:
import json
with open('20210514.json') as myfile:
data = json.load(myfile)
print(data)
keyVal = 'examplestring2'
if keyVal in data:
# Print the success message and the value of the key
print("Data is found in JSON data")
else:
# Print the message if the value does not exist
print("Data is not found in JSON data")
But I am getting these errors below, I am a complete newbie to Python so am having trouble deciphering them:
D:\PycharmProjects\venv\Scripts\python.exe D:/PycharmProjects/json_test.py
Traceback (most recent call last):
File "D:\PycharmProjects\json_test.py", line 4, in <module>
data = json.load(myfile)
File "C:\Users\xyz\AppData\Local\Programs\Python\Python39\lib\json\__init__.py", line 293, in load
return loads(fp.read(),
File "C:\Users\xyz\AppData\Local\Programs\Python\Python39\lib\json\__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "C:\Users\xyz\AppData\Local\Programs\Python\Python39\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\xyz\AppData\Local\Programs\Python\Python39\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 12 column 5 (char 921)
Process finished with exit code 1
Any help would be massively appreciated!

{
"naglowek": {
"dataGenerowaniaDanych": "20210514",
"liczbaTransformacji": "5000",
"schemat": "RRRRMMDDNNNNNNNNNNBBBBBBBBBBBBBBBBBBBBBBBBBB"
},
"skrotyPodatnikowCzynnych": [
"examplestring1",
"examplestring2",
"examplestring3",
"examplestring4"
],
"maski": [
"examplemask1",
"examplemask2",
"examplemask3",
"examplemask4"
]
}
This should work. The problem here is that you have a comma at the end of a list which your parser can't handle. ECMAScript 5 introduced the ability to parse that. But apparently JSON in general doesn't support it (yet?). So, make sure to not have a comma at the end of a list.
For your if-else statement to be correct, you'd have to change it to something like this:
keyVal = 'examplestring2'
keyName = 'skrotyPodatnikowCzynnych'
if keyName in data.keys() and keyval in data[keyName]:
# Print the success message and the value of the key
print("Data is found in JSON data")
else:
# Print the message if the value does not exist
print("Data is not found in JSON data")

Remove the trailing comma. JSON specification does not allow a trailing comma

If you don't want to change the file structure then you have to do this:
import yaml
with open('20210514.json') as myfile:
data = yaml.load(myfile, Loader=yaml.FullLoader)
print(data)
You also need to install yaml first.
https://pyyaml.org/

Facing problems with json decoding

I declared a variable which stores JSON file (output returned from subprocess).
app_data = self.run_subprocess(create_app)
Printed app_data looks like that:
(check comments for printed data)
I want to grab particular value from this str "appId", so I try to load app_data to json string and grab that value..
json_str = json.loads(app_data)
print(json_str["appId"])
Error
json.decoder.JSONDecodeError: Extra data: line 190 column 1 (char 5767)

It works fine upon running it and return the value 7f1f91c2-3b28-48ee-96ed-89080980. You can also confirm that it's a valid Json String by checking with a validator here.
The error
json.decoder.JSONDecodeError: Extra data: line 190 column 1 (char 5767)
I believe the error is to do with some malformed character on line 190, right after the line of the closing }. Find that and delete it and it should work fine

Python -- get at JSON info that's written like XML

In Python, I usually do simple JSON with this sort of template:
url = "url"
file = urllib2.urlopen(url)
json = file.read()
parsed = json.loads(json)
and then get at the variables with calls like:
parsed[obj name][value name]
But, this works with JSON that's formatted roughly like:
{'object':{'index':'value', 'index':'value'}}
The JSON I just encountered is formatted like:
{'index':'value', 'index':'value'},{'index':'value', 'index':'value'}
so there are no names for me to reference the different blocks. Of course the blocks give different info, but have the same "keys" -- much like XML is usually formatted. Using my method above, how would I parse through this JSON?

The following is not a valid JSON.
{'index':'value', 'index':'value'},{'index':'value', 'index':'value'}
Where as
[{'index':'value', 'index':'value'},{'index':'value', 'index':'value'}] is a valid JSON.
and python trackback shows that
import json
string = "{'index':'value', 'index':'value'},{'index':'value', 'index':'value'}"
parsed = json.loads(string)
print parsed
Traceback (most recent call last):
File "/Users/tron/Desktop/test3.py", line 3, in <module>
parsed_json = json.loads(json_string)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 369, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 1 column 27 - line 1 column 54 (char 26 - 53)
[Finished in 0.0s with exit code 1]
where is if you do
json_string = '[{"a":"value", "b":"value"},{"a":"value", "b":"value"}]'
everything works fine.
If that is the case, you can refer to it as an array of Jsons. where json_string[0] is the first JSON string. json_string[1] is the second and so on.
Otherwise if you think this is going to be an issue that you "just have to deal with". Here is one option:
Think of the ways JSON can be malformed and write a simple class to account for them. In the case above, here is a hacky way you can deal with it.
import json
json_string = '{"a":"value", "b":"value"},{"a":"value", "b":"value"}'
def parseJson(string):
parsed_json = None
try:
parsed_json = json.loads(string)
print parsed_json
except ValueError, e:
print string, "didnt parse"
if "Extra data" in str(e.args):
newString = "["+string+"]"
print newString
return parseJson(newString)
You could add more if/else to deal with various things you run into. I have to admit, this is very hacky and I don't think you can ever account for every possible mutation.
Good luck

The result must be list of dict:
[{'index1':'value1', 'index2':'value2'},{'index1':'value1', 'index2':'value2'}]
thus you can reference it using numbers: item[1]['index1']

Get valid python list from string (javascript array)

I'm trying to get the valid python list from the response of a server like you can see below:
window.__search.list=[{"order":"1","base":"LAW","n":"148904","access":{"css":"avail_yes","title":"\u042
2\u0435\u043a\u0441\u0442\u0434\u043e\u043a\u0443\u043c\u0435\u043d\u0442\u0430\u0434\u043e\u0441\u0442\u0443\u043f\u0435\u043d"},"title":"\"\u0410\u0440\u0431\u0438\u0442\u0440\u0430\u0436\u043d\u044b\u0439\u043f\u0440\u043e\u0446\u0435\u0441\u0441\u0443\u0430\u043b\u044c\u043d\u044b\u0439\u043a\u043e\u0434\u0435\u043a\u0441\u0420\u043e\u0441\u0441\u0438\u0439\u0441\u043a\u043e\u0439\u0424\u0435\u0434\u0435\u0440\u0430\u0446\u0438\u0438\" \u043e\u0442 24.07.2002 N 95-\u0424\u0417 (\u0440\u0435\u0434. \u043e\u0442 02.07.2013) (\u0441 \u0438\u0437\u043c. \u0438 \u0434\u043e\u043f.,\u0432\u0441\u0442\u0443\u043f\u0430 \u044e\u0449\u0438\u043c\u0438\u0432 \u0441\u0438\u043b\u0443 \u0441 01.08.2013)"}, ... }];
I did it through cutting off "window.__search.list=" and ";" from the string using data = json.loads(re.search(r"(?=\[)(.*?)\s*(?=\;)", url).group(1)) and then it was looked like standard JSON:
[{u'access': {u'css': u'avail_yes', u'title': u'\u0422\u0435\u043a\u0441\u0442\u0434\u043e\u043a\u04
43\u043c\u0435\u043d\u0442\u0430 \u0434\u043e\u0441\u0442\u0443\u043f\u0435\u043d'},u'title': u'"\u0410\u0440\u0431\u0438\u0442\u0440\u0430\u0436\u043d\u044b\u0439\u043f\u0440\u043e\u0446\u0435\u0441\u0441\u0443\u0430\u043b\u044c\u043d\u044b\u0439\u043a\u043e\u0434\u0435\u043a\u0441\u0420\u043e\u0441\u0441\u0438\u0439\u0441\u043a\u043e\u0439\u0424\u0435\u0434\u0435\u0440\u0430\u0446\u0438\u0438" \u043e\u0442 24.07.2002 N 95-\u0424\u0417 (\u04
40\u0435\u0434. \u043e\u0442 02.07.2013) (\u0441 \u0438\u0437\u043c. \u0438 \u0434\u043e
\u043f.,\u0432\u0441\u0442\u0443\u043f\u0430\u044e\u0449\u0438\u043c\u0438 \u0432 \u0441
\u0438\u043b\u0443 \u0441 01.08.2013)', u'base': u'LAW', u'order': u'1', u'n': u'148904'}, ... }]
But sometimes, during iterating an others urls I get an error like this:
File "/Developer/Python/test.py", line 123, in order_search
data = json.loads(re.search(r"(?=\[)(.*?)\s*(?=\;)", url).group(1))
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 326, in loads
return _default_decoder.decode(s)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 382, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Invalid \uXXXX escape: line 1 column 20235 (char 20235)
How can I fix it, or maybe there's an another way to get valid JSON (desirable using native libraries)?

Probably, your regular expression has found char ';' somewhere in the middle of a response, and because of this you get an error, because, using your regular expression, you might have received an incomplete, cropped response, and that's why you could not convert it into JSON.
Yes, I agree with user RickyA that sometimes using a native tools, a code will easier to read than trying to make up RegEx. But here, I'd rather to use exactly regular expression, something like this:
data = re.search(r'(?=\[)(.*?)[\;]*$', response).group(1)
/(?=\[)(.*?)[\;]*$/
(?=\[) Positive Lookahead
\[ Literal [
1st Capturing group (.*?)
. 0 to infinite times [lazy] Any character (except newline)
Char class [\;] 0 to infinite times [greedy] matches:
\; The character ;
$ End of string
I believe you meant that the variable 'url' means a response from a server, then maybe better to use name of variable 'response' instead of 'url'.
And, if you've some troubles with using RegEx, I advise you to use an editor of regular expressions, like RegEx 101.This is the online regular expression editor, which explains each block of inputted expression.

What about:
response = response.strip() #get rid of whitespaces
response = response[response.find("["):] #trim everything before the first '['
if response[-1:] == ";": #if last char == ";"
response = response[:-1] #trim it
Seems like a big overkill to do this with regex.

How to use simplejson to decode following data?

I grab some data from a URL, and search online to find out the data is in in Jason data format, but when I tried to use simplejson.loads(data), it will raise exception.
First time deal with jason data, any suggestion how to decode the data?
Thanks
=================
result = simplejson.loads(data, encoding="utf-8")
File "F:\My Documents\My Dropbox\StockDataDownloader\simplejson__init__.py", line 401, in loads
return cls(encoding=encoding, **kw).decode(s)
File "F:\My Documents\My Dropbox\StockDataDownloader\simplejson\decoder.py", line 402, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "F:\My Documents\My Dropbox\StockDataDownloader\simplejson\decoder.py", line 420, in raw_decode
raise JSONDecodeError("No JSON object could be decoded", s, idx)
simplejson.decoder.JSONDecodeError: No JSON object could be decoded: line 1 column 0 (char 0)
============================
data = "{identifier:'ID', label:'As at Wed 4 Aug 2010 05:05 PM',items:[{ID:0,N:'2ndChance',NC:'528',R:'NONE',I:'NONE',M:'-',LT:0.335,C:0.015,VL:51.000,BV:20.000,B:0.330,S:0.345,SV:20.000,O:0.335,H:0.335,L:0.335,V:17085.000,SC:'4',PV:0.320,P:4.6875,P_:'X',V_:''},{ID:1,N:'8Telecom',NC:'E25',R:'NONE',I:'NONE',M:'-',LT:0.190,C:0.000,VL:965.000,BV:1305.000,B:0.185,S:0.190,SV:641.000,O:0.185,H:0.190,L:0.185,V:179525.000,SC:'2',PV:0.190,P:0.0,P_:'X',V_:''},{ID:2,N:'A-Sonic',NC:'A53',R:'NONE',I:'NONE',M:'-',LT:0.090,C:0.005,VL:1278.000,BV:17.000,B:0.090,S:0.095,SV:346.000,O:0.090,H:0.090,L:0.090,V:115020.000,SC:'A',PV:0.085,P:5.882352734375,P_:'X',V_:''},{ID:3,N:'AA Grp',NC:'5GZ',R:'NONE',I:'NONE',M:'t',LT:0.000,C:0.000,VL:0.000,BV:100.000,B:0.050,S:0.060,SV:50.000,O:0.000,H:0.000,L:0.000,V:0.000,SC:'2',PV:0.050,P:0.0,P_:'X',V_:''}]}"

You're using simplejson correctly, but the site that gave you that data isn't using JSON format properly. Look at json.org, which uses simple syntax diagrams to show what is JSON: in the object diagram, after { (unless the object is empty, in which case a } immediately follows), JSON always has a string -- and as you see in that diagram, this means something that starts with a double quote. So, the very start of the string:
{identifier:
tells you that's incorrect JSON -- no double quotes around the word identifier.
Working around this problem is not as easy as recognizing it's there, but I wanted to reassure you, at least, about your code. Sigh it does seem that broken websites, such a great tradition in old HTML days, are with us to stay no matter how modern the technology they break is...:-(

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

converting string to dictionary using json.loads - python

Related

Python errors when trying to read and query a JSON file

Facing problems with json decoding

Python -- get at JSON info that's written like XML

Get valid python list from string (javascript array)

How to use simplejson to decode following data?

Categories

Resources