How to check if a string is a valid JSON in python

How to check if a string is a valid JSON in python - python

I have a python script providing command line / output in console on remote linux.
I have another script which is reading this output on local machine.
Output is in below format:
ABC: NEG
BCD: NEG
FGH: POS
{aa:bb:cc:dd:ee{"value":"30","type":"Tip 3","targetModule":"Target 3","configurationGroup":null,"name":"Configuration Deneme 3","description":null,"identity":"Configuration Deneme 3","version":0,"systemId":3,"active":true}}
notice last line is in json format, now I want to check which line is in json format of the output.
I tried
if "value" in line:
json.loads(line)
it is not reading and even
json.dumps(line)
not giving output ?

You can use try except clause to check if a string is actually json:
import json
line = '<what you think is json>'
try:
json_line = json.loads(line)
except ValueError:
print("not a json")
In your above code the last line is not a valid JSON. You can use this tool JSONLint to verify if your JSON is a valid JSON.

Related

Python writing to file and json returns None/null instead of value

I'm trying to write data to a file with the following code
#!/usr/bin/python37all
print('Content-type: text/html\n\n')
import cgi
from Alarm import *
import json
htmldata = cgi.FieldStorage()
alarm_time = htmldata.getvalue('alarm_time')
alarm_date = htmldata.getvalue('alarm_date')
print(alarm_time,alarm_date)
data = {'time':alarm_time,'date':alarm_date}
# print(data['time'],data['date'])
with open('alarm_data.txt','w') as f:
json.dump(data,f)
...
but when opening the the file, I get the following output:
{'time':null,'date':null}
The print statement returns what I except it to: 14:26 2020-12-12.
I've tried this same method with f.write() but it returns both values as None. This is being run on a raspberry pi. Why aren't the correct values being written?
--EDIT--
The json string I expect to see is the following:{'time':'14:26','date':'2020-12-12'}

Perhaps you meant:
data = {'time':str(alarm_time), 'date':str(alarm_date)}
I would expect to see your file contents like this:
{"time":"14:26","date":"2020-12-12"}
Note the double quotes: ". json is very strict about these things, so don't fool yourself into having single quotes ' in a file and expecting json to parse it.

Facing problems with json decoding

I declared a variable which stores JSON file (output returned from subprocess).
app_data = self.run_subprocess(create_app)
Printed app_data looks like that:
(check comments for printed data)
I want to grab particular value from this str "appId", so I try to load app_data to json string and grab that value..
json_str = json.loads(app_data)
print(json_str["appId"])
Error
json.decoder.JSONDecodeError: Extra data: line 190 column 1 (char 5767)

It works fine upon running it and return the value 7f1f91c2-3b28-48ee-96ed-89080980. You can also confirm that it's a valid Json String by checking with a validator here.
The error
json.decoder.JSONDecodeError: Extra data: line 190 column 1 (char 5767)
I believe the error is to do with some malformed character on line 190, right after the line of the closing }. Find that and delete it and it should work fine

Getting KeyError when parsing JSON containing three layers of keys, using Python

I'm building a Python program to parse some calls to a social media API into CSV and I'm running into an issue with a key that has two keys above it in the hierarchy. I get this error when I run the code with PyDev in Eclipse.
Traceback (most recent call last):
line 413, in <module>
main()
line 390, in main
postAgeDemos(monitorID)
line 171, in postAgeDemos
age0To17 = str(i["ageCount"]["sortedAgeCounts"]["ZERO_TO_SEVENTEEN"])
KeyError: 'ZERO_TO_SEVENTEEN'
Here's the section of the code I'm using for it. I have a few other functions built already that work with two layers of keys.
import urllib.request
import json
def postAgeDemos(monitorID):
print("Enter the date you'd like the data to start on")
startDate = input('The date must be in the format YYYY-MM-DD. ')
print("Enter the date you'd like the data to end on")
endDate = input('The date must be in the format YYYY-MM-DD. ')
dates = "&start="+startDate+"&end="+endDate
urlStart = getURL()
authToken = getAuthToken()
endpoint = "/monitor/demographics/age?id=";
urlData = urlStart+endpoint+monitorID+authToken+dates
webURL = urllib.request.urlopen(urlData)
fPath = getFilePath()+"AgeDemographics"+startDate+"&"+endDate+".csv"
print("Connecting...")
if (webURL.getcode() == 200):
print("Connected to "+urlData)
print("This query returns information in a CSV file.")
csvFile = open(fPath, "w+")
csvFile.write("postDate,totalPosts,totalPostsWithIdentifiableAge,0-17,18-24,25-34,35+\n")
data = webURL.read().decode('utf8')
theJSON = json.loads(data)
for i in theJSON["ageCounts"]:
postDate = i["startDate"]
totalDocs = str(i["numberOfDocuments"])
totalAged = str(i["ageCount"]["totalAgeCount"])
age0To17 = str(i["ageCount"]["sortedAgeCounts"]["ZERO_TO_SEVENTEEN"])
age18To24 = str(i["ageCount"]["sortedAgeCounts"]["EIGHTEEN_TO_TWENTYFOUR"])
age25To34 = str(i["ageCount"]["sortedAgeCounts"]["TWENTYFIVE_TO_THIRTYFOUR"])
age35Over = str(i["ageCount"]["sortedAgeCounts"]["THIRTYFIVE_AND_OVER"])
csvFile.write(postDate+","+totalDocs+","+totalAged+","+age0To17+","+age18To24+","+age25To34+","+age35Over+"\n")
print("File printed to "+fPath)
csvFile.close()
else:
print("Server Error, No Data" + str(webURL.getcode()))
Here's a sample of the JSON I'm trying to parse.
{"ageCounts":[{"startDate":"2016-01-01T00:00:00","endDate":"2016-01-02T00:00:00","numberOfDocuments":520813,"ageCount":{"sortedAgeCounts":{"ZERO_TO_SEVENTEEN":3245,"EIGHTEEN_TO_TWENTYFOUR":4289,"TWENTYFIVE_TO_THIRTYFOUR":2318,"THIRTYFIVE_AND_OVER":70249},"totalAgeCount":80101}},{"startDate":"2016-01-02T00:00:00","endDate":"2016-01-03T00:00:00","numberOfDocuments":633709,"ageCount":{"sortedAgeCounts":{"ZERO_TO_SEVENTEEN":3560,"EIGHTEEN_TO_TWENTYFOUR":1702,"TWENTYFIVE_TO_THIRTYFOUR":2786,"THIRTYFIVE_AND_OVER":119657},"totalAgeCount":127705}}],"status":"success"}
Here it is again with line breaks so it's a little easier to read.
{"ageCounts":[{"startDate":"2016-01-01T00:00:00","endDate":"2016-01-02T00:00:00","numberOfDocuments":520813,"ageCount":
{"sortedAgeCounts":{"ZERO_TO_SEVENTEEN":3245,"EIGHTEEN_TO_TWENTYFOUR":4289,"TWENTYFIVE_TO_THIRTYFOUR":2318,"THIRTYFIVE_AND_OVER":70249},"totalAgeCount":80101}},
{"startDate":"2016-01-02T00:00:00","endDate":"2016-01-03T00:00:00","numberOfDocuments":633709,"ageCount":
{"sortedAgeCounts":{"ZERO_TO_SEVENTEEN":3560,"EIGHTEEN_TO_TWENTYFOUR":1702,"TWENTYFIVE_TO_THIRTYFOUR":2786,"THIRTYFIVE_AND_OVER":119657},"totalAgeCount":127705}}],"status":"success"}
I've tried removing the ["sortedAgeCounts"] from in the middle of
age0To17 = str(i["ageCount"]["sortedAgeCounts"]["ZERO_TO_SEVENTEEN"])
but I still get the same error. I've remove the 0-17 section to test the other age ranges and I get the same error for them as well. I tried removing all the underscores from the JSON and then using keys without the underscores.
I've also tried moving the str() to convert to string from the call to where the output is printed but the error persists.
Any ideas? Is this section not actually a JSON key, maybe a problem with the all caps or am I just doing something dumb? Any other code improvements are welcome as well but I'm stuck on this one.
Let me know if you need to see anything else. Thanks in advance for your help.

Edited(This works):
JSON=json.loads(s)
for i in JSON:
print str(JSON[i][0]["ageCount"]["sortedAgeCounts"]["ZERO_TO_SEVENTEEN"])
s is a string which contains the your JSON.

Python -- get at JSON info that's written like XML

In Python, I usually do simple JSON with this sort of template:
url = "url"
file = urllib2.urlopen(url)
json = file.read()
parsed = json.loads(json)
and then get at the variables with calls like:
parsed[obj name][value name]
But, this works with JSON that's formatted roughly like:
{'object':{'index':'value', 'index':'value'}}
The JSON I just encountered is formatted like:
{'index':'value', 'index':'value'},{'index':'value', 'index':'value'}
so there are no names for me to reference the different blocks. Of course the blocks give different info, but have the same "keys" -- much like XML is usually formatted. Using my method above, how would I parse through this JSON?

The following is not a valid JSON.
{'index':'value', 'index':'value'},{'index':'value', 'index':'value'}
Where as
[{'index':'value', 'index':'value'},{'index':'value', 'index':'value'}] is a valid JSON.
and python trackback shows that
import json
string = "{'index':'value', 'index':'value'},{'index':'value', 'index':'value'}"
parsed = json.loads(string)
print parsed
Traceback (most recent call last):
File "/Users/tron/Desktop/test3.py", line 3, in <module>
parsed_json = json.loads(json_string)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 369, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 1 column 27 - line 1 column 54 (char 26 - 53)
[Finished in 0.0s with exit code 1]
where is if you do
json_string = '[{"a":"value", "b":"value"},{"a":"value", "b":"value"}]'
everything works fine.
If that is the case, you can refer to it as an array of Jsons. where json_string[0] is the first JSON string. json_string[1] is the second and so on.
Otherwise if you think this is going to be an issue that you "just have to deal with". Here is one option:
Think of the ways JSON can be malformed and write a simple class to account for them. In the case above, here is a hacky way you can deal with it.
import json
json_string = '{"a":"value", "b":"value"},{"a":"value", "b":"value"}'
def parseJson(string):
parsed_json = None
try:
parsed_json = json.loads(string)
print parsed_json
except ValueError, e:
print string, "didnt parse"
if "Extra data" in str(e.args):
newString = "["+string+"]"
print newString
return parseJson(newString)
You could add more if/else to deal with various things you run into. I have to admit, this is very hacky and I don't think you can ever account for every possible mutation.
Good luck

The result must be list of dict:
[{'index1':'value1', 'index2':'value2'},{'index1':'value1', 'index2':'value2'}]
thus you can reference it using numbers: item[1]['index1']

Error tolerant RDF parsing using RDFlib in python

I am writing a code, that gathers some statistics about ontologies. as input I have a folder with files some are RDF/XML, some are turtle or nt.
My problem is, that when I try to parse a file using wrong format, next time even if I parse it with correct format it fails.
Here test file is turtle format. If first parse it with turtle format all is fine. but if I first parse it with the wrong format 1. error is understandable (file:///test:1:0: not well-formed (invalid token)), but error for second is (Unknown namespace prefix : owl). Like I said when I first parse with the correct one, I don't get namespace error.
Pleas help, after 2 days, I'm getting desperate.
query = 'SELECT DISTINCT ?s ?o WHERE { ?s ?p owl:Ontology . ?s rdfs:comment ?o}'
data = open("test", "r")
g = rdflib.Graph("IOMemory")
try:
result = g.parse(file=data,format="xml")
relations = g.query(query)
print(( " graph has %s statements." % len(g)))
except:
print "bad1"
e = sys.exc_info()[1]
print e
try:
result = g.parse(file=data,format="turtle")
relations = g.query(query)
print(( " graph has %s statements." % len(g)))
except :
print "bad2"
e = sys.exc_info()[1]
print e

The problem is that the g.parse reads some part from the file input stream of data first, only to figure out afterwards that it is not xml. The second call (with the turtle format) then continues to read from the input stream after the part where the previous attempt has stopped. The part read by the first parser is lost to the secnd one.
If your test file is small, the xml-parser might have read it all, leaving an "empty" rest. It seems the turtle parser did not complain - it just read in nothing. Only the query in the next statement failed to find anything owl-like in it, as the graph is empty. (I have to admit I cannot reproduce this part, the turtle parser does complain in my case, but maybe I have a different version of rdflib)
To fix it, try to reopen the file; either reorganize the code so you have an data = open("test", "r") every time you call result = g.parse(file=data, format="(some format)"), or call data.seek(0) in the except: clause, like:
for format in 'xml','turtle':
try:
print 'reading', format
result = g.parse(data, format=format)
print 'success'
break
except Exception:
print 'failed'
data.seek(0)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to check if a string is a valid JSON in python - python

Related

Python writing to file and json returns None/null instead of value

Facing problems with json decoding

Getting KeyError when parsing JSON containing three layers of keys, using Python

Python -- get at JSON info that's written like XML

Error tolerant RDF parsing using RDFlib in python

Categories

Resources