Python code to load data from some long complicated JSON file:
with open(filename, "r") as f:
data = json.loads(f.read())
(note: the best code version should be:
with open(filename, "r") as f:
data = json.load(f)
but both exhibit similar behavior)
For many types of JSON error (missing delimiters, incorrect backslashes in strings, etc), this prints a nice helpful message containing the line and column number where the JSON error was found.
However, for other types of JSON error (including the classic "using comma on the last item in a list", but also other things like capitalising true/false), Python's output is just:
Traceback (most recent call last):
File "myfile.py", line 8, in myfunction
config = json.loads(f.read())
File "c:\python27\lib\json\__init__.py", line 326, in loads
return _default_decoder.decode(s)
File "c:\python27\lib\json\decoder.py", line 360, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "c:\python27\lib\json\decoder.py", line 378, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
For that type of ValueError, how do you get Python to tell you where is the error in the JSON file?
I've found that the simplejson module gives more descriptive errors in many cases where the built-in json module is vague. For instance, for the case of having a comma after the last item in a list:
json.loads('[1,2,]')
....
ValueError: No JSON object could be decoded
which is not very descriptive. The same operation with simplejson:
simplejson.loads('[1,2,]')
...
simplejson.decoder.JSONDecodeError: Expecting object: line 1 column 5 (char 5)
Much better! Likewise for other common errors like capitalizing True.
You wont be able to get python to tell you where the JSON is incorrect. You will need to use a linter online somewhere like this
This will show you error in the JSON you are trying to decode.
You could try the rson library found here: http://code.google.com/p/rson/ . I it also up on PYPI: https://pypi.python.org/pypi/rson/0.9 so you can use easy_install or pip to get it.
for the example given by tom:
>>> rson.loads('[1,2,]')
...
rson.base.tokenizer.RSONDecodeError: Unexpected trailing comma: line 1, column 6, text ']'
RSON is a designed to be a superset of JSON, so it can parse JSON files. It also has an alternate syntax which is much nicer for humans to look at and edit. I use it quite a bit for input files.
As for the capitalizing of boolean values: it appears that rson reads incorrectly capitalized booleans as strings.
>>> rson.loads('[true,False]')
[True, u'False']
I had a similar problem and it was due to singlequotes. The JSON standard(http://json.org) talks only about using double quotes so it must be that the python json library supports only double quotes.
For my particular version of this problem, I went ahead and searched the function declaration of load_json_file(path) within the packaging.py file, then smuggled a print line into it:
def load_json_file(path):
data = open(path, 'r').read()
print data
try:
return Bunch(json.loads(data))
except ValueError, e:
raise MalformedJsonFileError('%s when reading "%s"' % (str(e),
path))
That way it would print the content of the json file before entering the try-catch, and that way – even with my barely existing Python knowledge – I was able to quickly figure out why my configuration couldn't read the json file.
(It was because I had set up my text editor to write a UTF-8 BOM … stupid)
Just mentioning this because, while maybe not a good answer to the OP's specific problem, this was a rather quick method in determining the source of a very oppressing bug. And I bet that many people will stumble upon this article who are searching a more verbose solution for a MalformedJsonFileError: No JSON object could be decoded when reading …. So that might help them.
As to me, my json file is very large, when use common json in python it gets the above error.
After install simplejson by sudo pip install simplejson.
And then I solved it.
import json
import simplejson
def test_parse_json():
f_path = '/home/hello/_data.json'
with open(f_path) as f:
# j_data = json.load(f) # ValueError: No JSON object could be decoded
j_data = simplejson.load(f) # right
lst_img = j_data['images']['image']
print lst_img[0]
if __name__ == '__main__':
test_parse_json()
I had a similar problem this was my code:
json_file=json.dumps(pyJson)
file = open("list.json",'w')
file.write(json_file)
json_file = open("list.json","r")
json_decoded = json.load(json_file)
print json_decoded
the problem was i had forgotten to file.close() I did it and fixed the problem.
The accepted answer is the easiest one to fix the problem. But in case you are not allowed to install the simplejson due to your company policy, I propose below solution to fix the particular issue of "using comma on the last item in a list":
Create a child class "JSONLintCheck" to inherite from class "JSONDecoder" and override the init method of the class "JSONDecoder" like below:
def __init__(self, encoding=None, object_hook=None, parse_float=None,parse_int=None, parse_constant=None, strict=True,object_pairs_hook=None)
super(JSONLintCheck,self).__init__(encoding=None, object_hook=None, parse_float=None,parse_int=None, parse_constant=None, strict=True,object_pairs_hook=None)
self.scan_once = make_scanner(self)
make_scanner is a new function that used to override the 'scan_once' method of the above class. And here is code for it:
1 #!/usr/bin/env python
2 from json import JSONDecoder
3 from json import decoder
4 import re
5
6 NUMBER_RE = re.compile(
7 r'(-?(?:0|[1-9]\d*))(\.\d+)?([eE][-+]?\d+)?',
8 (re.VERBOSE | re.MULTILINE | re.DOTALL))
9
10 def py_make_scanner(context):
11 parse_object = context.parse_object
12 parse_array = context.parse_array
13 parse_string = context.parse_string
14 match_number = NUMBER_RE.match
15 encoding = context.encoding
16 strict = context.strict
17 parse_float = context.parse_float
18 parse_int = context.parse_int
19 parse_constant = context.parse_constant
20 object_hook = context.object_hook
21 object_pairs_hook = context.object_pairs_hook
22
23 def _scan_once(string, idx):
24 try:
25 nextchar = string[idx]
26 except IndexError:
27 raise ValueError(decoder.errmsg("Could not get the next character",string,idx))
28 #raise StopIteration
29
30 if nextchar == '"':
31 return parse_string(string, idx + 1, encoding, strict)
32 elif nextchar == '{':
33 return parse_object((string, idx + 1), encoding, strict,
34 _scan_once, object_hook, object_pairs_hook)
35 elif nextchar == '[':
36 return parse_array((string, idx + 1), _scan_once)
37 elif nextchar == 'n' and string[idx:idx + 4] == 'null':
38 return None, idx + 4
39 elif nextchar == 't' and string[idx:idx + 4] == 'true':
40 return True, idx + 4
41 elif nextchar == 'f' and string[idx:idx + 5] == 'false':
42 return False, idx + 5
43
44 m = match_number(string, idx)
45 if m is not None:
46 integer, frac, exp = m.groups()
47 if frac or exp:
48 res = parse_float(integer + (frac or '') + (exp or ''))
49 else:
50 res = parse_int(integer)
51 return res, m.end()
52 elif nextchar == 'N' and string[idx:idx + 3] == 'NaN':
53 return parse_constant('NaN'), idx + 3
54 elif nextchar == 'I' and string[idx:idx + 8] == 'Infinity':
55 return parse_constant('Infinity'), idx + 8
56 elif nextchar == '-' and string[idx:idx + 9] == '-Infinity':
57 return parse_constant('-Infinity'), idx + 9
58 else:
59 #raise StopIteration # Here is where needs modification
60 raise ValueError(decoder.errmsg("Expecting propert name enclosed in double quotes",string,idx))
61 return _scan_once
62
63 make_scanner = py_make_scanner
Better put the 'make_scanner' function together with the new child class into a same file.
Just hit the same issue and in my case the problem was related to BOM (byte order mark) at the beginning of the file.
json.tool would refuse to process even empty file (just curly braces) until i removed the UTF BOM mark.
What I have done is:
opened my json file with vim,
removed byte order mark (set nobomb)
save file
This resolved the problem with json.tool. Hope this helps!
When your file is created. Instead of creating a file with content is empty. Replace with:
json.dump({}, file)
You could use cjson, that claims to be up to 250 times faster than pure-python implementations, given that you have "some long complicated JSON file" and you will probably need to run it several times (decoders fail and report the first error they encounter only).
Related
This question already has answers here:
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
(24 answers)
Closed 1 year ago.
I've been trying to do some API queries to get some missing data in my DF. I'm using grequest library to send multiple request and create a list for the response object. Then I use a for loop to load the response in a json to retrieve the missing data. What I noticed is that when loading the data using .json() from the list directly using notition list[0].json() it works fine, but when trying to read the list and then load the response into a json, This error comes up : JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Here's my code :
import requests
import json
import grequests
ls = []
for i in null_data['name']:
url = 'https://pokeapi.co/api/v2/pokemon/' + i.lower()
ls.append(url)
rs = (grequests.get(u) for u in ls)
s = grequests.map(rs)
#This line works
print(s[0].json()['weight']/10)
for x in s:
#This one fails
js = x.json()
peso = js['weight']/10
null_data.loc[null_data['name'] == i.capitalize(), 'weight_kg'] = peso
<ipython-input-21-9f404bc56f66> in <module>
13
14 for x in s:
---> 15 js = x.json()
16 peso = js['weight']/10
17 null_data.loc[null_data['name'] == i.capitalize(), 'weight_kg'] = peso
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
One (or more) elements are empty.
So:
...
for x in s:
if x != ""
js = x.json()
peso = js['weight']/10
null_data.loc[null_data['name'] == i.capitalize(), 'weight_kg'] = peso
...
or
...
for x in s:
try:
js = x.json()
peso = js['weight']/10
null_data.loc[null_data['name'] == i.capitalize(), 'weight_kg'] = peso
except json.JSONDecodeError as ex: print("Failed to decode(%s)"%ex)
...
The first checks if x is an empty string while the other tries to decode every one but upon an exception just prints an error message instead of quitting.
Problem
For a Markdown document I want to filter out all sections whose header titles are not in the list to_keep. A section consists of a header and the body until the next section or the end of the document. For simplicity lets assume that the document only has level 1 headers.
When I make a simple case distinction on whether the current element has been preceeded by a header in to_keep and do either return None or return [] I get an error. That is, for pandoc --filter filter.py -o output.pdf input.md I get TypeError: panflute.dump needs input of type "panflute.Doc" but received one of type "list" (code, example file and complete error message at the end).
I use Python 3.7.4 and panflute 1.12.5 and pandoc 2.2.3.2.
Question
If make a more fine grained distinction on when to do return [], it works (function action_working). My question is, why is this more fine grained distinction neccesary? My solution seems to work, but it might well be accidental... How can I get this to work properly?
Files
error
Traceback (most recent call last):
File "filter.py", line 42, in <module>
main()
File "filter.py", line 39, in main
return run_filter(action_not_working, doc=doc)
File "C:\Users\ody_he\AppData\Local\Continuum\anaconda3\lib\site-packages\panflute\io.py", line 266, in run_filter
return run_filters([action], *args, **kwargs)
File "C:\Users\ody_he\AppData\Local\Continuum\anaconda3\lib\site-packages\panflute\io.py", line 253, in run_filters
dump(doc, output_stream=output_stream)
File "C:\Users\ody_he\AppData\Local\Continuum\anaconda3\lib\site-packages\panflute\io.py", line 132, in dump
raise TypeError(msg)
TypeError: panflute.dump needs input of type "panflute.Doc" but received one of type "list"
Error running filter filter.py:
Filter returned error status 1
input.md
# English
Some cool english text this is!
# Deutsch
Dies ist die deutsche Übersetzung!
# Sources
Some source.
# Priority
**Medium** *[Low | Medium | High]*
# Status
**Open for Discussion** *\[Draft | Open for Discussion | Final\]*
# Interested Persons (mailing list)
- Franz, Heinz, Karl
fiter.py
from panflute import *
to_keep = ['Deutsch', 'Status']
keep_current = False
def action_not_working(elem, doc):
'''For every element we check if it occurs in a section we wish to keep.
If it is, we keep it and return None (indicating to keep the element unchanged).
Otherwise we remove the element (return []).'''
global to_keep, keep_current
update_keep(elem)
if keep_current:
return None
else:
return []
def action_working(elem, doc):
global to_keep, keep_current
update_keep(elem)
if keep_current:
return None
else:
if isinstance(elem, Header):
return []
elif isinstance(elem, Para):
return []
elif isinstance(elem, BulletList):
return []
def update_keep(elem):
'''if the element is a header we update to_keep.'''
global to_keep, keep_current
if isinstance(elem, Header):
# Keep if the title of a section is in too keep
keep_current = stringify(elem) in to_keep
def main(doc=None):
return run_filter(action_not_working, doc=doc)
if __name__ == '__main__':
main()
I think what happens is that panflute call the action on all elements, including the Doc root element. If keep_current is False when walking the Doc element, it will be replaced by a list. This leads to the error message you are seeing, as panflute expectes the root node to always be there.
The updated filter only acts on Header, Para, and BulletList elements, so the Doc root node will be left untouched. You'll probably want to use something more generic like isinstance(elem, Block) instead.
An alternative approach could be to use panflute's load and dump elements directly: load the document into a Doc element, manually iterate over all blocks in args and remove all that are unwanted, then dump the resulting doc back into the output stream.
from panflute import *
to_keep = ['Deutsch', 'Status']
keep_current = False
doc = load()
for top_level_block in doc.args:
# do things, remove unwanted blocks
dump(doc)
I need to navigate to different urls to download images from each of them.
The urls are sequential, so I thought best to manually creating them rather than using the Next button in each page.
I'm trying to generate the different section of the url and then join them together with os.path.join().
This is my working code:
starting_url = 'https://www.mangareader.net/one-piece'
storing_folder = '/Users/macbook/Documents/Media/Fumetti/One_Piece'
ch_numb_regex = re.compile(r'\d+')
for chapter in os.listdir(storing_folder):
if not chapter.startswith('.'):
if os.listdir(os.path.join(storing_folder, chapter)) == []:
continue
else:
try:
page = 1
while True:
res = requests.get(os.path.join(starting_url, str(ch_numb_regex.search(chapter).group()) ,str(page)))
res.raise_for_status()
manga_soup = bs4.BeautifulSoup(res.text, 'lxml')
manga_image = manga_soup.select('#imgholder img')
manga_url = manga_image[0].get('src')
res = requests.get(manga_url)
res.raise_for_status()
imageFile = open(os.path.join(storing_folder, chapter, page), 'wb')
imageFile.write()
imageFile.close()
page += 1
except requests.HTTPError:
continue
However, I get the error:
TypeError Traceback (most recent call last)
<ipython-input-20-1ee22580435e> in <module>()
7 res = requests.get(manga_url)
8 res.raise_for_status()
----> 9 imageFile = open(os.path.join(storing_folder, chapter, page), 'wb')
10 imageFile.write()
11 imageFile.close()
/anaconda3/lib/python3.6/posixpath.py in join(a, *p)
90 path += sep + b
91 except (TypeError, AttributeError, BytesWarning):
---> 92 genericpath._check_arg_types('join', a, *p)
93 raise
94 return path
/anaconda3/lib/python3.6/genericpath.py in _check_arg_types(funcname, *args)
147 else:
148 raise TypeError('%s() argument must be str or bytes, not %r' %
--> 149 (funcname, s.__class__.__name__)) from None
150 if hasstr and hasbytes:
151 raise TypeError("Can't mix strings and bytes in path components") from None
TypeError: join() argument must be str or bytes, not 'int'
But they should all be strings.
Can I join urls using os.path.join() in Python [...]?
Not portably, no. In the case of non-Unix operating systems, the path separator will not be '/', so you'll create malformed URIs.
[...] is there a better way?
Yes. You can use urllib.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I've got a problem loading a JSON in python. I'm working with python 2.7 and I've got a JSON file that I would like to load. I did:
movies = json.load(open(FBO_REF_FILE, 'r'))
But when I display it I got a dict full of:
{u'id_yeyecine': 42753, u'budget_dollars': u'85', u'classification': u'Tous publics', u'pays': u'US', u'budget_euros': u'0', u'dpw_entrees_fr': 132326, u'realisateurs': u'Brad Peyton, Kevin Lima', u'is_art_et_essai': u'NON', u'distributeur_video': u'Warner hv', u'genre_gfk_1': u'ENFANT', u'genre_gfk_2': u'FILM FAMILLE', u'genre_gfk_3': u'FILM FAMILLE', u'is_3D': u'OUI', u'fid': 16429, u'cum_entrees_pp': 58076, u'titre': u'COMME CHIENS ET CHATS LA REVANCHE DE KITTY GALORE', u'psp_entrees': 963, u'cum_entrees_fr': 348225, u'dps_copies_fr': 453, u'dpj_entrees_pp': 7436, u'visa': 127021, u'dps_entrees_fr': 178908, u'genre': u'Com\xe9die', u'distributeur': u'WARNER BROS.', u'editeur_video': u'Warner bros', u'psp_copies': 15, u'dpw_entrees_pp': 26195, u'id_imdb': None, u'date_sortie_video': u'2010-12-06', u'dps_copies_pp': 39, u'date_sortie': u'2010-08-04', u'dps_entrees_pp': 32913, u'dpj_entrees_fr': 40369, u'ecrivains': u'', u'acteurs': u"Chris O'donnell, Jack McBrayer", u'is_premier_film': u'NON'}
I tried using ast but I got the following error: string malformed. The error I get when using last is the following:
153 if cursor is None:
154 movies = json.load(open(FBO_REF_FILE, 'r'))
--> 155 movies = ast.literal_eval(movies)
156 for movie in movies:
157 if movies[movie]['id_allocine'] == allocine_id:
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ast.pyc in literal_eval(node_or_string)
78 return left - right
79 raise ValueError('malformed string')
---> 80 return _convert(node_or_string)
81
82
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ast.pyc in _convert(node)
77 else:
78 return left - right
---> 79 raise ValueError('malformed string')
80 return _convert(node_or_string)
81
ValueError: malformed string
With json.load you parse a json file into python's datatypes. In your case this is a dict.
With open you load a file.
If you don't want to parse the json file just do the following
content = None
with open(FBO_REF_FILE, 'r') as f:
content = f.read()
print content # content is a string contaning the content of the file
If you want to parse the json file into python's datatypes do the following:
content = None
with open(FBO_REF_FILE, 'r') as f:
content = json.loads(f.read())
print content # content is a dict containing the parsed json data
print content['id_yeyecine']
print content['budget_dollars']
If you want to pretty print your dictionary:
json.dumps(movies, sort_keys=True, indent=4)
Or use pprint: https://docs.python.org/2/library/pprint.html
To read from movies, use regular dict methods:
id_yeyecine = movies["id_yeyecine"]
Now id_yeyecine is 42753.
So I am learning python and redoing some old projects. This project involves taking in a dictionary and a message to be translated from the command line, and translating the message. (For example: "btw, hello how r u" would be translated to "by the way, hello how are you".
We are using a scanner supplied by the professor to read in tokens and strings. If necessary I can post it here too. Heres my error:
Nathans-Air-4:py1 Nathan$ python translate.py test.xlt test.msg
Traceback (most recent call last):
File "translate.py", line 26, in <module>
main()
File "translate.py", line 13, in main
dictionary,count = makeDictionary(commandDict)
File "/Users/Nathan/cs150/extra/py1/support.py", line 12, in makeDictionary
string = s.readstring()
File "/Users/Nathan/cs150/extra/py1/scanner.py", line 105, in readstring
return self._getString()
File "/Users/Nathan/cs150/extra/py1/scanner.py", line 251, in _getString
if (delimiter == chr(0x2018)):
ValueError: chr() arg not in range(256)
Heres my main translate.py file:
from support import *
from scanner import *
import sys
def main():
arguments = len(sys.argv)
if arguments != 3:
print'Need two arguments!\n'
exit(1)
commandDict = sys.argv[1]
commandMessage = sys.argv[2]
dictionary,count = makeDictionary(commandDict)
message,messageCount = makeMessage(commandMessage)
print(dictionary)
print(message)
i = 0
while count < messageCount:
translation = translate(message[i],dictionary,messageCount)
print(translation)
count = count + 1
i = i +1
main()
And here is my support.py file I am using...
from scanner import *
def makeDictionary(filename):
fp = open(filename,"r")
s = Scanner(filename)
lyst = []
token = s.readtoken()
count = 0
while (token != ""):
lyst.append(token)
string = s.readstring()
count = count+1
lyst.append(string)
token = s.readtoken()
return lyst,count
def translate(word,dictionary,count):
i = 0
while i != count:
if word == dictionary[i]:
return dictionary[i+1]
i = i+1
else:
return word
i = i+1
return 0
def makeMessage(filename):
fp = open(filename,"r")
s = Scanner(filename)
lyst2 = []
string = s.readtoken()
count = 0
while (string != ""):
lyst2.append(string)
string = s.readtoken()
count = count + 1
return lyst2,count
Does anyone know whats going on here? I've looked through several times and i dont know why readString is throwing this error... Its probably something stupid i missed
chr(0x2018) will work if you use Python 3.
You have code that's written for Python 3 but you run it with Python 2. In Python 2 chr will give you a one character string in the ASCII range. This is an 8-bit string, so the maximum parameter value for chris 255. In Python 3 you'll get a unicode character and unicode code points can go up to much higher values.
The issue is that the character you're converting using chr isn't within the range accepted (range(256)). The value 0x2018 in decimal is 8216.
Check out unichr, and also see chr.