Why does simplejson think this string is invalid json? [duplicate]

Why does simplejson think this string is invalid json? [duplicate] - python

This question already has answers here:
Using number as "index" (JSON)
(7 answers)
Closed 2 years ago.
I'm trying to convert a simple JSON string {\n 100: {"a": "b"}\n} to a python object but its giving me this error: Expecting property name enclosed in double quotes
Why is it insisting that the name of the attribute be a string?
>>> import simplejson
>>> my_json = simplejson.loads('{\n 100: {"a": "b"}\n}')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/myuser/myenv/lib/python2.7/site-packages/simplejson/__init__.py", line 525, in loads
return _default_decoder.decode(s)
File "/Users/myuser/myenv/lib/python2.7/site-packages/simplejson/decoder.py", line 370, in decode
obj, end = self.raw_decode(s)
File "/Users/myuser/myenv/lib/python2.7/site-packages/simplejson/decoder.py", line 400, in raw_decode
return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 5 (char 6)

This value is invalid {100: {"a": "b"}}, you need {"100": {"a": "b"}}.
The property name there being 100 needs to be enclosed in double quotes so "100".
Why is it insisting that the name of the attribute be a string?
That's how JSON is.
You may have been used to be able to write {100: {"a": "b"}} in Javascript or another language, (without double quoting the property name), but you'll still get a parsing error if you try to parse it as JSON in Javascript, e.g.:
JSON.parse('{100: {"a": "b"}}')
SyntaxError: JSON.parse: expected property name or '}'
at line 1 column 2 of the JSON data

Related

Convert String Representation Of Array To Actual Array In Python

I am quite new to python , and am struggling to make this work.
I need to convert
["ladksa dlskdlsd slkd sldks lskds lskds","skkdsdl skdsl lskds lsdk sldk sdslkds"] - String Representation of Array To Actual Array Like this.
[0]-> "ladksa dlskdlsd slkd sldks lskds lskds"
[1]-> "skkdsdl skdsl lskds lsdk sldk sdslkds"
I have tried following things:
json.load(array) -> but it gave me parse array error
literal_eval(x) -> read it somewhere (dont know why it doesn't work)
Error:
custom_classes = json.loads(element.custom_classes)
File "C:\Users\Shri\AppData\Local\Programs\Python\Python38-32\lib\json\__init_
.py", line 357, in loads
return _default_decoder.decode(s)
File "C:\Users\Shri\AppData\Local\Programs\Python\Python38-32\lib\json\decoder
py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\Shri\AppData\Local\Programs\Python\Python38-32\lib\json\decoder
py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 2 (char 1)
Also, I am using below js code to send data:
'custom_classes':'\''+JSON.stringify(finalized_classes)+'\'',
Data Is Stored as :
Please check image here
and
arr_element = array of Objects For Specific Class
which has a property
custom_classes -> which is stored as above image.
and i am doing:
for element in arr_element:
string_obj = json.loads(str(element.custom_classes))
//here it is error
Please help

import json
s = '["ladksa dlskdlsd slkd sldks lskds lskds","skkdsdl skdsl lskds lsdk sldk sdslkds"]'
json.loads(s)
# gives ['ladksa dlskdlsd slkd sldks lskds lskds', 'skkdsdl skdsl lskds lsdk sldk sdslkds']

After you use json.loads(), you can use str.split() to turn the strings into arrays, and then append one of the arrays onto the other if you need to.
foo = ['this is a sentence', 'this is a different sentence']
bar = foo[0].split(' ')
biz = foo[1].split(' ')
bar.append(biz)
print(bar)
Result: ['this', 'is', 'a', 'sentence', 'this', 'is', 'a', 'different', 'sentence']

convert date from numpyarray into datetime type -> getting mystic error

I load a file f with the numpy.loadtxt function and wanted to extract some dates.
The date has a format like this: 15.08. - 21.08.2011
numpyarr = loadtxt(f, dtype=str, delimiter=";", skiprows = 1)
alldate = numpyarr[:,0][0]
dat = datetime.datetime.strptime(alldate,"%d.%m. - %d.%m.%Y")
And here is the whole error:
Traceback (most recent call last):
File "C:\PYTHON\Test DATETIME_2.py", line 52, in <module>
dat = datetime.datetime.strptime(alldate,"%d.%m. - %d.%m.%Y")
File "C:\Python27\lib\_strptime.py", line 308, in _strptime
format_regex = _TimeRE_cache.compile(format)
File "C:\Python27\lib\_strptime.py", line 265, in compile
return re_compile(self.pattern(format), IGNORECASE)
File "C:\Python27\lib\re.py", line 190, in compile
return _compile(pattern, flags)
File "C:\Python27\lib\re.py", line 242, in _compile
raise error, v # invalid expression
sre_constants.error: redefinition of group name 'd' as group 3; was group 1
Does somebody have an idea was going on?

A datetime holds a single date & time, while your field contains two dates and trys to extract them into a single variable. Specifically, the error you're getting is because you've used %d and %m twice.
You can try something along the lines of:
a = datetime.datetime.strptime(alldate.split('-')[0],"%d.%m. ")
b = datetime.datetime.strptime(alldate.split('-')[1]," %d.%m.%Y")
a = datetime.datetime(b.year, a.month, a.day)
(it's not the best code, but it demonstrates the fact that there are two dates in two different formats hiding in your string).

Weird TypeError from json.dumps

In python 3.4.0, using json.dumps() throws me a TypeError in one case but works like a charm in other case (which I think is equivalent to the first one).
I have a dict where keys are strings and values are numbers and other dicts (i.e. something like {'x': 1.234, 'y': -5.678, 'z': {'a': 4, 'b': 0, 'c': -6}}).
This fails (the stacktrace is not from this particular code snippet but from my larger script which I won't paste here but it is essentialy the same):
>>> x = dict(foo()) # obtain the data and make a new dict of it to really be sure
>>> import json
>>> json.dumps(x)
Traceback (most recent call last):
File "/mnt/data/gandalv/progs/pycharm-3.4/helpers/pydev/pydevd.py", line 1733, in <module>
debugger.run(setup['file'], None, None)
File "/mnt/data/gandalv/progs/pycharm-3.4/helpers/pydev/pydevd.py", line 1226, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/mnt/data/gandalv/progs/pycharm-3.4/helpers/pydev/_pydev_execfile.py", line 38, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc) #execute the script
File "/mnt/data/gandalv/School/PhD/Other work/Krachy/code/recalculate.py", line 54, in <module>
ls[1] = json.dumps(f)
File "/usr/lib/python3.4/json/__init__.py", line 230, in dumps
return _default_encoder.encode(obj)
File "/usr/lib/python3.4/json/encoder.py", line 192, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib/python3.4/json/encoder.py", line 250, in iterencode
return _iterencode(o, 0)
File "/usr/lib/python3.4/json/encoder.py", line 173, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: 306 is not JSON serializable
The 306 is one of the values in one of ther inner dicts in x. It is not always the same number, sometimes it is a different number contained in the dict, apparently because of the unorderedness of a dict.
However, this works like a charm:
>>> x = foo() # obtain the data and make a new dict of it to really be sure
>>> import ast
>>> import json
>>> x2 = ast.literal_eval(repr(x))
>>> x == x2
True
>>> json.dumps(x2)
"{...}" # the json representation of dict as it should be
Could anyone, please, tell me why does this happen or what could be the cause? The most confusing part is that those two dicts (the original one and the one obtained through evaluation of the representation of the original one) are equal but the dumps() function behaves differently for each of them.

The cause was that the numbers inside the dict were not ordinary python ints but numpy.in64s which are apparently not supported by the json encoder.

As you have seen, numpy int64 data types are not serializable into json directly:
>>> import numpy as np
>>> import json
>>> a=np.zeros(3, dtype=np.int64)
>>> a[0]=-9223372036854775808
>>> a[2]=9223372036854775807
>>> jstr=json.dumps(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Cellar/python3/3.4.1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/json/__init__.py", line 230, in dumps
return _default_encoder.encode(obj)
File "/usr/local/Cellar/python3/3.4.1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/json/encoder.py", line 192, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/local/Cellar/python3/3.4.1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/json/encoder.py", line 250, in iterencode
return _iterencode(o, 0)
File "/usr/local/Cellar/python3/3.4.1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/json/encoder.py", line 173, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: array([-9223372036854775808, 0, 9223372036854775807]) is not JSON serializable
However, Python integers -- including longer integers -- can be serialized and deserialized:
>>> json.loads(json.dumps(2**123))==2**123
True
So with numpy, you can convert directly to Python data structures then serialize:
>>> jstr=json.dumps(a.tolist())
>>> b=np.array(json.loads(jstr))
>>> np.array_equal(a,b)
True

How to parse JSON files with double-quotes inside strings in Python?

I'm trying to read a JSON file in Python. Some of the lines have strings with double quotes inside:
{"Height__c": "8' 0\"", "Width__c": "2' 8\""}
Using a raw string literal produces the right output:
json.loads(r"""{"Height__c": "8' 0\"", "Width__c": "2' 8\""}""")
{u'Width__c': u'2\' 8"', u'Height__c': u'8\' 0"'}
But my string comes from a file, ie:
s = f.readline()
Where:
>>> print repr(s)
'{"Height__c": "8\' 0"", "Width__c": "2\' 8""}'
And json throws the following exception:
json.loads(s) # s = """{"Height__c": "8' 0\"", "Width__c": "2' 8\""}"""
ValueError: Expecting ',' delimiter: line 1 column 21 (char 20)
Also,
>>> s = """{"Height__c": "8' 0\"", "Width__c": "2' 8\""}"""
>>> json.loads(s)
Fails, but assigning the raw literal works:
>>> s = r"""{"Height__c": "8' 0\"", "Width__c": "2' 8\""}"""
>>> json.loads(s)
{u'Width__c': u'2\' 8"', u'Height__c': u'8\' 0"'}
Do I need to write a custom Decoder?

The data file you have does not escape the nested quotes correctly; this can be hard to repair.
If the nested quotes follow a pattern; e.g. always follow a digit and are the last character in each string you can use a regular expression to fix these up. Given your sample data, if all you have is measurements in feet and inches, that's certainly doable:
import re
from functools import partial
repair_nested = partial(re.compile(r'(\d)""').sub, r'\1\\""')
json.loads(repair_nested(s))
Demo:
>>> import json
>>> import re
>>> from functools import partial
>>> s = '{"Height__c": "8\' 0"", "Width__c": "2\' 8""}'
>>> repair_nested = partial(re.compile(r'(\d)""').sub, r'\1\\""')
>>> json.loads(s)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/json/decoder.py", line 365, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/json/decoder.py", line 381, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting , delimiter: line 1 column 21 (char 20)
>>> json.loads(repair_nested(s))
{u'Width__c': u'2\' 8"', u'Height__c': u'8\' 0"'}

Retrieving JSON objects from a text file (using Python)

I have thousands of text files containing multiple JSON objects, but unfortunately there is no delimiter between the objects. Objects are stored as dictionaries and some of their fields are themselves objects. Each object might have a variable number of nested objects. Concretely, an object might look like this:
{field1: {}, field2: "some value", field3: {}, ...}
and hundreds of such objects are concatenated without a delimiter in a text file. This means that I can neither use json.load() nor json.loads().
Any suggestion on how I can solve this problem. Is there a known parser to do this?

This decodes your "list" of JSON Objects from a string:
from json import JSONDecoder
def loads_invalid_obj_list(s):
decoder = JSONDecoder()
s_len = len(s)
objs = []
end = 0
while end != s_len:
obj, end = decoder.raw_decode(s, idx=end)
objs.append(obj)
return objs
The bonus here is that you play nice with the parser. Hence it keeps telling you exactly where it found an error.
Examples
>>> loads_invalid_obj_list('{}{}')
[{}, {}]
>>> loads_invalid_obj_list('{}{\n}{')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "decode.py", line 9, in loads_invalid_obj_list
obj, end = decoder.raw_decode(s, idx=end)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 376, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting object: line 2 column 2 (char 5)
Clean Solution (added later)
import json
import re
#shameless copy paste from json/decoder.py
FLAGS = re.VERBOSE | re.MULTILINE | re.DOTALL
WHITESPACE = re.compile(r'[ \t\n\r]*', FLAGS)
class ConcatJSONDecoder(json.JSONDecoder):
def decode(self, s, _w=WHITESPACE.match):
s_len = len(s)
objs = []
end = 0
while end != s_len:
obj, end = self.raw_decode(s, idx=_w(s, end).end())
end = _w(s, end).end()
objs.append(obj)
return objs
Examples
>>> print json.loads('{}', cls=ConcatJSONDecoder)
[{}]
>>> print json.load(open('file'), cls=ConcatJSONDecoder)
[{}]
>>> print json.loads('{}{} {', cls=ConcatJSONDecoder)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 339, in loads
return cls(encoding=encoding, **kw).decode(s)
File "decode.py", line 15, in decode
obj, end = self.raw_decode(s, idx=_w(s, end).end())
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 376, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting object: line 1 column 5 (char 5)

Sebastian Blask has the right idea, but there's no reason to use regexes for such a simple change.
objs = json.loads("[%s]"%(open('your_file.name').read().replace('}{', '},{')))
Or, more legibly
raw_objs_string = open('your_file.name').read() #read in raw data
raw_objs_string = raw_objs_string.replace('}{', '},{') #insert a comma between each object
objs_string = '[%s]'%(raw_objs_string) #wrap in a list, to make valid json
objs = json.loads(objs_string) #parse json

How about something like this:
import re
import json
jsonstr = open('test.json').read()
p = re.compile( '}\s*{' )
jsonstr = p.sub( '}\n{', jsonstr )
jsonarr = jsonstr.split( '\n' )
for jsonstr in jsonarr:
jsonobj = json.loads( jsonstr )
print json.dumps( jsonobj )

Solution
As far as I know }{ does not appear in valid JSON, so the following should be perfectly safe when trying to get strings for separate objects that were concatenated (txt is the content of your file). It does not require any import (even of re module) to do that:
retrieved_strings = map(lambda x: '{'+x+'}', txt.strip('{}').split('}{'))
or if you prefer list comprehensions (as David Zwicker mentioned in the comments), you can use it like that:
retrieved_strings = ['{'+x+'}' for x in txt.strip('{}').split('}{'))]
It will result in retrieved_strings being a list of strings, each containing separate JSON object. See proof here: http://ideone.com/Purpb
Example
The following string:
'{field1:"a",field2:"b"}{field1:"c",field2:"d"}{field1:"e",field2:"f"}'
will be turned into:
['{field1:"a",field2:"b"}', '{field1:"c",field2:"d"}', '{field1:"e",field2:"f"}']
as proven in the example I mentioned.

Why don't you load the file as string, replace all }{ with },{ and surround the whole thing with []? Something like:
re.sub('\}\s*?\{', '\}, \{', string_read_from_a_file)
Or simple string replace if you are sure you always have }{ without whitespaces in between.
In case you expect }{ to occur in strings as well, you could also split on }{ and evaluate each fragment with json.load, in case you get an error, the fragment wasn't complete and you have to add the next to the first one and so forth.

import json
file1 = open('filepath', 'r')
data = file1.readlines()
for line in data :
values = json.loads(line)
'''Now you can access all the objects using values.get('key') '''

How about reading through the file incrementing a counter every time a { is found and decrementing it when you come across a }. When your counter reaches 0 you'll know that you've come to the end of the first object so send that through json.load and start counting again. Then just repeat to completion.

Suppose you added a [ to the start of the text in a file, and used a version of json.load() which, when it detected the error of finding a { instead of an expected comma (or hits the end of the file), spit out the just-completed object?

Replace a file with that junk in it:
$ sed -i -e 's;}{;}, {;g' foo
Do it on the fly in Python:
junkJson.replace('}{', '}, {')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why does simplejson think this string is invalid json? [duplicate] - python

Related

Convert String Representation Of Array To Actual Array In Python

convert date from numpyarray into datetime type -> getting mystic error

Weird TypeError from json.dumps

How to parse JSON files with double-quotes inside strings in Python?

Retrieving JSON objects from a text file (using Python)

Categories

Resources