Unable to load json containing escape sequences

Unable to load json containing escape sequences - python

I'm being passed some Json and am having trouble parsing it.
The object is currently simple with a single key/value pair. The key works fine but the value \d causes issues.
This is coming from an html form, via javascript. All of the below are literals.
Html: \d
Javascript: {'Key': '\d'}
Json: {"Key": "\\d"}
json.loads() doesn't seem to like Json in this format. A quick sanity check that I'm not doing anything silly works fine:
>>> import json
>>> json.loads('{"key":"value"}')
{'key': 'value'}
Since I'm declaring this string in Python, it should escape it down to a literal of va\\lue - which, when parsed as Json should be va\lue.
>>> json.loads('{"key":"va\\\\lue"}')
{'key': 'va\\lue'}
In case python wasn't escaping the string on the way in, I thought I'd check without the doubling...
>>> json.loads('{"key":"va\\lue"}')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python33\lib\json\__init__.py", line 319, in loads
return _default_decoder.decode(s)
File "C:\Python33\lib\json\decoder.py", line 352, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Python33\lib\json\decoder.py", line 368, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Invalid \escape: line 1 column 11 (char 10)
but it fails, as expected.
I can't see any way to parse Json field that should contain a single backslash after all the unescaping has taken place.
How can I get Python to deserialize this string literal {"a":"val\\ue"} (which is valid Json) into the appropriate python representation: {'a': 'val\ue'}?
As an aside, it doesn't help that PyDev is inconsistent with what representation of a string it uses. The watch window shows double backslashes, the tooltip of the variable shows quadruple backslashes. I assume that's the "If you were to type the string, this is what you'd have to use for it to escape to the original" representation, but it's by no means clear.
Edit to follow on from #twalberg's answer...
>>> input={'a':'val\ue'}
File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec cant decode bytes in position 3-5: truncated \uXXXX escape
>>> input={'a':'val\\ue'}
>>> input
{'a': 'val\\ue'}
>>> json.dumps(input)
'{"a": "val\\\\ue"}'
>>> json.loads(json.dumps(input))
{'a': 'val\\ue'}
>>> json.loads(json.dumps(input))['a']
'val\\ue'

Using json.dumps() to see how json would represent your target string:
>>> orig = { 'a' : 'val\ue' }
>>> jstring = json.dumps(orig)
>>> print jstring
{"a": "val\\ue"}
>>> extracted = json.loads(jstring)
>>> print extracted
{u'a': u'val\\ue'}
>>> print extracted['a']
val\ue
>>>
This was in Python 2.7.3, though, so it may be only partially relevant to your Python 3.x environment. Still, I don't think JSON has changed that much...

Related

Using Python's json.loads with JSON containing regex value, breaks?

TL,DR; How can JSON containing a regex with escaped backslahes, be loaded using Python's JSON decoder?
Detail; The regular expression \\[0-9]\\ will match (for example):
\2\
The same regular expression could be encoded as a JSON value:
{
"pattern": "\\[0-9]\\"
}
And in turn, the JSON value could be encoded as a string in Python (note the single quotes):
'{"pattern": "\\[0-9]\\"}'
When loading the JSON in Python, a JSONDecodeError is raised:
import json
json.loads('{"pattern": "\\[0-9]\\"}')
The problem is caused by the regular expression escaping the blackslashes:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/__init__.py", line 348, in loads
return _default_decoder.decode(s)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid \escape: line 1 column 14 (char 13)
>>> json.loads('{"pattern": "\\[0-9]\\"}')
This surprised me since each step seems reasonable (i.e. valid regex, valid JSON, and valid Python).
How can JSON containing a regex with escaped backslahes, be loaded using Python's JSON decoder?

What's happening is that Python is first escaping the input to loads as a string literal, making it '{"pattern": "\[0-9]\"}' (double backslash -> single backslash). Then, loads now attempts to escape \[, which is invalid. To fix, escape the backslashes again. However, it's easier and more practical to specify it as a raw string:
>>> import json
>>> json.loads('{"pattern": "\\[0-9]\\"}')
json.decoder.JSONDecodeError: Invalid \escape: line 1 column 14 (char 13)
>>> json.loads(r'{"pattern": "\\[0-9]\\"}')
{'pattern': '\\[0-9]\\'} # No error
Note that this problem won't apply if loading from a file.
test.json:
{"pattern": "\\[0-9]\\"}
Python:
import json
with open('test.json', 'r') as infile:
json.load(infile) # no problem
Basically, the problem arises with the fact that you're passing in a string literal, but ironically, your string literal isn't being taken literally.

The r means that the string is to be treated as a raw string, which means all escape codes will be ignored:
json.loads(r'{"pattern": "\\[0-9]\\"}')

How to change type of list to str? [duplicate]

This question already has answers here:
Reading a list stored in a text file [duplicate]
(2 answers)
Closed 4 years ago.
I am extracting a list from html of a web page in the format
lst = '["a","b","c"]' # (type <str>)
The data type of the above is str and I want it to convert to a python list type , somthing like this
lst = ["a","b","c"] #(type <list>)
I can obtain the above by
lst = lst[1:-1].replace('"','').split(',')
But as the actual value of a,b &c is quite long and complex(contains long html text), I can't be dependent on the above method.
I also tried doing it with json module and using json.loads(lst) , that is giving the below exception
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/json/__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "/usr/local/lib/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/lib/python2.7/json/decoder.py", line 382, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
Any way of converting to list in Python?
Edit: The actual value of the list is:
['reqlistitem.no','reqlistitem.applyonlinejobdesc','reqlistitem.no','reqlistitem.referjobdesc','reqlistitem.applyemailsubjectapplication','reqlistitem.applyemailjobdesc','reqlistitem.no','reqlistitem.addedtojobcart','reqlistitem.displayjobcartactionjobdesc','reqlistitem.shareURL','reqlistitem.title','reqlistitem.shareable','reqlistitem.title','reqlistitem.contestnumber','reqlistitem.contestnumber','reqlistitem.description','reqlistitem.description','reqlistitem.primarylocation','reqlistitem.primarylocation','reqlistitem.otherlocations','reqlistitem.jobschedule','reqlistitem.jobschedule','reqlistitem.jobfield','reqlistitem.jobfield','reqlistitem.displayreferfriendaction','reqlistitem.no','reqlistitem.no','reqlistitem.applyonlinejobdesc','reqlistitem.no','reqlistitem.referjobdesc','reqlistitem.applyemailsubjectapplication','reqlistitem.applyemailjobdesc','reqlistitem.no','reqlistitem.addedtojobcart','reqlistitem.displayjobcartactionjobdesc','reqlistitem.shareURL','reqlistitem.title','reqlistitem.shareable']

i think you are looking for literal_eval:
import ast
string = '["a","b","c"]'
print ast.literal_eval(string) # ['a', 'b', 'c']

The problem in your sample string is the single quotes. The JSON standard requires double quotes.
If you change the single quotes to double quotes, it will work. An easy way is to use str.replace():
import json
s = "['reqlistitem.no','reqlistitem.applyonlinejobdesc','reqlistitem.no']"
json.loads(s.replace("'", '"'))
#[u'reqlistitem.no', u'reqlistitem.applyonlinejobdesc', u'reqlistitem.no']

Python 3 not reading a JSON file right

I have some json files created by powershell using the ConvertTo-Json command. The content of the json file looks like
{
"Key1": "Value1",
"Key2": "Value2"
}
I ran the python interpreter to see if I could read the file but I get this weird output
>>> f=open('test.json', 'r')
>>> f.read()
'ÿ\xfe{\x00\n\x00\n\x00 \x00 \x00 \x00 \x00"\x00K\x00e\x00y\x001\x00"\x00:\x00 \x00 \x00"\x00V\x00a\x00l\x00u\x00e\x001\x00"\x00,\x00\n\x00\n\x00 \x00 \x00 \x00 \x00"\x00K\x00e\x00y\x002\x00"\x00:\x00 \x00 \x00"\x00V\x00a\x00l\x00u\x00e\x002\x00"\x00\n\x00\n\x00}\x00\n\x00\n\x00'
For some reason all the characters are escaped byte characters and there's the weird ÿ at the begninning (powershell error?).
The weird thing is this:
>>> f=open('test.json', 'r')
>>> str=f.read()
>>> type(str)
<class 'str'>
>>> json.loads(str)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Rutvik_Choudhary\AppData\Local\Programs\Python\Python35-32\lib\json\__init__.py", line 319, in loads
return _default_decoder.decode(s)
File "C:\Users\Rutvik_Choudhary\AppData\Local\Programs\Python\Python35-32\lib\json\decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\Rutvik_Choudhary\AppData\Local\Programs\Python\Python35-32\lib\json\decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
So the input is a string, but the json module can't parse it (json.load(f) return the same error). What is causing this error? Is it a python thing, a powershell thing, a json thing?

As pointed out by jwodder, PowerShell has encoded your json using UTF-16LE. To get this data into json correctly, you need to open the file using the correct encoding. eg.
with open("test.json", "r", encoding="utf16") as f:
json_string = f.read()
my_dict = json.loads(json_string)
You don't need to tell Python which variant of UTF-16 is being used. This is the purpose of the first two bytes of the text file. It's called a Byte Order Mark (BOM). It lets a program know if UTF-16LE or UTF-16BE has been used to encode the text file.

It seems that you have a BOM at the start of your file. You can verify it in a hex editor or with a good text editor (Notepad++ shows if BOM is present).

If you want to load text files with Unicode BOM headers, like yours you should better use to codecs.open functions instead of open as the default open is not able to interpret the BOM.
Or you can have a look at tendo.unicode - a small library that I wrote that can improve life for people that are not used to Unicode texts.

json.JSONDecoder().decode() can not work well

code is simple, but it can not work. I don't know the problem
import json
json_data = '{text: \"tl4ZCTPzQD0k|rEuPwudrAfgBD3nxFIsSbb4qMoYWA=\", key: \"MPm0ZIlk9|ADco64gjkJz2NwLm6SWHvW\"}'
my_data = json.JSONDecoder().decode(json_data)
print my_data
throw exption behinde:
Traceback (most recent call last):
File "D:\Python27\project\demo\digSeo.py", line 4, in <module>
my_data = json.JSONDecoder().decode(json_data)
File "D:\Python27\lib\json\decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "D:\Python27\lib\json\decoder.py", line 382, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting property name: line 1 column 2 (char 1)

Your json_data is not valid JSON.
In JSON, property names need to be in double quotes ("). Also, the double quotes terminating the string values don't need to be ecaped since you're already using single quotes (') for the string.
Example:
json_data = '{"text": "tl4ZCTPzQD0k|rEuPwudrAfgBD3nxFIsSbb4qMoYWA=", "key": "MPm0ZIlk9|ADco64gjkJz2NwLm6SWHvW"}'

The json module in Python standard library can work well, that's what a lot of people are using for their applications.
However these few lines of code that use this module have a small issue. The problem is that your sample data is not a valid JSON. The keys (text and key) should be quoted like this:
json_data = '{"text": \"tl4ZCTPzQD0k|rEuPwudrAfgBD3nxFIsSbb4qMoYWA=\", "key": \"MPm0ZIlk9|ADco64gjkJz2NwLm6SWHvW\"}'

How to convert a MongoDB find() query to a PyMongo dictionary?

I am successfully using PyMongo to connect and query (or rather find()) documents in my MongoDB collection.
However, I'm running into an issue. Using the MongoDB interface, I can execute the command:
db.mycollection.find({ $and: [{filename: {$regex: '\\.part_of_my_name'} }, {bases: {$gt: 10000 } } ] } )
And it works great. However, when I try to execute this command in PyMongo, I see that the PyMongo declaration for find() requires a Python dict object only. It actually raises an exception if a string is provided.
So my question is: how can I convert the above JSON(?) string to a dictionary?
I tried building it manually, but it's overly complicated, and I'm wondering if there is a simple way to go from string to dictionary.

To go from a string to a dictionary, you can use json.loads:
>>> import json
>>> json.loads('{"key":"value"}')
{'key': 'value'}
However, you cannot always copy and paste a MongoDB command and expect it to be valid json. For example:
>>> json.loads('{$and: [{filename: {$regex: "\\.part_of_my_name"} }, {bases: {$gt: 10000 }}]}')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 365, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 381, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting property name: line 1 column 2 (char 1)
In order for it to work in Python, all of the keys need to be explicitly quoted. So $and needs to become "$and". Also, you'll have to use additional escapes for backslashes (kinda ugly, I know).
The query in your example should look like this:
jsonString = '{"$and":[{"filename":{"$regex":"\\\\.part_of_my_name"}},{"bases":{"$gt":10000}}]}'
You can then use json.loads on it. However, At this point it is a valid Python dictionary, so you could just use this:
jsonDict = {"$and":[{"filename":{"$regex":"\\.part_of_my_name"}},{"bases":{"$gt":10000}}]}

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Unable to load json containing escape sequences - python

Related

Using Python's json.loads with JSON containing regex value, breaks?

How to change type of list to str? [duplicate]

Python 3 not reading a JSON file right

json.JSONDecoder().decode() can not work well

How to convert a MongoDB find() query to a PyMongo dictionary?

Categories

Resources