JSON String with elements containing unescaped double quotes

JSON String with elements containing unescaped double quotes - python

While trying to parse JSON from an AJAX request, the string returned contains invalid JSON.
Although the best practice would be to change the server to reply with valid JSON, as suggested in multiple related answers, this is not an option.
Trying to solve this problem using python, I looked at regular expressions.
The main problem is elements as follows (which I currently use as a test string:
testStr = '{"KEY1":"THIS IS "AN" ELEMENT","KEY2":"""THIS IS ANOTHER "ELEMENT""}'
I currently use the following code:
jsonString = re.sub(r'(?<=\w)\"(?=[^\(\:\}\,])','\\"',testStr)
jsonString = re.sub(r'\"\"(?![,}:])','\"\\\"',jsonString)
with very limited success.
If I was using C, I would parse the string, and simply escape all double quotes within the element (i.e between all double quotes which are preceded by [:{},] )
There must be a pythonic way to parse, without resorting to a for loop and looking ahead, and keeping history.
EDIT:
Assuming that strings do not contain: [ : { } ]
And also assuming that the unescaped double quotes are only within the value, and not in the key,
Then I assume that the following (or something similar should solve the problem:
import re
re.sub(r'(?<![\[\:])\"(?![,\}),'\"',testString)
But it still does not work.

Seems I needed a break to solve this.
The following regular expression seems to replace only doublequotes that are contained within the element string. (With the assumptions I stated in the question)
output = re.sub(r'(?<![\[\:\{\,])\"(?![\:\}\,])','\\\"', stringName)
I have created a sandbox here: https://repl.it/vNK
Example Output:
Original String:
{"KEY1":"THIS IS "AN" ELEMENT","KEY2":"""THIS IS ANOTHER "ELEMENT""}
Modified String:
{"KEY1":"THIS IS \"AN\" ELEMENT","KEY2":"\"\"THIS IS ANOTHER \"ELEMENT\""}
Parsed JSON:
{
"KEY1": "THIS IS \"AN\" ELEMENT",
"KEY2": "\"\"THIS IS ANOTHER \"ELEMENT\""
}
Any suggestions are welcome.

Related

Eliminating " " from a JSON file so that they don't interrupt the string [duplicate]

While trying to parse JSON from an AJAX request, the string returned contains invalid JSON.
Although the best practice would be to change the server to reply with valid JSON, as suggested in multiple related answers, this is not an option.
Trying to solve this problem using python, I looked at regular expressions.
The main problem is elements as follows (which I currently use as a test string:
testStr = '{"KEY1":"THIS IS "AN" ELEMENT","KEY2":"""THIS IS ANOTHER "ELEMENT""}'
I currently use the following code:
jsonString = re.sub(r'(?<=\w)\"(?=[^\(\:\}\,])','\\"',testStr)
jsonString = re.sub(r'\"\"(?![,}:])','\"\\\"',jsonString)
with very limited success.
If I was using C, I would parse the string, and simply escape all double quotes within the element (i.e between all double quotes which are preceded by [:{},] )
There must be a pythonic way to parse, without resorting to a for loop and looking ahead, and keeping history.
EDIT:
Assuming that strings do not contain: [ : { } ]
And also assuming that the unescaped double quotes are only within the value, and not in the key,
Then I assume that the following (or something similar should solve the problem:
import re
re.sub(r'(?<![\[\:])\"(?![,\}),'\"',testString)
But it still does not work.

Seems I needed a break to solve this.
The following regular expression seems to replace only doublequotes that are contained within the element string. (With the assumptions I stated in the question)
output = re.sub(r'(?<![\[\:\{\,])\"(?![\:\}\,])','\\\"', stringName)
I have created a sandbox here: https://repl.it/vNK
Example Output:
Original String:
{"KEY1":"THIS IS "AN" ELEMENT","KEY2":"""THIS IS ANOTHER "ELEMENT""}
Modified String:
{"KEY1":"THIS IS \"AN\" ELEMENT","KEY2":"\"\"THIS IS ANOTHER \"ELEMENT\""}
Parsed JSON:
{
"KEY1": "THIS IS \"AN\" ELEMENT",
"KEY2": "\"\"THIS IS ANOTHER \"ELEMENT\""
}
Any suggestions are welcome.

json.loads change " " to single quote

I am writing a program to call an API. I am trying to convert my data payload into json. Thus, I am using json.loads() to achieve this.
However, I have encountered the following problem.
I set my variable as following:
apiVar = [
"https://some.url.net/api/call", #url
'{"payload1":"email#user.net", "payload2":"stringPayload"}',#payload
{"Content-type": "application/json", "Accept": "text/plain"}#headers
]
Then I tried to convert apiVar[1] value into json object.
jsonObj = json.loads(apiVar[1])
However, instead of giving me output like the following:
{"payload1":"email#user.net", "payload2":"stringPayload"}
It gives me this instead:
{'payload1':'email#user.net', 'payload2':'stringPayload'}
I know for sure that this is not a valid json format. What I would like to know is, why does this happen? I try searching a solution for it but am not able to find anything on it. All code examples suggest it should have given me the double quote instead.
How should I fix it so that it will give the double quote output?

json.loads() takes a JSON string and converts it into the equivalent Python datastructure, which in this case is a dict containing strings. And Python strings display in single quotes by default.
If you want to convert a Python datastructure to JSON, use json.dumps(), which will return a string. Or if you're outputting straight to a file, use json.dump().
In any case, your payload is already valid JSON, so the only reason to load it is if you want to make changes to it before calling the API.

You need to use the json.dumps to convert the object back into json format.
The string with single quotes that you are reverencing is probably a str() or repr() method that is simply used to visualize the data as a python object (dictionary) not a json object. try taking a look at this:
print(type(jsonObj))
print(str(jsonObj))
print(json.dumps(jsonObj))

Insert XML String into Python dictionary value

I've had trouble searching this because there's a lot of "turn xml into a dictionary" posts but that's not what I'm looking for. I have no desire or need to parse the xml string.
I have an xml string that I want to insert into one dictionary element. My dictionary looks like this
{'JobName':'Test','JobProgram':'1234','JobParameters':'<xmlString><some have="double quotes" /><theresAlso aPath="\\path\with\(paraenthesis)\goes\here" /></xmlString>'}
But that doesn't seem to work as is, I'm guessing it has to do with the <> and double quotes. So what do I need to do?
My end goal is to send all this as a POST command to URL.php using the requests library in python. URL.php then uses htmlspecialchars($JobParameters), so I'm not fully sure I know what it expects as input either, raw xml or stripslashes(xml) or something else. I can read but cannot edit the php file.

JSON does not accept single quotes surrounding key names and values.
Double quotes (\") and literal backslashes (\\) must be escaped inside a value.
Using double quotes and testing on command line with jq
echo '{"JobName":"Test","JobProgram":"1234","JobParameters":"<xmlString><some have=\"double quotes\" /><theresAlso aPath=\"\\path\\goes\\here\" /></xmlString>"}' | jq -r '.'
Result (meaning valid Json):
{
"JobName": "Test",
"JobProgram": "1234",
"JobParameters": "<xmlString><some have=\"double quotes\" /><theresAlso aPath=\"\\path\\goes\\here\" /></xmlString>"
}

JSONDecodeError; Invalid /escape when parsing from Python

After running my object detection model, it outputs a .json file with the results. In order to actually use the results of the model in my python I need to parse the .json file, but nothing I have tried in order to do it works. I tried just to open and then print the results but I got the error:
json.decoder.JSONDecodeError: Invalid \escape: line 4 column 41 (char 60)
If you have any idea what I did wrong, the help would be very much appreciated. My code:
with open(r'C:\Yolo_v4\darknet\build\darknet\x64\result.json') as result:
data = json.load(result)
result.close()
print(data)
My .json file
[
{
"frame_id":1,
"filename":"C:\Yolo_v4\darknet\build\darknet\x64\f047.png",
"objects": [
{"class_id":32, "name":"right", "relative_coordinates":{"center_x":0.831927, "center_y":0.202225, "width":0.418463, "height":0.034752}, "confidence":0.976091},
{"class_id":19, "name":"h", "relative_coordinates":{"center_x":0.014761, "center_y":0.873551, "width":0.041723, "height":0.070544}, "confidence":0.484339},
{"class_id":24, "name":"left", "relative_coordinates":{"center_x":0.285694, "center_y":0.200752, "width":0.619584, "height":0.032149}, "confidence":0.646595},
]
}
]
(There are several more detected objects but did not include them)

The other responders are of course right. This is not valid JSON. But sometimes you don't have the option to change the format, e.g. because you are working with a broken data dump where the original source is no longer available.
The only way to deal with that is to sanitize it somehow. This is of course not ideal, because you have to put a lot of expectations into your sanitizer code, i.e. you need to know exactly what kind of errors the json file has.
However, a solution using regular expressions could look like this:
import json
import re
class LazyDecoder(json.JSONDecoder):
def decode(self, s, **kwargs):
regex_replacements = [
(re.compile(r'([^\\])\\([^\\])'), r'\1\\\\\2'),
(re.compile(r',(\s*])'), r'\1'),
]
for regex, replacement in regex_replacements:
s = regex.sub(replacement, s)
return super().decode(s, **kwargs)
with open(r'C:\Yolo_v4\darknet\build\darknet\x64\result.json') as result:
data = json.load(result, cls=LazyDecoder)
print(data)
This is by subclassing the standard JSONDecoder and using that one for loading.

Hi you need to use double (backslashes), remove the last comma in the objects property and finally you dont need close the file inside the with block
[
{
"frame_id":1,
"filename":"C:\\Yolo_v4\\darknet\\build\\darknet\\x64\\f047.png",
"objects": [
{"class_id":32, "name":"right", "relative_coordinates":{"center_x":0.831927, "center_y":0.202225, "width":0.418463, "height":0.034752}, "confidence":0.976091},
{"class_id":19, "name":"h", "relative_coordinates":{"center_x":0.014761, "center_y":0.873551, "width":0.041723, "height":0.070544}, "confidence":0.484339},
{"class_id":24, "name":"left", "relative_coordinates":{"center_x":0.285694, "center_y":0.200752, "width":0.619584, "height":0.032149}, "confidence":0.646595}
]
}
]

The "\" character is used not only in Windows filepaths but also as an escape character for things like newlines (you can use "\n" instead of an actual newline, for example). To escape the escape, you simply have to put a second backslash before it, like this:
"C:\\Yolo_v4\\darknet\\build\\darknet\\x64\\f047.png"
As someone said in the comments, json.dump should do this automatically for you, so it sounds like something internal is messed up (unless this wasn't created using that).

the function uses the encode method to encode the string as UTF-8, and the decode method with the unicode_escape codec to decode the string, removing any escape sequences in the process.
def remove_escape_sequences(string):
return string.encode('utf-8').decode('unicode_escape')

Appending dictionary to list of dictionaries puts double quotes around keywords

I want to accomplish:
Appending a Dictionary to an existing list of dictionaries and updating a value in that new dictionary.
What my problem is:
When I read in my Dictionary from the .yaml RobotFramework puts double qoutes around the keywords and values as below.
in the .yaml I have
Vlan2: { u'IP': u'1.1.1.1',
u'DNS': {u'SN': u's2', u'PN': u's1'},
u'SRoute': [{u'IF': u'eth0', u'Mask': u'0.0.0.0'}]
}
but when I do
Collections.Set To Dictionary ${Vlan2} IP=2.2.2.2
and I log to console
Log To Console ${Vlan2}
I get
[{ "u'IP'": "u'1.1.1.1'",
u'IP': '2.2.2.2',
"u'DNS'": {"u'SN'": "u's2'", "u'PN'": "u's1'"},
"u'SRoute'": [{"u'IF'": "u'eth0'", "u'Mask'": "u'0.0.0.0'"}]
}]
I think this is happening because Robot Framework is adding double qoutes when it reads in the values from the .yaml cause it to appear as a different keyword, but I cannot find out how to fix this.
It would be ideal to avoid the double qoutes all together since the JSON the info is going to is single qoute based as in the .yaml.
Any help is appreciated!

There is quite a lot confusion going on here. This part of your YAML:
{ u'IP': u'1.1.1.1',
Starts a mapping ({) and gives a key-value pair. both key and value are scalars. The first scalar starts with a u and ends before the key indicator (:), so the content of the scalar is u'IP'. Note that this probably is not what you want, because you say:
since the JSON the info is going to is single qoute based as in the .yaml.
You seem to think that you are using single-quoted scalars in your YAML when in fact, you are using unquoted scalars. In YAML, if a scalar does not start with a quotation mark (' or "), it is a plain scalar and further quotation marks inside it are parsed as content. Your scalars start with a u making them unquoted scalars. Your YAML should probably look like this:
Vlan2: { IP: 1.1.1.1,
DNS: {SN: s2, PN: s1},
SRoute: [{IF: eth0, Mask: 0.0.0.0}]
}
Another important thing to remember is that when loaded into Python, the representation style of a scalar is lost – it does not make a difference if it was single-quoted, double-quoted or unquoted in the YAML file.
Now let's look at the output: Here again, the strings are represented in textual form. This means that they are quoted by some means. The representation "u'IP'" matches exactly your input, the double quotes are not added to the string; they are just used as a means to tell you that the enclosed characters form a string.
Then there's this representation in the output: u'IP'. This is still a quoted string, just with the Python-specific representation of having a u in front indicating that this is a unicode string – its content is IP. The u-prefixed representation does not exist in YAML and this is why your input does not work correctly. In YAML, all input is unicode, usually encoded as UTF-8. The u'IP' in the output is the IP value you have set with your code. And because it did not match any existing dict key (as your original key, as explained, has the content u'IP', represented in the output as "u'IP'"), it has been added as additional key to the dict.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.