Insert XML String into Python dictionary value - python

I've had trouble searching this because there's a lot of "turn xml into a dictionary" posts but that's not what I'm looking for. I have no desire or need to parse the xml string.
I have an xml string that I want to insert into one dictionary element. My dictionary looks like this
{'JobName':'Test','JobProgram':'1234','JobParameters':'<xmlString><some have="double quotes" /><theresAlso aPath="\\path\with\(paraenthesis)\goes\here" /></xmlString>'}
But that doesn't seem to work as is, I'm guessing it has to do with the <> and double quotes. So what do I need to do?
My end goal is to send all this as a POST command to URL.php using the requests library in python. URL.php then uses htmlspecialchars($JobParameters), so I'm not fully sure I know what it expects as input either, raw xml or stripslashes(xml) or something else. I can read but cannot edit the php file.

JSON does not accept single quotes surrounding key names and values.
Double quotes (\") and literal backslashes (\\) must be escaped inside a value.
Using double quotes and testing on command line with jq
echo '{"JobName":"Test","JobProgram":"1234","JobParameters":"<xmlString><some have=\"double quotes\" /><theresAlso aPath=\"\\path\\goes\\here\" /></xmlString>"}' | jq -r '.'
Result (meaning valid Json):
{
"JobName": "Test",
"JobProgram": "1234",
"JobParameters": "<xmlString><some have=\"double quotes\" /><theresAlso aPath=\"\\path\\goes\\here\" /></xmlString>"
}

Related

Eliminating " " from a JSON file so that they don't interrupt the string [duplicate]

While trying to parse JSON from an AJAX request, the string returned contains invalid JSON.
Although the best practice would be to change the server to reply with valid JSON, as suggested in multiple related answers, this is not an option.
Trying to solve this problem using python, I looked at regular expressions.
The main problem is elements as follows (which I currently use as a test string:
testStr = '{"KEY1":"THIS IS "AN" ELEMENT","KEY2":"""THIS IS ANOTHER "ELEMENT""}'
I currently use the following code:
jsonString = re.sub(r'(?<=\w)\"(?=[^\(\:\}\,])','\\"',testStr)
jsonString = re.sub(r'\"\"(?![,}:])','\"\\\"',jsonString)
with very limited success.
If I was using C, I would parse the string, and simply escape all double quotes within the element (i.e between all double quotes which are preceded by [:{},] )
There must be a pythonic way to parse, without resorting to a for loop and looking ahead, and keeping history.
EDIT:
Assuming that strings do not contain: [ : { } ]
And also assuming that the unescaped double quotes are only within the value, and not in the key,
Then I assume that the following (or something similar should solve the problem:
import re
re.sub(r'(?<![\[\:])\"(?![,\}),'\"',testString)
But it still does not work.
Seems I needed a break to solve this.
The following regular expression seems to replace only doublequotes that are contained within the element string. (With the assumptions I stated in the question)
output = re.sub(r'(?<![\[\:\{\,])\"(?![\:\}\,])','\\\"', stringName)
I have created a sandbox here: https://repl.it/vNK
Example Output:
Original String:
{"KEY1":"THIS IS "AN" ELEMENT","KEY2":"""THIS IS ANOTHER "ELEMENT""}
Modified String:
{"KEY1":"THIS IS \"AN\" ELEMENT","KEY2":"\"\"THIS IS ANOTHER \"ELEMENT\""}
Parsed JSON:
{
"KEY1": "THIS IS \"AN\" ELEMENT",
"KEY2": "\"\"THIS IS ANOTHER \"ELEMENT\""
}
Any suggestions are welcome.

How to write a string with escaped \n to a file

Say I have a json file like
{"text":"Here is a string\nhere is another string","timestamp":"2021-05-24"}
I am trying to load it to a file and write the text field down, so the output file will be exactly
Here is a string\nhere is another string
However, if I read it as a json and do something like out.write(j['text']), I will get
Here is a string
here is another string
in the file, which translates \n into a new line. Is there a way I could output the string in the desired way?
Yes, however you would need to escape the \ in your string. Try using str.replace('\n', '\\n'). However, this would not work in cases where you would want to have \n somewhere else in the string.

Converting backslash single quote \' to backslash double quote \" for JSON

I've got a JSON file that was converted to a string in Python. Somehow along the way the double quotes have gotten replaced with single quotes.
{\'MyJSON\': {\'Report\': \'1\' ....
I need to convert my string so that it is in this format instead:
{\"MyJSON\": {\"Report\": \"1\" ....
My problem is that using str.replace, I can't figure out how to convert a single quote into a double quote as both quotes are escaped.
My ultimate goal is to be able to put the string into json.loads so that I can pretty print it.
Attempts:
txt.replace(r"\'", r'\"')
> "{'MyJSON': {'Report': '1'"
txt.replace("\"", "\'")
> "{'MyJSON': {'Report': '1'"
If I save my string to a txt file it appears in the preview as:
{'MyJSON': {'Report': '1' ...
So I think what I actually need to do is replace ' with "
I have decided to use ast.literal_eval(txt) which can convert my string to a dictionary. From there, json.loads(json.dumps(dict)) gets me to JSON
i mean,
my_string = "\"\'"
print(my_string.replace("\'", "\""))
works perfectly fine
EDIT: i didn't mean use this directly, it was a proof of concept. In mine the replacement was reversed. I have updated this snippet such that it could directly be put into your code. Try it again
Instead of focusing on the backslashes to try to "hack" a json string / dict str into a JSON, a better solution is to take it one step at a time and start by converting my dict string into a dictionary.
import ast
txt = ast.literal_eval(txt) # convert my string to a dictionary
txt = json.loads(json.dumps(txt)) # convert my dict to JSON

Appending dictionary to list of dictionaries puts double quotes around keywords

I want to accomplish:
Appending a Dictionary to an existing list of dictionaries and updating a value in that new dictionary.
What my problem is:
When I read in my Dictionary from the .yaml RobotFramework puts double qoutes around the keywords and values as below.
in the .yaml I have
Vlan2: { u'IP': u'1.1.1.1',
u'DNS': {u'SN': u's2', u'PN': u's1'},
u'SRoute': [{u'IF': u'eth0', u'Mask': u'0.0.0.0'}]
}
but when I do
Collections.Set To Dictionary ${Vlan2} IP=2.2.2.2
and I log to console
Log To Console ${Vlan2}
I get
[{ "u'IP'": "u'1.1.1.1'",
u'IP': '2.2.2.2',
"u'DNS'": {"u'SN'": "u's2'", "u'PN'": "u's1'"},
"u'SRoute'": [{"u'IF'": "u'eth0'", "u'Mask'": "u'0.0.0.0'"}]
}]
I think this is happening because Robot Framework is adding double qoutes when it reads in the values from the .yaml cause it to appear as a different keyword, but I cannot find out how to fix this.
It would be ideal to avoid the double qoutes all together since the JSON the info is going to is single qoute based as in the .yaml.
Any help is appreciated!
There is quite a lot confusion going on here. This part of your YAML:
{ u'IP': u'1.1.1.1',
Starts a mapping ({) and gives a key-value pair. both key and value are scalars. The first scalar starts with a u and ends before the key indicator (:), so the content of the scalar is u'IP'. Note that this probably is not what you want, because you say:
since the JSON the info is going to is single qoute based as in the .yaml.
You seem to think that you are using single-quoted scalars in your YAML when in fact, you are using unquoted scalars. In YAML, if a scalar does not start with a quotation mark (' or "), it is a plain scalar and further quotation marks inside it are parsed as content. Your scalars start with a u making them unquoted scalars. Your YAML should probably look like this:
Vlan2: { IP: 1.1.1.1,
DNS: {SN: s2, PN: s1},
SRoute: [{IF: eth0, Mask: 0.0.0.0}]
}
Another important thing to remember is that when loaded into Python, the representation style of a scalar is lost – it does not make a difference if it was single-quoted, double-quoted or unquoted in the YAML file.
Now let's look at the output: Here again, the strings are represented in textual form. This means that they are quoted by some means. The representation "u'IP'" matches exactly your input, the double quotes are not added to the string; they are just used as a means to tell you that the enclosed characters form a string.
Then there's this representation in the output: u'IP'. This is still a quoted string, just with the Python-specific representation of having a u in front indicating that this is a unicode string – its content is IP. The u-prefixed representation does not exist in YAML and this is why your input does not work correctly. In YAML, all input is unicode, usually encoded as UTF-8. The u'IP' in the output is the IP value you have set with your code. And because it did not match any existing dict key (as your original key, as explained, has the content u'IP', represented in the output as "u'IP'"), it has been added as additional key to the dict.

JSON String with elements containing unescaped double quotes

While trying to parse JSON from an AJAX request, the string returned contains invalid JSON.
Although the best practice would be to change the server to reply with valid JSON, as suggested in multiple related answers, this is not an option.
Trying to solve this problem using python, I looked at regular expressions.
The main problem is elements as follows (which I currently use as a test string:
testStr = '{"KEY1":"THIS IS "AN" ELEMENT","KEY2":"""THIS IS ANOTHER "ELEMENT""}'
I currently use the following code:
jsonString = re.sub(r'(?<=\w)\"(?=[^\(\:\}\,])','\\"',testStr)
jsonString = re.sub(r'\"\"(?![,}:])','\"\\\"',jsonString)
with very limited success.
If I was using C, I would parse the string, and simply escape all double quotes within the element (i.e between all double quotes which are preceded by [:{},] )
There must be a pythonic way to parse, without resorting to a for loop and looking ahead, and keeping history.
EDIT:
Assuming that strings do not contain: [ : { } ]
And also assuming that the unescaped double quotes are only within the value, and not in the key,
Then I assume that the following (or something similar should solve the problem:
import re
re.sub(r'(?<![\[\:])\"(?![,\}),'\"',testString)
But it still does not work.
Seems I needed a break to solve this.
The following regular expression seems to replace only doublequotes that are contained within the element string. (With the assumptions I stated in the question)
output = re.sub(r'(?<![\[\:\{\,])\"(?![\:\}\,])','\\\"', stringName)
I have created a sandbox here: https://repl.it/vNK
Example Output:
Original String:
{"KEY1":"THIS IS "AN" ELEMENT","KEY2":"""THIS IS ANOTHER "ELEMENT""}
Modified String:
{"KEY1":"THIS IS \"AN\" ELEMENT","KEY2":"\"\"THIS IS ANOTHER \"ELEMENT\""}
Parsed JSON:
{
"KEY1": "THIS IS \"AN\" ELEMENT",
"KEY2": "\"\"THIS IS ANOTHER \"ELEMENT\""
}
Any suggestions are welcome.

Categories

Resources