Saving python variable with new lines in JSON with pretty print - python

I am reading this text from a CSV file in Python.
Hi there,
This is a test.
and storing it into a variable text.
I am trying to write this variable in a JSON file with json.dump(), but it is being transformed into:
' \ufeffHi there,\n\n\xa0\n\nThis is a test.
How can I make my JSON file look like the one below?:
{
"text": "Hi there,
This is a test."
}

JSON does not allow real line-breaks. If you still want to use them, you will have to make your own "json" writer.
Edit: Here's function that will take python dict (which you can get using json.loads() ) and print it the way you need:
def print_wrong_json(dict_object):
print '{'
print ',\n'.join(['"{}": "{}"'.format(key, dict_object[key]) for key in dict_object])
print '}'

Well it can be done, as user1308345 shows in his answer but it wouldn't be valid JSON anymore and you probably run into issues later, when deserializing the JSON.
But if you really want to do it, and still want to have valid JSON, you could split the string (and remove the new lines) and serialize them as an array like suggested in this answer https://stackoverflow.com/a/7744658/1466757
Then your JSON would look similar to this
{
"text": [
"Hi there,",
"",
"",
"",
"this is a test."
]
}
After deserializing it, you would have to put the line breaks back in.

Related

Eliminating " " from a JSON file so that they don't interrupt the string [duplicate]

While trying to parse JSON from an AJAX request, the string returned contains invalid JSON.
Although the best practice would be to change the server to reply with valid JSON, as suggested in multiple related answers, this is not an option.
Trying to solve this problem using python, I looked at regular expressions.
The main problem is elements as follows (which I currently use as a test string:
testStr = '{"KEY1":"THIS IS "AN" ELEMENT","KEY2":"""THIS IS ANOTHER "ELEMENT""}'
I currently use the following code:
jsonString = re.sub(r'(?<=\w)\"(?=[^\(\:\}\,])','\\"',testStr)
jsonString = re.sub(r'\"\"(?![,}:])','\"\\\"',jsonString)
with very limited success.
If I was using C, I would parse the string, and simply escape all double quotes within the element (i.e between all double quotes which are preceded by [:{},] )
There must be a pythonic way to parse, without resorting to a for loop and looking ahead, and keeping history.
EDIT:
Assuming that strings do not contain: [ : { } ]
And also assuming that the unescaped double quotes are only within the value, and not in the key,
Then I assume that the following (or something similar should solve the problem:
import re
re.sub(r'(?<![\[\:])\"(?![,\}),'\"',testString)
But it still does not work.
Seems I needed a break to solve this.
The following regular expression seems to replace only doublequotes that are contained within the element string. (With the assumptions I stated in the question)
output = re.sub(r'(?<![\[\:\{\,])\"(?![\:\}\,])','\\\"', stringName)
I have created a sandbox here: https://repl.it/vNK
Example Output:
Original String:
{"KEY1":"THIS IS "AN" ELEMENT","KEY2":"""THIS IS ANOTHER "ELEMENT""}
Modified String:
{"KEY1":"THIS IS \"AN\" ELEMENT","KEY2":"\"\"THIS IS ANOTHER \"ELEMENT\""}
Parsed JSON:
{
"KEY1": "THIS IS \"AN\" ELEMENT",
"KEY2": "\"\"THIS IS ANOTHER \"ELEMENT\""
}
Any suggestions are welcome.

JSONDecodeError; Invalid /escape when parsing from Python

After running my object detection model, it outputs a .json file with the results. In order to actually use the results of the model in my python I need to parse the .json file, but nothing I have tried in order to do it works. I tried just to open and then print the results but I got the error:
json.decoder.JSONDecodeError: Invalid \escape: line 4 column 41 (char 60)
If you have any idea what I did wrong, the help would be very much appreciated. My code:
with open(r'C:\Yolo_v4\darknet\build\darknet\x64\result.json') as result:
data = json.load(result)
result.close()
print(data)
My .json file
[
{
"frame_id":1,
"filename":"C:\Yolo_v4\darknet\build\darknet\x64\f047.png",
"objects": [
{"class_id":32, "name":"right", "relative_coordinates":{"center_x":0.831927, "center_y":0.202225, "width":0.418463, "height":0.034752}, "confidence":0.976091},
{"class_id":19, "name":"h", "relative_coordinates":{"center_x":0.014761, "center_y":0.873551, "width":0.041723, "height":0.070544}, "confidence":0.484339},
{"class_id":24, "name":"left", "relative_coordinates":{"center_x":0.285694, "center_y":0.200752, "width":0.619584, "height":0.032149}, "confidence":0.646595},
]
}
]
(There are several more detected objects but did not include them)
The other responders are of course right. This is not valid JSON. But sometimes you don't have the option to change the format, e.g. because you are working with a broken data dump where the original source is no longer available.
The only way to deal with that is to sanitize it somehow. This is of course not ideal, because you have to put a lot of expectations into your sanitizer code, i.e. you need to know exactly what kind of errors the json file has.
However, a solution using regular expressions could look like this:
import json
import re
class LazyDecoder(json.JSONDecoder):
def decode(self, s, **kwargs):
regex_replacements = [
(re.compile(r'([^\\])\\([^\\])'), r'\1\\\\\2'),
(re.compile(r',(\s*])'), r'\1'),
]
for regex, replacement in regex_replacements:
s = regex.sub(replacement, s)
return super().decode(s, **kwargs)
with open(r'C:\Yolo_v4\darknet\build\darknet\x64\result.json') as result:
data = json.load(result, cls=LazyDecoder)
print(data)
This is by subclassing the standard JSONDecoder and using that one for loading.
Hi you need to use double (backslashes), remove the last comma in the objects property and finally you dont need close the file inside the with block
[
{
"frame_id":1,
"filename":"C:\\Yolo_v4\\darknet\\build\\darknet\\x64\\f047.png",
"objects": [
{"class_id":32, "name":"right", "relative_coordinates":{"center_x":0.831927, "center_y":0.202225, "width":0.418463, "height":0.034752}, "confidence":0.976091},
{"class_id":19, "name":"h", "relative_coordinates":{"center_x":0.014761, "center_y":0.873551, "width":0.041723, "height":0.070544}, "confidence":0.484339},
{"class_id":24, "name":"left", "relative_coordinates":{"center_x":0.285694, "center_y":0.200752, "width":0.619584, "height":0.032149}, "confidence":0.646595}
]
}
]
The "\" character is used not only in Windows filepaths but also as an escape character for things like newlines (you can use "\n" instead of an actual newline, for example). To escape the escape, you simply have to put a second backslash before it, like this:
"C:\\Yolo_v4\\darknet\\build\\darknet\\x64\\f047.png"
As someone said in the comments, json.dump should do this automatically for you, so it sounds like something internal is messed up (unless this wasn't created using that).
the function uses the encode method to encode the string as UTF-8, and the decode method with the unicode_escape codec to decode the string, removing any escape sequences in the process.
def remove_escape_sequences(string):
return string.encode('utf-8').decode('unicode_escape')

Parsing string for further use

I have a string like this "d/\"". I want to split the string to \" using the split('/') function. Then I want to format the string as raw string and append it to a list.
When I want to print/write that string of the list I want to get \".
I tried really many approaches. (With replacing \ character, using repr() function, trying to format string as raw string like that r'%s'%string)
None of that worked correctly. Maybe someone can help me out getting that desired solution.
Thank you in advance,
Greetings
EDIT: Minimal reproducible example:
JSON Object:
{
"name": "d/\"",
"name_encoding": "utf8",
"value": "/\/\",
}
I want a list with the raw strings like that ['\"', 'utf8', '/\/\'] and when write the list to a file or print it, output should look like that \", utf8, /\/\.
Minimal code I use (for reproducing the problem):
with open(...) as f:
for object in ijson.items(f, "item"):
temp_list =[]
for attribute_key in object.keys():
if attribute_key == 'name':
name_parsed = object[attribute_key].split('/', maxsplit=1)[1]
temp_list.append(repr(name_parsed)[1:-1])

Encoding in messenger JSON [duplicate]

This question already has answers here:
Facebook JSON badly encoded
(9 answers)
Closed 3 years ago.
I'm learning to work with JSON by making a simple program in python that analyzes facebook messages in JSON I downloaded, but these messages contain plenty of Unicode characters that are written in the JSON file like this
pom\u00c3\u00b4\u00c5\u00bee
The example above is supposed to be word
pomôže
however, when I try to work with the string and print out the word it comes up like this
'pomôže'
Even most online converters printed it out like this except this one https://github.com/mathiasbynens/utf8.js
Is there any way to fix this?
EDIT:
Alright, so I'm sorry for not being clear enough. Hopefully, this will make things clearer:
I have a JSON file that looks like this, when opened in Notepad++:
{
"participants": [
{
"name": "Person1"
},
{
"name": "Person2"
}
],
"messages": [
{
"sender_name": "Person1",
"timestamp_ms": 1521492166805,
"content": "D\u00c3\u00bafam, \u00c5\u00bee pom\u00c3\u00b4\u00c5\u00bee",
"type": "Generic"
}
]
}
When I try to print or work with the content of the message :
import json
with open("messages.json", "r") as f:
messages = json.load(f)
print(messages["messages"][0]["content"])
the string looks like this:
Dúfam, že pomôže
How do I get the text into readable form?
It took me a while to understand but it is quite easy the reason, the character table is read in many ways, in your case the problem is that you want to print in utf8 but the table utf-8 is related to the system language, you have to print in utf-16
I'll give you some examples:
in javascript:
console.log("pom\u{00f4}\u{017E}e");
in python 3
print("pom"+u"\u00F4"+u"\u017E"+"e")
in python 2
print("pom"+u"\u00F4".encode('utf-8')+u"\u017E".encode('utf-8')+"e")
doc python 2.X
doc python 3.X

JSON String with elements containing unescaped double quotes

While trying to parse JSON from an AJAX request, the string returned contains invalid JSON.
Although the best practice would be to change the server to reply with valid JSON, as suggested in multiple related answers, this is not an option.
Trying to solve this problem using python, I looked at regular expressions.
The main problem is elements as follows (which I currently use as a test string:
testStr = '{"KEY1":"THIS IS "AN" ELEMENT","KEY2":"""THIS IS ANOTHER "ELEMENT""}'
I currently use the following code:
jsonString = re.sub(r'(?<=\w)\"(?=[^\(\:\}\,])','\\"',testStr)
jsonString = re.sub(r'\"\"(?![,}:])','\"\\\"',jsonString)
with very limited success.
If I was using C, I would parse the string, and simply escape all double quotes within the element (i.e between all double quotes which are preceded by [:{},] )
There must be a pythonic way to parse, without resorting to a for loop and looking ahead, and keeping history.
EDIT:
Assuming that strings do not contain: [ : { } ]
And also assuming that the unescaped double quotes are only within the value, and not in the key,
Then I assume that the following (or something similar should solve the problem:
import re
re.sub(r'(?<![\[\:])\"(?![,\}),'\"',testString)
But it still does not work.
Seems I needed a break to solve this.
The following regular expression seems to replace only doublequotes that are contained within the element string. (With the assumptions I stated in the question)
output = re.sub(r'(?<![\[\:\{\,])\"(?![\:\}\,])','\\\"', stringName)
I have created a sandbox here: https://repl.it/vNK
Example Output:
Original String:
{"KEY1":"THIS IS "AN" ELEMENT","KEY2":"""THIS IS ANOTHER "ELEMENT""}
Modified String:
{"KEY1":"THIS IS \"AN\" ELEMENT","KEY2":"\"\"THIS IS ANOTHER \"ELEMENT\""}
Parsed JSON:
{
"KEY1": "THIS IS \"AN\" ELEMENT",
"KEY2": "\"\"THIS IS ANOTHER \"ELEMENT\""
}
Any suggestions are welcome.

Categories

Resources