How to write a string with escaped \n to a file - python

Say I have a json file like
{"text":"Here is a string\nhere is another string","timestamp":"2021-05-24"}
I am trying to load it to a file and write the text field down, so the output file will be exactly
Here is a string\nhere is another string
However, if I read it as a json and do something like out.write(j['text']), I will get
Here is a string
here is another string
in the file, which translates \n into a new line. Is there a way I could output the string in the desired way?

Yes, however you would need to escape the \ in your string. Try using str.replace('\n', '\\n'). However, this would not work in cases where you would want to have \n somewhere else in the string.

Related

When writing in Python a dictionary to a YAML file, how to make sure the string in the YAML file is split based on '\n'?

I have a long string in a dictionary which I will dump to a YAML file.
As an example
d = {'test': {'long_string':'this is a long string that does not succesfully split when it sees the character '\n' which is an issue'}}
ff = open('./test.yaml', 'w+')
yaml.safe_dump(d, ff)
Which produces the following output in the YAML file
test:
long_string: "this is a long string that does not successfully split when it sees\
\ the character '\n' which is an issue"
I want the string which is inside the YAML file to only be split into a new line when it sees the "\n", also, I don't want any characters indicating that it's a newline. I want the output as follows:
test:
long_string: "this is a long string that does not successfully split when it sees the character ''
which is an issue"
What do I need to do to make the yaml.dump or yaml.safe_dump fulfill this?
There is no general solution. YAML is a format intentionally designed in a way that lets the implementation decide on the exact representation of values.
What you can do is to suggest a format. The dumper will honor this suggestion if possible. The one scalar format that breaks at literal newlines in the value and nowhere else is a literal block scalar. This code will dump your string as such if possible:
import yaml, sys
class as_block(str):
#staticmethod
def represent(dumper, data):
return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
yaml.SafeDumper.add_representer(as_block, as_block.represent)
d = {'test': {'long_string':as_block('this is a long string that does not succes
fully split when it sees the character\n which is an issue')}}
yaml.safe_dump(d, sys.stdout)
Output:
test:
long_string: |-
this is a long string that does not succesfully split when it sees the character
which is an issue
I use as_block for the string that should be written as block scalar.
You can theoretically use this for all strings, but be aware that long_string and test would then also be written als block scalars, which is most probably not what you want.
This will not work when there is space before the line break, because YAML ignores space at the end of a line of a block scalar, so the serializer will choose another format to not lose the space character(s).
You can also take a step back and ask yourself why this is an issue in the first place. A YAML implementation is perfectly able to load the generated YAML and reconstruct your string.

Why does python add additional backslashes to the path?

I have a text file with a path that goes like this:
r"\\user\data\t83\rf\Desktop\QA"
When I try to read this file a print a line it returns the following string, I'm unable to open the file from this location:
'r"\\\\user\\data\\t83\\rf\\Desktop\\QA"\n'
Seems you've got Python code in your text file, so either sanitize your file, so it only includes the actual path (not a Python string representation) or you can try to fiddle with string replace until you're satisfied, or just evaluate the Python string.
Note that using eval() opens Padora's box (it as unsafe as it gets), it's safer to use ast.literal_eval() instead.
import ast
file_content = 'r"\\\\user\\data\\t83\\rf\\Desktop\\QA"\n'
print(eval(file_content)) # do not use this, it's only shown for the sake of completeness
print(ast.literal_eval(file_content))
Output:
\\user\data\t83\rf\Desktop\QA
\\user\data\t83\rf\Desktop\QA
Personally, I'd prefer to sanitize the file, so it only contains \\user\data\t83\rf\Desktop\QA
\ will wait for another character to form one like \n (new line) or \t (tab) therefore a single backslash will merge with the next character. To solve this if the next character is \\ it will represent the single backslash.

Converting backslash single quote \' to backslash double quote \" for JSON

I've got a JSON file that was converted to a string in Python. Somehow along the way the double quotes have gotten replaced with single quotes.
{\'MyJSON\': {\'Report\': \'1\' ....
I need to convert my string so that it is in this format instead:
{\"MyJSON\": {\"Report\": \"1\" ....
My problem is that using str.replace, I can't figure out how to convert a single quote into a double quote as both quotes are escaped.
My ultimate goal is to be able to put the string into json.loads so that I can pretty print it.
Attempts:
txt.replace(r"\'", r'\"')
> "{'MyJSON': {'Report': '1'"
txt.replace("\"", "\'")
> "{'MyJSON': {'Report': '1'"
If I save my string to a txt file it appears in the preview as:
{'MyJSON': {'Report': '1' ...
So I think what I actually need to do is replace ' with "
I have decided to use ast.literal_eval(txt) which can convert my string to a dictionary. From there, json.loads(json.dumps(dict)) gets me to JSON
i mean,
my_string = "\"\'"
print(my_string.replace("\'", "\""))
works perfectly fine
EDIT: i didn't mean use this directly, it was a proof of concept. In mine the replacement was reversed. I have updated this snippet such that it could directly be put into your code. Try it again
Instead of focusing on the backslashes to try to "hack" a json string / dict str into a JSON, a better solution is to take it one step at a time and start by converting my dict string into a dictionary.
import ast
txt = ast.literal_eval(txt) # convert my string to a dictionary
txt = json.loads(json.dumps(txt)) # convert my dict to JSON

removing weird double quotes (from excel file) in python string

I'm loading in an excel file to python3 using xlrd. They are basically lines of text in a spreadsheet. On some of these lines are quotation marks. For example, one line can be:
She said, "My name is Jennifer."
When I'm reading them into python and making them into strings, the double quotes are read in as a weird double quote character that looks like a double quote in italics. I'm assuming that somewhere along the way, python read in the character as some foreign character rather than actual double quotes due to some encoding issue or something. So in the above example, if I assign that line as "text", then we'll have something like the following (although not exactly since I don't actually type out the line, so imagine "text" was already assigned beforehand):
text = 'She said, “My name is Jennifer.”'
text[10] == '"'
The second line will spit out a False because it doesn't seem to recognize it as a normal double quote character. I'm working within the Mac terminal if that makes a difference.
My questions are:
1. Is there a way to easily strip these weird double quotes?
2. Is there a way when I read in the file to get python to recognize them as double quotes properly?
I'm assuming that somewhere along the way, python read in the character as some foreign character
Yes; it read that in because that's what the file data actually represents.
rather than actual double quotes due to some encoding issue or something.
There's no issue with the encoding. The actual character is not an "actual double quote".
Is there a way to easily strip these weird double quotes?
You can use the .replace method of strings as you would normally, to either replace them with an "actual double quote" or with nothing.
Is there a way when I read in the file to get python to recognize them as double quotes properly?
If you're looking for them, you can compare them to the character they actually are.
As noted in the comment, they are most likely U+201C LEFT DOUBLE QUOTATION MARK and U+201D RIGHT DOUBLE QUOTATION MARK. They're used so that opening and closing quotes can look different (by curving in different directions), which pretty typography normally does (as opposed to using " which is simply more convenient for programmers). You represent them in Python with a Unicode escape, thus:
text[10] == '\u201c'
You could also have directly asked Python for this info, by asking for text[10] at the Python command line (which would evaluate that and show you the representation), or explicitly in a script with e.g. print(repr(text[10])).

stripping away code in python using "re.sub"

I read this:
Stripping everything but alphanumeric chars from a string in Python
and this:
Python: Strip everything but spaces and alphanumeric
Didn't quite understand but I tried a bit on my own code, which now looks like this:
import re
decrypt = str(open("crypt.txt"))
crypt = re.sub(r'([^\s\w]|_)+', '', decrypt)
print(crypt)
When I run the script It comes back with this answer:
C:\Users\Adrian\Desktop\python>python tick.py
ioTextIOWrapper namecrypttxt moder encodingcp1252
I am trying to get away all the extra code from the document and just keep numbers and letter, inside the document the following text can be found: http://pastebin.com/Hj3SjhxC
I am trying to solve the assignment here:
http://www.pythonchallenge.com/pc/def/ocr.html
Anyone knows what "ioTextIOWrapper namecrypttxt moder encodingcp1252" means?
And how should I format the code to properly strip it from everything except letter and numbers?
Sincerely
str(open("file.txt")) doesn't do what you think it does. open() returns a file object. str gives you the string representation of that file object, not the contents of the file. If you want to read the contents of the file use open("file.txt").read().
Or, more safely, use a with statement:
with open("file.txt") as f:
decrypt = f.read()
crypt = ...
# etc.
You could just search for the alphanumeric chars instead. Like this:
print ''.join(re.findall('[A-Za-z]', decrypt))
And you also want:
decrypt = open("crypt.txt").read()

Categories

Resources