Replacing strings in a text file using Python adds weird characters

Replacing strings in a text file using Python adds weird characters - python

I want to replace a text with a path in each line of a text file using Python, but I am getting weird characters (squares) in the path in output file.
Current code:
#!/usr/bin/env python
f1 = open('input.txt', 'r')
f2 = open('output.txt', 'w')
for line in f1:
f2.write(line.replace('test/software', 'C:\Software\api\render\3bit\sim>'))
f1.close()
f2.close()
In the output text the following in the path is replaced with a square (weird character):
\a = changed to a square
\r = changed to a square
\3 = changed to a square
Is there something wrong with my code or are the above letters reserved for the system?

Python strings support escape codes; a backslash with certain characters is replaced by the code they represent. \r is interpreted as the ASCII line-feed character, for example, \a is an ASCII BELL, and \3 is interpreted as the ascii codepoint 3 (in octal numbering). See the Python string literal documentation.
To disable escape codes being interpreted, use a raw python string by prefixing the string definition with a r:
r'C:\Software\api\render\3bit\sim>'
so your line reads:
f2.write(line.replace('test/software', r'C:\Software\api\render\3bit\sim>'))
Alternatively, double the backslashes to have them interpreted as a literal backslashes instead:
'C:\\Software\\api\\render\\3bit\\sim>'

Your string has backslashes which are read in Python as escape codes. These are when a character preceded by a backslash is changed into a special character. For example, \n is a newline. You will need to either escape them (with another backslash) or just use a raw string.
r'C:\Software\api\render\3bit\sim>'

Before each of your path files, add an "r" character to create a raw string, this might fix the issue. Example:
f2.write(line.replace('test/software', r'C:\Software\api\render\3bit\sim>'))
Or alternatively, escape your backslashes:
f2.write(line.replace('test/software', 'C:\\Software\\api\\render\\3bit\\sim>'))

Related

Why aren't all the backslashes being replaced as per replace() statement?

I'm trying to replace backslashes (from this polyline) with double backslashes.
My Script:
txt = "qnkyHbgFYhBi#lFA|#Dz#K~#Oz#An#Dz#A|#D|#M|BJZd#KDHRzBf#lMNrBB~#Lz#\h#^`#Lt#?L]`#GH?FJl#Xl#NNLXVpBHV^LZf#Jt##j#Kx#OjB#l#b#ZJDv#d#`#RXFp#\Zl#d#|Ad#`#jAzBf#nATn#\xAD~#Jt#TJTv#^b#f#^b#b#f#^r#nA`#^\f#`#VVp#Lx#Vb#d#n#\l#V|BLx#Hz#Pp#Lz#Fl#ErAFf#K~#IvBQr#m#hBFfAC|#g#nC?z#ANi#hBGjAMx#OvBa#hBWfBSx#?r#M|#Qv#Wn#q#tAMt##z#Px#kAvDWj#u#pAYr#Mx#q#fA_#d#K~#YTAr#C##B#`#ITSL}#I]vBz#pA#z#KVGDk##g#Eg#Kg##g#Ec#Fe#YSZGFEUUg#Gz#i#j#S|#Wr#Sp#[hBDx#BvBD~#BvCNx#Zj#Z^Zf#TXb#R^Z\`#f#FXj#BTOdA]`BUx#Wl#Oz#G|#Hn#At#m#lI#XXf#?v#IBCq#M_AV{B\wGLs#To#|#Th#In#`##cBWmAuAeE[k#e#YYg#a#_#c#YQ[y#aAWa#Us#?iAEk#?{#KaFNy#d#qBh#eDl#s#J#JMLKLT`#R`#\X~#b#^^PfAr#\Lf#Fd#m#J{ABkALw#d#a#?_AMoA?yCbAqC#s#hBwBXi#Ng#Nu#n#sAMuBJ}#\kACs#[e#Wk#e#c#k#YYYIBi#eAMy#A}#Im#G}#JyBDyB?{#IyBC_AKc#_#QgAa#]m#eAo#]e#Om#SsAIy#GwDIwB#}BF}BC]AcEIq#Wq#i#eBMy#AkBUyDIKEvAIZUNK?MFc#ZUGKeAG}#M{DLgAD_ADiFHKtBITm#By#OS[QAUF}#PeBA}AIe#GQc#g#UaBUs#}#gAKi#Ns#A[Kw#GS]g#Ws#I}BOwB?{#SqDA_AQwDAw#g#DEGAu#B}BA}#Fy#Re#Hw#A_AUiAH{#RYXk#EwBG{#OiAAmA"
x = txt.replace("\\", "\\\\")
print(x)
Output (top string has spaces added to highlight the differences to the orginal string below)
qnkyHbgFYhBi#lFA|#Dz#K~#Oz#An#Dz#A|#D|#M|BJZd#KDHRzBf#lMNrBB~#Lz#\\h#^`#Lt#?L]`#GH?FJl#Xl#NNLXVpBHV^LZf#Jt##j#Kx#OjB#l#b#ZJDv#d#`#RXFp#\\Zl#d#|Ad#`#jAzBf#nATn#~ #Jt#TJTv#^b#f#^b#b#f#^r#nA`#^#` #VVp#Lx#Vb#d#n#\\ l#V|BLx#Hz#Pp#Lz#Fl#ErAFf#K~#IvBQr#m#hBFfAC|#g#nC?z#ANi#hBGjAMx#OvBa#hBWfBSx#?r#M|#Qv#Wn#q#tAMt##z#Px#kAvDWj#u#pAYr#Mx#q#fA_#d#K~#YTAr#C##B#`#ITSL}#I]vBz#pA#z#KVGDk##g#Eg#Kg##g#Ec#Fe#YSZGFEUUg#Gz#i#j#S|#Wr#Sp#[hBDx#BvBD~#BvCNx#Zj#Z^Zf#TXb#R^Z\\`#f#FXj#BTOdA]`BUx#Wl#Oz#G|#Hn#At#m#lI#XXf#?v#IBCq#M_AV{B\\wGLs#To#|#Th#In#`##cBWmAuAeE[k#e#YYg#a#_#c#YQ[y#aAWa#Us#?iAEk#?{#KaFNy#d#qBh#eDl#s#J#JMLKLT`#R`#\\X~#b#^^PfAr#\\Lf#Fd#m#J{ABkALw#d#a#?_AMoA?yCbAqC#s#hBwBXi#Ng#Nu#n#sAMuBJ}#\\kACs#[e#Wk#e#c#k#YYYIBi#eAMy#A}#Im#G}#JyBDyB?{#IyBC_AKc#_#QgAa#]m#eAo#]e#Om#SsAIy#GwDIwB#}BF}BC]AcEIq#Wq#i#eBMy#AkBUyDIKEvAIZUNK?MFc#ZUGKeAG}#M{DLgAD_ADiFHKtBITm#By#OS[QAUF}#PeBA}AIe#GQc#g#UaBUs#}#gAKi#Ns#A[Kw#GS]g#Ws#I}BOwB?{#SqDA_AQwDAw#g#DEGAu#B}BA}#Fy#Re#Hw#A_AUiAH{#RYXk#EwBG{#OiAAmA
qnkyHbgFYhBi#lFA|#Dz#K~#Oz#An#Dz#A|#D|#M|BJZd#KDHRzBf#lMNrBB~#Lz#\ h#^`#Lt#?L]`#GH?FJl#Xl#NNLXVpBHV^LZf#Jt##j#Kx#OjB#l#b#ZJDv#d#`#RXFp#\ Zl#d#|Ad#`#jAzBf#nATn#\ xAD~#Jt#TJTv#^b#f#^b#b#f#^r#nA`#^\f#`#VVp#Lx#Vb#d#n#\ l#V|BLx#Hz#Pp#Lz#Fl#ErAFf#K~#IvBQr#m#hBFfAC|#g#nC?z#ANi#hBGjAMx#OvBa#hBWfBSx#?r#M|#Qv#Wn#q#tAMt##z#Px#kAvDWj#u#pAYr#Mx#q#fA_#d#K~#YTAr#C##B#`#ITSL}#I]vBz#pA#z#KVGDk##g#Eg#Kg##g#Ec#Fe#YSZGFEUUg#Gz#i#j#S|#Wr#Sp#[hBDx#BvBD~#BvCNx#Zj#Z^Zf#TXb#R^Z\ `#f#FXj#BTOdA]`BUx#Wl#Oz#G|#Hn#At#m#lI#XXf#?v#IBCq#M_AV{B\ wGLs#To#|#Th#In#`##cBWmAuAeE[k#e#YYg#a#_#c#YQ[y#aAWa#Us#?iAEk#?{#KaFNy#d#qBh#eDl#s#J#JMLKLT`#R`#\ X~#b#^^PfAr#\ Lf#Fd#m#J{ABkALw#d#a#?_AMoA?yCbAqC#s#hBwBXi#Ng#Nu#n#sAMuBJ}#\ kACs#[e#Wk#e#c#k#YYYIBi#eAMy#A}#Im#G}#JyBDyB?{#IyBC_AKc#_#QgAa#]m#eAo#]e#Om#SsAIy#GwDIwB#}BF}BC]AcEIq#Wq#i#eBMy#AkBUyDIKEvAIZUNK?MFc#ZUGKeAG}#M{DLgAD_ADiFHKtBITm#By#OS[QAUF}#PeBA}AIe#GQc#g#UaBUs#}#gAKi#Ns#A[Kw#GS]g#Ws#I}BOwB?{#SqDA_AQwDAw#g#DEGAu#B}BA}#Fy#Re#Hw#A_AUiAH{#RYXk#EwBG{#OiAAmA
So you can see
#\xAD~#
becomes
#~#
when I would expect it to be
#\\xAD~#.

The basic problem is that there are no backslashes in your string. Your source code has backslashes, but they are all escape signals. If you want to retain the backslashes in "WYSIWYG" style, use raw string mode:
txt = r"qnkyHbgFYhB..."
This will retain the characters as seen, without processing the usual escape sequences.

Trouble loading csv file into Jupyter Notebooks [duplicate]

This question already has answers here:
How should I write a Windows path in a Python string literal?
(5 answers)
Closed 3 years ago.
I'm trying to read a CSV file into Python (Spyder), but I keep getting an error. My code:
import csv
data = open("C:\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener")
data = csv.reader(data)
print(data)
I get the following error:
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes
in position 2-3: truncated \UXXXXXXXX escape
I have tried to replace the \ with \\ or with / and I've tried to put an r before "C.., but all these things didn't work.

This error occurs, because you are using a normal string as a path. You can use one of the three following solutions to fix your problem:
1: Just put r before your normal string. It converts a normal string to a raw string:
pandas.read_csv(r"C:\Users\DeePak\Desktop\myac.csv")
2:
pandas.read_csv("C:/Users/DeePak/Desktop/myac.csv")
3:
pandas.read_csv("C:\\Users\\DeePak\\Desktop\\myac.csv")

The first backslash in your string is being interpreted as a special character. In fact, because it's followed by a "U", it's being interpreted as the start of a Unicode code point.
To fix this, you need to escape the backslashes in the string. The direct way to do this is by doubling the backslashes:
data = open("C:\\Users\\miche\\Documents\\school\\jaar2\\MIK\\2.6\\vektis_agb_zorgverlener")
If you don't want to escape backslashes in a string, and you don't have any need for escape codes or quotation marks in the string, you can instead use a "raw" string, using "r" just before it, like so:
data = open(r"C:\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener")

You can just put r in front of the string with your actual path, which denotes a raw string. For example:
data = open(r"C:\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener")

Consider it as a raw string. Just as a simple answer, add r before your Windows path.
import csv
data = open(r"C:\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener")
data = csv.reader(data)
print(data)

Try writing the file path as "C:\\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener" i.e with double backslash after the drive as opposed to "C:\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener"

Add r before your string. It converts a normal string to a raw string.

As per String literals:
String literals can be enclosed within single quotes (i.e. '...') or double quotes (i.e. "..."). They can also be enclosed in matching groups of three single or double quotes (these are generally referred to as triple-quoted strings).
The backslash character (i.e. \) is used to escape characters which otherwise will have a special meaning, such as newline, backslash itself, or the quote character. String literals may optionally be prefixed with a letter r or R. Such strings are called raw strings and use different rules for backslash escape sequences.
In triple-quoted strings, unescaped newlines and quotes are allowed, except that the three unescaped quotes in a row terminate the string.
Unless an r or R prefix is present, escape sequences in strings are interpreted according to rules similar to those used by Standard C.
So ideally you need to replace the line:
data = open("C:\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener")
To any one of the following characters:
Using raw prefix and single quotes (i.e. '...'):
data = open(r'C:\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener')
Using double quotes (i.e. "...") and escaping backslash character (i.e. \):
data = open("C:\\Users\\miche\\Documents\\school\\jaar2\\MIK\\2.6\\vektis_agb_zorgverlener")
Using double quotes (i.e. "...") and forwardslash character (i.e. /):
data = open("C:/Users/miche/Documents/school/jaar2/MIK/2.6/vektis_agb_zorgverlener")

Just putting an r in front works well.
eg:
white = pd.read_csv(r"C:\Users\hydro\a.csv")

It worked for me by neutralizing the '' by f = open('F:\\file.csv')

The double \ should work for Windows, but you still need to take care of the folders you mention in your path. All of them (except the filename) must exist. Otherwise you will get an error.

Understanding file locations in python - unexpected errors

I am learning python 3.3 in windows 7. I have a two text files - lines.txt and raven.txt in a folder. Both contain the same text for the first example.
When I try to access ravens, using the code below, I get the error -
OSError: [Errno 22] Invalid argument: 'C:\\Python\raven.txt'
I know that the above error can be fixed by using an escape character like this -
C:\\Python\\raven.txt
C:\Python\\raven.txt
Why do both methods work ? Strangely, when I access lines.txt in the same folder, I get no error ! Why ?
import re
def main():
print('')
fh = open('C:\Python\lines.txt')
for line in fh:
if re.search('(Len|Neverm)ore', line):
print(line, end = '')
if __name__ == '__main__':main()
Also, when I use the line below, I get a completely different error - TypeError: embedded NUL character. Why ?
fh = open('C:\Python\Exercise Files\09 Regexes\raven.txt')
I can rectify this by using \ before every \ in the file path.

\r is an escape character, but \l is not. So, lines is interpreted as lines while raven is interpreted as aven, since \r is escaped.
In [1]: len('\l')
Out[1]: 2
In [2]: len('\r')
Out[2]: 1
You should always escape backslashes with \\. In cases your string doesn't have quotes, you can also use raw strings:
In [9]: len(r'\r')
Out[9]: 2
In [10]: r'\r'
Out[10]: '\\r'
See: https://docs.python.org/3/reference/lexical_analysis.html

maybe you can use raw string.
just like this open(r'C:\Python\Exercise Files\09 Regexes\raven.txt').
When an r' orR' prefix is present, backslashes are still used to
quote the following character, but all backslashes are left in the
string. For example, the string literal r"\n" consists of two
characters: a backslash and a lowercase `n'. String quotes can be
escaped with a backslash, but the backslash remains in the string; for
example, r"\"" is a valid string literal consisting of two characters:
a backslash and a double quote; r"\" is not a value string literal
(even a raw string cannot end in an odd number of backslashes).
Specifically, a raw string cannot end in a single backslash (since the
backslash would escape the following quote character). Note also that
a single backslash followed by a newline is interpreted as those two
characters as part of the string, not as a line continuation.

You can actually use forward slashes instead of backward ones, that way you don't have to escape them at all, which would save you a lot of headaches. Like this: 'C:/Python/raven.txt', I can guarantee that it works on Windows.

String literals for file names

I am new to Python - but not to programming, and on a bit of a steep learning curve.
I have a programme that reads several input files - the first input file contains (amongst other things) the path and name the other files.
I can open the file and read the name OK. If I print the string it looks like this
Z:\ \python\ \rb_data.dat\n'
all my "\" become "\ \" I think I can fix this by using the "r" prefix to convert it to a literal.
My question is how do I attach the prefix to a string variable ??
This is what I want to do :
modat = open('z:\\python\mot1 input.txt') # first input file containing names of other file
rbfile = modat.readline() # get new file name
rbdat = open(rbfile) # open new file

The \\ is an escape sequence for the backslash character \. When you specify a string literal, they are enquoted by either ' or ". Because there are some characters you might need to specify to be part of the string which you cannot enter like this—for example the quotation marks themselves—escape sequences allow you to do it. They usually are \x where x is something you want to enter. Now because all escape sequences start with a backslash, the backslash itself also turns into a special character which you cannot specify directly within a string literal. So you need to escape it too.
That means that the string literal '\\' actually represents a string with a single character: The backslash. Raw strings, that are string literals with an r in front of the opening quotation character, ignore (most) escape sequences. So r'\\x' is actually the string where two backslashes are followed by an x. So it’s identical to the string described by the non-raw string literal '\\\\x'.
All this only applies to string literals though. The string itself holds no information about whether it was created with a raw string literal or not, or whether there was some escape sequence need or not. It just contains all the characters that make out the string.
That also means that as soon as you get a string from somewhere, for example by reading it from a file, then you don’t need to worry about escaping something in there to make sure that it’s a correct string. It just is.
So in your code, when you open the file at z:\python\mot1 input.txt, you need to specify that filename as a string first. So you have to use a string literal, either with escaping the backslashes, or by using a raw string.
Then, when you read the new filename from that file, you already have a real string, and don’t need to bother with anything more. Assuming that it was correctly written to the file, you can just use it like that.

The backslash \ in Python strings (and in code blocks on StackOverflow!) means, effectively, "treat the next character differently". As it is reserved for this purpose, when you actually have a backslash in your strings, it must be "escaped" by a preceding backslash:
>>> myString = "\\" # the first one "escapes" the second
>>> myString = "\" # no escape, so...
SyntaxError: EOL while scanning string literal
>>> print("\\") # when we actually print out the string
\
The short story is, you can basically ignore this in your strings. If you pass rbfile to open, Python will interpret it correctly.

Why not use os.path.normcase, like this:
with open(r'z:\python\mot1 input.txt') as f:
for line in f:
if line.strip():
if os.path.isfile(os.path.normcase(line.strip())):
with open(line.strip()) as f2:
# do something with
# f2
From the documentation of os.path.normcase:
Normalize the case of a pathname. On Unix and Mac OS X, this returns
the path unchanged; on case-insensitive filesystems, it converts the
path to lowercase. On Windows, it also converts forward slashes to
backward slashes.

How do you write special characters ("\n","\b",...) to a file in Python?

I'm using Python to process some plain text into LaTeX, so I need to be able to write things like \begin{enumerate} or \newcommand to a file. When Python writes this to a file, though, it interprets \b and \n as special characters.
How do I get Python to write \newcommand to a file, instead of writing ewcommand on a new line?
The code is something like this ...
with open(fileout,'w',encoding='utf-8') as fout:
fout.write("\begin{enumerate}[1.]\n")
Python 3, Mac OS 10.5 PPC

One solution is to escape the escape character (\). This will result in a literal backslash before the b character instead of escaping b:
with open(fileout,'w',encoding='utf-8') as fout:
fout.write("\\begin{enumerate}[1.]\n")
This will be written to the file as
\begin{enumerate}[1.]<newline>
(I assume that the \n at the end is an intentional newline. If not, use double-escaping here as well: \\n.)

You just need to double the backslash: \\n, \\b. This will escape the backslash. You can also put the r prefix in front of your string: r'\begin'. As detailed here, this will prevent substitutions.

You can also use raw strings:
with open(fileout,'w',encoding='utf-8') as fout:
fout.write(r"\begin{enumerate}[1.]\n")
Note the 'r' before \begin

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.