Why aren't all the backslashes being replaced as per replace() statement?

Why aren't all the backslashes being replaced as per replace() statement? - python

I'm trying to replace backslashes (from this polyline) with double backslashes.
My Script:
txt = "qnkyHbgFYhBi#lFA|#Dz#K~#Oz#An#Dz#A|#D|#M|BJZd#KDHRzBf#lMNrBB~#Lz#\h#^`#Lt#?L]`#GH?FJl#Xl#NNLXVpBHV^LZf#Jt##j#Kx#OjB#l#b#ZJDv#d#`#RXFp#\Zl#d#|Ad#`#jAzBf#nATn#\xAD~#Jt#TJTv#^b#f#^b#b#f#^r#nA`#^\f#`#VVp#Lx#Vb#d#n#\l#V|BLx#Hz#Pp#Lz#Fl#ErAFf#K~#IvBQr#m#hBFfAC|#g#nC?z#ANi#hBGjAMx#OvBa#hBWfBSx#?r#M|#Qv#Wn#q#tAMt##z#Px#kAvDWj#u#pAYr#Mx#q#fA_#d#K~#YTAr#C##B#`#ITSL}#I]vBz#pA#z#KVGDk##g#Eg#Kg##g#Ec#Fe#YSZGFEUUg#Gz#i#j#S|#Wr#Sp#[hBDx#BvBD~#BvCNx#Zj#Z^Zf#TXb#R^Z\`#f#FXj#BTOdA]`BUx#Wl#Oz#G|#Hn#At#m#lI#XXf#?v#IBCq#M_AV{B\wGLs#To#|#Th#In#`##cBWmAuAeE[k#e#YYg#a#_#c#YQ[y#aAWa#Us#?iAEk#?{#KaFNy#d#qBh#eDl#s#J#JMLKLT`#R`#\X~#b#^^PfAr#\Lf#Fd#m#J{ABkALw#d#a#?_AMoA?yCbAqC#s#hBwBXi#Ng#Nu#n#sAMuBJ}#\kACs#[e#Wk#e#c#k#YYYIBi#eAMy#A}#Im#G}#JyBDyB?{#IyBC_AKc#_#QgAa#]m#eAo#]e#Om#SsAIy#GwDIwB#}BF}BC]AcEIq#Wq#i#eBMy#AkBUyDIKEvAIZUNK?MFc#ZUGKeAG}#M{DLgAD_ADiFHKtBITm#By#OS[QAUF}#PeBA}AIe#GQc#g#UaBUs#}#gAKi#Ns#A[Kw#GS]g#Ws#I}BOwB?{#SqDA_AQwDAw#g#DEGAu#B}BA}#Fy#Re#Hw#A_AUiAH{#RYXk#EwBG{#OiAAmA"
x = txt.replace("\\", "\\\\")
print(x)
Output (top string has spaces added to highlight the differences to the orginal string below)
qnkyHbgFYhBi#lFA|#Dz#K~#Oz#An#Dz#A|#D|#M|BJZd#KDHRzBf#lMNrBB~#Lz#\\h#^`#Lt#?L]`#GH?FJl#Xl#NNLXVpBHV^LZf#Jt##j#Kx#OjB#l#b#ZJDv#d#`#RXFp#\\Zl#d#|Ad#`#jAzBf#nATn#~ #Jt#TJTv#^b#f#^b#b#f#^r#nA`#^#` #VVp#Lx#Vb#d#n#\\ l#V|BLx#Hz#Pp#Lz#Fl#ErAFf#K~#IvBQr#m#hBFfAC|#g#nC?z#ANi#hBGjAMx#OvBa#hBWfBSx#?r#M|#Qv#Wn#q#tAMt##z#Px#kAvDWj#u#pAYr#Mx#q#fA_#d#K~#YTAr#C##B#`#ITSL}#I]vBz#pA#z#KVGDk##g#Eg#Kg##g#Ec#Fe#YSZGFEUUg#Gz#i#j#S|#Wr#Sp#[hBDx#BvBD~#BvCNx#Zj#Z^Zf#TXb#R^Z\\`#f#FXj#BTOdA]`BUx#Wl#Oz#G|#Hn#At#m#lI#XXf#?v#IBCq#M_AV{B\\wGLs#To#|#Th#In#`##cBWmAuAeE[k#e#YYg#a#_#c#YQ[y#aAWa#Us#?iAEk#?{#KaFNy#d#qBh#eDl#s#J#JMLKLT`#R`#\\X~#b#^^PfAr#\\Lf#Fd#m#J{ABkALw#d#a#?_AMoA?yCbAqC#s#hBwBXi#Ng#Nu#n#sAMuBJ}#\\kACs#[e#Wk#e#c#k#YYYIBi#eAMy#A}#Im#G}#JyBDyB?{#IyBC_AKc#_#QgAa#]m#eAo#]e#Om#SsAIy#GwDIwB#}BF}BC]AcEIq#Wq#i#eBMy#AkBUyDIKEvAIZUNK?MFc#ZUGKeAG}#M{DLgAD_ADiFHKtBITm#By#OS[QAUF}#PeBA}AIe#GQc#g#UaBUs#}#gAKi#Ns#A[Kw#GS]g#Ws#I}BOwB?{#SqDA_AQwDAw#g#DEGAu#B}BA}#Fy#Re#Hw#A_AUiAH{#RYXk#EwBG{#OiAAmA
qnkyHbgFYhBi#lFA|#Dz#K~#Oz#An#Dz#A|#D|#M|BJZd#KDHRzBf#lMNrBB~#Lz#\ h#^`#Lt#?L]`#GH?FJl#Xl#NNLXVpBHV^LZf#Jt##j#Kx#OjB#l#b#ZJDv#d#`#RXFp#\ Zl#d#|Ad#`#jAzBf#nATn#\ xAD~#Jt#TJTv#^b#f#^b#b#f#^r#nA`#^\f#`#VVp#Lx#Vb#d#n#\ l#V|BLx#Hz#Pp#Lz#Fl#ErAFf#K~#IvBQr#m#hBFfAC|#g#nC?z#ANi#hBGjAMx#OvBa#hBWfBSx#?r#M|#Qv#Wn#q#tAMt##z#Px#kAvDWj#u#pAYr#Mx#q#fA_#d#K~#YTAr#C##B#`#ITSL}#I]vBz#pA#z#KVGDk##g#Eg#Kg##g#Ec#Fe#YSZGFEUUg#Gz#i#j#S|#Wr#Sp#[hBDx#BvBD~#BvCNx#Zj#Z^Zf#TXb#R^Z\ `#f#FXj#BTOdA]`BUx#Wl#Oz#G|#Hn#At#m#lI#XXf#?v#IBCq#M_AV{B\ wGLs#To#|#Th#In#`##cBWmAuAeE[k#e#YYg#a#_#c#YQ[y#aAWa#Us#?iAEk#?{#KaFNy#d#qBh#eDl#s#J#JMLKLT`#R`#\ X~#b#^^PfAr#\ Lf#Fd#m#J{ABkALw#d#a#?_AMoA?yCbAqC#s#hBwBXi#Ng#Nu#n#sAMuBJ}#\ kACs#[e#Wk#e#c#k#YYYIBi#eAMy#A}#Im#G}#JyBDyB?{#IyBC_AKc#_#QgAa#]m#eAo#]e#Om#SsAIy#GwDIwB#}BF}BC]AcEIq#Wq#i#eBMy#AkBUyDIKEvAIZUNK?MFc#ZUGKeAG}#M{DLgAD_ADiFHKtBITm#By#OS[QAUF}#PeBA}AIe#GQc#g#UaBUs#}#gAKi#Ns#A[Kw#GS]g#Ws#I}BOwB?{#SqDA_AQwDAw#g#DEGAu#B}BA}#Fy#Re#Hw#A_AUiAH{#RYXk#EwBG{#OiAAmA
So you can see
#\xAD~#
becomes
#~#
when I would expect it to be
#\\xAD~#.

The basic problem is that there are no backslashes in your string. Your source code has backslashes, but they are all escape signals. If you want to retain the backslashes in "WYSIWYG" style, use raw string mode:
txt = r"qnkyHbgFYhB..."
This will retain the characters as seen, without processing the usual escape sequences.

Related

Python assign "\" to a variable [duplicate]

When I write print('\') or print("\") or print("'\'"), Python doesn't print the backslash \ symbol. Instead it errors for the first two and prints '' for the third. What should I do to print a backslash?
This question is about producing a string that has a single backslash in it. This is particularly tricky because it cannot be done with raw strings. For the related question about why such a string is represented with two backslashes, see Why do backslashes appear twice?. For including literal backslashes in other strings, see using backslash in python (not to escape).

You need to escape your backslash by preceding it with, yes, another backslash:
print("\\")
And for versions prior to Python 3:
print "\\"
The \ character is called an escape character, which interprets the character following it differently. For example, n by itself is simply a letter, but when you precede it with a backslash, it becomes \n, which is the newline character.
As you can probably guess, \ also needs to be escaped so it doesn't function like an escape character. You have to... escape the escape, essentially.
See the Python 3 documentation for string literals.

A hacky way of printing a backslash that doesn't involve escaping is to pass its character code to chr:
>>> print(chr(92))
\

print(fr"\{''}")
or how about this
print(r"\ "[0])

For completeness: A backslash can also be escaped as a hex sequence: "\x5c"; or a short Unicode sequence: "\u005c"; or a long Unicode sequence: "\U0000005c". All of these will produce a string with a single backslash, which Python will happily report back to you in its canonical representation - '\\'.

Trouble loading csv file into Jupyter Notebooks [duplicate]

This question already has answers here:
How should I write a Windows path in a Python string literal?
(5 answers)
Closed 3 years ago.
I'm trying to read a CSV file into Python (Spyder), but I keep getting an error. My code:
import csv
data = open("C:\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener")
data = csv.reader(data)
print(data)
I get the following error:
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes
in position 2-3: truncated \UXXXXXXXX escape
I have tried to replace the \ with \\ or with / and I've tried to put an r before "C.., but all these things didn't work.

This error occurs, because you are using a normal string as a path. You can use one of the three following solutions to fix your problem:
1: Just put r before your normal string. It converts a normal string to a raw string:
pandas.read_csv(r"C:\Users\DeePak\Desktop\myac.csv")
2:
pandas.read_csv("C:/Users/DeePak/Desktop/myac.csv")
3:
pandas.read_csv("C:\\Users\\DeePak\\Desktop\\myac.csv")

The first backslash in your string is being interpreted as a special character. In fact, because it's followed by a "U", it's being interpreted as the start of a Unicode code point.
To fix this, you need to escape the backslashes in the string. The direct way to do this is by doubling the backslashes:
data = open("C:\\Users\\miche\\Documents\\school\\jaar2\\MIK\\2.6\\vektis_agb_zorgverlener")
If you don't want to escape backslashes in a string, and you don't have any need for escape codes or quotation marks in the string, you can instead use a "raw" string, using "r" just before it, like so:
data = open(r"C:\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener")

You can just put r in front of the string with your actual path, which denotes a raw string. For example:
data = open(r"C:\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener")

Consider it as a raw string. Just as a simple answer, add r before your Windows path.
import csv
data = open(r"C:\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener")
data = csv.reader(data)
print(data)

Try writing the file path as "C:\\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener" i.e with double backslash after the drive as opposed to "C:\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener"

Add r before your string. It converts a normal string to a raw string.

As per String literals:
String literals can be enclosed within single quotes (i.e. '...') or double quotes (i.e. "..."). They can also be enclosed in matching groups of three single or double quotes (these are generally referred to as triple-quoted strings).
The backslash character (i.e. \) is used to escape characters which otherwise will have a special meaning, such as newline, backslash itself, or the quote character. String literals may optionally be prefixed with a letter r or R. Such strings are called raw strings and use different rules for backslash escape sequences.
In triple-quoted strings, unescaped newlines and quotes are allowed, except that the three unescaped quotes in a row terminate the string.
Unless an r or R prefix is present, escape sequences in strings are interpreted according to rules similar to those used by Standard C.
So ideally you need to replace the line:
data = open("C:\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener")
To any one of the following characters:
Using raw prefix and single quotes (i.e. '...'):
data = open(r'C:\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener')
Using double quotes (i.e. "...") and escaping backslash character (i.e. \):
data = open("C:\\Users\\miche\\Documents\\school\\jaar2\\MIK\\2.6\\vektis_agb_zorgverlener")
Using double quotes (i.e. "...") and forwardslash character (i.e. /):
data = open("C:/Users/miche/Documents/school/jaar2/MIK/2.6/vektis_agb_zorgverlener")

Just putting an r in front works well.
eg:
white = pd.read_csv(r"C:\Users\hydro\a.csv")

It worked for me by neutralizing the '' by f = open('F:\\file.csv')

The double \ should work for Windows, but you still need to take care of the folders you mention in your path. All of them (except the filename) must exist. Otherwise you will get an error.

python 3: quoting result of random string generation

I'm new to python and things do not always work as I expect... but I am learning, slowly. Here is a case in point. If I randomly create a string via:
thing = ''.join([
random.SystemRandom().choice(
"{}{}{}".format(
string.ascii_letters, string.digits, string.punctuation
)
) for i in range(63)
])
then I could end up with a string with single quotes as well as backslashes. I assume that I should then go through the string and quote the possibly problematic characters. So, for example: if I generate the (short) string:
cs]b77e\IM>&4/,u.s_jr"xmMdHD7a'wrEw(
my instinct tells me that I should quote that into:
cs]b77e\\IM>&4/,u.s_jr"xmMdHD7a\'wrEw(
It looks like the string.replace() method is my friend...
thing = ''.join([
random.SystemRandom().choice(
"{}{}{}".format(
string.ascii_letters, string.digits, string.punctuation
)
) for i in range(63)
]).replace('\\', '\\').replace('\'', '\'')
but is there a better way?
Also, in the replace() methods the meaning of the single quoted strings seems to change depending on context. Coming from Perl this seems strange to me. My initial attempts had me doing things like replace('\\', '\\\\') thinking that I had to quote the characters going into the replacement string. Is this normal or am I missing something else?
Edit
My goal here is to end up with 63 characters in a string. I don't really think that I have to quote any generated single quotes but my thought is that if I later use the string and it has generated backslashes then the next character after the backslash would act like it was quoted, right? I mean:
len('1234')
yields 4 but
len('12\4')
yields 3 so I need to post-process the generated string to at least quote the backslashes, right? Is there a better way to quote problematic characters than a chain of replaces() methods?

A string can contain any valid characters; the quotes and backslashes are only useful or special when representing a string in Python code. So you don't normally need to do anything like this when you already have a string which contains the characters you want.
If you want a representation which can be parsed by Python (e.g. by writing it to a .py file), repr() does that.

You don't have to escape characters unless they are part of code you are writing or from an input from a user. If the backslash character or a quote character is generated by a Python program, then it is already stored as that character in memory. There is no need do any additional escaping.
Why? Because Python is not interpreting a string literal, it is simply generating characters, which are stored as numbers in memory. When you ask Python to display a string containing one of the characters such as a single quote or a backslash, it will automatically escape them.
Here is an example. A double quote is 34, single quote is character 39, and backslash is 92.
'a'+chr(34)+'b'+chr(39)+'c'+chr(92)+'d'
# returns:
'a"b\'c\\d'
Because I included a double quote and a single quote Python will use a single quote to surround the string, an unescaped double quote within the string, an escaped single quote, and and escaped backslash.
So there is no need to escape characters that are generated within a Python program, it does it for you.

String literals for file names

I am new to Python - but not to programming, and on a bit of a steep learning curve.
I have a programme that reads several input files - the first input file contains (amongst other things) the path and name the other files.
I can open the file and read the name OK. If I print the string it looks like this
Z:\ \python\ \rb_data.dat\n'
all my "\" become "\ \" I think I can fix this by using the "r" prefix to convert it to a literal.
My question is how do I attach the prefix to a string variable ??
This is what I want to do :
modat = open('z:\\python\mot1 input.txt') # first input file containing names of other file
rbfile = modat.readline() # get new file name
rbdat = open(rbfile) # open new file

The \\ is an escape sequence for the backslash character \. When you specify a string literal, they are enquoted by either ' or ". Because there are some characters you might need to specify to be part of the string which you cannot enter like this—for example the quotation marks themselves—escape sequences allow you to do it. They usually are \x where x is something you want to enter. Now because all escape sequences start with a backslash, the backslash itself also turns into a special character which you cannot specify directly within a string literal. So you need to escape it too.
That means that the string literal '\\' actually represents a string with a single character: The backslash. Raw strings, that are string literals with an r in front of the opening quotation character, ignore (most) escape sequences. So r'\\x' is actually the string where two backslashes are followed by an x. So it’s identical to the string described by the non-raw string literal '\\\\x'.
All this only applies to string literals though. The string itself holds no information about whether it was created with a raw string literal or not, or whether there was some escape sequence need or not. It just contains all the characters that make out the string.
That also means that as soon as you get a string from somewhere, for example by reading it from a file, then you don’t need to worry about escaping something in there to make sure that it’s a correct string. It just is.
So in your code, when you open the file at z:\python\mot1 input.txt, you need to specify that filename as a string first. So you have to use a string literal, either with escaping the backslashes, or by using a raw string.
Then, when you read the new filename from that file, you already have a real string, and don’t need to bother with anything more. Assuming that it was correctly written to the file, you can just use it like that.

The backslash \ in Python strings (and in code blocks on StackOverflow!) means, effectively, "treat the next character differently". As it is reserved for this purpose, when you actually have a backslash in your strings, it must be "escaped" by a preceding backslash:
>>> myString = "\\" # the first one "escapes" the second
>>> myString = "\" # no escape, so...
SyntaxError: EOL while scanning string literal
>>> print("\\") # when we actually print out the string
\
The short story is, you can basically ignore this in your strings. If you pass rbfile to open, Python will interpret it correctly.

Why not use os.path.normcase, like this:
with open(r'z:\python\mot1 input.txt') as f:
for line in f:
if line.strip():
if os.path.isfile(os.path.normcase(line.strip())):
with open(line.strip()) as f2:
# do something with
# f2
From the documentation of os.path.normcase:
Normalize the case of a pathname. On Unix and Mac OS X, this returns
the path unchanged; on case-insensitive filesystems, it converts the
path to lowercase. On Windows, it also converts forward slashes to
backward slashes.

python replace single backslash with double backslash [duplicate]

This question already has answers here:
How can I put an actual backslash in a string literal (not use it for an escape sequence)?
(4 answers)
Closed 7 months ago.
In python, I am trying to replace a single backslash ("\") with a double backslash("\"). I have the following code:
directory = string.replace("C:\Users\Josh\Desktop\20130216", "\", "\\")
However, this gives an error message saying it doesn't like the double backslash. Can anyone help?

No need to use str.replace or string.replace here, just convert that string to a raw string:
>>> strs = r"C:\Users\Josh\Desktop\20130216"
^
|
notice the 'r'
Below is the repr version of the above string, that's why you're seeing \\ here.
But, in fact the actual string contains just '\' not \\.
>>> strs
'C:\\Users\\Josh\\Desktop\\20130216'
>>> s = r"f\o"
>>> s #repr representation
'f\\o'
>>> len(s) #length is 3, as there's only one `'\'`
3
But when you're going to print this string you'll not get '\\' in the output.
>>> print strs
C:\Users\Josh\Desktop\20130216
If you want the string to show '\\' during print then use str.replace:
>>> new_strs = strs.replace('\\','\\\\')
>>> print new_strs
C:\\Users\\Josh\\Desktop\\20130216
repr version will now show \\\\:
>>> new_strs
'C:\\\\Users\\\\Josh\\\\Desktop\\\\20130216'

Let me make it simple and clear. Lets use the re module in python to escape the special characters.
Python script :
import re
s = "C:\Users\Josh\Desktop"
print s
print re.escape(s)
Output :
C:\Users\Josh\Desktop
C:\\Users\\Josh\\Desktop
Explanation :
Now observe that re.escape function on escaping the special chars in the given string we able to add an other backslash before each backslash, and finally the output results in a double backslash, the desired output.
Hope this helps you.

Use escape characters: "full\\path\\here", "\\" and "\\\\"

In python \ (backslash) is used as an escape character. What this means that in places where you wish to insert a special character (such as newline), you would use the backslash and another character (\n for newline)
With your example string you would notice that when you put "C:\Users\Josh\Desktop\20130216" in the repl you will get "C:\\Users\\Josh\\Desktop\x8130216". This is because \2 has a special meaning in a python string. If you wish to specify \ then you need to put two \\ in your string.
"C:\\Users\\Josh\\Desktop\\28130216"
The other option is to notify python that your entire string must NOT use \ as an escape character by pre-pending the string with r
r"C:\Users\Josh\Desktop\20130216"
This is a "raw" string, and very useful in situations where you need to use lots of backslashes such as with regular expression strings.
In case you still wish to replace that single \ with \\ you would then use:
directory = string.replace(r"C:\Users\Josh\Desktop\20130216", "\\", "\\\\")
Notice that I am not using r' in the last two strings above. This is because, when you use the r' form of strings you cannot end that string with a single \
Why can't Python's raw string literals end with a single backslash?
https://pythonconquerstheuniverse.wordpress.com/2008/06/04/gotcha-%E2%80%94-backslashes-are-escape-characters/

Maybe a syntax error in your case,
you may change the line to:
directory = str(r"C:\Users\Josh\Desktop\20130216").replace('\\','\\\\')
which give you the right following output:
C:\\Users\\Josh\\Desktop\\20130216

The backslash indicates a special escape character. Therefore, directory = path_to_directory.replace("\", "\\") would cause Python to think that the first argument to replace didn't end until the starting quotation of the second argument since it understood the ending quotation as an escape character.
directory=path_to_directory.replace("\\","\\\\")

Given the source string, manipulation with os.path might make more sense, but here's a string solution;
>>> s=r"C:\Users\Josh\Desktop\\20130216"
>>> '\\\\'.join(filter(bool, s.split('\\')))
'C:\\\\Users\\\\Josh\\\\Desktop\\\\20130216'
Note that split treats the \\ in the source string as a delimited empty string. Using filter gets rid of those empty strings so join won't double the already doubled backslashes. Unfortunately, if you have 3 or more, they get reduced to doubled backslashes, but I don't think that hurts you in a windows path expression.

You could use
os.path.abspath(path_with_backlash)
it returns the path with \

Use:
string.replace(r"C:\Users\Josh\Desktop\20130216", "\\", "\\")
Escape the \ character.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.