Python regex to replace double backslash with single backslash - python

I'm trying to replace all double backslashes with just a single backslash. I want to replace 'class=\\"highlight' with 'class=\"highlight'. I thought that python treats '\\' as one backslash and r'\\+' as a string with two backslashes. But when I try
In [5]: re.sub(r'\\+', '\\', string)
sre_constants.error: bogus escape (end of line)
So I tried switching the replace string with a raw string:
In [6]: re.sub(r'\\+', r'\\', string)
Out [6]: 'class=\\"highlight'
Which isn't what I need. So I tried only one backslash in the raw string:
In [7]: re.sub(r'\\+', r'\', string)
SyntaxError: EOL while scanning string literal

why not use string.replace()?
>>> s = 'some \\\\ doubles'
>>> print s
some \\ doubles
>>> print s.replace('\\\\', '\\')
some \ doubles
Or with "raw" strings:
>>> s = r'some \\ doubles'
>>> print s
some \\ doubles
>>> print s.replace('\\\\', '\\')
some \ doubles
Since the escape character is complicated, you still need to escape it so it does not escape the '

You only got one backslash in string:
>>> string = 'class=\\"highlight'
>>> print string
class=\"highlight
Now lets put another one in there
>>> string = 'class=\\\\"highlight'
>>> print string
class=\\"highlight
and then remove it again
>>> print re.sub('\\\\\\\\', r'\\', string)
class=\"highlight

Just use .replace() twice!
I had the following path: C:\\Users\\XXXXX\\Desktop\\PMI APP GIT\\pmi-app\\niton x5 test data
To convert \ to single backslashes, i just did the following:
path_to_file = path_to_file.replace('\\','*')
path_to_file = path_to_file.replace('**', '\\')
first operation creates ** for every \ and second operation escapes the first slash, replacing ** with a single \.
Result:
C:**Users**z0044wmy**Desktop**PMI APP GIT**pmi-app**GENERATED_REPORTS
C:\Users\z0044wmy\Desktop\PMI APP GIT\pmi-app\GENERATED_REPORTS

I just realized that this might be the simplest answer:
import os
os.getcwd()
The above outputs a path with \ (2 back slashes)
BUT if you wrap it with a print function, i.e.,
print(os.getcwd())
it will output the 2 slashes with 1 slash so you can then copy and paste into an address bar!

Related

Python: re.sub return illegal characters when the source containing Chinese character [duplicate]

I want to take the string 0.71331, 52.25378 and return 0.71331,52.25378 - i.e. just look for a digit, a comma, a space and a digit, and strip out the space.
This is my current code:
coords = '0.71331, 52.25378'
coord_re = re.sub("(\d), (\d)", "\1,\2", coords)
print coord_re
But this gives me 0.7133,2.25378. What am I doing wrong?
You should be using raw strings for regex, try the following:
coord_re = re.sub(r"(\d), (\d)", r"\1,\2", coords)
With your current code, the backslashes in your replacement string are escaping the digits, so you are replacing all matches the equivalent of chr(1) + "," + chr(2):
>>> '\1,\2'
'\x01,\x02'
>>> print '\1,\2'
,
>>> print r'\1,\2' # this is what you actually want
\1,\2
Any time you want to leave the backslash in the string, use the r prefix, or escape each backslash (\\1,\\2).
Python interprets the \1 as a character with ASCII value 1, and passes that to sub.
Use raw strings, in which Python doesn't interpret the \.
coord_re = re.sub(r"(\d), (\d)", r"\1,\2", coords)
This is covered right in the beginning of the re documentation, should you need more info.

Regular Expression issue in Python? [duplicate]

I want to take the string 0.71331, 52.25378 and return 0.71331,52.25378 - i.e. just look for a digit, a comma, a space and a digit, and strip out the space.
This is my current code:
coords = '0.71331, 52.25378'
coord_re = re.sub("(\d), (\d)", "\1,\2", coords)
print coord_re
But this gives me 0.7133,2.25378. What am I doing wrong?
You should be using raw strings for regex, try the following:
coord_re = re.sub(r"(\d), (\d)", r"\1,\2", coords)
With your current code, the backslashes in your replacement string are escaping the digits, so you are replacing all matches the equivalent of chr(1) + "," + chr(2):
>>> '\1,\2'
'\x01,\x02'
>>> print '\1,\2'
,
>>> print r'\1,\2' # this is what you actually want
\1,\2
Any time you want to leave the backslash in the string, use the r prefix, or escape each backslash (\\1,\\2).
Python interprets the \1 as a character with ASCII value 1, and passes that to sub.
Use raw strings, in which Python doesn't interpret the \.
coord_re = re.sub(r"(\d), (\d)", r"\1,\2", coords)
This is covered right in the beginning of the re documentation, should you need more info.

simplejson - encoding regexp \d+

I have some missunderstanding with encoding regexp:
>>> simplejson.dumps({'title':r'\d+'})
'{"title": "\\\\d+"}'
>>> simplejson.loads('{"title": "\\\\d+"}')
{u'title': u'\\d+'}
>>> print simplejson.loads('{"title": "\\\\d+"}')['title']
\d+
So, without using print I see \\, with using print I see \. So, what the value loaded dict contains - with \\ or with \?
Here is a trick: Use list to see what characters are really in the string:
In [3]: list(u'\\d+')
Out[3]: [u'\\', u'd', u'+']
list breaks up the string into individual characters. So u'\\' is one character. (The double backslash in u'\\' is an escape sequence.) It represents one backslash character. This is correct since r'\d+' also has only one backslash:
In [4]: list(r'\d+')
Out[4]: ['\\', 'd', '+']

How do I escape backslash and single quote or double quote in Python? [duplicate]

This question already has answers here:
How can I put an actual backslash in a string literal (not use it for an escape sequence)?
(4 answers)
Closed 7 months ago.
How do I escape a backslash and a single quote or double quote in Python?
For example:
Long string = '''some 'long' string \' and \" some 'escaped' strings'''
value_to_change = re.compile(A EXPRESION TO REPRESENT \' and \")
modified = re.sub(value_to_change, 'thevalue', Long_string)
## Desired Output
modified = '''some 'long' string thevalue and thevalue some 'escaped' strings'''
How you did it
If your "long string" is read from a file (as you mention in a comment) then your question is misleading. Since you obviously don't fully understand how escaping works, the question as you wrote it down probably is different from the question you really have.
If these are the contents of your file (51 bytes as shown + maybe one or two end-of-line characters):
some 'long' string \' and \" some 'escaped' strings
then this is what it will look like in python:
>>> s1 = open('data.txt', 'r').read().strip()
>>> s1
'some \'long\' string \\\' and \\" some \'escaped\' strings'
>>> print s1
some 'long' string \' and \" some 'escaped' strings
What you wrote in the question will produce:
>>> s2 = '''some 'long' string \' and \" some 'escaped' strings'''
>>> s2
'some \'long\' string \' and " some \'escaped\' strings'
>>> print s2
some 'long' string ' and " some 'escaped' strings
>>> len(s)
49
Do you see the difference?
There are no backslashes in s2 because they have special meaning when you use them to write down strings in Python. They have no special meaning when you read them from a file.
If you want to write down a string that afterwards has a backslash in it, you have to protect the backslash you enter. You have to keep Python from thinking it has special meaning. You do that by escaping it - with a backslash.
One way to do this is to use backslashes, but often the easier and less confusing way is to use raw strings:
>>> s3 = r'''some 'long' string \' and \" some 'escaped' strings'''
'some \'long\' string \\\' and \\" some \'escaped\' strings'
>>> print s3
some 'long' string \' and \" some 'escaped' strings
>>> s1 == s3
True
How you meant it
The above was only to show you that your question was confusing.
The actual answer is a bit harder - when you are working with regular expressions, the backslash takes on yet another layer of special meaning. If you want to safely get a backslash through string escaping and through regex escaping to the actual regex, you have to write down multiple backslashes accordingly.
Furthermore, rules for putting single quotes (') in single-quoted raw strings (r'') are a bit tricky as well, so I will use a raw string with triple single-quotes (r'''''').
>>> print re.sub(r'''\\['"]''', 'thevalue', s1)
some 'long' string thevalue and thevalue some 'escaped' strings
The two backslashes stay two backslashes throughout string escaping and then become only one backslash without special meaning through regex escaping. In total, the regex says:
"match one backslash followed by either a single-quote or a double-quote."
How it should be done
Now for the pièce de résistance: The previous is really a good demonstration of what jwz meant1. If you forget about regex (and know about raw strings), the solution becomes much more obvious:
>>> print s1.replace(r'\"', 'thevalue').replace(r"\'", 'thevalue')
some 'long' string thevalue and thevalue some 'escaped' strings
1 Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
The problem is that in your string \' and \" get converted to ' and ", so on your example as-is, you won't be able to match only \' without matching the single quotes around long.
But my understanding is that this data comes from a file so assuming you have your_file.txt containing
some 'long' string \' and \" some 'escaped' strings
you can replace \' and \" with following code:
import re
from_file = open("your_file.txt", "r").read()
print(re.sub("\\\\(\"|')", "thevalue", from_file))
Note the four slashes. Since this is a string \ gets converted to \ (as this is an escaped character). Then in the regular expression, the remaining \ gets again converted to \, as this is also regular experssion escaped character. Result will match a single slash and one of the " and ' quotes.
is this what you want?
import re
Long_string = "some long string \' and \" some escaped strings"
value_to_change = re.compile( "'|\"" )
modified = re.sub(value_to_change , 'thevalue' , Long_string )
print modified
I try this to print a single backslash (Python 3):
single_backslash_str = r'\ '[0]
print('single_backslash_str') #output: \
print('repr(single_backslash_str)') #output: '\\'
Hope this will help!
Keep in mind, all these strings are exactly the same:
Long_string = '''some long string \' and \" some escaped strings'''
Long_string = '''some long string ' and " some escaped strings'''
Long_string = """some long string ' and " some escaped strings"""
Long_string = 'some long string \' and \" some escaped strings'
Long_string = "some long string \' and \" some escaped strings"
Long_string = 'some long string \' and " some escaped strings'
Long_string = "some long string ' and \" some escaped strings"
There is no backslash character in any of them. So the regex you're looking for doesn't need to match a backslash and a quote, just a quote:
modified = re.sub("['\"]", 'thevalue', Long_string)
BTW: You also don't have to compile the regex before you use it, re.sub will accept a string regex as well as a compiled one.

replace all "\" with "\\" python

Does anyone know how replace all \ with \\ in python?
Ive tried:
re.sub('\','\\',string)
But it screws it up because of the escape sequence.
does anyone know the awnser to my question?
You just need to escape the backslashes in your strings: (also there's no need for regex stuff)
>>> s = "cats \\ dogs"
>>> print s
cats \ dogs
>>> print s.replace("\\", "\\\\")
cats \\ dogs
you should do:
re.sub(r'\\', r'\\\\', string)
As r'\' is not a valid string
BTW, you should always use raw (r'') strings with regex as many things are done with backslashes.
You should escape backslashes, and also you don't need regex for this simple operation:
>>> my_string = r"asd\asd\asd\\"
>>> print(my_string)
asd\asd\asd\\
>>> replaced = my_string.replace("\\", "\\\\")
>>> print(replaced)
asd\\asd\\asd\\\\
You either need re.sub("\\\\","\\\\\\\\",string) or re.sub(r'\\', r'\\\\', string) because you need to escape each slash twice ... once for the string and once for the regex.
>>> whatever = r'z\w\r'
>>> print whatever
z\w\r
>>> print re.sub(r"\\",r"\\\\", whatever)
z\\w\\r
>> print re.sub("\\\\","\\\\\\\\",whatever)
z\\w\\r

Categories

Resources