replace all "\" with "\\" python - python

Does anyone know how replace all \ with \\ in python?
Ive tried:
re.sub('\','\\',string)
But it screws it up because of the escape sequence.
does anyone know the awnser to my question?

You just need to escape the backslashes in your strings: (also there's no need for regex stuff)
>>> s = "cats \\ dogs"
>>> print s
cats \ dogs
>>> print s.replace("\\", "\\\\")
cats \\ dogs

you should do:
re.sub(r'\\', r'\\\\', string)
As r'\' is not a valid string
BTW, you should always use raw (r'') strings with regex as many things are done with backslashes.

You should escape backslashes, and also you don't need regex for this simple operation:
>>> my_string = r"asd\asd\asd\\"
>>> print(my_string)
asd\asd\asd\\
>>> replaced = my_string.replace("\\", "\\\\")
>>> print(replaced)
asd\\asd\\asd\\\\

You either need re.sub("\\\\","\\\\\\\\",string) or re.sub(r'\\', r'\\\\', string) because you need to escape each slash twice ... once for the string and once for the regex.
>>> whatever = r'z\w\r'
>>> print whatever
z\w\r
>>> print re.sub(r"\\",r"\\\\", whatever)
z\\w\\r
>> print re.sub("\\\\","\\\\\\\\",whatever)
z\\w\\r

Related

Python regex not working with special characters

SOLVED: it replaced the " symbols in the file with ' (in the data strings)
Do you know a way to only search for 1 or more words (not numbers) between [" and \n?
This works on regexr.com, but not in python
https://regexr.com/3tju7
¨
(?<=\[\")(\D+)(?=\\n)
"S": ["Something\n13/8-2018 09:00 to 11:30
¨
Python code:
re.search('(?<=[\")(\D+)(?=\n)', str(data))
I think \[, \" and \\n is the problem, I have tried to use raw in python
re.search('(?<=\[\")(\D+)(?=\\n)', '"S": ["Something\n13/8-201809:00 to 11:30').group()
This worked but I have to use "data" because I have multiple strings, and it won't let me use .group() on that.
Error: AttributeError: 'NoneType' object has no attribute 'group'
Your problem is that the \n is being interpreted as a newline, instead of the literal characters \ and n. You can use a simpler regex, \["([\w\s]+)$, along with the MULTILINE flag, without modifying the data.
>>> import re
>>> data = '"S": ["Something\n13/8-201809:00 to 11:30'
>>> pattern = '\["([\w\s]+)$'
>>> m = re.search(pattern, data, re.MULTILINE)
>>> m.group(1)
'Something'
Try to put a r before the string with the pattern, that marks the string as "raw". This stops python from evaluating escaped characters before passing them to the function
re.search(r'\search', string)
Or:
rgx = re.compile(r'pattern')
rgx.search(string)

regex unicode characters

The following regex working online but not working in python code and shows no matches:
https://regex101.com/r/lY1kY8/2
s=re.sub(r'\x.+[0-9]',' ',s)
required:
re.sub(r'\x.+[0-9]* ',' ',r'cats\xe2\x80\x99 faces')
Out[23]: 'cats faces'
basically wanted to remove the unicode special characters "\xe2\x80\x99"
As another option that doesn't require regex, you could instead remove the unicode characters by removing anything not listed in string.printable
>>> import string
>>> ''.join(i for i in 'cats\xe2\x80\x99 faces' if i in string.printable)
'cats faces'
print re.findall(r'\\x.*?[0-9]* ',r'cats\xe2\x80\x99 faces')
^^
Use raw mode flag.Use findall as match starts matching from beginning
print re.sub(ur'\\x.*?[0-9]+','',r'cats\xe2\x80\x99 faces')
with re.sub
s=r'cats\xe2\x80\x99 faces'
print re.sub(r'\\x.+?[0-9]*','',s)
EDIT:
The correct way would be to decode to utf-8 and then apply regex.
s='cats\xe2\x80\x99 faces'
\xe2\x80\x99 is U+2019
print re.sub(u'\u2019','',s.decode('utf-8'))
Assume you use Python 2.x
>>> s = 'cats\xe2\x80\x99 f'
>>> len(s), s[4]
(9, 'â')
Means chars like \xe2 is with 1 length, instead 3. So that you cannot match it with r'\\x.+?[0-9]*' to match it.
>>> s = '\x63\x61\x74\x73\xe2\x80\x99 f'
>>> ''.join([c for c in s if c <= 'z'])
'cats f'
Help this help a bit.

Python converting string to latex using regular expression

Say I have a string
string = "{1/100}"
I want to use regular expressions in Python to convert it into
new_string = "\frac{1}{100}"
I think I would need to use something like this
new_string = re.sub(r'{.+/.+}', r'', string)
But I'm stuck on what I would put in order to preserve the characters in the fraction, in this example 1 and 100.
You can use () to capture the numbers. Then use \1 and \2 to refer to them:
new_string = re.sub(r'{(.+)/(.+)}', r'\\frac{\1}{\2}', string)
# \frac{1}{100}
Note: Don't forget to escape the backslash \\.
Capture the numbers using parens and then reference them in the replacement text using \1 and \2. For example:
>>> print re.sub(r'{(.+)/(.+)}', r'\\frac{\1}{\2}', "{1/100}")
\frac{1}{100}
Anything inside the braces would be a number/number. So in the regex place numbers([0-9]) instead of a .(dot).
>>> import re
>>> string = "{1/100}"
>>> new = re.sub(r'{([0-9]+)/([0-9]+)}', r'\\frac{\1}{\2}', string)
>>> print new
\frac{1}{100}
Use re.match. It's more flexible:
>>> m = re.match(r'{(.+)/(.+)}', string)
>>> m.groups()
('1', '100')
>>> new_string = "\\frac{%s}{%s}"%m.groups()
>>> print new_string
\frac{1}{100}

python regex issue with underscore

i am trying to do some string search with regular expressions, where i need to print the [a-z,A-Z,_] only if they end with " " space, but i am having some trouble if i have underscore at the end then it doesn't wait for the space and executes the command.
if re.search(r".*\s\D+\s", string):
print string
if i keep
string = "abc shot0000 "
it works fine, i do need it to execute it only when the string ends with a space \s.
but if i keep
string = "abc shot0000 _"
then it doesn't wait for the space \s and executes the command.
You're using search and this function, as the name says, search in your string if the pattern appear and that's the case in your two strings.
You should add a $ to your regular expression to search for the end of string:
if re.search(r".*\s\D+\s$", string):
print string
You need to anchor the RE at the end of the string with $:
if re.search(r".*\s\D+\s$", string):
print string
Use a $:
>>> strs = "abc shot0000 "
>>> re.search(r"\s\w+\s$", strs) #use \w: it'll handle A-Za-z_
<_sre.SRE_Match object at 0xa530100>
>>> strs = "abc shot0000 _"
>>> re.search(r"\s\w+\s$", strs)
#None

Python regex to replace double backslash with single backslash

I'm trying to replace all double backslashes with just a single backslash. I want to replace 'class=\\"highlight' with 'class=\"highlight'. I thought that python treats '\\' as one backslash and r'\\+' as a string with two backslashes. But when I try
In [5]: re.sub(r'\\+', '\\', string)
sre_constants.error: bogus escape (end of line)
So I tried switching the replace string with a raw string:
In [6]: re.sub(r'\\+', r'\\', string)
Out [6]: 'class=\\"highlight'
Which isn't what I need. So I tried only one backslash in the raw string:
In [7]: re.sub(r'\\+', r'\', string)
SyntaxError: EOL while scanning string literal
why not use string.replace()?
>>> s = 'some \\\\ doubles'
>>> print s
some \\ doubles
>>> print s.replace('\\\\', '\\')
some \ doubles
Or with "raw" strings:
>>> s = r'some \\ doubles'
>>> print s
some \\ doubles
>>> print s.replace('\\\\', '\\')
some \ doubles
Since the escape character is complicated, you still need to escape it so it does not escape the '
You only got one backslash in string:
>>> string = 'class=\\"highlight'
>>> print string
class=\"highlight
Now lets put another one in there
>>> string = 'class=\\\\"highlight'
>>> print string
class=\\"highlight
and then remove it again
>>> print re.sub('\\\\\\\\', r'\\', string)
class=\"highlight
Just use .replace() twice!
I had the following path: C:\\Users\\XXXXX\\Desktop\\PMI APP GIT\\pmi-app\\niton x5 test data
To convert \ to single backslashes, i just did the following:
path_to_file = path_to_file.replace('\\','*')
path_to_file = path_to_file.replace('**', '\\')
first operation creates ** for every \ and second operation escapes the first slash, replacing ** with a single \.
Result:
C:**Users**z0044wmy**Desktop**PMI APP GIT**pmi-app**GENERATED_REPORTS
C:\Users\z0044wmy\Desktop\PMI APP GIT\pmi-app\GENERATED_REPORTS
I just realized that this might be the simplest answer:
import os
os.getcwd()
The above outputs a path with \ (2 back slashes)
BUT if you wrap it with a print function, i.e.,
print(os.getcwd())
it will output the 2 slashes with 1 slash so you can then copy and paste into an address bar!

Categories

Resources