Python regex and escape characters [duplicate]

Python regex and escape characters [duplicate] - python

This question already has answers here:
Regular expression with backslash in Python3
(2 answers)
Closed 3 years ago.
I have the following regex code:
tmp = 'c:\\\\temp'
m = re.search(tmp, tmp)
if(m==None):
print('Unable to find a ticker in ' + filename)
else:
print("REGEX RESULT - " + m.group(0))
which returns None. No matter how many or few backslashes I use for variable tmp, I still get None as result. How can I perform regex to search for a backslashed file path?

you can use r'' to ignore escape chars
tmp = r'c:\\temp'
r is for raw string

Related

python multiple characters replace in string not working with pipe [duplicate]

This question already has answers here:
How to input a regex in string.replace?
(7 answers)
Closed 1 year ago.
I am trying to match and replace multiple characters in a string.
str1 = 'US$0.18'
reg = 'AUS$|US$|HK$|MK$'
#reg = 'AUS\$|US\$|HK\$|MK\$' <-- doesn't work
#reg = 'US$' <-- this works
str1 = str1.replace(reg, '')
This doesn't replace US$ which I expected to.
What am I missing here?

You can do that using re.sub(). Since $ has a special meaning in re, we need to escape it by appending a \ in front of it.
(AUS|US|HK|MK)\$ - Finds a match that has either of AUS, US, HK or MK that is followed by a $.
re.sub(r'(AUS|US|HK|MK)\$',r'', s) - Replaces the matched string with a '' of string s.
import re
s = "US$0.18 AUS$45 HK$96"
x = re.sub(r'(AUS|US|HK|MK)\$',r'', s)
print(x)
0.18 45 96

Split string on "$" using regex [duplicate]

This question already has answers here:
What special characters must be escaped in regular expressions?
(13 answers)
Closed 1 year ago.
I am trying to split a string using regex on $ symbol but the output is not what I want.
string = "43$hello"
list_of_splits = re.split("$",string)
Output:
['43$hello','']
Output I want:
['43','hello']
It's visible by the output that "$" is a special character in regex, but now by how can I do this?

Use the escape character \ : list_of_splits = re.split("\$", str)

You can just use string split method.
string = "43$hello"
string.split("$")
Output
['43', 'hello']

Garbage characters added after re.sub()? [duplicate]

This question already has an answer here:
How to apply a function on a backreference? [duplicate]
(1 answer)
Closed 5 years ago.
Why the following regular expression returns garbage !!
expr = 'a + b'
expr2 = re.sub(r'\w', 'probs["\1"]', expr)
probs[""] + probs[""]
or:
probs["\x01"] + probs["\x01"]
desired output :
probs["a"] + probs["b"]
Stupid me I forgot the brackets :
expr2 = re.sub(r'(\w)', r'probs["\1"]', expr)

\1 is being interpreted as the character with ascii value 1. You probably want to add an r to make it a raw string, or use \\1 in the string.

How to get the correct result when I use re.compile(pattern), if the pattern contains some special characters, like, (),? [duplicate]

This question already has answers here:
Escaping regex string
(4 answers)
Closed 7 years ago.
I am trying to replace a string in a file using python re lib. But I failed on replacing some texts with some special characters, like, (), ?, etc. Can anyone help me look at this issue?
I attached my code in here.:
filterText = '\"' + sheet.row_values(row)[1] + '\"';
print "filterText = %s"%filterText;
pattern = re.compile(filterText, re.S);
replacedText = '\"' + sheet.row_values(row)[2] + '\"';
print "replacedText = %s"%replacedText;
if filterText == "English (UK)":
print "replacedText = %s"%replacedText;
fileContent = re.sub(pattern, replacedText, fileContent);

re.escape(string)
Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.
Use re.escape to convert any string as a literal pattern.
filterText = '\"' + re.escape(sheet.row_values(row)[1]) + '\"'

Most efficient way to strip forbidden characters in file name from Unicode string [duplicate]

This question already has answers here:
How to remove bad path characters in Python?
(5 answers)
Closed 8 years ago.
I have a string which contain some data I parse from the web, and make a file named after this data.
string = urllib.urlopen("http://example.com").read()
f = open(path + "/" + string + ".txt")
f.write("abcdefg")
f.close()
The problem is that it may include one of this characters: \ / * ? : " < > |.
I'm using Windows, and it is forbidden to use those characters in a filename.
Also, string is in Unicode formar which makes most of the solutions useless.
So, my question is: what is the most efficient / pythonic way to strip those characters?
Thanks in advance!
Edit: the filename is in Unicode format not str!

we dont know how your data look like:
But you can use re.sub:
import re
your_string = re.sub(r'[\\/*?:"<>|]',"","your_string")

The fastest way to do this is to use unicode.translate,
see unicode.translate.
In [31]: _unistr = u'sdfjkh,/.,we/.,132?.?.23490/,/' # any random string.
In [48]: remove_punctuation_map = dict((ord(char), None) for char in '\/*?:"<>|')
In [49]: _unistr.translate(remove_punctuation_map)Out[49]:
u'sdfjkh,.,we.,132..23490,'
To remove all puctuation.
In [46]: remove_punctuation_map = dict((ord(char), None) for char in string.punctuation)
In [47]: _unistr.translate(remove_punctuation_map)
Out[47]: u'sdfjkhwe13223490'

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python regex and escape characters [duplicate] - python

you can use r'' to ignore escape chars tmp = r'c:\\temp' r is for raw string

Related

python multiple characters replace in string not working with pipe [duplicate]

Split string on "$" using regex [duplicate]

Garbage characters added after re.sub()? [duplicate]

How to get the correct result when I use re.compile(pattern), if the pattern contains some special characters, like, (),? [duplicate]

Most efficient way to strip forbidden characters in file name from Unicode string [duplicate]

Categories

Resources