python multiple characters replace in string not working with pipe [duplicate] - python

This question already has answers here:
How to input a regex in string.replace?
(7 answers)
Closed 1 year ago.
I am trying to match and replace multiple characters in a string.
str1 = 'US$0.18'
reg = 'AUS$|US$|HK$|MK$'
#reg = 'AUS\$|US\$|HK\$|MK\$' <-- doesn't work
#reg = 'US$' <-- this works
str1 = str1.replace(reg, '')
This doesn't replace US$ which I expected to.
What am I missing here?

You can do that using re.sub(). Since $ has a special meaning in re, we need to escape it by appending a \ in front of it.
(AUS|US|HK|MK)\$ - Finds a match that has either of AUS, US, HK or MK that is followed by a $.
re.sub(r'(AUS|US|HK|MK)\$',r'', s) - Replaces the matched string with a '' of string s.
import re
s = "US$0.18 AUS$45 HK$96"
x = re.sub(r'(AUS|US|HK|MK)\$',r'', s)
print(x)
0.18 45 96

Related

How to remove all characters within a string after a non-alphanumerical character [duplicate]

This question already has answers here:
How to remove all characters after a specific character in python?
(10 answers)
Closed 1 year ago.
I want to remove all the character after a non-alphanumerical character ('_') within a string.
For example:
Petr_;Y -> Petr
ČEZ_^(České_energetické_závody) -> ČEZ
I tried:
''.join(c for c in mystring if c.isalnum())
But this way I'm stripping off only alphanumerical characters itself.
Help would be appreciated.
You may want to use the .split() method on strings.
new_string = your_string.split('_',1)[0]
This way you keep only what's before the fisrt '_'.
Searching the index of first occurrence of "_" will do:
s1 = "Petr_;Y"
s2 = "ČEZ_^(České_energetické_závody)"
s11 = s1[:s1.index("_")]
s22 = s2[:s2.index("_")]

RegEx returns empty list when searching for words which begin with a number [duplicate]

This question already has answers here:
What do ^ and $ mean in a regular expression?
(2 answers)
Closed 2 years ago.
I've got a problem with carets and dollar signs in Python.
I want to find every word which starts with a number and ends with a letter
Here is what I've tried already:
import re
text = "Cell: 415kkk -555- 9999ll Work: 212-555jjj -0000"
phoneNumRegex = re.compile(r'^\d+\w+$')
print(phoneNumRegex.findall(text))
Result is an empty list:
[]
The result I want:
415kkk, 9999ll, 555jjj
Where is the problem?
Problems with your regex:
^...$ means you only want full matches over the whole string - get rid of that.
r'\w+' means "any word character" which means letters + numbers (case ignorant) plus underscore '_'. So this would match '5555' for '555' via
r'\d+' and another '5' as '\w+' hence add it to the result.
You need
import re
text = "Cell: 415kkk -555- 9999ll Work: 212-555jjj -0000"
phoneNumRegex = re.compile(r'\b\d+[a-zA-Z]+\b')
print(phoneNumRegex.findall(text))
instead:
['415kkk', '9999ll', '555jjj']
The '\b' are word boundaries so you do not match 'abcd1111' inside '_§$abcd1111+§$'.
Readup:
re-syntax
regex101.com - Regextester website that can handle python syntax

Python - Replace only exact word in string [duplicate]

This question already has answers here:
How to match a whole word with a regular expression?
(4 answers)
Closed 3 years ago.
I want to replace only specific word in one string. However, some other words have that word inside but I don't want them to be changed.
For example, for the below string I only want to replace x with y in z string. how to do that?
x = "the"
y = "a"
z = "This is the thermometer"
import re
pattern=r'\bthe\b' # \b - start and end of the word
repl='a'
string = 'This is the thermometer'
string=re.sub(pattern, repl, string)
In your case you can use re.sub(x, y, z).
You can read the documentation here for more information.

Python regex of multiple occurrences of a string of 1+ consecutive chars within a string [duplicate]

This question already has an answer here:
Python re.sub back reference not back referencing [duplicate]
(1 answer)
Closed 5 years ago.
I need to find starting and ending positions of variable length sequences of chars, consisting of same 1 letter inside a string.
I saw this topic Finding multiple occurrences of a string within a string in Python, but I assume it's a bit off.
The following gives me nothing, while I expect to have 5 elements found.
import re
s = 'aaaaabaaaabaaabaaba'
pattern = '(a)\1+'
for el in re.finditer(pattern, s):
print 'str found', el.start(), el.end()
Thanks in advance.
Since it is a regex, the backslash should not be escaped at the string level, but should be interpreted by the regex.
You can use a raw string:
import re
s = 'aaaaabaaaabaaabaaba'
pattern = r'(a)\1+' # raw string
for el in re.finditer(pattern, s):
print 'str found', el.start(), el.end()
This generates:
str found 0 5
str found 6 10
str found 11 14
str found 15 17

Python: What is the Best way to split a string of 9 characters into 3 characters each and join them using delimiters? [duplicate]

This question already has answers here:
How to iterate over a list in chunks
(39 answers)
Closed 8 years ago.
I have a string "111222333" inside a CSV file. I would like to convert this into something like "\111\222\333"
Currently my python code is :
refcode = "111222333"
returnstring = "\\" + refcode[:3] + "\\" + refcode[3:6] + "\\" + refcode[-3:] + "\\"
I know there must be a better way to do this. May I know what are the better ways to do the same thing. Please help.
You could use re for that:
import re
refcode = "111222333"
returnstring = '\\'.join(re.match('()(\S{3})(\S{3})(\S{3})()', refcode).groups())
Explanation:
You have a string of 9 characters (let's say they are not any kind of whitespace chatacters, so we could represent it with \S).
We create a matching regexp using it, so (\S{3}) is a group of three sequential non-space characters (like letters, numbers, exclamation marks etc.).
(\S{3})(\S{3})(\S{3}) are three groups with 3 characters in each one.
If we call .groups() on it, we'll have a tuple of the matched groups, just like that:
In [1]: re.match('(\S{3})(\S{3})(\S{3})', refcode).groups()
Out[1]: ('111', '222', '333')
If we join it using a \ string, we'll get a:
In [29]: print "\\".join(re.match('(\S{3})(\S{3})(\S{3})', refcode).groups())
111\222\333
But you want to add the backslashes on the both sides of the string as well!
So we could create an empty group - () - on the each side of the regular expression.

Categories

Resources