I'm very new to regex, and i'm trying to find instances in a string where there exists a word consisting of either the letter w or e followed by 2 digits, such as e77 w10 etc.
Here's the regex that I currently have, which I think finds that (correct me if i'm wrong)
([e|w])\d{0,2}(\.\d{1,2})?
How can I add a space right after the letter e or w? If there are no instances where the criteria is met, I would like to keep the string as is. Do I need to use re.sub? I've read a bit about that.
Input: hello e77 world
Desired output: hello e 77 world
Thank You.
Your regex needs to just look like this:
([ew])(\d{2})
if you want to only match specifically 2 digits, or
([ew])(\d{1,2})
if you also want to match single digits like e4
The brackets are called capturing groups and could be back referenced in a search and replace, or with python, using re.sub
your replace string should look like
\1 \2
So it should be as simple as a line like:
re.sub(r'([ew])(\d{1,2})', r'\1 \2', your_string)
EDIT: working code
>>> import re
>>> your_string = 'hello e77 world'
>>>
>>> re.sub(r'([ew])(\d{1,2})', r'\1 \2', your_string)
'hello e 77 world'
This is what you're after:
import re
print(re.sub(r'([ew])(\d{1,2})', r'\g<1> \g<2>', 'hello e77 world'))
Related
I am trying to extract the ticket number from an email reply subject message. The subject message typically looks like this:
s = 'Re: Test something before TICKET#ABC123 hello world something after'
I would like to extract the part TICKET#ABC123
How can I achieve this the best in Python? Is this the way to go for my purpose or do you have better suggestions to keep track of mail chains?
Without regex (using split() and startswith()):
s = 'Re: Test something before TICKET#ABC123 hello world something after'
splitted = s.split()
for x in splitted:
if x.startswith('TICKET#'):
print(x)
# TICKET#ABC123
You could use the following regex:
import re
s = 'Re: Test something before TICKET#ABC123 hello world something after'
re.findall(r'TICKET#[a-zA-Z0-9]+(?=\s)', s)
# ['TICKET#ABC123']
Explanation:
r'TICKET# - matches the characters r'TICKET# literally (case sensitive)
[a-zA-Z0-9] - Match a single character present in [a-zA-Z0-9]
+ - Quantifier Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
(?=\s) - Positive Lookahead (?=\s)
\s- matches any whitespace character (equal to [\r\n\t\f\v ])
Using Regex.
Ex:
import re
s = 'Re: Test something before TICKET#ABC123 hello world something after'
m = re.search(r"TICKET#(\w+)", s)
if m:
print(m.group(1))
Output:
ABC123
Can't comment on #Rakesh.
But we need to change the regex a little bit, since expected result is TICKET#ABC123
Ex:
import re
s = 'Re: Test something before TICKET#ABC123 hello world something after'
m = re.search(r"(TICKET#(\w+))", s)
if m:
print(m.group(1))
Output:
TICKET#ABC123
If you want to get the ticket number, then you can use
m.group(2)
I currently have a string similar to the following:
str = 'abcHello Wor=A9ld'
What I want to do is find the 'abc' and '=A9' and replace these matched groups with an empty string, such that my final string is 'Hello World'.
I am currently using this regex, which is correctly finding the groups I want to replace:
r'^(abc).*?(=[A-Z0-9]+)'
I have tried to replace these groups using the following code:
clean_str = re.sub(r'^(abc).*?(=[A-Z0-9]+)', '', str)
Using the above code has resulted in:
print(clean_str)
>>> 'ld'
My question is, how can I use re.sub to replace these groups with an empty string and obtain my 'Hello World'?
Capture everything else and put those groups in the replacement, like so:
re.sub(r'^abc(.*?)=[A-Z0-9]+(.*)', r'\1\2', s)
This worked for me.
re.sub(r'^(abc)(.*?)(=[A-Z0-9]+)(.*?)$', r"\2\4", str)
Is there a way that I can .. ensure that abc is present, otherwise don't replace the second pattern?
I understand that you need to first check if the string starts with abc, and if yes, remove the abc and all instances of =[0-9A-Z]+ pattern in the string.
I recommend:
import re
s="abcHello wo=A9rld"
if s.startswith('abc'):
print(re.sub(r'=[A-Z0-9]+', '', s[3:]))
Here, if s.startswith('abc'): checks if the string has abc in the beginning, then s[3:] truncates the string from the start removing the abc, and then re.sub removes all non-overlapping instances of the =[A-Z0-9]+ pattern.
Note you may use PyPi regex module to do the same with one regex:
import regex
r = regex.compile(r'^abc|(?<=^abc.*?)=[A-Z0-9]+', regex.S)
print(r.sub('', 'abcHello Wor=A9ld=B56')) # Hello World
print(r.sub('', 'Hello Wor=A9ld')) # => Hello Wor=A9ld
See an online Python demo
Here,
^abc - abc at the start of the string only
| - or
(?<=^abc.*?) - check if there is abc at the start of the input and then any number of chars other than line break chars immediately to the left of the current location
=[A-Z0-9]+ - a = followed with 1+ uppercase ASCII letters/digits.
This is a naïve approach but why can't you use replace twice instead of regex, like this:
str = str.replace('abc','')
str = str.replace('=A9','')
print(str) #'Hello World'
Hey guys I'm trying to find all words with a specific character in the middle of the word. The word cannot begin or end with the specified character.
lets use 'x' for example. My current regex looks like this:
r'\b(?!x)\w+x(?<!x)\b'
the \w+x is not returning any results. Anyone have an idea why?
Try this:
>>> z = 'hello welxtra xcra crax extra'
>>> re.findall(r'[^x ]\w*x\w*[^x ]', z)
['welxtra', 'extra']
You can use something like this:
import re
print re.match(r'[^]+x+[^]','provxa')
print re.match(r'[^]+x+[^]','xprova')
Output:
<_sre.SRE_Match object at 0x10eaa0bf8>
None
where [^] is any char. So it will basically match an 'x' only is it between something else. You can change [^] with [a-z] to specify lowercase letters instead any char.
\b(?!x)\w+x\w+(?<!x)\b
^^
You missed the \w+after x.See demo.
https://regex101.com/r/nS2lT4/34
I would like to get 2 captured groups for a pair of consecutive words. I use this regular expression:
r'\b(hello)\b(world)\b'
However, searching "hello world" with this regular expression yields no results:
regex = re.compile(r'\b(hello)\b(world)\b')
m = regex.match('hello world') # m evaluates to None.
You need to allow for space between the words:
>>> import re
>>> regex = re.compile(r'\b(hello)\s*\b(world)\b')
>>> regex.match('hello world')
<_sre.SRE_Match object at 0x7f6fcc249140>
>>>
Discussion
The regex \b(hello)\b(world)\b requires that the word hello end exactly where the word world begins but with a word break \b between them. That cannot happen. Adding space, \s, between them fixes this.
If you meant to allow punctuation or other separators between hello and world, then that possibility should be added to the regex.
I would like to write a regex for searching for the existence of some words, but their order of appearance doesn't matter.
For example, search for "Tim" and "stupid". My regex is Tim.*stupid|stupid.*Tim. But is it possible to write a simpler regex (e.g. so that the two words appear just once in the regex itself)?
See this regex:
/^(?=.*Tim)(?=.*stupid).+/
Regex explanation:
^ Asserts position at start of string.
(?=.*Tim) Asserts that "Tim" is present in the string.
(?=.*stupid) Asserts that "stupid" is present in the string.
.+Now that our phrases are present, this string is valid. Go ahead and use .+ or - .++ to match the entire string.
To use lookaheads more exclusively, you can add another (?=.*<to_assert>) group. The entire regex can be simplified as /^(?=.*Tim).*stupid/.
See a regex demo!
>>> import re
>>> str ="""
... Tim is so stupid.
... stupid Tim!
... Tim foobar barfoo.
... Where is Tim?"""
>>> m = re.findall(r'^(?=.*Tim)(?=.*stupid).+$', str, re.MULTILINE)
>>> m
['Tim is so stupid.', 'stupid Tim!']
>>> m = re.findall(r'^(?=.*Tim).*stupid', str, re.MULTILINE)
>>> m
['Tim is so stupid.', 'stupid Tim!']
Read more:
Regex with exclusion chars and another regex
You can use Positive Lookahead to achieve this. The lookahead approach is nice for matching strings that contain both substrings regardless of order.
pattern = re.compile(r'^(?=.*Tim)(?=.*stupid).*$')
Example:
>>> s = '''Hey there stupid, hey there Tim
Hi Tim, this is stupid
Hi Tim, this is great'''
...
>>> import re
>>> pattern = re.compile(r'^(?=.*Tim)(?=.*stupid).*$', re.M)
>>> pattern.findall(s)
# ['Hey there stupid, hey there Tim', 'Hi Tim, this is stupid']