I am trying to extract the ticket number from an email reply subject message. The subject message typically looks like this:
s = 'Re: Test something before TICKET#ABC123 hello world something after'
I would like to extract the part TICKET#ABC123
How can I achieve this the best in Python? Is this the way to go for my purpose or do you have better suggestions to keep track of mail chains?
Without regex (using split() and startswith()):
s = 'Re: Test something before TICKET#ABC123 hello world something after'
splitted = s.split()
for x in splitted:
if x.startswith('TICKET#'):
print(x)
# TICKET#ABC123
You could use the following regex:
import re
s = 'Re: Test something before TICKET#ABC123 hello world something after'
re.findall(r'TICKET#[a-zA-Z0-9]+(?=\s)', s)
# ['TICKET#ABC123']
Explanation:
r'TICKET# - matches the characters r'TICKET# literally (case sensitive)
[a-zA-Z0-9] - Match a single character present in [a-zA-Z0-9]
+ - Quantifier Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
(?=\s) - Positive Lookahead (?=\s)
\s- matches any whitespace character (equal to [\r\n\t\f\v ])
Using Regex.
Ex:
import re
s = 'Re: Test something before TICKET#ABC123 hello world something after'
m = re.search(r"TICKET#(\w+)", s)
if m:
print(m.group(1))
Output:
ABC123
Can't comment on #Rakesh.
But we need to change the regex a little bit, since expected result is TICKET#ABC123
Ex:
import re
s = 'Re: Test something before TICKET#ABC123 hello world something after'
m = re.search(r"(TICKET#(\w+))", s)
if m:
print(m.group(1))
Output:
TICKET#ABC123
If you want to get the ticket number, then you can use
m.group(2)
Related
If I have an input something like this
input = 'AB. Hello word.'
the output should be
output = 'Hello word.'
Another example is
input = 'AB′. Hello word'
output = Hello Word
I want to produce a code which is generalized for any group of letter in any language. This is my code
text = 'A. Hello word.'
text = re.sub(r'A\. \w{1,2}\.*', '', text)
text
output = llo word.
So I can change 'A' with any other letter, but for some reason isn't working well.
I tried also this one
text = 'Ab. Hello word.'
text = re.sub(r'A+\. \w{1,2}\.*', '', text)
text
output = Ab. Hello word.
but isn't working as well.
Don't use a regex for this, just .split() on it, you can just split once and take the last half [-1]
>>> "Ab. Hello world.".split(".", 1)[-1].strip()
'Hello world.'
>>> "Hello world".split(".", 1)[-1]
'Hello world'
You may use this regex for matching:
\b[A-Za-z]{1,3}′?\.
Replace it with "".
RegEx Demo
RegEx Details:
\b: Word boundary
[A-Za-z]{1,3}: Match 1 to 3 letters
′?: Match an optional ′
\.: Match a dot
Try this:
import re
regex = r"^[^.]{1,3}\.\s*"
test_str = ("AB. Hello word.\n"
"AB′. Hello word.\n"
"A. Hello word.\n"
"Ab. Hello word.\n")
subst = ""
# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)
if result:
print (result)
Output:
Hello word.
Hello word.
Hello word.
Hello word.
regex101
Rextester
This solution may be useful:
a = "AB. Hello word."
print(a[a.find(".")+1:])
Use this generic regex pattern:
"^.{0,}\."
Expalination:
^ Finds a match as the beginning of a string.
.{0,} Matches any string that contains a sequence of zero or more characters.
\. ending with a . (dot)
I have a string looking like this: Hello #StackOverflow! How are you today? I'd like to !sh #StackExchange
I would like it to look like this: Hello ! How are you today? I'd like to !sh
I would like to remove # and anything after it, until the string is cleared of all matches.
The solution I came up with only removes the first occurence.
re.sub('#\S+ ', '', myString)
You may use this re.sub:
#\w+\s*
Code:
>>> s = "Hello #StackOverflow! How are you today? I'd like to !sh #StackExchange"
>>> print ( re.sub(r'#\w+\s*', '', s) )
Hello ! How are you today? I'd like to !sh
RegEx Details:
#: Match literal #:
\w+\s*: Match 1+ word characters followed by 0 or more whitespaces
You just need to remove the trailing space in your string.
import re
myString = "Hello #StackOverflow! How are you today? I'd like to !sh #StackExchange"
re.sub('#\S+', '', myString)
I'm trying to extract out the number before the - and the rest of the string after it, but it's not able to extract out both. Here's the output from the interactive terminal:
>>> a = '#232 - Hello There'
>>> re.findall('#(.*?) - (.*?)', a)
[('232', '')]
Why is my regex not working properly?
.*? is non-greedy i.e. it will match the smallest substring, you need the greedy version i.e. .* (matches longest substring) for the latter one:
In [1143]: a = '#232 - Hello There'
In [1144]: re.findall('#(.*?) - (.*?)', a)
Out[1144]: [('232', '')]
In [1145]: re.findall('#(.*?) - (.*)', a)
Out[1145]: [('232', 'Hello There')]
But you should use str methods to process such simple cases e.g. using str.split with splitting on -:
In [1146]: a.split(' - ')
Out[1146]: ['#232', 'Hello There']
With str.partition on - and slicing:
In [1147]: a.partition(' - ')[::2]
Out[1147]: ('#232', 'Hello There')
This expression might likely extract those desired values:
([0-9]+)\s*-\s*(.*)
Demo
Test
import re
print(re.findall("([0-9]+)\s*-\s*(.*)", "#232 - Hello There"))
Output
[('232', 'Hello There')]
Your regex is fine, you're just using the wrong function from re. The following matches things correctly:
m = re.fullmatch('#(.*?) - (.*?)', a)
I'm very new to regex, and i'm trying to find instances in a string where there exists a word consisting of either the letter w or e followed by 2 digits, such as e77 w10 etc.
Here's the regex that I currently have, which I think finds that (correct me if i'm wrong)
([e|w])\d{0,2}(\.\d{1,2})?
How can I add a space right after the letter e or w? If there are no instances where the criteria is met, I would like to keep the string as is. Do I need to use re.sub? I've read a bit about that.
Input: hello e77 world
Desired output: hello e 77 world
Thank You.
Your regex needs to just look like this:
([ew])(\d{2})
if you want to only match specifically 2 digits, or
([ew])(\d{1,2})
if you also want to match single digits like e4
The brackets are called capturing groups and could be back referenced in a search and replace, or with python, using re.sub
your replace string should look like
\1 \2
So it should be as simple as a line like:
re.sub(r'([ew])(\d{1,2})', r'\1 \2', your_string)
EDIT: working code
>>> import re
>>> your_string = 'hello e77 world'
>>>
>>> re.sub(r'([ew])(\d{1,2})', r'\1 \2', your_string)
'hello e 77 world'
This is what you're after:
import re
print(re.sub(r'([ew])(\d{1,2})', r'\g<1> \g<2>', 'hello e77 world'))
I have a good regexp for replacing repeating characters in a string. But now I also need to replace repeating words, three or more word will be replaced by two words.
Like
bye! bye! bye!
should become
bye! bye!
My code so far:
def replaceThreeOrMoreCharachetrsWithTwoCharacters(string):
# pattern to look for three or more repetitions of any character, including newlines.
pattern = re.compile(r"(.)\1{2,}", re.DOTALL)
return pattern.sub(r"\1\1", string)
Assuming that what is called "word" in your requirements is one or more non-whitespaces characters surrounded by whitespaces or string limits, you can try this pattern:
re.sub(r'(?<!\S)((\S+)(?:\s+\2))(?:\s+\2)+(?!\S)', r'\1', s)
You could try the below regex also,
(?<= |^)(\S+)(?: \1){2,}(?= |$)
Sample code,
>>> import regex
>>> s = "hi hi hi hi some words words words which'll repeat repeat repeat repeat repeat"
>>> m = regex.sub(r'(?<= |^)(\S+)(?: \1){2,}(?= |$)', r'\1 \1', s)
>>> m
"hi hi some words words which'll repeat repeat"
DEMO
I know you were after a regular expression but you could use a simple loop to achieve the same thing:
def max_repeats(s, max=2):
last = ''
out = []
for word in s.split():
same = 0 if word != last else same + 1
if same < max: out.append(word)
last = word
return ' '.join(out)
As a bonus, I have allowed a different maximum number of repeats to be specified (the default is 2). If there is more than one space between each word, it will be lost. It's up to you whether you consider that to be a bug or a feature :)
Try the following:
import re
s = your string
s = re.sub( r'(\S+) (?:\1 ?){2,}', r'\1 \1', s )
You can see a sample code here: http://codepad.org/YyS9JCLO
def replaceThreeOrMoreWordsWithTwoWords(string):
# Pattern to look for three or more repetitions of any words.
pattern = re.compile(r"(?<!\S)((\S+)(?:\s+\2))(?:\s+\2)+(?!\S)", re.DOTALL)
return pattern.sub(r"\1", string)