I am trying to get my hands dirty on regex.
Sample String
Info:Somestring-103409115825.Call
Info: BIL*ONL*00003.Avbl
Currently This Matches
>>> print(re.search(r'Info:(.*?).Call', payload).group(1))
Somestring-103409115825
>>> print(re.search(r'Info:(.*?).Avbl', payload).group(1))
BIL*ONL*00003
how to make regex to match both conudtion like Info -- AnyString -- .Call|.Avbl ?
You can escape the dot and place it before using an alternation to match either Call or Avbl
\bInfo:(.*?)\.(?:Call|Avbl)\b
Regex demo
import re
pattern = r"\bInfo:(.*?)\.(?:Call|Avbl)\b"
print(re.search(pattern, "Info:Somestring-103409115825.Call").group(1))
print(re.search(pattern, "Info: BIL*ONL*00003.Avbl").group(1))
Output
Somestring-103409115825
BIL*ONL*00003
If you don't want the leading space before the group value, you can use:
\bInfo:\s*(.*?)\.(?:Call|Avbl)\b
See a Python demo
Please try this code.
import re
payload1 = 'Info:Somestring-103409115825.Call'
payload2 = 'Info: BIL*ONL*00003.Avbl'
print(re.search(r'Info:(.*?)((.Call)|(.Avbl))', payload1).group(1))
print(re.search(r'Info:(.*?)((.Call)|(.Avbl))', payload2).group(1))
Related
I am currently new to Regular Expressions and would appreciate if someone can guide me through this.
import re
some = "I cannot take this B01234-56-K-9870 to the house of cards"
I have the above string and trying to extract the string with dashes (B01234-56-K-9870) using python regular expression. I have following code so far:
regex = r'\w+-\w+-\w+-\w+'
match = re.search(regex, some)
print(match.group()) #returns B01234-56-K-9870
Is there any simpler way to extract the dash pattern using regular expression? For now, I do not care about the order or anything. I just wanted it to extract string with dashes.
Try the following regex (as shortened by The fourth bird),
\w+-\S+
Original regex: (?=\w+-)\S+
Explanation:
\w+- matches 1 or more words followed by a -
\S+ matches non-space characters
Regex demo!
I am given a string which is number example "44.87" or "44.8796". I want to extract everything after decimal (.). I tried to use regex in Python code but was not successful. I am new to Python 3.
import re
s = "44.123"
re.findall(".","44.86")
Something like s.split('.')[1] should work
If you would like to use regex try:
import re
s = "44.123"
regex_pattern = "(?<=\.).*"
matched_string = re.findall(regex_pattern, s)
?<= a negative look behind that returns everything after specified character
\. is an escaped period
.* means "match all items after the period
This online regex tool is a helpful way to test your regex as you build it. You can confirm this solution there! :)
I have a string with some markup which I'm trying to parse, generally formatted like this.
'[*]\r\n[list][*][*][/list][*]text[list][*][/list]'
I want to match the asterisks within the [list] tags so I can re.sub them as [**] but I'm having trouble forming an expression to grab them. So far, I have:
match = re.compile('\[list\].+?\[/list\]', re.DOTALL)
This gets everything within the list, but I can't figure out a way to narrow it down to the asterisks alone. Any advice would be massively appreciated.
You may use a re.sub and use a lambda in the replacement part. You pass the match to the lambda and use a mere .replace('*','**') on the match value.
Here is the sample code:
import re
s = '[*]\r\n[list][*][*][/list][*]text[list][*][/list]'
match = re.compile('\[list].+?\[/list]', re.DOTALL)
print(match.sub(lambda m: m.group().replace('*', '**'), s))
# = > [*]
# [list][**][**][/list][*]text[list][**][/list]
See the IDEONE demo
Note that a ] outside of a character class does not have to be escaped in Python re regex.
Regex has never been my strong point. In python I'm attempting to build an expression which matches substrings such as this:
%MATCH%
%MATCH_1%
$THIS_IS_A_MATCH%
It would be extracted by a %MATCH% like this or %LIKE_THIS%
I ended up with this (logically, but does not seem to work): %[A-Z0-9_]*$%
So where am I going wrong on this?
You can use a simple regex like this:
[%$]\w+[%$] <-- Notice I put $ because of your sample
On the other hand, if you only want uppercase you can use:
[%$][A-Z_\d]+[%$]
If you only want to match content within %, you could also use:
%.+?%
Python code
import re
p = re.compile(ur'[%$]\w+[%$]')
test_str = u"%MATCH%\n\n%MATCH_1%\n\n$THIS_IS_A_MATCH%"
re.findall(p, test_str)
Btw, the problem with your regex is below:
%[A-Z0-9_]*$%
^--- Remove this dolar sign
I have the following text:
text = itunes20140618.tbz
How would I capture the date here, using a regular expression?
I am currently doing:
date = text.split('.tbz')[0].split('itunes')[-1]
I think using a re.findall here would be cleaner for what I am trying to do. Please note in the regular expression, it needs to be after the specific word "itunes" for the capture group (not just not numbers).
You can use re.search to find your desired match.
>>> import re
>>> re.search(r'\d+', 'itunes20140618.tbz').group()
'20140618'
Since you state it has to be after the word itunes, you can use a capturing group and refer to that group number to access your match.
>>> import re
>>> re.search(r'itunes(\d+)', 'itunes20140618.tbz').group(1)
'20140618'
You can also use a Positive Lookbehind to assure it's after the word itunes.
>>> re.search(r'(?<=itunes)\d+', 'itunes20140618.tbz').group()
'20140618'
Regex:
[^\d]*(\d+).*
Live demo
If you guarantee that the expression is going to be of this form:
itunes followed by date, then you can also use this:
itunes(\d+).*