How to match part of the string contains certain word - python

I want to be able to match the following texts
top of radio 54 / bottom 27
radio top 54 / bottom 27
The word top can be before or after radio only for the first half of the text (before /). top that appears after / should not be matched.
I tried to use the following pattern that encloses the lookahead for the first half with parentheses.
((?=top).*radio\s\d{2}\s)\/\D*bottom\s\d{2}
But it doesn't work like I expected. Only matches first string, but not second.

You may use this regex with a lookahead:
^(?=[^/]*\btop\s)[^/]*radio[^/]+\d{2}\s+/\D*bottom\s+\d{2}$
RegEx Demo
RegEx Breakup:
^: Start
(?=[^/]*\btop\s): Lookahead to assert presence of word top being present before matching a /. This is to ensure we match word top before / only
[^/]*: Match 0 more of any char that is not a /
radio: Match radio
[^/]+: Match 1+ of any char that is not a /
\d{2}: Match 2 digits
\s+: Match 1+ whitespaces
/: Match a /
\D*: Match 0 or more non-digits
bottom: Match bottom
\s+: Match 1+ whitespace
\d{2}: Match 2 digits
$: End

Related

Regex to Match Pattern 5ABXYXYXY

I am working on mobile number of 9 digits.
I want to use regex to match numbers with pattern 5ABXYXYXY.
A sample I have is 529434343
What I have tried
I have the below pattern to match it.
r"^\d*(\d)(\d)(?:\1\2){2}\d*$"
However, this pattern matches another pattern I have which is 5XXXXXXAB
a sample for that is 555555532.
What I want I want to edit my regex to match the first pattern only 5ABXYXYXY and ignore this one 5XXXXXXAB
You can use
^\d*((\d)(?!\2)\d)\1{2}$
See the regex demo.
Details:
^ - start of string
\d* - zero or more digits
((\d)(?!\2)\d) - Group 1: a digit (captured into Group 2), then another digit (not the same as the preceding one)
\1{2} - two occurrences of Group 1 value
$ - end of string.
To match 5ABXYXYXY where AB should not be same as XY matching 3 times, you may use this regex:
^\d*(\d{2})(?!\1)((\d)(?!\3)\d)\2{2}$
RegEx Demo
RegEx Breakup:
^: Start
\d*: Match 0 or more digits
(\d{2}): Match 2 digits and capture in group #1
(?!\1): Make sure we don't have same 2 digits at next position
(: Start capture group #2
(\d): Match and capture a digit in capture group #3
(?!\3): Make sure we don't have same digit at next position as in 3rd capture group
\d: Match a digit
)`: End capture group #2
\2{2}: Match 2 pairs of same value as in capture group #2
$: End

Optional group except when it precede with a match

I want to match any string that starts with . and word and then optionally any character after a space.
r"^\.(\w+)(?:\s+(.+)\b)?"
eg:
should match
.just one two
.just
.blah one#nine
.blah
.jargon blah
should not match
.jargon
I want this second group mandatory if first group is jargon
Using Python you can exclude matching only jargon using a negative lookahead, and then match 1 or more word characters
Then optionally match 1 or more whitespace characters excluding newlines followed by at least 1 or more characters without newlines.
^\.(?!jargon$)\w+(?:[^\S\n]+.+)?$
The pattern matches:
^ Start of string
\. Match a dot
(?!jargon$) Exlude matching jargon as the only word on the line
\w+ Match 1+ word characters
(?: Non capture group
[^\S\n]+.+ match 1+ whitespace chars excluding newline and then 1+ chars except newlines
)? Close non capture group and make it optional
$ End of string
See a regex demo and a Python demo.
Example
import re
strings = [
".just one two",
".just",
".blah one#nine",
".blah",
".jargon blah",
".jargon"
]
for s in strings:
m = re.match(r"\.(?!jargon$)\w+(?:[^\S\n]+.+)?$", s)
if m:
print(m.group())
Output
.just one two
.just
.blah one#nine
.blah
.jargon blah
One approach would be to phrase your requirement using an alternation:
^\.(?:(?!jargon\b)\w+(?: \S+)*|jargon(?: \S+)+)$
This pattern says to match:
^ from the start of the input
\. match dot
(?:
(?!jargon\b)\w+ match a first term which is NOT "jargon"
(?: \S+)* then match optional following terms zero or more times
| OR
jargon match "jargon" as the first term
(?: \S+)+ then match mandatory one or more terms
)
$ end of the input
Here is a sample Python script:
inp = [".just one two", ".just", ".blah one#nine", ".blah", ".jargon blah", "jargon"]
matches = [x for x in inp if re.search(r'^\.(?:(?!jargon\b)\w+(?: \S+)*|jargon(?: \S+)+)$', x)]
print(matches) # ['.just one two', '.just', '.blah one#nine', '.blah', '.jargon blah']
You could attempt to match the following regular expression:
^\.(?!jargon$)\w+(?= .|$).*
Demo
If successful, this will match the entire string. If one simply wants to know if the string conforms to the requirements .* can be dropped.
(?!jargon$) is a negative lookahead that asserts that the period is not immediately followed by 'jargon' at the end of the string.
(?= .|$) is a positive lookahead that asserts that the string of word characters is followed by a space followed by any character or they terminate the string.

How do I exact match all the words in a string includes those ends with '!,?.'"' but do not match those with any other punctuation using regex?

For example, if the pattern I want to search for is apple
The string I want to search into is
apple apple#323 apple.. apple??...!! apple??%% Ilovesapple
Expected matches:
apple
apple..
apple??...!!
I want to match all the exact words, the only exception is that the word ends with the punctuation in
!,?.'"
Is this what you want ?
\bapple[!,?.'\"]*(?=\s+|$)
Regex Demo
Details:
\b word boundary for 'apple'
apple the pattern you want to search
[!,?.'\"]* zero or more occurrences of the special character(s) at the end that you want to match
(?=\s+|$) Positive Lookahead to ensure the matching word is followed by white space or end of line.
Another version
Another possibility is this if you don't allow any non-space characters before 'apple':
(?:\s+|^)(apple[!,?.'\"]*)(?=\s+|$)
Regex Demo Note that #apple? is not matched because it starts with #
Refer to Group 1 for the matches
Details:
(?:\s+|^) non-capturing group for white space(s) or beginning of line before 'apple'
( start of capturing group (captured into Group 1)
apple the pattern you want to search
[!,?.'\"]* zero or more occurrences of the special character(s) at the end that you want to match
) end of capturing group (captured into Group 1)
(?=\s+|$) Positive Lookahead to ensure the matching word is followed by white space or end of line.

Not able get desired output after string parsing through regex

input =
6:/BENM/Gravity Exports/REM//INV: 3267/FEB20:65:ghgh
6:/BENM/Tabuler Trading/REM//IMP/2020-341
original_regex = 6:[A-Za-z0-9 \/\.\-:] - bt this is taking full string 6:/BENM/Gravity Exports/REM//INV: 3267/FEB20:65:ghgh
modified_regex_pattern = 6:[A-Za-z0-9 \/\.\-:]{1,}[\/-:]
In the first string i want output till
6:/BENM/Gravity Exports/REM//INV: 3267/FEB20
but its giving till :65:
Can anyone suggest better way to write this.
Example as below
https://regex101.com/r/pAduvy/1
You could for example use a capturing group with an optional part at the end to match the :digits:a-z part.
(6:[A-Za-z0-9 \/.:-]+?)(?::\d+:[a-z]+)?$
( Capture group 1
6:[A-Za-z0-9 \/.:-]+? Match any of the listed in the character class as least as possible
) Close group 1
(?::\d+:[a-z]+)? optionally match the part at the end that you don't want to include
$ End of string
Regex demo
Note Not sure if intended, but the last part of your pattern [\/-:] denotes a range from ASCII range 47 - 58.
Or a more precise pattern to get the match only
6:/\w+/\w+ \w+/[A-Z]+//[A-Z]+(?:: \d+)?/[A-Z]*\d+(?:-\d+)?
6:/\w+/\w+ Match 6 and 2 times / followed by 1+ word chars and a space
\w+/[A-Z]+//[A-Z]+ Match 1+ word chars, / and uppercase chars, // and again uppercase chars
(?:: \d+)? Optionally match a space and 1+ digits
/[A-Z]*\d+ Match /, optional uppercase chars and 1+ digits
(?:-\d+)? Optionally match - and 1+ digits
Regex demo

Finding exact values associated with given word using regex in python

am trying to find values associated with a particular word using regex but not getting expected results.
I wrote a pattern that is working fine for standard input only and I want to so the same for all sorts of inputs.
What I have now:
string = r'''results on 12/28/2012: WBC=8.110*3, RBC=3.3010*6, Hgb=11.3gm/dL'''
Pattern which I wrote:
re.findall(r'{}=(.*)'.format(detected_word), search_query)[0].split(',')[0]
detected_word is variable where am detecting left side part of equals sign like (WBC, RBC,...) using another technique.
In this above case, it's working fine, but if I change the sentence pattern like below am unable to find a generic pattern.
string = r'''results on 12/28/2012: WBC=8.110*3, RBC=3.3010*6 and Hgb=11.3gm/dL'''
string = r'''results for WBC, RBC and Hgb are 8.110*3, 3.3010*6 and 11.3gm/dL'''
no matter of string format I can able to detect WBC, RBC, and Hgb these words but detecting the value for an associated word is worrying me
Could anyone please help me with this?
Thanks in advance
Here is an idea: use two separate patterns for the strings you provided as sample input, the first one will extract values coming after expected word= and the other will extract them from clauses of expected word1 + optional expected word2 + optional expected word3 + "to be" verb + value1, optional value2 and optional value3.
Pattern 1:
\b(WBC|RBC|Hgb)=(\S*)\b
See the regex demo.
\b(WBC|RBC|Hgb) - a whole word WBC, RBC or Hgb
= - a = char
(\S*)\b - Group 2: 0 or more non-whitespaces, that stops at last word boundary position
Pattern 2:
\b(WBC|RBC|Hgb)(?:(?:\s+and)?(?:\s*,)?\s+(WBC|RBC|Hgb))?(?:(?:\s+and)?(?:\s*,)?\s*(WBC|RBC|Hgb))?\s+(?:is|(?:a|we)re|was|will\s+be)(?:\s*,)?\s*(\d\S*)\b(?:(?:\s+and)?(?:\s*,)?\s*(\d\S*)\b)?(?:(?:\s+and)?(?:\s*,)?\s*(\d\S*)\b)?
See regex demo.
\b(WBC|RBC|Hgb) - Group 1 capturing the searched word
(?:(?:\s+and)?(?:\s*,)?\s*(WBC|RBC|Hgb))? - an optional pattern:
(?:\s+and)? - an optional sequence of 1+ whitespaces and then and
(?:\s*,)? - an optional sequence of 0+ whitespaces and then a comma
\s*(WBC|RBC|Hgb) - 0+ whitespaces and Group 2 capturing the searched word
(?:(?:\s+and)?(?:\s*,)?\s*(WBC|RBC|Hgb))? - same as above, captures the 3rd optional searched word into Group 3
\s+ - 1+ whitespaces
(?:is|(?:a|we)re|was|will\s+be) - a VERB, you may add more if you expect them to be at this position, or plainly try a \S+ or \w+ pattern instead
(?:\s*,)?\s* - an optional 0+ whitespaces and a comma sequence, then 0+ whitespaces
(\d\S*)\b - Group 4 (pair it with Group 1 value): a digit and then 0+ non-whitespace chars limited by a word boundary
(?:(?:\s+and)?(?:\s*,)?\s*(\d\S*)\b)? - an optional group matching
(?:\s+and)? - an optional sequence of 1+ whitespaces and and
(?:\s*,)?\s* - an optional 0+ whitespaces and a comma, then 0+ whitespaces
(\d\S*)\b - Group 5 (pair it with Group 2 value): a digit and then 0+ non-whitespace chars limited by a word boundary
(?:(?:\s+and)?(?:\s*,)?\s*(\d\S*)\b)? - same as above, with a capture group 6 that must be paired with Group 3.

Categories

Resources