Regex to Match Pattern 5ABXYXYXY

Regex to Match Pattern 5ABXYXYXY - python

I am working on mobile number of 9 digits.
I want to use regex to match numbers with pattern 5ABXYXYXY.
A sample I have is 529434343
What I have tried
I have the below pattern to match it.
r"^\d*(\d)(\d)(?:\1\2){2}\d*$"
However, this pattern matches another pattern I have which is 5XXXXXXAB
a sample for that is 555555532.
What I want I want to edit my regex to match the first pattern only 5ABXYXYXY and ignore this one 5XXXXXXAB

You can use
^\d*((\d)(?!\2)\d)\1{2}$
See the regex demo.
Details:
^ - start of string
\d* - zero or more digits
((\d)(?!\2)\d) - Group 1: a digit (captured into Group 2), then another digit (not the same as the preceding one)
\1{2} - two occurrences of Group 1 value
$ - end of string.

To match 5ABXYXYXY where AB should not be same as XY matching 3 times, you may use this regex:
^\d*(\d{2})(?!\1)((\d)(?!\3)\d)\2{2}$
RegEx Demo
RegEx Breakup:
^: Start
\d*: Match 0 or more digits
(\d{2}): Match 2 digits and capture in group #1
(?!\1): Make sure we don't have same 2 digits at next position
(: Start capture group #2
(\d): Match and capture a digit in capture group #3
(?!\3): Make sure we don't have same digit at next position as in 3rd capture group
\d: Match a digit
)`: End capture group #2
\2{2}: Match 2 pairs of same value as in capture group #2
$: End

Related

Match Two Sets of Different Consecutive Numbers Regex Python

I am classifying a list of vanity phone numbers based on their patterns using regex.
I would like to capture this pattern 5ABXXXYYY
Sample 534666999
I wrote the below regex that captures XXXYYY.
(\d)\1{2}(\d)\2{2}
I want to add a condition to assert the B is not the same number as X.
Desired output will match the given pattern exactly and replace it with the word silver.
S_2 = 534666999
S_2_pattern = re.sub(r"(\d)\2{2}(\d)\3{2}", "Silver", str(S_2))
print(S_2_pattern)
Silver
Thanks

If you want to match 9 digits, and the 3rd digit should not be the same as the 4th, you can add another capture group for the 3rd digit and all the group numbers after are incremented by 1.
\b\d\d(\d)(?!\1)(\d)\2\2(\d)\3\3\b
\b A word boundary to prevent a partial word match
\d\d Match 2 digits
(\d)(?!\1) Capture a single digit in group 1, and assert that it is not followed by the same
(\d)\2\2 Capture a single digit in group 2 and match 2 times the same digits after it
(\d)\3\3 Capture a single digit in group 3 and match 2 times the same digits after it
\b A word boundary
Regex demo
If the first 3 digits in group 2 should also be different from the last 3 digits in group 3:
\b\d\d(\d)(?!\1)(\d)(?!\d\d\2)\2\2(\d)\3\3\b
Regex demo

How can I write a regex that finds everything but 4 digit numbers like 2000 or 1990 or 1234?

I have a text like this:
Film_relase_date:1970_films_by_20th_Century_Fox
I would like to create a regex that matches all text except 1970, resulting in:
Film_relase_date:_films_by_20th_Century_Fox
I tried with the regex:
[^\d{4}]
But this regex returns:
Film_relase_date:_films_by_th_Century_Fox
And therefore also excludes the 20 which instead I would like to be matched.
How can I improve the regex?
EDIT:
I want to use this regex to do something like:
x = 'Film_relase_date: 1970_films_by_20th_Century_Fox'
REPLACE (x, "Anything that is not a 4-digit number", "Non-Space") = 1970

Remember that {4} is supposed to be added after the character class, not inside.
Anyway, if you want to match "all text except 1970", you can use the following regex:
([^\d]|(?<!\d)\d(?!\d{3}(?!\d))\d*)?
see demo.
This regex matches:
a non-digit character or
a digit char that is nor preceded by another digit and it is not followeb by exactly 3 digits

If you want to match all except 4 digits, I would suggest an unrolled version matching either 1-3 or 5 digits asserting not followed by a digit to prevent consecutive matching digits.
If you don't want to cross newlines, you could use [^\d\r\n] instead of \D
\D+(?:(?:\d{1,3}|\d{5,})(?!\d)\D*)*
Explanation
\D+ Match 1+ non digits
(?: Non capture group
(?:\d{1,3}|\d{5,}) Match either 1-3 or 5 or more digits
(?!\d)\D* Negative lookahead, assert not a digit directly to the right followed by matching optional non digits
)* Close the non capture group and repeat 0+ times
Regex demo
Note that if you want to match 4 digits only, you could perhaps extract the 4 digits using (?<!\d)\d{4}(?!\d) instead of replacing with an empty string.
See another regex demo

Not able get desired output after string parsing through regex

input =
6:/BENM/Gravity Exports/REM//INV: 3267/FEB20:65:ghgh
6:/BENM/Tabuler Trading/REM//IMP/2020-341
original_regex = 6:[A-Za-z0-9 \/\.\-:] - bt this is taking full string 6:/BENM/Gravity Exports/REM//INV: 3267/FEB20:65:ghgh
modified_regex_pattern = 6:[A-Za-z0-9 \/\.\-:]{1,}[\/-:]
In the first string i want output till
6:/BENM/Gravity Exports/REM//INV: 3267/FEB20
but its giving till :65:
Can anyone suggest better way to write this.
Example as below
https://regex101.com/r/pAduvy/1

You could for example use a capturing group with an optional part at the end to match the :digits:a-z part.
(6:[A-Za-z0-9 \/.:-]+?)(?::\d+:[a-z]+)?$
( Capture group 1
6:[A-Za-z0-9 \/.:-]+? Match any of the listed in the character class as least as possible
) Close group 1
(?::\d+:[a-z]+)? optionally match the part at the end that you don't want to include
$ End of string
Regex demo
Note Not sure if intended, but the last part of your pattern [\/-:] denotes a range from ASCII range 47 - 58.
Or a more precise pattern to get the match only
6:/\w+/\w+ \w+/[A-Z]+//[A-Z]+(?:: \d+)?/[A-Z]*\d+(?:-\d+)?
6:/\w+/\w+ Match 6 and 2 times / followed by 1+ word chars and a space
\w+/[A-Z]+//[A-Z]+ Match 1+ word chars, / and uppercase chars, // and again uppercase chars
(?:: \d+)? Optionally match a space and 1+ digits
/[A-Z]*\d+ Match /, optional uppercase chars and 1+ digits
(?:-\d+)? Optionally match - and 1+ digits
Regex demo

Fetching respective group values in a regex expression

I have an example string like below:
Handling - Uncrating of 3 crates - USD600 each 7%=126.00 1,800.00
I can have another example string that can be like:
Unpacking/Unremoval fee Zero Rated 100.00
I am trying to access the first set of words and the last number values.
So I want the dict to be
{'Handling - Uncrating of 3 crates - USD600 each':1800.00}
or
{'Unpacking/Unremoval fee':100.00}
There might be strings where none of the above patterns (Zero Rated or something with %) present and I would skip those strings.
To do that, I was regexing the following pattern
pattern = re.search(r'(.*)Zero.*Rated\s*(\S*)',line.strip())
and then
pattern.group(1)
gives the keys for dict and
pattern.group(2)
gives the value of 1800.00. This works for lines where Zero Rated is present.
However if I want to also check for pattern where Zero Rated is not present but % is present as in first example above, I was trying to use | but it didn't work.
pattern = re.search(r'(.*)Zero.*Rated|%\s*(\S*)',line.strip())
But this time I am not getting the right pattern groups as it is fetching groups.

Sites like regex101.com can help debug regexes.
In this case, the problem is with operator precedence; the | operates over the whole of the rest of the regex. You can group parts of the regex without creating additional groups with (?: )
Try: r'(.*)(?:Zero.*Rated|%)\s*(\S*)'
Definitely give regex101.com a go, though, it'll show you what's going on in the regex.

You might use
^(.+?)\s*(?:Zero Rated|\d+%=\d{1,3}(?:\,\d{3})*\.\d{2})\s*(\d{1,3}(?:,\d{3})*\.\d{2})
The pattern matches
^ Start of string
(.+?) Capture group 1, match any char except a newline as least as possible
\s* Match 0+ whitespace chars
(?: Non capture group
Zero Rated Match literally
| Or
\d+%= Match 1+ digits and %=
\d{1,3}(?:\,\d{3})*\.\d{2} Match a digit format of 1-3 digits, optionally repeated by a comma and 3 digits followed by a dot and 2 digits
) Close non capture group
\s* Match 0+ whitespace chars
(\d{1,3}(?:,\d{3})*\.\d{2}) Capture group 2, match the digit format
Regex demo | Python demo
For example
import re
regex = r"^(.+?)\s*(?:Zero Rated|\d+%=\d{1,3}(?:\,\d{3})*\.\d{2})\s*(\d{1,3}(?:,\d{3})*\.\d{2})"
test_str = ("Handling - Uncrating of 3 crates - USD600 each 7%=126.00 1,800.00\n"
"Unpacking/Unremoval fee Zero Rated 100.00\n"
"Delivery Cartage - IT Equipment, up to 1000kgs - 7%=210.00 3,000.00")
print(dict(re.findall(regex, test_str, re.MULTILINE)))
Output
{'Handling - Uncrating of 3 crates - USD600 each': '1,800.00', 'Unpacking/Unremoval fee': '100.00', 'Delivery Cartage - IT Equipment, up to 1000kgs -': '3,000.00'}

Match text with 4 to 5 CAPITAL ALPAHABETS along with minimum 1 or maximum 2 digit number

Requirement:
4 to 5 CAPITAL ALPAHABETS along with minimum 1 or maximum 2 digit number
I have created a REGEX which matches string with CAPITAL ALPHABETS which has more than 1 digit but I want to match Text which has only 1 or 2 digits.
\b(?=.*\d){1,2}(?=.*[A-Z])[A-Z\d]{4,5}\b
Match Cases:
Allow
8HB8
H8ER
D5KC2
Disallow
8HB88
HEER
D54C2
Edit 1:
I should be able to match WORDs of that format with in sentence also not alone as word.
Allow:
This is a valid 9CB8 code
This is another valid H1CS code

One option is to assert 4-5 chars [A-Z0-9].
Then match at least 1 digit 0-9 between optional chars [A-Z] and optionally match a second digit.
^(?=[A-Z0-9]{4,5}$)[A-Z]*[0-9][A-Z]*(?:[0-9][A-Z]*)?$
In parts
^ Start of string
(?=[A-Z0-9]{4,5}$) Assert 4-5 chars A-Z0-9
[A-Z]*[0-9][A-Z]* Match a digit between optional chars A-Z
(?: Non capture group
[0-9][A-Z]* match a digit 0-9
)? Close group and make it optional
$ End of string
Regex demo

So maybe you could use:
^(?=[A-Z0-9]{4,5}$)(?:\D*\d\D*){1,2}$
I based my answer on the same principle as I did here.
^ - Start of string ancor
(?=[A-Z0-9]{4,5}$) - A positive lookahead for a minimum of 4 and a maximum of 5 characters in the range of [A-Z0-9] before the end of string ancor, $.
(?:\D*\d\D*) - A non-capture group where we have a combination of: zero or more non-digits followed by a digit and again zero or more non-digits.
{1,2} - Allow the previous non-capture group to occur a minimum of 1 and a maximum of two times (to make sure there are only 1 or 2 digits.
$ - End of string ancor.
See the online demo here and below is a visualization of the pattern from left to right:

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Regex to Match Pattern 5ABXYXYXY - python

You can use ^\d((\d)(?!\2)\d)\1{2}$ See the regex demo. Details: ^ - start of string \d - zero or more digits ((\d)(?!\2)\d) - Group 1: a digit (captured into Group 2), then another digit (not the same as the preceding one) \1{2} - two occurrences of Group 1 value $ - end of string.

Related

Match Two Sets of Different Consecutive Numbers Regex Python

How can I write a regex that finds everything but 4 digit numbers like 2000 or 1990 or 1234?

Not able get desired output after string parsing through regex

Fetching respective group values in a regex expression

Match text with 4 to 5 CAPITAL ALPAHABETS along with minimum 1 or maximum 2 digit number

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Regex to Match Pattern 5ABXYXYXY - python

You can use ^\d*((\d)(?!\2)\d)\1{2}$ See the regex demo. Details: ^ - start of string \d* - zero or more digits ((\d)(?!\2)\d) - Group 1: a digit (captured into Group 2), then another digit (not the same as the preceding one) \1{2} - two occurrences of Group 1 value $ - end of string.

Related

Match Two Sets of Different Consecutive Numbers Regex Python

How can I write a regex that finds everything but 4 digit numbers like 2000 or 1990 or 1234?

Not able get desired output after string parsing through regex

Fetching respective group values in a regex expression

Match text with 4 to 5 CAPITAL ALPAHABETS along with minimum 1 or maximum 2 digit number

Categories

Resources

You can use ^\d((\d)(?!\2)\d)\1{2}$ See the regex demo. Details: ^ - start of string \d - zero or more digits ((\d)(?!\2)\d) - Group 1: a digit (captured into Group 2), then another digit (not the same as the preceding one) \1{2} - two occurrences of Group 1 value $ - end of string.