Matching alternating alphanumeric characters with regex - python

I want to match the following alphanumeric combinations using regex; ao1 a12 01p p1p 1ap 1p1.
With the following regex I can match all but p1p and 1p1:
[a-z][0-9]{1,2}|[0-9]{1,2}[a-z]|[a-z][0-9][a-z]|[a-z]{1,2}[0-9]|[0-9][a-z][0-9]
How do I match the alternating number/letter/number and letter/number/letter correctly using regular expressions? It needs to match precisely 3 characters, they occur within sentences.

You may use
(?<!\S)(?=[a-z]{0,2}\d)(?=\d{0,2}[a-z])[a-z\d]{3}(?!\S)
See the regex demo
Details
(?<!\S) - a whitespace or start of string should be immediately to the left of the current location
(?=[a-z]{0,2}\d) - there must be a digit after 0 to 2 letters immediately to the right of the current location
(?=\d{0,2}[a-z]) - there must be a letter after 0 to 2 digits immediately to the right of the current location
[a-z\d]{3} - three letters or digits are matched
(?!\S) - a whitespace or end of string should be immediately to the right of the current location.

Are you looking for something like below?
([\d][a-zA-Z][\d]|[a-zA-Z][\d][a-zA-Z]|[a-zA-Z]{2}[\d]|[a-zA-Z][\d]{2}|[\d]{2}[a-zA-Z]|[\d][a-zA-Z]{2})

So if you need number/letter/number and letter/number/letter the below should work. But your input ao1 doesn't match this criteria.
\d[a-z]\d|[a-z]\d[a-z]

Related

Selecting a string without Space and without Number in the beginning

Here is my string:
^((\S)([a-z]))[a-zA-Z0-9_+-.]+#[a-zA-Z.-]+\.(edu|com|edu7|org)$\b
I need to check for 2 conditions in the beginning of a string:
No space
No number
My string satisfies the first condition but fails the second condition. Thank you for any suggestions. I did try regex101 but could not solve it.
Here are two email addresses that are both invalid:
somebody#gmail.com
5somebody#gmail.com
I want neither of those returned by the program. My current code considers the second email as valid, which is incorrect.
Your expected matches imply that you want to only allow letters as the first char in the string, so you can use
^[a-zA-Z][a-zA-Z0-9_+.-]*#[a-zA-Z.-]+\.(?:edu7?|com|org)$
See the regex demo. Details:
^ - start of string
[a-zA-Z] - an ASCII letter
[a-zA-Z0-9_+.-]* - zero or more letters, digits, _, +, . and - (note the position of the hyphen, it must be at the end of the character class)
# - a # char
[a-zA-Z.-]+ - one or more letters, dots or hyphens
\. - a dot
(?:edu7?|com|org) - edu, edu7, com, org
$ - end of string.

Matching consecutive digits in regex while ignoring dashes in python3 re

I'm working to advance my regex skills in python, and I've come across an interesting problem. Let's say that I'm trying to match valid credit card numbers , and on of the requirments is that it cannon have 4 or more consecutive digits. 1234-5678-9101-1213 is fine, but 1233-3345-6789-1011 is not. I currently have a regex that works for when I don't have dashes, but I want it to work in both cases, or at least in a way i can use the | to have it match on either one. Here is what I have for consecutive digits so far:
validNoConsecutive = re.compile(r'(?!([0-9])\1{4,})')
I know I could do some sort of replace '-' with '', but in an effort to make my code more versatile, it would be easier as just a regex. Here is the function for more context:
def isValid(number):
validStart = re.compile(r'^[456]') # Starts with 4, 5, or 6
validLength = re.compile(r'^[0-9]{16}$|^[0-9]{4}-[0-9]{4}-[0-9]{4}-[0-9]{4}$') # is 16 digits long
validOnlyDigits = re.compile(r'^[0-9-]*$') # only digits or dashes
validNoConsecutive = re.compile(r'(?!([0-9])\1{4,})') # no consecutives over 3
validators = [validStart, validLength, validOnlyDigits, validNoConsecutive]
return all([val.search(number) for val in validators])
list(map(print, ['Valid' if isValid(num) else 'Invalid' for num in arr]))
I looked into excluding chars and lookahead/lookbehind methods, but I can't seem to figure it out. Is there some way to perhaps ignore a character for a given regex? Thanks for the help!
You can add the (?!.*(\d)(?:-*\1){3}) negative lookahead after ^ (start of string) to add the restriction.
The ^(?!.*(\d)(?:-*\1){3}) pattern matches
^ - start of string
(?!.*(\d)(?:-*\1){3}) - a negative lookahead that fails the match if, immediately to the right of the current location, there is
.* - any zero or more chars other than line break chars as many as possible
(\d) - Group 1: one digit
(?:-*\1){3} - three occurrences of zero or more - chars followed with the same digit as captured in Group 1 (as \1 is an inline backreference to Group 1 value).
See the regex demo.
If you want to combine this pattern with others, just put the lookahead right after ^ (and in case you have other patterns before with capturing groups, you will need to adjust the \1 backreference). E.g. combining it with your second regex, validLength = re.compile(r'^[0-9]{16}$|^[0-9]{4}-[0-9]{4}-[0-9]{4}-[0-9]{4}$'), it will look like
validLength = re.compile(r'^(?!.*(\d)(?:-*\1){3})(?:[0-9]{16}|[0-9]{4}-[0-9]{4}-[0-9]{4}-[0-9]{4})$')

Match strings with alternating characters

I want to match strings in which every second character is same.
for example 'abababababab'
I have tried this : '''(([a-z])[^/2])*'''
The output should return the complete string as it is like 'abababababab'
This is actually impossible to do in a real regular expression with an amount of states polynomial to the alphabet size, because the expression is not a Chomsky level-0 grammar.
However, Python's regexes are not actually regular expressions, and can handle much more complex grammars than that. In particular, you could put your grammar as the following.
(..)\1*
(..) is a sequence of 2 characters. \1* matches the exact pair of characters an arbitrary (possibly null) number of times.
I interpreted your question as wanting every other character to be equal (ababab works, but abcbdb fails). If you needed only the 2nd, 4th, ... characters to be equal you can use a similar one.
.(.)(.\1)*
You could match the first [a-z] followed by capturing ([a-z]) in a group. Then repeat 0+ times matching again a-z and a backreference to group 1 to keep every second character the same.
^[a-z]([a-z])(?:[a-z]\1)*$
Explanation
^ Start of the string
[a-z]([a-z]) Match a-z and capture in group 1 matching a-z
)(?:[a-z]\1)* Repeat 0+ times matching a-z followed by a backreference to group 1
$ End of string
Regex demo
Though not a regex answer, you could do something like this:
def all_same(string):
return all(c == string[1] for c in string[1::2])
string = 'abababababab'
print('All the same {}'.format(all_same(string)))
string = 'ababacababab'
print('All the same {}'.format(all_same(string)))
the string[1::2] says start at the 2nd character (1) and then pull out every second character (the 2 part).
This returns:
All the same True
All the same False
This is a bit complicated expression, maybe we would start with:
^(?=^[a-z]([a-z]))([a-z]\1)+$
if I understand the problem right.
Demo

How to match words in which must be letter, number and slash using regex (Python)?

I have such list (it's only a part);
not match me
norme
16/02574/REMMAJ
20160721
17/00016/FULM
OUT/2017/1071
SMD/2017/0391
17/01090/FULM
2017/30597
17/03940/MAO
18/00076/FULM
CH/17/323
18/00840/OUTMEI
17/00902/EIAM
PL/2017/02671/MINFOT
I need to find general rule to match them all but not this first rows (simple words) or any of \d nor \w if not mixed with each other and slash. Numbers like \d{8} are allowed.
I don't know how to use something like MUST clause applied for each of these 3 groups together - neither can be miss.
These patterns either match not fully or match words. Need as simple regex as possible if possible.
\d{8}|(\w+|/+|\d+)
\d{8}|[\w/\d]+
EDIT
It's funny, but some not provided examples doesn't match for proposed expressions. For example:
7/2018/4127
NWB/18CM032
but I know why and this is outside the scope. However, adding functionality for mixed numbers and letters in one group, like NWB/18CM032 would be great and wouldn't break previous idea I think.
You could match either 1 or more times an uppercase char or 1-8 digits and repeat that zero or more times with a forward slash prepended:
^(?:[a-z0-9]+(?:/[a-z0-9]+)+|\d{8})$
That will match
^ Start of string
(?: Non capturing group
[a-z0-9]+ Match a char a-z or a digit 1+ times
(?:/[a-z0-9]+)+ Match a / followed by a char or digit 1+ times and repeat 1+ times.
| Or
\d{8} Match 8 digits
) Close group
$ End of string
See it on regex101

Regex inside a word, containing given characters

If I have a text (e.g. This is g56875f562f624g64a4b54a4g51bb3) how can I match the substrings of it that are made up of [a,b,0-9], are of length 5, contain at least one letter (a or b) and don't start or end with a space (so 51bb3 shouldn't be matched since it's at the end of the string)?
The matches in the example would be 64a4b, 4a4b5, a4b54, 4b54a and b54a4.
I want to use Python.
Start by matching exactly 5 occurences of [a,b,0-9]:
[ab0-9]{5}
Then wrap it in a lookahead so that it can produce overlapping matches:
(?=([ab0-9]{5}))
Then add another lookahead that asserts that there's an a or a b somewhere within the next 5 characters:
(?=.{,4}[ab])(?=([ab0-9]{5}))
And finally add lookarounds that assert the absence of whitespace:
(?<!\s)(?<!^)(?=.{,4}[ab])(?=([ab0-9]{5})(?!\s|$))
See also the online demo.

Categories

Resources