Regex for input validation enforcement - python

I have been trying to create a regex that will match on a string of strings with the following format: "static.string static.mod static.bin". I basically want to enforce the string.string format. My current implementation only gets the first string static.string. this is my RE ^(\s*)([A-Za-z]+)(\.+)([A-Za-z]+). This only matches the first string, so how do I make it iterate and match any string that fits that format in a string of strings?

You may use
re.findall(r'(?<!\S)[A-Za-z]+\.[A-Za-z]+(?!\S)', text)
See the regex demo.
The regex matches:
(?<!\S) - a location immediately preceded with a whitespace or start of string
[A-Za-z]+ - 1+ ASCII letters
\. - a dot
[A-Za-z]+ - 1+ ASCII letters
(?!\S) - a location immediately followed with a whitespace or end of string.

Related

Question about matching RE in a complicated form

How can I match a word using RE in the following format:
Letter number Alphanumeric dot(.) Alphanumeric{0-4}
Examples:
A24.L
A2F.L9
A2F.LG4
This is what I've come up with so far:
answer=re.findall(r'[A-Za-z]\d\w\.\w{0-4})
As you are using re.findall, I assume you are looking for partial matches inside longer text. Bearing that in mind, you need to fix the following:
\w matches not only alphanumeric, but also a _ char
{0-4} is not a valid limiting ("range", or "interval") quantifier, it has a {min,max} syntax (note that the min value should not be omitted, although some regex engines allow that with 0 value used as default, but there are regex engines that either do not support or that do not work correctly with this omitting)
In Python 3, \d matches any Unicode digit (like ٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९০১২৩৪৫৬৭৮৯੦੧੨੩੪੫੬੭੮੯૦૧૨૩૪૫૬૭૮૯୦୧୨୩୪୫୬୭୮୯௦௧௨௩௪௫௬௭௮௯౦౧౨౩౪౫౬౭౮౯೦೧೨೩೪೫೬೭೮೯൦൧൨൩൪൫൬൭൮൯๐๑๒๓๔๕๖๗๘๙໐໑໒໓໔໕໖໗໘໙༠༡༢༣༤༥༦༧༨༩၀၁၂၃၄၅၆၇၈၉႐႑႒႓႔႕႖႗႘႙០១២៣៤៥៦៧៨៩᠐᠑᠒᠓᠔᠕᠖᠗᠘᠙᥆᥇᥈᥉᥊᥋᥌᥍᥎᥏᧐᧑᧒᧓᧔᧕᧖᧗᧘᧙᭐᭑᭒᭓᭔᭕᭖᭗᭘᭙᮰᮱᮲᮳᮴᮵᮶᮷᮸᮹᱀᱁᱂᱃᱄᱅᱆᱇᱈᱉᱐᱑᱒᱓᱔᱕᱖᱗᱘᱙꘠꘡꘢꘣꘤꘥꘦꘧꘨꘩꣐꣑꣒꣓꣔꣕꣖꣗꣘꣙꤀꤁꤂꤃꤄꤅꤆꤇꤈꤉꩐꩑꩒꩓꩔꩕꩖꩗꩘꩙0123456789), so you probably want to use (?a) inline modifier (to only match ASCII digits) or an explicit [0-9].
So, you can use
answer=re.findall(r'\b[A-Za-z][0-9][A-Za-z0-9]\.[A-Za-z0-9]{1,4}\b', text)
if the alphanumeric after . is obligatory, and the following if the match can end in a dot:
answer=re.findall(r'\b[A-Za-z][0-9][A-Za-z0-9]\.[A-Za-z0-9]{0,4}(?<!\w\B)', text)
Details:
\b - word boundary
[A-Za-z] - a letter
[0-9] - an ASCII digit
[A-Za-z0-9] - an ASCII alphanumeric
\. - a . char
[A-Za-z0-9]{1,4}\b - one to four alphanumeric chars at the word boundary.
The second regex does not contain a word boundary at the end since the match is supposed to be able to end in a . (that is not a word char). The (?<!\w\B) is a right-hand dynamic word boundary that only requires a non-word char or end position if the preceding char is a word char.
See the regex demo.

Getting a correct regex for word starting and ending with different letters

I am quite new to regex and I right now Have a problem formulating a regex to match a string where the first and last letter are different. I looked up on the internet and found a regex that just does it's opposite. i.e. matches words that have same starting and ending letter. Can anyone please help me to understand if I can negeate this regex in some way or can create a new regex to match my requirements. The regex that needs to be modiifed or changed is:
^\s|^[a-z]$|^([a-z]).*\1$
This matches these Strings :
aba,
a,
b,
c,
d,
" ",
cccbbbbbbac,
aaaaba
But I want it to match strings like:
aaabbcz,
zba,
ccb,
cbbbba
Can anyone please help me in this regard? Thank you.
Note: I will be using this with Python Regex, so the regex should be compataible to be used with Python.
You don't need a regex for this, just use
s[0] != s[-1]
where s is your string. If you must use a regex, you can use this:
^(.).*(?!\1).$
This looks for
^ : beginning of string
(.) : a character (captured in group 1)
.* : some number of characters
(?!\1). : a character which is not the character captured in group 1
$ : end of string
Regex demo on regex101
This part of your pattern ^([a-z]).*\1$ only accounts for chars a-z, but you also want to exclude " "
You can rewrite that pattern by putting the part after the capture group inside a negative lookahead.
^(.)(?!.*\1$).+
^ Start of string
(.) Capture a single char (including spaces) in group 1
(?!.*\1$) Negative lookahead, assert that the string does not end with the same character
.+ Match 1+ characters so that the string has a minimum of 2 characters
See a regex demo.
If the string should start and end with a non whitespace character to prevent / trailing trailing spaces, you can start the match with a non whitespace character \S and also end the match with a non whitespace character.
^(\S)(?!.*\1$).*\S$
See another regex demo.

Detect strings containing only digits, letters and one or more question marks

I am writing a python regex that matches only string that consists of letters, digits and one or more question marks.
For example, regex1: ^[A-Za-z0-9?]+$ returns strings with or without ?
I want a regex2 that matches expressions such as ABC123?A, 1AB?CA?, ?2ABCD, ???, 123? but not ABC123, ABC.?1D1, ABC(a)?1d
on mysql, I did that and it works:
select *
from (
select * from norm_prod.skill_patterns
where pattern REGEXP '^[A-Za-z0-9?]+$') AS XXX
where XXX.pattern not REGEXP '^[A-Za-z0-9]+$'
How about something like this:
^(?=.*\?)[a-zA-Z0-9\?]+$
As you can see here at regex101.com
Explanation
The (?=.*\?) is a positive lookahead that tells the regex that the start of the match should be followed by 0 or more characters and then a ? - i.e., there should be a ? somewhere in the match.
The [a-zA-Z0-9\?]+ matches one-or-more occurrences of the characters given in the character class i.e. a-z, A-Z and digits from 0-9, and the question mark ?.
Altogether, the regex first checks if there is a question mark somewhere in the string to be matched. If yes, then it matches the characters mentioned above. If either the ? is not present, or there is some foreign character, then the string is not matched.
You can validate an alphanumeric string with one or more question marks using
where pattern REGEXP '^[A-Za-z0-9]*([?][A-Za-z0-9]*)+$'
In Python:
re.search(r'^[A-Za-z0-9]*(?:\?[A-Za-z0-9]*)+$', text)
See the regex demo.
Details:
^ - start of string
[A-Za-z0-9]* - zero or more letters or digits
([?][A-Za-z0-9]*)+ - one or more repetitions of a ? char and then zero or more letters or digits
$ - end of string.
If you plan to apply this to any Unicode string, consider using POSIX character classes:
where pattern REGEXP '^[[:alnum:]]*([?][[:alnum:]]*)+$'
where [[:alnum:]] matches any letters and digits. In Python:
re.search(r'^[^\W_]*(?:\?[^\W_]*)+$', text)
In Python, all shorthand character classes are Unicode aware by default, and the [^\W_] pattern is a \w (that matches letters, digits, connector punctuation) with _ subtracted from it.
If there should be at least a single question mark present using MySQL or Python:
^[A-Za-z0-9]*\?[A-Za-z0-9?]*$
Explanation
^ Start of string
[A-Za-z0-9]* Match optional chars A-Z a-z 0-9
\? Match a question mark
[A-Za-z0-9]* Match optional chars A-Z a-z 0-9 or ?
$ End of string
See a regex demo.
In MySQL double escape the backslash like:
REGEXP '^[A-Za-z0-9]*\\?[A-Za-z0-9?]*$'

Regex pattern for string having spaces only at end

I have a requirement where I need to match string which satisfies all of the below requirements -
String must be of length 12
String can have only following characters - Alphabets, Digits and Spaces
Spaces if any must be at the end of the string. Spaces in between are not allowed.
I have tried with below regex -
"^[0-9a-zA-Z\s]{12}$"
Above regex is satisfying requirement #1 and #2 but not able to satisfy #3.
Please help me to achieve the requirements.
Thanks in advance !!
You can use
^(?=.{12}$)[0-9a-zA-Z]*\s*$
If at least one letter must exist:
^(?=.{12}$)[0-9a-zA-Z]+\s*$
Details:
^ - start of string
(?=.{12}$) - the string must contain 12 chars
[0-9a-zA-Z]* - zero or more alphanumeroics
\s* - zero or more whitespaces
$ - end of string.
See the regex demo.
Use a non-word boundary \B:
^(?:[a-zA-Z0-9]|\s\B){12}$
demo
With it, a space can't be followed by a letter or a digit, but only by a non-word character (a space here) or the end of the string.
To ensure at least one character that isn't blank:
^[a-zA-Z0-9](?:[a-zA-Z0-9]|\s\B){11}$
Note that with PCRE you have to use the D (DOLLAR END ONLY) modifier to be sure that $ matches the end of the string and not before the last newline sequence. Or better replace $ with \z. There isn't this kind of problem with Python and the re module.
You may use this regex:
^(?!.*\h\S)[\da-zA-Z\h]{12}$
RegEx Demo
RegEx Details:
^: Start
(?!.*\h\S): Negative lookahead to fail the match if a whitespace is followed by a non-whitespace character
[\da-zA-Z\h]{12}: Match 12 characters of alphanumerics or white space
$: End

Replacing string if it contains specified pattern

I need to replace ravi.jhon#piramal.com| or sam.jennifer#piramal.com| to
''(empty strings).I have written following regex but its unable to deal
with . - emptyspace in the strings.
my regex is \w+#ongoose.com["|"]
now question is how to include ., empty space,- along with alpha numeric characters
my final output should be : ravi.jhon#piramal.com| to ``
Add the character you want to match in a character class [\w.-].
In you example you want to match piramal and in your regex you want to match ongoose. To match both of them you might use an alternation (?:ongoose|piramal) or match any non whitespace character using \S+ and replace with an empty string.
To match a dot you have to escape it \.
[\w.-]+#\S+\.com\|

Categories

Resources