Selecting a string without Space and without Number in the beginning - python

Here is my string:
^((\S)([a-z]))[a-zA-Z0-9_+-.]+#[a-zA-Z.-]+\.(edu|com|edu7|org)$\b
I need to check for 2 conditions in the beginning of a string:
No space
No number
My string satisfies the first condition but fails the second condition. Thank you for any suggestions. I did try regex101 but could not solve it.
Here are two email addresses that are both invalid:
somebody#gmail.com
5somebody#gmail.com
I want neither of those returned by the program. My current code considers the second email as valid, which is incorrect.

Your expected matches imply that you want to only allow letters as the first char in the string, so you can use
^[a-zA-Z][a-zA-Z0-9_+.-]*#[a-zA-Z.-]+\.(?:edu7?|com|org)$
See the regex demo. Details:
^ - start of string
[a-zA-Z] - an ASCII letter
[a-zA-Z0-9_+.-]* - zero or more letters, digits, _, +, . and - (note the position of the hyphen, it must be at the end of the character class)
# - a # char
[a-zA-Z.-]+ - one or more letters, dots or hyphens
\. - a dot
(?:edu7?|com|org) - edu, edu7, com, org
$ - end of string.

Related

How to match a string with pythons regex with optional character, but only if that optional character is preceded by another character

I need to match a string that optionally ends with numbers, but only if the numbers aren't preceded by a 0.
so AAAA should match, AAA1 should, AA20 should, but AA02 should not.
I can figure out the optionality of it, but I'm not sure if python has a "preceded by" or "followed by" flag.
if s.isalnum() and re.match("^[A-Z]+[1-9][0-9]*$", s):
return True
Try:
^[A-Z]+(?:[1-9][0-9]*)?$
Regex demo.
^[A-Z]+ - match letters from the beginning of string
(?:[1-9][0-9]*)? - optionally match a number that doesn't start from 0
$ - end of string

Regex to pull the first and last letter of a string

I am using this \d{2}-\d{2}-\d{4} to validate my string. It works to pull the sequence of numbers out of said string resulting in 02-01-1716 however, i also need to pull the letter the string begins with and ends with; i.e. Q:\Region01s\FY 02\02-01-1716A.pdf i need the Q as well as the A so in the end i would have Q: 02-01-1716A
You can use
import re
regex = r"^([a-zA-Z]:)\\(?:.*\\)?(\d{2}-\d{2}-\d{4}[a-zA-Z]?)"
text = r"Q:\Region01s\FY 02\02-01-1716A.pdf"
match = re.search(regex, text)
if match:
print(f"{match.group(1)} {match.group(2)}")
# => Q: 02-01-1716A
See the Python demo. Also, see the regex demo. Details:
^ - start of string
([a-zA-Z]:) - Group 1: a letter and :
\\ - a backslash
(?:.*\\)? - an optional sequence of any chars other than line break chars as many as possible, followed with a backslash
(\d{2}-\d{2}-\d{4}[a-zA-Z]?) - Group 2: two digits, -, two digits, -, four digits, an optional letter.
The output - if there is a match - is a concatenation of Group 1, space and Group 2 values.
You can try:
(.).*(.)\.[^\.]+$
Or with the validation:
(.).*\d{2}-\d{2}-\d{4}(.)\.[^\.]+$

RegEx for matching two digits and everything except new lines and dot

Using python v3, I'm trying to find a string only if it contains one to two digits (and not anymore than that in the same number) along with everything else following it. The match breaks on periods or new lines.
\d{1,2}[^.\n]+ is almost right except it returns numbers greater than two digits.
For example:
"5+years {} experience. stop.
10 asdasdas . 255
1abc1
5555afasfasf++++s()(jn."
Should return:
5+years {} experience
10 asdasdas
1abc1
Based upon your description and your sample data, you can use following regex to match the intended strings and discard others,
^\d[^\d.]*\d?[^\d.\n]*(?=\.|$)
Regex Explanation:
^ - Start of line
\d - Matches a digit
[^\d.]* - This matches any character other than digit or dot zero or more times. This basically allows optionally matching of non-digit non-dot characters.
\d? - As you want to allow one or two digits, this is the second digit which is optional hence \d followed by ?
[^\d.\n]* - This matches any character other than digit or dot or newline
(?=\.|$) - This positive look ahead ensures, the match either ends with a dot or end of line
Also, notice, multiline mode is enabled as ^ and $ need to match start of line and end of line.ad
Regex Demo 1
Code:
import re
s = '''5+years {} experience. stop.
10 asdasdas . 255
1abc1
5555afasfasf++++s()(2jn.'''
print(re.findall(r'(?m)^\d[^\d.]*\d?[^\d.\n]*(?=\.|$)', s))
Prints:
['5+years {} experience', '10 asdasdas ', '1abc1']
Also, if matching lines doesn't necessarily start with digits, you can use this regex to capture your intended string but here you need to get your string from group1 if you want captured string to start with number only, and if intended string doesn't necessarily have to start with digits, then you can capture whole match.
^[^\d\n]*(\d[^\d.]*\d?[^\d.\n]*)(?=\.|$)
Regex Explanation:
^ - Start of line
[^\d\n]* - Allows zero or more non-digit characters before first digit
( - Starts first grouping pattern to capture the string starting with first digit
\d - Matches a digit
[^\d.]* - This matches any character other than digit or dot zero or more times. This basically allows optionally matching of non-digit non-dot characters.
\d? - As you want to allow one or two digits, this is the second digit which is optional hence \d followed by ?
[^\d.\n]* - This matches any character other than digit or dot or newline
`) - End of first capturing pattern
(?=\.|$) - This positive look ahead ensures, the match either ends with a dot or end of line
Multiline mode is enabled which you can enable by placing (?m) before start of regex also called inline modifier or by passing third argument to re.search as re.MULTILINE
Regex Demo 2
Code:
import re
s = '''5+years {} experience. stop.
10 asdasdas . 255
1abc1
aaa1abc1
aa2aa1abc1
5555afasfasf++++s()(2jn.'''
print(re.findall(r'(?m)^[^\d\n]*(\d[^\d.]*\d?[^\d.\n]*)(?=\.|$)', s))
Prints:
['5+years {} experience', '10 asdasdas ', '1abc1', '1abc1']

strange output regular expression r'[-.\:alnum:](.*)'

I expect to fetch all alphanumeric characters after "-"
For an example:
>>> str1 = "12 - mystr"
>>> re.findall(r'[-.\:alnum:](.*)', str1)
[' mystr']
First, it's strange that white space is considered alphanumeric, while I expected to get ['mystr'].
Second, I cannot understand why this can be fetched, if there is no "-":
>>> str2 = "qwertyuio"
>>> re.findall(r'[-.\:alnum:](.*)', str2)
['io']
First of all, Python re does not support POSIX character classes.
The white space is not considered alphanumeric, your first pattern matches - with [-.\:alnum:] and then (.*) captures into Group 1 all 0 or more chars other than a newline. The [-.\:alnum:] pattern matches one char that is either -, ., :, a, l, n, u or m. Thus, when run against the qwertyuio, u is matched and io is captured into Group 1.
Alphanumeric chars can be matched with the [^\W_] pattern. So, to capture all alphanumeric chars after - that is followed with 0+ whitespaces you may use
re.findall(r'-\s*([^\W_]+)', s)
See the regex demo
Details
- - a hyphen
\s* - 0+ whitespaces
([^\W_]+) - Capturing group 1: one or more (+) chars that are letters or digits.
Python demo:
print(re.findall(r'-\s*([^\W_]+)', '12 - mystr')) # => ['mystr']
print(re.findall(r'-\s*([^\W_]+)', 'qwertyuio')) # => []
Your regex says: "Find any one of the characters -.:alnum, then capture any amount of any characters into the first capture group".
In the first test, it found - for the first character, then captured mystr in the first capture group. If any groups are in the regex, findall returns list of found groups, not the matches, so the matched - is not included.
Your second test found u as one of the -.:alnum characters (as none of qwerty matched any), then captured and returned the rest after it, io.
As #revo notes in comments, [....] is a character class - matching any one character in it. In order to include a POSIX character class (like [:alnum:]) inside it, you need two sets of brackets. Also, there is no order in a character class; the fact that you included - inside it just means it would be one of the matched characters, not that alphanumeric characters would be matched without it. Finally, if you want to match any number of alphanumerics, you have your quantifier * on the wrong thing.
Thus, "match -, then any number of alphanumeric characters" would be -([[:alnum:]]*), except... Python does not support POSIX character classes. So you have to write your own: -([A-Za-z0-9]*).
However, that will not match your string because the intervening space is, as you note, not an alphanumeric character. In order to account for that, -\s*([A-Za-z0-9]*).
Not quite sure what you want to match. I'll assume you don't want to include '-' in any matches.
If you want to get all alphanumeric chars after the first '-' and skip all other characters you can do something like this.
re.match('.*?(?<=-)(((?<=\s+)?[a-zA-Z\d]+(?=\s+)?)+)', inputString)
If you want to find each string of alphanumerics after a each '-' then you can do this.
re.findall('(?<=-)[a-zA-Z\d]+')

Matching alternating alphanumeric characters with regex

I want to match the following alphanumeric combinations using regex; ao1 a12 01p p1p 1ap 1p1.
With the following regex I can match all but p1p and 1p1:
[a-z][0-9]{1,2}|[0-9]{1,2}[a-z]|[a-z][0-9][a-z]|[a-z]{1,2}[0-9]|[0-9][a-z][0-9]
How do I match the alternating number/letter/number and letter/number/letter correctly using regular expressions? It needs to match precisely 3 characters, they occur within sentences.
You may use
(?<!\S)(?=[a-z]{0,2}\d)(?=\d{0,2}[a-z])[a-z\d]{3}(?!\S)
See the regex demo
Details
(?<!\S) - a whitespace or start of string should be immediately to the left of the current location
(?=[a-z]{0,2}\d) - there must be a digit after 0 to 2 letters immediately to the right of the current location
(?=\d{0,2}[a-z]) - there must be a letter after 0 to 2 digits immediately to the right of the current location
[a-z\d]{3} - three letters or digits are matched
(?!\S) - a whitespace or end of string should be immediately to the right of the current location.
Are you looking for something like below?
([\d][a-zA-Z][\d]|[a-zA-Z][\d][a-zA-Z]|[a-zA-Z]{2}[\d]|[a-zA-Z][\d]{2}|[\d]{2}[a-zA-Z]|[\d][a-zA-Z]{2})
So if you need number/letter/number and letter/number/letter the below should work. But your input ao1 doesn't match this criteria.
\d[a-z]\d|[a-z]\d[a-z]

Categories

Resources