python regex match a group or not match it - python

I want to match the string:
from string as string
It may or may not contain as.
The current code I have is
r'(?ix) from [a-z0-9_]+ [as ]* [a-z0-9_]+'
But this code matches a single a or s. So something like from string a little will also be in the result.
I wonder what is the correct way of doing this.

You may use
(?i)from\s+[a-z0-9_]+\s+(?:as\s+)?[a-z0-9_]+
See the regex demo
Note that you use x "verbose" (free spacing) modifier, and all spaces in your pattern became formatting whitespaces that the re engine omits when parsing the pattern. Thus, I suggest using \s+ to match 1 or more whitespaces. If you really want to use single regular spaces, just omit the x modifier and use the regular space. If you need the x modifier to insert comments, escape the regular spaces:
r'(?ix) from\ [a-z0-9_]+\ (?:as\ )?[a-z0-9_]+'
Also, to match a sequence of chars, you need to use a grouping construct rather than a character class. Here, (?:as\s+)? defines an optional non-capturing group that matches 1 or 0 occurrences of as + space substring.

Related

Need expression not to match after a colon appear

So I have a list of names and wanted to filter out the ones in proper format. For reference, the format I need is IP::hostname. This is the regex formula I currently have:
^\d+(\.|\:)\d+\.\d+\.\d+::.+\w$
However, I need to modify it so that if there are any colons (:) in or after the hostname, for it to not match the expression:
This matches which is correct:
10.179.12.241::CALMGTVCSRM0210
This matches but should not:
10.179.12.241::CALMGTVCSRM0210:as
Any help on how to modify my expression to not match any colons after the host name would be appreciated
The .+ pattern matches 1 or more chars other than line break chars, as many as possible, and thus matches colons allowing them. You need a negated character class, [^:]*, that will match 0+ chars other than a colon.
You may fix you regex (and enhance a bit) using
^\d+[.:]\d+\.\d+\.\d+::[^:]*\w$
^^^^^
See the regex demo
To make sure you want to match a valid IP you'd rather use
^(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?:\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}::[^:]*\w$
See another regex demo (IP regex source). The (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) matches a single octet from 0 to 255 and (?:\.<octet_pattern>){3} matches three repetitions of a dot and an octet pattern.

Regular Expression to match a mandatory symbol in an optional part of a string?

What is the regular expression that matches for a mandatory symbol in an optional part of a string.
For example, abcd will be matched by the RE but, if I add :, the resulting string will not be matched unless I add letter(s) afterwards like this abcd:efg.
So, the optional part is the : onward, and the mandatory symbol in this optional part is the : itself.
abcd:efg:hijk need also to be matched.
UPDATE:
I tried this ^([a-z]|_)*(:[a-z]|_)*$ but it did not work as expected.
You should include more examples and counter-examples, but this should be close enough to your goal:
^[a-z_]+(:[a-z_]+)*$
Here's a test.
The problem with your ^([a-z]|_)*(:[a-z]|_)*$ regex is that it only matches one letter after each :. a:b:c:d matches but not a:b:c:de.
Finally, please note that (:[a-z]|_) is :
a colon followed by a letter
or an underscore.
It doesn't match a colon followed by an underscore!
I would prefer a regex with a positive lookbehind. This also makes it easier to group the matching parts. It first matches the first string, and then matches all the following strings when preceded with a ":"
([a-z_]*)((?<=:):[a-z_])?
https://regex101.com/r/NkiZ3g/1
Your problem is that you need to know how to express optionality for a stretch longer than a single character. Try this:
^abcd(:efg)?$
For abcd and efg substitute whatever you're really looking for.

search for string embedded in {} after keyword

How can I get the string embedded in {} after a keyword, where the number of characters between the keyword and the braces {} is unknown. e.g.:
includegraphics[x=2]{image.pdf}
the keyword would be includegraphics and the string to be found is image.pdf, but the text in between [x=2] could have anything between the two [].
So I want to ignore all characters between the keyword and { or I want to ignore everything between []
Use re.findall
>>> sample = 'includegraphics[x=2]{image.pdf}'
>>> re.findall('includegraphics.*?{(.*?)}',sample)
['image.pdf']
Explanation:
The re module deals with regular expressions in Python. Its findall method is useful to find all occurences of a pattern in a string.
A regular expression for the pattern you are interested in is 'includegraphics.*?{(.*?)}'. Here . symbolizes "any character", while the * means 0 or more times. The question mark makes this a non-greedy operation. From the documentation:
The *, +, and ? qualifiers are all greedy; they match as much
text as possible. Sometimes this behaviour isn’t desired; if the RE
<.*> is matched against <H1\>title</H1>, it will match the entire
string, and not just <H1>. Adding ? after the qualifier makes it
perform the match in non-greedy or minimal fashion; as few characters
as possible will be matched. Using .*? in the previous expression will
match only <H1>.
Please note that while in your case using .*? should be fine, in general it's better to use more specialized character groups such as \w for alphanumerics and \d for digits, when you know what the content is going to consist of in advance.
Use re.search
re.search(r'includegraphics\[[^\[\]]*\]\{([^}]*)\}', s).group(1)

python regular expression of a string

I have python string
wrong_data_type is not one of the allowed values `([one_two, two_three, three_four])`
and I have a regexp:
\w+ is not one of the allowed values`\(\[\w,+\)\]`
However, it is not correct? Any help?
The regexp should be
\w+ is not one of the allowed values `\(\[(?:\w+, )*\w+\]\)`
Fixes:
Added space after values.
\]\) at the end instead of \)\].
Inside the brackets, need to allow multiple \w, so it should be \w+.
Need to have a space after ,.
Need a group around \w+, to match multiple comma-separated words using the * quantifier.
Then have to match a single last word with no comma after it.
data = re.search(r'\(\[[\w,\s]+\]\)', string).group()
You can use the following:
\w+ is not one of the allowed values `\(\[[\w,\s]+\]\)`

Joinning two regular expressions together

I have two regular expressions, one matching for all characters [a-z] and the other excluding the following combination of characters [^spuz(ih)] (the characters s, p, u, z, ih)how would I combine these two so that I could allow all alphanumeric characters except those listed in the second RE?
(re.match(r'^[a-z]*(?![spuz]|ih)[a-z]s$', insert_phrase)
You can't "combine" them as such, but you can write another regular expression which has the same effect. For this, you can use the (?!) construct. It matches 0 characters only if the regular expression in it is not matched by the following part. So you can use:
'(?![spuz(ih)])[a-z]'
Or, since this wasn't what you wanted, change it to:
'(?![spuz]|ih)[a-z]'
In the changed question, you seem to want negative lookbehind instead. This turns the pattern into:
'^[a-z]*(?<![a-z][spuz]|ih)s$'
Note the extra [a-z] in the lookbehind part. It is required because lookbehind expressions must be fixed width. This means that a string like 'ps' will match the pattern, but you don't want that. So instead, it's better to use two separate lookbehinds (both of which have to be be true for the string to match):
'^[a-z]*(?<![spuz])(?<!ih)s$'

Categories

Resources