Assume I have a word AB1234XZY or even 1AB1234XYZ.
I want to extract ONLY 'AB1234' or 1AB1234 (ie. everything up until the letters at the end).
I have used the following code to extract that but it's not working:
base= re.match(r"^(\D+)(\d+)", word).group(0)
When I print base, it's not working for the second case. Any ideas why?
Your regex doesn't work for the second case because it starts with a number; the \D at the beginning of your pattern matches anything that ISN'T a number.
You should be able to use something quite simple for this--simpler, in fact, than anything else I see here.
'.*\d'
That's it! This should match everything up to and including the last number in your string, and ignore everything after that.
Here's the pattern working online, so you can see for yourself.
(.+?\d+)\w+ would give you what you want.
Or even something like this
^(.+?)[a-zA-Z]+$
re.match starts at the beginning of the string, and re.search simply looks for it in the string. both return the first match. .group(0) is everything included in the match, if you had capturing groups, then .group(1) is the first group...etc etc... as opposed to normal convention where 0 is the first index, in this case, 0 is a special use case meaning everything.
in your case, depending on what you really need to capture, maybe using re.search is better. and instead of using 2 groups, you can use (\D+\d+) keep in mind, it will capture the first (non-digits,digits) group. it might be sufficient for you, but you might want to be more specific.
after reading your comment "everything before the letters at the end"
this regex is what you need:
regex = re.compile(r'(.+)[A-Za-z]')
Related
I've looked through many regexp examples here, but still fail to find a solution.
I have to check a request string for a certain substring in it. The substring in question will have something before it might have something after:
?something=xxx&to_dep=YYY&from_dep=zzz&...
OR
?something=xxx&to_dep=YYY
I need to extract YYY without a & in first case and simply YYY in the second case.
For now I use this kind of regexp:
re.search('to_dep=(.+?)&', req.query_string)
but works only in one case and can't be used if I want to re.sub it. (replace YYY with something else - & gets replaced too)
Any help?
Just try with:
[?&]to_dep=([^&]*)
[^&]* will match any characters that are not & or it will stop on the next & (first case) or stop on the end of the string (second case).
For both, you might use a positive lookbehind and a negated class:
re.search(r'(?<=to_dep=)[^&]+', req.query_string)
And this will give you only YYY, which then means you can also use it in re.sub:
re.sub(r'(?<=to_dep=)[^&]+', 'new_value', req.query_string)
[^&] matches any character except &.
(?<=to_dep=) makes sure there's a to_dep= before the part to match.
I have two regexes (simplified to be equal)
r'^(?P<slug>(^foo)[-\w]+)/$'
r'^(?P<slug>(^foo)[-\w]+)/$'
I would to add an exclusion on the first to check for the end so the latter wins.
For example:
foobar/ should pass the first and never the latter
I want foobar-my-string/ to fail the first but match the latter
I have tried #sdanzig's answer:
r'^(?P<slug>(^foo)[-\w]+(?!my-string$))/$'
r'^(?P<slug>(^foo)[-\w]+)/$'
But it doesn't work I always get into the latter with strings that do or do not end with "my-string"
I also tried it the other way around as my regexes are evaluated top to bottom, but it also doesn't work:
r'^(?P<slug>(^foo)[-\w]+(my-string$))/$'
r'^(?P<slug>(^foo)[-\w]+(?!my-string$))/$'
You should use this negative lookahead for the second regex because [-\w]+ is greedy so you end up consuming the entire string even before you trigger the check for negative lookahead.
p = r'(?P<slug>(?!.*my-string/$)(^foo)[-\w]+)'
Correction... for this particular requirement, you need a look BEHIND assertion, just before the $, to make sure the string doesn't end with my-string/:
(foo[-\w\/]+)(?<!my-string\/)$
I'm not really sure what you're trying to do with the P... it looks like you want to capture it, optionally? You could put (?:P)? just before the foo:
((?:P<slug>)?foo[-\w\/]+)(?<!my-string\/)$
Is there any way to directly replace all groups using regex syntax?
The normal way:
re.match(r"(?:aaa)(_bbb)", string1).group(1)
But I want to achieve something like this:
re.match(r"(\d.*?)\s(\d.*?)", "(CALL_GROUP_1) (CALL_GROUP_2)")
I want to build the new string instantaneously from the groups the Regex just captured.
Have a look at re.sub:
result = re.sub(r"(\d.*?)\s(\d.*?)", r"\1 \2", string1)
This is Python's regex substitution (replace) function. The replacement string can be filled with so-called backreferences (backslash, group number) which are replaced with what was matched by the groups. Groups are counted the same as by the group(...) function, i.e. starting from 1, from left to right, by opening parentheses.
The accepted answer is perfect. I would add that group reference is probably better achieved by using this syntax:
r"\g<1> \g<2>"
for the replacement string. This way, you work around syntax limitations where a group may be followed by a digit. Again, this is all present in the doc, nothing new, just sometimes difficult to spot at first sight.
I am writing a regex that will be used for recognizing commands in a string. I have three possible words the commands could start with and they always end with a semi-colon.
I believe the regex pattern should look something like this:
(command1|command2|command3).+;
The problem, I have found, is that since . matches any character and + tells it to match one or more, it skips right over the first instance of a semi-colon and continues going.
Is there a way to get it to stop at the first instance of a semi-colon it comes across? Is there something other than . that I should be using instead?
The issue you are facing with this: (command1|command2|command3).+; is that the + is greedy, meaning that it will match everything till the last value.
To fix this, you will need to make it non-greedy, and to do that you need to add the ? operator, like so: (command1|command2|command3).+?;
Just as an FYI, the same applies for the * operator. Adding a ? will make it non greedy.
Tell it to find only non-semicolons.
[^;]+
What you are looking for is a non-greedy match.
.+?
The "?" after your greedy + quantifier will make it match as less as possible, instead of as much as possible, which it does by default.
Your regex would be
'(command1|command2|command3).+?;'
See Python RE documentation
I'm using Python and I want to use regular expressions to check if something "is part of an include list" but "is not part of an exclude list".
My include list is represented by a regex, for example:
And.*
Everything which starts with And.
Also the exclude list is represented by a regex, for example:
(?!Andrea)
Everything, but not the string Andrea. The exclude list is obviously a negation.
Using the two examples above, for example, I want to match everything which starts with And except for Andrea.
In the general case I have an includeRegEx and an excludeRegEx. I want to match everything which matchs includeRegEx but not matchs excludeRegEx. Attention: excludeRegEx is still in the negative form (as you can see in the example above), so it should be better to say: if something matches includeRegEx, I check if it also matches excludeRegEx, if it does, the match is satisfied. Is it possible to represent this in a single regular expression?
I think Conditional Regular Expressions could be the solution but I'm not really sure of that.
I'd like to see a working example in Python.
Thank you very much.
Why not put both in one regex?
And(?!rea$).*
Since the lookahead only "looks ahead" without consuming any characters, this works just fine (well, this is the whole point of lookaround, actually).
So, in Python:
if re.match(r"And(?!rea$).*", subject):
# Successful match
# Note that re.match always anchor the match
# to the start of the string.
else:
# Match attempt failed
From the wording of your question, I'm not sure if you're starting with two already finished lists of "match/don't match" pairs. In that case, you could simply combine them automatically by concatenating the regexes. This works just as well but is uglier:
(?!Andrea$)And.*
In general, then:
(?!excludeRegex$)includeRegex