I want to know if a string is a collection of, by example, numbers ([0-9]).
I this case, i'm using the regular expression [0-9](,[0-9])* to find one or more numbers separated by commas (A collection of numbers).
Is there a better way to do it? I mean a shorter expression perhaps.
I would suggest the following pattern:
(?<=^|,|\s)(\d+)
(?<=...) is a lookbehind assertion that will not be captured into the groups nor be included into the matched string. It is used to identify the starting position of the number to be matched.
You can try the above pattern interactively in the following website:
https://regex101.com/r/IKGWtA/1
\d*(,\d*)* will catch the situation where you have multiple digits before and after a comma e.g. 100,000. This regex will only grab 0,0 from that same number.
Related
I have an input that is valid if it has this parts:
starts with letters(upper and lower), numbers and some of the following characters (!,#,#,$,?)
begins with = and contains only of numbers
begins with "<<" and may contain anything
example: !!Hel##lo!#=7<<vbnfhfg
what is the right regex expression in python to identify if the input is valid?
I am trying with
pattern= r"([a-zA-Z0-9|!|#|#|$|?]{2,})([=]{1})([0-9]{1})([<]{2})([a-zA-Z0-9]{1,})/+"
but apparently am wrong.
For testing regex I can really recommend regex101. Makes it much easier to understand what your regex is doing and what strings it matches.
Now, for your regex pattern and the example you provided you need to remove the /+ in the end. Then it matches your example string. However, it splits it into four capture groups and not into three as I understand you want to have from your list. To split it into four caputre groups you could use this:
"([a-zA-Z0-9!##$?]{2,})([=]{1}[0-9]+)(<<.*)"
This returns the capture groups:
!!Hel##lo!#
=7
<<vbnfhfg
Notice I simplified your last group a little bit, using a dot instead of the list of characters. A dot matches anything, so change that back to your approach in case you don't want to match special characters.
Here is a link to your regex in regex101: link.
I have some articles containing match scores like 13-9, 34-12, 22-10 which I want to extract using a regular expression to find the pattern in Python. re.compile(r'[0-9]+-[0-9]')works but how can I modify to eliminate 1999-06, 2020-01? I tried re.compile(r'[0-9]{1,2}-[0-9]')but those year values return as 99-06 which is also invalid in my case.
You can match for exact number of digits required with look behind assertions, not to slice log numbers, like below
(?<!\d)\d{2}-\d{1,2}
Demo
You can avoid matching in the middle of a number with
r'(?<!\d)[0-9]{1,2}-[0-9]'
The negative lookbehind prohibits matching immediately after another digit.
Perhaps also add
(?!\d)
at the end to impose a similar restriction at the end of the match.
I have expressions with this form...
##name<·parameters·>
...and I want a regular expression that matches the groups name and parameters. As I have a closed (and small) group of values for name I preffer to use a for loop to try with all the few values, but parameters can be anything... anything except <· and ·>, wich are the sequences for opening and closing sets of parameters.
I found this question and I tried this...
##(name)<·((?!(<·|·>).*))·>
...but I can't get it working. I think that the reason is that there the excluded expression is known in position and in number of repetitions (1) but in my case I want to exclude every occurrence of any of this two sequences in a string of unknown length.
Do you know how to do it? Thank you.
You regex must be,
##(name)<·((?:(?!<·|·>).)*)·>
Negative lookahead method. Just understand this part (?!<·|·>). only which matches any character(dot) but not of <· or ·> , (?:(?!<·|·>).)* zero (star) or more times.
or
Non-greedy method.
##(name)<·(.*?)·>
DEMO
You can also use the following regex:
##([^<]*)<\·([^\·]+)\·>
I have a piece of code that records times in this format:
0.0-8.0
0.0-9.0
0.0-10.0
I want to use a regular expression that will find all of these strings and have checked here and here for help but am still confused. I understand how to do it if I only wanted to do single digit numbers, but I can't figure out how to handle double digit numbers like 10 or 20.
It is also important that the expression does not find the string
0.0-1.0
as it should be ignored.
So far my expression looks like this:
expression = re.compile(',0\.0\-[0-2][0-9])
If you want to match each line shown in your question, try an expression like this:
0\.0\-[0-2]?\d\.\d
\d is the same as [0-9]. The ? means 0 or 1 occurrences, so this will only match 1- or 2-digit numbers. If you need the comma at the start of the regex, add that in.
If you want to exclude 0.0-1.0, then you should do that in code, not in the regular expression, since that would make it less readable. But if you insist, I have included one that will exclude that string for you:
Try it here
0\.0\-[0-2]?[0-9]\.(?<!0-1\.)\d
This uses a negative lookbehind to ensure the previous part is not 0-1., which would only occur in the match you didn't want.
How do I match a sequence of numbers preceded by certain text but not return the text, just the sequence of numbers?
For example, let's assume I have the following string:
url = "sampleurl/485734/abcdefgh/83275/"
I want to match all numbers that comes after the word sampleurl. So far, I`ve been using the following code
re.search("sampleurl/[0-9]+", url).group(0)[9:]
that works, but I'm assuming there is a fancier way of doing that instead of needing to use [9:] at the end.
For a quick reference, I've been using regex101 to check the validation of the regex.
You can place a capturing group around the part you want and refer to that group number for the match result.
re.search(r'sampleurl/(\d+)', url).group(1)
Another way would be implementing a lookaround assertion.
re.search(r'(?<=sampleurl/)\d+', url).group(0)