I have a piece of code that records times in this format:
0.0-8.0
0.0-9.0
0.0-10.0
I want to use a regular expression that will find all of these strings and have checked here and here for help but am still confused. I understand how to do it if I only wanted to do single digit numbers, but I can't figure out how to handle double digit numbers like 10 or 20.
It is also important that the expression does not find the string
0.0-1.0
as it should be ignored.
So far my expression looks like this:
expression = re.compile(',0\.0\-[0-2][0-9])
If you want to match each line shown in your question, try an expression like this:
0\.0\-[0-2]?\d\.\d
\d is the same as [0-9]. The ? means 0 or 1 occurrences, so this will only match 1- or 2-digit numbers. If you need the comma at the start of the regex, add that in.
If you want to exclude 0.0-1.0, then you should do that in code, not in the regular expression, since that would make it less readable. But if you insist, I have included one that will exclude that string for you:
Try it here
0\.0\-[0-2]?[0-9]\.(?<!0-1\.)\d
This uses a negative lookbehind to ensure the previous part is not 0-1., which would only occur in the match you didn't want.
Related
I have an input that is valid if it has this parts:
starts with letters(upper and lower), numbers and some of the following characters (!,#,#,$,?)
begins with = and contains only of numbers
begins with "<<" and may contain anything
example: !!Hel##lo!#=7<<vbnfhfg
what is the right regex expression in python to identify if the input is valid?
I am trying with
pattern= r"([a-zA-Z0-9|!|#|#|$|?]{2,})([=]{1})([0-9]{1})([<]{2})([a-zA-Z0-9]{1,})/+"
but apparently am wrong.
For testing regex I can really recommend regex101. Makes it much easier to understand what your regex is doing and what strings it matches.
Now, for your regex pattern and the example you provided you need to remove the /+ in the end. Then it matches your example string. However, it splits it into four capture groups and not into three as I understand you want to have from your list. To split it into four caputre groups you could use this:
"([a-zA-Z0-9!##$?]{2,})([=]{1}[0-9]+)(<<.*)"
This returns the capture groups:
!!Hel##lo!#
=7
<<vbnfhfg
Notice I simplified your last group a little bit, using a dot instead of the list of characters. A dot matches anything, so change that back to your approach in case you don't want to match special characters.
Here is a link to your regex in regex101: link.
I have some articles containing match scores like 13-9, 34-12, 22-10 which I want to extract using a regular expression to find the pattern in Python. re.compile(r'[0-9]+-[0-9]')works but how can I modify to eliminate 1999-06, 2020-01? I tried re.compile(r'[0-9]{1,2}-[0-9]')but those year values return as 99-06 which is also invalid in my case.
You can match for exact number of digits required with look behind assertions, not to slice log numbers, like below
(?<!\d)\d{2}-\d{1,2}
Demo
You can avoid matching in the middle of a number with
r'(?<!\d)[0-9]{1,2}-[0-9]'
The negative lookbehind prohibits matching immediately after another digit.
Perhaps also add
(?!\d)
at the end to impose a similar restriction at the end of the match.
I want to know if a string is a collection of, by example, numbers ([0-9]).
I this case, i'm using the regular expression [0-9](,[0-9])* to find one or more numbers separated by commas (A collection of numbers).
Is there a better way to do it? I mean a shorter expression perhaps.
I would suggest the following pattern:
(?<=^|,|\s)(\d+)
(?<=...) is a lookbehind assertion that will not be captured into the groups nor be included into the matched string. It is used to identify the starting position of the number to be matched.
You can try the above pattern interactively in the following website:
https://regex101.com/r/IKGWtA/1
\d*(,\d*)* will catch the situation where you have multiple digits before and after a comma e.g. 100,000. This regex will only grab 0,0 from that same number.
Similar to this question but with a difference subtle enough that I still need some help.
Currently I have:
'(.*)\[(\d+\-\d+)\]'
as my regex, which matches any number of characters followed by square brackets [] that contain two decimals separated by a dash. My issue is, I'd like it to also match with just one decimal number between the square brackets, and possibly even with nothing in between the square brackets. So:
word[1-5] = match
word[5] = match
word[] = match (not essential)
and ensuring
word[-5] = no match
Could anyone possibly point my in the direction of the next step. I currently find regex to be a bit of a guessing game though I would like to become better with them.
Go with yours and make the last part optional
(.*)\[(\d+(-\d+)?)\]
Using ?.
To accomplish the other task, well, go with ? again
(.*)\[(\d+(-\d+)?)?\]
^here
A working example http://rubular.com/r/t0MaHyHfeS
Use ? to match 0 or 1 match
So use ? for the -\d+ and for both the digits separated by -
(.*)\[(\d+(-\d+)?)?\]
No need to escape -..It has special meaning only if its's between a character class.
(.*)\[((\d+(?:\-\d+)?)?)\]
This will match everything, even with 0 digits in there and will backreference you (in match[1-5]):
1- match
2- 1-5
Not every regex interpreter supports this, but you could try an "or" operator for the part inside the brackets:
'(.*)\[(\d+\-\d+|\d+)\]'
I've looked thrould the forums but could not find exactly how exactly to solve my problem.
Let's say I have a string like the following:
UDK .636.32/38.082.4454.2(575.3)
and I would like to match the expression with a regex, capturing the actual number (in this case the '.636.32/38.082.4454.2(575.3)').
There could be some garbage characters between the 'UDK' and the actual number, and characters like '.', '/' or '-' are valid parts of the number. Essentially the number is a sequence of digits separated by some allowed characters.
What I've came up with is the following regex:
'UDK.*(\d{1,3}[\.\,\(\)\[\]\=\'\:\"\+/\-]{0,3})+'
but it does not group the '.636.32/38.082.4454.2(575.3)'! It leaves me with nothing more than a last digit of the last group (3 in this case).
Any help would be greatly appreciated.
First, you need a non-greedy .*?.
Second, you don't need to escape some chars in [ ].
Third, you might just consider it as a sequence of digits AND some allowed characters? Why there is a \d{1,3} but a 4454?
>>> re.match(r'UDK.*?([\d.,()\[\]=\':"+/-]+)', s).group(1)
'.636.32/38.082.4454.2(575.3)'
Not so much a direct answer to your problem, but a general regexp tip: use Kodos (http://kodos.sourceforge.net/). It is simply awesome for composing/testing out regexps. You can enter some sample text, and "try out" regular expressions against it, seeing what matches, groups, etc. It even generates Python code when you're done. Good stuff.
Edit: using Kodos I came up with:
UDK.*?(?P<number>[\d/.)(]+)
as a regexp which matches the given example. Code that Kodos produces is:
import re
rawstr = r"""UDK.*?(?P<number>[\d/.)(]+)"""
matchstr = """UDK .636.32/38.082.4454.2(575.3)"""
# method 1: using a compile object
compile_obj = re.compile(rawstr)
match_obj = compile_obj.search(matchstr)
# Retrieve group(s) by name
number = match_obj.group('number')