Pad an integer using a regular expression - python

I'm using regular expressions with a python framework to pad a specific number in a version number:
10.2.11
I want to transform the second element to be padded with a zero, so it looks like this:
10.02.11
My regular expression looks like this:
^(\d{2}\.)(\d{1})([\.].*)
If I just regurgitate back the matching groups, I use this string:
\1\2\3
When I use my favorite regular expression test harness (http://kodos.sourceforge.net/), I can't get it to pad the second group. I tried \1\20\3, but that interprets the second reference as 20, and not 2.
Because of the library I'm using this with, I need it to be a one liner. The library takes a regular expression string, and then a string for what should be used to replace it with.
I'm assuming I just need to escape the matching groups string, but I can't figure it out. Thanks in advance for any help.

How about a completely different approach?
nums = version_string.split('.')
print ".".join("%02d" % int(n) for n in nums)

What about removing the . from the regex?
^(\d{2})\.(\d{1})[\.](.*)
replace with:
\1.0\2.\3

Try this:
(^\d(?=\.)|(?<=\.)\d(?=\.)|(?<=\.)\d$)
And replace the match by 0\1. This will make any number at least two digits long.

Does your library support named groups? That might solve your problem.

Related

Extract Number before a Character in a String Using Python

I'm trying to extract the number before character "M" in a series of strings. The strings may look like:
"107S33M15H"
"33M100S"
"12M100H33M"
so basically there would be a sets of numbers separated by different characters, and "M" may show up more than once. For the example here, I would like my code to return:
33
33
12,33 #doesn't matter what deliminator to use here
One way I could think of is to split the string by "M", and find items that are pure numbers, but I suspect there are better ways to do it. Thanks a lot for the help.
You may use a simple (\d+)M regex (1+ digit(s) followed with M where the digits are captured into a capture group) with re.findall.
See IDEONE demo:
import re
s = "107S33M15H\n33M100S\n12M100H33M"
print(re.findall(r"(\d+)M", s))
And here is a regex demo
You can use rpartition to achieve that job.
s = '107S33M15H'
prefix = s.rpartition('M')[0]

Regex giving tuple and not full match

I'm trying to use regex to find proxy address on a website. Currently I'm using this piece of regex (\d{1,3}\.){3}\d{1,3}:(\d+). It works on regexr.com and in sublime text, but when I try to use it in Python it doesn't work as expected.
This is the piece of code I'm using:
p = re.compile("(\d{1,3}\.){3}\d{1,3}:(\d+)")
ipCandidates = p.findall(soupString)
It should return proxies like this 120.206.182.172:8123 but it returns tuples like this ('44.', '3128'). What can I do to fix this?
Thank you.
re.findall() only returns the contents of capturing groups instead of the whole match (if you have such groups in your regex).
Then, you're repeating a capturing group three times, which means that only the third repetition is preserved (the other two are overwritten).
Change your regex to
p = re.compile(r"(?:\d{1,3}\.){3}\d{1,3}:\d+")
and you'll get whole matches.
If you do want tuples of the separate submatches (without the dots and colon), you can do that, too, but you can't use repetition then:
p = re.compile(r"(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3}):(\d+)")
Also, always use raw strings for regexes, so regex escape sequences and string escape sequences can't be confused.

How can I extract two values from a string like this using a regular expression?

How can I get the value from the following strings using one regular expression?
/*##debug_string:value/##*/
or
/*##debug_string:1234/##*/
or
/*##debug_string:http://stackoverflow.com//##*/
The result should be
value
1234
http://stackoverflow.com/
Trying to read behind your pattern
re.findall("/\*##debug_string:(.*?)/##\*/", your_string)
Note that your variations cannot work because you didn't escape the *. In regular expressions, * mean a repetition of the previous character/group. If you really mean the * character, you must use \*.
import re
print re.findall("/\*##debug_string:(.*?)/##\*/", "/*##debug_string:value/##*/")
print re.findall("/\*##debug_string:(.*?)/##\*/", "/*##debug_string:1234/##*/")
print re.findall("/\*##debug_string:(.*?)/##\*/", "/*##debug_string:http://stackoverflow.com//##*/")
Executes as:
['value']
['1234']
['http://stackoverflow.com/']
EDIT: Ok I see that you can have a URL. I've amended the pattern to take it into account.
Use this regex:
[^:]+:([^/]+)
And use capture group #1 for your value.
Live Demo: http://www.rubular.com/r/FxFnpfPHFn
Your regex will be something like: .*:(.*)/.+. Group 1 will be what you are looking for. However this is a REALLY inclusive regex, you might want to post some more details so that you can create some more restrictions.
Assuming that the format stays consistent:
re.findall('debug_string:([^\/]+)\/##', string)

Regular Expressions Dependant on Previous Matchings

For example, how could we recognize a string of the following format with a single RE:
LenOfStr:Str
An example string in this format is:
5:5:str
The string we're looking for is "5:str".
In python, maybe something like the following (this isn't working):
r'(?P<len>\d+):(?P<str>.{int((?P=len))})'
In general, is there a way to change the previously matched groups before using them or I just asked yet another question not meant for RE.
Thanks.
Yep, what you're describing is outside the bounds of regular expressions. Regular expressions only deal with actual character data. This provides some limited ability to make matches dependent on context (e.g., (.)\1 to match the same character twice), but you can't apply arbitrary functions to pieces of an in-progress match and use the results later in the same match.
You could do something like search for text matching the regex (\d+):\w+, and then postprocess the results to check if the string length is equal to the int value of the first part of the match. But you can't do that as part of the matching process itself.
Well this can be done with a regex (if I understand the question):
>>> s='5:5:str and some more characters...'
>>> m=re.search(r'^(\d+):(.*)$',s)
>>> m.group(2)[0:int(m.group(1))]
'5:str'
It just cannot be done by dynamically changing the previous match group.
You can make it lool like a single regex like so:
>>> re.sub(r'^(\d+):(.*)$',lambda m: m.group(2)[0:int(m.group(1))],s)
'5:str'

Grouping in Python Regular Expressions

So I'm playing around with regular expressions in Python. Here's what I've gotten so far (debugged through RegExr):
##(VAR|MVAR):([a-zA-Z0-9]+)+(?::([a-zA-Z0-9]+))*##
So what I'm trying to match is stuff like this:
##VAR:param1##
##VAR:param2:param3##
##VAR:param4:param5:param6:0##
Essentially, you have either VAR or MVAR followed by a colon then some param name, then followed by the end chars (##) or another : and a param.
So, what I've gotten for the groups on the regex is the VAR, the first param, and then the last thing in the parameter list (for the last example, the 3rd group would be 0). I understand that groups are created by (...), but is there any way for the regex to match the multiple groups, so that param5, param6, and 0 are in their own group, rather than only having a maximum of three groups?
I'd like to avoid having to match this string then having to split on :, as I think this is capable of being done with regex. Perhaps I'm approaching this the wrong way.
Essentially, I'm attempting to see if I can find and split in the matching process rather than a postprocess.
If this format is fixed, you don't need regex, it just makes it harder. Just use split:
text.strip('#').split(':')
should do it.
The number of groups in a regular expression is fixed. You will need to postprocess somehow.

Categories

Resources