Representing version number as regular expression

Representing version number as regular expression - python

I need to represent version numbers as regular expressions. The broad definition is
Consist only of numbers
Allow any number of decimal points (but not consecutively)
No limit on maximum number
So 2.3.4.1,2.3,2,9999.9999.9999 are all valid whereas 2..,2.3. is not.
I wrote the following simple regex
'(\d+\.{0,1})+'
Using it in python with re module and searching in '2.6.31' gives
>>> y = re.match(r'(\d+\.{0,1})+$','2.6.31')
>>> y.group(0)
'2.6.31'
>>> y.group(1)
'31'
But if I name the group, then the named group only has 31.
Is my regex representation correct or can it be tuned/improved? It does not currently handle the 2.3. case.

The notation {0,1} can be shortened to just ?:
r'(\d+\.?)+$'
However, the above will allow a trailing .. Perhaps try:
r'\d+(\.\d+)*$'
Once you have validated the format matches what you expect, the easiest way to get the numbers out is with re.findall():
>>> ver = "1.2.3.4"
>>> re.findall(r'\d+', ver)
['1', '2', '3', '4']

Alternatively, you might want to use pyparsing:
>>> from pyparsing import *
>>> integer = Word(nums)
>>> parser = delimitedList(integer, delim='.') + StringEnd()
>>> list(parser.parseString('1.2.3.4'))
['1', '2', '3', '4']
or lepl:
>>> from lepl import *
>>> with Separator(~Literal('.')):
... parser = Integer()[1:]
>>> parser.parse('1.2.3.4')
['1', '2', '3', '4']

Related

how to find the matching pattern for an input list and then replace the found pattern with the proper pattern conversion using python

note that the final two numbers of this pattern for example FBXASC048 are ment to be ascii code for numbers (0-9)
input example list ['FBXASC048009Car', 'FBXASC053002Toy', 'FBXASC050004Human']
result example ['1009Car', '5002Toy', '2004Human']
what is the proper way to searches for any of these pattern in an input list
num_ascii = ['FBXASC048', 'FBXASC049', 'FBXASC050', 'FBXASC051', 'FBXASC052', 'FBXASC053', 'FBXASC054', 'FBXASC055', 'FBXASC056', 'FBXASC057']
and then replaces the pattern found with one of the items in the conv list but not randomally
because each element in the pattern list equals only one element in the conv_list
conv_list = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
this is the solution in mind:
it has two part
1st part--> is to find for ascii pattern[48, 49, 50, 51, 52, 53, 54, 55, 56,57]
and then replace those with the proper decimal matching (0-9)
so we will get new input list will be called input_modi_list that has ascii replaced with decimal
2nd part-->another process to use fixed pattern to replace using replace function which is this 'FBXASC0'
new_list3
for x in input_modi_list:
y = x.replace('FBXASC0', '')
new_list3.append(new_string)
so new_list3 will have the combined result of the two parts mentioned above.
i don't know if there would be a simplar solution or a better one maybe using regex
also note i don't have any idea on how to replace ascii with decimal for a list of items

I think this should do the trick:
import re
input_list = ['FBXASC048009Car', 'FBXASC053002Toy', 'FBXASC050004Human']
pattern = re.compile('FBXASC(\d{3,3})')
def decode(match):
return chr(int(match.group(1)))
result = [re.sub(pattern, decode, item) for item in input_list]
print(result)
Now, there is some explanation due:
1- the pattern object is a regular expression that will match any part of a string that starts with 'FBXASC' and ends with 3 digits (0-9). (the \d means digit, and {3,3} means that it should occur at least 3, and at most 3 times, i.e. exactly 3 times). Also, the parenthesis around \d{3,3} means that the three digits matched will be stored for later use (explained in the next part).
2- The decode function receives a match object, uses .group(1) to extract the first matched group (which in our case are the three digits matched by \d{3,3}), then uses the int function to parse the string into an integer (for example, convert '048' to 48), and finally uses the chr function to find which character has that ASCII-code. (for example chr(48) will return '0', and chr(65) will return 'A')
3- The final part applies the re.sub function to all elements of list which will replace each occurrence of the pattern you described (FBXASC048[3-digits]) with it's corresponding ASCII character.
You can see that this solution is not limited only to your specific examples. Any number can be used as long as it has a corresponding ASCII character recognized by the chr function.
But, if you do want to limit it just to the 48-57 range, you can simply modify the decode function:
def decode(match):
ascii_code = int(match.group(1))
if ascii_code >= 48 and ascii_code <= 57:
return chr(ascii_code)
else:
return match.group(0) # returns the entire string - no modification

This is how I would do it.
make the regex pattern by simply joining the strings with |:
>>> num_ascii = ['FBXASC048', 'FBXASC049', 'FBXASC050', 'FBXASC051', 'FBXASC052', 'FBXASC053', 'FBXASC054', 'FBXASC055', 'FBXASC056', 'FBXASC057']
>>> conv_list = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
>>> regex_pattern = '|'.join(num_ascii)
>>> regex_pattern
'FBXASC048|FBXASC049|FBXASC050|FBXASC051|FBXASC052|FBXASC053|FBXASC054|FBXASC055
|FBXASC056|FBXASC057'
make a look-up dictionary by simply zipping the two lists:
>>> conv_table = dict(zip(num_ascii, conv_list))
>>> conv_table
{'FBXASC048': '0', 'FBXASC049': '1', 'FBXASC050': '2', 'FBXASC051': '3', 'FBXASC
052': '4', 'FBXASC053': '5', 'FBXASC054': '6', 'FBXASC055': '7', 'FBXASC056': '8
', 'FBXASC057': '9'}
iterate over the data and replace the matched string with the corresponding digit:
>>> import re
>>> result = []
>>> for item in ['FBXASC048009Car', 'FBXASC053002Toy', 'FBXASC050004Human']:
... m = re.match(regex_pattern, item)
... matched_string = m[0]
... digit = (conv_table[matched_string])
... print(f'replacing {matched_string} with {digit}')
... result.append(item.replace(matched_string, digit))
...
replacing FBXASC048 with 0
replacing FBXASC053 with 5
replacing FBXASC050 with 2
>>> result
['0009Car', '5002Toy', '2004Human']

How to split algebraic expressions in a string using python?

For example I get following input:
-9x+5x-2-4x+5
And I need to get following list:
['-9x', '5x', '-2', '-4x', '5']
Here is my code, but I don't know how to deal with minuses.
import re
text = '-3x-5x+2=9x-9'
text = re.split(r'\W', text)
print(text)
warning: I cannot use any libraries except re and math.

You could re.findall all groups of characters followed by + or - (or end-of-string $), then strip the + (which, like -, is still part of the following group) from the substrings.
>>> s = "-9x+5x-2-4x+5"
>>> [x.strip("+") for x in re.findall(r".+?(?=[+-]|$)", s)]
['-9x', '5x', '-2', '-4x', '5']
Similarly, for the second string with =, add that to the character group and also strip it off the substrings:
>>> s = '-3x-5x+2=9x-9'
>>> [x.strip("+=") for x in re.findall(r".+?(?=[+=-]|$)", s)]
>>> ['-3x', '-5x', '2', '9x', '-9']
Or apply the original comprehension to the substrings after splitting by =, depending on how the result should look like:
>>> [[x.strip("+") for x in re.findall(r".+?(?=[+-]|$)", s2)] for s2 in s.split("=")]
>>> [['-3x', '-5x', '2'], ['9x', '-9']]
In fact, now that I think of it, you can also just findall that match an optional minus, followed by some digits, and an optional x, with or without splitting by = first:
>>> [re.findall(r"-?\d+x?", s2) for s2 in s.split("=")]
[['-3x', '-5x', '2'], ['9x', '-9']]

One of many possible ways:
import re
term = "-9x+5x-2-4x+5"
rx = re.compile(r'-?\d+[a-z]?')
factors = rx.findall(term)
print(factors)
This yields
['-9x', '5x', '-2', '-4x', '5']

For your example data, you might split on either a plus or equals sign or split when asserting a minus sign on the right which is not at the start of the string.
[+=]|(?=(?<!^)-)
[+=] Match either + or =
| Or
(?=(?<!^)-) Positive lookahead, assert what is on the right is - but not at the start of the string
Regex demo | Python demo
Output for both example strings
['-9x', '5x', '-2', '-4x', '5']
['-3x', '-5x', '2', '9x', '-9']

Python Regex capture multiple sections within string

I have string that are always of the format track-a-b where a and b are integers.
For example:
track-12-29
track-1-210
track-56-1
How do I extract a and b from such strings in python?

If it's just a single string, I would approach this using split:
>>> s = 'track-12-29'
>>> s.split('-')[1:]
['12', '29']
If it is a multi-line string, I would use the same approach ...
>>> s = 'track-12-29\ntrack-1-210\ntrack-56-1'
>>> results = [x.split('-')[1:] for x in s.splitlines()]
[['12', '29'], ['1', '210'], ['56', '1']]

You'll want to use re.findall() with capturing groups:
results = [re.findall(r'track-(\d+)-(\d+)', datum) for datum in data]

Splitting a string into a list (but not separating adjacent numbers) in Python

For example, I have:
string = "123ab4 5"
I want to be able to get the following list:
["123","ab","4","5"]
rather than list(string) giving me:
["1","2","3","a","b","4"," ","5"]

Find one or more adjacent digits (\d+), or if that fails find non-digit, non-space characters ([^\d\s]+).
>>> string = '123ab4 5'
>>> import re
>>> re.findall('\d+|[^\d\s]+', string)
['123', 'ab', '4', '5']
If you don't want the letters joined together, try this:
>>> re.findall('\d+|\S', string)
['123', 'a', 'b', '4', '5']

The other solutions are definitely easier. If you want something far less straightforward, you could try something like this:
>>> import string
>>> from itertools import groupby
>>> s = "123ab4 5"
>>> result = [''.join(list(v)) for _, v in groupby(s, key=lambda x: x.isdigit())]
>>> result = [x for x in result if x not in string.whitespace]
>>> result
['123', 'ab', '4', '5']

You could do:
>>> [el for el in re.split('(\d+)', string) if el.strip()]
['123', 'ab', '4', '5']

This will give the split you want:
re.findall(r'\d+|[a-zA-Z]+', "123ab4 5")
['123', 'ab', '4', '5']

you can do a few things here, you can
1. iterate the list and make groups of numbers as you go, appending them to your results list.
not a great solution.
2. use regular expressions.
implementation of 2:
>>> import re
>>> s = "123ab4 5"
>>> re.findall('\d+|[^\d]', s)
['123', 'a', 'b', '4', ' ', '5']
you want to grab any group which is at least 1 number \d+ or any other character.
edit
John beat me to the correct solution first. and its a wonderful solution.
i will leave this here though because someone else might misunderstand the question and look for an answer to what i thought was written also. i was under the impression the OP wanted to capture only groups of numbers, and leave everything else individual.

python random.sample("abc",3) . how to use regular expressions in place of "abc"

I am trying to generate random number in python using random.sample . This is how I am writing code .
import random
r = "".join(random.sample("abcdefghijklmnopqrstuvwxyz", 10))
Above code generates random numbers of length 10 whose letters are populated from the letters a-z. How can i use regular expressions over there like [a-z]?

Just use this instead
>>> from string import ascii_lowercase, ascii_uppercase
>>> ascii_lowercase
'abcdefghijklmnopqrstuvwxyz'
>>> ascii_uppercase
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
If you want to generate digits you can use range
>>> map(str, range(10))
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

If you have the pyparsing module handy, you can use the srange method to expand a regex character set to a string of characters:
>>> from pyparsing import srange
>>> srange('[a-z]')
u'abcdefghijklmnopqrstuvwxyz'
>>> srange('[a-z0-9]')
u'abcdefghijklmnopqrstuvwxyz0123456789'

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Representing version number as regular expression - python

Related

how to find the matching pattern for an input list and then replace the found pattern with the proper pattern conversion using python

How to split algebraic expressions in a string using python?

Python Regex capture multiple sections within string

Splitting a string into a list (but not separating adjacent numbers) in Python

python random.sample("abc",3) . how to use regular expressions in place of "abc"

Categories

Resources