Python regular expression match number in string - python

I used regular expression in python2.7 to match the number in a string but I can't match a single number in my expression, here are my code
import re
import cv2
s = '858 1790 -156.25 2'
re_matchData = re.compile(r'\-?\d{1,10}\.?\d{1,10}')
data = re.findall(re_matchData, s)
print data
and then print:
['858', '1790', '-156.25']
but when I change expression from
re_matchData = re.compile(r'\-?\d{1,10}\.?\d{1,10}')
to
re_matchData = re.compile(r'\-?\d{0,10}\.?\d{1,10}')
then print:
['858', '1790', '-156.25', '2']
is there any confuses between d{1, 10} and d{0,10} ?
If I did wrong, how to correct it ?
Thanks for checking my question !

try this:
r'\-?\d{1,10}(?:\.\d{1,10})?'
use (?:)? to make fractional part optional.
for r'\-?\d{0,10}\.?\d{1,10}', it is \.?\d{1,10} who matched 2.

The first \d{1,10} matches from 1 to 10 digits, and the second \d{1,10} also matches from 1 to 10 digits. In order for them both to match, you need at least 2 digits in your number, with an optional . between them.
You should make the entire fraction optional, not just the ..
r'\-?\d{1,10}(?:\.\d{1,10})?'

I would rather do as follows:
import re
s = '858 1790 -156.25 2'
re_matchData = re.compile(r'\-?\d{1,10}\.?\d{0,10}')
data = re_matchData.findall(s)
print data
Output:
['858', '1790', '-156.25', '2']

Related

Regular expression to retrieve string parts within parentheses separated by commas

I have a String from which I want to take the values within the parenthesis. Then, get the values that are separated from a comma.
Example: x(142,1,23ERWA31)
I would like to get:
142
1
23ERWA31
Is it possible to get everything with one regex?
I have found a method to do so, but it is ugly.
This is how I did it in python:
import re
string = "x(142,1,23ERWA31)"
firstResult = re.search("\((.*?)\)", string)
secondResult = re.search("(?<=\()(.*?)(?=\))", firstResult.group(0))
finalResult = [x.strip() for x in secondResult.group(0).split(',')]
for i in finalResult:
print(i)
142
1
23ERWA31
This works for your example string:
import re
string = "x(142,1,23ERWA31)"
l = re.findall (r'([^(,)]+)(?!.*\()', string)
print (l)
Result: a plain list
['142', '1', '23ERWA31']
The expression matches a sequence of characters not in (,,,) and – to prevent the first x being picked up – may not be followed by a ( anywhere further in the string. This makes it also work if your preamble x consists of more than a single character.
findall rather than search makes sure all items are found, and as a bonus it returns a plain list of the results.
You can make this a lot simpler. You are running your first Regex but then not taking the result. You want .group(1) (inside the brackets), not .group(0) (the whole match). Once you have that you can just split it on ,:
import re
string = "x(142,1,23ERWA31)"
firstResult = re.search("\((.*?)\)", string)
for e in firstResult.group(1).split(','):
print(e)
A little wonky looking, and also assuming there's always going to be a grouping of 3 values in the parenthesis - but try this regex
\((.*?),(.*?),(.*?)\)
To extract all the group matches to a single object - your code would then look like
import re
string = "x(142,1,23ERWA31)"
firstResult = re.search("\((.*?),(.*?),(.*?)\)", string).groups()
You can then call the firstResult object like a list
>> print(firstResult[2])
23ERWA31

Regular expression for version number (vX.X.X) not working

I am trying to check that an input string which contains a version number of the correct format.
vX.X.X
where X can be any number of numerical digits, e.g:
v1.32.12 or v0.2.2 or v1232.321.23
I have the following regular expression:
v([\d.][\d.])([\d])
This does not work.
Where is my error?
EDIT: I also require the string to have a max length of 20 characters, is there a way to do this through regex or is it best to just use regular Python len()
Note that [\d.] should match any one character either a digit or a dot.
v(\d+)\.(\d+)\.\d+
Use \d+ to match one or more digit characters.
Example:
>>> import re
>>> s = ['v1.32.12', 'v0.2.2' , 'v1232.321.23', 'v1.2.434312543898765']
>>> [i for i in s if re.match(r'^(?!.{20})v(\d+)\.(\d+)\.\d+$', i)]
['v1.32.12', 'v0.2.2', 'v1232.321.23']
>>>
(?!.{20}) negative lookahead at the start checks for the string length before matching. If the string length is atleast 20 then it would fails immediately without do matching on that particular string.
#Avinash Raj.Your answer is perfect except for one correction.
It would allow only 19 characters.Slight correction
>>> import re
>>> s = ['v1.32.12', 'v0.2.2' , 'v1232.321.23', 'v1.2.434312543898765']
>>> [i for i in s if re.match(r'^(?!.{21})v(\d+)\.(\d+)\.\d+$', i)]
['v1.32.12', 'v0.2.2', 'v1232.321.23']
>>>

python regex with repeating subpattern

I am wondering if there is a 'smart' way (one regex expression) to extract IDs from the following paragraph:
...
imgList = '9/optimized/1260089_fpx.tif,0/optimized/1260090_fpx.tif';
...
The result shoul be a list containing 1260089 and 1260090. The count of the IDs might be up to 10.
I need something like:
re.findall('imgList = (some expression)', string)
Any ideas?
Best would be to use a single regex finding all the numbers. I call for re.findall
>>> imgList = '9/optimized/1260089_fpx.tif,0/optimized/1260090_fpx.tif'
>>> import re
>>> re.findall('optimized/([0-9]*)_fpx', imgList)
['1260089', '1260090']
You could of course make the regex stronger, but if the data is as you indicated, this should suffice.
import re
s = '9/optimized/1260089_fpx.tif,0/optimized/1260090_fpx.tif'
print(re.findall(r'(\d+)_fpx.tif', s))
If the optimzed/ an _fpx part is not ensured and the ID is between 7 and 10 digits
you could do something like
import re
re.findall('[\d]{7,10}', imgList)
This will find a 7 to 10 digit number in the string, hence, IDs with 0-6 or more than 10 digits will be excluded.
import re
imgList = '9/optimized/1260089_fpx.tif,0/optimized/1260090_fpx.tif'
re.findall(r'([0-9]){7}',imgList)
['1260089', '1260090']
The code can only meet your situation.

How to replace only particular dots in the string?

How to replace only particular dots in the text string?:
string_expample = '123|4.3|123.54|sdflk|hfghjkkf.ffg..t.s..9.7..tg..3..654..2.fd'
I need to get only dots that are 1 and between 2 digits( 4.3 from |4.3|; 3.5 from 123.54, etc.)
be replaced by commas in the original string, is it possible?
If so, how?
So, the result string must be:
string_final = '123|4,3|123,54|sdflk|hfghjkkf.ffg..t.s..9,7..tg..3...654..2.fd'
Thanks in advance.
import re
string_example = '123|4.3|123.54|sdflk|hfghjkkf.ffg..t.s..4..tg..3...654..2.fd'
string_final = re.sub(r'(\d)\.(\d)', r'\1,\2', string_example)
print(string_final)
123|4,3|123,54|sdflk|hfghjkkf.ffg..t.s..4..tg..3...654..2.fd
We use a regular expression to find "digit . digit" (the digits are captured into groups with parentheses) and replace them with "group 1 , group 2" (the groups are the corresponding digits).

How to use regex to parse a number from HTML?

I want to write a simple regular expression in Python that extracts a number from HTML. The HTML sample is as follows:
Your number is <b>123</b>
Now, how can I extract "123", i.e. the contents of the first bold text after the string "Your number is"?
import re
m = re.search("Your number is <b>(\d+)</b>",
"xxx Your number is <b>123</b> fdjsk")
if m:
print m.groups()[0]
Given s = "Your number is <b>123</b>" then:
import re
m = re.search(r"\d+", s)
will work and give you
m.group()
'123'
The regular expression looks for 1 or more consecutive digits in your string.
Note that in this specific case we knew that there would be a numeric sequence, otherwise you would have to test the return value of re.search() to make sure that m contained a valid reference, otherwise m.group() would result in a AttributeError: exception.
Of course if you are going to process a lot of HTML you want to take a serious look at BeautifulSoup - it's meant for that and much more. The whole idea with BeautifulSoup is to avoid "manual" parsing using string ops or regular expressions.
import re
x = 'Your number is <b>123</b>'
re.search('(?<=Your number is )<b>(\d+)</b>',x).group(0)
this searches for the number that follows the 'Your number is' string
import re
print re.search(r'(\d+)', 'Your number is <b>123</b>').group(0)
The simplest way is just extract digit(number)
re.search(r"\d+",text)
val="Your number is <b>123</b>"
Option : 1
m=re.search(r'(<.*?>)(\d+)(<.*?>)',val)
m.group(2)
Option : 2
re.sub(r'([\s\S]+)(<.*?>)(\d+)(<.*?>)',r'\3',val)
import re
found = re.search("your number is <b>(\d+)</b>", "something.... Your number is <b>123</b> something...")
if found:
print found.group()[0]
Here (\d+) is the grouping, since there is only one group [0] is used. When there are several groupings [grouping index] should be used.
To extract as python list you can use findall
>>> import re
>>> string = 'Your number is <b>123</b>'
>>> pattern = '\d+'
>>> re.findall(pattern,string)
['123']
>>>
You can use the following example to solve your problem:
import re
search = re.search(r"\d+",text).group(0) #returns the number that is matched in the text
print("Starting Index Of Digit", search.start())
print("Ending Index Of Digit:", search.end())
import re
x = 'Your number is <b>123</b>'
output = re.search('(?<=Your number is )<b>(\d+)</b>',x).group(1)
print(output)

Categories

Resources