Python regex to check if a string has particular set of numbers - python

I want to check if this set of number appears in a string in an exact pattern or not:
String I want to check: \4&2096297&0
My code
a = "SCSI\DISK&VEN_MICRON&PROD_1100\4&2096297&0&000200"
print(bool(re.match(r"\4&2096297&0+", a)))
It returns False instead of true. If I try same thing on print(bool(re.match(r"hello[0-9]+", 'hello1'))). I get true. Where am I going wrong?

import re
pattern = "\4&2096297&0"
print(bool(re.search(pattern,a))) # this would print "True"

\4 refers to a character whose ordinal number is octal 4 rather than a literal backslash followed by a 4. You should use a raw string literal for the variable a instead:
a = r"SCSI\DISK&VEN_MICRON&PROD_1100\4&2096297&0&000200"
Also, instead of using r"\4&2096297&0+" as your regex, you should use double backslashes to denote a literal backslash so that \4 would not be interpreted as a backreference:
r"\\4&2096297&0+"
And finally, instead of re.match, you should use re.search since re.match matches the regex from the beginning of the string, which is not what you want.
So:
import re
a = r"SCSI\DISK&VEN_MICRON&PROD_1100\4&2096297&0&000200"
print(bool(re.search(r"\\4&2096297&0+", a)))
would output: True

Related

how to make a list in python from a string and using regular expression [duplicate]

I have a sample string <alpha.Customer[cus_Y4o9qMEZAugtnW] active_card=<alpha.AlphaObject[card] ...>, created=1324336085, description='Customer for My Test App', livemode=False>
I only want the value cus_Y4o9qMEZAugtnW and NOT card (which is inside another [])
How could I do it in easiest possible way in Python?
Maybe by using RegEx (which I am not good at)?
How about:
import re
s = "alpha.Customer[cus_Y4o9qMEZAugtnW] ..."
m = re.search(r"\[([A-Za-z0-9_]+)\]", s)
print m.group(1)
For me this prints:
cus_Y4o9qMEZAugtnW
Note that the call to re.search(...) finds the first match to the regular expression, so it doesn't find the [card] unless you repeat the search a second time.
Edit: The regular expression here is a python raw string literal, which basically means the backslashes are not treated as special characters and are passed through to the re.search() method unchanged. The parts of the regular expression are:
\[ matches a literal [ character
( begins a new group
[A-Za-z0-9_] is a character set matching any letter (capital or lower case), digit or underscore
+ matches the preceding element (the character set) one or more times.
) ends the group
\] matches a literal ] character
Edit: As D K has pointed out, the regular expression could be simplified to:
m = re.search(r"\[(\w+)\]", s)
since the \w is a special sequence which means the same thing as [a-zA-Z0-9_] depending on the re.LOCALE and re.UNICODE settings.
You could use str.split to do this.
s = "<alpha.Customer[cus_Y4o9qMEZAugtnW] active_card=<alpha.AlphaObject[card]\
...>, created=1324336085, description='Customer for My Test App',\
livemode=False>"
val = s.split('[', 1)[1].split(']')[0]
Then we have:
>>> val
'cus_Y4o9qMEZAugtnW'
This should do the job:
re.match(r"[^[]*\[([^]]*)\]", yourstring).groups()[0]
your_string = "lnfgbdgfi343456dsfidf[my data] ljfbgns47647jfbgfjbgskj"
your_string[your_string.find("[")+1 : your_string.find("]")]
courtesy: Regular expression to return text between parenthesis
You can also use
re.findall(r"\[([A-Za-z0-9_]+)\]", string)
if there are many occurrences that you would like to find.
See also for more info:
How can I find all matches to a regular expression in Python?
You can use
import re
s = re.search(r"\[.*?]", string)
if s:
print(s.group(0))
How about this ? Example illusrated using a file:
f = open('abc.log','r')
content = f.readlines()
for line in content:
m = re.search(r"\[(.*?)\]", line)
print m.group(1)
Hope this helps:
Magic regex : \[(.*?)\]
Explanation:
\[ : [ is a meta char and needs to be escaped if you want to match it literally.
(.*?) : match everything in a non-greedy way and capture it.
\] : ] is a meta char and needs to be escaped if you want to match it literally.
This snippet should work too, but it will return any text enclosed within "[]"
re.findall(r"\[([a-zA-Z0-9 ._]*)\]", your_text)

How to find string between '\begin{minipage}' and '\end{minipage}' by python re?

I have tried the following code:
strReFindString = u"\\begin{minipage}"+"(.*?)"
strReFindString += u"\\end{minipage}"
lst = re.findall(strReFindString, strBuffer, re.DOTALL)
But it always returns empty list.
How can I do?
Thanks all.
As #BrenBarn said, u"\\b" parses as \b; and \b is not a valid regexp escape, so findall treats it as b (literal b). u"\\\\b" is \\b, which regexp understands as \b (literal backslash, literal b). You can prevent escape-parsing in the string using raw strings, ur"\\b" is equal to u"\\\\b":
ur"\\b" == u"\\\\b"
# => True

Why does my python regex not work?

I wanna replace all the chars which occur more than one time,I used Python's re.sub and my regex looks like this data=re.sub('(.)\1+','##',data), But nothing happened...
Here is my Text:
Text
※※※※※※※※※※※※※※※※※Chapter One※※※※※※※※※※※※※※※※※※
This is the begining...
You need to use raw string here, 1 is interpreted as octal and then its ASCII value present at its integer equivalent is used in the string.
>>> '\1'
'\x01'
>>> chr(01)
'\x01'
>>> '\101'
'A'
>>> chr(0101)
'A'
Use raw string to fix this:
>>> '(.)\1+'
'(.)\x01+'
>>> r'(.)\1+' #Note the `r`
'(.)\\1+'
Use a raw string, so the regex engine interprets backslashes instead of the Python parser. Just put an r in front of the string:
data=re.sub(r'(.)\1+', '##', data)
^ this r is the important bit
Otherwise, \1 is interpreted as character value 1 instead of a backreference.

using reg exp to check if test string is of a fixed format

I want to make sure using regex that a string is of the format- "999.999-A9-Won" and without any white spaces or tabs or newline characters.
There may be 2 or 3 numbers in the range 0 - 9.
Followed by a period '.'
Again followed by 2 or 3 numbers in the range 0 - 9
Followed by a hyphen, character 'A' and a number between 0 - 9 .
This can be followed by anything.
Example: 87.98-A8-abcdef
The code I have come up until now is:
testString = "87.98-A1-help"
regCompiled = re.compile('^[0-9][0-9][.][0-9][0-9][-A][0-9][-]*');
checkMatch = re.match(regCompiled, testString);
if checkMatch:
print ("FOUND")
else:
print("Not Found")
This doesn't seem to work. I'm not sure what I'm missing and also the problem here is I'm not checking for white spaces, tabs and new line characters and also hard-coded the number for integers before and after decimal.
With {m,n} you can specify the number of times a pattern can repeat, and the \d character class matches all digits. The \S character class matches anything that is not whitespace. Using these your regular expression can be simplified to:
re.compile(r'\d{2,3}\.\d{2,3}-A\d-\S*\Z')
Note also the \Z anchor, making the \S* expression match all the way to the end of the string. No whitespace (newlines, tabs, etc.) are allowed here. If you combine this with the .match() method you assure that all characters in your tested string conform to the pattern, nothing more, nothing less. See search() vs. match() for more information on .match().
A small demonstration:
>>> import re
>>> pattern = re.compile(r'\d{2,3}\.\d{2,3}-A\d-\S*\Z')
>>> pattern.match('87.98-A1-help')
<_sre.SRE_Match object at 0x1026905e0>
>>> pattern.match('123.45-A6-no whitespace allowed')
>>> pattern.match('123.45-A6-everything_else_is_allowed')
<_sre.SRE_Match object at 0x1026905e0>
Let's look at your regular expression. If you want:
"2 or 3 numbers in the range 0 - 9"
then you can't start your regular expression with '^[0-9][0-9][.] because that will only match strings with exactly two integers at the beginning. A second issue with your regex is at the end: [0-9][-]* - if you wish to match anything at the end of the string then you need to finish your regular expression with .* instead. Edit: see Martijn Pieters's answer regarding the whitespace in the regular expressions.
Here is an updated regular expression:
testString = "87.98-A1-help"
regCompiled = re.compile('^[0-9]{2,3}\.[0-9]{2,3}-A[0-9]-.*');
checkMatch = re.match(regCompiled, testString);
if checkMatch:
print ("FOUND")
else:
print("Not Found")
Not everything needs to be enclosed inside [ and ], in particular when you know the character(s) that you wish to match (such as the part -A). Furthermore:
the notation {m,n} means: match at least m times and at most n times, and
to explicitly match a dot, you need to escape it: that's why there is \. in the regular expression above.

Check what number a string ends with in Python

Such as "example123" would be 123, "ex123ample" would be None, and "123example" would be None.
You can use regular expressions from the re module:
import re
def get_trailing_number(s):
m = re.search(r'\d+$', s)
return int(m.group()) if m else None
The r'\d+$' string specifies the expression to be matched and consists of these special symbols:
\d: a digit (0-9)
+: one or more of the previous item (i.e. \d)
$: the end of the input string
In other words, it tries to find one or more digits at the end of a string. The search() function returns a Match object containing various information about the match or None if it couldn't match what was requested. The group() method, for example, returns the whole substring that matched the regular expression (in this case, some digits).
The ternary if at the last line returns either the matched digits converted to a number or None, depending on whether the Match object is None or not.
 
I'd use a regular expression, something like /(\d+)$/. This will match and capture one or more digits, anchored at the end of the string.
Read about regular expressions in Python.
Oops, correcting (sorry, I missed the point)
you should do something like this ;)
Import the RE module
import re
Then write a regular expression, "searching" for an expression.
s = re.search("[a-zA-Z+](\d{3})$", "string123")
This will return "123" if match or NoneType if not.
s.group(0)

Categories

Resources