Regex in Python to get string of numbers after string of letters - python

I have a string formatted as results_item12345. The numeric part is either four or five digits long. The letters will always be lowercase and there will always be an underscore somewhere in the non-numeric part.
I tried to extract it using the following:
import re
string = 'results_item12345'
re.search(r'[^a-z][\d]',string)
However, I only get the leftmost two digits. How can I get the entire number?

Assuming you only care about the numbers at the end of the string, the following expression matches 4 or 5 digits at the end of the string.
\d{4,5}$
Otherwise, the following would be the full regex matching the provided requirements.
^[a-z_]+\d{4,5}$

If you wanted to just match any number in the string you could search for:
r'[\d]{4,5}'
If you need validation of some sort you need to use:
r'^result_item[\d]{4,5}$'

import re
a="results_item12345"
pattern=re.compile(r"(\D+)(\d+)")
x=pattern.match(a).groups()
print x[1]

Related

Regex for exactly phone number with any end

I want to re.sub to change phone number format inside a string but stuck with the number detection.
I want to detect and change this format : ###-###-#### to this one: (###)-###-####
My regex :(\d{3}\-)(\d{3}\-)(\d{4})$
my sub: (\1)-\2-\3
I got stuck at that my regex can detect the number but if the number string ends like this: My number is 212-345-9999. It can not detect the number string end with any other character. When I change my regex to:(\d{3}\-)(\d{3}\-)(\d{4}) it also changes the format of number like this: 123-456-78901 with is not a number I want to detect as a phone number.
Help me
Just add the word boundary \b to your regex pattern to require boundary characters such as space, period, etc. thus disallowing any additional numbers.
(\d{3}\-)(\d{3}\-)(\d{4})\b
But that will result to duplicate dashes. Instead, don't include the dash - in the captured groups so that they doesn't duplicate in the resulting string. So use this:
(\d{3})\-(\d{3})\-(\d{4})\b
If you want a stricter pattern to ensure that the string strictly contains the indicated pattern only and nothing more, match the start and end of string. Here, we will optionally catch an ending character \W that shouldn't be a digit nor letter.
^(\d{3})\-(\d{3})\-(\d{4})\W?$
Just change \W? to \W* if you want to match arbitrary number of non-digit characters e.g. 123-456-7890.,
Sample Run:
If you intend to only process the correctly-formatted numbers, then don't call re.sub() right away. First, check if there is a match via re.match():
import re
number_re = re.compile(r"^(\d{3})\-(\d{3})\-(\d{4})\W?$")
for num in [
"123-456-7890",
"123-456-78901",
"123-456-7890.",
"123-456-7890.1",
]:
print(num)
if number_re.match(num):
print("\t", number_re.sub(r"(\1)-\2-\3", num))
else:
print("\tIncorrect format")
Output:
123-456-7890
(123)-456-7890
123-456-78901
Incorrect format
123-456-7890.
(123)-456-7890
123-456-7890.1
Incorrect format

How can I check if a string has 9 or more digits?

I'm trying to detect if a string has 9 or more digits. What's the best way to approach this?
I want to be able to detect a phone number inside a string like this:
Call me # (123)123-1234
What's the best way to pull those numbers regardless of their positioning in the string?
Since it sounds like you just want to check whether there are 9 or more digits in the string, you can use the pattern
^(\D*\d){9}
It starts at the beginning of the string, and repeats a group composed of zero or more non-digit characters, followed by a digit character. Repeat that group 9 times, and you know that the string has at least 9 digits in it.
pattern = re.compile(r'^(?:\D*\d){9}')
print(pattern.match('Call me # (123)123-1234'))
print(pattern.match('Call me # (123)123-12'))
#Import the regular expressions library
import re
#set our string variable equal to yours above
string = 'Call me # (123)123-1234'
#create a list using regular expressions, of all digits in the string
a = re.findall("[0-9]",string)
#examine the list to see if its length is 9 digits or more, and print if so
if len(a) >= 9:
print(a)
Or without regex (slower for big strings):
print(sum(letter.isdigit() for letter in my_string)>=9)
Or part regex:
print(len(re.findall("[0-9]",my_string))>=9)
Just use python for checking if nine (9) or more digits.

regex to strict check numbers in string

Example strings:
I am a numeric string 75698
I am a alphanumeric string A14-B32-C7D
So far my regex works: (\S+)$
I want to add a way (probably look ahead) to check if the result generated by above regex contains any digit (0-9) one or more times?
This is not working: (\S+(?=\S*\d\S*))$
How should I do it?
Look ahead is not necessary for this, this is simply :
(\S*\d+\S*)
Here is a test case :
http://regexr.com?34s7v
permute it and use the \D class instead of \S:
((?=\D*\d)\S+)$
explanation: \D = [^\d] in other words it is all that is not a digit.
You can be more explicit (better performances for your examples) with:
((?=[a-zA-Z-]*\d)\[a-zA-Z\d-]+)$
and if you have only uppercase letters, you know what to do. (smaller is the class, better is the regex)
text = '''
I am a numeric string 75698 \t
I am a alphanumeric string A14-B32-C7D
I am a alphanumeric string A14-B32-C74578
I am an alphabetic number: three
'''
import re
regx = re.compile('\s(?=.*\d)([\da-zA-Z-]+)\s*$',re.MULTILINE)
print regx.findall(text)
# result ['75698', 'A14-B32-C7D', 'A14-B32-C74578']
Note the presence of \s* in front of $ in order to catch alphanumeric portions that are separated with whitespazces from the end of the lines.

python regular expression numbers in a row

I'm trying to check a string for a maximum of 3 numbers in a row for which I used:
regex = re.compile("\d{0,3}")
but this does not work for instance the string 1234 would be accepted by this regex even though the digit string if over length 3.
If you want to check a string for a maximum of 3 digits in string you need to use '\d{4,}' as you are only interest in the digits string over a length of 3.
import re
str='123abc1234def12'
print re.findall('\d{4,}',str)
>>> '[1234]'
If you use {0,3}:
str='123456'
print re.findall('\d{0,3}',str)
>>> ['123', '456', '']
The regex matches digit strings of maximum length 3 and empty strings but this cannot be used to test correctness. Here you can't check whether all digit strings are in length but you can easily check for digits string over the length.
So to test do something like this:
str='1234'
if re.match('\d{4,}',str):
print 'Max digit string too long!'
>>> Max digit string too long!
\d{0} matches every possible string. It's not clear what you mean by "doesn't work", but if you expect to match a string with digits, increase the repetition operator to {1,3}.
If you wish to exclude runs of 4 or more, try something like (?:^|\D)\d{1,3}(?:\D|$) and of course, if you want to capture the match, use capturing parentheses around \d{1,3}.
The method you have used is to find substrings with 0-3 numbers, it couldn't reach your expactation.
My solve:
>>> import re
>>> re.findall('\d','ds1hg2jh4jh5')
['1', '2', '4', '5']
>>> res = re.findall('\d','ds1hg2jh4jh5')
>>> len(res)
4
>>> res = re.findall('\d','23425')
>>> len(res)
5
so,next you just need use ‘if’ to judge the numbers of digits.
There could be a couple reasons:
Since you want \d to search for digits or numbers, you should probably spell that as "\\d" or r"\d". "\d" might happen to work, but only because d isn't special (yet) in a string. "\n" or "\f" or "\r" will do something totally different. Check out the re module documentation and search for "raw strings".
"\\d{0,3}" will match just about anything, because {0,3} means "zero or up to three". So, it will match the start of any string, since any string starts with the empty string.
or, perhaps you want to be searching for strings that are only zero to three numbers, and nothing else. In this case, you want to use something like r"^\d{0,3}$". The reason is that regular expressions match anywhere in a string (or only at the beginning if you are using re.match and not re.search). ^ matches the start of the string, and $ matches the end, so by putting those at each end you are not matching anything that has anything before or after \d{0,3}.

Python Regex to look for string

I have a text file with text that looks like below
Format={ Window_Type="Tabular", Tabular={ Num_row_labels=10
}
}
I need to look for Num_row_labels >=10 in my text file. How do I do that using Python 3.2 regex?
Thanks.
Assume that the data is formatted as above, and there is no leading 0's in the number:
Num_row_labels=\d{2,}
A more liberal regex which allows arbitrary spaces, still assume no leading 0's:
Num_row_labels\s*=\s*\d{2,}
An even more liberal regex which allows arbitrary spaces, and allow leading 0's:
Num_row_labels\s*=\s*0*[1-9]\d+
If you need to capture the numbers, just surround \d{2,} (in 1st and 2nd regex) or [1-9]\d+ (in 3rd regex) with parentheses () and refers to it in the 1st capture group.
Use:
match = re.search("Num_row_labels=(\d+)", line)
The (\d+) matches at least one decimal digit (0-9) and captures all digits matched as a group (groups are stored in the object returned by re.search and re.match, which I'm assigning to match here). To access the group and compare compare against 10, use:
if int(match.group(1)) >= 10:
print "Num_row_labels is at least 10"
This will allow you to easily change the value of your threshold, unlike the answers that do everything in the regex. Additionally, I believe this is more readable in that it is very obvious that you are comparing a value against 10, rather than matching a nonzero digit in the regex followed by at least one other digit. What the code above does is ask for the 1st group that was matched (match.group(1) returns the string that was matched by \d+), and then, with the call to int(), converts the string to an integer. The integer returned by int() is then compared against 10.
The regex is Num_row_labels=[1-9][0-9]{1}.*
Now you can use the re python module (take a look here) to analyze your text and extract those
the re looks like:
Num_row_labels=[0-9]*[1-9][0-9]+
Example of usage:
if re.search('Num_row_labels=[0-9]*[1-9][0-9]+', line):
print line
The regular expression [0-9]*[1-9][0-9]+ means that in the string must be at least
one digit from 1 to 9 ([1-9], symbol class [] in regular expressions means that here can be any symbol from the range specified in the brackets);
and at least one digit from 0 to 9 (but it can be more of them) ([0-9]+, the + sign in regular expression means that the symbol/expression that stand before it can be repeated 1 or more times).
Before these digits can be any other digits ([0-9]*, that means any digit, 0 or more times). When you already have two digits you can have any other digits before — the number would be greater or equal 10 anyway.

Categories

Resources