I am trying to find expression but I want the value as a whole, not seperate values:
test = "code.coding = 'DS-2002433E172062D';"
find = re.compile('[A-Z,\d]')
hey= re.findall(find,str(test))
print hey
response = ['D, 'S', '2', '0', '0', '2', '6', '6', '8', '0', '3', 'E', '1', '7', '2', '0', '6', '2', 'D']
However I want it as DS-2002433E172062D
Your current regex, [A-Z,\d] matches a single character in the range A-Z, or a digit. You want to capture your entire desired string in your match. Try out:
[a-zA-Z0-9]{2}-[a-zA-Z0-9]+
import re
regx = r'[a-zA-Z]{2}-[a-zA-Z0-9]+'
print(re.findall(regx, "code.coding = 'DS-2002433E172062D';"))
Output:
['DS-2002433E172062D']
Explanation:
[ // start of single character match
a-zA-Z0-9 // matches any letter or digit
]{2} // matches 2 times
- // matches the - character
[ // start of single character match
a-zA-Z0-9 // matches any letter or digit
]+ // matches between 1 and unlimited times (greedy)
As an improvement on the previous answer I often use a utility function like this:
Python Function:
def extractRegexPattern(theInputPattern, theInputString):
"""
Extracts the given regex pattern.
Param theInputPattern - the pattern to extract.
Param theInputString - a String to extract from.
"""
import re
sourceStr = str(theInputString)
prog = re.compile(theInputPattern)
theList = prog.findall(sourceStr)
return theList
Use like so:
test = "code.coding = 'DS-2002433E172062D';"
groups = extractRegexPattern("([A-Z,\d]+)+", test)
print(groups)
Outputs:
['DS', '2002433E172062D']
Regex needs () to make it a group:
You had:
'[A-Z,\d]'
Which means: any letter in the range A-Z or "," (comma) or the pattern \d (digits) as a single match ... and this yielded all of the matches separately (and subtly omits the dash which would have needed [-A-Z,\d] to be included).
For groups of the same type of matches try:
'([A-Z,\d]+)+'
Which means: any letter in the range A-Z or "," (comma) or the pattern \d (digits) one or more times per group match and one or more groups pre input ... and this yields all of the matches as groups (and clearly omits the dash which would have needed ([A-Z,\d]+)+ to be included).
The point: use () (parentheses) to denote groups in regular expression.
Lastly, while testing you regex it may be helpful to use a debugger like http://pythex.org until you get the hang of the pattern at hand.
Edit:
Sorry, I missed the part about how you wanted it as DS-2002433E172062D
For that result using groups try:
'([-A-Z,\d]+)+'
Which means: a - (dash) or any letter in the range A-Z or , (comma) or the pattern \d (digits) one or more times per group match and one or more groups pre input ... and this yields all of the matches as groups. Note the - (dash) MUST come before any ranges in most (all?) regex, and it is best in my experience to put dashes as the very beginning of the patern to be clear.
Hope this helps
You are very near , just change your regex pattern :
pattern=r'[A-Z,\d]'
to
pattern=r'[A-Z,\d-]+'
full code:
import re
test = "code.coding = 'DS-2002433E172062D';"
pattern=r'[A-Z,\d-]+'
print(re.findall(pattern,test))
output:
['DS-2002433E172062D']
Related
s = 'scores:start:4.2.1.3.4.3:final:55.34.13.63.44.34'
I'm trying to reset the start scores to 0.0.0.0.0.0 so that it reads
scores:start:0.0.0.0.0.0:final:55.34.13.63.44.34
This works:
re.sub('start\:(\d+\.\d+\.\d+\.\d+\.\d+\.\d+)','0.0.0.0.0.0',s)
But I was looking for something more flexible to use in case the amount of scores change.
EDIT:
Actually my example does not work, because start is removed from the string
scores:0.0.0.0.0.0:final:55.34.13.63.44.34
but I would like to keep it:
scores:start:0.0.0.0.0.0:final:55.34.13.63.44.34
You may use this re.sub with a lambda if you want to replace dot seperated numbers with same number of zeroes:
>>> import re
>>> s = 'scores:start:4.2.1.3.4.3:final:55.34.13.63.44.34'
>>> re.sub(r'start:\d+(?:\.\d+)+', lambda m: re.sub(r'\d+', '0', m.group()), s)
'scores:start:0.0.0.0.0.0:final:55.34.13.63.44.34'
We are using regex as start:\d+(?:\.\d+)+ to match text that starts with start: and followed by digits separated by dot.
In lambda part we replace each 1+ digit with 0 to get same number of zeroes in output as the input.
Could replace all numbers before final with a zero:
re.sub(r'\d+(?=.*final)', '0', s)
Try it online!
Or perhaps more efficient if there were many more scores:
re.sub(r'\d|(final.*)', lambda m: m[1] or '0', s)
Another lambda in re.sub:
import re
s = 'scores:start:4.2.1.3.4.3:final:55.34.13.63.44.34'
pat=r'(?<=:start:)([\d.]+)(?=:final:)'
>>> re.sub(pat, lambda m: '.'.join(['0']*len(m.group(1).split('.'))), s)
scores:start:0.0.0.0.0.0:final:55.34.13.63.44.34
A variant using the regex PyPi module:
(?:\bstart:|\G(?!^))\.?\K\d+
The pattern matches
(?: Non capture group
\bstart: Match start:
| Or
\G(?!^) Assert the position at the end of the previous match, not at the start of the string
) Close the non capture group
\.?\K Match an optional dot, clear the current match buffer (forget what is matched so far)
\d+ Match 1 or more digits (to be replaced with a 0)
Regex demo
import regex
s = 'scores:start:4.2.1.3.4.3:final:55.34.13.63.44.34'
pattern = r"(?:\bstart:|\G(?!^))\.?\K\d+"
print(regex.sub(pattern, '0', s))
Output
scores:start:0.0.0.0.0.0:final:55.34.13.63.44.34
I have a small issue i am running into. I need a regular expression that would split a passed string with numbers separately and anything chunk of characters within square brackets separately and regular set of string separately.
for example if I have a strings that resembles
s = 2[abc]3[cd]ef
i need a list with lst = ['2','abc','3','cd','ef']
I have a code so far that has this..
import re
s = "2[abc]3[cd]ef"
s_final = ""
res = re.findall("(\d+)\[([^[\]]*)\]", s)
print(res)
This is outputting a list of tuples that looks like this.
[('2', 'abc'), ('3', 'cd')]
I am very new to regular expression and learning.. Sorry if this is an easy one.
Thanks!
The immediate fix is getting rid of the capturing groups and using alternation to match either digits or chars other than square bracket chars:
import re
s = "2[abc]3[cd]ef"
res = re.findall(r"\d+|[^][]+", s)
print(res)
# => ['2', 'abc', '3', 'cd', 'ef']
See the regex demo and the Python demo. Details:
\d+ - one or more digits
| - or
[^][]+ - one or more chars other than [ and ]
Other solutions that might help are:
re.findall(r'\w+', s)
re.findall(r'\d+|[^\W\d_]+', s)
where \w+ matches one or more letters, digits, underscores and some more connector punctuation with diacritics and [^\W\d_]+ matches any one or more Unicode letters.
See this Python demo.
Don't try a regex that will find all part in the string, but rather a regex that is able to match each block, and \w (meaning [a-zA-Z0-9_]) feats well
s = "2[abc]3[cd]ef"
print(re.findall(r"\w+", s)) # ['2', 'abc', '3', 'cd', 'ef']
Or split on brackets
print(re.split(r"[\[\]]", s)) # ['2', 'abc', '3', 'cd', 'ef ']
Regex is intended to be used as a Regular Expression, your string is Irregular.
regex is being mostly used to find a specific pattern in a long text, text validation, extract things from text.
for example, in order to find a phone number in a string, I would use RegEx, but when I want to build a calculator and I need to extract operators/digits I would not, but I would rather want to write a python code to do that.
I tried to extract numbers in a format like "**/*,**/*".
For example, for string "272/3,65/5", I want to get a list = [272,3,65,5]
I tried to use \d*/ and /\d* to extract 272,65 and 3,5 separately, but I want to know if there's a method to get a list I showed above directly. Thanks!
You can use [0-9] to represent a single decimal digit. so a better way of writing the regex would be something along the lines of [0-9]*/?[0-9]+,[0-9]+/?[0-9]*. I'm assuming you are trying to match fractional points such as 5,2/3, 23242/234,23, or 0,0.
Breakdown of the main components in [0-9]*/?[0-9]+,[0-9]+/?[0-9]*:
[0-9]* matches 0 or more digits
/? optionally matches a '/'
[0-9]+ matches at least one digit
My favorite tool for debugging regexes is https://regex101.com/ since it explains what each of the operators means and shows you how the regex is preforming on a sample of your choice as you write it.
#importing regular expression module
import re
# input string
s = "272/3,65/5"
# with findall method we can extract text by given format
result = re.findall('\d+',s)
print(result) # ['272', '3', '65', '5']
['272', '3', '65', '5']
where the \d means for matching digits in given input
if we use \d it gives output like below
result = re.findall('\d',s)
['2', '7', '2', '3', '6', '5', '5']
so we should use \d+ where + matches in possible matches upto fail match
I'm using python and regex (new to both) to find sequence of chars in a string as follows:
Grab the first instance of p followed by any number (It'll always be in the form of p_ _ where _ and _ will be integers). Then either find an 's' or a 'go' then all integers till the end of the string. For example:
ascjksdcvyp12nbvnzxcmgonbmbh12hjg23
should yield p12 go 12 23.
ascjksdcvyp12nbvnzxcmsnbmbh12hjg23
should yield p12 s 12 23.
I've only managed to get the p12 part of the string and this is what I've tried so far to extract the 'go' or 's':
decoded = (re.findall(r'([p][0-9]*)',myStr))
print(decoded) //prints p12
I know by doing something like
re.findall(r'[s]|[go]',myStr)
will give me all occurrences of s and g and o, but something like that is not what I'm looking for. And I'm not sure how I'd combine these regexes to get the desired output.
Use re.findall with pattern grouping:
>>> string = 'ascjksdcvyp12nbvnzxcmgonbmbh12hjg23'
>>> re.findall(r'(p\d{2}).*(s|go)\D*(\d+)(?:\D*(\d+))*', string)
[('p12', 'go', '12', '23')]
>>> string = 'ascjksdcvyp12nbvnzxcmsnbmbh12hjg23'
>>> re.findall(r'(p\d{2}).*(s|go)\D*(\d+)(?:\D*(\d+))*', string)
[('p12', 's', '12', '23')]
With re.findall we are only willing to get what are matched by pattern grouping ()
p\d{2} matches any two digits after p
After that .* matches anything
Then, s|go matches either s or go
\D* matches any number of non-digits
\d+ indicates one or more digits
(?:) is a non-capturing group i.e. the match inside won't show up in the output, it is only for the sake of grouping tokens
Note:
>>> re.findall(r'(p\d{2}).*(s|go)(?:\D*(\d+))+?', string)
[('p12', 's', '12')]
>>> re.findall(r'(p\d{2}).*(s|go)(?:\D*(\d+))+', string)
[('p12', 's', '23')]
I would like to use one of the above two as matching later digits is kind of a repeated task but there are problems with both non-greedy and greedy matches, hence we need to match the digits after s or go well, kind of explicitly.
First, try to match your line with a minimal pattern, as a test. Use (grouping) and (?:nongrouping) parens to capture the interesting parts and not capture the uninteresting parts. Store away what you care about,
then chop off the remainder of the string and search for numbers as a second step.
import re
simple_test = r'^.*p(\d{2}).*?(?:s|go).*?(\d+)'
m = re.match(simple_test, line)
if m is not None:
p_num = m.group(1)
trailing_numbers = [m.group(2)]
remainder = line[m.end()+1:]
trailing_numbers.extend( # extend list by appending
map( # list from applying
lambda m: m.group(1), # get group(1) from match
re.finditer(r"(\d+)", remainder) # of each number in string
)
)
print("P:", p_num, "Numbers:", trailing_numbers)
I need a regex to parse through a string that contains fractions and a operation [+, -, *, or /] and to return a 5 element tuple containing the numerators, denominators, and operation using the findall function in the re module.
Example: str = "15/9 + -9/5"
The output should of the form[("15","9","+","-9","5")]
I was able to come up with this:
pattern = r'-?\d+|\s+\W\s+'
print(re.findall(pattarn,str))
Which produces an output of ["15","9"," + ","-9","5"]. But after fiddling with this for so time, I cannot get this into a 5 element tuple and I cannot match the operation without also matching the white spaces around it.
This pattern will work:
(-?\d+)\/(\d+)\s+([+\-*/])\s+(-?\d+)\/(\d+)
#lets walk through it
(-?\d+) #matches any group of digits that may or may not have a `-` sign to group 1
\/ #escape character to match `/`
(\d+) #matches any group of digits to group 2
\s+([+\-*/])\s+ #matches any '+,-,*,/' character and puts only that into group 3 (whitespace is not captured in group)
(-?\d+)\/(\d+) #exactly the same as group 1/2 for groups 4/5
demo for this:
>>> s = "15/9 + -9/5 6/12 * 2/3"
>>> re.findall('(-?\d+)\/(\d+)\s([+\-*/])\s(-?\d+)\/(\d+)',s)
[('15', '9', '+', '-9', '5'), ('6', '12', '*', '2', '3')]
A general way to tokenize a string based on a regexp is this:
import re
pattern = "\s*(\d+|[/+*-])"
def tokens(x):
return [ m.group(1) for m in re.finditer(pattern, x) ]
print tokens("9 / 4 + 6 ")
Notes:
The regex begins with \s* to pass over any initial whitespace.
The part of the regex which matches the token is enclosed in parens to form a capture.
The different token patterns are in the capture separated by the alternation operation |.
Be careful about using \W since that will also match whitespace.