This question already has an answer here:
Regular expression works on regex101.com, but not on prod
(1 answer)
Closed 4 months ago.
import re
print(re.search('\(\d{1,}\)', "b' 1. Population, 2016 (1)'"))
I am trying to extract the digits (one or more) between parentheses in strings. The above codes show my attempted solution. I checked my regular expression on https://regex101.com/ and expected the codes to return True. However, the returned value is None. Can someone let me know what happened?
Your current regex pattern is only valid if you make it a raw string:
inp = "b' 1. Population, 2016 (1)'"
nums = re.findall(r'\((\d{1,})\)', inp)
print(nums) # ['1']
Otherwise, you would have to double escape the \\d in the pattern.
Below RE will help you grab the digits inside the brackets only when the digits are present.
r"\((?P<digits_inside_brackets>\d+)\)
For your scenario, the above RE will match 1 under the group "digits_inside_brackets".
It can be executed through below snippet
import re
user_string = "b' 1. Population, 2016 (1)"
comp = re.compile(r"\((?P<digits_inside_brackets>\d+)\)") # Captures when digits are the only
for i in re.finditer(comp, user_string):
print(i.group("digits_inside_brackets"))
Output for the above snippet
Grab digits even when white space are provided:
r"\(\s*(?P<digits_inside_brackets>\d+)\s*\)
Grab digits inside brackets at any condition:
r"\(\D*(?P<digits_inside_brackets>\d+)\D*\)
Output when applied with above RE
Related
This question already has answers here:
Regexp to remove specific number of occurrences of character only
(2 answers)
How to only match a single instance of a character?
(3 answers)
Closed 11 months ago.
I'm stuck with regular expressions in Python...
#!/usr/bin/python3
import re
combi="ABBAEAADCA"
one_a = len(re.findall('[^A](A)[^A]', combi))
print("A:"+str(one_a))
I try to make this variable (one_a) contain the number of A's that appear alone (3) but it does not count those at the beginning and end of lines so....
one_a = len(re.findall('\A(A)[^A]', combi))
print("A ini:"+str(one_a))
one_a += len(re.findall('[^A](A)[^A]', combi))
print("A_cen:"+str(one_a))
one_a += len(re.findall('[^A](A)\Z', combi))
print("A_end:"+str(one_a))
but it didn't work either when in this particular case the value that should stay in the variable should be 3.
I would appreciate knowing what I am missing or what mistake I am making.
Thank you very much
Using a negated character class [^A] matches a single character, and \A asserts the start of the string.
To get the A's that stand alone you can negative lookarounds asserting not A directly to the left and right:
(?<!A)A(?!A)
See a regex demo and a Python demo.
import re
combi="ABBAEAADCA"
one_a = len(re.findall('(?<!A)A(?!A)', combi))
print("A:"+str(one_a))
Output
A:3
You can combine start-of-string (^) and end of string ($) with regular character classes through the or (|) operator.
re.findall(r'(?:^|[^A])A(?:$|[^A])', combi)
This gives you all substrings where A is either surrended by start of string and end of string, start of string and not-A, not-A or end of string or not-A and not-A.
>>> re.findall(r'(?:^|[^A])A(?:$|[^A])', combi)
['AB', 'BAE', 'CA']
Applying len to this list gives you the count of single A's.
This question already has answers here:
How to grab number after word in python
(4 answers)
Closed 2 years ago.
I want to extract the numbers for each parameter below:
import re
parameters = '''
NO2: 42602
SO2: 42401
CO: 42101
'''
The desired output should be:['42602','42401','42101']
I first tried re.findall(r'\d+',parameters), but it also returns the "2" from "NO2" and "SO2".
Then I tried re.findall(':.*',parameters), but it returns [': 42602', ': 42401', ': 42101']
If I can not rename the "NO2" to "Nitrogen dioxide", is there a way just to collect numbers on the right (after ":")?
Many thanks.
If you do not want to use capturing groups, you could use look behind.
(?<=:\s)\d+
Details:
(?<=:\s): gets string after :\s
\d+: gets digits
I also tried result on python.
import re
parameters = '''
NO2: 42602
SO2: 42401
CO: 42101
'''
result = re.findall(r'(?<=:\s)\d+',parameters)
print (result)
Result
['42602', '42401', '42101']
You can use the following regex to capture the numbers
^\s*\w+:\s(\d+)$
Hereby, ^ in the beginning asserts the position at the start of the line. \s* means that there may be 0 or more whitespaces before the content. \w+:\s matches a word character followed by ":" and space, that is "NO2: ".
Finally, (\d+) matches the following digits you want as a group. $ matches the end of the line.
To get all the matches as a list you can use
matches = re.findall(r'^\s*\w+:\s(\d+)$', parameters, re.MULTILINE)
As re.MULTILINE is specified,
the pattern character '^' matches at the beginning of the string and
at the beginning of each line.
as stated in the docs.
The result is as follows
>> print(matches)
['42602', '42401', '42101']
To put my two cents in, you could simpley use
re.findall(r'(\b\d+\b)', parameters)
See a demo on regex101.com.
If you happen to have other digits floating around somewhere in your string, be more precise with
\w+:\s*(\d+)
See another demo on regex101.com.
re.findall(r'(?<=:\s)\d+', parameters)
Should work. You can learn more about look-behind from here.
You just need to specify where in your string do you want to search for digits, you can use:
re.findall(r': (\d+)', parameters)
This tells Python to look for digits in the part of the string after ":" and the "space".
This question already has answers here:
Regular expression to return text between parenthesis
(11 answers)
Closed 2 years ago.
I have long string S, and I want to find value (numeric) in the following format "Value(**)", where ** is values I want to extract.
For example, S is "abcdef Value(34) Value(56) Value(13)", then I want to extract values 34, 56, 13 from S.
I tried to use regex as follows.
import re
regex = re.compile('\Value(.*'))
re.findall(regex, S)
But the code yields the result I did not expect.
Edit. I edited some mistakes.
You should escape the parentheses, correct the typo of Value (as opposed to Values), use a lazy repeater *? instead of *, add the missing right parenthesis, and capture what's enclosed in the escaped parentheses with a pair of parentheses:
regex = re.compile(r'Value\((.*?)\)')
Only one of your numbers follows the word 'Value', so you can extract anything inside parentheses. You also need to escape the parentheses which are special characters.
regex = re.compile('\(.*?\)')
re.findall(regex, S)
Output:
['(34)', '(56)', '(13)']
I think what you're looking for is a capturing group that can return multiple matches. This string is: (\(\d{2}\))?. \d matches an digit and {2} matches exactly 2 digits. {1,2} will match 1 or 2 digits ect. ? matches 0 to unlimited number of times. Each group can be indexed and combined into a list. This is a simple implementation and will return the numbers within parentheses.
eg. 'asdasd Value(102222), fgdf(20), he(77)' will match 20 and 77 but not 102222.
This question already has answers here:
Remove text between square brackets at the end of string
(3 answers)
Closed 3 years ago.
I'm trying to extract the last statement in brackets. However my code is returning every statement in brackets plus everything in between.
Ex: 'What [are] you [doing]'
I want '[doing]', but I get back '[are] you [doing]' when I run re.search.
I ran re.search using a regex expression that SHOULD get the last statement in brackets (plus the brackets) and nothing else. I also tried adding \s+ at the beginning hoping that would fix it, but it didn't.
string = '[What] are you [doing]'
m = re.search(r'\[.*?\]$' , string)
print(m.group(0))
I should just get [doing] back, but instead I get the entire string.
re.findall(r'\[(.+?)\]', 'What [are] you [doing]')[-1]
['doing']
According to condition to extract the last statement in brackets:
import re
s = 'What [are] you [doing]'
m = re.search(r'.*(\[[^\[\]]+\])', s)
res = m.group(1) if m else m
print(res) # [doing]
You can use findall and get last index
import re
string = 'What [are] you [doing]'
re.findall("\[\w{1,}]", string)[-1]
Output
'[doing]'
This will also work with the example posted by #MonkeyZeus in comments. If the last value is empty it should not return empty value. For example
string = 'What [are] you []'
Output
'[are]'
You can use a negative lookahead pattern to ensure that there isn't another pair of brackets to follow the matching pair of brackets:
re.search(r'\[[^\]]*\](?!.*\[.*\])', string).group()
or you can use .* to consume all the leading characters until the last possible match:
re.search(r'.*(\[.*?\])', string).group(1)
Given string = 'abc [foo] xyz [bar] 123', both of the above code would return: '[bar]'
This captures bracketed segments with anything in between the brackets (not necessarily letters or digits: any symbols/spaces/etc):
import re
string = '[US 1?] Evaluate any matters identified when testing segment information.[US 2!]'
print(re.findall(r'\[[^]]*\]', string)[-1])
gives
[US 2!]
A minor fix with your regex. You don't need the $ at the end. And also use re.findall rather than re.search
import re
string = 'What [are] you [doing]'
re.findall("\[.*?\]", string)[-1]
Output:
'[doing]'
If you have empty [] in your string, it will also be counted in the output by above method. To solve this, change the regex from \[.*?\] to \[..*?\]
import re
string = "What [are] you []"
re.findall("\[..*?\]", string)[-1]
Output:
'[are]'
If there is no matching, it will throw error like all other answers, so you will have to use try and except
This question already has answers here:
Checking whole string with a regex
(5 answers)
Closed 5 years ago.
I am trying to compile a regex on python but am having limited success. I am doing the following
import re
pattern = re.compile("[a-zA-Z0-9_])([a-zA-Z0-9_-]*)")
m=pattern.match("gb,&^(#)")
if m: print 1
else: print 2
I am expecting the output of the above to print 2, but instead it is printing one. The regex should match strings as follows:
The first letter is alphanumeric or an underscore. All characters after that can be alphanumeric, an underscore, or a dash and there can be 0 or more characters after the first.
I was thinking that this thing should fail as soon as it sees the comma, but it is not.
What am I doing wrong here?
import re
pattern = re.compile("^([a-zA-Z0-9_])([a-zA-Z0-9_-]*)$") # when you don't use $ at end it will match only initial string satisfying your regex
m=pattern.match("gb,&^(#)")
if m:
print(1)
else:
print(2)
pat = re.compile("^([a-zA-Z0-9_])([a-zA-Z0-9_-]*)") # this pattern is written by you which matches any string having alphanumeric characters and underscore in starting
if pat.match("_#"):
print('match')
else:
print('no match 1')
This will also help you understand explaination by #Wiktor with example.