Split only first group of string with regex in python [duplicate] - python

This question already has answers here:
re.findall behaves weird
(3 answers)
Closed 2 years ago.
I want to search for a specific string can anyone tell me why I am seeing below result?
I checked it out in an Online regex site, It seems I have seperated in to 3 groups and now the result is printing the 3 groups. how I can only seperate the first group?
Also is it possible to change the code so the "String" with lower case would be detected?
Relative String
DD-JSH-String43423213-3774
DE-String43423214-SDC-3721
Output:
'String43423213', 'String', '43423213','String43423214', 'String', '43423214'
Code:
matches = re.findall(r'((String)(\d+))', inp)
matches = [j for sub in matches for j in sub if j != ""]
Expected Result:
'String43423213', 'String43423214'

This is because you are even grouping on the two matches, so you have to remove the outer group. And also you can add flag re.I to ignore case:
matches = re.findall(r'(String)(\d+)', inp, flags=re.I)
print(*[''.join(x) for x in matches],sep="\n")

Try this regex-demo:
python source:
input="""DD-JSH-String43423213-3774
DE-String43423214-SDC-3721"""
matches = re.findall(r'String\d+', input, flags=re.I)
print(matches)
or
matches = re.findall(r'(?i)String\d+', input)
print(matches)
output:
['String43423213', 'String43423214']
explanation:
Because your regex has two groups String and \d+, re.findall returns a list that contains all tuples of the two groups like ('String', 'String43423214'). You could group it like (String\d+) or non-group like String\d+, both expressions are working.

you can do this:
import re
inp = """
DD-JSH-String43423213-3774
DE-String43423214-SDC-3721
"""
matches = matches = re.findall(r'String\d+', inp)
for match in matches:
print(match)

Related

Regex: How to find substring that does NOT contain a certain word [duplicate]

This question already has answers here:
Regular expressions: Ensuring b doesn't come between a and c
(4 answers)
Closed 3 years ago.
I have this string;
string = "STARTcandyFINISH STARTsugarFINISH STARTpoisonFINISH STARTBlobpoisonFINISH STARTpoisonBlobFINISH"
I would like to match and capture all substrings that appear in between START and FINISH but only if the word "poison" does NOT appear in that substring. How do I exclude this word and capture only the desired substrings?
re.findall(r'START(.*?)FINISH', string)
Desired captured groups:
candy
sugar
Using a tempered dot, we can try:
string = "STARTcandyFINISH STARTsugarFINISH STARTpoisonFINISH STARTBlobpoisonFINISH STARTpoisonBlobFINISH"
matches = re.findall(r'START((?:(?!poison).)*?)FINISH', string)
print(matches)
This prints:
['candy', 'sugar']
For an explanation of how the regex pattern works, we can have a closer look at:
(?:(?!poison).)*?
This uses a tempered dot trick. It will match, one character at a time, so long as what follows is not poison.

Find substrings matching a pattern allowing overlaps [duplicate]

This question already has answers here:
How to use regex to find all overlapping matches
(5 answers)
Closed 3 years ago.
So I have strings that form concatenated 1's and 0's with length 12. Here are some examples:
100010011100
001111110000
001010100011
I want to isolate sections of each which start with 1, following with any numbers of zeros, and then ends with 1.
So for the first string, I would want ['10001','1001']
The second string, I would want nothing returned
The third list, I would want ['101','101','10001']
I've tried using a combination of positive lookahead and positive lookbehind, but it isn't working. This is what I've come up with so far [(?<=1)0][0(?=1)]
For a non-regex approach, you can split the string on 1. The matches you want are any elements in the resulting list with a 0 in it, excluding the first and last elements of the array.
Code:
myStrings = [
"100010011100",
"001111110000",
"001010100011"
]
for s in myStrings:
matches = ["1"+z+"1" for i, z in enumerate(s.split("1")[:-1]) if (i>0) and ("0" in z)]
print(matches)
Output:
#['10001', '1001']
#[]
#['101', '101', '10001']
I suggest writing a simple regex: r'10+1'. Then use python logic to find each match using re.search(). After each match, start the next search at the position after the beginning of the match.
Can't do it in one search with a regex.
def parse(s):
pattern = re.compile(r'(10+1)')
match = pattern.search(s)
while match:
yield match[0]
match = pattern.search(s, match.end()-1)

Extracting the last statement in []'s (regex) [duplicate]

This question already has answers here:
Remove text between square brackets at the end of string
(3 answers)
Closed 3 years ago.
I'm trying to extract the last statement in brackets. However my code is returning every statement in brackets plus everything in between.
Ex: 'What [are] you [doing]'
I want '[doing]', but I get back '[are] you [doing]' when I run re.search.
I ran re.search using a regex expression that SHOULD get the last statement in brackets (plus the brackets) and nothing else. I also tried adding \s+ at the beginning hoping that would fix it, but it didn't.
string = '[What] are you [doing]'
m = re.search(r'\[.*?\]$' , string)
print(m.group(0))
I should just get [doing] back, but instead I get the entire string.
re.findall(r'\[(.+?)\]', 'What [are] you [doing]')[-1]
['doing']
According to condition to extract the last statement in brackets:
import re
s = 'What [are] you [doing]'
m = re.search(r'.*(\[[^\[\]]+\])', s)
res = m.group(1) if m else m
print(res) # [doing]
You can use findall and get last index
import re
string = 'What [are] you [doing]'
re.findall("\[\w{1,}]", string)[-1]
Output
'[doing]'
This will also work with the example posted by #MonkeyZeus in comments. If the last value is empty it should not return empty value. For example
string = 'What [are] you []'
Output
'[are]'
You can use a negative lookahead pattern to ensure that there isn't another pair of brackets to follow the matching pair of brackets:
re.search(r'\[[^\]]*\](?!.*\[.*\])', string).group()
or you can use .* to consume all the leading characters until the last possible match:
re.search(r'.*(\[.*?\])', string).group(1)
Given string = 'abc [foo] xyz [bar] 123', both of the above code would return: '[bar]'
This captures bracketed segments with anything in between the brackets (not necessarily letters or digits: any symbols/spaces/etc):
import re
string = '[US 1?] Evaluate any matters identified when testing segment information.[US 2!]'
print(re.findall(r'\[[^]]*\]', string)[-1])
gives
[US 2!]
A minor fix with your regex. You don't need the $ at the end. And also use re.findall rather than re.search
import re
string = 'What [are] you [doing]'
re.findall("\[.*?\]", string)[-1]
Output:
'[doing]'
If you have empty [] in your string, it will also be counted in the output by above method. To solve this, change the regex from \[.*?\] to \[..*?\]
import re
string = "What [are] you []"
re.findall("\[..*?\]", string)[-1]
Output:
'[are]'
If there is no matching, it will throw error like all other answers, so you will have to use try and except

python - regex why does `findall` find nothing, but `search` works? [duplicate]

This question already has answers here:
re.findall behaves weird
(3 answers)
Closed 5 years ago.
>>> reg = re.compile(r'^\d{1,3}(,\d{3})*$')
>>> str = '42'
>>> reg.search(str).group()
'42'
>>> reg.findall(str)
['']
>>>
python regex
Why does reg.findall find nothing, but reg.search works in this piece of code above?
When you have capture groups (wrapped with parenthesis) in the regex, findall will return the match of the captured group; And in your case the captured group matches an empty string; You can make it non capture with ?: if you want to return the whole match; re.search ignores capture groups on the other hand. These are reflected in the documentation:
re.findall:
Return all non-overlapping matches of pattern in string, as a list of
strings. The string is scanned left-to-right, and matches are returned
in the order found. If one or more groups are present in the pattern,
return a list of groups; this will be a list of tuples if the pattern
has more than one group.
re.search:
Scan through string looking for the first location where the regular
expression pattern produces a match, and return a corresponding
MatchObject instance. Return None if no position in the string matches
the pattern; note that this is different from finding a zero-length
match at some point in the string.
import re
reg = re.compile(r'^\d{1,3}(?:,\d{3})*$')
s = '42'
reg.search(s).group()
​# '42'
reg.findall(s)
# ['42']

Split string based on regexp without consuming characters [duplicate]

This question already has answers here:
Non-consuming regular expression split in Python
(2 answers)
Closed 8 years ago.
I would like to split a string like the following
text="one,two;three.four:"
into the list
textOut=["one", ",two", ";three", ".four", ":"]
I have tried with
import re
textOut = re.split(r'(?=[.:,;])', text)
But this does not split anything.
I would use re.findall here instead of re.split:
>>> from re import findall
>>> text = "one,two;three.four:"
>>> findall("(?:^|\W)\w*", text)
['one', ',two', ';three', '.four', ':']
>>>
Below is a breakdown of the Regex pattern used above:
(?: # The start of a non-capturing group
^|\W # The start of the string or a non-word character (symbol)
) # The end of the non-capturing group
\w* # Zero or more word characters (characters that are not symbols)
For more information, see here.
I don't know what else can occur in your string, but will this do the trick?
>>> s='one,two;three.four:'
>>> [x for x in re.findall(r'[.,;:]?\w*', s) if x]
['one', ',two', ';three', '.four', ':']

Categories

Resources