In python, how to 'if finditer(...) has no matches'?

In python, how to 'if finditer(...) has no matches'? - python

I would like to do something when finditer() does not find anything.
import re
pattern = "1"
string = "abc"
matched_iter = re.finditer(pattern, string)
# <if matched_iter is empty (no matched found>.
# do something.
# else
for m in matched_iter:
print m.group()
The best thing I could come up with is to keep track of found manually:
mi_no_find = re.finditer(r'\w+',"$$%%%%") # not matching.
found = False
for m in mi_no_find:
print m.group()
found = True
if not found:
print "Nothing found"
Related posts that don't answer:
Counting finditer matches: Number of regex matches (I don't need to count, I just need to know if there are no matches).
finditer vs match: different behavior when using re.finditer and re.match (says always have to loop over an iterator returned by finditer)
[edit]
- I have no interest in enumerating or counting total output. Only if found else not found actions.
- I understand I can put finditer into a list, but this would be inefficient for large strings. One objective is to have low memory utilization.

Updated 04/10/2020
Use re.search(pattern, string) to check if a pattern exists.
pattern = "1"
string = "abc"
if re.search(pattern, string) is None:
print('do this because nothing was found')
Returns:
do this because nothing was found
If you want to iterate over the return, then place the re.finditer() within the re.search().
pattern = '[A-Za-z]'
string = "abc"
if re.search(pattern, string) is not None:
for thing in re.finditer(pattern, string):
print('Found this thing: ' + thing[0])
Returns:
Found this thing: a
Found this thing: b
Found this thing: c
Therefore, if you wanted both options, use the else: clause with the if re.search() conditional.
pattern = "1"
string = "abc"
if re.search(pattern, string) is not None:
for thing in re.finditer(pattern, string):
print('Found this thing: ' + thing[0])
else:
print('do this because nothing was found')
Returns:
do this because nothing was found
previous reply below (not sufficient, just read above)
If the .finditer() does not match a pattern, then it will not perform any commands within the related loop.
So:
Set the variable before the loop you are using to iterate over the regex returns
Call the variable after (And outside of) the loop you are using to iterate over the regex returns
This way, if nothing is returned from the regex call, the loop won't execute and your variable call after the loop will return the exact same variable it was set to.
Below, example 1 demonstrates the regex finding the pattern. Example 2 shows the regex not finding the pattern, so the variable within the loop is never set.
Example 3 shows my suggestion - where the variable is set before the regex loop, so if the regex does not find a match (and subsequently, does not trigger the loop), the variable call after the loop returns the initial variable set (Confirming the regex pattern was not found).
Remember to import the import re module.
EXAMPLE 1 (Searching for the characters 'he' in the string 'hello world' will return 'he')
my_string = 'hello world'
pat = '(he)'
regex = re.finditer(pat,my_string)
for a in regex:
b = str(a.groups()[0])
print(b)
# returns 'he'
EXAMPLE 2 (Searching for the characters 'ab' in the string 'hello world' do not match anything, so the 'for a in regex:' loop does not execute and does not assign the b variable any value.)
my_string = 'hello world'
pat = '(ab)'
regex = re.finditer(pat,my_string)
for a in regex:
b = str(a.groups()[0])
print(b)
# no return
EXAMPLE 3 (Searching for the characters 'ab' again, but this time setting the variable b to 'CAKE' before the loop, and calling the variable b after, outside of the loop returns the initial variable - i.e. 'CAKE' - since the loop did not execute).
my_string = 'hello world'
pat = '(ab)'
regex = re.finditer(pat,my_string)
b = 'CAKE' # sets the variable prior to the for loop
for a in regex:
b = str(a.groups()[0])
print(b) # calls the variable after (and outside) the loop
# returns 'CAKE'
It's also worth noting that when designing your pattern to feed into the regex, make sure to use the parenthesis to indicate the start and end of a group.
pattern = '(ab)' # use this
pattern = 'ab' # avoid using this
To tie back to the initial question:
Since nothing found won’t execute the for loop (for a in regex), the user can preload the variable, then check it after the for loop for the original loaded value. This will allow for the user to know if nothing was found.
my_string = 'hello world'
pat = '(ab)'
regex = re.finditer(pat,my_string)
b = 'CAKE' # sets the variable prior to the for loop
for a in regex:
b = str(a.groups()[0])
if b == ‘CAKE’:
# action taken if nothing is returned

If performance isn't an issue, simply use findall or list(finditer(...)), which returns a list.
Otherwise, you can "peek" into the generator with next, then loop as normal if it raises StopIteration. Though there are other ways to do it, this is the simplest to me:
import itertools
import re
pattern = "1"
string = "abc"
matched_iter = re.finditer(pattern, string)
try:
first_match = next(matched_iter)
except StopIteration:
print("No match!") # action for no match
else:
for m in itertools.chain([first_match], matched_iter):
print(m.group())

You can probe the iterator with next and then chain the results back together while excepting StopIteration which means the iterator was empty:
import itertools as it
matches = iter([])
try:
probe = next(matches)
except StopIteration:
print('empty')
else:
for m in it.chain([probe], matches):
print(m)
Regarding your solution you could check m directly, setting it to None beforehand:
matches = iter([])
m = None
for m in matches:
print(m)
if m is None:
print('empty')

It prints the original string if there are no matches in the string.
It will replace the position n of the string.
For more reference: https://docs.python.org/2/howto/regex.html
Input_Str = "FOOTBALL"
def replacing(Input_String, char_2_replace, replaced_char, n):
pattern = re.compile(char_2_replace)
if len(re.findall(pattern, Input_String)) >= n:
where = [m for m in pattern.finditer(Input_String)][n-1]
before = Input_String[:where.start()]
after = Input_String[where.end():]
newString = before + replaced_char + after
else:
newString = Input_String
return newString
print(replacing(Input_Str, 'L', 'X', 4))```

I know this answer is late, but very suitable for Python 3.8+
You can use the new warlus operator := operator along with next(iterator[, default]) to solve for 'no matches' in re.finditer(pattern, string, flags=0) somewhat like this:
import re
pattern_ = "1"
string_ = "abc"
def is_match():
was_found = False
while next((match := re.finditer(pattern_, string_)), None) is not None:
was_found = True
yield match.group() # or just print it
return was_found

Related

Print the first, second occurred character in a list

I working on a simple algorithm which prints the first character who occurred twice or more.
for eg:
string ='abcabc'
output = a
string = 'abccba'
output = c
string = 'abba'
output = b
what I have done is:
string = 'abcabc'
s = []
for x in string:
if x in s:
print(x)
break
else:
s.append(x)
output: a
But its time complexity is O(n^2), how can I do this in O(n)?

Change s = [] to s = set() (and obviously the corresponding append to add). in over set is O(1), unlike in over list which is sequential.
Alternately, with regular expressions (O(n^2), but rather fast and easy):
import re
match = re.search(r'(.).*\1', string)
if match:
print(match.group(1))
The regular expression (.).*\1 means "any character which we'll remember for later, any number of intervening characters, then the remembered character again". Since regexp is scanned left-to-right, it will find a in "abba" rather than b, as required.

Use dictionaries
string = 'abcabc'
s = {}
for x in string:
if x in s:
print(x)
break
else:
s[x] = 0
or use sets
string = 'abcabc'
s = set()
for x in string:
if x in s:
print(x)
break
else:
s.add(x)
both dictionaries and sets use indexing and search in O(1)

How to find what matched in any() with Python?

I'm working in Python, using any() like so to look for a match between a String[] array and a comment pulled from Reddit's API.
Currently, I'm doing it like this:
isMatch = any(string in comment.body for string in myStringArray)
But it would also be useful to not just know if isMatch is true, but which element of myStringArray it was that had a match. Is there a way to do this with my current approach, or do I have to find a different way to search for a match?

You could use next with default=False on a conditional generator expression:
next((string for string in myStringArray if string in comment.body), default=False)
The default is returned when there is no item that matched (so it's like any returning False), otherwise the first matching item is returned.
This is roughly equivalent to:
isMatch = False # variable to store the result
for string in myStringArray:
if string in comment.body:
isMatch = string
break # after the first occurrence stop the for-loop.
or if you want to have isMatch and whatMatched in different variables:
isMatch = False # variable to store the any result
whatMatched = '' # variable to store the first match
for string in myStringArray:
if string in comment.body:
isMatch = True
whatMatched = string
break # after the first occurrence stop the for-loop.

For python 3.8 or newer use Assignment Expressions
if any((match := string) in comment.body for string in myStringArray):
print(match)

I agree with the comment that an explicit loop would be clearest. You could fudge your original like so:
isMatch = any(string in comment.body and remember(string) for string in myStringArray)
^^^^^^^^^^^^^^^^^^^^^
where:
def remember(x):
global memory
memory = x
return True
Then the global memory will contain the matched string if isMatch is True, or retain whatever value (if any) it originally had if isMatch is False.

It's not a good idea to use one variable to store two different kinds of information: whether a string matches (a bool) and what that string is (a string).
You really only need the second piece of information: while there are creative ways to do this in one statement, as in the above answer, it really makes sense to use a for loop:
match = ''
for string in myStringArray:
if string in comment.body:
match = string
break
if match:
pass # do stuff

Say you have a = ['a','b','c','d'] and b = ['x','y','d','z'],
so that by doing any(i in b for i in a) you get True.
You can get:
The array of matches : matches = list( (i in b for i in a) )
Where in a it first matches : posInA = matches.index(True)
The value : value = a[posInA]
Where in b it first matches : posInB = b.index(value)
To get all the values and their indexes, the problem is that matches == [False, False, True, True] whether the multiple values are in a or b, so you need to use enumerate in loops (or in a list comprehension).
for m,i in enumerate(a):
print('considering '+i+' at pos '+str(m)+' in a')
for n,j in enumerate(b):
print('against '+j+' at pos '+str(n)+' in b')
if i == j:
print('in a: '+i+' at pos '+str(m)+', in b: '+j+' at pos '+str(n))

How can you group a very specfic pattern with regex?

Problem:
https://coderbyte.com/editor/Simple%20Symbols
The str parameter will be composed of + and = symbols with
several letters between them (ie. ++d+===+c++==a) and for the string
to be true each letter must be surrounded by a + symbol. So the string
to the left would be false. The string will not be empty and will have
at least one letter.
Input:"+d+=3=+s+"
Output:"true"
Input:"f++d+"
Output:"false"
I'm trying to create a regular expression for the following problem, but I keep running into various problems. How can I produce something that returns the specified rules('+\D+')?
import re
plusReg = re.compile(r'[(+A-Za-z+)]')
plusReg.findall()
>>> []
Here I thought I could create my own class that searches for the pattern.
import re
plusReg = re.compile(r'([\\+,\D,\\+])')
plusReg.findall('adf+a+=4=+S+')
>>> ['a', 'd', 'f', '+', 'a', '+', '=', '=', '+', 'S', '+']
Here I thought I the '\\+' would single out the plus symbol and read it as a char.
mo = plusReg.search('adf+a+=4=+S+')
mo.group()
>>>'a'
Here using the same shell, I tried using the search instead of findall, but I just ended up with the first letter which isn't even surrounded by a plus.
My end result is to group the string 'adf+a+=4=+S+' into ['+a+','+S+'] and so on.
edit:
Solution:
import re
def SimpleSymbols(str):
#added padding, because if str = 'y+4==+r+'
#then program would return true when it should return false.
string = '=' + str + '='
#regex that returns false if a letter *doesn't* have a + in front or back
plusReg = re.compile(r'[^\+][A-Za-z].|.[A-Za-z][^\+]')
#if statement that returns "true" if regex doesn't find any letters
#without a + behind or in front
if plusReg.search(string) is None:
return "true"
return "false"
print SimpleSymbols(raw_input())
I borrowed some code from ekhumoro and Sanjay. Thanks

One approach is to search the string for any letters that are either: (1) not preceeded by a +, or (2) not followed by a +. This can be done using look ahead and look behind assertions:
>>> rgx = re.compile(r'(?<!\+)[a-zA-Z]|[a-zA-Z](?!\+)')
So if rgx.search(string) returns None, the string is valid:
>>> rgx.search('+a+') is None
True
>>> rgx.search('+a+b+') is None
True
but if it returns a match, the string is invalid:
>>> rgx.search('+ab+') is None
False
>>> rgx.search('+a=b+') is None
False
>>> rgx.search('a') is None
False
>>> rgx.search('+a') is None
False
>>> rgx.search('a+') is None
False
The important thing about look ahead/behind assertions is that they don't consume characters, so they can handle overlapping matches.

Something like this should do the trick:
import re
def is_valid_str(s):
return re.findall('[a-zA-Z]', s) == re.findall('\+([a-zA-Z])\+', s)
Usage:
In [10]: is_valid_str("f++d+")
Out[10]: False
In [11]: is_valid_str("+d+=3=+s+")
Out[11]: True

I think you are on the right track. The regular expression you have is correct, but it can simplify down to just letters:
search_pattern = re.compile(r'\+[a-zA-z]\+')
for upper and lower case strings. Now we can use this regex with the findall function:
results = re.findall(search_pattern, 'adf+a+=4=+S+') # returns ['+a+', '+S+']
Now the question needs you to return a boolean depending on if the string is valid to the specified pattern so we can wrap this all up into a function:
def is_valid_pattern(pattern_string):
search_pattern = re.compile(r'\+[a-zA-z]?\+')
letter_pattern = re.compile(r'[a-zA-z]') # to search for all letters
results = re.findall(search_pattern, pattern_string)
letters = re.findall(letter_pattern, pattern_string)
# if the lenght of the list of all the letters equals the length of all
# the values found with the pattern, we can say that it is a valid string
return len(results) == len(letter_pattern)

You should be looking for what isn't there, as opposed to what is. You should search for something like, ([^\+][A-Za-z]|[A-Za-z][^\+]). The | in the middle is a logical or operator. Then on either side, it checks if it can find any scenario where there is a letter without a "+" on the left/right respectively. If if finds something, that means the string fails. If it can't find anything, that means that there are no instances of a letter not being surrounded by "+"'s.

Regular expression to replace a character on odd repeated occurrences in Python

Can't get a regular expression to replace a character on odd repeated occurrences in Python.
Example:
char = ``...```.....``...`....`````...`
to
``...``````.....``...``....``````````...``
on even occurrences doesn't replace.

for example:
>>> import re
>>> s = "`...```.....``...`....`````...`"
>>> re.sub(r'((?<!`)(``)*`(?!`))', r'\1\1', s)
'``...``````.....``...``....``````````...``'

Maybe I'm old fashioned (or my regex skills aren't up to par), but this seems to be a lot easier to read:
import re
def double_odd(regex,string):
"""
Look for groups that match the regex. Double every second one.
"""
count = [0]
def _double(match):
count[0] += 1
return match.group(0) if count[0]%2 == 0 else match.group(0)*2
return re.sub(regex,_double,string)
s = "`...```.....``...`....`````...`"
print double_odd('`',s)
print double_odd('`+',s)
It seems that I might have been a little confused about what you were actually looking for. Based on the comments, this becomes even easier:
def odd_repl(match):
"""
double a match (all of the matched text) when the length of the
matched text is odd
"""
g = match.group(0)
return g*2 if len(g)%2 == 1 else g
re.sub(regex,odd_repl,your_string)

This may be not as good as the regex solution, but works:
In [101]: s1=re.findall(r'`{1,}',char)
In [102]: s2=re.findall(r'\.{1,}',char)
In [103]: fill=s1[-1] if len(s1[-1])%2==0 else s1[-1]*2
In [104]: "".join("".join((x if len(x)%2==0 else x*2,y)) for x,y in zip(s1,s2))+fill
Out[104]: '``...``````.....``...``....``````````...``'

python string manipulation [duplicate]

I have a string s with nested brackets: s = "AX(p>q)&E((-p)Ur)"
I want to remove all characters between all pairs of brackets and store in a new string like this: new_string = AX&E
i tried doing this:
p = re.compile("\(.*?\)", re.DOTALL)
new_string = p.sub("", s)
It gives output: AX&EUr)
Is there any way to correct this, rather than iterating each element in the string?

Another simple option is removing the innermost parentheses at every stage, until there are no more parentheses:
p = re.compile("\([^()]*\)")
count = 1
while count:
s, count = p.subn("", s)
Working example: http://ideone.com/WicDK

You can just use string manipulation without regular expression
>>> s = "AX(p>q)&E(qUr)"
>>> [ i.split("(")[0] for i in s.split(")") ]
['AX', '&E', '']
I leave it to you to join the strings up.

>>> import re
>>> s = "AX(p>q)&E(qUr)"
>>> re.compile("""\([^\)]*\)""").sub('', s)
'AX&E'

Yeah, it should be:
>>> import re
>>> s = "AX(p>q)&E(qUr)"
>>> p = re.compile("\(.*?\)", re.DOTALL)
>>> new_string = p.sub("", s)
>>> new_string
'AX&E'

Nested brackets (or tags, ...) are something that are not possible to handle in a general way using regex. See http://www.amazon.de/Mastering-Regular-Expressions-Jeffrey-Friedl/dp/0596528124/ref=sr_1_1?ie=UTF8&s=gateway&qid=1304230523&sr=8-1-spell for details why. You would need a real parser.
It's possible to construct a regex which can handle two levels of nesting, but they are already ugly, three levels will already be quite long. And you don't want to think about four levels. ;-)

You can use PyParsing to parse the string:
from pyparsing import nestedExpr
import sys
s = "AX(p>q)&E((-p)Ur)"
expr = nestedExpr('(', ')')
result = expr.parseString('(' + s + ')').asList()[0]
s = ''.join(filter(lambda x: isinstance(x, str), result))
print(s)
Most code is from: How can a recursive regexp be implemented in python?

You could use re.subn():
import re
s = 'AX(p>q)&E((-p)Ur)'
while True:
s, n = re.subn(r'\([^)(]*\)', '', s)
if n == 0:
break
print(s)
Output
AX&E

this is just how you do it:
# strings
# double and single quotes use in Python
"hey there! welcome to CIP"
'hey there! welcome to CIP'
"you'll understand python"
'i said, "python is awesome!"'
'i can\'t live without python'
# use of 'r' before string
print(r"\new code", "\n")
first = "code in"
last = "python"
first + last #concatenation
# slicing of strings
user = "code in python!"
print(user)
print(user[5]) # print an element
print(user[-3]) # print an element from rear end
print(user[2:6]) # slicing the string
print(user[:6])
print(user[2:])
print(len(user)) # length of the string
print(user.upper()) # convert to uppercase
print(user.lstrip())
print(user.rstrip())
print(max(user)) # max alphabet from user string
print(min(user)) # min alphabet from user string
print(user.join([1,2,3,4]))
input()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

In python, how to 'if finditer(...) has no matches'? - python

Related

Print the first, second occurred character in a list

How to find what matched in any() with Python?

How can you group a very specfic pattern with regex?

Regular expression to replace a character on odd repeated occurrences in Python

python string manipulation [duplicate]

Categories

Resources