How to find what matched in any() with Python? - python

I'm working in Python, using any() like so to look for a match between a String[] array and a comment pulled from Reddit's API.
Currently, I'm doing it like this:
isMatch = any(string in comment.body for string in myStringArray)
But it would also be useful to not just know if isMatch is true, but which element of myStringArray it was that had a match. Is there a way to do this with my current approach, or do I have to find a different way to search for a match?

You could use next with default=False on a conditional generator expression:
next((string for string in myStringArray if string in comment.body), default=False)
The default is returned when there is no item that matched (so it's like any returning False), otherwise the first matching item is returned.
This is roughly equivalent to:
isMatch = False # variable to store the result
for string in myStringArray:
if string in comment.body:
isMatch = string
break # after the first occurrence stop the for-loop.
or if you want to have isMatch and whatMatched in different variables:
isMatch = False # variable to store the any result
whatMatched = '' # variable to store the first match
for string in myStringArray:
if string in comment.body:
isMatch = True
whatMatched = string
break # after the first occurrence stop the for-loop.

For python 3.8 or newer use Assignment Expressions
if any((match := string) in comment.body for string in myStringArray):
print(match)

I agree with the comment that an explicit loop would be clearest. You could fudge your original like so:
isMatch = any(string in comment.body and remember(string) for string in myStringArray)
^^^^^^^^^^^^^^^^^^^^^
where:
def remember(x):
global memory
memory = x
return True
Then the global memory will contain the matched string if isMatch is True, or retain whatever value (if any) it originally had if isMatch is False.

It's not a good idea to use one variable to store two different kinds of information: whether a string matches (a bool) and what that string is (a string).
You really only need the second piece of information: while there are creative ways to do this in one statement, as in the above answer, it really makes sense to use a for loop:
match = ''
for string in myStringArray:
if string in comment.body:
match = string
break
if match:
pass # do stuff

Say you have a = ['a','b','c','d'] and b = ['x','y','d','z'],
so that by doing any(i in b for i in a) you get True.
You can get:
The array of matches : matches = list( (i in b for i in a) )
Where in a it first matches : posInA = matches.index(True)
The value : value = a[posInA]
Where in b it first matches : posInB = b.index(value)
To get all the values and their indexes, the problem is that matches == [False, False, True, True] whether the multiple values are in a or b, so you need to use enumerate in loops (or in a list comprehension).
for m,i in enumerate(a):
print('considering '+i+' at pos '+str(m)+' in a')
for n,j in enumerate(b):
print('against '+j+' at pos '+str(n)+' in b')
if i == j:
print('in a: '+i+' at pos '+str(m)+', in b: '+j+' at pos '+str(n))

Related

How to replace multiple matches in Regex

I'm trying to replace '=' with '==' in the following string:
log="[x] = '1' and [y] <> '7' or [z]='51'".
Unfortunately, only the second '=' is getting replaced. Why is the first one not being replaced and how do I replace the first one as well?
def subs_equal_sign(logic):
y = re.compile(r'\]\s?\=\s?')
iterator = y.finditer(logic)
for match in iterator:
j = str(match.group())
return logic.replace(j, ']==')
The output should be:
log="[x] == '1' and [y] <> '7' or [z]=='51'".
This is what i get instead:
log="[x] = '1' and [y] <> '7' or [z]=='51'".
for match in iterator:
j = str(match.group())
return logic.replace(j, ']==')
This part goes through the matches and doesn't do any replacing.
Only when you leave the loop, you do replacing - that's why it changes only the last one. ;)
Also, you do replacing without using the regex - simple str.replace takes all substrings matches and replaces them. So if your first = didn't have space before, it would get changed anyway!
Looking at your regex, there is only one space possible between ] and =, so why not do the replacing on those two cases, instead of using regexes? ;)
def subs_equal_sign(logic):
return logic.replace(']=', ']==').replace('] =', ']==')
Maybe the replace() function is what you are looking for :
log="[x] = '1' and [y] <> '7' or [z]='51'"
log = log.replace("=", "==")
Change your function to
def subs_equal_sign(logic):
y = re.compile(r'\]\s?\=\s?')
return y.sub("]==", logic)
and the output will now be
>>> subs_equal_sign('''log="[x] = '1' and [y] <> '7' or [z]='51'".''')
'log="[x]==\'1\' and [y] <> \'7\' or [z]==\'51\'".'
as expected.
#h4z3 correctly pointed out that your key problem is iterating through the matched groups without doing anything to them. You can make it work by simply using re.sub() to replace all occurrences at once.
A quick way to deal with this is to remove the whitespace:
def subs_equal_sign(logic):
for k in range(len(logic))):
logic[k].replace(' ','')
y = re.compile(r'\]\s?\=\s?')
iterator = y.finditer(logic)
for match in iterator:
j = str(match.group())
return logic.replace(j, ']==')
Does the string represent the branching logic for a REDCap variable? If so, I wrote a function a while back that should convert REDCap's SQL-like syntax to a pythonic form. Here it is:
def make_pythonic(str):
"""
Takes the branching logic string of a field name
and converts the syntax to that of Python.
"""
# make list of all checkbox vars in branching_logic string
# NOTE: items in list have the same serialization (ordering)
# as in the string.
checkbox_snoop = re.findall('[a-z0-9_]*\([0-9]*\)', str)
# if there are entries in checkbox_snoop
if len(checkbox_snoop) > 0:
# serially replace "[mycheckboxvar(888)]" syntax of each
# checkbox var in the logic string with the appropraite
# "record['mycheckboxvar___888']" syntax
for item in checkbox_snoop:
item = re.sub('\)', '', item)
item = re.sub('\(', '___', item)
str = re.sub('[a-z0-9_]*\([0-9]*\)', item, str)
# mask and substitute
str = re.sub('<=', 'Z11Z', str)
str = re.sub('>=', 'X11X', str)
str = re.sub('=', '==', str)
str = re.sub('Z11Z', '<=', str)
str = re.sub('X11X', '>=', str)
str = re.sub('<>', '!=', str)
str = re.sub('\[', 'record[\'', str)
str = re.sub('\]', '\']', str)
# return the string
return str
This could replace the given character with the new char to be replaced in the entire string.
log=log.replace("=","==")#Replaces the given substring with new string
print(log)#Display

How does comparing two chars (within a string) work in Python

I am starting to learn Python and looked at following website: https://www.w3resource.com/python-exercises/string/
I work on #4 which is "Write a Python program to get a string from a given string where all occurrences of its first char have been changed to '$', except the first char itself."
str="restart"
char=str[0]
print(char)
strcpy=str
i=1
for i in range(len(strcpy)):
print(strcpy[i], "\n")
if strcpy[i] is char:
strcpy=strcpy.replace(strcpy[i], '$')
print(strcpy)
I would expect "resta$t" but the actual result is: $esta$t
Thank you for your help!
There are two issues, first, you are not starting iteration where you think you are:
i = 1 # great, i is 1
for i in range(5):
print(i)
0
1
2
3
4
i has been overwritten by the value tracking the loop.
Second, the is does not mean value equivalence. That is reserved for the == operator. Simpler types such as int and str can make it seem like is works in this fashion, but other types do not behave this way:
a, b = 5, 5
a is b
True
a, b = "5", "5"
a is b
True
a==b
True
### This doesn't work
a, b = [], []
a is b
False
a == b
True
As #Kevin pointed out in the comments, 99% of the time, is is not the operator you want.
As far as your code goes, str.replace will replace all instances of the argument supplied with the second arg, unless you give it an optional number of instances to replace. To avoid replacing the first character, grab the first char separately, like val = somestring[0], then replace the rest using a slice, no need for iteration:
somestr = 'restart' # don't use str as a variable name
val = somestr[0] # val is 'r'
# somestr[1:] gives 'estart'
x = somestr[1:].replace(val, '$')
print(val+x)
# resta$t
If you still want to iterate, you can do that over the slice as well:
# collect your letters into a list
letters = []
char = somestr[0]
for letter in somestr[1:]: # No need to track an index here
if letter == char: # don't use is, use == for value comparison
letter = '$' # change letter to a different value if it is equal to char
letters.append(letter)
# Then use join to concatenate back to a string
print(char + ''.join(letters))
# resta$t
There are some need of modification on your code.
Modify your code with as given in below.
strcpy="restart"
i=1
for i in range(len(strcpy)):
strcpy=strcpy.replace(strcpy[0], '$')[:]
print(strcpy)
# $esta$t
Also, the best practice to write code in Python is to use Function. You can modify your code as given below or You can use this function.
def charreplace(s):
return s.replace(s[0],'$')[:]
charreplace("restart")
#'$esta$t'
Hope this helpful.

In python, how to 'if finditer(...) has no matches'?

I would like to do something when finditer() does not find anything.
import re
pattern = "1"
string = "abc"
matched_iter = re.finditer(pattern, string)
# <if matched_iter is empty (no matched found>.
# do something.
# else
for m in matched_iter:
print m.group()
The best thing I could come up with is to keep track of found manually:
mi_no_find = re.finditer(r'\w+',"$$%%%%") # not matching.
found = False
for m in mi_no_find:
print m.group()
found = True
if not found:
print "Nothing found"
Related posts that don't answer:
Counting finditer matches: Number of regex matches (I don't need to count, I just need to know if there are no matches).
finditer vs match: different behavior when using re.finditer and re.match (says always have to loop over an iterator returned by finditer)
[edit]
- I have no interest in enumerating or counting total output. Only if found else not found actions.
- I understand I can put finditer into a list, but this would be inefficient for large strings. One objective is to have low memory utilization.
Updated 04/10/2020
Use re.search(pattern, string) to check if a pattern exists.
pattern = "1"
string = "abc"
if re.search(pattern, string) is None:
print('do this because nothing was found')
Returns:
do this because nothing was found
If you want to iterate over the return, then place the re.finditer() within the re.search().
pattern = '[A-Za-z]'
string = "abc"
if re.search(pattern, string) is not None:
for thing in re.finditer(pattern, string):
print('Found this thing: ' + thing[0])
Returns:
Found this thing: a
Found this thing: b
Found this thing: c
Therefore, if you wanted both options, use the else: clause with the if re.search() conditional.
pattern = "1"
string = "abc"
if re.search(pattern, string) is not None:
for thing in re.finditer(pattern, string):
print('Found this thing: ' + thing[0])
else:
print('do this because nothing was found')
Returns:
do this because nothing was found
previous reply below (not sufficient, just read above)
If the .finditer() does not match a pattern, then it will not perform any commands within the related loop.
So:
Set the variable before the loop you are using to iterate over the regex returns
Call the variable after (And outside of) the loop you are using to iterate over the regex returns
This way, if nothing is returned from the regex call, the loop won't execute and your variable call after the loop will return the exact same variable it was set to.
Below, example 1 demonstrates the regex finding the pattern. Example 2 shows the regex not finding the pattern, so the variable within the loop is never set.
Example 3 shows my suggestion - where the variable is set before the regex loop, so if the regex does not find a match (and subsequently, does not trigger the loop), the variable call after the loop returns the initial variable set (Confirming the regex pattern was not found).
Remember to import the import re module.
EXAMPLE 1 (Searching for the characters 'he' in the string 'hello world' will return 'he')
my_string = 'hello world'
pat = '(he)'
regex = re.finditer(pat,my_string)
for a in regex:
b = str(a.groups()[0])
print(b)
# returns 'he'
EXAMPLE 2 (Searching for the characters 'ab' in the string 'hello world' do not match anything, so the 'for a in regex:' loop does not execute and does not assign the b variable any value.)
my_string = 'hello world'
pat = '(ab)'
regex = re.finditer(pat,my_string)
for a in regex:
b = str(a.groups()[0])
print(b)
# no return
EXAMPLE 3 (Searching for the characters 'ab' again, but this time setting the variable b to 'CAKE' before the loop, and calling the variable b after, outside of the loop returns the initial variable - i.e. 'CAKE' - since the loop did not execute).
my_string = 'hello world'
pat = '(ab)'
regex = re.finditer(pat,my_string)
b = 'CAKE' # sets the variable prior to the for loop
for a in regex:
b = str(a.groups()[0])
print(b) # calls the variable after (and outside) the loop
# returns 'CAKE'
It's also worth noting that when designing your pattern to feed into the regex, make sure to use the parenthesis to indicate the start and end of a group.
pattern = '(ab)' # use this
pattern = 'ab' # avoid using this
To tie back to the initial question:
Since nothing found won’t execute the for loop (for a in regex), the user can preload the variable, then check it after the for loop for the original loaded value. This will allow for the user to know if nothing was found.
my_string = 'hello world'
pat = '(ab)'
regex = re.finditer(pat,my_string)
b = 'CAKE' # sets the variable prior to the for loop
for a in regex:
b = str(a.groups()[0])
if b == ‘CAKE’:
# action taken if nothing is returned
If performance isn't an issue, simply use findall or list(finditer(...)), which returns a list.
Otherwise, you can "peek" into the generator with next, then loop as normal if it raises StopIteration. Though there are other ways to do it, this is the simplest to me:
import itertools
import re
pattern = "1"
string = "abc"
matched_iter = re.finditer(pattern, string)
try:
first_match = next(matched_iter)
except StopIteration:
print("No match!") # action for no match
else:
for m in itertools.chain([first_match], matched_iter):
print(m.group())
You can probe the iterator with next and then chain the results back together while excepting StopIteration which means the iterator was empty:
import itertools as it
matches = iter([])
try:
probe = next(matches)
except StopIteration:
print('empty')
else:
for m in it.chain([probe], matches):
print(m)
Regarding your solution you could check m directly, setting it to None beforehand:
matches = iter([])
m = None
for m in matches:
print(m)
if m is None:
print('empty')
It prints the original string if there are no matches in the string.
It will replace the position n of the string.
For more reference: https://docs.python.org/2/howto/regex.html
Input_Str = "FOOTBALL"
def replacing(Input_String, char_2_replace, replaced_char, n):
pattern = re.compile(char_2_replace)
if len(re.findall(pattern, Input_String)) >= n:
where = [m for m in pattern.finditer(Input_String)][n-1]
before = Input_String[:where.start()]
after = Input_String[where.end():]
newString = before + replaced_char + after
else:
newString = Input_String
return newString
print(replacing(Input_Str, 'L', 'X', 4))```
I know this answer is late, but very suitable for Python 3.8+
You can use the new warlus operator := operator along with next(iterator[, default]) to solve for 'no matches' in re.finditer(pattern, string, flags=0) somewhat like this:
import re
pattern_ = "1"
string_ = "abc"
def is_match():
was_found = False
while next((match := re.finditer(pattern_, string_)), None) is not None:
was_found = True
yield match.group() # or just print it
return was_found

Simplifying many if-statements

Is there a way to simplify this pile of if-statements? This parsing function sure works (with the right dictionaries), but it has to test 6 if-statements for each word in the input. For a 5-word sentence that would be 30 if-statements. It is also kind of hard to read.
def parse(text):
predicate=False
directObjectAdjective=False
directObject=False
preposition=False
indirectObjectAdjective=False
indirectObject=False
text=text.casefold()
text=text.split()
for word in text:
if not predicate:
if word in predicateDict:
predicate=predicateDict[word]
continue
if not directObjectAdjective:
if word in adjectiveDict:
directObjectAdjective=adjectiveDict[word]
continue
if not directObject:
if word in objectDict:
directObject=objectDict[word]
continue
if not preposition:
if word in prepositionDict:
preposition=prepositionDict[word]
continue
if not indirectObjectAdjective:
if word in adjectiveDict:
indirectObjectAdjective=adjectiveDict[word]
continue
if not indirectObject:
if word in objectDict:
indirectObject=objectDict[word]
continue
if not directObject and directObjectAdjective:
directObject=directObjectAdjective
directObjectAdjective=False
if not indirectObject and indirectObjectAdjective:
indirectObject=indirectObjectAdjective
indirectObjectAdjective=False
return [predicate,directObjectAdjective,directObject,preposition,indirectObjectAdjective,indirectObject]
Here's also a sample of a dictionary, if that's needed.
predicateDict={
"grab":"take",
"pick":"take",
"collect":"take",
"acquire":"take",
"snag":"take",
"gather":"take",
"attain":"take",
"capture":"take",
"take":"take"}
This is more of a Code Review question than a Stack Overflow one. A major issue is that you have similar data that you're keeping in separate variables. If you combine your variables, then you can iterate over them.
missing_parts_of_speech = ["predicate", [...]]
dict_look_up = {"predicate":predicateDict,
[...]
}
found_parts_of_speech = {}
for word in text:
for part in missing_parts_of_speech:
if word in dict_look_up[part]:
found_parts_of_speech[part] = dict_look_up[part][word]
missing_parts_of_speech.remove(part)
continue
I would suggest to simply use the method dict.get. This method has the optional argument default. By passing this argument you can avoid a KeyError. If the key is not present in a dictionary, the default value will be returned.
If you use the previously assigned variable as default, it will not be replaced by an arbitrary value, but the correct value. E.g., if the current word is a "predicate" the "direct object" will be replaced by the value that was already stored in the variable.
CODE
def parse(text):
predicate = False
directObjectAdjective = False
directObject = False
preposition = False
indirectObjectAdjective = False
indirectObject = False
text=text.casefold()
text=text.split()
for word in text:
predicate = predicateDict.get(word, predicate)
directObjectAdjective = adjectiveDict.get(word, directObjectAdjective)
directObject = objectDict.get(word, directObject)
preposition = prepositionDict.get(word, preposition)
indirectObjectAdjective = adjectiveDict.get(word, indirectObjectAdjective)
indirectObject = objectDict.get(word, indirectObject)
if not directObject and directObjectAdjective:
directObject = directObjectAdjective
directObjectAdjective = False
if not indirectObject and indirectObjectAdjective:
indirectObject = indirectObjectAdjective
indirectObjectAdjective = False
return [predicate, directObjectAdjective, directObject, preposition, indirectObjectAdjective, indirectObject]
PS: Use a little more spaces. Readers will thank you...
PPS: I have not tested this, for I do not have such dictionaries at hand.
PPPS: This will always return the last occurances of the types within the text, while your implementation will always return the first occurances.
You could map the different kinds of words (as strings) to dictionaries where to find those words, and then just check which of those have not been found yet and look them up if they are in those dicts.
needed = {"predicate": predicateDict,
"directObjectAdjective": adjectiveDict,
"directObject": objectDict,
"preposition": prepositionDict,
"indirectObjectAdjective": adjectiveDict,
"indirectObject": objectDict}
for word in text:
for kind in needed:
if isinstance(needed[kind], dict) and word in needed[kind]:
needed[kind] = needed[kind][word]
continue
In the end (and in each step on the way) all the items in needed that do not have a dict as a value have been found and replaced by the value from their respective dict.
(In retrospect, it might make more sense to ue two dictionaries, or one dict and a set: One for the final value for that kind of word, and one for whether they have already been found. Would probably be a bit easier to grasp.)
I suggest that you use a new pattern to write this code instead the old one. The new pattern has 9 lines and stay 9 lines - just add more dictionaries to D. The old has already 11 lines and will grow 4 lines with every additional dictionaries to test.
aDict = { "a1" : "aa1", "a2" : "aa1" }
bDict = { "b1" : "bb1", "b2" : "bb2" }
text = ["a1", "b2", "a2", "b1"]
# old pattern
a = False
b = False
for word in text:
if not a:
if word in aDict:
a = aDict[word]
continue
if not b:
if word in bDict:
b = bDict[word]
continue
print(a, b)
# new pattern
D = [ aDict, bDict]
A = [ False for _ in D]
for word in text:
for i, a in enumerate(A):
if not a:
if word in D[i]:
A[i] = D[i][word]
continue
print(A)

How can you group a very specfic pattern with regex?

Problem:
https://coderbyte.com/editor/Simple%20Symbols
The str parameter will be composed of + and = symbols with
several letters between them (ie. ++d+===+c++==a) and for the string
to be true each letter must be surrounded by a + symbol. So the string
to the left would be false. The string will not be empty and will have
at least one letter.
Input:"+d+=3=+s+"
Output:"true"
Input:"f++d+"
Output:"false"
I'm trying to create a regular expression for the following problem, but I keep running into various problems. How can I produce something that returns the specified rules('+\D+')?
import re
plusReg = re.compile(r'[(+A-Za-z+)]')
plusReg.findall()
>>> []
Here I thought I could create my own class that searches for the pattern.
import re
plusReg = re.compile(r'([\\+,\D,\\+])')
plusReg.findall('adf+a+=4=+S+')
>>> ['a', 'd', 'f', '+', 'a', '+', '=', '=', '+', 'S', '+']
Here I thought I the '\\+' would single out the plus symbol and read it as a char.
mo = plusReg.search('adf+a+=4=+S+')
mo.group()
>>>'a'
Here using the same shell, I tried using the search instead of findall, but I just ended up with the first letter which isn't even surrounded by a plus.
My end result is to group the string 'adf+a+=4=+S+' into ['+a+','+S+'] and so on.
edit:
Solution:
import re
def SimpleSymbols(str):
#added padding, because if str = 'y+4==+r+'
#then program would return true when it should return false.
string = '=' + str + '='
#regex that returns false if a letter *doesn't* have a + in front or back
plusReg = re.compile(r'[^\+][A-Za-z].|.[A-Za-z][^\+]')
#if statement that returns "true" if regex doesn't find any letters
#without a + behind or in front
if plusReg.search(string) is None:
return "true"
return "false"
print SimpleSymbols(raw_input())
I borrowed some code from ekhumoro and Sanjay. Thanks
One approach is to search the string for any letters that are either: (1) not preceeded by a +, or (2) not followed by a +. This can be done using look ahead and look behind assertions:
>>> rgx = re.compile(r'(?<!\+)[a-zA-Z]|[a-zA-Z](?!\+)')
So if rgx.search(string) returns None, the string is valid:
>>> rgx.search('+a+') is None
True
>>> rgx.search('+a+b+') is None
True
but if it returns a match, the string is invalid:
>>> rgx.search('+ab+') is None
False
>>> rgx.search('+a=b+') is None
False
>>> rgx.search('a') is None
False
>>> rgx.search('+a') is None
False
>>> rgx.search('a+') is None
False
The important thing about look ahead/behind assertions is that they don't consume characters, so they can handle overlapping matches.
Something like this should do the trick:
import re
def is_valid_str(s):
return re.findall('[a-zA-Z]', s) == re.findall('\+([a-zA-Z])\+', s)
Usage:
In [10]: is_valid_str("f++d+")
Out[10]: False
In [11]: is_valid_str("+d+=3=+s+")
Out[11]: True
I think you are on the right track. The regular expression you have is correct, but it can simplify down to just letters:
search_pattern = re.compile(r'\+[a-zA-z]\+')
for upper and lower case strings. Now we can use this regex with the findall function:
results = re.findall(search_pattern, 'adf+a+=4=+S+') # returns ['+a+', '+S+']
Now the question needs you to return a boolean depending on if the string is valid to the specified pattern so we can wrap this all up into a function:
def is_valid_pattern(pattern_string):
search_pattern = re.compile(r'\+[a-zA-z]?\+')
letter_pattern = re.compile(r'[a-zA-z]') # to search for all letters
results = re.findall(search_pattern, pattern_string)
letters = re.findall(letter_pattern, pattern_string)
# if the lenght of the list of all the letters equals the length of all
# the values found with the pattern, we can say that it is a valid string
return len(results) == len(letter_pattern)
You should be looking for what isn't there, as opposed to what is. You should search for something like, ([^\+][A-Za-z]|[A-Za-z][^\+]). The | in the middle is a logical or operator. Then on either side, it checks if it can find any scenario where there is a letter without a "+" on the left/right respectively. If if finds something, that means the string fails. If it can't find anything, that means that there are no instances of a letter not being surrounded by "+"'s.

Categories

Resources