string match algorithm code for advice

string match algorithm code for advice - python

Debugging the following problem, post problem and code reference I am debugging. My question is, I think this if condition check if not necessary, and could be removed safely? If I am wrong, please feel free to correct me. Thanks.
if len(first) > 1 and first[0] == '*' and len(second) == 0:
return False
Given two strings where first string may contain wild card characters and second string is a normal string. Write a function that returns true if the two strings match. The following are allowed wild card characters in first string.
* --> Matches with 0 or more instances of any character or set of characters.
? --> Matches with any one character.
For example, g*ks matches with geeks match. And string ge?ks* matches with geeksforgeeks (note * at the end of first string). But g*k doesn’t match with gee as character k is not present in second string.
# Python program to match wild card characters
# The main function that checks if two given strings match.
# The first string may contain wildcard characters
def match(first, second):
# If we reach at the end of both strings, we are done
if len(first) == 0 and len(second) == 0:
return True
# Make sure that the characters after '*' are present
# in second string. This function assumes that the first
# string will not contain two consecutive '*'
if len(first) > 1 and first[0] == '*' and len(second) == 0:
return False
# If the first string contains '?', or current characters
# of both strings match
if (len(first) > 1 and first[0] == '?') or (len(first) != 0
and len(second) !=0 and first[0] == second[0]):
return match(first[1:],second[1:]);
# If there is *, then there are two possibilities
# a) We consider current character of second string
# b) We ignore current character of second string.
if len(first) !=0 and first[0] == '*':
return match(first[1:],second) or match(first,second[1:])
return False
thanks in advance,
Lin

That if statement is critical to the proper operation of the function. Removing it will have disastrous consequences.
For example, assume that first="*a" and second="". In other words, the function was called as match("*a",""). Then the if statement will cause the function to return False (which is correct since there is no a in second). Without the if statement, the code will proceed to the line
return match(first[1:],second) or match(first,second[1:])
The call match(first[1:],second) will evaluate to match("a","") which will return False. But when the code calls match(first,second[1:]), the call is equivalent to match("*a",""), and the result is infinite recursion.

Related

Not Clear How to Account for a Specific Test Case in My Function that Compares Strings Looking for Differences

Write a function called singleline_diff_format that takes two single line strings and the index of the first difference and will generate a formatted string that will allow a user to clearly see where the first difference between two lines is located. A user is likely going to want to see where the difference is in the context of the lines, not just see a number. Your function will return a three-line string that looks as follows:
#abcd
#==^
#abef
The format of these three lines is:
1) The complete first line.
2) A separator line that consists of repeated equal signs ("=") up until the first difference. A "^" symbol indicates the position of the first difference.
3) The complete second line.
If either line contains a newline or carriage return character ("\n" or "\r") then the function returns an empty string (since the lines are not single lines and the output format will not make sense to a person reading it).
If the index is not a valid index that could indicate the position of the first difference of the two input lines, the function should also return an empty string (again because the output would not make sense otherwise). It must, therefore, be between 0 and the length of the shorter line. Note that you do not need to check whether the index actually identifies the correct location of the first difference, as that should have been computed correctly prior to calling this function.
I am able to write the function, and I used an if-else setup to evaluate if the index doesn't equal -1 and if the lines contain any '\r' or '\n' characters. I then have it printing out the result like in the instructions above. If it doesn't meet those cases, then the function returns an empty string.
def singleline_diff_format(line1, line2, idx):
"""
Inputs:
line1 - first single line string
line2 - second single line string
idx - index at which to indicate difference
Output:
Returns a three line formatted string showing the location
of the first difference between line1 and line2.
If either input line contains a newline or carriage return,
then returns an empty string.
If idx is not a valid index, then returns an empty string.
"""
if idx != -1 and "\n" not in line1 and "\n" not in line2 and "\r" not in line1 and "\r" not in line2:
difference = line1 + "\n" + "=" * idx + "^" + "\n" + line2 + "\n"
return difference
else:
return ""
The problem I am running into is how to address "If the index is not a valid index that could indicate the position of the first difference of the two input lines, the function should also return an empty string (again because the output would not make sense otherwise). It must therefore be between 0 and the length of the shorter line."
print(singleline_diff_format('abcdefg', 'abc', 5)) #should return empty string
Instead, I get this:
abcdefg
===^
abc
As it stands right now, my if conditional is pretty long. I am not sure of a good way to account for if the index is bigger than the length of the shorter line in my conditional. I have two questions.
1) Is there a way to condense down my current conditional into a more elegant statement?
2) How do I account for a scenario where the index can exceed the length of a shorter line? I have a function (see below) that might help with that. Should I invoke it, and if so, how do I invoke it for this case?
Potential useful function
IDENTICAL = -1
def singleline_diff(line1, line2):
"""
Inputs:
line1 - first single line string
line2 - second single line string
Output:
Returns the index where the first difference between
line1 and line2 occurs.
Returns IDENTICAL if the two lines are the same.
"""
if len(line1) > len(line2):
i = 0
for i in range(len(line2)):
if line1[i] == line2[i]:
i += 1
elif line1[i] != line2[i]:
return i
return len(line2)
elif len(line1) < len(line2):
i = 0
for i in range(len(line1)):
if line1[i] == line2[i]:
i += 1
elif line1[i] != line2[i]:
return i
return len(line1)
else: #Condition where the lengths of the strings are equal
i = 0
for i in range(len(line1)):
if line1[i] == line2[i]:
i += 1
elif line1[i] != line2[i]:
return i
return IDENTICAL

First of all, there is no need to compound all of your special conditions into a single check. Your program will be much easier to read if you handle those separately. Also, use temporary variables to avoid calling functions repeatedly. For starters ...
len1 = len(line1)
len2 = len(line2)
empty = ""
Now, your problem condition is simply
# Check for invalid index: too long or negative
if idx >= min(line1, line2) or idx < 0:
return empty
Continuing ...
# Check for return chars
both_line = line1 + line2
nl = '\n'
ret = '\r'
if nl in both_line or ret in both_line:
return empty
You can also simplify your difference checking. zip will let you make nice character pairs from your two strings; enumerate will let you iterate through the pairs and a loop index. In the first two examples below, there is no difference within the range of the shorter string, so there's no output.
def diff(line1, line2):
for idx, pair in enumerate(zip(line1, line2)):
if pair[0] != pair[1]:
print(idx, pair)
diff("abc", "abc")
diff("abc", "abcd")
diff("abz", "abc")
diff("abc", "qbc")
Output:
2 ('z', 'c')
0 ('a', 'q')
I'll leave the application as an exercise for the student. :-)

How do I detect any of 4 characters in a string and then return their index?

I would understand how to do this assuming that I was only looking for one specific character, but in this instance I am looking for any of the 4 operators, '+', '-', '*', '/'. The function returns -1 if there is no operator in the passed string, txt, otherwise it returns the position of the leftmost operator. So I'm thinking find() would be optimal here.
What I have so far:
def findNextOpr(txt):
# txt must be a nonempty string.
if len(txt) <= 0 or not isinstance(txt, str):
print("type error: findNextOpr")
return "type error: findNextOpr"
if '+' in txt:
return txt.find('+')
elif '-' in txt:
return txt.find('-')
else
return -1
I think if I did what I did for the '+' and '-' operators for the other operators, it wouldn't work for multiple instances of that operator in one expression. Can a loop be incorporated here?

Your current approach is not very efficient, as you will iterate over txt, multiple times, 2 (in and find()) for each operator.
You could use index() instead of find() and just ignore the ValueError exception , e.g.:
def findNextOpr(txt):
for o in '+-*/':
try:
return txt.index(o)
except ValueError:
pass
return -1
You can do this in a single (perhaps more readable) pass by enumerate()ing the txt and return if you find the character, e.g.:
def findNextOpr(txt):
for i, c in enumerate(txt):
if c in '+-*/':
return i
return -1
Note: if you wanted all of the operators you could change the return to yield, and then just iterate over the generator, e.g.:
def findNextOpr(txt):
for i, c in enumerate(txt):
if c in '+-*/':
yield i
In []:
for op in findNextOpr('1+2-3+4'):
print(op)
Out[]:
1
3
5

You can improve your code a bit because you keep looking at the string a lot of times. '+' in txt actually searches through the string just like txt.find('+') does. So you can combine those easily to avoid having to search through it twice:
pos = txt.find('+')
if pos >= 0:
return pos
But this still leaves you with the problem that this will return for the first operator you are looking for if that operator is contained anywhere within the string. So you don’t actually get the first position any of these operators is within the string.
So what you want to do is look for all operators separately, and then return the lowest non-negative number since that’s the first occurence of any of the operators within the string:
plusPos = txt.find('+')
minusPos = txt.find('-')
multPos = txt.find('*')
divPos = txt.find('/')
return min(pos for pos in (plusPos, minusPos, multPos, divPos) if pos >= 0)

First, you shouldn't be printing or returning error messages; you should be raising exceptions. TypeError and ValueError would be appropriate here. (A string that isn't long enough is the latter, not the former.)
Second, you can simply find the the positions of all the operators in the string using a list comprehension, exclude results of -1, and return the lowest of the positions using min().
def findNextOpr(text, start=0):
ops = "+-/*"
assert isinstance(text, str), "text must be a string"
# "text must not be empty" isn't strictly true:
# you'll get a perfectly sensible result for an empty string
assert text, "text must not be empty"
op_idxs = [pos for pos in (text.find(op, start) for op in ops) if pos > -1]
return min(op_idxs) if op_idxs else -1
I've added a start argument that can be used to find the next operator: simply pass in the index of the last-found operator, plus 1.

Why is my new split array element being taken as a non-int value?

My question is regarding the way Python3 handles certain array elements.
Here is my code:
def isIPv4(inputStr):
inputStr.split('.') #splits input into array elements (no periods)
val = []
for i in inputStr:
if not i.isdigit(): #if the element is not a digit (valid to convert to INT).
return False
if int(i) >=0 and int(i)<=255: #element value between 0 and 255.
val.append(i)
else:
return False
return len(val) == 4 #array has 4 elements ^^
The code should let me know if the input is an IPv4 address, meaning for numbers between 0 and 255, separated by periods. Returns True or False.
inputs that work:
inputStr: "1.1.1.1a"
inputStr: "0..1.0"
For both, my code correctly returns False.
inputs that do not work:
inputStr: "172.16.254.1"
inputStr: "0.254.255.0"
For these, my code also returns False, while it should return True instead.
As you can see, the program handles the splitting of the dot separated values properly, however, even though '1a' is being correctly thrown out as a non-int, '0' and also '172' are being thrown out.
I realize that '0' and '172' are both strings, so is there something I should know about how the Python3 module handles this data?

inputStr.split('.') returns a list. You ignored that list altogether. inputStr itself is an immutable string and does not change.
so inputStr stays the original string, and you are testing if each individual character is a digit. That fails for the . characters.
You need to store the result of the str.split() call and test against that:
def isIPv4(inputStr):
parts = inputStr.split('.')
for part in parts:
if not part.isdigit():
return False
if not (0 <= int(part) <= 255):
return False
return len(parts) == 4
Note that you don't have to build a new vals list either; just test if the split result is the right length.

Create Your Own Find String Function

For a school project I have to create a function called find_str that essentially does the same thing as the .find string method, but we cannot use any string methods in our definition.
The project description reads: "Function find_str has two parameters (both strings). It returns the lowest index where the second parameter is found within the first parameter (it returns -1 if the second parameter is not found within the first parameter)."
I have spent a lot of time working on this project and have yet to come to a solution. This is the current definition that I have come up with:
def find_str (string, substring):
index = 0
length = len (substring)
for ch in string:
if ch == substring [0]:
subindex1 = 0
subindex2 = index
for i in range (length):
if ch == substring [i]:
subindex1 +=1
if subindex1 == length:
return index
ch = string [(subindex2)+1]
subindex2 +=1
index += 1
return "-1"
This sample of code only works in some instances, but not all.
For example:
print (find_str ("hello", "llo"))
returns:
2
as it should.
But
print (find_str ("hello", "el"))
returns:
ch = string [(subindex2)+1]
IndexError: string index out of range
I feel like I am overthinking this and there must be is an easier way to do it. Any input or help would be great! Thanks.

FFUsing a sub function to clear your thoughts often help.
def find_str (string, substring):
index = 0
length = len (substring)
for j in range(len(string)):
if is_next_sub(string, substring, j):
return j
return "-1"
def is_next_sub(string, substring, index):
for i in range(len(substring)):
if substring[i] != string[index + i]:
return False
return True

I'm not sure we should be helping you with 'homework'
How about this:
def find_str(string, substring):
for off in xrange(len(string)):
if string[off:].startswith(substring):
return off
return -1

I haven't checked through your code in detail, but it looks like you're trying to compare characters that don't exist.
Suppose you're searching "aaaaa" for the substring "aaa", and you need to find all matches...
String : aaaaa
Match at 0 : aaa..
Match at 1 : .aaa.
Match at 2 : ..aaa
Even though the characters always match, and there five characters in the string, there are only three positions that you might need to consider.
So before you look at the actual characters at all, you can restrict the number of start positions you might need to consider based on the lengths of the string and substring. You only loop for those start positions. That means you're not looping for start positions that cannot match. Also, if you don't do this...
String : aaaaa
Match at 0 : aaa..
Match at 1 : .aaa.
Match at 2 : ..aaa
Match at 3 : ...aa!
Match at 4 : ....a!!
Those exclamation points are places where you try to match a character in the substring with a character that doesn't exist, after the end of the string. You can check for that within the loop to avoid the error each time it occurs, but why not eliminate all those cases at once by not looping for the match positions that cannot occur?
The number of start positions you may need to check is len(fullstring) + 1 - len(substring), so you can derive a range of possible start positions using range(0, len(fullstring) + 1 - len(substring)).

a string that for every character in it, there exists all the characters which are alphabetically smaller than it before it

How to check for a string that for every character in it, there exists all the characters which are alphabetically smaller than it before it e.g aab is correct while aacb is not, because the second case, we have 'c' but 'b' is not present before it.
Also aac is not correct as it does not have 'b' before 'c'.

A pseudocode. Works for cases like abac too.
max = 'a' - 1 // character immediately before 'a'
for char in string
if char > max + 1
// bad string, stop algorithm
end
if char > max
max = char
end
end
The idea is that we need to check only that the character preceding the current one alphabetically has occurred before. If we have character e now and d has occurred before, then c, b and a did too.

Consider this as a bad answer
import string
foo = string.printable[10:36]
a = 'aac'
for i in a:
if i =='a':continue
if a.rfind(foo[foo.rfind(i)-1])!=-1:continue
else:print 'check_not cleared';break

ALPHA = 'abcdefghijklmnopqrstuvwxyz'
tests = [
'aab','abac','aabaacaabade', # First 3 tests should eval True
'ba','aac','aabbccddf' # Last 3 test should eval False
]
def CheckString(test):
alpha_counter = 0
while test:
if test[0] == ALPHA[alpha_counter]:
test = test.replace(ALPHA[alpha_counter],'')
alpha_counter+=1
else:
return False
return True
for test in tests:
print CheckString(test)
True
True
True
False
False
False
Given your criteria...
All you need to do is check the first letter to see if it passes your criteria... if it does, remove all occurrences of that letter from the string. And move onto the next letter. Your given criteria makes it easy because you just need to check alphabetically.
aabaacaabade
take the string above for example.
first letter 'a' passes criteria [there are no letters before 'a']
remove all 'a's from string remaining string: bcbde
first letter 'b' passes criteria [there was an 'a' before the 'b']
remove all 'b's from string remaining string: cde
first letter 'c' passes criteria [there was an 'a' and a 'b' before the 'c']
remove all 'c's from string remaining string: de
...
That should work if I understood your criteria correctly.

I believe to understand your question correctly, and here is my attempt at answering it, if I have mis-understood please correct me.
The standard comparisons (<, <=, >, >=, ==, !=) apply to strings. These comparisons use the standard character-by-character comparison rules for ASCII or Unicode. That being said, the greater and less than operators will compare strings using alphabetical order.

You might want to use the ascii encoding of the character.
mystr = "aab"
curr = ord(mystr[0])
for char in mystr[1:]:
if ord(char) < curr:
print "This character should not be here"
if ord(char) > curr:
curr = ord(char)
Changes made to reflect user470379's suggestion:
mystr = "aab"
curr = mystr[0]
for char in mystr[1:]:
if char < curr:
print "This character should not be here"
if char > curr:
curr = char

The idea is very simple, for each char in the string, it should not less than its preceding, and it shouldn't larger than its preceding + 1.

How about this? It simplifies the problem by first removing duplicate characters, then you only need to check the string is a prefix of the string containing all lowercase (ascii) letters.
import string
def uniq(s):
last = None
for c in s:
if c != last: yield c
last = c
def is_gapless_ascending(s):
s = ''.join(uniq(s))
return string.ascii_lowercase.startswith(s)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

string match algorithm code for advice - python

Related

Not Clear How to Account for a Specific Test Case in My Function that Compares Strings Looking for Differences

How do I detect any of 4 characters in a string and then return their index?

Why is my new split array element being taken as a non-int value?

Create Your Own Find String Function

a string that for every character in it, there exists all the characters which are alphabetically smaller than it before it

Categories

Resources