Create Your Own Find String Function - python

For a school project I have to create a function called find_str that essentially does the same thing as the .find string method, but we cannot use any string methods in our definition.
The project description reads: "Function find_str has two parameters (both strings). It returns the lowest index where the second parameter is found within the first parameter (it returns -1 if the second parameter is not found within the first parameter)."
I have spent a lot of time working on this project and have yet to come to a solution. This is the current definition that I have come up with:
def find_str (string, substring):
index = 0
length = len (substring)
for ch in string:
if ch == substring [0]:
subindex1 = 0
subindex2 = index
for i in range (length):
if ch == substring [i]:
subindex1 +=1
if subindex1 == length:
return index
ch = string [(subindex2)+1]
subindex2 +=1
index += 1
return "-1"
This sample of code only works in some instances, but not all.
For example:
print (find_str ("hello", "llo"))
returns:
2
as it should.
But
print (find_str ("hello", "el"))
returns:
ch = string [(subindex2)+1]
IndexError: string index out of range
I feel like I am overthinking this and there must be is an easier way to do it. Any input or help would be great! Thanks.

FFUsing a sub function to clear your thoughts often help.
def find_str (string, substring):
index = 0
length = len (substring)
for j in range(len(string)):
if is_next_sub(string, substring, j):
return j
return "-1"
def is_next_sub(string, substring, index):
for i in range(len(substring)):
if substring[i] != string[index + i]:
return False
return True

I'm not sure we should be helping you with 'homework'
How about this:
def find_str(string, substring):
for off in xrange(len(string)):
if string[off:].startswith(substring):
return off
return -1

I haven't checked through your code in detail, but it looks like you're trying to compare characters that don't exist.
Suppose you're searching "aaaaa" for the substring "aaa", and you need to find all matches...
String : aaaaa
Match at 0 : aaa..
Match at 1 : .aaa.
Match at 2 : ..aaa
Even though the characters always match, and there five characters in the string, there are only three positions that you might need to consider.
So before you look at the actual characters at all, you can restrict the number of start positions you might need to consider based on the lengths of the string and substring. You only loop for those start positions. That means you're not looping for start positions that cannot match. Also, if you don't do this...
String : aaaaa
Match at 0 : aaa..
Match at 1 : .aaa.
Match at 2 : ..aaa
Match at 3 : ...aa!
Match at 4 : ....a!!
Those exclamation points are places where you try to match a character in the substring with a character that doesn't exist, after the end of the string. You can check for that within the loop to avoid the error each time it occurs, but why not eliminate all those cases at once by not looping for the match positions that cannot occur?
The number of start positions you may need to check is len(fullstring) + 1 - len(substring), so you can derive a range of possible start positions using range(0, len(fullstring) + 1 - len(substring)).

Related

Python algorithm in list

In a list of N strings, implement an algorithm that outputs the largest n if the entire string is the same as the preceding n strings. (i.e., print out how many characters in front of all given strings match).
My code:
def solution(a):
import numpy as np
for index in range(0,a):
if np.equal(a[index], a[index-1]) == True:
i += 1
return solution
else:
break
return 0
# Test code
print(solution(['abcd', 'abce', 'abchg', 'abcfwqw', 'abcdfg'])) # 3
print(solution(['abcd', 'gbce', 'abchg', 'abcfwqw', 'abcdfg'])) # 0
Some comments on your code:
There is no need to use numpy if it is only used for string comparison
i is undefined when i += 1 is about to be executed, so that will not run. There is no actual use of i in your code.
index-1 is an invalid value for a list index in the first iteration of the loop
solution is your function, so return solution will return a function object. You need to return a number.
The if condition is only comparing complete words, so there is no attempt to only compare a prefix.
A possible way to do this, is to be optimistic and assume that the first word is a prefix of all other words. Then as you detect a word where this is not the case, reduce the size of the prefix until it is again a valid prefix of that word. Continue like that until all words have been processed. If at any moment you find the prefix is reduced to an empty string, you can actually exit and return 0, as it cannot get any less than that.
Here is how you could code it:
def solution(words):
prefix = words[0] # if there was only one word, this would be the prefix
for word in words:
while not word.startswith(prefix):
prefix = prefix[:-1] # reduce the size of the prefix
if not prefix: # is there any sense in continuing?
return 0 # ...: no.
return len(prefix)
The description is somewhat convoluted but it does seem that you're looking for the length of the longest common prefix.
You can get the length of the common prefix between two strings using the next() function. It can find the first index where characters differ which will correspond to the length of the common prefix:
def maxCommon(S):
cp = S[0] if S else "" # first string is common prefix (cp)
for s in S[1:]: # go through other strings (s)
cs = next((i for i,(a,b) in enumerate(zip(s,cp)) if a!=b),len(cp))
cp = cp[:cs] # truncate to new common size (cs)
return len(cp) # return length of common prefix
output:
print(maxCommon(['abcd', 'abce', 'abchg', 'abcfwqw', 'abcdfg'])) # 3
print(maxCommon(['abcd', 'gbce', 'abchg', 'abcfwqw', 'abcdfg'])) # 0

Identifying a substring in Python (moving along indices within a string) - can only concatenate str (not "int") to str

I'm new to Python and coding and stuck at comparing a substring to another string.
I've got:
string sq and pattern STR.
Goal: I'm trying to count the max number of STR pattern appearing in that string in a row.
This is part of the code:
STR = key
counter = 0
maximum = 0
for i in sq:
while sq[i:i+len(STR)] == STR:
counter += 1
i += len(STR)
The problem seems to appear in the "while part", saying TypeError: can only concatenate str (not "int") to str.
I see it treats i as a character and len(STR) as an int, but I don't get how to fix this.
The idea is to take the first substring equal to the length of STR and figure out whether this substring and STR pattern are identical.
Thank you!
By looping using:
for i in sq:
you are looping over the elements of sq.
If instead you want the variable i to loop over the possible indexes of sq, you would generally loop over range(len(sq)), so that you get values from 0 to len(sq) - 1.
for i in range(len(sq)):
However, in this case you are wanting to assign to i inside the loop:
i += len(STR)
This will not have the desired effect if you are looping over range(...) because on the next iteration it will be assigned to the next value from range, ignoring the increment that was added. In general one should not assign to a loop variable inside the loop.
So it would probably most easily be implemented with a while loop, and you set the desired value of i explicitly (i=0 to initialise, i+=1 before restarting the loop), and then you can have whatever other assignments you want inside the loop.
STR = "ell"
sq = "well, well, hello world"
counter = 0
i = 0
while i < len(sq):
while sq[i:i+len(STR)] == STR: # re use of while here, see comments
counter += 1
i += len(STR)
i += 1
print(counter) # prints 3
(You could perhaps save len(sq) and len(STR) in other variables to save evaluating them repeatedly.)
This solution doesn't use a for so the increment can be by one on a non-match and by string length on a match. Any non-match records the maximum count seen so far and resets the count.
def count_max(string,key):
if len(key) > len(string):
return 0
last = len(string) - len(key)
i = 0
count = 0
maximum = 0
while i <= last:
if string[i:i+len(key)] == key:
count += 1
i += len(key)
else:
maximum = max(maximum,count)
count = 0
i += 1
return max(maximum,count)
key = 'abc'
strings = 'ab','abc','ababcabc','abcdefabcabc','abcabcdefabc'
for string in strings:
print(count_max(string,key))
Output:
0
1
2
2
2
Here also is a potentially faster version. For short strings it isn't faster, but will be much faster if the strings are very long since the regular expression will find matches much faster than Python loops.
def count_max2(string,key):
return max([len(match) // len(key)
for match in re.findall(rf'(?:{re.escape(key)})+',string)]
,default=0)
How it works:
re.escape is a function to make sure characters in key are taken literally and are not regular expression syntax. Allows searching for + for example, instead of being treated as a "one or more" match.
rf'' is syntax for a raw f-string (format string). "raw" is recommended for regular expressions because some syntax for expressions is confused with other Python syntax. f-strings allow variables and functions to be inserted into strings with curly braces {}.
re.findall finds all consecutive matches in the string.
[f(x) for x in iterable] is a list comprehension and takes the list returned from iterable and computes a function on each item in the list. In this case, if takes the length of the match divided by the length of the key to get the number of occurrences of the key.
max(iterable,default=0) returns the maximum value of iterable, or 0 if the iterable is empty (no matches).

counting the number of substrings in a string

I am working on an Python assignment and I am stuck here.
Apparently, I have to write a code that counts the number of a given substring within a string.
I thought I got it right, then I am stuck here.
def count(substr,theStr):
# your code here
num = 0
i = 0
while substr in theStr[i:]:
i = i + theStr.find(substr)+1
num = num + 1
return num
substr = 'is'
theStr = 'mississipi'
print(count(substr,theStr))
if I run this, I expect to get 2 as the result, rather, I get 3...
See, other examples such as ana and banana works fine, but this specific example keeps making the error. I don't know what I did wrong here.
Would you PLEASE help me out.
In your code
while substr in theStr[i:]:
correctly advances over the target string theStr, however the
i = i + theStr.find(substr)+1
keeps looking from the start of theStr.
The str.find method accepts optional start and end arguments to limit the search:
str.find(sub[, start[, end]])
Return the lowest index in the string where substring sub is found
within the slice s[start:end]. Optional arguments start and end
are interpreted as in slice notation. Return -1 if sub is not found.
We don't really need to use in here: we can just check that find doesn't return -1. It's a bit wasteful performing an in search when we then need to repeat the search using find to get the index of the substring.
I assume that you want to find overlapping matches, since the str.count method can find non-overlapping matches, and since it's implemented in C it's more efficient than implementing it yourself in Python.
def count(substr, theStr):
num = i = 0
while True:
j = theStr.find(substr, i)
if j == -1:
break
num += 1
i = j + 1
return num
print(count('is', 'mississipi'))
print(count('ana', 'bananana'))
output
2
3
The core of this code is
j = theStr.find(substr, i)
i is initialised to 0, so we start searching from the beginning of theStr, and because of i = j + 1 subsequent searches start looking from the index following the last found match.
The code change you need is -
i = i + theStr[i:].find(substr)+ 1
instead of
i = i + theStr.find(substr)+ 1
In your code the substring is always found until i reaches position 4 or more. But while finding the index of the substring, you were using the original(whole) string which in turn returns the position as 1.
In your example of banana, after first iteration i becomes 2. So, in next iteration str[i:] becomes nana. And the position of substring ana in this sliced string and the original string is 1. So, the bug in the code is just suppressed and the code seems to work fine.
If your code is purely for learning purpose, the you can do this way. Otherwise you may want to make use of python provided functions (like count()) to do the job.
Counting the number of substrings:
def count(substr,theStr):
num = 0
for i in range(len(theStr)):
if theStr[i:i+len(substr)] == substr:
num += 1
return num
substr = 'is'
theStr = 'mississipi'
print(count(substr,theStr))
O/P : 2
where theStr[i:i+len(substr)] is slice string, i is strating index and i+len(substr) is ending index.
Eg.
i = 0
substr length = 2
first-time compare substring is => mi
String slice more details

string.find indexing in Python

I'm trying to understand why the following python code incorrectly returns the string "dining":
def remove(somestring, sub):
"""Return somestring with sub removed."""
location = somestring.find(sub)
length = len(sub)
part_before = somestring[:location]
part_after = somestring[location + length:]
return part_before + part_after
print remove('ding', 'do')
I realize the way to make the code run correctly is to add an if statement so that if the location variable returns a -1 it will simply return the original string (in this case "ding"). The code, for example, should be:
def remove(somestring, sub):
"""Return somestring with sub removed."""
location = somestring.find(sub)
if location == -1:
return somestring
length = len(sub)
part_before = somestring[:location]
part_after = somestring[location + length:]
return part_before + part_after
print remove('ding', 'do')
Without using the if statement to fix the function, the part_before variable will return the string "din". I would love to know why this happens. Reading the python documentation on string.find (which is ultimately how part_before is formulated) I see that the location variable would become a -1 because "do" is NOT found. But if the part_before variable holds all letters before the -1 index, shouldn't it be blank and not "din"? What am I missing here?
For reference, Python documentation for string.find states:
string.find(s, sub[, start[, end]])
Return the lowest index in s where the substring sub is found such that sub is wholly contained in s[start:end]. Return -1 on failure. Defaults for start and end and interpretation of negative values is the same as for slices.
string = 'ding'
string[:-1]
>>> 'din'
Using a negative number as an index in python returns the nth element from the right-hand side. Accordingly, a slice with :-1 return all but the last element of the string.
If you have a string 'ding' and you are searching for 'do', str.find() will return -1. 'ding'[:-1] is equal to 'din' and 'ding'[-1 + len(sub):] equals 'ding'[1:] which is equal to 'ing'. Putting the two together results in 'dining'. To get the right answer, try something like this:
def remove(string, sub):
index = string.find(sub)
if index == -1:
return string
else:
return string[:index] + string[index + len(sub):]
The reason that string[:-1] is not equal to the whole string is that in slicing, the first number (in this case blank so equal to None, or for our purposes equivalent to 0) is inclusive, but the second number (-1) is exclusive. string[-1] is the last character, so that character is not included.

trying to find if a character appears successively in a string

Simple script to find if the second arguement appears 3 times successively in the first arguement. I am able to find if the second arguement is in first and how many time etc but how do i see if its present 3 times successively or not ?
#!/usr/bin/python
import string
def three_consec(s1,s2) :
for i in s1 :
total = s1.count(s2)
if total > 2:
return "True"
print three_consec("ABABA","A")
total = s1.count(s2) will give you the number of s2 occurrences in s1 regardless of your position i.
Instead, just iterate through the string, and keep counting as you see characters s2:
def three_consec (string, character):
found = 0
for c in string:
if c == character:
found += 1
else:
found = 0
if found > 2:
return True
return False
Alternatively, you could also do it the other way around, and just look if “three times the character” appears in the string:
def three_consec (string, character):
return (character * 3) in string
This uses the feature that you can multiplicate a string by a number to repeat that string (e.g. 'A' * 3 will give you 'AAA') and that the in operator can be used to check whether a substring exists in a string.

Categories

Resources