I was playing around with a function for an assignment just to better understand it. It was meant to find the last occurrence of a sub-string within a string. The function should return the position of the start of the last occurrence of the sub-string or it must return -1 if the sub-string is not found at all. The 'standard' way was as follows:
def find_last(full, sub):
start = -1
while True:
new = full.find(sub, start + 1)
if new == -1:
break
else:
start = new
return start
I wanted to try and have it search in reverse, as this seemed to be the more efficient way. So I tried this:
def find_last(full, sub):
start = -1
while True:
new = full.find(sub, start)
if new == -1 and abs(start) <= len(full): #evals to False when beginning of string is reached
start -= 1
else:
break
return new
We were given a handful of test cases which needed to be passed and my reversed function passed all but one:
print find_last('aaaa', 'a')
>>>3
print find_last('aaaaa', 'aa')
>>>3
print find_last('aaaa', 'b')
>>>-1
print find_last("111111111", "1")
>>>8
print find_last("222222222", "")
>>>8 #should be 9
print find_last("", "3")
>>>-1
print find_last("", "")
>>>0
Can someone kindly explain why find is behaving this way with negative indexing? Or is it just some glaring mistake in my code?
The empty string can be found at any position. Initializing start with -1 makes your algorithm beginning its search at the penultimate position, not the last.
The last position is after the last character of the string, but you are starting to look at the last character of the string.
Related
I know there are other problems similar to this but I can't seem to figure out how to fix it. I've implemented some "tracking"-code to understand where it goes wrong. The problem is obviously the same for other parts of the code as well depending on what is run through the tokenize function. I know the problem exists because of the end += 1 in the while-loops and doesn't stop/continue correctly. After the last letter/number/symbol is read it should add it to words but instead it tries to go one further step and creates this error. Tried numerous if's and tries things but my coding is to weak to solve it properly. Any other comments of the code in general is much appreciated as well. I had a draft that was working earlier but I accidentally deleted that draft when I was supposed to polish it and move it to another document...
def tokenize(lines):
words = []
for line in lines:
print("new line")
start = 0
while start-1 < len(line):
print(start)
print("start")
while line[start].isspace() == True:
print("remove space")
start += 1
end = start
while line[end].isspace() == True:
print("remove space")
end += 1
if line[end].isalpha() == True:
while line[end].isalpha() == True:
print("letter")
end += 1
elif line[end].isdigit() == True:
while line[end].isdigit() == True:
print("number")
end += 1
else:
print("symbol")
end += 1
words.append(line[start:end].lower())
print(line[start:end] + " - adds to words")
start = end
print(len(line))
print(words)
return words
tokenize([" all .. 12 foas d 12 9"])
The main issue is that you have to check your indexing bounds in every part of the code where indexing variables might have changed. This includes both start and end variables as they are independently incremented within your code.
I also cut out areas of your code which were unnecessary and considered mainly duplicate code and untidy logic, which you have to avoid in every program you write. This also makes it easier to debug, maintain, and understand your program. Always make sure your logic gets as straightforward as possible before you start writing any code.
def tokenize(lines):
words = []
for line in lines:
print("new line")
start = 0
# start, as an index, is allowed in the range [0, len(line) - 1]
# so use either *start < len(line)* or *start <= len(line) - 1* as they are equivalent
while start < len(line):
print(start)
print("start")
# going forward, watch not to overstep again
while start < len(line) and line[start].isspace():
print("remove space")
start += 1
end = start
# whatever variable you use as an index, you have to make sure
# it will be within bounds; as you go forward to capture
# non-space symbols, you should also stop before the string finishes.
while end < len(line) and not line[end].isspace():
if line[end].isalpha():
print("letter")
elif line[end].isdigit():
print("number")
else:
print("symbol")
end += 1
words.append(line[start:end].lower())
print(line[start:end] + " - adds to words")
start = end
print(len(line))
print(words)
UPDATE:
It seems the OP is trying to keep the non-alphanumeric symbols as separate tokens. I suggest not doing this in a single passing. You can first split the normal way and then go over each word again to split by symbols (and retain symbols). This will keep your code simpler and easier to read. I'm going to use regex split for the second step:
import re
greeting = "Hey, how are you doing?"
# get rid of spaces
tokens = greeting.split()
result = []
for w in tokens:
# '[^\d\w]' will match symbol characters (non-digit and non-alpha)
# parentheses will capture the delimiters (the symbols) as tokens in the final list
for x in re.split("([^\d\w]+)", w):
if x:
result.append(x)
print(result)
##### or use list comprehension to achieve this in a single line
result = [x for w in greeting.split() for x in re.split("([^\d\w]+)", w) if x != ""]
print(result)
I need to count the nunber of times the substring 'bob' occurs in a string.
Example problem: Find the number of times 'bob' occurs in string s such that
"s = xyzbobxyzbobxyzbob" #(here there are three occurrences)
Here is my code:
s = "xyzbobxyzbobxyzbob"
numBobs = 0
while(s.find('bob') >= 0)
numBobs = numBobs + 1
print numBobs
Since the find function in Python is supposed to return -1 if a substring is unfound the while loop ought to end after printing out the incremented number of bobs each time it finds the substring.
However the program turns out to be an infinite loop when I run it.
For this job, str.find isn't very efficient. Instead, str.count should be what you use:
>>> s = 'xyzbobxyzbobxyzbob'
>>> s.count('bob')
3
>>> s.count('xy')
3
>>> s.count('bobxyz')
2
>>>
Or, if you want to get overlapping occurrences, you can use Regex:
>>> from re import findall
>>> s = 'bobobob'
>>> len(findall('(?=bob)', s))
3
>>> s = "bobob"
>>> len(findall('(?=bob)', s))
2
>>>
When you do s.find('bob') you search from the beginning, so you end-up finding the same bob again and again, you need to change your search position to end of the bob you found.
string.find takes start argument which you can pass to tell it from where to start searching, string.find also return the position are which it found bob, so you can use that, add length of bob to it and pass it to next s.find.
So at start of loop set start=0 as you want to search from start, inside loop if find returns a non-negative number you should add length of search string to it to get new start:
srch = 'bob'
start = numBobs = 0 while start >= 0:
pos = s.find(srch, start)
if pos < 0:
break
numBobs += 1
start = pos + len(srch)
Here I am assuming that overlapped search string are not considered
find doesn't remember where the previous match was and start from there, not unless you tell it to. You need to keep track of the match location and pass in the optional start parameter. If you don't find will just find the first bob over and over.
find(...)
S.find(sub [,start [,end]]) -> int
Return the lowest index in S where substring sub is found,
such that sub is contained within s[start:end]. Optional
arguments start and end are interpreted as in slice notation.
Return -1 on failure.
Here is a solution that returns number of overlapping sub-strings without using Regex:
(Note: the 'while' loop here is written presuming you are looking for a 3-character sub-string i.e. 'bob')
bobs = 0
start = 0
end = 3
while end <= len(s) + 1 and start < len(s)-2 :
if s.count('bob', start,end) == 1:
bobs += 1
start += 1
end += 1
print(bobs)
Here you have an easy function for the task:
def countBob(s):
number=0
while s.find('Bob')>0:
s=s.replace('Bob','',1)
number=number+1
return number
Then, you ask countBob whenever you need it:
countBob('This Bob runs faster than the other Bob dude!')
def count_substring(string, sub_string):
count=a=0
while True:
a=string.find(sub_string)
string=string[a+1:]
if a>=0:
count=count+1;
else:
break
return count
So here is my code
def count_occurrences(sub, s):
if len(s) == 0:
return 0
else:
if str(sub) in str(s) and str(sub) == str(s):
return 1+count_occurrences(sub, s[1:])
else:
return count_occurrences(sub, s[1:])
print(count_occurrences('ill', 'Bill will still get ill'))
I believe this if str(sub) in str(s) and str(sub) == str(s): statement is throwing me off when I run the debugger UI. If I just put if str(sub) in str(s) it gives me a number but it is not the number I want which is 4.
Your code didn't quite work properly because you skipped one character only if you found the substring that will lead the program to find the substring at the same place, instead you have to skip to the index after the first occurence of the substring. This code will work
def count_occurences(s, sub):
if len(s) == 0:
return 0
else:
ind = s.find(sub)
if ind>=0:
return 1+count_occurences(s[ind+1:], sub)
else:
return 0
I added 1 to the index because, in the case of "ill", find() will give me the index of the letter 'i', so if I give s[ind+1:] that will remove all the characters before the first 'l' i.e including 'i', so the next iteration won't find "ill" in the same place as before which leads to counting the same occurence twice.
your condition str(sub) == str(s) will never be True, except maybe if the substring is at the very end. You have to compare the start of the string (same length as the substring) instead of searching for it at any position, otherwise you'll count the same match multiple times. Also, you dont need to use str() if you are already processing strings.
def count_occurrences(sub, s):
if len(sub)>len(s): return 0
return s.startswith(sub) + count_occurrences(sub,s[1:]) # True is 1
Output:
print(count_occurrences('ill', 'Bill will still get ill'))
4
Note that I assumed that you want to count overlapping substrings. For example: 'ana' counts for 2 in 'banana'.
I have been learning Python (as my first language) from "How to Think Like a Computer Scientist: Learning with Python". This open book teaches mostly through examples and I prefer to read the goal and build the program on my own, rather than actually reading the program code provided in the book.
However, I am struggling with creating a function which will search for a specific character in a given string and return how many times that character was counted.
The code I wrote is:
def find(s, x): #find s in x
s = raw_input("Enter what you wish to find: ")
x = raw_input("Where to search? ")
count = 0
for l in x: #loop through every letter in x
if l == s:
count += 1
else:
print count
However, when I run this code, I get the error "name 's' is not defined".
The code in the book has a slightly different goal: it searches for a specific character in a string, but instead of counting how many times the character was found, it returns the position of the character in the string.
def find(strng, ch, start=0, step=1):
index = start
while 0 <= index < len(strng):
if strng[index] == ch:
return index
index += step
return -1
I don't really understand this code, actually.
However, even when I run the code, for example, to search for 'a' in 'banana', I get the error name 'banana' is not defined.
What is wrong with my code? Could please someone explain me how the code provided in the book works?
1: There are a couple things wrong with this code. The function takes in two parameters, s and x, then immediately throws them away by overwriting those variables with user input. In your for loop, every time you encounter a character that isn't s you print the count. You should try to separate different ideas in your code into different methods so that you can reuse code more easily.
Break down your code into small, simple ideas. If the purpose of find is to count the instances of a character in a string, it shouldn't also be handling user interaction. If you take out the raw_input and printing, you can simplify this function to:
def find(s, x): #find s in x
count = 0
for l in x: #loop through every letter in x
if l == s:
count += 1
return count
Now all it does it take in a character and a string and return the number of times the character appears in the string.
Now you can do your user interaction outside of the function
char = raw_input("Enter what you wish to find: ")
string = raw_input("Where to search?: )
print char + " appears " + `find(char, string)` + " times in " + string
2: The goal of this function is to find the first place where ch is found when walking through the characters strng from a starting position with a specified step. It takes in ch, strng, a position to start searching, and a step size. If the start is 0 and the step is 1, it will check every character. If the start is 2 it will check all but the first 2 characters, if the step is 2 it will check every other character, etc. This works by starting looking at the start index (index = start), then looping while the index is at least 0 and less than the length of the string. Since python is 0-indexed, the last character in the string has an index of one less than the length of the string, so this just restricts you from trying to check invalid indices. For each iteration of the loop, the code checks if the character at the current index is ch, in which case it returns the index (this is the first time it found the character). Every time it doesn't find the character at the current index, it increments the index by the step and tries again until it goes past the last character. When this happens it exits the loop and returns -1, a sentinel value which indicates that we didn't find the character in the string.
def find(strng, ch, start=0, step=1):
index = start
while 0 <= index < len(strng):
if strng[index] == ch:
return index
index += step
return -1
3: I'm guessing you passed some invalid parameters. strng should be a string, ch should be a single character, and start and step should be integers.
Try this. I took the parameters out of your function, moved the print command out of the else block and out of the for loop, and then wrote the last line to call the function.
def find(): #find s in x
s = raw_input("Enter what you wish to find: ")
x = raw_input("Where to search? ")
count = 0
for l in x: #loop through every letter in x
if l == s:
count += 1
print count
find()
It seems like you're taking in inputs s and x twice - once through the function arguments and once through raw input. Modify the function to do either one (say only from raw input - see below). Also, you only need to print out the count once, so you can place the print statement in the outermost indent level in the function.
def find(): #find s in x
s = raw_input("Enter what you wish to find: ")
x = raw_input("Where to search? ")
count = 0
for l in x: #loop through every letter in x
if l == s:
count += 1
print count
I need to count the nunber of times the substring 'bob' occurs in a string.
Example problem: Find the number of times 'bob' occurs in string s such that
"s = xyzbobxyzbobxyzbob" #(here there are three occurrences)
Here is my code:
s = "xyzbobxyzbobxyzbob"
numBobs = 0
while(s.find('bob') >= 0)
numBobs = numBobs + 1
print numBobs
Since the find function in Python is supposed to return -1 if a substring is unfound the while loop ought to end after printing out the incremented number of bobs each time it finds the substring.
However the program turns out to be an infinite loop when I run it.
For this job, str.find isn't very efficient. Instead, str.count should be what you use:
>>> s = 'xyzbobxyzbobxyzbob'
>>> s.count('bob')
3
>>> s.count('xy')
3
>>> s.count('bobxyz')
2
>>>
Or, if you want to get overlapping occurrences, you can use Regex:
>>> from re import findall
>>> s = 'bobobob'
>>> len(findall('(?=bob)', s))
3
>>> s = "bobob"
>>> len(findall('(?=bob)', s))
2
>>>
When you do s.find('bob') you search from the beginning, so you end-up finding the same bob again and again, you need to change your search position to end of the bob you found.
string.find takes start argument which you can pass to tell it from where to start searching, string.find also return the position are which it found bob, so you can use that, add length of bob to it and pass it to next s.find.
So at start of loop set start=0 as you want to search from start, inside loop if find returns a non-negative number you should add length of search string to it to get new start:
srch = 'bob'
start = numBobs = 0 while start >= 0:
pos = s.find(srch, start)
if pos < 0:
break
numBobs += 1
start = pos + len(srch)
Here I am assuming that overlapped search string are not considered
find doesn't remember where the previous match was and start from there, not unless you tell it to. You need to keep track of the match location and pass in the optional start parameter. If you don't find will just find the first bob over and over.
find(...)
S.find(sub [,start [,end]]) -> int
Return the lowest index in S where substring sub is found,
such that sub is contained within s[start:end]. Optional
arguments start and end are interpreted as in slice notation.
Return -1 on failure.
Here is a solution that returns number of overlapping sub-strings without using Regex:
(Note: the 'while' loop here is written presuming you are looking for a 3-character sub-string i.e. 'bob')
bobs = 0
start = 0
end = 3
while end <= len(s) + 1 and start < len(s)-2 :
if s.count('bob', start,end) == 1:
bobs += 1
start += 1
end += 1
print(bobs)
Here you have an easy function for the task:
def countBob(s):
number=0
while s.find('Bob')>0:
s=s.replace('Bob','',1)
number=number+1
return number
Then, you ask countBob whenever you need it:
countBob('This Bob runs faster than the other Bob dude!')
def count_substring(string, sub_string):
count=a=0
while True:
a=string.find(sub_string)
string=string[a+1:]
if a>=0:
count=count+1;
else:
break
return count