Counting occurrences of a sub-string without using a built-in function - python

My teacher challenged me of finding a way to count the occurences of the word "bob" in any random string variable without str.count(). So I did,
a = "dfjgnsdfgnbobobeob bob"
compteurDeBob = 0
for i in range (len(a) - 1):
if a[i] == "b":
if a[i+1] == "o":
if a[i+2] == "b":
compteurDeBob += 1
print(compteurDeBob)
but I wanted to find a way to do that with a word of any length as shown below, but I have no clue on how to do that...
a = input("random string: ")
word = input("Wanted word: ")
compteurDeBob = 0
for i in range (len(a)-1):
#... i don't know...
print(compteurDeBob)

a = input("random string: ")
word = input("Wanted word: ")
count = 0
for i in range(len(a)-len(word)):
if a[i:i+len(word)] == word:
count += 1
print(count)
If you want your search to be case-insensitive, then you can use lower() function:
a = input("random string: ").lower()
word = input("Wanted word: ").lower()
count = 0
for i in range(len(a)):
if a[i:i+len(word)] == word:
count += 1
print(count)
For the user input
Hi Bob. This is bob
the first approach will output 1 and the second approach will output 2

To count all overlapping occurrences (like in your example) you could just slice the string in a loop:
a = input("random string: ")
word = input("Wanted word: ")
cnt = 0
for i in range(len(a)-len(word)+1):
if a[i:i+len(word)] == word:
cnt += 1
print(cnt)

You can use string slicing. One way to adapt your code:
a = 'dfjgnsdfgnbobobeob bob'
counter = 0
value = 'bob'
chars = len(value)
for i in range(len(a) - chars + 1):
if a[i: i + chars] == value:
counter += 1
A more succinct way of writing this is possible via sum and a generator expression:
counter = sum(a[i: i + chars] == value for i in range(len(a) - chars + 1))
This works because bool is a subclass of int in Python, i.e. True / False values are considered 1 and 0 respectively.
Note str.count won't work here, as it only counts non-overlapping matches. You could utilise str.find if built-ins are allowed.

The fastest way to calculate overlapping matches is the Knuth-Morris-Pratt algorithm [wiki] which runs in O(m+n) with m the string to match, and n the size of the string.
The algorithm first builds a lookup table that acts more or less as the description of a finite state machine (FSM). First we construct such table with:
def build_kmp_table(word):
t = [-1] * (len(word)+1)
cnd = 0
for pos in range(1, len(word)):
if word[pos] == word[cnd]:
t[pos] = t[cnd]
else:
t[pos] = cnd
cnd = t[cnd]
while cnd >= 0 and word[pos] != word[cnd]:
cnd = t[cnd]
cnd += 1
t[len(word)] = cnd
return t
Then we can count with:
def count_kmp(string, word):
n = 0
wn = len(word)
t = build_kmp_table(word)
k = 0
j = 0
while j < len(string):
if string[j] == word[k]:
k += 1
j += 1
if k >= len(word):
n += 1
k = t[k]
else:
k = t[k]
if k < 0:
k += 1
j += 1
return n
The above counts overlapping instances in linear time in the string to be searched, which was an improvements of the "slicing" approach that was earlier used, that works in O(m×n).

Related

How do I fix my infinite loop from an if statement?

Alice has a W-word essay due tomorrow (1 ≤ W ≤ 10,000), but she's too
busy programming to bother with that! However, Alice happens to know
that H.S. High School's English teacher is sick of reading and grading
long essays, so she figures that if she just submits a "reasonable"
essay which fulfills the requirements but is as short as possible, she
may get some pity marks!
As such, Alice wants to write a program to generate a sequence of W
words to pass off as her essay, where each word is any string
consisting of 1 or more lowercase letters ("a".."z") (not necessarily
a real English word). The essay will have no punctuation or
formatting, as those seem unnecessary to Alice. In an attempt to
disguise the essay's generated nature, Alice will insist that all W
words are distinct. Finally, for her plan to come together, she'll
make the sum of the W words' lengths as small as possible.
Help Alice generate any essay which meets the above requirements.
As of now, I think I've identified a piece of code that is causing an infinite loop. However, I cannot find out how to fix it. My theory: the first if statement is contradicting with the other if statements, causing an infinite loop. It is starting to loop infinitely when it reaches third character words.
import string, math
w = int (raw_input(" "))
words = []
paragraph = ""
alphabet = string.ascii_lowercase
first_alpha = -1
second_alpha = 0
third_alpha = 1
switch_to_two_char = False
switch_to_three_char = False
def unique(s):
return len(set(s)) == len(s)
x = 0
while (x != w):
word = ""
if (x != 0):
word = " "
if (first_alpha >= 25):
first_alpha = 0
switch_to_two_char = True
elif (second_alpha >= 25):
second_alpha = 0
first_alpha += 1
elif (second_alpha >= 25 & first_alpha >= 25):
first_alpha = 0
second_alpha = 0
switch_to_three_char = True
elif (third_alpha >= 25):
second_alpha += 1
third_alpha = 0
else:
if (switch_to_two_char and not switch_to_three_char):
second_alpha += 1
if (switch_to_three_char):
third_alpha += 1
else:
first_alpha += 1
if (switch_to_two_char):
word += alphabet[second_alpha]
word += alphabet[first_alpha]
elif (switch_to_three_char):
word += alphabet[third_alpha]
word += alphabet[second_alpha]
word += alphabet[first_alpha]
else:
word += alphabet[first_alpha]
if (unique(word) == 0):
continue
if (word in words):
continue
else:
paragraph += word
words.append (word)
x += 1
print paragraph
When second_alpha add to 25, first_alpha+1 and second_alpha return to 0. So, when the first_alpha add to 25 finally, the second_alpha return to 0 again. Next loop, you program will go into this if-statement.
elif (first_alpha >= 25):
first_alpha = 0
switch_to_two_char = True
And then, both of first_alpha and second_alpha return to 0 again.

count the same character which comes in sequence

def count_squences(string):
i= 0
total = 0
total_char_list = []
while i < len(string):
print(string[i])
if string[i] == "x":
total += 1
if string[i] == "y":
total_char_list.append(total)
total = 0
i = i + 1
return total_char_list
print(count_squences("xxxxyyxyxx"))
I am trying to return the most used x characters in a file format. for example this functions should return [4, 1, 2].
For example if string is "xxxxxyxxyxxx' it should return [5, 2, 3]
My function does not return the correct list. Any helps would be really appreciated. Thanks
You are not resetting your counter when you encounter a y character, and you should only append to total_char_list if there was at least one x character counted by the time you find a y character (y characters could be duplicated too):
total = 0
while i < len(string):
if string[i] == "x":
total += 1
if string[i] == "y":
if total:
total_char_list.append(total)
total = 0
i = i + 1
Next, when the loop ends and total is not zero, you need to append that value too, or you won't be counting 'x' characters at the end of the sequence:
while ...:
# ...
if total:
# x characters at the end
total_char_list.append(total)
Next, you really want to use a for loop to loop over a sequence. You are given the individual characters that way:
total = 0
for char in string:
if char == 'x':
total += 1
if char == 'y':
if total:
total_charlist.append(total)
total = 0
if total:
# x characters at the end
total_char_list.append(total)
You can make this faster with itertools.groupby():
from itertools import groupby
def count_squences(string):
return [sum(1 for _ in group) for char, group in groupby(string) if char == 'x']
groupby() divides up an iterable input (such as a string) into separate iterators per group, where a group is defined as any consecutive value with the same key(value) result. The default key() function just returns the value, so groupby(string) gives you groups of consecutive characters that are the same. char is the repeated character, and sum(1 for _ in group) takes the length of an iterator.
You can then make it more generic, and count all groups:
def count_all_sequences(string):
counts = {}
for char, group in groupby(string):
counts.setdefault(char, []).append(sum(1 for _ in group))
return counts
The same can be done with a regular expression:
import re
def count_all_sequences(string):
counts = {}
# (.)(\1*) finds repeated characters; (.) matching one, \1 matching the same
# This gives us (first, rest) tuples, so len(rest) + 1 is the total length
for char, group in re.findall(r'(.)(\1*)', string):
counts.setdefault(char, []).append(len(group) + 1)
return counts
You don't initialize the value of total between the sequences so it keeps on counting.
def count_squences(string):
i= 0
total = 0
total_char_list = []
while i < len(string):
if string[i] == "x":
total += 1
if string[i] == "y":
if total != 0:
total_char_list.append(total)
total = 0
i = i + 1
if total != 0:
total_char_list.append(total)
return total_char_list
Update (17:00) - fixed the original procedure and I thought of a better solution -
my_str = "xxxxyyxyxx"
[len(z) for z in re.split("y+", my_str)]
Edited for function format:
def count_sequences(string):
return [len(x) for x in re.findall(r"x+", string)]
count_sequences("xxxxyyxyxx")
returns [4,1,2]

How to count common letters in order between two words in Python?

I have a string pizzas and when comparing it to pizza - it is not the same. How can you make a program that counts common letters (in order) between two words, and if it's a 60% match then a variable match is True?
For e.g. pizz and pizzas have 4 out of 6 letters in common, which is a 66% match, which means match must be True, but zzip and pizzasdo not have any letters in order in common, thus match is False
You can write a function to implement this logic.
zip is used to loop through the 2 strings simultaneously.
def checker(x, y):
c = 0
for i, j in zip(x, y):
if i==j:
c += 1
else:
break
return c/len(x)
res = checker('pizzas', 'pizz') # 0.6666666666666666
def longestSubstringFinder(string1, string2):
answer = ""
len1, len2 = len(string1), len(string2)
for i in range(len1):
match = ""
for j in range(len2):
if (i + j < len1 and string1[i + j] == string2[j]):
match += string2[j]
else:
if (len(match) > len(answer)): answer = match
match = ""
return answer
ss_len = len(longestSubstringFinder("pizz", "pizzas"))
max_len = max(len("pizza"),len("pizzas"))
percent = ss_len/max_len*100
print(percent)
if(percent>=60):
print("True");
else:
print("False")
Optimised algorithm using dynamic programming:
def LCSubStr(X, Y, m, n):
LCSuff = [[0 for k in range(n+1)] for l in range(m+1)]
result = 0
for i in range(m + 1):
for j in range(n + 1):
if (i == 0 or j == 0):
LCSuff[i][j] = 0
elif (X[i-1] == Y[j-1]):
LCSuff[i][j] = LCSuff[i-1][j-1] + 1
result = max(result, LCSuff[i][j])
else:
LCSuff[i][j] = 0
return result
This will directly return the length of LCS.

count an occurrence of a string in a bigger string

I am looking to understand what I can do to make my code to work. Learning this concept will probably unlock a lot in my programming understanding. I am trying to count the number of times the string 'bob' occurs in a larger string. Here is my method:
s='azcbobobegghakl'
for i in range(len(s)):
if (gt[0]+gt[1]+gt[2]) == 'bob':
count += 1
gt.replace(gt[0],'')
else:
gt.replace(gt[0],'')
print count
How do I refer to my string instead of having to work with integers because of using for i in range(len(s))?
Try this:
def number_of_occurrences(needle, haystack, overlap=False):
hlen, nlen = len(haystack), len(needle)
if nlen > hlen:
return 0 # definitely no occurrences
N, i = 0, 0
while i < hlen:
consecutive_matching_chars = 0
for j, ch in enumerate(needle):
if (i + j < hlen) and (haystack[i + j] == ch):
consecutive_matching_chars += 1
else:
break
if consecutive_matching_chars == nlen:
N += 1
# if you don't need overlap, skip 'nlen' characters of 'haystack'
i += (not overlap) * nlen # booleans can be treated as numbers
i += 1
return N
Example usage:
haystack = 'bobobobobobobobobob'
needle = 'bob'
r = number_of_occurrences(needle, haystack)
R = haystack.count(needle)
print(r == R)
thanks. your support help to birth the answer in. here what I have :
numBobs = 0
for i in range(1, len(s)-1):
if s[i-1:i+2] == 'bob':
numBobs += 1
print 'Number of times bob occurs is:', numBobs

Looping and Counting w/find

So I am working diligently on some examples for my homework and came across yet another error.
The original:
word = 'banana'
count = 0
for letter in word:
if letter == 'a':
count = count + 1
print count
Ok. Looks simple.
I then used this code in a function name count and generalized it so that it accepts the string and the letter as argument.
def count1(str, letter):
count = 0
word = str
for specific_letter in word:
if specific_letter == letter:
count = count + 1
print count
This is where I'm still not sure what I'm doing wrong.
I have to rewrite this function so that instead of traversing the string, it uses the three-parameter version of find from the previous section. Which this is:
def find(word, letter, startat):
index = startat
while index <= len(word):
if word[index] == letter:
return index
index = index + 1
return -1
This is how far I got... but the program doesn't work the way I want it to.
def find(str, letter, startat):
index = startat
word = str
count = 0
while index <= len(word):
if word[index] == letter:
for specific_letter in word:
if specific_letter == letter:
count = count + 1
print count
index = index + 1
Can someone point me in the right direction. I want to understand what I'm doing instead of just given the answer. Thanks.
The point of the exercise is to use the previously defined function find as a building block to implement a new function count. So, where you're going wrong is by trying to redefine find, when you should be trying to change the implementation of count.
However, there is a wrinkle in that find as you have given has a slight error, you would need to change the <= to a < in order for it to work properly. With a <=, you could enter the body of the loop with index == len(word), which would cause IndexError: string index out of range.
So fix the find function first:
def find(word, letter, startat):
index = startat
while index < len(word):
if word[index] == letter:
return index
index = index + 1
return -1
And then re-implement count, this time using find in the body:
def count(word, letter):
result = 0
startat = 0
while startat < len(word):
next_letter_position = find(word, letter, startat)
if next_letter_position != -1:
result += 1
startat = next_letter_position + 1
else:
break
return result
if __name__ == '__main__':
print count('banana', 'a')
The idea is to use find to find you the next index of the given letter.
In your code you don't use the find function.
If you want to try something interesting and pythonic: Change the original find to yield index and remove the final return -1. Oh, and fix the <= bug:
def find(word, letter, startat):
index = startat
while index < len(word):
if word[index] == letter:
yield index
index = index + 1
print list(find('hello', 'l', 0))
Now find returns all of the results. You can use it like I did in the example or with a for position in find(...): You can also simply write count in terms of the length of the result.
(Sorry, no hints on the final function in your question because I can't tell what you're trying to do. Looks like maybe you left in too much of the original function and jumbled their purposes together?)
Here's what I came up with: This should work.
def find(word, letter, startat)
index = startat
count = 0
while index < len(word):
if word[index] == letter:
count = count + 1 ##This counts when letter matches the char in word
index = index + 1
print count
>>> find('banana', 'a', 0)
3
>>> find('banana', 'n', 0)
2
>>> find('mississippi', 's', 0)
4
>>>
Try using :
def find_count(srch_wrd, srch_char, startlookingat):
counter = 0
index = startlookingat
while index < len(srch_wrd):
if srch_wrd[index] == srch_char:
counter += 1
index += 1
return counter`
def count_letter2(f, l):
count = 0
t = 0
while t < len(f):
np = f.find(l, t)
if np != -1:
count += 1
t = np + 1
"I was wrong by doing t =t +1"
else:
break
return count
print(count_letter2("banana", "a"))
print(count_letter2("abbbb", "a"))

Categories

Resources