counting the number of substrings in a string

counting the number of substrings in a string - python

I am working on an Python assignment and I am stuck here.
Apparently, I have to write a code that counts the number of a given substring within a string.
I thought I got it right, then I am stuck here.
def count(substr,theStr):
# your code here
num = 0
i = 0
while substr in theStr[i:]:
i = i + theStr.find(substr)+1
num = num + 1
return num
substr = 'is'
theStr = 'mississipi'
print(count(substr,theStr))
if I run this, I expect to get 2 as the result, rather, I get 3...
See, other examples such as ana and banana works fine, but this specific example keeps making the error. I don't know what I did wrong here.
Would you PLEASE help me out.

In your code
while substr in theStr[i:]:
correctly advances over the target string theStr, however the
i = i + theStr.find(substr)+1
keeps looking from the start of theStr.
The str.find method accepts optional start and end arguments to limit the search:
str.find(sub[, start[, end]])
Return the lowest index in the string where substring sub is found
within the slice s[start:end]. Optional arguments start and end
are interpreted as in slice notation. Return -1 if sub is not found.
We don't really need to use in here: we can just check that find doesn't return -1. It's a bit wasteful performing an in search when we then need to repeat the search using find to get the index of the substring.
I assume that you want to find overlapping matches, since the str.count method can find non-overlapping matches, and since it's implemented in C it's more efficient than implementing it yourself in Python.
def count(substr, theStr):
num = i = 0
while True:
j = theStr.find(substr, i)
if j == -1:
break
num += 1
i = j + 1
return num
print(count('is', 'mississipi'))
print(count('ana', 'bananana'))
output
2
3
The core of this code is
j = theStr.find(substr, i)
i is initialised to 0, so we start searching from the beginning of theStr, and because of i = j + 1 subsequent searches start looking from the index following the last found match.

The code change you need is -
i = i + theStr[i:].find(substr)+ 1
instead of
i = i + theStr.find(substr)+ 1
In your code the substring is always found until i reaches position 4 or more. But while finding the index of the substring, you were using the original(whole) string which in turn returns the position as 1.
In your example of banana, after first iteration i becomes 2. So, in next iteration str[i:] becomes nana. And the position of substring ana in this sliced string and the original string is 1. So, the bug in the code is just suppressed and the code seems to work fine.
If your code is purely for learning purpose, the you can do this way. Otherwise you may want to make use of python provided functions (like count()) to do the job.

Counting the number of substrings:
def count(substr,theStr):
num = 0
for i in range(len(theStr)):
if theStr[i:i+len(substr)] == substr:
num += 1
return num
substr = 'is'
theStr = 'mississipi'
print(count(substr,theStr))
O/P : 2
where theStr[i:i+len(substr)] is slice string, i is strating index and i+len(substr) is ending index.
Eg.
i = 0
substr length = 2
first-time compare substring is => mi
String slice more details

Related

Finding index of given character in a string in Python with some default Parameter

This might be a silly question and perhaps might be one of those easiest question on SO. Consider the following code which tries to find the index of given character in a string:
def find(s ,ch, start=0):
index = start
while index <= end:
if s[index] == ch:
return index
index = index + 1
return -1
print(find("apple","p"))
This works fine. Now, in this code I want to add a default Parameter end which will tell the function till what length of string, we have to search in the given string. Like this:
def find(s, ch, start=0,end=len(s)):
index = start
while index <= end:
if s[index] == ch:
return index
index = index + 1
return -1
print(find("apple","p"))
However, when I run this code, I get the error in the line 1 of the above code:
NameError: Name s is not defined
I tried to read something about this in some textbook. I found that when function is defined, s is still undefined (For which I have no idea about why this is a case). Hence, len(s) is undefinable.
I know that there is a built in function which implements this but I want to write my own algorithm to do that.
Can anyone help or give hint?

A more pythonic choice for iterating a list would be the for loop since it is simpler and easier to read:
def find(s, ch):
index = None
for i in range(0, len(s)):
if s[i] == ch:
index = i + 1
return (index)
print(find("auhkle","a"))
print(find("auhkle","h"))
print(find("auhkle","e"))
Output:
1
3
6

Identifying a substring in Python (moving along indices within a string) - can only concatenate str (not "int") to str

I'm new to Python and coding and stuck at comparing a substring to another string.
I've got:
string sq and pattern STR.
Goal: I'm trying to count the max number of STR pattern appearing in that string in a row.
This is part of the code:
STR = key
counter = 0
maximum = 0
for i in sq:
while sq[i:i+len(STR)] == STR:
counter += 1
i += len(STR)
The problem seems to appear in the "while part", saying TypeError: can only concatenate str (not "int") to str.
I see it treats i as a character and len(STR) as an int, but I don't get how to fix this.
The idea is to take the first substring equal to the length of STR and figure out whether this substring and STR pattern are identical.
Thank you!

By looping using:
for i in sq:
you are looping over the elements of sq.
If instead you want the variable i to loop over the possible indexes of sq, you would generally loop over range(len(sq)), so that you get values from 0 to len(sq) - 1.
for i in range(len(sq)):
However, in this case you are wanting to assign to i inside the loop:
i += len(STR)
This will not have the desired effect if you are looping over range(...) because on the next iteration it will be assigned to the next value from range, ignoring the increment that was added. In general one should not assign to a loop variable inside the loop.
So it would probably most easily be implemented with a while loop, and you set the desired value of i explicitly (i=0 to initialise, i+=1 before restarting the loop), and then you can have whatever other assignments you want inside the loop.
STR = "ell"
sq = "well, well, hello world"
counter = 0
i = 0
while i < len(sq):
while sq[i:i+len(STR)] == STR: # re use of while here, see comments
counter += 1
i += len(STR)
i += 1
print(counter) # prints 3
(You could perhaps save len(sq) and len(STR) in other variables to save evaluating them repeatedly.)

This solution doesn't use a for so the increment can be by one on a non-match and by string length on a match. Any non-match records the maximum count seen so far and resets the count.
def count_max(string,key):
if len(key) > len(string):
return 0
last = len(string) - len(key)
i = 0
count = 0
maximum = 0
while i <= last:
if string[i:i+len(key)] == key:
count += 1
i += len(key)
else:
maximum = max(maximum,count)
count = 0
i += 1
return max(maximum,count)
key = 'abc'
strings = 'ab','abc','ababcabc','abcdefabcabc','abcabcdefabc'
for string in strings:
print(count_max(string,key))
Output:
0
1
2
2
2
Here also is a potentially faster version. For short strings it isn't faster, but will be much faster if the strings are very long since the regular expression will find matches much faster than Python loops.
def count_max2(string,key):
return max([len(match) // len(key)
for match in re.findall(rf'(?:{re.escape(key)})+',string)]
,default=0)
How it works:
re.escape is a function to make sure characters in key are taken literally and are not regular expression syntax. Allows searching for + for example, instead of being treated as a "one or more" match.
rf'' is syntax for a raw f-string (format string). "raw" is recommended for regular expressions because some syntax for expressions is confused with other Python syntax. f-strings allow variables and functions to be inserted into strings with curly braces {}.
re.findall finds all consecutive matches in the string.
[f(x) for x in iterable] is a list comprehension and takes the list returned from iterable and computes a function on each item in the list. In this case, if takes the length of the match divided by the length of the key to get the number of occurrences of the key.
max(iterable,default=0) returns the maximum value of iterable, or 0 if the iterable is empty (no matches).

string.find indexing in Python

I'm trying to understand why the following python code incorrectly returns the string "dining":
def remove(somestring, sub):
"""Return somestring with sub removed."""
location = somestring.find(sub)
length = len(sub)
part_before = somestring[:location]
part_after = somestring[location + length:]
return part_before + part_after
print remove('ding', 'do')
I realize the way to make the code run correctly is to add an if statement so that if the location variable returns a -1 it will simply return the original string (in this case "ding"). The code, for example, should be:
def remove(somestring, sub):
"""Return somestring with sub removed."""
location = somestring.find(sub)
if location == -1:
return somestring
length = len(sub)
part_before = somestring[:location]
part_after = somestring[location + length:]
return part_before + part_after
print remove('ding', 'do')
Without using the if statement to fix the function, the part_before variable will return the string "din". I would love to know why this happens. Reading the python documentation on string.find (which is ultimately how part_before is formulated) I see that the location variable would become a -1 because "do" is NOT found. But if the part_before variable holds all letters before the -1 index, shouldn't it be blank and not "din"? What am I missing here?
For reference, Python documentation for string.find states:
string.find(s, sub[, start[, end]])
Return the lowest index in s where the substring sub is found such that sub is wholly contained in s[start:end]. Return -1 on failure. Defaults for start and end and interpretation of negative values is the same as for slices.

string = 'ding'
string[:-1]
>>> 'din'
Using a negative number as an index in python returns the nth element from the right-hand side. Accordingly, a slice with :-1 return all but the last element of the string.

If you have a string 'ding' and you are searching for 'do', str.find() will return -1. 'ding'[:-1] is equal to 'din' and 'ding'[-1 + len(sub):] equals 'ding'[1:] which is equal to 'ing'. Putting the two together results in 'dining'. To get the right answer, try something like this:
def remove(string, sub):
index = string.find(sub)
if index == -1:
return string
else:
return string[:index] + string[index + len(sub):]
The reason that string[:-1] is not equal to the whole string is that in slicing, the first number (in this case blank so equal to None, or for our purposes equivalent to 0) is inclusive, but the second number (-1) is exclusive. string[-1] is the last character, so that character is not included.

Create Your Own Find String Function

For a school project I have to create a function called find_str that essentially does the same thing as the .find string method, but we cannot use any string methods in our definition.
The project description reads: "Function find_str has two parameters (both strings). It returns the lowest index where the second parameter is found within the first parameter (it returns -1 if the second parameter is not found within the first parameter)."
I have spent a lot of time working on this project and have yet to come to a solution. This is the current definition that I have come up with:
def find_str (string, substring):
index = 0
length = len (substring)
for ch in string:
if ch == substring [0]:
subindex1 = 0
subindex2 = index
for i in range (length):
if ch == substring [i]:
subindex1 +=1
if subindex1 == length:
return index
ch = string [(subindex2)+1]
subindex2 +=1
index += 1
return "-1"
This sample of code only works in some instances, but not all.
For example:
print (find_str ("hello", "llo"))
returns:
2
as it should.
But
print (find_str ("hello", "el"))
returns:
ch = string [(subindex2)+1]
IndexError: string index out of range
I feel like I am overthinking this and there must be is an easier way to do it. Any input or help would be great! Thanks.

FFUsing a sub function to clear your thoughts often help.
def find_str (string, substring):
index = 0
length = len (substring)
for j in range(len(string)):
if is_next_sub(string, substring, j):
return j
return "-1"
def is_next_sub(string, substring, index):
for i in range(len(substring)):
if substring[i] != string[index + i]:
return False
return True

I'm not sure we should be helping you with 'homework'
How about this:
def find_str(string, substring):
for off in xrange(len(string)):
if string[off:].startswith(substring):
return off
return -1

I haven't checked through your code in detail, but it looks like you're trying to compare characters that don't exist.
Suppose you're searching "aaaaa" for the substring "aaa", and you need to find all matches...
String : aaaaa
Match at 0 : aaa..
Match at 1 : .aaa.
Match at 2 : ..aaa
Even though the characters always match, and there five characters in the string, there are only three positions that you might need to consider.
So before you look at the actual characters at all, you can restrict the number of start positions you might need to consider based on the lengths of the string and substring. You only loop for those start positions. That means you're not looping for start positions that cannot match. Also, if you don't do this...
String : aaaaa
Match at 0 : aaa..
Match at 1 : .aaa.
Match at 2 : ..aaa
Match at 3 : ...aa!
Match at 4 : ....a!!
Those exclamation points are places where you try to match a character in the substring with a character that doesn't exist, after the end of the string. You can check for that within the loop to avoid the error each time it occurs, but why not eliminate all those cases at once by not looping for the match positions that cannot occur?
The number of start positions you may need to check is len(fullstring) + 1 - len(substring), so you can derive a range of possible start positions using range(0, len(fullstring) + 1 - len(substring)).

Swapping every second character in a string in Python

I have the following problem: I would like to write a function in Python which, given a string, returns a string where every group of two characters is swapped.
For example given "ABCDEF" it returns "BADCFE".
The length of the string would be guaranteed to be an even number.
Can you help me how to do it in Python?

To add another option:
>>> s = 'abcdefghijkl'
>>> ''.join([c[1] + c[0] for c in zip(s[::2], s[1::2])])
'badcfehgjilk'

import re
print re.sub(r'(.)(.)', r'\2\1', "ABCDEF")

from itertools import chain, izip_longest
''.join(chain.from_iterable(izip_longest(s[1::2], s[::2], fillvalue = '')))
You can also use islices instead of regular slices if you have very large strings or just want to avoid the copying.
Works for odd length strings even though that's not a requirement of the question.

While the above solutions do work, there is a very simple solution shall we say in "layman's" terms. Someone still learning python and string's can use the other answers but they don't really understand how they work or what each part of the code is doing without a full explanation by the poster as opposed to "this works". The following executes the swapping of every second character in a string and is easy for beginners to understand how it works.
It is simply iterating through the string (any length) by two's (starting from 0 and finding every second character) and then creating a new string (swapped_pair) by adding the current index + 1 (second character) and then the actual index (first character), e.g., index 1 is put at index 0 and then index 0 is put at index 1 and this repeats through iteration of string.
Also added code to ensure string is of even length as it only works for even length.
string = "abcdefghijklmnopqrstuvwxyz123"
# use this prior to below iteration if string needs to be even but is possibly odd
if len(string) % 2 != 0:
string = string[:-1]
# iteration to swap every second character in string
swapped_pair = ""
for i in range(0, len(string), 2):
swapped_pair += (string[i + 1] + string[i])
# use this after above iteration for any even or odd length of strings
if len(swapped_pair) % 2 != 0:
swapped_adj += swapped_pair[-1]
print(swapped_pair)
badcfehgjilknmporqtsvuxwzy21 # output if the "needs to be even" code used
badcfehgjilknmporqtsvuxwzy213 # output if the "even or odd" code used

Here's a nifty solution:
def swapem (s):
if len(s) < 2: return s
return "%s%s%s"%(s[1], s[0], swapem (s[2:]))
for str in ("", "a", "ab", "abcdefgh", "abcdefghi"):
print "[%s] -> [%s]"%(str, swapem (str))
though possibly not suitable for large strings :-)
Output is:
[] -> []
[a] -> [a]
[ab] -> [ba]
[abcdefgh] -> [badcfehg]
[abcdefghi] -> [badcfehgi]

If you prefer one-liners:
''.join(reduce(lambda x,y: x+y,[[s[1+(x<<1)],s[x<<1]] for x in range(0,len(s)>>1)]))

Here's a another simple solution:
"".join([(s[i:i+2])[::-1]for i in range(0,len(s),2)])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

counting the number of substrings in a string - python

Related

Finding index of given character in a string in Python with some default Parameter

Identifying a substring in Python (moving along indices within a string) - can only concatenate str (not "int") to str

string.find indexing in Python

Create Your Own Find String Function

Swapping every second character in a string in Python

Categories

Resources