Count overlapping substring in a string [duplicate] - python

This question already has answers here:
String count with overlapping occurrences [closed]
(25 answers)
Closed 1 year ago.
Say I have string = 'hannahannahskdjhannahannah' and I want to count the number of times the string hannah occurs, I can't simply use count, because that only counts the substring once in each case. That is, I am expecting to return 4 but only returns 2 when I run this with string.count('hannah').

You could use a running index to fetch the next occurance:
bla = 'hannahannahskdjhannahannah'
cnt = 0
idx = 0
while True:
idx = bla.find('hannah', idx)
if idx >= 0:
cnt += 1
idx += 1
else:
break
print(cnt)
Gives:
>> 4

How about something like this?
>>> d = {}
>>> string = 'hannahannahskdjhannahannah'
>>> for i in xrange(0,len(string)-len('hannah')+1):
... if string[i:i+len('hannah')] == 'hannah':
... d['hannah'] = d.get('hannah',0)+1
...
>>> d
{'hannah': 4}
>>>
This searches the string for hannah by splicing the string iteratively from index 0 all the way up to the length of the string minus the length of hannah

'''
s: main string
sub: sub-string
count: number of sub-strings found
p: use the found sub-string's index in p for finding the next occurrence of next sub-string
'''
count=0
p=0
for letter in s:
p=s.find(sub,p)
if(p!=-1):
count+=1
p+=1
print count

If you want to count also nonconsecutive substrings, this is the way to do it
def subword(lookup,whole):
if len(whole)<len(lookup):
return 0
if lookup==whole:
return 1
if lookup=='':
return 1
if lookup[0]==whole[0]:
return subword(lookup[1:],whole[1:])+subword(lookup,whole[1:])
return subword(lookup,whole[1:])

def Count_overlap(string, substring):
count = 0
start = 0
while start < len(string):
pos = string.find(substring, start)
if pos != -1:
start = pos + 1
count += 1
else:
break
return count
string = "hannahannahskdjhannahannah"
print(Count_overlap(string, "hannah"))

Don't want to answer this for you as it's simple enough to work out yourself.
But if I were you I'd use the string.find() method which takes the string you're looking for and the position to start looking from, combined with a while loop which uses the result of the find method as it's condition in some way.
That should in theory give you the answer.

Related

PYTHON how I separate numbers by dots every three steps? [duplicate]

This question already has answers here:
How do I slice a string every 3 indices? [duplicate]
(4 answers)
Closed 4 years ago.
Example
test = "123456789123"
I tried
test = "1234567"
print(".".join(test))
result
1.2.3.4.5.6.7
but I would like this
result
123.456.789.123
Here is a simple regex solution:
import re
print(re.sub(r'(?<!^)(?=(\d{3})+$)', r'.', "12345673456456456"))
It produces the following output:
12.345.673.456.456.456
the regex uses lookahead to check that the number of digits after a given position is divisible by 3.
If you don't really need a dot, you can simply use:
formated_number = "{:,}".format(value)
And if you really want those dots:
formated_number.replace(',','.')
Very naive solution without list comprehension:
test = '123456789123'
result = ''
while test:
result += test[:3]
if len(test) > 3:
result += '.'
test = test[3:]
print(result)
You can loop over the characters as a list and track where you are with a counter:
s = "123456789123"
output = []
count = 0
for c in list(s):
count += 1
if count == 4:
output.append(".")
count = 0
continue
else:
output.append(c)
result = ''.join(output)
print(result)
A straightforward implementation:
test = "1234567891234431222334442234ee3432"
testJoined = ""
for i in range(0,len(test),3):
testJoined += test[i:i+3] + "."
print(testJoined[0:-1])
test1:
"1234567891234431222334442234ee3432"
result1:
123.456.789.123.443.122.233.444.223.4ee.343.2
test2:
"1234567891234431222334442234ee34aa32"
result2:
123.456.789.123.443.122.233.444.223.4ee.34a.a32

Python | find method

def count_substring(string, sub_string):
count = 0
for i in range(0 , len(string)):
if ( string[i: ].find(sub_string)) == True:
count = count +1
return count
STRING = 'ininini'
SUB_STRING = 'ini'
CORRECT OUTPUT : 3
MY OUTPUT : 2
it is not detecting the last substring.
the problem is that
string[i:].find(sub_string)
returns -1 if not found or the position if found. You want to test for 0 you're testing for position 1 (aka True) (https://docs.python.org/3/library/stdtypes.html#str.find).
It's not "not detecting the last substring", it's detecting bogus matches.
You could use startswith instead:
def count_substring(string, sub_string):
count = 0
for i in range(0,len(string)):
if string[i:].startswith(sub_string):
count += 1
return count
Note that using find isn't a bad idea at all, since you don't have to slice the string (it's faster), there's a start position parameter which is handy here:
def count_substring(string, sub_string):
count = 0
for i in range(0,len(string)):
if string.find(sub_string,i) == i:
count += 1
return count
or in one line:
def count_substring(string, sub_string):
return sum(1 for i in range(len(string)) if string.find(sub_string,i) == i)
note that string.count(sub_string) doesn't yield the same result because it doesn't consider overlapping strings, like your solution does.

Write a program that prints the number of times the string contains a substring

s = "bobobobobobsdfsdfbob"
count = 0
for x in s :
if x == "bob" :
count += 1
print count
i want to count how many bobs in string s, the result if this gives me 17
what's wrong with my code i'm newbie python.
When you are looping overt the string, the throwaway variable will hold the characters, so in your loop x is never equal with bob.
If you want to count the non-overlaping strings you can simply use str.count:
In [52]: s.count('bob')
Out[52]: 4
For overlapping sub-strings you can use lookaround in regex:
In [57]: import re
In [59]: len(re.findall(r'(?=bob)', s))
Out[59]: 6
you can use string.count
for example:
s = "bobobobobobsdfsdfbob"
count = s.count("bob")
print(count)
I'm not giving the best solution, just trying to correct your code.
Understanding what for each (a.k.a range for) does in your case
for c in "Hello":
print c
Outputs:
H
e
l
l
o
In each iteration you are comparing a character to a string which results in a wrong answer.
Try something like
(For no overlapping, i.e no span)
s = "bobobobobobsdfsdfbob"
w = "bob"
count = 0
i = 0
while i <= len(s) - len(w):
if s[i:i+len(w)] == w:
count += 1
i += len(w)
else:
i += 1
print (count)
Output:
Count = 4
Overlapping
s = "bobobobobobsdfsdfbob"
w = "bob"
count = 0
for i in range(len(s) - len(w) + 1):
if s[i:i+len(w)] == w:
count += 1
print (count)
Output:
Count = 6

CodingBat Warm up 2 Python last2

I think I have the solution to this problem but when I run it on pythonfiddle.com or Canopy nothing comes up.
The problem is:
Given a string, return the count of the number of times that a
substring length 2 appears in the string and also as the last 2 chars
of the string, so "hixxxhi" yields 1 (we won't count the end
substring).
last2('hixxhi') → 1
last2('xaxxaxaxx') → 1
last2('axxxaaxx') → 2
My solution is :
def last2(str):
test2 = str[-2:]
start = 0
count = 0
while True:
if str.find(test2, start, -2) > 0:
count +=1
start = str.find(test2, start, -2) + 1
else:
break
return count
When I call the function last2, I get nothing. Is there something I'm missing?
str.find() returns -1 if a match is not found. If the match is found at the start of the string, 0 is returned, but your test condition excludes this case.
if str.find(test2, start, -1) > 0:
You want to match 0 too:
if str.find(test2, start, -2) >= 0:
You could avoid using str.find() twice here, and you want to allow for the one but last character to count too (xxxx has xx in there twice outside of matching the last two characters). Last but not least, if the string is shorter than length three, there never will be any matches:
def last2(value):
if len(value) < 3:
return 0
test2 = value[-2:]
start = 0
count = 0
while True:
index = value.find(test2, start, -1)
if index == -1:
break
count +=1
start = index + 1
return count
I've avoided shadowing the built-in str() function here too.
Demo:
>>> last2('hixxhi')
1
>>> last2('xaxxaxaxx')
1
>>> last2('axxxaaxx')
2
You've got an off by one error. str.find returns -1 if the string does not contain the substring, but it will return 0 if the substring is at the beginning of the string. Notice that your method works fine on the third example.
It should be:
if str.find(test2, start -2) >= 0:

String count with overlapping occurrences [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 months ago.
The community reviewed whether to reopen this question 3 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
What's the best way to count the number of occurrences of a given string, including overlap in Python? This is one way:
def function(string, str_to_search_for):
count = 0
for x in xrange(len(string) - len(str_to_search_for) + 1):
if string[x:x+len(str_to_search_for)] == str_to_search_for:
count += 1
return count
function('1011101111','11')
This method returns 5.
Is there a better way in Python?
Well, this might be faster since it does the comparing in C:
def occurrences(string, sub):
count = start = 0
while True:
start = string.find(sub, start) + 1
if start > 0:
count+=1
else:
return count
>>> import re
>>> text = '1011101111'
>>> len(re.findall('(?=11)', text))
5
If you didn't want to load the whole list of matches into memory, which would never be a problem! you could do this if you really wanted:
>>> sum(1 for _ in re.finditer('(?=11)', text))
5
As a function (re.escape makes sure the substring doesn't interfere with the regex):
def occurrences(text, sub):
return len(re.findall('(?={0})'.format(re.escape(sub)), text))
>>> occurrences(text, '11')
5
You can also try using the new Python regex module, which supports overlapping matches.
import regex as re
def count_overlapping(text, search_for):
return len(re.findall(search_for, text, overlapped=True))
count_overlapping('1011101111','11') # 5
Python's str.count counts non-overlapping substrings:
In [3]: "ababa".count("aba")
Out[3]: 1
Here are a few ways to count overlapping sequences, I'm sure there are many more :)
Look-ahead regular expressions
How to find overlapping matches with a regexp?
In [10]: re.findall("a(?=ba)", "ababa")
Out[10]: ['a', 'a']
Generate all substrings
In [11]: data = "ababa"
In [17]: sum(1 for i in range(len(data)) if data.startswith("aba", i))
Out[17]: 2
def count_substring(string, sub_string):
count = 0
for pos in range(len(string)):
if string[pos:].startswith(sub_string):
count += 1
return count
This could be the easiest way.
A fairly pythonic way would be to use list comprehension here, although it probably wouldn't be the most efficient.
sequence = 'abaaadcaaaa'
substr = 'aa'
counts = sum([
sequence.startswith(substr, i) for i in range(len(sequence))
])
print(counts) # 5
The list would be [False, False, True, False, False, False, True, True, False, False] as it checks all indexes through the string, and because int(True) == 1, sum gives us the total number of matches.
s = "bobobob"
sub = "bob"
ln = len(sub)
print(sum(sub == s[i:i+ln] for i in xrange(len(s)-(ln-1))))
How to find a pattern in another string with overlapping
This function (another solution!) receive a pattern and a text. Returns a list with all the substring located in the and their positions.
def occurrences(pattern, text):
"""
input: search a pattern (regular expression) in a text
returns: a list of substrings and their positions
"""
p = re.compile('(?=({0}))'.format(pattern))
matches = re.finditer(p, text)
return [(match.group(1), match.start()) for match in matches]
print (occurrences('ana', 'banana'))
print (occurrences('.ana', 'Banana-fana fo-fana'))
[('ana', 1), ('ana', 3)]
[('Bana', 0), ('nana', 2), ('fana', 7), ('fana', 15)]
My answer, to the bob question on the course:
s = 'azcbobobegghaklbob'
total = 0
for i in range(len(s)-2):
if s[i:i+3] == 'bob':
total += 1
print 'number of times bob occurs is: ', total
Here is my edX MIT "find bob"* solution (*find number of "bob" occurences in a string named s), which basicaly counts overlapping occurrences of a given substing:
s = 'azcbobobegghakl'
count = 0
while 'bob' in s:
count += 1
s = s[(s.find('bob') + 2):]
print "Number of times bob occurs is: {}".format(count)
If strings are large, you want to use Rabin-Karp, in summary:
a rolling window of substring size, moving over a string
a hash with O(1) overhead for adding and removing (i.e. move by 1 char)
implemented in C or relying on pypy
That can be solved using regex.
import re
def function(string, sub_string):
match = re.findall('(?='+sub_string+')',string)
return len(match)
def count_substring(string, sub_string):
counter = 0
for i in range(len(string)):
if string[i:].startswith(sub_string):
counter = counter + 1
return counter
Above code simply loops throughout the string once and keeps checking if any string is starting with the particular substring that is being counted.
re.subn hasn't been mentioned yet:
>>> import re
>>> re.subn('(?=11)', '', '1011101111')[1]
5
def count_overlaps (string, look_for):
start = 0
matches = 0
while True:
start = string.find (look_for, start)
if start < 0:
break
start += 1
matches += 1
return matches
print count_overlaps ('abrabra', 'abra')
Function that takes as input two strings and counts how many times sub occurs in string, including overlaps. To check whether sub is a substring, I used the in operator.
def count_Occurrences(string, sub):
count=0
for i in range(0, len(string)-len(sub)+1):
if sub in string[i:i+len(sub)]:
count=count+1
print 'Number of times sub occurs in string (including overlaps): ', count
For a duplicated question i've decided to count it 3 by 3 and comparing the string e.g.
counted = 0
for i in range(len(string)):
if string[i*3:(i+1)*3] == 'xox':
counted = counted +1
print counted
An alternative very close to the accepted answer but using while as the if test instead of including if inside the loop:
def countSubstr(string, sub):
count = 0
while sub in string:
count += 1
string = string[string.find(sub) + 1:]
return count;
This avoids while True: and is a little cleaner in my opinion
This is another example of using str.find() but a lot of the answers make it more complicated than necessary:
def occurrences(text, sub):
c, n = 0, text.find(sub)
while n != -1:
c += 1
n = text.find(sub, n+1)
return c
In []:
occurrences('1011101111', '11')
Out[]:
5
Given
sequence = '1011101111'
sub = "11"
Code
In this particular case:
sum(x == tuple(sub) for x in zip(sequence, sequence[1:]))
# 5
More generally, this
windows = zip(*([sequence[i:] for i, _ in enumerate(sequence)][:len(sub)]))
sum(x == tuple(sub) for x in windows)
# 5
or extend to generators:
import itertools as it
iter_ = (sequence[i:] for i, _ in enumerate(sequence))
windows = zip(*(it.islice(iter_, None, len(sub))))
sum(x == tuple(sub) for x in windows)
Alternative
You can use more_itertools.locate:
import more_itertools as mit
len(list(mit.locate(sequence, pred=lambda *args: args == tuple(sub), window_size=len(sub))))
# 5
A simple way to count substring occurrence is to use count():
>>> s = 'bobob'
>>> s.count('bob')
1
You can use replace () to find overlapping strings if you know which part will be overlap:
>>> s = 'bobob'
>>> s.replace('b', 'bb').count('bob')
2
Note that besides being static, there are other limitations:
>>> s = 'aaa'
>>> count('aa') # there must be two occurrences
1
>>> s.replace('a', 'aa').count('aa')
3
def occurance_of_pattern(text, pattern):
text_len , pattern_len = len(text), len(pattern)
return sum(1 for idx in range(text_len - pattern_len + 1) if text[idx: idx+pattern_len] == pattern)
I wanted to see if the number of input of same prefix char is same postfix, e.g., "foo" and """foo"" but fail on """bar"":
from itertools import count, takewhile
from operator import eq
# From https://stackoverflow.com/a/15112059
def count_iter_items(iterable):
"""
Consume an iterable not reading it into memory; return the number of items.
:param iterable: An iterable
:type iterable: ```Iterable```
:return: Number of items in iterable
:rtype: ```int```
"""
counter = count()
deque(zip(iterable, counter), maxlen=0)
return next(counter)
def begin_matches_end(s):
"""
Checks if the begin matches the end of the string
:param s: Input string of length > 0
:type s: ```str```
:return: Whether the beginning matches the end (checks first match chars
:rtype: ```bool```
"""
return (count_iter_items(takewhile(partial(eq, s[0]), s)) ==
count_iter_items(takewhile(partial(eq, s[0]), s[::-1])))
Solution with replaced parts of the string
s = 'lolololol'
t = 0
t += s.count('lol')
s = s.replace('lol', 'lo1')
t += s.count('1ol')
print("Number of times lol occurs is:", t)
Answer is 4.
If you want to count permutation counts of length 5 (adjust if wanted for different lengths):
def MerCount(s):
for i in xrange(len(s)-4):
d[s[i:i+5]] += 1
return d

Categories

Resources