python pattern count Find a “hidden message” in the replication origin - python

The question ask to find a “hidden message” in the replication origin.
Input: A string Text (representing the replication origin of a genome).
Output: A hidden message in Text.
Translate to computational language,
Input: Strings Text and Pattern.
Output: Count(Text, Pattern).
For example,
Count(ACAACTATGCATACTATCGGGAACTATCCT, ACTAT) = 3.
In theory, we should account for overlapping occurrences of Pattern in Text right? So one way to do it is to screen down from first element to the length of text-length of the pattern we are looking for?
Here's the pseudo code i come up with,
def PatternCount(Text, Pattern):
count = 0
for i = 0 to len(Text)-len(Pattern):
if Text(i, len(Pattern)) = Pattern:
count = count + 1
return count
Any suggestion? I'm new to python. Thanks in advance.

This is what I came up with:
def pattern_count(text, pattern):
count = 0
for i in range(0, len(text) - len(pattern) + 1):
if text[i : len(pattern) + i] == pattern:
count += 1
return count
We're using string slicing (text[i : len(pattern) + i]) to check if the sub-string matches the pattern.
Input: text = "abc123!##654abcabc" and pattern = "abc"
Output: 3

import re
print len(re.findall("abc", "abc123!##654abcabc"))

I think a more "pythonic" solution would be to use list comprehensions.
def pattern_count(text, pattern):
return len([x for x in range(len(text) - len(pattern)+1) if pattern in text[x:len(pattern)+x]])

Related

How to find the position of every word in a sentence(string) where the same word is used many times in a sentence using functions in python [duplicate]

Python has string.find() and string.rfind() to get the index of a substring in a string.
I'm wondering whether there is something like string.find_all() which can return all found indexes (not only the first from the beginning or the first from the end).
For example:
string = "test test test test"
print string.find('test') # 0
print string.rfind('test') # 15
#this is the goal
print string.find_all('test') # [0,5,10,15]
For counting the occurrences, see Count number of occurrences of a substring in a string.
There is no simple built-in string function that does what you're looking for, but you could use the more powerful regular expressions:
import re
[m.start() for m in re.finditer('test', 'test test test test')]
#[0, 5, 10, 15]
If you want to find overlapping matches, lookahead will do that:
[m.start() for m in re.finditer('(?=tt)', 'ttt')]
#[0, 1]
If you want a reverse find-all without overlaps, you can combine positive and negative lookahead into an expression like this:
search = 'tt'
[m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')]
#[1]
re.finditer returns a generator, so you could change the [] in the above to () to get a generator instead of a list which will be more efficient if you're only iterating through the results once.
>>> help(str.find)
Help on method_descriptor:
find(...)
S.find(sub [,start [,end]]) -> int
Thus, we can build it ourselves:
def find_all(a_str, sub):
start = 0
while True:
start = a_str.find(sub, start)
if start == -1: return
yield start
start += len(sub) # use start += 1 to find overlapping matches
list(find_all('spam spam spam spam', 'spam')) # [0, 5, 10, 15]
No temporary strings or regexes required.
Here's a (very inefficient) way to get all (i.e. even overlapping) matches:
>>> string = "test test test test"
>>> [i for i in range(len(string)) if string.startswith('test', i)]
[0, 5, 10, 15]
Use re.finditer:
import re
sentence = input("Give me a sentence ")
word = input("What word would you like to find ")
for match in re.finditer(word, sentence):
print (match.start(), match.end())
For word = "this" and sentence = "this is a sentence this this" this will yield the output:
(0, 4)
(19, 23)
(24, 28)
Again, old thread, but here's my solution using a generator and plain str.find.
def findall(p, s):
'''Yields all the positions of
the pattern p in the string s.'''
i = s.find(p)
while i != -1:
yield i
i = s.find(p, i+1)
Example
x = 'banananassantana'
[(i, x[i:i+2]) for i in findall('na', x)]
returns
[(2, 'na'), (4, 'na'), (6, 'na'), (14, 'na')]
You can use re.finditer() for non-overlapping matches.
>>> import re
>>> aString = 'this is a string where the substring "is" is repeated several times'
>>> print [(a.start(), a.end()) for a in list(re.finditer('is', aString))]
[(2, 4), (5, 7), (38, 40), (42, 44)]
but won't work for:
In [1]: aString="ababa"
In [2]: print [(a.start(), a.end()) for a in list(re.finditer('aba', aString))]
Output: [(0, 3)]
Come, let us recurse together.
def locations_of_substring(string, substring):
"""Return a list of locations of a substring."""
substring_length = len(substring)
def recurse(locations_found, start):
location = string.find(substring, start)
if location != -1:
return recurse(locations_found + [location], location+substring_length)
else:
return locations_found
return recurse([], 0)
print(locations_of_substring('this is a test for finding this and this', 'this'))
# prints [0, 27, 36]
No need for regular expressions this way.
If you're just looking for a single character, this would work:
string = "dooobiedoobiedoobie"
match = 'o'
reduce(lambda count, char: count + 1 if char == match else count, string, 0)
# produces 7
Also,
string = "test test test test"
match = "test"
len(string.split(match)) - 1
# produces 4
My hunch is that neither of these (especially #2) is terribly performant.
this is an old thread but i got interested and wanted to share my solution.
def find_all(a_string, sub):
result = []
k = 0
while k < len(a_string):
k = a_string.find(sub, k)
if k == -1:
return result
else:
result.append(k)
k += 1 #change to k += len(sub) to not search overlapping results
return result
It should return a list of positions where the substring was found.
Please comment if you see an error or room for improvment.
This does the trick for me using re.finditer
import re
text = 'This is sample text to test if this pythonic '\
'program can serve as an indexing platform for '\
'finding words in a paragraph. It can give '\
'values as to where the word is located with the '\
'different examples as stated'
# find all occurances of the word 'as' in the above text
find_the_word = re.finditer('as', text)
for match in find_the_word:
print('start {}, end {}, search string \'{}\''.
format(match.start(), match.end(), match.group()))
This thread is a little old but this worked for me:
numberString = "onetwothreefourfivesixseveneightninefiveten"
testString = "five"
marker = 0
while marker < len(numberString):
try:
print(numberString.index("five",marker))
marker = numberString.index("five", marker) + 1
except ValueError:
print("String not found")
marker = len(numberString)
You can try :
>>> string = "test test test test"
>>> for index,value in enumerate(string):
if string[index:index+(len("test"))] == "test":
print index
0
5
10
15
You can try :
import re
str1 = "This dress looks good; you have good taste in clothes."
substr = "good"
result = [_.start() for _ in re.finditer(substr, str1)]
# result = [17, 32]
When looking for a large amount of key words in a document, use flashtext
from flashtext import KeywordProcessor
words = ['test', 'exam', 'quiz']
txt = 'this is a test'
kwp = KeywordProcessor()
kwp.add_keywords_from_list(words)
result = kwp.extract_keywords(txt, span_info=True)
Flashtext runs faster than regex on large list of search words.
This function does not look at all positions inside the string, it does not waste compute resources. My try:
def findAll(string,word):
all_positions=[]
next_pos=-1
while True:
next_pos=string.find(word,next_pos+1)
if(next_pos<0):
break
all_positions.append(next_pos)
return all_positions
to use it call it like this:
result=findAll('this word is a big word man how many words are there?','word')
src = input() # we will find substring in this string
sub = input() # substring
res = []
pos = src.find(sub)
while pos != -1:
res.append(pos)
pos = src.find(sub, pos + 1)
Whatever the solutions provided by others are completely based on the available method find() or any available methods.
What is the core basic algorithm to find all the occurrences of a
substring in a string?
def find_all(string,substring):
"""
Function: Returning all the index of substring in a string
Arguments: String and the search string
Return:Returning a list
"""
length = len(substring)
c=0
indexes = []
while c < len(string):
if string[c:c+length] == substring:
indexes.append(c)
c=c+1
return indexes
You can also inherit str class to new class and can use this function
below.
class newstr(str):
def find_all(string,substring):
"""
Function: Returning all the index of substring in a string
Arguments: String and the search string
Return:Returning a list
"""
length = len(substring)
c=0
indexes = []
while c < len(string):
if string[c:c+length] == substring:
indexes.append(c)
c=c+1
return indexes
Calling the method
newstr.find_all('Do you find this answer helpful? then upvote
this!','this')
This is solution of a similar question from hackerrank. I hope this could help you.
import re
a = input()
b = input()
if b not in a:
print((-1,-1))
else:
#create two list as
start_indc = [m.start() for m in re.finditer('(?=' + b + ')', a)]
for i in range(len(start_indc)):
print((start_indc[i], start_indc[i]+len(b)-1))
Output:
aaadaa
aa
(0, 1)
(1, 2)
(4, 5)
Here's a solution that I came up with, using assignment expression (new feature since Python 3.8):
string = "test test test test"
phrase = "test"
start = -1
result = [(start := string.find(phrase, start + 1)) for _ in range(string.count(phrase))]
Output:
[0, 5, 10, 15]
I think the most clean way of solution is without libraries and yields:
def find_all_occurrences(string, sub):
index_of_occurrences = []
current_index = 0
while True:
current_index = string.find(sub, current_index)
if current_index == -1:
return index_of_occurrences
else:
index_of_occurrences.append(current_index)
current_index += len(sub)
find_all_occurrences(string, substr)
Note: find() method returns -1 when it can't find anything
The pythonic way would be:
mystring = 'Hello World, this should work!'
find_all = lambda c,s: [x for x in range(c.find(s), len(c)) if c[x] == s]
# s represents the search string
# c represents the character string
find_all(mystring,'o') # will return all positions of 'o'
[4, 7, 20, 26]
>>>
if you only want to use numpy here is a solution
import numpy as np
S= "test test test test"
S2 = 'test'
inds = np.cumsum([len(k)+len(S2) for k in S.split(S2)[:-1]])- len(S2)
print(inds)
if you want to use without re(regex) then:
find_all = lambda _str,_w : [ i for i in range(len(_str)) if _str.startswith(_w,i) ]
string = "test test test test"
print( find_all(string, 'test') ) # >>> [0, 5, 10, 15]
please look at below code
#!/usr/bin/env python
# coding:utf-8
'''黄哥Python'''
def get_substring_indices(text, s):
result = [i for i in range(len(text)) if text.startswith(s, i)]
return result
if __name__ == '__main__':
text = "How much wood would a wood chuck chuck if a wood chuck could chuck wood?"
s = 'wood'
print get_substring_indices(text, s)
def find_index(string, let):
enumerated = [place for place, letter in enumerate(string) if letter == let]
return enumerated
for example :
find_index("hey doode find d", "d")
returns:
[4, 7, 13, 15]
Not exactly what OP asked but you could also use the split function to get a list of where all the substrings don't occur. OP didn't specify the end goal of the code but if your goal is to remove the substrings anyways then this could be a simple one-liner. There are probably more efficient ways to do this with larger strings; regular expressions would be preferable in that case
# Extract all non-substrings
s = "an-example-string"
s_no_dash = s.split('-')
# >>> s_no_dash
# ['an', 'example', 'string']
# Or extract and join them into a sentence
s_no_dash2 = ' '.join(s.split('-'))
# >>> s_no_dash2
# 'an example string'
Did a brief skim of other answers so apologies if this is already up there.
def count_substring(string, sub_string):
c=0
for i in range(0,len(string)-2):
if string[i:i+len(sub_string)] == sub_string:
c+=1
return c
if __name__ == '__main__':
string = input().strip()
sub_string = input().strip()
count = count_substring(string, sub_string)
print(count)
I runned in the same problem and did this:
hw = 'Hello oh World!'
list_hw = list(hw)
o_in_hw = []
while True:
o = hw.find('o')
if o != -1:
o_in_hw.append(o)
list_hw[o] = ' '
hw = ''.join(list_hw)
else:
print(o_in_hw)
break
Im pretty new at coding so you can probably simplify it (and if planned to used continuously of course make it a function).
All and all it works as intended for what i was doing.
Edit: Please consider this is for single characters only, and it will change your variable, so you have to create a copy of the string in a new variable to save it, i didnt put it in the code cause its easy and its only to show how i made it work.
By slicing we find all the combinations possible and append them in a list and find the number of times it occurs using count function
s=input()
n=len(s)
l=[]
f=input()
print(s[0])
for i in range(0,n):
for j in range(1,n+1):
l.append(s[i:j])
if f in l:
print(l.count(f))
To find all the occurence of a character in a give string and return as a dictionary
eg: hello
result :
{'h':1, 'e':1, 'l':2, 'o':1}
def count(string):
result = {}
if(string):
for i in string:
result[i] = string.count(i)
return result
return {}
or else you do like this
from collections import Counter
def count(string):
return Counter(string)

How to split word to ngrams in Python?

I've got this question. I should split word to ngrams (for example: word ADVENTURE has three 4grams - ADVE; ENTU; TURE). There is a book file document (that's the reason for counter and isalpha), which is I don't have here, so I'm using only a list of 2 words. This is my code in Python:
words = ['adven', 'adventure']
def ngrams(words, n):
counter = {}
for word in words:
if (len(word)-1) >= n:
for i in range(0, len(word)):
if word.isalpha() == True:
ngram = ""
for i in range(len(word)):
ngram += word[i:n:]
if len(ngram) == n:
ngram.join(counter)
counter[ngram] = counter.get(ngram, 0) + 1
return counter
print(trotl(words, 4))
This is what the code gives me:
{'adve': 14}
I don't care about the values in it but I'm not so good at strings and I don't know what I should do to gives me the three 4grams. I try to do "ngram += word[i::]" but that gives me None. Please help me, this is my school homework and I can't do more functions when this ngrams doesn't work.
use nltk.ngrams for this job:
from nltk import ngrams
I think the definition you have of n-grams is a little bit different from the conventional, as pointed out by #Stuart in his comment. However, with the definition from your comment, I think the following would solve your problem.
def n_grams(word, n):
# We can't find n-grams if the word has less than n letters.
if n > len(word):
return []
output = []
start_idx = 0
end_idx = start_idx + n
# Grab all n-grams except the last one
while end_idx < len(word):
n_gram = word[start_idx:end_idx]
output.append(n_gram)
start_idx = end_idx - 1
end_idx = start_idx + n
# Grab the last n-gram
last_n_gram_start = len(word) - n
last_n_gram_end = len(word)
output.append(word[last_n_gram_start:last_n_gram_end])
return output
If I've understood the rules correctly, you can do it like this
def special_ngrams(word, n):
""" Yield character ngrams of word that overlap by only one character,
except for the last two ngrams which may overlap by more than one
character. The first and last ngrams of the word are always included. """
for start in range(0, len(word) - n, n - 1):
yield word[start:start + n]
yield word[-n:]
for word in "hello there this is a test", "adventure", "tyrannosaurus", "advent":
print(list(special_ngrams(word, 4)))

How to count consecutive repetitions of a substring in a string?

I need to find consecutive (non-overlapping) repetitions of a substring in a string. I can count them but not consecutive. For instance:
string = "AASDASDDAAAAAAAAERQREQREQRAAAAREWQRWERAAA"
substring = "AA"
here, "AA" is repeated one time at the beginning of the string, then 4 times, then 2 times, etc.
I should select the biggest one, in this example - 4 times.
How can I do that?
Regular expressions shine when searching through strings. Here you can find all groups of one or more AA with (?:AA)+ the (?: simply tells the engine to interpret the parentheses for grouping only.
Once you have the groups you can use max() to find the longest based on length (len()).
import re
s = "AASDASDDAAAAAAAAERQREQREQRAAAAREWQRWERAAA"
groups = re.findall(r'(?:AA)+', s)
print(groups)
# ['AA', 'AAAAAAAA', 'AAAA', 'AA']
largest = max(groups, key=len)
print(len(largest) // 2)
# 4
one way to do it with basic operations is to search for the pattern "AA" in the string and add "AA" to the search until you don't find any more:
string = "AASDASDDAAAAAAAAERQREQREQRAAAAREWQRWERAAA"
count = 0
pattern = "AA"
while pattern in string:
count += 1
pattern += "AA"
output:
print(count) # 4
It could also be written on a single line like this:
count = next(r-1 for r in range(1,len(string)+1) if "AA"*r not in string)
You could also use the find() method instead of the in operator which would allow the search to continue from the first match instead of starting over from the beginning of the string:
string = "AASDASDDAAAAAAAAERQREQREQRAAAAREWQRWERAAA"
pattern = "AA"
repeated = ""
position = 0
while position >= 0:
repeated += pattern
position = string.find(repeated,position)
count = len(repeated)//len(pattern)-1
print(count) # 4
Try this. This simply iterates through the words and updates the max on each iteration.
string = "AASDASDDAAAAAAAAAERQREQREQRAAAAREWQRWERAAA" #normal case
#string_2 = "BAASDAAAAAAAAERQREQREQRAAAAREWQRWERAAA" #edge case
substring = "AA"
max_count=0
count = 0
for i in range(0,len(string),len(substring)):
print(string[i-len(substring):i])
if string[i-1:i] == substring[:1] and count == 0:
i = i+(len(substring)-1)
if string[i-len(substring):i] == substring:
count+= 1
else:
count = 0
max_count = max(max_count,count)
print(max_count)
outputs
4
Since you are new, try using a visualizer to get a better understanding of the code execution.
string = "AASDASDDAAAAAAAAERQREQREQRAAAAREWQRWERAAA"
substring = "AA"
count = 0
string_length = len(string)
substring_length = len(substring)
for i in range( round( string_length / substring_length ) ):
if (i * substring) in string:
count = i
return count

How to remove multiple consecutive sequences of consecutive duplicate characters in python

I am trying to preprocess some tweets for an ML project where I am having troubles with two types of strings e.g.
str1 = "coooool" and str2 = "gooooaaaaaal".
After removing repeated characters, I would like to maintain the word in str1, i.e.
cleaned_str1 = "cool" while cleaned_str2 = "goal".
I tried a few approaches that I found but I couldn't get the right output. Could someone help me with this? Thank you in advance.
Use regular expressions:
re.sub(r"(\w)\1+(\w)\2+", r"\1\2", "goooaaaal") # -> goal
re.sub(r"(\w)\1+(\w)\2+", r"\1\2", "coooool") # -> cool
def removeDuplicates(S):
n = len(S)
j = 0
if (n < 2) :
return
for i in range(n):
if (S[j] != S[i]):
j += 1
S[j] = S[i]
j += 1
S = S[:j]
return S
This was taken directly from Geeks for Geeks.
There is no way for a program to intuitively know that "cool" needs two "o's" as in your example.

String count with overlapping occurrences [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 months ago.
The community reviewed whether to reopen this question 3 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
What's the best way to count the number of occurrences of a given string, including overlap in Python? This is one way:
def function(string, str_to_search_for):
count = 0
for x in xrange(len(string) - len(str_to_search_for) + 1):
if string[x:x+len(str_to_search_for)] == str_to_search_for:
count += 1
return count
function('1011101111','11')
This method returns 5.
Is there a better way in Python?
Well, this might be faster since it does the comparing in C:
def occurrences(string, sub):
count = start = 0
while True:
start = string.find(sub, start) + 1
if start > 0:
count+=1
else:
return count
>>> import re
>>> text = '1011101111'
>>> len(re.findall('(?=11)', text))
5
If you didn't want to load the whole list of matches into memory, which would never be a problem! you could do this if you really wanted:
>>> sum(1 for _ in re.finditer('(?=11)', text))
5
As a function (re.escape makes sure the substring doesn't interfere with the regex):
def occurrences(text, sub):
return len(re.findall('(?={0})'.format(re.escape(sub)), text))
>>> occurrences(text, '11')
5
You can also try using the new Python regex module, which supports overlapping matches.
import regex as re
def count_overlapping(text, search_for):
return len(re.findall(search_for, text, overlapped=True))
count_overlapping('1011101111','11') # 5
Python's str.count counts non-overlapping substrings:
In [3]: "ababa".count("aba")
Out[3]: 1
Here are a few ways to count overlapping sequences, I'm sure there are many more :)
Look-ahead regular expressions
How to find overlapping matches with a regexp?
In [10]: re.findall("a(?=ba)", "ababa")
Out[10]: ['a', 'a']
Generate all substrings
In [11]: data = "ababa"
In [17]: sum(1 for i in range(len(data)) if data.startswith("aba", i))
Out[17]: 2
def count_substring(string, sub_string):
count = 0
for pos in range(len(string)):
if string[pos:].startswith(sub_string):
count += 1
return count
This could be the easiest way.
A fairly pythonic way would be to use list comprehension here, although it probably wouldn't be the most efficient.
sequence = 'abaaadcaaaa'
substr = 'aa'
counts = sum([
sequence.startswith(substr, i) for i in range(len(sequence))
])
print(counts) # 5
The list would be [False, False, True, False, False, False, True, True, False, False] as it checks all indexes through the string, and because int(True) == 1, sum gives us the total number of matches.
s = "bobobob"
sub = "bob"
ln = len(sub)
print(sum(sub == s[i:i+ln] for i in xrange(len(s)-(ln-1))))
How to find a pattern in another string with overlapping
This function (another solution!) receive a pattern and a text. Returns a list with all the substring located in the and their positions.
def occurrences(pattern, text):
"""
input: search a pattern (regular expression) in a text
returns: a list of substrings and their positions
"""
p = re.compile('(?=({0}))'.format(pattern))
matches = re.finditer(p, text)
return [(match.group(1), match.start()) for match in matches]
print (occurrences('ana', 'banana'))
print (occurrences('.ana', 'Banana-fana fo-fana'))
[('ana', 1), ('ana', 3)]
[('Bana', 0), ('nana', 2), ('fana', 7), ('fana', 15)]
My answer, to the bob question on the course:
s = 'azcbobobegghaklbob'
total = 0
for i in range(len(s)-2):
if s[i:i+3] == 'bob':
total += 1
print 'number of times bob occurs is: ', total
Here is my edX MIT "find bob"* solution (*find number of "bob" occurences in a string named s), which basicaly counts overlapping occurrences of a given substing:
s = 'azcbobobegghakl'
count = 0
while 'bob' in s:
count += 1
s = s[(s.find('bob') + 2):]
print "Number of times bob occurs is: {}".format(count)
If strings are large, you want to use Rabin-Karp, in summary:
a rolling window of substring size, moving over a string
a hash with O(1) overhead for adding and removing (i.e. move by 1 char)
implemented in C or relying on pypy
That can be solved using regex.
import re
def function(string, sub_string):
match = re.findall('(?='+sub_string+')',string)
return len(match)
def count_substring(string, sub_string):
counter = 0
for i in range(len(string)):
if string[i:].startswith(sub_string):
counter = counter + 1
return counter
Above code simply loops throughout the string once and keeps checking if any string is starting with the particular substring that is being counted.
re.subn hasn't been mentioned yet:
>>> import re
>>> re.subn('(?=11)', '', '1011101111')[1]
5
def count_overlaps (string, look_for):
start = 0
matches = 0
while True:
start = string.find (look_for, start)
if start < 0:
break
start += 1
matches += 1
return matches
print count_overlaps ('abrabra', 'abra')
Function that takes as input two strings and counts how many times sub occurs in string, including overlaps. To check whether sub is a substring, I used the in operator.
def count_Occurrences(string, sub):
count=0
for i in range(0, len(string)-len(sub)+1):
if sub in string[i:i+len(sub)]:
count=count+1
print 'Number of times sub occurs in string (including overlaps): ', count
For a duplicated question i've decided to count it 3 by 3 and comparing the string e.g.
counted = 0
for i in range(len(string)):
if string[i*3:(i+1)*3] == 'xox':
counted = counted +1
print counted
An alternative very close to the accepted answer but using while as the if test instead of including if inside the loop:
def countSubstr(string, sub):
count = 0
while sub in string:
count += 1
string = string[string.find(sub) + 1:]
return count;
This avoids while True: and is a little cleaner in my opinion
This is another example of using str.find() but a lot of the answers make it more complicated than necessary:
def occurrences(text, sub):
c, n = 0, text.find(sub)
while n != -1:
c += 1
n = text.find(sub, n+1)
return c
In []:
occurrences('1011101111', '11')
Out[]:
5
Given
sequence = '1011101111'
sub = "11"
Code
In this particular case:
sum(x == tuple(sub) for x in zip(sequence, sequence[1:]))
# 5
More generally, this
windows = zip(*([sequence[i:] for i, _ in enumerate(sequence)][:len(sub)]))
sum(x == tuple(sub) for x in windows)
# 5
or extend to generators:
import itertools as it
iter_ = (sequence[i:] for i, _ in enumerate(sequence))
windows = zip(*(it.islice(iter_, None, len(sub))))
sum(x == tuple(sub) for x in windows)
Alternative
You can use more_itertools.locate:
import more_itertools as mit
len(list(mit.locate(sequence, pred=lambda *args: args == tuple(sub), window_size=len(sub))))
# 5
A simple way to count substring occurrence is to use count():
>>> s = 'bobob'
>>> s.count('bob')
1
You can use replace () to find overlapping strings if you know which part will be overlap:
>>> s = 'bobob'
>>> s.replace('b', 'bb').count('bob')
2
Note that besides being static, there are other limitations:
>>> s = 'aaa'
>>> count('aa') # there must be two occurrences
1
>>> s.replace('a', 'aa').count('aa')
3
def occurance_of_pattern(text, pattern):
text_len , pattern_len = len(text), len(pattern)
return sum(1 for idx in range(text_len - pattern_len + 1) if text[idx: idx+pattern_len] == pattern)
I wanted to see if the number of input of same prefix char is same postfix, e.g., "foo" and """foo"" but fail on """bar"":
from itertools import count, takewhile
from operator import eq
# From https://stackoverflow.com/a/15112059
def count_iter_items(iterable):
"""
Consume an iterable not reading it into memory; return the number of items.
:param iterable: An iterable
:type iterable: ```Iterable```
:return: Number of items in iterable
:rtype: ```int```
"""
counter = count()
deque(zip(iterable, counter), maxlen=0)
return next(counter)
def begin_matches_end(s):
"""
Checks if the begin matches the end of the string
:param s: Input string of length > 0
:type s: ```str```
:return: Whether the beginning matches the end (checks first match chars
:rtype: ```bool```
"""
return (count_iter_items(takewhile(partial(eq, s[0]), s)) ==
count_iter_items(takewhile(partial(eq, s[0]), s[::-1])))
Solution with replaced parts of the string
s = 'lolololol'
t = 0
t += s.count('lol')
s = s.replace('lol', 'lo1')
t += s.count('1ol')
print("Number of times lol occurs is:", t)
Answer is 4.
If you want to count permutation counts of length 5 (adjust if wanted for different lengths):
def MerCount(s):
for i in xrange(len(s)-4):
d[s[i:i+5]] += 1
return d

Categories

Resources