Count consecutive characters - python

How would I count consecutive characters in Python to see the number of times each unique digit repeats before the next unique digit?
At first, I thought I could do something like:
word = '1000'
counter = 0
print range(len(word))
for i in range(len(word) - 1):
while word[i] == word[i + 1]:
counter += 1
print counter * "0"
else:
counter = 1
print counter * "1"
So that in this manner I could see the number of times each unique digit repeats. But this, of course, falls out of range when i reaches the last value.
In the example above, I would want Python to tell me that 1 repeats 1, and that 0 repeats 3 times. The code above fails, however, because of my while statement.
How could I do this with just built-in functions?

Consecutive counts:
You can use itertools.groupby:
s = "111000222334455555"
from itertools import groupby
groups = groupby(s)
result = [(label, sum(1 for _ in group)) for label, group in groups]
After which, result looks like:
[("1": 3), ("0", 3), ("2", 3), ("3", 2), ("4", 2), ("5", 5)]
And you could format with something like:
", ".join("{}x{}".format(label, count) for label, count in result)
# "1x3, 0x3, 2x3, 3x2, 4x2, 5x5"
Total counts:
Someone in the comments is concerned that you want a total count of numbers so "11100111" -> {"1":6, "0":2}. In that case you want to use a collections.Counter:
from collections import Counter
s = "11100111"
result = Counter(s)
# {"1":6, "0":2}
Your method:
As many have pointed out, your method fails because you're looping through range(len(s)) but addressing s[i+1]. This leads to an off-by-one error when i is pointing at the last index of s, so i+1 raises an IndexError. One way to fix this would be to loop through range(len(s)-1), but it's more pythonic to generate something to iterate over.
For string that's not absolutely huge, zip(s, s[1:]) isn't a a performance issue, so you could do:
counts = []
count = 1
for a, b in zip(s, s[1:]):
if a==b:
count += 1
else:
counts.append((a, count))
count = 1
The only problem being that you'll have to special-case the last character if it's unique. That can be fixed with itertools.zip_longest
import itertools
counts = []
count = 1
for a, b in itertools.zip_longest(s, s[1:], fillvalue=None):
if a==b:
count += 1
else:
counts.append((a, count))
count = 1
If you do have a truly huge string and can't stand to hold two of them in memory at a time, you can use the itertools recipe pairwise.
def pairwise(iterable):
"""iterates pairwise without holding an extra copy of iterable in memory"""
a, b = itertools.tee(iterable)
next(b, None)
return itertools.zip_longest(a, b, fillvalue=None)
counts = []
count = 1
for a, b in pairwise(s):
...

A solution "that way", with only basic statements:
word="100011010" #word = "1"
count=1
length=""
if len(word)>1:
for i in range(1,len(word)):
if word[i-1]==word[i]:
count+=1
else :
length += word[i-1]+" repeats "+str(count)+", "
count=1
length += ("and "+word[i]+" repeats "+str(count))
else:
i=0
length += ("and "+word[i]+" repeats "+str(count))
print (length)
Output :
'1 repeats 1, 0 repeats 3, 1 repeats 2, 0 repeats 1, 1 repeats 1, and 0 repeats 1'
#'1 repeats 1'

Totals (without sub-groupings)
#!/usr/bin/python3 -B
charseq = 'abbcccdddd'
distros = { c:1 for c in charseq }
for c in range(len(charseq)-1):
if charseq[c] == charseq[c+1]:
distros[charseq[c]] += 1
print(distros)
I'll provide a brief explanation for the interesting lines.
distros = { c:1 for c in charseq }
The line above is a dictionary comprehension, and it basically iterates over the characters in charseq and creates a key/value pair for a dictionary where the key is the character and the value is the number of times it has been encountered so far.
Then comes the loop:
for c in range(len(charseq)-1):
We go from 0 to length - 1 to avoid going out of bounds with the c+1 indexing in the loop's body.
if charseq[c] == charseq[c+1]:
distros[charseq[c]] += 1
At this point, every match we encounter we know is consecutive, so we simply add 1 to the character key. For example, if we take a snapshot of one iteration, the code could look like this (using direct values instead of variables, for illustrative purposes):
# replacing vars for their values
if charseq[1] == charseq[1+1]:
distros[charseq[1]] += 1
# this is a snapshot of a single comparison here and what happens later
if 'b' == 'b':
distros['b'] += 1
You can see the program output below with the correct counts:
➜ /tmp ./counter.py
{'b': 2, 'a': 1, 'c': 3, 'd': 4}

You only need to change len(word) to len(word) - 1. That said, you could also use the fact that False's value is 0 and True's value is 1 with sum:
sum(word[i] == word[i+1] for i in range(len(word)-1))
This produces the sum of (False, True, True, False) where False is 0 and True is 1 - which is what you're after.
If you want this to be safe you need to guard empty words (index -1 access):
sum(word[i] == word[i+1] for i in range(max(0, len(word)-1)))
And this can be improved with zip:
sum(c1 == c2 for c1, c2 in zip(word[:-1], word[1:]))

If we want to count consecutive characters without looping, we can make use of pandas:
In [1]: import pandas as pd
In [2]: sample = 'abbcccddddaaaaffaaa'
In [3]: d = pd.Series(list(sample))
In [4]: [(cat[1], grp.shape[0]) for cat, grp in d.groupby([d.ne(d.shift()).cumsum(), d])]
Out[4]: [('a', 1), ('b', 2), ('c', 3), ('d', 4), ('a', 4), ('f', 2), ('a', 3)]
The key is to find the first elements that are different from their previous values and then make proper groupings in pandas:
In [5]: sample = 'abba'
In [6]: d = pd.Series(list(sample))
In [7]: d.ne(d.shift())
Out[7]:
0 True
1 True
2 False
3 True
dtype: bool
In [8]: d.ne(d.shift()).cumsum()
Out[8]:
0 1
1 2
2 2
3 3
dtype: int32

This is my simple code for finding maximum number of consecutive 1's in binaray string in python 3:
count= 0
maxcount = 0
for i in str(bin(13)):
if i == '1':
count +=1
elif count > maxcount:
maxcount = count;
count = 0
else:
count = 0
if count > maxcount: maxcount = count
maxcount

There is no need to count or groupby. Just note the indices where a change occurs and subtract consecutive indicies.
w = "111000222334455555"
iw = [0] + [i+1 for i in range(len(w)-1) if w[i] != w[i+1]] + [len(w)]
dw = [w[i] for i in range(len(w)-1) if w[i] != w[i+1]] + [w[-1]]
cw = [ iw[j] - iw[j-1] for j in range(1, len(iw) ) ]
print(dw) # digits
['1', '0', '2', '3', '4']
print(cw) # counts
[3, 3, 3, 2, 2, 5]
w = 'XXYXYYYXYXXzzzzzYYY'
iw = [0] + [i+1 for i in range(len(w)-1) if w[i] != w[i+1]] + [len(w)]
dw = [w[i] for i in range(len(w)-1) if w[i] != w[i+1]] + [w[-1]]
cw = [ iw[j] - iw[j-1] for j in range(1, len(iw) ) ]
print(dw) # characters
print(cw) # digits
['X', 'Y', 'X', 'Y', 'X', 'Y', 'X', 'z', 'Y']
[2, 1, 1, 3, 1, 1, 2, 5, 3]

A one liner that returns the amount of consecutive characters with no imports:
def f(x):s=x+" ";t=[x[1] for x in zip(s[0:],s[1:],s[2:]) if (x[1]==x[0])or(x[1]==x[2])];return {h: t.count(h) for h in set(t)}
That returns the amount of times any repeated character in a list is in a consecutive run of characters.
alternatively, this accomplishes the same thing, albeit much slower:
def A(m):t=[thing for x,thing in enumerate(m) if thing in [(m[x+1] if x+1<len(m) else None),(m[x-1] if x-1>0 else None)]];return {h: t.count(h) for h in set(t)}
In terms of performance, I ran them with
site = 'https://web.njit.edu/~cm395/theBeeMovieScript/'
s = urllib.request.urlopen(site).read(100_000)
s = str(copy.deepcopy(s))
print(timeit.timeit('A(s)',globals=locals(),number=100))
print(timeit.timeit('f(s)',globals=locals(),number=100))
which resulted in:
12.528256356999918
5.351301653001428
This method can definitely be improved, but without using any external libraries, this was the best I could come up with.

In python
your_string = "wwwwweaaaawwbbbbn"
current = ''
count = 0
for index, loop in enumerate(your_string):
current = loop
count = count + 1
if index == len(your_string)-1:
print(f"{count}{current}", end ='')
break
if your_string[index+1] != current:
print(f"{count}{current}",end ='')
count = 0
continue
This will output
5w1e4a2w4b1n

#I wrote the code using simple loops and if statement
s='feeekksssh' #len(s) =11
count=1 #f:0, e:3, j:2, s:3 h:1
l=[]
for i in range(1,len(s)): #range(1,10)
if s[i-1]==s[i]:
count = count+1
else:
l.append(count)
count=1
if i == len(s)-1: #To check the last character sequence we need loop reverse order
reverse_count=1
for i in range(-1,-(len(s)),-1): #Lopping only for last character
if s[i] == s[i-1]:
reverse_count = reverse_count+1
else:
l.append(reverse_count)
break
print(l)

Today I had an interview and was asked the same question. I was struggling with the original solution in mind:
s = 'abbcccda'
old = ''
cnt = 0
res = ''
for c in s:
cnt += 1
if old != c:
res += f'{old}{cnt}'
old = c
cnt = 0 # default 0 or 1 neither work
print(res)
# 1a1b2c3d1
Sadly this solution always got unexpected edge cases result(is there anyone to fix the code? maybe i need post another question), and finally timeout the interview.
After the interview I calmed down and soon got a stable solution I think(though I like the groupby best).
s = 'abbcccda'
olds = []
for c in s:
if olds and c in olds[-1]:
olds[-1].append(c)
else:
olds.append([c])
print(olds)
res = ''.join([f'{lst[0]}{len(lst)}' for lst in olds])
print(res)
# [['a'], ['b', 'b'], ['c', 'c', 'c'], ['d'], ['a']]
# a1b2c3d1a1

Here is my simple solution:
def count_chars(s):
size = len(s)
count = 1
op = ''
for i in range(1, size):
if s[i] == s[i-1]:
count += 1
else:
op += "{}{}".format(count, s[i-1])
count = 1
if size:
op += "{}{}".format(count, s[size-1])
return op

data_input = 'aabaaaabbaaaaax'
start = 0
end = 0
temp_dict = dict()
while start < len(data_input):
if data_input[start] == data_input[end]:
end = end + 1
if end == len(data_input):
value = data_input[start:end]
temp_dict[value] = len(value)
break
if data_input[start] != data_input[end]:
value = data_input[start:end]
temp_dict[value] = len(value)
start = end
print(temp_dict)

PROBLEM: we need to count consecutive characters and return characters with their count.
def countWithString(input_string:str)-> str:
count = 1
output = ''
for i in range(1,len(input_string)):
if input_string[i]==input_string[i-1]:
count +=1
else:
output += f"{count}{input_string[i-1]}"
count = 1
# Used to add last string count (at last else condition will not run and data will not be inserted to ouput string)
output += f"{count}{input_string[-1]}"
return output
countWithString(input)
input:'aaabbbaabbcc'
output:'3a3b2a2b2c'
Time Complexity: O(n)
Space Complexity: O(1)

temp_str = "aaaajjbbbeeeeewwjjj"
def consecutive_charcounter(input_str):
counter = 0
temp_list = []
for i in range(len(input_str)):
if i==0:
counter+=1
elif input_str[i]== input_str[i-1]:
counter+=1
if i == len(input_str)-1:
temp_list.extend([input_str[i - 1], str(counter)])
else:
temp_list.extend([input_str[i-1],str(counter)])
counter = 1
print("".join(temp_list))
consecutive_charcounter(temp_str)

Related

How can i count the first and the last index of max sequence?

monets = []
for i in range(20):
choices = ['Tails', 'Eagle']
monets.append(random.choice(choices))
cnt = 0
prev = 0
for i, e in enumerate(monets):
if e == 'Eagle':
cnt += 1
if e == 'Eagle' and i == len(monets) - 1 and cnt > prev:
prev = cnt
elif e != 'Eagle':
if prev < cnt:
prev = cnt
cnt = 0
print(monets)
print(prev)
My code calculates the max sequence of 'Eagle' in random generated list, but i stuck on how to calculate first and last index of this sequence. I figured out that using enumerate may help me with it, but i mixed up. Example: ['Tails', 'Eagle','Eagle','Tails','Eagle'] => output: 1,2
This should works, this is a simple algorithm, you don't need any sophisticated libraries:
(revision 2)
m = 0
c = 0
p = -1
for [i,s] in enumerate(monets):
if s == 'Eagle':
c += 1
else:
c = 0
if c > m:
m = c
p = i
print('max Eagle:', m, 'from:', p + 1 - m, 'to:', p)
You could also use itertools.groupby to get groups of consecutive "Eagles". Combine that with enumerate, as in your approach, to pair them with the indices, and use max to find the longest sequence. Finally, get the indices from the first and last elements of that list.
>>> from itertools import groupby
>>> monets = ['Tails', 'Eagle','Eagle','Tails','Eagle']
>>> max((list(g) for k, g in groupby(enumerate(monets), key=lambda x: x[1]) if k == "Eagle"), key=len)
[(1, 'Eagle'), (2, 'Eagle')]
>>> _[0][0], _[-1][0]
(1, 2)
Just reading your code, looks like you've got following computation working (i.e. generally correct, but I didn't actually run and test for bugs)
['Tails', 'Eagle','Eagle','Tails','Eagle'] # monets list
[ 0, 1, 2, 0, 1] # 'Eagle' sequence lengths
There are a few different ways to do what you want, but continuing on your existing methodology, you can indeed use enumerate to generate the following:
[ (0, 0), (1, 1), (2, 2), (3, 0), (4, 1)] # seq lengths from before, enumerated
Where each pair represents: (index, length)
From that, find the pair with the largest length, and you'll have the end index of the sequence, in this case: (2, 2).
The first instance of length == 1, searching backwards from the end index, will give you the start index.
Sidenote: #tobias_k's answer is written in a more functional style (which I also personally prefer). It's a different methodology than you've started with, but I highly recommend learning it. Here is that method written more (IMO) readably:
import itertools as it
monets = ['Tails', 'Eagle','Eagle','Tails','Eagle']
grouped = it.groupby(enumerate(monets), key=lambda pair: pair[1])
eagle_seqs = [list(seq) for v, seq in grouped if v == 'Eagle']
longest_seq = max(eagle_seqs, key=len)
seq_idxs = [i for i, _ in longest_seq]
start_idx, end_idx = seq_idxs[0], seq_idxs[-1]
This is the most elegant solution to this problem:
import random
import numpy as np
import pandas as pd
monets = []
for i in range(20):
choices = ['Tails', 'Eagle']
monets.append(random.choice(choices))
Here the only additional thing to do is to encode the seq into num values and find the longest contiguous sequence of indices:
encode_ = {'Tails': 0, 'Eagle': 1}
df = pd.DataFrame(monets).replace(encode_)
A = np.where(df == 1)[0]
result = max(np.split(A, np.where(np.diff(A) != 1)[0] + 1), key=len).tolist()
start_idx, end_idx = result[0],result[-1]
Using a down-to-ground approach: (it returns the position of the 1st maximal sequence of consecutive terms)
lst = ['Tails', 'Eagle', 'Eagle','Tails', 'Eagle', 'Eagle','Eagle', 'Eagle', 'Tails', 'Eagle', 'Eagle','Tails']
index, counter = -1, 0
tmp_i, tmp_c = -1, 0
for i, v in enumerate(lst):
if v == 'Eagle':
# tmp-update
tmp_c += 1
if tmp_i == -1:
tmp_i = i
else:
if tmp_c > counter:
# global update
counter = tmp_c
index = tmp_i
# reset
tmp_i, tmp_c = -1, 0
# final check for occurrence of max sequence at the end of the list
if tmp_c > counter:
# global update
counter = tmp_c
index = tmp_i
boundaries_max_seq = (index, index + counter - 1)
print(boundaries_max_seq)
# (4, 7)

when a word in a sentence is input the program identifies all the positions where the word occurs [duplicate]

This question already has answers here:
How to find all occurrences of a substring?
(32 answers)
Closed 12 months ago.
How do I find multiple occurrences of a string within a string in Python? Consider this:
>>> text = "Allowed Hello Hollow"
>>> text.find("ll")
1
>>>
So the first occurrence of ll is at 1 as expected. How do I find the next occurrence of it?
Same question is valid for a list. Consider:
>>> x = ['ll', 'ok', 'll']
How do I find all the ll with their indexes?
Using regular expressions, you can use re.finditer to find all (non-overlapping) occurences:
>>> import re
>>> text = 'Allowed Hello Hollow'
>>> for m in re.finditer('ll', text):
print('ll found', m.start(), m.end())
ll found 1 3
ll found 10 12
ll found 16 18
Alternatively, if you don't want the overhead of regular expressions, you can also repeatedly use str.find to get the next index:
>>> text = 'Allowed Hello Hollow'
>>> index = 0
>>> while index < len(text):
index = text.find('ll', index)
if index == -1:
break
print('ll found at', index)
index += 2 # +2 because len('ll') == 2
ll found at 1
ll found at 10
ll found at 16
This also works for lists and other sequences.
I think what you are looking for is string.count
"Allowed Hello Hollow".count('ll')
>>> 3
Hope this helps
NOTE: this only captures non-overlapping occurences
For the list example, use a comprehension:
>>> l = ['ll', 'xx', 'll']
>>> print [n for (n, e) in enumerate(l) if e == 'll']
[0, 2]
Similarly for strings:
>>> text = "Allowed Hello Hollow"
>>> print [n for n in xrange(len(text)) if text.find('ll', n) == n]
[1, 10, 16]
this will list adjacent runs of "ll', which may or may not be what you want:
>>> text = 'Alllowed Hello Holllow'
>>> print [n for n in xrange(len(text)) if text.find('ll', n) == n]
[1, 2, 11, 17, 18]
FWIW, here are a couple of non-RE alternatives that I think are neater than poke's solution.
The first uses str.index and checks for ValueError:
def findall(sub, string):
"""
>>> text = "Allowed Hello Hollow"
>>> tuple(findall('ll', text))
(1, 10, 16)
"""
index = 0 - len(sub)
try:
while True:
index = string.index(sub, index + len(sub))
yield index
except ValueError:
pass
The second tests uses str.find and checks for the sentinel of -1 by using iter:
def findall_iter(sub, string):
"""
>>> text = "Allowed Hello Hollow"
>>> tuple(findall_iter('ll', text))
(1, 10, 16)
"""
def next_index(length):
index = 0 - length
while True:
index = string.find(sub, index + length)
yield index
return iter(next_index(len(sub)).next, -1)
To apply any of these functions to a list, tuple or other iterable of strings, you can use a higher-level function —one that takes a function as one of its arguments— like this one:
def findall_each(findall, sub, strings):
"""
>>> texts = ("fail", "dolly the llama", "Hello", "Hollow", "not ok")
>>> list(findall_each(findall, 'll', texts))
[(), (2, 10), (2,), (2,), ()]
>>> texts = ("parallellized", "illegally", "dillydallying", "hillbillies")
>>> list(findall_each(findall_iter, 'll', texts))
[(4, 7), (1, 6), (2, 7), (2, 6)]
"""
return (tuple(findall(sub, string)) for string in strings)
For your list example:
In [1]: x = ['ll','ok','ll']
In [2]: for idx, value in enumerate(x):
...: if value == 'll':
...: print idx, value
0 ll
2 ll
If you wanted all the items in a list that contained 'll', you could also do that.
In [3]: x = ['Allowed','Hello','World','Hollow']
In [4]: for idx, value in enumerate(x):
...: if 'll' in value:
...: print idx, value
...:
...:
0 Allowed
1 Hello
3 Hollow
This code might not be the shortest/most efficient but it is simple and understandable
def findall(f, s):
l = []
i = -1
while True:
i = s.find(f, i+1)
if i == -1:
return l
l.append(s.find(f, i))
findall('test', 'test test test test')
# [0, 5, 10, 15]
For the first version, checking a string:
def findall(text, sub):
"""Return all indices at which substring occurs in text"""
return [
index
for index in range(len(text) - len(sub) + 1)
if text[index:].startswith(sub)
]
print(findall('Allowed Hello Hollow', 'll'))
# [1, 10, 16]
No need to import re. This should run in linear time, as it only loops through the string once (and stops before the end, once there aren't enough characters left to fit the substring). I also find it quite readable, personally.
Note that this will find overlapping occurrences:
print(findall('aaa', 'aa'))
# [0, 1]
>>> for n,c in enumerate(text):
... try:
... if c+text[n+1] == "ll": print n
... except: pass
...
1
10
16
This version should be linear in length of the string, and should be fine as long as the sequences aren't too repetitive (in which case you can replace the recursion with a while loop).
def find_all(st, substr, start_pos=0, accum=[]):
ix = st.find(substr, start_pos)
if ix == -1:
return accum
return find_all(st, substr, start_pos=ix + 1, accum=accum + [ix])
bstpierre's list comprehension is a good solution for short sequences, but looks to have quadratic complexity and never finished on a long text I was using.
findall_lc = lambda txt, substr: [n for n in xrange(len(txt))
if txt.find(substr, n) == n]
For a random string of non-trivial length, the two functions give the same result:
import random, string; random.seed(0)
s = ''.join([random.choice(string.ascii_lowercase) for _ in range(100000)])
>>> find_all(s, 'th') == findall_lc(s, 'th')
True
>>> findall_lc(s, 'th')[:4]
[564, 818, 1872, 2470]
But the quadratic version is about 300 times slower
%timeit find_all(s, 'th')
1000 loops, best of 3: 282 µs per loop
%timeit findall_lc(s, 'th')
10 loops, best of 3: 92.3 ms per loop
Brand new to programming in general and working through an online tutorial. I was asked to do this as well, but only using the methods I had learned so far (basically strings and loops). Not sure if this adds any value here, and I know this isn't how you would do it, but I got it to work with this:
needle = input()
haystack = input()
counter = 0
n=-1
for i in range (n+1,len(haystack)+1):
for j in range(n+1,len(haystack)+1):
n=-1
if needle != haystack[i:j]:
n = n+1
continue
if needle == haystack[i:j]:
counter = counter + 1
print (counter)
The following function finds all the occurrences of a string inside another while informing the position where each occurrence is found.
You can call the function using the test cases in the table below. You can try with words, spaces and numbers all mixed up.
The function works well with overlapping characters.
theString
aString
"661444444423666455678966"
"55"
"661444444423666455678966"
"44"
"6123666455678966"
"666"
"66123666455678966"
"66"
Calling examples:
1. print("Number of occurrences: ", find_all("123666455556785555966", "5555"))
output:
Found in position: 7
Found in position: 14
Number of occurrences: 2
2. print("Number of occurrences: ", find_all("Allowed Hello Hollow", "ll "))
output:
Found in position: 1
Found in position: 10
Found in position: 16
Number of occurrences: 3
3. print("Number of occurrences: ", find_all("Aaa bbbcd$###abWebbrbbbbrr 123", "bbb"))
output:
Found in position: 4
Found in position: 21
Number of occurrences: 2
def find_all(theString, aString):
count = 0
i = len(aString)
x = 0
while x < len(theString) - (i-1):
if theString[x:x+i] == aString:
print("Found in position: ", x)
x=x+i
count=count+1
else:
x=x+1
return count
#!/usr/local/bin python3
#-*- coding: utf-8 -*-
main_string = input()
sub_string = input()
count = counter = 0
for i in range(len(main_string)):
if main_string[i] == sub_string[0]:
k = i + 1
for j in range(1, len(sub_string)):
if k != len(main_string) and main_string[k] == sub_string[j]:
count += 1
k += 1
if count == (len(sub_string) - 1):
counter += 1
count = 0
print(counter)
This program counts the number of all substrings even if they are overlapped without the use of regex. But this is a naive implementation and for better results in worst case it is advised to go through either Suffix Tree, KMP and other string matching data structures and algorithms.
Here is my function for finding multiple occurrences. Unlike the other solutions here, it supports the optional start and end parameters for slicing, just like str.index:
def all_substring_indexes(string, substring, start=0, end=None):
result = []
new_start = start
while True:
try:
index = string.index(substring, new_start, end)
except ValueError:
return result
else:
result.append(index)
new_start = index + len(substring)
A simple iterative code which returns a list of indices where the substring occurs.
def allindices(string, sub):
l=[]
i = string.find(sub)
while i >= 0:
l.append(i)
i = string.find(sub, i + 1)
return l
You can split to get relative positions then sum consecutive numbers in a list and add (string length * occurence order) at the same time to get the wanted string indexes.
>>> key = 'll'
>>> text = "Allowed Hello Hollow"
>>> x = [len(i) for i in text.split(key)[:-1]]
>>> [sum(x[:i+1]) + i*len(key) for i in range(len(x))]
[1, 10, 16]
>>>
Maybe not so Pythonic, but somewhat more self-explanatory. It returns the position of the word looked in the original string.
def retrieve_occurences(sequence, word, result, base_counter):
indx = sequence.find(word)
if indx == -1:
return result
result.append(indx + base_counter)
base_counter += indx + len(word)
return retrieve_occurences(sequence[indx + len(word):], word, result, base_counter)
I think there's no need to test for length of text; just keep finding until there's nothing left to find. Like this:
>>> text = 'Allowed Hello Hollow'
>>> place = 0
>>> while text.find('ll', place) != -1:
print('ll found at', text.find('ll', place))
place = text.find('ll', place) + 2
ll found at 1
ll found at 10
ll found at 16
You can also do it with conditional list comprehension like this:
string1= "Allowed Hello Hollow"
string2= "ll"
print [num for num in xrange(len(string1)-len(string2)+1) if string1[num:num+len(string2)]==string2]
# [1, 10, 16]
I had randomly gotten this idea just a while ago. Using a While loop with string splicing and string search can work, even for overlapping strings.
findin = "algorithm alma mater alison alternation alpines"
search = "al"
inx = 0
num_str = 0
while True:
inx = findin.find(search)
if inx == -1: #breaks before adding 1 to number of string
break
inx = inx + 1
findin = findin[inx:] #to splice the 'unsearched' part of the string
num_str = num_str + 1 #counts no. of string
if num_str != 0:
print("There are ",num_str," ",search," in your string.")
else:
print("There are no ",search," in your string.")
I'm an amateur in Python Programming (Programming of any language, actually), and am not sure what other issues it could have, but I guess it's working fine?
I guess lower() could be used somewhere in it too if needed.

Finding the length of longest repeating?

I have tried plenty of different methods to achieve this, and I don't know what I'm doing wrong.
reps=[]
len_charac=0
def longest_charac(strng)
for i in range(len(strng)):
if strng[i] == strng[i+1]:
if strng[i] in reps:
reps.append(strng[i])
len_charac=len(reps)
return len_charac
Remember in Python counting loops and indexing strings aren't usually needed. There is also a builtin max function:
def longest(s):
maximum = count = 0
current = ''
for c in s:
if c == current:
count += 1
else:
count = 1
current = c
maximum = max(count,maximum)
return maximum
Output:
>>> longest('')
0
>>> longest('aab')
2
>>> longest('a')
1
>>> longest('abb')
2
>>> longest('aabccdddeffh')
3
>>> longest('aaabcaaddddefgh')
4
Simple solution:
def longest_substring(strng):
len_substring=0
longest=0
for i in range(len(strng)):
if i > 0:
if strng[i] != strng[i-1]:
len_substring = 0
len_substring += 1
if len_substring > longest:
longest = len_substring
return longest
Iterates through the characters in the string and checks against the previous one. If they are different then the count of repeating characters is reset to zero, then the count is incremented. If the current count beats the current record (stored in longest) then it becomes the new longest.
Compare two things and there is one relation between them:
'a' == 'a'
True
Compare three things, and there are two relations:
'a' == 'a' == 'b'
True False
Combine these ideas - repeatedly compare things with the things next to them, and the chain gets shorter each time:
'a' == 'a' == 'b'
True == False
False
It takes one reduction for the 'b' comparison to be False, because there was one 'b'; two reductions for the 'a' comparison to be False because there were two 'a'. Keep repeating until the relations are all all False, and that is how many consecutive equal characters there were.
def f(s):
repetitions = 0
while any(s):
repetitions += 1
s = [ s[i] and s[i] == s[i+1] for i in range(len(s)-1) ]
return repetitions
>>> f('aaabcaaddddefgh')
4
NB. matching characters at the start become True, only care about comparing the Trues with anything, and stop when all the Trues are gone and the list is all Falses.
It can also be squished into a recursive version, passing the depth in as an optional parameter:
def f(s, depth=1):
s = [ s[i] and s[i]==s[i+1] for i in range(len(s)-1) ]
return f(s, depth+1) if any(s) else depth
>>> f('aaabcaaddddefgh')
4
I stumbled on this while trying for something else, but it's quite pleasing.
You can use itertools.groupby to solve this pretty quickly, it will group characters together, and then you can sort the resulting list by length and get the last entry in the list as follows:
from itertools import groupby
print(sorted([list(g) for k, g in groupby('aaabcaaddddefgh')],key=len)[-1])
This should give you:
['d', 'd', 'd', 'd']
This works:
def longestRun(s):
if len(s) == 0: return 0
runs = ''.join('*' if x == y else ' ' for x,y in zip(s,s[1:]))
starStrings = runs.split()
if len(starStrings) == 0: return 1
return 1 + max(len(stars) for stars in starStrings)
Output:
>>> longestRun("aaabcaaddddefgh")
4
First off, Python is not my primary language, but I can still try to help.
1) you look like you are exceeding the bounds of the array. On the last iteration, you check the last character against the character beyond the last character. This normally leads to undefined behavior.
2) you start off with an empty reps[] array and compare every character to see if it's in it. Clearly, that check will fail every time and your append is within that if statement.
def longest_charac(string):
longest = 0
if string:
flag = string[0]
tmp_len = 0
for item in string:
if item == flag:
tmp_len += 1
else:
flag = item
tmp_len = 1
if tmp_len > longest:
longest = tmp_len
return longest
This is my solution. Maybe it will help you.
Just for context, here is a recursive approach that avoids dealing with loops:
def max_rep(prev, text, reps, rep=1):
"""Recursively consume all characters in text and find longest repetition.
Args
prev: string of previous character
text: string of remaining text
reps: list of ints of all reptitions observed
rep: int of current repetition observed
"""
if text == '': return max(reps)
if prev == text[0]:
rep += 1
else:
rep = 1
return max_rep(text[0], text[1:], reps + [rep], rep)
Tests:
>>> max_rep('', 'aaabcaaddddefgh', [])
4
>>> max_rep('', 'aaaaaabcaadddddefggghhhhhhh', [])
7

Python: list's method similar to dict.get()

My problem is to find the consecutive '3's in a list. For example list('133233313333') . What makes it difficult is only two adjacent '3's is valid, three or more adjacent '3's are not. So '33' is valid, but triple '3's and '3333' are not valid. I tried the following at first:
try:
if l[i] == '3' and l[i+1] == '3' and l[i+2] != '3' and l[i-1] != '3':
record_current(i)
except IndexError:
pass
My intention is to ignore the comparison and let it be true if there is an IndexError, but it doesn't work.
If list has a method like dict.get(), which returns None is there's an KeyError, I could write it as (l[i+2] == None or l[i+2] != '3').
If I must finish it now, I would treat the first item and the last two items sperately from the other items. But is there some way to solve this problem elegantly?
You can do this using itertools.groupby:
>>> from operator import itemgetter
>>> from itertools import groupby
>>> s = list('1332333133334433')
>>> for k, g in groupby(enumerate(s), itemgetter(1)):
if k == '3':
ind = next(g)[0]
if sum(1 for _ in g) == 1:
print ind
...
1
14
Count the consecutive 3s !
Keep a counter which is incremented every time you meet a '3' and reset on a non-'3'; compare to 2 before a reset:
j= 0
for i in range(len(L)):
if L[i] == '3':
j+= 1
else:
if j == 2:
print "Found at", i - j
j= 0
if j == 2:
print "Found at", i - j + 1 # Late fix (+ 1)
Alternatively, one may find successive runs of '3's and non-'3's. This way, one avoids testing j == 2 on every non-'3' element, at the expense of one extra loop test for every sequence of 3's:
i= 0
while i < len(L):
# Find the next '3'
while i < len(L) and L[i] != '3':
i+= 1
j= i
# Find the next non-'3'
while i < len(L) and L[i] == '3':
i+= 1
if i - j == 2:
print "Found at", j
You are trying to check for a certain Grammar. For this, you can implement a Deterministic Finite Automaton (or DFA).
Here is a solution that uses regular expressions:
import re
m = re.finditer('(?<!3)3{2}(?!3)', '1332333133334433')
for x in m:
print x.span()[0]
The regular expression finds all matches for two successive threes, as long as they are not followed by or preceeded by a 3. The output is:
1
14
You can substitute any character for the '3' in the regular expression, to search for that letter instead.
data = "1332333133334433"
from itertools import groupby
from operator import itemgetter
result = []
for char, grp in groupby(enumerate(data), itemgetter(1)):
groups = list(grp)
if char == "3" and len(groups) == 2:
result.append(groups[0][0])
print result
Output
[1, 14]
This returns True if '333' in the list
>>> l = "1332333133334433"
>>> any([(i[:3]=='333' and i[3] != '3') for i in map("".join,zip(l[:],l[1:],l[2:],l[3:]))])
True
you can see that:
>>> map("".join,zip(l[:],l[1:],l[2:],l[3:]))
['1332', '3323', '3233', '2333', '3331', '3313', '3133', '1333', '3333', '3334', '3344', '3443', '4433']
Here's a general solution for finding two consecutive letters that are the same:
def find_two_consecutive(my_str):
prev_letter = None
count = 1
for index, current_letter in enumerate(my_str):
if current_letter == prev_letter:
count += 1
else:
if count == 2:
print("Starting at index: %d" % (index - 2))
count = 1
prev_letter = current_letter
if count == 2:
print("Starting at index: %d" % (index - 2))
If your list really only contains one-letter elements you should use the re module:
import re
chars = list('133233313333433')
numberstr = ''.join(chars)
for match in re.finditer('(?<!3)33(?!3)', numberstr):
print(match.start())
Result:
1
13
The pattern (?<!3)33(?!3) means: find two consecutive 3s that are neither preceded nor followed by a 3.
The documentation can be found here.
Oh, and this:
chars = list('133233313333433')
numberstr = ''.join(chars)
should probably be just:
numberstr = '133233313333433'

Counting longest occurrence of repeated sequence in Python

What's the easiest way to count the longest consecutive repeat of a certain character in a string? For example, the longest consecutive repeat of "b" in the following string:
my_str = "abcdefgfaabbbffbbbbbbfgbb"
would be 6, since other consecutive repeats are shorter (3 and 2, respectively.) How can I do this in Python?
How about a regex example:
import re
my_str = "abcdefgfaabbbffbbbbbbfgbb"
len(max(re.compile("(b+b)*").findall(my_str))) #changed the regex from (b+b) to (b+b)*
# max([len(i) for i in re.compile("(b+b)").findall(my_str)]) also works
Edit, Mine vs. interjays
x=timeit.Timer(stmt='import itertools;my_str = "abcdefgfaabbbffbbbbbbfgbb";max(len(list(y)) for (c,y) in itertools.groupby(my_str) if c=="b")')
x.timeit()
22.759046077728271
x=timeit.Timer(stmt='import re;my_str = "abcdefgfaabbbffbbbbbbfgbb";len(max(re.compile("(b+b)").findall(my_str)))')
x.timeit()
8.4770550727844238
Here is a one-liner:
max(len(list(y)) for (c,y) in itertools.groupby(my_str) if c=='b')
Explanation:
itertools.groupby will return groups of consecutive identical characters, along with an iterator for all items in that group. For each such iterator, len(list(y)) will give the number of items in the group. Taking the maximum of that (for the given character) will give the required result.
Here's my really boring, inefficient, straightforward counting method (interjay's is much better). Note, I wrote this in this little text field, which doesn't have an interpreter, so I haven't tested it, and I may have made a really dumb mistake that a proof-read didn't catch.
my_str = "abcdefgfaabbbffbbbbbbfgbb"
last_char = ""
current_seq_len = 0
max_seq_len = 0
for c in mystr:
if c == last_char:
current_seq_len += 1
if current_seq_len > max_seq_len:
max_seq_len = current_seq_len
else:
current_seq_len = 1
last_char = c
print(max_seq_len)
Using run-length encoding:
import numpy as NP
signal = NP.array([4,5,6,7,3,4,3,5,5,5,5,3,4,2,8,9,0,1,2,8,8,8,0,9,1,3])
px, = NP.where(NP.ediff1d(signal) != 0)
px = NP.r_[(0, px+1, [len(signal)])]
# collect the run-lengths for each unique item in the signal
rx = [ (m, n, signal[m]) for (m, n) in zip(px[:-1], px[1:]) if (n - m) > 1 ]
# get longest:
rx2 = [ (b-a, c) for (a, b, c) in rx ]
rx2.sort(reverse=True)
# returns: [(4, 5), (3, 8)], ie, '5' occurs 4 times consecutively, '8' occurs 3 times consecutively
Here is my code, Not that efficient but seems to work:
def LongCons(mystring):
dictionary = {}
CurrentCount = 0
latestchar = ''
for i in mystring:
if i == latestchar:
CurrentCount += 1
if dictionary.has_key(i):
if CurrentCount > dictionary[i]:
dictionary[i]=CurrentCount
else:
CurrentCount = 1
dictionary.update({i: CurrentCount})
latestchar = i
k = max(dictionary, key=dictionary.get)
print(k, dictionary[k])
return

Categories

Resources