Longest Common Prefix from list elements in Python - python

I have a list as below:
strs = ["flowers", "flow", "flight"]
Now, I want to find the longest prefix of the elements from the list. If there is no match then it should return "". I am trying to use the 'Divide and Conquer' rule for solving the problem. Below is my code:
strs = ["flowers", "flow", "flight"]
firstHalf = ""
secondHalf = ""
def longestCommonPrefix(strs) -> str:
minValue = min(len(i) for i in strs)
length = len(strs)
middle_index = length // 2
firstHalf = strs[:middle_index]
secondHalf = strs[middle_index:]
minSecondHalfValue = min(len(i) for i in secondHalf)
matchingString=[] #Creating a stack to append the matching characters
for i in range(minSecondHalfValue):
secondHalf[0][i] == secondHalf[1][i]
return secondHalf
print(longestCommonPrefix(strs))
I was able to find the mid and divide the list into two parts. Now I am trying to use the second half and get the longest prefix but am unable to do so. I have had created a stack where I would be adding the continuous matching characters and then I would use it to compare with the firstHalf but how can I compare the get the continuous matching characters from start?
Expected output:
"fl"
Just a suggestion would also help. I can give it a try.

No matter what, you need to look at each character from each string in turn (until you find a set of corresponding characters that doesn't match), so there's no benefit to splitting the list up. Just iterate through and break when the common prefix stops being common:
def common_prefix(strs) -> str:
prefix = ""
for chars in zip(*strs):
if len(set(chars)) > 1:
break
prefix += chars[0]
return prefix
print(common_prefix(["flowers", "flow", "flight"])) # fl

Even if this problem has already found its solution, I would like to post my approach (I considered the problem interesting, so started playing around with it).
So, your divide-and-conquer solution would involve a very big task split in many smaller subtasks, whose solutions get processed by other small tasks and so, until you get to the final solution. The typical example is a sum of numbers (let's take 1 to 8), which can be done sequentially (1 + 2 = 3, then 3 + 3 = 6, then 6 + 4 = 10... until the end) or splitting the problem (1 + 2 = 3, 3 + 4 = 7, 5 + 6 = 11, 7 + 8 = 15, then 3 + 7 = 10 and 11 + 15 = 26...). The second approach has the clear advantage that it can be parallelized - increasing the time performance dramatically in the right set up - reason why this goes generally hand in hand with topics like multithreading.
So my approach:
import math
def run(lst):
if len(lst) > 1:
lst_split = [lst[2 * (i-1) : min(len(lst) + 1, 2 * i)] for i in range(1, math.ceil(len(lst)/2.0) + 1)]
lst = [Processor().process(*x) for x in lst_split]
if any([len(x) == 0 for x in lst]):
return ''
return run(lst)
else:
return lst[0]
class Processor:
def process(self, w1, w2 = None):
if w2 != None:
zipped = list(zip(w1, w2))
for i, (x, y) in enumerate(zipped):
if x != y:
return w1[:i]
if i + 1 == len(zipped):
return w1[:i+1]
else:
return w1
return ''
lst = ["flowers", "flow", "flight", "flask", "flock"]
print(run(lst))
OUTPUT
fl
If you look at the run method, the passed lst gets split in couples, which then get processed (this is where you could start multiple threads, but let's not focus on that). The resulting list gets reprocessed until the end.
An interesting aspect of this problem is: if, after a pass, you get one empty match (two words with no common start), you can stop the reduction, given that you know the solution already! Hence the introduction of
if any([len(x) == 0 for x in lst]):
return ''
I don't think the functools.reduce offers the possibility of stopping the iteration in case a specific condition is met.
Out of curiosity: another solution could take advantage of regex:
import re
pattern = re.compile("(\w+)\w* \\1\w*")
def find(x, y):
v = pattern.findall(f'{x} {y}')
return v[0] if len(v) else ''
reduce(find, lst)
OUTPUT
'fl'

Sort of "divide and conquer" :
solve for 2 strings
solve for the other strings
def common_prefix2_(s1: str, s2: str)-> str:
if not s1 or not s2: return ""
for i, z in enumerate(zip(s1,s2)):
if z[0] != z[1]:
break
else:
i += 1
return s1[:i]
from functools import reduce
def common_prefix(l:list):
return reduce(common_prefix2_, l[1:], l[0]) if len(l) else ''
Tests
for l in [["flowers", "flow", "flight"],
["flowers", "flow", ""],
["flowers", "flow"],
["flowers", "xxx"],
["flowers" ],
[]]:
print(f"{l if l else '[]'}: '{common_prefix(l)}'")
# output
['flowers', 'flow', 'flight']: 'fl'
['flowers', 'flow', '']: ''
['flowers', 'flow']: 'flow'
['flowers', 'xxx']: ''
['flowers']: 'flowers'
[]: ''

Related

Find Longest Alphabetically Ordered Substring - Efficiently

The goal of some a piece of code I wrote is to find the longest alphabetically ordered substring within a string.
"""
Find longest alphabetically ordered substring in string s.
"""
s = 'zabcabcd' # Test string.
alphabetical_str, temp_str = s[0], s[0]
for i in range(len(s) - 1): # Loop through string.
if s[i] <= s[i + 1]: # Check if next character is alphabetically next.
temp_str += s[i + 1] # Add character to temporary string.
if len(temp_str) > len(alphabetical_str): # Check is temporary string is the longest string.
alphabetical_str = temp_str # Assign longest string.
else:
temp_str = s[i + 1] # Assign last checked character to temporary string.
print(alphabetical_str)
I get an output of abcd.
But the instructor says there is PEP 8 compliant way of writing this code that is 7-8 lines of code and there is a more computational efficient way of writing this code that is ~16 lines. Also that there is a way of writing this code in only 1 line 75 character!
Can anyone provide some insight on what the code would look like if it was 7-8 lines or what the most work appropriate way of writing this code would be? Also any PEP 8 compliance critique would be appreciated.
Linear time:
s = 'zabcabcd'
longest = current = []
for c in s:
if [c] < current[-1:]:
current = []
current += c
longest = max(longest, current, key=len)
print(''.join(longest))
Your PEP 8 issues I see:
"Limit all lines to a maximum of 79 characters." (link) - You have two lines longer than that.
"do not rely on CPython’s efficient implementation of in-place string concatenation for statements in the form a += b" [...] the ''.join() form should be used instead" (link). You do that repeated string concatenation.
Also, yours crashes if the input string is empty.
1 line 72 characters:
s='zabcabcd';print(max([t:='']+[t:=t*(c>=t[-1:])+c for c in s],key=len))
Optimized linear time (I might add benchmarks tomorrow):
def Kelly_fast(s):
maxstart = maxlength = start = length = 0
prev = ''
for c in s:
if c >= prev:
length += 1
else:
if length > maxlength:
maxstart = start
maxlength = length
start += length
length = 1
prev = c
if length > maxlength:
maxstart = start
maxlength = length
return s[maxstart : maxstart+maxlength]
Depending on how you choose to count, this is only 6-7 lines and PEP 8 compliant:
def longest_alphabetical_substring(s):
sub = '', 0
for i in range(len(s)):
j = i + len(sub) + 1
while list(s[i:j]) == sorted(s[i:j]) and j <= len(s):
sub, j = s[i:j], j+1
return sub
print(longest_alphabetical_substring('zabcabcd'))
Your own code was PEP 8 compliant as far as I can tell, although it would make sense to capture code like this in a function, for easy reuse and logical grouping for improved readability.
The solution I provided here is not very efficient, as it keeps extracting copies of the best result so far. A slightly longer solution that avoids this:
def longest_alphabetical_substring(s):
n = m = 0
for i in range(len(s)):
for j in range(i+1, len(s)+1):
if j == len(s) or s[j] < s[j-1]:
if j-i > m-n:
n, m = i, j
break
return s[n:m]
print(longest_alphabetical_substring('zabcabcd'))
There may be more efficient ways of doing this; for example you could detect that there's no need to keep looking because there is not enough room left in the string to find longer strings, and exit the outer loop sooner.
User #kellybundy is correct, a truly efficient solution would be linear in time. Something like:
def las_efficient(s):
t = s[0]
return max([(t := c) if c < t[-1] else (t := t + c) for c in s[1:]], key=len)
print(las_efficient('zabcabcd'))
No points for readability here, but PEP 8 otherwise, and very brief.
And for an even more efficient solution:
def las_very_efficient(s):
m, lm, t, ls = '', 0, s[0], len(s)
for n, c in enumerate(s[1:]):
if c < t[-1]:
t = c
else:
t += c
if len(t) > lm:
m, lm = t, len(t)
if n + lm > ls:
break
return m
You can keep appending characters from the input string to a candidate list, but clear the list when the current character is lexicographically smaller than the last character in the list, and set the candidate list as the output list if it's longer than the current output list. Join the list into a string for the final output:
s = 'zabcabcdabc'
candidate = longest = []
for c in s:
if candidate and c < candidate[-1]:
candidate = []
candidate.append(c)
if len(candidate) > len(longest):
longest = candidate
print(''.join(longest))
This outputs:
abcd

How to find the longest repeating sequence using python

I went through an interview, where they asked me to print the longest repeated character sequence.
I got stuck is there any way to get it?
But my code prints only the count of characters present in a string is there any approach to get the expected output
import pandas as pd
import collections
a = 'abcxyzaaaabbbbbbb'
lst = collections.Counter(a)
df = pd.Series(lst)
df
Expected output :
bbbbbbb
How to add logic to in above code?
A regex solution:
max(re.split(r'((.)\2*)', a), key=len)
Or without library help (but less efficient):
s = ''
max((s := s * (c in s) + c for c in a), key=len)
Both compute the string 'bbbbbbb'.
Without any modules, you could use a comprehension to go backward through possible sizes and get the first character multiplication that is present in the string:
next(c*s for s in range(len(a),0,-1) for c in a if c*s in a)
That's quite bad in terms of efficiency though
another approach would be to detect the positions of letter changes and take the longest subrange from those
chg = [i for i,(x,y) in enumerate(zip(a,a[1:]),1) if x!=y]
s,e = max(zip([0]+chg,chg+[len(a)]),key=lambda se:se[1]-se[0])
longest = a[s:e]
Of course a basic for-loop solution will also work:
si,sc = 0,"" # current streak (start, character)
ls,le = 0,0 # longest streak (start, end)
for i,c in enumerate(a+" "): # extra space to force out last char.
if i-si > le-ls: ls,le = si,i # new longest
if sc != c: si,sc = i,c # new streak
longest = a[ls:le]
print(longest) # bbbbbbb
A more long winded solution, picked wholesale from:
maximum-consecutive-repeating-character-string
def maxRepeating(str):
len_s = len(str)
count = 0
# Find the maximum repeating
# character starting from str[i]
res = str[0]
for i in range(len_s):
cur_count = 1
for j in range(i + 1, len_s):
if (str[i] != str[j]):
break
cur_count += 1
# Update result if required
if cur_count > count :
count = cur_count
res = str[i]
return res, count
# Driver code
if __name__ == "__main__":
str = "abcxyzaaaabbbbbbb"
print(maxRepeating(str))
Solution:
('b', 7)

Comparing strings and getting the most frequent serie of letters [duplicate]

I'm looking for a Python library for finding the longest common sub-string from a set of strings. There are two ways to solve this problem:
using suffix trees
using dynamic programming.
Method implemented is not important. It is important it can be used for a set of strings (not only two strings).
These paired functions will find the longest common string in any arbitrary array of strings:
def long_substr(data):
substr = ''
if len(data) > 1 and len(data[0]) > 0:
for i in range(len(data[0])):
for j in range(len(data[0])-i+1):
if j > len(substr) and is_substr(data[0][i:i+j], data):
substr = data[0][i:i+j]
return substr
def is_substr(find, data):
if len(data) < 1 and len(find) < 1:
return False
for i in range(len(data)):
if find not in data[i]:
return False
return True
print long_substr(['Oh, hello, my friend.',
'I prefer Jelly Belly beans.',
'When hell freezes over!'])
No doubt the algorithm could be improved and I've not had a lot of exposure to Python, so maybe it could be more efficient syntactically as well, but it should do the job.
EDIT: in-lined the second is_substr function as demonstrated by J.F. Sebastian. Usage remains the same. Note: no change to algorithm.
def long_substr(data):
substr = ''
if len(data) > 1 and len(data[0]) > 0:
for i in range(len(data[0])):
for j in range(len(data[0])-i+1):
if j > len(substr) and all(data[0][i:i+j] in x for x in data):
substr = data[0][i:i+j]
return substr
Hope this helps,
Jason.
This can be done shorter:
def long_substr(data):
substrs = lambda x: {x[i:i+j] for i in range(len(x)) for j in range(len(x) - i + 1)}
s = substrs(data[0])
for val in data[1:]:
s.intersection_update(substrs(val))
return max(s, key=len)
set's are (probably) implemented as hash-maps, which makes this a bit inefficient. If you (1) implement a set datatype as a trie and (2) just store the postfixes in the trie and then force each node to be an endpoint (this would be the equivalent of adding all substrings), THEN in theory I would guess this baby is pretty memory efficient, especially since intersections of tries are super-easy.
Nevertheless, this is short and premature optimization is the root of a significant amount of wasted time.
def common_prefix(strings):
""" Find the longest string that is a prefix of all the strings.
"""
if not strings:
return ''
prefix = strings[0]
for s in strings:
if len(s) < len(prefix):
prefix = prefix[:len(s)]
if not prefix:
return ''
for i in range(len(prefix)):
if prefix[i] != s[i]:
prefix = prefix[:i]
break
return prefix
From http://bitbucket.org/ned/cog/src/tip/cogapp/whiteutils.py
I prefer this for is_substr, as I find it a bit more readable and intuitive:
def is_substr(find, data):
"""
inputs a substring to find, returns True only
if found for each data in data list
"""
if len(find) < 1 or len(data) < 1:
return False # expected input DNE
is_found = True # and-ing to False anywhere in data will return False
for i in data:
print "Looking for substring %s in %s..." % (find, i)
is_found = is_found and find in i
return is_found
# this does not increase asymptotical complexity
# but can still waste more time than it saves. TODO: profile
def shortest_of(strings):
return min(strings, key=len)
def long_substr(strings):
substr = ""
if not strings:
return substr
reference = shortest_of(strings) #strings[0]
length = len(reference)
#find a suitable slice i:j
for i in xrange(length):
#only consider strings long at least len(substr) + 1
for j in xrange(i + len(substr) + 1, length + 1):
candidate = reference[i:j] # ↓ is the slice recalculated every time?
if all(candidate in text for text in strings):
substr = candidate
return substr
Disclaimer This adds very little to jtjacques' answer. However, hopefully, this should be more readable and faster and it didn't fit in a comment, hence why I'm posting this in an answer. I'm not satisfied about shortest_of, to be honest.
If someone is looking for a generalized version that can also take a list of sequences of arbitrary objects:
def get_longest_common_subseq(data):
substr = []
if len(data) > 1 and len(data[0]) > 0:
for i in range(len(data[0])):
for j in range(len(data[0])-i+1):
if j > len(substr) and is_subseq_of_any(data[0][i:i+j], data):
substr = data[0][i:i+j]
return substr
def is_subseq_of_any(find, data):
if len(data) < 1 and len(find) < 1:
return False
for i in range(len(data)):
if not is_subseq(find, data[i]):
return False
return True
# Will also return True if possible_subseq == seq.
def is_subseq(possible_subseq, seq):
if len(possible_subseq) > len(seq):
return False
def get_length_n_slices(n):
for i in xrange(len(seq) + 1 - n):
yield seq[i:i+n]
for slyce in get_length_n_slices(len(possible_subseq)):
if slyce == possible_subseq:
return True
return False
print get_longest_common_subseq([[1, 2, 3, 4, 5], [2, 3, 4, 5, 6]])
print get_longest_common_subseq(['Oh, hello, my friend.',
'I prefer Jelly Belly beans.',
'When hell freezes over!'])
My answer, pretty slow, but very easy to understand. Working on a file with 100 strings of 1 kb each takes about two seconds, returns any one longest substring if there are more than one
ls = list()
ls.sort(key=len)
s1 = ls.pop(0)
maxl = len(s1)
#1 create a list of all substrings backwards sorted by length. Thus we don't have to check the whole list.
subs = [s1[i:j] for i in range(maxl) for j in range(maxl,i,-1)]
subs.sort(key=len, reverse=True)
#2 Check a substring with the next shortest then the next etc. if is not in an any next shortest string then break the cycle, it's not common. If it passes all checks, it is the longest one by default, break the cycle.
def isasub(subs, ls):
for sub in subs:
for st in ls:
if sub not in st:
break
else:
return sub
break
print('the longest common substring is: ',isasub(subs,ls))
Caveman solution that will give you a dataframe with the top most frequent substring in a string base on the substring length you pass as a list:
import pandas as pd
lista = ['How much wood would a woodchuck',' chuck if a woodchuck could chuck wood?']
string = ''
for i in lista:
string = string + ' ' + str(i)
string = string.lower()
characters_you_would_like_to_remove_from_string = [' ','-','_']
for i in charecters_you_would_like_to_remove_from_string:
string = string.replace(i,'')
substring_length_you_want_to_check = [3,4,5,6,7,8]
results_list = []
for string_length in substring_length_you_want_to_check:
for i in range(len(string)):
checking_str = string[i:i+string_length]
if len(checking_str) == string_length:
number_of_times_appears = (len(string) - len(string.replace(checking_str,'')))/string_length
results_list = results_list+[[checking_str,number_of_times_appears]]
df = pd.DataFrame(data=results_list,columns=['string','freq'])
df['freq'] = df['freq'].astype('int64')
df = df.drop_duplicates()
df = df.sort_values(by='freq',ascending=False)
display(df[:10])
result is:
string freq
78 huck 4
63 wood 4
77 chuc 4
132 chuck 4
8 ood 4
7 woo 4
21 chu 4
23 uck 4
22 huc 4
20 dch 3
The addition of a single 'break' speeds up jtjacques's answer significantly on my machine (1000X or so for 16K files):
def long_substr(data):
substr = ''
if len(data) > 1 and len(data[0]) > 0:
for i in range(len(data[0])):
for j in range(len(substr)+1, len(data[0])-i+1):
if all(data[0][i:i+j] in x for x in data[1:]):
substr = data[0][i:i+j]
else:
break
return substr
You could use the SuffixTree module that is a wrapper based on an ANSI C implementation of generalised suffix trees. The module is easy to handle....
Take a look at: here

Finding the length of longest repeating?

I have tried plenty of different methods to achieve this, and I don't know what I'm doing wrong.
reps=[]
len_charac=0
def longest_charac(strng)
for i in range(len(strng)):
if strng[i] == strng[i+1]:
if strng[i] in reps:
reps.append(strng[i])
len_charac=len(reps)
return len_charac
Remember in Python counting loops and indexing strings aren't usually needed. There is also a builtin max function:
def longest(s):
maximum = count = 0
current = ''
for c in s:
if c == current:
count += 1
else:
count = 1
current = c
maximum = max(count,maximum)
return maximum
Output:
>>> longest('')
0
>>> longest('aab')
2
>>> longest('a')
1
>>> longest('abb')
2
>>> longest('aabccdddeffh')
3
>>> longest('aaabcaaddddefgh')
4
Simple solution:
def longest_substring(strng):
len_substring=0
longest=0
for i in range(len(strng)):
if i > 0:
if strng[i] != strng[i-1]:
len_substring = 0
len_substring += 1
if len_substring > longest:
longest = len_substring
return longest
Iterates through the characters in the string and checks against the previous one. If they are different then the count of repeating characters is reset to zero, then the count is incremented. If the current count beats the current record (stored in longest) then it becomes the new longest.
Compare two things and there is one relation between them:
'a' == 'a'
True
Compare three things, and there are two relations:
'a' == 'a' == 'b'
True False
Combine these ideas - repeatedly compare things with the things next to them, and the chain gets shorter each time:
'a' == 'a' == 'b'
True == False
False
It takes one reduction for the 'b' comparison to be False, because there was one 'b'; two reductions for the 'a' comparison to be False because there were two 'a'. Keep repeating until the relations are all all False, and that is how many consecutive equal characters there were.
def f(s):
repetitions = 0
while any(s):
repetitions += 1
s = [ s[i] and s[i] == s[i+1] for i in range(len(s)-1) ]
return repetitions
>>> f('aaabcaaddddefgh')
4
NB. matching characters at the start become True, only care about comparing the Trues with anything, and stop when all the Trues are gone and the list is all Falses.
It can also be squished into a recursive version, passing the depth in as an optional parameter:
def f(s, depth=1):
s = [ s[i] and s[i]==s[i+1] for i in range(len(s)-1) ]
return f(s, depth+1) if any(s) else depth
>>> f('aaabcaaddddefgh')
4
I stumbled on this while trying for something else, but it's quite pleasing.
You can use itertools.groupby to solve this pretty quickly, it will group characters together, and then you can sort the resulting list by length and get the last entry in the list as follows:
from itertools import groupby
print(sorted([list(g) for k, g in groupby('aaabcaaddddefgh')],key=len)[-1])
This should give you:
['d', 'd', 'd', 'd']
This works:
def longestRun(s):
if len(s) == 0: return 0
runs = ''.join('*' if x == y else ' ' for x,y in zip(s,s[1:]))
starStrings = runs.split()
if len(starStrings) == 0: return 1
return 1 + max(len(stars) for stars in starStrings)
Output:
>>> longestRun("aaabcaaddddefgh")
4
First off, Python is not my primary language, but I can still try to help.
1) you look like you are exceeding the bounds of the array. On the last iteration, you check the last character against the character beyond the last character. This normally leads to undefined behavior.
2) you start off with an empty reps[] array and compare every character to see if it's in it. Clearly, that check will fail every time and your append is within that if statement.
def longest_charac(string):
longest = 0
if string:
flag = string[0]
tmp_len = 0
for item in string:
if item == flag:
tmp_len += 1
else:
flag = item
tmp_len = 1
if tmp_len > longest:
longest = tmp_len
return longest
This is my solution. Maybe it will help you.
Just for context, here is a recursive approach that avoids dealing with loops:
def max_rep(prev, text, reps, rep=1):
"""Recursively consume all characters in text and find longest repetition.
Args
prev: string of previous character
text: string of remaining text
reps: list of ints of all reptitions observed
rep: int of current repetition observed
"""
if text == '': return max(reps)
if prev == text[0]:
rep += 1
else:
rep = 1
return max_rep(text[0], text[1:], reps + [rep], rep)
Tests:
>>> max_rep('', 'aaabcaaddddefgh', [])
4
>>> max_rep('', 'aaaaaabcaadddddefggghhhhhhh', [])
7

Counting longest occurrence of repeated sequence in Python

What's the easiest way to count the longest consecutive repeat of a certain character in a string? For example, the longest consecutive repeat of "b" in the following string:
my_str = "abcdefgfaabbbffbbbbbbfgbb"
would be 6, since other consecutive repeats are shorter (3 and 2, respectively.) How can I do this in Python?
How about a regex example:
import re
my_str = "abcdefgfaabbbffbbbbbbfgbb"
len(max(re.compile("(b+b)*").findall(my_str))) #changed the regex from (b+b) to (b+b)*
# max([len(i) for i in re.compile("(b+b)").findall(my_str)]) also works
Edit, Mine vs. interjays
x=timeit.Timer(stmt='import itertools;my_str = "abcdefgfaabbbffbbbbbbfgbb";max(len(list(y)) for (c,y) in itertools.groupby(my_str) if c=="b")')
x.timeit()
22.759046077728271
x=timeit.Timer(stmt='import re;my_str = "abcdefgfaabbbffbbbbbbfgbb";len(max(re.compile("(b+b)").findall(my_str)))')
x.timeit()
8.4770550727844238
Here is a one-liner:
max(len(list(y)) for (c,y) in itertools.groupby(my_str) if c=='b')
Explanation:
itertools.groupby will return groups of consecutive identical characters, along with an iterator for all items in that group. For each such iterator, len(list(y)) will give the number of items in the group. Taking the maximum of that (for the given character) will give the required result.
Here's my really boring, inefficient, straightforward counting method (interjay's is much better). Note, I wrote this in this little text field, which doesn't have an interpreter, so I haven't tested it, and I may have made a really dumb mistake that a proof-read didn't catch.
my_str = "abcdefgfaabbbffbbbbbbfgbb"
last_char = ""
current_seq_len = 0
max_seq_len = 0
for c in mystr:
if c == last_char:
current_seq_len += 1
if current_seq_len > max_seq_len:
max_seq_len = current_seq_len
else:
current_seq_len = 1
last_char = c
print(max_seq_len)
Using run-length encoding:
import numpy as NP
signal = NP.array([4,5,6,7,3,4,3,5,5,5,5,3,4,2,8,9,0,1,2,8,8,8,0,9,1,3])
px, = NP.where(NP.ediff1d(signal) != 0)
px = NP.r_[(0, px+1, [len(signal)])]
# collect the run-lengths for each unique item in the signal
rx = [ (m, n, signal[m]) for (m, n) in zip(px[:-1], px[1:]) if (n - m) > 1 ]
# get longest:
rx2 = [ (b-a, c) for (a, b, c) in rx ]
rx2.sort(reverse=True)
# returns: [(4, 5), (3, 8)], ie, '5' occurs 4 times consecutively, '8' occurs 3 times consecutively
Here is my code, Not that efficient but seems to work:
def LongCons(mystring):
dictionary = {}
CurrentCount = 0
latestchar = ''
for i in mystring:
if i == latestchar:
CurrentCount += 1
if dictionary.has_key(i):
if CurrentCount > dictionary[i]:
dictionary[i]=CurrentCount
else:
CurrentCount = 1
dictionary.update({i: CurrentCount})
latestchar = i
k = max(dictionary, key=dictionary.get)
print(k, dictionary[k])
return

Categories

Resources