Find Longest Alphabetically Ordered Substring - Efficiently

Find Longest Alphabetically Ordered Substring - Efficiently - python

The goal of some a piece of code I wrote is to find the longest alphabetically ordered substring within a string.
"""
Find longest alphabetically ordered substring in string s.
"""
s = 'zabcabcd' # Test string.
alphabetical_str, temp_str = s[0], s[0]
for i in range(len(s) - 1): # Loop through string.
if s[i] <= s[i + 1]: # Check if next character is alphabetically next.
temp_str += s[i + 1] # Add character to temporary string.
if len(temp_str) > len(alphabetical_str): # Check is temporary string is the longest string.
alphabetical_str = temp_str # Assign longest string.
else:
temp_str = s[i + 1] # Assign last checked character to temporary string.
print(alphabetical_str)
I get an output of abcd.
But the instructor says there is PEP 8 compliant way of writing this code that is 7-8 lines of code and there is a more computational efficient way of writing this code that is ~16 lines. Also that there is a way of writing this code in only 1 line 75 character!
Can anyone provide some insight on what the code would look like if it was 7-8 lines or what the most work appropriate way of writing this code would be? Also any PEP 8 compliance critique would be appreciated.

Linear time:
s = 'zabcabcd'
longest = current = []
for c in s:
if [c] < current[-1:]:
current = []
current += c
longest = max(longest, current, key=len)
print(''.join(longest))
Your PEP 8 issues I see:
"Limit all lines to a maximum of 79 characters." (link) - You have two lines longer than that.
"do not rely on CPython’s efficient implementation of in-place string concatenation for statements in the form a += b" [...] the ''.join() form should be used instead" (link). You do that repeated string concatenation.
Also, yours crashes if the input string is empty.
1 line 72 characters:
s='zabcabcd';print(max([t:='']+[t:=t*(c>=t[-1:])+c for c in s],key=len))
Optimized linear time (I might add benchmarks tomorrow):
def Kelly_fast(s):
maxstart = maxlength = start = length = 0
prev = ''
for c in s:
if c >= prev:
length += 1
else:
if length > maxlength:
maxstart = start
maxlength = length
start += length
length = 1
prev = c
if length > maxlength:
maxstart = start
maxlength = length
return s[maxstart : maxstart+maxlength]

Depending on how you choose to count, this is only 6-7 lines and PEP 8 compliant:
def longest_alphabetical_substring(s):
sub = '', 0
for i in range(len(s)):
j = i + len(sub) + 1
while list(s[i:j]) == sorted(s[i:j]) and j <= len(s):
sub, j = s[i:j], j+1
return sub
print(longest_alphabetical_substring('zabcabcd'))
Your own code was PEP 8 compliant as far as I can tell, although it would make sense to capture code like this in a function, for easy reuse and logical grouping for improved readability.
The solution I provided here is not very efficient, as it keeps extracting copies of the best result so far. A slightly longer solution that avoids this:
def longest_alphabetical_substring(s):
n = m = 0
for i in range(len(s)):
for j in range(i+1, len(s)+1):
if j == len(s) or s[j] < s[j-1]:
if j-i > m-n:
n, m = i, j
break
return s[n:m]
print(longest_alphabetical_substring('zabcabcd'))
There may be more efficient ways of doing this; for example you could detect that there's no need to keep looking because there is not enough room left in the string to find longer strings, and exit the outer loop sooner.
User #kellybundy is correct, a truly efficient solution would be linear in time. Something like:
def las_efficient(s):
t = s[0]
return max([(t := c) if c < t[-1] else (t := t + c) for c in s[1:]], key=len)
print(las_efficient('zabcabcd'))
No points for readability here, but PEP 8 otherwise, and very brief.
And for an even more efficient solution:
def las_very_efficient(s):
m, lm, t, ls = '', 0, s[0], len(s)
for n, c in enumerate(s[1:]):
if c < t[-1]:
t = c
else:
t += c
if len(t) > lm:
m, lm = t, len(t)
if n + lm > ls:
break
return m

You can keep appending characters from the input string to a candidate list, but clear the list when the current character is lexicographically smaller than the last character in the list, and set the candidate list as the output list if it's longer than the current output list. Join the list into a string for the final output:
s = 'zabcabcdabc'
candidate = longest = []
for c in s:
if candidate and c < candidate[-1]:
candidate = []
candidate.append(c)
if len(candidate) > len(longest):
longest = candidate
print(''.join(longest))
This outputs:
abcd

Related

How to find the longest repeating sequence using python

I went through an interview, where they asked me to print the longest repeated character sequence.
I got stuck is there any way to get it?
But my code prints only the count of characters present in a string is there any approach to get the expected output
import pandas as pd
import collections
a = 'abcxyzaaaabbbbbbb'
lst = collections.Counter(a)
df = pd.Series(lst)
df
Expected output :
bbbbbbb
How to add logic to in above code?

A regex solution:
max(re.split(r'((.)\2*)', a), key=len)
Or without library help (but less efficient):
s = ''
max((s := s * (c in s) + c for c in a), key=len)
Both compute the string 'bbbbbbb'.

Without any modules, you could use a comprehension to go backward through possible sizes and get the first character multiplication that is present in the string:
next(c*s for s in range(len(a),0,-1) for c in a if c*s in a)
That's quite bad in terms of efficiency though
another approach would be to detect the positions of letter changes and take the longest subrange from those
chg = [i for i,(x,y) in enumerate(zip(a,a[1:]),1) if x!=y]
s,e = max(zip([0]+chg,chg+[len(a)]),key=lambda se:se[1]-se[0])
longest = a[s:e]
Of course a basic for-loop solution will also work:
si,sc = 0,"" # current streak (start, character)
ls,le = 0,0 # longest streak (start, end)
for i,c in enumerate(a+" "): # extra space to force out last char.
if i-si > le-ls: ls,le = si,i # new longest
if sc != c: si,sc = i,c # new streak
longest = a[ls:le]
print(longest) # bbbbbbb

A more long winded solution, picked wholesale from:
maximum-consecutive-repeating-character-string
def maxRepeating(str):
len_s = len(str)
count = 0
# Find the maximum repeating
# character starting from str[i]
res = str[0]
for i in range(len_s):
cur_count = 1
for j in range(i + 1, len_s):
if (str[i] != str[j]):
break
cur_count += 1
# Update result if required
if cur_count > count :
count = cur_count
res = str[i]
return res, count
# Driver code
if __name__ == "__main__":
str = "abcxyzaaaabbbbbbb"
print(maxRepeating(str))
Solution:
('b', 7)

What is a faster version of this code instead of the double for-loop (python)?

When I hand this code in on a site (from my university) that corrects it, it is too long for its standards.
Here is the code:
def pangram(String):
import string
alfabet = list(string.ascii_lowercase)
interpunctie = string.punctuation + "’" + "123456789"
String = String.lower()
string_1 = ""
for char in String:
if not char in interpunctie:
string_1 += char
string_1 = string_1.replace(" ", "")
List = list(string_1)
List.sort()
list_2 = []
for index, char in enumerate(List):
if not List[index] == 0:
if not (char == List[index - 1]):
list_2.append(char)
return list_2 == alfabet
def venster(tekst):
pangram_gevonden = False
if pangram(tekst) == False:
return None
for lengte in range(26, len(tekst)):
if pangram_gevonden == True:
break
for n in range(0, len(tekst) - lengte):
if pangram(tekst[n:lengte+n]):
kortste_pangram = tekst[n:lengte+n]
pangram_gevonden = True
break
return kortste_pangram
So the first function (pangram) is fine and it determines whether or not a given string is a pangram: it contains all the letters of the alphabet at least once.
The second function checks whether or not the string(usually a longer tekst) is a pangram or not and if it is, it returns the shortest possible pangram within that tekst (even if that's not correct English). If there are two pangrams with the same length: the most left one is returned.
For this second function I used a double for loop: The first one determines the length of the string that's being checked (26 - len(string)) and the second one uses this length to go through the string at each possible point to check if it is a pangram. Once the shortest (and most left) pangram is found, it breaks out of both of the for loops.
However this (apparantly) still takes too long. So i wonder if anyone knew a faster way of tackling this second function. It doesn't necessarily have to be with a for loop.
Thanks in advance
Lucas

Create a map {letter; int} and activecount counter.
Make two indexes left and right, set them in 0.
Move right index.
If l=s[right] is letter, increment value for map key l.
If value becomes non-zero - increment activecount.
Continue until activecount reaches 26
Now move left index.
If l=s[left] is letter, decrement value for map key l.
If value becomes zero - decrement activecount and stop.
Start moving right index again and so on.
Minimal difference between left and right while
activecount==26 corresponds to the shortest pangram.
Algorithm is linear.
Example code for string containing only lower letters from alphabet 'abcd'. Returns length of the shortest substring that contains all letters from abcd. Does not check for valid chars, is not thoroughly tested.
import string
def findpangram(s):
alfabet = list(string.ascii_lowercase)
map = dict(zip(alfabet, [0]*len(alfabet)))
left = 0
right = 0
ac = 0
minlen = 100000
while left < len(s):
while right < len(s):
l = s[right]
c = map[l]
map[l] = c + 1
right += 1
if c==0:
ac+=1
if ac == 4:
break
if ac < 4:
break
if right - left < minlen:
minlen = right - left
while left < right:
l = s[left]
c = map[l]
map[l] = c - 1
left += 1
if c==1:
ac-=1
break
if right - left + 2 < minlen:
minlen = right - left + 1
return minlen
print(findpangram("acacdbcca"))

Why can't I initialize a variable to the indices of two other variables working on a string?

The wording of my question was not polite to the search feature on the site, so I apologize should someone feel this is a duplicate question, but I must ask anyway.
Working in Python 3.6.1, my goal is to find a substring of letters in a string that are in alphabetical order and if that substring of letters is the longest substring of letters in alphabetical order (aa would be considered alphabetical order), then print out the string. I have not gotten entirely close to the solution but I'm making progress; however, this came up and I'm confounded by it being completely new to Python. My question is, why is this valid:
s = 'hijkkpdgijklmnopqqrs'
n = len(s)
i = 0
a = 0
for i in range(n-2):
if s[i] <= s[i+1]:
a = s[i+1]
i = s[i+2]
a = i + a
print(a)
And yet this is not:
s = 'hijkkpdgijklmnopqqrs'
n = len(s)
i = 0
a = 0
b = ''
for i in range(n-2):
if s[i] <= s[i+1]:
b = a + i
a = s[i+1]
i = s[i+2]
a = a + i
print(b)
When the latter code is run, I receive the error:
Traceback (most recent call last):
File "C:\Users\spect\Desktop\newjackcity.py", line 14, in <module>
b = a + i
TypeError: must be str, not int
What I am ultimately trying to do is to 'index in' to the string s, compare the zeroth element to the zeroth+1 element and if s[I] < s[I+1], I want to concatenate the two into my variable a for later printing. Because when I do this, a only prints out two letters in the string. I thought, well initialize the variable first so that a and i can be incremented, then added into a for comparison purposes, and b for printing.
I see now that I'm only going through n-2 iterations (in order to compare the second to last letter to n-1 so the logic is flawed, but I still don't understand the error of why all of a sudden binding a+i to a variable b will produce a str/int error? In my view saying s[i]; etc. is pulling out the elements as a string and this to me is proven in the fact if I run the first set of code, I get the output:
sr
>>>

In both for loops, you use i as the loop variable, so it starts as an int.
In the first version, you reassign i to a string, then add.
for i in range(n-2):
# here i is an int, something between 0 and n-2
if s[i] <= s[i+1]:
a = s[i+1] # a is a string...
i = s[i+2] # now you change i to a string
a = i + a # string + string: OK!
In the second version you try to add i first:
for i in range(n-2):
# here i is an int, something between 0 and n-2
if s[i] <= s[i+1]:
b = a + i # string + int, can't do it...
a = s[i+1]
i = s[i+2]
a = a + i
You will have an easier time debugging your code if you pick more meaningful names.
edit: here is my cleaned up version of your code:
s = 'hijkkpdgijklmnopqqrs'
# i = 0 isn't needed, range starts at 0
# the first character is always 'alphabetical'
alph_substr = s[0]
# range(1,n) is [1,2, ..., n-1]
for i in range(1, len(s)):
if s[i-1] <= s[i]:
alph_substr = alph_substr + s[i]
else:
# we have to start over, since we're not alphabetical anymore
print(alph_substr)
alph_substr = s[i]
print(alph_substr)

Comparing strings and getting the most frequent serie of letters [duplicate]

I'm looking for a Python library for finding the longest common sub-string from a set of strings. There are two ways to solve this problem:
using suffix trees
using dynamic programming.
Method implemented is not important. It is important it can be used for a set of strings (not only two strings).

These paired functions will find the longest common string in any arbitrary array of strings:
def long_substr(data):
substr = ''
if len(data) > 1 and len(data[0]) > 0:
for i in range(len(data[0])):
for j in range(len(data[0])-i+1):
if j > len(substr) and is_substr(data[0][i:i+j], data):
substr = data[0][i:i+j]
return substr
def is_substr(find, data):
if len(data) < 1 and len(find) < 1:
return False
for i in range(len(data)):
if find not in data[i]:
return False
return True
print long_substr(['Oh, hello, my friend.',
'I prefer Jelly Belly beans.',
'When hell freezes over!'])
No doubt the algorithm could be improved and I've not had a lot of exposure to Python, so maybe it could be more efficient syntactically as well, but it should do the job.
EDIT: in-lined the second is_substr function as demonstrated by J.F. Sebastian. Usage remains the same. Note: no change to algorithm.
def long_substr(data):
substr = ''
if len(data) > 1 and len(data[0]) > 0:
for i in range(len(data[0])):
for j in range(len(data[0])-i+1):
if j > len(substr) and all(data[0][i:i+j] in x for x in data):
substr = data[0][i:i+j]
return substr
Hope this helps,
Jason.

This can be done shorter:
def long_substr(data):
substrs = lambda x: {x[i:i+j] for i in range(len(x)) for j in range(len(x) - i + 1)}
s = substrs(data[0])
for val in data[1:]:
s.intersection_update(substrs(val))
return max(s, key=len)
set's are (probably) implemented as hash-maps, which makes this a bit inefficient. If you (1) implement a set datatype as a trie and (2) just store the postfixes in the trie and then force each node to be an endpoint (this would be the equivalent of adding all substrings), THEN in theory I would guess this baby is pretty memory efficient, especially since intersections of tries are super-easy.
Nevertheless, this is short and premature optimization is the root of a significant amount of wasted time.

def common_prefix(strings):
""" Find the longest string that is a prefix of all the strings.
"""
if not strings:
return ''
prefix = strings[0]
for s in strings:
if len(s) < len(prefix):
prefix = prefix[:len(s)]
if not prefix:
return ''
for i in range(len(prefix)):
if prefix[i] != s[i]:
prefix = prefix[:i]
break
return prefix
From http://bitbucket.org/ned/cog/src/tip/cogapp/whiteutils.py

I prefer this for is_substr, as I find it a bit more readable and intuitive:
def is_substr(find, data):
"""
inputs a substring to find, returns True only
if found for each data in data list
"""
if len(find) < 1 or len(data) < 1:
return False # expected input DNE
is_found = True # and-ing to False anywhere in data will return False
for i in data:
print "Looking for substring %s in %s..." % (find, i)
is_found = is_found and find in i
return is_found

# this does not increase asymptotical complexity
# but can still waste more time than it saves. TODO: profile
def shortest_of(strings):
return min(strings, key=len)
def long_substr(strings):
substr = ""
if not strings:
return substr
reference = shortest_of(strings) #strings[0]
length = len(reference)
#find a suitable slice i:j
for i in xrange(length):
#only consider strings long at least len(substr) + 1
for j in xrange(i + len(substr) + 1, length + 1):
candidate = reference[i:j] # ↓ is the slice recalculated every time?
if all(candidate in text for text in strings):
substr = candidate
return substr
Disclaimer This adds very little to jtjacques' answer. However, hopefully, this should be more readable and faster and it didn't fit in a comment, hence why I'm posting this in an answer. I'm not satisfied about shortest_of, to be honest.

If someone is looking for a generalized version that can also take a list of sequences of arbitrary objects:
def get_longest_common_subseq(data):
substr = []
if len(data) > 1 and len(data[0]) > 0:
for i in range(len(data[0])):
for j in range(len(data[0])-i+1):
if j > len(substr) and is_subseq_of_any(data[0][i:i+j], data):
substr = data[0][i:i+j]
return substr
def is_subseq_of_any(find, data):
if len(data) < 1 and len(find) < 1:
return False
for i in range(len(data)):
if not is_subseq(find, data[i]):
return False
return True
# Will also return True if possible_subseq == seq.
def is_subseq(possible_subseq, seq):
if len(possible_subseq) > len(seq):
return False
def get_length_n_slices(n):
for i in xrange(len(seq) + 1 - n):
yield seq[i:i+n]
for slyce in get_length_n_slices(len(possible_subseq)):
if slyce == possible_subseq:
return True
return False
print get_longest_common_subseq([[1, 2, 3, 4, 5], [2, 3, 4, 5, 6]])
print get_longest_common_subseq(['Oh, hello, my friend.',
'I prefer Jelly Belly beans.',
'When hell freezes over!'])

My answer, pretty slow, but very easy to understand. Working on a file with 100 strings of 1 kb each takes about two seconds, returns any one longest substring if there are more than one
ls = list()
ls.sort(key=len)
s1 = ls.pop(0)
maxl = len(s1)
#1 create a list of all substrings backwards sorted by length. Thus we don't have to check the whole list.
subs = [s1[i:j] for i in range(maxl) for j in range(maxl,i,-1)]
subs.sort(key=len, reverse=True)
#2 Check a substring with the next shortest then the next etc. if is not in an any next shortest string then break the cycle, it's not common. If it passes all checks, it is the longest one by default, break the cycle.
def isasub(subs, ls):
for sub in subs:
for st in ls:
if sub not in st:
break
else:
return sub
break
print('the longest common substring is: ',isasub(subs,ls))

Caveman solution that will give you a dataframe with the top most frequent substring in a string base on the substring length you pass as a list:
import pandas as pd
lista = ['How much wood would a woodchuck',' chuck if a woodchuck could chuck wood?']
string = ''
for i in lista:
string = string + ' ' + str(i)
string = string.lower()
characters_you_would_like_to_remove_from_string = [' ','-','_']
for i in charecters_you_would_like_to_remove_from_string:
string = string.replace(i,'')
substring_length_you_want_to_check = [3,4,5,6,7,8]
results_list = []
for string_length in substring_length_you_want_to_check:
for i in range(len(string)):
checking_str = string[i:i+string_length]
if len(checking_str) == string_length:
number_of_times_appears = (len(string) - len(string.replace(checking_str,'')))/string_length
results_list = results_list+[[checking_str,number_of_times_appears]]
df = pd.DataFrame(data=results_list,columns=['string','freq'])
df['freq'] = df['freq'].astype('int64')
df = df.drop_duplicates()
df = df.sort_values(by='freq',ascending=False)
display(df[:10])
result is:
string freq
78 huck 4
63 wood 4
77 chuc 4
132 chuck 4
8 ood 4
7 woo 4
21 chu 4
23 uck 4
22 huc 4
20 dch 3

The addition of a single 'break' speeds up jtjacques's answer significantly on my machine (1000X or so for 16K files):
def long_substr(data):
substr = ''
if len(data) > 1 and len(data[0]) > 0:
for i in range(len(data[0])):
for j in range(len(substr)+1, len(data[0])-i+1):
if all(data[0][i:i+j] in x for x in data[1:]):
substr = data[0][i:i+j]
else:
break
return substr

You could use the SuffixTree module that is a wrapper based on an ANSI C implementation of generalised suffix trees. The module is easy to handle....
Take a look at: here

Finding longest substring in alphabetical order

EDIT: I am aware that a question with similar task was already asked in SO but I'm interested to find out the problem in this specific piece of code. I am also aware that this problem can be solved without using recursion.
The task is to write a program which will find (and print) the longest sub-string in which the letters occur in alphabetical order. If more than 1 equally long sequences were found, then the first one should be printed. For example, the output for a string abczabcd will be abcz.
I have solved this problem with recursion which seemed to pass my manual tests. However when I run an automated tests set which generate random strings, I have noticed that in some cases, the output is incorrect. For example:
if s = 'hixwluvyhzzzdgd', the output is hix instead of luvy
if s = 'eseoojlsuai', the output is eoo instead of jlsu
if s = 'drurotsxjehlwfwgygygxz', the output is dru instead of ehlw
After some time struggling, I couldn't figure out what is so special about these strings that causes the bug.
This is my code:
pos = 0
maxLen = 0
startPos = 0
endPos = 0
def last_pos(pos):
if pos < (len(s) - 1):
if s[pos + 1] >= s[pos]:
pos += 1
if pos == len(s)-1:
return len(s)
else:
return last_pos(pos)
return pos
for i in range(len(s)):
if last_pos(i+1) != None:
diff = last_pos(i) - i
if diff - 1 > maxLen:
maxLen = diff
startPos = i
endPos = startPos + diff
print s[startPos:endPos+1]

There are many things to improve in your code but making minimum changes so as to make it work. The problem is you should have if last_pos(i) != None: in your for loop (i instead of i+1) and you should compare diff (not diff - 1) against maxLen. Please read other answers to learn how to do it better.
for i in range(len(s)):
if last_pos(i) != None:
diff = last_pos(i) - i + 1
if diff > maxLen:
maxLen = diff
startPos = i
endPos = startPos + diff - 1

Here. This does what you want. One pass, no need for recursion.
def find_longest_substring_in_alphabetical_order(s):
groups = []
cur_longest = ''
prev_char = ''
for c in s.lower():
if prev_char and c < prev_char:
groups.append(cur_longest)
cur_longest = c
else:
cur_longest += c
prev_char = c
return max(groups, key=len) if groups else s
Using it:
>>> find_longest_substring_in_alphabetical_order('hixwluvyhzzzdgd')
'luvy'
>>> find_longest_substring_in_alphabetical_order('eseoojlsuai')
'jlsu'
>>> find_longest_substring_in_alphabetical_order('drurotsxjehlwfwgygygxz')
'ehlw'
Note: It will probably break on strange characters, has only been tested with the inputs you suggested. Since this is a "homework" question, I will leave you with the solution as is, though there is still some optimization to be done, I wanted to leave it a little bit understandable.

You can use nested for loops, slicing and sorted. If the string is not all lower-case then you can convert the sub-strings to lower-case before comparing using str.lower:
def solve(strs):
maxx = ''
for i in xrange(len(strs)):
for j in xrange(i+1, len(strs)):
s = strs[i:j+1]
if ''.join(sorted(s)) == s:
maxx = max(maxx, s, key=len)
else:
break
return maxx
Output:
>>> solve('hixwluvyhzzzdgd')
'luvy'
>>> solve('eseoojlsuai')
'jlsu'
>>> solve('drurotsxjehlwfwgygygxz')
'ehlw'

Python has a powerful builtin package itertools and a wonderful function within groupby
An intuitive use of the Key function can give immense mileage.
In this particular case, you just have to keep a track of order change and group the sequence accordingly. The only exception is the boundary case which you have to handle separately
Code
def find_long_cons_sub(s):
class Key(object):
'''
The Key function returns
1: For Increasing Sequence
0: For Decreasing Sequence
'''
def __init__(self):
self.last_char = None
def __call__(self, char):
resp = True
if self.last_char:
resp = self.last_char < char
self.last_char = char
return resp
def find_substring(groups):
'''
The Boundary Case is when an increasing sequence
starts just after the Decresing Sequence. This causes
the first character to be in the previous group.
If you do not want to handle the Boundary Case
seperately, you have to mak the Key function a bit
complicated to flag the start of increasing sequence'''
yield next(groups)
try:
while True:
yield next(groups)[-1:] + next(groups)
except StopIteration:
pass
groups = (list(g) for k, g in groupby(s, key = Key()) if k)
#Just determine the maximum sequence based on length
return ''.join(max(find_substring(groups), key = len))
Result
>>> find_long_cons_sub('drurotsxjehlwfwgygygxz')
'ehlw'
>>> find_long_cons_sub('eseoojlsuai')
'jlsu'
>>> find_long_cons_sub('hixwluvyhzzzdgd')
'luvy'

Simple and easy.
Code :
s = 'hixwluvyhzzzdgd'
r,p,t = '','',''
for c in s:
if p <= c:
t += c
p = c
else:
if len(t) > len(r):
r = t
t,p = c,c
if len(t) > len(r):
r = t
print 'Longest substring in alphabetical order is: ' + r
Output :
Longest substring in alphabetical order which appeared first: luvy

Here is a single pass solution with a fast loop. It reads each character only once. Inside the loop operations are limited to
1 string comparison (1 char x 1 char)
1 integer increment
2 integer subtractions
1 integer comparison
1 to 3 integer assignments
1 string assignment
No containers are used. No function calls are made. The empty string is handled without special-case code. All character codes, including chr(0), are properly handled. If there is a tie for the longest alphabetical substring, the function returns the first winning substring it encountered. Case is ignored for purposes of alphabetization, but case is preserved in the output substring.
def longest_alphabetical_substring(string):
start, end = 0, 0 # range of current alphabetical string
START, END = 0, 0 # range of longest alphabetical string yet found
prev = chr(0) # previous character
for char in string.lower(): # scan string ignoring case
if char < prev: # is character out of alphabetical order?
start = end # if so, start a new substring
end += 1 # either way, increment substring length
if end - start > END - START: # found new longest?
START, END = start, end # if so, update longest
prev = char # remember previous character
return string[START : END] # return longest alphabetical substring
Result
>>> longest_alphabetical_substring('drurotsxjehlwfwgygygxz')
'ehlw'
>>> longest_alphabetical_substring('eseoojlsuai')
'jlsu'
>>> longest_alphabetical_substring('hixwluvyhzzzdgd')
'luvy'
>>>

a lot more looping, but it gets the job done
s = raw_input("Enter string")
fin=""
s_pos =0
while s_pos < len(s):
n=1
lng=" "
for c in s[s_pos:]:
if c >= lng[n-1]:
lng+=c
n+=1
else :
break
if len(lng) > len(fin):
fin= lng`enter code here`
s_pos+=1
print "Longest string: " + fin

def find_longest_order():
`enter code here`arr = []
`enter code here`now_long = ''
prev_char = ''
for char in s.lower():
if prev_char and char < prev_char:
arr.append(now_long)
now_long = char
else:
now_long += char
prev_char = char
if len(now_long) == len(s):
return now_long
else:
return max(arr, key=len)
def main():
print 'Longest substring in alphabetical order is: ' + find_longest_order()
main()

Simple and easy to understand:
s = "abcbcd" #The original string
l = len(s) #The length of the original string
maxlenstr = s[0] #maximum length sub-string, taking the first letter of original string as value.
curlenstr = s[0] #current length sub-string, taking the first letter of original string as value.
for i in range(1,l): #in range, the l is not counted.
if s[i] >= s[i-1]: #If current letter is greater or equal to previous letter,
curlenstr += s[i] #add the current letter to current length sub-string
else:
curlenstr = s[i] #otherwise, take the current letter as current length sub-string
if len(curlenstr) > len(maxlenstr): #if current cub-string's length is greater than max one,
maxlenstr = curlenstr; #take current one as max one.
print("Longest substring in alphabetical order is:", maxlenstr)

s = input("insert some string: ")
start = 0
end = 0
temp = ""
while end+1 <len(s):
while end+1 <len(s) and s[end+1] >= s[end]:
end += 1
if len(s[start:end+1]) > len(temp):
temp = s[start:end+1]
end +=1
start = end
print("longest ordered part is: "+temp)

I suppose this is problem set question for CS6.00.1x on EDX. Here is what I came up with.
s = raw_input("Enter the string: ")
longest_sub = ""
last_longest = ""
for i in range(len(s)):
if len(last_longest) > 0:
if last_longest[-1] <= s[i]:
last_longest += s[i]
else:
last_longest = s[i]
else:
last_longest = s[i]
if len(last_longest) > len(longest_sub):
longest_sub = last_longest
print(longest_sub)

I came up with this solution
def longest_sorted_string(s):
max_string = ''
for i in range(len(s)):
for j in range(i+1, len(s)+1):
string = s[i:j]
arr = list(string)
if sorted(string) == arr and len(max_string) < len(string):
max_string = string
return max_string

Assuming this is from Edx course:
till this question, we haven't taught anything about strings and their advanced operations in python
So, I would simply go through the looping and conditional statements
string ="" #taking a plain string to represent the then generated string
present ="" #the present/current longest string
for i in range(len(s)): #not len(s)-1 because that totally skips last value
j = i+1
if j>= len(s):
j=i #using s[i+1] simply throws an error of not having index
if s[i] <= s[j]: #comparing the now and next value
string += s[i] #concatinating string if above condition is satisied
elif len(string) != 0 and s[i] > s[j]: #don't want to lose the last value
string += s[i] #now since s[i] > s[j] #last one will be printed
if len(string) > len(present): #1 > 0 so from there we get to store many values
present = string #swapping to largest string
string = ""
if len(string) > len(present): #to swap from if statement
present = string
if present == s[len(s)-1]: #if no alphabet is in order then first one is to be the output
present = s[0]
print('Longest substring in alphabetical order is:' + present)

I agree with #Abhijit about the power of itertools.groupby() but I took a simpler approach to (ab)using it and avoided the boundary case problems:
from itertools import groupby
LENGTH, LETTERS = 0, 1
def longest_sorted(string):
longest_length, longest_letters = 0, []
key, previous_letter = 0, chr(0)
def keyfunc(letter):
nonlocal key, previous_letter
if letter < previous_letter:
key += 1
previous_letter = letter
return key
for _, group in groupby(string, keyfunc):
letters = list(group)
length = len(letters)
if length > longest_length:
longest_length, longest_letters = length, letters
return ''.join(longest_letters)
print(longest_sorted('hixwluvyhzzzdgd'))
print(longest_sorted('eseoojlsuai'))
print(longest_sorted('drurotsxjehlwfwgygygxz'))
print(longest_sorted('abcdefghijklmnopqrstuvwxyz'))
OUTPUT
> python3 test.py
luvy
jlsu
ehlw
abcdefghijklmnopqrstuvwxyz
>

s = 'azcbobobegghakl'
i=1
subs=s[0]
subs2=s[0]
while(i<len(s)):
j=i
while(j<len(s)):
if(s[j]>=s[j-1]):
subs+=s[j]
j+=1
else:
subs=subs.replace(subs[:len(subs)],s[i])
break
if(len(subs)>len(subs2)):
subs2=subs2.replace(subs2[:len(subs2)], subs[:len(subs)])
subs=subs.replace(subs[:len(subs)],s[i])
i+=1
print("Longest substring in alphabetical order is:",subs2)

s = 'gkuencgybsbezzilbfg'
x = s.lower()
y = ''
z = [] #creating an empty listing which will get filled
for i in range(0,len(x)):
if i == len(x)-1:
y = y + str(x[i])
z.append(y)
break
a = x[i] <= x[i+1]
if a == True:
y = y + str(x[i])
else:
y = y + str(x[i])
z.append(y) # fill the list
y = ''
# search of 1st longest string
L = len(max(z,key=len)) # key=len takes length in consideration
for i in range(0,len(z)):
a = len(z[i])
if a == L:
print 'Longest substring in alphabetical order is:' + str(z[i])
break

first_seq=s[0]
break_seq=s[0]
current = s[0]
for i in range(0,len(s)-1):
if s[i]<=s[i+1]:
first_seq = first_seq + s[i+1]
if len(first_seq) > len(current):
current = first_seq
else:
first_seq = s[i+1]
break_seq = first_seq
print("Longest substring in alphabetical order is: ", current)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Find Longest Alphabetically Ordered Substring - Efficiently - python

Related

How to find the longest repeating sequence using python

What is a faster version of this code instead of the double for-loop (python)?

Why can't I initialize a variable to the indices of two other variables working on a string?

Comparing strings and getting the most frequent serie of letters [duplicate]

Finding longest substring in alphabetical order

Categories

Resources