Find longest substring in alphabetical order - python

I want to write a program that prints the longest substring in alphabetical order.
And in case of ties, it prints the first substring.
Here is what I wrote
import sys
s1 = str(sys.argv[1])
alpha = "abcdefghijklmnopqrstuvwxyz"
def longest_substring(s1):
for i in range(len(alpha)):
for k in range(len(alpha)):
if alpha[i:k] in s1:
return alpha[i:k]
print("Longest substring in alphabetical order:", longest_substring(s1))
However, it does not work and I do not know how to do the second part.
Can you help me, please?

Here is what your code should look like to achieve what you want:
#!/usr/bin/env python3.6
import sys
s1 = str(sys.argv[1])
alpha = "abcdefghijklmnopqrstuvwxyz"
subs = []
def longest_substring(s1):
for i in range(len(alpha)):
for k in range(len(alpha)):
if alpha[i:k] in s1:
subs.append(alpha[i:k])
return max(subs, key=len)
print("Longest substring in alphabetical order:", longest_substring(s1))
You were returning right out of the function on the first alphabetically ordered substring you found. In my code, we add them to a list then print out the longest one.

Assume that substring contains 2 or more characters in alphabetical order. So that you should not only return the first occurrence but collect all and find longest. I try to keep your idea the same, but this is not the most efficient way:
def longest_substring(s1):
res = []
for i in range(len(alpha) - 2):
for k in range(i + 2, len(alpha)):
if alpha[i:k] in s1:
res.append(alpha[i:k])
return max(res, key=len)

You re-write a version of itertools.takewhile to take a binary compare function instead of the unary one.
def my_takewhile(predicate, starting_value, iterable):
last = starting_value
for cur in iterable:
if predicate(last, cur):
yield cur
last = cur
else:
break
Then you can lowercase the word (since "Za" isn't in alphabetical order, but any [A-Z] compares lexicographically before any [a-z]) and get all the substrings.
i = 0
substrings = []
while i < len(alpha):
it = iter(alpha[i:])
substring = str(my_takewhile(lambda x,y: x<y, chr(0), it))
i += len(substring)
substrings.append(substring)
Then just find the longest substring in substrings.
result = max(substrings, key=len)

Instead of building a list of all possible substring slices and then checking which one exists in the string, you can build a list of all consecutive substrings, and then take the one with the maximum length.
This is easily done by grouping the characters using the difference between the ord of that character and an increasing counter; successive characters will have a constant difference. itertools.groupby is used to perform the grouping:
from itertools import groupby, count
alpha = "abcdefghijklmnopqrstuvwxyz"
c = count()
lst_substrs = [''.join(g) for _, g in groupby(alpha, lambda x: ord(x)-next(c))]
substr = max(lst_substrs, key=len)
print(substr)
# abcdefghijklmnopqrstuvwxyz
As #AdamSmith commented, the above assumes the characters are always in alphabetical order. In the case they may not be, one can enforce the order by checking that items in the group are alphabetical:
from itertools import groupby, count, tee
lst = []
c = count()
for _, g in groupby(alpha, lambda x: ord(x)-next(c)):
a, b = tee(g)
try:
if ord(next(a)) - ord(next(a)) == -1:
lst.append(''.join(b))
except StopIteration:
pass
lst.extend(b) # add each chr from non-alphabetic iterator (could be empty)
substr = max(lst, key=len)

back up and look at this problem again.
1. you are looking for a maximum and should basically (pseudo code):
set a max to ""
loop through sequences
if new sequence is bigger the max, then replace max
find the sequences you can be more efficient if you only step though the input characters once.
Here is a version of this:
def longest_substring(s1):
max_index, max_len = 0, 0 # keep track of the longest sequence here
last_c = s1[0] # previous char
start, seq_len = 0, 1 # tracking current seqence
for i, c in enumerate(s1[1:]):
if c >= last_c: # can we extend sequence in alpha order
seq_len += 1
if seq_len > max_len: # found longer
max_index, max_len = start, seq_len
else: # this char starts new sequence
seq_len = 0
start = i + 1
last_c = c
return s1[max_index:max_index+max_len]

s = 'azcbobobegghakl'
def max_alpha_subStr(s):
'''
INPUT: s, a string of lowercase letters
OUTPUT: longest substing of s in which the
letters occur in alphabetical order
'''
longest = s[0] # set variables 'longest' and 'current' as 1st letter in s
current = s[0]
for i in s[1:]: # begin iteration from 2nd letter to the end of s
if i >= current[-1]: # if the 'current' letter is bigger
# than the letter before it
current += i # add that letter to the 'current' letter(s) and
if len(current) > len(longest): # check if the 'current' length of
# letters are longer than the letters in'longest'
longest = current # if 'current' is the longest, make 'longest'
# now equal 'current'
else: # otherwise the current letter is lesser
# than the letter before it and
current = i # restart evaluating from the point of iteration
return print("Longest substring in alphabetical order is: ", longest)
max_alpha_subStr(s)

Related

What is wrong with my code trying to find longest substring of a string that is in alphabetical order?

I need help on a problem that I am doing in a course. The exact details are below:
Assume s is a string of lower case characters.
Write a program that prints the longest substring of s in which the letters occur in alphabetical order.
For example, if s = 'azcbobobegghakl', then your program should print
"Longest substring in alphabetical order is: beggh"
In the case of ties, print the first substring. For example, if s = 'abcbcd', then your program should print
"Longest substring in alphabetical order is: abc"
I have written some code that achieves some correct answers but not all and I am unsure why.
This is my code
s = 'vettmlxvn'
alphabet = "abcdefghijklmnopqrstuvwxyz"
substring = ""
highest_len = 0
highest_string = ""
counter = 0
for letter in s:
counter += 1
if s.index(letter) == 0:
substring = substring + letter
highest_len = len(substring)
highest_string = substring
else:
x = alphabet.index(substring[-1])
y = alphabet.index(letter)
if y >= x:
substring = substring + letter
if counter == len(s) and len(substring) > highest_len:
highest_len = len(substring)
highest_string = substring
else:
if len(substring) > highest_len:
highest_len = len(substring)
highest_string = substring
substring = "" + letter
else:
substring = "" + letter
print("Longest substring in alphabetical order is: " + highest_string)
When I test for this specific string it gives me "lxv" instead of the correct answer: "ett". I do not know why this is and have even tried drawing a trace table so I can trace variables and I should be getting "ett".
Maybe I have missed something simple but can someone explain why it is not working.
I know there are probably easier ways to do this problem but I am a beginner in python and have been working on this problem for a long time.
Just want to know what is wrong with my code.
Thanks.
I solved your Problem with an alternate simpler approach.
You can compare 2 characters like numbers. a comes before b in the alphabet, that means the expression 'a' < 'b' is True
def longest_substring(s):
# Initialize the longest substring
longest = ""
# Initialize the current substring
current = ""
# Loop through the string
for i in range(len(s)):
if i == 0:
# If it's the first letter, add it to the current substring
current += s[i]
else:
if s[i] >= s[i-1]:
# If the current letter is greater than or equal to the previous letter,
# add it to the current substring
current += s[i]
else:
# If the current letter is less than the previous letter,
# check if the current substring is longer than the longest substring
# and update the longest substring if it is
if len(current) > len(longest):
longest = current
current = s[i]
# Once the loop is done,
# check again if the current substring is longer than the longest substring
if len(current) > len(longest):
longest = current
return longest
print(longest_substring("azcbobobegghakl"))
print(longest_substring("abcbcd"))
print(longest_substring("abcdefghijklmnopqrstuvwxyz"))
print(longest_substring("vettmlxvn"))
Output:
beggh
abc
abcdefghijklmnopqrstuvwxyz
ett
I haven't figured out what's wrong with your code yet. I will update this if I figured it out.
Edit
So commented some stuff out and changed one thing.
Here's the working code:
s = 'vettmlxvn'
alphabet = "abcdefghijklmnopqrstuvwxyz"
substring = ""
highest_len = 0
highest_string = ""
counter = 0
for letter in s:
counter += 1
if s.index(letter) == 0:
substring = substring + letter
# highest_len = len(substring)
# highest_string = substring
else:
x = alphabet.index(substring[-1])
y = alphabet.index(letter)
if y >= x:
substring = substring + letter
if counter == len(s) and len(substring) > highest_len:
highest_len = len(substring)
highest_string = substring
else:
if len(substring) > len(highest_string): # changed highest_len to len(highest_string)
#highest_len = len(substring)
highest_string = substring
substring = "" + letter
else:
substring = "" + letter
print("Longest substring in alphabetical order is: " + highest_string)
You are overwriting highest_string at the wrong time. Only overwrite it if the substrings ends and after checking if it's greater than the longest found before.
Outputs:
Longest substring in alphabetical order is: ett
Alternate Approach
Break input string into substrings
Where each substring is increasing
Find the longest substring
Code
def longest_increasing_substring(s):
# Break s into substrings which are increasing
incr_subs = []
for c in s:
if not incr_subs or incr_subs[-1][-1] > c:# check alphabetical order of last letter of current
# string to current letter
incr_subs.append('') # Start new last substring since not in order
incr_subs[-1] += c # append to last substsring
return max(incr_subs, key = len) # substring with max length
Test
for s in ['vettmlxvn', 'azcbobobegghakl', 'abcbcd']:
print(f'{s} -> {longest_increasing_substring(s)}')
Output
vettmlxvn -> ett
azcbobobegghakl -> beggh
abcbcd -> abc

split a string to have chunks containing the maximum number of possible characters

e.g. string = 'bananaban'
=> ['ban', 'anab', 'an']
My attempt:
def apart(string):
letters = []
for i in string:
while i not in letters:
letters.append(i)
print("The letters are:" +str(letters))
x = []
result = []
return result
string = str(input("Enter string: "))
print(apart(string)
Basically, If I know all the letters that are in the word/string, I want to add them into x, until x contains all letters. Then I want to add x into result.
In my examaple "bananaban" it would mean [ban] is one x, because "ban" countains the letter "b","a" and "n". Same goes for [anab]. [an] only contains "a" and "n" because it is the end of the word.
Would be cool if somebody could help me ^^
IIUC, you want to split after all characters are in the current chunk.
You could use a set to keep track of the seen characters:
s = 'bananaban'
seen = set()
letters = set(s)
out = ['']
for c in s:
if seen != letters:
out[-1] += c
seen.add(c)
else:
seen = set(c)
out.append(c)
output: ['ban', 'anab', 'an']
The logical way seens to be first create a set with all letters in your string, then go over teh original one, collecting each character, and startign a new collection each time the set of letters in the collection match the original.
def apart(string):
target = set(string)
result = []
component = ""
for char in string:
component += char
if set(component) == target:
result.append(component)
component = ""
if component:
result.append(component)
return result
Using a set of the characters in the string, you can loop through the string and add or extend the last group in your resulting list:
S = "bananaban"
chars = set(S) # distinct characters of string
groups = [""] # start with an empty group
for c in S:
if chars.issubset(groups[-1]): # group contains all characters
groups.append(c) # start a new group
else:
groups[-1] += c # append character to last group
print(groups)
['ban', 'anab', 'an']

Python count max character in a string?

I am trying to find the max character count in a string using loop. So far, this is the code i have written :-
def max_char_count(string):
max_char = ''
max_count = 0
for char in string:
count = string.count(char)
if count > max_count:
max_count = count
max_char = char
return max_char
print(max_char_count('apple hellooo'))
But the issue i am running into is even though there are 3 l and 3 o. I am only getting the output as l. How can i adjust the code to show the right count for the characters? Thank you.
Your approach is inefficient because you need to iterate over the whole string to compute string.count(char) while you iterate over the entire string anyway, which gives a time complexity of O(n^2)
Instead, I suggest you calculate the counts of each character once looping through the string, and then select the one(s) with the maximum count.
def max_char_count(string):
ret = []
counter = dict() # create an empty dict to store the character counts
# Itrate over the string once to count the characters
# N iterations
for char in string:
# `.get()` returns the given default value (0) if the `char` key doesn't exist
# If it does, it returns the value for that key
# Then you increment it
counter[char] = counter.get(char, 0) + 1
# Find the max value
# (another loop over the dict, worst-case N iterations)
max_count = max(counter.values())
# Iterate over the dict one last time to get the keys that have a value == max_count
# Again, worst-case N iterations
for char, count in counter.items():
if count == max_count:
ret.append((char, count))
return ret
Now, print(max_char_count('apple hellooo')) returns [('o', 3), ('l', 3)]
If you don't want to reinvent the counting wheel, use collections.Counter instead. counter = collections.Counter(string)
Since we have three loops and none of them are nested loops, we get a time complexity of O(3*n) (or just O(n))
You can use a dictionary of characters to count the occurrences, then find the largest count and return all keys that have that count. The fromkeys() constructor will allow you to initialize the letter counts to zero thus making the counting loop much simpler. Selecting the letters with the maximum count can be done using a list comprehension.
This will compute the result in very few lines of code:
string = 'apple hellooo'
counts = dict.fromkeys(string,0) # initialize counts to zero
for c in string: counts[c] += 1 # compute characters counts
max_count = max(counts.values()) # find maximum count
result = [c for c,n in counts.items() if n==max_count] # matching characters
print(result)
['o', 'l']

I want to create a program that finds the longest substring in alphabetical order using python using basics

my code for finding longest substring in alphabetical order using python
what I mean by longest substring in alphabetical order?
if the input was"asdefvbrrfqrstuvwxffvd" the output wil be "qrstuvwx"
#we well use the strings as arrays so don't be confused
s='abcbcd'
#give spaces which will be our deadlines
h=s+' (many spaces) '
#creat outputs
g=''
g2=''
#list of alphapets
abc='abcdefghijklmnopqrstuvwxyz'
#create the location of x"the character the we examine" and its limit
limit=len(s)
#start from 1 becouse we substract one in the rest of the code
x=1
while (x<limit):
#y is the curser that we will move the abc array on it
y=0
#putting our break condition first
if ((h[x]==' ') or (h[x-1]==' ')):
break
for y in range(0,26):
#for the second character x=1
if ((h[x]==abc[y]) and (h[x-1]==abc[y-1]) and (x==1)):
g=g+abc[y-1]+abc[y]
x+=1
#for the third to the last character x>1
if ((h[x]==abc[y]) and (h[x-1]==abc[y-1]) and (x!=1)):
g=g+abc[y]
x+=1
if (h[x]==' '):
break
print ("Longest substring in alphabetical order is:" +g )
it doesn't end,as if it's in infinite loop
what should I do?
I am a beginner so I want some with for loops not functions from libraries
Thanks in advance
To avoid infinite loop add x += 1 in the very end of your while-loop. As a result your code works but works wrong in general case.
The reason why it works wrong is that you use only one variable g to store the result. Use at least two variables to compare previous found substring and new found substring or use list to remember all substrings and then choose the longest one.
s = 'abcbcdiawuhdawpdijsamksndaadhlmwmdnaowdihasoooandalw'
longest = ''
current = ''
for idx, item in enumerate(s):
if idx == 0 or item > s[idx-1]:
current = current + item
if idx > 0 and item <= s[idx-1]:
current = ''
if len(current)>len(longest):
longest = current
print(longest)
Output:
dhlmw
For your understanding 'b'>'a' is True, 'a'>'b' is False etc
Edit:
For longest consecutive substring:
s = 'asdefvbrrfqrstuvwxffvd'
abc = 'abcdefghijklmnopqrstuvwxyz'
longest = ''
current = ''
for idx, item in enumerate(s):
if idx == 0 or abc.index(item) - abc.index(s[idx-1]) == 1:
current = current + item
else:
current = item
if len(current)>len(longest):
longest = current
print(longest)
Output:
qrstuvwx
def sub_strings(string):
substring = ''
string +='\n'
i = 0
string_dict ={}
while i < len(string)-1:
substring += string[i]
if ord(substring[-1])+1 != ord(string[i+1]):
string_dict[substring] = len(substring)
substring = ''
i+=1
return string_dict
s='abcbcd'
sub_strings(s)
{'abc': 3, 'bcd': 3}
To find the longest you can domax(sub_strings(s))
So here which one do you want to be taken as the longest??. Now that is a problem you would need to solve
You can iterate through the string and keep comparing to the last character and append to the potentially longest string if the current character is greater than the last character by one ordinal number:
def longest_substring(s):
last = None
current = longest = ''
for c in s:
if not last or ord(c) - ord(last) == 1:
current += c
else:
if len(current) > len(longest):
longest = current
current = c
last = c
if len(current) > len(longest):
longest = current
return longest
so that:
print(longest_substring('asdefvbrrfqrstuvwxffvd'))
would output:
qrstuvwx

Is it possible to achieve this without defining a function?

I'm trying to write a Python program which would take a string and print the longest substring in it which is also in alphabetical order. For example:
the_string = "abcdefgghhisdghlqjwnmonty"
The longest substring in alphabetical order here would be "abcdefgghhis"
I'm not allowed to define my own functions and can't use lists. So here's what I came up with:
def in_alphabetical_order(string):
for letter in range(len(string) - 1):
if string[letter] > string[letter + 1]:
return False
return True
s = "somestring"
count = 0
for char in range(len(s)):
i = 0
while i <= len(s):
sub_string = s[char : i]
if (len(sub_string) > count) and (in_alphabetical_order(sub_string)):
count = len(sub_string)
longest_string = sub_string
i += 1
print("Longest substring in alphabetical order is: " + longest_string)
This obviously contains a function that is not built-in. How can I check whether the elements of the substring candidate is in alphabetical order without defining this function? In other words: How can I implement what this function does for me into the code (e.g. by using another for loop in the code somewhere or something)?
just going by your code you can move the operation of the function into the loop and use a variable to store what would have been the return value.
I would recommend listening to bill the lizard to help with the way you solve the problem
s = "somestring"
count = 0
longest_string = ''
for char in range(len(s)):
i = 0
while i <= len(s):
sub_string = s[char : i]
in_order = True
for letter in range(len(sub_string) - 1):
if sub_string[letter] > sub_string[letter + 1]:
in_order = False
break
if (len(sub_string) > count) and (in_order):
count = len(sub_string)
longest_string = sub_string
i += 1
print("Longest substring in alphabetical order is: " + longest_string)
You don't need to check the whole substring with a function call to see if it is alphabetical. You can just check one character at a time.
Start at the first character. If the next character is later in the alphabet, keep moving along in the string. When you reach a character that's earlier in the alphabet than the previous character, you've found the longest increasing substring starting at the first character. Save it and start over from the second character.
Each time you find the longest substring starting at character N, check to see if it is longer than the previous longest substring. If it is, replace the old one.
Here's a solution based off of what you had:
s = 'abcdefgghhisdghlqjwnmonty'
m, n = '', ''
for i in range(len(s) - 1):
if s[i + 1] < s[i]:
if len(n) > len(m):
m = n
n = s[i]
else:
n += s[i]
Output:
m
'abcdefgghhi'

Categories

Resources