alphabetical order in Python

alphabetical order in Python - python

Okay I have questions regarding the following code:
s = "wxyabcd"
myString = s[0]
longest = s[0]
for i in range(1, len(s)):
if s[i] >= myString[-1]:
myString += s[i]
if len(myString) > len(longest):
longest = myString
else:
myString = s[i]
print longest
Answer: "abcd"
w
wx
wxy
a
ab
abc
abcd
I am new to Python and I am trying to learn how some of these loops work but I am very confused. This found what the longest string in alphabetical order was... The actual answer was "abcd" but I know that the process it went through was one by one.
Question: Can someone please guide me through the code so I can understand it better? Since there are 7 characters I am assuming it starts by saying: "For each item in range 1-7 if the item is 'more' than myString [-1] which is 'w' then I add the letter plus the item in i which in this case it would be 'x'.
I get lost right after this... So from a - z : a > z? Is that how it is? And how then when s[i] != myString[-1] did it skip to start from 'a' in s[i].
Sorry I am all over the place. Anyways i've tried to search places online to help me learn this but some things are just hard. I know that in a few months ill get the hang of it and hopefully be more fluent.
Thank you!

Here's a bit of an explanation of the control flow and what's going on with Python's indexing, hope it helps:
s = "wxyabcd"
myString = s[0] # 'w'
longest = s[0] # 'w' again, for collecting other chars
for i in range(1, len(s)): # from 1 to 7, exclusive of 7, so 2nd index to last
if s[i] >= myString[-1]: # compare the chars, e.g. z > a, so x and y => True
myString += s[i] # concatenate on to 'w'
if len(myString) > len(longest): # evident?
longest = myString # reassign longest to myString
else:
myString = s[i] # reassign myString to where you are in s.
print longest

# s is a 7 character string
s = "wxyabcd"
# set `mystring` to be the first character of s, 'w'
myString = s[0]
# set `longest` to be the first character of s, 'w'
longest = s[0]
# loop from 1 up to and not including length of s (7)
# Which equals for i in (1,2,3,4,5,6):
for i in range(1, len(s)):
# Compare the character at i with the last character of `mystring`
if s[i] >= myString[-1]:
# If it is greater (in alphabetical sense)
# append the character at i to `mystring`
myString += s[i]
# If this makes `mystring` longer than the previous `longest`,
# set `mystring` to be the new `longest`
if len(myString) > len(longest):
longest = myString
# Otherwise set `mystring` to be a single character string
# and start checking from index i
else:
myString = s[i]
# `longest` will be the longest `mystring` that was formed,
# using only characters in descending alphabetic order
print longest

Two approaches I can think of (quickly)
def approach_one(text): # I approve of this method!
all_substrings = list()
this_substring = ""
for letter in text:
if len(this_substring) == 0 or letter > this_substring[-1]:
this_substring+=letter
else:
all_substrings.append(this_substring)
this_substring = letter
all_substrings.append(this_substring)
return max(all_substrings,key=len)
def approach_two(text):
#forthcoming

Related

What is wrong with my code trying to find longest substring of a string that is in alphabetical order?

I need help on a problem that I am doing in a course. The exact details are below:
Assume s is a string of lower case characters.
Write a program that prints the longest substring of s in which the letters occur in alphabetical order.
For example, if s = 'azcbobobegghakl', then your program should print
"Longest substring in alphabetical order is: beggh"
In the case of ties, print the first substring. For example, if s = 'abcbcd', then your program should print
"Longest substring in alphabetical order is: abc"
I have written some code that achieves some correct answers but not all and I am unsure why.
This is my code
s = 'vettmlxvn'
alphabet = "abcdefghijklmnopqrstuvwxyz"
substring = ""
highest_len = 0
highest_string = ""
counter = 0
for letter in s:
counter += 1
if s.index(letter) == 0:
substring = substring + letter
highest_len = len(substring)
highest_string = substring
else:
x = alphabet.index(substring[-1])
y = alphabet.index(letter)
if y >= x:
substring = substring + letter
if counter == len(s) and len(substring) > highest_len:
highest_len = len(substring)
highest_string = substring
else:
if len(substring) > highest_len:
highest_len = len(substring)
highest_string = substring
substring = "" + letter
else:
substring = "" + letter
print("Longest substring in alphabetical order is: " + highest_string)
When I test for this specific string it gives me "lxv" instead of the correct answer: "ett". I do not know why this is and have even tried drawing a trace table so I can trace variables and I should be getting "ett".
Maybe I have missed something simple but can someone explain why it is not working.
I know there are probably easier ways to do this problem but I am a beginner in python and have been working on this problem for a long time.
Just want to know what is wrong with my code.
Thanks.

I solved your Problem with an alternate simpler approach.
You can compare 2 characters like numbers. a comes before b in the alphabet, that means the expression 'a' < 'b' is True
def longest_substring(s):
# Initialize the longest substring
longest = ""
# Initialize the current substring
current = ""
# Loop through the string
for i in range(len(s)):
if i == 0:
# If it's the first letter, add it to the current substring
current += s[i]
else:
if s[i] >= s[i-1]:
# If the current letter is greater than or equal to the previous letter,
# add it to the current substring
current += s[i]
else:
# If the current letter is less than the previous letter,
# check if the current substring is longer than the longest substring
# and update the longest substring if it is
if len(current) > len(longest):
longest = current
current = s[i]
# Once the loop is done,
# check again if the current substring is longer than the longest substring
if len(current) > len(longest):
longest = current
return longest
print(longest_substring("azcbobobegghakl"))
print(longest_substring("abcbcd"))
print(longest_substring("abcdefghijklmnopqrstuvwxyz"))
print(longest_substring("vettmlxvn"))
Output:
beggh
abc
abcdefghijklmnopqrstuvwxyz
ett
I haven't figured out what's wrong with your code yet. I will update this if I figured it out.
Edit
So commented some stuff out and changed one thing.
Here's the working code:
s = 'vettmlxvn'
alphabet = "abcdefghijklmnopqrstuvwxyz"
substring = ""
highest_len = 0
highest_string = ""
counter = 0
for letter in s:
counter += 1
if s.index(letter) == 0:
substring = substring + letter
# highest_len = len(substring)
# highest_string = substring
else:
x = alphabet.index(substring[-1])
y = alphabet.index(letter)
if y >= x:
substring = substring + letter
if counter == len(s) and len(substring) > highest_len:
highest_len = len(substring)
highest_string = substring
else:
if len(substring) > len(highest_string): # changed highest_len to len(highest_string)
#highest_len = len(substring)
highest_string = substring
substring = "" + letter
else:
substring = "" + letter
print("Longest substring in alphabetical order is: " + highest_string)
You are overwriting highest_string at the wrong time. Only overwrite it if the substrings ends and after checking if it's greater than the longest found before.
Outputs:
Longest substring in alphabetical order is: ett

Alternate Approach
Break input string into substrings
Where each substring is increasing
Find the longest substring
Code
def longest_increasing_substring(s):
# Break s into substrings which are increasing
incr_subs = []
for c in s:
if not incr_subs or incr_subs[-1][-1] > c:# check alphabetical order of last letter of current
# string to current letter
incr_subs.append('') # Start new last substring since not in order
incr_subs[-1] += c # append to last substsring
return max(incr_subs, key = len) # substring with max length
Test
for s in ['vettmlxvn', 'azcbobobegghakl', 'abcbcd']:
print(f'{s} -> {longest_increasing_substring(s)}')
Output
vettmlxvn -> ett
azcbobobegghakl -> beggh
abcbcd -> abc

Remove string character after run of n characters in string

Suppose you have a given string and an integer, n. Every time a character appears in the string more than n times in a row, you want to remove some of the characters so that it only appears n times in a row. For example, for the case n = 2, we would want the string 'aaabccdddd' to become 'aabccdd'. I have written this crude function that compiles without errors but doesn't quite get me what I want:
def strcut(string, n):
for i in range(len(string)):
for j in range(n):
if i + j < len(string)-(n-1):
if string[i] == string[i+j]:
beg = string[:i]
ends = string[i+1:]
string = beg + ends
print(string)
These are the outputs for strcut('aaabccdddd', n):
n
output
expected
1
'abcdd'
'abcd'
2
'acdd'
'aabccdd'
3
'acddd'
'aaabccddd'
I am new to python but I am pretty sure that my error is in line 3, 4 or 5 of my function. Does anyone have any suggestions or know of any methods that would make this easier?

This may not answer why your code does not work, but here's an alternate solution using regex:
import re
def strcut(string, n):
return re.sub(fr"(.)\1{{{n-1},}}", r"\1"*n, string)
How it works: First, the pattern formatted is "(.)\1{n-1,}". If n=3 then the pattern becomes "(.)\1{2,}"
(.) is a capture group that matches any single character
\1 matches the first capture group
{2,} matches the previous token 2 or more times
The replacement string is the first capture group repeated n times
For example: str = "aaaab" and n = 3. The first "a" is the capture group (.). The next 3 "aaa" matches \1{2,} - in this example a{2,}. So the whole thing matches "a" + "aaa" = "aaaa". That is replaced with "aaa".
regex101 can explain it better than me.

you can implement a stack data structure.
Idea is you add new character in stack, check if it is same as previous one or not in stack and yes then increase counter and check if counter is in limit or not if yes then add it into stack else not. if new character is not same as previous one then add that character in stack and set counter to 1
# your code goes here
def func(string, n):
stack = []
counter = None
for i in string:
if not stack:
counter = 1
stack.append(i)
elif stack[-1]==i:
if counter+1<=n:
stack.append(i)
counter+=1
elif stack[-1]!=i:
stack.append(i)
counter = 1
return ''.join(stack)
print(func('aaabbcdaaacccdsdsccddssse', 2)=='aabbcdaaccdsdsccddsse')
print(func('aaabccdddd',1 )=='abcd')
print(func('aaabccdddd',2 )=='aabccdd')
print(func('aaabccdddd',3 )=='aaabccddd')
output
True
True
True
True

The method I would use is creating a new empty string at the start of the function and then everytime you exceed the number of characters in the input string you just not insert them in the output string, this is computationally efficient because it is O(n) :
def strcut(string,n) :
new_string = ""
first_c, s = string[0], 0
for c in string :
if c != first_c :
first_c, s= c, 0
s += 1
if s > n : continue
else : new_string += c
return new_string
print(strcut("aabcaaabbba",2)) # output : #aabcaabba

Simply, to anwer the question
appears in the string more than n times in a row
the following code is small and simple, and will work fine :-)
def strcut(string: str, n: int) -> str:
tmp = "*" * (n+1)
for char in string:
if tmp[len(tmp) - n:] != char * n:
tmp += char
print(tmp[n+1:])
strcut("aaabccdddd", 1)
strcut("aaabccdddd", 2)
strcut("aaabccdddd", 3)
Output:
abcd
aabccdd
aaabccddd
Notes:
The character "*" in the line tmp = "*"*n+string[0:1] can be any character that is not in the string, it's just a placeholder to handle the start case when there are no characters.
The print(tmp[n:]) line simply removes the "*" characters added in the beginning.

You don't need nested loops. Keep track of the current character and its count. include characters when the count is less or equal to n, reset the current character and count when it changes.
def strcut(s,n):
result = '' # resulting string
char,count = '',0 # initial character and count
for c in s: # only loop once on the characters
if c == char: count += 1 # increase count
else: char,count = c,1 # reset character/count
if count<=n: result += c # include character if count is ok
return result

Just to give some ideas, this is a different approach. I didn't like how n was iterating each time even if I was on i=3 and n=2, I still jump to i=4 even though I already checked that character while going through n. And since you are checking the next n characters in the string, you method doesn't fit with keeping the strings in order. Here is a rough method that I find easier to read.
def strcut(string, n):
for i in range(len(string)-1,0,-1): # I go backwards assuming you want to keep the front characters
if string.count(string[i]) > n:
string = remove(string,i)
print(string)
def remove(string, i):
if i > len(string):
return string[:i]
return string[:i] + string[i+1:]
strcut('aaabccdddd',2)

finding the minimum window substring

the problem says to create a string, take 3 non-consecutive characters from the string and put it into a sub-string and print the which character the first one is and which character the last one is.
str="subliminal"
sub="bmn"
n = len(str)-3
for i in range(0, n):
print(str1[i:i+4])
if sub1 in str1:
print(sub1[i])
this should print 3 to 8 because b is the third letter and n is the 8th letter.
i also don't know how to make the code work for substrings that aren't 3 characters long without changing the code in total.

Not sure if this is what you meant. I assume that the substring is already valid, which means that it contains non consecutive letters. Then I get the first and last letter of the substring and create a list of all the letters in the string using a list comprehension. Then i just loop through the letters and save where the first and last letter occur. If anything is missing, hmu.
sub = "bmn"
str = "subliminal"
first_letter = sub[0]
last_letter = sub[-1]
start = None
end = None
letters = [let for let in str]
for i, letter in enumerate(letters):
if letter == first_letter:
start = i
if letter == last_letter:
end = i
if start and end:
print(f"From %s to %s." % (start + 1, end + 1)) # Output: From 3 to 8.

Some recursion for good health:
def minimum_window_substring(strn, sub, beg=0, fin=0, firstFound=False):
if len(sub) == 0 or len(strn) == 0:
return f'From {beg + 1} to {fin}'
elif strn[0] == sub[0]:
return minimum_window_substring(strn[1:], sub[1:], beg, fin + 1, True)
if not firstFound:
beg += 1
return minimum_window_substring(strn[1:], sub, beg, fin + 1, firstFound)
Explanation:
The base case is if we get our original string or our sub-string to be length 0, we then stop and print the beginning and the end of the substring in the original string.
If the first letter of the current string is equal then we start the counter (we fix the beginning "beg" with the flag "firstFound") Then increment until we finish (sub is an empty string / original string is empty)
Something to think about / More explanation:
If for example, you ask for the first occurrence of the substring, for example if the original string would be "sububusubulum" and the sub would equal to "sbl" then when we hit our first "s" - it means it would 100% start from there, because if another "sbl" is inside the original string - then it must contain the remaining letters, and so we would say they belong to the first s. (A horrible explanation, I am sorry) what I am trying to say is that if we have 2 occurrences of the substring - then we would pick the first one, no matter what.
Note: This function does not really care if the sub-string contains consecutive letters, also, it does not check whether the characters are in the string itself, because you said that we must be given characters from the original string. The positive thing about it, is that the function can be given more than (or less than) 3 characters long substring
When I say "original string" I mean subliminal (or other inputs)

There are many different ways you could do it,
here is a soultion,
import re
def Func(String, SubString):
patt = "".join([char + "[A-Za-z]" + "+" for char in sub[:-1]] + [sub[-1]])
MatchedString = re.findall(patt, String)[0]
FirstIndex = String.find(MatchedString) + 1
LastIndex = FirstIndex + len(MatchedString) -1
return FirstIndex, LastIndex
string="subliminal"
sub="bmn"
FirstIndex, LastIndex = Func(string, sub)
This will return 3, 8 and you could change the length of the substring, and assuming you want just the first match only

Is it possible to achieve this without defining a function?

I'm trying to write a Python program which would take a string and print the longest substring in it which is also in alphabetical order. For example:
the_string = "abcdefgghhisdghlqjwnmonty"
The longest substring in alphabetical order here would be "abcdefgghhis"
I'm not allowed to define my own functions and can't use lists. So here's what I came up with:
def in_alphabetical_order(string):
for letter in range(len(string) - 1):
if string[letter] > string[letter + 1]:
return False
return True
s = "somestring"
count = 0
for char in range(len(s)):
i = 0
while i <= len(s):
sub_string = s[char : i]
if (len(sub_string) > count) and (in_alphabetical_order(sub_string)):
count = len(sub_string)
longest_string = sub_string
i += 1
print("Longest substring in alphabetical order is: " + longest_string)
This obviously contains a function that is not built-in. How can I check whether the elements of the substring candidate is in alphabetical order without defining this function? In other words: How can I implement what this function does for me into the code (e.g. by using another for loop in the code somewhere or something)?

just going by your code you can move the operation of the function into the loop and use a variable to store what would have been the return value.
I would recommend listening to bill the lizard to help with the way you solve the problem
s = "somestring"
count = 0
longest_string = ''
for char in range(len(s)):
i = 0
while i <= len(s):
sub_string = s[char : i]
in_order = True
for letter in range(len(sub_string) - 1):
if sub_string[letter] > sub_string[letter + 1]:
in_order = False
break
if (len(sub_string) > count) and (in_order):
count = len(sub_string)
longest_string = sub_string
i += 1
print("Longest substring in alphabetical order is: " + longest_string)

You don't need to check the whole substring with a function call to see if it is alphabetical. You can just check one character at a time.
Start at the first character. If the next character is later in the alphabet, keep moving along in the string. When you reach a character that's earlier in the alphabet than the previous character, you've found the longest increasing substring starting at the first character. Save it and start over from the second character.
Each time you find the longest substring starting at character N, check to see if it is longer than the previous longest substring. If it is, replace the old one.

Here's a solution based off of what you had:
s = 'abcdefgghhisdghlqjwnmonty'
m, n = '', ''
for i in range(len(s) - 1):
if s[i + 1] < s[i]:
if len(n) > len(m):
m = n
n = s[i]
else:
n += s[i]
Output:
m
'abcdefgghhi'

return longest alphabetical substring

The aim of the program is to print the longest substring within variable s that is in alphabetical order.
s ='abchae'
currentlen = 0
longestlen = 0
current = ''
longest = ''
alphabet = 'abcdefghijklmnopqrstuvwxyz'
for char in s:
for number in range(0,len(s)):
if s[number] == char:
n = number
nxtchar = 1
alphstring = s[n]
while alphstring in alphabet == True and n+nxtchar <= 5:
alphstring += s[n+nxtchar]
nxtchar += 1
currentlen = len(alphstring)
current = alphstring
if currentlen > longestlen:
longest = current
print longest
When run, the program doesn't print anything. I don't seem to see what's wrong with the code. Any help would be appreciated.

I'd use regex for this
import re
string = 'abchae'
alphstring = re.compile(r'a*b*c*d*e*f*g*h*i*j*k*l*m*n*o*p*q*r*s*t*u*v*w*x*y*z*', re.I)
longest = ''
for match in alphstring.finditer(string):
if len(match.group()) > len(longest):
longest = match.group()
print(longest)
Output:
abch
Note: The flag re.I in the regex expression causes the regex to ignore case. If this is not the desired behavior you can delete the flag and it will only match lowercase characters.

Like Kasramvd said, I do not understand the logic behind your code. You sure your code can run without raise IndentationError? As I concerned, the following part (the second row, have wrong indentation).
for number in range(0,len(s)):
if s[number] == char:
n = number
If you fixed that indentation error, you can run you code without error, and the last row (print longest) does work, it just does not work as you expect, it only prints a blank line.

I think I understood what you meant.
First you need to fix the indentation problem in your code, that would make it run:
for number in range(0,len(s)):
if s[number] == char:
n = number
Second, that condition will return two numbers 0 and 4 since a appears two times in s. I believe you only want the first so you should probably add a break statement after you find a match.
for number in range(0,len(s)):
if s[number] == char:
n = number
break
Finally, alphstring in alphabet == True will always return False. Because alphabet will never be True, you need parentheses to make this work or remove the == True.
ex: while (alphstring in alphabet) == True and n+nxtchar <= 5:
I believe that you were looking for the string abch which is what I managed to obtain with these changes

This is my solution:
result = ""
s = 'abchae'
alphabet = 'abcdefghijklmnopqrstuvwxyz'
max_length=0
for i in range(len(s)):
for j in range(len(s)):
if s[i:j] in alphabet and len(s[i:j])>max_length:
max_length = len(s[i:j])
result = s[i:j]
print result

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

alphabetical order in Python - python

Related

What is wrong with my code trying to find longest substring of a string that is in alphabetical order?

Remove string character after run of n characters in string

finding the minimum window substring

Is it possible to achieve this without defining a function?

return longest alphabetical substring

Categories

Resources