Python Range Error in While Loop - python

I am trying to write a python program that will take any string of lowercase letters and return the longest alphabetical substring within it. Below is a segment of the code.
s="abc" #sample string
anslist=[] #stores answers
shift=0 #shifts substring
expan=0 #expands substring
while len(s) >= 1+shift+expan: #within bounds of s
if s[0+shift+expan] > s[1+shift+expan]: #if not alphabetical
shift += 1 #moves substring over
else: #if alphabetical
while s[0+shift+expan] <= s[1+shift+expan]: #while alphabetical
expan += 1 #checks next letter
anslist += s[0+shift:2+shift+expan] #adds substring to ans
expan = 0 #resets expansion
When I run the code, the line containing
while s[0+shift+expan] <= s[1+shift+expan]:
creates an error that the string index is outside of the range. I see that adding to expan will put the index out of range, but shouldn't the largest while loop solve this? I appreciate any help.

First, why your code doesn't work:
You aren't protecting your inner loop against running off the end of the string
your indexes are off when "saving" the substring
you += onto anslist, which is not how you add strings to a list
you don't increment the shift after processing a substring, so when it clears expan it starts over at the same index and loops forever
Fixed code (inline comments explain changes):
s="abckdefghacbde" #sample string
anslist=[] #stores answers
shift=0 #shifts substring
expan=0 #expands substring
while len(s) > 1+shift+expan: #within bounds of s
if s[0+shift+expan] > s[1+shift+expan]: #if not alphabetical
shift += 1 #moves substring over
else: #if alphabetical
# Added guard for end of string
while len(s) > 1 + shift + expan and # While still valid
s[0+shift+expan] <= s[1+shift+expan]:# While alphabetical
expan += 1 #checks next letter
# Fixed string sublength and append instead of +=
anslist.append(s[0+shift:1+shift+expan]) #adds substring to ans
# Continue to next possible substring
shift += expan # skip inner substrings
expan = 0
print anslist
Results in:
['abck', 'defgh', 'ac', 'bde']
So the last step would be to find the one with the longest length, which I'll leave up to you, since this looks like homework.
To answer the question:
I see that adding to expan will put the index out of range, but shouldn't the largest while loop solve this?
It protects against your starting substring index from going off, but not your expansion. You must protect against both possibilities.

Have a look at this.
>>> import re
>>> words = []
>>> word = r'[a]*[b]*[c]*[d]*[e]*[f]*[g]*[h]*[i]*[j]*[k]*[l]*[m]*[n]*[o]*[q]*[r]*[s]*[t]*[u]*[v]*[x]*[y]*[z]*' # regex that would match sub-strings. Just extend it up to z.
>>> string = "bacde"
>>> for m in re.finditer(word, string):
... words.append(m.group())
>>> print(words) # list of substrings
['b', 'acde']
Then you can extract the largest string from the list of strings
>>> print(max(words, key=len))
acde

Related

finding the minimum window substring

the problem says to create a string, take 3 non-consecutive characters from the string and put it into a sub-string and print the which character the first one is and which character the last one is.
str="subliminal"
sub="bmn"
n = len(str)-3
for i in range(0, n):
print(str1[i:i+4])
if sub1 in str1:
print(sub1[i])
this should print 3 to 8 because b is the third letter and n is the 8th letter.
i also don't know how to make the code work for substrings that aren't 3 characters long without changing the code in total.
Not sure if this is what you meant. I assume that the substring is already valid, which means that it contains non consecutive letters. Then I get the first and last letter of the substring and create a list of all the letters in the string using a list comprehension. Then i just loop through the letters and save where the first and last letter occur. If anything is missing, hmu.
sub = "bmn"
str = "subliminal"
first_letter = sub[0]
last_letter = sub[-1]
start = None
end = None
letters = [let for let in str]
for i, letter in enumerate(letters):
if letter == first_letter:
start = i
if letter == last_letter:
end = i
if start and end:
print(f"From %s to %s." % (start + 1, end + 1)) # Output: From 3 to 8.
Some recursion for good health:
def minimum_window_substring(strn, sub, beg=0, fin=0, firstFound=False):
if len(sub) == 0 or len(strn) == 0:
return f'From {beg + 1} to {fin}'
elif strn[0] == sub[0]:
return minimum_window_substring(strn[1:], sub[1:], beg, fin + 1, True)
if not firstFound:
beg += 1
return minimum_window_substring(strn[1:], sub, beg, fin + 1, firstFound)
Explanation:
The base case is if we get our original string or our sub-string to be length 0, we then stop and print the beginning and the end of the substring in the original string.
If the first letter of the current string is equal then we start the counter (we fix the beginning "beg" with the flag "firstFound") Then increment until we finish (sub is an empty string / original string is empty)
Something to think about / More explanation:
If for example, you ask for the first occurrence of the substring, for example if the original string would be "sububusubulum" and the sub would equal to "sbl" then when we hit our first "s" - it means it would 100% start from there, because if another "sbl" is inside the original string - then it must contain the remaining letters, and so we would say they belong to the first s. (A horrible explanation, I am sorry) what I am trying to say is that if we have 2 occurrences of the substring - then we would pick the first one, no matter what.
Note: This function does not really care if the sub-string contains consecutive letters, also, it does not check whether the characters are in the string itself, because you said that we must be given characters from the original string. The positive thing about it, is that the function can be given more than (or less than) 3 characters long substring
When I say "original string" I mean subliminal (or other inputs)
There are many different ways you could do it,
here is a soultion,
import re
def Func(String, SubString):
patt = "".join([char + "[A-Za-z]" + "+" for char in sub[:-1]] + [sub[-1]])
MatchedString = re.findall(patt, String)[0]
FirstIndex = String.find(MatchedString) + 1
LastIndex = FirstIndex + len(MatchedString) -1
return FirstIndex, LastIndex
string="subliminal"
sub="bmn"
FirstIndex, LastIndex = Func(string, sub)
This will return 3, 8 and you could change the length of the substring, and assuming you want just the first match only

how to reset counter when looking for a sequence of strings properly

so i am doing the cs50 dna problem and i am having a hard time with the counter as i don't know how to code to properly count the highest amount of times the sequence i am looking for has repeated without another sequence in between the two. for example i am looking for the sequence AAT and the text is AATDHDHDTKSDHAATAAT so the highest amount should be two as the last two sequence is AAT and there is no sequence between them.
here is my code:
text="TCTAGTCTAGTCTAGTCTAGTCTAGACTTGTCGCTGACTCCGAGAAGATCCTAACATTAACCAATTCCCCCTAGTCTGAGGCACGGTTACCGATCGGGTTAATGGATCTCTCACCGTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAAACGTGTAACTGTAATAATCCGCCCGAAAAAACTGATCTTAGGGTTGCGGCATCTGCACGTGACAGTGTGCTACTGTTAGATAGAGGGATCAAACGAGGTTGCAAGGATTATATCTCTCCGTGCTCGATAAGACACAGCCGGTTGCGGGCTGCTTCCTCTGGATCCAATGCAGCCGTACGTACACCGTAGAGCAAATTTAGTGGTAAAGGAACTTGCTCAAACACTACGGCTTCGGGCTACTGTTGGCGCCGGTTGGGGATCCCATTCAACGCTGGCCCTTTCGCTATGGTTCGGTGATTTTACACCGAAGCGAACCTTGAACCGTGGATTTCGGGTGTCCTCCGTTTTTAGGTACTGCGTGCAGACATGGGCACCTGCCATAGTGCGATCAGCCAGAATCCATTGTATGGGAGTTGGACTCGTTTGAATTTACCGGAAACCTCATGCTTGGTCTGTAGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGAAACTGGGCGACTTGAAGTCGGCTTGCGTATTAATAGCTCTGCAATGTAACTCGGCCCTTGGCGGCGGGCAGCTTAGTATTGAACCGCGACACACCATAGGTGCGGCAAATATTAAAAGTACGCTCGAACCGGAACCTGTCTCCATGACTGGACGACCAGCCCGGCGTCTTCTACGTAACACAGGGGGCTGTCGAGGTAGGGCGTAGGAACTTCGGGGTCACTACGCCGTAACAGCACCGAATATCATATCATCCAACTTGCTTGGTACATGCCCCGTTCTGTATCAAAAGTTTACGGCCCCGGACATACCTGCTGTCAGTTGAATACCTATGCGAGTCTGAAACACGAATAGTTCAGGCGTGCAAAGACACGCTAAGCACACGCCGCAGGCAGGGGGGGTATTTTATAAGTCGTTTTTTGGAAGGGTAATGTAAACTTATCCCATAATACCCTTTGGCTTCCCCTCACTCGTGCACTTCTCATAATGATACGTCAGGGTGATTGTAGATTCACGCGTCATCAGATTGTCCCTTTCTCGAGTCTTAGTATCTTTCCTAATCCGCTCGACTCTGCGCCATGATCGAATTCCTGACAGGCTACAAGAATAAACTGCCAGCATACTCCTTACCGATTGGCGCCTACTAATTATACGCACATGGGCATCTTCGACGTCTAAACATAGGCTCTTAGTATTCCGTAGGATGTTGAGCCGACAGGAAAGTCAAACGTCGTGGGTGACCGTAGCCTGACTCGCCCGACGCAGGATTCGCTCATATGTGTGAACGGATGCTTATGTAACTTCCTAATTGCAGCGAATGGCAGTTCCGTAGTGAAGGTTCGAAACGTACGGGGTCCGGCCATGGATTAGATCTTTCAGTGCGCTAAACTCTTAACCGCAGATACTTGGCGGACCATCTTCGTGTTGCTACTATGGTATAGACCAGGCTGTCGAATCTACTTAACACAGGTGAACCCCCAGATCGGCTAGAGCCTTCGAGGCTAGACCTTTAACAATCTTTAGACACTTCCAAATCGCGGCCGGATATGTCTCGTTGGCAGCCGCAGACAAGAGAAGAGGGTCGGCAGTGTCTGCCACGCGTGACCTGTATGATCTTAGCCTTTAAGATCACACTACTGATCACAATCTATTATGATTGCCTTAGCTAACTGAGTGATGCACCCCCACAGGCTGAGAGAAATCTGTAGTTTGACGACACGCCGTCTGGCTAAAAATGTGAATCCGCCGATCCGAGACGGTGGAAGCTTGAGACCAAATGCGGGAAACCAATGACTTCATTACGGAACAAGACATAACGGCGTGAGTTGACGACTGGGATTAACCCTTTCCCGAGTCTGTACTTCTGCTACACAATGAGGATGCGAATTATCTAAGACCTTGTACTACCTAAACTAACCCTGAGGCGGGCATTGAATTCCGGCCATCTTCAGCCCAAAGAAAGACCAAATGTGAGGAAAATGAGGGATCGGTATAAGCTTTTCACGATCTCAAGGTTCACGGCCGCCAGGGCCGTAGTTGGGGCTTCATGCACATTGCCAACCCGGACATCGACAGTCGGTACCGCAGGGGTTCGAGGAATACTCCCAGCTGTGACACCTGGTCGTCGACTGGACCCAGCTGGTGGGCGGCATAGGTAGTTAATACTGAATTAAAGCCGGGAACGTCTCTCTAACTAGAAACCTTGTGATAGGATACACAGACCTAGTGCCCCGACGTTAGCATTTGAATTCATCTATCTTGGCGTCTTTTAGTAGGCCTGGGTCAACTCCGGCGTTGGCCAAAATAACCGATCTGCGTTATGTGGCCACGCATCGAGTGACAGGGTGCATACAAATTGATGGTCAAAGAGTTTAAACAAGACAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCCCCACGCTTCTACATAGCCACACTGGAGCTAGTCCTCGTGTTAAATTTTTCGCTTGTTGCACGGTTATCATCAGAAGTGCCACTGGTATTCCTCTGTAGCTCCCGTATGCCGAAGGTTGCGGCTTAGGTACTGCTTATACACGTCTCTCAAGTTTGTCAGCCGCGTGATCTTTCTGCGGGGATAGGTGATCGTCCCTCGCTCCGGACATTGCATTAAAATTACCTAGTTGATAGGGCGGCGGAGTTGCATACCGGCGTTCAATCGCGGCTCCAGACTGGTTTGAGCTACGCGTCTGCCAGCGTGAAAAAGCTGATTTGTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCCAGGTATTATCATTTGAATCGTATGTTTTCTGCCGTACGTCAACTGCGTCGTCGGGGACTGAAATGGTCTGCCTCCAGACCCTTACCTCCCGATAAGCCATGACTAAGTATGTGAAGGATCACCTGAATTGCTGAAAGTTAACGGTAAGATATCTGAAAGAGCTCATTAGATCCAACACTTATCTACTCAAAAATTCGTCATATTTCGGTGACTTGCTAGAAAGGCTCTTGCACAGTAAGGTTATAGAGAATGCTACCGTTGAAGCACCAGCCGTTGAAGCCCGCCTTTAACCACGCGATATATCCAATTAACCAAGGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGTCGCCTTGTAATAATTACTTTGGCCCGGATTATAACGAAGGAACTCGCCATGAACTCGCAGCACGTTGTACTGGAACAATCTACTTTTTATAATATAGCGATAACTCCCAGCTTTTATGTGGGTGATATTGTCCTAGCTTTTTAAAGATACCCTCTGGCCCGGTCCAAGTAAGGTCCACATTGCCTGACGTAAGCGTACGGTCAACGGGTGCACCGGTTCCCGCTAAAGCTCGATCCTATTCTTTCAGTCGGGGGGAAATAAACTCGTATACTCTCCACCCACCCGTACGTCCCGGACTAGAATAACTACCGGGTATTTCCGGTTCGTAACACCACGCCATGACGTGTCAACATAAACGCTTCTTTTGAAAGGTGCACATGCAGATTGCACAAGCAGCAGGCACCGCCCTTATCCATATCCTGTTGAGGCCCTCGATCCTAGTGTTCCTTGTTATCAGGATATTTTCTCGCTGTACGTTATTGTCCTTTTCAAATTACAACTGACCGCTTCCTCACCCGCTAAACCCTACCTTACGCACAACCAAGGCCTTGTCCCGGATGAACCCGGCTGCTCCTATGGATAAGCAACCCAGCCCGGCAGTTTACTTCAGGTGTTATCGTCGACTGACACCCTCAGCTTTCTCCCATTACACAGCGAGTATTTTCCGCGTAGCAATGGCAGTGACTTTGAGCGCACACTCAGAAGCCGTTGGAATGGCACCGGGGACGGCCCGATTTAGCCCCGCACACCTCCTGGAATCTTAGATCGCACGGCGATCTCGGTTCAGGCACCAACCCCAAAGAGTGTTTTGAGTTTTTGGTATGGCTCGCCTCAATTATCGGTTTTCGCTGCTCTGTGCCTGTCAACTCGGCTAGCTGTCGTGTTTTGTCGATCAGTGCGTGGACACTCTCGGTCGATGGTCGTGGATGGGACTGTAGTAAGTTTCACCGAAGCAGGAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAACTTCGCTTCATATAACGTAGCCATAGTGCTGTCTGCCATCAATAAGTCTTGCTCAGTGGTGCATACGTCGGGGAGGTTTGTTCCGCCTGGTCAGAACGAGTCTAGGGCGAGCCTATAGGCCAGTCGAGAGCCAAGATTCTATGAAATTAATACGACTACTGGGTGAGAGGTCATACAATTCCCGTGGAATCTGTACCTAAGATATTTCCAGATAGGGATGGCTACTGGTTAAGTTGACAGTGTCTAGATACGTGAGAGCACCTGAGAGGACGCCACGAGTCGGAGCGTGGGCGATCACCCTTCTGAGTCATAAGTCATGTCTATATATCCCTCACTAAAAAGGGCACACGACTATACATGCTTGAGCTTTACGGTCTGGCATGTGGAATGCCCGGAGCAACCCAGTCTTACCATCCTTTACGTACATTTACCGACCCGGCAGTGGCCGGCGCGGAAACCCAGGAGAACGTCGGTCATGATACGCGCCCTCCGCCGAAAGCGTGCTCACACCTCAGGATATCAGCGCTATTACCGGACGTCCCGCGTCCACCATCTAATAATTCAGGTGCTCCTAATAAGTGGGCTGGAGAGCGAGGATTGATATACGTTGAGGAGCTCCGACGGCCCTCTCGTGCGTTTGATGTAGATTGCGTTACCGACGGAGCACGCGTTTGTCAATTTCTGTCTAGGGACGTTTATGTCCTCAATACGAATACCAGGCCTATTTTAGTGTACAAATCACTTAGCAGTCGGAATTGGAAACCTGATGGAAGCGT"
counter=0
length=len(text)
search="AGATC"
tmp=0
for i in range(length):
if text[i:i + len(search)] == search:
tmp += 1
if tmp > counter:
counter = tmp
if text[i:i + len(search)] != search:
tmp = 0
print("done")
print(counter)
try this
import re
sequence = "AATDHDHDTKSDHAATAAT"
matches = re.findall(r'(?:AAT)+', sequence)
largest = max(matches, key=len)
print(len(largest)//len('AAT'))
basically this way will find you the list of substrings in the string you are have then you choose the largest substring. The number of occurrences of the substring will be the length of the largest divide by the length of substring
First and foremost, the regex solution is the Python way to solve this. However, if you want your code repaired ...
The problem with your code is that your index fails to acknowledge that you've found a match. You have no way to recognize consecutive occurrences.
Consider the case where you've found the start of a triple-match, AATAATAAT. You get to the first A, recognize, the AAT and increment tmp. You go to the next loop iteration, and now i points at the second A. You see that it's not AAT here (it's ATA, spanning the first two occurrences), so you record one instance and reset all your status variables.
Instead, you have to jump to the end of the first match and look for a second. Since your index does not move smoothly by increments of 1, you'll want a while loop instead.
Please learn to use meaningful variable names where the variables have
any meaning. i is fine if all it does is to manage your loop. As
soon as you use it for anything else, give it a real name. Similarly,
tmp and count really need replacing.
snip_size = len(search)
pos = 0 # position in the genetic sequence
rep = 0 # number of consecutive repetitions
max_rep = 0 # longest repetition sequence found
while pos < length:
if text[pos:pos + snip_size] == search:
rep += 1
pos += snip_size
else:
max_rep = max(max_rep, rep)
rep = 0
pos += 1
print(max_rep, "repetitions found")
Output:
15 repetitions found

Largest substring of non-repeating letters of a string

From the beginning I want to point out, that I am using Python Language. In this question I initially have a string. For example 'abcagfhtgba'. I need to find the length of the largest substring of non-repeating letters. In the case provided above it is 'agfht' (5), because at position [4] the 'a' repeats, so we start the count from the begining.
My idea for this question is to create a dictionary, which stores letters as keys, and numbers of their appearances as values. Whenever any key has corresponding value 2, we append the length of the dictionary to the list named result and completely substitute it with an empty list. For some tests this approach holds, for some not. I will provide the code that I used with brief comments of explanation.
Here I store the input in form of a list
this = list(map(str, input()))
def function(list):
dict = {}
count = 0
result = [1]
Here I start the loop and for every element if it is not in the keys I create a key
with value 1. If the element is in the dictionary I substitute the dict with the empty one. I don't forget to store the first repeating element in a new dictionary and do this. Another important point is at the end to append the count after the loop. Because the tail of the string (if it has the largest non-repeating sequence of letters) should be considered.
for i in range(len(list)):
if list[i] not in dict:
dict[list[i]] = 1
count += 1
elif list[i] in dict:
dict = {}
dict[list[i]] = 1
result.append(count)
count = 1
result.append(count)
print(result)
return max(result)
Here i make my function to choose choose the largest between the string and the inverse of it, to deal with the cases 'adabc', where the largest substring is at the end.
if len(this) != 0:
print(max(function(this), function(this[::-1])))
else:
print('')
I need help of people to tell me where in the approach to the problem I am wrong and edit my code.
Hopefully you might find this a little easier. The idea is to keep track of the seen substrings up to a given point in a set for a faster lookup, and if the current value is contained, build the set anew and append the substring seen up to that point. As you mention you have to check whether the last values have been added or not, hence the final if:
s = 'abcagfhtgba'
seen = set()
out = []
current_out = []
for i in s:
if i not in seen:
current_out += i
seen.update(i)
else:
seen = set(i)
out.append(''.join(current_out))
current_out = [i]
if current_out:
out.append(''.join(current_out))
max(out, key=len)
# 'agfht'
So some key differences:
Iterate over the string itself, not a range
Use sets rather than counts and dictionaries
Remember the last duplicate you have seen, maintain a map of letter to index. If you have already seen then this is duplicate, so we need to reset the index. But index can be this new one or just after the last duplicate character is seen.
s = 'abcagfhtgba'
seen = dict()
longest = ""
start = 0
last_duplicate = 0
for i, c in enumerate(s):
if seen.has_key(c):
if len(longest) < (i - start + 1):
longest = s[start:i]
new_start = seen.get(c) + 1
if last_duplicate > new_start:
start = i
else:
start = new_start
last_duplicate = i
seen[c] = I
if len(longest) < (len(s) - start + 1):
longest = s[start:]
print longest

I want to create a program that finds the longest substring in alphabetical order using python using basics

my code for finding longest substring in alphabetical order using python
what I mean by longest substring in alphabetical order?
if the input was"asdefvbrrfqrstuvwxffvd" the output wil be "qrstuvwx"
#we well use the strings as arrays so don't be confused
s='abcbcd'
#give spaces which will be our deadlines
h=s+' (many spaces) '
#creat outputs
g=''
g2=''
#list of alphapets
abc='abcdefghijklmnopqrstuvwxyz'
#create the location of x"the character the we examine" and its limit
limit=len(s)
#start from 1 becouse we substract one in the rest of the code
x=1
while (x<limit):
#y is the curser that we will move the abc array on it
y=0
#putting our break condition first
if ((h[x]==' ') or (h[x-1]==' ')):
break
for y in range(0,26):
#for the second character x=1
if ((h[x]==abc[y]) and (h[x-1]==abc[y-1]) and (x==1)):
g=g+abc[y-1]+abc[y]
x+=1
#for the third to the last character x>1
if ((h[x]==abc[y]) and (h[x-1]==abc[y-1]) and (x!=1)):
g=g+abc[y]
x+=1
if (h[x]==' '):
break
print ("Longest substring in alphabetical order is:" +g )
it doesn't end,as if it's in infinite loop
what should I do?
I am a beginner so I want some with for loops not functions from libraries
Thanks in advance
To avoid infinite loop add x += 1 in the very end of your while-loop. As a result your code works but works wrong in general case.
The reason why it works wrong is that you use only one variable g to store the result. Use at least two variables to compare previous found substring and new found substring or use list to remember all substrings and then choose the longest one.
s = 'abcbcdiawuhdawpdijsamksndaadhlmwmdnaowdihasoooandalw'
longest = ''
current = ''
for idx, item in enumerate(s):
if idx == 0 or item > s[idx-1]:
current = current + item
if idx > 0 and item <= s[idx-1]:
current = ''
if len(current)>len(longest):
longest = current
print(longest)
Output:
dhlmw
For your understanding 'b'>'a' is True, 'a'>'b' is False etc
Edit:
For longest consecutive substring:
s = 'asdefvbrrfqrstuvwxffvd'
abc = 'abcdefghijklmnopqrstuvwxyz'
longest = ''
current = ''
for idx, item in enumerate(s):
if idx == 0 or abc.index(item) - abc.index(s[idx-1]) == 1:
current = current + item
else:
current = item
if len(current)>len(longest):
longest = current
print(longest)
Output:
qrstuvwx
def sub_strings(string):
substring = ''
string +='\n'
i = 0
string_dict ={}
while i < len(string)-1:
substring += string[i]
if ord(substring[-1])+1 != ord(string[i+1]):
string_dict[substring] = len(substring)
substring = ''
i+=1
return string_dict
s='abcbcd'
sub_strings(s)
{'abc': 3, 'bcd': 3}
To find the longest you can domax(sub_strings(s))
So here which one do you want to be taken as the longest??. Now that is a problem you would need to solve
You can iterate through the string and keep comparing to the last character and append to the potentially longest string if the current character is greater than the last character by one ordinal number:
def longest_substring(s):
last = None
current = longest = ''
for c in s:
if not last or ord(c) - ord(last) == 1:
current += c
else:
if len(current) > len(longest):
longest = current
current = c
last = c
if len(current) > len(longest):
longest = current
return longest
so that:
print(longest_substring('asdefvbrrfqrstuvwxffvd'))
would output:
qrstuvwx

Most frequent character in Python 3.3

This program lets the user enter a string and displays the character that appears most frequently in a string.
I need help explaining frequent = i.
# This program displays the character that appears most frequently in the string
def main():
# Local variables.
count = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
letters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
index = 0
frequent = 0
# Get input.
user_string = input('Enter a string: ')
for ch in user_string:
ch = ch.upper()
# Determine which letter this character is.
index = letters.find(ch)
if index >= 0:
# Increase counting array for this letter.
count[index] = count[index] + 1
# Please help me explain this entire part!
for i in range(len(count)):
if count[i] > count[frequent]:
frequent = i
print('The character that appears most frequently' \
' in the string is ', letters[frequent], '.', \
sep='')
# Call main
main()
The code snippet in question:
for i in range(len(count)):
if count[i] > count[frequent]:
frequent = i
First the for loop iterates over the length of count which is 26.
The if statement:
if count[i] > count[frequent]:
Checks to see if the current letter in the for loop is larger than the current most frequent character. If it is then it sets the new most frequent character as the index of the for loop.
For example,
If A is referenced 12 times and B is referenced 14 then on the second loop when i = 1 the if statement would look like this:
if 12 > 14:
frequent = 1
This sets frequent to 1 which can be used to find the frequency in count for ex.
count[1] == 14
There are 26 different items in the list count, and 26 letters in the charset. It iterates through the count list for each item (that's the for i in range (len(count)) part) and then sees if the value of that item is greater than the value of the current largest item it's found - simply speaking it finds the largest value in the array, but instead of getting the value it gets the index, frequent = i is setting the index of the largest value currently found as it iterates to the variable frequent. It's simpler and more pythonistic to simply do
frequent = index(max(count)
which has EXACTLY the same effect

Categories

Resources