From the beginning I want to point out, that I am using Python Language. In this question I initially have a string. For example 'abcagfhtgba'. I need to find the length of the largest substring of non-repeating letters. In the case provided above it is 'agfht' (5), because at position [4] the 'a' repeats, so we start the count from the begining.
My idea for this question is to create a dictionary, which stores letters as keys, and numbers of their appearances as values. Whenever any key has corresponding value 2, we append the length of the dictionary to the list named result and completely substitute it with an empty list. For some tests this approach holds, for some not. I will provide the code that I used with brief comments of explanation.
Here I store the input in form of a list
this = list(map(str, input()))
def function(list):
dict = {}
count = 0
result = [1]
Here I start the loop and for every element if it is not in the keys I create a key
with value 1. If the element is in the dictionary I substitute the dict with the empty one. I don't forget to store the first repeating element in a new dictionary and do this. Another important point is at the end to append the count after the loop. Because the tail of the string (if it has the largest non-repeating sequence of letters) should be considered.
for i in range(len(list)):
if list[i] not in dict:
dict[list[i]] = 1
count += 1
elif list[i] in dict:
dict = {}
dict[list[i]] = 1
result.append(count)
count = 1
result.append(count)
print(result)
return max(result)
Here i make my function to choose choose the largest between the string and the inverse of it, to deal with the cases 'adabc', where the largest substring is at the end.
if len(this) != 0:
print(max(function(this), function(this[::-1])))
else:
print('')
I need help of people to tell me where in the approach to the problem I am wrong and edit my code.
Hopefully you might find this a little easier. The idea is to keep track of the seen substrings up to a given point in a set for a faster lookup, and if the current value is contained, build the set anew and append the substring seen up to that point. As you mention you have to check whether the last values have been added or not, hence the final if:
s = 'abcagfhtgba'
seen = set()
out = []
current_out = []
for i in s:
if i not in seen:
current_out += i
seen.update(i)
else:
seen = set(i)
out.append(''.join(current_out))
current_out = [i]
if current_out:
out.append(''.join(current_out))
max(out, key=len)
# 'agfht'
So some key differences:
Iterate over the string itself, not a range
Use sets rather than counts and dictionaries
Remember the last duplicate you have seen, maintain a map of letter to index. If you have already seen then this is duplicate, so we need to reset the index. But index can be this new one or just after the last duplicate character is seen.
s = 'abcagfhtgba'
seen = dict()
longest = ""
start = 0
last_duplicate = 0
for i, c in enumerate(s):
if seen.has_key(c):
if len(longest) < (i - start + 1):
longest = s[start:i]
new_start = seen.get(c) + 1
if last_duplicate > new_start:
start = i
else:
start = new_start
last_duplicate = i
seen[c] = I
if len(longest) < (len(s) - start + 1):
longest = s[start:]
print longest
Related
I am trying to find the max character count in a string using loop. So far, this is the code i have written :-
def max_char_count(string):
max_char = ''
max_count = 0
for char in string:
count = string.count(char)
if count > max_count:
max_count = count
max_char = char
return max_char
print(max_char_count('apple hellooo'))
But the issue i am running into is even though there are 3 l and 3 o. I am only getting the output as l. How can i adjust the code to show the right count for the characters? Thank you.
Your approach is inefficient because you need to iterate over the whole string to compute string.count(char) while you iterate over the entire string anyway, which gives a time complexity of O(n^2)
Instead, I suggest you calculate the counts of each character once looping through the string, and then select the one(s) with the maximum count.
def max_char_count(string):
ret = []
counter = dict() # create an empty dict to store the character counts
# Itrate over the string once to count the characters
# N iterations
for char in string:
# `.get()` returns the given default value (0) if the `char` key doesn't exist
# If it does, it returns the value for that key
# Then you increment it
counter[char] = counter.get(char, 0) + 1
# Find the max value
# (another loop over the dict, worst-case N iterations)
max_count = max(counter.values())
# Iterate over the dict one last time to get the keys that have a value == max_count
# Again, worst-case N iterations
for char, count in counter.items():
if count == max_count:
ret.append((char, count))
return ret
Now, print(max_char_count('apple hellooo')) returns [('o', 3), ('l', 3)]
If you don't want to reinvent the counting wheel, use collections.Counter instead. counter = collections.Counter(string)
Since we have three loops and none of them are nested loops, we get a time complexity of O(3*n) (or just O(n))
You can use a dictionary of characters to count the occurrences, then find the largest count and return all keys that have that count. The fromkeys() constructor will allow you to initialize the letter counts to zero thus making the counting loop much simpler. Selecting the letters with the maximum count can be done using a list comprehension.
This will compute the result in very few lines of code:
string = 'apple hellooo'
counts = dict.fromkeys(string,0) # initialize counts to zero
for c in string: counts[c] += 1 # compute characters counts
max_count = max(counts.values()) # find maximum count
result = [c for c,n in counts.items() if n==max_count] # matching characters
print(result)
['o', 'l']
Here I have a string in a list:
['aaaaaaappppppprrrrrriiiiiilll']
I want to get the word 'april' in the list, but not just one of them, instead how many times the word 'april' actually occurs the string.
The output should be something like:
['aprilaprilapril']
Because the word 'april' occurred three times in that string.
Well the word actually didn't occurred three times, all the characters did. So I want to order these characters to 'april' for how many times did they appeared in the string.
My idea is basically to extract words from some random strings, but not just extracting the word, instead to extract all of the word that appears in the string. Each word should be extracted and the word (characters) should be ordered the way I wanted to.
But here I have some annoying conditions; you can't delete all the elements in the list and then just replace them with the word 'april'(you can't replace the whole string with the word 'april'); you can only extract 'april' from the string, not replacing them. You can't also delete the list with the string. Just think of all the string there being very important data, we just want some data, but these data must be ordered, and we need to delete all other data that doesn't match our "data chain" (the word 'april'). But once you delete the whole string you will lose all the important data. You don't know how to make another one of these "data chains", so we can't just put the word 'april' back in the list.
If anyone know how to solve my weird problem, please help me out, I am a beginner python programmer. Thank you!
One way is to use itertools.groupby which will group the characters individually and unpack and iterate them using zip which will iterate n times given n is the number of characters in the smallest group (i.e. the group having lowest number of characters)
from itertools import groupby
'aaaaaaappppppprrrrrriiiiiilll'
result = ''
for each in zip(*[list(g) for k, g in groupby('aaaaaaappppppprrrrrriiiiiilll')]):
result += ''.join(each)
# result = 'aprilaprilapril'
Another possible solution is to create a custom counter that will count each unique sequence of characters (Please be noted that this method will work only for Python 3.6+, for lower version of Python, order of dictionaries is not guaranteed):
def getCounts(strng):
if not strng:
return [], 0
counts = {}
current = strng[0]
for c in strng:
if c in counts.keys():
if current==c:
counts[c] += 1
else:
current = c
counts[c] = 1
return counts.keys(), min(counts.values())
result = ''
counts=getCounts('aaaaaaappppppprrrrrriiiiiilll')
for i in range(counts[1]):
result += ''.join(counts[0])
# result = 'aprilaprilapril'
How about using regex?
import re
word = 'april'
text = 'aaaaaaappppppprrrrrriiiiiilll'
regex = "".join(f"({c}+)" for c in word)
match = re.match(regex, text)
if match:
# Find the lowest amount of character repeats
lowest_amount = min(len(g) for g in match.groups())
print(word * lowest_amount)
else:
print("no match")
Outputs:
aprilaprilapril
Works like a charm
Here is a more native approach, with plain iteration.
It has a time complexity of O(n).
It uses an outer loop to iterate over the character in the search key, then an inner while loop that consumes all occurrences of that character in the search string while maintaining a counter. Once all consecutive occurrences of the current letter have been consumes, it updates a the minLetterCount to be the minimum of its previous value or this new count. Once we have iterated over all letters in the key, we return this accumulated minimum.
def countCompleteSequenceOccurences(searchString, key):
left = 0
minLetterCount = 0
letterCount = 0
for i, searchChar in enumerate(key):
while left < len(searchString) and searchString[left] == searchChar:
letterCount += 1
left += 1
minLetterCount = letterCount if i == 0 else min(minLetterCount, letterCount)
letterCount = 0
return minLetterCount
Testing:
testCasesToOracles = {
"aaaaaaappppppprrrrrriiiiiilll": 3,
"ppppppprrrrrriiiiiilll": 0,
"aaaaaaappppppprrrrrriiiiii": 0,
"aaaaaaapppppppzzzrrrrrriiiiiilll": 0,
"pppppppaaaaaaarrrrrriiiiiilll": 0,
"zaaaaaaappppppprrrrrriiiiiilll": 3,
"zzzaaaaaaappppppprrrrrriiiiiilll": 3,
"aaaaaaappppppprrrrrriiiiiilllzzz": 3,
"zzzaaaaaaappppppprrrrrriiiiiilllzzz": 3,
}
key = "april"
for case, oracle in testCasesToOracles.items():
result = countCompleteSequenceOccurences(case, key)
assert result == oracle
Usage:
key = "april"
result = countCompleteSequenceOccurences("aaaaaaappppppprrrrrriiiiiilll", key)
print(result * key)
Output:
aprilaprilapril
A word will only occur as many times as the minimum letter recurrence. To account for the possibility of having repeated letters in the word (for example, appril, you need to factor this count out. Here is one way of doing this using collections.Counter:
from collections import Counter
def count_recurrence(kernel, string):
# we need to count both strings
kernel_counter = Counter(kernel)
string_counter = Counter(string)
# now get effective count by dividing the occurence in string by occurrence
# in kernel
effective_counter = {
k: int(string_counter.get(k, 0)/v)
for k, v in kernel_counter.items()
}
# min occurence of kernel is min of effective counter
min_recurring_count = min(effective_counter.values())
return kernel * min_recurring_count
so i am doing the cs50 dna problem and i am having a hard time with the counter as i don't know how to code to properly count the highest amount of times the sequence i am looking for has repeated without another sequence in between the two. for example i am looking for the sequence AAT and the text is AATDHDHDTKSDHAATAAT so the highest amount should be two as the last two sequence is AAT and there is no sequence between them.
here is my code:
text="TCTAGTCTAGTCTAGTCTAGTCTAGACTTGTCGCTGACTCCGAGAAGATCCTAACATTAACCAATTCCCCCTAGTCTGAGGCACGGTTACCGATCGGGTTAATGGATCTCTCACCGTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAAACGTGTAACTGTAATAATCCGCCCGAAAAAACTGATCTTAGGGTTGCGGCATCTGCACGTGACAGTGTGCTACTGTTAGATAGAGGGATCAAACGAGGTTGCAAGGATTATATCTCTCCGTGCTCGATAAGACACAGCCGGTTGCGGGCTGCTTCCTCTGGATCCAATGCAGCCGTACGTACACCGTAGAGCAAATTTAGTGGTAAAGGAACTTGCTCAAACACTACGGCTTCGGGCTACTGTTGGCGCCGGTTGGGGATCCCATTCAACGCTGGCCCTTTCGCTATGGTTCGGTGATTTTACACCGAAGCGAACCTTGAACCGTGGATTTCGGGTGTCCTCCGTTTTTAGGTACTGCGTGCAGACATGGGCACCTGCCATAGTGCGATCAGCCAGAATCCATTGTATGGGAGTTGGACTCGTTTGAATTTACCGGAAACCTCATGCTTGGTCTGTAGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGAAACTGGGCGACTTGAAGTCGGCTTGCGTATTAATAGCTCTGCAATGTAACTCGGCCCTTGGCGGCGGGCAGCTTAGTATTGAACCGCGACACACCATAGGTGCGGCAAATATTAAAAGTACGCTCGAACCGGAACCTGTCTCCATGACTGGACGACCAGCCCGGCGTCTTCTACGTAACACAGGGGGCTGTCGAGGTAGGGCGTAGGAACTTCGGGGTCACTACGCCGTAACAGCACCGAATATCATATCATCCAACTTGCTTGGTACATGCCCCGTTCTGTATCAAAAGTTTACGGCCCCGGACATACCTGCTGTCAGTTGAATACCTATGCGAGTCTGAAACACGAATAGTTCAGGCGTGCAAAGACACGCTAAGCACACGCCGCAGGCAGGGGGGGTATTTTATAAGTCGTTTTTTGGAAGGGTAATGTAAACTTATCCCATAATACCCTTTGGCTTCCCCTCACTCGTGCACTTCTCATAATGATACGTCAGGGTGATTGTAGATTCACGCGTCATCAGATTGTCCCTTTCTCGAGTCTTAGTATCTTTCCTAATCCGCTCGACTCTGCGCCATGATCGAATTCCTGACAGGCTACAAGAATAAACTGCCAGCATACTCCTTACCGATTGGCGCCTACTAATTATACGCACATGGGCATCTTCGACGTCTAAACATAGGCTCTTAGTATTCCGTAGGATGTTGAGCCGACAGGAAAGTCAAACGTCGTGGGTGACCGTAGCCTGACTCGCCCGACGCAGGATTCGCTCATATGTGTGAACGGATGCTTATGTAACTTCCTAATTGCAGCGAATGGCAGTTCCGTAGTGAAGGTTCGAAACGTACGGGGTCCGGCCATGGATTAGATCTTTCAGTGCGCTAAACTCTTAACCGCAGATACTTGGCGGACCATCTTCGTGTTGCTACTATGGTATAGACCAGGCTGTCGAATCTACTTAACACAGGTGAACCCCCAGATCGGCTAGAGCCTTCGAGGCTAGACCTTTAACAATCTTTAGACACTTCCAAATCGCGGCCGGATATGTCTCGTTGGCAGCCGCAGACAAGAGAAGAGGGTCGGCAGTGTCTGCCACGCGTGACCTGTATGATCTTAGCCTTTAAGATCACACTACTGATCACAATCTATTATGATTGCCTTAGCTAACTGAGTGATGCACCCCCACAGGCTGAGAGAAATCTGTAGTTTGACGACACGCCGTCTGGCTAAAAATGTGAATCCGCCGATCCGAGACGGTGGAAGCTTGAGACCAAATGCGGGAAACCAATGACTTCATTACGGAACAAGACATAACGGCGTGAGTTGACGACTGGGATTAACCCTTTCCCGAGTCTGTACTTCTGCTACACAATGAGGATGCGAATTATCTAAGACCTTGTACTACCTAAACTAACCCTGAGGCGGGCATTGAATTCCGGCCATCTTCAGCCCAAAGAAAGACCAAATGTGAGGAAAATGAGGGATCGGTATAAGCTTTTCACGATCTCAAGGTTCACGGCCGCCAGGGCCGTAGTTGGGGCTTCATGCACATTGCCAACCCGGACATCGACAGTCGGTACCGCAGGGGTTCGAGGAATACTCCCAGCTGTGACACCTGGTCGTCGACTGGACCCAGCTGGTGGGCGGCATAGGTAGTTAATACTGAATTAAAGCCGGGAACGTCTCTCTAACTAGAAACCTTGTGATAGGATACACAGACCTAGTGCCCCGACGTTAGCATTTGAATTCATCTATCTTGGCGTCTTTTAGTAGGCCTGGGTCAACTCCGGCGTTGGCCAAAATAACCGATCTGCGTTATGTGGCCACGCATCGAGTGACAGGGTGCATACAAATTGATGGTCAAAGAGTTTAAACAAGACAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCCCCACGCTTCTACATAGCCACACTGGAGCTAGTCCTCGTGTTAAATTTTTCGCTTGTTGCACGGTTATCATCAGAAGTGCCACTGGTATTCCTCTGTAGCTCCCGTATGCCGAAGGTTGCGGCTTAGGTACTGCTTATACACGTCTCTCAAGTTTGTCAGCCGCGTGATCTTTCTGCGGGGATAGGTGATCGTCCCTCGCTCCGGACATTGCATTAAAATTACCTAGTTGATAGGGCGGCGGAGTTGCATACCGGCGTTCAATCGCGGCTCCAGACTGGTTTGAGCTACGCGTCTGCCAGCGTGAAAAAGCTGATTTGTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCCAGGTATTATCATTTGAATCGTATGTTTTCTGCCGTACGTCAACTGCGTCGTCGGGGACTGAAATGGTCTGCCTCCAGACCCTTACCTCCCGATAAGCCATGACTAAGTATGTGAAGGATCACCTGAATTGCTGAAAGTTAACGGTAAGATATCTGAAAGAGCTCATTAGATCCAACACTTATCTACTCAAAAATTCGTCATATTTCGGTGACTTGCTAGAAAGGCTCTTGCACAGTAAGGTTATAGAGAATGCTACCGTTGAAGCACCAGCCGTTGAAGCCCGCCTTTAACCACGCGATATATCCAATTAACCAAGGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGTCGCCTTGTAATAATTACTTTGGCCCGGATTATAACGAAGGAACTCGCCATGAACTCGCAGCACGTTGTACTGGAACAATCTACTTTTTATAATATAGCGATAACTCCCAGCTTTTATGTGGGTGATATTGTCCTAGCTTTTTAAAGATACCCTCTGGCCCGGTCCAAGTAAGGTCCACATTGCCTGACGTAAGCGTACGGTCAACGGGTGCACCGGTTCCCGCTAAAGCTCGATCCTATTCTTTCAGTCGGGGGGAAATAAACTCGTATACTCTCCACCCACCCGTACGTCCCGGACTAGAATAACTACCGGGTATTTCCGGTTCGTAACACCACGCCATGACGTGTCAACATAAACGCTTCTTTTGAAAGGTGCACATGCAGATTGCACAAGCAGCAGGCACCGCCCTTATCCATATCCTGTTGAGGCCCTCGATCCTAGTGTTCCTTGTTATCAGGATATTTTCTCGCTGTACGTTATTGTCCTTTTCAAATTACAACTGACCGCTTCCTCACCCGCTAAACCCTACCTTACGCACAACCAAGGCCTTGTCCCGGATGAACCCGGCTGCTCCTATGGATAAGCAACCCAGCCCGGCAGTTTACTTCAGGTGTTATCGTCGACTGACACCCTCAGCTTTCTCCCATTACACAGCGAGTATTTTCCGCGTAGCAATGGCAGTGACTTTGAGCGCACACTCAGAAGCCGTTGGAATGGCACCGGGGACGGCCCGATTTAGCCCCGCACACCTCCTGGAATCTTAGATCGCACGGCGATCTCGGTTCAGGCACCAACCCCAAAGAGTGTTTTGAGTTTTTGGTATGGCTCGCCTCAATTATCGGTTTTCGCTGCTCTGTGCCTGTCAACTCGGCTAGCTGTCGTGTTTTGTCGATCAGTGCGTGGACACTCTCGGTCGATGGTCGTGGATGGGACTGTAGTAAGTTTCACCGAAGCAGGAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAACTTCGCTTCATATAACGTAGCCATAGTGCTGTCTGCCATCAATAAGTCTTGCTCAGTGGTGCATACGTCGGGGAGGTTTGTTCCGCCTGGTCAGAACGAGTCTAGGGCGAGCCTATAGGCCAGTCGAGAGCCAAGATTCTATGAAATTAATACGACTACTGGGTGAGAGGTCATACAATTCCCGTGGAATCTGTACCTAAGATATTTCCAGATAGGGATGGCTACTGGTTAAGTTGACAGTGTCTAGATACGTGAGAGCACCTGAGAGGACGCCACGAGTCGGAGCGTGGGCGATCACCCTTCTGAGTCATAAGTCATGTCTATATATCCCTCACTAAAAAGGGCACACGACTATACATGCTTGAGCTTTACGGTCTGGCATGTGGAATGCCCGGAGCAACCCAGTCTTACCATCCTTTACGTACATTTACCGACCCGGCAGTGGCCGGCGCGGAAACCCAGGAGAACGTCGGTCATGATACGCGCCCTCCGCCGAAAGCGTGCTCACACCTCAGGATATCAGCGCTATTACCGGACGTCCCGCGTCCACCATCTAATAATTCAGGTGCTCCTAATAAGTGGGCTGGAGAGCGAGGATTGATATACGTTGAGGAGCTCCGACGGCCCTCTCGTGCGTTTGATGTAGATTGCGTTACCGACGGAGCACGCGTTTGTCAATTTCTGTCTAGGGACGTTTATGTCCTCAATACGAATACCAGGCCTATTTTAGTGTACAAATCACTTAGCAGTCGGAATTGGAAACCTGATGGAAGCGT"
counter=0
length=len(text)
search="AGATC"
tmp=0
for i in range(length):
if text[i:i + len(search)] == search:
tmp += 1
if tmp > counter:
counter = tmp
if text[i:i + len(search)] != search:
tmp = 0
print("done")
print(counter)
try this
import re
sequence = "AATDHDHDTKSDHAATAAT"
matches = re.findall(r'(?:AAT)+', sequence)
largest = max(matches, key=len)
print(len(largest)//len('AAT'))
basically this way will find you the list of substrings in the string you are have then you choose the largest substring. The number of occurrences of the substring will be the length of the largest divide by the length of substring
First and foremost, the regex solution is the Python way to solve this. However, if you want your code repaired ...
The problem with your code is that your index fails to acknowledge that you've found a match. You have no way to recognize consecutive occurrences.
Consider the case where you've found the start of a triple-match, AATAATAAT. You get to the first A, recognize, the AAT and increment tmp. You go to the next loop iteration, and now i points at the second A. You see that it's not AAT here (it's ATA, spanning the first two occurrences), so you record one instance and reset all your status variables.
Instead, you have to jump to the end of the first match and look for a second. Since your index does not move smoothly by increments of 1, you'll want a while loop instead.
Please learn to use meaningful variable names where the variables have
any meaning. i is fine if all it does is to manage your loop. As
soon as you use it for anything else, give it a real name. Similarly,
tmp and count really need replacing.
snip_size = len(search)
pos = 0 # position in the genetic sequence
rep = 0 # number of consecutive repetitions
max_rep = 0 # longest repetition sequence found
while pos < length:
if text[pos:pos + snip_size] == search:
rep += 1
pos += snip_size
else:
max_rep = max(max_rep, rep)
rep = 0
pos += 1
print(max_rep, "repetitions found")
Output:
15 repetitions found
my code for finding longest substring in alphabetical order using python
what I mean by longest substring in alphabetical order?
if the input was"asdefvbrrfqrstuvwxffvd" the output wil be "qrstuvwx"
#we well use the strings as arrays so don't be confused
s='abcbcd'
#give spaces which will be our deadlines
h=s+' (many spaces) '
#creat outputs
g=''
g2=''
#list of alphapets
abc='abcdefghijklmnopqrstuvwxyz'
#create the location of x"the character the we examine" and its limit
limit=len(s)
#start from 1 becouse we substract one in the rest of the code
x=1
while (x<limit):
#y is the curser that we will move the abc array on it
y=0
#putting our break condition first
if ((h[x]==' ') or (h[x-1]==' ')):
break
for y in range(0,26):
#for the second character x=1
if ((h[x]==abc[y]) and (h[x-1]==abc[y-1]) and (x==1)):
g=g+abc[y-1]+abc[y]
x+=1
#for the third to the last character x>1
if ((h[x]==abc[y]) and (h[x-1]==abc[y-1]) and (x!=1)):
g=g+abc[y]
x+=1
if (h[x]==' '):
break
print ("Longest substring in alphabetical order is:" +g )
it doesn't end,as if it's in infinite loop
what should I do?
I am a beginner so I want some with for loops not functions from libraries
Thanks in advance
To avoid infinite loop add x += 1 in the very end of your while-loop. As a result your code works but works wrong in general case.
The reason why it works wrong is that you use only one variable g to store the result. Use at least two variables to compare previous found substring and new found substring or use list to remember all substrings and then choose the longest one.
s = 'abcbcdiawuhdawpdijsamksndaadhlmwmdnaowdihasoooandalw'
longest = ''
current = ''
for idx, item in enumerate(s):
if idx == 0 or item > s[idx-1]:
current = current + item
if idx > 0 and item <= s[idx-1]:
current = ''
if len(current)>len(longest):
longest = current
print(longest)
Output:
dhlmw
For your understanding 'b'>'a' is True, 'a'>'b' is False etc
Edit:
For longest consecutive substring:
s = 'asdefvbrrfqrstuvwxffvd'
abc = 'abcdefghijklmnopqrstuvwxyz'
longest = ''
current = ''
for idx, item in enumerate(s):
if idx == 0 or abc.index(item) - abc.index(s[idx-1]) == 1:
current = current + item
else:
current = item
if len(current)>len(longest):
longest = current
print(longest)
Output:
qrstuvwx
def sub_strings(string):
substring = ''
string +='\n'
i = 0
string_dict ={}
while i < len(string)-1:
substring += string[i]
if ord(substring[-1])+1 != ord(string[i+1]):
string_dict[substring] = len(substring)
substring = ''
i+=1
return string_dict
s='abcbcd'
sub_strings(s)
{'abc': 3, 'bcd': 3}
To find the longest you can domax(sub_strings(s))
So here which one do you want to be taken as the longest??. Now that is a problem you would need to solve
You can iterate through the string and keep comparing to the last character and append to the potentially longest string if the current character is greater than the last character by one ordinal number:
def longest_substring(s):
last = None
current = longest = ''
for c in s:
if not last or ord(c) - ord(last) == 1:
current += c
else:
if len(current) > len(longest):
longest = current
current = c
last = c
if len(current) > len(longest):
longest = current
return longest
so that:
print(longest_substring('asdefvbrrfqrstuvwxffvd'))
would output:
qrstuvwx
the question was actually like
Write a program that says the number of terms to be replaced to get all the numbers same in the list
ex
a=[1,1,2,1,1,4,1,1]
then the expected output is 2
I guess you are looking for the minimum number of numbers to be replaced so that all numbers are the same.
# store occurrences of each number in a dictionary
d = {}
for i in a:
if i not in d:
d[i] = 1
else:
d[i] += 1
# get key with max value: i.e. most frequent number in initial list
max_k = max(d, key=d.get)
# delete element that appears most often in the initial list
del d[max_k]
# count rest of the numbers => numbers to be replaced
nums_to_replaces = sum([d[k] for k in d])
Your wording is a bit strange.
If you want the number of items that are not one:
len([i for i in a if i!=1])
or
len(a)-a.count(1)
if you want the unique set of numbers that are not repeated as the question says:
len([i for i in set(a) if a.count(i)==1])
You could iterate over the list and make a dictionary of the occurrence:
occurrences = {}
for x in range(len(a)):
if a[x] not in occurences:
occurrences[x] = 1
else:
occurrences[x] += 1
Now you could have a dict of the occurrences and you could do this to get the number of terms to be replaced:
nums = list(occurrences.values()).sort(reverse=True) # Sort I descending order with num of repetitions
print(sum(nums[1:])) # Add occurrences of all but the element with max occurrences
#Or just subtract the max occurrence from the length of list
print(len(a) - max(list(occurrences.values())))
This will give you the least number of changes need to be made. In this way you cannot specify which element you want the whole list to contain