Python count max character in a string? - python

I am trying to find the max character count in a string using loop. So far, this is the code i have written :-
def max_char_count(string):
max_char = ''
max_count = 0
for char in string:
count = string.count(char)
if count > max_count:
max_count = count
max_char = char
return max_char
print(max_char_count('apple hellooo'))
But the issue i am running into is even though there are 3 l and 3 o. I am only getting the output as l. How can i adjust the code to show the right count for the characters? Thank you.

Your approach is inefficient because you need to iterate over the whole string to compute string.count(char) while you iterate over the entire string anyway, which gives a time complexity of O(n^2)
Instead, I suggest you calculate the counts of each character once looping through the string, and then select the one(s) with the maximum count.
def max_char_count(string):
ret = []
counter = dict() # create an empty dict to store the character counts
# Itrate over the string once to count the characters
# N iterations
for char in string:
# `.get()` returns the given default value (0) if the `char` key doesn't exist
# If it does, it returns the value for that key
# Then you increment it
counter[char] = counter.get(char, 0) + 1
# Find the max value
# (another loop over the dict, worst-case N iterations)
max_count = max(counter.values())
# Iterate over the dict one last time to get the keys that have a value == max_count
# Again, worst-case N iterations
for char, count in counter.items():
if count == max_count:
ret.append((char, count))
return ret
Now, print(max_char_count('apple hellooo')) returns [('o', 3), ('l', 3)]
If you don't want to reinvent the counting wheel, use collections.Counter instead. counter = collections.Counter(string)
Since we have three loops and none of them are nested loops, we get a time complexity of O(3*n) (or just O(n))

You can use a dictionary of characters to count the occurrences, then find the largest count and return all keys that have that count. The fromkeys() constructor will allow you to initialize the letter counts to zero thus making the counting loop much simpler. Selecting the letters with the maximum count can be done using a list comprehension.
This will compute the result in very few lines of code:
string = 'apple hellooo'
counts = dict.fromkeys(string,0) # initialize counts to zero
for c in string: counts[c] += 1 # compute characters counts
max_count = max(counts.values()) # find maximum count
result = [c for c,n in counts.items() if n==max_count] # matching characters
print(result)
['o', 'l']

Related

How to extract words from repeating strings

Here I have a string in a list:
['aaaaaaappppppprrrrrriiiiiilll']
I want to get the word 'april' in the list, but not just one of them, instead how many times the word 'april' actually occurs the string.
The output should be something like:
['aprilaprilapril']
Because the word 'april' occurred three times in that string.
Well the word actually didn't occurred three times, all the characters did. So I want to order these characters to 'april' for how many times did they appeared in the string.
My idea is basically to extract words from some random strings, but not just extracting the word, instead to extract all of the word that appears in the string. Each word should be extracted and the word (characters) should be ordered the way I wanted to.
But here I have some annoying conditions; you can't delete all the elements in the list and then just replace them with the word 'april'(you can't replace the whole string with the word 'april'); you can only extract 'april' from the string, not replacing them. You can't also delete the list with the string. Just think of all the string there being very important data, we just want some data, but these data must be ordered, and we need to delete all other data that doesn't match our "data chain" (the word 'april'). But once you delete the whole string you will lose all the important data. You don't know how to make another one of these "data chains", so we can't just put the word 'april' back in the list.
If anyone know how to solve my weird problem, please help me out, I am a beginner python programmer. Thank you!
One way is to use itertools.groupby which will group the characters individually and unpack and iterate them using zip which will iterate n times given n is the number of characters in the smallest group (i.e. the group having lowest number of characters)
from itertools import groupby
'aaaaaaappppppprrrrrriiiiiilll'
result = ''
for each in zip(*[list(g) for k, g in groupby('aaaaaaappppppprrrrrriiiiiilll')]):
result += ''.join(each)
# result = 'aprilaprilapril'
Another possible solution is to create a custom counter that will count each unique sequence of characters (Please be noted that this method will work only for Python 3.6+, for lower version of Python, order of dictionaries is not guaranteed):
def getCounts(strng):
if not strng:
return [], 0
counts = {}
current = strng[0]
for c in strng:
if c in counts.keys():
if current==c:
counts[c] += 1
else:
current = c
counts[c] = 1
return counts.keys(), min(counts.values())
result = ''
counts=getCounts('aaaaaaappppppprrrrrriiiiiilll')
for i in range(counts[1]):
result += ''.join(counts[0])
# result = 'aprilaprilapril'
How about using regex?
import re
word = 'april'
text = 'aaaaaaappppppprrrrrriiiiiilll'
regex = "".join(f"({c}+)" for c in word)
match = re.match(regex, text)
if match:
# Find the lowest amount of character repeats
lowest_amount = min(len(g) for g in match.groups())
print(word * lowest_amount)
else:
print("no match")
Outputs:
aprilaprilapril
Works like a charm
Here is a more native approach, with plain iteration.
It has a time complexity of O(n).
It uses an outer loop to iterate over the character in the search key, then an inner while loop that consumes all occurrences of that character in the search string while maintaining a counter. Once all consecutive occurrences of the current letter have been consumes, it updates a the minLetterCount to be the minimum of its previous value or this new count. Once we have iterated over all letters in the key, we return this accumulated minimum.
def countCompleteSequenceOccurences(searchString, key):
left = 0
minLetterCount = 0
letterCount = 0
for i, searchChar in enumerate(key):
while left < len(searchString) and searchString[left] == searchChar:
letterCount += 1
left += 1
minLetterCount = letterCount if i == 0 else min(minLetterCount, letterCount)
letterCount = 0
return minLetterCount
Testing:
testCasesToOracles = {
"aaaaaaappppppprrrrrriiiiiilll": 3,
"ppppppprrrrrriiiiiilll": 0,
"aaaaaaappppppprrrrrriiiiii": 0,
"aaaaaaapppppppzzzrrrrrriiiiiilll": 0,
"pppppppaaaaaaarrrrrriiiiiilll": 0,
"zaaaaaaappppppprrrrrriiiiiilll": 3,
"zzzaaaaaaappppppprrrrrriiiiiilll": 3,
"aaaaaaappppppprrrrrriiiiiilllzzz": 3,
"zzzaaaaaaappppppprrrrrriiiiiilllzzz": 3,
}
key = "april"
for case, oracle in testCasesToOracles.items():
result = countCompleteSequenceOccurences(case, key)
assert result == oracle
Usage:
key = "april"
result = countCompleteSequenceOccurences("aaaaaaappppppprrrrrriiiiiilll", key)
print(result * key)
Output:
aprilaprilapril
A word will only occur as many times as the minimum letter recurrence. To account for the possibility of having repeated letters in the word (for example, appril, you need to factor this count out. Here is one way of doing this using collections.Counter:
from collections import Counter
def count_recurrence(kernel, string):
# we need to count both strings
kernel_counter = Counter(kernel)
string_counter = Counter(string)
# now get effective count by dividing the occurence in string by occurrence
# in kernel
effective_counter = {
k: int(string_counter.get(k, 0)/v)
for k, v in kernel_counter.items()
}
# min occurence of kernel is min of effective counter
min_recurring_count = min(effective_counter.values())
return kernel * min_recurring_count

Largest substring of non-repeating letters of a string

From the beginning I want to point out, that I am using Python Language. In this question I initially have a string. For example 'abcagfhtgba'. I need to find the length of the largest substring of non-repeating letters. In the case provided above it is 'agfht' (5), because at position [4] the 'a' repeats, so we start the count from the begining.
My idea for this question is to create a dictionary, which stores letters as keys, and numbers of their appearances as values. Whenever any key has corresponding value 2, we append the length of the dictionary to the list named result and completely substitute it with an empty list. For some tests this approach holds, for some not. I will provide the code that I used with brief comments of explanation.
Here I store the input in form of a list
this = list(map(str, input()))
def function(list):
dict = {}
count = 0
result = [1]
Here I start the loop and for every element if it is not in the keys I create a key
with value 1. If the element is in the dictionary I substitute the dict with the empty one. I don't forget to store the first repeating element in a new dictionary and do this. Another important point is at the end to append the count after the loop. Because the tail of the string (if it has the largest non-repeating sequence of letters) should be considered.
for i in range(len(list)):
if list[i] not in dict:
dict[list[i]] = 1
count += 1
elif list[i] in dict:
dict = {}
dict[list[i]] = 1
result.append(count)
count = 1
result.append(count)
print(result)
return max(result)
Here i make my function to choose choose the largest between the string and the inverse of it, to deal with the cases 'adabc', where the largest substring is at the end.
if len(this) != 0:
print(max(function(this), function(this[::-1])))
else:
print('')
I need help of people to tell me where in the approach to the problem I am wrong and edit my code.
Hopefully you might find this a little easier. The idea is to keep track of the seen substrings up to a given point in a set for a faster lookup, and if the current value is contained, build the set anew and append the substring seen up to that point. As you mention you have to check whether the last values have been added or not, hence the final if:
s = 'abcagfhtgba'
seen = set()
out = []
current_out = []
for i in s:
if i not in seen:
current_out += i
seen.update(i)
else:
seen = set(i)
out.append(''.join(current_out))
current_out = [i]
if current_out:
out.append(''.join(current_out))
max(out, key=len)
# 'agfht'
So some key differences:
Iterate over the string itself, not a range
Use sets rather than counts and dictionaries
Remember the last duplicate you have seen, maintain a map of letter to index. If you have already seen then this is duplicate, so we need to reset the index. But index can be this new one or just after the last duplicate character is seen.
s = 'abcagfhtgba'
seen = dict()
longest = ""
start = 0
last_duplicate = 0
for i, c in enumerate(s):
if seen.has_key(c):
if len(longest) < (i - start + 1):
longest = s[start:i]
new_start = seen.get(c) + 1
if last_duplicate > new_start:
start = i
else:
start = new_start
last_duplicate = i
seen[c] = I
if len(longest) < (len(s) - start + 1):
longest = s[start:]
print longest

count the total numbers of unique letters occurred once in a string in python?

a = 'abhishek'
count = 0
for x in a:
if x in a:
count += 1
print(count)
I have tried this but it gives me the total number of letters. I want only a unique latter that occurs only once.
len(set(a)) will give you the unique count of letters
Edit: add explanation
set(a) returns a container of all the unique characters (Python calls this the set) in the string a. Then len() gets the count of that set, which corresponds to the count of unique chars in string a.
You are iterating the string and checking the letter in the string itself, so your if condition is always True in this case.
What you need is to maintain a separate list of all the letters you have already seen while iterating the string. Like this,
uniq_list = []
a = 'abhishek'
count = 0
for x in a:
if x not in uniq_list: # check if the letter is already seen.
count += 1 # increase the counter only when the letter is not seen.
uniq_list.append(x) # add the letter in the list to mark it as seen.
print(count)
a = 'abhishek'
count = 0
uls = set()
nls = set()
for x in a:
if x not in uls:
uls.add(x)
else:
nls.add(x)
print(len(uls - nls))
it will print char, which occur only once.
Output: 6
Why not just:
a = 'abhishek'
a.count('a') # or any other letter you want to count.
1
Is this what you want?

Most frequent character in Python 3.3

This program lets the user enter a string and displays the character that appears most frequently in a string.
I need help explaining frequent = i.
# This program displays the character that appears most frequently in the string
def main():
# Local variables.
count = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
letters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
index = 0
frequent = 0
# Get input.
user_string = input('Enter a string: ')
for ch in user_string:
ch = ch.upper()
# Determine which letter this character is.
index = letters.find(ch)
if index >= 0:
# Increase counting array for this letter.
count[index] = count[index] + 1
# Please help me explain this entire part!
for i in range(len(count)):
if count[i] > count[frequent]:
frequent = i
print('The character that appears most frequently' \
' in the string is ', letters[frequent], '.', \
sep='')
# Call main
main()
The code snippet in question:
for i in range(len(count)):
if count[i] > count[frequent]:
frequent = i
First the for loop iterates over the length of count which is 26.
The if statement:
if count[i] > count[frequent]:
Checks to see if the current letter in the for loop is larger than the current most frequent character. If it is then it sets the new most frequent character as the index of the for loop.
For example,
If A is referenced 12 times and B is referenced 14 then on the second loop when i = 1 the if statement would look like this:
if 12 > 14:
frequent = 1
This sets frequent to 1 which can be used to find the frequency in count for ex.
count[1] == 14
There are 26 different items in the list count, and 26 letters in the charset. It iterates through the count list for each item (that's the for i in range (len(count)) part) and then sees if the value of that item is greater than the value of the current largest item it's found - simply speaking it finds the largest value in the array, but instead of getting the value it gets the index, frequent = i is setting the index of the largest value currently found as it iterates to the variable frequent. It's simpler and more pythonistic to simply do
frequent = index(max(count)
which has EXACTLY the same effect

Find longest substring in alphabetical order

I want to write a program that prints the longest substring in alphabetical order.
And in case of ties, it prints the first substring.
Here is what I wrote
import sys
s1 = str(sys.argv[1])
alpha = "abcdefghijklmnopqrstuvwxyz"
def longest_substring(s1):
for i in range(len(alpha)):
for k in range(len(alpha)):
if alpha[i:k] in s1:
return alpha[i:k]
print("Longest substring in alphabetical order:", longest_substring(s1))
However, it does not work and I do not know how to do the second part.
Can you help me, please?
Here is what your code should look like to achieve what you want:
#!/usr/bin/env python3.6
import sys
s1 = str(sys.argv[1])
alpha = "abcdefghijklmnopqrstuvwxyz"
subs = []
def longest_substring(s1):
for i in range(len(alpha)):
for k in range(len(alpha)):
if alpha[i:k] in s1:
subs.append(alpha[i:k])
return max(subs, key=len)
print("Longest substring in alphabetical order:", longest_substring(s1))
You were returning right out of the function on the first alphabetically ordered substring you found. In my code, we add them to a list then print out the longest one.
Assume that substring contains 2 or more characters in alphabetical order. So that you should not only return the first occurrence but collect all and find longest. I try to keep your idea the same, but this is not the most efficient way:
def longest_substring(s1):
res = []
for i in range(len(alpha) - 2):
for k in range(i + 2, len(alpha)):
if alpha[i:k] in s1:
res.append(alpha[i:k])
return max(res, key=len)
You re-write a version of itertools.takewhile to take a binary compare function instead of the unary one.
def my_takewhile(predicate, starting_value, iterable):
last = starting_value
for cur in iterable:
if predicate(last, cur):
yield cur
last = cur
else:
break
Then you can lowercase the word (since "Za" isn't in alphabetical order, but any [A-Z] compares lexicographically before any [a-z]) and get all the substrings.
i = 0
substrings = []
while i < len(alpha):
it = iter(alpha[i:])
substring = str(my_takewhile(lambda x,y: x<y, chr(0), it))
i += len(substring)
substrings.append(substring)
Then just find the longest substring in substrings.
result = max(substrings, key=len)
Instead of building a list of all possible substring slices and then checking which one exists in the string, you can build a list of all consecutive substrings, and then take the one with the maximum length.
This is easily done by grouping the characters using the difference between the ord of that character and an increasing counter; successive characters will have a constant difference. itertools.groupby is used to perform the grouping:
from itertools import groupby, count
alpha = "abcdefghijklmnopqrstuvwxyz"
c = count()
lst_substrs = [''.join(g) for _, g in groupby(alpha, lambda x: ord(x)-next(c))]
substr = max(lst_substrs, key=len)
print(substr)
# abcdefghijklmnopqrstuvwxyz
As #AdamSmith commented, the above assumes the characters are always in alphabetical order. In the case they may not be, one can enforce the order by checking that items in the group are alphabetical:
from itertools import groupby, count, tee
lst = []
c = count()
for _, g in groupby(alpha, lambda x: ord(x)-next(c)):
a, b = tee(g)
try:
if ord(next(a)) - ord(next(a)) == -1:
lst.append(''.join(b))
except StopIteration:
pass
lst.extend(b) # add each chr from non-alphabetic iterator (could be empty)
substr = max(lst, key=len)
back up and look at this problem again.
1. you are looking for a maximum and should basically (pseudo code):
set a max to ""
loop through sequences
if new sequence is bigger the max, then replace max
find the sequences you can be more efficient if you only step though the input characters once.
Here is a version of this:
def longest_substring(s1):
max_index, max_len = 0, 0 # keep track of the longest sequence here
last_c = s1[0] # previous char
start, seq_len = 0, 1 # tracking current seqence
for i, c in enumerate(s1[1:]):
if c >= last_c: # can we extend sequence in alpha order
seq_len += 1
if seq_len > max_len: # found longer
max_index, max_len = start, seq_len
else: # this char starts new sequence
seq_len = 0
start = i + 1
last_c = c
return s1[max_index:max_index+max_len]
s = 'azcbobobegghakl'
def max_alpha_subStr(s):
'''
INPUT: s, a string of lowercase letters
OUTPUT: longest substing of s in which the
letters occur in alphabetical order
'''
longest = s[0] # set variables 'longest' and 'current' as 1st letter in s
current = s[0]
for i in s[1:]: # begin iteration from 2nd letter to the end of s
if i >= current[-1]: # if the 'current' letter is bigger
# than the letter before it
current += i # add that letter to the 'current' letter(s) and
if len(current) > len(longest): # check if the 'current' length of
# letters are longer than the letters in'longest'
longest = current # if 'current' is the longest, make 'longest'
# now equal 'current'
else: # otherwise the current letter is lesser
# than the letter before it and
current = i # restart evaluating from the point of iteration
return print("Longest substring in alphabetical order is: ", longest)
max_alpha_subStr(s)

Categories

Resources