Find the most occurring character in a string

Find the most occurring character in a string - python

This piece of code is going to find the most occurring chr in a string ,and it almost works fine through a dictionary ,but unfortunately the problem is that I want to make it return the last key when there are two keys with the same frequency ,but it returns the first one.
And this is what I have done so far:
def most_frequent_letter(s):
st = s.lower().replace(' ', '')
frequencies = {}
for items in st:
if items in frequencies:
frequencies[items] += 1
else:
frequencies[items] = 1
return max(frequencies, key=frequencies.get)
most_frequent_letter('mmmaaa')
Out[48]: 'm'
However I don't know how to return 'a' instead of 'm'.

Here's a way that creates a reverse frequency dictionary. I also made the creation of the frequency dictionary and its reverse fairly succinct by using a dictionary comprehension:
def most_frequent_letter(s):
st = s.lower().replace(' ', '')
frequencies = {}
frequencies = {item: frequencies.setdefault(item, 0) + 1 for item in st}
rev_freq = {count: key for key, count in frequencies.items()}
return rev_freq[max(rev_freq)]
print(most_frequent_letter('nnmmmaaa')) # -> a

Python max function always returns the first maximum occurrence.
Hence, if you always want the last key, then you can just reverse the original string in your code.
def most_frequent_letter(s):
st = s.lower().replace(' ', '')
st = st[::-1]
frequencies = {}
for items in st:
if items in frequencies:
frequencies[items] += 1
else:
frequencies[items] = 1
return max(frequencies, key=frequencies.get)
Or sort the string first if you want the lowest valued key.
You can also just create your own max function instead to suit your needs.

def most_frequent_letter(word):
letters = list(word)
return (max(set(letters), key = letters.count))
print(most_frequent_letter('mmmaaa'))
# output:m
print(most_frequent_letter('some apples are green'))
# output: e
max() will return the highest value in a list. The key argument takes a single argument function to customize the sort order, in this case, it’s letters.count. The function is applied to each item on the iterable.
letters.count is a built-in function of list. It takes an argument and will count the number of occurrences for that argument. So letters.count('m') will return 3 and letters.count(a) returns 3.
set(test) returns all the unique values from test, so {3, 3}
So what we do in this single line of code is take all the unique values of test, which is {1, 3}. Next, max will apply the list.count function to them and return the maximum value.

collections library has Counter which does the job for you: We normalize the word with lower casing and replace space before reverse string to have last appearance first.
from collections import Counter
word = 'mmmaaa'
characters = Counter(reversed(word.lower().replace(' ', '')))
# most common
print(characters.most_common(1))

yes you can get both m and a, it depends how you want to get the output but I have taken a string just for example
def most_frequent_letter(s):
st = s.lower().replace(' ', '')
frequencies = {}
for items in st:
if items in frequencies:
frequencies[items] += 1
else:
frequencies[items] = 1
max_val=max(frequencies.values())
result=""
for key,value in frequencies.items():
if value==max_val:
result+=key
return result
result=most_frequent_letter('mmmaaa')
print(result)
the output will be "ma"

In python when you use the max function it will return the first max frequency, if you want the second max frequency you could try to delete from the list the 'm's so after that the first max frequency will be 'a'.

Related

Populating a List - Python assigns index and gives error

I have a solution to count the number of occurrences of each letter in a string and return a dict.
def count_characters(in_str):
all_freq = {}
for i in in_str:
if i in all_freq:
all_freq[i] += 1
else:
all_freq[i] = 1
return all_freq
count_characters("Hello")
It works.
I am trying to understand how Python automatically assigns each letter as the key to the dict - its not explicitly assigned. I replace the null dictionary assignment with a list and expect to get just the frequency, without the letter.
def count_characters(in_str):
all_freq = []
for i in in_str:
if i in all_freq:
all_freq[i] += 1
else:
all_freq[i] = 1
return all_freq
count_characters("Hello")
I get an error.
TypeError: list indices must be integers or slices, not str
My question is:
How does each letter get automatically assigned as the key?
How can I return just the numbers as a list - without the letter?

How does each letter get automatically assigned as the key?
You are doing it explicitly in all_freq[i] = 1. Here, i contains a letter (though I think the variable could be named better — i typically stands for an idex of some sort, which this isn't).
How can I return just the numbers as a list - without the letter?
You could still build a dictionary and then return list(all_freq.values()). Though, if you do that, how would you know which letter each count corresponds to?
This is probably not relevant for your exercise, but the standard library already has a class for doing this sort of counting: collections.Counter.

List are indexing with integer, in your second exemple you use dictionary synthax with a list, that why python complain.
You can use count method
def count_characters(in_str):
all_freq = []
for i in in_str:
all_freq.append(in_str.count(i))
return all_freq
count_characters("Hello")
Edit : I agree with #NPE comment
You could still build a dictionary and then return list(all_freq.values()). Though, if you do that, how would you know which letter each count corresponds to?
This is probably not relevant for your exercise, but the standard library already has a class for doing this sort of counting: collections.Counter.

It's simple when we use built-in methods of dictionary like .keys(), .values()
Example:
def count_char(string):
dict = {}
count = 0
for i in string :
if i in dict.keys():
dict[i] +=1
else:
dict[i] = 1
for j in dict.values():
count += j
print(dict)
print(count)

why key is used in the tuple ? and what's syntax here

def LongestWord(sen):
nw = ""
for letter in sen:
if letter.isalpha() or letter.isnumeric():
nw += letter
else :
nw += " "
return max(nw.split(),key=len)
print(LongestWord("Hello world"))
what is the key=len means. key is used in dict right ? I can't understand the syntax here max(nw.split(), key=len) ?

You're right that dictionaries contain mappings from keys to values. In this particular case though, key is just one of the parameters of the max function. It allows the caller to specify a sort function. For more information, see https://docs.python.org/3/library/functions.html#max.

max means maximum, but what metric's maximum are you trying to find? That's where the key comes in. Here, the key is len (length) that is the you are trying to find the element with the highest length. In case of words you can not simply use greater than or less than, hence you need to specify a key with which you determine the pattern. For example:
>>> words = ['this','is','an','example']
>>> max(words, key=len)
'example'
You can think of the keys as the keys in dictionary, as they key here is len, the dict would be like:
{4: 'this', 2: 'an', 7: 'example'}
So it will return the value of the highest key (7), that is example.
You can also define custom keys:
>>> def vowels(word):
... '''this returns number of vowels
... in a word'''
... v = 'aeiou'
... ctr = 0
... for char in word:
... if char in v:
... ctr += 1
... return ctr
>>> words = ['standing','in','a','queue']
>>> max(words, key = vowels)
'queue'
The dictionary analogy would be:
{2:'standing', 1: 'a', 3: 'queue'}
So the answer will be queue

max(nw.split(),key=len)
Here, max(iterator, default, key=function) max function takes 3 arguments first is iterator like list, tuple, or dictonay.
Second parameter is default value to return if the iterator is empty, the second paramter is optional.
third parameter is a key word argument that is key=function we have to pass a function that take one parameter and our each value in the iterator is passed to this function so the on the bases of return value of this function our max() function gives output to us.
third parameter is also a optional paramter.

Here key is for the method max().
Since this function is to find the longest word in the string, you are trying to find the word with max length, hence key = len
Example:
max(111,222,333,444,555,999) = 999
max(111,222,333,444,555,999, key = lambda x:x%3 ) = 111

Variable not reassigned when changed in for loop

The goal of this code is to count the word that appears the most within the given list. I planned to do this by looping through the dictionary. If a word appeared a greater number of times than the value stored in the variable rep_num, it was reassigned. Currently, the variable rep_num remains 0 and is not reassigned to the number of times a word appears in the list. I believe this has something to do with trying to reassign it within a for loop, but I am not sure how to fix the issue.
def rep_words(novel_list):
rep_num=0
for i in range(len(novel_list)):
if novel_list.count(i)>rep_num:
rep_num=novel_list.count(i)
return rep_num
novel_list =['this','is','the','story','in','which','the','hero','was','guilty']
In the given code, 2 should be returned, but 0 is returned instead.

In you for loop you are iterating over the numbers and not list elements themselves,
def rep_words(novel_list):
rep_num=0
for i in novel_list:
if novel_list.count(i)>rep_num:
rep_num=novel_list.count(i)
return rep_num

You're iterating over a numeric range, and counting the integer i, none of which values exist in the list at all. Try this instead, which returns the maximum frequency, and optionally a list of words which occur that many times.
novel_list =['this','is','the','story','in','which','the','hero','was','guilty']
def rep_words(novel_list, include_words=False):
counts = {word:novel_list.count(word) for word in set(novel_list)}
rep = max(counts.values())
word = [k for k,v in counts.items() if v == rep]
return (rep, word) if include_words else rep
>>> rep_words(novel_list)
2
>>> rep_words(novel_list, True)
(2, ['the'])
>>> rep_words('this list of words has many words in this list of words and in this list of words is this'.split(' '), True)
(4, ['words', 'this'])

You've an error in your function (you're counting the index, not the value), write like this:
def rep_words(novel_list):
rep_num=0
for i in novel_list:
if novel_list.count(i)>rep_num: #you want to count the value, not the index
rep_num=novel_list.count(i)
return rep_num
Or you may try this too:
def rep_words(novel_list):
rep_num=0
for i in range(len(novel_list)):
if novel_list.count(novel_list[i])>rep_num:
rep_num=novel_list.count(novel_list[i])
return rep_num

How to remove duplicate characters in a string and print according to the longest occurrence

I've been trying to solve this program, but i am unable.
x="abcaa" # sample input
x="bca" # sample output
i have tried this:
from collections import OrderedDict
def f(x):
print ''.join(OrderedDict.fromkeys(x))
t=input()
for i in range(t):
x=raw_input()
f(x)
The above code is giving:
x="abcaa" # Sample input
x="abc" # sample output
More Details:
Sample Input:
abc
aaadcea
abcdaaae
Sample Output:
abc
adce
bcdae
In first case, the string is="abcaa", here 'a' is repeated maximum at the last so that is placed at last so resulting "bca" And in other case, "aaadcea", here 'a' is repeated maximum at the first so it is placed at first, resulting "adce".

The OrderedDict isn't helping you at all, because the order you're preserving isn't the one you want.
If I understand your question (and I'm not at all sure I do…) the order you want is a sorted order, using the number of times the character appears as the sorting key, so the most frequent characters appear last.
So, this means you need to associate each character with a count in some way. You could do that with an explicit loop and d.setdefault(char, 0) and so on, but if you look in the collections docs, you'll see something named Counter right next to OrderedDict, which is a:
dict subclass for counting hashable objects
That's exactly what you want:
>>> x = 'abcaa'
>>> collections.Counter(x)
Counter({'a': 3, 'b': 1, 'c': 1})
And now you just need to sort with a key function:
>>> ''.join(sorted(c, key=c.__getitem__))
'bca'
If you want this to be a stable sort, so that elements with the same counts are shown in the order they first appear, or the order they first reach that count, then you will need OrderedDict. How do you get both OrderedDict behavior and Counter behavior? There's a recipe in the docs that shows how to do it. (And you actually don't even need that much; the __repr__ and __reduce__ are irrelevant for your use, so you can just inherit from Counter and OrderedDict and pass for the body.)

Taking a different guess at what you want:
For each character, you want to find the position at which it has the most repetitions.
That means that, as you go along, you need to keep track of two things for each character: the position at which it has the most repetitions so far, and how many. And you also need to keep track of the current run of characters.
In that case, the OrderedDict is necessary, it's just not sufficient. You need to add characters to the OrderedDict as you find them, and remove them and readd them when you find a longer run, and you also need to store a count in the value for each key rather that just use the OrderedDict as an OrderedSet. Like this:
d = collections.OrderedDict()
lastch, runlength = None, None
for ch in x:
if ch == lastch:
runlength += 1
else:
try:
del d[lastch]
except KeyError:
pass
if runlength:
d[lastch] = runlength
lastch, runlength = ch, 1
try:
del d[lastch]
except KeyError:
pass
if runlength:
d[lastch] = runlength
x = ''.join(d)
You may notice that there's a bit of repetition here, and a lot of verbosity. You can simplify the problem quite a bit by breaking it into two steps: first compress the string into runs, then just keep track of the largest run for each character. Thanks to the magic of iterators, this doesn't even have to be done in two passes, the first step can be done lazily.
Also, because you're still using Python 2.7 and therefore don't have OrderedDict.move_to_end, we have to do that silly delete-then-add shuffle, but we can use pop to make that more concise.
So:
d = collections.OrderedDict()
for key, group in itertools.groupby(x):
runlength = len(list(group))
if runlength > d.get(key, 0):
d.pop(key, None)
d[key] = runlength
x = ''.join(d)
A different way to solve this would be to use a plain-old dict, and store the runlength and position for each character, then sort the results in position order. This means we no longer need to do the move-to-end shuffle, we're just updating the position as part of the value:
d = {}
for i, (key, group) in enumerate(itertools.groupby(x)):
runlength = len(list(group))
if runlength > d.get(key, (None, 0))[1]:
d[key] = (i, runlength)
x = ''.join(sorted(d, key=d.__getitem__))
However, I'm not sure this improvement actually improves the readability, so I'd go with the second version above.

This is an inelegant, ugly, inefficient, and almost certainly non-Pythonic solution but I think it does what you're looking for.
t = raw_input('Write your string here: ')
# Create a list initalized to 0 to store character counts
seen = dict()
# Make sure actually have a string
if len(t) < 1:
print ""
else:
prevChar = t[0]
count = 0
for char in t:
if char == prevChar:
count = count + 1
else:
# Check if the substring we just finished is the longest
if count > seen.get(prevChar, 0):
seen[prevChar] = count
# Characters differ, restart
count = 1
prevChar = char
# Append last character
seen[prevChar] = count
# Now let's build the string, appending the character when we find the longest version
count = 0
prevChar = t[0]
finalString = ""
for char in t:
if char in finalString:
# Make sure we don't append a char twice, append the first time we find the longest subsequence
continue
if char == prevChar:
count = count + 1
else:
# Check if the substring we just finished is the longest
if count == seen.get(prevChar, 0):
finalString = finalString + prevChar
# Characters differ, restart
count = 1
prevChar = char
# Check the last character
if count == seen[prevChar]:
finalString= finalString + prevChar
print finalString

How to remove duplicates only if consecutive in a string? [duplicate]

This question already has answers here:
Removing elements that have consecutive duplicates
(9 answers)
Closed 3 years ago.
For a string such as '12233322155552', by removing the duplicates, I can get '1235'.
But what I want to keep is '1232152', only removing the consecutive duplicates.

import re
# Only repeated numbers
answer = re.sub(r'(\d)\1+', r'\1', '12233322155552')
# Any repeated character
answer = re.sub(r'(.)\1+', r'\1', '12233322155552')

You can use itertools, here is the one liner
>>> s = '12233322155552'
>>> ''.join(i for i, _ in itertools.groupby(s))
'1232152'

Microsoft / Amazon job interview type of question:
This is the pseudocode, the actual code is left as exercise.
for each char in the string do:
if the current char is equal to the next char:
delete next char
else
continue
return string
As a more high level, try (not actually the implementation):
for s in string:
if s == s+1: ## check until the end of the string
delete s+1

Hint: the itertools module is super-useful. One function in particular, itertools.groupby, might come in really handy here:
itertools.groupby(iterable[, key])
Make an iterator that returns consecutive keys and groups from
the iterable. The key is a function computing a key value for each
element. If not specified or is None, key defaults to an identity
function and returns the element unchanged. Generally, the iterable
needs to already be sorted on the same key function.
So since strings are iterable, what you could do is:
use groupby to collect neighbouring elements
extract the keys from the iterator returned by groupby
join the keys together
which can all be done in one clean line..

First of all, you can't remove anything from a string in Python (google "Python immutable string" if this is not clear).
M first approach would be:
foo = '12233322155552'
bar = ''
for chr in foo:
if bar == '' or chr != bar[len(bar)-1]:
bar += chr
or, using the itertools hint from above:
''.join([ k[0] for k in groupby(a) ])

+1 for groupby. Off the cuff, something like:
from itertools import groupby
def remove_dupes(arg):
# create generator of distinct characters, ignore grouper objects
unique = (i[0] for i in groupby(arg))
return ''.join(unique)
Cooks for me in Python 2.7.2

number = '12233322155552'
temp_list = []
for item in number:
if len(temp_list) == 0:
temp_list.append(item)
elif len(temp_list) > 0:
if temp_list[-1] != item:
temp_list.append(item)
print(''.join(temp_list))

This would be a way:
def fix(a):
list = []
for element in a:
# fill the list if the list is empty
if len(list) == 0:list.append(element)
# check with the last element of the list
if list[-1] != element: list.append(element)
print(''.join(list))
a= 'GGGGiiiiniiiGinnaaaaaProtijayi'
fix(a)
# output => GiniGinaProtijayi

t = '12233322155552'
for i in t:
dup = i+i
t = re.sub(dup, i, t)
You can get final output as 1232152

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Find the most occurring character in a string - python

In python when you use the max function it will return the first max frequency, if you want the second max frequency you could try to delete from the list the 'm's so after that the first max frequency will be 'a'.

Related

Populating a List - Python assigns index and gives error

why key is used in the tuple ? and what's syntax here

Variable not reassigned when changed in for loop

How to remove duplicate characters in a string and print according to the longest occurrence

How to remove duplicates only if consecutive in a string? [duplicate]

Categories

Resources