Most common character in a string - python

Write a function that takes a string consisting of alphabetic
characters as input argument and returns the most common character.
Ignore white spaces i.e. Do not count any white space as a character.
Note that capitalization does not matter here i.e. that a lower case
character is equal to a upper case character. In case of a tie between
certain characters return the last character that has the most count
This is the updated code
def most_common_character (input_str):
input_str = input_str.lower()
new_string = "".join(input_str.split())
print(new_string)
length = len(new_string)
print(length)
count = 1
j = 0
higher_count = 0
return_character = ""
for i in range(0, len(new_string)):
character = new_string[i]
while (length - 1):
if (character == new_string[j + 1]):
count += 1
j += 1
length -= 1
if (higher_count < count):
higher_count = count
return (character)
#Main Program
input_str = input("Enter a string: ")
result = most_common_character(input_str)
print(result)
The above is my code. I am getting an error of string index out of bound which I can't understand why. Also the code only checks the occurrence of first character I am confused about how to proceed to the next character and take the maximum count?
The error i get when I run my code:
> Your answer is NOT CORRECT Your code was tested with different inputs.
> For example when your function is called as shown below:
>
> most_common_character ('The cosmos is infinite')
>
> ############# Your function returns ############# e The returned variable type is: type 'str'
>
> ######### Correct return value should be ######## i The returned variable type is: type 'str'
>
> ####### Output of student print statements ###### thecosmosisinfinite 19

You can use a regex patter to search for all characters. \w matches any alphanumeric character and the underscore; this is equivalent to the set [a-zA-Z0-9_]. The + after [\w] means to match one or more repetitions.
Finally, you use Counter to total them and most_common(1) to get the top value. See below for the case of a tie.
from collections import Counter
import re
s = "Write a function that takes a string consisting of alphabetic characters as input argument and returns the most common character. Ignore white spaces i.e. Do not count any white space as a character. Note that capitalization does not matter here i.e. that a lower case character is equal to a upper case character. In case of a tie between certain characters return the last character that has the most count"
>>> Counter(c.lower() for c in re.findall(r"\w", s)).most_common(1)
[('t', 46)]
In the case of a tie, it is a little more tricky.
def top_character(some_string):
joined_characters = [c for c in re.findall(r"\w+", some_string.lower())]
d = Counter(joined_characters)
top_characters = [c for c, n in d.most_common() if n == max(d.values())]
if len(top_characters) == 1:
return top_characters[0]
reversed_characters = joined_characters[::-1]
for c in reversed_characters:
if c in top_characters:
return c
>>> top_character(s)
't'
>>> top_character('the the')
'e'
In the case of your code above and your sentence "The cosmos is infinite", you can see that 'i' occurs more frequently that 'e' (the output of your function):
>>> Counter(c.lower() for c in "".join(re.findall(r"[\w]+", 'The cosmos is infinite'))).most_common(3)
[('i', 4), ('s', 3), ('e', 2)]
You can see the issue in your code block:
for i in range(0, len(new_string)):
character = new_string[i]
...
return (character)
You are iterating through a sentence and assign that letter to the variable character, which is never reassigned elsewhere. The variable character will thus always return the last character in your string.

Actually your code is almost correct. You need to move count, j, length inside of your for i in range(0, len(new_string)) because you need to start over on each iteration and also if count is greater than higher_count, you need to save that charater as return_character and return it instead of character which is always last char of your string because of character = new_string[i].
I don't see why have you used j+1 and while length-1. After correcting them, it now covers tie situations aswell.
def most_common_character (input_str):
input_str = input_str.lower()
new_string = "".join(input_str.split())
higher_count = 0
return_character = ""
for i in range(0, len(new_string)):
count = 0
length = len(new_string)
j = 0
character = new_string[i]
while length > 0:
if (character == new_string[j]):
count += 1
j += 1
length -= 1
if (higher_count <= count):
higher_count = count
return_character = character
return (return_character)

If we ignore the "tie" requirement; collections.Counter() works:
from collections import Counter
from itertools import chain
def most_common_character(input_str):
return Counter(chain(*input_str.casefold().split())).most_common(1)[0][0]
Example:
>>> most_common_character('The cosmos is infinite')
'i'
>>> most_common_character('ab' * 3)
'a'
To return the last character that has the most count, we could use collections.OrderedDict:
from collections import Counter, OrderedDict
from itertools import chain
from operator import itemgetter
class OrderedCounter(Counter, OrderedDict):
pass
def most_common_character(input_str):
counter = OrderedCounter(chain(*input_str.casefold().split()))
return max(reversed(counter.items()), key=itemgetter(1))[0]
Example:
>>> most_common_character('ab' * 3)
'b'
Note: this solution assumes that max() returns the first character that has the most count (and therefore there is a reversed() call, to get the last one) and all characters are single Unicode codepoints. In general, you might want to use \X regular expression (supported by regex module), to extract "user-perceived characters" (eXtended grapheme cluster) from the Unicode string.

Related

Remove string character after run of n characters in string

Suppose you have a given string and an integer, n. Every time a character appears in the string more than n times in a row, you want to remove some of the characters so that it only appears n times in a row. For example, for the case n = 2, we would want the string 'aaabccdddd' to become 'aabccdd'. I have written this crude function that compiles without errors but doesn't quite get me what I want:
def strcut(string, n):
for i in range(len(string)):
for j in range(n):
if i + j < len(string)-(n-1):
if string[i] == string[i+j]:
beg = string[:i]
ends = string[i+1:]
string = beg + ends
print(string)
These are the outputs for strcut('aaabccdddd', n):
n
output
expected
1
'abcdd'
'abcd'
2
'acdd'
'aabccdd'
3
'acddd'
'aaabccddd'
I am new to python but I am pretty sure that my error is in line 3, 4 or 5 of my function. Does anyone have any suggestions or know of any methods that would make this easier?
This may not answer why your code does not work, but here's an alternate solution using regex:
import re
def strcut(string, n):
return re.sub(fr"(.)\1{{{n-1},}}", r"\1"*n, string)
How it works: First, the pattern formatted is "(.)\1{n-1,}". If n=3 then the pattern becomes "(.)\1{2,}"
(.) is a capture group that matches any single character
\1 matches the first capture group
{2,} matches the previous token 2 or more times
The replacement string is the first capture group repeated n times
For example: str = "aaaab" and n = 3. The first "a" is the capture group (.). The next 3 "aaa" matches \1{2,} - in this example a{2,}. So the whole thing matches "a" + "aaa" = "aaaa". That is replaced with "aaa".
regex101 can explain it better than me.
you can implement a stack data structure.
Idea is you add new character in stack, check if it is same as previous one or not in stack and yes then increase counter and check if counter is in limit or not if yes then add it into stack else not. if new character is not same as previous one then add that character in stack and set counter to 1
# your code goes here
def func(string, n):
stack = []
counter = None
for i in string:
if not stack:
counter = 1
stack.append(i)
elif stack[-1]==i:
if counter+1<=n:
stack.append(i)
counter+=1
elif stack[-1]!=i:
stack.append(i)
counter = 1
return ''.join(stack)
print(func('aaabbcdaaacccdsdsccddssse', 2)=='aabbcdaaccdsdsccddsse')
print(func('aaabccdddd',1 )=='abcd')
print(func('aaabccdddd',2 )=='aabccdd')
print(func('aaabccdddd',3 )=='aaabccddd')
output
True
True
True
True
The method I would use is creating a new empty string at the start of the function and then everytime you exceed the number of characters in the input string you just not insert them in the output string, this is computationally efficient because it is O(n) :
def strcut(string,n) :
new_string = ""
first_c, s = string[0], 0
for c in string :
if c != first_c :
first_c, s= c, 0
s += 1
if s > n : continue
else : new_string += c
return new_string
print(strcut("aabcaaabbba",2)) # output : #aabcaabba
Simply, to anwer the question
appears in the string more than n times in a row
the following code is small and simple, and will work fine :-)
def strcut(string: str, n: int) -> str:
tmp = "*" * (n+1)
for char in string:
if tmp[len(tmp) - n:] != char * n:
tmp += char
print(tmp[n+1:])
strcut("aaabccdddd", 1)
strcut("aaabccdddd", 2)
strcut("aaabccdddd", 3)
Output:
abcd
aabccdd
aaabccddd
Notes:
The character "*" in the line tmp = "*"*n+string[0:1] can be any character that is not in the string, it's just a placeholder to handle the start case when there are no characters.
The print(tmp[n:]) line simply removes the "*" characters added in the beginning.
You don't need nested loops. Keep track of the current character and its count. include characters when the count is less or equal to n, reset the current character and count when it changes.
def strcut(s,n):
result = '' # resulting string
char,count = '',0 # initial character and count
for c in s: # only loop once on the characters
if c == char: count += 1 # increase count
else: char,count = c,1 # reset character/count
if count<=n: result += c # include character if count is ok
return result
Just to give some ideas, this is a different approach. I didn't like how n was iterating each time even if I was on i=3 and n=2, I still jump to i=4 even though I already checked that character while going through n. And since you are checking the next n characters in the string, you method doesn't fit with keeping the strings in order. Here is a rough method that I find easier to read.
def strcut(string, n):
for i in range(len(string)-1,0,-1): # I go backwards assuming you want to keep the front characters
if string.count(string[i]) > n:
string = remove(string,i)
print(string)
def remove(string, i):
if i > len(string):
return string[:i]
return string[:i] + string[i+1:]
strcut('aaabccdddd',2)

find first not repeating character

Problem:
Given a string s consisting of small English letters, find and return the first instance of a non-repeating character in it. If there is no such character, return '_'.
Example
For s = "abacabad", the output should be
firstNotRepeatingCharacter(s) = 'c'.
This is what I have so far, but it's too slow. How can I make the run time faster? Thanks.
def firstNotRepeatingCharacter(s):
char = set(s)
for i in range(len(s)):
if s.count(s[i]) == 1:
return s[i]
return "_"
You could use collections.Counter to count the characters in linear time, and then filter the result in conjunction with next, like this:
from collections import Counter
def firstNotRepeatingCharacter(s):
counts = Counter(s)
return next((ch for ch in s if counts[ch] < 2), "_")
print(firstNotRepeatingCharacter("abacabad"))
Output
c
Or simply use a dictionary (no imports needed):
counts = {}
for ch in s:
counts[ch] = counts.get(ch, 0) + 1
return next((ch for ch in s if counts[ch] < 2), "_")
Both approaches are linear in the length of the input string, your current approach is O(k*s) where k is the number of unique characters.

The function returns different answers for the same given

The required is a function that takes a string and returns the most repeated character in it without considering the punctuations, white spaces and numbers, it also treats "A" == "a", if the string has equally repeated characters it returns the letter which comes first in the Latin alphabet.
Here is the function given the examples, I've commented it for more clarification
def checkio(text):
# any char
result = "a"
# iterating through small chars
for i in text.lower():
# iterating through lowercase letters only and not considering punctuation, white spaces and letters
if i in string.ascii_lowercase:
# If i is repeated more than the result
if text.count(i) > text.count(result):
result = i
# in case the letters are equal in repeated the same time
elif text.count(i) == text.count(result):
# returning according to the letter which comes first in the
Latin alphabet
if string.ascii_lowercase.find(i) < string.ascii_lowercase.find(result):
result = i
return result
print(checkio("Hello World!"))
print(checkio("How do you do?"))
print(checkio("One"))
print(checkio("Oops!"))
print(checkio("abe"))
print(checkio("a" * 9000 + "b" * 1000))
# Here is the problem
print(checkio("AAaooo!!!!")) # returns o
print(checkio("aaaooo!!!!")) # returns a --> the right solution!
Your calls to text.count don't call lower() first. At the top of the function, you should call text = text.lower(). Then your text.count calls will work on the same normalized, lowercase characters that your iterator does.
When to execute text.count(i) for "AAaooo!!!!", it does the following: A : count = 2 a : count = 1 o : count = 3 .... and so on.
Hence, before counting the characters, you need to convert the entire string to lower case. That will solve your issue.
def checkio(text):
# any char
result = "a"
# iterating through small chars
for i in text.lower():
# iterating through lowercase letters only and not considering punctuation, white spaces and letters
if i in string.ascii_lowercase:
# If i is repeated more than the result
if text.lower().count(i) > text.lower().count(result):
result = i
# in case the letters are equal in repeated the same time
elif text.lower().count(i) == text.lower().count(result):
# returning according to the letter which comes first in the Latin alphabet
if string.ascii_lowercase.find(i) < string.ascii_lowercase.find(result):
result = i
return result
print(checkio("AAaooo!!!!")) # returns a
print(checkio("aaaooo!!!!")) # returns a

Counting words starting with a character

Write a function that accepts a string and a character as input and
returns the count of all the words in the string which start with the
given character. Assume that capitalization does not matter here. You
can assume that the input string is a sentence i.e. words are
separated by spaces and consists of alphabetic characters.
This is my code:
def count_input_character (input_str, character):
input_str = input_str.lower()
character = character.lower()
count = 0
for i in range (0, len(input_str)):
if (input_str[i] == character and input_str[i - 1] == " "):
count += 1
return (count)
#Main Program
input_str = input("Enter a string: ")
character = input("Enter character whose occurances are to be found in the given input string: ")
result = count_input_character(input_str, character)
#print(result)
The only part missing here is that how to check if the first word of the sentence is stating with the user given character. consider this output:
Your answer is NOT CORRECT Your code was tested with different inputs. > For example when your function is called as shown below:
count_input_character ('the brahman the master of the universe', 't')
####### Your function returns ############# 2 The returned variable type is: type 'int'
### Correct return value should be ######## 3 The returned variable type is: type 'int'
You function misses the first t because in this line
if (input_str[i] == character and input_str[i - 1] == " "):
when i is 0, then input_str[i - 1] is input_str[-1] which Python will resolve as the last character of the string!
To fix this, you could change your condition to
if input_str[i] == character and (i == 0 or input_str[i - 1] == " "):
Or use str.split with a list comprehension. Or a regular expression like r'(?i)\b%s', with (?i) meaning "ignore case", \b is word boundary and %s a placeholder for the character..
Instead of looking for spaces, you could split input_str on whitespace, this would produce a list of words that you could then test against character. (Pseudocode below)
function F sentence, character {
l = <sentence split by whitespace>
count = 0
for word in l {
if firstchar(word) == character {
count = count + 1
}
}
return count
}
Although it doesn't fix your specific bug, for educational purposes, please note you could rewrite your function like this using list comprehension:
def count_input_character (input_str, character):
return len([x for x in input_str.lower().split() if x.startswith(character.lower())])
or even more efficiently(thanks to tobias_k)
def count_input_character (input_str, character):
sum(w.startswith(character.lower()) for w in input_str.lower().split())
def c_upper(text, char):
text = text.title() #set leading char of words to uppercase
char = char.upper() #set given char to uppercase
k = 0 #counter
for i in text:
if i.istitle() and i == char: #checking conditions for problem, where i is a char in a given string
k = k + 1
return k

How to convert characters of a string from lowercase to upper case and vice versa without using string functions in python?

Write a function which accepts an input string and returns a string
where the case of the characters are changed, i.e. all the uppercase
characters are changed to lower case and all the lower case characters
are changed to upper case. The non-alphabetic characters should not be
changed. Do NOT use the string methods upper(), lower(), or swap().
This is my code:
def changing_cases (input_str):
new_string = []
for i in range(0, len(input_str)):
convert = input_str[i]
value = ord(convert)
if (value >= 65 and value <= 90 ):
value += 32
new_string.append(chr(value))
elif (value >= 97 and value <= 122):
value -= 32
new_string.append(chr(value))
return (str(new_string))
#Main Program
input_str = "Hello"
result = changing_cases (input_str)
print (result)
This code works as expected but there are two major problems with this.
Firstly the output which it returns to the Main is a list, I want it as a string.
Second, how to check whether the string contains special cases and by pass it if there is a special character. Special characters are scattered all over the ASCII table.
Any help would be appreciated.
The string method .join() can help you to unite a list and return a string. But without that knowledge, you could have done this string concatenation.(For that you need to initialize new_string with "", not [])
Join usage:
"".join(["h","e","l","l","o"])
# "hello"
To your second question. You could check if an input is from the alphabet with the .isalpha() method. Which returns a boolean value.
"a".isalpha()
# True
"&".isalpha()
# False
And a suggestion about the solution, You could import the uppercase and lowercase alphabets from the string module. After that, iterating over the term and swapping letters using the alphabet strings is very easy. Your solution is fine for understanding how ascii table works. But with the way I mentioned, you can avoid facing problems about special cases. It is a poor method for cryptology though.
Concerning the first problem. I've found it may be possible to use:
print ','.join(result)
or
print str(result).strip('[]')
Good luck!
def changing_cases (input_str):
new_string = []
for i in range(0, len(input_str)):
convert = input_str[i]
value = ord(convert)
if 65 <= value <= 90:
value += 32
new_string.append(chr(value))
elif 97 <= value <= 122:
value -= 32
new_string.append(chr(value))
else:
return
return ''.join(new_string)
So this function will return None if there are any special characters in string and you simply add if conditon to check if result is None then you just skip this word
You are close:
def changing_cases (input_str):
new_string = []
for i in range(0, len(input_str)):
convert = input_str[i]
value = ord(convert)
if (value >= 65 and value <= 90 ):
value += 32
new_string.append(chr(value))
elif (value >= 97 and value <= 122):
value -= 32
new_string.append(chr(value))
else: #if special character
new_string.append(chr(value))
return (''.join(new_string))
#Main Program
input_str = "Hello"
result = changing_cases (input_str)
print (result)
In python3, one option is to use str.translate.
First, use string methods - string.ascii_uppercase and string.ascii_lowercase to build strings with entire character sets 'A..Z' and 'a..z'. Use str.maketranslate to make a translation table, one for upper case letters to lower case and another for lower case to upper case letters. Finally, loop through the required string, and use str.translate to build the converted string.
import re
import string
test_str = r'The Tree outside is gREEN'
new_str = ''
str_lower = string.ascii_lowercase
#'abcdefghijklmnopqrstuvwxyz'
str_upper = string.ascii_uppercase
#'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
tr_to_upper = str.maketrans(str_lower, str_upper)
tr_to_lower = str.maketrans(str_upper, str_lower)
for char in test_str:
if re.findall(r'[A-Z]', char):
new_str = new_str + char.translate(tr_to_lower)
elif re.findall(r'[a-z]', char):
new_str = new_str + char.translate(tr_to_upper)
else:
new_str = new_str + char
print('{}'.format(new_str))
Output:
tHE tREE OUTSIDE IS Green
How about a cipher? It's not quite as readable but it seriously reduces the line count.
def swap_cases(data):#ONLY INTENDED FOR USE WITH ASCII! EBCDIC OR OTHER INTERCHANGE CODES MAY BE PROBLEMATIC!
output = list(data);
for i in range(0,len(data)):
if ord(output[i]) > 64 < 123: output[i] = chr(ord(data[i]) ^ ord(list(" "*70)[i % len(" "*70)]));
return "".join(output);
print(swap_cases(input("ENTRY::")))
No effecting special characters or anything not in the alphabet, 6 lines of code, relatively fast algorithm no external modules, doesn't use swap() or others string functions and contains only one if block, returning a string as requested not a list.
EDIT:
Come to think of it you can reduce a lot of the clutter by doing this:
if ord(output[i]) > 64 < 123: output[i] = chr(ord(data[i]) ^ 32);
instead of:
if ord(output[i]) > 64 < 123: output[i] = chr(ord(data[i]) ^ ord(list(" "*70)[i % len(" "*70)]));

Categories

Resources