Character frequency in python 2.7

Character frequency in python 2.7 - python

I'm so stuck on this task. I have a task where I need to write a program in python 2.7 which prompts a user to input a string and then the program needs to return the number of times the letters in that string occur. for example the word "google.com" must return 'o': 3, 'g': 2, '.': 1, 'e': 1, 'l': 1, 'm': 1, 'c': 1
I know i need to use the list() function but all i have so far is:
string = raw_input("Enter a string: ")
newString = list(string)
and then i get stuck from there because I don't know how to make the program count the number of times the letters occur. I know there must be a for loop in the syntax but I'm not sure how I'm going to use it in this case.
NB: We haven't been introduced to dictionaries or imports yet so please keep it as simple as possible. Basically the most round about method will work best.

You can handle this problem directly with the help of count function.
You can start with an empty dictonary and add each character of the entered string and its count to the dictionary.
This can be done like this..!
string = raw_input("Enter a string: ")
count_dict = {}
for x in string:
count_dict[x] = string.count(x)
print count_dict
#input : google.com
# output : {'c': 1, 'e': 1, 'g': 2, 'm': 1, 'l': 1, 'o': 3, '.': 1}

Update:
Since you haven't been introduced to dictionary and imports, you can use the below solution.
for i in set(string):
print("'{}'".format(i), string.count(i), end=",")
Use Counter:
from collections import Counter
string = "google.com"
print(Counter(string))
Other way, create a dictionary and add chars looping through your string.
dicta = {}
for i in string:
if i not in dicta:
dicta[i] = 1
else:
dicta[i] += 1
print(dicta)

Related

Why this python character sequence code giving unexpected result?

I am writing a python program to find character sequence in a word. But the program is giving the unexpected result.
I have found a similar type program that works perfectly.
To me I think the two program is quite similar but dont know why one of them does not work
The program that is not working:
# Display the character sequence in a word
dict={}
string=input("Enter the string:").strip().lower()
for letter in string:
if letter !=dict.keys():
dict[letter]=1
else:
dict[letter]=dict[letter]+1
print(dict)
The program that is working:
def char_frequency(str1):
dict = {}
for n in str1:
keys = dict.keys()
if n in keys:
dict[n] += 1
else:
dict[n] = 1
return dict
print(char_frequency('google.com'))
The output for the first program is giving:
Enter the string:google.com
{'g': 1, 'c': 1, 'm': 1, 'o': 1, 'l': 1, '.': 1, 'e': 1}
The output for the second program is:
{'c': 1, 'e': 1, 'o': 3, 'g': 2, '.': 1, 'm': 1, 'l': 1}
The above is the correct output.
Now the questions in my mind.
i. Why the first program is not working correctly?
ii. Is the ideology of these two programs are different?

Actually, there's a little mistake is in the if statement you have used. Just have a look at the below modified program.
Note: Also make sure not to use pre-defined data type names like dict as variable names. I have changed that to d here.
>>> d = {}
>>>
>>> string=input("Enter the string:").strip().lower()
Enter the string:google.com
>>>
>>> for letter in string:
... if letter not in d.keys():
... d[letter] = 1
... else:
... d[letter] = d[letter] + 1
...
>>> print(d)
{'g': 2, 'o': 3, 'l': 1, 'e': 1, '.': 1, 'c': 1, 'm': 1}
>>>
You can have also have a look at the below statements executed on the terminal.
Comparing a key with d.keys() will always return False as key is a string here and d.keys() will always be an object of type dict_keys (Python3) and a list (Python2).
>>> d = {"k1": "v1", "k3": "v2", "k4": "Rishi"}
>>>
>>> d.keys()
dict_keys(['k1', 'k3', 'k4'])
>>>
>>> "k1" in d
True
>>>
>>> not "k1" in d
False
>>>
>>> "k1" == d.keys()
False
>>>
>>> "k1" not in d
False
>>>
Answers of your 2 questions:
Because the statement letter != dict.keys() is always True so no increment in key counts. Just change it to letter not in dict.keys(). And it is better to use d in place of dict so that the statement will look like letter not in d.keys().
Logic of both the programs are same i.e. iterating over the dictionary, checking for an existence of key in dictionary. If it does not exist, create a new key with count 1 else increment the related count by 1.
Thank you v. much.

This line is nonsensical:
if letter !=dict.keys():
letter is a length one str, while dict.keys() returns a key view object, which is guaranteed to never be equal to a str of any kind. Your if check is always false. The correct logic would be:
if letter not in dict:
(you could add .keys() if you really want to, but it's wasteful and pointless; membership testing on a dict is checking its keys implicitly).
Side-note: You're going to confuse the crap out of yourself by naming a variable dict, because you're name-shadowing the dict constructor; if you ever need to use it, it won't be available in that scope. Don't shadow built-in names if at all possible.

Dictionary comprehension produce different result from loop

The loop to check whether hand contain letter in word work as below.
hand = {'h': 1, 'e': 1, 'l': 2, 'o': 1}
word = 'hello'
extra_hand = hand.copy()
for letter in word:
extra_hand[letter] -= 1
>> extra_hand
{'h': 0, 'e': 0, 'l': 0, 'o': 0}
Then, I try to convert to Dictionary comprehension. It should look like this.
hand = {'h': 1, 'e': 1, 'l': 2, 'o': 1}
word = 'hello'
extra_hand = {letter:hand[letter] - 1 for letter in word}
>>extra_hand
{'h': 0, 'e': 0, 'l': 1, 'o': 0}
As you can see, the result is different, l is 1 which incorrect. I suspect that 'l' were derived from hand dictionary object without mutation. So, it just did 2-1 twice and become 1 rather than 2-1 and 1-1.
What should I do to fix the dictionary comprehension please?

A dictionary comprehension cannot be used in this recursive manner. It cannot continually update an item as word is iterated.
Another way to think of this is that the keys and values of your dictionary are not available for manipulation until the entire comprehension is complete.
You can consider the dictionary comprehension to be replicating the for loop below. As with the for loop, you will be setting values rather than adding to the value previously assigned to the key.
for letter in word:
extra_hand['letter'] = hand['letter'] - 1
Your loop is perfectly fine and there is no need to use a dictionary comprehension.
As an alternative, if you only wish to calculate non-zero counts, you can use collections.Counter:
from collections import Counter
hand = {'h': 1, 'e': 1, 'l': 2, 'o': 1}
word = 'hello'
res = Counter(hand) - Counter(word)
# Counter()
hand = {'h': 1, 'e': 2, 'l': 2, 'o': 1}
word = 'hello'
res = Counter(hand) - Counter(word)
# Counter({'e': 1})

Your both methods do not mean the same. If the dictionary comprehension method would be tranlated in loops, you would get
hand = {'h': 1, 'e': 1, 'l': 2, 'o': 1}
word = 'hello'
extra_hand = {}
for letter in word:
extra_hand[letter] = hand[letter] - 1
So, hand['l'] is never changed and therefore, it's still 2 when the loop reaches the second l. That's why you get the value 1 both times.
In my opinion, the loop variant is perfectly fine.

extra_hand = {letter:hand[letter] - 1 for letter in word}
is equivalent to:
for letter in word:
extra_hand[letter] = hand[letter] - 1
And not:
for letter in word:
extra_hand[letter] -= 1
In the first case, extra_hand['l'] equals to 1, while in the second case, you subtract 1 twice (which gives 0).

python list.count always returns 0

I have a lengthy Python list and would like to count the number of occurrences of a single character. For example, how many total times does 'o' occur? I want N=4.
lexicon = ['yuo', 'want', 'to', 'sioo', 'D6', 'bUk', 'lUk'], etc.
list.count() is the obvious solution. However, it consistently returns 0. It doesn't matter which character I look for. I have double checked my file - the characters I am searching for are definitely there. I happen to be calculating count() in a for loop:
for i in range(100):
# random sample 500 words
sample = list(set(random.sample(lexicon, 500)))
C1 = ['k']
total = sum(len(i) for i in sample) # total words
sample_count_C1 = sample.count(C1) / total
But it returns 0 outside of the for loop, over the list 'lexicon' as well. I don't want a list of overall counts so I don't think Counter will work.
Ideas?

If we take your list (the shortened version you supplied):
lexicon = ['yu', 'want', 'to', 'si', 'D6', 'bUk', 'lUk']
then we can get the count using sum() and a generator-expression:
count = sum(s.count(c) for s in lexicon)
so if c were, say, 'k' this would give 2 as there are two occurances of k.
This will work in a for-loop or not, so you should be able to incorporate this into your wider code by yourself.
With your latest edit, I can confirm that this produces a count of 4 for 'o' in your modified list.

If I understand your question correctly, you would like to count the number of occurrences of each character for each word in the list. This is known as a frequency distribution.
Here is a simple implementation using Counter
from collections import Counter
lexicon = ['yu', 'want', 'to', 'si', 'D6', 'bUk', 'lUk']
chars = [char for word in lexicon for char in word]
freq_dist = Counter(chars)
Counter({'t': 2, 'U': 2, 'k': 2, 'a': 1, 'u': 1, 'l': 1, 'i': 1, 'y': 1, 'D': 1, '6': 1, 'b': 1, 's': 1, 'w': 1, 'n': 1, 'o': 1})
Using freq_dist, you can return the number of occurrences for a character.
freq_dist.get('a')
1
# get() method returns None if character is not in dict
freq_dist.get('4')
None

It's giving zero because sample.count('K') will matches k as a string. It will not consider buk or luk.
If u want to calculate frequency of character go like this
for i in range(100):
# random sample 500 words
sample = list(set(random.sample(lexicon, 500)))
C1 = ['k']
total = sum(len(i) for i in sample) # total words
sample_count=sum([x.count(C1) for x in sample])
sample_count_C1 = sampl_count / total

How to get a tally of each letter in a string?

I am tasked with coming with a program that will decyrpt a Cesar cipher, and I was looking at other questions previously asked on this site and understand it mostly. However, I just have a very basic question on how to get a tally of every letter within a string.
here's what I have come up with so far:
Input=input("input the text you want to decipher:")
import string
print(string.ascii_uppercase)
def get_char(ch,shift):
#get a tally of the each letter
common_letter=#letter with most "tallies"
return common_letter
print(common_letter)
#finding the shift
def get_shift(s,ignore):
for x in Input:
shift=get_char-x
if shift=='e':
return x
print(x)
def output_plaintext(s,shift):
#convert back to English based off shift
pass
def main():
# main body where i call together my other functions
pass
input("is this decrypted?")
#if no is inputted re run program with second most common letter
How do I get a count of each letter in a String?
-Nathan

This may help you:-
from collections import Counter
input='Nathannn'
print Counter(input)
Output:-
Counter({'n': 3, 'a': 2, 'h': 1, 't': 1, 'N': 1})
If you want to ignore case use input.lower() and then apply Counter(input)

Here's another approach, if you can't use imported modules, e.g. collections:
>>> string = 'aardvark'
>>> {letter: string.count(letter) for letter in set(string)}
{'v': 1, 'r': 2, 'd': 1, 'a': 3, 'k': 1}

You can get the last character in a string using the following code:
string[len(string)-1]

Python KeyError while comparing chars in dict and list

I have a problem concerning a comparison between a char key in a dict and a char within a list.
The Task is to read a text and count all beginning letters.
I have a list with chars:
bchars = ('i','g','h','n','h')
and a dict with the alphabet and frequency default to zero:
d = dict(dict())
for i in range(97,123):
d[i-97]={chr(i):0}
no i want to check like the following:
for i in range(len(bchars)):
for j in range(len(d)):
if(bchars[i] in d[j]):
d[j][chr(i+97)] +=1
else:
d[j][chr(i+97)] +=0
so if the char in the list is a key at the certain position then += 1 else += zero
I thought by using a if/else statement I can bypass the KeyError.
Is there any more elegant solution for that?

The specific problem is that you check whether bchars[i] is in d[j], but then the key you actually use is chr(i+97).
chr(i+97) is the index of the ith character in bchars, but mapped to ASCII characters starting from 'a'. Why would you want to use this as your key?
I think you really want to do:
for i in range(len(bchars)):
for j in range(len(d)):
if(bchars[i] in d[j]):
d[j][bchars[i]] += 1
else:
d[j][bchars[i]] = 1
Note that you can't use += in the else; remember how you literally just checked whether the key was there and decided it wasn't?
More broadly, though, your code doesn't make sense - it is overcomplicated and does not use the real power of Python's dictionaries. d looks like:
{0: {'a': 0}, 1: {'b': 0}, 2: {'c': 0}, ...}
It would be much more sensible to build a dictionary mapping character directly to count:
{'a': 0, 'b': 0, 'c': 0, ...}
then you can simply do:
for char in bchars:
if char in d:
d[char] += 1
Python even comes with a class just for doing this sort of thing.

The nested dictionary doesn't seem necessary:
d = [0] * 26
for c in bchars:
d[ord(c)-97] += 1
You might also want to look at the Counter class in the collections module.

from collections import Counter
bchars = ('i','g','h','n','h')
counts = Counter(bchars)
print(counts)
print(counts['h'])
prints
Counter({'h': 2, 'i': 1, 'g': 1, 'n': 1})
2

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Character frequency in python 2.7 - python

Related

Why this python character sequence code giving unexpected result?

Dictionary comprehension produce different result from loop

python list.count always returns 0

How to get a tally of each letter in a string?

Python KeyError while comparing chars in dict and list

Categories

Resources