I was going through the code on this question for the second answer...
counting letter frequency with a dict
word_list=['abc','az','ed']
def count_letter_frequency(word_list):
letter_frequency={}
for word in word_list:
for letter in word:
keys=letter_frequency.keys()
if letter in keys:
letter_frequency[letter]+=1
else:
letter_frequency[letter]=1
return letter_frequency
I don't understand how keys = letter_frequency.keys() connects to the rest of the problem.
I tried playing around with the code. I understand it as follows:
word_list=['abc','az','ed']
def count_letter_frequency(word_list): #function name and input
letter_frequency={} #empty dictionary
for word in word_list: #go through word_list for each item, the
first being 'abc'
for letter in word: #from 'abc', loop through each letter...'a',
'b', 'c'
keys=letter_frequency.keys() # where I'm stuck. i read it as our empty dictionary,
letter_fequency, plus keys(), together defined by the
variable 'keys'. make keys out of the letters in
letter_frequency (then count them using the next if/else
statement)
if letter in keys: # how can a letter be in keys? there isn't anything in
letter_frequency either, is there?
letter_frequency[letter]+=1
else:
letter_frequency[letter]=1
return letter_frequency
The condition on if letter in keys, appart from being inefficient and overly complex, is meant to determine if the letter should add one to the existing entry in the dictionary or add a new entry for the first occurrence of each letter. Checking if letter in letter_frequency would have done the same thing.
A simpler and more efficient approach would be to let the dictionary indexing work for us (as opposed to a sequential search in keys):
word_list=['abc','az','ed']
letter_frequency = dict()
for word in word_list:
for letter in word:
letter_frequency[letter] = letter_frequency.get(letter,0) + 1
print(letter_frequency)
{'a': 2, 'b': 1, 'c': 1, 'z': 1, 'e': 1, 'd': 1}
letter_frequency={}
is an empty dictionary.
We then look at each word, one letter at a time.
for letter in word:
We find the keys, which is initially empty:
keys=letter_frequency.keys()
We then consider the current letter:
if letter in keys:
letter_frequency[letter]+=1
else:
letter_frequency[letter]=1
First time round, letter isn't there, since keys is empty, so we use the else to add the letter, with a count of one.
letter_frequency[letter]=1
When we look at the next letter, the previous letter is in the keys.
In your above code, keys = letter_frequency.keys() is used to get the list of keys present in the dictionary such as ['a','b','c']. This is to check whether the key is already present in the dictionary or not.
In my opinion, one should use word_list.get() function of dictionary to check the availability of a character in a dictionary.
Refer the below code for clarification.
def countLetterFrequency(word_list):
letter_frequency = {}
for word in word_list:
for ch in word:
# use .get() function instead of .keys() to check the presence of character in dictionary.
if not letter_frequency.get(ch, False):
letter_frequency[ch] = 1
else:
letter_frequency[ch] += 1
return letter_frequency
word_list = ['abc', 'az', 'ed']
print(countLetterFrequency(word_list))
Output:
{'a': 2, 'b': 1, 'c': 1, 'z': 1, 'e': 1, 'd': 1}
Note: .get() function takes the character in first argument which needs to be checked in the dictionary and in second argument takes the default output which needs to be return when the key is not present.
In the first iteration there will be nothing in letter_frequency, therefore the else condition of if letter in keys triggers. In the line letter_frequency[letter]=1, just above the return, the current letter will be used as a new key for the empty dict and its value will be set to 1.
Then, in the next iteration of the inner for loop, keys=letter_frequency.keys() will not be empty anymore.
Related
This question already has answers here:
example function in Python: counting words
(5 answers)
Closed 12 months ago.
Write a function called word_freq that accepts a string as a formal parameter and uses an accumulation pattern to return a dictionary that contins the number of occurrences for each word in that string. Use the code below to test your function:
s = 'it is what it is and only what it is'
print(word_freq(s))
If your function is written correctly, the dictionary below will be displayed.
{'it': 3, 'is': 3, 'what': 2, 'and': 1, 'only': 1}
This should work:
def word_freq(s):
freq = {}
s = s.split(' ')
for i in s:
if i in freq.keys():
freq[i] += 1
else:
freq[i] = 1
return freq
Output for print(word_freq('it is what it is and only what it is')):
{'it': 3, 'is': 3, 'what': 2, 'and': 1, 'only': 1}
What this does is it takes your string s. It defines a dictionary freq that will be used to count the number of times each word is in the string. Then it splits the string everywhere that there is a space, this splits the string into separate words and returns a list. After that it loops over that list, checks if the word is in the dictionary's keys (it, what and the other words will all be keys after the loop has finished). If so, it adds one to the amount of times it occurs. If not, it creates a key for the dictionary, calls it the word, and gives it a value of 1. Finally the function just returns the dictionary freq.
This question already has answers here:
dict.fromkeys all point to same list [duplicate]
How do I initialize a dictionary of empty lists in Python?
(7 answers)
Closed 1 year ago.
I have a list of characters. I would like to pull the punctuation out of the list and make each type of punctuation a key in a dictionary (so a key for ".", a key for "!", etc.) Then, I would like to count the number of times each punctuation character occurs in another list and make that count the value for the corresponding key. Problem is, every value in my dictionary changes instead of just the one key.
The output should look like this because there are 2 "." and 4 "!" in the 'punctuationList'
{'.': [2], ',': [0], '!': [4], '?': [0]}
But instead looks like this because the "!" appears 4 times
{'.': [4], ',': [4], '!': [4], '?': [4]}
# Create a list of characters that we will eventually count
charList = [".","!",".","!","!","!","p","p","p","p","p"]
# Create a list of the punctuation we want a count of
punctuationList = [".",",","!","?"]
# Group each type of punctuation and count the number of times it occurs
dic = dict.fromkeys(punctuationList,[0]) # Turn punctuationList into a dictionary with each punctuation type as a key with a value that is
# the count of each time the key appears in newList
print (dic)
# Count each punctuation in the dictionary
for theKey in dic: # iterate through each key (each punctuation type)
counter = 0 # Set the counter at 0
for theChar in charList: # If the key matches the character in the list, then add 1 to the counter
if theKey == theChar:
counter = counter + 1
dic[theKey][0] = counter # Then change the value of that key to the number of times
# that character shows up in the list
print (dic)
dict.fromkeys shares the same value for each key.
You'll want
dic = {key: [0] for key in punctuationList}
instead to initialize a separate list for each key. (As an aside, there really is no need to wrap the number in a list.)
That said, your code could be implemented using the built-in collections.Counter in just
from collections import Counter
dic = dict(Counter(ch for ch in charList if ch in punctuationList))
i do have a rough idea to use ord and id function to obtain values of the letters in a string but no idea how to increment for every other similar occurrence of the same value.This is my Interview personal preparation. Please give me idea and suggestion no codes.
Without using built in functions or counters, I would think outside the box. Just put a comment that instructs the user what to do physically at their desk. This uses just enough Python to get the job done without using any of the built in functions.
"""
Using a pencil, write down the word, then starting at the first letter:
Write down the letter and put a dash or tick or something next to it
Look for that letter in the word, and add additional marks next to the letter to keep track of the count.
When you've reached the end of the word, go to the next letter and repeat the process
Skip any letters that you have already counted(Otherwise instead of the count of each letter's occurrences, you'll get the factorial of the count!)
"""
User a counter variable, something like this should get you started:
>>> count = 0
>>> meaty = 'asjhdkajhskjfhalksjhdflaksjdkhaskjd'
>>> for i in meaty:
... if i == 'a':
... count+=1
...
>>> print count
5
if you are tracking multiple occurrences of multiple letters, user the letter as a dictionary key that stores a counter integer:
>>> count = {}
>>> for i in meaty:
... key = i
... if key in count:
... count[key]+=1
... else:
... count[key]=1
...
>>> count
{'a': 5, 'd': 4, 'f': 2, 'h': 5, 'k': 6, 'j': 6, 'l': 2, 's': 5}
Edit: Meh, no counter, no built in functions, no coding examples on Stack Overflow? Not sure how to count without counting and isn't python essentially a collection of built in functions? Isn't Stack Overflow for actual coding help?
I am facing very unusual problem, below is code inside a class where pitnamebasename is 2d list.
For example:=
pitnamebasename= [['a','b'],['n','m'],['b','c'],['m','f'],['c','d'],['d',''],['h','f']]
Above list is not necessary to be in any order like ['d',''] can be at 0th order.
Here is my function (inside a class):-
def getRole(self,pitname):
basename=pitname
print '\n\n\nbasename=',basename
ind=True
while pitname and ind:
ind=False
basename=pitname
print 'basename=',basename
for i in self.pitnamebasename:
print 'comparing-',i[0],'and',pitname
if i[0] == pitname:
pitname=i[1]
print 'newpitname=',pitname
ind=True
break
print 'returning-',basename
return basename
pitname is the string for example here it can be 'a'. I want return value to be 'd' mean the traversing must be like a to b, b to c and d to None, hence return value must be d.
Please don't suggest me any other methods to solve.
Now my problem is that in the for loop its not looping till last but getting out in middle. Like return value is either b or c or even d depends on what I am searching. Actually list is very very long. Strange thing I noted that for loop loops only to that index where it loops till its first time. Like here first time for loop gets end when it find 'a' and pitname becomes 'b' but when it search for 'b' it loops till it find 'a' only. Does anyone knows how it is happening?
pitnamebasename= [['a','b'],['n','m'],['b','c'],['m','f'],['c','d'],['d',''],['h','f']]
First, change your '2d' array into a dict:
pitnamebasename = dict(pitnamebasename)
Now, it should be a simple matter of walking from element to element, using the value associated with the current key as the next key, until the value is the empty string; then
return the current key. If pitname ever fails to exist as a key, it's treated as if it does exist and maps to the empty string.
def getRole(self, pitname):
while pitnamebasename.get('pitname', '') != '':
pitname = pitnamebasename[pitname]
return pitname
A defaultdict could also be used in this case:
import collections.defaultdict
pitnamebasename = defaultdict(str, pitnamebasename)
def getRole(self, pitname):
while pitnamebasename[pitname] != '':
pitname = pitnamebasename[pitname]
return pitname
You asked for the solution to your problem but I am having trouble replicating the problem. This code does the same thing without requiring you to change your entire class storage system.
By converting your list of lists into a dictionary lookup (looks like the following)
as_list_of_lists = [['a','b'],['n','m'],['b','c'],['m','f'],['c','d'],['d',''],['h','f']]
as_dict = dict(as_list_of_lists)
# as_dict = {'a': 'b', 'c': 'd', 'b': 'c', 'd': '', 'h': 'f', 'm': 'f', 'n': 'm'}
we can do a default dictionary lookup using the dictionary method .get. This will look for an item (say we pass it 'a') and return the associated value ('b'). If we look for something that isn't in the dictionary, .get will return the second value (a default value) which we can supply as ''. Hence as_dict.get('z','') will return ''
class this_class
def __init__(self):
self.pitnamebasename= [['a','b'],['n','m'],['b','c'],['m','f'],['c','d'],['d',''],['h','f']]
def getRole(self,pitname):
lookup = dict(self.pitnamebasename)
new_pitname = pitname
while new_pitname != '':
pitname = new_pitname
new_pitname = lookup.get(pitname, '')
return pitname
This question already has answers here:
Why do `key in dict` and `key in dict.keys()` have the same output?
(2 answers)
Closed 13 days ago.
I was trying to count duplicate words over a list of 230 thousand words.I used python dictionary to do so. The code is given below:
for words in word_list:
if words in word_dict.keys():
word_dict[words] += 1
else:
word_dict[words] = 1
The above code took 3 minutes!. I ran the same code over 1.5 million words and it was running for more than 25 minutes and I lost my patience and terminated. Then I found that I can use the following code from here (also shown below). The result was so surprising, it completed within seconds!. So my question is what is the faster way to do this operation?. I guess the dictionary creation process must be taking O(N) time. How was the Counter method able to complete this process in seconds and create an exact dictionary of word as key and frequency as it's value?
from collections import Counter
word_dict = Counter(word_list)
It's because of this:
if words in word_dict.keys():
.keys() returns a list of all the keys. Lists take linear time to scan, so your program was running in quadratic time!
Try this instead:
if words in word_dict:
Also, if you're interested, you can see the Counter implementation for yourself. It's written in regular Python.
your dictionary counting method is not well constructed.
you could have used a defaultdict in the following way:
d = defaultdict(int)
for word in word_list:
d[word] += 1
but the counter method from itertools is still faster even though it is doing almost the same thing, because it is written in a more efficient implementation. however, with the counter method, you need to pass it a list to count, whereas using a defaultdict, you can put sources from different locations and have a more complicated loop.
ultimately it is your preference. if counting a list, counter is the way to go, if iterating from multiple sources, or you simply want a counter in your program and dont want the extra lookup to check if an item is already being counted or not. then defaultdict is your choice.
You can actually look at the Counter code, here is the update method that is called on init:
(Notice it uses the performance trick of defining a local definition of self.get)
def update(self, iterable=None, **kwds):
'''Like dict.update() but add counts instead of replacing them.
Source can be an iterable, a dictionary, or another Counter instance.
>>> c = Counter('which')
>>> c.update('witch') # add elements from another iterable
>>> d = Counter('watch')
>>> c.update(d) # add elements from another counter
>>> c['h'] # four 'h' in which, witch, and watch
4
'''
# The regular dict.update() operation makes no sense here because the
# replace behavior results in the some of original untouched counts
# being mixed-in with all of the other counts for a mismash that
# doesn't have a straight-forward interpretation in most counting
# contexts. Instead, we implement straight-addition. Both the inputs
# and outputs are allowed to contain zero and negative counts.
if iterable is not None:
if isinstance(iterable, Mapping):
if self:
self_get = self.get
for elem, count in iterable.iteritems():
self[elem] = self_get(elem, 0) + count
else:
super(Counter, self).update(iterable) # fast path when counter is empty
else:
self_get = self.get
for elem in iterable:
self[elem] = self_get(elem, 0) + 1
if kwds:
self.update(kwds)
You could also try to use defaultdict as a more competitive choice. Try:
from collections import defaultdict
word_dict = defaultdict(lambda: 0)
for word in word_list:
word_dict[word] +=1
print word_dict
Similar to what monkut mentioned, one of the best ways to do this is to utilize the .get() function. Credit for this goes to Charles Severance and the Python For Everybody Course
For testing:
# Pretend line is as follow.
# It can and does contain \n (newline) but not \t (tab).
line = """Your battle is my battle . We fight together . One team . One team .
Shining sun always comes with the rays of hope . The hope is there .
Our best days yet to come . Let the hope light the road .""".lower()
His code (with my notes):
# make an empty dictionary
# split `line` into a list. default is to split on a space character
# etc., etc.
# iterate over the LIST of words (made from splitting the string)
counts = dict()
words = line.split()
for word in words:
counts[word] = counts.get(word,0) + 1
Your code:
for words in word_list:
if words in word_dict.keys():
word_dict[words] += 1
else:
word_dict[words] = 1
.get() does this:
Return the VALUE in the dictionary associated with word.
Otherwise (, if the word is not a key in the dictionary,) return 0.
No matter what is returned, we add 1 to it. Thus it handles the base case of seeing the word for the first time. We cannot use a dictionary comprehension, since the variable the comprehension is assigned to won't exist as we are creating that variable. Meaning
this: counts = { word:counts.get(word,0) + 1 for word in words} is not possible, since counts (is being created and assigned to at the same time. Alternatively, since) counts the variable hasn't been fully defined when we reference it (again) to .get() from it.
Output
>> counts
{'.': 8,
'always': 1,
'battle': 2,
'best': 1,
'come': 1,
'comes': 1,
'days': 1,
'fight': 1,
'hope': 3,
'is': 2,
'let': 1,
'light': 1,
'my': 1,
'of': 1,
'one': 2,
'our': 1,
'rays': 1,
'road': 1,
'shining': 1,
'sun': 1,
'team': 2,
'the': 4,
'there': 1,
'to': 1,
'together': 1,
'we': 1,
'with': 1,
'yet': 1,
'your': 1}
As an aside here is a "loaded" use of .get() that I wrote as a way to solve the classic FizzBuzz question. I'm currently writing code for a similar situation in which I will use modulus and a dictionary, but for a split string as input.