Can someone explain this word frequency count, please?

Can someone explain this word frequency count, please? - python

I'm taking the MIT DS&A algorithm course and on the document distance problem, we have to parse a file into a list of words, then count the frequency of each word in the file. I have a hard time comprehending the following function:
def count_frequency(word_list):
"""
Return a list giving pairs of form: (word,frequency)
"""
L = []
for new_word in word_list:
for entry in L:
if new_word == entry[0]:
entry[1] = entry[1] + 1
break
else:
L.append([new_word,1])
return L
Why do we compare new_word to entry[0]?
a. What if L is empty? What do we compare new_word to?
b. Why do we compare new_word to entry[0] specifically? Why don't we do something like
if new_word in L
c. Why do we need to use break?
Why is the else block 1 tab to the right of the earlier if block? When I tried to indent the else block, an indentation error would show up.
Thank you for your help!

The list L contains two-item entries due to L.append([new_word,1]). If L is empty the for would not be entered, so there is no problem with entry[0].
entry[0] is a word and entry[1] is a count. You can't say if new_word in L because it is not just a list of strings.
break stops the for once a word is found.
for/else is a thing in Python. The else runs if the for completes without interruption (a break in this case). If new_word isn't in L, the for won't break and the new word and a count of 1 is added to L.
FYI, the built-in collections.Counter() would return similar results.

Related

How to use insert() in a for loop without making it a endless loop?

I'm trying to make a project in python so that whenever the program encounters a capital letter it makes a white space. So for example "helloThisIsMyProject" > hello This is My Project.
def project(w):
lst = []
for i in w:
lst.append(i)
for letter in lst:
if letter.isupper():
index = lst.index(letter)
lst.insert(index, " ")
continue
return "".join(lst)
print(project("helloThisIsMyProject"))
I'm having a problem with insert() and the for loop because the for loop is endlessly looping over the "T" from "This". I tried using continue to skip the letter T but it didn't fix my problem.

Those 1st three lines are simply
lst = list(w)
But you would find this more convenient:
lst = []
for letter in w:
if letter.isupper():
lst.append(" ")
lst.append(letter)
That is, rather than producing a list and going back to fix it up,
you could have it correct from the outset.
You can conditionally output a SPACE before you output the letter.
If you prefer to do it with a list comprehension,
you might enlist the aid of a helper:
def adjust(s: str) -> str:
"""Returns the input string, possibly whitespace adjusted."""
return " " + s if s.isupper() else s
lst = [adjust(letter)
for letter in w]
or more compactly:
lst = list(map(adjust, w))

You could use a join with a list comprehension that adds the space to individual characters:
def spaceCaps(w):
return "".join(" "*c.isupper()+c for c in w)
spaceCaps("helloThisIsMyProject")
'hello This Is My Project'
You could also use the sub() function from the regular expression module:
import re
def spaceCaps(w):
return re.sub("([A-Z])",r" \1",w)

Printing Parallel Tuples in python

I'm trying to make a word guessing program and I'm having trouble printing parallel tuples. I need to print the "secret word" with the corresponding hint, but the code that I wrote doesn't work. I can't figure out where I'm going wrong.
Any help would be appreciated :)
This is my code so far:
import random
Words = ("wallet","canine")
Hints = ("Portable money holder","Man's best friend")
vowels = "aeiouy"
secret_word = random.choice(Words)
new_word = ""
for letter in secret_word:
if letter in vowels:
new_word += '_'
else:
new_word += letter
maxIndex = len(Words)
for i in range(1):
random_int = random.randrange(maxIndex)
print(new_word,"\t\t\t",Hints[random_int])

The issue here is that random_int is, as defined, random. As a result you'll randomly get the right result sometimes.
A quick fix is by using the tuple.index method, get the index of the element inside the tuple Words and then use that index on Hints to get the corresponding word, your print statement looking like:
print(new_word,"\t\t\t",Hints[Words.index(secret_word)])
This does the trick but is clunky. Python has a data structure called a dictionary with which you can map one value to another. This could make your life easier in the long run. To create a dictionary from the two tuples we can zip them together:
mapping = dict(zip(Words, Hints))
and create a structure that looks like:
{'canine': "Man's best friend", 'wallet': 'Portable money holder'}
This helps.
Another detail you could fix is in how you create the new_word; instead of looping you can use a comprehension to create the respective letters and then join these on the empty string "" to create the resulting string:
new_word = "".join("_" if letter in vowels else letter for letter in secret_word)
with exactly the same effect. Now, since you also have the dictionary mapping, getting the respective hint is easy, just supply the key new_word to mapping and it'll return the key.
A revised version of your code looks like this:
import random
Words = ("wallet", "canine")
Hints = ("Portable money holder", "Man's best friend")
mapping = dict(zip(Words, Hints))
vowels = "aeiouy"
secret_word = random.choice(Words)
new_word = "".join("_" if letter in vowels else letter for letter in secret_word)
print(new_word,"\t\t\t", d[secret_word])

How to avoid Runtime error in this coding challenge?

I am completing this HackerRank coding challenge. Essentially, the challenge asks us to find all the substrings of the input string without mixing up the letters. Then, we count the number of substrings that start with a vowel and count the number of substrings that start with a consonant.
The coding challenge is structured as a game where Stuart's score is the number of consonant starting substrings and Kevin's score is the number of vowel starting substrings. The program outputs the winner, i.e. the one with the most substrings.
For example, I created the following code:
def constwordfinder(word):
word = word.lower()
return_lst = []
for indx in range(1,len(word)+1):
if word[indx-1:indx] not in ['a','e','i','o','u']:
itr = indx
while itr < len(word)+1:
return_lst.append(word[indx-1:itr])
itr +=1
return return_lst
def vowelwordfinder(word):
word = word.lower()
return_lst = []
for indx in range(1,len(word)+1):
if word[indx-1:indx] in ['a','e','i','o','u']:
itr = indx
while itr < len(word)+1:
return_lst.append(word[indx-1:itr])
itr +=1
return return_lst
def game_scorer(const_list, vow_list):
if len(const_list) == len(vow_list):
return 'Draw'
else:
if len(const_list) > len(vow_list):
return 'Stuart ' + str(len(const_list))
else:
return 'Kevin ' + str(len(vow_list))
input_str = input()
print(game_scorer(constwordfinder(input_str), vowelwordfinder(input_str)))
This worked for smaller strings like BANANA, although when HackerRank started inputting strings like the following, I got multiple Runtime errors on the test cases:
NANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANAN
I tried structuring the program to be a bit more concise, although I still got Runtime errors on the longer test cases:
def wordfinder(word):
word = word.lower()
return_lst = []
for indx in range(1,len(word)+1):
itr = indx
while itr < len(word)+1:
return_lst.append(word[indx-1:itr])
itr +=1
return return_lst
def game_scorer2(word_list):
kevin_score = 0
stuart_score = 0
for word in word_list:
if word[0:1] not in ['a','e','i','o','u']:
stuart_score += 1
else:
kevin_score +=1
if stuart_score == kevin_score:
return 'Draw'
else:
if stuart_score > kevin_score:
return 'Stuart ' + str(stuart_score)
else:
return 'Kevin ' + str(kevin_score)
print(game_scorer2(wordfinder(input())))
What else exactly should I be doing to structure my program to avoid Runtime errors like before?

Here's a quick and dirty partial solution based on my hints:
input_str = raw_input()
kevin = 0
for i, c in enumerate(input_str):
if c.lower() in "aeiou":
kevin += len(input_str) - i
print kevin
Basically, iterate over each character, and if it is in the set of vowels, Kevin's score increases by the number of remaining characters in the string.
The remaining work should be rather obvious, I hope!
[stolen from the spoilers section of the site in question]
Because say for each consonant, you can make n substrings beginning with that consanant. So for the BANANA example look at the first B. With that B, you can make: B, BA, BAN, BANA, BANAN, BANANA. That's six substrings starting with that B or length(string) - indexof(character), which means that B adds 6 to the score. So you go through the string, looking for each consonant, and add length(string) - index to the score.

The problem here is your algorithm.You are finding all the Sub strings of the text. It takes exponential time to solve the problem. That's why you got run time errors here. You have to use another good algorithm to solve this problem rather than using sub strings.

Counting the vowels from a string

This is what I'm trying to do. I have some text and I'm trying to filter through it and only print out the TOP TEN words with the most vowels. This is only a portion of my code, but this is the only part that I keep getting an error on. The error I get is: unorderable types NoneType() < int()....and I get that error when i try and sort my words. (let me know if I am unclear or confusing) How can I fix this error (in my topTen function)? This is the little piece of code I'm talking about:
def topTen(text):
words = textToWords(text)
words.sort(key=countVowels, reverse=True)
for ctr in range(10):
print(words[ctr])
def textToWords(T):
W = []
for line in T:
words = line.split()
for word in words:
W.append(word.lower())
return W
def countVowels(st):
ctr = 0
for ch in st:
if ch in "aeiou":
ctr =+ 1
return ctr

The last line of countVowels is not indented correctly. When the if statement is taken the function will return 1; when the if statement is never taken (the word contains no vowels) the function returns None. Re-indent the return ctr so it's outside the loop and I think the program will work.

Reference next item in list: python

I'm making a variation of Codecademy's pyglatin.py to make a translator that accepts and translates multiple words. However, I'm having trouble translating more than one word. I've been able to transfer the raw input into a list and translate the first, but I do not know how to reference the next item in the list. Any help would be greatly appreciated.
def piglatin1():
pig = 'ay'
original = raw_input('Enter a phrase:').split(' ')
L = list(original)
print L
i = iter(L)
item = i.next()
for item in L:
if len(item) > 0 and item.isalpha():
word = item.lower()
first = word
if first == "a" or first == "e" or first == "i" or first == "o" or first =="u":
new_word = word + pig
print new_word
else:
new_word = word[1:] + word[0:1] + pig
# first word translated
L = []
M = L[:]
L.append(new_word)
print L # secondary list created.
again = raw_input('Translate again? Y/N')
print again
if len(again) > 0 and again.isalpha():
second_word = again.lower()
if second_word == "y":
return piglatin()
else:
print "Okay Dokey!"
else:
print 'Letters only please!'
return piglatin1()

I was working on this problem recently as well and came up with the following solution (rather than use range, use enumerate to get the index).
for index, item in enumerate(L):
next = index + 1
if next < len(L):
print index, item, next
This example shows how to access the current index, the current item, and then the next item in the list (if it exists in the bounds of the list).

Here are a few things to note that might help.
The lines i = iter(L) and item = i.next() are unnecessary. They have no effect in this method because you are redefining item immediately afterwards in the line for item in L. Go ahead and comment out those two lines to see if it makes any changes in your output.
The looping construct for item in L will go once over every item in the list. Whatever code you write within this loop will be executed once for each item in the list. The variable item is your handle to the list element of an iteration.
If, during any iteration, you really do want to access the "next" element in the list as well, then consider using a looping construct such as for i in range(0,len(L)). Then L[i] will be the current item and L[i+1] will you give the subsequent item.

There are some slight issues with the code but I think there is one main reason why it will not repeat.
In order to process the entire string the
again = raw_input('Translate again? Y/N')
and it's succeeding lines should be brought outside the for statement.
Also you appear to be setting L to an empty string inside the loop:
L = []
The following is a modified version of your code which will loop through the entire sentence and then ask for another one.
def piglatin():
pig = 'ay'
while True:
L = raw_input('Enter a phrase:').split(' ')
M = []
for item in L:
if len(item) > 0 and item.isalpha():
word = item.lower()
first = word
if first == "a" or first == "e" or first == "i" or first == "o" or first =="u":
new_word = word + pig
print new_word
else:
new_word = word[1:] + word[0:1] + pig
M.append(new_word)
else:
print 'Letters only please!'
print M # secondary list created.
again = raw_input('Translate again? Y/N')
print again
if len(again) > 0 and again.isalpha():
second_word = again.lower()
if second_word == "n":
print "Okay Dokey!"
break
Changes made:
You don't need to cast the return of the split to a list. The split
return type is a list.
It isn't necessary to make an iterator, the for loop will do this for you.
I removed the function as the return type. I'm assuming you were attempting some form of recursion but it isn't strictly necessary.
Hope this helps.

Step by step:
If you set variable original in this way:
original = raw_input('Enter a phrase:').split()
it will be already a list, so need to additional assignment.
What is the purpose of these lines?
i = iter(L)
item = i.next()
In a loop, you assign variable to the word, when it is actually only the first letter of the word, so it’ll be better like this: first = word[0]
Then if you want to check if first is a vowel, you can just do:
if first in 'aeuoiy'
Answer to your actual question: do not assign L to an empty list!
If you want to repeat the action of a function, you can just call it again, no need to rewrite the code.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Can someone explain this word frequency count, please? - python

Related

How to use insert() in a for loop without making it a endless loop?

Printing Parallel Tuples in python

How to avoid Runtime error in this coding challenge?

Counting the vowels from a string

Reference next item in list: python

Categories

Resources