Get word count and average length from frequency dictionary

Get word count and average length from frequency dictionary - python

If I have a frequency dictionary that includes the words from a text as the keys and the number of times they appear in the text as the value. How can I get the word count and average length by making sure to take into account of words that appear more than once? Right now what I have is to just make a list of the keys (since they are the words) and then just use len() for word count.
wordcount=len(list(freq.keys()))
report["count:"]=wordcount
#for average length:
avg=list(freq.keys())
average=sum(map(len,avg))/len(avg)
report["avglen"]=average

number_of_words = int(raw_input("Enter the number of words. "))
word_dict = {}
for i in range(number_of_words):
word = raw_input("Enter word. ")
if word in word_dict:
word_dict[word] += 1
else:
word_dict[word] = 1
print word_dict
print sum([len(word)*word_dict[word] for word in word_dict])/number_of_words
A very similar question: https://stackoverflow.com/questions/20143947/word-frequency-counter-python/20145320#20145320

Use the sum function and dict.values():
freq = { 'test' : 10, 'rep' : 100 }
wordcount = sum(freq.values())
average = sum(len(w) * c for w, c in freq.items()) / wordcount
print(wordcount, average)

Related

Count vowels of each word in a string

Refer to this image1
Hey, this is a program i want to write in python. I tried, and i sucessfully iterated words but now how do i count the individual score?
a_string = input("Enter a sentance: ").lower()
vowel_counts = {}
splits = a_string.split()
for i in splits:
words = []
words.append(i)
print(words)

You can flag the vowels using translate to convert all the vowels to 'a's . Then count the 'a's in each word using the count method:
sentence = "computer programmers rock"
vowels = str.maketrans("aeiouAEIOU","aaaaaaaaaa")
flagged = sentence.translate(vowels) # all vowels --> 'a'
counts = [word.count('a') for word in flagged.split()] # counts per word
score = sum(1 if c<=2 else 2 for c in counts) # sum of points
print(counts,score)
# [3, 3, 1] 5

finding the word with most repeated letters from a string containing a sentence in python

I want to find a word with the most repeated letters given an input a sentence.
I know how to find the most repeated letters given the sentence but I'm not able how to print the word.
For example:
this is an elementary test example
should print
elementary
def most_repeating_word(strg):
words =strg.split()
for words1 in words:
dict1 = {}
max_repeat_count = 0
for letter in words1:
if letter not in dict1:
dict1[letter] = 1
else:
dict1[letter] += 1
if dict1[letter]> max_repeat_count:
max_repeat_count = dict1[letter]
most_repeated_char = letter
result=words1
return result

You are resetting the most_repeat_count variable for each word to 0. You should move that upper in you code, above first for loop, like this:
def most_repeating_word(strg):
words =strg.split()
max_repeat_count = 0
for words1 in words:
dict1 = {}
for letter in words1:
if letter not in dict1:
dict1[letter] = 1
else:
dict1[letter] += 1
if dict1[letter]> max_repeat_count:
max_repeat_count = dict1[letter]
most_repeated_char = letter
result=words1
return result
Hope this helps

Use a regex instead. It is simple and easy. Iteration is an expensive operation compared to regular expressions.
Please refer to the solution for your problem in this post:
Count repeated letters in a string

Interesting exercise! +1 for using Counter(). Here's my suggestion also making use of max() and its key argument, and the * unpacking operator.
For a final solution note that this (and the other proposed solutions to the question) don't currently consider case, other possible characters (digits, symbols etc) or whether more than one word will have the maximum letter count, or if a word will have more than one letter with the maximum letter count.
from collections import Counter
def most_repeating_word(strg):
# Create list of word tuples: (word, max_letter, max_count)
counters = [ (word, *max(Counter(word).items(), key=lambda item: item[1]))
for word in strg.split() ]
max_word, max_letter, max_count = max(counters, key=lambda item: item[2])
return max_word

word="SBDDUKRWZHUYLRVLIPVVFYFKMSVLVEQTHRUOFHPOALGXCNLXXGUQHQVXMRGVQTBEYVEGMFD"
def most_repeating_word(strg):
dict={}
max_repeat_count = 0
for word in strg:
if word not in dict:
dict[word] = 1
else:
dict[word] += 1
if dict[word]> max_repeat_count:
max_repeat_count = dict[word]
result={}
for word, value in dict.items():
if value==max_repeat_count:
result[word]=value
return result
print(most_repeating_word(word))

Count words (even multiples) in a text with Python

I have to write a function that counts how many times a word (or a series of words) appears in a given text.
This is my function so far. What I noticed is that with a series of 3 words the functions works well, but not with 4 words and so on.
from nltk import ngrams
def function(text, word):
for char in ".?!-":
text = text.replace(char, ' ')
n = len(word.split())
countN = 0
bigram_lower = text.lower()
word_lower = word.lower()
n_grams = ngrams(bigram_lower.split(), n)
for gram in n_grams:
for i in range (0, n):
if gram[i] == word_lower.split()[i]:
countN = countN + 1
print (countN)

First thing, please fix your indentation and don't use bigrams as a variable for ngrams as it's a bit confusing (Since you are not storing just bigrams in the bigrams variable). Secondly lets look at this part of your code -
for gram in bigrams:
for i in range (0, n):
if gram[i] == word_lower.split()[i]:
countN = countN + 1
print (countN)
Here you are increasing countN by one for each time a word in your ngram matches up instead of increasing it when the whole ngram matches up. You should instead only increase countN if all the words have matched up -
for gram in bigrams:
if list(gram) == word_lower.split():
countN = countN + 1
print (countN)

May be it was already done in here
Is nltk mandatory?
# Open the file in read mode
text = open("sample.txt", "r")
# Create an empty dictionary
d = dict()
# Loop through each line of the file
for line in text:
# Remove the leading spaces and newline character
line = line.strip()
# Convert the characters in line to
# lowercase to avoid case mismatch
line = line.lower()
# Split the line into words
words = line.split(" ")
# Iterate over each word in line
for word in words:
# Check if the word is already in dictionary
if word in d:
# Increment count of word by 1
d[word] = d[word] + 1
else:
# Add the word to dictionary with count 1
d[word] = 1
# Print the contents of dictionary
for key in list(d.keys()):
print(key, ":", d[key])

This shuld work for you:
def function(text, word):
for char in ".?!-,":
text = text.replace(char, ' ')
n = len(word.split())
countN = 0
bigram_lower = text.lower()
word_lower = tuple(word.lower().split())
bigrams = nltk.ngrams(bigram_lower.split(), n)
for gram in bigrams:
if gram == word_lower:
countN += 1
print (countN)
>>> tekst="this is the text i want to search, i want to search it for the words i want to search for, and it should count the occurances of the words i want to search for"
>>> function(tekst, "i want to search")
4
>>> function(tekst, "i want to search for")
2

Jumble anagram longest word python

This algorithm will input for a number then return how many anagrams have that length in the dictionary from a .txt file. I am getting an output of 6783 if i enter a 5 when I should be getting 5046 based on my list. I do not know what else to change.
ex: An input of 5 should return 5046
I also have been trying to search through the list with an input of a positive integer for the word length, to collect words with the maximum amount of anagrams, I have no idea where to start.
ex: An input of 4 for word length should return the maximum amount of anagrams which is 6, and outputs the list of anagrams, e.g
[’opts’, ’post’, ’pots’, ’spot’, ’stop’, ’tops’]
def maxword():
input_word = int(input("Enter word length (hit enter key to quit):"))
word_file = open("filename", "r")
word_list = {}
alist = []
for text in word_file:
simple_text = ''.join(sorted(text.strip()))
word_list.update({text.strip(): simple_text})
count = 0
for num in word_list.values():
if len(num) == input_word:
count += 1
alist.append(num)
return str(input_word) + str(len(alist))

This can be achieved with a single pass of the input text file. Your idea to sort the word and store it in a map is the right approach.
Build a dictionary with the sorted word as the key since it will be same for all anagrams and a list of the words having same sorted word as the key as the values.
To avoid looping over the dictionary again we'll keep track of the key having the largest length as it's value.
If building this word_list dictionary is for one time use only for specific length then you can consider only words having the input_word length for the dictionary.
word_file = ["abcd", "cdab", "cdab", "cdab", "efgh", "ghfe", "fehg"]
word_list = {}
alist = []
input_word = 4
max_len = -1
max_word = ""
for text in word_file:
if len(text) == input_word:
simple_text = ''.join(sorted(text.strip()))
if simple_text not in word_list:
word_list.update({simple_text: [text.strip()]})
else:
word_list[simple_text].append(text.strip())
if(len(word_list[simple_text]) > max_len):
max_len = len(word_list[simple_text])
max_word = simple_text
print(max_word)
print(word_list[max_word])

Python 3.2 - Converting words to lower case & number of palindromes in the list of words that have at least 3 letters

I have a random file of words and some of them are palindromes and some are not. Some of those palindromes are 3 or more letters long. How do I count them? I'm wondering how to make the conditions better. I thought I could just length but I keep getting 0 as my answer, which I know is not true because I have the .txt file.
Where am I messing up?
number_of_words = []
with open('words.txt') as wordFile:
for word in wordFile:
word = word.strip()
for letter in word:
letter_lower = letter.lower()
def count_pali(wordFile):
count_pali = 0
for word in wordFile:
word = word.strip()
if word == word[::-1]:
count_pali += 1
return count_pali
print(count_pali)
count = 0
for number in number_of_words:
if number >= 3:
count += 1
print("Number of palindromes in the list of words that have at least 3 letters: {}".format(count))

You are looping through number_of_words in order to calculate count, but number_of_words is initialized to an empty list and never changed after that, hence the loop
for number in number_of_words:
if number >= 3:
count += 1
will execute exactly 0 times

Your code looks good right up until the loop:
for number in number_of_words:
if number >= 3:
count += 1
There is a problem in the logic here. If you think about the data structure of number_of_words, and what you are actually asking python to compare with the 'number >= 3' condition, then I think you will figure your way through it nicely.
--- revised look:
# Getting the words into a list
# word_file = [word1, word2, word3, ..., wordn]
word_file = open('words.txt').readlines()
# set up counters
count_pali, high_score = 0, 0
# iterate through each word and test it
for word in word_file:
# strip newline character
word = word.strip()
# if word is palindrome
if word == word[::-1]:
count_pali += 1
# if word is palindrome AND word is longer than 3 letters
if len(word) > 3:
high_score += 1
print('Number of palindromes in the list of words that have at least 3 letter: {}'.format(high_score))
NOTES:
count_pali: counts the total number of words that are palindromes
high_score: counts the total number of palindromes that are longer than 3 letters
len(word): if word is palindrome, will test the length of the word
Good luck!

This doesn't directly answer your question, but it might help to understand some of the problems that we have run into here. Mostly, you can see how to add to a list, and hopefully see the difference between getting the length of a string, list and integer (which you actually can't do!).
Try running the code below, and examine what is happening:
def step_forward():
raw_input('(Press ENTER to continue...)')
print('\n.\n.\n.')
def experiment():
""" Run a whole lot experiments to explore the idea of lists and
variables"""
# create an empty list, test length
word_list = []
print('the length of word_list is: {}'.format(len(word_list)))
# expect output to be zero
step_forward()
# add some words to the list
print('\nAdding some words...')
word_list.append('Hello')
word_list.append('Experimentation')
word_list.append('Interesting')
word_list.append('ending')
# test length of word_list again
print('\ttesting length again...')
print('\tthe length of word_list is: {}'.format(len(word_list)))
step_forward()
# print the length of each word in the list
print('\nget the length of each word...')
for each_word in word_list:
print('\t{word} has a length of: {length}'.format(word=each_word, length=len(each_word)))
# output:
# Hello has a length of: 5
# Experimentation has a length of: 15
# Interesting has a length of: 11
# ending has a length of: 6
step_forward()
# set up a couple of counters
short_word = 0
long_word = 0
# test the length of the counters:
print('\nTrying to get the length of our counter variables...')
try:
len(short_word)
len(long_word)
except TypeError:
print('\tERROR: You can not get the length of an int')
# you see, you can't get the length of an int
# but, you can get the length of a word, or string!
step_forward()
# we will make some new tests, and some assumptions:
# short_word: a word is short, if it has less than 9 letters
# long_word: a word is long, if it has 9 or more letters
# this test will count how many short and long words there are
print('\nHow many short and long words are there?...')
for each_word in word_list:
if len(each_word) < 9:
short_word += 1
else:
long_word += 1
print('\tThere are {short} short words and {long} long words.'.format(short=short_word, long=long_word))
step_forward()
# BUT... what if we want to know which are the SHORT words and which are the LONG words?
short_word = 0
long_word = 0
for each_word in word_list:
if len(each_word) < 9:
short_word += 1
print('\t{word} is a SHORT word'.format(word=each_word))
else:
long_word += 1
print('\t{word} is a LONG word'.format(word=each_word))
step_forward()
# and lastly, if you need to use the short of long words again, you can
# create new sublists
print('\nMaking two new lists...')
shorts = []
longs = []
short_word = 0
long_word = 0
for each_word in word_list:
if len(each_word) < 9:
short_word += 1
shorts.append(each_word)
else:
long_word += 1
longs.append(each_word)
print('short list: {}'.format(shorts))
print('long list: {}'.format(longs))
# now, the counters short_words and long_words should equal the length of the new lists
if short_word == len(shorts) and long_word == len(longs):
print('Hurray, its works!')
else:
print('Oh no!')
experiment()
Hopefully, when you look through our answers here, and examine the mini-experiment above, you will be able to get your code to do what you need it to do :)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Get word count and average length from frequency dictionary - python

Use the sum function and dict.values(): freq = { 'test' : 10, 'rep' : 100 } wordcount = sum(freq.values()) average = sum(len(w) * c for w, c in freq.items()) / wordcount print(wordcount, average)

Related

Count vowels of each word in a string

finding the word with most repeated letters from a string containing a sentence in python

Count words (even multiples) in a text with Python

Jumble anagram longest word python

Python 3.2 - Converting words to lower case & number of palindromes in the list of words that have at least 3 letters

Categories

Resources