This algorithm will input for a number then return how many anagrams have that length in the dictionary from a .txt file. I am getting an output of 6783 if i enter a 5 when I should be getting 5046 based on my list. I do not know what else to change.
ex: An input of 5 should return 5046
I also have been trying to search through the list with an input of a positive integer for the word length, to collect words with the maximum amount of anagrams, I have no idea where to start.
ex: An input of 4 for word length should return the maximum amount of anagrams which is 6, and outputs the list of anagrams, e.g
[’opts’, ’post’, ’pots’, ’spot’, ’stop’, ’tops’]
def maxword():
input_word = int(input("Enter word length (hit enter key to quit):"))
word_file = open("filename", "r")
word_list = {}
alist = []
for text in word_file:
simple_text = ''.join(sorted(text.strip()))
word_list.update({text.strip(): simple_text})
count = 0
for num in word_list.values():
if len(num) == input_word:
count += 1
alist.append(num)
return str(input_word) + str(len(alist))
This can be achieved with a single pass of the input text file. Your idea to sort the word and store it in a map is the right approach.
Build a dictionary with the sorted word as the key since it will be same for all anagrams and a list of the words having same sorted word as the key as the values.
To avoid looping over the dictionary again we'll keep track of the key having the largest length as it's value.
If building this word_list dictionary is for one time use only for specific length then you can consider only words having the input_word length for the dictionary.
word_file = ["abcd", "cdab", "cdab", "cdab", "efgh", "ghfe", "fehg"]
word_list = {}
alist = []
input_word = 4
max_len = -1
max_word = ""
for text in word_file:
if len(text) == input_word:
simple_text = ''.join(sorted(text.strip()))
if simple_text not in word_list:
word_list.update({simple_text: [text.strip()]})
else:
word_list[simple_text].append(text.strip())
if(len(word_list[simple_text]) > max_len):
max_len = len(word_list[simple_text])
max_word = simple_text
print(max_word)
print(word_list[max_word])
Related
I want to find a word with the most repeated letters given an input a sentence.
I know how to find the most repeated letters given the sentence but I'm not able how to print the word.
For example:
this is an elementary test example
should print
elementary
def most_repeating_word(strg):
words =strg.split()
for words1 in words:
dict1 = {}
max_repeat_count = 0
for letter in words1:
if letter not in dict1:
dict1[letter] = 1
else:
dict1[letter] += 1
if dict1[letter]> max_repeat_count:
max_repeat_count = dict1[letter]
most_repeated_char = letter
result=words1
return result
You are resetting the most_repeat_count variable for each word to 0. You should move that upper in you code, above first for loop, like this:
def most_repeating_word(strg):
words =strg.split()
max_repeat_count = 0
for words1 in words:
dict1 = {}
for letter in words1:
if letter not in dict1:
dict1[letter] = 1
else:
dict1[letter] += 1
if dict1[letter]> max_repeat_count:
max_repeat_count = dict1[letter]
most_repeated_char = letter
result=words1
return result
Hope this helps
Use a regex instead. It is simple and easy. Iteration is an expensive operation compared to regular expressions.
Please refer to the solution for your problem in this post:
Count repeated letters in a string
Interesting exercise! +1 for using Counter(). Here's my suggestion also making use of max() and its key argument, and the * unpacking operator.
For a final solution note that this (and the other proposed solutions to the question) don't currently consider case, other possible characters (digits, symbols etc) or whether more than one word will have the maximum letter count, or if a word will have more than one letter with the maximum letter count.
from collections import Counter
def most_repeating_word(strg):
# Create list of word tuples: (word, max_letter, max_count)
counters = [ (word, *max(Counter(word).items(), key=lambda item: item[1]))
for word in strg.split() ]
max_word, max_letter, max_count = max(counters, key=lambda item: item[2])
return max_word
word="SBDDUKRWZHUYLRVLIPVVFYFKMSVLVEQTHRUOFHPOALGXCNLXXGUQHQVXMRGVQTBEYVEGMFD"
def most_repeating_word(strg):
dict={}
max_repeat_count = 0
for word in strg:
if word not in dict:
dict[word] = 1
else:
dict[word] += 1
if dict[word]> max_repeat_count:
max_repeat_count = dict[word]
result={}
for word, value in dict.items():
if value==max_repeat_count:
result[word]=value
return result
print(most_repeating_word(word))
When a person enters a function (e.g. find_from_dict(letters)), the function searches a word from dictionary.txt that can be made from the letters that the user has inputted—a word that contains the most letters inputted).
For example, letters is input as random typing such as "BAJPPNLE" which will then find "APPLE" from the dictionary since "APPLE" has the most letters from "BAJPPNLE".
def find_from_dict(letters):
n = 0
y = 0
x = 0
dictFile = [line.rstrip('\n') for line in open("dictionary.txt")]
listLetters = list(letters)
final = []
while True:
if n < len(dictFile) and len(list(dictFile[n])) <= len(listLetters) and x < len(list(dictFile[n])) and list(dictFile[n])[x] in listLetters:
x = x + 1
elif n < len(dictFile) and len(list(dictFile[n])) <= len(listLetters) and x < len(list(dictFile[n])) and list(dictFile[n])[x] not in listLetters:
x = 0
n = n + 1
elif n < len(dictFile) and len(list(dictFile[n])) <= len(listLetters) and x == len(list(dictFile[n])):
final.append(dictFile[n])
elif n < len(dictFile) and len(list(dictFile[n])) > len(listLetters):
n = n + 1
else:
print(final)
break
I have this code at the moment, but since my dictionary.txt file is huge and the code is inefficient, it takes forever to go through..
Does anyone have any idea how I could make this code efficient?
You can speed this up by preparing a word index formed of the sorted letters in your word list. Then look for sorted combinations of the letters in that index:
for example:
from collections import defaultdict
from itertools import combinations
with open("/usr/share/dict/words","r") as wordList:
words = defaultdict(list)
for word in wordList.read().upper().split("\n"):
words[tuple(sorted(word))].append(word) # index by sorted letters
def findWords(letters):
for size in range(len(letters),2,-1): # from large to small (minimum 3 letters)
for combo in combinations(sorted(letters),size): # combinations of that size
for word in (w for w in words[combo]): # matching fords from index
yield word # return as you go (iterator)
# If you only want one, change this to: return word
testing:
while True:
letters = input("Enter letters:")
if not letters: break
for word in findWords(letters.upper()):
stop = input(word)
if stop: break
print("")
sample output:
Enter letters:BAJPPNLE
JELAB
BEJAN
LEBAN
NABLE
PEBAN
PEBAN
ALPEN
NEPAL
PANEL
PENAL
PLANE
ALPEN
NEPAL
PANEL
PENAL
PLANE
APPLE
NAPPE.
Enter letters:EPROING
PERIGON
PIGEON
IGNORE
REGION
PROGNE
OPINER.
Enter letters:
if you need a solution without using libraries, you will need to use a recursive approach that does a breadth first traversal of the combination tree:
with open("/usr/share/dict/words","r") as wordList:
words = dict()
for word in wordList.read().upper().split("\n"):
words.setdefault(tuple(sorted(word)),list()).append(word) # index by sorted letters
def findWords(letters,size=None):
if size == None:
letters = sorted(letters)
for size in range(len(letters),2,-1):
for word in findWords(letters,size): yield word
elif len(letters) == size:
for word in words.get(tuple(letters),[]): yield word
elif len(letters)>size:
for i in range(len(letters)):
for word in findWords(letters[:i]+letters[i+1:],size):
yield word
You can kind of "cheat" your way through it by pre-processing the dictionary file.
The idea is: instead of having a list of words, you have a list of groups which is determined by the sorted letters of the words.
For example, something like:
"aeegr": [
"agree",
"eager",
],
"alps": [
"alps",
"laps",
"pals",
]
Then if you wanted to just find the exact match, you could sort the letters from the input and search in the processed file.
But you want the one that matches the most letters, so what you could do is number the letters with prime numbers (I'm only considering lowercase ascii characters), so that a is 2, b is 3, c is 5, d is 7 and so on.
Then, you can get a number by multiplying all the letters, so for example for alps you'd get 2*37*53*67.
In your dictionary file you then have the numbers obtained the same way for each word.
Like:
262774: [
"alps",
"laps",
"pals",
]
You then go through your dictionary and if the initial number divided by the dictionary number has a remainder of 0, that's a possible match.
The maximum number with a remainder of 0 is the one that you want, because that's the one with the most letters present.
Keep in mind that the numbers might get very big very quickly, depending on how many letters you use.
At the moment this code takes in a string from a user and compares it to a text file in which many words are stored. It then outputs all the words that contain an exact match to the string. (E.G "otp = opt, top, pot) Currently when i input the string it only matches the string to the word with the EXACT same letters in a rearranged order.
My question is how do i go about being able to type in excess letters but still output all the words that are contained? for example: Type in "orkignwer" and the program will output "working" even though there are extra letters.
words = []
def isAnAnagram(word, user):
wordList= list(word)
wordList.sort()
inputList= list(user)
inputList.sort()
return (wordList == inputList)
def getAnagrams(user):
lister = [word for word in words if len(word) == len(user) ]
for item in lister:
if isAnAnagram(item, user):
yield item
with open('Dictionary.txt', 'r') as f:
allwords = f.readlines()
f.close()
for x in allwords:
x = x.rstrip()
words.append(x)
inp = 1
while inp != "99":
inp = input("enter word:")
result = getAnagrams(inp)
print(list(result))
You have to edit the isAnAnagram and the getAnagrams functions. First the getAnagrams function should be edited to also include the words of greater length in the lister list:
def getAnagrams(user):
lister = [word for word in words if len(word) <= len(user) ]
for item in lister:
if isAnAnagram(item, user):
yield item
Then you would need to edit the isAnAnagram function. As Alexander Huszagh pointed out, you can use the Counter from the collections package:
from collections import Counter
def isAnAnagram(word, user):
word_counter = Counter(word)
input_counter = Counter(user)
return all(count <= input_counter[key] for key, count in word_counter.items())
The all(count <= input_counter[key] for key, count in word_counter.items()) checks to see if every letter of word appears in user at least as many times as they did in word.
P.S. If you want a more optimized solution, you might want to checkout TRIEs (e.g. MARISA-trie, python-trie or PyTrie).
If I have a frequency dictionary that includes the words from a text as the keys and the number of times they appear in the text as the value. How can I get the word count and average length by making sure to take into account of words that appear more than once? Right now what I have is to just make a list of the keys (since they are the words) and then just use len() for word count.
wordcount=len(list(freq.keys()))
report["count:"]=wordcount
#for average length:
avg=list(freq.keys())
average=sum(map(len,avg))/len(avg)
report["avglen"]=average
number_of_words = int(raw_input("Enter the number of words. "))
word_dict = {}
for i in range(number_of_words):
word = raw_input("Enter word. ")
if word in word_dict:
word_dict[word] += 1
else:
word_dict[word] = 1
print word_dict
print sum([len(word)*word_dict[word] for word in word_dict])/number_of_words
A very similar question: https://stackoverflow.com/questions/20143947/word-frequency-counter-python/20145320#20145320
Use the sum function and dict.values():
freq = { 'test' : 10, 'rep' : 100 }
wordcount = sum(freq.values())
average = sum(len(w) * c for w, c in freq.items()) / wordcount
print(wordcount, average)
I have a random file of words and some of them are palindromes and some are not. Some of those palindromes are 3 or more letters long. How do I count them? I'm wondering how to make the conditions better. I thought I could just length but I keep getting 0 as my answer, which I know is not true because I have the .txt file.
Where am I messing up?
number_of_words = []
with open('words.txt') as wordFile:
for word in wordFile:
word = word.strip()
for letter in word:
letter_lower = letter.lower()
def count_pali(wordFile):
count_pali = 0
for word in wordFile:
word = word.strip()
if word == word[::-1]:
count_pali += 1
return count_pali
print(count_pali)
count = 0
for number in number_of_words:
if number >= 3:
count += 1
print("Number of palindromes in the list of words that have at least 3 letters: {}".format(count))
You are looping through number_of_words in order to calculate count, but number_of_words is initialized to an empty list and never changed after that, hence the loop
for number in number_of_words:
if number >= 3:
count += 1
will execute exactly 0 times
Your code looks good right up until the loop:
for number in number_of_words:
if number >= 3:
count += 1
There is a problem in the logic here. If you think about the data structure of number_of_words, and what you are actually asking python to compare with the 'number >= 3' condition, then I think you will figure your way through it nicely.
--- revised look:
# Getting the words into a list
# word_file = [word1, word2, word3, ..., wordn]
word_file = open('words.txt').readlines()
# set up counters
count_pali, high_score = 0, 0
# iterate through each word and test it
for word in word_file:
# strip newline character
word = word.strip()
# if word is palindrome
if word == word[::-1]:
count_pali += 1
# if word is palindrome AND word is longer than 3 letters
if len(word) > 3:
high_score += 1
print('Number of palindromes in the list of words that have at least 3 letter: {}'.format(high_score))
NOTES:
count_pali: counts the total number of words that are palindromes
high_score: counts the total number of palindromes that are longer than 3 letters
len(word): if word is palindrome, will test the length of the word
Good luck!
This doesn't directly answer your question, but it might help to understand some of the problems that we have run into here. Mostly, you can see how to add to a list, and hopefully see the difference between getting the length of a string, list and integer (which you actually can't do!).
Try running the code below, and examine what is happening:
def step_forward():
raw_input('(Press ENTER to continue...)')
print('\n.\n.\n.')
def experiment():
""" Run a whole lot experiments to explore the idea of lists and
variables"""
# create an empty list, test length
word_list = []
print('the length of word_list is: {}'.format(len(word_list)))
# expect output to be zero
step_forward()
# add some words to the list
print('\nAdding some words...')
word_list.append('Hello')
word_list.append('Experimentation')
word_list.append('Interesting')
word_list.append('ending')
# test length of word_list again
print('\ttesting length again...')
print('\tthe length of word_list is: {}'.format(len(word_list)))
step_forward()
# print the length of each word in the list
print('\nget the length of each word...')
for each_word in word_list:
print('\t{word} has a length of: {length}'.format(word=each_word, length=len(each_word)))
# output:
# Hello has a length of: 5
# Experimentation has a length of: 15
# Interesting has a length of: 11
# ending has a length of: 6
step_forward()
# set up a couple of counters
short_word = 0
long_word = 0
# test the length of the counters:
print('\nTrying to get the length of our counter variables...')
try:
len(short_word)
len(long_word)
except TypeError:
print('\tERROR: You can not get the length of an int')
# you see, you can't get the length of an int
# but, you can get the length of a word, or string!
step_forward()
# we will make some new tests, and some assumptions:
# short_word: a word is short, if it has less than 9 letters
# long_word: a word is long, if it has 9 or more letters
# this test will count how many short and long words there are
print('\nHow many short and long words are there?...')
for each_word in word_list:
if len(each_word) < 9:
short_word += 1
else:
long_word += 1
print('\tThere are {short} short words and {long} long words.'.format(short=short_word, long=long_word))
step_forward()
# BUT... what if we want to know which are the SHORT words and which are the LONG words?
short_word = 0
long_word = 0
for each_word in word_list:
if len(each_word) < 9:
short_word += 1
print('\t{word} is a SHORT word'.format(word=each_word))
else:
long_word += 1
print('\t{word} is a LONG word'.format(word=each_word))
step_forward()
# and lastly, if you need to use the short of long words again, you can
# create new sublists
print('\nMaking two new lists...')
shorts = []
longs = []
short_word = 0
long_word = 0
for each_word in word_list:
if len(each_word) < 9:
short_word += 1
shorts.append(each_word)
else:
long_word += 1
longs.append(each_word)
print('short list: {}'.format(shorts))
print('long list: {}'.format(longs))
# now, the counters short_words and long_words should equal the length of the new lists
if short_word == len(shorts) and long_word == len(longs):
print('Hurray, its works!')
else:
print('Oh no!')
experiment()
Hopefully, when you look through our answers here, and examine the mini-experiment above, you will be able to get your code to do what you need it to do :)