At the moment this code takes in a string from a user and compares it to a text file in which many words are stored. It then outputs all the words that contain an exact match to the string. (E.G "otp = opt, top, pot) Currently when i input the string it only matches the string to the word with the EXACT same letters in a rearranged order.
My question is how do i go about being able to type in excess letters but still output all the words that are contained? for example: Type in "orkignwer" and the program will output "working" even though there are extra letters.
words = []
def isAnAnagram(word, user):
wordList= list(word)
wordList.sort()
inputList= list(user)
inputList.sort()
return (wordList == inputList)
def getAnagrams(user):
lister = [word for word in words if len(word) == len(user) ]
for item in lister:
if isAnAnagram(item, user):
yield item
with open('Dictionary.txt', 'r') as f:
allwords = f.readlines()
f.close()
for x in allwords:
x = x.rstrip()
words.append(x)
inp = 1
while inp != "99":
inp = input("enter word:")
result = getAnagrams(inp)
print(list(result))
You have to edit the isAnAnagram and the getAnagrams functions. First the getAnagrams function should be edited to also include the words of greater length in the lister list:
def getAnagrams(user):
lister = [word for word in words if len(word) <= len(user) ]
for item in lister:
if isAnAnagram(item, user):
yield item
Then you would need to edit the isAnAnagram function. As Alexander Huszagh pointed out, you can use the Counter from the collections package:
from collections import Counter
def isAnAnagram(word, user):
word_counter = Counter(word)
input_counter = Counter(user)
return all(count <= input_counter[key] for key, count in word_counter.items())
The all(count <= input_counter[key] for key, count in word_counter.items()) checks to see if every letter of word appears in user at least as many times as they did in word.
P.S. If you want a more optimized solution, you might want to checkout TRIEs (e.g. MARISA-trie, python-trie or PyTrie).
Related
I want to find a word with the most repeated letters given an input a sentence.
I know how to find the most repeated letters given the sentence but I'm not able how to print the word.
For example:
this is an elementary test example
should print
elementary
def most_repeating_word(strg):
words =strg.split()
for words1 in words:
dict1 = {}
max_repeat_count = 0
for letter in words1:
if letter not in dict1:
dict1[letter] = 1
else:
dict1[letter] += 1
if dict1[letter]> max_repeat_count:
max_repeat_count = dict1[letter]
most_repeated_char = letter
result=words1
return result
You are resetting the most_repeat_count variable for each word to 0. You should move that upper in you code, above first for loop, like this:
def most_repeating_word(strg):
words =strg.split()
max_repeat_count = 0
for words1 in words:
dict1 = {}
for letter in words1:
if letter not in dict1:
dict1[letter] = 1
else:
dict1[letter] += 1
if dict1[letter]> max_repeat_count:
max_repeat_count = dict1[letter]
most_repeated_char = letter
result=words1
return result
Hope this helps
Use a regex instead. It is simple and easy. Iteration is an expensive operation compared to regular expressions.
Please refer to the solution for your problem in this post:
Count repeated letters in a string
Interesting exercise! +1 for using Counter(). Here's my suggestion also making use of max() and its key argument, and the * unpacking operator.
For a final solution note that this (and the other proposed solutions to the question) don't currently consider case, other possible characters (digits, symbols etc) or whether more than one word will have the maximum letter count, or if a word will have more than one letter with the maximum letter count.
from collections import Counter
def most_repeating_word(strg):
# Create list of word tuples: (word, max_letter, max_count)
counters = [ (word, *max(Counter(word).items(), key=lambda item: item[1]))
for word in strg.split() ]
max_word, max_letter, max_count = max(counters, key=lambda item: item[2])
return max_word
word="SBDDUKRWZHUYLRVLIPVVFYFKMSVLVEQTHRUOFHPOALGXCNLXXGUQHQVXMRGVQTBEYVEGMFD"
def most_repeating_word(strg):
dict={}
max_repeat_count = 0
for word in strg:
if word not in dict:
dict[word] = 1
else:
dict[word] += 1
if dict[word]> max_repeat_count:
max_repeat_count = dict[word]
result={}
for word, value in dict.items():
if value==max_repeat_count:
result[word]=value
return result
print(most_repeating_word(word))
This algorithm will input for a number then return how many anagrams have that length in the dictionary from a .txt file. I am getting an output of 6783 if i enter a 5 when I should be getting 5046 based on my list. I do not know what else to change.
ex: An input of 5 should return 5046
I also have been trying to search through the list with an input of a positive integer for the word length, to collect words with the maximum amount of anagrams, I have no idea where to start.
ex: An input of 4 for word length should return the maximum amount of anagrams which is 6, and outputs the list of anagrams, e.g
[’opts’, ’post’, ’pots’, ’spot’, ’stop’, ’tops’]
def maxword():
input_word = int(input("Enter word length (hit enter key to quit):"))
word_file = open("filename", "r")
word_list = {}
alist = []
for text in word_file:
simple_text = ''.join(sorted(text.strip()))
word_list.update({text.strip(): simple_text})
count = 0
for num in word_list.values():
if len(num) == input_word:
count += 1
alist.append(num)
return str(input_word) + str(len(alist))
This can be achieved with a single pass of the input text file. Your idea to sort the word and store it in a map is the right approach.
Build a dictionary with the sorted word as the key since it will be same for all anagrams and a list of the words having same sorted word as the key as the values.
To avoid looping over the dictionary again we'll keep track of the key having the largest length as it's value.
If building this word_list dictionary is for one time use only for specific length then you can consider only words having the input_word length for the dictionary.
word_file = ["abcd", "cdab", "cdab", "cdab", "efgh", "ghfe", "fehg"]
word_list = {}
alist = []
input_word = 4
max_len = -1
max_word = ""
for text in word_file:
if len(text) == input_word:
simple_text = ''.join(sorted(text.strip()))
if simple_text not in word_list:
word_list.update({simple_text: [text.strip()]})
else:
word_list[simple_text].append(text.strip())
if(len(word_list[simple_text]) > max_len):
max_len = len(word_list[simple_text])
max_word = simple_text
print(max_word)
print(word_list[max_word])
I'm trying to make a script where I can input an anagram of any word and it will read from a dictionary to see if there's a match
(ex. estt returns: = unjumble words: test)
If there are two matches it will write
(ex. estt returns: there are multiple matches: test, sett(assuming sett is a word lol)
I couldn't even get one match going, keeps returning "no match" even though if I look at my list made from a dictionary I see the words.
Here's the code I wrote so far
def anagrams(s):
if s =="":
return [s]
else:
ans = []
for w in anagrams(s[1:]):
for pos in range(len(w)+1):
ans.append(w[:pos]+s[0]+w[pos:])
return ans
dic_list = []
def dictionary(filename):
openfile = open(filename,"r")
read_file = openfile.read()
lowercase = read_file.lower()
split_words = lowercase.split()
for words in split_words:
dic_list.append(words)
def main():
dictionary("words.txt")
anagramsinput = anagrams(input("unjumble words here: "))
for anagram in anagramsinput:
if anagram in dic_list:
print(anagram)
else:
print("no match")
break
It's as if anagram isn't in dic_list. what's happening?
You are breaking after a single check in your loop, remove the break to get all anagrams:
def main():
dictionary("words.txt")
anagramsinput = anagrams(input("unjumble words here: "))
for anagram in anagramsinput:
if anagram in dic_list: # don't break, loop over every possibility
print(anagram)
If you don't want to print no match just remove it, also if you want all possible permutations of the letters use itertools.permutations:
from itertools import permutations
def anagrams(s):
return ("".join(p) for p in permutations(s))
Output:
unjumble words here: onaacir
aaronic
In your anagrams function you are returning before you finish the outer loop therefore missing many permutations:
def anagrams(s):
if s =="":
return [s]
else:
ans = []
for w in anagrams(s[1:]):
for pos in range(len(w)+1):
ans.append(w[:pos]+s[0]+w[pos:])
return ans # only return when both loops are done
Now after both changes your code will work
I'm working with some of the corpus materials from NLPP. I'm trying to improve my unscrambling score in the code... at the moment I'm hitting 91.250%.
The point of the exercise is to alter the represent_word function to improve the score.
The function consumes a word a string, and this word is either scrambled or unscrambled. The function produces a "representation" of the word, which is a list containing the following information:
word length
number of vowels
number of consonants
first and last letter of the word (these are always unscrambled)
a tuple of the most commonly used words from the corpus, who's characters are also members of the given word input.
I have also tried analysing anagrams of prefixes and suffixes, but they don't contribute anything to the score in the shadow of the most common words with common characters tuple.
I'm not sure why I can't improve the score. I've even tried increasing dictionary size by importing words from another corpus.
The only section that can be altered here is the represent_word function and the definitions just above it. However, I'm including the entire source incase it might yield some insightful information to someones.
import nltk
import re
def word_counts(corpus, wordcounts = {}):
""" Function that counts all the words in the corpus."""
for word in corpus:
wordcounts.setdefault(word.lower(), 0)
wordcounts[word.lower()] += 1
return wordcounts
JA_list = filter(lambda x: x.isalpha(), map(lambda x:x.lower(),
nltk.corpus.gutenberg.words('austen-persuasion.txt')))
JA_freqdist=nltk.FreqDist(JA_list)
JA_toplist=sorted(JA_freqdist.items(),key=lambda x: x[1], reverse=True)[:0]
JA_topwords=[]
for i in JA_toplist:
JA_topwords.append(i[0])
PP_list = filter(lambda x: x.isalpha(),map(lambda x:x.lower(),
open("Pride and Prejudice.txt").read().split()))
PP_freqdist=nltk.FreqDist(PP_list)
PP_toplist=sorted(PP_freqdist.items(),key=lambda x: x[1], reverse=True)[:7]
PP_topwords=[]
for i in PP_toplist:
PP_topwords.append(i[0])
uniquewords=[]
for i in JA_topwords:
if i not in PP_topwords:
uniquewords.append(i)
else:
continue
uniquewords.extend(PP_topwords)
def represent_word(word):
def common_word(word):
dictionary= uniquewords
findings=[]
for string in dictionary:
if all((letter in word) for letter in string):
findings.append(string)
else:
False
if not findings:
return None
else:
return tuple(findings)
vowels = list("aeiouy")
consonants = list("bcdfghjklmnpqrstvexz")
number_of_consonants = sum(word.count(i) for i in consonants)
number_of_vowels = sum(word.count(i) for i in vowels)
split_word=list(word)
common_words=common_word(word)
return tuple([split_word[0],split_word[-1], len(split_word),number_of_consonants, number_of_vowels, common_words])
def create_mapping(words, mapping = {}):
""" Returns a mapping of representations of words to the most common word for that representation. """
for word in words:
representation = represent_word(word)
mapping.setdefault(representation, ("", 0))
if mapping[representation][1] < words[word]:
mapping[representation] = (word, words[word])
return mapping
if __name__ == '__main__':
# Create a mapping of representations of the words in Persuasian by Jane Austen to use as a corpus
words = JA_freqdist
mapping = create_mapping(words)
# Load the words in the scrambled file
with open("Pdrie and Puicejdre.txt") as scrambled_file:
scrambled_lines = [line.split() for line in scrambled_file if len(line.strip()) > 0 ]
scrambled_words = [word.lower() for line in scrambled_lines for word in line]
# Descramble the words using the best mapping
descrambled_words = []
for scrambled_word in scrambled_words:
representation = represent_word(scrambled_word)
if representation in mapping:
descrambled_word = mapping[representation][0]
else:
descrambled_word = scrambled_word
descrambled_words.append(descrambled_word)
# Load the original words
with open("Pride and Prejudice.txt") as original_file:
original_lines = [line.split() for line in original_file if len(line.strip()) > 0 ]
original_words = [word.lower() for line in original_lines for word in line]
# Make a list of word pairs from descrambled_words and original words
word_pairs = zip(descrambled_words, original_words)
# See if the words are the same
judgements = [descrambled_word == original_word for (descrambled_word, original_word) in word_pairs]
# Print the results
print "Correct: {0:.3%}".format(float(judgements.count(True))/len(judgements))
I am trying to write a function that takes a string, and checks every letter in that string against every letter, in every line, in a list of words. The code I have written is:
def uses_all(required):
fin = open('words.txt')
for line in fin:
for letter in line:
if letter not in required:
pass
return line
When I try to have only words that contain vowels returned it is only returning the last line in the file.
>>> uses_all('aeiou')
'zymurgy\n'
Well, the function you´ve written iterates through the file without doing anything, and then returns the last line, so the behavior you see is kinda expected.
Try this:
def uses_all(required):
ret = []
fin = open('words.txt')
for line in fin:
# Let´s try and find all our required letters in that word.
for letter in required:
if letter not in line:
break # We`re missing one! Break!
else: # else block executes if no break occured
ret.append(line)
return ret
It`s a lousy implementation, but it should work.
Lines yielded from iterating over a file have the EOL at the end. Strip that first.
Also, the question doesn't match the logic in the code.
You are only returning line which is just the loop variable. You need to build a list of answers. I am not sure what you where trying to do with the pass which is a no-op but here is a version of your code which should work...
def uses_all(required):
fin = open('words.txt')
answer = []
for line in fin:
should_take = True
for letter in required:
if letter not in required:
should_take = False
if should_take ==True:
answer.append(line)
return answer
class WordMatcher(object):
#classmethod
def fromFile(cls, fname):
with open(fname) as inf:
return cls(inf)
def __init__(self, words):
super(WordMatcher,self).__init__()
self.words = set(word.strip().lower() for word in words)
def usesAllLetters(self, letters):
letters = set(letters)
for word in self.words:
if all(ch in word for ch in letters):
yield word
wordlist = WordMatcher.fromFile('words.txt')
vowelWords = list(wordlist.usesAllLetters('aeiou'))
uses_any()
The name uses_all() seems to contradict the intent "I try to have only words that contain vowels returned". Here's a possible correction:
def uses_any(letters):
"""Yield words that contain any of the `letters`."""
with open('words.txt') as fin:
for word in fin: # if there is a single word per line then just say so
if any(letter in letters for letter in word):
# or set(letters).issuperset(word)
yield word.strip() # remove leading/trailing whitespace
uses_only()
Another interpretation could be:
def uses_only(input_words, required_letters, only=True):
"""Yield words from `input_words` that contain `required_letters`.
if `only` is true then a word is constructed entirely from`required_letters`
(but — may be not from *all* `required_letters`).
"""
present = all if only else any
for word in input_words:
if present(letter in required_letters for letter in word):
yield word #note: no .strip(), .lower(), etc here
with open('/etc/dictionaries-common/words') as fin:
words = fin.read().decode('utf-8').lower().splitlines()
print 'Words that constructed entirely from given vowels:'
print '\n'.join(uses_only(words, u'aeiou'))
print 'Words that contain any given vowels:'
print '\n'.join(uses_only(words, u'aeiou', only=False))
Output
Words that constructed entirely from given vowels:
a
au
e
eu
i
io
o
u
a
e
i
o
u
Words that contain any given vowels:
...
épées
étude
étude's
études
uses_all()
If the intent is: "I try to have only words that contain [all] vowels returned [but other letters are allowed too]" then:
def uses_all(input_words, required_letters):
"""Yield `input_words` that contain all `required_letters`."""
required_letters = frozenset(required_letters)
for word in input_words:
if required_letters.issubset(word):
yield word
print 'Words that contain all given vowels:'
print '\n'.join(uses_all(unique_justseen(words), u'aeiou'))
Where words is defined in the previous example and unique_justseen() is:
from itertools import imap, groupby
from operator import itemgetter
def unique_justseen(iterable, key=None):
"""List unique elements, preserving order.
Remember only the element just seen.
"""
# unique_justseen('AAAABBBCCDAABBB') --> A B C D A B
# unique_justseen('ABBCcAD', str.lower) --> A B C A D
return imap(next, imap(itemgetter(1), groupby(iterable, key)))
Output
Words that contain all given vowels:
...
vivaciousness's
vocabularies
voluntaries
voluptuaries
warehousing