Python - match letters of words in a list - python

I'm trying to create a simple program where a user enters a few letters
Enter letters: abc
I then want to run through a list of words I have in list and match and words that contain 'a','b', and 'c'.
This is what I've tried so far with no luck
for word in good_words: #For all words in good words list
for letter in letters: #for each letter inputed by user
if not(letter in word):
break
matches.append(word)

If you want all the letters inside the word:
[word for word in good_words if all(letter in word for letter in letters)]
The problem with your code is the break inside the inner loop. Python doesn't have a construction to allow breaking more than one loop at once (and you wanted that)

You could probably improve the spee using a Set or FrozenSet
If you look at the doc, it mentionned the case of testing membership :
A set object is an unordered collection of distinct hashable objects.
Common uses include membership testing, removing duplicates from a
sequence, and computing mathematical operations such as intersection,
union, difference, and symmetric difference.

List comprehensions are definitely the way to go, but just to address the issue that OP was having with his code:
Your break statement only breaks out of the innermost loop. Because of that the word is still appended to matches. A quick fix for this is to take advantage of python's for... else construct:
for word in good_words:
for letter in letters:
if letter not in word:
break
else:
matches.append(word)
In the above code, else only executes if the loop is allowed to run all the way through. The break statement exits out of the loop completely, and matches.append(..) is not executed.

import collections
I would first compute the occurrences of letters in the words list.
words_by_letters = collections.defaultdict(list)
for word in good_words:
key = frozenset(word)
words_by_letters[key].append(word)
Then it's simply a matter of looking for words with particular letter occurrences. This is hopefully faster than checking each word individually.
subkey = set(letters)
for key, words in words_by_letters.iteritems():
if key.issuperset(subkey):
matches.extend(words)
If you want to keep track of letter repeats, you can do something similar by building a key from collections.Counter.

Related

Trouble getting list of words given a list of available letters for each character (Python)

I have a list of words that I would like to go through and remove any that don't fit my criteria.
The criteria is a list of lists of letters that are possible for each character.
letters = [['l','e'],['a','b','c'],['d','e','f']]
words = ['lab','lad','ebf','tem','abe','dan','lce']
The function I have written to try and solve this is:
def calc_words(letters,words):
for w in words:
for i in range(len(letters)):
if w in words:
for j in letters[i]:
if w in words:
if j != w[i]:
words.remove(w)
return words
The output when I run the function calc_words(letters,words) should be ['lad', 'ebf', 'lce']. But, I get ['lad', 'tem', 'dan'] instead.
I can't figure out what is going on. I'm relatively new to Python, so if someone either knows what is going wrong with my function, or knows a different way to go about this, I would appreciate any input.
In general, it's good to avoid using one-letter variable names and reduce nesting as much as possible. Here's an implementation of calc_words() that should suit your needs:
letters = [['l','e'],['a','b','c'],['d','e','f']]
words = ['lab','lad','ebf','tem','abe','dan','lce']
def calc_words(letters, words):
# Iterate over each word.
# Assume that a word should be in the result
# until we reach a letter that violates the
# constraint set by the letters list.
result = []
for word in words:
all_letters_match = True
for index, letter in enumerate(word):
if letter not in letters[index]:
all_letters_match = False
break
if all_letters_match:
result.append(word)
return result
# Prints "['lad', 'ebf', 'lce']".
print(calc_words(letters, words))
There is a comment in the first definition that describes how it works. This is similar to your implementation, with some nesting removed and some improved naming (e.g. the if w in words check aren't necessary, because in each iteration of w, w takes on the value of an element from words). I've tried to keep this solution as close to your original code.

How to change a single letter in input string

I'm newbie in Python so that I have a question. I want to change letter in word if the first letter appears more than once. Moreover I want to use input to get the word from user. I'll present the problem using an example:
word = 'restart'
After changes the word should be like this:
word = 'resta$t'
I was trying couple of ideas but always I got stuck. Is there any simple sollutions for this?
Thanks in advance.
EDIT: In response to Simas Joneliunas
It's not my homework. I'm just finished reading some basic Python tutorials and I found some questions that I couldn't solve on my own. My first thought was to separate word into a single letters and then to find out the place of the letter I want to replace by "$". I have wrote that code but I couldn't came up with sollution how to get to specific place and replace it.
word = 'restart'
how_many = {}
for x in word:
how_many=+1
else:
how_many=1
for y in how_many:
if how_many[y] > 0:
print(y,how_many[y])
Using str.replace:
s = "restart"
new_s = s[0] + s[1:].replace(s[0], "$")
Output:
'resta$t'
Try:
"".join([["$" if ch in word[:i] else ch for i, ch in enumerate(word)])
enumerate iterates through the string (i.e. a list of characters) and keeps a running index of the iteration
word[:i] checks the list of chars until the current index, i.e. previously appeared characters
"$" if ch in word[:i] else ch means replace the character at existing position with $ if it appears before others keep the character
"".join() joins the list of characters into a single string.
This is where the python console is handy and lets you experiment. Since you have to keep track of number of letters, for a good visual I would list the alphabet in a list. Then in the loop remove from the list the current letter. If letter does not exist in the list replace the letter with $.
So check if it exists first thing in the loop, if it exists, remove it, if it doesn’t exist replace it from example above.

Trying to filter a dictionary for an AI

Hi so I'm currently taking a class and one of our assignments is to create a Hangman AI. There are two parts to this assignment and currently I am stuck on the first task, which is, given the state of the hangman puzzle, and a list of words as a dictionary, filter out non-possible answers to the puzzle from the dictionary. As an example, if the puzzle given is t--t, then we should filter out all words that are not 4 letters long. Following that, we should filter out words which do not have t as their first and last letter (ie. tent and test are acceptable, but type and help should be removed). Currently, for some reason, my code seems to remove all entries from the dictionary and I have no idea why. Help would be much appreciated. I have also attached my code below for reference.
def filter(self,puzzle):
wordlist = dictionary.getWords()
newword = {i : wordlist[i] for i in range(len(wordlist)) if len(wordlist[i])==len(str(puzzle))}
string = puzzle.getState()
array = []
for i in range(len(newword)):
n = newword[i]
for j in range(len(string)):
if string[j].isalpha() and n[j]==string[j]:
continue
elif not string[j].isalpha():
continue
else:
array+=i
print(i)
break
array = list(reversed(array))
for i in array:
del newword[i]
Some additional information:
puzzle.getState() is a function given to us that returns a string describing the state of the puzzle, eg. a-a--- or t---t- or elepha--
dictionary.getWords essentially creates a dictionary from the list of words
Thanks!
This may be a bit late, but using regular expressions:
import re
def remove(regularexpression, somewords):
words_to_remove=[]
for word in somewords:
#if not re.search(regularexpression, word):
if re.search(regularexpression, word)==None:
words_to_remove.append(word)
for word in words_to_remove:
somewords.remove(word)
For example you have the list words=['hello', 'halls', 'harp', 'heroic', 'tests'].
You can now do:
remove('h[a-zA-Z]{4}$', words) and words would become ['hello', 'halls'], while remove('h[a-zA-Z]{3}s$', words) only leaves words as ['halls']
This line
newword = {i : wordlist[i] for i in range(len(wordlist)) if len(wordlist[i])==len(str(puzzle))}
creates a dictionary with non-contiguous keys. You want only the words that are the same length as puzzle so if your wordlist is ['test', 'type', 'string', 'tent'] and puzzle is 4 letters newword will be {0:'test', 1:'type', 3:'tent'}. You then use for i in range(len(newword)): to iterate over the dictionary based on the length of the dictionary. I'm a bit surprised that you aren't getting a KeyError with your code as written.
I'm not able to test this without the rest of your code, but I think changing the loop to:
for i in newword.keys():
will help.

Check if word is inside of list of tuples

I'm wondering how I can efficiently check whether a value is inside a given list of tuples. Say I have a list of:
("the", 1)
("check", 1)
("brown, 2)
("gary", 5)
how can I check whether a given word is inside the list, ignoring the second value of the tuples? If it was just a word I could use
if "the" in wordlist:
#...
but this will not work, is there something along the line this i can do?
if ("the", _) in wordlist:
#...
May be use a hash
>>> word in dict(list_of_tuples)
Use any:
if any(word[0] == 'the' for word in wordlist):
# do something
Lookup of the word in the list will be O(n) time complexity, so the more words in the list, the slower find will work. To speed up you may sort a list by word as a key alphabeticaly and then use binary search - search of the word becomes log(N) complexity, but the most efficient way is to use hashing with the set structure:
'the' in set((word for word, _ in a))
O(1), independent of how many words are in the set. BTW, it guarantees that only one instance of the word is inside the structure, while list can hold as many "the" as many you append. Set should be constructed once, add words with the .add method(add new word is O(1) complexity too)
for tupl in wordlist:
if 'the' in tupl:
# ...
words,scores = zip(*wordlist)
to split the wordlist into a list of words and a list of scores then just
print "the" in words

A function to count words in a corpora using dictionary values using Python

I'm a Python Newbie trying to get a count of words that occur within a corpora (corpora) using a dictionary of specific words. The corpora is a string type that has been tokenized, normalized, lemmatized, and stemmed.
dict = {}
dict ['words'] = ('believe', 'tried', 'trust', 'experience')
counter=0
Result = []
for word in corpora:
if word in dict.values():
counter = i + 1
else counter = 0
This code produces a syntax error on the dict.values() line. Any help is appreciated!
Don't do dict = {}. dict is a built-in function and you are shadowing it. That's not critical, you won't be able t use if you 'll need it later.
A dictionary is a key→value mapping. Like a real dictionary (word → translation). What you did is said that value ('believe', …), which is a tuple, corresponds to the key 'word' in your dictionary. Then you are using dict.values() which gives you a sequence of all the values stored in the dictionary, in your case this sequence consists of exacly one item, and this item is a tuple. Your if condition will never be True: word is a string and dict.values() is a sequence, consisting of a single tuple of strings.
I'm not really sure why you are using a dictionary. It seems that you've got a set of words that are important for you, and you are scanning your corpora and count the number of occurences of those words. The key word here is set. You don't need a dictionary, you need a set.
It is not clear, what you are counting. What's that i you are adding to the counter? If you meant to increment counter by one, that should be counter = counter + 1 or simply counter += 1.
Why are you resetting counter?
counter = 0
I don't think you really want to reset the counter when you found an unknown word. It seems that unkown words shouldn't change your counter, then, just don't alter it.
Notes. Try to avoid using upper case letters in variable names (Result = [] is bad). Also as others mntioned, you are missing a colon after else.
So, now let's put it all together. The first thing to do is to make a set of words we are interested in:
words = {'believe', 'tried', 'trust', 'experience'}
Next you can iterate over the words in your corpora and see which of them are present in the set:
for word in corpora:
if word in words:
# do something
It is not clear what exactly the code should do, but if your goal is to know how many times all the words in the set are found in the corpora all together, then you'll just add one to counter inside that if.
If you want to know how many times each word of the set appears in the corpora, you'll have to maintain a separate counter for every word in the set (that's where a dictionary might be useful). This can be achieved easily with collections.Counter (which is a special dictionary) and you'll have to filter your corpora to leave only the words you are interested in, that's where ifilter will help you.
filtered_corpora = itertools.ifilter(lambda w: w in words, corpora)
—this is your corpora will all the words not found in words removed. You can pass it do Counter right away.
This trick is also useful for the first case (i.e. when you need only the total count). You'll just return the length of this filtered corpora (len(filtered_corpora)).
You have multiple issues. You did not define corpora in the example here. you are redfining dict, which is a built-in type. the else is not indented correctly. dict.values() return an iterable, each of which is a tuple; word will not be inside it, if word is a string. and it is not clear what counter counts, actually. adn what Results is doing there?
your code may be similar to this (pseudo)code
d = {'words' : ('believe', 'tried', 'trust', 'experience')} #if that's really what you want
counter = {}
for word in corpora:
for tup in d.values(): # each tup is a tuple
if word in tup:
x = counter[word] if word in counter else 0
counter[word] = x+1
There Is A Shorter Way To Do It.
This task, of counting things, is so common that a specific class for doing so exists in the library: collections.Counter.

Categories

Resources