A step word is formed by taking a given word, adding a letter, and anagramming the result. For example, starting with the word "APPLE", you can add an "A" and anagram to get "APPEAL".
Given a global dictionary of words, create a function step(word) that returns a list of all unique, valid step words appearing in the dictionary.
Dictionary: https://raw.githubusercontent.com/eneko/data-repository/master/data/words.txt
I made a dictionary using the link using:
>>> words = open('words.txt', encoding='ascii').read().upper().split()
This assignment should be completed without any other library function calls. There are several solutions, but some are better and faster than others. How can you speed up your solution?
The solution should look like this.
>>> step("APPLE")
>>>['APPEAL', 'CAPPLE', 'PALPED', 'LAPPED', 'DAPPLE', 'ALEPPO', 'LAPPER', 'RAPPEL', 'LAPPET', 'PAPULE', 'UPLEAP']
As we know anagrams in sorted form are exactly same.
Logic:
Create a dictionary for look-up. where key would be sorted string and value would be array of anagrams of the key.
Add all the alphabet (A..Z) one by one to the input string, one at a time such that the result string contains one extra alphabet than the input string and is in sorted form. Now, find the anagrams in the dictionary created in previous step.
Combination of all the values you got from step 2 will be your expected output.
Talking about the complexity of run-time code. (excluding the time to create constants like dictionary for look-up, alphabets)
It will take O(NxLogN) + O(26xN) ~ O(NxLogN)
26: Number of Alphabets
N: length of input string
To sort the input string once the sorted function will take NlogN time in worst case.
To create the new sorted string by addition of one alphabet to the sorted input string will take 26xN time.
Code:
# array of all valid and unique words from the dictionary
valid_words = set(open('words.txt', encoding='ascii').read().upper().split())
look_up = {}
for word in valid_words:
try:
look_up[''.join(sorted(word))].append(word)
except KeyError:
look_up[''.join(sorted(word))] = [word]
alphabet_array = []
alphabet_dict = {}
for i in range (65, 91):
alphabet_dict[chr(i)]=i
alphabet_array.append(chr(i))
def step(word):
sorted_string = sorted(word)
length_of_input_string = len(sorted_string)
output_values = []
for i in alphabet_array:
new_str = ''
value_added = 0
for j in range (0, length_of_input_string):
if value_added==0 and (alphabet_dict[sorted_string[j]] > alphabet_dict[i]):
new_str += i
value_added = 1
new_str += sorted_string[j]
if value_added==0:
new_str += i
try:
output_values+=look_up[new_str]
except KeyError:
pass
return output_values
if __name__ == '__main__':
input_string = 'APPLE'
print (step(input_string))
Since anagrams have the same letters, if you alphabetically sort the letters in a word, you would get the same string for words that are anagrams of each other.
For example:
LEAP -> alphabetically sorted -> AEPL
PALE -> alphabetically sorted -> AEPL
1) You should iterate through all the words in your dictionary and create a look up of the alphabetically sorted string key to a list of words that have the same key.
Given a list of words
["PALE","LEAP"]
you will get the anagram lookup as follows
{
"AEPL"=>["PALE","LEAP"],
...
}
2) Next, take the input word, and try different combinations of alphabets to create a new string. Sort this string and lookup against the anagram dictionary for matches. Concatenate the lists returned into one list and return that list.
Let's say the input word is PEA, generate all combinations
["PEAA","PEAB"...,"PEAL",...]
Alphabetically sort every candidate word
["AAEP","ABEP",...,"AEPL",...]
Then lookup and concatenate the lists returned
["LEAP","PALE"]
Let me know if you want the python code here as well, but it should be easy to code this up. The speedup is primarily due to preprocessing the anagram lookup dictionary, due to which the final lookup runs in near constant time, but it uses additional space of the order of the words in the input list.
Please see the implementation and explanations following:
For this problem we can divide it into two parts: firstly try to build up a
words map using defaultdict to store all similar anagram words into list. For example words that have same letters, such as TEA, EAT, should have the same key.
Creating the word maps will enable us to loop through N words and sort them.
The run time will be O(N * k logk) - assuming the average word length is k.
Secondly, we can loop through each letter and add it to the given word, and
check if the new value are already for this key is in the maps. If so, we
find the step word and add it to the results.
from collections import defaultdict
from string import ascii_uppercase as uppers
def make_wordmap(dictionary): # first part - build up the lookup hash map
maps = defaultdict(list)
for word in dictionary:
maps[tuple(sorted(word))].append(word)
return maps
def step_words(word, dc): # search the word from dict by using the maps
word_map = make_wordmap(dc)
step_words = []
for letter in uppers:
key = tuple(sorted(word + letter))
if word_map[key]:
step_words.extend(word_map[key])
return step_words
if __name__ == '__main__':
dictionary = ['APPEAL', 'TEA', 'DAVY']
word = 'APPLE'
print(step_words(word, dictionary))
print(step_words('DAY', dictionary))
Here is a simple piece of code which: read the file and build a lookup table.
There are some code to handle special cases found in the file.
## letters which will be added to the word to lookup for anagrams
alphabet = 'abcdefghijklmnopqrstuvwxyz'
## read the file, lowercase the letters, and split it in case you have some blanks
dictionary = open('words.txt', encoding='ascii').read().lower().split()
## build a lookup dict which holds all words for a given sorted set of letters
lookup = {}
for word in dictionary:
try:
if word not in lookup[''.join(sorted(word))]: ## avoid possible duplicates in the dictionary if word is already in the dictionnary, like anagrams, or Upper/lowercase
lookup[''.join(sorted(word))].append(word)
except:
lookup[''.join(sorted(word))] = [word] ## create new dictionnary entry if key does not exists
## step function to find anagrams with one more letter
def step(word):
word = word.lower() ## works with lowercase word only
output = []
for i in alphabet: ## try to find anagrams with one extra letter added to the word
try:
output += lookup[''.join(sorted(word+i))] ## add the word found if an anagram is found in lookup dict
except:
pass
return list(set(output)) ## be sure to return only unique answers in a list
## main
if __name__ == '__main__':
print (step('aPPle'))
print (step('OV')) ## test for the dupes 'Ova' and 'ova' found in the file
print (step('a')) ## test for the one letter dupes 'a' and 'A' found in the file
print (step('A')) ## test for the one letter dupes 'a' and 'A' found in the file
The below program will Search for the Multiple Combinations of the input string and print the Anagrams. This can be further finetuned, but this is a basic starting point for checking the valid Anagram values.
#Program: Anagram Finder
from itertools import permutations
#Store the words text file in to an object for searching
with open('words.txt', 'r') as f:
dictionary = f.read()
dictionary = [x.lower() for x in dictionary.split('\n')]
#Get permutations of input word
def get_perms(value, length):
for l in range(length):
for perm in permutations(value, l):
yield ''.join(perm)
else:
return []
# Search the dictionary for possible Anagrams and list the valid ones
def fncSearchForAnagram():
y = ["Apple"] # Anagram check Sample input.
for i in y:
perms = get_perms(i, len(i))
for item in perms:
if item.lower() in dictionary: # converting search string to lower since the word file has all lower case chars
# This output will be your Anagram combination word listed in the Words.txt File
print(item)
fncSearchForAnagram()
To easily check if a word is an anagram of another we sort both words first.
And check if, after sorting they are equals. As anagram is just the rearranging of characters.
To check if in a word we can add some char to create another word, we just check if the char of a word are present in the other word.
list_of_words = ['handsome', 'handy', 'notright', 'and']
word_to_check = 'adn'
anagram_can_add_chars= []
anagram_cannot_add_chars= []
# loop through the list of words
for word in list_of_words:
# we sort the string so that we know that we have all the chars
if sorted(word_to_check) == sorted(word):
anagram_cannot_add_chars.append(word)
# we convert the strings to set
# this will remove duplicate chars on each
# but we can always add them
if set(sorted(word_to_check)).issubset(set(sorted(word))):
anagram_can_add_chars.append(word)
print(anagram_can_add_chars)
print('---')
print(anagram_cannot_add_chars)
result :
['handsome', 'handy', 'and']
---
['and']
Related
Can anyone tell me what's wrong with my code? It only returns the second element of the list.
I'm working on LeetCode problem 720. Longest Word in Dictionary.
Given an array of strings words representing an English Dictionary, return the longest word in words that can be built one character at a time by other words in words.
If there is more than one possible answer, return the longest word with the smallest lexicographical order. If there is no answer, return the empty string.
Example 1:
Input: words = ["w","wo","wor","worl","world"]
Output: "world"
Explanation: The word "world" can be built one character at a time by "w", "wo", "wor", and "worl".
Example 2:
Input: words = ["a","banana","app","appl","ap","apply","apple"]
Output: "apple"
Explanation: Both "apply" and "apple" can be built from other words in the dictionary. However, "apple" is lexicographically smaller than "apply".
This is my code so far:
class Solution:
def longestWord(self, words: List[str]) -> str:
words.sort(key=len)
def find(m,n):
if n==0 and len(words[n])!=1:
return Flase
if len(words[n])==1:
return 1
if words[m]==words[n]+words[m][-1]:
result=find(n,n-1)
else:
n=n-1
result=find(m,n)
for i in range(len(words)-1,0,-1):
j=i-1
res=find(i,j)
if res==1:
return words[i]
return ''
The "core" of your find function looks like this:
if words[m]==words[n]+words[m][-1]:
result=find(n,n-1)
else:
n=n-1
result=find(m,n)
Replace that with this:
if words[m]==words[n]+words[m][-1]:
return find(n,n-1)
else:
return False
That gets you closer, but there's still a problem: that returns the FIRST match, but not necessarily the smallest lexicographically. For that, you'll need to change your sort key.
This is my code, but it doesn't work. It should read text from the console, split it into words and distribute them into 3 lists and use separators between them.
words = list(map(str, input().split(" ")))
lowercase_words = []
uppercase_words = []
mixedcase_words = []
def split_symbols(list):
from operator import methodcaller
list = words
map(methodcaller(str,"split"," ",",",":",";",".","!","( )","","'","\\","/","[ ]","space"))
return list
for word in words:
if words[word] == word.lower():
words[word] = lowercase_words
elif words[word] == word.upper():
words[word] = uppercase_words
else:
words[word] = mixedcase_words
print(f"Lower case: {split_symbols(lowercase_words)}")
print(f"Upper case: {split_symbols(uppercase_words)}")
print(f"Mixed case: {split_symbols(mixedcase_words)}")
There are several issues in your code.
1) words is a list and word is string. And you are trying to access the list with the index as string which will throw an error. You must use integer for indexing a list. In this case, you don't even need indexes.
2) To check lower or upper case you can just do, word == word.lower() or word == word.upper(). Or another approach would be to use islower() or isupper() function which return a boolean.
3) You are trying to assign an empty list to that element of list. What you want is to append the word to that particular list. You want something like lowercase_words.append(word). Same for uppercase and mixedcase
So, to fix this two issues you can write the code like this -
for word in words:
if word == word.lower(): # same as word.islower()
lowercase_words.append(word)
elif word == word.upper(): # same as word.isupper()
uppercase_words.append(word)
else:
mixedcase_words.append(word)
My advice would be to refrain from naming variable things like list. Also, in split_words() you are assigning list to words. I think you meant it other way around.
Now I am not sure about the "use separators between them" part of the question. But the line map(methodcaller(str,"split"," ",",",":",";",".","!","( )","","'","\\","/","[ ]","space")) is definitely wrong. map() takes a function and an iterable. In your code the iterable part is absent and I think this where the input param list fits in. So, it may be something like -
map(methodcaller("split"," "), list)
But then again I am not sure what are you trying to achieve with that many seperator
I am struggling with the following, is my code correct and how to test if it works?
Task: Take a string as a single input argument. You may assume the string consists of distinct lower case letters (in alphabetical order). You may assume the input is a string of letters in alphabetical order.
Return a list of strings where each string represents a permutation of the input string. The list of permutations must be in lexicographic order. (This is basically the ordering that dictionaries use. Order by the first letter (alphabetically), if tie then use the second letter, etc.
If the string contains a single character return a list containing that string
Loop through all character positions of the string containing the characters to be permuted, for each character:
Form a simpler string by removing the character
Generate all permutations of the simpler string recursively
Add the removed character to the front of each permutation of the simpler
word, and add the resulting permutation to a list
Return all these newly constructed permutations
[My code]
def perm_gen_lex(in_string):
if (len(in_string) <= 1):
return(in_string)
# List of all new combinations
empty_list = []
# All permutations
final_perm = perm_gen_lex(in_string[1:])
# Character to be removed
remove_char = in_string(0)
# Remaining part of string
remaining_string = in_string[1:]
for perm in final_perm[1:]:
for i in range(len(in_string) + 1):
return empty_list.append(perm[:i] + remove_char + perm[i:])
return empty_list
Some variation on this will get you moving:
from itertools import product
def combinations(string):
return [''.join(i) for i in product(string, repeat = len(string))]
print(combinations("abc"))
See https://docs.python.org/3/library/itertools.html#itertools.product
I need to call various different functions that i have already created in order to achieve the question below. I am really unsure how to programme this in order to achieve it. The question is..
Find two word anagrams in word list( str, str list ), where the input parameters should
be a string and a list of strings. The output should be a list of all strings made up of two
words separated by a space, such that both of these words are in str list and the combination
of the two words is an anagram of str.
The output i expect is:
wordlist = ('and','band','nor,'born')
find_two_word_anagrams_in_wordlist( "brandon", wordlist )
[’and born’, ’band nor’]
How to achieve this is:
Initialise a list two word anagrams to the empty list, [].
Call find partial anagrams in word list to get a list of all the
partial anagrams of str that can be found in the word list.
Then, do a loop that runs over all these partial anagrams. For each
partial anagram part anag do the following:
Remove the letters of part anag from the input string, str, to get
a string, rem that is the remaining letters after taking away the
partial anagram;
Call find anagrams in word list on rem (and the input word list) to
get a list of anagrams of the remaining letters;
For each anagram rem anag of the remaining letters, form the string
part anag + " " + rem anag and add it to the list two word anagrams.
At this point the list two word anagrams should have all the two
word anagrams in it, so return that list from your function.
Code i have already created is:
def find_partial_anagrams_in_word_list(str1,str_list):
partial_anagrams = []
for word in str_list:
if (partial_anagram(word, str1)):
partial_anagrams.append(word)
print(partial_anagrams)
def remove_letters(str1,str2):
str2_list = list(str2)
for char in str2:
if char in str1:
str2_list.remove(char)
return "".join(str2_list)
def find_anagrams_in_word_list(str1,str_list):
anagrams = []
for word in str_list:
if (anagram(str1,word)):
anagrams.append(word)
print(word)
Any step by step help or input would be appreciated.
You have two choices, you can look at anagrams in combinations of two different words, or you can look at anagrams in all two by two combinations.
The choice is yours, here I have implemented both
from itertools import combinations, combinations_with_replacement
def ana2(s,wl):
rl = []
for w1, w2 in combinations(wl,2):
w = w1+w2
if len(w) != len(s): continue
if sorted(w) == sorted(s): rl.append((w1, w2)
return rl
def ana2wr(s,wl):
rl = []
for w1, w2 in combinations_with_replacement(wl,2):
w = w1+w2
if len(w) != len(s): continue
if sorted(w) == sorted(s): rl.append((w1, w2))
return rl
and here it is some testing
wl = ('and','band','nor','born')
s1 = 'brandon'
s2 = 'naddan
print ana2(s1,wl)
print ana2(s2,wl)
print ana2wr(s1,wl)
print ana2wr(s2,wl)
that produces the following output
[('and', 'born'), ('band', 'nor')]
[]
[('and', 'born'), ('band', 'nor')]
[('and', 'and')]
You wrote
I need to call various different functions that i have already created
in order to achieve the question below
def find_2_word_anagram(word, wordlist)
# Initialise a list two word anagrams to the empty list, [].
analist = []
# Call find partial anagrams in word list to get a list of all the
# partial anagrams of str that can be found in the word list.
candidates = find_partial_anagrams_in_word_list(word,wordlist)
# Then, do a loop that runs over all these partial anagrams
for candidate in candidates:
# Remove the letters of part anag from the input string, str, to
# get a string, rem that is the remaining letters after taking
# away the partial anagram
remaining = remove_letter(candidate,words)
# Call find anagrams in word list on rem (and the input word list)
# to get a list of anagrams of the remaining letters;
matches = find_anagrams_in_word_list(remaining, wordlist)
for other in matches:
analist.append(" ".join(candidate,other))
return analist
Note that
you have still to write the inner functions following your specifications
when you write a function that returns, e.g., a list of words you MUST RETURN A LIST OF WORDS, and in particular I mean that it isn't enough that you print the matches from your function
to find an anagram, the idiom sorted(w1)==sorted(w2) is all you need, but the story with finding a partial anagram is more complex...
I took the liberty of inlining one of your functions, the one that computes the remainder.
When you strip the comments, that are your verbatim your specs, you have very few lines of code.
Post Scriptum
Have a thorough look at Daniel's comment to my previous answer, there's a lot in it...
I am writing code in which a word list is inputted. If any of the words in the list are of exactly 4 characters then those words will be returned in this format:
['word','four']
I am making a loop to check the whole list but obviously return is stopping the function so only the first 4 letter word is getting printed. As per instructions 'return' must be used and not print and the output must be in the list format like above. any help will be appreciated. Thank you.
def letter(list):
word = []
for word in list:
if len(word)==4:
return word
Once return is used it exits from the function so its better to populate an entire list of four letter words an then return it
def letter(word_list):
words = []
for word in word_list:
if len(word)==4:
words.append(word)
return words
One way to do it is using the built-in filter function that takes a boolean function and an iterable as inputs. For filter(f(item), lst) returns all items in lst for which f(item) returns true. Keep in mind that filter() returns a filter object, so you need to apply list(filter()) to return the list. For this case, the code would be:
list(filter(lambda word: len(word) == 4, words))
Another way to do it would be to use a list comprehension:
[word for word in words if len(word) == 4]
Using a list comprehension
def letter(list):
return [word for word in list if len(word)==4]