I'm having a funny issue with map_async that i can't figure out.
I'm using python's multiprocessing library with process pools. I'm trying to pass a list of strings to compare against and a list of strings to be compared to a function using map_async()
right now i have:
from multiprocessing import Pool, cpu_count
import functools
dictionary = /a/file/on/my/disk
passin = /another/file/on/my/disk
num_proc = cpu_count()
dictionary = readFiletoList(fdict)
dictionary = sortByLength(dictionary)
words = readFiletoList(passin, 'WINDOWS-1252')
words = sortByLength(words)
result = pool.map_async(functools.partial(mpmine, dictionary=dictionary), [words], 1000)
def readFiletoList(fname, fencode='utf-8'):
linelist = list()
with open(fname, encoding=fencode) as f:
for line in f:
linelist.append(line.strip())
return linelist
def sortByLength(words):
'''Takes an ordered iterable and sorts it based on word length'''
return sorted(words, key=len)
def mpmine(word, dictionary):
'''Takes a tuple of length 2 with it's arguments.
At least dictionary needs to be sorted by word length. If not, whacky results ensue.
'''
results = dict()
for pw in word:
pwlen = len(pw)
pwres = list()
for word in dictionary:
if len(word) > pwlen:
break
if word in pw:
pwres.append(word)
if len(pwres) > 0:
results[pw] = pwres
return results
if __name__ == '__main__':
main()
Both dictionary and words are lists of strings. This results in only one process being used instead of the amount I have set. If i take the square brackets off the variable 'words' it seems to iterate through each string's characters in turn and cause a mess.
What i would like to have happen is it take like 1000 strings out of words and pass them into the worker process and then get the results, because this is a ridiculously parallelisable task.
EDIT: Added more code to make what's going on more clear.
Ok, i actually figured this one out myself. I'm only going to post the answer here for anyone else who might come along and have the same issue. The reason i was having problems was because map_async takes one item from the list (in this case a string), and feeds it into the function, which was expecting a list of strings. so it then was treating each string as a list of chars basically. the corrected code for mpmine is:
def mpmine(word, dictionary):
'''Takes a tuple of length 2 with it's arguments.
At least dictionary needs to be sorted by word length. If not, whacky results ensue.
'''
results = dict()
pw = word
pwlen = len(pw)
pwres = list()
for word in dictionary:
if len(word) > pwlen:
break
if word in pw:
pwres.append(word)
if len(pwres) > 0:
results[pw] = pwres
return results
I hope this helps anyone else facing a similar issue.
Related
This is my code, but it doesn't work. It should read text from the console, split it into words and distribute them into 3 lists and use separators between them.
words = list(map(str, input().split(" ")))
lowercase_words = []
uppercase_words = []
mixedcase_words = []
def split_symbols(list):
from operator import methodcaller
list = words
map(methodcaller(str,"split"," ",",",":",";",".","!","( )","","'","\\","/","[ ]","space"))
return list
for word in words:
if words[word] == word.lower():
words[word] = lowercase_words
elif words[word] == word.upper():
words[word] = uppercase_words
else:
words[word] = mixedcase_words
print(f"Lower case: {split_symbols(lowercase_words)}")
print(f"Upper case: {split_symbols(uppercase_words)}")
print(f"Mixed case: {split_symbols(mixedcase_words)}")
There are several issues in your code.
1) words is a list and word is string. And you are trying to access the list with the index as string which will throw an error. You must use integer for indexing a list. In this case, you don't even need indexes.
2) To check lower or upper case you can just do, word == word.lower() or word == word.upper(). Or another approach would be to use islower() or isupper() function which return a boolean.
3) You are trying to assign an empty list to that element of list. What you want is to append the word to that particular list. You want something like lowercase_words.append(word). Same for uppercase and mixedcase
So, to fix this two issues you can write the code like this -
for word in words:
if word == word.lower(): # same as word.islower()
lowercase_words.append(word)
elif word == word.upper(): # same as word.isupper()
uppercase_words.append(word)
else:
mixedcase_words.append(word)
My advice would be to refrain from naming variable things like list. Also, in split_words() you are assigning list to words. I think you meant it other way around.
Now I am not sure about the "use separators between them" part of the question. But the line map(methodcaller(str,"split"," ",",",":",";",".","!","( )","","'","\\","/","[ ]","space")) is definitely wrong. map() takes a function and an iterable. In your code the iterable part is absent and I think this where the input param list fits in. So, it may be something like -
map(methodcaller("split"," "), list)
But then again I am not sure what are you trying to achieve with that many seperator
A step word is formed by taking a given word, adding a letter, and anagramming the result. For example, starting with the word "APPLE", you can add an "A" and anagram to get "APPEAL".
Given a global dictionary of words, create a function step(word) that returns a list of all unique, valid step words appearing in the dictionary.
Dictionary: https://raw.githubusercontent.com/eneko/data-repository/master/data/words.txt
I made a dictionary using the link using:
>>> words = open('words.txt', encoding='ascii').read().upper().split()
This assignment should be completed without any other library function calls. There are several solutions, but some are better and faster than others. How can you speed up your solution?
The solution should look like this.
>>> step("APPLE")
>>>['APPEAL', 'CAPPLE', 'PALPED', 'LAPPED', 'DAPPLE', 'ALEPPO', 'LAPPER', 'RAPPEL', 'LAPPET', 'PAPULE', 'UPLEAP']
As we know anagrams in sorted form are exactly same.
Logic:
Create a dictionary for look-up. where key would be sorted string and value would be array of anagrams of the key.
Add all the alphabet (A..Z) one by one to the input string, one at a time such that the result string contains one extra alphabet than the input string and is in sorted form. Now, find the anagrams in the dictionary created in previous step.
Combination of all the values you got from step 2 will be your expected output.
Talking about the complexity of run-time code. (excluding the time to create constants like dictionary for look-up, alphabets)
It will take O(NxLogN) + O(26xN) ~ O(NxLogN)
26: Number of Alphabets
N: length of input string
To sort the input string once the sorted function will take NlogN time in worst case.
To create the new sorted string by addition of one alphabet to the sorted input string will take 26xN time.
Code:
# array of all valid and unique words from the dictionary
valid_words = set(open('words.txt', encoding='ascii').read().upper().split())
look_up = {}
for word in valid_words:
try:
look_up[''.join(sorted(word))].append(word)
except KeyError:
look_up[''.join(sorted(word))] = [word]
alphabet_array = []
alphabet_dict = {}
for i in range (65, 91):
alphabet_dict[chr(i)]=i
alphabet_array.append(chr(i))
def step(word):
sorted_string = sorted(word)
length_of_input_string = len(sorted_string)
output_values = []
for i in alphabet_array:
new_str = ''
value_added = 0
for j in range (0, length_of_input_string):
if value_added==0 and (alphabet_dict[sorted_string[j]] > alphabet_dict[i]):
new_str += i
value_added = 1
new_str += sorted_string[j]
if value_added==0:
new_str += i
try:
output_values+=look_up[new_str]
except KeyError:
pass
return output_values
if __name__ == '__main__':
input_string = 'APPLE'
print (step(input_string))
Since anagrams have the same letters, if you alphabetically sort the letters in a word, you would get the same string for words that are anagrams of each other.
For example:
LEAP -> alphabetically sorted -> AEPL
PALE -> alphabetically sorted -> AEPL
1) You should iterate through all the words in your dictionary and create a look up of the alphabetically sorted string key to a list of words that have the same key.
Given a list of words
["PALE","LEAP"]
you will get the anagram lookup as follows
{
"AEPL"=>["PALE","LEAP"],
...
}
2) Next, take the input word, and try different combinations of alphabets to create a new string. Sort this string and lookup against the anagram dictionary for matches. Concatenate the lists returned into one list and return that list.
Let's say the input word is PEA, generate all combinations
["PEAA","PEAB"...,"PEAL",...]
Alphabetically sort every candidate word
["AAEP","ABEP",...,"AEPL",...]
Then lookup and concatenate the lists returned
["LEAP","PALE"]
Let me know if you want the python code here as well, but it should be easy to code this up. The speedup is primarily due to preprocessing the anagram lookup dictionary, due to which the final lookup runs in near constant time, but it uses additional space of the order of the words in the input list.
Please see the implementation and explanations following:
For this problem we can divide it into two parts: firstly try to build up a
words map using defaultdict to store all similar anagram words into list. For example words that have same letters, such as TEA, EAT, should have the same key.
Creating the word maps will enable us to loop through N words and sort them.
The run time will be O(N * k logk) - assuming the average word length is k.
Secondly, we can loop through each letter and add it to the given word, and
check if the new value are already for this key is in the maps. If so, we
find the step word and add it to the results.
from collections import defaultdict
from string import ascii_uppercase as uppers
def make_wordmap(dictionary): # first part - build up the lookup hash map
maps = defaultdict(list)
for word in dictionary:
maps[tuple(sorted(word))].append(word)
return maps
def step_words(word, dc): # search the word from dict by using the maps
word_map = make_wordmap(dc)
step_words = []
for letter in uppers:
key = tuple(sorted(word + letter))
if word_map[key]:
step_words.extend(word_map[key])
return step_words
if __name__ == '__main__':
dictionary = ['APPEAL', 'TEA', 'DAVY']
word = 'APPLE'
print(step_words(word, dictionary))
print(step_words('DAY', dictionary))
Here is a simple piece of code which: read the file and build a lookup table.
There are some code to handle special cases found in the file.
## letters which will be added to the word to lookup for anagrams
alphabet = 'abcdefghijklmnopqrstuvwxyz'
## read the file, lowercase the letters, and split it in case you have some blanks
dictionary = open('words.txt', encoding='ascii').read().lower().split()
## build a lookup dict which holds all words for a given sorted set of letters
lookup = {}
for word in dictionary:
try:
if word not in lookup[''.join(sorted(word))]: ## avoid possible duplicates in the dictionary if word is already in the dictionnary, like anagrams, or Upper/lowercase
lookup[''.join(sorted(word))].append(word)
except:
lookup[''.join(sorted(word))] = [word] ## create new dictionnary entry if key does not exists
## step function to find anagrams with one more letter
def step(word):
word = word.lower() ## works with lowercase word only
output = []
for i in alphabet: ## try to find anagrams with one extra letter added to the word
try:
output += lookup[''.join(sorted(word+i))] ## add the word found if an anagram is found in lookup dict
except:
pass
return list(set(output)) ## be sure to return only unique answers in a list
## main
if __name__ == '__main__':
print (step('aPPle'))
print (step('OV')) ## test for the dupes 'Ova' and 'ova' found in the file
print (step('a')) ## test for the one letter dupes 'a' and 'A' found in the file
print (step('A')) ## test for the one letter dupes 'a' and 'A' found in the file
The below program will Search for the Multiple Combinations of the input string and print the Anagrams. This can be further finetuned, but this is a basic starting point for checking the valid Anagram values.
#Program: Anagram Finder
from itertools import permutations
#Store the words text file in to an object for searching
with open('words.txt', 'r') as f:
dictionary = f.read()
dictionary = [x.lower() for x in dictionary.split('\n')]
#Get permutations of input word
def get_perms(value, length):
for l in range(length):
for perm in permutations(value, l):
yield ''.join(perm)
else:
return []
# Search the dictionary for possible Anagrams and list the valid ones
def fncSearchForAnagram():
y = ["Apple"] # Anagram check Sample input.
for i in y:
perms = get_perms(i, len(i))
for item in perms:
if item.lower() in dictionary: # converting search string to lower since the word file has all lower case chars
# This output will be your Anagram combination word listed in the Words.txt File
print(item)
fncSearchForAnagram()
To easily check if a word is an anagram of another we sort both words first.
And check if, after sorting they are equals. As anagram is just the rearranging of characters.
To check if in a word we can add some char to create another word, we just check if the char of a word are present in the other word.
list_of_words = ['handsome', 'handy', 'notright', 'and']
word_to_check = 'adn'
anagram_can_add_chars= []
anagram_cannot_add_chars= []
# loop through the list of words
for word in list_of_words:
# we sort the string so that we know that we have all the chars
if sorted(word_to_check) == sorted(word):
anagram_cannot_add_chars.append(word)
# we convert the strings to set
# this will remove duplicate chars on each
# but we can always add them
if set(sorted(word_to_check)).issubset(set(sorted(word))):
anagram_can_add_chars.append(word)
print(anagram_can_add_chars)
print('---')
print(anagram_cannot_add_chars)
result :
['handsome', 'handy', 'and']
---
['and']
My function first calculates all possible anagrams of the given word. Then, for each of these anagrams, it checks if they are valid words, but checking if they equal to any of the words in the wordlist.txt file. The file is a giant file with a bunch of words line by line. So I decided to just read each line and check if each anagram is there. However, it comes up blank. Here is my code:
def perm1(lst):
if len(lst) == 0:
return []
elif len(lst) == 1:
return [lst]
else:
l = []
for i in range(len(lst)):
x = lst[i]
xs = lst[:i] + lst[i+1:]
for p in perm1(xs):
l.append([x] + p)
return l
def jumbo_solve(string):
'''jumbo_solve(string) -> list
returns list of valid words that are anagrams of string'''
passer = list(string)
allAnagrams = []
validWords = []
for x in perm1(passer):
allAnagrams.append((''.join(x)))
for x in allAnagrams:
if x in open("C:\\Users\\Chris\\Python\\wordlist.txt"):
validWords.append(x)
return(validWords)
print(jumbo_solve("rarom"))
If have put in many print statements to debug, and the passed in list, "allAnagrams", is fully functional. For example, with the input "rarom, one valid anagram is the word "armor", which is contained in the wordlist.txt file. However, when I run it, it does not detect if for some reason. Thanks again, I'm still a little new to Python so all the help is appreciated, thanks!
You missed a tiny but important aspect of:
word in open("C:\\Users\\Chris\\Python\\wordlist.txt")
This will search the file line by line, as if open(...).readlines() was used, and attempt to match the entire line, with '\n' in the end. Really, anything that demands iterating over open(...) works like readlines().
You would need
x+'\n' in open("C:\\Users\\Chris\\Python\\wordlist.txt")
if the file is a list of words on separate lines to make this work to fix what you have, but it's inefficient to do this on every function call. Better to do once:
wordlist = open("C:\\Users\\Chris\\Python\\wordlist.txt").read().split('\n')
this will create a list of words if the file is a '\n' separated word list. Note you can use
`readlines()`
instead of read().split('\n'), but this will keep the \n on every word, like you have, and you would need to include that in your search as I show above. Now you can use the list as a global variable or as a function argument.
if x in wordlist: stuff
Note Graphier raised an important suggestion in the comments. A set:
wordlist = set(open("C:\\Users\\Chris\\Python\\wordlist.txt").read().split('\n'))
Is better suited for a word lookup than a list, since it's O(word length).
You have used the following code in the wrong way:
if x in open("C:\\Users\\Chris\\Python\\wordlist.txt"):
Instead, try the following code, it should solve your problem:
with open("words.txt", "r") as file:
lines = file.read().splitlines()
for line in lines:
# do something here
So, putting all advice together, your code could be as simple as:
from itertools import permutations
def get_valid_words(file_name):
with open(file_name) as f:
return set(line.strip() for line in f)
def jumbo_solve(s, valid_words=None):
"""jumbo_solve(s: str) -> list
returns list of valid words that are anagrams of `s`"""
if valid_words is None:
valid_words = get_valid_words("C:\\Users\\Chris\\Python\\wordlist.txt")
return [word for word in permutations(s) if word in valid_words]
if __name__ == "__main__":
print(jumbo_solve("rarom"))
I’m quite new to python and am getting some strange results, surely due to a basic error on my part…
Basically, in Python 3.x, I must define a function (best_words(ltr_set,word_file)) that takes a set of letters (a list of characters) and searches a .txt file of words (1 word per line) for those that can be formed with those letters.
I first defined a function that checks if a given word can be made from a given set of letters. The word to be checked must be fed into this function as a list of characters (lsta), so it can be checked against the set of letters available (lstb):
def can_make_lsta_wrd_frm_lstb(lsta,lstb):
result = True
i = 0
while i < len(lsta) and result == True:
if lsta[i] in lstb:
lstb.remove(lsta[i])
i+=1
else:
result = False
return result
I also defined a function that takes any given string and converts it into a list of it's characters:
def lst(string):
ls = []
for c in string:
ls.append(c)
return ls
The idea behind the main best_words function is therefore to take a given set of letters and apply the above function to every line in a file of words, with the aim of filtering down to only those that can be made from the letters available...
def best_words(ltr_set, word_file):
possible_words = []
f = open(word_file)
lines = f.readlines()
i = 0
while i < len(lines):
lines[i] = lines[i].strip('\n')
i+=1
for item in lines:
if can_make_lsta_wrd_frm_lstb(lst(item),ltr_set):
possible_words.append(item)
return possible_words
However, I keep getting an unexpected result, as if a loop is not continued as it should be…
For instance, if I take a file short_dictionnary.txt with the following words:
AA
AAS
ABACA
ABACAS
ABACOST
ABACOSTS
ABACULE
ABACULES
ABAISSA
ABAISSABLE
and call the function:
best_words([‘A’,’C’,’B’,’A’,’S’,’A’], “short_dictionnary.txt”)
The possible_words list is comprised solely of “AA”…whilst AAS, ABACA and ABACAS could also be formed…
If anyone can see what’s going on, their input be greatly appreciated!
I would convert letter_set to a Counter and then for each letter in possible word, check that there are enough of that letter in letter_set to make that word. You're also leaking file references.
from collections import Counter
def can_make_word(c, word):
return all(c[letter]>=count for letter, count in Counter(word).most_common())
def best_words(ltr_set, word_file):
possible_words = []
c = Counter(ltr_set)
with open(word_file) as f:
lines = f.readlines()
lines = [line.strip() for line in lines]
for item in lines:
if can_make_word(c, item):
possible_words.append(item)
return possible_words
Thanks everyone, I now understand what I had to do!
I essentially needed to ensure the original ltr_set wasn't modified; so this was achieved by making a simple copy. I don't know if answering my own question is of any use (I'm quite new to this forum), but here's the corrected can_make... function should anyone find it useful for resolving a similar issue:
def can_make_lsta_wrd_frm_lstb(lsta,lstb):
lstb_copy = lstb[:]
result = True
i = 0
while i < len(lsta) and result == True:
if lsta[i] in lstb_copy:
lstb_copy.remove(lsta[i])
i+=1
else:
result = False
return result
I need to call various different functions that i have already created in order to achieve the question below. I am really unsure how to programme this in order to achieve it. The question is..
Find two word anagrams in word list( str, str list ), where the input parameters should
be a string and a list of strings. The output should be a list of all strings made up of two
words separated by a space, such that both of these words are in str list and the combination
of the two words is an anagram of str.
The output i expect is:
wordlist = ('and','band','nor,'born')
find_two_word_anagrams_in_wordlist( "brandon", wordlist )
[’and born’, ’band nor’]
How to achieve this is:
Initialise a list two word anagrams to the empty list, [].
Call find partial anagrams in word list to get a list of all the
partial anagrams of str that can be found in the word list.
Then, do a loop that runs over all these partial anagrams. For each
partial anagram part anag do the following:
Remove the letters of part anag from the input string, str, to get
a string, rem that is the remaining letters after taking away the
partial anagram;
Call find anagrams in word list on rem (and the input word list) to
get a list of anagrams of the remaining letters;
For each anagram rem anag of the remaining letters, form the string
part anag + " " + rem anag and add it to the list two word anagrams.
At this point the list two word anagrams should have all the two
word anagrams in it, so return that list from your function.
Code i have already created is:
def find_partial_anagrams_in_word_list(str1,str_list):
partial_anagrams = []
for word in str_list:
if (partial_anagram(word, str1)):
partial_anagrams.append(word)
print(partial_anagrams)
def remove_letters(str1,str2):
str2_list = list(str2)
for char in str2:
if char in str1:
str2_list.remove(char)
return "".join(str2_list)
def find_anagrams_in_word_list(str1,str_list):
anagrams = []
for word in str_list:
if (anagram(str1,word)):
anagrams.append(word)
print(word)
Any step by step help or input would be appreciated.
You have two choices, you can look at anagrams in combinations of two different words, or you can look at anagrams in all two by two combinations.
The choice is yours, here I have implemented both
from itertools import combinations, combinations_with_replacement
def ana2(s,wl):
rl = []
for w1, w2 in combinations(wl,2):
w = w1+w2
if len(w) != len(s): continue
if sorted(w) == sorted(s): rl.append((w1, w2)
return rl
def ana2wr(s,wl):
rl = []
for w1, w2 in combinations_with_replacement(wl,2):
w = w1+w2
if len(w) != len(s): continue
if sorted(w) == sorted(s): rl.append((w1, w2))
return rl
and here it is some testing
wl = ('and','band','nor','born')
s1 = 'brandon'
s2 = 'naddan
print ana2(s1,wl)
print ana2(s2,wl)
print ana2wr(s1,wl)
print ana2wr(s2,wl)
that produces the following output
[('and', 'born'), ('band', 'nor')]
[]
[('and', 'born'), ('band', 'nor')]
[('and', 'and')]
You wrote
I need to call various different functions that i have already created
in order to achieve the question below
def find_2_word_anagram(word, wordlist)
# Initialise a list two word anagrams to the empty list, [].
analist = []
# Call find partial anagrams in word list to get a list of all the
# partial anagrams of str that can be found in the word list.
candidates = find_partial_anagrams_in_word_list(word,wordlist)
# Then, do a loop that runs over all these partial anagrams
for candidate in candidates:
# Remove the letters of part anag from the input string, str, to
# get a string, rem that is the remaining letters after taking
# away the partial anagram
remaining = remove_letter(candidate,words)
# Call find anagrams in word list on rem (and the input word list)
# to get a list of anagrams of the remaining letters;
matches = find_anagrams_in_word_list(remaining, wordlist)
for other in matches:
analist.append(" ".join(candidate,other))
return analist
Note that
you have still to write the inner functions following your specifications
when you write a function that returns, e.g., a list of words you MUST RETURN A LIST OF WORDS, and in particular I mean that it isn't enough that you print the matches from your function
to find an anagram, the idiom sorted(w1)==sorted(w2) is all you need, but the story with finding a partial anagram is more complex...
I took the liberty of inlining one of your functions, the one that computes the remainder.
When you strip the comments, that are your verbatim your specs, you have very few lines of code.
Post Scriptum
Have a thorough look at Daniel's comment to my previous answer, there's a lot in it...