compare specific string to a word python - python

say I have a certain string and a list of strings.
I would like to append to a new list all the words from the list (of strings)
that are exactly like the pattern
for example:
list of strings = ['string1','string2'...]
pattern =__letter__letter_ ('_c__ye_' for instance)
I need to add all strings that are made up of the same letters in the same places as the pattern, and has the same length.
so for instance:
new_list = ['aczxyep','zcisyef'...]
I have tried this:
def pattern_word_equality(words,pattern):
list1 = []
for word in words:
for letter in word:
if letter in pattern:
list1.append(word)
return list1
help will be much appreciated :)

If your pattern is as simple as _c__ye_, then you can look for the characters in the specific positions:
words = ['aczxyep', 'cxxye', 'zcisyef', 'abcdefg']
result1 = list(filter(lambda w: w[1] == 'c' and w[4:6] == 'ye', words))
If your pattern is getting more complex, then you can start using regular expressions:
pat = re.compile("^.c..ye.$")
result2 = list(filter(lambda w: pat.match(w), words))
Output:
print(result1) # ['aczxyep', 'zcisyef']
print(result2) # ['aczxyep', 'zcisyef']

This works:
words = ['aczxyep', 'cxxye', 'zcisyef', 'abcdefg']
pattern = []
for i in range(len(words)):
if (words[i])[1].lower() == 'c' and (words[i])[4:6].lower() == 'ye':
pattern.append(words[i])
print(pattern)
You start by defining the words and pattern lists. Then you loop around for the amount of items in words by using len(words). You then find whether the i item number is follows the pattern by seeing if the second letter is c and the 5th and 6th letters are y and e. If this is true then it appends that word onto pattern and it prints them all out at the end.

Related

Filter a list of strings by a char in same position

I am trying to make a simple function that gets three inputs: a list of words, list of guessed letters and a pattern. The pattern is a word with some letters hidden with an underscore. (for example the word apple and the pattern '_pp_e')
For some context it's a part of the game hangman where you try to guess a word and this function gives a hint.
I want to make this function to return a filtered list of words from the input that does not contain any letters from the list of guessed letters and the filtered words contain the same letters and their position as with the given pattern.
I tried making this work with three loops.
First loop that filters all words by the same length as the pattern.
Second loop that checks for similarity between the pattern and the given word. If the not filtered word does contain the letter but not in the same position I filter it out.
Final loop checks the filtered word that it does not contain any letters from the given guessed list.
I tried making it work with not a lot of success, I would love for help. Also any tips for making the code shorter (without using third party libraries) will be a appreciated very much.
Thanks in advance!
Example: pattern: "d _ _ _ _ a _ _ _ _" guessed word list ['b','c'] and word list contain all the words in english.
output list: ['delegating', 'derogation', 'dishwasher']
this is the code for more context:
def filter_words_list(words, pattern, wrong_guess_lst):
lst_return = []
lst_return_2 = []
lst_return_3 = []
new_word = ''
for i in range(len(words)):
if len(words[i]) == len(pattern):
lst_return.append(words[i])
pattern = list(pattern)
for i in range(len(lst_return)):
count = 0
word_to_check = list(lst_return[i])
for j in range(len(pattern)):
if pattern[j] == word_to_check[j] or (pattern[j] == '_' and
(not (word_to_check[j] in
pattern))):
count += 1
if count == len(pattern):
lst_return_2.append(new_word.join(word_to_check))
for i in range(len(lst_return_2)):
word_to_check = lst_return_2[i]
for j in range(len(wrong_guess_lst)):
if word_to_check.find(wrong_guess_lst[j]) == -1:
lst_return_3.append(word_to_check)
return lst_return_3
The easiest, and likely quite efficient, way to do this would be to translate your pattern into a regular expression, if regular expressions are in your "toolbox". (The re module is in the standard library.)
In a regular expression, . matches any single character. So, we replace all _s with .s and add "^" and "$" to anchor the regular expression to the whole string.
import re
def filter_words(words, pattern, wrong_guesses):
re_pattern = re.compile("^" + re.escape(pattern).replace("_", ".") + "$")
# get words that
# (a) are the correct length
# (b) aren't in the wrong guesses
# (c) match the pattern
return [
word
for word in words
if (
len(word) == len(pattern) and
word not in wrong_guesses and
re_pattern.match(word)
)
]
all_words = [
"cat",
"dog",
"mouse",
"horse",
"cow",
]
print(filter_words(all_words, "c_t", []))
print(filter_words(all_words, "c__", []))
print(filter_words(all_words, "c__", ["cat"]))
prints out
['cat']
['cat', 'cow']
['cow']
If you don't care for using regexps, you can instead translate the pattern to a dict mapping each defined position to the character that should be found there:
def filter_words_without_regex(words, pattern, wrong_guesses):
# get a map of the pattern's defined letters to their positions
letter_map = {i: letter for i, letter in enumerate(pattern) if letter != "_"}
# get words that
# (a) are the correct length
# (b) aren't in the wrong guesses
# (c) have the correct letters in the correct positions
return [
word
for word in words
if (
len(word) == len(pattern) and
word not in wrong_guesses and
all(word[i] == ch for i, ch in letter_map.items())
)
]
The result is the same.
Probably not the most efficient, but this should work:
def filter_words_list(words, pattern, wrong_guess_lst):
fewer_words = [w for w in words if not any([wgl in w for wgl in wrong_guess_lst])]
equal_len_words = [w for w in fewer_words if len(w) == len(pattern)]
pattern_indices = [idl for idl, ltr in enumerate(pattern) if ltr != '_']
word_indices = [[idl for idl, ltr in enumerate(w) if ((ltr in pattern) and (ltr != '_'))] for w in equal_len_words]
out = [w for wid, w in zip(word_indices, equal_len_words) if ((wid == pattern_indices) and (w[pid] == pattern[pid] for pid in pattern_indices))]
return out
The idea is to first remove all words that have letters in your wrong_guess_lst.
Then, remove everything which does not have the same length (you could also merge this condition in the first one..).
Next, for both pattern and your remaining words, you create a pattern mask, which indicates the positions of non '_' letters.
To be a candidate, the masks have to be identical AND the letters in these positions have to be identical as well.
Note, that I replaced a lot of for loops in you code by list comprehension snippets. List comprehension is a very useful construct which helps a lot especially if you don't want to use other libraries.
Edit: I cannot really tell you, where your code went wrong as it was a little too long for me..
The regex rule is explicitely constructed, in particular no check on the word's length is needed. To achieve this the groupby function from the itertools package of the standard library is used:
'_ b _ _ _' -- regex-- > r'^.{1}b.{3}$'
Here how to filter the dictionary by a guess string:
import itertools as it
import re
# sample dictionary
dictionary = "a ability able about above accept according account across act action activity actually add address"
dictionary = dictionary.split()
guess = '_ b _ _ _'
guess = guess.replace(' ', '') # remove white spaces
# construction of the regex rule
regex = r'^'
for _, i in it.groupby(guess, key=lambda x: x == '_'):
if '_' in (l:=list(i)):
regex += ''.join(f'.{{{len(l)}}}') # escape the curly brackets
else:
regex += ''.join(l)
regex += '$'
# processing the regex rule
pattern = re.compile(regex)
# filter the dictionary by the rule
l = [word for word in dictionary if pattern.match(word)]
print(l)
Output
['about', 'above']

Search string in string with wildcard char

I was looking around regex and fnmatch threads but couldn't find similar problem.
User enters some string and I need to find it in string that is in col in dataframe. Strings have N char as a wildcard so N can be one of 3 other letters B W C
'BBBB' in 'BBNBAQWE' = True
becouse N transformed into B
'QWER' in 'QNERVFRZ' = True
becouse N transformed into W
strings can be diffrent sizes and from my understanding only one N letter can be morphed in that string to fit user request
What im planning is to add True/False value to new col based on output
df['is_present'] = df['strings'].map(lambda x: get_strings(x, user_val))
One way is to replace every letter of searched pattern allowing 'N' as alternative.
You can switch all the patterns using list comprehension:
raw_pattern = 'QWER'
pattern = ''.join(['(?:' + letter + '|N)' for letter in list(raw_pattern)])
#pattern = '(?:Q|N)(?:W|N)(?:E|N)(?:R|N)'
Then
sentence = 'QNENVFRZ'
re.findall(pattern, sentence)
>>> ['QNEN']
If the resulting list is not empty, the pattern was found in the sentence.
Edit:
The question was modified to only accept 'N' if it exchanges 'B', 'W', or 'C'.
Then we would like to create pattern like this:
pattern = ''.join(['(?:' + letter + '|N)' if letter in ('B', 'W', 'C') else letter for letter in list(raw_pattern)])
# pattern = 'Q(?:W|N)ER'
Of course then the original example does not match, as R was not able to replace N.
We get:
re.findall(pattern, sentence)
>>> []
We can check whether something was matched comparing to an empty list.
re.findall(pattern, sentence) == []
>>> True

Selecting randomly from a list with a specific character

So, I have a text file (full of words) I put into a list. I want Python 2.7 to select a word from the list randomly, but for it to start in a specific character.
list code:
d=[]
with open("dic.txt", "r") as x:
d=[line.strip() for line in x]
It's for a game called Shiritori. The user starts with saying any word in the English language, ie dog. The program then has to pick another word starting with the last character, in this case, 'g'.
code for the game:
game_user='-1'
game_user=raw_input("em, lets go with ")
a1=len(game_user)
I need a program that will randomly select a word beginning with that character.
Because your game relies specifically upon a random word with a fixed starting letter, I suggest first sorting all your words into a dictionary with the starting letter as the key. Then, you can randomly lookup any word starting with a given letter:
d=[]
lookup = {}
with open("dic.txt", "r") as x:
d=[line.strip() for line in x]
for word in d:
if word[0] in lookup:
lookup[word[0]].append(word)
else:
lookup[word[0]] = [ word ]
now you have a dict 'lookup' that has all your words sorted by letter.
When you need a word that starts with the last letter of the previous word, you can randomly pick an element in your list:
import random
random_word = random.choice(lookup[ game_user[-1] ])
In order to get a new list of all the values that start with the last letter of the user input:
choices = [x in d if x[0] == game_user[-1]]
Then, you can select a word by:
newWord = random.choice(choices)
>>> import random
>>> with open('/usr/share/dict/words') as f:
... words = f.read().splitlines()
...
>>> c = 'a'
>>> random.choice([w for w in words if w.startswith(c)])
'anthologizing'
Obviously, you need to replace c = 'a' with raw_input("em, lets go with ")
You might get better use out of more advanced data structures, but here's a shot:
words_dict = {}
for row in d:
# Gives us the first letter
myletter = row[0]
if myletter not in words_dict:
words_dict[myletter] = []
words_dict[myletter].append(row)
After creating a dictionary of all letters and their corresponding words, you can then access any particular set of words like so:
words_dict['a']
Which will give you all the words that start with a in a list. Then you can take:
# This could be any letter here..
someletter = 'a'
newword = words_dict[someletter][random.randint(0,len(words_dict[someletter]-1))]
Let me know if that makes sense?

Removing words containing digits from a given string

I'm trying to write a simple program that removes all words containing digits from a received string.
Here is my current implementation:
import re
def checkio(text):
text = text.replace(",", " ").replace(".", " ") .replace("!", " ").replace("?", " ").lower()
counter = 0
words = text.split()
print words
for each in words:
if bool(re.search(r'\d', each)):
words.remove(each)
print words
checkio("1a4 4ad, d89dfsfaj.")
However, when I execute this program, I get the following output:
['1a4', '4ad', 'd89dfsfaj']
['4ad']
I can't figure out why '4ad' is printed in the second line as it contains digits and should have been removed from the list. Any ideas?
Assuming that your regular expression does what you want, you can do this to avoid removing while iterating.
import re
def checkio(text):
text = re.sub('[,\.\?\!]', ' ', text).lower()
words = [w for w in text.split() if not re.search(r'\d', w)]
print words ## prints [] in this case
Also, note that I simplified your text = text.replace(...) line.
Additionally, if you do not need to reuse your text variable, you can use regex to split it directly.
import re
def checkio(text):
words = [w for w in re.split('[,.?!]', text.lower()) if w and not re.search(r'\d', w)]
print words ## prints [] in this case
If you are testing for alpha numeric strings why not use isalnum() instead of regex ?
In [1695]: x = ['1a4', '4ad', 'd89dfsfaj']
In [1696]: [word for word in x if not word.isalnum()]
Out[1696]: []
This would be possible through using re.sub, re.search and list_comprehension.
>>> import re
>>> def checkio(s):
print([i for i in re.sub(r'[.,!?]', '', s.lower()).split() if not re.search(r'\d', i)])
>>> checkio("1a4 4ad, d89dfsfaj.")
[]
>>> checkio("1a4 ?ad, d89dfsfaj.")
['ad']
So apparently what happens is a concurrent access error. Namely - you are deleting an element while traversing the array.
At the first iteration we have words = ['1a4', '4ad', 'd89dfsfaj']. Since '1a4' has a number, we remove it.
Now, words = ['4ad','d89dfsfaj']. However, at the second iteration, the current word is now 'd89dfsfaj' and we remove it. What happens is that we skip '4ad', because it is now at index 0 and the current pointer for the for cycle is at 1.

How do I remove 1 instance of x characters in a string and find the word it makes in Python3?

This is what I have so far, but I'm stuck. I'm using nltk for the word list and trying to find all the words with the letters in "sand". From this list I want to find all the words I can make from the remaining letters.
import nltk.corpus.words.words()
pwordlist = []
for w in wordlist:
if 's' in w:
if 'a' in w:
if 'n' in w:
if 'd' in w:
pwordlist.append(w)
In this case I have to use all the letters to find the words possible.
I think this will work for finding the possible words with the remaining letters, but I can't figure out how to remove only 1 instance of the letters in 'sand'.
puzzle_letters = nltk.FreqDist(x)
[w for w in pwordlist if len(w) = len(pwordlist) and nltk.FreqDist(w) = puzzle_letters]
I would separate the logic into four sections:
A function contains(word, letters), which we'll use to detect whether a word contains "sand"
A function subtract(word, letters), which we'll use to remove "sand" from the word.
A function get_anagrams(word), which finds all of the anagrams of a word.
The main algorithm that combines all of the above to find words that are anagrams of other words once you remove "sand".
 
from collections import Counter
words = ??? #todo: somehow get a list of every English word.
def contains(word, letters):
return not Counter(letters) - Counter(word)
def subtract(word, letters):
remaining = Counter(word) - Counter(letters)
return "".join(remaining.elements())
anagrams = {}
for word in words:
base = "".join(sorted(word))
anagrams.setdefault(base, []).append(word)
def get_anagrams(word):
return anagrams.get("".join(sorted(word)), [])
for word in words:
if contains(word, "sand"):
reduced_word = subtract(word, "sand")
matches = get_anagrams(reduced_word)
if matches:
print word, matches
Running the above code on the Words With Friends dictionary, I get a lot of results, including:
...
cowhands ['chow']
credentials ['reticle', 'tiercel']
cyanids ['icy']
daftness ['efts', 'fest', 'fets']
dahoons ['oho', 'ooh']
daikons ['koi']
daintiness ['seniti']
daintinesses ['sienites']
dalapons ['opal']
dalesman ['alme', 'lame', 'male', 'meal']
...
Program:
from nltk.corpus import words
from collections import defaultdict
def norm(word):
return ''.join(sorted(word))
completers = defaultdict(list)
for word in words.words():
completers[norm(word + 'sand')].append(word)
for word in words.words():
comps = completers[norm(word)]
if comps:
print(word, comps)
Output:
...
admirableness ['miserable']
adnascent ['enact']
adroitness ['sorite', 'sortie', 'triose']
adscendent ['cedent', 'decent']
adsorption ['portio']
adventuress ['vesture']
adversant ['avert', 'tarve', 'taver', 'trave']
...
Let's answer your question instead of spoiling the fun by doing the whole exercise for you: To remove just one instance of the letter, specify a replacement and give a limit to how many times it should apply:
>>> "Frodo".replace("o", "", 1)
'Frdo'
Or if you need to apply a regexp just once (though in this case you don't need a regexp):
>>> import re
>>> re.sub(r"[od]", "", "Frodo", 1)
'Frdo'
Now if you have a string whose letters (s, a, n, d) you want to remove from a word word, you can simply loop over the string:
>>> for letter in "sand":
word = word.replace(letter, "", word)
I'll leave it to you to embed this in a loop that goes over all words in your wordlist, and to utilize the remaining letters.

Categories

Resources