Search string in string with wildcard char

Search string in string with wildcard char - python

I was looking around regex and fnmatch threads but couldn't find similar problem.
User enters some string and I need to find it in string that is in col in dataframe. Strings have N char as a wildcard so N can be one of 3 other letters B W C
'BBBB' in 'BBNBAQWE' = True
becouse N transformed into B
'QWER' in 'QNERVFRZ' = True
becouse N transformed into W
strings can be diffrent sizes and from my understanding only one N letter can be morphed in that string to fit user request
What im planning is to add True/False value to new col based on output
df['is_present'] = df['strings'].map(lambda x: get_strings(x, user_val))

One way is to replace every letter of searched pattern allowing 'N' as alternative.
You can switch all the patterns using list comprehension:
raw_pattern = 'QWER'
pattern = ''.join(['(?:' + letter + '|N)' for letter in list(raw_pattern)])
#pattern = '(?:Q|N)(?:W|N)(?:E|N)(?:R|N)'
Then
sentence = 'QNENVFRZ'
re.findall(pattern, sentence)
>>> ['QNEN']
If the resulting list is not empty, the pattern was found in the sentence.
Edit:
The question was modified to only accept 'N' if it exchanges 'B', 'W', or 'C'.
Then we would like to create pattern like this:
pattern = ''.join(['(?:' + letter + '|N)' if letter in ('B', 'W', 'C') else letter for letter in list(raw_pattern)])
# pattern = 'Q(?:W|N)ER'
Of course then the original example does not match, as R was not able to replace N.
We get:
re.findall(pattern, sentence)
>>> []
We can check whether something was matched comparing to an empty list.
re.findall(pattern, sentence) == []
>>> True

Related

Filter a list of strings by a char in same position

I am trying to make a simple function that gets three inputs: a list of words, list of guessed letters and a pattern. The pattern is a word with some letters hidden with an underscore. (for example the word apple and the pattern '_pp_e')
For some context it's a part of the game hangman where you try to guess a word and this function gives a hint.
I want to make this function to return a filtered list of words from the input that does not contain any letters from the list of guessed letters and the filtered words contain the same letters and their position as with the given pattern.
I tried making this work with three loops.
First loop that filters all words by the same length as the pattern.
Second loop that checks for similarity between the pattern and the given word. If the not filtered word does contain the letter but not in the same position I filter it out.
Final loop checks the filtered word that it does not contain any letters from the given guessed list.
I tried making it work with not a lot of success, I would love for help. Also any tips for making the code shorter (without using third party libraries) will be a appreciated very much.
Thanks in advance!
Example: pattern: "d _ _ _ _ a _ _ _ _" guessed word list ['b','c'] and word list contain all the words in english.
output list: ['delegating', 'derogation', 'dishwasher']
this is the code for more context:
def filter_words_list(words, pattern, wrong_guess_lst):
lst_return = []
lst_return_2 = []
lst_return_3 = []
new_word = ''
for i in range(len(words)):
if len(words[i]) == len(pattern):
lst_return.append(words[i])
pattern = list(pattern)
for i in range(len(lst_return)):
count = 0
word_to_check = list(lst_return[i])
for j in range(len(pattern)):
if pattern[j] == word_to_check[j] or (pattern[j] == '_' and
(not (word_to_check[j] in
pattern))):
count += 1
if count == len(pattern):
lst_return_2.append(new_word.join(word_to_check))
for i in range(len(lst_return_2)):
word_to_check = lst_return_2[i]
for j in range(len(wrong_guess_lst)):
if word_to_check.find(wrong_guess_lst[j]) == -1:
lst_return_3.append(word_to_check)
return lst_return_3

The easiest, and likely quite efficient, way to do this would be to translate your pattern into a regular expression, if regular expressions are in your "toolbox". (The re module is in the standard library.)
In a regular expression, . matches any single character. So, we replace all _s with .s and add "^" and "$" to anchor the regular expression to the whole string.
import re
def filter_words(words, pattern, wrong_guesses):
re_pattern = re.compile("^" + re.escape(pattern).replace("_", ".") + "$")
# get words that
# (a) are the correct length
# (b) aren't in the wrong guesses
# (c) match the pattern
return [
word
for word in words
if (
len(word) == len(pattern) and
word not in wrong_guesses and
re_pattern.match(word)
)
]
all_words = [
"cat",
"dog",
"mouse",
"horse",
"cow",
]
print(filter_words(all_words, "c_t", []))
print(filter_words(all_words, "c__", []))
print(filter_words(all_words, "c__", ["cat"]))
prints out
['cat']
['cat', 'cow']
['cow']
If you don't care for using regexps, you can instead translate the pattern to a dict mapping each defined position to the character that should be found there:
def filter_words_without_regex(words, pattern, wrong_guesses):
# get a map of the pattern's defined letters to their positions
letter_map = {i: letter for i, letter in enumerate(pattern) if letter != "_"}
# get words that
# (a) are the correct length
# (b) aren't in the wrong guesses
# (c) have the correct letters in the correct positions
return [
word
for word in words
if (
len(word) == len(pattern) and
word not in wrong_guesses and
all(word[i] == ch for i, ch in letter_map.items())
)
]
The result is the same.

Probably not the most efficient, but this should work:
def filter_words_list(words, pattern, wrong_guess_lst):
fewer_words = [w for w in words if not any([wgl in w for wgl in wrong_guess_lst])]
equal_len_words = [w for w in fewer_words if len(w) == len(pattern)]
pattern_indices = [idl for idl, ltr in enumerate(pattern) if ltr != '_']
word_indices = [[idl for idl, ltr in enumerate(w) if ((ltr in pattern) and (ltr != '_'))] for w in equal_len_words]
out = [w for wid, w in zip(word_indices, equal_len_words) if ((wid == pattern_indices) and (w[pid] == pattern[pid] for pid in pattern_indices))]
return out
The idea is to first remove all words that have letters in your wrong_guess_lst.
Then, remove everything which does not have the same length (you could also merge this condition in the first one..).
Next, for both pattern and your remaining words, you create a pattern mask, which indicates the positions of non '_' letters.
To be a candidate, the masks have to be identical AND the letters in these positions have to be identical as well.
Note, that I replaced a lot of for loops in you code by list comprehension snippets. List comprehension is a very useful construct which helps a lot especially if you don't want to use other libraries.
Edit: I cannot really tell you, where your code went wrong as it was a little too long for me..

The regex rule is explicitely constructed, in particular no check on the word's length is needed. To achieve this the groupby function from the itertools package of the standard library is used:
'_ b _ _ _' -- regex-- > r'^.{1}b.{3}$'
Here how to filter the dictionary by a guess string:
import itertools as it
import re
# sample dictionary
dictionary = "a ability able about above accept according account across act action activity actually add address"
dictionary = dictionary.split()
guess = '_ b _ _ _'
guess = guess.replace(' ', '') # remove white spaces
# construction of the regex rule
regex = r'^'
for _, i in it.groupby(guess, key=lambda x: x == '_'):
if '_' in (l:=list(i)):
regex += ''.join(f'.{{{len(l)}}}') # escape the curly brackets
else:
regex += ''.join(l)
regex += '$'
# processing the regex rule
pattern = re.compile(regex)
# filter the dictionary by the rule
l = [word for word in dictionary if pattern.match(word)]
print(l)
Output
['about', 'above']

split a string to have chunks containing the maximum number of possible characters

e.g. string = 'bananaban'
=> ['ban', 'anab', 'an']
My attempt:
def apart(string):
letters = []
for i in string:
while i not in letters:
letters.append(i)
print("The letters are:" +str(letters))
x = []
result = []
return result
string = str(input("Enter string: "))
print(apart(string)
Basically, If I know all the letters that are in the word/string, I want to add them into x, until x contains all letters. Then I want to add x into result.
In my examaple "bananaban" it would mean [ban] is one x, because "ban" countains the letter "b","a" and "n". Same goes for [anab]. [an] only contains "a" and "n" because it is the end of the word.
Would be cool if somebody could help me ^^

IIUC, you want to split after all characters are in the current chunk.
You could use a set to keep track of the seen characters:
s = 'bananaban'
seen = set()
letters = set(s)
out = ['']
for c in s:
if seen != letters:
out[-1] += c
seen.add(c)
else:
seen = set(c)
out.append(c)
output: ['ban', 'anab', 'an']

The logical way seens to be first create a set with all letters in your string, then go over teh original one, collecting each character, and startign a new collection each time the set of letters in the collection match the original.
def apart(string):
target = set(string)
result = []
component = ""
for char in string:
component += char
if set(component) == target:
result.append(component)
component = ""
if component:
result.append(component)
return result

Using a set of the characters in the string, you can loop through the string and add or extend the last group in your resulting list:
S = "bananaban"
chars = set(S) # distinct characters of string
groups = [""] # start with an empty group
for c in S:
if chars.issubset(groups[-1]): # group contains all characters
groups.append(c) # start a new group
else:
groups[-1] += c # append character to last group
print(groups)
['ban', 'anab', 'an']

How to pick out first letter from words in dot-separated string

How do I pick out the letters in a string like: "first.last".
I want to pick out the f in first, and the l in last.
But I need to make it where when someone inputs something like "quartz.block" I need to get the 'q' and 'b'.
Does anyone have any ideas how?

Use split(".") function. Separate the words where dot (.) is present and print the first letter of the separated words. Your code:
string = input("Enter= ")
words = string.split(".")
for word in words:
print(word[0])

You can try the following:
word = "quartz.block"
first_last = f"{word[0]}{word[word.index('.') + 1]}"

Shakespeare iterator solution:
it = iter('quartz.block')
print([c for c in it
if '.' in it
or '.' not in it])
Output (Try it online!):
['q', 'b']

Append last letter in a string to another string

I am constructing a chatbot that rhymes in Python. Is it possible to identify the last vowel (and all the letters after that vowel) in a random word and then append those letters to another string without having to go through all the possible letters one by one (like in the following example)
lastLetters = '' # String we want to append the letters to
if user_answer.endswith("a")
lastLetters.append("a")
else if user_answer.endswith("b")
lastLetters.append("b")
Like if the word was right we’d want to get ”ight”

You need to find the last index of a vowel, for that you could do something like this (a bit fancy):
s = input("Enter the word: ") # You can do this to get user input
last_index = len(s) - next((i for i, e in enumerate(reversed(s), 1) if e in "aeiou"), -1)
result = s[last_index:]
print(result)
Output
ight
An alternative using regex:
import re
s = "right"
last_index = -1
match = re.search("[aeiou][^aeiou]*$", s)
if match:
last_index = match.start()
result = s[last_index:]
print(result)
The pattern [aeiou][^aeiou]*$ means match a vowel followed by possibly several characters that are not a vowel ([^aeiou] means not a vowel, the sign ^ inside brackets means negation in regex) until the end of the string. So basically match the last vowel. Notice this assumes a string compose only of consonants and vowels.

compare specific string to a word python

say I have a certain string and a list of strings.
I would like to append to a new list all the words from the list (of strings)
that are exactly like the pattern
for example:
list of strings = ['string1','string2'...]
pattern =__letter__letter_ ('_c__ye_' for instance)
I need to add all strings that are made up of the same letters in the same places as the pattern, and has the same length.
so for instance:
new_list = ['aczxyep','zcisyef'...]
I have tried this:
def pattern_word_equality(words,pattern):
list1 = []
for word in words:
for letter in word:
if letter in pattern:
list1.append(word)
return list1
help will be much appreciated :)

If your pattern is as simple as _c__ye_, then you can look for the characters in the specific positions:
words = ['aczxyep', 'cxxye', 'zcisyef', 'abcdefg']
result1 = list(filter(lambda w: w[1] == 'c' and w[4:6] == 'ye', words))
If your pattern is getting more complex, then you can start using regular expressions:
pat = re.compile("^.c..ye.$")
result2 = list(filter(lambda w: pat.match(w), words))
Output:
print(result1) # ['aczxyep', 'zcisyef']
print(result2) # ['aczxyep', 'zcisyef']

This works:
words = ['aczxyep', 'cxxye', 'zcisyef', 'abcdefg']
pattern = []
for i in range(len(words)):
if (words[i])[1].lower() == 'c' and (words[i])[4:6].lower() == 'ye':
pattern.append(words[i])
print(pattern)
You start by defining the words and pattern lists. Then you loop around for the amount of items in words by using len(words). You then find whether the i item number is follows the pattern by seeing if the second letter is c and the 5th and 6th letters are y and e. If this is true then it appends that word onto pattern and it prints them all out at the end.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Search string in string with wildcard char - python

Related

Filter a list of strings by a char in same position

split a string to have chunks containing the maximum number of possible characters

How to pick out first letter from words in dot-separated string

Append last letter in a string to another string

compare specific string to a word python

Categories

Resources