How to match 2 string lists?

How to match 2 string lists? - python

I want to write Python codes to search the matching items from "word" using the "letter" list.
I created 2 lists as follow - word & letter:
word = ['hello', 'how', 'are', 'you', 'potato']
letter = ['how', 'ell', 'aaa', 'bbb', 'tat', 're']
What I want to get is the following output as a list, with the mapping result between "word" and "letter" list.
If a complete string is matched, the result will return as "True".
If a partial string is matched, the result will return as "True".
If nil part of the string is matched, the result will return as "False.
word_result = ['True', 'True', 'True', 'False', 'True']
I tried on my own using for loop / if...else / import re, but could not get the result as what I want.
Can anyone give a hand to assist?
Thank you so much!
I tested using the below coding but not work:
word = ['hello', 'how', 'are', 'you', 'potato']
letters = ['ell', 'how', 'aaa', 'bbb', 'tat', 're']
def check_match():
for l in letters:
if l in word:
print(l)
print(check_match())
Expect result:
word_result = ['True', 'True', 'True', 'False', 'True']

Use list comprehension with any:
word = ['hello', 'how', 'are', 'you', 'potato']
letters = ['ell', 'how', 'aaa', 'bbb', 'tat']
def check_match():
return [any(x in i for x in letters) for i in word]
print(check_match())
Output:
[True, True, False, False, True]

I am not sure if your expected output is right.
Here is code that should work:
def check_match():
res = []
for l in letters:
for w in word:
if l in w:
res.append(True)
break
else:
res.append(False)
return res
output:
[True, True, False, False, True]
EDIT:
Now I get your question... First of all, you are shown different input:
word = ['hello', 'how', 'are', 'you', 'potato']
letter = ['how', 'ell', 'aaa', 'bbb', 'tat', 're']
Then used different:
word = ['hello', 'how', 'are', 'you', 'potato']
letters = ['ell', 'how', 'aaa', 'bbb', 'tat']
And the second of all, you want to check if word from word has some corresponding letter from letters. All you need to do is switch loops:
def check_match():
res = []
for w in word:
for l in letters:
if l in w:
res.append(True)
break
else:
res.append(False)
return res
output:
[True, True, True, False, True]

You do not have a precise definition of partial match, which makes it difficult to answer your question in full.
However, we can confine this (missing) piece of wisdom into a function:
def partial_match(word, letter):
...
The rest of the logic can be easily written with either a nested loop:
words = ['hello', 'how', 'are', 'you', 'potato']
letters = ['ell', 'how', 'aaa', 'bbb', 'tat']
results = []
for word in words:
result = False
for letter in letters:
if word == letter or partial_match(word, letter):
result = True
break
results.append(result)
The inner part of the loop is a common design pattern in programming, and Python offers a shortcut for this using the any() primitive and rewriting the inner loop as a comprehension:
results = []
for word in words:
result = any(
word == letter or partial_match(word, letter)
for letter in letters)
results.append(result)
or even more compact, rewriting both loops are comprehensions:
results = [
any(
word == letter or partial_match(word, letter)
for letter in letters)
for word in words]
Now let us focus to partial_match(), if all you want is to make sure that letter is contained in word, e.g.:
partial_match('how', 'ow') == True
partial_match('how', 'ho') == True
partial_match('how', 'o') == True
partial_match('how', 'oww') == False
partial_match('how', 'wow') == False
partial_match('how', 'hoe') == False
partial_match('how', 'xxx') == False
Then you can simply use:
def partial_match(word, letter):
return letter in word
and, noticing that the exact match (described by word == letter) will also satisfy the partial_match() you end up #U10-Forward's answer by omitting the == check, inlining partial_match() and a couple of renames.
If your partial_match() should be different, all of the above is still valid, and you just need to refine that function.

Related

Stemmer function that takes a string and returns the stems of each word in a list

I am trying to create this function which takes a string as input and returns a list containing the stem of each word in the string. The problem is, that using a nested for loop, the words in the string are appended multiple times in the list. Is there a way to avoid this?
def stemmer(text):
stemmed_string = []
res = text.split()
suffixes = ('ed', 'ly', 'ing')
for word in res:
for i in range(len(suffixes)):
if word.endswith(suffixes[i]):
stemmed_string.append(word[:-len(suffixes[i])])
elif len(word) > 8:
stemmed_string.append(word[:8])
else:
stemmed_string.append(word)
return stemmed_string
If I call the function on this text ('I have a dog is barking') this is the output:
['I',
'I',
'I',
'have',
'have',
'have',
'a',
'a',
'a',
'dog',
'dog',
'dog',
'that',
'that',
'that',
'is',
'is',
'is',
'barking',
'barking',
'bark']

You are appending something in each round of the loop over suffixes. To avoid the problem, don't do that.
It's not clear if you want to add the shortest possible string out of a set of candidates, or how to handle stacked suffixes. Here's a version which always strips as much as possible.
def stemmer(text):
stemmed_string = []
suffixes = ('ed', 'ly', 'ing')
for word in text.split():
for suffix in suffixes:
if word.endswith(suffix):
word = word[:-len(suffix)]
stemmed_string.append(word)
return stemmed_string
Notice the fixed syntax for looping over a list, too.
This will reduce "sparingly" to "spar", etc.
Like every naïve stemmer, this will also do stupid things with words like "sly" and "thing".
Demo: https://ideone.com/a7FqBp

Replacing character in string doesn't do anything

I have a list like this,
['Therefore', 'allowance' ,'(#)', 't(o)o', 'perfectly', 'gentleman', '(##)' ,'su(p)posing', 'man', 'his', 'now']
Expected output:
['Therefore', 'allowance' ,'(#)', 'too', 'perfectly', 'gentleman', '(##)' ,'supposing', 'man', 'his', 'now']
Removing the brackets is easy by using .replace(), but I don't want to remove the brackets from strings (#) and (##).
my code:
ch = "()"
for w in li:
if w in ["(#)", "(##)"]:
print(w)
else:
for c in ch:
w.replace(c, "")
print(w)
but this doesn't remove the brackets from the words.

You can use re.sub. In particular, note that it can take a function as repl parameter. The function takes a match object, and returns the desired replacement based on the information the match object has (e.g., m.group(1)).
import re
lst = ['Therefore', 'allowance', '(#)', 't(o)o', 'perfectly', 'gentleman', '(##)', 'su(p)posing', 'man', 'his', 'now']
def remove_paren(m):
return m.group(0) if m.group(1) in ('#', '##') else m.group(1)
output = [re.sub(r"\((.*?)\)", remove_paren, word) for word in lst]
print(output) # ['Therefore', 'allowance', '(#)', 'too', 'perfectly', 'gentleman', '(##)', 'supposing', 'man', 'his', 'now']

def removeparanthesis(s):
a=''
for i in s:
if i not in '()':
a+=i
return a
a = ['Therefore', 'allowance' , '(#)' , 't(o)o' , 'perfectly' , 'gentleman' , '(##)' , 'su(p)posing', 'man', 'his', 'now']
b=[]
for i in a:
if i == '(#)' or i == '(##)':
b.append(i)
else:
b.append(removeparanthesis(i))
print(b)
#I just created a function to remove parenthesis to those with not having them as a start and end

Give this a try!
Here, I define another empty array. And by looping in the original array to append the words again except the ones that we don't need.
At first, as you can see we got two loops. In the second one, we loop through each character and whenever we encounter a ( or ) we skip it and continue appending our string word.
If you notice that; to keep the (#) and (##) we skip the second loop but do not forget to add them again to the new list.
li = ["Therefore", "allowance", "(#)", "t(o)o" , "perfectly", "gentleman", "(##)", "su(p)posing", "man", "his", "now"]
new_li = []
for index, w in enumerate(li):
if w in ["(#)", "(##)"]:
new_li.append(w)
continue
new_word = ""
for c in w:
if c == "(" or c == ")":
continue
new_word = new_word + c
new_li.append(new_word)
print(new_li)

Return a list of words that contain a letter

I wanna return a list of words containing a letter disregarding its case.
Say if i have sentence = "Anyone who has never made a mistake has never tried anything new", then f(sentence, a) would return
['Anyone', 'has', 'made', 'a', 'mistake', 'has', 'anything']
This is what i have
import re
def f(string, match):
string_list = string.split()
match_list = []
for word in string_list:
if match in word:
match_list.append(word)
return match_list

You don't need re. Use str.casefold:
[w for w in sentence.split() if "a" in w.casefold()]
Output:
['Anyone', 'has', 'made', 'a', 'mistake', 'has', 'anything']

You can use string splitting for it, if there is not punctuation.
match_list = [s for s in sentence.split(' ') if 'a' in s.lower()]

Here's another variation :
sentence = 'Anyone who has never made a mistake has never tried anything new'
def f (string, match) :
match_list = []
for word in string.split () :
if match in word.lower ():
match_list.append (word)
return match_list
print (f (sentence, 'a'))

Words in a list with consecutively repeated letters

Right now I have a list of for example
data = ['dog','cat','a','aa','aac','bbb','bcca','ffffff']
I want to remove the words with the repeated letters, in which I want to remove the words
'aa','aac','bbb','bcca','ffffff'
Maybe import re?

Thanks to this thread: Regex to determine if string is a single repeating character
Here is the re version, but I would stick to PM2 ring and Tameem's solutions if the task was as simple as this:
import re
data = ['dog','cat','a','aa','aac','bbb','bcca','ffffff']
[i for i in data if not re.search(r'^(.)\1+$', i)]
Output
['dog', 'cat', 'a', 'aac', 'bcca']
And the other:
import re
data = ['dog','cat','a','aa','aac','bbb','bcca','ffffff']
[i for i in data if not re.search(r'((\w)\2{1,})', i)]
Output
['dog', 'cat', 'a']

Loop is the way to go. Forget about sets so far as they do not work for words with repetitive letters.
Here is a method you can use to determine if word is valid in a single loop:
def is_valid(word):
last_char = None
for i in word:
if i == last_char:
return False
last_char = i
return True
Example
In [28]: is_valid('dogo')
Out[28]: True
In [29]: is_valid('doo')
Out[29]: False

The original version of this question wanted to drop words that consist entirely of repetitions of a single character. An efficient way to do this is to use sets. We convert each word to a set, and if it consists of only a single character the length of that set will be 1. If that's the case, we can drop that word, unless the original word consisted of a single character.
data = ['dog','cat','a','aa','aac','bbb','bcca','ffffff']
newdata = [s for s in data if len(s) == 1 or len(set(s)) != 1]
print(newdata)
output
['dog', 'cat', 'a', 'aac', 'bcca']
Here's code for the new version of your question, where you want to drop words that contain any repeated characters. This one's simpler, because we don't need to make a special test for one-character words..
data = ['dog','cat','a','aa','aac','bbb','bcca','ffffff']
newdata = [s for s in data if len(set(s)) == len(s)]
print(newdata)
output
['dog', 'cat', 'a']
If the repetitions have to be consecutive, we can handle that using groupby.
from itertools import groupby
data = ['dog','cat','a','aa','aac','bbb','bcca','ffffff', 'abab', 'wow']
newdata = [s for s in data if max(len(list(g)) for _, g in groupby(s)) == 1]
print(newdata)
output
['dog', 'cat', 'a', 'abab', 'wow']

Here's a way to check if there are consecutive repeated characters:
def has_consecutive_repeated_letters(word):
return any(c1 == c2 for c1, c2 in zip(word, word[1:]))
You can then use a list comprehension to filter your list:
words = ['dog','cat','a','aa','aac','bbb','bcca','ffffff', 'abab', 'wow']
[word for word in words if not has_consecutive_repeated_letters(word)]
# ['dog', 'cat', 'a', 'abab', 'wow']

One line is all it takes :)
data = ['dog','cat','a','aa','aac','bbb','bcca','ffffff']
data = [value for value in data if(len(set(value))!=1 or len(value) ==1)]
print(data)
Output
['dog', 'cat', 'a', 'aac', 'bcca']

Alter the letter in list of strings

An example:
eword_list = ["a", "is", "bus", "on", "the"]
alter_the_list("A bus station is where a bus stops A train station is where a train stops On my desk I have a work station", word_list)
print("1.", word_list)
word_list = ["a", 'up', "you", "it", "on", "the", 'is']
alter_the_list("It is up to YOU", word_list)
print("2.", word_list)
word_list = ["easy", "come", "go"]
alter_the_list("Easy come easy go go go", word_list)
print("3.", word_list)
word_list = ["a", "is", "i", "on"]
alter_the_list("", word_list)
print("4.", word_list)
word_list = ["a", "is", "i", "on", "the"]
alter_the_list("May your coffee be strong and your Monday be short", word_list)
print("5.", word_list)
def alter_the_list(text, word_list):
return[text for text in word_list if text in word_list]
I'm trying to remove any word from the list of words which is a separate word in the string of text. The string of text should be converted to lower case before I check the elements of the list of words are all in lower case. There is no punctuation in the string of text and each word in the parameter list of word is unique. I don't know how to fix it.
output:
1. ['a', 'is', 'bus', 'on', 'the']
2. ['a', 'up', 'you', 'it', 'on', 'the', 'is']
3. ['easy', 'come', 'go']
4. ['a', 'is', 'i', 'on']
5. ['a', 'is', 'i', 'on', 'the']
expected:
1. ['the']
2. ['a', 'on', 'the']
3. []
4. ['a', 'is', 'i', 'on']
5. ['a', 'is', 'i', 'on', 'the']

I've done it like this:
def alter_the_list(text, word_list):
for word in text.lower().split():
if word in word_list:
word_list.remove(word)
text.lower().split() returns a list of all space-separated tokens in text.
The key is that you're required to alter word_list. It is not enough to return a new list; you have to use Python 3's list methods to modify the list in-place.

If the order of the resulting list does not matter you can use sets:
def alter_the_list(text, word_list):
word_list[:] = set(word_list).difference(text.lower().split())
This function will update word_list in place due to the assignment to the list slice with word_list[:] = ...

1
Your main problem is that you return a value from your function, but then ignore it. You have to save it in some way to print out, such as:
word_list = ["easy", "come", "go"]
word_out = alter_the_list("Easy come easy go go go", word_list)
print("3.", word_out)
What you printed is the original word list, not the function result.
2
You ignore the text parameter to the function. You reuse the variable name as a loop index in your list comprehension. Get a different variable name, such as
return[word for word in word_list if word in word_list]
3
You still have to involve text in the logic of the list you build. Remember that you're looking for words that are not in the given text.
Most of all, learn basic debugging.
See this lovely debug blog for help.
If nothing else, learn to use simple print statements to display the values of your variables, and to trace program execution.
Does that get you moving toward a solution?

I like #Simon's answer better, but if you want to do it in two list comprehensions:
def alter_the_list(text, word_list):
# Pull out all words found in the word list
c = [w for w in word_list for t in text.split() if t == w]
# Find the difference of the two lists
return [w for w in word_list if w not in c]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to match 2 string lists? - python

Use list comprehension with any: word = ['hello', 'how', 'are', 'you', 'potato'] letters = ['ell', 'how', 'aaa', 'bbb', 'tat'] def check_match(): return [any(x in i for x in letters) for i in word] print(check_match()) Output: [True, True, False, False, True]

Related

Stemmer function that takes a string and returns the stems of each word in a list

Replacing character in string doesn't do anything

Return a list of words that contain a letter

Words in a list with consecutively repeated letters

Alter the letter in list of strings

Categories

Resources