Python: Match string in different conditions - python

I have 4 lists:
string1=['I love apple', 'Banana is yellow', "I have no school today", "Baking pies at home", "I bought 3 melons today"]
no=['strawberry','apple','melon', 'Banana', "cherry"]
school=['school', 'class']
home=['dinner', 'Baking', 'home']
I want to know every sting in string1 belongs to which group, if the string is about fruit, then ignore it, if string is about school and home, print them.
The result I expected:
I have no school today
school
Baking pies at home
Baking #find the first match
Here's my code, it did print out something I want, but with many duplicate values:
for i in string1:
for j in no:
if j in i:
#print(j)
#print(i)
continue
for k in school:
if k in i:
print(i)
print(k)
for l in home:
if l in i:
print(i)
print(l)
I know this is not an efficient way to find the match. If you have any suggestion please let me know. Thank you!

You can do this with a combination of any and filter. We use any to ignore strings that have any occurrence of a word in no. Otherwise, we find the match using filter:
string1 = ['I love apple', 'Banana is yellow', "I have no school today", "Baking pies at home", "I bought 3 melons today"]
no = ['strawberry', 'apple', 'melon', 'Banana', "cherry"]
school = ['school', 'class']
home = ['dinner', 'Baking', 'home']
for s in string1:
if not any(x in s for x in no):
first_match = list(filter(lambda x: x in s, school + home))[0]
print(s)
print(first_match)
Output
I have no school today
school
Baking pies at home
Baking

Assuming you are trying to see if any of the lists no, school, and home have a word that is in any of the strings in string1.
I would just concatenate no, school and home lists together then just do
for string in string1:
for word in all3lists:
if word in string:
print("{0}\n{1}".format(string, word))
Hope that is of some help, I'm not in a position to test it but that's my best bet without doing a test to see if that works :)

Related

Multiple Search Terms in List

I'm looking to count whether eth or btc appear in listy
searchterms = ['btc', 'eth']
listy = ['Hello, my name is btc', 'hello, my name is ETH,', 'i love eth', '#eth is great', 'barbaric tsar', 'nothing']
cnt = round((sum(len(re.findall('eth', x.lower())) for x in listy)/(len(listy)))*100)
print(f'{cnt}%')
The solution only looks for eth. How do I look for multiple search terms?
Bottom line: I have a list of search terms in searchterms. I'd like to see if any of those appear in listy. From there, I can perform a percentage of how many of those terms appear in the list.
you need to use the pipe "|" betwwen the values you want to search. In your code change re.findall('eth', x.lower() by re.findall(r"(eth)|(btc)", x.lower()
listy = ['Hello, my name is btc', 'hello, my name is ETH,', 'i love eth', '#eth is great', 'barbaric tsar', 'nothing']
cnt = round((sum(len(re.findall(r"(eth)|(btc)", x.lower())) for x in listy)/(len(listy)))*100)
print(f'{cnt}%')
67%
I would say instead of complicating the problem and using re, use a simple classic list comprehension.
listy = ['Hello, my name is btc', 'hello, my name is ETH,', 'i love eth', '#eth is great', 'barbaric tsar', 'nothing']
print(len([i for i in listy if 'btc' in i.lower() or 'eth' in i.lower()]) * 100 / len(listy))
It improves the readability and the simplicity of the code.
Let me know if it helps!
a bit more of code is giving a good readability. also adding additional words to search for, just needs to add them to the list search_for. for counting it uses a defaultdict.
listy = ['Hello, my name is btc', 'hello, my name is ETH,', 'i love eth', '#eth is great', 'barbaric tsar', 'nothing']
my_dict = defaultdict(int)
search_for = ['btc', 'eth']
for word in listy:
for sub in search_for:
if sub in word:
my_dict[sub] += 1
print(my_dict.items())

how to format a list by number and type of item in python

How to format a list in python? The code for the list would be something like this.
import random
nums = range(1, 11)
foods1 = ['pancake', 'pancake']
foods2 = ['pineapple pizza', 'pineapple pizza']
for num in nums:
for food1 in foods1:
list1 = [f"{num} {food1}"]
for food2 in foods2:
list2 = [f"{num} {food2}"]
alist = list1.append(list2)
blist = random.sample(alist, 12)
print (blist)
And I want the blist to listed with the same foods listed together and ordered by the number in front of them. The repeating foods in the two lists are intentional, the output of the list I would expect is like this: ['1 pancake', '3 pancake', '3 pancake', '5 pancake'... '3 pineapple pizza', '8 pineapple pizza', '9 pineapple pizza'] ordered by the numbers in front of them and separated by the kind of food they are.
Based on my understanding of your problem, here is a sleek solution, but please let me know if this is not what you intended to ask.
from random import sample
foods1 = ['pancake', 'pancake']
foods2 = ['pineapple pizza', 'pineapple pizza']
population = [f"{i} {food}" for i in range(1,11) for food in foods1]+[f"{i} {food}" for i in range(1,11) for food in foods2]
ans = sorted(sample(population,12), key=lambda x: x[0])

how to store target words in Python

I have a question in regard to how to store target words in the list.
I have a text file:
apple tree apple_tree
banana juice banana_juice
dinner time dinner_time
divorce lawyer divorce_lawyer
breakfast table breakfast_table
I would like read this file and store only nouns...but I am struggling with the code in Python.
file = open("text.txt","r")
for f in file.readlines():
words.append(f.split(" "))
I dont know how to split lines by white space and eliminate compounds with "_"...
list = [apple, tree, banana, juice, dinner, time...]
Try this code. It works fine.
split the whole string & add only those values in the list that have not compound words(i-e those words have not _)
Code :
temp = """apple tree apple_tree
banana juice banana_juice
dinner time dinner_time
divorce lawyer divorce_lawyer
breakfast table breakfast_table"""
new_arr = [i for i in temp.split() if not '_' in i]
print(new_arr)
Output :
['apple', 'tree', 'banana', 'juice', 'dinner', 'time', 'divorce', 'lawyer', 'breakfast', 'table']
This code stores only words without the underscore, and all in one list instead of a nested list:
words = []
file = open("text.txt","r")
for f in file.readlines():
words += [i for i in f.split(" ") if not '_' in i]
print(words)
import re
file = ["apple tree apple_tree apple_tree_tree apple_tree_ _",
"banana juice banana_juice",
"dinner time dinner_time",
"divorce lawyer divorce_lawyer",
"breakfast table breakfast_table"]
#approach 1 - list comprehensions
words=[]
for f in file:
words += [x for x in f.split(" ") if '_' not in x]
print(words)
#approach 2 - regular expressions
words=[]
for f in file:
f = re.sub(r"\s*\w*_[\w_]*\s*", "", f)
words += f.split(" ")
print(words)
Both of the above approaches would work.
IMO first is better(regular expressions can be costly) and also more pythonic

How to map the differences between two strings?

I came across the following question and was wondering what would be an elegant way to solve it.
Let's say we have two strings:
string1 = "I love to eat $(fruit)"
string2 = "I love to eat apples"
The only difference between those strings is $(fruit) and apples.
So, I can find the fruit is apples, and a dict{fruit:apples} could be returned.
Another example would be:
string1 = "I have $(food1), $(food2), $(food3) for lunch"
string2 = "I have rice, soup, vegetables for lunch"
I would like to have a dict{food1:rice, food2:soup, food3:vegetables} as the result.
Anyone have a good idea about how to implement it?
Edit:
I think I need the function to be more powerful.
ex.
string1 = "I want to go to $(place)"
string2 = "I want to go to North America"
result: {place : North America}
ex.
string1 = "I won $(index)place in the competition"
string2 = "I won firstplace in the competition"
result: {index : first}
The Rule would be: map the different parts of the string and make them a dict
So I guess all answers using str.split() or trying to split the string will not work. There is no rule that says what characters would be used as a separator in the string.
I think this can be cleanly done with regex-based splitting. This should also handle punctuation and other special characters (where a split on space is not enough).
import re
p = re.compile(r'[^\w$()]+')
mapping = {
x[2:-1]: y for x, y in zip(p.split(string1), p.split(string2)) if x != y}
For your examples, this returns
{'fruit': 'apple'}
and
{'food1': 'rice', 'food2': 'soup', 'food3': 'vegetable'}
One solution is to replace $(name) with (?P<name>.*) and use that as a regex:
def make_regex(text):
replaced = re.sub(r'\$\((\w+)\)', r'(?P<\1>.*)', text)
return re.compile(replaced)
def find_mappings(mapper, text):
return make_regex(mapper).match(text).groupdict()
Sample usage:
>>> string1 = "I have $(food1), $(food2), $(food3) for lunch"
>>> string2 = "I have rice, soup, vegetable for lunch"
>>> string3 = "I have rice rice rice, soup, vegetable for lunch"
>>> make_regex(string1).pattern
'I have (?P<food1>.*), (?P<food2>.*), (?P<food3>.*) for lunch'
>>> find_mappings(string1, string2)
{'food1': 'rice', 'food3': 'vegetable', 'food2': 'soup'}
>>> find_mappings(string1, string3)
{'food1': 'rice rice rice', 'food3': 'vegetable', 'food2': 'soup'}
Note that this can handle non alpha numeric tokens (see food1 and rice rice rice). Obviously this will probably do an awful lot of backtracking and might be slow. You can tweak the .* regex to try and make it faster depending on your expectations on "tokens".
For production ready code you'd want to re.escape the parts outside the (?P<name>.*) groups. A bit of pain in the ass to do because you have to "split" that string and call re.escape on each piece, put them together and call re.compile.
Since my answer got accepted I wanted to include a more robust version of the regex:
def make_regex(text):
regex = ''.join(map(extract_and_escape, re.split(r'\$\(', text)))
return re.compile(regex)
def extract_and_escape(partial_text):
m = re.match(r'(\w+)\)', partial_text)
if m:
group_name = m.group(1)
return ('(?P<%s>.*)' % group_name) + re.escape(partial_text[len(group_name)+1:])
return re.escape(partial_text)
This avoids issues when the text contains special regex characters (e.g. I have $(food1) and it costs $$$. The first solution would end up considering $$$ as three times the $ anchor (which would fail), this robust solution escapes them.
I suppose this does the trick.
s_1 = 'I had $(food_1), $(food_2) and $(food_3) for lunch'
s_2 = 'I had rice, meat and vegetable for lunch'
result = {}
for elem1, elem2 in zip(s_1.split(), s_2.split()):
if elem1.startswith('$'):
result[elem1.strip(',')[2:-1]] = elem2
print result
# {'food_3': 'vegetable', 'food_2': 'meat', 'food_1': 'rice,'}
If you'd rather not use regex:
string1 = "I have $(food1), $(food2), $(food3) for lunch"
string2 = "I have rice, soup, vegetable for lunch"
trans_table = str.maketrans({'$': '', '(': '', ')': '', ',': ''})
{
substr1.translate(trans_table): substr2.translate(trans_table)
for substr1, substr2 in zip(string1.split(),string2.split())
if substr1 != substr2
}
Output:
{'food1': 'rice', 'food2': 'soup', 'food3': 'vegetable'}
Alternatively, something a bit more flexible:
def substr_parser(substr, chars_to_ignore='$(),'):
trans_table = str.maketrans({char: '' for char in chars_to_ignore})
substr = substr.translate(trans_table)
# More handling here
return substr
{
substr_parser(substr1): substr_parser(substr2)
for substr1, substr2 in zip(string1.split(),string2.split())
if substr1 != substr2
}
Same output as above.
You can use re:
import re
def get_dict(a, b):
keys, values = re.findall('(?<=\$\().*?(?=\))', a), re.findall(re.sub('\$\(.*?\)', '(\w+)', a), b)
return dict(zip(keys, values if not isinstance(_values[0], tuple) else _values[0]))
d = [["I love to eat $(fruit)", "I love to eat apple"], ["I have $(food1), $(food2), $(food3) for lunch", "I have rice, soup, vegetable for lunch"]]
results = [get_dict(*i) for i in d]
Output:
[{'fruit': 'apple'}, {'food3': 'vegetable', 'food2': 'soup', 'food1': 'rice'}]
You can do:
>>> dict((x.strip('$(),'),y.strip(',')) for x,y in zip(string1.split(), string2.split()) if x!=y)
{'food1': 'rice', 'food2': 'soup', 'food3': 'vegetable'}
Or with a regex:
>>> import re
>>> dict((x, y) for x,y in zip(re.findall(r'\w+', string1), re.findall(r'\w+', string2)) if x!=y)
{'food1': 'rice', 'food2': 'soup', 'food3': 'vegetable'}
zip in combination with dictionary comprehension works well here we can zip the two lists and only take the pairs that are not equal.
l = [*zip(s1.split(),s2.split())]
d = {i[0].strip('$(),'): i[1] for i in l if i[0] != i[1] }

How to separate a single list into multiple list in python

I'm not sure if this has already been asked before, but here is what I want to do:
I have a list:
foods = ['I_want_ten_orange_cookies', 'I_want_four_orange_juices', 'I_want_ten_lemon_cookies', 'I_want_four_lemon_juices']
And I want to separate them into each individual list using the flavor, in this case which is 'orange' and 'lemon':
orange = ['I_want_ten_orange_cookies', 'I_want_four_orange_juices']
lemon = ['I_want_ten_lemon_cookies', 'I_want_ten_lemon_juices']
I'm a beginner in Python, is this difficult to do? Thank you!
How about this:
foods = ['I_want_ten_orange_cookies', 'I_want_four_orange_juices', 'I_want_ten_lemon_cookies', 'I_want_four_lemon_juices']
orange=[]
lemon=[]
for food in foods:
if 'orange' in food.split('_'):
orange.append(food)
elif 'lemon' in food.split('_'):
lemon.append(food)
This would output:
>>> orange
['I_want_ten_orange_cookies', 'I_want_four_orange_juices']
>>> lemon
['I_want_ten_lemon_cookies', 'I_want_four_lemon_juices']
This works if the items in the list are always separated by underscores.
The if 'orange' in food.split('_') splits the sentence into a list of words and then then checks if the food is in that list.
You could, in theory, just do if 'orange' in food but that would fail if the substring is found in another word. For example:
>>> s='I_appeared_there'
>>> if 'pear' in s:
print "yes"
yes
>>> if 'pear' in s.split('_'):
print "yes"
>>>
foods = ['I_want_ten_orange_cookies', 'I_want_four_orange_juices', 'I_want_ten_lemon_cookies', 'I_want_four_lemon_juices']
foodlists = {'orange':[], 'lemon':[]}
for food in foods:
for name, L in foodlists.items():
if name in food:
L.append(food)
Now, foodlists['orange'] and foodlists['lemon'] are the lists you are after

Categories

Resources