Replacing text in tags - python

I have been having problems trying to find a way to replace tags in my strings in Python.
What I have at the moment is the text:
you should buy a {{cat_breed + dog_breed}} or a {{cat_breed + dog_breed}}
Where cat_breed and dog_breed are lists of cat and dog breeds.
What I want to end up with is:
you should buy a Scottish short hair or a golden retriever
I want the tag to be replaced by a random entry in one of the two lists.
I have been looking at re.sub() but I do not know how to fix the problem and not just end up with the same result in both tags.

Use random.sample to get two unique elements from the population.
import random
cats = 'lazy cat', 'cuddly cat', 'angry cat'
dogs = 'dirty dog', 'happy dog', 'shaggy dog'
print("you should buy a {} or a {}".format(*random.sample(dogs + cats, 2)))
There's no reason to use regular expressions here. Just use string.format instead.

I hope the idea below gives you some idea on how to complete your task:
list1 = ['cat_breed1', 'cat_breed2']
list2 = ['dog_breed1', 'dog_breed2']
a = random.choice(list1)
b = random.choice(list2)
sentence = "you should buy a %s or a %s" %(a, b)

Related

Most common n words in a text

I am currently learning to work with NLP. One of the problems I am facing is finding most common n words in text. Consider the following:
text=['Lion Monkey Elephant Weed','Tiger Elephant Lion Water Grass','Lion Weed Markov Elephant Monkey Fine','Guard Elephant Weed Fortune Wolf']
Suppose n = 2. I am not looking for most common bigrams. I am searching for 2-words that occur together the most in the text. Like, the output for the above should give:
'Lion' & 'Elephant': 3
'Elephant' & 'Weed': 3
'Lion' & 'Monkey': 2
'Elephant' & 'Monkey': 2
and such..
Could anyone suggest a suitable way to tackle this?
I would suggest using Counter and combinations as follows.
from collections import Counter
from itertools import combinations, chain
text = ['Lion Monkey Elephant Weed', 'Tiger Elephant Lion Water Grass', 'Lion Weed Markov Elephant Monkey Fine', 'Guard Elephant Weed Fortune Wolf']
def count_combinations(text, n_words, n_most_common=None):
count = []
for t in text:
words = t.split()
combos = combinations(words, n_words)
count.append([" & ".join(sorted(c)) for c in combos])
return dict(Counter(sorted(list(chain(*count)))).most_common(n_most_common))
count_combinations(text, 2)
it was tricky but I solved for you, I used empty space to detect if elem contains more than 3 words :-) cause if elem has 3 words then it must be 2 empty spaces :-) in that case, only elem with 2 words will be returned
l = ["hello world", "good night world", "good morning sunshine", "wassap babe"]
for elem in l:
if elem.count(" ") == 1:
print(elem)
output
hello world
wassap babe

Suitable data structure for fast search on sets. imput: tags, output: sentence

I have the following problem.
I get 1-10 tags, related to an image, each have a probability of existence in image.
inputs: beach, woman, dog, tree ...
I would like to retrieve from database an already composed sentence which is most related to the tags.
e.g:
beach -> "fun at the beach" / "chilling on the beach" ....
beach, woman -> "woman at the beach"
beach, woman, dog - > none found!
take the closest exist but consider probability
lets say: woman 0.95, beach 0.85, dog 0.7
so if exist take woman+beach(0.95, 0.85) then woman+dog and last beach+dog, the order is that higher are better but we are not summing.
I thought of using python sets but I am not really sure how.
Another option will be defaultdict:
db['beach']['woman']['dog'], but I want to get the same result also from:
db['woman']['beeach']['dog']
I would like to get a nice solution.
Thanks.
EDIT: Working solution
from collections import OrderedDict
list_of_keys = []
sentences = OrderedDict()
sentences[('dogs',)] = ['I like dogs','dogs are man best friends!']
sentences[('dogs', 'beach')] = ['the dog is at the beach']
sentences[('woman', 'cafe')] = ['The woman sat at the cafe.']
sentences[('woman', 'beach')] = ['The woman was at the beach']
sentences[('dress',)] = ['hi nice dress', 'what a nice dress !']
def keys_to_list_of_sets(dict_):
list_of_keys = []
for key in dict_:
list_of_keys.append(set(key))
return list_of_keys
def match_best_sentence(image_tags):
for i, tags in enumerate(list_of_keys):
if (tags & image_tags) == tags:
print(list(sentences.keys())[i])
list_of_keys = keys_to_list_of_sets(sentences)
tags = set(['beach', 'dogs', 'woman'])
match_best_sentence(tags)
results:
('dogs',)
('dogs', 'beach')
('woman', 'beach')
This solution run over all keys of an ordered dictionary,
o(n), I would like to see any performance improvement.
What seems to be the simplest way of doing this without using DBs would be to keep sets for each word and take intersections.
More explicitly:
If a sentence contains the word "woman" then you put it into the "woman" set. Similarly for dog and beach etc. for each sentence. This means your space complexity is O(sentences*average_tags) as each sentence is repeated in the data structure.
You may have:
>>> dogs = set(["I like dogs", "the dog is at the beach"])
>>> woman = set(["The woman sat at the cafe.", "The woman was at the beach"])
>>> beach = set(["the dog is at the beach", "The woman was at the beach", "I do not like the beach"])
>>> dogs.intersection(beach)
{'the dog is at the beach'}
Which you can build into an object which is on top of defaultdict so that you can take a list of tags and you can intersect only those lists and return results.
Rough implementation idea:
from collections import defaultdict
class myObj(object): #python2
def __init__(self):
self.sets = defaultdict(lambda: set())
def add_sentence(self, sentence, tags):
#how you process tags is up to you, they could also be parsed from
#the input string.
for t in tags:
self.sets[tag].add(sentence)
def get_match(self, tags):
result = self.sets(tags[0]) #this is a hack
for t in tags[1:]:
result = result.intersection(self.sets[t])
return result #this function can stand to be improved but the idea is there
Maybe this will make it more clear how the default dict and sets will end up looking in the object.
>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
It mainly depends on the size of the database and on the number of combinations between keywords. Moreover, it depends also on which operation you do most.
If it's small and you need a fast find operation, a possibility is to use a dictionary with frozensets as keys which contain the tags and with values all the associated sentences.
For instance,
d=defaultdict(list)
# preprocessing
d[frozenset(["bob","car","red"])].append("Bob owns a red car")
# searching
d[frozenset(["bob","car","red"])] #['Bob owns a red car']
d[frozenset(["red","car","bob"])] #['Bob owns a red car']
For combinations of words like "bob","car" you have different possibilities according to the number of the keywords and on what matters more. For example
you could have additional entries with for each combination
you could iterate over the keys and check the ones that contain both car and bob

Modify each element of a Python list and combine results into a string

Let's say there is a list of objects of type "dog", and each dog has a parameter "name". For example:
dogs = [dog1, dog2, dog3]
is a list consisting of three dogs with names Rocky, Spot, and Daisy.
I am trying to access "name" of each dog and generate a string like "Rocky, Spot, Daisy". I understand that I need to use List Comprehensions, but the specifics turned out to be trickier than I thought.
I tried using
result = (dog.name+", " for dog in dogs)
but result becomes a generator rather than a string.
I also tried
result = ",".join(layers.name)
but I didn't find a way to access the "name" field of each dog.
I know how to solve the problem using brute force, but I would really want to implement an elegant, "python" solution instead.
Any help would be appreciate!
You have to combine the generator with join:
result = ', '.join(dog.name for dog in dogs)
You can plug the generator into join.
>>> ", ".join(dog.name for dog in dogs)
'Rocky, Spot, Daisy'
This:
result = (dog.name+", " for dog in dogs)
is a generator comprehension/expression not a list comprehension. You can use that like so:
>>> dogs = ['Rocky', 'Spot', 'Daisy']
>>> result = (dog for dog in dogs)
>>> for dog in result:
... print(dog)
...
Rocky
Spot
Daisy
Or for your specific case:
>>> result = (dog for dog in dogs)
>>> ', '.join(result)
'Rocky, Spot, Daisy'
If you combine list comprehension with ", ".join you get:
result = ", ".join( [dog.name for dog in dogs] )

Python, find words from array in string

I just want to ask how can I find words from array in my string?
I need to do filter that will find words i saved in my array in text that user type to text window on my web.
I need to have 30+ words in array or list or something.
Then user type text in text box.
Then script should find all words.
Something like spam filter i quess.
Thanks
import re
words = ['word1', 'word2', 'word4']
s = 'Word1 qwerty word2, word3 word44'
r = re.compile('|'.join([r'\b%s\b' % w for w in words]), flags=re.I)
r.findall(s)
>> ['Word1', 'word2']
Solution 1 uses the regex approach which will return all instances of the keyword found in the data. Solution 2 will return the indexes of all instances of the keyword found in the data
import re
dataString = '''Life morning don't were in multiply yielding multiply gathered from it. She'd of evening kind creature lesser years us every, without Abundantly fly land there there sixth creature it. All form every for a signs without very grass. Behold our bring can't one So itself fill bring together their rule from, let, given winged our. Creepeth Sixth earth saying also unto to his kind midst of. Living male without for fruitful earth open fruit for. Lesser beast replenish evening gathering.
Behold own, don't place, winged. After said without of divide female signs blessed subdue wherein all were meat shall that living his tree morning cattle divide cattle creeping rule morning. Light he which he sea from fill. Of shall shall. Creature blessed.
Our. Days under form stars so over shall which seed doesn't lesser rule waters. Saying whose. Seasons, place may brought over. All she'd thing male Stars their won't firmament above make earth to blessed set man shall two it abundantly in bring living green creepeth all air make stars under for let a great divided Void Wherein night light image fish one. Fowl, thing. Moved fruit i fill saw likeness seas Tree won't Don't moving days seed darkness.
'''
keyWords = ['Life', 'stars', 'seed', 'rule']
#---------------------- SOLUTION 1
print 'Solution 1 output:'
for keyWord in keyWords:
print re.findall(keyWord, dataString)
#---------------------- SOLUTION 2
print '\nSolution 2 output:'
for keyWord in keyWords:
index = 0
indexes = []
indexFound = 0
while indexFound != -1:
indexFound = dataString.find(keyWord, index)
if indexFound not in indexes:
indexes.append(indexFound)
index += 1
indexes.pop(-1)
print indexes
Output:
Solution 1 output:
['Life']
['stars', 'stars']
['seed', 'seed']
['rule', 'rule', 'rule']
Solution 2 output:
[0]
[765, 1024]
[791, 1180]
[295, 663, 811]
Try
words = ['word1', 'word2', 'word4']
s = 'word1 qwerty word2, word3 word44'
s1 = s.split(" ")
i = 0
for x in s1:
if(x in words):
print x
i++
print "count is "+i
output
'word1'
'word2'
count is 2

how to find longest match of a string including a focus word in python

new to python/programming, so not quite sure how to phrase this....
What I want to do is this: input a sentence, find all matches of the input sentence and a set of stored sentences/strings, and return the longest combination of matched strings.
I think the answer will have something to do with regex, but I haven't started those yet and didn't want to if i didn't need to.
My question: is regex the way to go about this? or is there a way to do this without importing anything?
if it helps you understand my question/idea, heres pseudocode for what i'm trying to do:
input = 'i play soccer and eat pizza on the weekends'
focus_word = 'and'
ss = [
'i play soccer and baseball',
'i eat pizza and apples',
'every day i walk to school and eat pizza for lunch',
'i play soccer but eat pizza on the weekend',
]
match = MatchingFunction(input, focus_word, ss)
# input should match with all except ss[3]
ss[0]match= 'i play soccer and'
ss[1]match = 'and'
ss[2]match = 'and eat pizza'
#the returned value match should be 'i play soccer and eat pizza'
It sounds like you want to find the longest common substring between your input string and each string in your database. Assuming you have a function LCS that will find the longest common substring of two strings, you could do something like:
> [LCS(input, s) for s in ss]
['i play soccer and ',
' eat pizza ',
' and eat pizza ',
' eat pizza on the weekend']
Then, it sounds like you're looking for the most-repeated substring within your list of strings. (Correct me if I'm wrong, but I'm not quite sure what you're looking for in the general case!) From the array output above, what combination of strings would you use to create your output string?
Based on your comments, I think this should do the trick:
> parts = [s for s in [LCS(input, s) for s in ss] if s.find(focus_word) > -1]
> parts
['i play soccer and ', ' and eat pizza ']
Then, to get rid of the duplicate words in this example:
> "".join([parts[0]] + [p.replace(focus_word, "").strip() for p in parts[1:]])
'i play soccer and eat pizza'

Categories

Resources