Compare list with string in Python and highlight matches - python

I'm trying to compare a python list with a string and highlight matches with mark inside a new string.
But it won't work. Following example:
my_string = 'This is my string where i would find matches of my List'
my_list = ['THIS IS', 'WOULD FIND', 'BLA', 'OF MY LIST']
result_that_i_need = '<mark>This is</mark> my string where i <mark>would find</mark> matches <mark>of my List</mark>'
Has anybody any idea how to solve this? Can somebody help me please?
I tried following:
my_string = 'This is my string where i would find matches of my List'
my_string_split = string.split()
my_list = ['This is', 'would find', 'bla', 'of my List']
input_list=[]
for my_li in my_list:
if my_li in my_string:
input_list.append(my_li)
input_list_join = " ".join(input_list)
new_list = []
for my_string_spl in my_string_split:
if my_string_spl in input_list_join:
new_list.append('<mark>'+ my_string_spl + '</mark>')
else:
new_list.append(my_string_spl)
result = " ".join(new_list)
print(result)

Maybe something like this:
my_string = 'This is my string where i would find matches of my List'
my_list = ['This is', 'would find', 'bla', 'of my List']
result = my_string
for match in my_list:
if match in my_string:
result = result.replace(match, '<mark>' + match + '</mark>')
print(result)
Output:
<mark>This is</mark> my string where i <mark>would find</mark> matches <mark>of my List</mark>

Related

Replace a string with corresponding value from a list of strings

I have a list of strings and some text strings as follows:
my_list = ['en','di','fi','ope']
test_strings = ['you cannot enter', 'the wound is open', 'the house is clean']
I would like to replace the last words in the test_strings with their corresponding strings from the list above, if they appear in the list. i have written a for loop to capture the pattern, but do not know how to proceed with the replacement (?).
for entry in [entry for entry in test_strings]:
if entry.split(' ', 1)[-1].startswith(tuple(my_list)):
output = entry.replace(entry,?)
as output I would like to have:
output = ['you cannot en', 'the wound is ope', 'the house is clean']
for i,v in enumerate(test_strings):
last_term = v.split(' ')[-1]
for k in my_list:
if last_term.startswith(k):
test_strings[i] = v.replace(last_term,k)
break
print(test_strings)
['you cannot en', 'the wound is ope', 'the house is clean']
my_list = ['en', 'di', 'fi', 'ope']
test_strings = ['you cannot enter', 'the wound is open', 'the house is clean']
for i in range(len(test_strings)):
temp = test_strings[i].split(" ")
last_word = temp[-1]
for j in my_list:
if last_word.startswith(j):
break
else:
j = last_word
temp[-1] = j
test_strings[i] = " ".join(temp)
After this, test_strings will be what is required.
For replacing the text, the index of the list has been reassigned to the new text.

Check the list and split the sentence in python

I have a list as follows.
mylist = ['test copy', 'test project', 'test', 'project']
I want to see if my sentence includes the aforementioned mylistelements and split the sentence from the first match and obtain its first part.
For example:
mystring1 = 'it was a nice test project and I enjoyed it a lot'
output should be: it was a nice
mystring2 = 'the example test was difficult'
output should be: the example
My current code is as follows.
for sentence in L:
if mylist in sentence:
splits = sentence.split(mylist)
sentence= splits[0]
However, I get an error saying TypeError: 'in <string>' requires string as left operand, not list. Is there a way to fix this?
You need another for loop to iterate over every string in mylist.
mylist = ['test copy', 'test project', 'test', 'project']
mystring1 = 'it was a nice test project and I enjoyed it a lot'
mystring2 = 'the example test was difficult'
L = [mystring1, mystring2]
for sentence in L:
for word in mylist:
if word in sentence:
splits = sentence.split(word)
sentence= splits[0]
print(sentence)
# it was a nice
# the example
Probably the most effective way to do this is by first constructing a regex, that tests all the strings concurrently:
import re
split_regex = re.compile('|'.join(re.escape(s) for s in mylist))
for sentence in L:
first_part = split_regex.split(sentence, 1)[0]
This yields:
>>> split_regex.split(mystring1, 1)[0]
'it was a nice '
>>> mystring2 = 'the example test was difficult'
>>> split_regex.split(mystring2, 1)[0]
'the example '
If the number of possible strings is large, a regex can typically outperform searching each string individually.
You probably also want to .strip() the string (remove spaces in the front and end of the string):
import re
split_regex = re.compile('|'.join(re.escape(s) for s in mylist))
for sentence in L:
first_part = split_regex.split(sentence, 1)[0].strip()
mylist = ['test copy', 'test project', 'test', 'project']
L = ['it was a nice test project and I enjoyed it a lot','a test copy']
for sentence in L:
for x in mylist:
if x in sentence:
splits = sentence.split(x)
sentence= splits[0]
print(sentence)
the error says you are trying to check a list in sentence. so you must iterate on elements of list.

Retrieve first word in list index python

I want to know how to retrieve the first word at list index.
For example, if the list is:
['hello world', 'how are you']
Is there a way to get x = "hello how"?
Here is what I've tried so far (newfriend is the list):
x=""
for values in newfriend:
values = values.split()
values = ''.join(values.split(' ', 1)[0])
x+=" ".join(values)
x+="\n"
A simple generator expression would do, I guess, e.g.
>>> l = ["hello world", "how are you"]
>>> ' '.join(x.split()[0] for x in l)
'hello how'
You're not far off. Here is how I would do it.
# Python 3
newfriend = ['hello world', 'how are you']
x = [] # Create x as an empty list, rather than an empty string.
for v in newfriend:
x.append(v.split(' ')[0]) # Append first word of each phrase to the list.
y = ' '.join(x) # Join the list.
print(y)
import re
#where l =["Hello world","hi world"]
g=[]
for i in range(l):
x=re.findall(r'\w+',l[i])
g.append(x)
print(g[0][0]+g[1][0])

Python dictionary replacement with space in key

I have a string and a dictionary, I have to replace every occurrence of the dict key in that text.
text = 'I have a smartphone and a Smart TV'
dict = {
'smartphone': 'toy',
'smart tv': 'junk'
}
If there is no space in keys, I will break the text into word and compare one by one with dict. Look like it took O(n). But now the key have space inside it so thing is more complected. Please suggest me the good way to do this and please notice the key may not match case with the text.
Update
I have think of this solution but it not efficient. O(m*n) or more...
for k,v in dict.iteritems():
text = text.replace(k,v) #or regex...
If the key word in the text is not close to each others (keyword other keyword) we may do this. Took O(n) to me >"<
def dict_replace(dictionary, text, strip_chars=None, replace_func=None):
"""
Replace word or word phrase in text with keyword in dictionary.
Arguments:
dictionary: dict with key:value, key should be in lower case
text: string to replace
strip_chars: string contain character to be strip out of each word
replace_func: function if exist will transform final replacement.
Must have 2 params as key and value
Return:
string
Example:
my_dict = {
"hello": "hallo",
"hallo": "hello", # Only one pass, don't worry
"smart tv": "http://google.com?q=smart+tv"
}
dict_replace(my_dict, "hello google smart tv",
replace_func=lambda k,v: '[%s](%s)'%(k,v))
"""
# First break word phrase in dictionary into single word
dictionary = dictionary.copy()
for key in dictionary.keys():
if ' ' in key:
key_parts = key.split()
for part in key_parts:
# Mark single word with False
if part not in dictionary:
dictionary[part] = False
# Break text into words and compare one by one
result = []
words = text.split()
words.append('')
last_match = '' # Last keyword (lower) match
original = '' # Last match in original
for word in words:
key_word = word.lower().strip(strip_chars) if \
strip_chars is not None else word.lower()
if key_word in dictionary:
last_match = last_match + ' ' + key_word if \
last_match != '' else key_word
original = original + ' ' + word if \
original != '' else word
else:
if last_match != '':
# If match whole word
if last_match in dictionary and dictionary[last_match] != False:
if replace_func is not None:
result.append(replace_func(original, dictionary[last_match]))
else:
result.append(dictionary[last_match])
else:
# Only match partial of keyword
match_parts = last_match.split(' ')
match_original = original.split(' ')
for i in xrange(0, len(match_parts)):
if match_parts[i] in dictionary and \
dictionary[match_parts[i]] != False:
if replace_func is not None:
result.append(replace_func(match_original[i], dictionary[match_parts[i]]))
else:
result.append(dictionary[match_parts[i]])
result.append(word)
last_match = ''
original = ''
return ' '.join(result)
If your keys have no spaces:
output = [dct[i] if i in dct else i for i in text.split()]
' '.join(output)
You should use dct instead of dict so it doesn't collide with the built in function dict()
This makes use of a dictionary comprehension, and a ternary operator
to filter the data.
If your keys do have spaces, you are correct:
for k,v in dct.iteritems():
string.replace('d', dct[d])
And yes, this time complexity will be m*n, as you have to iterate through the string every time for each key in dct.
Drop all dictionary keys and the input text to lower case, so the comparisons are easy. Now ...
for entry in my_dict:
if entry in text:
# process the match
This assumes that the dictionary is small enough to warrant the match. If, instead, the dictionary is large and the text is small, you'll need to take each word, then each 2-word phrase, and see whether they're in the dictionary.
Is that enough to get you going?
You need to test all the neighbor permutations from 1 (each individual word) to len(text) (the entire string). You can generate the neighbor permutations this way:
text = 'I have a smartphone and a Smart TV'
array = text.lower().split()
key_permutations = [" ".join(array[j:j + i]) for i in range(1, len(array) + 1) for j in range(0, len(array) - (i - 1))]
>>> key_permutations
['i', 'have', 'a', 'smartphone', 'and', 'a', 'smart', 'tv', 'i have', 'have a', 'a smartphone', 'smartphone and', 'and a', 'a smart', 'smart tv', 'i have a', 'have a smartphone', 'a smartphone and', 'smartphone and a', 'and a smart', 'a smart tv', 'i have a smartphone', 'have a smartphone and', 'a smartphone and a', 'smartphone and a smart', 'and a smart tv', 'i have a smartphone and', 'have a smartphone and a', 'a smartphone and a smart', 'smartphone and a smart tv', 'i have a smartphone and a', 'have a smartphone and a smart', 'a smartphone and a smart tv', 'i have a smartphone and a smart', 'have a smartphone and a smart tv', 'i have a smartphone and a smart tv']
Now we substitute through the dictionary:
import re
for permutation in key_permutations:
if permutation in dict:
text = re.sub(re.escape(permutation), dict[permutation], text, flags=re.IGNORECASE)
>>> text
'I have a toy and a junk'
Though you'll likely want to try the permutations in the reverse order, longest first, so more specific phrases have precedence over individual words.
You can do this pretty easily with regular expressions.
import re
text = 'I have a smartphone and a Smart TV'
dict = {
'smartphone': 'toy',
'smart tv': 'junk'
}
for k, v in dict.iteritems():
regex = re.compile(re.escape(k), flags=re.I)
text = regex.sub(v, text)
It still suffers from the problem of depending on processing order of the dict keys, if the replacement value for one item is part of the search term for another item.

Python: replace nth word in string

What is the easiest way in Python to replace the nth word in a string, assuming each word is separated by a space?
For example, if I want to replace the tenth word of a string and get the resulting string.
I guess you may do something like this:
nreplace=1
my_string="hello my friend"
words=my_string.split(" ")
words[nreplace]="your"
" ".join(words)
Here is another way of doing the replacement:
nreplace=1
words=my_string.split(" ")
" ".join([words[word_index] if word_index != nreplace else "your" for word_index in range(len(words))])
Let's say your string is:
my_string = "This is my test string."
You can split the string up using split(' ')
my_list = my_string.split()
Which will set my_list to
['This', 'is', 'my', 'test', 'string.']
You can replace the 4th list item using
my_list[3] = "new"
And then put it back together with
my_new_string = " ".join(my_list)
Giving you
"This is my new string."
A solution involving list comprehension:
text = "To be or not to be, that is the question"
replace = 6
replacement = 'it'
print ' '.join([x if index != replace else replacement for index,x in enumerate(s.split())])
The above produces:
To be or not to be, it is the question
You could use a generator expression and the string join() method:
my_string = "hello my friend"
nth = 0
new_word = 'goodbye'
print(' '.join(word if i != nth else new_word
for i, word in enumerate(my_string.split(' '))))
Output:
goodbye my friend
Through re.sub.
>>> import re
>>> my_string = "hello my friend"
>>> new_word = 'goodbye'
>>> re.sub(r'^(\s*(?:\S+\s+){0})\S+', r'\1'+new_word, my_string)
'goodbye my friend'
>>> re.sub(r'^(\s*(?:\S+\s+){1})\S+', r'\1'+new_word, my_string)
'hello goodbye friend'
>>> re.sub(r'^(\s*(?:\S+\s+){2})\S+', r'\1'+new_word, my_string)
'hello my goodbye'
Just replace the number within curly braces with the position of the word you want to replace - 1. ie, for to replace the first word, the number would be 0, for second word the number would be 1, likewise it goes on.

Categories

Resources