I have a list as follows.
mylist = ['test copy', 'test project', 'test', 'project']
I want to see if my sentence includes the aforementioned mylistelements and split the sentence from the first match and obtain its first part.
For example:
mystring1 = 'it was a nice test project and I enjoyed it a lot'
output should be: it was a nice
mystring2 = 'the example test was difficult'
output should be: the example
My current code is as follows.
for sentence in L:
if mylist in sentence:
splits = sentence.split(mylist)
sentence= splits[0]
However, I get an error saying TypeError: 'in <string>' requires string as left operand, not list. Is there a way to fix this?
You need another for loop to iterate over every string in mylist.
mylist = ['test copy', 'test project', 'test', 'project']
mystring1 = 'it was a nice test project and I enjoyed it a lot'
mystring2 = 'the example test was difficult'
L = [mystring1, mystring2]
for sentence in L:
for word in mylist:
if word in sentence:
splits = sentence.split(word)
sentence= splits[0]
print(sentence)
# it was a nice
# the example
Probably the most effective way to do this is by first constructing a regex, that tests all the strings concurrently:
import re
split_regex = re.compile('|'.join(re.escape(s) for s in mylist))
for sentence in L:
first_part = split_regex.split(sentence, 1)[0]
This yields:
>>> split_regex.split(mystring1, 1)[0]
'it was a nice '
>>> mystring2 = 'the example test was difficult'
>>> split_regex.split(mystring2, 1)[0]
'the example '
If the number of possible strings is large, a regex can typically outperform searching each string individually.
You probably also want to .strip() the string (remove spaces in the front and end of the string):
import re
split_regex = re.compile('|'.join(re.escape(s) for s in mylist))
for sentence in L:
first_part = split_regex.split(sentence, 1)[0].strip()
mylist = ['test copy', 'test project', 'test', 'project']
L = ['it was a nice test project and I enjoyed it a lot','a test copy']
for sentence in L:
for x in mylist:
if x in sentence:
splits = sentence.split(x)
sentence= splits[0]
print(sentence)
the error says you are trying to check a list in sentence. so you must iterate on elements of list.
Related
I have a list of strings in python, where I need to preserve order and split some strings.
The condition to split a string is that after first match of : there is a none space/new line/tab char.
For example, this must be split:
example: Test to ['example':, 'Test']
While this stays the same: example: , IGNORE_ME_EXAMPLE
Given an input like this:
['example: Test', 'example: ', 'IGNORE_ME_EXAMPLE']
I'm expecting:
['example:', 'Test', 'example: ', 'IGNORE_ME_EXAMPLE']
Please Note that split strings are yet stick to each other and follow original order.
Plus, whenever I split a string I don't want to check split parts again. In other words, I don't want to check 'Test' after I split it.
To make it more clear, Given an input like this:
['example: Test::YES']
I'm expecting:
['example:', 'Test::YES']
You can use regular expressions for that:
import re
pattern = re.compile(r"(.+:)\s+([^\s].+)")
result = []
for line in lines:
match = pattern.match(line)
if match:
result.append(match.group(1))
result.append(match.group(2))
else:
result.append(line)
You can use nested loop comprehension for the input list:
l = ['example: Test::YES']
l1 = [j.lower().strip() for i in l for j in i.split(":", 1) if j.strip().lower() != '']
print(l1)
Output:
['example', 'Test::YES']
you need to iterate over your list of words, for each word, you need to check if : present or not. if present the then split the word in 2 parts, pre : and post part. append these pre and post to final list and if there is no : in word add that word in the result list and skip other operation for that word
# your code goes here
wordlist = ['example:', 'Test', 'example: ', 'IGNORE_ME_EXAMPLE']
result = []
for word in wordlist:
index = -1
part1, part2 = None, None
if ':' in word:
index = word.index(':')
else:
result.append(word)
continue
part1, part2 = word[:index+1], word[index+1:]
if part1 is not None and len(part1)>0:
result.append(part1)
if part2 is not None and len(part2)>0:
result.append(part2)
print(result)
output
['example:', 'Test', 'example:', ' ', 'IGNORE_ME_EXAMPLE']
I am trying to split a string such as the one below, with all of the delimiters below, but only once.
string = 'it; seems; like\ta good\tday to watch\va\vmovie.'
delimiters = '\t \v ;'
The output, in this case, would be:
['it', ' seems; like', 'a good\tday to watch', 'a\vmovie.']
Obviously the example above is a nonsense example, but I am trying to learn whether or not this is possible. Would a fairly involved regex be in order?
Apologies if this question had been asked before. I did a fair bit of searching and could not find something quite like my example. Thanks for your time!
This should do the trick:
import re
def split_once_by(s, delims):
delims = set(delims)
parts = []
while delims:
delim_re = '({})'.format('|'.join(re.escape(d) for d in delims))
result = re.split(delim_re, s, maxsplit=1)
if len(result) == 3:
first, delim, s = result
parts.append(first)
delims.remove(delim)
else:
break
parts.append(s)
return parts
Example:
>>> split_once_by('it; seems; like\ta good\tday to watch\va\vmovie.', '\t\v;')
['it', ' seems; like', 'a good\tday to watch', 'a\x0bmovie.']
Burning Alcohol's answer inspired me to write this (IMO) better function:
def split_once_by(s, delims):
split_points = sorted((s.find(d), -len(d), d) for d in delims)
start = 0
for stop, _longest_first, d in split_points:
if stop < start: continue
yield s[start:stop]
start = stop + len(d)
yield s[start:]
with usage:
>>> list(split_once_by('it; seems; like\ta good\tday to watch\va\vmovie.', '\t\v;'))
['it', ' seems; like', 'a good\tday to watch', 'a\x0bmovie.']
A simple algorithm would do,
test_string = r'it; seems; like\ta good\tday to watch\va\vmovie.'
delimiters = [r'\t', r'\v', ';']
# find the index of each first occurence and sort it
delimiters = sorted(delimiters, key=lambda delimiter: test_string.find(delimiter))
splitted_string = [test_string]
# perform split with option maxsplit
for index, delimiter in enumerate(delimiters):
if delimiter in splitted_string[-1]:
splitted_string += splitted_string[-1].split(delimiter, maxsplit=1)
splitted_string.pop(index)
print(splitted_string)
# ['it', ' seems; like', 'a good\\tday to watch', 'a\\vmovie.']
Just create a list of patterns and apply them once:
string = 'it; seems; like\ta good\tday to watch\va\vmovie.'
patterns = ['\t', '\v', ';']
for pattern in patterns:
string = '*****'.join(string.split(pattern, maxsplit=1))
print(string.split('*****'))
Output:
['it', ' seems; like', 'a good\tday to watch', 'a\x0bmovie.']
So, what is "*****" ??
On each iteration, when you apply the split method you get a list. So, in the next iteration, You can't apply the .split () method (because you have a list), so you have to join each value of that list with some weird character like "****" or "###" or "^^^^^^^" or whatever you want, in order to re-apply the split () in the next iteration.
Finally, for each "*****" on your string, you will have one pattern of the list, so you can use this to make a final split.
I'm trying to compare a python list with a string and highlight matches with mark inside a new string.
But it won't work. Following example:
my_string = 'This is my string where i would find matches of my List'
my_list = ['THIS IS', 'WOULD FIND', 'BLA', 'OF MY LIST']
result_that_i_need = '<mark>This is</mark> my string where i <mark>would find</mark> matches <mark>of my List</mark>'
Has anybody any idea how to solve this? Can somebody help me please?
I tried following:
my_string = 'This is my string where i would find matches of my List'
my_string_split = string.split()
my_list = ['This is', 'would find', 'bla', 'of my List']
input_list=[]
for my_li in my_list:
if my_li in my_string:
input_list.append(my_li)
input_list_join = " ".join(input_list)
new_list = []
for my_string_spl in my_string_split:
if my_string_spl in input_list_join:
new_list.append('<mark>'+ my_string_spl + '</mark>')
else:
new_list.append(my_string_spl)
result = " ".join(new_list)
print(result)
Maybe something like this:
my_string = 'This is my string where i would find matches of my List'
my_list = ['This is', 'would find', 'bla', 'of my List']
result = my_string
for match in my_list:
if match in my_string:
result = result.replace(match, '<mark>' + match + '</mark>')
print(result)
Output:
<mark>This is</mark> my string where i <mark>would find</mark> matches <mark>of my List</mark>
I have 2 scenarios so split a string
scenario 1:
"##$hello?? getting good.<li>hii"
I want to be split as 'hello','getting','good.<li>hii (Scenario 1)
'hello','getting','good','li,'hi' (Scenario 2)
Any ideas please??
Something like this should work:
>>> re.split(r"[^\w<>.]+", s) # or re.split(r"[##$? ]+", s)
['', 'hello', 'getting', 'good.<li>hii']
>>> re.split(r"[^\w]+", s)
['', 'hello', 'getting', 'good', 'li', 'hii']
This might be what your looking for \w+ it matches any digit or letter from 1 to n times as many times as possible. Here is a working Java-Script
var value = "##$hello?? getting good.<li>hii";
var matches = value.match(
new RegExp("\\w+", "gi")
);
console.log(matches)
It works by using \w+ which matches word characters as many times as possible. You cound also use [A-Za-b] to match only letters which not numbers. As show here.
var value = "##$hello?? getting good.<li>hii777bloop";
var matches = value.match(
new RegExp("[A-Za-z]+", "gi")
);
console.log(matches)
It matches what are in the brackets 1 to n timeas as many as possible. In this case the range a-z of lower case charactors and the range of A-Z uppder case characters. Hope this is what you want.
For first scenario just use regex to find all words that are contain word characters and <>.:
In [60]: re.findall(r'[\w<>.]+', s)
Out[60]: ['hello', 'getting', 'good.<li>hii']
For second one you need to repleace the repeated characters only if they are not valid english words, you can do this using nltk corpus, and re.sub regex:
In [61]: import nltk
In [62]: english_vocab = set(w.lower() for w in nltk.corpus.words.words())
In [63]: repeat_regexp = re.compile(r'(\w*)(\w)\2(\w*)')
In [64]: [repeat_regexp.sub(r'\1\2\3', word) if word not in english_vocab else word for word in re.findall(r'[^\W]+', s)]
Out[64]: ['hello', 'getting', 'good', 'li', 'hi']
In case you are looking for solution without regex. string.punctuation will give you list of all special characters.
Use this list with list comprehension for achieving your desired result as:
>>> import string
>>> my_string = '##$hello?? getting good.<li>hii'
>>> ''.join([(' ' if s in string.punctuation else s) for s in my_string]).split()
['hello', 'getting', 'good', 'li', 'hii'] # desired output
Explanation: Below is the step by step instruction regarding how it works:
import string # Importing the 'string' module
special_char_string = string.punctuation
# Value of 'special_char_string': '!"#$%&\'()*+,-./:;<=>?#[\\]^_`{|}~'
my_string = '##$hello?? getting good.<li>hii'
# Generating list of character in sample string with
# special character replaced with whitespace
my_list = [(' ' if item in special_char_string else item) for item in my_string]
# Join the list to form string
my_string = ''.join(my_list)
# Split it based on space
my_desired_list = my_string.strip().split()
The value of my_desired_list will be:
['hello', 'getting', 'good', 'li', 'hii']
What is the easiest way in Python to replace the nth word in a string, assuming each word is separated by a space?
For example, if I want to replace the tenth word of a string and get the resulting string.
I guess you may do something like this:
nreplace=1
my_string="hello my friend"
words=my_string.split(" ")
words[nreplace]="your"
" ".join(words)
Here is another way of doing the replacement:
nreplace=1
words=my_string.split(" ")
" ".join([words[word_index] if word_index != nreplace else "your" for word_index in range(len(words))])
Let's say your string is:
my_string = "This is my test string."
You can split the string up using split(' ')
my_list = my_string.split()
Which will set my_list to
['This', 'is', 'my', 'test', 'string.']
You can replace the 4th list item using
my_list[3] = "new"
And then put it back together with
my_new_string = " ".join(my_list)
Giving you
"This is my new string."
A solution involving list comprehension:
text = "To be or not to be, that is the question"
replace = 6
replacement = 'it'
print ' '.join([x if index != replace else replacement for index,x in enumerate(s.split())])
The above produces:
To be or not to be, it is the question
You could use a generator expression and the string join() method:
my_string = "hello my friend"
nth = 0
new_word = 'goodbye'
print(' '.join(word if i != nth else new_word
for i, word in enumerate(my_string.split(' '))))
Output:
goodbye my friend
Through re.sub.
>>> import re
>>> my_string = "hello my friend"
>>> new_word = 'goodbye'
>>> re.sub(r'^(\s*(?:\S+\s+){0})\S+', r'\1'+new_word, my_string)
'goodbye my friend'
>>> re.sub(r'^(\s*(?:\S+\s+){1})\S+', r'\1'+new_word, my_string)
'hello goodbye friend'
>>> re.sub(r'^(\s*(?:\S+\s+){2})\S+', r'\1'+new_word, my_string)
'hello my goodbye'
Just replace the number within curly braces with the position of the word you want to replace - 1. ie, for to replace the first word, the number would be 0, for second word the number would be 1, likewise it goes on.