Python RegEx to replace string - python

How do I use RegEx (or something else in Python) for the following requirement?
I need to:
Remove the word "dream" (including all its stems)
Remove All previous words (i.e. all words behind the word "dream")
Remove the Word next to it (in front of it/to the right of "dream")
Remove the word "to" from all phrases.
Input:
text = ["Dream of a car",
"Dream to live in a world",
"Dream about 8am every morning",
"stopped dreaming today",
"still dreaming of a car",
"One more dream to come late tomorrow",
"Dream coming to hope tomorrow"]
Required Output:
["a car",
"live in a world",
"8am every morning",
" ",
"a car",
"come late tomorrow",
"hope tomorrow"]
I tried:
result = [re.sub('Dream', '', a) for a in text]
# MyOutput
[' of a car', ' to live in a world', ' about 8am every morning', 'stopped dreaming today', 'still dreaming of a car', 'One more dream to come late tomorrow', ' coming to hope tomorrow']

This gives your required output
result = [re.sub(r'\bto\b *', '', re.sub(r'^.*Dream[^ ]* *[^ ]* *', '', a, flags=re.I)) for a in text]
If you only want to remove the to at the front
result = [re.sub(r'^.*Dream[^ ]* *[^ ]* *(\bto\b)? *', '', a, flags=re.I) for a in text]

Related

Eliminating a white spaces from a string except for end of the string

I want to eliminate white spaces in a string except for end of the string
code:
sentence = ['He must be having a great time/n ', 'It is fun to play chess ', 'Sometimes TT is better than Badminton ']
pattern = "\s+^[\s+$]"
res = [re.sub(pattern,', ', line) for line in sentence]
print(res)
But...
output is same input list.
['He must be having a great time/n ', 'It is fun to play chess ', 'Sometimes TT is better than Badminton ']
Can anyone suggest the right solution.
code:
sentence = ['He must be having a great time ', 'It is fun to play chess ', 'Sometimes TT is better than Badminton ']
pattern = "\s+^[\s+$]"
res = [re.sub(pattern,', ', line) for line in sentence]
print(res)
But...
output is same input list.
['He must be having a great time/n ', 'It is fun to play chess ', 'Sometimes TT is better than Badminton ']
expected output:
['He,must,be,having,a,great,time', 'It,is,fun,to,play,chess', 'Sometimes,TT,is,better,than,Badminton ']
We can first strip off leading/trailing whitespace, then do a basic replacement of space to comma:
import re
sentence = ['He must be having a great time\n ', 'It is fun to play chess ', 'Sometimes TT is better than Badminton ']
output = [re.sub(r'\s+', ',', x.strip()) for x in sentence]
print(output)
This prints:
['He,must,be,having,a,great,time',
'It,is,fun,to,play,chess',
'Sometimes,TT,is,better,than,Badminton']
You can use a simpler split/join method (timeit: 1.48 µs ± 74 ns).
str.split() will split on groups of whitespace characters (space or newline for instance).
str.join(iter) will join the elements of iter with the str it is used on.
Demo:
sentence = [
"He must be having a great time\n ",
"It is fun to play chess ",
"Sometimes TT is better than Badminton ",
]
[",".join(s.split()) for s in sentence]
gives
['He,must,be,having,a,great,time',
'It,is,fun,to,play,chess',
'Sometimes,TT,is,better,than,Badminton']
Second method, strip/replace (timeit: 1.56 µs ± 107 ns).
str.strip() removes all whitespace characters at the beginning and then end of str.
str.replace(old, new) replaces all occurences of old in str with new (works because you have single spaces between words in your strings).
Demo:
sentence = [
"He must be having a great time\n ",
"It is fun to play chess ",
"Sometimes TT is better than Badminton ",
]
[s.strip().replace(" ", ",") for s in sentence]
gives
['He,must,be,having,a,great,time',
'It,is,fun,to,play,chess',
'Sometimes,TT,is,better,than,Badminton']
def eliminating_white_spaces(list):
for string in range(0,len(list)):
if ' ' in list[string] and string+1==len(list):
pass
else:
list[string]=str(list[string]).replace(' ',',')
return list

Python remove sentence if it is at start of string and starts with specific words?

I have strings that looks like:
docs = ['Hi, my name is Eric. Are you blue?',
"Hi, I'm ! What is your name?",
'This is a great idea. I would love to go.',
'Hello, I am Jane Brown. What is your name?',
"Hello, I am a doctor! Let's go to the mall.",
'I am ready to go. Mom says hello.']
I want to remove the first sentence of a string if it starts with 'Hi' or 'Hello'.
Desired Output:
docs = ['Are you blue?',
'What is your name?',
'This is a great idea. I would love to go.',
'What is your name?',
"Let's go to the mall."
'I am ready to go. Mom says hello.']
The regex I have is:
re.match('.*?[a-z0-9][.?!](?= )', x))
But this only give be the first sentence in weird format like:
<re.Match object; span=(0, 41), match='Hi, my name is Eric.'>
What can I do to get my desired output?
You can use
docs = [re.sub(r'^H(?:ello|i)\b.*?[.?!]\s+', '', doc) for doc in docs]
See the regex demo. Details:
^ - start of string
H(?:ello|i)\b - Hello or Hi word (\b is a word boundary)
.*? - any zero or more chars other than line break chars as few as possible
[.?!] - a ., ? or !
\s+ - one or more whitespaces.
See the Python demo:
import re
docs = ['Hi, my name is Eric. Are you blue?',
"Hi, I'm ! What is your name?",
'This is a great idea. I would love to go.',
'Hello, I am Jane Brown. What is your name?',
"Hello, I am a doctor! Let's go to the mall.",
'I am ready to go. Mom says hello.']
docs = [re.sub(r'^H(?:ello|i)\b.*?[.?!]\s+', '', doc) for doc in docs]
print(docs)
Output:
[
'Are you blue?',
'What is your name?',
'This is a great idea. I would love to go.',
'What is your name?',
"Let's go to the mall.",
'I am ready to go. Mom says hello.'
]
You would have to first split the string in sentences
splitted_docs = []
for str in docs:
splitted_docs.append(str.split('.'))
Then, you want to check each sentence for Hi or Hello with your regex and add it to the final array
final_docs = []
for str in splitted_docs:
final_sentence = []
for sentence in str:
if not re.match('.*?[a-z0-9][.?!](?= )', sentence):
final_sentence.append(sentence)
final_docs.append(final_sentence.join('.'))
Actually, your regex is not working, just changed the code to make it work, i goes just like follows:
for str in splitted_docs:
final_sentence = []
for sentence in str:
if not 'Hello' in sentence and not 'Hi' in sentence:
final_sentence.append(sentence)
final_docs.append('.'.join(final_sentence))
Finally, filter your array to remove all the empty strings that may have been created in the process of joining:
final_docs = list(filter(lambda x: x != '', final_docs))
print(final_docs)
Output:
[' Are you blue?', 'This is a great idea. I would love to go.', ' What is your name?', 'I am ready to go. Mom says hello.']
I'll leave the full code here, any suggestion is welcome, I am sure this can be solved in a more functional approach that may be easier to understand, but I am not familiar with it to such a level.
import re
docs = ['Hi, my name is Eric. Are you blue?',
"Hi, I'm ! What is your name?",
'This is a great idea. I would love to go.',
'Hello, I am Jane Brown. What is your name?',
"Hello, I am a doctor! Let's go to the mall.",
'I am ready to go. Mom says hello.']
splitted_docs = []
for str in docs:
splitted_docs.append(str.split('.'))
final_docs = []
for str in splitted_docs:
final_sentence = []
for sentence in str:
if not 'Hello' in sentence and not 'Hi' in sentence:
final_sentence.append(sentence)
final_docs.append('.'.join(final_sentence))
final_docs = list(filter(lambda x: x != '', final_docs))
print(final_docs)

Merging consecutive elements recursively in a list of strings in Python

I have list of strings which need to be transformed into a smaller list of strings, depending whether two consecutive elements belong to the same phrase.
This happens, at the moment, if the last character of the i-th string is lower and the first character of the i+1-th string is also lower, but more complex conditions should be checked in the future.
For example this very profound text:
['I am a boy',
'and like to play'
'My friends also'
'like to play'
'Cats and dogs are '
'nice pets, and'
'we like to play with them'
]
should become:
['I am a boy and like to play',
'My friends also like to play',
'Cats and dogs are nice pets, and we like to play with them'
]
My python solution
I think the data you have posted is comma seperated. If it is pfb a simple loop solution.
data=['I am a boy',
'and like to play',
'My friends also',
'like to play',
'Cats and dogs are ',
'nice pets, and',
'we like to play with them'
]
required_list=[]
for j,i in enumerate(data):
print(i,j)
if j==0:
req=i
else:
if i[0].isupper():
required_list.append(req)
req=i
else:
req=req+" "+i
required_list.append(req)
print(required_list)
Here is your code check it
data = ['I am a boy',
'and like to play'
'My friends also'
'like to play'
'Cats and dogs are '
'nice pets, and'
'we like to play with them'
]
joined_string = ",".join(data).replace(',',' ')
import re
values = re.findall('[A-Z][^A-Z]*', joined_string)
print(values)
Since you want to do it recursively, you can try something like this:
def join_text(text, new_text):
if not text:
return
if not new_text:
new_text.append(text.pop(0))
return join_text(text, new_text)
phrase = text.pop(0)
if phrase[0].islower(): # you can add more complicated logic here
new_text[-1] += ' ' + phrase
else:
new_text.append(phrase)
return join_text(text, new_text)
phrases = [
'I am a boy',
'and like to play',
'My friends also',
'like to play',
'Cats and dogs are ',
'nice pets, and',
'we like to play with them'
]
joined_phrases = []
join_text(phrases, joined_phrases)
print(joined_phrases)
My solution has some problems with witespaces, but I hope you got the idea.
Hope it helps!

Split string based on delimiter and word using re

I am using Python for natural language processing. I am trying to split my input string using re. I want to split using ;,. as well as word but.
import re
print (re.split("[;,.]", 'i am; working here but you are. working here, as well'))
['i am', ' working here but you are', ' working here', ' as well']
How to do that? When I put in word but in regex, it treats every character as splitting criterion. How do I get following output?
['i am', ' working here', 'you are', ' working here', ' as well']
you can filter as it : but | [;,.]
It will search for char ; , and . but also for word but !
import re
print (re.split("but |[;,.]", 'i am; working here but you are. working here, as well'))
hope this help.
Even this one works:
import re
print (re.split('; |, |\. | but', 'i am; working here but you are. working here, as well'))
Output:
['i am', 'working here', ' you are', 'working here', 'as well']

Python regular expression search

I was trying to search for all those phrases with the key word 'car':
e.g. text = 'alice: speed car, my red car, new car', I would like to find 'speed car', 'my red car', 'new car'.
import re
text = 'alice: speed car, my red car, new car'
regex = r'([a-zA-Z]+\s)+car'
match = re.findall(regex, text)
if match:
print(match)
but the above code yields:
["speed ", "red ", "new "]
instead of
["speed car", "my red car", "new car"]
which is expected?
Problem is you're not capturing 'car' in your regex, put the whole regex inside a () and and use ?: for the inner regex to make it a non-capturing group.
>>> regex = r'((?:[a-zA-Z]+\s)+car)'
>>> text = 'alice: speed car, my red car, new car'
>>> re.findall(regex, text)
['speed car', 'my red car', 'new car']

Categories

Resources