How to print a text without substring in Python - python

I want to search for a word in the text and then print the text without that word. For example, we have the text "I was with my friend", I want the text be "I with my friend". I have done the following so far:
text=re.compile("[^was]")
val = "I was with my friend"
if text.search(val):
print text.search(val) #in this line it is obvious wrong
else:
print 'no'

val = "I was with my friend"
print val.replace("was ", "")
Output:
I with my friend

If you want to remove what you've found using a regular expression:
match = text.search(val)
if match is not None:
print val.replace(match.group(0), "")
(However, if you were searching for the word was then your pattern is wrong.)

Substitute an empty string if matched.
text=re.compile(r"was")
val = "I was with my friend"
if text.search(val):
print text.sub('',val)
else:
print 'no'
or you can split by match and join again.
if text.search(val):
print(''.join(text.split(val)))

May be something like this:
print val[:val.index('was')] + val[val.index('was ') + 4:]
This example assumes that word is was. But you can define a variable and use that variable
search_word = 'was'
print val[:val.index(search_word)] + val[val.index(search_word) + len(search_word) + 1:]
Also, this only works for the first occurrence of the search_word and doesn't do any validation if it contains the word or not

to search a substring simply do
if 'was' in 'i was with my friend':
print ...

Related

How to remove whitespaces in a string except from between certain elements

I have a string similar to (the below one is simplified):
" word= {his or her} whatever "
I want to delete every whitespace except between {}, so that my modified string will be:
"word={his or her}whatever"
lstrip or rstrip doesn't work of course. If I delete all whitespaces the whitespaces between {} are deleted as well. I tried to look up solutions for limiting the replace function to certain areas but even if I found out it I haven't been able to implement it. There are some stuff with regex (I am not sure if they are relevant here) but I haven't been able to understand them.
EDIT: If I wanted to except the area between, say {} and "", that is:
if I wanted to turn this string:
" word= {his or her} and "his or her" whatever "
into this:
"word={his or her}and"his or her"whatever"
What would I change
re.sub(r'\s+(?![^{]*})', '', list_name) into?
See instead going arround re you can replace uisng string.replace. Which will be much more easier and less complex when you playing around strings. Espacillay when you have multiple substitutions you end up bigger regex.
st =" word= {his or her} whatever "
st2=""" word= {his or her} and "his or her" whatever """
new = " ".join(st2.split())
new = new.replace("= ", "=").replace("} ", "}").replace('" ' , '"').replace(' "' , '"')
print(new)
Some outputs
Example 1 output
word={his or her}whatever
Example 2 output
word={his or her}and"his or her"whatever
You can use by replace
def remove(string):
return string.replace(" ", "")
string = 'hell o whatever'
print(remove(string)) // Output: hellowhatever

Python - If a "full" word is inside a text, print out else don't

I am having issue where I want to compare a word inside a text - meaning if there is a word that contains inside a text it should print out.
The issue is that I am having let say I have a word that is "lo" - and my text is = "hello guys, my name is Stackoverflow" - it will print out that whole text there is a lo inside this text which are inside "hello" and "stackoverflow"
my question is how can I make whenever I want to search for a word such as "lo" it should take it as a word and not print out if it etc. contains inside a word such as "hello" or "stackoverflow" - Only print out if it has the word "lo"?
keywords = ["Lo"]
for text in keywords:
if text in text_roman():
print("Yay found word")
Split up the string into words then test for the substring in each of the words.
For word in s.split():
If q in word:
Print word
You could do this but there are like... 400 edge cases that will make this a problem.
text = "This is my text"
keywords = ["Lo"]
if len(set(text.split()).intersection(set(keywords))) > 0:
print("Yes")
use string.find() . It returns the index of the substring you're looking for, and -1 if not found. So you can apply if statement to check whether it is a substring or not.
s='Hello there Stack!'
if (s.find('llo')!=-1):
print('String found')
Hope this helped!
The most straightforward way is probably to use a regex. You can play around with regexes here and figure out how to implement them in python here.
import re
target_strings = ["lo", "stack", "hell", "cow", "hello", "overf"]
for target in target_strings:
re_target = re.compile(r"\b({})\b".format(target), flags=re.IGNORECASE)
if re.search(re_target, "Hello stack overflow lo"):
print(target)
>>> lo
>>> stack
>>> hello

Find first word in string Python

I have to write a single function that should return the first word in the following strings:
("Hello world") -> return "Hello"
(" a word ") -> return "a"
("don't touch it") -> return "don't"
("greetings, friends") -> return "greetings"
("... and so on ...") -> return "and"
("hi") -> return "hi"
All have to return the first word and as you can see some start with a whitespace, have apostrophes or end with commas.
I've used the following options:
return text.split()[0]
return re.split(r'\w*, text)[0]
Both error at some of the strings, so who can help me???
Try the below code. I tested with all your inputs and it works fine.
import re
text=["Hello world"," a word ","don't touch it","greetings, friends","... and so on ...","hi"]
for i in text:
rgx = re.compile("(\w[\w']*\w|\w)")
out=rgx.findall(i)
print out[0]
Output:
Hello
a
don't
greetings
and
hi
It is tricky to distinguish apostrophes which are supposed to be part of a word and single quotes which are punctuation for the syntax. But since your input examples do not show single quotes, I can go with this:
re.match(r'\W*(\w[^,. !?"]*)', text).groups()[0]
For all your examples, this works. It won't work for atypical stuff like "'tis all in vain!", though. It assumes that words end on commas, dots, spaces, bangs, question marks, and double quotes. This list can be extended on demand (in the brackets).
try this one:
>>> def pm(s):
... p = r"[a-zA-Z][\w']*"
... m = re.search(p,s)
... print m.group(0)
...
test result:
>>> pm("don't touch it")
don't
>>> pm("Hello w")
Hello
>>> pm("greatings, friends")
greatings
>>> pm("... and so on...")
and
>>> pm("hi")
hi
A non-regex solution: stripping off leading punctation/whitespace characters, splitting the string to get the first word, then removing trailing punctuation/whitespace:
from string import punctuation, whitespace
def first_word(s):
to_strip = punctuation + whitespace
return s.lstrip(to_strip).split(' ', 1)[0].rstrip(to_strip)
tests = [
"Hello world",
"a word",
"don't touch it",
"greetings, friends",
"... and so on ...",
"hi"]
for test in tests:
print('#{}#'.format(first_word(test)))
Outputs:
#Hello#
#a#
#don't#
#greetings#
#and#
#hi#
You can try something like this:
import re
pattern=r"[a-zA-Z']+"
def first_word(words_tuple):
match=re.findall(pattern,words_tuple)
for i in match:
if i[0].isalnum():
return i
print(first_word(("don't touch it")))
output:
don't
I've done this by using the first occurrence of white space to stop the "getting" of the first word. Something like this:
stringVariable = whatever sentence
firstWord = ""
stringVariableLength = len(stringVariable)
for i in range(0, stringVariableLength):
if stringVariable[i] != " ":
firstWord = firstWord + stringVariable[i]
else:
break
This code will parse through the string variable that you want to get the first word of, and add it into a new variable called firstWord, until it gets to the first occurance of white space. I'm not exactly sure how you would put that into a function as I'm pretty new to this whole thing, but I'm sure it could be done!

Python - Print Each Sentence On New Line

Per the subject, I'm trying to print each sentence in a string on a new line. With the current code and output shown below, what's the syntax to return "Correct Output" shown below?
Code
sentence = 'I am sorry Dave. I cannot let you do that.'
def format_sentence(sentence):
sentenceSplit = sentence.split(".")
for s in sentenceSplit:
print s + "."
Output
I am sorry Dave.
I cannot let you do that.
.
None
Correct Output
I am sorry Dave.
I cannot let you do that.
You can do this :
def format_sentence(sentence) :
sentenceSplit = filter(None, sentence.split("."))
for s in sentenceSplit :
print s.strip() + "."
There are some issues with your implementation. First, as Jarvis points out in his answer, if your delimiter is the first or last character in your string or if two delimiter characters are right next to each other, None will be inserted into your array. To fix this, you need to filter out the None values. Also, instead of using the + operator, use formatting instead.
def format_sentence(sentences):
sentences_split = filter(None, sentences.split('.'))
for s in sentences_split:
print '{0}.'.format(s.strip())
You can split the string by ". " instead of ".", then print each line with an additional "." until the last one, which will have a "." already.
def format_sentence(sentence):
sentenceSplit = sentence.split(". ")
for s in sentenceSplit[:-1]:
print s + "."
print sentenceSplit[-1]
Try:
def format_sentence(sentence):
print(sentence.replace('. ', '.\n'))

Find term with multiple words from dictionary

So I'm doing a project where I'm finding words from a file, and then checking to see if it is in the dictionary. I don't know if I'm following the proper syntax because it prints out the else statement that it doesn't find "does not work" in dictionary.
Does it have anything to do with the spaces in between?
test for term with multiple words -- does not work: -3
if 'does not work' in dictionary:
expected_value3 = str(-3)
actual_value3 = dictionary['does not work']
if actual_value3 == expected_value3:
print "---------------------------------"
print "words with spaces passes| word: does not work"
else:
print "---------------------------------"
print "words with spaces FALSE| word: does not work"
else:
print "---------------------------------"
print "does not work not in dictionary"
To deal with phrases you can't split a line in the dictionary file of the format word def by spaces because word might be made up of several words with spaces in-between. You need to have a character which won't appear in word or def to separate them, for instance a tab \t or pipe |, and then build your dictionary like so:
d = {}
with open('dict.txt') as df:
for line in df:
word,definition = line.split('\t')
d[word] = definition
Otherwise you end up with
sentiments = ['does','not','work','the','operation',...]
In your loop, and you end up setting
dictionary['does'] = 'not'
With the code
for line in scores_file:
sentiments = line.split()
dictionary[sentiments[0]] = sentiments[1]

Categories

Resources