How to print specific words in colour on python? - python

I want to print a specific word a different color every time it appears in the text. In the existing code, I've printed the lines that contain the relevant word "one".
import json
from colorama import Fore
fh = open(r"fle.json")
corpus = json.loads(fh.read())
for m in corpus['smsCorpus']['message']:
identity = m['#id']
text = m['text']['$']
strtext = str(text)
utterances = strtext.split()
if 'one' in utterances:
print(identity,text, sep ='\t')
I imported Fore but I don't know where to use it. I want to use it to have the word "one" in a different color.
output (section of)
44814 Ohhh that's the one Johnson told us about...can you send it to me?
44870 Kinda... I went but no one else did, I so just went with Sarah to get lunch xP
44951 No, it was directed in one place loudly and stopped when I stoppedmore or less
44961 Because it raised awareness but no one acted on their new awareness, I guess
44984 We need to do a fob analysis like our mcs onec
Thank you

You could also just use the ANSI color codes in your strings:
# define aliases to the color-codes
red = "\033[31m"
green = "\033[32m"
blue = "\033[34m"
reset = "\033[39m"
t = "That was one hell of a show for a one man band!"
utterances = t.split()
if "one" in utterances:
# figure out the list-indices of occurences of "one"
idxs = [i for i, x in enumerate(utterances) if x == "one"]
# modify the occurences by wrapping them in ANSI sequences
for i in idxs:
utterances[i] = red + utterances[i] + reset
# join the list back into a string and print
utterances = " ".join(utterances)
print(utterances)

If you only have 1 coloured word you can use this I think, you can expand the logic for n coloured words:
our_str = "Ohhh that's the one Johnson told us about...can you send it to me?"
def colour_one(our_str):
if "one" in our_str:
str1, str2 = our_str.split("one")
new_str = str1 + Fore.RED + 'one' + Style.RESET_ALL + str2
else:
new_str = our_str
return new_str
I think this is an ugly solution, not even sure if it works. But it's a solution if you can't find anything else.

i use colour module from this link or colored module that link
Furthermore if you dont want to use a module for coloring you can address to this link or that link

Related

Is there a way in python to count sentences having quotation marks, question mark and full stop?

I have been searching for the solution to this problem. I am writing a custom function to count number of sentences. I tried nltk and textstat for this problem but both are giving me different counts.
An Example of a sentence is something like this.
Annie said, "Are you sure? How is it possible? you are joking, right?"
NLTK is giving me --> count=3.
['Annie said, "Are you sure?', 'How is it possible?', 'you are
joking, right?"']
another example:
Annie said, "It will work like this! you need to go and confront your
friend. Okay!"
NLTK is giving me --> count=3.
Please suggest. The expected count is 1 as it is a single direct sentence.
I have written a simple function that does what you want:
def sentences_counter(text: str):
end_of_sentence = ".?!…"
# complete with whatever end of a sentence punctuation mark I might have forgotten
# you might for instance want to add '\n'.
sentences_count = 0
sentences = []
inside_a_quote = False
start_of_sentence = 0
last_end_of_sentence = -2
for i, char in enumerate(text):
# quote management, to solve your issue
if char == '"':
inside_a_quote = not inside_a_quote
if not inside_a_quote and text[i-1] in end_of_sentence: # 🚩
last_end_of_sentence = i # 🚩
elif inside_a_quote:
continue
# basic management of sentences with the punctuation marks in `end_of_sentence`
if char in end_of_sentence:
last_end_of_sentence = i
elif last_end_of_sentence == i-1:
sentences.append(text[start_of_sentence:i].strip())
sentences_count += 1
start_of_sentence = i
# same as the last block in case there is no end punctuation mark in the text
last_sentence = text[start_of_sentence:]
if last_sentence:
sentences.append(last_sentence.strip())
sentences_count += 1
return sentences_count, sentences
Consider the following:
text = '''Annie said, "Are you sure? How is it possible? you are joking, right?" No, I'm not... I thought you were'''
To generalize your problem a bit, I added 2 more sentences, one with ellipsis and the last one without even any end punctuation mark. Now, if I execute this:
sentences_count, sentences = sentences_counter(text)
print(f'{sentences_count} sentences detected.')
print(f'The detected sentences are: {sentences}')
I obtain this:
3 sentences detected.
The detected sentences are: ['Annie said, "Are you sure? How is it possible? you are joking, right?"', "No, I'm not...", 'I thought you were']
I think it works fine.
Note: Please consider the quote management of my solution works for American style quotes, where the end punctuation mark of the sentence can be inside of the quote. Remove the lines where I have put flag emojis 🚩 to disable this.

Python: print condition

I found that I just asked the wrong question a few minutes ago, sorry about that. I ran a code that need to identify if the word in certain location matches my condition.
The original code is not in English, I just tried to use a simple way to show you the problem I had. There's actually no space between words in my language, so use split or re is not working.
I need to find the word before "car" to know whether someone loves the car or not. So I used location as conditions to identify it.
For example: (But it will be too long)
message="I do not like cars."
#print(message[14:18]) #cars starts from location 14
location = 14
if message[int(loca)-5:int(loca)-1]=="like":
print("like")
elif message[int(loca)-8:int(loca)-1]=="dislike":
print("dislike")
elif message[int(loca)-5:int(loca)-1]=="hate":
print("hate")
elif message[int(loca)-5:int(loca)-1]=="cool":
print("cool")
I actually used this one in my code, but found that I could not print the word:
if (
message[int(location) - 5:int(location) - 1] == "like" or
message[int(location) - 8:int(location) - 1] == "dislike" or
message[int(location) - 5:int(location) - 1] == "hate" or
message[int(location) - 5:int(location) - 1] == "cool"
):
#print "like"
#unable to do it
Is there anyway I can solve it by printing the matching word?
Looks like you need Regex:
import re
message="I do not dislike cars."
check_list = {"like", "dislike", "hate", "cool"}
pattern = re.compile(r"(\b{}\b)".format("|".join(check_list))) #or re.compile(r"({})".format("|".join(check_list)))
m = pattern.search(message)
if m:
print(m.group(1)) # --> dislike

How can I get words after and before a specific token?

I currently work on a project which is simply creating basic corpus databases and tokenizes texts. But it seems I am stuck in a matter. Assume that we have those things:
import os, re
texts = []
for i in os.listdir(somedir): # Somedir contains text files which contain very large plain texts.
with open(i, 'r') as f:
texts.append(f.read())
Now I want to find the word before and after a token.
myToken = 'blue'
found = []
for i in texts:
fnd = re.findall('[a-zA-Z0-9]+ %s [a-zA-Z0-9]+|\. %s [a-zA-Z0-9]+|[a-zA-Z0-9]+ %s\.' %(myToken, myToken, myToken), i, re.IGNORECASE|re.UNICODE)
found.extend(fnd)
print myToken
for i in found:
print '\t\t%s' %(i)
I thought there would be three possibilities: The token might start sentence, the token might end sentence or the token might appear somewhere in the sentence, so I used the regex rule above. When I run, I come across those things:
blue
My blue car # What I exactly want.
he blue jac # That's not what I want. That must be "the blue jacket."
eir blue phone # Wrong! > their
a blue ali # Wrong! > alien
. Blue is # Okay.
is blue. # Okay.
...
I also tried \b\w\b or \b\W\b things, but unfortunately those did not return any results instead of returning wrong results. I tried:
'\b\w\b%s\b[a-zA-Z0-9]+|\.\b%s\b\w\b|\b\w\b%s\.'
'\b\W\b%s\b[a-zA-Z0-9]+|\.\b%s\b\W\b|\b\W\b%s\.'
I hope question is not too blur.
I think what you want is:
(Optionally) a word and a space;
(Always) 'blue';
(Optionally) a space and a word.
Therefore one appropriate regex would be:
r'(?i)((?:\w+\s)?blue(?:\s\w+)?)'
For example:
>>> import re
>>> text = """My blue car
the blue jacket
their blue phone
a blue alien
End sentence. Blue is
is blue."""
>>> re.findall(r'(?i)((?:\w+\s)?{0}(?:\s\w+)?)'.format('blue'), text)
['My blue car', 'the blue jacket', 'their blue phone', 'a blue alien', 'Blue is', 'is blue']
See demo and token-by-token explanation here.
Let's say token is test.
(?=^test\s+.*|.*?\s+test\s+.*?|.*?\s+test$).*
You can use lookahead.It will not eat up anything and at the same time validate as well.
http://regex101.com/r/wK1nZ1/2
Regex can be sometimes slow (if not implemented correctly) and moreover accepted answer did not work for me in several cases.
So I went for the brute force solution (not saying it is the best one), where keyword can be composed of several words:
#staticmethod
def find_neighbours(word, sentence):
prepost_map = []
if word not in sentence:
return prepost_map
split_sentence = sentence.split(word)
for i in range(0, len(split_sentence) - 1):
prefix = ""
postfix = ""
prefix_list = split_sentence[i].split()
postfix_list = split_sentence[i + 1].split()
if len(prefix_list) > 0:
prefix = prefix_list[-1]
if len(postfix_list) > 0:
postfix = postfix_list[0]
prepost_map.append([prefix, word, postfix])
return prepost_map
Empty string before or after the keyword indicates that keyword was the first or the last word in the sentence, respectively.

Python search for multiple values and show with boundaries

I am trying to allow the user to do this:
Lets say initially the text says:
"hello world hello earth"
when the user searches for "hello" it should display:
|hello| world |hello| earth
here's what I have:
m = re.compile(pattern)
i =0
match = False
while i < len(self.fcontent):
content = " ".join(self.fcontent[i])
i = i + 1;
for find in m.finditer(content):
print i,"\t"+content[:find.start()]+"|"+content[find.start():find.end()]+"|"+content[find.end():]
match = True
pr = raw_input( "(n)ext, (p)revious, (q)uit or (r)estart? ")
if (pr == 'q'):
break
elif (pr == 'p'):
i = i - 2
elif (pr == 'r'):
i = 0
if match is False:
print "No matches in the file!"
where :
pattern = user specified pattern
fcontent = contents of a file read in and stored as array of words and lines e.g:
[['line','1'],['line','2','here'],['line','3']]
however it prints
|hello| world hello earth
hello world |hello| earth
how can i merge the two lines to be displayed as one?
Thanks
Edit:
This a part of a larger search function where the pattern..in this case the word "hello" is passed from the user, so I have to use regex search/match/finditer to find the pattern. The replace and other methods sadly won't work because the user can choose to search for "[0-9]$" and that would mean to put the ending number between |'s
If you're just doing that, use str.replace.
print self.content.replace(m.find, "|%s|" % m.find)
you can use regexp as follows:
import re
src = "hello world hello earth"
dst = re.sub('hello', '|hello|', src)
print dst
or use string replace:
dst = src.replace('hello', '|hello|')
Ok, going back to original solution since OP confirmed that word would stand on its own (ie not be a substring of another word).
target = 'hello'
line = 'hello world hello earth'
rep_target = '|{}|'.format(target)
line = line.replace(target, rep_target)
yields:
|hello| world |hello| earth
As has been pointed out based on your example, using str.replace is the easiest. If more complex criteria is required, then you can adapt the following...
import re
def highlight(string, words, boundary='|'):
if isinstance(words, basestring):
words = [words]
rs = '({})'.format(boundary.join(sorted(map(re.escape, words), key=len, reverse=True)))
return re.sub(rs, lambda L: '{0}{1}{0}'.format(boundary, L.group(1)), string)

Identify substrings and return responses based on order of substrings in Python

I am a beginner in Python, I am teaching myself off of Google Code University online. One of the exercises in string manipulation is as follows:
# E. not_bad
# Given a string, find the first appearance of the
# substring 'not' and 'bad'. If the 'bad' follows
# the 'not', replace the whole 'not'...'bad' substring
# with 'good'.
# Return the resulting string.
# So 'This dinner is not that bad!' yields:
# This dinner is good!
def not_bad(s):
# +++your code here+++
return
I'm stuck. I know it could be put into a list using ls = s.split(' ') and then sorted with various elements removed, but I think that is probably just creating extra work for myself. The lesson hasn't covered RegEx yet so the solution doesn't involve re. Help?
Here's what I tried, but it doesn't quite give the output correctly in all cases:
def not_bad(s):
if s.find('not') != -1:
notindex = s.find('not')
if s.find('bad') != -1:
badindex = s.find('bad') + 3
if notindex > badindex:
removetext = s[notindex:badindex]
ns = s.replace(removetext, 'good')
else:
ns = s
else:
ns = s
else:
ns = s
return ns
Here is the output, it worked in 1/4 of the test cases:
not_bad
X got: 'This movie is not so bad' expected: 'This movie is good'
X got: 'This dinner is not that bad!' expected: 'This dinner is good!'
OK got: 'This tea is not hot' expected: 'This tea is not hot'
X got: "goodIgoodtgood'goodsgood goodbgoodagooddgood goodygoodegoodtgood
goodngoodogoodtgood" expected: "It's bad yet not"
Test Cases:
print 'not_bad'
test(not_bad('This movie is not so bad'), 'This movie is good')
test(not_bad('This dinner is not that bad!'), 'This dinner is good!')
test(not_bad('This tea is not hot'), 'This tea is not hot')
test(not_bad("It's bad yet not"), "It's bad yet not")
UPDATE: This code solved the problem:
def not_bad(s):
notindex = s.find('not')
if notindex != -1:
if s.find('bad') != -1:
badindex = s.find('bad') + 3
if notindex < badindex:
removetext = s[notindex:badindex]
return s.replace(removetext, 'good')
return s
Thanks everyone for helping me discover the solution (and not just giving me the answer)! I appreciate it!
Well, I think that it is time to make a small review ;-)
There is an error in your code: notindex > badindex should be changed into notindex < badindex. The changed code seems to work fine.
Also I have some remarks about your code:
It is usual practice to compute the value once, assign it to the variable and use that variable in the code below. And this rule seems to be acceptable for this particular case:
For example, the head of your function could be replaced by
notindex = s.find('not')
if notindex == -1:
You can use return inside of your function several times.
As a result tail of your code could be significantly reduced:
if (*all right*):
return s.replace(removetext, 'good')
return s
Finally i want to indicate that you can solve this problem using split. But it does not seem to be better solution.
def not_bad( s ):
q = s.split( "bad" )
w = q[0].split( "not" )
if len(q) > 1 < len(w):
return w[0] + "good" + "bad".join(q[1:])
return s
Break it down like this:
How would you figure out if the word "not" is in a string?
How would you figure out where the word "not" is in a string, if it is?
How would you combine #1 and #2 in a single operation?
Same as #1-3 except for the word "bad"?
Given that you know the words "not" and "bad" are both in a string, how would you determine whether the word "bad" came after the word "not"?
Given that you know "bad" comes after "not", how would you get every part of the string that comes before the word "not"?
And how would you get every part of the string that comes after the word "bad"?
How would you combine the answers to #6 and #7 to replace everything from the start of the word "not" and the end of the word "bad" with "good"?
Since you are trying to learn, I don't want to hand you the answer, but I would start by looking in the python documentation for some of the string functions including replace and index.
Also, if you have a good IDE it can help by showing you what methods are attached to an object and even automatically displaying the help string for those methods. I tend to use Eclipse for large projects and the lighter weight Spyder for small projects
http://docs.python.org/library/stdtypes.html#string-methods
I suspect that they're wanting you to use string.find to locate the various substrings:
>>> mystr = "abcd"
>>> mystr.find("bc")
1
>>> mystr.find("bce")
-1
Since you're trying to teach yourself (kudos, BTW :) I won't post a complete solution, but also note that you can use indexing to get substrings:
>>> mystr[0:mystr.find("bc")]
'a'
Hope that's enough to get you started! If not, just comment here and I can post more. :)
def not_bad(s):
snot = s.find("not")
sbad = s.find("bad")
if snot < sbad:
s = s.replace(s[snot:(sbad+3)], "good")
return s
else:
return s

Categories

Resources