a. I have a line as given below:
HELLO CMD-LINE: hello how are you -color blue how is life going -color red,green life is pretty -color orange,violet,red
b. I wanted to print the string after -color.
c. I tried the below reg exp method,
for i in range (len(tar_read_sp)):
print tar_read_sp[i]
wordy = re.findall(r'-color.(\w+)', tar_read_sp[i], re.M|re.I|re.U)
# print "%s"%(wordy.group(0))
if wordy:
print "Matched"
print "Full match: %s" % (wordy)
print "Full match: %s" % (wordy[0])
# wordy_ls = wordy.group(0).split('=')
# print wordy_ls[1]
# break
else:
print "Not Matched"
but it prints only the first word matching after the string like,
['blue', 'red', 'orange'].
c. But how to print all the string after matching string? like
['blue', 'red', 'green', 'orange', 'violet'] and remove the repeating variable?
Please share your comments and suggestions to print the same?
Agree with depperm: fix your indentation.
Using his regex suggestion and combining it with the necessary split, de-duping, and re-ordering the list:
wordy = re.findall(r'(?:-color.((?:\w+,?)+))', test_string, re.M|re.I|re.U)
wordy = list({new_word for word in wordy for new_word in word.split(',')})[::-1]
That should give you a flattened, unique list like you asked for (at least I assume that's what you mean by "remove the repeating variable").
My personal preference would to do something like this:
import re
tar_read_sp = "hello how are you -color blue how is life going -color red,green life is pretty -color orange,violet,red"
wordy = re.findall(r'-color.([^\s]+)', tar_read_sp, re.I)
big_list = []
for match in wordy:
small_list = match.split(',')
big_list.extend(small_list)
big_set = list(set(big_list))
print (big_set)
I find this approach a little easier to read and update down the road. The idea is to get all those color matches, build a big list of them, and the use set to dedupe it. The regex I'm using:
-color ([^\s])+
Will capture the 'small_list' of colors up the the next space.
I have a solution not using regex.
test_string = 'hello how are you -color blue how is life going -color red,green life is pretty -color orange,violet,red'
result = []
for colors in [after_color.split(' ')[1] for after_color in test_string.split('-color')[1:]]:
result = result+colors.split(',')
print result
The result is:
['blue', 'red', 'green', 'orange', 'violet', 'red']
Related
I use this answer for printing colored in python, but for printing variable I have problem because it prints it like a tuple like below:
from termcolor import colored
val = 'fruit'
print(colored(('Banana is', val), 'yellow'))
>>> ('Banana is', 'fruit')
But this is the output I want:
>>> Banana is fruit
(also without apostrophe)
print(colored(f"Banana is {val}", 'yellow'))
The f behind the string makes it so u can insert variables into a string
print(colored(f"Banana is {val}", 'yellow'))
Lets say I have two strings.
a = 'I am Sam. I love cooking.'
b = 'I am sam. I used to drink a lot.'
I am calculating their similarity score using :
from difflib import SequenceMatcher
s = SequenceMatcher(lambda x: x == " ",a,b)
print s.ratio()
Now I want to print non-matching sentences in both strings. Like this
a = 'I love cooking.'
b = 'I used to drink a lot.'
Any suggestion like what module or approach I can use to do that? I saw one module in difflib https://pymotw.com/2/difflib/ But in this it prints with (+,-,!,...) I don't want output in that format.
It is a very simple script . But i hope it gives you idea of how to do:
a = 'I am Sam. I love cooking.'
b = 'I am sam. I used to drink a lot.'
a= a.split('.')
b=b.split('.')
ca=len(a)
cb=len(b)
if ca>cb:l=cb
else :l=ca
c=0
while c<l:
if a[c].upper() == b[c].upper():pass
else:print b[c]+'.'
c=c+1
Use difflib. You can easily post-process the output of difflib.Differ, to strip off the first two characters of each unit and convert them to any format you want. Or you can work with the alignments returned by SequenceMatcher.get_matching_blocks, and generate your own output.
Here's how you might do it. If that's not what you want, edit your question to provide a less simplistic example of comparison and the output format you need.
differ = difflib.Differ()
for line in differ.compare(list1, list2):
if line.startswith("-"):
print("a="+line[2:])
elif line.startswith("+"):
print("b="+line[2:])
# else just ignore the line
I am making a program that has a small way of self learning, but now I want to get "information" from the output like:
>>>#ff0000 is the hexcode for the color red
I want to filter with reggular expressions that the user filled this sentence is the hexcode for the color, and that I retrieve the name of the color and the hexcode. I have put a small code below how I want to works:
#main.py
strInput = raw_input("Please give a fact:")
if "{0} is the hexcode for the color {1}" in strInput:
# {0} is the name of the color
# {1} is the hexcode of the color
print "You give me an color"
if "{0} is an vehicle" in strInput:
# {0} is an vehicle
print "You give me an vehicle"
Is this possible with reggular expressions, and what is the best way to do it with reggular expressions?
You can read about regular expressions in Python in the standard library documentation. Here, I'm using named groups to store the matched value into a dictionary structure with a key that you choose.
>>> import re
>>> s = '#ff0000 is the hexcode for the color red'
>>> m = re.match(r'(?P<hexcode>.+) is the hexcode for the color (?P<color>.+)', s)
>>> m.groupdict()
{'color': 'red', 'hexcode': '#ff0000'}
Note that if there's no match using your regular expression, the m object here will be None.
Based on the given input:
I can do waaaaaaaaaaaaay better :DDDD!!!! I am sooooooooo exicted about it :))) Good !!
Desired: output
I can do way/LNG better :D/LNG !/LNG I am so/LNG exicted about it :)/LNG Good !/LNG
--- Challenges:
better vs. soooooooooo >> we need to keep the first one as is but shorten the second
for the second we need to add a tag (LNG) as it might have some importance for intensification for subjectivity and sentiment
---- Problem: error message "unbalanced parentheses"
Any ideas?
My code is:
import re
lengWords = {} # a dictionary of lengthened words
def removeDuplicates(corpus):
data = (open(corpus, 'r').read()).split()
myString = " ".join(data)
for word in data:
for chr in word:
countChr = word.count(chr)
if countChr >= 3:
lengWords[word] = word+"/LNG"
lengWords[word] = re.sub(r'([A-Za-z])\1+', r'\1', lengWords[word])
lengWords[word] = re.sub(r'([\'\!\~\.\?\,\.,\),\(])\1+', r'\1', lengWords[word])
for k, v in lengWords.items():
if k == word:
re.sub(word, v, myString)
return myString
It's not the perfect solution, but I don't have time to refine it now- just wanted to get you started with easy approach:
s = "I can do waaaaaaaaaaaaay better :DDDD!!!! I am sooooooooo exicted about it :))) Good !!"
re.sub(r'(.)(\1{2,})',r'\1/LNG',s)
>> 'I can do wa/LNGy better :D/LNG!/LNG I am so/LNG exicted about it :)/LNG Good !!'
I want to match a list of words with an string and get how many of the words are matched.
Now I have this:
import re
words = ["red", "blue"]
exactMatch = re.compile(r'\b%s\b' % '\\b|\\b'.join(words), flags=re.IGNORECASE)
print exactMatch.search("my blue cat")
print exactMatch.search("my red car")
print exactMatch.search("my red and blue monkey")
print exactMatch.search("my yellow dog")
My current regex will match the first 3, but I would like to find out how many of the words in the list words that matches the string passed to search. Is this possible without making a new re.compile for each word in the list?
Or is there another way to achieve the same thing?
The reason I want to keep the number of re.compile to a minimum is speed, since in my application I have multiple word lists and about 3500 strings to search against.
If you use findall instead of search, then you get a tuple as result containing all the matched words.
print exactMatch.findall("my blue cat")
print exactMatch.findall("my red car")
print exactMatch.findall("my red and blue monkey")
print exactMatch.findall("my yellow dog")
will result in
['blue']
['red']
['red', 'blue']
[]
If you need to get the amount of matches you get them using len()
print len(exactMatch.findall("my blue cat"))
print len(exactMatch.findall("my red car"))
print len(exactMatch.findall("my red and blue monkey"))
print len(exactMatch.findall("my yellow dog"))
will result in
1
1
2
0
If I got right the question, you only want to know the number of matches of blue or red in a sentence.
>>> exactMatch = re.compile(r'%s' % '|'.join(words), flags=re.IGNORECASE)
>>> print exactMatch.findall("my blue blue cat")
['blue', 'blue']
>>> print len(exactMatch.findall("my blue blue cat"))
2
You need more code if you want to test multiple colors
Why not storing all words in a hash and iterate a lookup of every words in sentences thru a finditer
words = { "red": 1 .... }
word = re.compile(r'\b(\w+)\b')
for i in word.finditer(sentence):
if words.get(i.group(1)):
....
for w in words:
if w in searchterm:
print "found"