Python search for multiple values and show with boundaries - python

I am trying to allow the user to do this:
Lets say initially the text says:
"hello world hello earth"
when the user searches for "hello" it should display:
|hello| world |hello| earth
here's what I have:
m = re.compile(pattern)
i =0
match = False
while i < len(self.fcontent):
content = " ".join(self.fcontent[i])
i = i + 1;
for find in m.finditer(content):
print i,"\t"+content[:find.start()]+"|"+content[find.start():find.end()]+"|"+content[find.end():]
match = True
pr = raw_input( "(n)ext, (p)revious, (q)uit or (r)estart? ")
if (pr == 'q'):
break
elif (pr == 'p'):
i = i - 2
elif (pr == 'r'):
i = 0
if match is False:
print "No matches in the file!"
where :
pattern = user specified pattern
fcontent = contents of a file read in and stored as array of words and lines e.g:
[['line','1'],['line','2','here'],['line','3']]
however it prints
|hello| world hello earth
hello world |hello| earth
how can i merge the two lines to be displayed as one?
Thanks
Edit:
This a part of a larger search function where the pattern..in this case the word "hello" is passed from the user, so I have to use regex search/match/finditer to find the pattern. The replace and other methods sadly won't work because the user can choose to search for "[0-9]$" and that would mean to put the ending number between |'s

If you're just doing that, use str.replace.
print self.content.replace(m.find, "|%s|" % m.find)

you can use regexp as follows:
import re
src = "hello world hello earth"
dst = re.sub('hello', '|hello|', src)
print dst
or use string replace:
dst = src.replace('hello', '|hello|')

Ok, going back to original solution since OP confirmed that word would stand on its own (ie not be a substring of another word).
target = 'hello'
line = 'hello world hello earth'
rep_target = '|{}|'.format(target)
line = line.replace(target, rep_target)
yields:
|hello| world |hello| earth

As has been pointed out based on your example, using str.replace is the easiest. If more complex criteria is required, then you can adapt the following...
import re
def highlight(string, words, boundary='|'):
if isinstance(words, basestring):
words = [words]
rs = '({})'.format(boundary.join(sorted(map(re.escape, words), key=len, reverse=True)))
return re.sub(rs, lambda L: '{0}{1}{0}'.format(boundary, L.group(1)), string)

Related

Pull several substrings from an input using specific characters to find them

I need to make a user created madlib where the user would input a madlib for someone else to use. The input would be something like this:
The (^noun^) and the (^adj^) (^noun^)
I need to pull anything between (^ and ^) so I can use the word to code so I get another input prompt to complete the madlib.
input('Enter "word in-between the characters":')
This is my code right now
madlib = input("Enter (^madlib^):")
a = "(^"
b = "^)"
start = madlib.find(a) + len(a)
end = madlib.find(b)
substring = madlib[start:end]
def mad():
if "(^" in madlib:
substring = madlib[start:end]
m = input("Enter " + substring + ":")
mad = madlib.replace(madlib[start:end],m)
return mad
print(mad())
What am I missing?
You can use re.finditer() to do this fairly cleanly by collecting the .span() of each match!
import re
# collect starting madlib
madlib_base = input('Enter madlib base with (^...^) around words like (^adj^)): ')
# list to put the collected blocks of spans and user inputs into
replacements = []
# yield every block like (^something^) by matching each end and `not ^` inbetween
for match in re.finditer(r"\(\^([^\^]+)\^\)", madlib_base):
replacements.append({
"span": match.span(), # position of the match in madlib_base
"sub_str": input(f"enter a {match.group(1)}: "), # replacement str
})
# replacements mapping and madlib_base can be saved for later!
def iter_replace(base_str, replacements_mapping):
# yield alternating blocks of text and replacement
# skip the replacement span from the text when yielding
base_index = 0 # index in base str to begin from
for block in replacements_mapping:
head, tail = block["span"] # unpack span
yield base_str[base_index:head] # next value up to span
yield block["sub_str"] # string the user gave us
base_index = tail # start from the end of the span
# collect the iterable into a single result string
# this can be done at the same time as the earlier loop if the input is known
result = "".join(iter_replace(madlib_base, replacements))
Demonstration
...
enter a noun: Madlibs
enter a adj: rapidly
enter a noun: house
...
>>> result
'The Madlibs and the rapidly house'
>>> replacements
[{'span': (4, 12), 'sub_str': 'Madlibs'}, {'span': (21, 28), 'sub_str': 'rapidly'}, {'span': (29, 37), 'sub_str': 'house'}]
>>> madlib_base
'The (^noun^) and the (^adj^) (^noun^)'
Your mad() function only does one substitution, and it's only called once. For your sample input with three required substitutions, you'll only ever get the first noun. In addition, mad() depends on values that are initialized outside the function, so calling it multiple times won't work (it'll keep trying to operate on the same substring, etc).
To fix it, you need to make it so that mad() does one substitution on whatever text you give it, regardless of any other state outside of the function; then you need to call it until it's substituted all the words. You can make this easier by having mad return a flag indicating whether it found anything to substitute.
def mad(text):
start = text.find("(^")
end = text.find("^)")
substring = text[start+2:end] if start > -1 and end > start else ""
if substring:
m = input(f"Enter {substring}: ")
return text.replace(f"(^{substring}^)", m, 1), True
return text, False
madlib, do_mad = input("Enter (^madlib^):"), True
while do_mad:
madlib, do_mad = mad(madlib)
print(madlib)
Enter (^madlib^):The (^noun^) and the (^adj^) (^noun^)
Enter noun: cat
Enter adj: lazy
Enter noun: dog
The cat and the lazy dog

How to print specific words in colour on python?

I want to print a specific word a different color every time it appears in the text. In the existing code, I've printed the lines that contain the relevant word "one".
import json
from colorama import Fore
fh = open(r"fle.json")
corpus = json.loads(fh.read())
for m in corpus['smsCorpus']['message']:
identity = m['#id']
text = m['text']['$']
strtext = str(text)
utterances = strtext.split()
if 'one' in utterances:
print(identity,text, sep ='\t')
I imported Fore but I don't know where to use it. I want to use it to have the word "one" in a different color.
output (section of)
44814 Ohhh that's the one Johnson told us about...can you send it to me?
44870 Kinda... I went but no one else did, I so just went with Sarah to get lunch xP
44951 No, it was directed in one place loudly and stopped when I stoppedmore or less
44961 Because it raised awareness but no one acted on their new awareness, I guess
44984 We need to do a fob analysis like our mcs onec
Thank you
You could also just use the ANSI color codes in your strings:
# define aliases to the color-codes
red = "\033[31m"
green = "\033[32m"
blue = "\033[34m"
reset = "\033[39m"
t = "That was one hell of a show for a one man band!"
utterances = t.split()
if "one" in utterances:
# figure out the list-indices of occurences of "one"
idxs = [i for i, x in enumerate(utterances) if x == "one"]
# modify the occurences by wrapping them in ANSI sequences
for i in idxs:
utterances[i] = red + utterances[i] + reset
# join the list back into a string and print
utterances = " ".join(utterances)
print(utterances)
If you only have 1 coloured word you can use this I think, you can expand the logic for n coloured words:
our_str = "Ohhh that's the one Johnson told us about...can you send it to me?"
def colour_one(our_str):
if "one" in our_str:
str1, str2 = our_str.split("one")
new_str = str1 + Fore.RED + 'one' + Style.RESET_ALL + str2
else:
new_str = our_str
return new_str
I think this is an ugly solution, not even sure if it works. But it's a solution if you can't find anything else.
i use colour module from this link or colored module that link
Furthermore if you dont want to use a module for coloring you can address to this link or that link

Replace a word in a String by indexing without "string replace function" -python

Is there a way to replace a word within a string without using a "string replace function," e.g., string.replace(string,word,replacement).
[out] = forecast('This snowy weather is so cold.','cold','awesome')
out => 'This snowy weather is so awesome.
Here the word cold is replaced with awesome.
This is from my MATLAB homework which I am trying to do in python. When doing this in MATLAB we were not allowed to us strrep().
In MATLAB, I can use strfind to find the index and work from there. However, I noticed that there is a big difference between lists and strings. Strings are immutable in python and will likely have to import some module to change it to a different data type so I can work with it like how I want to without using a string replace function.
just for fun :)
st = 'This snowy weather is so cold .'.split()
given_word = 'awesome'
for i, word in enumerate(st):
if word == 'cold':
st.pop(i)
st[i - 1] = given_word
break # break if we found first word
print(' '.join(st))
Here's another answer that might be closer to the solution you described using MATLAB:
st = 'This snow weather is so cold.'
given_word = 'awesome'
word_to_replace = 'cold'
n = len(word_to_replace)
index_of_word_to_replace = st.find(word_to_replace)
print st[:index_of_word_to_replace]+given_word+st[index_of_word_to_replace+n:]
You can convert your string into a list object, find the index of the word you want to replace and then replace the word.
sentence = "This snowy weather is so cold"
# Split the sentence into a list of the words
words = sentence.split(" ")
# Get the index of the word you want to replace
word_to_replace_index = words.index("cold")
# Replace the target word with the new word based on the index
words[word_to_replace_index] = "awesome"
# Generate a new sentence
new_sentence = ' '.join(words)
Using Regex and a list comprehension.
import re
def strReplace(sentence, toReplace, toReplaceWith):
return " ".join([re.sub(toReplace, toReplaceWith, i) if re.search(toReplace, i) else i for i in sentence.split()])
print(strReplace('This snowy weather is so cold.', 'cold', 'awesome'))
Output:
This snowy weather is so awesome.

Python Regular Expression to find all combinations of a Letter Number Letter Designation

I need to implement a Python regular expression to search for a all occurrences A1a or A_1_a or A-1-a or _A_1_a_ or _A1a, where:
A can be A to Z.
1 can be 1 to 9.
a can be a to z.
Where there are only three characters letter number letter, separated by Underscores, Dashes or nothing. The case in the search string needs to be matched exactly.
The main problem I am having is that sometimes these three letter combinations are connected to other text by dashes and underscores. Also creating the same regular expression to search for A1a, A-1-a and A_1_a.
Also I forgot to mention this is an XML file.
Thanks this found every occurrence of what I was looking for with a slight modification [-]?[A][-]?[1][-]?[a][-]?, but I need to have these be variables something like
[-]?[var_A][-]?[var_3][-]?[Var_a][-]?
would that be done like this
regex = r"[-]?[%s][-]?[%s][-]?[%s][-]?"
print re.findall(regex,var_A,var_Num,Var_a)
Or more like:
regex = ''.join(['r','\"','[-]?[',Var_X,'][-]?[',Var_Num,'][-]?[',Var_x,'][-]?','\"'‌​])
print regex
for sstr in searchstrs:
matches = re.findall(regex, sstr, re.I)
But this isn't working
Sample Lines of the File:
Before Running Script
<t:ION t:SA="BoolObj" t:H="2098947" t:P="2098944" t:N="AN7 Result" t:CI="Boolean_Register" t:L="A_3_a Fdr2" t:VS="true">
<t:ION t:SA="RegisterObj" t:H="20971785" t:P="20971776" t:N="ART1 Result 1" t:CI="NumericVariable_Register" t:L="A3a1 Status" t:VS="1">
<t:ION t:SA="ModuleObj" t:H="2100736" t:P="2097152" t:N="AND/OR 14" t:CI="AndOr_Module" t:L="A_3_a**_2 Energized from Norm" t:S="0" t:SC="5">
After Running Script
What I am getting: (It's deleting the entire line and leaving only what is below)
B_1_c
B1c1
B_1_c_2
What I Want to get:
<t:ION t:SA="BoolObj" t:H="2098947" t:P="2098944" t:N="AN7 Result" t:CI="Boolean_Register" t:L="B_1_c Fdr2" t:VS="true">
<t:ION t:SA="RegisterObj" t:H="20971785" t:P="20971776" t:N="ART1 Result 1" t:CI="NumericVariable_Register" t:L="B1c1 Status" t:VS="1">
<t:ION t:SA="ModuleObj" t:H="2100736" t:P="2097152" t:N="AND/OR 14" t:CI="AndOr_Module" t:L="B_1_c_2 Energized from Norm" t:S="0" t:SC="5">
import re
import os
search_file_name = 'Alarms Test.fwn'
pattern = 'A3a'
fileName, fileExtension = os.path.splitext(search_file_name)
newfilename = fileName + '_' + pattern + fileExtension
outfile = open(newfilename, 'wb')
def find_ext(text):
matches = re.findall(r'([_-]?[A{1}][_-]?[3{1}][_-]?[a{1}][_-]?)', text)
records = [m.replace('3', '1').replace('A', 'B').replace('a', 'c') for m in matches]
if matches:
outfile.writelines(records)
return 1
else:
outfile.writelines(text)
return 0
def main():
success = 0
count = 0
with open(search_file_name, 'rb') as searchfile:
try:
searchstrs = searchfile.readlines()
for s in searchstrs:
success = find_ext(s)
count = count + success
finally:
searchfile.close()
print count
if __name__ == "__main__":
main()
You want to use the following to find your matches.
matches = re.findall(r'([_-]?[a-z][_-]?[1-9][_-]?[a-z][_-]?)', s, re.I)
See regex101 demo
If your are looking to find the matches then strip all of the -, _ characters, you could do..
import re
s = '''
A1a _A_1 A_ A_1_a A-1-a _A_1_a_ _A1a _A-1-A_ a1_a A-_-5-a
_A-_-5-A a1_-1 XMDC_A1a or XMDC-A1a or XMDC_A1-a XMDC_A_1_a_ _A-1-A_
'''
def find_this(text):
matches = re.findall(r'([_-]?[a-z][_-]?[1-9][_-]?[a-z][_-]?)', text, re.I)
records = [m.replace('-', '').replace('_', '') for m in matches]
print records
find_this(s)
Output
['A1a', 'A1a', 'A1a', 'A1a', 'A1a', 'A1A', 'a1a', 'A1a', 'A1a', 'A1a', 'A1a', 'A1A']
See working demo
To quickly get the A1as out without the punctuation, and not having to reconstruct the string from captured parts...
t = '''A1a _B_2_z_
A_1_a
A-1-a
_A_1_a_
_C1c '''
re.findall("[A-Z][0-9][a-z]",t.replace("-","").replace("_",""))
Output:
['A1a', 'B2z', 'A1a', 'A1a', 'A1a', 'C1c']
(But if you don't want to capture from FILE.TXT-2b, then you would have to be careful about most of these solutions...)
If the string can be separated by multiple underscores or dashes (e.g. A__1a):
[_-]*[A-Z][_-]*[1-9][_-]*[a-z]
If there can only be one or zero underscores or dashes:
[_-]?[A-Z][_-]?[1-9][_-]?[a-z]
regex = r"[A-Z][-_]?[1-9][-_]?[a-z]"
print re.findall(regex,some_string_variable)
should work
to just capture the parts your interested in wrap them in parens
regex = r"([A-Z])[-_]?([1-9])[-_]?([a-z])"
print re.findall(regex,some_string_variable)
if the underscores or dashes or lack thereof must match or it will return bad results you would need a statemachine whereas regex is stateless

Python: matching OR of two variables containing regex code

What I am trying to do is to take user input text which would contain wildcards (so I need to keep them that way) but furthermore to look for the specified input. So for example that I have working below I use the pipe |.
I figured out how to make this work:
dual = 'a bunch of stuff and a bunch more stuff!'
reobj = re.compile('b(.*?)f|\s[a](.*?)u', re.IGNORECASE)
result = reobj.findall(dual)
for link in result:
print link[0] +' ' + link[1]
which returns:
unch o
nd a b
As well
dual2 = 'a bunch of stuff and a bunch more stuff!'
#So I want to now send in the regex codes of my own.
userin1 = 'b(.*?)f'
userin2 = '\s[a](.*?)u'
reobj = re.compile(userin1, re.IGNORECASE)
result = reobj.findall(dual2)
for link in result:
print link[0] +' ' + link[1]
Which returns:
u n
u n
I don't understand what it is doing as if I get rid of all save link[0] in print I get:
u
u
I however can pass in a user input regex string:
dual = 'a bunch of stuff and a bunch more stuff!'
userinput = 'b(.*?)f'
reobj = re.compile(userinput, re.IGNORECASE)
result = reobj.findall(dual)
print(result)
but when I try to update this to two user strings with the pipe:
dual = 'a bunch of stuff and a bunch more stuff!'
userin1 = 'b(.*?)f'
userin2 = '\s[a](.*?)u'
reobj = re.compile(userin1|userin2, re.IGNORECASE)
result = reobj.findall(dual)
print(result)
I get the error:
reobj = re.compile(userin1|userin2, re.IGNORECASE)
TypeError: unsupported operand type(s) for |: 'str' and 'str'
I get this error a lot such as if I put brackets () or [] around userin1|userin2.
I have found the following:
Python regular expressions OR
but can not get it to work ;..{-( .
What I would like to do is to be able to understand how to pass in these regex variables such as that of OR and return all the matches of both as well as something such as AND - which in the end is useful as it will operate on files and let me know which files contain particular words with the various logical relations OR, AND etc.
Thanks much for your thoughts,
Brian
Although I couldn't get the answer from A. Rodas to work, he gave the idea for the .join. The example I worked out - although slightly different returns (in link[0] and link[1]) the desired results.
userin1 = '(T.*?n)'
userin2 = '(G.*?p)'
list_patterns = [userin1,userin2]
swaplogic = '|'
string = 'What is a Torsion Abelian Group (TAB)?'
theresult = re.findall(swaplogic.join(list_patterns), string)
print theresult
for link in theresult:
print link[0]+' '+link[1]

Categories

Resources