python string replacement, all possible combinations #2 - python

I have sentences like the following:
((wouldyou)) give me something ((please))
and a bunch of keywords, stored in arrays / lists:
keywords["wouldyou"] = ["can you", "would you", "please"]
keywords["please"] = ["please", "ASAP"]
I want to replace every occurrence of variables in parentheses with a suitable set of strings stored in an array and get every possible combination back. The amount of variables and keywords is undefined.
James helped me with the following code:
def filler(word, from_char, to_char):
options = [(c,) if c != from_char else (from_char, to_char) for c in word.split(" ")]
return (' '.join(o) for o in product(*options))
list(filler('((?please)) tell me something ((?please))', '((?please))', ''))
It works great but only replaces one specific variable with empty strings. Now I want to go through various variables with different set of keywords. The desired result should look something like this:
can you give me something please
would you give me something please
please give me something please
can you give me something ASAP
would you give me something ASAP
please give me something ASAP
I guess it has something to do with to_ch, but I have no idea how to compare through list items at this place.

The following would work. It uses itertools.product to construct all of the possible pairings (or more) of your keywords.
import re, itertools
text = "((wouldyou)) give me something ((please))"
keywords = {}
keywords["wouldyou"] = ["can you", "would you", "please"]
keywords["please"] = ["please", "ASAP"]
# Get a list of bracketed terms
lsources = re.findall("\(\((.*?)\)\)", text)
# Build a list of the possible substitutions
ldests = []
for source in lsources:
ldests.append(keywords[source])
# Generate the various pairings
for lproduct in itertools.product(*ldests):
output = text
for src, dest in itertools.izip(lsources, lproduct):
# Replace each term (you could optimise this using a single re.sub)
output = output.replace("((%s))" % src, dest)
print output
You could further improve it by avoiding the need to do multiple replace() and assignment calls with one re.sub() call.
This scripts gives the following output:
can you give me something please
can you give me something ASAP
would you give me something please
would you give me something ASAP
please give me something please
please give me something ASAP
It was tested using Python 2.7. You will need to think how to solve it if multiple identical keywords were used. Hopefully you find this useful.

This is a job for Captain Regex!
Partial, pseudo-codey, solution...
One direct, albeit inefficient (like O(n*m) where n is number of words to replace and m is average number of replacements per word), way to do this would be to use the regex functionality in the re module to match the words, then use the re.sub() method to swap them out. Then you could just embed that in nested loops. So (assuming you get your replacements into a dict or something first), it would look something like this:
for key in repldict:
regexpattern = # construct a pattern on the fly for key
for item in repldict[key]:
newstring = re.sub(regexpattern, item)
And so forth. Only, you know, like with correct syntax and stuff. And then just append the newstring to a list, or print it, or whatever.
For creating the regexpatterns on the fly, string concatenation just should do it. Like a regex to match left parens, plus the string to match, plus a regex to match right parens.
If you do it that way, then you can handle the optional features just by looping over a second version of the regex pattern which appends a question mark to the end of the left parens, then does whatever you want to do with that.

Related

how to find a set amount of letters in between symbols

hey I was wondering if it is possible to collect a set amount of letters or words in between brackets, an example would be I want to collect the words in between the brackets :
I like food (because it tastes) good
how can I collect the word(s) or letter(s) or number(s) in between those brackets?
thank you.
You can do it using regex in python. I took the liberty of generalizing your example!!
import re
re.findall("\((.*?)\)","I like food (because it tastes) good also (hello world)")
Output:
['because it tastes', 'hello world']
Explanation:
The regex says that whatever expressions are present inside braces pick them out for me.
Why did I use \? Because ( and ) are special characters in regex. So if you want to use them as ordinary braces, you need to tell that to the compiler by using \ before the braces.
Why did I use ??
? is used for two purposes:
a. Whenever you want to give an option like either this could be present or not. For eg. apples? would match apple and apples both.
b. Whenever you want to select a minimum of something. For eg. had I not used ? here, the output that I would have got would be something like ['because it tastes) good also (hello world']. Look closely and you will find that the max expression that lies inside two braces is the one selected.
You can do something like:
x = "I like food (because it tastes) good"
print(len(x.split("(")[-1].split(")")[0]))
I believe there are better ways, but this works.

Python split with regular expression to divide string

I have a need to recover 2 results of a regular expression in Python: what is searched and all else.
For example, in:
"boofums",3,4
I'd like to find what is in the quotes and what isn't:
boofums
,3,4
What I have so far is:
bobbles = '"boofums",3,4'
pickles = re.split(r'\".*\"', bobbles)
morton = re.match(r'\".*\"', bobbles)
print(pickles[1])
print(morton[0])
,3,4
"boofums"
This seems to me insanely inefficient and not Python-esque. Is there a better way to do this? (Sorry for the "is there a better way" construct on StackOverflow, but... I need to do this better! 😂)
...and if you can help me extract just what's in the quotes, something that I'd easily do in Perl or Ruby, all the better!
You're probably best off with regex groupings:
So for your example I'd use something like
regex = re.compile("\"(.*)\"(.*)")
bobble_groups = regex.match(bobbles)
you can then use bobble_groups.group(1) to just get the quotation marks.
See named groups if you don't want to depend on an index number.
a, b = re.match('"(.*)"(.*)', bobbles).groups()
Brackets determine groups that are "saved" to the match object

Exact keyword match in string

I know this question has been asked almost hundred times in stack overflow but after doing lot of search and not finding my answer, I am asking this question.
I am looking to search exact word from strings something like below.
'svm_midrangedb_nonprod:svm_midrangedb_nonprod_root'
'svm_midrangedb_prod:svm_midrangedb_prod_root'
I want to search only for 'prod' but getting both 'prod' and 'nonprod' in output.
Here is the code I am using:
re.search(r"\wprod\w", in_volumes.json()[i]['name'].split(":")[2].lower())
You have to make rules to not match nonprod but match prod.
For example, maybe you can make it so that if there's n infront of prod, you exclude it like this: [^n]prod\w.
Or maybe some data has n infront of prod and you want to keep it. Then, you want to exclude if there's non infront of prod like this: \w*(?<!non)prod\w*.
It really depends on the rest of your data and see what kind of rules you can make/apply to them to get your desired data.
It's normal because your regular expression tell that you want a string containing "prod", in order to solve that very easily you can do the same thing you did but like follow
re.search(r"\w_prod\w", in_volumes.json()[i]['name'].split(":")[2].lower())
I just add a _ character existing in your targeted string

in python find index in list if combination of strings exist

I'm writing my first script and trying to learn python.
But I'm stuck and can't get out of this one.
I'm writing a script to change file names.
Lets say I have a string = "this.is.tEst3.E00.erfeh.ervwer.vwtrt.rvwrv"
I want the result to be string = "This Is Test3 E00"
this is what I have so far:
l = list(string)
//Transform the string into list
for i in l:
if "E" in l:
p = l.index("E")
if isinstance((p+1), int () is True:
if isinstance((p+2), int () is True:
delp = p+3
a = p-3
del l[delp:]
new = "".join(l)
new = new.replace("."," ")
print (new)
get in index where "E" and check if after "E" there are 2 integers.
Then delete everything after the second integer.
However this will not work if there is an "E" anyplace else.
at the moment the result I get is:
this is tEst
because it is finding index for the first "E" on the list and deleting everything after index+3
I guess my question is how do I get the index in the list if a combination of strings exists.
but I can't seem to find how.
thanks for everyone answers.
I was going in other direction but it is also not working.
if someone could see why it would be awesome. It is much better to learn by doing then just coping what others write :)
this is what I came up with:
for i in l:
if i=="E" and isinstance((i+1), int ) is True:
p = l.index(i)
print (p)
anyone can tell me why this isn't working. I get an error.
Thank you so much
Have you ever heard of a Regular Expression?
Check out python's re module. Link to the Docs.
Basically, you can define a "regex" that would match "E and then two integers" and give you the index of it.
After that, I'd just use python's "Slice Notation" to choose the piece of the string that you want to keep.
Then, check out the string methods for str.replace to swap the periods for spaces, and str.title to put them in Title Case
An easy way is to use a regex to find up until the E followed by 2 digits criteria, with s as your string:
import re
up_until = re.match('(.*?E\d{2})', s).group(1)
# this.is.tEst3.E00
Then, we replace the . with a space and then title case it:
output = up_until.replace('.', ' ').title()
# This Is Test3 E00
The technique to consider using is Regular Expressions. They allow you to search for a pattern of text in a string, rather than a specific character or substring. Regular Expressions have a bit of a tough learning curve, but are invaluable to learn and you can use them in many languages, not just in Python. Here is the Python resource for how Regular Expressions are implemented:
http://docs.python.org/2/library/re.html
The pattern you are looking to match in your case is an "E" followed by two digits. In Regular Expressions (usually shortened to "regex" or "regexp"), that pattern looks like this:
E\d\d # ('\d' is the specifier for any digit 0-9)
In Python, you create a string of the regex pattern you want to match, and pass that and your file name string into the search() method of the the re module. Regex patterns tend to use a lot of special characters, so it's common in Python to prepend the regex pattern string with 'r', which tells the Python interpreter not to interpret the special characters as escape characters. All of this together looks like this:
import re
filename = 'this.is.tEst3.E00.erfeh.ervwer.vwtrt.rvwrv'
match_object = re.search(r'E\d\d', filename)
if match_object:
# The '0' means we want the first match found
index_of_Exx = match_object.end(0)
truncated_filename = filename[:index_of_Exx]
# Now take care of any more processing
Regular expressions can get very detailed (and complex). In fact, you can probably accomplish your entire task of fully changing the file name using a single regex that's correctly put together. But since I don't know the full details about what sorts of weird file names might come into your program, I can't go any further than this. I will add one more piece of information: if the 'E' could possibly be lower-case, then you want to add a flag as a third argument to your pattern search which indicates case-insensitive matching. That flag is 're.I' and your search() method would look like this:
match_object = re.search(r'E\d\d', filename, re.I)
Read the documentation on Python's 're' module for more information, and you can find many great tutorials online, such as this one:
http://www.zytrax.com/tech/web/regex.htm
And before you know it you'll be a superhero. :-)
The reason why this isn't working:
for i in l:
if i=="E" and isinstance((i+1), int ) is True:
p = l.index(i)
print (p)
...is because 'i' contains a character from the string 'l', not an integer. You compare it with 'E' (which works), but then try to add 1 to it, which errors out.

Using variable in re.match in python

I am trying to create an array of things to match in a description line. So I can ignore them later on in my script. Below is a sample script that I have been working on, on the side.
Basically I am trying to take a bunch of strings and match it against a bunch of other strings.
AKA:
asdf or asfs or wrtw in string = true continue with script
if not print this.
import re
ignorelist = ['^test', '(.*)set']
def guess(a):
for ignore in ignorelist:
if re.match(ignore, a):
return('LOSE!')
else:
return('WIN!')
a = raw_input('Take a guess: ')
print guess(a)
Thanks
You have a bit of logic/flow problem.
You test the first term in the list. If it doesn't match, you go to the else and return "WIN!" without testing any of the other terms in the list.
(Also, ignorelist is outside the function.)
[EDIT: I see you edited the question to include regular expressions, so I will edit the answer back to a re context...] Note that you should use re.search instead of re.match if you want to give it actual regex since re.match only matches at the beginning of the line.
There are innumerable ways to change this, depending on how you want your program to work.
I would re-write guess along these lines. (You can also put ignorelist inside the function instead of passing it.):
ignorelist = [r'^test', r'[abc]set']
def guess(a,il):
for reg in il:
if re.search(reg,a):
return "LOSE"
return "WIN"
a = raw_input()
print guess(a,ignorelist)
In this case, it will loop through each word, exiting if it finds a match, but if it doesn't (completes the loop without returning anything) then it will finally return "WIN".
I think it would be far better using a single regex, or a set of them if only one would be to big to compile. Something like:
GUESSER = re.compile('|'.join(ignorelist))
def guess(a):
if GUESSER.search(a):
return('WIN!')
else:
return('LOSE!')
Note: Pattern in "ignorelist" should be enclosed in a pair of parentheses if they use the or "|" operator.

Categories

Resources