Exact keyword match in string - python

I know this question has been asked almost hundred times in stack overflow but after doing lot of search and not finding my answer, I am asking this question.
I am looking to search exact word from strings something like below.
'svm_midrangedb_nonprod:svm_midrangedb_nonprod_root'
'svm_midrangedb_prod:svm_midrangedb_prod_root'
I want to search only for 'prod' but getting both 'prod' and 'nonprod' in output.
Here is the code I am using:
re.search(r"\wprod\w", in_volumes.json()[i]['name'].split(":")[2].lower())

You have to make rules to not match nonprod but match prod.
For example, maybe you can make it so that if there's n infront of prod, you exclude it like this: [^n]prod\w.
Or maybe some data has n infront of prod and you want to keep it. Then, you want to exclude if there's non infront of prod like this: \w*(?<!non)prod\w*.
It really depends on the rest of your data and see what kind of rules you can make/apply to them to get your desired data.

It's normal because your regular expression tell that you want a string containing "prod", in order to solve that very easily you can do the same thing you did but like follow
re.search(r"\w_prod\w", in_volumes.json()[i]['name'].split(":")[2].lower())
I just add a _ character existing in your targeted string

Related

I wish to take the middle pattern of the sentence in chinese character using regex

I tried to take the middle words based on my pattern. Below are my codes:
text = "東京都田中区9-7−4"
import re
#Sorry due to the edit problem and stackoverflow doesnt allow me to include long sentences here, please check my comment below for the compile function of re.
city = re.findall(r,text)
print("getCity: {}".format(city))
My current output:
getCity: ['都田中区']
My expected output:
getCity: ['田中区']
I do not want to take the [都道府県] so I use "?!" in my first beginning pattern as (?!...??[都道府県]). However, when I run my program, it shows that "都" is inside as well like I show on my current output. Could anyone please direct me on this?
The problem with your regex is that it is too allowing.
If you look at this visualisation here (I have removed all the hardcoded city (市) names because they are irrelevant):
you can see a lot of "any character" repeated x times, or just "not 市" and "not 町" repeated x times. These are what matches the 都道府県 in your string. Therefore, these are the places where you should disallow 都道府県:
The corresponding regex would be:
(?:余市|高市|[^都道府県市]{2,3}?)郡(?:玉村|大町|[^都道府県]{1,5}?)[町村]|(?:[^都道府県]{1,4}市)?[^都道府県町]{1,4}?区|[^都道府県]{1,7}?[市町村]
Remember to add the hardcoded cities when you put this in your code!

Regex lookbehind and lookahead doesn't find any match

I have a lot of data that I need to parse and output in different format. The data looks something like this:
tag="001">utb20181009818<
tag="003">CZ PrNK<
...
And now, I want to extract 'utb20181009818' after after 'tag="001">' and before the last '<'
This is my code in python:
regex_pattern = re.compile(r'''(?=(tag="001(.*?)">)).*?(?<=[<])''')
ID = regex_pattern.match(one_line)
print(ID)
My variable one_line already contains the necessary data and I just need to extract the value, but it doesn't seem to match no matter what I do. I looked at it for hours, but doesn't seem to find out what I'm doing wrong.
Try regex tag=\"001\">(.*?)< and capture the first group ID.group(1)
Regex
The issue is that lookaheads don't move the match position to the right because they don't match anything - they only look.
Obviously, utilizing a match group as suggested would be the simplest way to go here, as you wouldn't have to take pains to avoid matching the parts you don't want.
But if your "001" isn't variable length, I think what you want is actually a lookbehind/lookahead (not lookahead/lookbehind):
(?<=tag="001">).*(?=<)
https://regex101.com/r/rMQnna/3/

python string replacement, all possible combinations #2

I have sentences like the following:
((wouldyou)) give me something ((please))
and a bunch of keywords, stored in arrays / lists:
keywords["wouldyou"] = ["can you", "would you", "please"]
keywords["please"] = ["please", "ASAP"]
I want to replace every occurrence of variables in parentheses with a suitable set of strings stored in an array and get every possible combination back. The amount of variables and keywords is undefined.
James helped me with the following code:
def filler(word, from_char, to_char):
options = [(c,) if c != from_char else (from_char, to_char) for c in word.split(" ")]
return (' '.join(o) for o in product(*options))
list(filler('((?please)) tell me something ((?please))', '((?please))', ''))
It works great but only replaces one specific variable with empty strings. Now I want to go through various variables with different set of keywords. The desired result should look something like this:
can you give me something please
would you give me something please
please give me something please
can you give me something ASAP
would you give me something ASAP
please give me something ASAP
I guess it has something to do with to_ch, but I have no idea how to compare through list items at this place.
The following would work. It uses itertools.product to construct all of the possible pairings (or more) of your keywords.
import re, itertools
text = "((wouldyou)) give me something ((please))"
keywords = {}
keywords["wouldyou"] = ["can you", "would you", "please"]
keywords["please"] = ["please", "ASAP"]
# Get a list of bracketed terms
lsources = re.findall("\(\((.*?)\)\)", text)
# Build a list of the possible substitutions
ldests = []
for source in lsources:
ldests.append(keywords[source])
# Generate the various pairings
for lproduct in itertools.product(*ldests):
output = text
for src, dest in itertools.izip(lsources, lproduct):
# Replace each term (you could optimise this using a single re.sub)
output = output.replace("((%s))" % src, dest)
print output
You could further improve it by avoiding the need to do multiple replace() and assignment calls with one re.sub() call.
This scripts gives the following output:
can you give me something please
can you give me something ASAP
would you give me something please
would you give me something ASAP
please give me something please
please give me something ASAP
It was tested using Python 2.7. You will need to think how to solve it if multiple identical keywords were used. Hopefully you find this useful.
This is a job for Captain Regex!
Partial, pseudo-codey, solution...
One direct, albeit inefficient (like O(n*m) where n is number of words to replace and m is average number of replacements per word), way to do this would be to use the regex functionality in the re module to match the words, then use the re.sub() method to swap them out. Then you could just embed that in nested loops. So (assuming you get your replacements into a dict or something first), it would look something like this:
for key in repldict:
regexpattern = # construct a pattern on the fly for key
for item in repldict[key]:
newstring = re.sub(regexpattern, item)
And so forth. Only, you know, like with correct syntax and stuff. And then just append the newstring to a list, or print it, or whatever.
For creating the regexpatterns on the fly, string concatenation just should do it. Like a regex to match left parens, plus the string to match, plus a regex to match right parens.
If you do it that way, then you can handle the optional features just by looping over a second version of the regex pattern which appends a question mark to the end of the left parens, then does whatever you want to do with that.

How to parse a list in Django urlparser?

On Stack Overflow, you can view a list of questions with multiple tags at a URL such as http://stackoverflow.com/questions/tagged/django+python.
I'd like to do something similar in a project I am working on, where one of the url parameters would be a list of tags, but I'm not sure how to write a regex urlparser that can parse it out. I'm fond of SO's way of using the + sign, but it's not a dealbreaker. I also imagine that the urlparser may have to take the whole string (foo+bar+baz) as a single variable to give to the view, which is also fine as I can just split it in the view itself- that is, I'm not expecting the URL parser to give the view an already split list, but if it can, even better!
Right now all I have is:
url(r'^documents/tag/(?P<tag>\w+)/$', ListDocuments.as_view(), name="list_documents"),
Which just pulls out one single tag since \w+ just gets me those [A-Za-z0-9_], but not +. I tried something like:
url(r'^documents/tag/(?P<tag>[\w+\+*])/$', ListDocuments.as_view(), name="list_documents"),
But this no longer matched documents/tag/foo nor documents/tag/foo+bar.
Please assist, I'm not so great with regex, thanks!
It's not possible to do this automatically. From the documentation: "Each captured argument is sent to the view as a plain Python string, regardless of what sort of match the regular expression makes." Splitting it in the view is the way to go.
The second regex in your answer is OK, but it does allow some things you might not want (e.g. 'django+++python+'). A stricter version might be something like: (?P<tag>\w+(?:\+\w+)*). Then you can just do a simple tag.split('+') in the view without worrying about any edge cases.
This works for now:
url(r'^documents/tag/(?P<tag>[A-Za-z0-9_\+]+)/$', ListDocuments.as_view(), name="list_documents"),
But I'd like to be able to get that w back in there instead of the full list of characters like that.
[Edit]
Here we go:
url(r'^documents/tag/(?P<tag>[\w\+]+)/$', ListDocuments.as_view(), name="list_documents"),
I will still select a better answer if there is a way for the Django urlparser to give the view an actual list instead of just one big long string, but if that's not possible, this solution does work.

A simple regexp in python

My program is a simple calculator, so I need to parse te expression which the user types, to get the input more user-friendly. I know I can do it with regular expressions, but I'm not familar enough about this.
So I need transform a input like this:
import re
input_user = "23.40*1200*(12.00-0.01)*MM(H2O)/(8.314 *func(2*x+273.15,x))"
re.some_stuff( ,input_user) # ????
in this:
"23.40*1200*(12.00-0.01)*MM('H2O')/(8.314 *func('2*x+273.15',x))"
just adding these simple quotes inside the parentheses. How can I do that?
UPDATE:
To be more clear, I want add simple quotes after every sequence of characters "MM(" and before the ")" which comes after it, and after every sequence of characters "func(" and before the "," which comes after it.
This is the sort of thing where regexes can work, but they can potentially result in major problems unless you consider exactly what your input will be like. For example, can whatever is inside MM(...) contain parentheses of its own? Can the first expression in func( contain a comma? If the answers to both questions is no, then the following could work:
input_user2 = re.sub(r'MM\(([^\)]*)\)', r"MM('\1')", input_user)
output = re.sub(r'func\(([^,]*),', r"func('\1',", input_user)
However, this will not work if the answer to either question is yes, and even without that could cause problems depending upon what sort of inputs you expect to receive. Essentially, the first re.sub here looks for MM( ('MM('), followed by any number (including 0) of characters that aren't a close-parenthesis ('([^)]*)') that are then stored as a group (caused by the extra parentheses), and then a close-parenthesis. It replaces that section with the string in the second argument, where \1 is replaced by the first and only group from the pattern. The second re.sub works similarly, looking for any number of characters that aren't a comma.
If the answer to either question is yes, then regexps aren't appropriate for the parsing, as your language would not be regular. The answer to this question, while discussing a different application, may give more insight into that matter.

Categories

Resources