Python regex to find either one or the other - python

I have the following regex that checks for the word "Read" but i'm looking for it to check for either "Read" or "Deleted"
len(re.findall("Read", phrase))
How can I make the regex so it's looking for either Read or Deleted?

You can use alternatives (separated by a pipe |) to search for either "Read" or "Deleted":
len(re.findall("Read|Deleted", phrase))

the pattern match must be independent of search word order
phrase = " hello world Deleted,Undeleted,Read in a sentence"
results=re.search("(Read|Deleted|Undeleted).*(Read|Deleted|Undeleted).* (Read|Deleted|Undeleted).*", phrase).groups()
for group in results:
print(group)
output
Deleted
Undeleted
Read

Related

How to substitute a regex with another regex in a string

This question showed how to replace a regex with another regex like this
$string = '"SIP/1037-00000014","SIP/CL-00000015","Dial","SIP/CL/61436523277,45"';
$$pattern = '["SIP/CL/(\d*),(\d*)",]';
$replacement = '"SIP/CL/\1|\2",';
$string = preg_replace($pattern, $replacement, $string);
print($string);
However, I couldn't adapt that pattern to solve my case where I want to remove the full stop that lies between 2 words but not between a word and a number:
text = 'this . is bad. Not . 820'
regex1 = r'(\w+)(\s\.\s)(\D+)'
regex2 = r'(\w+)(\s)(\D+)'
re.sub(regex1, regex2, text)
# Desired outcome:
'this is bad. Not . 820'
Basically I like to remove the . between the two alphabet words. Could someone please help me with this problem? Thank you in advance.
These expressions might be close to what you might have in mind:
\s[.](?=\s\D)
or
(?<=\s)[.](?=\s\D)
Test
import re
regex = r"\s[.](?=\s\D)"
test_str = "this . is bad. Not . 820"
print(re.sub(regex, "", test_str))
Output
this is bad. Not . 820
If you wish to explore/simplify/modify the expression, it's been
explained on the top right panel of
regex101.com. If you'd like, you
can also watch in this
link, how it would match
against some sample inputs.
Firstly, you can't really take PHP and apply it directly to Python, for obvious reasons.
Secondly, it always helps to specify which version of Python you're using as APIs change. Luckily in this instance, the API of re.sub has remained the same between Python 2.x and Python 3.
Onto your issue.
The second argument to re.sub is either a string or a function. If you pass in regex2 it'll just replace regex1 with the string contents of regex2, it won't apply regex2 as a regex.
If you want to use groups derived from the first regex (similar to your example, which is using \1 and \2 to extract the first and second matching group from the first regex), then you'd want to use a function, which takes a match object as its sole argument, which you could then use to extract matching groups and return them as part of the replacement string.

Multiline regex python

I'm trying to do some text file parsing where this pattern is repeated throughout the file:
VERSION.PROGRAM:program_name
VERSION.SUBPROGRAM:sub_program_name
My intent is to, given a progra_name, retrieve the sub_program_name for each block of text i mentioned above.
I have the following function that finds if the text actually exists, but doesn't print the sub_program_name:
def find_subprogram(program_name):
regex_string = r'VERSION.PROGRAM:%s\nVERSION.SUBPROGRAM:.' % program_name
with open('file.txt', r) as f:
match = re.search(regex_string, f.read(), re.DOTALL|re.MULTILINE)
if match:
print match.group()
I will appreciate some help or tips.
Thanks
Your regex has a typo, it's looking for PRGRAM.
If you want to search for multiple lines, then you don't want to use the MULTILINE modifier. What that does is it considers each line as its own separate entity to be matched against with a beginning and an end.
You also are not using valid regex matching techniques. You should look up how to properly use regex.
For matching any character, using (.*) not %s.
Here is an example
Using VERSION\.PROGRAM:YOURSTRING\nVERSION\.SUBPROGRAM:(.*) will match the groups properly
re.compile('VERSION\.PROGRAM:%s\nVERSION\.SUBPROGRAM:(.*)'%(re.escape(yourstr)))

Using regex to find multiple matches on the same line

I need to build a program that can read multiple lines of code, and extract the right information from each line.
Example text:
no matches
one match <'found'>
<'one'> match <found>
<'three'><'matches'><'found'>
For this case, the program should detect <'found'>, <'one'>, <'three'>, <'matches'> and <'found'> as matches because they all have "<" and "'".
However, I cannot work out a system using regex to account for multiple matches on the same line. I was using something like:
re.search('^<.*>$')
But if there are multiple matches on one line, the extra "'<" and ">'" are taken as part of the .*, without counting them as separate matches. How do I fix this?
This works -
>>> r = re.compile(r"\<\'.*?\'\>")
>>> r.findall(s)
["<'found'>", "<'one'>", "<'three'>", "<'matches'>", "<'found'>"]
Use findall instead of search:
re.findall( r"<'.*?'>", str )
You can use re.findall and match on non > characters inside of the angle brackets:
>>> re.findall('<[^>]*>', "<'three'><'matches'><'found'>")
["<'three'>", "<'matches'>", "<'found'>"]
Non-greedy quantifier '?' as suggested by anubhava is also an option.

Python: match a single word (with spaces)

The problem is that I am trying to match a word (spaces on either side) if it exists.
The code I have working (at least mostly) is:
import re, os
str1 = "the host offered $ rec*ting advice"
str1 = re.sub('[*]', '(.*?)', str1)
str1 = re.sub('[$]', '(.*?)', str1)
str1 = str1.lower()
print str1
previous_dir = os.getcwd()
os.chdir('testfilefolder')
for filename in os.listdir('.'):
with open(filename) as f:
file_contents = f.read().lower()
output = re.search("%s" % str1, file_contents)
if output:
print (" Match found in " + filename))
So for example if I have the string "the host has offered some recruiting advice" and do a search on the string "the host offered some $ rec*ting advice" it will not work - due to the dollar sign (which is replaced by the (.*?). The interesting thing is, if I have "the host offered $ rec*ting advice" - note "some" is gone and hence this works - so I can match 1 word if it exists -looks like (.*?) is supposed to match one character which each word has at least one character in it so I suppose that is why it works. I am not sure if the (.*?) is even right to use but it is the best that I have gotten working so far after my research. Any advice on that would be very appreciated. Note above I have (.*?) in the text it seems to show up that somehow the (.*?) is some sort of tag and just formats the string between the (.*?)'s.
I however want to match 0 or 1 word. I had found something before similar to \bs+\b (I can't quite remember and I can't find it again), but couldn't get it to work anyways. I know that \b is supposed to match an empty string on either side of the possible existence of a word.
I appologize if this question is asked elsewhere but it seems that everything I have found (that I can still find and was able to get working) is looking for a particular word - I however am looking to see if only 0 or 1 exists:
How do I match a word in a text file using python?
Your question is very hard to understand so this is probably not exactly what you are looking for but it may help you in the right direction.
If you want to find all words in the text this is how it could be done:
import re
str1 = "the host offered $ rec*ting advice"
re.findall(r'\b\S+\b',str1)
This will produce:
['the', 'host', 'offered', 'rec*ting', 'advice']
The \b-thing in the pattern is not actually matching a character, but a place in the string where a word starts or ends (see http://docs.python.org/2/library/re for more info on this).
The dollar sign is not considered a word since its not a word character according to the \b-definition used.
If you want to get the first word in a string if there is a word there to get you could use:
re.findall(r'\b\S+\b',str1)[:1]
You will then get a list of zero or one element!

Python regular expressoin - match string containing #expr1 and not #expr2 and not #expr3

I would like to match a string that has "subscribe" and does not have "did not" or "unsub.*" in it.
For example,
"please subscribe me" would match
but "I did not subscribe this email" or "please unsubscribe me" would fail to match.
what I have is
".*subscribe(?!.*did\\s+not)(?!.*unsub.*)"
which apparently doesn't work.
So again, the expression I want is (A and !B and !C)
Any help would be appreciated.
Thank you,
Eric
Your lookaheads should be at the start of the regular expression:
re.match(r"(?!.*did\s+not)(?!.*unsub).*subscribe", text)
Regex:
^(?!.*unsub)(?!.*did not).*subscribe
Python:
re.match(r"^(?!.*unsub)(?!.*did not).*subscribe", str)
You can do this with positive and negative lookahead, but a far better approach is to have one regexp for search terms, and another for the stopwords.
if re.search(r"\bsubscribe", text) and not re.search(r"did\s+not|\bunsub", text):
unsubscribe(sender)
Lookaheads are only worth using if you need to include/exclude text at specific positions.
Also note that the \b (word boundary) will keep "subscribe" from matching inside the word "unsubscribe".

Categories

Resources