Regex match " character - python

I am trying to get data using regex but just don't know how to match the character " in the regex. I have tried the following:
text = "value=1.211.1.1"
regex ='''w+\=(\d+\.\d+\.\d+\.\d+)'''
match_result = 1.211.1.1
However when my text is:
text = value=""value=1.211.1.1""
I am not able to get the match. I tried the following but it doesn't work. How can I determine whether the " character is in a given string?
regex = '''w+\=\"(\d+\.\d+\.\d+\.\d+)\"'''

Your question is a little confusing but is this perhaps what you're after?
import re
s = '"value="1.211.1.1"'
m = re.match('''['\"]*\w+=['\"]?(\d+\.\d+\.\d+\.\d+)['\"]*''', s)
print(m.group(1))
Output
1.211.1.1

Related

How can I add a string inside a string?

The problem is simple, I'm given a random string and a random pattern and I'm told to get all the posible combinations of that pattern that occur in the string and mark then with [target] and [endtarget] at the beggining and end.
For example:
given the following text: "XuyZB8we4"
and the following pattern: "XYZAB"
The expected output would be: "[target]X[endtarget]uy[target]ZB[endtarget]8we4".
I already got the part that identifies all the words, but I can't find a way of placing the [target] and [endtarget] strings after and before the pattern (called in the code match).
import re
def tagger(text, search):
place_s = "[target]"
place_f = "[endtarget]"
pattern = re.compile(rf"[{search}]+")
matches = pattern.finditer(text)
for match in matches:
print(match)
return test_string
test_string = "alsikjuyZB8we4 aBBe8XAZ piarBq8 Bq84Z "
pattern = "XYZAB"
print(tagger(test_string, pattern))
I also tried the for with the sub method, but I couldn't get it to work.
for match in matches:
re.sub(match.group(0), place_s + match.group(0) + place_f, text)
return text
re.sub allows you to pass backreferences to matched groups within your pattern. so you do need to enclose your pattern in parentheses, or create a named group, and then it will replace all matches in the entire string at once with your desired replacements:
In [10]: re.sub(r'([XYZAB]+)', r'[target]\1[endtarget]', test_string)
Out[10]: 'alsikjuy[target]ZB[endtarget]8we4 a[target]BB[endtarget]e8[target]XAZ[endtarget] piar[target]B[endtarget]q8 [target]B[endtarget]q84[target]Z[endtarget] '
With this approach, re.finditer is not not needed at all.

Inconsistency between regex and python search

I'm doing a small regex that catch all the text before the numbers.
https://regex101.com/r/JhIiG9/2
import re
regex = "^(.*?)(\d*([-.]\d*)*)$"
message = "Myteeeeext 0.366- 0.3700"
result = re.search(regex, message)
print(result.group(1))
https://www.online-python.com/a7smOJHBwp
When I run this regex instead of just showing the first group which is Myteeeeext I'm getting Myteeeeext 0.366- but in regex101 it shows only
Try this Regex, [^\d.-]+
It catches all the text before the numbers
import re
regex = "[^\d.-]+"
message = "Myteeeeext 0.366- 0.3700 notMyteeeeext"
result = re.search(regex, message)
print(f"'{result.group()}'")
Outputs:
'Myteeeeext '
tell me if its okay for you...
Your regex:
regex = "^(.*?)(\d*([-.]\d*)*)$"
doesn't allow for the numbers part to have any spaces, but your search string:
message = "Myteeeeext 0.366- 0.3700"
does have a space after the dash, so this part of your regex:
(.*?)
matches up to the second number.
It doesn't look like your test string in the regex101.com example you gave has a space, so that's why your results are different.

How can I "divide" words with regular expressions?

I have a sentence in which every token has a / in it. I want to just print what I have before the slash.
What I have now is basic:
text = less/RBR.....
return re.findall(r'\b(\S+)\b', text)
This obviously just prints the text, how do I cut off the words before the /?
Assuming you want all characters before the slash out of every word that contains a slash. This would mean e.g. for the input string match/this but nothing here but another/one you would want the results match and another.
With regex:
import re
result = re.findall(r"\b(\w*?)/\w*?\b", my_string)
print(result)
Without regex:
result = [word.split("/")[0] for word in my_string.split()]
print(result)
Simple and straight-forward:
rx = r'^[^/]+'
# anchor it to the beginning
# the class says: match everything not a forward slash as many times as possible
In Python this would be:
import re
text = "less/RBR....."
print re.match(r'[^/]+', text)
As this is an object, you'd probably like to print it out, like so:
print re.match(r'[^/]+', text).group(0)
# less
This should also work
\b([^\s/]+)(?=/)\b
Python Code
p = re.compile(r'\b([^\s/]+)(?=/)\b')
test_str = "less/RBR/...."
print(re.findall(p, test_str))
Ideone Demo

regular expression to extract part of email address

I am trying to use a regular expression to extract the part of an email address between the "#" sign and the "." character. This is how I am currently doing it, but can't get the right results.
company = re.findall('^From:.+#(.*).',line)
Gives me:
['#iupui.edu']
I want to get rid of the .edu
To match a literal . in your regex, you need to use \., so your code should look like this:
company = re.findall('^From:.+#(.*)\.',line)
# ^ this position was wrong
See it live here.
Note that this will always match the last occurrence of . in your string, because (.*) is greedy. If you want to match the first occurence, you need to exclude any . from your capturing group:
company = re.findall('^From:.+#([^\.]*)\.',line)
See a demo.
You can try this:
(?<=\#)(.*?)(?=\.)
See a demo.
A simple example would be:
>>> import re
>>> re.findall(".*(?<=\#)(.*?)(?=\.)", "From: atc#moo.com")
['moo']
>>> re.findall(".*(?<=\#)(.*?)(?=\.)", "From: atc#moo-hihihi.com")
['moo-hihihi']
This matches the hostname regardless of the beginning of the line, i.e. it's greedy.
You could just split and find:
s = " abc.def#ghi.mn I"
s = s.split("#", 1)[-1]
print(s[:s.find(".")])
Or just split if it is not always going to match your string:
s = s.split("#", 1)[-1].split(".", 1)[0]
If it is then find will be the fastest:
i = s.find("#")
s = s[i+1:s.find(".", i)]

Regex in Python - Substring with single "re.sub" call

I am looking into the Regex function in Python.
As part of this, I am trying to extract a substring from a string.
For instance, assume I have the string:
<place of birth="Stockholm">
Is there a way to extract Stockholm with a single regex call?
So far, I have:
location_info = "<place of birth="Stockholm">"
#Remove before
location_name1 = re.sub(r"<place of birth=\"", r"", location_info)
#location_name1 --> Stockholm">
#Remove after
location_name2 = re.sub(r"\">", r"", location_name1)
#location_name2 --> Stockholm
Any advice on how to extract the string Stockholm, without using two "re.sub" calls is highly appreciated.
Sure, you can match the beginning up to the double quotes, and match and capture all the characters other than double quotes after that:
import re
p = re.compile(r'<place of birth="([^"]*)')
location_info = "<place of birth=\"Stockholm\">"
match = p.search(location_info)
if match:
print(match.group(1))
See IDEONE demo
The <place of birth=" is matched as a literal, and ([^"]*) is a capture group 1 matching 0 or more characters other than ". The value is accessed with .group(1).
Here is a REGEX demo.
print re.sub(r'^[^"]*"|"[^"]*$',"",location_info)
This should do it for you.See demo.
https://regex101.com/r/vV1wW6/30#python
Is there a specific reason why you are removing the rest of the string, instead of selecting the part you want with something like
location_info = "<place of birth="Stockholm">"
location_info = re.search('<.*="(.*)".*>', location_info, re.IGNORECASE).group(1)
this code tested under python 3.6
test = '<place of birth="Stockholm">'
resp = re.sub(r'.*="(\w+)">',r'\1',test)
print (resp)
Stockholm

Categories

Resources