I want to search for some text in a line. As an example, text is:
{'id: 'id-::blabla1::blabal2-A'}
or
{'id: 'id-::blabla3::blabal4-B'}
or
{'id: 'id-::blabla5::blabal6-c'}
I want to find this text: A or B or C. How do I build a regular expression in python to do this?
I think you mean a dictionary although you miss a ' in both cases.
I guess something like this is what you're looking for:
import re
dict = {'id': 'id-::blabla1::blabal2-A'}
test = re.sub(r'.+?::.+?::.+?-(\w)',r'\1',dict['id'])
regex could be simplified but this is all I can do you for based on this info
You can start with this one :
{'id: 'id-(?::.*?){2}-([a-zA-Z])'}
see : https://regex101.com/r/XnlUTi/2
([a-zA-Z])
This will be the group match who return A or B or c
import re
content = "{'id: 'id-::blabla1::blabal2-A'}"
pattern = re.compile('{\'id: \'id-::blabla.*?::blabal.*?-(.*?)\'}', re.S)
print re.findall(pattern, content)
.*? represents anything, (.*?) represents things that you want.
re.findall(pattern, content) will return a list that meet the regular expression.
Related
I need a python regular expression to extract all the occurrences of a string from the line .
So for example,
line = 'TokenRange(start_token:5835456583056758754, end_token:5867789857766669245, rack:brikbrik0),EndpointDetails(host:192.168.210.183, datacenter:DC1, rack:brikbrikadfdas), EndpointDetails(host:192.168.210.182, datacenter:DC1, rack:brikbrik1adf)])'
I want to extract all the string which contains the rack ID. I am crappy with reg ex, so when I looked at the python docs but could not find the correct use of re.findAll or some similar regex expression.
Can someone help me with the regular expression?
Here is the output i need : [brikbrik0,brikbrikadfdas, brikbrik1adf]
You can capture alphanumerics coming after the rack::
>>> re.findall(r"rack:(\w+)", line)
['brikbrik0', 'brikbrikadfdas', 'brikbrik1adf']
Add a word boundary to rack:
\brack:(\w+)
See a demo on regex101.com.
In Python (demo on ideone.com):
import re
string = """TokenRange(start_token:5835456583056758754, end_token:5867789857766669245, rack:brikbrik0),EndpointDetails(host:192.168.210.183, datacenter:DC1, rack:brikbrikadfdas), EndpointDetails(host:192.168.210.182, datacenter:DC1, rack:brikbrik1adf)])"""
rx = re.compile(r'\brack:(\w+)')
matches = [match.group(1) for match in rx.finditer(string)]
print(matches)
I have a more challenging task, but first I am faced with this issue. Given a string s, I want to extract all the groups of characters marked by some delimiter, e.g. parentheses. How can I accomplish this using regular expressions (or any Pythonic way)?
import re
>>> s = '(3,1)-[(7,2),1,(a,b)]-8a'
>>> pattern = r'(\(.+\))'
>>> re.findall(pattern, s).group() # EDITED: findall vs. search
['(3,1)-[(7,2),1,(a,b)']
# Desire result
['(3,1)', '(7,2)', '(a,b)']
Use findall() instead of search(). The former finds all occurences, the latter only finds the first.
Use the non-greedy ? operator. Otherwise, you'll find a match starting at the first ( and ending at the final ).
Note that regular expressions aren't a good tool for finding nested expressions like: ((1,2),(3,4)).
import re
s = '(3,1)-[(7,2),1,(a,b)]-8a'
pattern = r'(\(.+?\))'
print re.findall(pattern, s)
Use re.findall()
import re
data = '(3,1)-[(7,2),1,(a,b)]-8a'
found = re.findall('(\(\w,\w\))', data)
print found
Output:
['(3,1)', '(7,2)', '(a,b)']
I am looking into the Regex function in Python.
As part of this, I am trying to extract a substring from a string.
For instance, assume I have the string:
<place of birth="Stockholm">
Is there a way to extract Stockholm with a single regex call?
So far, I have:
location_info = "<place of birth="Stockholm">"
#Remove before
location_name1 = re.sub(r"<place of birth=\"", r"", location_info)
#location_name1 --> Stockholm">
#Remove after
location_name2 = re.sub(r"\">", r"", location_name1)
#location_name2 --> Stockholm
Any advice on how to extract the string Stockholm, without using two "re.sub" calls is highly appreciated.
Sure, you can match the beginning up to the double quotes, and match and capture all the characters other than double quotes after that:
import re
p = re.compile(r'<place of birth="([^"]*)')
location_info = "<place of birth=\"Stockholm\">"
match = p.search(location_info)
if match:
print(match.group(1))
See IDEONE demo
The <place of birth=" is matched as a literal, and ([^"]*) is a capture group 1 matching 0 or more characters other than ". The value is accessed with .group(1).
Here is a REGEX demo.
print re.sub(r'^[^"]*"|"[^"]*$',"",location_info)
This should do it for you.See demo.
https://regex101.com/r/vV1wW6/30#python
Is there a specific reason why you are removing the rest of the string, instead of selecting the part you want with something like
location_info = "<place of birth="Stockholm">"
location_info = re.search('<.*="(.*)".*>', location_info, re.IGNORECASE).group(1)
this code tested under python 3.6
test = '<place of birth="Stockholm">'
resp = re.sub(r'.*="(\w+)">',r'\1',test)
print (resp)
Stockholm
I've come up with a regex expression that works well enough for my purposes for finding phone numbers.
I would like to take it a step further and use it in large text blocks to identify matching strings that follow the words 'cell' or 'mobile' by at most 10 characters. I would like it to return the number in Cell Phone: (954) 555-4444 as well as Mobile 555-777-9999 but not Fax: (555) 444-6666
something like (in pseudocode)
regex = re.compile(r'(\+?[2-9]\d{2}\)?[ -]?\d{3}[ -]?\d{4})')
bigstring = # Some giant string added together from many globbed files
matches = regex.search(bigstring)
for match in matches:
if match follows 'cell' or match follows 'mobile':
print match.group(0)
You can do:
txt='''\
Call me on my mobile anytime: 555-666-1212
The office is best at 555-222-3333
Dont ever call me at 555-666-2345 '''
import re
print re.findall(r'(?:(mobile|office).{0,15}(\+?[2-9]\d{2}\)?[ -]?\d{3}[ -]?\d{4}))', txt)
Prints:
[('mobile', '555-666-1212'), ('office', '555-222-3333')]
You can do that with your regular expression. In the re documentation, you will find that the pattern r'(?<=abc)def' matches 'def' only if it is preceded by 'abc'.
Similarly r'Hello (?=World)' matches 'Hello ' if followed by 'World'
Is there a way to see if a line contains words that matches a set of regex pattern?
If I have [regex1, regex2, regex3], and I want to see if a line matches any of those, how would I do this?
Right now, I am using re.findall(regex1, line), but it only matches 1 regex at a time.
You can use the built in functions any (or all if all regexes have to match) and a Generator expression to cycle through all the regex objects.
any (regex.match(line) for regex in [regex1, regex2, regex3])
(or any(re.match(regex_str, line) for regex in [regex_str1, regex_str2, regex_str2]) if the regexes are not pre-compiled regex objects, of course)
However, that will be inefficient compared to combining your regexes in a single expression. If this code is time- or CPU-critical, you should try instead to compose a single regular expression that encompasses all your needs, using the special | regex operator to separate the original expressions.
A simple way to combine all the regexes is to use the string join method:
re.match("|".join([regex_str1, regex_str2, regex_str2]), line)
A warning about combining the regexes in this way: It can result in wrong expressions if the original ones already do make use of the | operator.
Try this new regex: (regex1)|(regex2)|(regex3). This will match a line with any of the 3 regexs in it.
You cou loop through the regex items and do a search.
regexList = [regex1, regex2, regex3]
line = 'line of data'
gotMatch = False
for regex in regexList:
s = re.search(regex,line)
if s:
gotMatch = True
break
if gotMatch:
doSomething()
#quite new to python but had the same problem. made this to find all with multiple
#regular #expressions.
regex1 = r"your regex here"
regex2 = r"your regex here"
regex3 = r"your regex here"
regexList = [regex1, regex1, regex3]
for x in regexList:
if re.findall(x, your string):
some_list = re.findall(x, your string)
for y in some_list:
found_regex_list.append(y)#make a list to add them to.