I need a python regular expression to extract all the occurrences of a string from the line .
So for example,
line = 'TokenRange(start_token:5835456583056758754, end_token:5867789857766669245, rack:brikbrik0),EndpointDetails(host:192.168.210.183, datacenter:DC1, rack:brikbrikadfdas), EndpointDetails(host:192.168.210.182, datacenter:DC1, rack:brikbrik1adf)])'
I want to extract all the string which contains the rack ID. I am crappy with reg ex, so when I looked at the python docs but could not find the correct use of re.findAll or some similar regex expression.
Can someone help me with the regular expression?
Here is the output i need : [brikbrik0,brikbrikadfdas, brikbrik1adf]
You can capture alphanumerics coming after the rack::
>>> re.findall(r"rack:(\w+)", line)
['brikbrik0', 'brikbrikadfdas', 'brikbrik1adf']
Add a word boundary to rack:
\brack:(\w+)
See a demo on regex101.com.
In Python (demo on ideone.com):
import re
string = """TokenRange(start_token:5835456583056758754, end_token:5867789857766669245, rack:brikbrik0),EndpointDetails(host:192.168.210.183, datacenter:DC1, rack:brikbrikadfdas), EndpointDetails(host:192.168.210.182, datacenter:DC1, rack:brikbrik1adf)])"""
rx = re.compile(r'\brack:(\w+)')
matches = [match.group(1) for match in rx.finditer(string)]
print(matches)
Related
I am trying to create a Python regex to capture a file name, but only if the text "external=true" appears within the square brackets after the alleged file name.
I believe I am nearly there, but am missing a specific use-case. Essentially, I want to capture the text between qrcode: and the first [, but only if the text external=true appears between the two square brackets.
I have created the regex qrcode:([^:].*?)\[.*?external=true.*?\], which does not work for the second line below: it incorrectly returns vcard3.txt and does not return vcard4.txt.
qrcode:vcard1.txt[external=true] qrcode:vcard2.txt[xdim=2,ydim=2]
qrcode:vcard3.txt[xdim=2,ydim=2] qrcode:vcard4.txt[xdim=2,ydim=2,external=true]
qrcode:vcard5.txt[xdim=2,ydim=2,external=true,foreground=red,background=white]
qrcode:https://www.github.com[foreground=blue]
https://regex101.com/r/bh3IMb/3
As an alternative you can use
qrcode:([\w\.]+)(?=\[[\w\=,]*external=true[^\]]*)
See the regex demo.
Python demo:
import re
regex = re.compile(r"qrcode:([\w\.]+)(?=\[[\w\=,]*external=true[^\]]*)")
sample = """
qrcode:vcard1.txt[external=true] qrcode:vcard2.txt[xdim=2,ydim=2]
qrcode:vcard3.txt[xdim=2,ydim=2] qrcode:vcard4.txt[xdim=2,ydim=2,external=true]
qrcode:vcard5.txt[xdim=2,ydim=2,external=true,foreground=red,background=white]
qrcode:https://www.github.com[foreground=blue]
"""
print(regex.findall(sample))
Output:
['vcard1.txt', 'vcard4.txt', 'vcard5.txt']
Using positive look-ahead (for qrcode:) and positive look-behind (for [*external=true with lazy matching to capture the smallest of such groups.
Regex101 explanation: https://regex101.com/r/bOezIm/1
A complete python example:
import re
pattern = r"(?<=qrcode:)[^:]*?(?=\[[^\]]*?external=true)"
string = """
qrcode:vcard1.txt[external=true] qrcode:vcard2.txt[xdim=2,ydim=2]
qrcode:vcard3.txt[xdim=2,ydim=2] qrcode:vcard4.txt[xdim=2,ydim=2,external=true]
qrcode:vcard5.txt[xdim=2,ydim=2,external=true,foreground=red,background=white]
qrcode:https://www.github.com[foreground=blue]
"""
print(re.findall(pattern, string))
I have been working on the python code to extract document Ids from text documents where IDs can be at the random line in the text using regex.
This document ID is comprised of four letters followed by a hyphen, followed by three numbers and optionally ending in a letter. For example, each of the following is valid document IDs:
ABCD-123
ABCD-123V
XKCD-999
COMP-200
I have tried following regular expression for finding all ids:
re = re.findall(r"([A-Z]{4})(-)([0-9]{3})([A-Z]{0,1})", text.read())
These expressions work correctly but I have a problem when Ids are connected to words like:
XKCD-999James
The regular expression should return XKCD-999 but it is returning XKCD-999J which is incorrect.
What changes should I do in RE to get the correct?
Use a negative lookahead assertion to ignore patterns that have trailing letters:
exp = re.findall(r"([A-Z]{4})(-)([0-9]{3})([A-Z](?![A-Za-z]))?", text.read())
# ^^^^^^^^^^^^^^^^^^^^
As you are using word characters, you can optionally match a char A-Z followed by a word boundary.
\b[A-Z]{4}-[0-9]{3}(?:[A-Z]\b)?
Regex demo
Note that using re.findall will return the captured groups, so if you want to return just the whole match, you can omit the groups.
With the capture groups, the pattern can be:
\b([A-Z]{4})(-)([0-9]{3}(?:[A-Z]\b)?)
Regex demo
How about you use a boundary operation \b ?
[A-Z]{4}-\d{3}(?:[A-Z]\b)?
Regex101 Sample - https://regex101.com/r/DhC5Vd/4
text = "XKCD-999James"
exp = re.findall(r"[A-Z]{4}-\d{3}(?:[A-Z]\b)?", text)
#OUTPUT: ['XKCD-999']
I want to extract a value if exist from an url using regex ,
My string :
string = "utm_source=google&utm_campaign=replay&utm_medium=display&ctm_account=4&ctm_country=fr&ctm_bu=b2c&ctm_adchannel=im&esl-k=gdn|nd|c427558773026|m|k|pwww.ldpeople.com|t|dm|a100313514420|g9711440090"
From this string, I want to extract : c427558773026 , the value to extract will start always by c and have this pattern |c*|
import re
pattern = re.compile('|c\w|')
pattern.findall(string)
The result is none in my case, I am using python 2.7
You could assert a pipe (not that it is escaped) \| on the left and right using lookarounds, and match a c char followed by 1+ digits \d+
(?<=\|)c\d+(?=\|)
Regex demo
import re
string = "utm_source=google&utm_campaign=replay&utm_medium=display&ctm_account=4&ctm_country=fr&ctm_bu=b2c&ctm_adchannel=im&esl-k=gdn|nd|c427558773026|m|k|pwww.ldpeople.com|t|dm|a100313514420|g9711440090"
print(re.findall(r"(?<=\|)c\d+(?=\|)", string))
Or use a capturing group leaving out the lookbehind as #Wiktor Stribiżew suggest:
\|(c\d+)(?=\|)
Regex demo
The problem with your approach is that | is the or, which must be escaped to match the literal character. Additionally, you could use look-ahead/look-behind to ensure that | is encapsulating the string, and not capture it with findall
Here is a code snippet that should solve the problem:
>>> import re
>>> string = "utm_source=google&...&esl-k=gdn|nd|c427558773026|m|k|..."
>>> pattern = re.compile('(?<=\|)c\d+(?=\|)')
>>> pattern.findall(string)
['c427558773026']
I have a string of the below pattern.
string = "'L3-OPS-AB-1499', 'L3-RBP_C-449', 'L2-COM-310', 'L2-PLT-16796'"
My requirement is for a regular expression to find all the occurrences of following patterns as below
a string starting with L
followed by a number
a hyphen
then special keywords like "OPS-AB" or "PLT" or "COM"
then a hyphen
followed by a number
ex: L3-OPS-AB-1499
I tried the below regex but it gives a partial result
regex = re.search("L\d-(OPS|RBP_|-AB|C)|(COM|PLT)-\d+",string)
my output
'COM-310'
expected output
'L3-OPS-AB-1499', 'L3-RBP_C-449', 'L2-COM-310', 'L2-PLT-16796'
Any help will be appreciated, thanks
Use re.findall
>>> re.findall(r'L\d+-(?:OPS-AB|PLT|COM|RBP_C)-\d+', string)
['L3-OPS-AB-1499', 'L3-RBP_C-449', 'L2-COM-310', 'L2-PLT-16796']
I want to search for some text in a line. As an example, text is:
{'id: 'id-::blabla1::blabal2-A'}
or
{'id: 'id-::blabla3::blabal4-B'}
or
{'id: 'id-::blabla5::blabal6-c'}
I want to find this text: A or B or C. How do I build a regular expression in python to do this?
I think you mean a dictionary although you miss a ' in both cases.
I guess something like this is what you're looking for:
import re
dict = {'id': 'id-::blabla1::blabal2-A'}
test = re.sub(r'.+?::.+?::.+?-(\w)',r'\1',dict['id'])
regex could be simplified but this is all I can do you for based on this info
You can start with this one :
{'id: 'id-(?::.*?){2}-([a-zA-Z])'}
see : https://regex101.com/r/XnlUTi/2
([a-zA-Z])
This will be the group match who return A or B or c
import re
content = "{'id: 'id-::blabla1::blabal2-A'}"
pattern = re.compile('{\'id: \'id-::blabla.*?::blabal.*?-(.*?)\'}', re.S)
print re.findall(pattern, content)
.*? represents anything, (.*?) represents things that you want.
re.findall(pattern, content) will return a list that meet the regular expression.