Python extract occurence of a string with regex

Python extract occurence of a string with regex - python

I need a python regular expression to extract all the occurrences of a string from the line .
So for example,
line = 'TokenRange(start_token:5835456583056758754, end_token:5867789857766669245, rack:brikbrik0),EndpointDetails(host:192.168.210.183, datacenter:DC1, rack:brikbrikadfdas), EndpointDetails(host:192.168.210.182, datacenter:DC1, rack:brikbrik1adf)])'
I want to extract all the string which contains the rack ID. I am crappy with reg ex, so when I looked at the python docs but could not find the correct use of re.findAll or some similar regex expression.
Can someone help me with the regular expression?
Here is the output i need : [brikbrik0,brikbrikadfdas, brikbrik1adf]

You can capture alphanumerics coming after the rack::
>>> re.findall(r"rack:(\w+)", line)
['brikbrik0', 'brikbrikadfdas', 'brikbrik1adf']

Add a word boundary to rack:
\brack:(\w+)
See a demo on regex101.com.
In Python (demo on ideone.com):
import re
string = """TokenRange(start_token:5835456583056758754, end_token:5867789857766669245, rack:brikbrik0),EndpointDetails(host:192.168.210.183, datacenter:DC1, rack:brikbrikadfdas), EndpointDetails(host:192.168.210.182, datacenter:DC1, rack:brikbrik1adf)])"""
rx = re.compile(r'\brack:(\w+)')
matches = [match.group(1) for match in rx.finditer(string)]
print(matches)

Related

Regex to capture string if other string present within brackets

I am trying to create a Python regex to capture a file name, but only if the text "external=true" appears within the square brackets after the alleged file name.
I believe I am nearly there, but am missing a specific use-case. Essentially, I want to capture the text between qrcode: and the first [, but only if the text external=true appears between the two square brackets.
I have created the regex qrcode:([^:].*?)\[.*?external=true.*?\], which does not work for the second line below: it incorrectly returns vcard3.txt and does not return vcard4.txt.
qrcode:vcard1.txt[external=true] qrcode:vcard2.txt[xdim=2,ydim=2]
qrcode:vcard3.txt[xdim=2,ydim=2] qrcode:vcard4.txt[xdim=2,ydim=2,external=true]
qrcode:vcard5.txt[xdim=2,ydim=2,external=true,foreground=red,background=white]
qrcode:https://www.github.com[foreground=blue]
https://regex101.com/r/bh3IMb/3

As an alternative you can use
qrcode:([\w\.]+)(?=\[[\w\=,]*external=true[^\]]*)
See the regex demo.
Python demo:
import re
regex = re.compile(r"qrcode:([\w\.]+)(?=\[[\w\=,]*external=true[^\]]*)")
sample = """
qrcode:vcard1.txt[external=true] qrcode:vcard2.txt[xdim=2,ydim=2]
qrcode:vcard3.txt[xdim=2,ydim=2] qrcode:vcard4.txt[xdim=2,ydim=2,external=true]
qrcode:vcard5.txt[xdim=2,ydim=2,external=true,foreground=red,background=white]
qrcode:https://www.github.com[foreground=blue]
"""
print(regex.findall(sample))
Output:
['vcard1.txt', 'vcard4.txt', 'vcard5.txt']

Using positive look-ahead (for qrcode:) and positive look-behind (for [*external=true with lazy matching to capture the smallest of such groups.
Regex101 explanation: https://regex101.com/r/bOezIm/1
A complete python example:
import re
pattern = r"(?<=qrcode:)[^:]*?(?=\[[^\]]*?external=true)"
string = """
qrcode:vcard1.txt[external=true] qrcode:vcard2.txt[xdim=2,ydim=2]
qrcode:vcard3.txt[xdim=2,ydim=2] qrcode:vcard4.txt[xdim=2,ydim=2,external=true]
qrcode:vcard5.txt[xdim=2,ydim=2,external=true,foreground=red,background=white]
qrcode:https://www.github.com[foreground=blue]
"""
print(re.findall(pattern, string))

Regular expression with condition

I have been working on the python code to extract document Ids from text documents where IDs can be at the random line in the text using regex.
This document ID is comprised of four letters followed by a hyphen, followed by three numbers and optionally ending in a letter. For example, each of the following is valid document IDs:
ABCD-123
ABCD-123V
XKCD-999
COMP-200
I have tried following regular expression for finding all ids:
re = re.findall(r"([A-Z]{4})(-)([0-9]{3})([A-Z]{0,1})", text.read())
These expressions work correctly but I have a problem when Ids are connected to words like:
XKCD-999James
The regular expression should return XKCD-999 but it is returning XKCD-999J which is incorrect.
What changes should I do in RE to get the correct?

Use a negative lookahead assertion to ignore patterns that have trailing letters:
exp = re.findall(r"([A-Z]{4})(-)([0-9]{3})([A-Z](?![A-Za-z]))?", text.read())
# ^^^^^^^^^^^^^^^^^^^^

As you are using word characters, you can optionally match a char A-Z followed by a word boundary.
\b[A-Z]{4}-[0-9]{3}(?:[A-Z]\b)?
Regex demo
Note that using re.findall will return the captured groups, so if you want to return just the whole match, you can omit the groups.
With the capture groups, the pattern can be:
\b([A-Z]{4})(-)([0-9]{3}(?:[A-Z]\b)?)
Regex demo

How about you use a boundary operation \b ?
[A-Z]{4}-\d{3}(?:[A-Z]\b)?
Regex101 Sample - https://regex101.com/r/DhC5Vd/4
text = "XKCD-999James"
exp = re.findall(r"[A-Z]{4}-\d{3}(?:[A-Z]\b)?", text)
#OUTPUT: ['XKCD-999']

Python : Extract substring if exist from another string using regex

I want to extract a value if exist from an url using regex ,
My string :
string = "utm_source=google&utm_campaign=replay&utm_medium=display&ctm_account=4&ctm_country=fr&ctm_bu=b2c&ctm_adchannel=im&esl-k=gdn|nd|c427558773026|m|k|pwww.ldpeople.com|t|dm|a100313514420|g9711440090"
From this string, I want to extract : c427558773026 , the value to extract will start always by c and have this pattern |c*|
import re
pattern = re.compile('|c\w|')
pattern.findall(string)
The result is none in my case, I am using python 2.7

You could assert a pipe (not that it is escaped) \| on the left and right using lookarounds, and match a c char followed by 1+ digits \d+
(?<=\|)c\d+(?=\|)
Regex demo
import re
string = "utm_source=google&utm_campaign=replay&utm_medium=display&ctm_account=4&ctm_country=fr&ctm_bu=b2c&ctm_adchannel=im&esl-k=gdn|nd|c427558773026|m|k|pwww.ldpeople.com|t|dm|a100313514420|g9711440090"
print(re.findall(r"(?<=\|)c\d+(?=\|)", string))
Or use a capturing group leaving out the lookbehind as #Wiktor Stribiżew suggest:
\|(c\d+)(?=\|)
Regex demo

The problem with your approach is that | is the or, which must be escaped to match the literal character. Additionally, you could use look-ahead/look-behind to ensure that | is encapsulating the string, and not capture it with findall
Here is a code snippet that should solve the problem:
>>> import re
>>> string = "utm_source=google&...&esl-k=gdn|nd|c427558773026|m|k|..."
>>> pattern = re.compile('(?<=\|)c\d+(?=\|)')
>>> pattern.findall(string)
['c427558773026']

python find all matches of multiple strings in a given string using regex?

I have a string of the below pattern.
string = "'L3-OPS-AB-1499', 'L3-RBP_C-449', 'L2-COM-310', 'L2-PLT-16796'"
My requirement is for a regular expression to find all the occurrences of following patterns as below
a string starting with L
followed by a number
a hyphen
then special keywords like "OPS-AB" or "PLT" or "COM"
then a hyphen
followed by a number
ex: L3-OPS-AB-1499
I tried the below regex but it gives a partial result
regex = re.search("L\d-(OPS|RBP_|-AB|C)|(COM|PLT)-\d+",string)
my output
'COM-310'
expected output
'L3-OPS-AB-1499', 'L3-RBP_C-449', 'L2-COM-310', 'L2-PLT-16796'
Any help will be appreciated, thanks

Use re.findall
>>> re.findall(r'L\d+-(?:OPS-AB|PLT|COM|RBP_C)-\d+', string)
['L3-OPS-AB-1499', 'L3-RBP_C-449', 'L2-COM-310', 'L2-PLT-16796']

How to search for text in line using python regular expression?

I want to search for some text in a line. As an example, text is:
{'id: 'id-::blabla1::blabal2-A'}
or
{'id: 'id-::blabla3::blabal4-B'}
or
{'id: 'id-::blabla5::blabal6-c'}
I want to find this text: A or B or C. How do I build a regular expression in python to do this?

I think you mean a dictionary although you miss a ' in both cases.
I guess something like this is what you're looking for:
import re
dict = {'id': 'id-::blabla1::blabal2-A'}
test = re.sub(r'.+?::.+?::.+?-(\w)',r'\1',dict['id'])
regex could be simplified but this is all I can do you for based on this info

You can start with this one :
{'id: 'id-(?::.*?){2}-([a-zA-Z])'}
see : https://regex101.com/r/XnlUTi/2
([a-zA-Z])
This will be the group match who return A or B or c

import re
content = "{'id: 'id-::blabla1::blabal2-A'}"
pattern = re.compile('{\'id: \'id-::blabla.*?::blabal.*?-(.*?)\'}', re.S)
print re.findall(pattern, content)
.*? represents anything, (.*?) represents things that you want.
re.findall(pattern, content) will return a list that meet the regular expression.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python extract occurence of a string with regex - python

You can capture alphanumerics coming after the rack:: >>> re.findall(r"rack:(\w+)", line) ['brikbrik0', 'brikbrikadfdas', 'brikbrik1adf']

Related

Regex to capture string if other string present within brackets

Regular expression with condition

Python : Extract substring if exist from another string using regex

python find all matches of multiple strings in a given string using regex?

How to search for text in line using python regular expression?

Categories

Resources