Python Regular Express Lookahead multiple conditions - python

My string looks like this:
string = "*[EQ](#[Type],'A,B,C',#[Type],*[EQ](#[Type],D,E,F))"
The ideal output list is:
['#[Type]', 'A,B,C', '#[Type]', '*[EQ](#[Type],D,E,F)']
So I can parse the string as:
if #[Type] in ('A,B,C') then #[Type] else *[EQ](#[Type],D,E,F)
The challenge is to find all the commas followed by #, ' or *. I've tried the following code but it doesn't work:
interM = re.search(r"\*\[EQ\]\((.+)(?=,#|,\*|,\')+,(.+)\)", string)
print(interM.groups())
Edit:
The ultimate goal is to parse out the 4 components of the input string:
*[EQ](Value, Target, ifTrue, ifFalse)

>>> import re
>>> string = "*[EQ](#[Type],'A,B,C',#[Type],*[EQ](#[Type],D,E,F))"
>>> re.split(r"^\*\[EQ\]\(|\)$|,(?=[#'*])", string)[1:-1]
['#[Type]', "'A,B,C'", '#[Type]', '*[EQ](#[Type],D,E,F)']
Although, if you are looking for a more robust solution I'd highly recommend a Lexical Analyzer such as flex.

x="*[EQ](#[Type],'A,B,C',#[Type],*[EQ](#[Type],D,E,F))"
print re.findall(r"#[^,]+|'[^']+'|\*.*?\([^\)]*\)",re.findall(r"\*\[EQ\]\((.*?)\)$",x)[0])
Output:
['#[Type]', "'A,B,C'", '#[Type]', '*[EQ](#[Type],D,E,F)']
You can try something of this sort.You have not mentioned the logic or anything so not sure if this can be scaled.

Related

Complex regex in Python

I am trying to write a generic pattern using regex so that it fetches only particular things from the string. Let's say we have strings like GigabitEthernet0/0/0/0 or FastEthernet0/4 or Ethernet0/0.222. The regex should fetch the first 2 characters and all the numerals. Therefore, the fetched result should be something like Gi0000 or Fa04 or Et00222 depending on the above cases.
x = 'GigabitEthernet0/0/0/2
m = re.search('([\w+]{2}?)[\\\.(\d+)]{0,}',x)
I am not able to understand how shall I write the regular expression. The values can be fetched in the form of a list also. I write few more patterns but it isn't helping.
In regex, you may use re.findall function.
>>> import re
>>> s = 'GigabitEthernet0/0/0/0 '
>>> s[:2]+''.join(re.findall(r'\d', s))
'Gi0000'
OR
>>> ''.join(re.findall(r'^..|\d', s))
'Gi0000'
>>> ''.join(re.findall(r'^..|\d', 'Ethernet0/0.222'))
'Et00222'
OR
>>> s = 'GigabitEthernet0/0/0/0 '
>>> s[:2]+''.join([i for i in s if i.isdigit()])
'Gi0000'
z="Ethernet0/0.222."
print z[:2]+"".join(re.findall(r"(\d+)(?=[\d\W]*$)",z))
You can try this.This will make sure only digits from end come into play .
Here is another option:
s = 'Ethernet0/0.222'
"".join(re.findall('^\w{2}|[\d]+', s))

Extracting text in the middle of a string - python

i was wondering if anyone has a simpler solution to extract a few letters in the middle of a string. i want to retrive the 3 letters (in this case, GMB) and all the entries follow the same patter. i'struggling o get a simpler way of doing this.
here is an example of what i've been using.
entry = "entries-alphabetical.jsp?raceid13=GMB$20140313A"
symbol = entry.strip('entries-alphabetical.jsp?raceid13=')
symbol = symbol[0:3]
print symbol
thanks
First of all the argument passed to str.strip is not prefix or suffix, it is just a combination of characters that you want to be stripped off from the string.
Since the string looks like an url, you can use urlparse.parse_qsl:
>>> import urlparse
>>> urlparse.parse_qsl(entry)
[('entries-alphabetical.jsp?raceid13', 'GMB$20140313A')]
>>> urlparse.parse_qsl(entry)[0][1][:3]
'GMB'
This is what regular expressions are for. http://docs.python.org/2/library/re.html
import re
val = re.search(r'(GMB.*)', entry)
print val.group(1)

Using regular expression to extract string

I need to extract the IP address from the following string.
>>> mydns='ec2-54-196-170-182.compute-1.amazonaws.com'
The text to the left of the dot needs to be returned. The following works as expected.
>>> mydns[:18]
'ec2-54-196-170-182'
But it does not work in all cases. For e.g.
mydns='ec2-666-777-888-999.compute-1.amazonaws.com'
>>> mydns[:18]
'ec2-666-777-888-99'
How to I use regular expressions in python?
No need for regex... Just use str.split
mydns.split('.', 1)[0]
Demo:
>>> mydns='ec2-666-777-888-999.compute-1.amazonaws.com'
>>> mydns.split('.', 1)[0]
'ec2-666-777-888-999'
If you wanted to use regex for this:
Regex String
ec2-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3}).*
Alternative (EC2 Agnostic):
.*\b([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3}).*
Replacement String
Regular: \1.\2.\3.\4
Reverse: \4.\3.\2.\1
Python code
import re
subject = 'ec2-54-196-170-182.compute-1.amazonaws.com'
result = re.sub("ec2-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3}).*", r"\1.\2.\3.\4", subject)
print result
This regex will match (^[^.]+:
So Try this:
import re
string = "ec2-54-196-170-182.compute-1.amazonaws.com"
ip = re.findall('^[^.]+',string)[0]
print ip
Output:
ec2-54-196-170-182
Best thing is this will match even if the instance was ec2,ec3 so this regex is actually very much similar to the code of #mgilson

Regular expression split

I have inputs similar to the following:
TV-12VX
TV-14JW
TV-2JIS
VC-224X
I need to remove everything after the numbers after the dash. The result would be:
TV-12
TV-14
TV-2
TV-224
How would I do this split via regular expressions?
The following code shows how to match strings of the form "TV-" + (some number):
>>> re.match('TV-[0-9]+','TV-12VX').group(0)
'TV-12'
(Note that, because I'm using match, this only works if the string starts with the bit you want to extract.)
I think this regex is appropriate for you: (.+?-\d+?)[a-zA-Z]. You can use it with re.findall, or re.match.
import re
p = re.match('([\w]{2}-\d+)', 'TV-12VX')
print(p.group(0))
Outputs
TV-12
You can remove everything after the digits with this:
re.sub(r"^(\w+-\d+).*", r"\1", input)

How to convert specific character sequences in a string to upper case using Python?

I am looking to accomplish the following and am wondering if anyone has a suggestion as to how best go about it.
I have a string, say 'this-is,-toronto.-and-this-is,-boston', and I would like to convert all occurrences of ',-[a-z]' to ',-[A-Z]'. In this case the result of the conversion would be 'this-is,-Toronto.-and-this-is,-Boston'.
I've been trying to get something working with re.sub(), but as yet haven't figured out how how
testString = 'this-is,-toronto.-and-this-is,-boston'
re.sub(r',_([a-z])', r',_??', testString)
Thanks!
re.sub can take a function which returns the replacement string:
import re
s = 'this-is,-toronto.-and-this-is,-boston'
t = re.sub(',-[a-z]', lambda x: x.group(0).upper(), s)
print t
prints
this-is,-Toronto.-and-this-is,-Boston

Categories

Resources