Using regular expression to extract string

Using regular expression to extract string - python

I need to extract the IP address from the following string.
>>> mydns='ec2-54-196-170-182.compute-1.amazonaws.com'
The text to the left of the dot needs to be returned. The following works as expected.
>>> mydns[:18]
'ec2-54-196-170-182'
But it does not work in all cases. For e.g.
mydns='ec2-666-777-888-999.compute-1.amazonaws.com'
>>> mydns[:18]
'ec2-666-777-888-99'
How to I use regular expressions in python?

No need for regex... Just use str.split
mydns.split('.', 1)[0]
Demo:
>>> mydns='ec2-666-777-888-999.compute-1.amazonaws.com'
>>> mydns.split('.', 1)[0]
'ec2-666-777-888-999'

If you wanted to use regex for this:
Regex String
ec2-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3}).*
Alternative (EC2 Agnostic):
.*\b([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3}).*
Replacement String
Regular: \1.\2.\3.\4
Reverse: \4.\3.\2.\1
Python code
import re
subject = 'ec2-54-196-170-182.compute-1.amazonaws.com'
result = re.sub("ec2-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3}).*", r"\1.\2.\3.\4", subject)
print result

This regex will match (^[^.]+:
So Try this:
import re
string = "ec2-54-196-170-182.compute-1.amazonaws.com"
ip = re.findall('^[^.]+',string)[0]
print ip
Output:
ec2-54-196-170-182
Best thing is this will match even if the instance was ec2,ec3 so this regex is actually very much similar to the code of #mgilson

Related

How do I print matches from a regex given a string value in Python?

I have the string "/browse/advanced-computer-science-modules?title=machine-learning"** in Python. I want to print the string in between the second "/" and the "?", which is "advanced-computer-science-modules".
I've created a regular expression that is as follows ^([a-z]*[\-]*[a-z])*?$ but it prints nothing when I run the .findall() function from the re module.
I created my own regex and imported the re module in python. Below is a snippet of my code that returned nothing.
regex = re.compile(r'^([a-z]*[\-]*[a-z])*?$')
str = '/browse/advanced-computer-science-modules?title=machine-learning'
print(regex.findall(str))

Since this appears to be a URL, I'd suggest you use URL-parsing tools instead:
>>> from urllib.parse import urlsplit
>>> url = '/browse/advanced-computer-science-modules?title=machine-learning'
>>> s = urlsplit(url)
SplitResult(scheme='', netloc='', path='/browse/advanced-computer-science-modules', query='title=machine-learning', fragment='')
>>> s.path
'/browse/advanced-computer-science-modules'
>>> s.path.split('/')[-1]
'advanced-computer-science-modules'

The regex is as follows:
\/[a-zA-Z\-]+\?
Then you catch the substring:
regex.findall(str)[1:len(str) - 1]
Very specific to this problem, but it should work.

Alternatively, you can use split method of a string:
str = '/browse/advanced-computer-science-modules?title=machine-learning'
result = str.split('/')[-1].split('?')[0]
print(result)
#advanced-computer-science-modules

How to use regex to find the middle of a string

I'm trying to get certain results out of the response from Blogger. I wanna get my blog names. How would I go about something like that with Regex? I've tried Googling my issue but none of the answers helped me in my case unfortunately.
So my response looks something like this:
\\x22http://emyblog.blogspot.com/
So it's always starting with the \\x22http:// and ending with .blogspot.com/
I've tried the following re:
regEx = re.findall(b"""\x22http://(.*)\.blogspot\.com""", r)
But unfortunately it returned an empty list. Any idea's on how to solve this problem?
Thanks,

Use a raw string, otherwise \\x22 is interpreted as the character " instead of a literal string. Not sure that the re.findall method is the good method, re.search should suffice.
Assuming your byte-string is:
>>> r = rb'\\x22http://emyblog.blogspot.com/'
With byte-strings:
>>> res = re.search(rb'\\x22http://(.*)\.blogspot\.com/', r)
>>> res.group(1)
b'emyblog'
With normal strings:
>>> res = re.search(r'\\\\x22http://(.*)\.blogspot\.com/', r.decode('utf-8'))
>>> res.group(1)
'emyblog'

use r'' (string is taken as raw string literal) instead of b''
import re
pattern = re.compile(r'\x22http://(.*)\.blogspot\.com')
match = pattern.match('\x22http://emyblog.blogspot.com/')
match.group(1)
# 'emyblog'

This seems to be working!
import re
text = "\x22http://emyblog.blogspot.com/"
regex = re.compile('\x22http://(.*)\.blogspot\.com')
print regex.findall(text)

Complex regex in Python

I am trying to write a generic pattern using regex so that it fetches only particular things from the string. Let's say we have strings like GigabitEthernet0/0/0/0 or FastEthernet0/4 or Ethernet0/0.222. The regex should fetch the first 2 characters and all the numerals. Therefore, the fetched result should be something like Gi0000 or Fa04 or Et00222 depending on the above cases.
x = 'GigabitEthernet0/0/0/2
m = re.search('([\w+]{2}?)[\\\.(\d+)]{0,}',x)
I am not able to understand how shall I write the regular expression. The values can be fetched in the form of a list also. I write few more patterns but it isn't helping.

In regex, you may use re.findall function.
>>> import re
>>> s = 'GigabitEthernet0/0/0/0 '
>>> s[:2]+''.join(re.findall(r'\d', s))
'Gi0000'
OR
>>> ''.join(re.findall(r'^..|\d', s))
'Gi0000'
>>> ''.join(re.findall(r'^..|\d', 'Ethernet0/0.222'))
'Et00222'
OR
>>> s = 'GigabitEthernet0/0/0/0 '
>>> s[:2]+''.join([i for i in s if i.isdigit()])
'Gi0000'

z="Ethernet0/0.222."
print z[:2]+"".join(re.findall(r"(\d+)(?=[\d\W]*$)",z))
You can try this.This will make sure only digits from end come into play .

Here is another option:
s = 'Ethernet0/0.222'
"".join(re.findall('^\w{2}|[\d]+', s))

how to remove tokens that contains number followed by character using regular expression in python?

As the question says, how do I replace a token like '23abc' with '' using regular expression in python. It shouldn't affect for characters tokens like 'hello','jimmy','trip','travel' etc.
my code:
import re
str="23abcd"
print re.sub(r"[0-9a-z]","",str)
But the code doesn't work if str like 'hello' are passed. It still replaces with ''. Please help. Thanks.

Try this pattern:
re.sub(r"[0-9]+[a-z]+","",str)

It should be:
>>> import re
>>> pattern="23abcd"
>>> _str = "a mlmsm 23abcd smo jimmy"
>>> re.sub(pattern, "", _str)
'a mlmsm smo jimmy'

regex and replace on string using python

I am rather new to Python Regex (regex in general) and I have been encountering a problem. So, I have a few strings like so:
str1 = r'''hfo/gfbi/mytag=a_17014b_82c'''
str2 = r'''/bkyhi/oiukj/game/?mytag=a_17014b_82c&'''
str3 = r'''lkjsd/image/game/mytag=a_17014b_82c$'''
the & and the $ could be any symbol.
I would like to have a single regex (and replace) which replaces:
mytag=a_17014b_82c
to:
mytag=myvalue
from any of the above 3 strings. Would appreciate any guidance on how I can achieve this.
UPDATE: the string to be replaced is always not the same. So, a_17014b_82c could be anything in reality.

If the string to be replaced is constant you don't need a regex. Simply use replace:
>>> str1 = r'''hfo/gfbi/mytag=a_17014b_82c'''
>>> str1.replace('a_17014b_82c','myvalue')
'hfo/gfbi/mytag=myvalue'

Use re.sub:
>>> import re
>>> r = re.compile(r'(mytag=)(\w+)')
>>> r.sub(r'\1myvalue', str1)
'hfo/gfbi/mytag=myvalue'
>>> r.sub(r'\1myvalue', str2)
'/bkyhi/oiukj/game/?mytag=myvalue&'
>>> r.sub(r'\1myvalue', str3)
'lkjsd/image/game/mytag=myvalue$'

import re
r = re.compile(r'(mytag=)\w+$')
r.sub(r'\1myvalue', str1)
This is based on #Ashwini's answer, two small changes are we are saying the mytag=a_17014b part should be at the end of input, so that even inputs such as
str1 = r'''/bkyhi/mytag=blah/game/?mytag=a_17014b_82c&'''
will work fine, substituting the last mytag instead of the the first.
Another small change is we are not unnecessarily capturing the \w+, since we aren't using it anyway. This is just for a bit of code clarity.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using regular expression to extract string - python

No need for regex... Just use str.split mydns.split('.', 1)[0] Demo: >>> mydns='ec2-666-777-888-999.compute-1.amazonaws.com' >>> mydns.split('.', 1)[0] 'ec2-666-777-888-999'

Related

How do I print matches from a regex given a string value in Python?

How to use regex to find the middle of a string

Complex regex in Python

how to remove tokens that contains number followed by character using regular expression in python?

regex and replace on string using python

Categories

Resources