How to delete copies of character from string using regex? - python

I want to delete copies of 'i' in this example, tried using groups but it's not working. Where am I doing this wrong?
import re
a = '123iiii'
b = re.match('.*i(i+)', a)
print(b.group(1))
>>> i
a = re.sub(b.group(1), '', a)
print(a)
>>> 123
Desired result is '123i'.
Thanks for the answer.

Maybe,
([^i]*i)i*([^\r\n]*)
and a replacement of,
\1\2
might be OK to look into.
Test
import re
string = '''
123iiii
123iiiiabc
123i
'''
expression = r'([^i]*i)i*([^\r\n]*)'
print(re.sub(expression, r'\1\2', string))
Output
123i
123iabc
123i
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:

It seems that what you need is:
import re
a = '123iiii'
a = re.sub(r"i+", "i", a)
print(a)
>>> 123i

You can achieve your goal by simply using the sub function to replace a sequence of i's with a single i
import re
a = '123iiii'
a = re.sub(r'i+', 'i', a)
print(a)

The following will work even if you have the i's in multiple places in the string.
import re
s = '123iiii657iii'
re.sub('i+','i',s)
Output:
'123i675i'

Related

Complex regex in Python

I am trying to write a generic pattern using regex so that it fetches only particular things from the string. Let's say we have strings like GigabitEthernet0/0/0/0 or FastEthernet0/4 or Ethernet0/0.222. The regex should fetch the first 2 characters and all the numerals. Therefore, the fetched result should be something like Gi0000 or Fa04 or Et00222 depending on the above cases.
x = 'GigabitEthernet0/0/0/2
m = re.search('([\w+]{2}?)[\\\.(\d+)]{0,}',x)
I am not able to understand how shall I write the regular expression. The values can be fetched in the form of a list also. I write few more patterns but it isn't helping.
In regex, you may use re.findall function.
>>> import re
>>> s = 'GigabitEthernet0/0/0/0 '
>>> s[:2]+''.join(re.findall(r'\d', s))
'Gi0000'
OR
>>> ''.join(re.findall(r'^..|\d', s))
'Gi0000'
>>> ''.join(re.findall(r'^..|\d', 'Ethernet0/0.222'))
'Et00222'
OR
>>> s = 'GigabitEthernet0/0/0/0 '
>>> s[:2]+''.join([i for i in s if i.isdigit()])
'Gi0000'
z="Ethernet0/0.222."
print z[:2]+"".join(re.findall(r"(\d+)(?=[\d\W]*$)",z))
You can try this.This will make sure only digits from end come into play .
Here is another option:
s = 'Ethernet0/0.222'
"".join(re.findall('^\w{2}|[\d]+', s))

Using regular expression to extract string

I need to extract the IP address from the following string.
>>> mydns='ec2-54-196-170-182.compute-1.amazonaws.com'
The text to the left of the dot needs to be returned. The following works as expected.
>>> mydns[:18]
'ec2-54-196-170-182'
But it does not work in all cases. For e.g.
mydns='ec2-666-777-888-999.compute-1.amazonaws.com'
>>> mydns[:18]
'ec2-666-777-888-99'
How to I use regular expressions in python?
No need for regex... Just use str.split
mydns.split('.', 1)[0]
Demo:
>>> mydns='ec2-666-777-888-999.compute-1.amazonaws.com'
>>> mydns.split('.', 1)[0]
'ec2-666-777-888-999'
If you wanted to use regex for this:
Regex String
ec2-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3}).*
Alternative (EC2 Agnostic):
.*\b([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3}).*
Replacement String
Regular: \1.\2.\3.\4
Reverse: \4.\3.\2.\1
Python code
import re
subject = 'ec2-54-196-170-182.compute-1.amazonaws.com'
result = re.sub("ec2-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3}).*", r"\1.\2.\3.\4", subject)
print result
This regex will match (^[^.]+:
So Try this:
import re
string = "ec2-54-196-170-182.compute-1.amazonaws.com"
ip = re.findall('^[^.]+',string)[0]
print ip
Output:
ec2-54-196-170-182
Best thing is this will match even if the instance was ec2,ec3 so this regex is actually very much similar to the code of #mgilson

search patterns with variable gaps in python

I am looking for patterns in a list containing different strings as:
names = ['TAATGH', 'GHHKLL', 'TGTHA', 'ATGTTKKKK', 'KLPPNF']
I would like to select the string that has the pattern 'T--T' (no matter how the string starts), so those elements would be selected and appended to a new list as:
namesSelected = ['TAATGH', 'ATGTTKKKK']
Using grep I could:
grep "T[[:alpha:]]\{2\}T"
Is there a similar mode in re python?
Thanks for any help!
I think this is most likely what you want:
re.search(r'T[A-Z]{2}T', inputString)
The equivalent in Python for [[:alpha:]] would be [a-zA-Z]. You may replace [A-Z] with [a-zA-Z] in the code snippet above if you wish to allow lowercase alphabet.
Documentation for re.search.
Yep, you can use re.search:
>>> names = ['TAATGH', 'GHHKLL', 'TGTHA', 'ATGTTKKKK', 'KLPPNF']
>>> reslist = []
>>> for i in names:
... res = re.search(r'T[A-Z]{2}T', i)
... if res:
... reslist.append(i)
...
>>>
>>> print(reslist)
['TAATGH', 'ATGTTKKKK']
import re
def grep(l, pattern):
r = re.compile(pattern)
return [_ for _ in l if r.search(pattern)]
nameSelected = grep(names, "T\w{2}T")
Note the use of \w instead of [[:alpha:]]

regex and replace on string using python

I am rather new to Python Regex (regex in general) and I have been encountering a problem. So, I have a few strings like so:
str1 = r'''hfo/gfbi/mytag=a_17014b_82c'''
str2 = r'''/bkyhi/oiukj/game/?mytag=a_17014b_82c&'''
str3 = r'''lkjsd/image/game/mytag=a_17014b_82c$'''
the & and the $ could be any symbol.
I would like to have a single regex (and replace) which replaces:
mytag=a_17014b_82c
to:
mytag=myvalue
from any of the above 3 strings. Would appreciate any guidance on how I can achieve this.
UPDATE: the string to be replaced is always not the same. So, a_17014b_82c could be anything in reality.
If the string to be replaced is constant you don't need a regex. Simply use replace:
>>> str1 = r'''hfo/gfbi/mytag=a_17014b_82c'''
>>> str1.replace('a_17014b_82c','myvalue')
'hfo/gfbi/mytag=myvalue'
Use re.sub:
>>> import re
>>> r = re.compile(r'(mytag=)(\w+)')
>>> r.sub(r'\1myvalue', str1)
'hfo/gfbi/mytag=myvalue'
>>> r.sub(r'\1myvalue', str2)
'/bkyhi/oiukj/game/?mytag=myvalue&'
>>> r.sub(r'\1myvalue', str3)
'lkjsd/image/game/mytag=myvalue$'
import re
r = re.compile(r'(mytag=)\w+$')
r.sub(r'\1myvalue', str1)
This is based on #Ashwini's answer, two small changes are we are saying the mytag=a_17014b part should be at the end of input, so that even inputs such as
str1 = r'''/bkyhi/mytag=blah/game/?mytag=a_17014b_82c&'''
will work fine, substituting the last mytag instead of the the first.
Another small change is we are not unnecessarily capturing the \w+, since we aren't using it anyway. This is just for a bit of code clarity.

python: how to remove '$'?

All I want to do is remove the dollar sign '$'. This seems simple, but I really don't know why my code isn't working.
import re
input = '$5'
if '$' in input:
input = re.sub(re.compile('$'), '', input)
print input
Input still is '$5' instead of just '5'! Can anyone help?
Try using replace instead:
input = input.replace('$', '')
As Madbreaks has stated, $ means match the end of the line in a regular expression.
Here is a handy link to regular expressions: http://docs.python.org/2/library/re.html
In this case, I'd use str.translate
>>> '$$foo$$'.translate(None,'$')
'foo'
And for benchmarking purposes:
>>> def repl(s):
... return s.replace('$','')
...
>>> def trans(s):
... return s.translate(None,'$')
...
>>> import timeit
>>> s = '$$foo bar baz $ qux'
>>> print timeit.timeit('repl(s)','from __main__ import repl,s')
0.969965934753
>>> print timeit.timeit('trans(s)','from __main__ import trans,s')
0.796354055405
There are a number of differences between str.replace and str.translate. The most notable is that str.translate is useful for switching 1 character with another whereas str.replace replaces 1 substring with another. So, for problems like, I want to delete all characters a,b,c, or I want to change a to d, I suggest str.translate. Conversely, problems like "I want to replace the substring abc with def" are well suited for str.replace.
Note that your example doesn't work because $ has special meaning in regex (it matches at the end of a string). To get it to work with regex you need to escape the $:
>>> re.sub('\$','',s)
'foo bar baz qux'
works OK.
$ is a special character in regular expressions that translates to 'end of the string'
you need to escape it if you want to use it literally
try this:
import re
input = "$5"
if "$" in input:
input = re.sub(re.compile('\$'), '', input)
print input
You need to escape the dollar sign - otherwise python thinks it is an anchor http://docs.python.org/2/library/re.html
import re
fred = "$hdkhsd%$"
print re.sub ("\$","!", fred)
>> !hdkhsd%!
Aside from the other answers, you can also use strip():
input = input.strip('$')

Categories

Resources