Python regex string matching with varying search string

Python regex string matching with varying search string - python

Is there anyway in python to be able to perform:
"DDx" should match "01x", "10x", "11x, "00x"
in an elegant way in Python?
The easiest way I see to do this is by using regex, which in this case would be:
re.search('\d\dx',line)
Is there anyway to dynamically update this regex?
In case the input is:
"D0x" then regex: \d0x
Please help.
Using Python 2.7
EDIT
In simpler terms, my question is:
>>> str = "DDx"
>>> str.replace('\d','D')
>>> re.search(<use str here>,line)
Or any alternate approach

I think I found the answer:
>>> s = "DDx"
>>> s = s.replace('D','\d')
>>> p = "01x"
>>> c = re.search(s,p)
>>> print c.group(0)
>>> "01x"

Related

retrieve subset of string with regex - python [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
p = "\home\gef\Documents\abc_this_word_dfg.gz.tar"
I'm looking for a way to retrieve this_word.
base = os.path.basename(p)
base1 = base.replace("abc_","")
base1.replace("_dfg.gz.tar","")
this works, but it's not ideal because I would need to know in advance what strings I want to remove. Maybe a regex would be appropriate here?

You don't give much information, but from what is shown can't you just use string slicing?
Maybe like this:
>>> p = os.path.join('home', 'gef', 'Documents', 'abc_this_word_dfg.gz.tar')
>>> p
'home/gef/Documents/abc_this_word_dfg.gz.tar'
>>> os.path.dirname(p)
'home/gef/Documents'
>>> os.path.basename(p)
'abc_this_word_dfg.gz.tar'
>>> os.path.basename(p)[4:-11]
'this_word'

You don't give much information, but from what is shown can't you just split on _ chars?
Maybe like this:
>>> p = os.path.join('home', 'gef', 'Documents', 'abc_this_word_dfg.gz.tar')
>>> p
'home/gef/Documents/abc_this_word_dfg.gz.tar'
>>> os.path.dirname(p)
'home/gef/Documents'
>>> os.path.basename(p)
'abc_this_word_dfg.gz.tar'
>>> '_'.join(
... os.path.basename(p).split('_')[1:-1])
'this_word'
It splits by underscores, then discards the first and last part, finally joining the other parts together with underscore (if this_word had no underscores, then there will be only one part left and no joining will be done).

how to split the text using python?

f_output.write('\n{}, {}\n'.format(filename, summary))
I am printing the output as the name of the file. I am getting the output as VCALogParser_output_ARW.log, VCALogParser_output_CZC.log and so on. but I am interested only in printing ARW, CZC and so on. So please someone can tell me how to split this text ?

filename.split('_')[-1].split('.')[0]
this will give you : 'ARW'
summary.split('_')[-1].split('.')[0]
and this will give you: 'CZC'

If you are only interested in CZC and ARW without the .log then, you can do it with re.search method:
>>> import re
>>> s1 = 'VCALogParser_output_ARW.log'
>>> s2 = 'VCALogParser_output_CZC.log'
>>> re.search(r'.*_(.*)\.log', s1).group(1)
'ARW'
>>> re.search(r'.*_(.*)\.log', s2).group(1)
'CZC'
Or better maker your patten p then call its search method when formatting your string:
>>> p = re.compile(r'.*_(.*)\.log')
>>>
>>> '\n{}, {}\n'.format(p.search(s1).group(1), p.search(s2).group(1))
'\nARW, CZC\n'
Also, it might be helpful using re.sub with positive look ahead and group naming:
>>> p = re.compile(r'.*(?<=_)(?P<mystr>[a-zA-Z0-9]+)\.log$')
>>>
>>>
>>> p.sub('\g<mystr>', s1)
'ARW'
>>> p.sub('\g<mystr>', s2)
'CZC'
>>>
>>>
>>> '\n{}, {}\n'.format(p.sub('\g<mystr>', s1), p.sub('\g<mystr>', s2))
'\nARW, CZC\n'
In case, you are not able or you don't want to use re module, then you can define lengths of strings that you don't need and index your string variables with them:
>>> i1 = len('VCALogParser_output_')
>>> i2 = len ('.log')
>>>
>>> '\n{}, {}\n'.format(s1[i1:-i2], s2[i1:-i2])
'\nARW, CZC\n'
But keep in mind that the above is valid as long as you have those common strings in all of your string variables.

fname.split('_')[-1]
is rought but this will give you 'CZC.log', 'ARW.log' and so on, assuming that all files have the same underscore-delimited format.

If the format of the file is always such that it ends with _ARW.log or _CZC.log this is really easy to do just using the standard string split() method, with two consecutive splits:
shortname = filename.split("_")[-1].split('.')[0]
Or, to make it (arguably) a bit more readable, we can use the os module:
shortname = os.path.splitext(filename)[0].split("_")[-1]

You can also try:
>>> s1 = 'VCALogParser_output_ARW.log'
>>> s2 = 'VCALogParser_output_CZC.log'
>>> s1.split('_')[2].split('.')[0]
ARW
>>> s2.split('_')[2].split('.')[0]
CZC

Parse file name correctly, so basically my guess is that you wanna to strip file extension .log and prefix VCALogParser_output_ to do that it's enough to use str.replace rather than using str.split
Use os.linesep when you writing to file to have cross-browser
Code below would perform desired result(after applying steps listed above):
import os
filename = 'VCALogParser_output_SOME_NAME.log'
summary = 'some summary'
fname = filename.replace('VCALogParser_output_', '').replace('.log', '')
linesep = os.linesep
f_output.write('{linesep}{fname}, {summary}{linesep}'
.format(fname=fname, summary=summary, linesep=linesep))
# or if vars in execution scope strictly controlled pass locals() into format
f_output.write('{linesep}{fname}, {summary}{linesep}'.format(**locals()))

Complex regex in Python

I am trying to write a generic pattern using regex so that it fetches only particular things from the string. Let's say we have strings like GigabitEthernet0/0/0/0 or FastEthernet0/4 or Ethernet0/0.222. The regex should fetch the first 2 characters and all the numerals. Therefore, the fetched result should be something like Gi0000 or Fa04 or Et00222 depending on the above cases.
x = 'GigabitEthernet0/0/0/2
m = re.search('([\w+]{2}?)[\\\.(\d+)]{0,}',x)
I am not able to understand how shall I write the regular expression. The values can be fetched in the form of a list also. I write few more patterns but it isn't helping.

In regex, you may use re.findall function.
>>> import re
>>> s = 'GigabitEthernet0/0/0/0 '
>>> s[:2]+''.join(re.findall(r'\d', s))
'Gi0000'
OR
>>> ''.join(re.findall(r'^..|\d', s))
'Gi0000'
>>> ''.join(re.findall(r'^..|\d', 'Ethernet0/0.222'))
'Et00222'
OR
>>> s = 'GigabitEthernet0/0/0/0 '
>>> s[:2]+''.join([i for i in s if i.isdigit()])
'Gi0000'

z="Ethernet0/0.222."
print z[:2]+"".join(re.findall(r"(\d+)(?=[\d\W]*$)",z))
You can try this.This will make sure only digits from end come into play .

Here is another option:
s = 'Ethernet0/0.222'
"".join(re.findall('^\w{2}|[\d]+', s))

regex and replace on string using python

I am rather new to Python Regex (regex in general) and I have been encountering a problem. So, I have a few strings like so:
str1 = r'''hfo/gfbi/mytag=a_17014b_82c'''
str2 = r'''/bkyhi/oiukj/game/?mytag=a_17014b_82c&'''
str3 = r'''lkjsd/image/game/mytag=a_17014b_82c$'''
the & and the $ could be any symbol.
I would like to have a single regex (and replace) which replaces:
mytag=a_17014b_82c
to:
mytag=myvalue
from any of the above 3 strings. Would appreciate any guidance on how I can achieve this.
UPDATE: the string to be replaced is always not the same. So, a_17014b_82c could be anything in reality.

If the string to be replaced is constant you don't need a regex. Simply use replace:
>>> str1 = r'''hfo/gfbi/mytag=a_17014b_82c'''
>>> str1.replace('a_17014b_82c','myvalue')
'hfo/gfbi/mytag=myvalue'

Use re.sub:
>>> import re
>>> r = re.compile(r'(mytag=)(\w+)')
>>> r.sub(r'\1myvalue', str1)
'hfo/gfbi/mytag=myvalue'
>>> r.sub(r'\1myvalue', str2)
'/bkyhi/oiukj/game/?mytag=myvalue&'
>>> r.sub(r'\1myvalue', str3)
'lkjsd/image/game/mytag=myvalue$'

import re
r = re.compile(r'(mytag=)\w+$')
r.sub(r'\1myvalue', str1)
This is based on #Ashwini's answer, two small changes are we are saying the mytag=a_17014b part should be at the end of input, so that even inputs such as
str1 = r'''/bkyhi/mytag=blah/game/?mytag=a_17014b_82c&'''
will work fine, substituting the last mytag instead of the the first.
Another small change is we are not unnecessarily capturing the \w+, since we aren't using it anyway. This is just for a bit of code clarity.

Python regular words cut

I have string: './money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]'
I need string: '27-10-2011 17:07:02'
How can i do this in python?

There are many ways to do this, one way is to use str.partition:
text='./money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]'
before,_,after = text.partition('[')
print(after[:-1])
# 27-10-2011 17:07:02
Another is to use str.split:
before,after = text.split('[',1)
print(after[:-1])
# 27-10-2011 17:07:02
or str.find and str.rfind:
ind1 = text.find('[')+1
ind2 = text.rfind(']')
print(text[ind1:ind2])
All these methods rely on the desired substring immediately following the first left-bracket [.
The first two methods also rely on the desired substring ending at the next-to-last character in text. The last method (using rfind) searches from the right for the index of the right-bracket, so it is a little more general, and does not depend on quite so many (potential off-by-one) constants.

If your string has always the same structure this is probably the simplest solution:
s = r'./money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]'
s[s.find("[")+1:s.find("]")]
Update:
After seeing some of the other answers this is a slight improvement:
s[s.find("[")+1:-1]
Exploiting the fact that the closing square bracket is the last character in your string.

If the format is "fixed", you can also use this
>>> s = './money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]'
>>> s[-20:-1:]
'27-10-2011 17:07:02'
>>>

You can also use regular expression:
import re
s = './money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]'
print re.search(r'\[(.*?)\]', s).group(1)

Try with a regex :
import re
re.findall(".*\[(.*)\]", './money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]')
>>> ['27-10-2011 17:07:02']

Probably the easiest way(if you know the string will always be in this format
>>> s = './money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]'
>>> s[s.index('[') + 1:-1]
'27-10-2011 17:07:02'

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python regex string matching with varying search string - python

I think I found the answer: >>> s = "DDx" >>> s = s.replace('D','\d') >>> p = "01x" >>> c = re.search(s,p) >>> print c.group(0) >>> "01x"

Related

retrieve subset of string with regex - python [duplicate]

how to split the text using python?

Complex regex in Python

regex and replace on string using python

Python regular words cut

Categories

Resources