I want to check if a string ends with a "_INT".
Here is my code
nOther = "c1_1"
tail = re.compile('_\d*$')
if tail.search(nOther):
nOther = nOther.replace("_","0")
print nOther
output:
c101
c102
c103
c104
but there may be two underscores in the string, I am only interested in the last one.
How can I edit my code to handle this?
Using two steps is useless (check if the pattern matches, make the replacement), because re.sub makes it in one step:
txt = re.sub(r'_(?=\d+$)', '0', txt)
The pattern use a lookahead (?=...) (i.e. followed by) that is only a check and the content inside is not a part of the match result. (In other words \d+$ is not replaced)
One way to do it would be to capture everything that is not the last underscore and rebuild the string.
import re
nOther = "c1_1"
tail = re.compile('(.*)_(\d*$)')
tail.sub(nOther, "0")
m = tail.search(nOther)
if m:
nOther = m.group(1) + '0' + m.group(2)
print nOther
Related
I have this string
cmd = "show run IP(k1) new Y(y1) add IP(dev.maintserial):Y(dev.maintkeys)"
What is a regex to first match exactly "IP(dev.maintserial):Y(dev.maintkeys)"
There might be a different path inside the parenthesis, like (name.dev.serial), so it is not like there will always be one dot there.
I though of something like this:
re.search('(IP\(.*?\):Y\(.*?\))', cmd) but this will also match the single IP(k1) and Y(y1
My usage will be:
If "IP(*):Y(*)" in cmd:
do substitution of IP(dev.maintserial):Y(dev.maintkeys) to Y(dev.maintkeys.IP(dev.maintserial))
How can I then do the above substitution? In the if condition I want to do this change in order: from IP(path_to_IP_key):Y(path_to_Y_key) to Y(path_to_Y_key.IP(path_to_IP_key)) , so IP is inside Y at the end after the dot.
This should work as it is more restrictive.
(IP\([^\)]+\):Y\(.*?\))
[^\)]+ means at least one character that isn't a closing parenthesis.
.*? in yours is too open ended allowing almost anything to be in until "):Y("
Something like this?
r"IP\(([^)]*\..+)\):Y\(([^)]*\..+)\)"
You can try it with your string. It matches the entire string IP(dev.maintserial):Y(dev.maintkeys) with groups dev.maintserial and dev.maintkeys.
The RE matches IP(, zero or more characters that are not a closing parenthesis ([^)]*), a period . (\.), one or more of any characters (.+), then ):Y(, ... (between the parentheses -- same as above), ).
Example Usage
import re
cmd = "show run IP(k1) new Y(y1) add IP(dev.maintserial):Y(dev.maintkeys)"
# compile regular expression
p = re.compile(r"IP\(([^)]*\..+)\):Y\(([^)]*\..+)\)")
s = p.search(cmd)
# if there is a match, s is not None
if s:
print(f"{s[0]}\n{s[1]}\n{s[2]}")
a = "Y(" + s[2] + ".IP(" + s[1] + "))"
print(f"\n{a}")
Above p.search(cmd) "[s]can[s] through [cmd] looking for the first location where this regular expression [p] produces a match, and return[s] a corresponding match object" (docs). None is the return value if there is no match. If there is a match, s[0] gives the entire match, s[1] gives the first parenthesized subgroup, and s[2] gives the second parenthesized subgroup (docs).
Output
IP(dev.maintserial):Y(dev.maintkeys)
dev.maintserial
dev.maintkeys
Y(dev.maintkeys.IP(dev.maintserial))
You can use 2 negated character classes [^()]* to match any character except parenthesis, and omit the outer capture group for a match only.
To prevent a partial word match, you might start matching IP with a word boundary \b
\bIP\([^()]*\):Y\([^()]*\)
Regex demo
I have a string "person:x:1319:nobody,jram,dapp,test1,app1,lasp\r\n" (for example) and need to split the string and get output only as
"nobody,jram,dapp,test1,app1,lasp\r\n"
how will i be able to do that?
You can use str.rsplit() it will split the string based on the delemiter from right side. rsplit() return result as list then you can access the values using index.
s = "person:x:1319:nobody,jram,dapp,test1,app1,lasp\r\n"
res = s.rsplit(':', 1)[-1]
print(res)
This solution uses regex to find an occurrence of a digit followed by a colon. Then returns the part afterwards as the match.
import re
s1 = "person:x:1319:nobody,jram,dapp,test1,app1,lasp\r\n"
m = re.search(r'(?<=\d:).*', s1)
match1 = m.group(0)
print(match1)
Output: nobody,jram,dapp,test1,app1,lasp
Note that this solution will still work (according to what was requested in the title) even if you have another colon in the text which is not preceded by a number.
s2 = "person:x:1319:test:nobody,jram,dapp,test1,app1,lasp\r\n"
m = re.search(r'(?<=\d:).*', s2)
match2 = m.group(0)
print(match2)
Output: test:nobody,jram,dapp,test1,app1,lasp
What is the clean way in python to do this simple text fixing - checking if every full stop (except the last one) is followed by space. Assume that having a dot not followed by an empty space is the only possible error we can get in the input string.
I am doing this:
def textFix(text):
result = re.sub('\.(?!\s)', '. ', text)
if (result[len(result) - 1]) == ' ':
return result[:-1]
return result
You may check it with
\.(?!\s|$)
See the regex demo. It matches a dot not followed with whitespace or end of string, that is, any non-final dot that has no whitespace after it.
Or, you may also consider
\.(?=\S)
to match any dot followed with a non-whitespace char.
See another demo.
Python demo:
import re
rx = r"\.(?=\S)"
s = "Text1. Text2.Text3."
result = re.sub(rx, ". ", s)
print(result)
# => "Text1. Text2. Text3."
Your technique looks perfect. But also include a check to avoid adding space after last dot (.)
\.(?!\s)(?!$)
where (?!$) helps make sure if the . is followed by end of string $ then isn't matched and so no space is added after it.
Regex 101 demo
I have a string like this:
data='WebSpherePMI_jvmRuntimeModule_ProcessCpuUsage'
I need to get rid of everything until the first instance of the underline (inclusive) in regex.
I've tried this:
re.sub("(^.*\_),"", data)
but this get rids of everything before all underlines
ProcessCpuUsage
I need it to be:
jvmRuntimeModule_ProcessCpuUsag
Use this instead:
from string import find
data='WebSpherePMI_jvmRuntimeModule_ProcessCpuUsage'
result = data[find(data, "_")+1:]
print result
re.sub("(^.*\_),"", data)
This makes . match every character in the line. Once it gets to the end, and can't match any more ".", it goes to the next token. Oops, that's a underscore! So, it backtracks back before the _ProcessCpuUsage, where it can match a underscore at the start, and then complete the match.
You should ask the . multiplier to be less greedy. You also do not need to capture the contents. Drop the parens. The backslash does nothing. Drop it. The leading line-start anchor also does nothing. Drop it.
re.sub(".*?_,", data)
You have become a victim of greedy matching. The expression matches the longest sequence that it possibly can.
I know there's a way to turn off greedy matching, but I never remember it. Instead there's a trick I use when there's a character I want to stop at. Instead of matching on every character with . I match on every character except the one I want to stop at.
re.sub("(^[^_]*\_", "", data)
This should do:
import re
def get_last_part(d):
m = re.match('[^_]*_(.*)', d)
if m:
return m.group(1)
else:
return None
print get_last_part('WebSpherePMI_jvmRuntimeModule_ProcessCpuUsage')
you can use str.index:
>>> data = 'WebSpherePMI_jvmRuntimeModule_ProcessCpuUsage'
>>> data[data.index('_')+1:]
'jvmRuntimeModule_ProcessCpuUsage'
Using str.split
>>> data.split('_',1)[1]
'jvmRuntimeModule_ProcessCpuUsage'
Using str.find:
>>> data[data.find('_')+1:]
'jvmRuntimeModule_ProcessCpuUsage'
Take a look at string methods Here
Try this regex:
result = re.sub("^.*?_", "", text)
What the regex ^.*?_ does:
^ .. Assert that the position is at the beginning of the string.
.*? .. Match every character that is not a linebreak character
between zero and unlimitted times as few times as possible.
- .. Match the character _
Try using split():
s = 'WebSpherePMI_jvmRuntimeModule_ProcessCpuUsage'
print(s.split('_',1)[1])
Result:
jvmRuntimeModule_ProcessCpuUsage
I'm working with a search&replace programming assignment. I'm a student and I'm finding the regex documentation a bit overwhelming (e.g. https://docs.python.org/2/library/re.html), so I'm hoping someone here could explain to me how to accomplish what I'm looking for.
I've used regex to get a list of strings from my document. They all look like this:
%#import fileName (regexStatement)
An actual example:
%#import script_example.py ( *out =(.|\n)*?return out)
Now, I'm wondering how I can split these up so I get the fileName and regexStatements as separate strings. I'd assume using a regex or string split function, but I'm not sure how to make it work on all kinds of variations of %#import fileName (regexstatement). Splitting using parentheses could hit the middle of the regex statement, or if a parentheses is part of the fileName, for instance. The assignment doesn't specify if it should only be able to import from python files, so I don't believe I can use ".py (" as a splitting point before the regex statement either.
I'm thinking something like a regex "%#import " to hit the gap after import, "\..* " to hit the gap after fileName. But I'm not sure how to get rid of the parentheses that encapsule the regex statement, or how to use all of it to actually split the string correctly so i have one variable storing fileName and one storing regexStatement for each entry in my list.
Thanks a lot for your attention!
If the filename can't contain spaces, just split your string on spaces with maxsplit 2:
>>> line.split(' ', 2)
['%#import', 'script_example.py', '( *out =(.|\n)*?return out)']
The maxsplit 2 makes it split only the first two spaces, and leave intact any spaces within the regex. Now you have the filename as the second element and the regex as the third. It's not clear from your statement whether the parentheses are part of the regex or not (i.e., as a capturing group). If not, you can easily remove them by trimming the first and last characters from that part.
If you assign the values like this:
filename, regex = line.split(' ', 2)[1:]
then you can strip the parentheses with:
regex = regex[1:-1]
That should do it nicely
^%#import (\S+) \((.*)\)
or, if the filename may have spaces:
^%#import ((?:(?! \().)+) \((.*)\)
Both expressions contain two groups, one for the file name and one for the contents of the parentheses. Run in multiline mode on the entire file or in normal mode if you work with single lines anyway.
This: ((?:(?! \().)+) breaks down as:
( # group start
(?: # non-capturing group
(?! # negative look-ahead: a position NOT followed by
\( # " ("
) # end look-ahead
. # match any char (this is part of the filename)
)+ # end non-capturing group, repeat
) # end group
The other bits of the expression should be self-explanatory.
import re
line = "%#import script_example.py ( *out =(.|\\n)*?return out)"
pattern = r'^%#import (\S+) \((.*)\)'
match = re.match(pattern, line)
if match:
print "match.group(1) '" + match.group(1) + "'"
print "match.group(2) '" + match.group(2) + "'"
else:
print "No match."
prints
match.group(1) 'script_example.py'
match.group(2) ' *out =(.|\n)*?return out'
For matching something like %#import script_example.py ( *out =(.|\n)*?return out) i suggest :
r'%#impor[\w\W ]+'
DEMO
note that :
\w match any word character [a-zA-Z0-9_]
\W match any non-word character [^a-zA-Z0-9_]
so you can use re.findall() for find all the matches :
import re
re.findall(r'%#impor[\w\W ]+', your_string)