Replacing string between two characters in the file - python

I have a string in my proprieties file as below:
line = "variables=ORACLE_BASE_HOME=/u02/test/oracle/landscape/1/db_50,DB_UNIQUE_NAME=cdms,ORACLE_BASE=//u02/test,PDB_NAME=,DB_NAME=cdms,ORACLE_HOME=/u02/test/product/19/db_21,SID=ss"
I would like to replace the following string with a different value:
DB_NAME=cdms -> DB_NAME=abc
I have the code below, however, it seems not doing as expected:
f = fileinput.FileInput(rsp_file_path)
for line in f:
re.sub(",DB_NAME=(.*?),", "abc", line, flags=re.DOTALL)
f.close()

It should be:
re.sub("(,DB_NAME=)(.*?),", "\g<1>abc,", line, flags=re.DOTALL)
or using raw string:
re.sub(r"(,DB_NAME=)(.*?),", r"\1abc,", line, flags=re.DOTALL)
That's because the documentation for re.sub() states:
In string-type repl arguments, in addition to the character escapes
and backreferences described above, \g will use the substring
matched by the group named name, as defined by the (?P...)
syntax. \g uses the corresponding group number; \g<2> is
therefore equivalent to \2, but isn’t ambiguous in a replacement such
as \g<2>0. \20 would be interpreted as a reference to group 20, not a
reference to group 2 followed by the literal character '0'. The
backreference \g<0> substitutes in the entire substring matched by the
RE.
In your case (,DB_NAME=) is the first captured group which you refer to with \g<1>.

you can use use string.replace()
s.replace('DB_NAME', 'cdms', 1).replace('DB_NAME', 'abc', 1)

Related

How to match anything in a regular expression up to a character and not including it?

For example if I have a string abc%12341%%c%9876 I would like to substitute from the last % in the string to the end with an empty string, the final output that I'm trying to get is abc%12341%%c.
I created a regular expression '.*#' to search for the last % meaning abc%12341%%c% , and then getting the index of the the last % and then just replacing it with an empty string.
I was wondering if it can be done in one line using re.sub(..)
Use the following regex pattern, and then replace with empty string:
%[^%]*$
Sample script:
inp = "abc%12341%%c%9876"
output = re.sub(r'%[^%]*$', '', inp)
print(output) # abc%12341%%c
The regex pattern says to match the final % sign, followed by zero or more non % characters, up to the end of the string. We then replace with empty string, to effectively remove this content from the input.
I think it is called lookahead matching - I will look it up if I am not too slow :-)
(?=...)
Matches if ... matches next, but doesn’t consume any of the string. This is called a lookahead assertion. For example, Isaac (?=Asimov) will match 'Isaac ' only if it’s followed by 'Asimov'

how to make a list in python from a string and using regular expression [duplicate]

I have a sample string <alpha.Customer[cus_Y4o9qMEZAugtnW] active_card=<alpha.AlphaObject[card] ...>, created=1324336085, description='Customer for My Test App', livemode=False>
I only want the value cus_Y4o9qMEZAugtnW and NOT card (which is inside another [])
How could I do it in easiest possible way in Python?
Maybe by using RegEx (which I am not good at)?
How about:
import re
s = "alpha.Customer[cus_Y4o9qMEZAugtnW] ..."
m = re.search(r"\[([A-Za-z0-9_]+)\]", s)
print m.group(1)
For me this prints:
cus_Y4o9qMEZAugtnW
Note that the call to re.search(...) finds the first match to the regular expression, so it doesn't find the [card] unless you repeat the search a second time.
Edit: The regular expression here is a python raw string literal, which basically means the backslashes are not treated as special characters and are passed through to the re.search() method unchanged. The parts of the regular expression are:
\[ matches a literal [ character
( begins a new group
[A-Za-z0-9_] is a character set matching any letter (capital or lower case), digit or underscore
+ matches the preceding element (the character set) one or more times.
) ends the group
\] matches a literal ] character
Edit: As D K has pointed out, the regular expression could be simplified to:
m = re.search(r"\[(\w+)\]", s)
since the \w is a special sequence which means the same thing as [a-zA-Z0-9_] depending on the re.LOCALE and re.UNICODE settings.
You could use str.split to do this.
s = "<alpha.Customer[cus_Y4o9qMEZAugtnW] active_card=<alpha.AlphaObject[card]\
...>, created=1324336085, description='Customer for My Test App',\
livemode=False>"
val = s.split('[', 1)[1].split(']')[0]
Then we have:
>>> val
'cus_Y4o9qMEZAugtnW'
This should do the job:
re.match(r"[^[]*\[([^]]*)\]", yourstring).groups()[0]
your_string = "lnfgbdgfi343456dsfidf[my data] ljfbgns47647jfbgfjbgskj"
your_string[your_string.find("[")+1 : your_string.find("]")]
courtesy: Regular expression to return text between parenthesis
You can also use
re.findall(r"\[([A-Za-z0-9_]+)\]", string)
if there are many occurrences that you would like to find.
See also for more info:
How can I find all matches to a regular expression in Python?
You can use
import re
s = re.search(r"\[.*?]", string)
if s:
print(s.group(0))
How about this ? Example illusrated using a file:
f = open('abc.log','r')
content = f.readlines()
for line in content:
m = re.search(r"\[(.*?)\]", line)
print m.group(1)
Hope this helps:
Magic regex : \[(.*?)\]
Explanation:
\[ : [ is a meta char and needs to be escaped if you want to match it literally.
(.*?) : match everything in a non-greedy way and capture it.
\] : ] is a meta char and needs to be escaped if you want to match it literally.
This snippet should work too, but it will return any text enclosed within "[]"
re.findall(r"\[([a-zA-Z0-9 ._]*)\]", your_text)

Numeric value directly after backreference [duplicate]

This question already has an answer here:
python re.sub group: number after \number
(1 answer)
Closed 6 years ago.
I'm trying to use re.sub on a string with a numeric value directly after a numeric backreference. That is, if my replacement value is 15.00 and my backreference is \1, my replacement string would look like:
\115.00, which as expected will throw an error: invalid group reference because it thinks my backreference group is 115.
Example:
import re
r = re.compile("(Value:)([-+]?[0-9]*\.?[0-9]+)")
to_replace = "Value:0.99" # Want Value:1.00
# Won't work:
print re.sub(r, r'\11.00', to_replace)
# Will work, but don't want the space
print re.sub(r, r'\1 1.00', to_replace)
Is there a solution that doesn't involve more than re.sub?
Use an unambiguous backreference syntax \g<1>. See re.sub reference:
\g<number> uses the corresponding group number; \g<2> is therefore equivalent to \2, but isn’t ambiguous in a replacement such as \g<2>0. \20 would be interpreted as a reference to group 20, not a reference to group 2 followed by the literal character '0'.
See this regex demo.
Python demo:
import re
r = re.compile("(Value:)([-+]?[0-9]*\.?[0-9]+)")
to_replace = "Value:0.99" # Want Value:1.00
print(re.sub(r, r'\g<1>1.00', to_replace))
# => Value:1.00

Replace String in python with matched pattern

I have to remove any punctuation marks from the start and at the end of the word.
I am using re.sub to do it.
re.sub(r'(\w.+)(?=[^\w]$)','\1',text)
Grouping not working out - all I get is ☺. for Mihir4. in command line
If you have string with multiple words, such as
text = ".adfdf. 'df' !3423? ld! :sdsd"
this will do the trick (it will also work for single words, of course):
>>> re.sub(r'[^\w\s]*(\w+)[^\w\s]*', r'\1', text)
'adfdf df 3423 ld sdsd'
Notice the r in r'\1'. This is equivalent to '\\1'.
>>> re.sub(r'[^\w\s]*(\w+)[^\w\s]*', '\\1', text)
'adfdf df 3423 ld sdsd'
Further reading: the backslash plague
The string literal '\1' is equivalent to '\x01'. You need to escape it or use raw string literal to mean backreference group 1.
BTW, you don't need to use the capturing group.
>>> re.sub(r'^[^-\w]+|[^-\w]$', '', 'Mihir4.')
'Mihir4'

Use Regular expression with fileinput

I am trying to replace a variable stored in another file using regular expression. The code I have tried is:
r = re.compile(r"self\.uid\s*=\s*('\w{12})'")
for line in fileinput.input(['file.py'], inplace=True):
print line.replace(r.match(line), sys.argv[1]),
The format of the variable in the file is:
self.uid = '027FC8EBC2D1'
I am trying to pass in a parameter in this format and use regular expression to verify that the sys.argv[1] is correct format and to find the variable stored in this file and replace it with the new variable.
Can anyone help. Thanks for the help.
You can use re.sub which will match the regular expression and do the substitution in one go:
r = re.compile(r"(self\.uid\s*=\s*)'\w{12}'")
for line in fileinput.input(['file.py'], inplace=True):
print r.sub(r"\1'%s'" %sys.argv[1],line),
You need to use re.sub(), not str.replace():
re.sub(pattern, repl, string[, count])
Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn’t found, string is returned unchanged. repl can be a string or a function; if it is a string, any backslash escapes in it are processed. ... Backreferences, such as \6, are replaced with the substring matched by group 6 in the pattern.
...
In addition to character escapes and backreferences as described above, \g<name> will use the substring matched by the group named name, as defined by the (?P<name>...) syntax. \g<number> uses the corresponding group number;
Quick test, using \g<number> for backreference:
>>> r = re.compile(r"(self\.uid\s*=\s*)'\w{12}'")
>>> line = "self.uid = '027FC8EBC2D1'"
>>> newv = "AAAABBBBCCCC"
>>> r.sub(r"\g<1>'%s'" % newv, line)
"self.uid = 'AAAABBBBCCCC'"
>>>
str.replace(old, new[, count])(old, new[, count]):
Return a copy of the string with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.
re.match returns either MatchObject or (most likely in your case) None, neither is a string required by str.replace.

Categories

Resources