Use Regular expression with fileinput - python

I am trying to replace a variable stored in another file using regular expression. The code I have tried is:
r = re.compile(r"self\.uid\s*=\s*('\w{12})'")
for line in fileinput.input(['file.py'], inplace=True):
print line.replace(r.match(line), sys.argv[1]),
The format of the variable in the file is:
self.uid = '027FC8EBC2D1'
I am trying to pass in a parameter in this format and use regular expression to verify that the sys.argv[1] is correct format and to find the variable stored in this file and replace it with the new variable.
Can anyone help. Thanks for the help.

You can use re.sub which will match the regular expression and do the substitution in one go:
r = re.compile(r"(self\.uid\s*=\s*)'\w{12}'")
for line in fileinput.input(['file.py'], inplace=True):
print r.sub(r"\1'%s'" %sys.argv[1],line),

You need to use re.sub(), not str.replace():
re.sub(pattern, repl, string[, count])
Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn’t found, string is returned unchanged. repl can be a string or a function; if it is a string, any backslash escapes in it are processed. ... Backreferences, such as \6, are replaced with the substring matched by group 6 in the pattern.
...
In addition to character escapes and backreferences as described above, \g<name> will use the substring matched by the group named name, as defined by the (?P<name>...) syntax. \g<number> uses the corresponding group number;
Quick test, using \g<number> for backreference:
>>> r = re.compile(r"(self\.uid\s*=\s*)'\w{12}'")
>>> line = "self.uid = '027FC8EBC2D1'"
>>> newv = "AAAABBBBCCCC"
>>> r.sub(r"\g<1>'%s'" % newv, line)
"self.uid = 'AAAABBBBCCCC'"
>>>

str.replace(old, new[, count])(old, new[, count]):
Return a copy of the string with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.
re.match returns either MatchObject or (most likely in your case) None, neither is a string required by str.replace.

Related

Regular Expression replacement in Python

I have a regular expression to match all instances of 1 followed by a letter. I would like to remove all these instances.
EXPRESSION = re.compile(r"1([A-Z])")
I can use re.split.
result = EXPRESSION.split(input)
This would return a list. So we could do
result = ''.join(EXPRESSION.split(input))
to convert it back to a string.
or
result = EXPRESSION.sub('', input)
Are there any differences to the end result?
Yes, the results are different. Here is a simple example:
import re
EXPRESSION = re.compile(r"1([A-Z])")
s = 'hello1Aworld'
result_split = ''.join(EXPRESSION.split(s))
result_sub = EXPRESSION.sub('', s)
print('split:', result_split)
print('sub: ', result_sub)
Output:
split: helloAworld
sub: helloworld
The reason is that because of the capture group, EXPRESSION.split(s) includes the A, as noted in the documentation:
re.split = split(pattern, string, maxsplit=0, flags=0)
Split the source string by the occurrences of the pattern,
returning a list containing the resulting substrings. If
capturing parentheses are used in pattern, then the text of all
groups in the pattern are also returned as part of the resulting
list. If maxsplit is nonzero, at most maxsplit splits occur,
and the remainder of the string is returned as the final element
of the list.
When removing the capturing parentheses, i.e., using
EXPRESSION = re.compile(r"1[A-Z]")
then so far I have not found a case where result_split and result_sub are different, even after reading this answer to a similar question about regular expressions in JavaScript, and changing the replacement string from '' to '-'.

how to make a list in python from a string and using regular expression [duplicate]

I have a sample string <alpha.Customer[cus_Y4o9qMEZAugtnW] active_card=<alpha.AlphaObject[card] ...>, created=1324336085, description='Customer for My Test App', livemode=False>
I only want the value cus_Y4o9qMEZAugtnW and NOT card (which is inside another [])
How could I do it in easiest possible way in Python?
Maybe by using RegEx (which I am not good at)?
How about:
import re
s = "alpha.Customer[cus_Y4o9qMEZAugtnW] ..."
m = re.search(r"\[([A-Za-z0-9_]+)\]", s)
print m.group(1)
For me this prints:
cus_Y4o9qMEZAugtnW
Note that the call to re.search(...) finds the first match to the regular expression, so it doesn't find the [card] unless you repeat the search a second time.
Edit: The regular expression here is a python raw string literal, which basically means the backslashes are not treated as special characters and are passed through to the re.search() method unchanged. The parts of the regular expression are:
\[ matches a literal [ character
( begins a new group
[A-Za-z0-9_] is a character set matching any letter (capital or lower case), digit or underscore
+ matches the preceding element (the character set) one or more times.
) ends the group
\] matches a literal ] character
Edit: As D K has pointed out, the regular expression could be simplified to:
m = re.search(r"\[(\w+)\]", s)
since the \w is a special sequence which means the same thing as [a-zA-Z0-9_] depending on the re.LOCALE and re.UNICODE settings.
You could use str.split to do this.
s = "<alpha.Customer[cus_Y4o9qMEZAugtnW] active_card=<alpha.AlphaObject[card]\
...>, created=1324336085, description='Customer for My Test App',\
livemode=False>"
val = s.split('[', 1)[1].split(']')[0]
Then we have:
>>> val
'cus_Y4o9qMEZAugtnW'
This should do the job:
re.match(r"[^[]*\[([^]]*)\]", yourstring).groups()[0]
your_string = "lnfgbdgfi343456dsfidf[my data] ljfbgns47647jfbgfjbgskj"
your_string[your_string.find("[")+1 : your_string.find("]")]
courtesy: Regular expression to return text between parenthesis
You can also use
re.findall(r"\[([A-Za-z0-9_]+)\]", string)
if there are many occurrences that you would like to find.
See also for more info:
How can I find all matches to a regular expression in Python?
You can use
import re
s = re.search(r"\[.*?]", string)
if s:
print(s.group(0))
How about this ? Example illusrated using a file:
f = open('abc.log','r')
content = f.readlines()
for line in content:
m = re.search(r"\[(.*?)\]", line)
print m.group(1)
Hope this helps:
Magic regex : \[(.*?)\]
Explanation:
\[ : [ is a meta char and needs to be escaped if you want to match it literally.
(.*?) : match everything in a non-greedy way and capture it.
\] : ] is a meta char and needs to be escaped if you want to match it literally.
This snippet should work too, but it will return any text enclosed within "[]"
re.findall(r"\[([a-zA-Z0-9 ._]*)\]", your_text)

Replacing string between two characters in the file

I have a string in my proprieties file as below:
line = "variables=ORACLE_BASE_HOME=/u02/test/oracle/landscape/1/db_50,DB_UNIQUE_NAME=cdms,ORACLE_BASE=//u02/test,PDB_NAME=,DB_NAME=cdms,ORACLE_HOME=/u02/test/product/19/db_21,SID=ss"
I would like to replace the following string with a different value:
DB_NAME=cdms -> DB_NAME=abc
I have the code below, however, it seems not doing as expected:
f = fileinput.FileInput(rsp_file_path)
for line in f:
re.sub(",DB_NAME=(.*?),", "abc", line, flags=re.DOTALL)
f.close()
It should be:
re.sub("(,DB_NAME=)(.*?),", "\g<1>abc,", line, flags=re.DOTALL)
or using raw string:
re.sub(r"(,DB_NAME=)(.*?),", r"\1abc,", line, flags=re.DOTALL)
That's because the documentation for re.sub() states:
In string-type repl arguments, in addition to the character escapes
and backreferences described above, \g will use the substring
matched by the group named name, as defined by the (?P...)
syntax. \g uses the corresponding group number; \g<2> is
therefore equivalent to \2, but isn’t ambiguous in a replacement such
as \g<2>0. \20 would be interpreted as a reference to group 20, not a
reference to group 2 followed by the literal character '0'. The
backreference \g<0> substitutes in the entire substring matched by the
RE.
In your case (,DB_NAME=) is the first captured group which you refer to with \g<1>.
you can use use string.replace()
s.replace('DB_NAME', 'cdms', 1).replace('DB_NAME', 'abc', 1)

Python - why doesn't this simple regex work?

This code below should be self explanatory. The regular expression is simple. Why doesn't it match?
>>> import re
>>> digit_regex = re.compile('\d')
>>> string = 'this is a string with a 4 digit in it'
>>> result = digit_regex.match(string)
>>> print result
None
Alternatively, this works:
>>> char_regex = re.compile('\w')
>>> result = char_regex.match(string)
>>> print result
<_sre.SRE_Match object at 0x10044e780>
Why does the second regex work, but not the first?
Here is what re.match() says If zero or more characters at the beginning of string match the regular expression pattern ...
In your case the string doesn't have any digit \d at the beginning. But for the \w it has t at the beginning at your string.
If you want to check for digit in your string using same mechanism, then add .* with your regex:
digit_regex = re.compile('.*\d')
The second finds a match because string starts with a word character. If you want to find matches within the string, use the search or findall methods (I see this was suggested in a comment too). Or change your regex (e.g. .*(\d).*) and use the .groups() method on the result.

Check what number a string ends with in Python

Such as "example123" would be 123, "ex123ample" would be None, and "123example" would be None.
You can use regular expressions from the re module:
import re
def get_trailing_number(s):
m = re.search(r'\d+$', s)
return int(m.group()) if m else None
The r'\d+$' string specifies the expression to be matched and consists of these special symbols:
\d: a digit (0-9)
+: one or more of the previous item (i.e. \d)
$: the end of the input string
In other words, it tries to find one or more digits at the end of a string. The search() function returns a Match object containing various information about the match or None if it couldn't match what was requested. The group() method, for example, returns the whole substring that matched the regular expression (in this case, some digits).
The ternary if at the last line returns either the matched digits converted to a number or None, depending on whether the Match object is None or not.
 
I'd use a regular expression, something like /(\d+)$/. This will match and capture one or more digits, anchored at the end of the string.
Read about regular expressions in Python.
Oops, correcting (sorry, I missed the point)
you should do something like this ;)
Import the RE module
import re
Then write a regular expression, "searching" for an expression.
s = re.search("[a-zA-Z+](\d{3})$", "string123")
This will return "123" if match or NoneType if not.
s.group(0)

Categories

Resources