How to convert variable to a regex string? - python

I am working in python I am looping through a large group of strings and I want to be able to see if they are in a second list of strings.
for line in dictionary:
line = line.replace('\r\n','').replace('\n','')
for each in complex8list:
txt = re.compile(.*line.*)
if re.search(each, txt):
I need to be able to check if the string with anything before it, and anything after it is in the second list.
What is the correct syntax to do this?

If line isn't a regex, you don't even need regex for this.
if line in each:
If line is a regex, then you don't need to do anything since a leading .* is implied with re.search and a trailing .* is unnecessary.
if re.search(line, each):
BTW you seem to have the arguments to re.search backwards.

Related

Python - replace multiline string in a file

I'm writing a script which finds in a file a few lines of text. I wonder how to replace exactly that text with other given (new string might be shorter or longer). I'm using re.compile() to create a multiple line pattern then looking for any match in a file I do like this:
for match in pattern.finditer(text_in_file)
#if it would be possible I wish to change
#text in a file here by (probably) replacing match.group(0)
Is it possible to accomplish in this way (if yes, then how to do it in the easiest way?) or my approach is wrong or hard to do it right (if yes, then how to do it right?)
The simple solution:
Read the whole text into a variable as a string.
Use a multi-line regexp to match what you want to replace
Use output = pattern.sub('replacement', fileContent)
The complex solution:
Read the file line by line
Print any line which doesn't match the start of the pattern
If you find a match for the start, stop printing until you see the end pattern.
If you saw the end pattern, print the replacement
Use pattern.sub('replacement text', text_in_file) to replace matches.
You can use back references in the replacement pattern as needed. It doesn't matter if the string is shorter or longer; the method returns a new string value with the replacements made. If the text came from a file, you'll need to write back the text to that file to replace the contents.
You could use the fileinput module if you need to make the replacement in-place; the module takes care of moving the original file aside and write a new file in it's place.

How to use ^ and $ to parse simple expression?

How do I use the ^ and $ symbols to parse only /blog/articles in the following?
I've created ex3.txt that contains this:
/blog/article/1
/blog/articles
/blog
/admin/blog/articles
and the regex:
^/blog/articles$
doesn't appear to work, as in when I type it using 'regetron' (see learning regex the hard way) there is no output on the next line.
This is my exact procedure:
At command line in the correct directory, I type: regetron ex3.txt. ex3.txt contains one line with the following:
/blog/article/1 /blog/articles /blog /admin/blog/articles
although I have tried it with newlines between entries.
I type in ^/blog/article/[0-9]$ and nothing is returned on the next line.
I try the first solution posted,^\/blog\/articles$ and nothing is returned.
Thanks in advance SOers!
Change your regex to:
^\/blog\/articles$
You need to escape your slashes.
Also, ensure there are no trailing spaces on the end of each line in your ex3.txt file.
Based on your update, it sounds like ^ and $ might not be the right operators for you. Those match the beginning and end of a line respectively. If you have multiple strings that you want to match on the same line, then you'll need something more like this:
(?:^|\s)(\/blog\/articles)(?:$|\s)
What this does:
(?:^|\s) Matches, but does not capture (?:), a line start (^) OR (|) a whitespace (\s)
(\/blog\/articles) Matches and captures /blog/articles.
(?:$|\s) Matches, but does not capture (?:), a line end ($) OR (|) a whitespace (\s)
This will work for both cases, but be aware that it will match (but will not capture) up to a single whitespace before and after /blog/articles.

match pattern between symbols, after given pattern

I would like to find the first occurrence of a string, after a certain other string, and between a certain pattern. The documents that I am parsing are not xml, but have similar rules of start/end. and example of what I would be looking at:
...b.filext", "{xxxxx-xxx-xxx-xxx-xxxxxxxxxxx}"
and I am trying to get the string between { and }, right after an occurrence of a vcxproj, every time it happens in the document.
I tried the following, but I get a list of None:
my_list=[]
for line in text.split('.vcxproj'):
if '{' in line:
my_list.append(re.match( r"(?<=\{)(.*?)(?=\})", line))
I have tried to alter my expression but no success. Help ? Thank you.
re.match matches only at the beginning of the string. Try re.search. Alternatively, replace the loop with re.findall.
How about:
re.findall(r"vcxproj.*?\{(.*?)\}", text)

dealing with \n characters at end of multiline string in python

I have been using python with regex to clean up a text file. I have been using the following method and it has generally been working:
mystring = compiledRegex.sub("replacement",mystring)
The string in question is an entire text file that includes many embedded newlines. Some of the compiled regex's cover multiple lines using the re.DOTALL option. If the last character in the compiled regex is a \n the above command will substitute all matches of the regex except the match that ends with the final newline at the end of the string. In fact, I have had several other no doubt related problems dealing with newlines and multiple newlines when they appear at the very end of the string. Can anyone give me a pointer as to what is going on here? Thanks in advance.
If i correctly undestood you and all that you need is to get a text without newline at the end of the each line and then iterate over this text in order to find a required word than you can try to use the following:
data = (line for line in text.split('\n') if line.strip())# gives you all non empty lines without '\n'at the end
Now you can either search/replace any text you need using list slicing or regex functionality.
Or you can use replace in order to replace all '\n' to whenever you want:
text.replace('\n', '')
My bet is that your file does not end with a newline...
>>> content = open('foo').read()
>>> print content
TOTAL:.?C2
abcTOTAL:AC2
defTOTAL:C2
>>> content
'TOTAL:.?C2\nabcTOTAL:AC2\ndefTOTAL:C2'
...so the last line does not match the regex:
>>> regex = re.compile('TOTAL:.*?C2\n', re.DOTALL)
>>> regex.sub("XXX", content)
'XXXabcXXXdefTOTAL:C2'
If that is the case, the solution is simple: just match either a newline or the end of the file (with $):
>>> regex = re.compile('TOTAL:.*?C2(\n|$)', re.DOTALL)
>>> regex.sub("XXX", content)
'XXXabcXXXdefXXX'
I can't get a good handle on what is going on from your explanation but you may be able to fix it by replacing all multiple newlines with a single newline as you read in the file. Another option might be to just trim() the regex removing the \n at the end unless you need it for something.
Is the question mark to prevent the regex matching more than one iine at a time? If so then you probably want to be using the MULTILINE flag instead of DOTALL flag. The ^ sign will now match just after a new line or the beginning of a string and the $ sign will now match just before a newline character or the end of a string.
eg.
regex = re.compile('^TOTAL:.*$', re.MULTILINE)
content = regex.sub('', content)
However, this still leaves with the problem of empty lines. But why not just run one additional regex at the end that removes blank lines.
regex = re.compile('\n{2,}')
content = regex.sub('\n', content)

Python: pattern matching for a string

Im trying to check a file line by line for any_string=any_string. It must be that format, no spaces or anything else. The line must contain a string then a "=" and then another string and nothing else. Could someone help me with the syntax in python to find this please? =]
pattern='*\S\=\S*'
I have this, but im pretty sure its wrong haha.
Don't know if you are looking for lines with the same value on both = sides. If so then use:
the_same_re = re.compile(r'^(\S+)=(\1)$')
if values can differ then use
the_same_re = re.compile(r'^(\S+)=(\S+)$')
In this regexpes:
^ is the beginning of line
$ is the end of line
\S+ is one or more non space character
\1 is first group
r before regex string means "raw" string so you need not escape backslashes in string.
pattern = r'\S+=\S+'
If you want to be able to grab the left and right-hand sides, you could add capture groups:
pattern = r'(\S+)=(\S+)'
If you don't want to allow multiple equals signs in the line (which would do weird things), you could use this:
pattern = r'[^\s=]+=[^\s=]+'
I don't know what the tasks you want make use this pattern. Maybe you want parse configuration file.
If it is true you may use module ConfigParser.
Ok, so you want to find anystring=anystring and nothing else. Then no need regex.
>>> s="anystring=anystring"
>>> sp=s.split("=")
>>> if len(sp)==2:
... print "ok"
...
ok
Since Python 2.5 I prefer this to split. If you don't like spaces, just check.
left, _, right = any_string.partition("=")
if right and " " not in any_string:
# proceed
Also it never hurts to learn regular expressions.

Categories

Resources