Python regex search for string at beginning of line in file - python

Here's my code:
#!/usr/bin/python
import io
import re
f = open('/etc/ssh/sshd_config','r')
strings = re.search(r'.*IgnoreR.*', f.read())
print(strings)
That returns data, but I need specific regex matching: e.g.:
^\s*[^#]*IgnoreRhosts\s+yes
If I change my code to simply:
strings = re.search(r'^IgnoreR.*', f.read())
or even
strings = re.search(r'^.*IgnoreR.*', f.read())
I don't get anything back. I need to be able to use real regex's like in perl

You can use the multiline mode then ^ match the beginning of a line:
#!/usr/bin/python
import io
import re
f = open('/etc/ssh/sshd_config','r')
strings = re.search(r"^\s*[^#]*IgnoreRhosts\s+yes", f.read(), flags=re.MULTILINE)
print(strings.group(0))
Note that without this mode you can always replace ^ by \n
Note too that this file is calibrated as a tomato thus:
^IgnoreRhosts\s+yes
is good enough for checking the parameter
EDIT: a better way
with open('/etc/ssh/sshd_config') as f:
for line in f:
if line.startswith('IgnoreRhosts yes'):
print(line)
One more time there is no reason to have leading spaces. However if you want to be sure you can always use lstrip().

Related

Python Exclude Comments with re.search

I am searching for a string in a line using:
import re
myfile = "myfile.txt"
files = open(myfile, 'r').read().splitlines()
for line in file:
if re.search("`this", line):
print "bingo"
This works fine. However, I want to exclude any lines that are comments. The comments in the file that I am reading the lines from can have comments in the form of //. I'm not sure how to exclude the comments though. Comments might start anywhere in the line, not necessarily at the beginning of the line.
Example:
I want to exclude lines like first_last = "name" //`this THAT since "`this" is in a comment
This can be done with a variable length negative lookbehind assertion, but for that you need to use the regex package installable with pip form the PyPi repository. The regex is:
(?<!//.*) # negative lookahead assertion stating that the following must not be preceded by // followed by 0 or more arbitary characters
`this # matches `this
The code:
import regex as re
regex = re.compile(r'(?<!//.*)`this')
myfile = "myfile.txt"
with open(myfile, 'r') as f:
for line in f: # line has newline character at end; call rstrip method on line to get rid if you want
if regex.search(line):
print(line, end='')
Regex Demo

Script to find and replace text in a .csv doesn't work with "=$"

My simple find and replace python script should find the "find_str" text and replace it with empty. It seems to work for any text I enter except the string "=$" for some reason. Can anyone help with why this might be.
import re
# open your csv and read as a text string
with open('new.csv', 'r') as f:
my_csv_text = f.read()
find_str = '=$'
replace_str = ' '
# substitute
new_csv_str = re.sub(find_str, replace_str, my_csv_text)
# open new file and save
new_csv_path = './my_new_csv.csv'
with open(new_csv_path, 'w') as f:
f.write(new_csv_str)
$ is a special character within the regex world.
You have different choices:
Escape the $:
find_str = '=\$'
Use simple string functions as you do not have any variation in your pattern (no re module needed, really):
my_csv_text.replace(find_str, replace_str, my_csv_text)

Matching a simple string with regex not working?

I have a large txt-file and want to extract all strings with these patterns:
/m/meet_the_crr
/m/commune
/m/hann_2
Here is what I tried:
import re
with open("testfile.txt", "r") as text_file:
contents = text_file.read().replace("\n", "")
print(re.match(r'^\/m\/[a-zA-Z0-9_-]+$', contents))
The result I get is a simple "None". What am I doing wrong here?
You need to not remove lineends and use the re.MULTILINE flag so you get multiple results from a bigger text returned:
# write a demo file
with open("t.txt","w") as f:
f.write("""
/m/meet_the_crr\n
/m/commune\n
/m/hann_2\n\n
# your text looks like this after .read().replace(\"\\n\",\"\")\n
/m/meet_the_crr/m/commune/m/hann_2""")
Program:
import re
regex = r"^\/m\/[a-zA-Z0-9_-]+$"
with open("t.txt","r") as f:
contents = f.read()
found_all = re.findall(regex,contents,re.M)
print(found_all)
print("-")
print(open("t.txt").read())
Output:
['/m/meet_the_crr', '/m/commune', '/m/hann_2']
Filecontent:
/m/meet_the_crr
/m/commune
/m/hann_2
# your text looks like this after .read().replace("\n","")
/m/meet_the_crr/m/commune/m/hann_2
This is about what Wiktor Stribiżew did tell you in his comment - although he suggested to use a better pattern as well: r'^/m/[\w-]+$'
There is nothing logically wrong with your code, and in fact your pattern will match the inputs you describe:
result = re.match(r'^\/m\/[a-zA-Z0-9_-]+$', '/m/meet_the_crr')
if result:
print(result.groups()) # this line is reached, as there is a match
Since you did not specify any capture groups, you will see () being printed to the console. You could capture the entire input, and then it would be available, e.g.
result = re.match(r'(^\/m\/[a-zA-Z0-9_-]+$)', '/m/meet_the_crr')
if result:
print(result.groups(1)[0])
/m/meet_the_crr
You are reading a whole file into a variable (into memory) using .read(). With .replace("\n", ""), you re,ove all newlines in the string. The re.match(r'^\/m\/[a-zA-Z0-9_-]+$', contents) tries to match the string that entirely matches the \/m\/[a-zA-Z0-9_-]+ pattern, and it is impossible after all the previous manipulations.
There are at least two ways out. Either remove .replace("\n", "") (to prevent newline removal) and use re.findall(r'^/m/[\w-]+$', contents, re.M) (re.M option will enable matching whole lines rather than the whole text), or read the file line by line and use your re.match version to check each line for a match, and if it matches add to the final list.
Example:
import re
with open("testfile.txt", "r") as text_file:
contents = text_file.read()
print(re.findall(r'^/m/[\w-]+$', contents, re.M))
Or
import re
with open("testfile.txt", "r") as text_file:
for line in text_file:
if re.match(r'/m/[\w-]+\s*$', line):
print(line.rstrip())
Note I used \w to make the pattern somewhat shorter, but if you are working in Python 3 and only want to match ASCII letters and digits, use also re.ASCII option.
Also, / is not a special char in Python regex patterns, there is no need escaping it.

Deleting generalised strings from a text file

I am trying to delete strings from a text file. The strings consist of different types of characters and are all different, but they all start with the same three letters and finish at the end of the line.
So far I am using this code, which I know works when I want to delete all occurrences of a specific string:
import sys
import fileinput
for i, line in enumerate(fileinput.input('duck_test.txt', inplace=1)):
sys.stdout.write(line.replace('pep.*', '')
I have tried to adapt it to delete a generalised string using '.*' but it doesn't work. Does anyone know where I am going wrong? Thanks.
Try to use re module for that purpose:
import re
import fileinput
for line in fileinput.input('duck_test.txt', inplace=True):
if not re.search(r'pep.*', line):
sys.stdout.write(line)
The following tested code will replace all strings that begin with the letters 'pep' and end with a newline in the file 'duck_test' with an empty string:
import sys
import fileinput
import re
for i, line in enumerate(fileinput.input('duck_test', inplace=1)):
sys.stdout.write(re.sub(r'pep.*', '', line))

Replace Mathematical Symbols in File

I have a file which contains some mathematical expressions in latex form. For example, I have the following which appears in my file:
{\frac{d^{2}}{d^{2}{r}}}\zeta
I would like to write a python code which will scan the file and output a new file where all the instances of the above expression are replaced with
\zeta''
I have tried the following code:
import sys
import fileinput
for line in fileinput.input():
l = line.replace(r"{\frac{d^{2}}{d^{2}{r}}}\zeta","\zeta'")
sys.stdout = open('output.txt','a')
sys.stdout.write(l)
I know that the r which appears just before the first string to be replaced tells the code to ignore any escape characters. But it appears to have difficulty dealing with the d^{2} part. This "^" symbol is not correctly interpreted by the code, so it doesn't make the replacement.
I know that {\frac{d^{2}}{d^{2}{r}}}\zeta is not technically a string, but I'm not sure how else to treat it. Any help would be great. Thanks.
An equivalent of your code (regex.py):
#!/usr/bin/python
import sys
import fileinput
x = open("output.txt", "a")
for line in fileinput.input():
l = line.replace(r"{\frac{d^{2}}{d^{2}{r}}}\zeta","\zeta''")
x.write(l)
Seems to run just fine: $ echo '{\frac{d^{2}}{d^{2}{r}}}\zeta' | ./regex.py gives:
\zeta''

Categories

Resources