Deleting generalised strings from a text file - python

I am trying to delete strings from a text file. The strings consist of different types of characters and are all different, but they all start with the same three letters and finish at the end of the line.
So far I am using this code, which I know works when I want to delete all occurrences of a specific string:
import sys
import fileinput
for i, line in enumerate(fileinput.input('duck_test.txt', inplace=1)):
sys.stdout.write(line.replace('pep.*', '')
I have tried to adapt it to delete a generalised string using '.*' but it doesn't work. Does anyone know where I am going wrong? Thanks.

Try to use re module for that purpose:
import re
import fileinput
for line in fileinput.input('duck_test.txt', inplace=True):
if not re.search(r'pep.*', line):
sys.stdout.write(line)

The following tested code will replace all strings that begin with the letters 'pep' and end with a newline in the file 'duck_test' with an empty string:
import sys
import fileinput
import re
for i, line in enumerate(fileinput.input('duck_test', inplace=1)):
sys.stdout.write(re.sub(r'pep.*', '', line))

Related

Grep a string in python

Friends,
I have a situation where i need to grep a word from a string
[MBeanServerInvocationHandler]com.bea:Name=itms2md01,Location=hello,Type=ServerRuntime
What I want to grep is the word that assigned to the variable Name in the above string which is itms2md01.
In my case i have to grep which ever string assigned to Name= so there is no particular string i have to search
Tried:
import re
import sys
file = open(sys.argv[2], "r")
for line in file:
if re.search(sys.argv[1], line):
print line,
Deak is right. As I am not having enough reputation to comment, I am depicting it below. I am not going to the file level. Just see as an instance:-
import re
str1 = "[MBeanServerInvocationHandler]com.bea:Name=itms2md01,Location=hello,Type=ServerRuntime"
pat = '(?<=Name=)\w+(?=,)'
print re.search(pat, str1).group()
Accordingly you can apply your logic with the file content with this pattern
I like to use named groups, because I'm often searching for more than one thing. But even for one item in the search, it still works nicely, and I can remember very easily what I was searching for.
I'm not certain that I fully understand the question, but if you are saying that the user can pass a key to search the value for and also a file from which to search, you can do that like this:
So, for this case, I might do:
import re
import sys
file = open(sys.argv[2], "r")
for line in file:
match = re.search(r"%s=(?P<item>[^,]+)" % sys.argv[1], line)
if match is not None:
print match.group('item')
I am assuming that is the purpose, as you have included sys.argv[1] into the search, though you didn't mention why you did so in your question.

How can I replace'&' to '&' in python?

I'm having issues with .replace(). My XML parser does not like '&', but will accept '&\amp;'. I'd like to use .replace('&','&') but this does not seem to be working. I keep getting the error:
lxml.etree.XMLSyntaxError: xmlParseEntityRef: no name, line 51, column 41
So far I have tried just a straight forward file=file.replace('&','&'), but this doesn't work. I've also tried:
xml_file = infile
file=xml_file.readlines()
for line in file:
for char in line:
char.replace('&','&')
infile=open('a','w')
file='\n'.join(file)
infile.write(file)
infile.close()
infile=open('a','r')
xml_file=infile
What would be the best way to fix my issue?
str.replace creates and returns a new string. It can't alter strings in-place - they're immutable. Try replacing:
file=xml_file.readlines()
with
file = [line.replace('&','&') for line in xml_file]
This uses a list comprehension to build a list equivalent to .readlines() but with the replacement already made.
As pointed out in the comments, if there were already &s in the string, they'd be turned into &amp;, likely not what you want. To avoid that, you could use a negative lookahead in a regular expression to replace only the ampersands not already followed by amp;:
import re
file = [re.sub("&(?!amp;)", "&", line) ...]
str.replace() returns new string object with the change made. It does not change data in-place. You are ignoring the return value.
You want to apply it to each line instead:
file = [line.replace('&', '&') for line in file]
You could use the fileinput() module to do the transformation, and have it handle replacing the original file (a backup will be made):
import fileinput
import sys
for line in fileinput.input('filename', inplace=True):
sys.stdout.write(line.replace('&', '&'))
Oh...
You need to decode HTML notation for special symbols. Python has module to deal with it - HTMLParser, here some docs.
Here is example:
import HTMLParser
out_file = ....
file = xml_file.readlines()
parsed_lines = []
for line in file:
parsed_lines.append(htmlparser.unescape(line))
Slightly off topic, but it might be good to use some escaping?
I often use urllib's quote which will put the HTML escaping in and out:
result=urllib.quote("filename&fileextension")
'filename%26fileextension'
urllib.unquote(result)
filename&fileextension
Might help for consistency?

Replace Mathematical Symbols in File

I have a file which contains some mathematical expressions in latex form. For example, I have the following which appears in my file:
{\frac{d^{2}}{d^{2}{r}}}\zeta
I would like to write a python code which will scan the file and output a new file where all the instances of the above expression are replaced with
\zeta''
I have tried the following code:
import sys
import fileinput
for line in fileinput.input():
l = line.replace(r"{\frac{d^{2}}{d^{2}{r}}}\zeta","\zeta'")
sys.stdout = open('output.txt','a')
sys.stdout.write(l)
I know that the r which appears just before the first string to be replaced tells the code to ignore any escape characters. But it appears to have difficulty dealing with the d^{2} part. This "^" symbol is not correctly interpreted by the code, so it doesn't make the replacement.
I know that {\frac{d^{2}}{d^{2}{r}}}\zeta is not technically a string, but I'm not sure how else to treat it. Any help would be great. Thanks.
An equivalent of your code (regex.py):
#!/usr/bin/python
import sys
import fileinput
x = open("output.txt", "a")
for line in fileinput.input():
l = line.replace(r"{\frac{d^{2}}{d^{2}{r}}}\zeta","\zeta''")
x.write(l)
Seems to run just fine: $ echo '{\frac{d^{2}}{d^{2}{r}}}\zeta' | ./regex.py gives:
\zeta''

I can't make a search-and-replace script for Python

I've been trying to do this, but I'm pretty new at Python, and can't figure out how to make it work.
I have this:
import fileinput
for line in fileinput.input(['tooltips.txt'], inplace=True, backup="bak.txt"):
line.replace("oldString1", "newString1")
line.replace("oldString2", "newString2")
But it just deletes everything from the txt.
What am I doing wrong?
I have tried with print(line.replace("oldString1", "newString1")
but it doesn't remove the existing words.
As I said, I'm pretty new at this.
Thanks!
line.replace() doesn't modify line it returns the modified string
import fileinput, sys
for line in fileinput.input(['tooltips.txt'], inplace=True, backup="bak.txt"):
sys.stdout.write(line.replace("oldString1", "newString1"))
One simple way to do this is with the open function and the os module:
import os
with open(tmp_file) as tmp:
with open(my_file) as f:
for line in f.readlines():
tmp.write(line.replace("oldString1", "newString1").replace("oldString2", "newString2") + "\n")
os.remove(my_file)
os.rename(tmp_file, my_file)

Python regex search for string at beginning of line in file

Here's my code:
#!/usr/bin/python
import io
import re
f = open('/etc/ssh/sshd_config','r')
strings = re.search(r'.*IgnoreR.*', f.read())
print(strings)
That returns data, but I need specific regex matching: e.g.:
^\s*[^#]*IgnoreRhosts\s+yes
If I change my code to simply:
strings = re.search(r'^IgnoreR.*', f.read())
or even
strings = re.search(r'^.*IgnoreR.*', f.read())
I don't get anything back. I need to be able to use real regex's like in perl
You can use the multiline mode then ^ match the beginning of a line:
#!/usr/bin/python
import io
import re
f = open('/etc/ssh/sshd_config','r')
strings = re.search(r"^\s*[^#]*IgnoreRhosts\s+yes", f.read(), flags=re.MULTILINE)
print(strings.group(0))
Note that without this mode you can always replace ^ by \n
Note too that this file is calibrated as a tomato thus:
^IgnoreRhosts\s+yes
is good enough for checking the parameter
EDIT: a better way
with open('/etc/ssh/sshd_config') as f:
for line in f:
if line.startswith('IgnoreRhosts yes'):
print(line)
One more time there is no reason to have leading spaces. However if you want to be sure you can always use lstrip().

Categories

Resources