Refactoring use of Python open() function and read() method - python

In the following Python script, I'm utilising the open() function to create a file object and then reading from it using the read() method. It's a simple script that will copy one file to another.
I have 2 questions:
1) Can the lines where we assign in_file and indata be combined? I understand that we create a file object, then we read from it - however can this be done in a single line of code? I'm guessing for example on option we can possibly chain the open() function and read() method together?
2) Can the two lines of code that assign the out_file variable be refactored in a similar fashion? We again create a separate file object using open(), then write to it using the write() method.
Please keep answers as simple as possible and explain what is happening in the code.
from sys import argv
from os.path import exists
script, from_file, to_file = argv
# we could combine these two lines of code into one line, how?
in_file = open(from_file)
indata = in_file.read()
# could these two lines be combined in a similar way?
out_file = open(to_file, 'w')
out_file.write(indata)
in_file.close()
out_file.close()

You can use the "with open"
One way of doing it is like below:
with open(from_file) as in_file, open(to_file, 'w') as out_file:
indata = in_file.read()
out_file.write(indata)
Apart from combining the lines, one benefit of this is that, you don't have explicitly close the files, they will be automatically closed when you exit the with block

Your script file could be more pythonic using with, also you can avoid dumping all file into memory using generators (a file object is a generator itself):
if __name__ == "__main__":
from sys import argv
from os.path import exists
script, from_file, to_file = argv
with open(from_file, "r") as inf, open(to_file, "w") as outf:
outf.writelines(inf)

Related

How to print a file after writing to it in Python

I am following Hard to Learn Python the Hard way and have tried to modify exercise 17 where you copy one file (Doc1.txt) to another (Doc2.txt) but it is not working using the code below. If I omit line 11, the file copying works fine, however, when I try to print out the contents of the "new" Doc2 by including line 11, I get the error "IOError: File not open for reading". I feel like I am missing something very basic here and getting a bit frustrated. I know a similar question has been asked before but that answer didn't help. Many thanks in advance.
from sys import argv
script, from_file, to_file = argv
in_file = open(from_file)
indata = in_file.read()
out_file = open(to_file, 'w')
out_file.write(indata)
print out_file.read()
out_file.close()
in_file.close()
You are opening out_file with the 'w' flag which is for write only. You either need to close it, and reopen with 'r' or just open it with 'r+' for read and write from the start
Change
out_file = open(to_file, 'w')
to
out_file = open(to_file, 'r+')
And then add the following to go back to the start of the file
out_file.seek(0)
The file is open for writing only. Set the "w" parameter to "r+" to read and write.
As well as this, after writing to the file, the out_file position will be at the end of the file. To read the contents, you must first add the line out_file.seek(0) to get to the start of the file.

How to write the output of a os.walk to file

I have a simple 2 line code and i need to write the output to a file. The code is as follows:
import os,sys
print next(os.walk('/var/lib/tomcat7/webapps/'))[1]
How to do it ?
Use open() method to open file, write to write to it and close to close it as in lines below:
import os,sys
with open('myfile','w') as f:
# note that i've applied str before writing next(...)[1] to file
f.write(str(next(os.walk('/var/lib/tomcat7/webapps/'))[1]))
See Reading and Writing Files tutorial for more information of how to deal with files in python and What is the python "with" statement designed for? SO question to get better understanding of with statement.
Good Luck !
In Python 3 you can use the file parameter to the print() function:
import os
with open('outfile', 'w') as outfile:
print(next(os.walk('/var/lib/tomcat7/webapps/'))[1], file=outfile)
which saves you the bother of converting to a string, and also adds a new line after the output.
The same works in Python 2 if you add this import at the top of your python file:
from __future__ import print_function
Also in Python 2 you can use the "print chevron" syntax (that is if you do not add the above import):
with open('outfile', 'w') as outfile:
print >>outfile, next(os.walk('/var/lib/tomcat7/webapps/'))[1]
Using print >> also adds a new line at the end of each print.
In either Python version you can use file.write():
with open('outfile', 'w') as outfile:
outfile.write('{!r}\n'.format(next(os.walk('/var/lib/tomcat7/webapps/'))[1]))
which requires you to explicitly convert to a string and explicitly add a new line.
I think the first option is best.

Print file passed in as argument

I have a very simple python script that should print the contents of a file that is passed like this: python script.py stuff.txt. I don't get any output.
Here is the code:
import sys
fname = sys.argv[1]
f = open(fname, 'r')
f.read()
From what I have read, this is supposed to work. Why not?
You read the file, but you don't do anything with the data.
print(f.read())
Or, for better style:
import sys
fname = sys.argv[1]
with open(fname, 'r') as f:
print(f.read())
This is the recommended way to use files. It guarantees the file is closed when you exit the with block. Does not really matter for your small script, but it's a good habit to take.

remove first char from each line in a text file

im new to Python, to programming in general.
I want to remove first char from each line in a text file and write the changes back to the file. For example i have file with 36 lines, and the first char in each line contains a symbol or a number, and i want it to be removed.
I made a little code here, but it doesn't work as expected, it only duplicates whole liens. Any help would be appreciated in advance!
from sys import argv
run, filename = argv
f = open(filename, 'a+')
f.seek(0)
lines = f.readlines()
for line in lines:
f.write(line[1:])
f.close()
Your code already does remove the first character. I saved exactly your code as both dupy.py and dupy.txt, then ran python dupy.py dupy.txt, and the result is:
from sys import argv
run, filename = argv
f = open(filename, 'a+')
f.seek(0)
lines = f.readlines()
for line in lines:
f.write(line[1:])
f.close()
rom sys import argv
un, filename = argv
= open(filename, 'a+')
.seek(0)
ines = f.readlines()
or line in lines:
f.write(line[1:])
.close()
It's not copying entire lines; it's copying lines with their first character stripped.
But from the initial statement of your problem, it sounds like you want to overwrite the lines, not append new copies. To do that, don't use append mode. Read the file, then write it:
from sys import argv
run, filename = argv
f = open(filename)
lines = f.readlines()
f.close()
f = open(filename, 'w')
for line in lines:
f.write(line[1:])
f.close()
Or, alternatively, write a new file, then move it on top of the original when you're done:
import os
from sys import argv
run, filename = argv
fin = open(filename)
fout = open(filename + '.tmp', 'w')
lines = f.readlines()
for line in lines:
fout.write(line[1:])
fout.close()
fin.close()
os.rename(filename + '.tmp', filename)
(Note that this version will not work as-is on Windows, but it's simpler than the actual cross-platform version; if you need Windows, I can explain how to do this.)
You can make the code a lot simpler, more robust, and more efficient by using with statements, looping directly over the file instead of calling readlines, and using tempfile:
import tempfile
from sys import argv
run, filename = argv
with open(filename) as fin, tempfile.NamedTemporaryFile(delete=False) as fout:
for line in fin:
fout.write(line[1:])
os.rename(fout.name, filename)
On most platforms, this guarantees an "atomic write"—when your script finishes, or even if someone pulls the plug in the middle of it running, the file will end up either replaced by the new version, or untouched; there's no way it can end up half-way overwritten into unrecoverable garbage.
Again this version won't work on Windows. Without a whole lot of work, there is no way to implement this "write-temp-and-rename" algorithm on Windows. But you can come close with only a bit of extra work:
with open(filename) as fin, tempfile.NamedTemporaryFile(delete=False) as fout:
for line in fin:
fout.write(line[1:])
outname = fout.name
os.remove(filename)
os.rename(outname, filename)
This does prevent you from half-overwriting the file, but it leaves a hole where you may have deleted the original file, and left the new file in a temporary location that you'll have to search for. You can make this a little nicer by putting the file somewhere easier to find (see the NamedTemporaryFile docs to see how). Or renaming the original file to a temporary name, then writing to the original filename, then deleting the original file. Or various other possibilities. But to actually get the same behavior as on other platforms is very difficult.
You can either read all lines in memory then recreate file,
from sys import argv
run, filename = argv
with open(filename, 'r') as f:
data = [i[1:] for i in f
with open(filename, 'w') as f:
f.writelines(i+'\n' for i in data) # this is for linux. for win use \r\n
or You can create other file and move data from first file to second line by line. Then You can rename it If You'd like
from sys import argv
run, filename = argv
new_name = filename + '.tmp'
with open(filename, 'r') as f_in, open(new_name, 'w') as f_out:
for line in f_in:
f_out.write(line[1:])
os.rename(new_name, filename)
At its most basic, your problem is that you need to seek back to the beginning of the file after you read its complete contents into the array f. Since you are making the file shorter, you also need to use truncate to adjust the official length of the file after you're done. Furthermore, open mode a+ (a is for append) overrides seek and forces all writes to go to the end of the file. So your code should look something like this:
import sys
def main(argv):
filename = argv[1]
with open(filename, 'r+') as f:
lines = f.readlines()
f.seek(0)
for line in lines:
f.write(line[1:])
f.truncate()
if __name__ == '__main__': main(sys.argv)
It is better, when doing something like this, to write the changes to a new file and then rename it over the old file when you're done. This causes the update to happen "atomically" - a concurrent reader sees either the old file or the new one, not some mangled combination of the two. That looks like this:
import os
import sys
import tempfile
def main(argv):
filename = argv[1]
with open(filename, 'r') as inf:
with tempfile.NamedTemporaryFile(dir=".", delete=False) as outf:
tname = outf.name
for line in inf:
outf.write(line[1:])
os.rename(tname, filename)
if __name__ == '__main__': main(sys.argv)
(Note: Atomically replacing a file via rename does not work on Windows; you have to os.remove the old name first. This unfortunately does mean there is a brief window (no pun intended) where a concurrent reader will find that the file does not exist. As far as I know there is no way to avoid this.)
import re
with open(filename,'r+') as f:
modified = re.sub('^.','',f.read(),flags=re.MULTILINE)
f.seek(0,0)
f.write(modified)
In the regex pattern:
^ means 'start of string'
^ with flag re.MULTILINE means 'start of line'
^. means 'the only one character at the start of a line'
The start of a line is the start of the string or any position after a newline (a newline is \n)
So, we may fear that some newlines in sequences like \n\n\n\n\n\n\n could match with the regex pattern.
But the dot symbolizes any character EXCEPT a newline, then all the newlines don't match with this regex pattern.
During the reading of the file triggered by f.read(), the file's pointer goes until the end of the file.
f.seek(0,0) moves the file's pointer back to the beginning of the file
f.truncate() puts a new EOF = end of file at the point where the writing has stopped. It's necessary since the modified text is shorter than the original one.
Compare what it does with a code without this line
To be hones, i'm really not sure how good/bad is an idea of nesting with open(), but you can do something like this.
with open(filename_you_reading_lines_FROM, 'r') as f0:
with open(filename_you_appending_modified_lines_TO, 'a') as f1:
for line in f0:
f1.write(line[1:])
While there seemed to be some discussion of best practice and whether it would run on Windows or not, being new to Python, I was able to run the first example that worked and get it to run in my Win environment that has cygwin binaries in my environmental variables Path and remove the first 3 characters (which were line numbers from a sample file):
import os
from sys import argv
run, filename = argv
fin = open(filename)
fout = open(filename + '.tmp', 'w')
lines = fin.readlines()
for line in lines:
fout.write(line[3:])
fout.close()
fin.close()
I chose not to automatically overwrite since I wanted to be able to eyeball the output.
python c:\bin\remove1st3.py sampleCode.txt

Python read/write file without closing

Sometimes when I open a file for reading or writing in Python
f = open('workfile', 'r')
or
f = open('workfile', 'w')
I read/write the file, and then at the end I forget to do f.close(). Is there a way to automatically close after all the reading/writing is done, or after the code finishes processing?
with open('file.txt','r') as f:
#file is opened and accessible via f
pass
#file will be closed before here
You could always use the with...as statement
with open('workfile') as f:
"""Do something with file"""
or you could also use a try...finally block
f = open('workfile', 'r')
try:
"""Do something with file"""
finally:
f.close()
Although since you say that you forget to add f.close(), I guess the with...as statement will be the best for you and given it's simplicity, it's hard to see the reason for not using it!
Whatever you do with your file, after you read it in, this is how you should read and write it back:
$ python myscript.py sample.txt sample1.txt
Then the first argument (sample.txt) is our "oldfile" and the second argument (sample1.txt) is our "newfile". You can then do the following code into a file called "myscript.py"
from sys import argv
script_name,oldfile,newfile = argv
content = open(oldfile,"r").read()
# now, you can rearrange your content here
t = open(newfile,"w")
t.write(content)
t.close()

Categories

Resources