Print file passed in as argument - python

I have a very simple python script that should print the contents of a file that is passed like this: python script.py stuff.txt. I don't get any output.
Here is the code:
import sys
fname = sys.argv[1]
f = open(fname, 'r')
f.read()
From what I have read, this is supposed to work. Why not?

You read the file, but you don't do anything with the data.
print(f.read())
Or, for better style:
import sys
fname = sys.argv[1]
with open(fname, 'r') as f:
print(f.read())
This is the recommended way to use files. It guarantees the file is closed when you exit the with block. Does not really matter for your small script, but it's a good habit to take.

Related

Refactoring use of Python open() function and read() method

In the following Python script, I'm utilising the open() function to create a file object and then reading from it using the read() method. It's a simple script that will copy one file to another.
I have 2 questions:
1) Can the lines where we assign in_file and indata be combined? I understand that we create a file object, then we read from it - however can this be done in a single line of code? I'm guessing for example on option we can possibly chain the open() function and read() method together?
2) Can the two lines of code that assign the out_file variable be refactored in a similar fashion? We again create a separate file object using open(), then write to it using the write() method.
Please keep answers as simple as possible and explain what is happening in the code.
from sys import argv
from os.path import exists
script, from_file, to_file = argv
# we could combine these two lines of code into one line, how?
in_file = open(from_file)
indata = in_file.read()
# could these two lines be combined in a similar way?
out_file = open(to_file, 'w')
out_file.write(indata)
in_file.close()
out_file.close()
You can use the "with open"
One way of doing it is like below:
with open(from_file) as in_file, open(to_file, 'w') as out_file:
indata = in_file.read()
out_file.write(indata)
Apart from combining the lines, one benefit of this is that, you don't have explicitly close the files, they will be automatically closed when you exit the with block
Your script file could be more pythonic using with, also you can avoid dumping all file into memory using generators (a file object is a generator itself):
if __name__ == "__main__":
from sys import argv
from os.path import exists
script, from_file, to_file = argv
with open(from_file, "r") as inf, open(to_file, "w") as outf:
outf.writelines(inf)

How to take input file from terminal for python script?

I have a python script which uses a text file and manipulate the data from the file and output to another file. Basically I want it to work for any text file input. Right now I readline from the file and then print the output to screen. I want the output in a file.
So user can type the following and test for any file:
cat input_file.txt | python script.py > output_file.txt.
How can I implement this in my script? Thank You.
cat is command in linux. I don't know how it works.
The best way to do this is probably to call the input and output files as arguments for the python script:
import sys
inFile = sys.argv[1]
outFile = sys.argv[2]
Then you can read in all your data, do your manipulations, and write out the results:
with open(inFile,'r') as i:
lines = i.readlines()
processedLines = manipulateData(lines)
with open(outFile,'w') as o:
for line in processedLines:
o.write(line)
You can call this program by running python script.py input_file.txt output_file.txt
If you absolutely must pipe the data to python (which is really not recommended), use sys.stdin.readlines()
This method (your question) describes reading data from STDIN:
cat input_file.txt | python script.py
Solution: script.py:
import sys
for line in sys.stdin:
print line
The method in above solutions describes taking argument parameters with your python call:
python script.py input_file.txt
Solution: script.py:
import sys
with open(sys.argv[1], 'r') as file:
for line in file:
print line
Hope this helps!
cat input_file.txt | python script.py > output_file.txt.
You can passing a big string that has all the data inside input_file.txt instead of an actual file so in order to implement your python script, just take that it as a string argument and split the strings by new line characters, for example you can use "\n" as a delimiter to split that big string and to write to an outputfile, just do it in the normal way
i.e. open file, write to the file and close file
Sending output to a file is very similar to taking input from a file.
You open a file for writing the same way you do for reading, except with a 'w' mode instead of an 'r' mode.
You write to a file by calling write on it the same way you read by calling read or readline.
This is all explained in the Reading and Writing Files section of the tutorial.
So, if your existing code looks like this:
with open('input.txt', 'r') as f:
while True:
line = f.readline()
if not line:
break
print(line)
You just need to do this:
with open('input.txt', 'r') as fin, open('output.txt', 'w') as fout:
while True:
line = fin.readline()
if not line:
break
fout.write(line)
If you're looking to allow the user to pass the filenames on the command line, use sys.argv to get the filenames, or use argparse for more complicated command-line argument parsing.
For example, you can change the first line to this:
import sys
with open(sys.argv[1], 'r') as fin, open(sys.argv[2], 'w') as fout:
Now, you can run the program like this:
python script.py input_file.txt outputfile.txt
cat input_file.txt | python script.py > output_file.txt
Basically, python script needs to read the input file and write to the standard output.
import sys
with open('input_file.txt', 'r') as f:
while True:
line = f.readline()
if not line:
break
sys.stdout.write(line)

remove first char from each line in a text file

im new to Python, to programming in general.
I want to remove first char from each line in a text file and write the changes back to the file. For example i have file with 36 lines, and the first char in each line contains a symbol or a number, and i want it to be removed.
I made a little code here, but it doesn't work as expected, it only duplicates whole liens. Any help would be appreciated in advance!
from sys import argv
run, filename = argv
f = open(filename, 'a+')
f.seek(0)
lines = f.readlines()
for line in lines:
f.write(line[1:])
f.close()
Your code already does remove the first character. I saved exactly your code as both dupy.py and dupy.txt, then ran python dupy.py dupy.txt, and the result is:
from sys import argv
run, filename = argv
f = open(filename, 'a+')
f.seek(0)
lines = f.readlines()
for line in lines:
f.write(line[1:])
f.close()
rom sys import argv
un, filename = argv
= open(filename, 'a+')
.seek(0)
ines = f.readlines()
or line in lines:
f.write(line[1:])
.close()
It's not copying entire lines; it's copying lines with their first character stripped.
But from the initial statement of your problem, it sounds like you want to overwrite the lines, not append new copies. To do that, don't use append mode. Read the file, then write it:
from sys import argv
run, filename = argv
f = open(filename)
lines = f.readlines()
f.close()
f = open(filename, 'w')
for line in lines:
f.write(line[1:])
f.close()
Or, alternatively, write a new file, then move it on top of the original when you're done:
import os
from sys import argv
run, filename = argv
fin = open(filename)
fout = open(filename + '.tmp', 'w')
lines = f.readlines()
for line in lines:
fout.write(line[1:])
fout.close()
fin.close()
os.rename(filename + '.tmp', filename)
(Note that this version will not work as-is on Windows, but it's simpler than the actual cross-platform version; if you need Windows, I can explain how to do this.)
You can make the code a lot simpler, more robust, and more efficient by using with statements, looping directly over the file instead of calling readlines, and using tempfile:
import tempfile
from sys import argv
run, filename = argv
with open(filename) as fin, tempfile.NamedTemporaryFile(delete=False) as fout:
for line in fin:
fout.write(line[1:])
os.rename(fout.name, filename)
On most platforms, this guarantees an "atomic write"—when your script finishes, or even if someone pulls the plug in the middle of it running, the file will end up either replaced by the new version, or untouched; there's no way it can end up half-way overwritten into unrecoverable garbage.
Again this version won't work on Windows. Without a whole lot of work, there is no way to implement this "write-temp-and-rename" algorithm on Windows. But you can come close with only a bit of extra work:
with open(filename) as fin, tempfile.NamedTemporaryFile(delete=False) as fout:
for line in fin:
fout.write(line[1:])
outname = fout.name
os.remove(filename)
os.rename(outname, filename)
This does prevent you from half-overwriting the file, but it leaves a hole where you may have deleted the original file, and left the new file in a temporary location that you'll have to search for. You can make this a little nicer by putting the file somewhere easier to find (see the NamedTemporaryFile docs to see how). Or renaming the original file to a temporary name, then writing to the original filename, then deleting the original file. Or various other possibilities. But to actually get the same behavior as on other platforms is very difficult.
You can either read all lines in memory then recreate file,
from sys import argv
run, filename = argv
with open(filename, 'r') as f:
data = [i[1:] for i in f
with open(filename, 'w') as f:
f.writelines(i+'\n' for i in data) # this is for linux. for win use \r\n
or You can create other file and move data from first file to second line by line. Then You can rename it If You'd like
from sys import argv
run, filename = argv
new_name = filename + '.tmp'
with open(filename, 'r') as f_in, open(new_name, 'w') as f_out:
for line in f_in:
f_out.write(line[1:])
os.rename(new_name, filename)
At its most basic, your problem is that you need to seek back to the beginning of the file after you read its complete contents into the array f. Since you are making the file shorter, you also need to use truncate to adjust the official length of the file after you're done. Furthermore, open mode a+ (a is for append) overrides seek and forces all writes to go to the end of the file. So your code should look something like this:
import sys
def main(argv):
filename = argv[1]
with open(filename, 'r+') as f:
lines = f.readlines()
f.seek(0)
for line in lines:
f.write(line[1:])
f.truncate()
if __name__ == '__main__': main(sys.argv)
It is better, when doing something like this, to write the changes to a new file and then rename it over the old file when you're done. This causes the update to happen "atomically" - a concurrent reader sees either the old file or the new one, not some mangled combination of the two. That looks like this:
import os
import sys
import tempfile
def main(argv):
filename = argv[1]
with open(filename, 'r') as inf:
with tempfile.NamedTemporaryFile(dir=".", delete=False) as outf:
tname = outf.name
for line in inf:
outf.write(line[1:])
os.rename(tname, filename)
if __name__ == '__main__': main(sys.argv)
(Note: Atomically replacing a file via rename does not work on Windows; you have to os.remove the old name first. This unfortunately does mean there is a brief window (no pun intended) where a concurrent reader will find that the file does not exist. As far as I know there is no way to avoid this.)
import re
with open(filename,'r+') as f:
modified = re.sub('^.','',f.read(),flags=re.MULTILINE)
f.seek(0,0)
f.write(modified)
In the regex pattern:
^ means 'start of string'
^ with flag re.MULTILINE means 'start of line'
^. means 'the only one character at the start of a line'
The start of a line is the start of the string or any position after a newline (a newline is \n)
So, we may fear that some newlines in sequences like \n\n\n\n\n\n\n could match with the regex pattern.
But the dot symbolizes any character EXCEPT a newline, then all the newlines don't match with this regex pattern.
During the reading of the file triggered by f.read(), the file's pointer goes until the end of the file.
f.seek(0,0) moves the file's pointer back to the beginning of the file
f.truncate() puts a new EOF = end of file at the point where the writing has stopped. It's necessary since the modified text is shorter than the original one.
Compare what it does with a code without this line
To be hones, i'm really not sure how good/bad is an idea of nesting with open(), but you can do something like this.
with open(filename_you_reading_lines_FROM, 'r') as f0:
with open(filename_you_appending_modified_lines_TO, 'a') as f1:
for line in f0:
f1.write(line[1:])
While there seemed to be some discussion of best practice and whether it would run on Windows or not, being new to Python, I was able to run the first example that worked and get it to run in my Win environment that has cygwin binaries in my environmental variables Path and remove the first 3 characters (which were line numbers from a sample file):
import os
from sys import argv
run, filename = argv
fin = open(filename)
fout = open(filename + '.tmp', 'w')
lines = fin.readlines()
for line in lines:
fout.write(line[3:])
fout.close()
fin.close()
I chose not to automatically overwrite since I wanted to be able to eyeball the output.
python c:\bin\remove1st3.py sampleCode.txt

Python read/write file without closing

Sometimes when I open a file for reading or writing in Python
f = open('workfile', 'r')
or
f = open('workfile', 'w')
I read/write the file, and then at the end I forget to do f.close(). Is there a way to automatically close after all the reading/writing is done, or after the code finishes processing?
with open('file.txt','r') as f:
#file is opened and accessible via f
pass
#file will be closed before here
You could always use the with...as statement
with open('workfile') as f:
"""Do something with file"""
or you could also use a try...finally block
f = open('workfile', 'r')
try:
"""Do something with file"""
finally:
f.close()
Although since you say that you forget to add f.close(), I guess the with...as statement will be the best for you and given it's simplicity, it's hard to see the reason for not using it!
Whatever you do with your file, after you read it in, this is how you should read and write it back:
$ python myscript.py sample.txt sample1.txt
Then the first argument (sample.txt) is our "oldfile" and the second argument (sample1.txt) is our "newfile". You can then do the following code into a file called "myscript.py"
from sys import argv
script_name,oldfile,newfile = argv
content = open(oldfile,"r").read()
# now, you can rearrange your content here
t = open(newfile,"w")
t.write(content)
t.close()

How to read a whole file in Python? To work universally in command line

How to read a whole file in Python? I would like my script to work however it is called
script.py log.txt
script.py < log2.txt
python script.py < log2.txt
python -i script.py logs/yesterday.txt
You get the idea.
I tried
import fileinput
from bs4 import BeautifulSoup
f = fileinput.input()
soup = BeautifulSoup(f.read())
But I get
Traceback (most recent call last):
File "visual-studio-extension-load-times.py", line 5, in <module>
soup = BeautifulSoup(f.read())
AttributeError: FileInput instance has no attribute 'read'
Instead of using fileinput, open the file directly yourself:
import sys
try:
fileobj = open(sys.argv[1], 'r')
except IndexError:
fileobj = sys.stdin
with fileobj:
data = fileobj.read()
f = open('file.txt', 'r')
data = f.read()
f.close()
further more, to open a file passed from the command line you can do:
(also, this is a smarter way to open files, instead of f = open(...) you can do with ...)
import sys
with open(sys.argv[1], 'r') as f:
data = f.read()
The reason for with being a smarter way to open files, is because it will automaticlly close the file after you leave the indented with block.
This means you don't have to "worry" about files being open or forgotten for to long (that can cause "to many open filehandles" from your OS)
Then to sys.argv
sys.argv[1] will be the first parameter on the command line after your python file.
sys.argv[0] will be your scripts name. for instance:
python myscript.py heeyooo will be:
sys.argv[0] == "myscript.py"
sys.argv[1] == "heeyooo" :)
Then there's all sorts of modules that will be interesting to you when working with files.
For one, os.path is a good start because you will most likely want to do as much cross-platform as possible and this gives you the option to convert \ into / on Linux and vice versa.
A few good ones are:
os.path.abspath
os.path.isfile
os.path.isdir
You also have os.getcwd() which might be good :)
argparse to the rescue!:
>>> import sys
>>> import argparse
>>> parser = argparse.ArgumentParser()
>>> parser.add_argument('infile', nargs='?', type=argparse.FileType('r'),
... default=sys.stdin)
>>> args = parser.parse_args()
>>> file_data = args.infile.read()
script.py log.txt
script.py < log2.txt
These two are very different invocations of your script! In the first, the shell passes the filename log.txt to the script; in the second, the shell connects the script's standard input to the file log2.txt, and the script never actually sees the filename.
It is possible to handle both of these in the same script. One way to do it is to read from standard input if no files are passed on the command line. Another way is to read from standard input if it's not a terminal and then also read files listed on the command line, if any (I do like fileinput for this if you are interested in reading the lines but don't care what file they come from). You can use sys.stdin.isatty() which returns True if the standard input is a terminal. So something like this:
import sys, fileinput
if not sys.stdin.isatty():
for line in sys.stdin:
process(line)
for line in fileinput.input():
process(line)
But if you are looking to process each file as a whole, as it appears, then fileinput won't do. Instead, read each filename from the command line individually, read the indicated file, and process it:
import sys
if not sys.stdin.isatty():
stdin = sys.stdin.read()
if stdin:
process(stdin)
for filename in sys.argv[1:]:
with open(filename) as f:
process(f.read())
Now with regard to these invocations:
python script.py < log2.txt
python -i script.py logs/yesterday.txt
These are the same as though you had just invoked script.py directly as far as the script can tell, so you don't need to handle them specially. Using the -i option with input indirection (<) could cause some unexpected behavior, but I haven't tried it (and there wouldn't be any way to work around it anyway).
It doesn't sound like you really wanted fileinput in the first place, since you're not trying to concatenate multiple files, handle the name - as "put stdin here", etc.
But if you do want fileinput, instead of trying to reproduce all of its behavior, just wrap it up.
You want to read all of the input into one string, but all it provides is functions that give you one line or one file at a time. So, what can you do? Join them together:
soup = BeautifulSoup(''.join(fileinput.input()))
That's it.
Went with this.
import sys
from bs4 import BeautifulSoup
f = open(sys.argv[1] ) if sys.argv[1:] else sys.stdin
soup = BeautifulSoup(f)

Categories

Resources