Input a text file into a program

Input a text file into a program - python

I'm working on a PDF generator project. The goal is to have a program that takes document files and generate a PDF file. I'm having trouble in finding a way to input a file into the program to be converted.
I started out by using the input function, where I input the file in the terminal. As a test, I wanted to input, open, read, and print a csv file containing US zipcode data. The rest of the program opens, reads and prints out some of the data. Here is the code:
import csv
file = input("Drop file here: ")
with open(file, 'r', encoding='utf8') as zf:
rf = csv.reader(zf, delimiter=',')
header = next(rf)
data = [row for row in rf]
print(header)
print(data[1])
print(data[10])
print(data[100])
print(data[1000])
When I opened the terminal to input the file this error (TypeError: 'encoding' is an invalid keyword argument for this function) appeared.
Is there a better way I can code a program to input a file so it can be open and converted into a PDF?

There are more things going on and as was mentioned in the comments, in this case it is very relevant which version of python are you using. A bit more of the back story.
input built-in has different meaning in Python2 (https://docs.python.org/2.7/library/functions.html#input) or Python3 (https://docs.python.org/3.6/library/functions.html#input). In Python2 it reads the user input and tries to execute it as python code, which is unlikely what you actually wanted.
Then as pointed out, open arguments are different as well (https://docs.python.org/2.7/library/functions.html#open and https://docs.python.org/3.6/library/functions.html#open).
In short, as suggested by #idlehands, if you have both version installed try calling python3 instead of python and this code should actually run.
Recommendation: I would suggest not to use interactive input like this at all (unless there is a good reason to do that) and instead let the desired filename be passed in from outside. I'd opt for argparse (https://docs.python.org/3.6/library/argparse.html#module-argparse) in this case which very comfortably gives you great flexibility, for instance myscript.py:
#!/usr/bin/env python3
import argparse
import sys
parser = argparse.ArgumentParser(description='My script to do stuff.')
parser.add_argument('-o', '--output', metavar='OUTFILE', dest='out_file',
type=argparse.FileType('w'), default=sys.stdout,
help='Resulting file.')
parser.add_argument('in_file', metavar='INFILE', nargs="?",
type=argparse.FileType('r'), default=sys.stdin,
help='File to be processed.')
args = parser.parse_args()
args.out_file.write(args.in_file.read()) # replace with actual action
This gives you the ability to run the script as a pass through pipe stuff in and out, work on specified file(s) as well as explicitly use - to denote stdin/stdout are to be used. argparse also gives you command line usage/help for free.
You may want the specifics tweak for different behavior, but bottom line, I'd still go with a command line argument.
EDIT: I should add more more comment for consideration. I'd write the actual code (a function or more complex object) performing the wanted action so that it exposes ins/outs through its interfaces and write the command line to gather these bits and call my action code with it. That way you can reuse it from another Python script easily or write a GUI for that should you need/want to.

Related

How do you use Python Ghostscript's high-level interface to convert a .pdf file into multiple .png files?

I am trying to convert a .pdf file into several .png files using Ghostscript in Python. The other answers on here were pretty old hence this new thread.
The following code was given as an example on pypi.org of the 'high level' interface, and I am trying to model my code after the example code below.
import sys
import locale
import ghostscript
args = [
"ps2pdf", # actual value doesn't matter
"-dNOPAUSE", "-dBATCH", "-dSAFER",
"-sDEVICE=pdfwrite",
"-sOutputFile=" + sys.argv[1],
"-c", ".setpdfwrite",
"-f", sys.argv[2]
]
# arguments have to be bytes, encode them
encoding = locale.getpreferredencoding()
args = [a.encode(encoding) for a in args]
ghostscript.Ghostscript(*args)
Can someone explain what this code is doing? And can it be used somehow to convert a .pdf into .png files?
I am new to this and am truly confused. Thanks so much!

That's calling Ghostscript, obviously. From the arguments it's not spawning a process, it's linked (either dynamically or statically) to the Ghostscript library.
The args are Ghostscript arguments. These are documented in the Ghostscript documentation, you can find it online here. Because it mimics the command line interface, where the first argument is the calling program, the first argument here is meaningless and can be anything you want (as the comment says).
The next three arguments turn on SAFER (which prevents some potentially dangerous operations and is, now, the default anyway), sets NOPAUSE so the entire input is processed without pausing between pages, and BATCH so that on completion Ghostscript exits instead of returning to the interactive prompt.
Then it selects a device. In Ghostscript (due to the PostScript language) devices are what actually output stuff. In this case the device selected is the pdfwrite device, which outputs PDF.
Then there's the OutputFile, you can probably guess that this is the name (and path) of the file where the output is to be written.
The next 3 arguments; -c .setpdfwrite -f are, frankly archaic and pointless. They were once recommended when using the pdfwrite device (and only the pdfwrite device) but they have no useful effect these days.
The very last argument is, of course, the input file.
Certainly you can use Ghostscript to render PDF files to PNG. You want to use one of the PNG devices, there are several depending on what colour depth you want to support. Unless you have some stranger requirement, just use png16m. If your input file contains more than one page you'll want to set the OutputFile to use %d so that it writes one file per page.
More details on all of this can, of course, be found in the documentation.

Python String Query

I recently made a Twitter-bot that takes a specified .txt file and tweets it out, line by line. A lot of the features I built into the program to troubleshoot some formatting issues actually allows the program to work with pretty much any text file.
I would like build in a feature where I can "import" a .txt file to use.
I put that in quotes because the program runs in the command line at them moment.
I figured there are two ways I can tackle this problem but need some guidance on each:
A) I begin the program with a prompt asking which file the user want to use. This is stored as a string (lets say variable string) and the code looks like this-
file = open(string,'r')
There are two main issues with. The first is I'm unsure how to keep the program from crashing if the program specified is misspelled or does not exist. The second is that it won't mesh with future development (eventually I'd like to build app functionality around this program)
B) Somehow specify the desired file somehow in the command line. While the program will still occasionally crash, it isn't as inconvenient to the user. Also, this would lend itself to future development, as it'll be easier to pass a value in through the command line than an internal prompt.
Any ideas?

For the first part of the question, exception handling is the way to go . Though for the second part you can also use a module called argparse.
import argparse
# creating command line argument with the name file_name
parser = argparse.ArgumentParser()
parser.add_argument("file_name", help="Enter file name")
args = parser.parse_args()
# reading the file
with open(args.file_name,'r') as f:
file = f.read()
You can read more about the argparse module on its documentation page.

Regarding A), you may want to investigate
try:
with open(fname) as f:
blah = f.read()
except Exception as ex:
# handle error
and for B) you can, e.g.
import sys
fname = sys.argv[1]
You could also combine the both to make sure that the user has passed an argument:
#!/usr/bin/env python
# encoding: utf-8
import sys
def tweet_me(text):
# your bot goes here
print text
if __name__ == '__main__':
try:
fname = sys.argv[1]
with open(fname) as f:
blah = f.read()
tweet_me(blah)
except Exception as ex:
print ex
print "Please call this as %s <name-of-textfile>" % sys.argv[0]
Just in case someone wonders about the # encoding: utf-8 line. This allows the source code to contain utf-8 characters. Otherwise only ASCII is allowed, which would be ok for this script. So the line is not necessary. I was, however, testing the script on itself (python x.py x.py) and, as a little test, added a utf-8 comment (# ä). In real life, you will have to care a lot more for character encoding of your input...
Beware, however, that just catching any Exception that may arise from the whole program is not considered good coding style. While Python encourages to assume the best and try it, it might be wise to catch expectable errors right where they happen. For example , accessing a file which does not exist will raise an IOError. You may end up with something like:
except IndexError as ex:
print "Syntax: %s <text-file>" % sys.argv[0]
except IOError as ex:
print "Please provide an existing (and accessible) text-file."
except Exception as ex:
print "uncaught Exception:", type(ex)

Interpolate variables into external programs call (subprocess.call/Popen) in Python

I'm trying to run an external program from a Python script.
After searching and reading multiple post here I came to what seemed to be the solution.
First, I used subprocess.call function.
If I build the command this way:
hmmer1=subprocess.call("D:\Python_Scripts\HMMer3\hmmsearch.exe --tblout hmmTestTab.out SDHA.hmm Test.fasta")
The external program D:\Python_Scripts\HMMer3\hmmsearch.exe is run taking hmmTestTab.out as file name for the output and SDHA.hmm and Test.fasta as input files.
Nevertheless, if I try to replace the file names with the variables outfile, hmmprofile and fastafile (I intend to receive those variables as arguments for the Python script and use them to build the external program call),
hmmer2=subprocess.call("D:\Python_Scripts\HMMer3\hmmsearch.exe --tblout outfile hmmprofile fastafile")
Python prints an error about being unable to open the input files.
I also used "Popen" function with analogous results:
This call works
hmmer3=Popen(['D:\Python_Scripts\HMMer3\hmmsearch.exe', '--tblout','hmmTestTab.out', 'SDHA.hmm','Test.fasta'])
and this one doesn't
hmmer4=Popen(['D:\Python_Scripts\HMMer3\hmmsearch.exe', '--tblout','outfile', 'hmmprofile','fastafile'])
As result of this, I presume I need to understand which is process to follow to interpolate the variables into the call, because it seems that the problem is there.
Would any of you help me with this issue?
Thanks in advance

You have:
hmmer4=Popen(['D:\Python_Scripts\HMMer3\hmmsearch.exe', '--tblout','outfile', 'hmmprofile','fastafile'])
But that's not passing the variable outfile. It's passing a string, 'outfile'.
You want:
hmmer4=Popen(['D:\Python_Scripts\HMMer3\hmmsearch.exe', '--tblout', outfile, hmmprofile, fastafile])
And the other answer is correct, though it addresses a different problem; you should double the backslashes, or use r'' raw strings.

Try to change this:
hmmer1=subprocess.call("D:\Python_Scripts\HMMer3\hmmsearch.exe"
to
hmmer1=subprocess.call('D:\\Python_Scripts\\HMMer3\\hmmsearch.exe'
Edit
argv = ' --tblout outfile hmmprofile fastafile' # your arguments
program = [r'"D:\\Python_Scripts\\HMMer3\\hmmsearch.exe"', argv]
subprocess.call(program)

Limit number of arguments passed in

I am using argparse to take a list of input files:
import argparse
p = argparse.ArgumentParser()
p.add_argument("infile", nargs='+', type=argparse.FileType('r'), help="copy from")
p.add_argument("outfile", help="copy to")
args = p.parse_args()
However, this opens the door for user to pass in prog /path/to/* outfile, where the source directory could potentially have millions of file, the shell expansion can overrun the parser. My questions are:
is there a way to disable the shell expansion (*) within?
if not, if there a way to put a cap on the number of input files before it is assembled into a list?

(1) no, the shell expansion is done by the shell. When Python is run, the command line is expanded already. The use "*" or '*' will deactivate it but that also happens on the shell.
(2) Yes, get the length of sys.argv early in your code and exit if it is too long.
Also most shells have a built-in limit to the expansion.

If you are concerned about too many infile values, don't use FileType.
p.add_argument("infile", nargs='+', help="copy from")
Just accept a list of file names. That's not going to cost you much. Then you can open and process just as many of the files as you want.
FileType opens the file when the name is parsed. That is ok for a few files that you will use right away in small script. But usually you don't want, or need, to have all those files open at once. In modern Python you are encouraged to open files in a with context, so the get closed right away (instead of hanging around till the script is done).
FileType handles the '-', stdin, value. And it will issue a nice error report if it fails to open a file. But is that what you want? Or would you rather process each file, skipping over the bad names.
Overall FileType is a convenience, but generally a poor choice in serious applications.
Something else to be worried about - outfile is the last of a (potentially) long list of files, the '+' input ones and 1 more. argparse accepts that, but it could give problems. For example what if the user forgets to provide an 'outfile'? Then the last of input files will be used as the outfile. That error could result in unintentionally over writing a file. It may be safer to use '-o','--outfile',, making the user explicitly mark the outfile. And the user could give it first, so he doesn't forget.
In general '+' and '*' positionals are safest when used last.

Read from stdin or input file with argparse

I'd like to use argparse to read from either stdin or an input file. In other words:
If an input file is given, read that.
If not, read from stdin only if it's not the terminal. (i.e. a file is being piped in)
If neither of these criteria are satisfied, signal to argparse that the inputs aren't correct.
I'm asking for behavior similar to what's described in this question, but I want argparse to recognize no file as a failed input.

Using the information from the question you linked to, what about using sys.stdin.isatty() to check if the instance your program is being run is part of a pipeline, if not, read from input file, otherwise read from stdin. If the input file does not exist or stdin is empty throw an error.
Hope that helped.

I would recommend just settings nargs='?' and then handling the case of a Nonetype separately. According to the official documentation, "FileType objects understand the pseudo-argument '-' and automatically convert this into sys.stdin for readable FileType objects and sys.stdout for writable FileType objects". So just give it a dash if you want stdin.
Example
import argparse
import sys
parser = argparse.ArgumentParser()
parser.add_argument('inputfile', nargs='?', type=argparse.FileType('r'))
if not inputfile:
sys.exit("Please provide an input file, or pipe it via stdin")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.