My argparse const=open() option always creates empty files? - python

I am a student and for practice am creating a little program that converts .fastq to .fasta files (so it deletes some lines, basically).
I am trying to implement the typical user input of an input file and an output file with the argparse library. For the output, I am trying to have three scenarios:
user puts -o outputfilename.fasta to create an outfile with custom name
user puts no argument, then it prints output in stdout
user puts -o with no followup, then it should create a file by itself with the name from input .fasta.
#!/usr/bin/python3
import argparse
import re
import sys
c=1
parser = argparse.ArgumentParser()
parser.add_argument("--input", "-i", required=True, dest="inputfile", type=argparse.FileType("r"))
parser.add_argument("--output", "-o", dest="outfilename", type=argparse.FileType("w"), nargs="?", default=sys.stdout, const=open('{}.fasta'.format(sys.argv[2]), "w" ))
args = parser.parse_args()
for line in args.inputfile:
if c==1:
line=re.sub ("[#]", ">", line)
args.outfilename.write (line)
c=c+1
elif c==2:
args.outfilename.write (line)
c=c+1
elif c==3:
c=c+1
else:
c=1
I am struggling with the third option, because the way my code is now, it always creates the extra file, but empty. So basically, it always runs my const= option, even though according to the manual, it shouldn't.
(just to be clear: I type -o outfilename.fasta and it produces the file plus an empty one from the input name. I type no argument and it prints it in my commandline and produces the empty inputname file. I type -o and it produces the inputfilename.fasta file with the correct lines in it)
nargs='?'. One argument will be consumed from the command line if possible, and produced as a single item. If no command-line argument is present, the value from default will be produced. Note that for optional arguments, there is an additional case - the option string is present but not followed by a command-line argument. In this case the value from const will be produced.
Because I thought the open command might be problematic, I tried it with
parser.add_argument("--output", "-o", dest="outfilename", type=argparse.FileType("w"), nargs="?", default=sys.stdout, const=argparse.FileType('{}.fasta'.format(sys.argv[2]), "w" ))
(I just wanted another way to write a file without using open)
and weirdly enough it only gave me this error message:
Traceback (most recent call last):
File "./fastqtofastaEXPANDED.py", line 19, in
args.outfilename.write (line)
AttributeError: 'FileType' object has no attribute 'write'
when I used -o argument. So that would tell me the opposite, that it does indeed only use the const option when I type -o, and not in the other cases (since the other ones worked fine, without extra files and without error messages).
I am confused as to why with the open parameter it seems to use const all the time....
I feel like a solution to my problem might be in the action classes, but I couldn't wrap my head around that yet. I would be no problem if the const just worked the way the manual says :D or is it error in the open, after all?
Thanks for your help!
EDIT: Since the const= probably won't work the way I wanted, I've created this work-around.
Basically just said that if the value is None, it will open a new file with name from the first input, minus suffix, plus new suffix.
If someone has a better solution, I am still open to change it :)
parser.add_argument("--output", "-o", dest="outfilename", type=argparse.FileType("w"), nargs="?", default=sys.stdout)
args = parser.parse_args()
if args.outfilename==None:
i=sys.argv[2][:sys.argv[2].rfind(".")]
args.outfilename=open("{}.fasta".format(i), "w")
#then all the line reading jazz...
if args.outfilename==None:
args.outfilename.close()
#to close the file, if it was used.

With this script:
import argparse, sys
# testing the use of `sys.argv` to create a filename (so-so idea)
if sys.argv[1:]:
constname = f'foobar{sys.argv[1]}.txt'
else:
constname = 'foobar1.txt'
parser = argparse.ArgumentParser()
a2 = parser.add_argument('number', type=int)
a1 = parser.add_argument('-o','--output', nargs='?', default='foobar0.txt',
const=constname, type=argparse.FileType('w'))
args = parser.parse_args()
print(args)
Sample runs:
1253:~/mypy$ rm foobar*
1253:~/mypy$ python3 stack63357111.py
usage: stack63357111.py [-h] [-o [OUTPUT]] number
stack63357111.py: error: the following arguments are required: number
1253:~/mypy$ ls foobar*
foobar0.txt
Even though argparse issues an error and quits, it creates the default file. That's because the output default is processed before the required check. Trying to create a const value based on some number in sys.argv is clumsy, and error prone.
1253:~/mypy$ python3 stack63357111.py 2
Namespace(number=2, output=<_io.TextIOWrapper name='foobar0.txt' mode='w' encoding='UTF-8'>)
1254:~/mypy$ ls foobar*
foobar0.txt
argparse creates the default and leaves it open for you to use.
1254:~/mypy$ python3 stack63357111.py 2 -o
Namespace(number=2, output=<_io.TextIOWrapper name='foobar2.txt' mode='w' encoding='UTF-8'>)
1254:~/mypy$ ls foobar*
foobar0.txt foobar2.txt
argparse creates the const based on that number positional
1254:~/mypy$ python3 stack63357111.py 2 -o foobar3.txt
Namespace(number=2, output=<_io.TextIOWrapper name='foobar3.txt' mode='w' encoding='UTF-8'>)
1254:~/mypy$ ls foobar*
foobar0.txt foobar2.txt foobar3.txt
Making a file with the user provided name.
Overall I think it's better to keep the use of FileType simple, and handle special cases after parsing. There's no virtue in doing everything in the parser. Its primary job is to determine what the user wants; your own code does the execution.

Related

Setting optional system arguments via command prompt

I am doing a project in which I want to specify one system argument on my cmd right after the script.py. My problem is that I want to specify another argument in which is optional, and the user may or may not want to give that argument. Therefore, I am struggling how to deal with the fact that the system argument might or might not be given by the user and how to read that. If everything sounds confusing, the following text might clarify:
The user types the following on the command prompt to run the program:
python script.py file.txt
I want to add an argument which may or may not be given, like:
python script.py file.txt file_added.txt
As I read these arguments on my main script, I though that this problem would solve:
If sys.argv[2] is not None:
file2 = f"\{sys.argv[2]}"
However, I still getting IndexError when doing that. So, is there a simple way to bypass such problem?
If sys.argv holds less than 2 items, you'll get an IndexError. So wrap the statement around with a try block
try:
filename = sys.argv[2]
except IndexError:
filename = None
if filename:
# ... do something
A way to accomplish this would be to check the length of sys.argv. If the length is 3 you'll know that a second argument was passed (3 because the first argument is script.py). So something along the lines:
if len(sys.argv) == 3:
file2 = f"\{sys.argv[2]}"
Here, sys.argv[2] is not None you are checking if 3rd element is None or not and that is the issue.
You are indexing outside the length of argv array and index error.
If you only have max 2 input then you could check the length of argv like if len(sys.argv) == 3 and that means you have got both the input and then you can access them via sys.argv[1] and sys.argv[2]
You can use argsparse which is a built in library in python, which makes it easy to handle command line arguments. Go to the link https://docs.python.org/3/library/argparse.html to know mor, but the basic implementation for your usecase will be like this.
import argparse
parser = argparse.ArgumentParser(description='Enter filenames')
parser.add_argument('-file', type=str,help='enter the file name',dest='filename1')
parser.add_argument('--optional','--op',type=str, dest='filename2',help='enter optional filename')
args = parser.parse_args()
file1=args.filename1
file2=args.filename2
Then in the cmd you can invoke it as
python script.py -filename="file1.txt"
or
python script.py -filename="file1.txt" --optional="file2.txt"
or
python script.py -filename="file1.txt" --op="file2.txt"
You are looking for argv[1], argv[2], and so on.
This should work:
for filename in sys.argv[1:]:
readfile(filename)

Argparse Arg with > flag

I have a commandline script that works perfectly fine. Now I want to make my tool more intuitive.
I have:
parser.add_argument("-s",help = "'*.sam','*.fasta','*.fastq'", required=True)
right now, python script.py -s savefile.sam works but I would like it to be python script.py > savefile.sam
parser.add_argument("->",help = "'*.sam','*.fasta','*.fastq'", required=True)
does not work as it gives: error: unrecognized arguments: -
can I do this with argparse or should I settle for -s?
> savefile.sam is shell syntax and means "send output to the file savefile.sam". Argparse won't even see this part of the command because the shell will interpret it first (assuming you issue this command from a suitable shell).
While your command does make sense, you shouldn't try to use argparse to implement it. Instead, if an -s isn't detected, simply send the script's output to stdout. You can achieve this by setting the default for -s:
parser.add_argument("-s",
type=argparse.FileType("w"),
help="'*.sam','*.fasta','*.fastq'",
default=sys.stdout)
This way, you can run python script.py > savefile.sam, and the following will happen:
The shell will evaluate python script.py.
argparse will see no additional arguments, and will use the default sys.stdout.
Your script will send output to stdout.
The shell will redirect the script's output from stdout to savefile.sam.
Of course, you can also send the stdout of the script into the stdin the another process using a pipe.
Note that, using FileType, it's also legal to use -s - to specify stdout. See here for details.
In a sense your argparse works
import argparse
import sys
print sys.argv
parser=argparse.ArgumentParser()
parser.add_argument('->')
print parser.parse_args('-> test'.split())
print parser.parse_args()
with no arguments, it just assigns None to the > attribute. Note though that you can't access this attribute as args.>. You'd have to use getattr(args,'>') (which is what argparse uses internally). Better yet, assign this argument a proper long name or dest.
1401:~/mypy$ python stack29233375.py
['stack29233375.py']
Namespace(>='test')
Namespace(>=None)
But if I give a -> test argument, the shell redirection consumes the >, as shown below:
1405:~/mypy$ python stack29233375.py -> test
usage: stack29233375.py [-h] [-> >]
stack29233375.py: error: unrecognized arguments: -
1408:~/mypy$ cat test
['stack29233375.py', '-']
Namespace(>='test')
Only the - passes through in argv, and on to the parser. So it complains about unrecognized arguments. An optional positional argument could have consumed this string, resulting in no errors.
Notice that I had to look at the test file to see the rest of the output - because the shell redirected stdout to test. The error message goes to stderr, so it doesn't get redirected.
You can change the prefix char from - (or in addition to it) with an ArgumentParser parameter. But you can't use such a character alone. The flag must be prefix + char (i.e. 2 characters).
Finally, since this argument is required, do you even need a flag? Just make it a positional argument. It's quite common for scripts to take input and/or output file names as positional arguments.

Multiple arguments with stdin in Python

I have a burning question that concerns passing multiple stdin arguments when running a Python script from a Unix terminal.
Consider the following command:
$ cat file.txt | python3.1 pythonfile.py
Then the contents of file.txt (accessed through the "cat" command) will be passed to the python script as standard input. That works fine (although a more elegant way would be nice). But now I have to pass another argument, which is simply a word which will be used as a query (and later two words). But I cannot find out how to do that properly, as the cat pipe will yield errors. And you can't use the standard input() in Python because it will result in an EOF-error (you cannot combine stdin and input() in Python).
I am reasonably sure that the stdin marker with do the trick:
cat file.txt | python3.1 prearg - postarg
The more elegant way is probably to pass file.txt as an argument then open and read it.
The argparse module would give you a lot more flexibility to play with command line arguments.
import argparse
parser = argparse.ArgumentParser(prog='uppercase')
parser.add_argument('-f','--filename',
help='Any text file will do.') # filename arg
parser.add_argument('-u','--uppercase', action='store_true',
help='If set, all letters become uppercase.') # boolean arg
args = parser.parse_args()
if args.filename: # if a filename is supplied...
print 'reading file...'
f = open(args.filename).read()
if args.uppercase: # and if boolean argument is given...
print f.upper() # do your thing
else:
print f # or do nothing
else:
parser.print_help() # or print help
So when you run without arguments you get:
/home/myuser$ python test.py
usage: uppercase [-h] [-f FILENAME] [-u]
optional arguments:
-h, --help show this help message and exit
-f FILENAME, --filename FILENAME
Any text file will do.
-u, --uppercase If set, all letters become uppercase.
Let's say there is an absolute need for one to pass content as stdin, not filepath because your script resides in a docker container or something, but you also have other arguments that you are required to pass...so do something like this
import sys
import argparse
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('-dothis', '--DoThis', help='True or False', required=True)
# add as many such arguments as u want
args = vars(parser.parse_args())
if args['DoThis']=="True":
content = ""
for line in sys.stdin:
content = content + line
print "stdin - " + content
To run this script do
$ cat abc.txt | script.py -dothis True
$ echo "hello" | script.py -dothis True
The variable content would store in it whatever was printed out on the left side of the pipe, '|', and you would also be able to provide script arguments.
While Steve Barnes answer will work, it isn't really the most "pythonic" way of doing things. A more elegant way is to use sys arguments and open and read the file in the script itself. That way you don't have to pipe the output of the file and figure out a workaround, you can just pass the file name as another parameter.
Something like (in the python script):
import sys
with open(sys.argv[1].strip) as f:
file_contents = f.readlines()
# Do basic transformations on file contents here
transformed_file_contents = format(file_contents)
# Do the rest of your actions outside the with block,
# this will allow the file to close and is the idiomatic
# way to do this in python
So (in the command line):
python3.1 pythonfile.py file.txt postarg1 postarg2

Python stdin filename

I'm trying to get the filename thats given in the command line. For example:
python3 ritwc.py < DarkAndStormyNight.txt
I'm trying to get DarkAndStormyNight.txt
When I try fileinput.filename() I get back same with sys.stdin. Is this possible? I'm not looking for sys.argv[0] which returns the current script name.
Thanks!
In general it is not possible to obtain the filename in a platform-agnostic way. The other answers cover sensible alternatives like passing the name on the command-line.
On Linux, and some related systems, you can obtain the name of the file through the following trick:
import os
print(os.readlink('/proc/self/fd/0'))
/proc/ is a special filesystem on Linux that gives information about processes on the machine. self means the current running process (the one that opens the file). fd is a directory containing symbolic links for each open file descriptor in the process. 0 is the file descriptor number for stdin.
You can use ArgumentParser, which automattically gives you interface with commandline arguments, and even provides help, etc
from argparse import ArgumentParser
parser = ArgumentParser()
parser.add_argument('fname', metavar='FILE', help='file to process')
args = parser.parse_args()
with open(args.fname) as f:
#do stuff with f
Now you call python2 ritwc.py DarkAndStormyNight.txt. If you call python3 ritwc.py with no argument, it'll give an error saying it expected argument for FILE. You can also now call python3 ritwc.py -h and it will explain that a file to process is required.
PS here's a great intro in how to use it: http://docs.python.org/3.3/howto/argparse.html
In fact, as it seams that python cannot see that filename when the stdin is redirected from the console, you have an alternative:
Call your program like this:
python3 ritwc.py -i your_file.txt
and then add the following code to redirect the stdin from inside python, so that you have access to the filename through the variable "filename_in":
import sys
flag=0
for arg in sys.argv:
if flag:
filename_in = arg
break
if arg=="-i":
flag=1
sys.stdin = open(filename_in, 'r')
#the rest of your code...
If now you use the command:
print(sys.stdin.name)
you get your filename; however, when you do the same print command after redirecting stdin from the console you would got the result: <stdin>, which shall be an evidence that python can't see the filename in that way.
I don't think it's possible. As far as your python script is concerned it's writing to stdout. The fact that you are capturing what is written to stdout and writing it to file in your shell has nothing to do with the python script.

Python, optparse and file mask

if __name__=='__main__':
parser = OptionParser()
parser.add_option("-i", "--input_file",
dest="input_filename",
help="Read input from FILE", metavar="FILE")
(options, args) = parser.parse_args()
print options
result is
$ python convert.py -i video_*
{'input_filename': 'video_1.wmv'}
there are video_[1-6].wmv in the current folder.
Question is why video_* become video_1.wmv. What i'm doing wrong?
Python has nothing to do with this -- it's the shell.
Call
$ python convert.py -i 'video_*'
and it will pass in that wildcard.
The other six values were passed in as args, not attached to the -i, exactly as if you'd run python convert.py -i video_1 video_2 video_3 video_4 video_5 video_6, and the -i only attaches to the immediate next parameter.
That said, your best bet might to be just read your input filenames from args, rather than using options.input.
Print out args and you'll see where the other files are going...
They are being converted to separate arguments in argv, and optparse only takes the first one as the value for the input_filename option.
To clarify:
aprogram -e *.wmv
on a Linux shell, all wildcards (*.wmv) are expanded by the shell. So aprogram actually recieves the arguments:
sys.argv == ['aprogram', '-e', '1.wmv', '2.wmv', '3.wmv']
Like Charles said, you can quote the argument to get it to pass in literally:
aprogram -e "*.wmv"
This will pass in:
sys.argv == ['aprogram', '-e', '*.wmv']
It isn't obvious, even if you read some of the standards (like this or this).
The args part of a command line are -- almost universally -- the input files.
There are only very rare odd-ball cases where an input file is specified as an option. It does happen, but it's very rare.
Also, the output files are never named as args. They almost always are provided as named options.
The idea is that
Most programs can (and should) read from stdin. The command-line argument of - is a code for "stdin". If no arguments are given, stdin is the fallback plan.
If your program opens any files, it may as well open an unlimited number of files specified on the command line. The shell facilitates this by expanding wild-cards for you. [Windows doesn't do this for you, however.]
You program should never overwrite a file without an explicit command-line options, like '-o somefile' to write to a file.
Note that cp, mv, rm are the big examples of programs that don't follow these standards.

Categories

Resources