Python, optparse and file mask

Python, optparse and file mask - python

if __name__=='__main__':
parser = OptionParser()
parser.add_option("-i", "--input_file",
dest="input_filename",
help="Read input from FILE", metavar="FILE")
(options, args) = parser.parse_args()
print options
result is
$ python convert.py -i video_*
{'input_filename': 'video_1.wmv'}
there are video_[1-6].wmv in the current folder.
Question is why video_* become video_1.wmv. What i'm doing wrong?

Python has nothing to do with this -- it's the shell.
Call
$ python convert.py -i 'video_*'
and it will pass in that wildcard.
The other six values were passed in as args, not attached to the -i, exactly as if you'd run python convert.py -i video_1 video_2 video_3 video_4 video_5 video_6, and the -i only attaches to the immediate next parameter.
That said, your best bet might to be just read your input filenames from args, rather than using options.input.

Print out args and you'll see where the other files are going...
They are being converted to separate arguments in argv, and optparse only takes the first one as the value for the input_filename option.

To clarify:
aprogram -e *.wmv
on a Linux shell, all wildcards (*.wmv) are expanded by the shell. So aprogram actually recieves the arguments:
sys.argv == ['aprogram', '-e', '1.wmv', '2.wmv', '3.wmv']
Like Charles said, you can quote the argument to get it to pass in literally:
aprogram -e "*.wmv"
This will pass in:
sys.argv == ['aprogram', '-e', '*.wmv']

It isn't obvious, even if you read some of the standards (like this or this).
The args part of a command line are -- almost universally -- the input files.
There are only very rare odd-ball cases where an input file is specified as an option. It does happen, but it's very rare.
Also, the output files are never named as args. They almost always are provided as named options.
The idea is that
Most programs can (and should) read from stdin. The command-line argument of - is a code for "stdin". If no arguments are given, stdin is the fallback plan.
If your program opens any files, it may as well open an unlimited number of files specified on the command line. The shell facilitates this by expanding wild-cards for you. [Windows doesn't do this for you, however.]
You program should never overwrite a file without an explicit command-line options, like '-o somefile' to write to a file.
Note that cp, mv, rm are the big examples of programs that don't follow these standards.

Related

Prevent expansion of wildcards in non-quoted python script argument when running in UNIX environment

I have a python script that I'd like to supply with an argument (usually) containing wildcards, referring to a series of files that I'd like to do stuff with. Example here:
#!/usr/bin/env python
import argparse
import glob
parser = argparse.ArgumentParser()
parser.add_argument('-i', action="store", dest="i")
results = parser.parse_args()
print 'argument i is: ', results.i
list_of_matched_files = glob.glob(results.i)
In this case, everything works great if the user adds quotes to the passed argument like so:
./test_script.py -i "foo*.txt"
...but often times the users forget to add quotes to the argument and are stumped when the list only contains the first match because UNIX already expanded the list and argparse only then gets the first list element.
Is there a way (within the script) to prevent UNIX from expanding the list before passing it to python? Or maybe even just to test if the argument doesn't contain quotes and then warn the user?

No. Wildcards are expanded by the shell (Bash, zsh, csh, fish, whatever) before the script even runs, and the script can't do anything about them. Testing whether the argument contains quotes also won't work, as the shell similarly strips the quotes from "foo*.txt" before passing the argument to the script, so all Python sees is foo*.txt.

Its not UNIX that is doing the expansion, it is the shell.
Bash has an option set -o noglob (or -f) which turns off globbing (filename expansion), but that is non-standard.
If you give an end-user access to the command-line then they really should know about quoting. For example, the commonly used find command has a -name parameter which can take glob constructs but they have to be quoted in a similar manner. Your program is no different to any other.
If users can't handle that then maybe you should give them a different interface. You could go to the extreme of writing a GUI or a web/HTML front-end, but that's probably over the top.
Or why not prompt for the filename pattern? You could, for example, use a -p option to indicate prompting, e.g:
import argparse
import glob
parser = argparse.ArgumentParser()
parser.add_argument('-i', action="store", dest="i")
parser.add_argument('-p', action="store_true", default=False)
results = parser.parse_args()
if results.p:
pattern = raw_input("Enter filename pattern: ")
else:
pattern = results.i
list_of_matched_files = glob.glob(pattern)
print list_of_matched_files
(I have assumed Python 2 because of your print statement)
Here the input is not read by the shell but by python, which will not expand glob constructs unless you ask it to.

You can disable the expansion using set -f from the command line. (re-enable with set +f).
As jwodder correctly says though, this happens before the script is run, so the only way I can think of to do this is to wrap it with a shell script that disables expansion temporarily, runs the python script, and re-enables expansion. Preventing UNIX from expanding the list before passing it to python is not possible.

Here is an example for the Bash shell that shows what #Tom Wyllie is talking about:
alias sea='set -f; search_function'
search_function() { perl /home/scripts/search.pl $# ; set +f; }
This defines an alias called "sea" that:
Turns off expansion ("set -f")
Runs the search_function function which is a perl script
Turns expansion back on ("set +f")
The problem with this is that if a user stops execution with ^C or some such then the expansion may not be turned back on leaving the user puzzling why "ls *" is not working. So I'm not necessarily advocating using this. :).

This worked for me:
files = sys.argv[1:]
Even though only one string is on the command line, the shell expands the wildcards and fills sys.argv[] with the list.

Argparse optional stdin read and/or stdout out

A non-Python program will call a Python program with both input and output arguments. Input may be a file reference or a string redirected to stdin in the non-Python program. Output may be a file or stdout.
argparse.FileType seems ready to handle this as it already has the special - to direct to stdin/stdout. In fact, using - to direct to stdout works but the implementation/syntax for stdin is what I don't know.
Examples calls in the non-Python code:
python mycode.py - output.txt
python mycode.py - -
What does the non-Python code do after that? Print/stdout an input string?
What does the Python code do after that?
I will always need to distinguish where both args are going (i.e. input and output) so using default="-" nor default=sys.stdin in add_argument won't work because they require an absent argument.
Here's what I have so far:
parser = argparse.ArgumentParser()
parser.add_argument('read_fref', type=argparse.FileType('r'))
parser.add_argument('write_fref', type=argparse.FileType('w'))
parser_ns = parser.parse_args()
with parser_ns.read_fref as f_r:
read_f = json.load(f_r)
output = {'k': 'v'}
with parser_ns.write_fref as f_w:
json.dump(output, f_w)

I'm having trouble understanding what you are asking. I understand what Python and argparse are doing, but I don't quite understand what you are trying to do.
Your sample looks like it would run fine when called from a Linux shell. With the the - arguments, it should accept input from the keyboard, and display it on the screen. But those arguments are most often used with shell redirection controls >, <, | (details vary with shell, sh, bash, etc).
But if you are using the shell to redirect stdin or stdout to/from files, you could just as well give those files as commandline arguments.
If you are bothered by required/default issue, consider making these arguments flagged (also called optionals):
parser.add_argument('-r','--readfile', type=argparse.FileType('r'), default='-')
parser.add_argument('-w','--writefile', type=argparse.FileType('w'), default='-')
With this change, these calls are the same
python mycode.py -r - <test.json
python mycode.py <test.json
python mycode.py -r test.json
all writing to the screen (stdout). That could be redirected in similar ways.
To take typed input:
python mycode.py
{...}
^D

Python stdin filename

I'm trying to get the filename thats given in the command line. For example:
python3 ritwc.py < DarkAndStormyNight.txt
I'm trying to get DarkAndStormyNight.txt
When I try fileinput.filename() I get back same with sys.stdin. Is this possible? I'm not looking for sys.argv[0] which returns the current script name.
Thanks!

In general it is not possible to obtain the filename in a platform-agnostic way. The other answers cover sensible alternatives like passing the name on the command-line.
On Linux, and some related systems, you can obtain the name of the file through the following trick:
import os
print(os.readlink('/proc/self/fd/0'))
/proc/ is a special filesystem on Linux that gives information about processes on the machine. self means the current running process (the one that opens the file). fd is a directory containing symbolic links for each open file descriptor in the process. 0 is the file descriptor number for stdin.

You can use ArgumentParser, which automattically gives you interface with commandline arguments, and even provides help, etc
from argparse import ArgumentParser
parser = ArgumentParser()
parser.add_argument('fname', metavar='FILE', help='file to process')
args = parser.parse_args()
with open(args.fname) as f:
#do stuff with f
Now you call python2 ritwc.py DarkAndStormyNight.txt. If you call python3 ritwc.py with no argument, it'll give an error saying it expected argument for FILE. You can also now call python3 ritwc.py -h and it will explain that a file to process is required.
PS here's a great intro in how to use it: http://docs.python.org/3.3/howto/argparse.html

In fact, as it seams that python cannot see that filename when the stdin is redirected from the console, you have an alternative:
Call your program like this:
python3 ritwc.py -i your_file.txt
and then add the following code to redirect the stdin from inside python, so that you have access to the filename through the variable "filename_in":
import sys
flag=0
for arg in sys.argv:
if flag:
filename_in = arg
break
if arg=="-i":
flag=1
sys.stdin = open(filename_in, 'r')
#the rest of your code...
If now you use the command:
print(sys.stdin.name)
you get your filename; however, when you do the same print command after redirecting stdin from the console you would got the result: <stdin>, which shall be an evidence that python can't see the filename in that way.

I don't think it's possible. As far as your python script is concerned it's writing to stdout. The fact that you are capturing what is written to stdout and writing it to file in your shell has nothing to do with the python script.

How to use python in windows to open javascript, have it interpreted by WScript, and pass it the command line arguments

I have a format holding paths to files and command line arguments to pass to those files when they are opened in Windows.
For example I might have a path to a javascript file and a list of command line arguments to pass it, in such a case I want to open the javascript file in the same way you might with os.startfile and pass it the command line arguments - since the arguments are saved as a string I would like to pass it as a string but I can also pass it as a list if need be.
I am not quite sure what I should be using for this since a .js is not an executable, and thus will raise errors in Popen while startfile only takes verbs as its second command.
This problem can be extended to an arbitrary number of file extensions that need to be opened, and passed command line arguments, but will be interpreted by a true executable when opening.

If windows has registered the .js extension to open with wscript, you can do this, by leaving that decision up to the windows shell.
You can just use os.system() to do the same thing as you would do when you type it at the command prompt, for example:
import os
os.system('example.js arg1 arg2')
You can also use the start command:
os.system('start example.js arg1 arg2')
If you need more power, for example to get results, you can use subprocess.Popen(), but make sure to use shell=True (so that the shell can call the right application):
from subprocess import Popen
p = Popen('example.js arg1 arg2', shell=True)
# you can also do pass the filename and arguments separately:
# p = Popen(['example.js', 'arg1', 'arg2'], shell=True)
stdoutdata, stderrdata = p.communicate()
(Although this would probably require cscript instead of wscript)
If Windows doesn't have any default application to open the file with (or if it's not the one you want), well, you're on your own of course...

Why does subprocess.Popen() with shell=True work differently on Linux vs Windows?

When using subprocess.Popen(args, shell=True) to run "gcc --version" (just as an example), on Windows we get this:
>>> from subprocess import Popen
>>> Popen(['gcc', '--version'], shell=True)
gcc (GCC) 3.4.5 (mingw-vista special r3) ...
So it's nicely printing out the version as I expect. But on Linux we get this:
>>> from subprocess import Popen
>>> Popen(['gcc', '--version'], shell=True)
gcc: no input files
Because gcc hasn't received the --version option.
The docs don't specify exactly what should happen to the args under Windows, but it does say, on Unix, "If args is a sequence, the first item specifies the command string, and any additional items will be treated as additional shell arguments." IMHO the Windows way is better, because it allows you to treat Popen(arglist) calls the same as Popen(arglist, shell=True) ones.
Why the difference between Windows and Linux here?

Actually on Windows, it does use cmd.exe when shell=True - it prepends cmd.exe /c (it actually looks up the COMSPEC environment variable but defaults to cmd.exe if not present) to the shell arguments. (On Windows 95/98 it uses the intermediate w9xpopen program to actually launch the command).
So the strange implementation is actually the UNIX one, which does the following (where each space separates a different argument):
/bin/sh -c gcc --version
It looks like the correct implementation (at least on Linux) would be:
/bin/sh -c "gcc --version" gcc --version
Since this would set the command string from the quoted parameters, and pass the other parameters successfully.
From the sh man page section for -c:
Read commands from the command_string operand instead of from the standard input. Special parameter 0 will be set from the command_name operand and the positional parameters ($1, $2, etc.) set from the remaining argument operands.
This patch seems to fairly simply do the trick:
--- subprocess.py.orig 2009-04-19 04:43:42.000000000 +0200
+++ subprocess.py 2009-08-10 13:08:48.000000000 +0200
## -990,7 +990,7 ##
args = list(args)
if shell:
- args = ["/bin/sh", "-c"] + args
+ args = ["/bin/sh", "-c"] + [" ".join(args)] + args
if executable is None:
executable = args[0]

From the subprocess.py source:
On UNIX, with shell=True: If args is a string, it specifies the
command string to execute through the shell. If args is a sequence,
the first item specifies the command string, and any additional items
will be treated as additional shell arguments.
On Windows: the Popen class uses CreateProcess() to execute the child
program, which operates on strings. If args is a sequence, it will be
converted to a string using the list2cmdline method. Please note that
not all MS Windows applications interpret the command line the same
way: The list2cmdline is designed for applications using the same
rules as the MS C runtime.
That doesn't answer why, just clarifies that you are seeing the expected behavior.
The "why" is probably that on UNIX-like systems, command arguments are actually passed through to applications (using the exec* family of calls) as an array of strings. In other words, the calling process decides what goes into EACH command line argument. Whereas when you tell it to use a shell, the calling process actually only gets the chance to pass a single command line argument to the shell to execute: The entire command line that you want executed, executable name and arguments, as a single string.
But on Windows, the entire command line (according to the above documentation) is passed as a single string to the child process. If you look at the CreateProcess API documentation, you will notice that it expects all of the command line arguments to be concatenated together into a big string (hence the call to list2cmdline).
Plus there is the fact that on UNIX-like systems there actually is a shell that can do useful things, so I suspect that the other reason for the difference is that on Windows, shell=True does nothing, which is why it is working the way you are seeing. The only way to make the two systems act identically would be for it to simply drop all of the command line arguments when shell=True on Windows.

The reason for the UNIX behaviour of shell=True is to do with quoting. When we write a shell command, it will be split at spaces, so we have to quote some arguments:
cp "My File" "New Location"
This leads to problems when our arguments contain quotes, which requires escaping:
grep -r "\"hello\"" .
Sometimes we can get awful situations where \ must be escaped too!
Of course, the real problem is that we're trying to use one string to specify multiple strings. When calling system commands, most programming languages avoid this by allowing us to send multiple strings in the first place, hence:
Popen(['cp', 'My File', 'New Location'])
Popen(['grep', '-r', '"hello"'])
Sometimes it can be nice to run "raw" shell commands; for example, if we're copy-pasting something from a shell script or a Web site, and we don't want to convert all of the horrible escaping manually. That's why the shell=True option exists:
Popen(['cp "My File" "New Location"'], shell=True)
Popen(['grep -r "\"hello\"" .'], shell=True)
I'm not familiar with Windows so I don't know how or why it behaves differently.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.