Order of the arguments matters in getopt - python

My application parses the command line arguments:
import sys
import getopt
arguments = sys.argv[1:]
options, remainder = getopt.getopt(arguments, "aa:bb:cc:dd:h", ["aaaa=", "bbbb=", "cccc=", "dddd=", "help"])
print dict(options)
This works great but at the same time odd: if I put the arguments in the different order, they aren't get parsed
python my_app.py --aaaa=value1 --bbbb=value2 --cccc=value3 --dddd=value4 #ok
python my_app.py --dddd=value4 --bbbb=value2 --cccc=value3 --aaaa=value1 # empty
That's disappointing because the order of the arguments shouldn't matter, should it? Is there any way to solve that?
UPDATE:
python my_app.py -aa value1 # odd, empty { "-a" : "" }
python my_app.py -a value1 # even this empty { "-a" : "" }

As stated in the first comment to your question, your main example regarding failed parsing of arguments in a different order works just fine:
~/tmp/so$ python my_app.py --aaaa=value1 --bbbb=value2 --cccc=value3 --dddd=value4
{'--aaaa': 'value1', '--cccc': 'value3', '--dddd': 'value4', '--bbbb': 'value2'}
~/tmp/so$ python my_app.py --dddd=value4 --bbbb=value2 --cccc=value3 --aaaa=value1
{'--cccc': 'value3', '--bbbb': 'value2', '--aaaa': 'value1', '--dddd': 'value4'}
If that's not the case for you, please update the script to print the remainder as well, and show its output.
However, you have still misused the getopt library and that's the reason the latest examples you provided don't work as expected. You can't specify more than a single character as an option, since the second character would count as a new separate option. getopt provides no way to differentiate between two consecutive characters that count as a single option (with the first one carrying no argument value, as it is not followed by a colon) or a single option that is composed of two characters. From getopt.getopt's documentation, with my added emphasis:
options is the string of option letters that the script wants to recognize, with options that require an argument followed by a colon.
Therefore, when getopt parses your arguments, each time it encounters a -a argument, it associates it with the first a option it notices, which in your case is not followed by a colon. Thus, it sets this option, discards its argument value, if there was any (if -aa was passed as an argument to the script, the second a counts as the argument value) and moves on to the next argument.
Finally, regarding getopt and argparse. The documentation clearly advocates argparse:
The getopt module is a parser for command line options whose API is designed to be familiar to users of the C getopt() function. Users who are unfamiliar with the C getopt() function or who would like to write less code and get better help and error messages should consider using the argparse module instead.
More about why argparse is better than both getopt and the deprecated optparse can be read in this PEP and in the answers to this question.
The only functionality that I've found to be supported in getopt while it requires a bit of work in argparse is argument order permutation like that of gnu getopt. However, this question explains how this can be achieved via argparse.

Related

ConfigArgParse ignores abbreviated options?

my config.ini:
banana=original_banana
if I run with full argument name, I get the expected result:
python test_configargparse.py --banana new_banana
new_banana
if I run with abbreviated argument name (--ban instead of --banana), I get unexpected behaviour:
python test_configargparse.py --ban new_banana
original_banana
code for test_configargparse.py
import os, configargparse as ap
parser = ap.ArgumentParser(default_config_files=["config.ini"])
parser.add_argument('--banana',dest='banana')
options = parser.parse_args()
print(options.banana)
versions = ConfigArgParse==0.13.0, Python 2.7.10
is this a bug or am I missing something obvious?? it's a very basic feature in a very established module...
NOTE: this feature is explicitly documented in https://docs.python.org/3/library/argparse.html
allows long options to be abbreviated to a prefix, if the abbreviation is unambiguous (the prefix matches a unique option)
It looks like a bug in ConfigArgParse. When it loads options from the config file, it discards any option that is already on the command line.
discard_this_key = already_on_command_line(
args, action.option_strings)
The bug is that already_on_command_line() only checks for complete argument names, not prefixes.
def already_on_command_line(existing_args_list, potential_command_line_args):
"""Utility method for checking if any of the potential_command_line_args is
already present in existing_args.
"""
return any(potential_arg in existing_args_list
for potential_arg in potential_command_line_args)
That leaves two copies of the argument in the list, with the config file's value second. ArgumentParser takes the second value.

why isn't getopt working on python 2.7.5 mac os x

For some reason, when I pass in an option using the long form, the getopt function isn't recognizing it properly. Any ideas? I've read the documentation here http://docs.python.org/2/library/getopt.html, and it seems it should not do what it's doing.
I'm running python 2.7.5 on mac os x for the record.
[user#macbookpro:~] python Script.py test --condition=foo --output-file abc.def
['test', '--condition=foo', '--output-file', 'abc.def']
[]
<type 'list'>
def main(argv):
try:
optlist, args = getopt.getopt(argv[1:], '', ['condition=', 'output-file=', 'testing'])
except getopt.GetoptError, msg:
logging.warning(msg)
return 1
print args
print optlist
print type(optlist)
I should be getting the following as stated in the documentation:
optlist
[('--condition', 'foo'), ('--testing', ''), ('--output-file', 'abc.def')]
The documentation doesn't say that you should be getting that. In fact, it explicitly says you shouldn't:
Note: Unlike GNU getopt(), after a non-option argument, all further arguments are considered also non-options. This is similar to the way non-GNU Unix systems work.
And if you look at the examples, the non-option arguments come after the options on the command line. If you do that, it gives you what you were hoping for:
$ python Script.py --condition=foo --output-file abc.def test
['test']
[('--condition', 'foo'), ('--output-file', 'abc.def')]
<type 'list'>
But if you do something different from the examples, you get different results from the examples. And the results you get match what the docs say you should.
But really, if you don't understand why putting test after the options is different from putting it before the options, you shouldn't be using getopt in the first place. As the docs say in a big box right at the top:
Note: The getopt module is a parser for command line options whose API is designed to be familiar to users of the C getopt() function. Users who are unfamiliar with the C getopt() function or who would like to write less code and get better help and error messages should consider using the argparse module instead.
If you really want to learn getopt, then read the POSIX definition. That's what Python is trying to emulate. It does add GNU-style -- long arguments, but that doesn't mean it includes all GNU extensions.

Using OptionParser vs sys.argv

For a script, I'm currently using OptionParser to add variables to an input. However, all of my current options are booleans, and it seems it would just be easier to parse using argv instead. For example:
$ script.py option1 option4 option6
And then do something like:
if 'option1' in argv:
do this
if 'option2' in argv:
do this
etc...
Would it be suggested to use argv over OptionParser when the optionals are all booleans?
"However, all of my current options are booleans, and it seems it
would just be easier to parse using argv instead."
There's nothing wrong with using argv, and if it's simpler to use argv, there's no reason not to.
OptionParser has been deprecated, and unless you're stuck on an older version of python, you should use the ArgParser module.
For one-off scripts, there's nothing wrong with parsing sys.argv yourself. There are some advantages to using an argument parsing module instead of writing your own.
Standardized. Do you allow options like "-test", because the standard is usually 2 underscores for multichar options (e.g. "--test"). With a module, you don't have to worry about defining standards because they're already defined.
Do you need error-catching and help messages? Because you get a lot of that for free with ArgParse.
Will someone else be maintaining your code? There's already lots of documentation and examples of ArgParse. Plus, it's somewhat self documenting, because you have to specify the type and number of arguments, which isn't always apparent from looking at a sys.argv parser.
Basically, if you ever expect your command line options to change over time, or expect that your code will have to be modified by someone else, the overhead of ArgParse isn't that bad and would probably save you time in the future.

Stop parsing on first unknown argument

Using argparse, is it possible to stop parsing arguments at the first unknown argument?
I've found 2 almost solutions;
parse_known_args, but this allows for known parameters to be detected after the first unknown argument.
nargs=argparse.REMAINDER, but this won't stop parsing until the first non-option argument. Any options preceding this that aren't recognised generate an error.
Have I overlooked something? Should I be using argparse at all?
I haven't used argparse myself (need to keep my code 2.6-compatible), but looking through the docs, I don't think you've missed anything.
So I have to wonder why you want argparse to stop parsing arguments, and why the -- pseudo-argument won't do the job. From the docs:
If you have positional arguments that must begin with '-' and don’t look like negative numbers, you can insert the pseudo-argument '--' which tells parse_args() that everything after that is a positional argument:
>>> parser.parse_args(['--', '-f'])
Namespace(foo='-f', one=None)
One way to do it, although it may not be perfect in all situations, is to use getopt instead.
for example:
import sys
import os
from getopt import getopt
flags, args = getopt(sys.argv[1:], 'hk', ['help', 'key='])
for flag, v in flags:
if flag in ['-h', '--help']:
print(USAGE, file=sys.stderr)
os.exit()
elif flag in ['-k', '--key']:
key = v
Once getopt encounters a non-option argument it will stop processing arguments.

Why isn't getopt working if sys.argv is passed fully?

If I'm using this with getopt:
import getopt
import sys
opts,args = getopt.getopt(sys.argv,"a:bc")
print opts
print args
opts will be empty. No tuples will be created. If however, I'll use sys.argv[1:], everything works as expected. I don't understand why that is. Anyone care to explain?
The first element of sys.argv (sys.argv[0]) is the name of the script currently being executed. Because this script name is (likely) not a valid argument (and probably doesn't begin with a - or -- anyway), getopt does not recognize it as an argument. Due to the nature of how getopt works, when it sees something that is not a command-line flag (something that does not begin with - or --), it stops processing command-line options (and puts the rest of the arguments into args), because it assumes the rest of the arguments are items that will be handled by the program (such as filenames or other "required" arguments).
It's by design. Recall that sys.argv[0] is the running program name, and getopt doesn't want it.
From the docs:
Parses command line options and
parameter list. args is the argument
list to be parsed, without the leading
reference to the running program.
Typically, this means sys.argv[1:].
options is the string of option
letters that the script wants to
recognize, with options that require
an argument followed by a colon (':';
i.e., the same format that Unix
getopt() uses).
http://docs.python.org/library/getopt.html

Categories

Resources