String Formatting Confusion - python

O'Reilly's Learn Python Powerful Object Oriented Programming by Mark Lutz teaches different ways to format strings.
This following code has me confused. I am interpreting 'ham' as filling the format place marker at index zero, and yet it still pops up at index one of the outputted string. Please help me understand what is actually going on.
Here is the code:
template = '{motto}, {0} and {food}'
template.format('ham', motto='spam', food='eggs')
And here is the output:
'spam, ham and eggs'
I expected:
'ham, spam and eggs'

The only thing you have to understand is that {0} refers to the first (zeroeth) unnamed argument sent to format(). We can see this to be the case by removing all unnamed references and trying to use a linear fill-in:
>>> "{motto}".format("boom")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'motto'
You would expect that 'boom' would fill in 'motto' if this is how it works. But, instead, format() looks for a parameter named 'motto'. The key hint here is the KeyError. Similarly, if it were just taking the sequence of parameters passed to format(), then this wouldn't error, either:
>>> "{0} {1}".format('ham', motto='eggs')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: tuple index out of range
Here, format() is looking for the second unnamed argument in the parameter list - but that doesn't exist so it gets a 'tuple index out of range' error. This is just the difference between the unnamed (which are positionally sensitive) and named arguments passed in Python.
See this post to understand the difference between these types arguments, known as 'args' and 'kwargs'.

Related

Is it possible to write a function signature that behaves like getattr()'s does?

According to help(getattr), two or three arguments are accepted:
getattr(...)
getattr(object, name[, default]) -> value
Doing some simple tests, we can confirm this:
>>> obj = {}
>>> getattr(obj, 'get')
<built-in method get of dict object at 0x7f6d4beaf168>
>>> getattr(obj, 'bad', 'with default')
'with default'
Too few/too many arguments also behave as expected:
>>> getattr()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: getattr expected at least 2 arguments, got 0
>>> getattr(obj, 'get', 'with default', 'extra')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: getattr expected at most 3 arguments, got 4
The argument names specified in the help text do not seem to be accepted as keyword arguments:
>>> getattr(object=obj, name='get')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: getattr() takes no keyword arguments
The inspect module is no help here:
>>> import inspect
>>> inspect.getargspec(getattr)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/inspect.py", line 816, in getargspec
raise TypeError('{!r} is not a Python function'.format(func))
TypeError: <built-in function getattr> is not a Python function
(messaging is a little different in python3, but the gist is the same)
Now, the question: Is there a straightforward way to write my own Python function with a signature that behaves exactly like getattr's signature? That is, keyword arguments are not allowed, and minumum/maximum number of arguments are enforced? The closest I've come is the following:
def myfunc(*args):
len_args = len(args)
if len_args < 2:
raise TypeError('expected at least 2 arguments, got %d' % len_args)
elif len_args > 3:
raise TypeError('expected at most 3 arguments, got %d' % len_args)
...
But now instead of meaningful argument names like object and name we get args[0] and args[1]. It's also a lot of boilerplate, and feels downright unpleasant. I know that, being a builtin, getattr must have vastly different implementation than typical Python code, and perhaps there's no way to perfectly emulate the way it behaves. But it's a curiosity I've had for a while.
This code ticks most of your requirements:
def anonymise_args(fn):
#functools.wraps(fn)
def wrap(*args):
return fn(*args)
return wrap
#anonymise_args
def myfunc(obj, name, default=None):
print obj, name, default
keyword arguments are not allowed
x.myfunc(obj=1, name=2)
TypeError: wrap() got an unexpected keyword argument 'obj'
A minumum/maximum number of arguments are enforced
x.myfunc(1,2,3,4)
TypeError: myfunc() takes at most 3 arguments (4 given)
meaningful argument names
not a lot of boilerplate
As of Python 3.8, there is now syntax-level support for this:
def f(a, b, c=None, /):
...
Note the slash. Any parameters before the slash are positional-only; they cannot be specified by keyword. This syntax has been picked out for quite a while - PEP 457 dates back to 2013 - but it was only made an actual language feature in Python 3.8.
Regardless of whether any parameters are made positional-only, default argument values still have the limitation that there is no way to distinguish the no-value-passed case from the case where the default is passed explicitly. To do that, you have to process *args manually.
Prior to Python 3.8, these kinds of function signatures are particular to functions written in C, using the C-level PyArg_Parse* family of functions and the Argument Clinic preprocessor. There's no built-in way to write that kind of signature in Python before 3.8. The closest you can get is what you already came up with, using *args.

Python not raising an exception on % without conversion specifier

The % operator for string formatting is described here.
Usually, when presented a string without conversion specifier, it will raise a TypeError: not all arguments converted during string formatting. For instance, "" % 1 will fail. So far, so good.
Sometimes, it won't fail, though, if the argument on the right of the % operator is something empty: "" % [], or "" % {} or "" % () will silently return the empty string, and it looks fair enough.
The same with "%s" instead of the empty string will convert the empty object into a string, except the last which will fail, but I think it's an instance of the problems of the % operator, which are solved by the format method.
There is also the case of a non-empty dictionary, like "" % {"a": 1}, which works because it's really supposed to be used with named type specifiers, like in "%(a)d" % {"a": 1}.
However, there is one case I don't understand: "" % b"x" will return the empty string, no exception raised. Why?
I'm not 100% sure, but after a quick look in the sources, I guess the reason is the following:
when there's only one argument on the right of %, Python looks if it has the getitem method, and, if yes, assumes it to be a mapping and expects us to use named formats like %(name)s. Otherwise, Python creates a single-element tuple from the argument and performs positional formatting. Argument count is not checked with mappings, therefore, since bytes and lists do have getitem, they won't fail:
>>> "xxx" % b'a'
'xxx'
>>> "xxx" % ['a']
'xxx'
Consider also:
>>> class X: pass
...
>>> "xxx" % X()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: not all arguments converted during string formatting
>>> class X:
... def __getitem__(self,x): pass
...
>>> "xxx" % X()
'xxx'
Strings are exception of this rule - they have getitem, but are still "tuplified" for positional formatting:
>>> "xxx" % 'a'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: not all arguments converted during string formatting
Of course, this "sequences as mappings" logic doesn't make much sense, because formatting keys are always strings:
>>> "xxx %(0)s" % ['a']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: list indices must be integers, not str
but I doubt anyone is going to fix that, given that % is abandoned anyways.
The offending line is at unicodeobject.c. It considers all objects that are "mappings", and explicitly are not either tuples or strings, or subclasses thereof, as "dictionaries", and for those it is not error if not all arguments are converted.
The PyMapping_Check is defined as:
int
PyMapping_Check(PyObject *o)
{
return o && o->ob_type->tp_as_mapping &&
o->ob_type->tp_as_mapping->mp_subscript;
}
That is, any type with tp_as_mapping defined and that having mp_subscript is a mapping.
And bytes does define that, as does any other object with __getitem__. Thus in Python 3.4 at least, no object with __getitem__ will fail as the rightside argument to the % format op.
Now this is a change from Python 2.7. Furthermore, the reason for this is that it is that there is no way to detect all possible types that could be used for %(name)s formatting, except by accepting all types that implement __getitem__, though the most obvious mistakes have been taken out. When the Python 3 was published, no one added bytes there, though it clearly shouldn't support strings as arguments to __getitem__; but neither is there list there.
Another oversight is that a list cannot be used for formatting for positional parameters.

What is "module" in the docstring of len?

>>> print(len.__doc__)
len(module, object)
Return the number of items of a sequence or mapping.
>>> len(os, 1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: len() takes exactly one argument (2 given)
Notice the two parameters in the first line of the docstring.
When would you pass two arguments to len? Is the docstring incorrect? I'm using Python 3.4.0.
This was a bug submitted on 2014-04-18 here. It has since been fixed in 3.4.1.
Quoting Vedran Čačić, the original author of the bug report:
From recently, help(len) gives the wrong signature of len.
Help on built-in function len in module builtins:
len(...)
len(module, object)
^^^^^^^^
Return the number of items of a sequence or mapping.

Misleading `ValueError` on bad formatting in python 2.7

When I try the following wrong code:
not_float = [1, 2, 3]
"{:.6f}".format(not_float)
I get the following misleading ValueError:
ValueError: Unknown format code 'f' for object of type 'str'
It is misleading, since it might make me think not_float is a string. Same message occurs for other non_float types, such as NoneType, tuple, etc. Do you have any idea why? And: should I expect this error message no matter what the type of non_float is, as long as it does not provide some formatting method for f?
On the other hand, trying:
non_date = 3
"{:%Y}".format(non_date)
brings
ValueError: Invalid conversion specification
which is less informative but also less misleading.
The str.format() method, and the format() function, call the .__format__() method of the objects that are being passed in, passing along everything after the : colon (.6f in this case).
The default object.__format__() implementation of that method is to call str(self) then apply format() on that result. This is implemented in C, but in Python that'd look like:
def __format__(self, fmt):
return format(str(self), fmt)
It is this call that throws the exception. For a list object, for example:
>>> not_float.__format__('.6f')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: Unknown format code 'f' for object of type 'str'
because this is functionally the same as:
>>> format(str(not_float), '.6f')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: Unknown format code 'f' for object of type 'str'
Integers have a custom .__format__ implementation instead; it does not use str() internally because there are integer-specific formatting options. It turns the value into a string differently. As a result, it throws a different exception because it doesn't recognize %Y as a valid formatting string.
The error message could certainly be improved; an open Python bug discusses this issue. Because of changes in how all this works the problem is no longer an issue in Python 3.4 though; if the format string is empty .__format__() will no longer be called.

Python Date() Function

I have what may be a dopey question about Python's date function.
Let's say I want to pass the script a date, July 2nd 2013. This code works fine:
from datetime import date
july_2nd = date(2013,7,2)
print july_2nd
Output:
2013-07-02
So now what if I want to pass the date() function a value stored in a variable, which I can set with a function, rather than hard coding 7/2/13, so I try this and get an error:
from datetime import date
july_2nd = (2013,7,2)
print date(july_2nd)
Error message:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: an integer is required
Can anyone explain what's going on here?
You want to pass in the tuple as separate arguments, using the *args splat syntax:
date(*july_2nd)
By prefixing july_2nd with an asterisk, you are telling Python to call date() with all values from that variable as separate parameters.
See the calls expression documentation for details; there is a **kwargs form as well to expand mapping into keyword arguments.

Categories

Resources