>>> print(len.__doc__)
len(module, object)
Return the number of items of a sequence or mapping.
>>> len(os, 1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: len() takes exactly one argument (2 given)
Notice the two parameters in the first line of the docstring.
When would you pass two arguments to len? Is the docstring incorrect? I'm using Python 3.4.0.
This was a bug submitted on 2014-04-18 here. It has since been fixed in 3.4.1.
Quoting Vedran Čačić, the original author of the bug report:
From recently, help(len) gives the wrong signature of len.
Help on built-in function len in module builtins:
len(...)
len(module, object)
^^^^^^^^
Return the number of items of a sequence or mapping.
Related
In Luciano Ramalho's Fluent Python, an iterable is defined as an object in which the __iter__ method is implemented, with no additional characteristics.
I am currently working out a tutorial for laymen in which I am trying to chunk the core concepts of Python to make programming more manageable for newcomers.
I find it easier to explain iterables and their utility for these people when I associate these objects with the concept of "size" (thus also length). By saying that "iterables are objects that have length" and thus tying in with the len function, I am able to naturally evolve the concept of loops and iteration with commonly used types such as the Standard Library list, dict, tuple, str, as well as numpy.ndarray, pandas.Series and pandas.DataFrame.
However, since now I know about the sole necessity for the __iter__ method, there can be cases where the analogy with len fails. Ramalho even provides an impromptu example in his book:
import re
import reprlib
RE_WORD = re.compile(r'\w+')
class Sentence:
def __init__(self, text):
self.text = text
def __repr__(self):
return 'Sentence(%s)' % reprlib.repr(self.text)
def __iter__(self):
for match in RE_WORD.finditer(self.text):
yield match.group()
As expected, any instance of Sentence is an iterable (I can use for loops), but len(Sentence('an example')) will raise a TypeError.
Since all the aforementioned objects are iterables and have a __len__ method implemented, I want to know if there are relevant objects in Python which are iterables (__iter__), but do not have lengths (__len__) so if I can determine whether I just add a footnote to my tutorial or work out a different analogy.
A file has no length:
>>> with open("test") as f:
... print(len(f))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object of type '_io.TextIOWrapper' has no len()
Iterating through a file like that in open iterates over lines, i.e. chunks of text delimited by newline characters. To know how many lines there are, the file would have to be read entirely and then iterated through - depending on the size of the file this could take a long time or the computer could run out of RAM.
Iterators are ubiquitous iterables that usually don't offer a length:
>>> len(iter('foo'))
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
len(iter('foo'))
TypeError: object of type 'str_iterator' has no len()
>>> len(iter((1, 2, 3)))
Traceback (most recent call last):
File "<pyshell#3>", line 1, in <module>
len(iter((1, 2, 3)))
TypeError: object of type 'tuple_iterator' has no len()
>>> len(iter([1, 2, 3]))
Traceback (most recent call last):
File "<pyshell#0>", line 1, in <module>
len(iter([1, 2, 3]))
TypeError: object of type 'list_iterator' has no len()
According to help(getattr), two or three arguments are accepted:
getattr(...)
getattr(object, name[, default]) -> value
Doing some simple tests, we can confirm this:
>>> obj = {}
>>> getattr(obj, 'get')
<built-in method get of dict object at 0x7f6d4beaf168>
>>> getattr(obj, 'bad', 'with default')
'with default'
Too few/too many arguments also behave as expected:
>>> getattr()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: getattr expected at least 2 arguments, got 0
>>> getattr(obj, 'get', 'with default', 'extra')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: getattr expected at most 3 arguments, got 4
The argument names specified in the help text do not seem to be accepted as keyword arguments:
>>> getattr(object=obj, name='get')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: getattr() takes no keyword arguments
The inspect module is no help here:
>>> import inspect
>>> inspect.getargspec(getattr)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/inspect.py", line 816, in getargspec
raise TypeError('{!r} is not a Python function'.format(func))
TypeError: <built-in function getattr> is not a Python function
(messaging is a little different in python3, but the gist is the same)
Now, the question: Is there a straightforward way to write my own Python function with a signature that behaves exactly like getattr's signature? That is, keyword arguments are not allowed, and minumum/maximum number of arguments are enforced? The closest I've come is the following:
def myfunc(*args):
len_args = len(args)
if len_args < 2:
raise TypeError('expected at least 2 arguments, got %d' % len_args)
elif len_args > 3:
raise TypeError('expected at most 3 arguments, got %d' % len_args)
...
But now instead of meaningful argument names like object and name we get args[0] and args[1]. It's also a lot of boilerplate, and feels downright unpleasant. I know that, being a builtin, getattr must have vastly different implementation than typical Python code, and perhaps there's no way to perfectly emulate the way it behaves. But it's a curiosity I've had for a while.
This code ticks most of your requirements:
def anonymise_args(fn):
#functools.wraps(fn)
def wrap(*args):
return fn(*args)
return wrap
#anonymise_args
def myfunc(obj, name, default=None):
print obj, name, default
keyword arguments are not allowed
x.myfunc(obj=1, name=2)
TypeError: wrap() got an unexpected keyword argument 'obj'
A minumum/maximum number of arguments are enforced
x.myfunc(1,2,3,4)
TypeError: myfunc() takes at most 3 arguments (4 given)
meaningful argument names
not a lot of boilerplate
As of Python 3.8, there is now syntax-level support for this:
def f(a, b, c=None, /):
...
Note the slash. Any parameters before the slash are positional-only; they cannot be specified by keyword. This syntax has been picked out for quite a while - PEP 457 dates back to 2013 - but it was only made an actual language feature in Python 3.8.
Regardless of whether any parameters are made positional-only, default argument values still have the limitation that there is no way to distinguish the no-value-passed case from the case where the default is passed explicitly. To do that, you have to process *args manually.
Prior to Python 3.8, these kinds of function signatures are particular to functions written in C, using the C-level PyArg_Parse* family of functions and the Argument Clinic preprocessor. There's no built-in way to write that kind of signature in Python before 3.8. The closest you can get is what you already came up with, using *args.
O'Reilly's Learn Python Powerful Object Oriented Programming by Mark Lutz teaches different ways to format strings.
This following code has me confused. I am interpreting 'ham' as filling the format place marker at index zero, and yet it still pops up at index one of the outputted string. Please help me understand what is actually going on.
Here is the code:
template = '{motto}, {0} and {food}'
template.format('ham', motto='spam', food='eggs')
And here is the output:
'spam, ham and eggs'
I expected:
'ham, spam and eggs'
The only thing you have to understand is that {0} refers to the first (zeroeth) unnamed argument sent to format(). We can see this to be the case by removing all unnamed references and trying to use a linear fill-in:
>>> "{motto}".format("boom")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'motto'
You would expect that 'boom' would fill in 'motto' if this is how it works. But, instead, format() looks for a parameter named 'motto'. The key hint here is the KeyError. Similarly, if it were just taking the sequence of parameters passed to format(), then this wouldn't error, either:
>>> "{0} {1}".format('ham', motto='eggs')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: tuple index out of range
Here, format() is looking for the second unnamed argument in the parameter list - but that doesn't exist so it gets a 'tuple index out of range' error. This is just the difference between the unnamed (which are positionally sensitive) and named arguments passed in Python.
See this post to understand the difference between these types arguments, known as 'args' and 'kwargs'.
When I try the following wrong code:
not_float = [1, 2, 3]
"{:.6f}".format(not_float)
I get the following misleading ValueError:
ValueError: Unknown format code 'f' for object of type 'str'
It is misleading, since it might make me think not_float is a string. Same message occurs for other non_float types, such as NoneType, tuple, etc. Do you have any idea why? And: should I expect this error message no matter what the type of non_float is, as long as it does not provide some formatting method for f?
On the other hand, trying:
non_date = 3
"{:%Y}".format(non_date)
brings
ValueError: Invalid conversion specification
which is less informative but also less misleading.
The str.format() method, and the format() function, call the .__format__() method of the objects that are being passed in, passing along everything after the : colon (.6f in this case).
The default object.__format__() implementation of that method is to call str(self) then apply format() on that result. This is implemented in C, but in Python that'd look like:
def __format__(self, fmt):
return format(str(self), fmt)
It is this call that throws the exception. For a list object, for example:
>>> not_float.__format__('.6f')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: Unknown format code 'f' for object of type 'str'
because this is functionally the same as:
>>> format(str(not_float), '.6f')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: Unknown format code 'f' for object of type 'str'
Integers have a custom .__format__ implementation instead; it does not use str() internally because there are integer-specific formatting options. It turns the value into a string differently. As a result, it throws a different exception because it doesn't recognize %Y as a valid formatting string.
The error message could certainly be improved; an open Python bug discusses this issue. Because of changes in how all this works the problem is no longer an issue in Python 3.4 though; if the format string is empty .__format__() will no longer be called.
Let's have a look:
print([object, ...], *, sep=' ', end='\n', file=sys.stdout)
http://docs.python.org/py3k/library/functions.html?highlight=print#print
How can we interpret that '*'?
Usually an asterisk ('*') means numerous objects. But herein it is a mystery to me. Between two commas... I'm even afraid to think it may be a typo.
That's an error in the documentation, inserted by someone applying a new Python 3 feature to places where it shouldn't be used. It has since been fixed (see issue 15831).
The function signatures in the document used is given in a psuedo-formal-grammar form, but adding in the * marker only makes sense if you use actual python syntax. The [object, ...], * part of the signature should have been listed as *objects instead in that case.
The corrected version now reads:
print(*objects, sep=' ', end='\\n', file=sys.stdout, flush=False)
The online development version of the documentation has as of now not yet been updated, but the documentation source has been corrected; I'll see if we can request a regeneration of the docs.
To be clear: the * syntax is valid in Python 3 and means that the following arguments can only ever be used as keyword arguments, not positional arguments. This does however not apply to the print() function, as all positional arguments are to be printed anyway and could never be mistaken for the keyword arguments.
It means that the following arguments are keyword-only i.e., you can't supply them as positional arguments, you must use their names e.g.:
>>> def f(*, a): pass
...
>>> f(1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: f() takes exactly 0 positional arguments (1 given)
>>> f(a=1)
>>> # ok
Another example:
>>> def g(*a, b): pass
...
>>> g(1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: g() needs keyword-only argument b
>>> g(1, 2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: g() needs keyword-only argument b
>>> g(1, b=2)
>>> # ok
>>> g(1, 2, b=3)
>>> # ok