In Luciano Ramalho's Fluent Python, an iterable is defined as an object in which the __iter__ method is implemented, with no additional characteristics.
I am currently working out a tutorial for laymen in which I am trying to chunk the core concepts of Python to make programming more manageable for newcomers.
I find it easier to explain iterables and their utility for these people when I associate these objects with the concept of "size" (thus also length). By saying that "iterables are objects that have length" and thus tying in with the len function, I am able to naturally evolve the concept of loops and iteration with commonly used types such as the Standard Library list, dict, tuple, str, as well as numpy.ndarray, pandas.Series and pandas.DataFrame.
However, since now I know about the sole necessity for the __iter__ method, there can be cases where the analogy with len fails. Ramalho even provides an impromptu example in his book:
import re
import reprlib
RE_WORD = re.compile(r'\w+')
class Sentence:
def __init__(self, text):
self.text = text
def __repr__(self):
return 'Sentence(%s)' % reprlib.repr(self.text)
def __iter__(self):
for match in RE_WORD.finditer(self.text):
yield match.group()
As expected, any instance of Sentence is an iterable (I can use for loops), but len(Sentence('an example')) will raise a TypeError.
Since all the aforementioned objects are iterables and have a __len__ method implemented, I want to know if there are relevant objects in Python which are iterables (__iter__), but do not have lengths (__len__) so if I can determine whether I just add a footnote to my tutorial or work out a different analogy.
A file has no length:
>>> with open("test") as f:
... print(len(f))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object of type '_io.TextIOWrapper' has no len()
Iterating through a file like that in open iterates over lines, i.e. chunks of text delimited by newline characters. To know how many lines there are, the file would have to be read entirely and then iterated through - depending on the size of the file this could take a long time or the computer could run out of RAM.
Iterators are ubiquitous iterables that usually don't offer a length:
>>> len(iter('foo'))
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
len(iter('foo'))
TypeError: object of type 'str_iterator' has no len()
>>> len(iter((1, 2, 3)))
Traceback (most recent call last):
File "<pyshell#3>", line 1, in <module>
len(iter((1, 2, 3)))
TypeError: object of type 'tuple_iterator' has no len()
>>> len(iter([1, 2, 3]))
Traceback (most recent call last):
File "<pyshell#0>", line 1, in <module>
len(iter([1, 2, 3]))
TypeError: object of type 'list_iterator' has no len()
Related
I have been studying the Tensorflow API written in Python. I have two questions.
1. Can we always use a list type as a function parameter when a tuple is expected?
If we look at the official API definition about the tf.placeholder and its examples (https://www.tensorflow.org/api_docs/python/tf/placeholder), we see that the second parameter of this function is the "shape". In the example code, we can see that a tuple is used to provide the shape information as a parameter as shown below.
x = tf.placeholder(tf.float32, shape=(1024, 1024))
However, in the official tutorial page (https://www.tensorflow.org/get_started/mnist/beginners), the example uses the list as the shape rather than using the tuple as shown below.
y_ = tf.placeholder(tf.float32, [None, 10])
I know that there are some differences between list and tuple such as being immutable vs mutable.
If the list supports all the functionality of a tuple, then could we always use the list instead of the tuple safely as a function parameter? And is it recommended?
2. What's the meaning of [None, 10] in the above example code?
In the above example code, [None, 10] is used. Are such expressions normally used? If so, then is "None" also considered as a kind of number?
Almost everything which you can do on tuple you can do on list too. However the vice-versa is not true because tuple are immutable whereas list are mutable.
But there's exception. Since tuple is immutable:
it can be used as a key in a dictionary.
used in a set.
Lists are intended to be homogeneous sequences, while tuples are heterogeneous data structures. Also, tuple are little better in terms of performance.
From the Python's Tuples and Sequences document:
Though tuples may seem similar to lists, they are often used in different situations and for different purposes. Tuples are immutable, and usually contain a heterogeneous sequence of elements that are accessed via unpacking (see later in this section) or indexing (or even by attribute in the case of namedtuples).
So the answer to your question:
Can we always use a list type as a function parameter when a tuple is expected?
You may use list instead of tuple in most of the cases, but not always. But you need not to worry much about this, as Python will remind you when your usage of list may go wrong. Below is the error which you'll receive on doing so:
TypeError: unhashable type: 'list'
For example:
>>> set([1, [1, 2]])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
>>> {[1, 2]: 1}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
As MoinuddinQuadri noted in https://stackoverflow.com/a/48038899/7505395 for this usage you can interchange a 2 element list for a tuple.
To answer your 2nd question:
According to the documentation you linked, [None,784] in the context of shape means that one dimension is unlimited, one is fixed to 784:
https://www.tensorflow.org/get_started/mnist/beginners#implementing_the_regression
x = tf.placeholder(tf.float32, [None, 784])
x isn't a specific value. It's a placeholder, a value that we'll input when we ask TensorFlow to run a computation. We want to be able to input any number of MNIST images, each flattened into a 784-dimensional vector. We represent this as a 2-D tensor of floating-point numbers, with a shape [None, 784].
(Here None means that a dimension can be of any length.)
"Can we always use a list type as a function parameter when a tuple is expected?"
No. Aside from other reasons, the function may check the type.
>>> issubclass(ZeroDivisionError, (Exception,))
True
>>> issubclass(ZeroDivisionError, [Exception,])
Traceback (most recent call last):
File "<pyshell#6>", line 1, in <module>
issubclass(ZeroDivisionError, [Exception,])
TypeError: issubclass() arg 2 must be a class or tuple of classes
There are also a couple of places where Python syntax requires a tuple, as in except clauses.
>>> try: 1/0
except (Exception,) as e: print(e)
division by zero
>>> try: 1/0
except [Exception] as e: print(e)
Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
try: 1/0
ZeroDivisionError: division by zero
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<pyshell#2>", line 2, in <module>
except [Exception] as e: print(e)
TypeError: catching classes that do not inherit from BaseException is not allowed
According to help(getattr), two or three arguments are accepted:
getattr(...)
getattr(object, name[, default]) -> value
Doing some simple tests, we can confirm this:
>>> obj = {}
>>> getattr(obj, 'get')
<built-in method get of dict object at 0x7f6d4beaf168>
>>> getattr(obj, 'bad', 'with default')
'with default'
Too few/too many arguments also behave as expected:
>>> getattr()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: getattr expected at least 2 arguments, got 0
>>> getattr(obj, 'get', 'with default', 'extra')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: getattr expected at most 3 arguments, got 4
The argument names specified in the help text do not seem to be accepted as keyword arguments:
>>> getattr(object=obj, name='get')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: getattr() takes no keyword arguments
The inspect module is no help here:
>>> import inspect
>>> inspect.getargspec(getattr)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/inspect.py", line 816, in getargspec
raise TypeError('{!r} is not a Python function'.format(func))
TypeError: <built-in function getattr> is not a Python function
(messaging is a little different in python3, but the gist is the same)
Now, the question: Is there a straightforward way to write my own Python function with a signature that behaves exactly like getattr's signature? That is, keyword arguments are not allowed, and minumum/maximum number of arguments are enforced? The closest I've come is the following:
def myfunc(*args):
len_args = len(args)
if len_args < 2:
raise TypeError('expected at least 2 arguments, got %d' % len_args)
elif len_args > 3:
raise TypeError('expected at most 3 arguments, got %d' % len_args)
...
But now instead of meaningful argument names like object and name we get args[0] and args[1]. It's also a lot of boilerplate, and feels downright unpleasant. I know that, being a builtin, getattr must have vastly different implementation than typical Python code, and perhaps there's no way to perfectly emulate the way it behaves. But it's a curiosity I've had for a while.
This code ticks most of your requirements:
def anonymise_args(fn):
#functools.wraps(fn)
def wrap(*args):
return fn(*args)
return wrap
#anonymise_args
def myfunc(obj, name, default=None):
print obj, name, default
keyword arguments are not allowed
x.myfunc(obj=1, name=2)
TypeError: wrap() got an unexpected keyword argument 'obj'
A minumum/maximum number of arguments are enforced
x.myfunc(1,2,3,4)
TypeError: myfunc() takes at most 3 arguments (4 given)
meaningful argument names
not a lot of boilerplate
As of Python 3.8, there is now syntax-level support for this:
def f(a, b, c=None, /):
...
Note the slash. Any parameters before the slash are positional-only; they cannot be specified by keyword. This syntax has been picked out for quite a while - PEP 457 dates back to 2013 - but it was only made an actual language feature in Python 3.8.
Regardless of whether any parameters are made positional-only, default argument values still have the limitation that there is no way to distinguish the no-value-passed case from the case where the default is passed explicitly. To do that, you have to process *args manually.
Prior to Python 3.8, these kinds of function signatures are particular to functions written in C, using the C-level PyArg_Parse* family of functions and the Argument Clinic preprocessor. There's no built-in way to write that kind of signature in Python before 3.8. The closest you can get is what you already came up with, using *args.
I am trying to add a new class/type to python via C/C++. This particular class has a vector like attribute called x, it is a list but it is made only from positive doubles. I used the tp_getset to have finer control over setting x when defined like:
>>> obj.x=[1.0,2.5,3.5]
I have total control over the setting of the attribute, and it works as expected and I can prevent user from doing:
>>> obj.x=[1.0,2.5,-4]
Traceback (most recent call last):
File "input.py", line 21, in <module>
obj.x=[1.0,2.5,-4]
ValueError: 'x[2]' should be greater than or equal to 0.
or
>>> obj.x=[1.0,2.5,'str']
Traceback (most recent call last):
File "input.py", line 21, in <module>
obj.x=[1.0,2.5,'str'];
TypeError: fialure to deduce C++ type <double*>
and throw an exception. However, when tp_getset.get is defined which returns a list object of doubles, I do not have any control over the user doing the following:
>>> obj.x=[1.0,2.5,3.5]
>>> obj.x[2]=-3.6
I know that as a last resort I can just generate a list object with only one reference every time tp_getset.get and do not keep a reference inside the class so that obj.x[2] basically a temporary and nothing will happen
>>>obj.x=[1.0,2.5,3.5]
>>>obj.x[2]='str'
>>>print obj.x
[1.0,2.5,3.5]
My question is there a way to have this specific kind of control over sequence attributes of a class? for example.
>>> obj.x=[1.0,2.5,4]
>>> obj.x[2]=-3
File "input.py", line 21, in <module>
obj.x[2]=-3
ValueError: 'x[2]' should be greater than or equal to 0.
EDIT: I just added more details for clarification
>>> print(len.__doc__)
len(module, object)
Return the number of items of a sequence or mapping.
>>> len(os, 1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: len() takes exactly one argument (2 given)
Notice the two parameters in the first line of the docstring.
When would you pass two arguments to len? Is the docstring incorrect? I'm using Python 3.4.0.
This was a bug submitted on 2014-04-18 here. It has since been fixed in 3.4.1.
Quoting Vedran Čačić, the original author of the bug report:
From recently, help(len) gives the wrong signature of len.
Help on built-in function len in module builtins:
len(...)
len(module, object)
^^^^^^^^
Return the number of items of a sequence or mapping.
When I try the following wrong code:
not_float = [1, 2, 3]
"{:.6f}".format(not_float)
I get the following misleading ValueError:
ValueError: Unknown format code 'f' for object of type 'str'
It is misleading, since it might make me think not_float is a string. Same message occurs for other non_float types, such as NoneType, tuple, etc. Do you have any idea why? And: should I expect this error message no matter what the type of non_float is, as long as it does not provide some formatting method for f?
On the other hand, trying:
non_date = 3
"{:%Y}".format(non_date)
brings
ValueError: Invalid conversion specification
which is less informative but also less misleading.
The str.format() method, and the format() function, call the .__format__() method of the objects that are being passed in, passing along everything after the : colon (.6f in this case).
The default object.__format__() implementation of that method is to call str(self) then apply format() on that result. This is implemented in C, but in Python that'd look like:
def __format__(self, fmt):
return format(str(self), fmt)
It is this call that throws the exception. For a list object, for example:
>>> not_float.__format__('.6f')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: Unknown format code 'f' for object of type 'str'
because this is functionally the same as:
>>> format(str(not_float), '.6f')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: Unknown format code 'f' for object of type 'str'
Integers have a custom .__format__ implementation instead; it does not use str() internally because there are integer-specific formatting options. It turns the value into a string differently. As a result, it throws a different exception because it doesn't recognize %Y as a valid formatting string.
The error message could certainly be improved; an open Python bug discusses this issue. Because of changes in how all this works the problem is no longer an issue in Python 3.4 though; if the format string is empty .__format__() will no longer be called.