Python object.__repr__(self) should be an expression?

Python object.__repr__(self) should be an expression? - python

I was looking at the builtin object methods in the Python documentation, and I was interested in the documentation for object.__repr__(self). Here's what it says:
Called by the repr() built-in function
and by string conversions (reverse
quotes) to compute the “official”
string representation of an object. If
at all possible, this should look like
a valid Python expression that could
be used to recreate an object with the
same value (given an appropriate
environment). If this is not possible,
a string of the form <...some useful
description...> should be returned.
The return value must be a string
object. If a class defines repr()
but not str(), then repr() is
also used when an “informal” string
representation of instances of that
class is required.
This is typically used for debugging,
so it is important that the
representation is information-rich and
unambiguous
The most interesting part to me, was...
If at all possible, this should look like a valid Python expression that could be used to recreate an object with the same value
... but I'm not sure exactly what this means. It says it should look like an expression which can be used to recreate the object, but does that mean it should just be an example of the sort of expression you could use, or should it be an actual expression, that can be executed (eval etc..) to recreate the object? Or... should it be just a rehasing of the actual expression which was used, for pure information purposes?
In general I'm a bit confused as to exactly what I should be putting here.

>>> from datetime import date
>>>
>>> repr(date.today()) # calls date.today().__repr__()
'datetime.date(2009, 1, 16)'
>>> eval(_) # _ is the output of the last command
datetime.date(2009, 1, 16)
The output is a string that can be parsed by the python interpreter and results in an equal object.
If that's not possible, it should return a string in the form of <...some useful description...>.

It should be a Python expression that, when eval'd, creates an object with the exact same properties as this one. For example, if you have a Fraction class that contains two integers, a numerator and denominator, your __repr__() method would look like this:
# in the definition of Fraction class
def __repr__(self):
return "Fraction(%d, %d)" % (self.numerator, self.denominator)
Assuming that the constructor takes those two values.

Guideline: If you can succinctly provide an exact representation, format it as a Python expression (which implies that it can be both eval'd and copied directly into source code, in the right context). If providing an inexact representation, use <...> format.
There are many possible representations for any value, but the one that's most interesting for Python programmers is an expression that recreates the value. Remember that those who understand Python are the target audience—and that's also why inexact representations should include relevant context. Even the default <XXX object at 0xNNN>, while almost entirely useless, still provides type, id() (to distinguish different objects), and indication that no better representation is available.

"but does that mean it should just be an example of the sort of expression you could use, or should it be an actual expression, that can be executed (eval etc..) to recreate the object? Or... should it be just a rehasing of the actual expression which was used, for pure information purposes?"
Wow, that's a lot of hand-wringing.
An "an example of the sort of expression you could use" would not be a representation of a specific object. That can't be useful or meaningful.
What is the difference between "an actual expression, that can ... recreate the object" and "a rehasing of the actual expression which was used [to create the object]"? Both are an expression that creates the object. There's no practical distinction between these. A repr call could produce either a new expression or the original expression. In many cases, they're the same.
Note that this isn't always possible, practical or desirable.
In some cases, you'll notice that repr() presents a string which is clearly not an expression of any kind. The default repr() for any class you define isn't useful as an expression.
In some cases, you might have mutual (or circular) references between objects. The repr() of that tangled hierarchy can't make sense.
In many cases, an object is built incrementally via a parser. For example, from XML or JSON or something. What would the repr be? The original XML or JSON? Clearly not, since they're not Python. It could be some Python expression that generated the XML. However, for a gigantic XML document, it might not be possible to write a single Python expression that was the functional equivalent of parsing XML.

'repr' means representation.
First, we create an instance of class coordinate.
x = Coordinate(3, 4)
Then if we input x into console, the output is
<__main__.Coordinate at 0x7fcd40ab27b8>
If you use repr():
>>> repr(x)
Coordinate(3, 4)
the output is as same as 'Coordinate(3, 4)', except it is a string. You can use it to recreate a instance of coordinate.
In conclusion, repr() method is print out a string, which is the representation of the object.

To see how the repr works within a class, run the following code, first with and then without the repr method.
class Coordinate (object):
def __init__(self,x,y):
self.x = x
self.y = y
def getX(self):
# Getter method for a Coordinate object's x coordinate.
# Getter methods are better practice than just accessing an attribute directly
return self.x
def getY(self):
# Getter method for a Coordinate object's y coordinate
return self.y
def __repr__(self): #remove this and the next line and re-run
return 'Coordinate(' + str(self.getX()) + ',' + str(self.getY()) + ')'
>>>c = Coordinate(2,-8)
>>>print(c)

I think the confusion over here roots from the english. I mean __repr__(); short for 'representation' of the value I'm guessing, like #S.Lott said
"What is the difference between "an actual expression, that can ... recreate the object" and "a rehasing of the actual expression which was used [to create the object]"? Both are an expression that creates the object. There's no practical distinction between these. A repr call could produce either a new expression or the original expression. In many cases, they're the same."
But in some cases they might be different. E.g; coordinate points, you might want c.coordinate to return: 3,5 but c.__repr__ to return Coordinate(3, 5). Hope that makes more sense...

Related

Why we can call `mod` function with string in Python?

I am reading python source code:
https://hg.python.org/cpython/file/2.7/Lib/collections.py#l621
def __repr__(self):
if not self:
return '%s()' % self.__class__.__name__
items = ', '.join(map('%r: %r'.__mod__, self.most_common()))
return '%s({%s})' % (self.__class__.__name__, items)
form the doc:
operator.mod(a, b)
operator.__mod__(a, b)¶
Return a % b.
This is right as i think,
But why '%r: %r'.__mod__ is right?

Why Strings Have __mod__
__mod__ implements the behaviour of the % operator in Python. For strings, the % operator is overloaded to give us string formatting options. Where usually a % b would force the evaluation of a mod b if a and b are numbers, for strings, we can change the behaviour of % so that a % b actually inserts the elements of b into a if a is a string.
The way operator overloading works in Python is that each infix operator symbol - +,-,*,/, etc. (and, as of Python 3.5, the matrix multiplication operator #) - corresponds to a specific method in the base definition of the class it's being called on. For +, it is __add__(), for example. For %, it is __mod__(). We can overload these methods by simply redefining them within a class.
If I have class Foo, and Foo implements a member function __add__(self, other), I can potentially make Foo() + bar behave very differently than what the usual definition of + is.
In other words, the string formatting technique
'%s: %s' % (5,2)
in Python actually calls
'%s: %s'.__mod__((5,2))
under the hood, where __mod__ is defined for objects belonging to class string. The way __mod__() is implemented for strings yields, in this case, just 5: 2, rather than the ridiculous interpretation of '%s : %s' mod (5,2)
Why __mod__ in map and not __mod__()
In the specific case of map('%r: %r'.__mod__, self.most_common()), what's happening is that the function pointer (for want of a better word - note that Python doesn't have pointers, but it doesn't hurt to think in that way for a moment) __mod__ is being applied to each of the elements in self.most_common(), rather than the function __mod__().
This is no different than doing, say, map(int, "52"). We don't pass the function invocation int(), we pass a reference to the function as int and expect the function to be invoked by map with the second arguments to map. i.e. that int() will be invoked over each element of "52".
We can't do map('%r: %r'.__mod__(), self.most_common()) for exactly this reason. The function '%r: %r'.__mod__() would be invoked without the appropriate parameters passed in and return an error - what we want instead is a reference to the function __mod__() than we can deference and invoke whenever we like, which is accomplished by calling __mod__.
A C++ Analogy
The behaviour of __mod__ versus __mod__() is really no different than how function pointers work in C++: a function pointer for foo() is denoted by just foo i.e. without the parentheses. Something analogous - but not quite the same - happens here. I introduce this here because it may make the distinction clearer, because on the surface pointers look very similar to what is happening and introducing pointers leads to a fairly familiar mode of thinking which is good enough for this specific purpose.
In C++, we can pass function pointers to other functions and introduce a form of currying - you can then invoke the function pointer on elements through regular foo() syntax inside another function, for example. In Python, we don't have pointers - we have wrapper objects that can reference the underlying memory location (but prevent raw access to it). For our purposes, though, the net effect is the same. #Bukuriu explores the difference in the comments.
Basically, __mod__() forces an evaluation with no parameters; __mod__ returns a pointer to __mod__() than can then be invoked by another function on suitable parameters. Internally, that is what map does: take a function pointer (again, this is an analogy), and then deference and evaluate it on another element.
You can see this yourself: calling '%s'.__mod__ returns
<method-wrapper '__mod__' of str object at 0x7f92ed464690>
i.e. a wrapper object with a reference to the memory address to the function. Meanwhile, calling '%s'.__mod__() returns an error:
TypeError: expected 1 arguments, got 0
because the extra parentheses invoked an evaluation of __mod__ and found there were no arguments.

As in http://rafekettler.com/magicmethods.html says
__mod__(self, other)
Implements modulo using the % operator.
This means when you do string formating '%s' % '123' you do '%s'.__mod__('123')

Let's break this case down.
Essential parts of this complex line are:
seq = self.most_common()
string_representations = map('%r: %r'.__mod__, seq)
items = ', '.join(string_representations)
First line calls Counters method to retrieve top counts from dictionary. Third line joins string representations to single comma-separated string. Both are fairly trivial.
Second line:
- calls map - which tells us it calls some function for each element in seq
- first argument of map defines function as '%r: %r'.__mod__
We know that operator overloading is done by redefining __magic_methods__ in class declaration. Strings happens to define __mod__ as interpolation operation.
Also, we know that most of operations are just syntactic sugars around those magic methods.
What happens here is magic method being referred explicitly instead of via a % b syntactic sugar. Or, from different perspective, underlining implementation detail is used perform operation instead of standard form.
Second line is roughy equivalent to:
string_representations = ['%r: %r' % o for o in seq]

What does "Everything" mean when someone says "Everything in Python is an object."?

I constantly see people state that "Everything in Python is an object.", but I haven't seen "thing" actually defined. This saying would lead me to believe that all tokens of any kind are also considered to be objects, including operators, punctuators, whitespace, etc. Is that actually the case? Is there a more concise way of stating what a Python object actually is?
Thanks

Anything that can be assigned to a variable is an object.
That includes functions, classes, and modules, and of course int's, str's, float's, list's, and everything else. It does not include whitespace, punctuation, or operators.
Just to mention it, there is the operator module in the standard library which includes functions that implement operators; those functions are objects. That doesn't mean + or * are objects.
I could go on and on, but this is simple and pretty complete.

Some values are obviously objects; they are instances of a class, have attributes, etc.
>>> i = 3
>>> type(i)
<type 'int'>
>>> i.denominator
1
Other values are less obviously objects. Types are objects:
>>> type(int)
<type 'type'>
>>> int.__mul__(3, 5)
15
Even type is an object (of type type, oddly enough):
>>> type(type)
<type 'type'>
Modules are objects:
>>> import sys
>>> type(sys)
<type 'module'>
Built-in functions are objects:
>>> type(sum)
<type 'builtin_function_or_method'>
In short, if you can reference it by name, it's an object.

What is generally meant is that most things, for example functions and methods are objects. Modules too. Classes (not just their instances) themselves are objects. and int/float/strings are objects. So, yes, things generally tend to be objects in Python. Cyphase is correct, I just wanted to give some examples of things that might not be immediately obvious as objects.
Being objects then a number of properties are observable on things that you would consider special case, baked-in stuff in other languages. Though __dict__, which allows arbitrary attribute assignment in Python, is often missing on things intended for large volume instantiations like int.
Therefore, at least on pure-Python objects, a lot of magic can happen, from introspection to things like creating a new class on the fly.
Kinda like turtles all the way down.

You're not going to find a rigorous definition like C++11's, because Python does not have a formal specification like C++11, it has a reference manual like pre-ISO C++. The Data model chapter is as rigorous as it gets:
Objects are Python’s abstraction for data. All data in a Python program is represented by objects or by relations between objects. (In a sense, and in conformance to Von Neumann’s model of a “stored program computer,” code is also represented by objects.)
Every object has an identity, a type and a value. An object’s identity never changes once it has been created; you may think of it as the object’s address in memory. …
The glossary also has a shorter definition:
Any data with state (attributes or value) and defined behavior (methods).
And it's true that everything in Python has methods and (other) attributes. Even if there are no public methods, there's a set of special methods and values inherited from the object base class, like the __str__ method.
This wasn't true in versions of Python before 2.2, which is part of the reason we have multiple words for nearly the same thing—object, data, value; type, class… But from then on, the following kinds of things are identical:
Objects.
Things that can be returned or yielded by a function.
Things that can be stored in a variable (including a parameter).
Things that are instances of type object (usually indirectly, through a subclass or two).
Things that can be the value resulting from an expression.
Things represented by pointers to PyObject structs in CPython.
… and so on.
That's what "everything is an object" means.
It also means that Python doesn't have "native types" and "class types" like Java, or "value types" and "reference types" like C#; there's only one kind of thing, objects.
This saying would lead me to believe that all tokens of any kind are also considered to be objects, including operators, punctuators, whitespace, etc. Is that actually the case?
No. Those things don't have values, so they're not objects.1
Also, variables are not objects. Unlike C-style variables, Python variables are not memory locations with a type containing a value, they're just names bound to a value in some namespace.2 And that's why you can't pass around references to variables; there is no "thing" to reference.3
Assignment targets are also not objects. They sometimes look a lot like values, and even the core devs sometimes refer to things like the a, b in a, b = 1, 2 loosely as a tuple object—but there is no tuple there.4
There's also a bit of apparent vagueness with things like elements of a numpy.array (or an array.array or ctypes.Structure). When you write a[0] = 3, the 3 object doesn't get stored in the array the way it would with a list. Instead, numpy stores some bytes that Python doesn't even understand, but that it can use to do "the same thing a 3 would do" in array-wide operations, or to make a new copy of the 3 object if you later ask for a[0] = 3.
But if you go back to the definition, it's pretty clear that this "virtual 3" is not an object—while it has a type and value, it does not have an identity.
1. At the meta level, you can write an import hook that can act on imported code as a byte string, a decoded Unicode string, a list of token tuples, an AST node, a code object, or a module, and all of those are objects… But at the "normal" level, from within the code being imported, tokens, etc. are not objects.
2. Under the covers, there's almost always a string object to represent that name, stored in a dict or tuple that represents the namespace, as you can see by calling globals() or dir(self). But that's not what the variable is.
3. A closure cell is sort of a way of representing a reference to a variable, but really, it's the cell itself that's an object, and the variables at different scopes are just a slightly special kind of name for that cell.
4. However, in a[0] = 3, although a[0] isn't a value, a and 0 are, because that assignment is equivalent to the expression a.__setitem__(0, 3), except that it's not an expression.

Custom Python Data Structure

I have a class that wraps around python deque from collections. When I go and create a deque x=deque(), I create a empty deque object. So if I fill it up: x.append(0) and simply type in x on the console, i get:
In[78]: x
Out[78]: deque([0])
My question is how can I output the same thing as above when I have a wrapper for class deque. For example.
class deque_wrapper:
def __init__(self):
self.data_structure = deque()
def newCustomAddon(x):
return len(self.data_structure)
Ie
In[74]: x = deque_wrapper()
In[75]: x
Out[75]: <__main__.deque_wrapperat 0x7e3d0f0>
I want to customize what gets printed out as oppose to just a memory location. What can I do?

I want to customize what gets printed out as oppose to just a memory location. What can I do?
This is exactly what __repr__ is for:
Called by the repr() built-in function to compute the “official” string representation of an object. If at all possible, this should look like a valid Python expression that could be used to recreate an object with the same value (given an appropriate environment). If this is not possible, a string of the form <...some useful description...> should be returned.
Because you didn't define a __repr__, you're getting the default implementation from object (assuming Python 3… otherwise, you've written a classic class, which is a bad idea, and you don't want to learn how they get their defaults when you can just stop using them…), which just returns that string with the object's type name and address.
Note the __str__ method below __repr__ in the docs. If the most human-readable representation and the valid-Python-expression representation are not the same, define both methods. Otherwise, just define __repr__, and __str__ will use it by default.
So, if you want to print the exact same thing as deque, just delegate __repr__:
def __repr__(self):
return repr(self.data_structure)
If you want to wrap it in something:
def __repr__(self):
return '{}({!r})'.format(type(self).__name__, self.data_structure)
Note that I didn't call repr in the second version, because that's exactly what !r means in a format string. But really, in this case, you don't need either; a deque has the same str and repr.

Delegate the generation of the representation.
class deque_wrapper:
...
def __repr__(self):
return repr(self.data_structure)

When is the output of repr useful?

I have been reading about repr in Python. I was wondering what the application of the output of repr is. e.g.
class A:
pass
repr(A) ='<class __main__.A at 0x6f570>'
b=A()
repr(b) = '<__main__.A instance at 0x74d78>'
When would one be interested in '<class __main__.A at 0x6f570>' or'<__main__.A instance at 0x74d78>'?

Theoretically, repr(obj) should spit out a string such that it can be fed into eval to recreate the object. In other words,
obj2 = eval(repr(obj1))
should reproduce the object.
In practice, repr is often a "lite" version of str. str might print a human-readable form of the object, whereas repr prints out information like the object's class, usually for debugging purposes. But the usefulness depends a lot on your situation and how the object in question handles repr.

Sometimes you have to deal with or present a byte string such as
bob2='bob\xf0\xa4\xad\xa2'
If you print this out (in Ubuntu) you get
In [62]: print(bob2)
bob𤭢
which is not very helpful to others trying to understand your byte string. In the comments, John points out that in Windows, print(bob2) results in something like bobð¤¢. The problem is that Python detects the default encoding of your terminal/console and tries to decode the byte string according to that encoding. Since Ubuntu and Windows uses different default encodings (possibly utf-8 and cp1252 respectively), different results ensue.
In contrast, the repr of a string is unambiguous:
In [63]: print(repr(bob2))
'bob\xf0\xa4\xad\xa2'
When people post questions here on SO about Python strings, they are often asked to show the repr of the string so we know for sure what string they are dealing with.
In general, the repr should be an unambiguous string representation of the object. repr(obj) calls the object obj's __repr__ method. Since in your example the class A does not have its own __repr__ method, repr(b) resorts to indicating the class and memory address.
You can override the __repr__ method to give more relevant information.
In your example, '<__main__.A instance at 0x74d78>' tells us two useful things:
that b is an instance of class A
in the __main__
namespace,
and that the object resides in
memory at address 0x74d78.
You might for instance, have two instances of class A. If they have the same memory address then you'd know they are "pointing" to the same underlying object. (Note this information can also be obtained using id).

The main purpose of repr() is that it is used in the interactive interpreter and in the debugger to format objects in human-readable form. The example you gave is mainly useful for debugging purposes.

How does Python differentiate between the different data types?

Sorry if this is quite noobish to you, but I'm just starting out to learn Python after learning C++ & Java, and I am wondering how in the world I could just declare variables like id = 0 and name = 'John' without any int's or string's in front! I figured out that perhaps it's because there are no ''s in a number, but how would Python figure that out in something like def increase(first, second) instead of something like int increase(int first, int second) in C++?!

The literal objects you mention carry (pointers to;-) their own types with them of course, so when a name's bound to that object the problem of type doesn't arise -- the object always has a type, the name doesn't -- just delegates that to the object it's bound to.
There's no "figuring out" in def increase(first, second): -- name increase gets bound to a function object, names first and second are recorded as parameters-names and will get bound (quite possibly to objects of different types at various points) as increase gets called.
So say the body is return first + second -- a call to increase('foo', 'bar') will then happily return 'foobar' (delegating the addition to the objects, which in this case are strings), and maybe later a call to increase(23, 45) will just as happily return 68 -- again by delegating the addition to the objects bound to those names at the point of call, which in this case are ints. And if you call with incompatible types you'll get an exception as the delegated addition operation can't make sense of the situation -- no big deal!

Python is dynamically typed: all variables can refer to an object of any type. id and name can be anything, but the actual objects are of types like int and str. 0 is a literal that is parsed to make an int object, and 'John' a literal that makes a str object. Many object types do not have literals and are returned by a callable (like frozenset—there's no way to make a literal frozenset, you must call frozenset.)
Consequently, there is no such thing as declaration of variables, since you aren't defining anything about the variable. id = 0 and name = 'John' are just assignment.
increase returns an int because that's what you return in it; nothing in Python forces it not to be any other object. first and second are only ints if you make them so.
Objects, to a certain extent, share a common interface. You can use the same operators and functions on them all, and if they support that particular operation, it works. It is a common, recommended technique to use different types that behave similarly interchangably; this is called duck typing. For example, if something takes a file object you can instead pass a cStringIO.StringIO object, which supports the same method as a file (like read and write) but is a completely different type. This is sort of like Java interfaces, but does not require any formal usage, you just define the appropriate methods.

Python uses the duck-typing method - if it walks, looks and quacks like a duck, then it's a duck. If you pass in a string, and try to do something numerical on it, then it will fail.
Have a look at: http://en.wikipedia.org/wiki/Python_%28programming_language%29#Typing and http://en.wikipedia.org/wiki/Duck_typing

When it comes to assigning literal values to variables, the type of the literal value can be inferred at the time of lexical analysis. For example, anything matching the regular expression (-)?[1-9][0-9]* can be inferred to be an integer literal. If you want to convert it to a float, there needs to be an explicit cast. Similarly, a string literal is any sequence of characters enclosed in single or double quotes.
In a method call, the parameters are not type-checked. You only need to pass in the correct number of them to be able to call the method. So long as the body of the method does not cause any errors with respect to the arguments, you can call the same method with lots of different types of arguments.

In Python, Unlike in C++ and Java, numbers and strings are both objects. So this:
id = 0
name = 'John'
is equivalent to:
id = int(0)
name = str('John')
Since variables id and name are references that may address any Python object, they don't need to be declared with a particular type.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.