getattr(dir,"__name__") is dir.__name__ evaluates to False - is there an alternative to getattr that would yield True ?
The __name__ attribute of built-in functions is implemented (on the CPython reference interpreter) as a property (technically, a get-set descriptor), not stored as an attribute in the form of a Python object.
Properties act like attributes, but call a function when the value is requested, and in this case, the function converts the C-style string name of the function to a Python str on demand. So each time you look up dir.__name__, you get freshly constructed str representing the data; as noted in the comments, this means there is no way to have an is check pass; even dir.__name__ is dir.__name__ returns False, because each lookup of __name__ returned a new str.
The language gives no guarantees of how __name__ is implemented, so you shouldn't be assuming it returns the same object each time. There are very few language guaranteed singletons (None, True, False, Ellipsis and NotImplemented are the biggies, and all classes have unique identities); assuming is will work with anything not in that set when it's not an object you controlled the creation of is a bad idea. If you want to check if the values are the same, test with ==, not is.
Update to address traversing an arbitrary graph of python objects without getting hung up by descriptors and other stuff (like __getattr__) that dynamically generate objects (and therefore shouldn't be invoked to describe the static graph):
The inspect.getattr_static function should let you "traverse an arbitrary graph of python objects reachable from a starting one while assuming as little possible about the types of objects and the implementation of their attributes" (as your comment requested). When the attribute is actually an attribute, it returns the value, but it doesn't trigger dynamic lookup for descriptors (like #property), __getattr__ or __getattribute__. So inspect.getattr_static(dir, '__name__') will return the getset_descriptor that CPython uses to implement __name__ without actually retrieving the string. On a different object where __name__ is a real attribute (e.g. the inspect module itself), it will return the attribute (inspect.getattr_static(inspect, '__name__') returns 'inspect').
While it's not perfect (some properties may actually be backed by real Python objects, not dynamically generated ones, that you can't otherwise access), it's at least a workable solution; you won't end up creating new objects by accident, and you won't end up in infinite loops of property lookup (e.g. every callable can have __call__ looked up on it forever, wrapping itself over and over as it goes), so you can at least arrive at a solution that mostly reflects the object graph accurately, and doesn't end up recursing to death.
Notably, it will preserve identity semantics properly. If two objects have the same attribute (by identity), the result will match as expected. If two objects share a descriptor (e.g. __name__ for all built-in functions, e.g. bin, dir), then it returns the descriptor itself, which will match on identity. And it does it all without needing to know up front if what you have is an attribute or descriptor.
Related
I understand that the dot operator is accessing the method specific to an object that is an instance of the class containing that method/function. However, in which cases do you instead call the function directly on an object, in the form func(obj) as opposed to obj.func()?
Can both techniques always be implemented (at least in custom code) or are there certain cases in which the former should be used over the latter, and vice versa?
I had previously read that the form func(obj) is for processing data that the object holds, but why would this not be possible with doing obj.dataMember.func(), is there an advantage to passing just the object, such as some change in mutability?
If the function exists exclusively to serve that object type, then you should probably make it a method of the class; that requires the obj.func() syntax.
If the function will also work on objects not of that one class, then you should make it a regular function, performing the generalization and discrimination with the function. This requires the syntax func(obj).
I constantly see people state that "Everything in Python is an object.", but I haven't seen "thing" actually defined. This saying would lead me to believe that all tokens of any kind are also considered to be objects, including operators, punctuators, whitespace, etc. Is that actually the case? Is there a more concise way of stating what a Python object actually is?
Thanks
Anything that can be assigned to a variable is an object.
That includes functions, classes, and modules, and of course int's, str's, float's, list's, and everything else. It does not include whitespace, punctuation, or operators.
Just to mention it, there is the operator module in the standard library which includes functions that implement operators; those functions are objects. That doesn't mean + or * are objects.
I could go on and on, but this is simple and pretty complete.
Some values are obviously objects; they are instances of a class, have attributes, etc.
>>> i = 3
>>> type(i)
<type 'int'>
>>> i.denominator
1
Other values are less obviously objects. Types are objects:
>>> type(int)
<type 'type'>
>>> int.__mul__(3, 5)
15
Even type is an object (of type type, oddly enough):
>>> type(type)
<type 'type'>
Modules are objects:
>>> import sys
>>> type(sys)
<type 'module'>
Built-in functions are objects:
>>> type(sum)
<type 'builtin_function_or_method'>
In short, if you can reference it by name, it's an object.
What is generally meant is that most things, for example functions and methods are objects. Modules too. Classes (not just their instances) themselves are objects. and int/float/strings are objects. So, yes, things generally tend to be objects in Python. Cyphase is correct, I just wanted to give some examples of things that might not be immediately obvious as objects.
Being objects then a number of properties are observable on things that you would consider special case, baked-in stuff in other languages. Though __dict__, which allows arbitrary attribute assignment in Python, is often missing on things intended for large volume instantiations like int.
Therefore, at least on pure-Python objects, a lot of magic can happen, from introspection to things like creating a new class on the fly.
Kinda like turtles all the way down.
You're not going to find a rigorous definition like C++11's, because Python does not have a formal specification like C++11, it has a reference manual like pre-ISO C++. The Data model chapter is as rigorous as it gets:
Objects are Python’s abstraction for data. All data in a Python program is represented by objects or by relations between objects. (In a sense, and in conformance to Von Neumann’s model of a “stored program computer,” code is also represented by objects.)
Every object has an identity, a type and a value. An object’s identity never changes once it has been created; you may think of it as the object’s address in memory. …
The glossary also has a shorter definition:
Any data with state (attributes or value) and defined behavior (methods).
And it's true that everything in Python has methods and (other) attributes. Even if there are no public methods, there's a set of special methods and values inherited from the object base class, like the __str__ method.
This wasn't true in versions of Python before 2.2, which is part of the reason we have multiple words for nearly the same thing—object, data, value; type, class… But from then on, the following kinds of things are identical:
Objects.
Things that can be returned or yielded by a function.
Things that can be stored in a variable (including a parameter).
Things that are instances of type object (usually indirectly, through a subclass or two).
Things that can be the value resulting from an expression.
Things represented by pointers to PyObject structs in CPython.
… and so on.
That's what "everything is an object" means.
It also means that Python doesn't have "native types" and "class types" like Java, or "value types" and "reference types" like C#; there's only one kind of thing, objects.
This saying would lead me to believe that all tokens of any kind are also considered to be objects, including operators, punctuators, whitespace, etc. Is that actually the case?
No. Those things don't have values, so they're not objects.1
Also, variables are not objects. Unlike C-style variables, Python variables are not memory locations with a type containing a value, they're just names bound to a value in some namespace.2 And that's why you can't pass around references to variables; there is no "thing" to reference.3
Assignment targets are also not objects. They sometimes look a lot like values, and even the core devs sometimes refer to things like the a, b in a, b = 1, 2 loosely as a tuple object—but there is no tuple there.4
There's also a bit of apparent vagueness with things like elements of a numpy.array (or an array.array or ctypes.Structure). When you write a[0] = 3, the 3 object doesn't get stored in the array the way it would with a list. Instead, numpy stores some bytes that Python doesn't even understand, but that it can use to do "the same thing a 3 would do" in array-wide operations, or to make a new copy of the 3 object if you later ask for a[0] = 3.
But if you go back to the definition, it's pretty clear that this "virtual 3" is not an object—while it has a type and value, it does not have an identity.
1. At the meta level, you can write an import hook that can act on imported code as a byte string, a decoded Unicode string, a list of token tuples, an AST node, a code object, or a module, and all of those are objects… But at the "normal" level, from within the code being imported, tokens, etc. are not objects.
2. Under the covers, there's almost always a string object to represent that name, stored in a dict or tuple that represents the namespace, as you can see by calling globals() or dir(self). But that's not what the variable is.
3. A closure cell is sort of a way of representing a reference to a variable, but really, it's the cell itself that's an object, and the variables at different scopes are just a slightly special kind of name for that cell.
4. However, in a[0] = 3, although a[0] isn't a value, a and 0 are, because that assignment is equivalent to the expression a.__setitem__(0, 3), except that it's not an expression.
Below is the program that returns function type object defined in function f whose stack frame(f1) is still alive until the program exits.
Below is the program that returns int type object whose value is 1024, but the stack frame does not exist after we return int type object?
As per the above two diagrams, Why this difference in return type mechanisms, where frame is not alive, when you return int type object.
What is the idea for stack frame being alive when function type object is returned?
Python never make copies unless explicitly asked to (for example, slicing a list does ask Python to copy that part of the list, shallowly).
"Does add_three refer to same int object that n is pointing to?" -- yes, only references to that int are being passed around and held in frames. In this case this applies whatever the value of n.
Any Python implementation is allowed to keep a single copy, or multiple copies, of immutable objects line ints -- whatever's most convenient to that implementation, given the semantics are not affected anyway.
So in a given implementation it could happen that every mention of literal 3 refers to the same int object but mentions of literal 333 need not. E.g:
2>>> a=333; b=333; print(id(a), id(b))
(4298804944, 4298804944)
2>>> a=333
2>>> b=333
2>>> print(id(a), id(b))
(4298753600, 4298753336)
The semantics of the two cases are absolutely identical; in the first case the compiler (intrinsically called on the whole line at once) finds it handy to instantiate and use a single int worth 333, in the second case it prefers to make and use two such instances -- either is completely fine, given int's immutability (same goes for other number types, strings, tuples, frozen sets -- but not for mutable types).
Note that when the Python specification refers to "same semantics", it explicitly includes introspection, which may be able to pinpoint implementation differences between semantically equivalent states.
id (normally returning the memory address of an object, in current popular implementations of Python, but in any case an id that's unique per object as long as the object lives, per language specs) is introspection, as consequently is the is operator. So you can if you wish use it to understand some optimizations a given implementation may perform, or not.
So on to your other Qs: "Is my understanding correct?" -- no.
"Why this difference" -- def builds a function object, which is mutable, so any def even with identical function definitions must return a new object, just like e.g [] builds a list object, mutable, so any [] must return a new object. 3 build an int object, which is immutable, so any 3 is allowed (per language rules) to return either the same or a new object.
One more question was added in an edit: "What is the idea for stack frame being alive when function type object is returned?"
Answer: every object stays alive as long as it's reachable. An outer function's frame, in particular, stays alive as long as inner (nested) functions is returned, if they refer to names in the outer frame.
(Any Python implementation doesn't have to garbage-collect objects that don't any more need to be alive -- it may delay that garbage collection as long as it pleases, or can perform it at once -- implementation details!-).
This question already has answers here:
id()s of bound and unbound method objects --- sometimes the same for different objects, sometimes different for the same object
(2 answers)
Closed 7 years ago.
The results from the code below in Python 2.7 struck me as a contradiction. The is operator is supposed to work with object identity and so is id. But their results diverge when I'm looking at a user-defined method. Why is that?
py-mach >>class Hello(object):
... def hello():
... pass
...
py-mach >>Hello.hello is Hello.hello
False
py-mach >>id(Hello.hello) - id(Hello.hello)
0
I found the following excerpt from the description of the Python data model somewhat useful. But it didn't really make everything clear. Why does the id function return the same integer if the user-defined method objects are constructed anew each time?
User-defined method objects may be created when getting an attribute of a class (perhaps via an instance of that class), if that attribute is a user-defined function object, an unbound user-defined method object, or a class method object. When the attribute is a user-defined method object, a new method object is only created if the class from which it is being retrieved is the same as, or a derived class of, the class stored in the original method object; otherwise, the original method object is used as it is.
The Python documentation for the id function states:
Return the "identity" of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
(emphasis mine)
When you do id(Hello.hello) == id(Hello.hello), the method object is created only briefly and is considered "dead" after the first call to 'id'. Because of the call to id, you only need Hello.hello to be alive for a short period of time -- enough to obtain the id. Once you get that id, the object is dead and the second Hello.hello can reuse that address, which makes it appear as if the two objects have the same id.
This is in contrast to doing Hello.hello is Hello.hello -- both instances have to live long enough to be compared to each other, so you end up having two live instances.
If you instead tried:
>>> a = Hello.hello
>>> b = Hello.hello
>>> id(a) == id(b)
False
...you'd get the expected value of False.
This is a "simple" consequence of how the memory allocator works. It is very similar to the case:
>>> id([]) == id([])
True
Basically python doesn't guarantee that ID's don't get reused -- it only guarantees that the id is unique as long as the object is alive. In this case, the first object being passed to id is dead after the call to id and (C)python re-uses that id when creating the second object.
Never rely on this behavior as it is allowed by the language reference, but certainly not required.
In Python:
len(a) can be replaced by a.__len__()
str(a) or repr(a) can be replaced by a.__str__() or a.__repr__()
== is __eq__, + is __add__, etc.
Is there similar method to get the id(a) ? If not, is there any workaround to get an unique id of a python object without using id() ?
edit: additional question: if not ? is there any reason not to define a __id__() ?
No, this behavior cannot be changed. id() is used to get "an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime" (source). No other special meaning is given to this integer (in CPython it is the address of the memory location where the object is stored, but this cannot be relied upon in portable Python).
Since there is no special meaning for the return value of id(), it makes no sense to allow you to return a different value instead.
Further, while you could guarantee that id() would return unique integers for your own objects, you could not possibly satisfy the global uniqueness constraint, since your object cannot possibly have knowledge of all other living objects. It would be possible (and likely) that one of your special values clashes with the identity of another object alive in the runtime. This would not be an acceptable scenario.
If you need a return value that has some special meaning then you should define a method where appropriate and return a useful value from it.
An object isn't aware of its own name (it can have many), let alone of any unique ID it has associated with it. So - in short - no. The reasons that __len__ and co. work is that they are bound to the object already - an object is not bound to its ID.