Slightly surprising behaviour. Two classes with the same name in different modules share the same name. (Not equality, which is expected, but string object identity! )
I don't support it really matters much, but does anyone know why, and whether there's any potential here for further surprises?
Demo (with a/a.py and b/b.py and empty __init__.py in a/ and b/)
>>> from a import a
>>> from b import b
>>> ta = a.Test()
>>> tb = b.Test()
>>> ta.__class__.__name__
'Test'
>>> tb.__class__.__name__
'Test'
>>> tb.__class__.__name__ is ta.__class__.__name__ # unexpected
True
>>> ta.__class__
<class 'a.a.Test'>
>>> tb.__class__
<class 'b.b.Test'>
This is an implementation detail of the CPython interpreter.
It interns identifier strings; only one copy is created for source code strings, and re-used everywhere the exact same string value appears. This makes dictionary containment tests faster (by trying pointer comparisons first).
You can do this for your own strings with the sys.intern() function, which helpfully notes:
Normally, the names used in Python programs are automatically interned, and the dictionaries used to hold module, class or instance attributes have interned keys.
Also see About the changing id of a Python immutable string
As youve noted - you can't rely in sting objects being the same - nor in they being different objects. (Unless you explicitly makes references to the same strings, of course).
If you want to compare for the name, just use the == object. If yu want to know if the classes are the same, them use the is operator with the classes themselves, not with their names.
In Python everything is object/class. Whatever you write either a class or function, it's type is type.
In question you have 2 diff classes a and b. Here these 2 lines create 2 instances of both classes.
>>> ta = a.Test()
>>> tb = b.Test()
Function Test on each object return class Test. A class to __class__ returns object type or __repr__ of object. And name returns name of class in form of string. If you want to confirm type this and see type(tb.class.name). So
>>> tb.__class__.__name__ is ta.__class__.__name__
this is returning true as one str is type of other.
Related
Python documentation for id() function states the following:
This is an integer which is guaranteed to be unique and constant for
this object during its lifetime. Two objects with non-overlapping
lifetimes may have the same id() value.
CPython implementation detail: This is the address of the object in memory.
Although, the snippet below shows that id's are repeated. Since I didn't explicitly del the objects, I presume they are all alive and unique (I do not know what non-overlapping means).
>>> g = [0, 1, 0]
>>> for h in g:
... print(h, id(h))
...
0 10915712
1 10915744
0 10915712
>>> a=0
>>> b=1
>>> c=0
>>> d=[a, b,c]
>>> for e in d:
... print(e, id(e))
...
0 10915712
1 10915744
0 10915712
>>> id(a)
10915712
>>> id(b)
10915744
>>> id(c)
10915712
>>>
How can the id values for different objects be the same? Is it so because the value 0 (object of class int) is a constant and the interpreter/C compiler optimizes?
If I were to do a = c, then I understand c to have the same id as a since c would just be a reference to a (alias). I expected the objects a and c to have different id values otherwise, but, as shown above, they have the same values.
What's happening? Or am I looking at this the wrong way?
I would expect the id's for user-defined class' objects to ALWAYS be unique even if they have the exact same member values.
Could someone explain this behavior? (I looked at the other questions that ask uses of id(), but they steer in other directions)
EDIT (09/30/2019):
TO extend what I already wrote, I ran python interpreters in separate terminals and checked the id's for 0 on all of them, they were exactly the same (for the same interpreter); multiple instances of different interpreters had the same id for 0. Python2 vs Python3 had different values, but the same Python2 interpreter had same id values.
My question is because the id()'s documentation doesn't state any such optimizations, which seems misleading (I don't expect every quirk to be noted, but some note alongside the CPython note would be nice)...
EDIT 2 (09/30/2019):
The question is stemmed in understanding this behavior and knowing if there are any hooks to optimize user-define classes in a similar way (by modifying the __equals__ method to identify if two objects are same; perhaps the would point to the same address in memory i.e. same id? OR use some metaclass properties)
Ids are guaranteed to be unique for the lifetime of the object. If an object gets deleted, a new object can acquire the same id. CPython will delete items immediately when their refcount drops to zero. The garbage collector is only needed to break up reference cycles.
CPython may also cache and re-use certain immutable objects like small integers and strings defined by literals that are valid identifiers. This is an implementation detail that you should not rely upon. It is generally considered improper to use is checks on such objects.
There are certain exceptions to this rule, for example, using an is check on possibly-interned strings as an optimization before comparing them with the normal == operator is fine. The dict builtin uses this strategy for lookups to make them faster for identifiers.
a is b or a == b # This is OK
If the string happens to be interned, then the above can return true with a simple id comparison instead of a slower character-by-character comparison, but it still returns true if and only if a == b (because if a is b then a == b must also be true). However, a good implementation of .__eq__() would already do an is check internally, so at best you would only avoid the overhead of calling the .__eq__().
Thanks for the answer, would you elaborate around the uniqueness for user-defined objects, are they always unique?
The id of any object (be it user-defined or not) is unique for the lifetime of the object. It's important to distinguish objects from variables. It's possible to have two or more variables refer to the same object.
>>> a = object()
>>> b = a
>>> c = object()
>>> a is b
True
>>> a is c
False
Caching optimizations mean that you are not always guaranteed to get a new object in cases where one might naiively think one should, but this does not in any way violate the uniqueness guarantee of IDs. Builtin types like int and str may have some caching optimizations, but they follow exactly the same rules: If they are live at the same time, and their IDs are the same, then they are the same object.
Caching is not unique to builtin types. You can implement caching for your own objects.
>>> def the_one(it=object()):
... return it
...
>>> the_one() is the_one()
True
Even user-defined classes can cache instances. For example, this class only makes one instance of itself.
>>> class TheOne:
... _the_one = None
... def __new__(cls):
... if not cls._the_one:
... cls._the_one = super().__new__(cls)
... return cls._the_one
...
>>> TheOne() is TheOne() # There can be only one TheOne.
True
>>> id(TheOne()) == id(TheOne()) # This is what an is-check does.
True
Note that each construction expression evaluates to an object with the same id as the other. But this id is unique to the object. Both expressions reference the same object, so of course they have the same id.
The above class only keeps one instance, but you could also cache some other number. Perhaps recently used instances, or those configured in a way you expect to be common (as ints do), etc.
So, I was playing around with Python while answering this question, and I discovered that this is not valid:
o = object()
o.attr = 'hello'
due to an AttributeError: 'object' object has no attribute 'attr'. However, with any class inherited from object, it is valid:
class Sub(object):
pass
s = Sub()
s.attr = 'hello'
Printing s.attr displays 'hello' as expected. Why is this the case? What in the Python language specification specifies that you can't assign attributes to vanilla objects?
For other workarounds, see How can I create an object and add attributes to it?.
To support arbitrary attribute assignment, an object needs a __dict__: a dict associated with the object, where arbitrary attributes can be stored. Otherwise, there's nowhere to put new attributes.
An instance of object does not carry around a __dict__ -- if it did, before the horrible circular dependence problem (since dict, like most everything else, inherits from object;-), this would saddle every object in Python with a dict, which would mean an overhead of many bytes per object that currently doesn't have or need a dict (essentially, all objects that don't have arbitrarily assignable attributes don't have or need a dict).
For example, using the excellent pympler project (you can get it via svn from here), we can do some measurements...:
>>> from pympler import asizeof
>>> asizeof.asizeof({})
144
>>> asizeof.asizeof(23)
16
You wouldn't want every int to take up 144 bytes instead of just 16, right?-)
Now, when you make a class (inheriting from whatever), things change...:
>>> class dint(int): pass
...
>>> asizeof.asizeof(dint(23))
184
...the __dict__ is now added (plus, a little more overhead) -- so a dint instance can have arbitrary attributes, but you pay quite a space cost for that flexibility.
So what if you wanted ints with just one extra attribute foobar...? It's a rare need, but Python does offer a special mechanism for the purpose...
>>> class fint(int):
... __slots__ = 'foobar',
... def __init__(self, x): self.foobar=x+100
...
>>> asizeof.asizeof(fint(23))
80
...not quite as tiny as an int, mind you! (or even the two ints, one the self and one the self.foobar -- the second one can be reassigned), but surely much better than a dint.
When the class has the __slots__ special attribute (a sequence of strings), then the class statement (more precisely, the default metaclass, type) does not equip every instance of that class with a __dict__ (and therefore the ability to have arbitrary attributes), just a finite, rigid set of "slots" (basically places which can each hold one reference to some object) with the given names.
In exchange for the lost flexibility, you gain a lot of bytes per instance (probably meaningful only if you have zillions of instances gallivanting around, but, there are use cases for that).
As other answerers have said, an object does not have a __dict__. object is the base class of all types, including int or str. Thus whatever is provided by object will be a burden to them as well. Even something as simple as an optional __dict__ would need an extra pointer for each value; this would waste additional 4-8 bytes of memory for each object in the system, for a very limited utility.
Instead of doing an instance of a dummy class, in Python 3.3+, you can (and should) use types.SimpleNamespace for this.
It is simply due to optimization.
Dicts are relatively large.
>>> import sys
>>> sys.getsizeof((lambda:1).__dict__)
140
Most (maybe all) classes that are defined in C do not have a dict for optimization.
If you look at the source code you will see that there are many checks to see if the object has a dict or not.
So, investigating my own question, I discovered this about the Python language: you can inherit from things like int, and you see the same behaviour:
>>> class MyInt(int):
pass
>>> x = MyInt()
>>> print x
0
>>> x.hello = 4
>>> print x.hello
4
>>> x = x + 1
>>> print x
1
>>> print x.hello
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
AttributeError: 'int' object has no attribute 'hello'
I assume the error at the end is because the add function returns an int, so I'd have to override functions like __add__ and such in order to retain my custom attributes. But this all now makes sense to me (I think), when I think of "object" like "int".
https://docs.python.org/3/library/functions.html#object :
Note: object does not have a __dict__, so you can’t assign arbitrary attributes to an instance of the object class.
It's because object is a "type", not a class. In general, all classes that are defined in C extensions (like all the built in datatypes, and stuff like numpy arrays) do not allow addition of arbitrary attributes.
This is (IMO) one of the fundamental limitations with Python - you can't re-open classes. I believe the actual problem, though, is caused by the fact that classes implemented in C can't be modified at runtime... subclasses can, but not the base classes.
I constantly see people state that "Everything in Python is an object.", but I haven't seen "thing" actually defined. This saying would lead me to believe that all tokens of any kind are also considered to be objects, including operators, punctuators, whitespace, etc. Is that actually the case? Is there a more concise way of stating what a Python object actually is?
Thanks
Anything that can be assigned to a variable is an object.
That includes functions, classes, and modules, and of course int's, str's, float's, list's, and everything else. It does not include whitespace, punctuation, or operators.
Just to mention it, there is the operator module in the standard library which includes functions that implement operators; those functions are objects. That doesn't mean + or * are objects.
I could go on and on, but this is simple and pretty complete.
Some values are obviously objects; they are instances of a class, have attributes, etc.
>>> i = 3
>>> type(i)
<type 'int'>
>>> i.denominator
1
Other values are less obviously objects. Types are objects:
>>> type(int)
<type 'type'>
>>> int.__mul__(3, 5)
15
Even type is an object (of type type, oddly enough):
>>> type(type)
<type 'type'>
Modules are objects:
>>> import sys
>>> type(sys)
<type 'module'>
Built-in functions are objects:
>>> type(sum)
<type 'builtin_function_or_method'>
In short, if you can reference it by name, it's an object.
What is generally meant is that most things, for example functions and methods are objects. Modules too. Classes (not just their instances) themselves are objects. and int/float/strings are objects. So, yes, things generally tend to be objects in Python. Cyphase is correct, I just wanted to give some examples of things that might not be immediately obvious as objects.
Being objects then a number of properties are observable on things that you would consider special case, baked-in stuff in other languages. Though __dict__, which allows arbitrary attribute assignment in Python, is often missing on things intended for large volume instantiations like int.
Therefore, at least on pure-Python objects, a lot of magic can happen, from introspection to things like creating a new class on the fly.
Kinda like turtles all the way down.
You're not going to find a rigorous definition like C++11's, because Python does not have a formal specification like C++11, it has a reference manual like pre-ISO C++. The Data model chapter is as rigorous as it gets:
Objects are Python’s abstraction for data. All data in a Python program is represented by objects or by relations between objects. (In a sense, and in conformance to Von Neumann’s model of a “stored program computer,” code is also represented by objects.)
Every object has an identity, a type and a value. An object’s identity never changes once it has been created; you may think of it as the object’s address in memory. …
The glossary also has a shorter definition:
Any data with state (attributes or value) and defined behavior (methods).
And it's true that everything in Python has methods and (other) attributes. Even if there are no public methods, there's a set of special methods and values inherited from the object base class, like the __str__ method.
This wasn't true in versions of Python before 2.2, which is part of the reason we have multiple words for nearly the same thing—object, data, value; type, class… But from then on, the following kinds of things are identical:
Objects.
Things that can be returned or yielded by a function.
Things that can be stored in a variable (including a parameter).
Things that are instances of type object (usually indirectly, through a subclass or two).
Things that can be the value resulting from an expression.
Things represented by pointers to PyObject structs in CPython.
… and so on.
That's what "everything is an object" means.
It also means that Python doesn't have "native types" and "class types" like Java, or "value types" and "reference types" like C#; there's only one kind of thing, objects.
This saying would lead me to believe that all tokens of any kind are also considered to be objects, including operators, punctuators, whitespace, etc. Is that actually the case?
No. Those things don't have values, so they're not objects.1
Also, variables are not objects. Unlike C-style variables, Python variables are not memory locations with a type containing a value, they're just names bound to a value in some namespace.2 And that's why you can't pass around references to variables; there is no "thing" to reference.3
Assignment targets are also not objects. They sometimes look a lot like values, and even the core devs sometimes refer to things like the a, b in a, b = 1, 2 loosely as a tuple object—but there is no tuple there.4
There's also a bit of apparent vagueness with things like elements of a numpy.array (or an array.array or ctypes.Structure). When you write a[0] = 3, the 3 object doesn't get stored in the array the way it would with a list. Instead, numpy stores some bytes that Python doesn't even understand, but that it can use to do "the same thing a 3 would do" in array-wide operations, or to make a new copy of the 3 object if you later ask for a[0] = 3.
But if you go back to the definition, it's pretty clear that this "virtual 3" is not an object—while it has a type and value, it does not have an identity.
1. At the meta level, you can write an import hook that can act on imported code as a byte string, a decoded Unicode string, a list of token tuples, an AST node, a code object, or a module, and all of those are objects… But at the "normal" level, from within the code being imported, tokens, etc. are not objects.
2. Under the covers, there's almost always a string object to represent that name, stored in a dict or tuple that represents the namespace, as you can see by calling globals() or dir(self). But that's not what the variable is.
3. A closure cell is sort of a way of representing a reference to a variable, but really, it's the cell itself that's an object, and the variables at different scopes are just a slightly special kind of name for that cell.
4. However, in a[0] = 3, although a[0] isn't a value, a and 0 are, because that assignment is equivalent to the expression a.__setitem__(0, 3), except that it's not an expression.
When python gives me the location of an object in memory, what is that for, other than distinguishing between instances in the interactive prompt?
Example:
>>>inspect.walktree
<function walktree at 0x2a97410>
This is the default string representation that is returned if you call repr(obj) on an object which doesn't define the magic __repr__ method (or didn't override the default implementation inherited from object, in the case of new-style objects).
That default string has the purpose of giving the programmer useful information about the type and identity of the underlying object.
Additional information
Internally, the id function is called to get the number included in the string:
>>> o = object()
>>> o
<object object at 0x7fafd75d10a0>
>>> id(o)
140393209204896
>>> "%x" % id(o)
'7fafd75d10a0'
Note that id does NOT represent a unique ID. It can happen that during the lifetime of a program several objects will have the same ID (although never at the the same time).
It also does not have to correlate with the location of the object in memory (although it does in CPython).
You can easily override the representation string for your own classes, by the way:
class MyClass(object):
def __repr__(self):
return "meaningful representation (or is it?)"
This is just a default representation for objects that don't have the __repr__ magic method.
Indeed, the address has no other purpose than "distinguishing between instances".
This is an implementation detail and you shouldn't rely on it:
$ python
>>> import inspect
>>> inspect.walktree
<function walktree at 0x7f07899c9230>
>>> id(inspect.walktree)
139670350238256
$ jython
>>> import inspect
>>> inspect.walktree
<function walktree 1>
>>> id(inspect.walktree)
1
The number being displayed is just an identity that can be use for testing with the is operator to check if two object are the same one. As already said, Whether that number is a memory location or not, is an implementation detail.
So, I was playing around with Python while answering this question, and I discovered that this is not valid:
o = object()
o.attr = 'hello'
due to an AttributeError: 'object' object has no attribute 'attr'. However, with any class inherited from object, it is valid:
class Sub(object):
pass
s = Sub()
s.attr = 'hello'
Printing s.attr displays 'hello' as expected. Why is this the case? What in the Python language specification specifies that you can't assign attributes to vanilla objects?
For other workarounds, see How can I create an object and add attributes to it?.
To support arbitrary attribute assignment, an object needs a __dict__: a dict associated with the object, where arbitrary attributes can be stored. Otherwise, there's nowhere to put new attributes.
An instance of object does not carry around a __dict__ -- if it did, before the horrible circular dependence problem (since dict, like most everything else, inherits from object;-), this would saddle every object in Python with a dict, which would mean an overhead of many bytes per object that currently doesn't have or need a dict (essentially, all objects that don't have arbitrarily assignable attributes don't have or need a dict).
For example, using the excellent pympler project (you can get it via svn from here), we can do some measurements...:
>>> from pympler import asizeof
>>> asizeof.asizeof({})
144
>>> asizeof.asizeof(23)
16
You wouldn't want every int to take up 144 bytes instead of just 16, right?-)
Now, when you make a class (inheriting from whatever), things change...:
>>> class dint(int): pass
...
>>> asizeof.asizeof(dint(23))
184
...the __dict__ is now added (plus, a little more overhead) -- so a dint instance can have arbitrary attributes, but you pay quite a space cost for that flexibility.
So what if you wanted ints with just one extra attribute foobar...? It's a rare need, but Python does offer a special mechanism for the purpose...
>>> class fint(int):
... __slots__ = 'foobar',
... def __init__(self, x): self.foobar=x+100
...
>>> asizeof.asizeof(fint(23))
80
...not quite as tiny as an int, mind you! (or even the two ints, one the self and one the self.foobar -- the second one can be reassigned), but surely much better than a dint.
When the class has the __slots__ special attribute (a sequence of strings), then the class statement (more precisely, the default metaclass, type) does not equip every instance of that class with a __dict__ (and therefore the ability to have arbitrary attributes), just a finite, rigid set of "slots" (basically places which can each hold one reference to some object) with the given names.
In exchange for the lost flexibility, you gain a lot of bytes per instance (probably meaningful only if you have zillions of instances gallivanting around, but, there are use cases for that).
As other answerers have said, an object does not have a __dict__. object is the base class of all types, including int or str. Thus whatever is provided by object will be a burden to them as well. Even something as simple as an optional __dict__ would need an extra pointer for each value; this would waste additional 4-8 bytes of memory for each object in the system, for a very limited utility.
Instead of doing an instance of a dummy class, in Python 3.3+, you can (and should) use types.SimpleNamespace for this.
It is simply due to optimization.
Dicts are relatively large.
>>> import sys
>>> sys.getsizeof((lambda:1).__dict__)
140
Most (maybe all) classes that are defined in C do not have a dict for optimization.
If you look at the source code you will see that there are many checks to see if the object has a dict or not.
So, investigating my own question, I discovered this about the Python language: you can inherit from things like int, and you see the same behaviour:
>>> class MyInt(int):
pass
>>> x = MyInt()
>>> print x
0
>>> x.hello = 4
>>> print x.hello
4
>>> x = x + 1
>>> print x
1
>>> print x.hello
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
AttributeError: 'int' object has no attribute 'hello'
I assume the error at the end is because the add function returns an int, so I'd have to override functions like __add__ and such in order to retain my custom attributes. But this all now makes sense to me (I think), when I think of "object" like "int".
https://docs.python.org/3/library/functions.html#object :
Note: object does not have a __dict__, so you can’t assign arbitrary attributes to an instance of the object class.
It's because object is a "type", not a class. In general, all classes that are defined in C extensions (like all the built in datatypes, and stuff like numpy arrays) do not allow addition of arbitrary attributes.
This is (IMO) one of the fundamental limitations with Python - you can't re-open classes. I believe the actual problem, though, is caused by the fact that classes implemented in C can't be modified at runtime... subclasses can, but not the base classes.