I was ashamed of a question that occupied my mind, if it is possible and you have the opportunity, thank you for answering: that when we create a instance of a class, the methods of that instance object, especially that instance, are created with the instance (object) or i mean that to run a method, the address of that method in the class with object parameters is referred to as the method class, and if this is not done, it does not cause memory consuming? I did a lot of research on this subject, but I was not arrested much, and I wrote and executed this code:
class a:
def func1(self,name):
print("hello")
b=a()
c=a()
print(id(a.func1))
print(id(b.func1))
print(id(c.func1))
The address I got from the last two lines is exactly the same. The output was something like this:
76767678900
87665677888
87665677888
And why 2 last address is alike?
Thanks a lot
The first address corresponds to the original function (you accessed it on the class, so it didn't bind it, you just saw the address where the raw function itself was allocated).
The other two (identical) addresses are bound method objects. You immediately released the bound method it allocated, and CPython makes use of both per-type freelists (not sure if any involved here) and a small object allocator that will frequently return the same memory just freed if you ask for the same amount of memory immediately thereafter. If you extracted the underlying function from the bound method, e.g.:
print(id(b.func1.__func__))
you'd see it is the same as a.func1 (and that value will be stable, where the address of the bound methods could differ every time you bind them).
In short, ids are only unique within the current set of objects in the program; if you release one of those objects, its id could appear attached to some other newly allocated object immediately thereafter.
Related
This question already has answers here:
What is the most preferred way to pass object attributes to a function in Python?
(5 answers)
Closed 2 months ago.
A class
class Test:
self.model = model
self.type = type
self.version = version
...
test = Test()
Functions
def get_type_1(test):
if test.model == "something" and test.type == "something" and type.version == "something"
return "value"
def get_type_2(model, type, version):
if model == "something" and type == "something" and version == "something"
return "value"
From the perspective of "clean code" which type of function should I use? I couch myself using type_1 when there are more arguments and type_2 where there is 1-2 of them. Which is making a logical mess in my program. Do I need to worry in Python about speed and memory passing class all the time?
Prefer the 1st form, for three reasons.
You're not shadowing the type builtin. (Trivial, could use alternate spelling type_)
More convenient for the caller, and for folks reading the calling code.
Those three things go together. Better to show that, with the representation.
When we speak of (model, type, version),
they could be nearly anything.
There's no clear relationship among them,
and no name to hang documentation upon.
OTOH the object may have well-understood constraints,
perhaps "model is never Edsel when version > 3".
We can consult the documentation,
and the implementation,
to understand the class invariants.
Sometimes mutability is a concern.
That is, a caller might have passed in an
object with foo(test), and then we're
worried that library routine foo might possibly have
changed model "Colt" to "Bronco".
Often the docs, implicit or explicit,
will make clear that such mutations
are out of bounds, they will not happen.
To make things very obvious with
minimal documentation burden, consider
using a named tuple
for those three fields in the example.
need to worry in Python about speed and memory passing class all the time?
No.
Python is about clarity of communicating a technical
idea to other humans. It is not about speed.
Recall Knuth's advice. If speed was a principal
concern, you would have already used
cProfile
to identify the hot spots that should be
implemented in e.g. Rust, cython, or C++.
Usually that only becomes important when you
notice you're often looping more than a thousand
or a million times.
Use dis.dis()
to disassemble your two functions.
Notice that caller1 pushed a single reference
to test, while caller2 spent more time and
more stack memory pushing three references.
Down in the target code, we still need to
chase three references, so that's mostly a wash.
If you pass an object with a dozen attributes,
of which just three will be used, that's no
burden on the bytecode interpreter, the other
nine are simply never touched.
It can be an intellectual burden on an engineer
maintaining the code, who might need to reason
about those nine and dismiss them as not a concern.
Another concern that a paranoid caller might have
about called library code relates to references.
Typically we expect the called routine will not
permanently hold a reference (or weakref) on
the passed test object, nor on attributes
such as test.version or test.version.history_dict.
If the library routine will store a reference for a
long time, or pass a reference to someone that will
store it, well, that's worth documenting.
Caller will want to understand memory consumption,
leaks, and object lifetime.
Should you declare private instance variables of a class in the init function? My code works perfectly fine without doing this, but PyCharm tells me to do this when highlighting warnings.
It's generally considered good practice to assign to all instance variables in __init__, even if some of them are lazily given real values and all you can do in __init__ is give them a sentinel that means "No value here yet" (e.g. None). There are two reasons for this:
Maintainer benefit: If you don't follow this guideline, determining the complete set of attributes the class may have involves reading the entire class to look for lazily added attributes. It's a lot easier if maintainers can count on __init__ to provide the complete set of attributes, even if some of them are given real values elsewhere.
(On modern CPython, as an implementation detail) Reduced memory usage: When all instances of a class are given the same set of attributes, in the same order, and the set of attributes is not modified unpredictably after __init__ (it's okay to reassign an attribute, just not to add or delete attributes), CPython uses a key-sharing dictionary to hold the attributes for each instance. The hash table itself that stores the keys ends up shared, tied to the class itself, and only the cheap array containing the values for the instance's attributes ends up costing memory. For the case of a class with a single attribute, this reduces the per-instance __dict__ size from 232 bytes to 104, and the ratio remains similar as the number of attributes grows (the key-sharing __dict__ costs less than half as much memory as a non-key-sharing __dict__).
The Python official document specifies that a docstring is a string literal that occurs at the beginning of a function. And it can be accessed using the __doc__ attribute.
If I have a function that will be called many many times, does that mean the docstring will be declared every time the function is called?
If this is the case, would it be more efficient to design docstring in such a way that it is stored in __doc__ but not being declared every time the function is called?
every time you start a python program, they are "important" into memory only "once", parsed so that every "object" properties are determined and all objects are put into their separate memory locations and then linked together in the memory to make it a whole running system (remember the object nature of python).
second behavior is when you don't restrict the python interpreter. If you import your files, then, in addition to the above steps, it writes these objects into more durable .pyc files under __pycache__ folder at the same level of the file.
In this process, new objects are created to have a __doc__ property object when certain keywords are parsed, mainly class and def. These __doc__ of each are then either kept empty, filled some default by inheritance, or if it has a docstring then it is written inside.
You can see this behavior on different objects, created with/out supplying a docstring, simply by using dir(objectname). To answer your question, you can use this command throughout your program.
However, this is true only for static written code. If you are trying to make objects on the fly, especially within loops, then your objects will be actively created and destroyed, thus there will be almost no optimization against them and docstrings will be created again and again.
consider these two:
def staticMethod():
pass
for i in range(5):
def activeMethod():
pass
print(staticMethod,"s")
print(activeMethod,"a")
while staticMethod is served from the same memory location, the memory address for activeMethod changes. you will see an altering between few values because python can still optimize since this one is a simple example.
So keep yourself aware of this distinct behavior, especially of loops.
I'm doing some things in Python (3.3.3), and I came across something that is confusing me since to my understanding classes get a new id each time they are called.
Lets say you have this in some .py file:
class someClass: pass
print(someClass())
print(someClass())
The above returns the same id which is confusing me since I'm calling on it so it shouldn't be the same, right? Is this how Python works when the same class is called twice in a row or not? It gives a different id when I wait a few seconds but if I do it at the same like the example above it doesn't seem to work that way, which is confusing me.
>>> print(someClass());print(someClass())
<__main__.someClass object at 0x0000000002D96F98>
<__main__.someClass object at 0x0000000002D96F98>
It returns the same thing, but why? I also notice it with ranges for example
for i in range(10):
print(someClass())
Is there any particular reason for Python doing this when the class is called quickly? I didn't even know Python did this, or is it possibly a bug? If it is not a bug can someone explain to me how to fix it or a method so it generates a different id each time the method/class is called? I'm pretty puzzled on how that is doing it because if I wait, it does change but not if I try to call the same class two or more times.
The id of an object is only guaranteed to be unique during that object's lifetime, not over the entire lifetime of a program. The two someClass objects you create only exist for the duration of the call to print - after that, they are available for garbage collection (and, in CPython, deallocated immediately). Since their lifetimes don't overlap, it is valid for them to share an id.
It is also unsuprising in this case, because of a combination of two CPython implementation details: first, it does garbage collection by reference counting (with some extra magic to avoid problems with circular references), and second, the id of an object is related to the value of the underlying pointer for the variable (ie, its memory location). So, the first object, which was the most recent object allocated, is immediately freed - it isn't too surprising that the next object allocated will end up in the same spot (although this potentially also depends on details of how the interpreter was compiled).
If you are relying on several objects having distinct ids, you might keep them around - say, in a list - so that their lifetimes overlap. Otherwise, you might implement a class-specific id that has different guarantees - eg:
class SomeClass:
next_id = 0
def __init__(self):
self.id = SomeClass.nextid
SomeClass.nextid += 1
If you read the documentation for id, it says:
Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
And that's exactly what's happening: you have two objects with non-overlapping lifetimes, because the first one is already out of scope before the second one is ever created.
But don't trust that this will always happen, either. Especially if you need to deal with other Python implementations, or with more complicated classes. All that the language says is that these two objects may have the same id() value, not that they will. And the fact that they do depends on two implementation details:
The garbage collector has to clean up the first object before your code even starts to allocate the second object—which is guaranteed to happen with CPython or any other ref-counting implementation (when there are no circular references), but pretty unlikely with a generational garbage collector as in Jython or IronPython.
The allocator under the covers have to have a very strong preference for reusing recently-freed objects of the same type. This is true in CPython, which has multiple layers of fancy allocators on top of basic C malloc, but most of the other implementations leave a lot more to the underlying virtual machine.
One last thing: The fact that the object.__repr__ happens to contain a substring that happens to be the same as the id as a hexadecimal number is just an implementation artifact of CPython that isn't guaranteed anywhere. According to the docs:
If at all possible, this should look like a valid Python expression that could be used to recreate an object with the same value (given an appropriate environment). If this is not possible, a string of the form <...some useful description…> should be returned.
The fact that CPython's object happens to put hex(id(self)) (actually, I believe it's doing the equivalent of sprintf-ing its pointer through %p, but since CPython's id just returns the same pointer cast to a long that ends up being the same) isn't guaranteed anywhere. Even if it has been true since… before object even existed in the early 2.x days. You're safe to rely on it for this kind of simple "what's going on here" debugging at the interactive prompt, but don't try to use it beyond that.
I sense a deeper problem here. You should not be relying on id to track unique instances over the lifetime of your program. You should simply see it as a non-guaranteed memory location indicator for the duration of each object instance. If you immediately create and release instances then you may very well create consecutive instances in the same memory location.
Perhaps what you need to do is track a class static counter that assigns each new instance with a unique id, and increments the class static counter for the next instance.
It's releasing the first instance since it wasn't retained, then since nothing has happened to the memory in the meantime, it instantiates a second time to the same location.
Try this, try calling the following:
a = someClass()
for i in range(0,44):
print(someClass())
print(a)
You'll see something different. Why? Cause the memory that was released by the first object in the "foo" loop was reused. On the other hand a is not reused since it's retained.
A example where the memory location (and id) is not released is:
print([someClass() for i in range(10)])
Now the ids are all unique.
I'm doing some things in Python (3.3.3), and I came across something that is confusing me since to my understanding classes get a new id each time they are called.
Lets say you have this in some .py file:
class someClass: pass
print(someClass())
print(someClass())
The above returns the same id which is confusing me since I'm calling on it so it shouldn't be the same, right? Is this how Python works when the same class is called twice in a row or not? It gives a different id when I wait a few seconds but if I do it at the same like the example above it doesn't seem to work that way, which is confusing me.
>>> print(someClass());print(someClass())
<__main__.someClass object at 0x0000000002D96F98>
<__main__.someClass object at 0x0000000002D96F98>
It returns the same thing, but why? I also notice it with ranges for example
for i in range(10):
print(someClass())
Is there any particular reason for Python doing this when the class is called quickly? I didn't even know Python did this, or is it possibly a bug? If it is not a bug can someone explain to me how to fix it or a method so it generates a different id each time the method/class is called? I'm pretty puzzled on how that is doing it because if I wait, it does change but not if I try to call the same class two or more times.
The id of an object is only guaranteed to be unique during that object's lifetime, not over the entire lifetime of a program. The two someClass objects you create only exist for the duration of the call to print - after that, they are available for garbage collection (and, in CPython, deallocated immediately). Since their lifetimes don't overlap, it is valid for them to share an id.
It is also unsuprising in this case, because of a combination of two CPython implementation details: first, it does garbage collection by reference counting (with some extra magic to avoid problems with circular references), and second, the id of an object is related to the value of the underlying pointer for the variable (ie, its memory location). So, the first object, which was the most recent object allocated, is immediately freed - it isn't too surprising that the next object allocated will end up in the same spot (although this potentially also depends on details of how the interpreter was compiled).
If you are relying on several objects having distinct ids, you might keep them around - say, in a list - so that their lifetimes overlap. Otherwise, you might implement a class-specific id that has different guarantees - eg:
class SomeClass:
next_id = 0
def __init__(self):
self.id = SomeClass.nextid
SomeClass.nextid += 1
If you read the documentation for id, it says:
Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
And that's exactly what's happening: you have two objects with non-overlapping lifetimes, because the first one is already out of scope before the second one is ever created.
But don't trust that this will always happen, either. Especially if you need to deal with other Python implementations, or with more complicated classes. All that the language says is that these two objects may have the same id() value, not that they will. And the fact that they do depends on two implementation details:
The garbage collector has to clean up the first object before your code even starts to allocate the second object—which is guaranteed to happen with CPython or any other ref-counting implementation (when there are no circular references), but pretty unlikely with a generational garbage collector as in Jython or IronPython.
The allocator under the covers have to have a very strong preference for reusing recently-freed objects of the same type. This is true in CPython, which has multiple layers of fancy allocators on top of basic C malloc, but most of the other implementations leave a lot more to the underlying virtual machine.
One last thing: The fact that the object.__repr__ happens to contain a substring that happens to be the same as the id as a hexadecimal number is just an implementation artifact of CPython that isn't guaranteed anywhere. According to the docs:
If at all possible, this should look like a valid Python expression that could be used to recreate an object with the same value (given an appropriate environment). If this is not possible, a string of the form <...some useful description…> should be returned.
The fact that CPython's object happens to put hex(id(self)) (actually, I believe it's doing the equivalent of sprintf-ing its pointer through %p, but since CPython's id just returns the same pointer cast to a long that ends up being the same) isn't guaranteed anywhere. Even if it has been true since… before object even existed in the early 2.x days. You're safe to rely on it for this kind of simple "what's going on here" debugging at the interactive prompt, but don't try to use it beyond that.
I sense a deeper problem here. You should not be relying on id to track unique instances over the lifetime of your program. You should simply see it as a non-guaranteed memory location indicator for the duration of each object instance. If you immediately create and release instances then you may very well create consecutive instances in the same memory location.
Perhaps what you need to do is track a class static counter that assigns each new instance with a unique id, and increments the class static counter for the next instance.
It's releasing the first instance since it wasn't retained, then since nothing has happened to the memory in the meantime, it instantiates a second time to the same location.
Try this, try calling the following:
a = someClass()
for i in range(0,44):
print(someClass())
print(a)
You'll see something different. Why? Cause the memory that was released by the first object in the "foo" loop was reused. On the other hand a is not reused since it's retained.
A example where the memory location (and id) is not released is:
print([someClass() for i in range(10)])
Now the ids are all unique.