When you call the object.__repr__() method in Python you get something like this back:
<__main__.Test object at 0x2aba1c0cf890>
Is there any way to get a hold of the memory address if you overload __repr__(), other then calling super(Class, obj).__repr__() and regexing it out?
The Python manual has this to say about id():
Return the "identity'' of an object.
This is an integer (or long integer)
which is guaranteed to be unique and
constant for this object during its
lifetime. Two objects with
non-overlapping lifetimes may have the
same id() value. (Implementation note:
this is the address of the object.)
So in CPython, this will be the address of the object. No such guarantee for any other Python interpreter, though.
Note that if you're writing a C extension, you have full access to the internals of the Python interpreter, including access to the addresses of objects directly.
You could reimplement the default repr this way:
def __repr__(self):
return '<%s.%s object at %s>' % (
self.__class__.__module__,
self.__class__.__name__,
hex(id(self))
)
Just use
id(object)
There are a few issues here that aren't covered by any of the other answers.
First, id only returns:
the “identity” of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
In CPython, this happens to be the pointer to the PyObject that represents the object in the interpreter, which is the same thing that object.__repr__ displays. But this is just an implementation detail of CPython, not something that's true of Python in general. Jython doesn't deal in pointers, it deals in Java references (which the JVM of course probably represents as pointers, but you can't see those—and wouldn't want to, because the GC is allowed to move them around). PyPy lets different types have different kinds of id, but the most general is just an index into a table of objects you've called id on, which is obviously not going to be a pointer. I'm not sure about IronPython, but I'd suspect it's more like Jython than like CPython in this regard. So, in most Python implementations, there's no way to get whatever showed up in that repr, and no use if you did.
But what if you only care about CPython? That's a pretty common case, after all.
Well, first, you may notice that id is an integer;* if you want that 0x2aba1c0cf890 string instead of the number 46978822895760, you're going to have to format it yourself. Under the covers, I believe object.__repr__ is ultimately using printf's %p format, which you don't have from Python… but you can always do this:
format(id(spam), '#010x' if sys.maxsize.bit_length() <= 32 else '#18x')
* In 3.x, it's an int. In 2.x, it's an int if that's big enough to hold a pointer—which is may not be because of signed number issues on some platforms—and a long otherwise.
Is there anything you can do with these pointers besides print them out? Sure (again, assuming you only care about CPython).
All of the C API functions take a pointer to a PyObject or a related type. For those related types, you can just call PyFoo_Check to make sure it really is a Foo object, then cast with (PyFoo *)p. So, if you're writing a C extension, the id is exactly what you need.
What if you're writing pure Python code? You can call the exact same functions with pythonapi from ctypes.
Finally, a few of the other answers have brought up ctypes.addressof. That isn't relevant here. This only works for ctypes objects like c_int32 (and maybe a few memory-buffer-like objects, like those provided by numpy). And, even there, it isn't giving you the address of the c_int32 value, it's giving you the address of the C-level int32 that the c_int32 wraps up.
That being said, more often than not, if you really think you need the address of something, you didn't want a native Python object in the first place, you wanted a ctypes object.
Just in response to Torsten, I wasn't able to call addressof() on a regular python object. Furthermore, id(a) != addressof(a). This is in CPython, don't know about anything else.
>>> from ctypes import c_int, addressof
>>> a = 69
>>> addressof(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: invalid type
>>> b = c_int(69)
>>> addressof(b)
4300673472
>>> id(b)
4300673392
You can get something suitable for that purpose with:
id(self)
With ctypes, you can achieve the same thing with
>>> import ctypes
>>> a = (1,2,3)
>>> ctypes.addressof(a)
3077760748L
Documentation:
addressof(C instance) -> integer
Return the address of the C instance internal buffer
Note that in CPython, currently id(a) == ctypes.addressof(a), but ctypes.addressof should return the real address for each Python implementation, if
ctypes is supported
memory pointers are a valid notion.
Edit: added information about interpreter-independence of ctypes
I know this is an old question but if you're still programming, in python 3 these days... I have actually found that if it is a string, then there is a really easy way to do this:
>>> spam.upper
<built-in method upper of str object at 0x1042e4830>
>>> spam.upper()
'YO I NEED HELP!'
>>> id(spam)
4365109296
string conversion does not affect location in memory either:
>>> spam = {437 : 'passphrase'}
>>> object.__repr__(spam)
'<dict object at 0x1043313f0>'
>>> str(spam)
"{437: 'passphrase'}"
>>> object.__repr__(spam)
'<dict object at 0x1043313f0>'
You can get the memory address/location of any object by using the 'partition' method of the built-in 'str' type.
Here is an example of using it to get the memory address of an object:
Python 3.8.3 (default, May 27 2020, 02:08:17)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> object.__repr__(1)
'<int object at 0x7ca70923f0>'
>>> hex(int(object.__repr__(1).partition('object at ')[2].strip('>'), 16))
0x7ca70923f0
>>>
Here, I am using the built-in 'object' class' '__repr__' method with an object/item such as 1 as an argument to return the string and then I am partitioning that string which will return a tuple of the string before the string that I provided, the string that I provided and then the string after the string that I provided, and as the memory location is positioned after 'object at', I can get the memory address as it has partitioned it from that part.
And then as the memory address was returned as the third item in the returned tuple, I can access it with index 2 from the tuple. But then, it has a right angled bracket as a suffix in the string that I obtained, so I use the 'strip' function to remove it, which will return it without the angled bracket. I then transformed the resulted string into an integer with base 16 and then turn it into a hex number.
While it's true that id(object) gets the object's address in the default CPython implementation, this is generally useless... you can't do anything with the address from pure Python code.
The only time you would actually be able to use the address is from a C extension library... in which case it is trivial to get the object's address since Python objects are always passed around as C pointers.
If the __repr__ is overloaded, you may consider __str__ to see the memory address of the variable.
Here is the details of __repr__ versus __str__ by Moshe Zadka in StackOverflow.
There is a way to recovery the value from the 'id' command, here it the TL;DR.
ctypes.cast(memory_address,ctypes.py_object).value
source
Related
I constantly see people state that "Everything in Python is an object.", but I haven't seen "thing" actually defined. This saying would lead me to believe that all tokens of any kind are also considered to be objects, including operators, punctuators, whitespace, etc. Is that actually the case? Is there a more concise way of stating what a Python object actually is?
Thanks
Anything that can be assigned to a variable is an object.
That includes functions, classes, and modules, and of course int's, str's, float's, list's, and everything else. It does not include whitespace, punctuation, or operators.
Just to mention it, there is the operator module in the standard library which includes functions that implement operators; those functions are objects. That doesn't mean + or * are objects.
I could go on and on, but this is simple and pretty complete.
Some values are obviously objects; they are instances of a class, have attributes, etc.
>>> i = 3
>>> type(i)
<type 'int'>
>>> i.denominator
1
Other values are less obviously objects. Types are objects:
>>> type(int)
<type 'type'>
>>> int.__mul__(3, 5)
15
Even type is an object (of type type, oddly enough):
>>> type(type)
<type 'type'>
Modules are objects:
>>> import sys
>>> type(sys)
<type 'module'>
Built-in functions are objects:
>>> type(sum)
<type 'builtin_function_or_method'>
In short, if you can reference it by name, it's an object.
What is generally meant is that most things, for example functions and methods are objects. Modules too. Classes (not just their instances) themselves are objects. and int/float/strings are objects. So, yes, things generally tend to be objects in Python. Cyphase is correct, I just wanted to give some examples of things that might not be immediately obvious as objects.
Being objects then a number of properties are observable on things that you would consider special case, baked-in stuff in other languages. Though __dict__, which allows arbitrary attribute assignment in Python, is often missing on things intended for large volume instantiations like int.
Therefore, at least on pure-Python objects, a lot of magic can happen, from introspection to things like creating a new class on the fly.
Kinda like turtles all the way down.
You're not going to find a rigorous definition like C++11's, because Python does not have a formal specification like C++11, it has a reference manual like pre-ISO C++. The Data model chapter is as rigorous as it gets:
Objects are Python’s abstraction for data. All data in a Python program is represented by objects or by relations between objects. (In a sense, and in conformance to Von Neumann’s model of a “stored program computer,” code is also represented by objects.)
Every object has an identity, a type and a value. An object’s identity never changes once it has been created; you may think of it as the object’s address in memory. …
The glossary also has a shorter definition:
Any data with state (attributes or value) and defined behavior (methods).
And it's true that everything in Python has methods and (other) attributes. Even if there are no public methods, there's a set of special methods and values inherited from the object base class, like the __str__ method.
This wasn't true in versions of Python before 2.2, which is part of the reason we have multiple words for nearly the same thing—object, data, value; type, class… But from then on, the following kinds of things are identical:
Objects.
Things that can be returned or yielded by a function.
Things that can be stored in a variable (including a parameter).
Things that are instances of type object (usually indirectly, through a subclass or two).
Things that can be the value resulting from an expression.
Things represented by pointers to PyObject structs in CPython.
… and so on.
That's what "everything is an object" means.
It also means that Python doesn't have "native types" and "class types" like Java, or "value types" and "reference types" like C#; there's only one kind of thing, objects.
This saying would lead me to believe that all tokens of any kind are also considered to be objects, including operators, punctuators, whitespace, etc. Is that actually the case?
No. Those things don't have values, so they're not objects.1
Also, variables are not objects. Unlike C-style variables, Python variables are not memory locations with a type containing a value, they're just names bound to a value in some namespace.2 And that's why you can't pass around references to variables; there is no "thing" to reference.3
Assignment targets are also not objects. They sometimes look a lot like values, and even the core devs sometimes refer to things like the a, b in a, b = 1, 2 loosely as a tuple object—but there is no tuple there.4
There's also a bit of apparent vagueness with things like elements of a numpy.array (or an array.array or ctypes.Structure). When you write a[0] = 3, the 3 object doesn't get stored in the array the way it would with a list. Instead, numpy stores some bytes that Python doesn't even understand, but that it can use to do "the same thing a 3 would do" in array-wide operations, or to make a new copy of the 3 object if you later ask for a[0] = 3.
But if you go back to the definition, it's pretty clear that this "virtual 3" is not an object—while it has a type and value, it does not have an identity.
1. At the meta level, you can write an import hook that can act on imported code as a byte string, a decoded Unicode string, a list of token tuples, an AST node, a code object, or a module, and all of those are objects… But at the "normal" level, from within the code being imported, tokens, etc. are not objects.
2. Under the covers, there's almost always a string object to represent that name, stored in a dict or tuple that represents the namespace, as you can see by calling globals() or dir(self). But that's not what the variable is.
3. A closure cell is sort of a way of representing a reference to a variable, but really, it's the cell itself that's an object, and the variables at different scopes are just a slightly special kind of name for that cell.
4. However, in a[0] = 3, although a[0] isn't a value, a and 0 are, because that assignment is equivalent to the expression a.__setitem__(0, 3), except that it's not an expression.
Background
I came to know recently that this is because the garbage collection would clear the contents of the location anytime, so relying on it would be a bad idea. There could be some other reason too, but I don't know.
I also came to know we could access an object given its location using C, because the in CPython address=id of the object. (I should thank the IRC guys for this.). But I haven't tried it.
I am talking about this address (id):
address = id(object_name)
or may be this one (if that helps):
hex_address = hex(id(object))
Anyway, I still think it would have been better if they gave some method that could do that for me.
I wouldn't want to use such a method in practice, but it bothers me that we have an object and something that would give its address, but nothing that does the vice-versa.
Question
Why was this decision made?
Can we do this using crazy introspection/hack at the Python level? I have been told we can't do that at Python level, but I just wanted to be sure.
The simplest answer would be: "because it is not needed, and it is easier to maintain the code without low level access to variables".
A bit more elaborate is that everything you could do with such pointer, you can also do with basic references in python, or weakreferences (if you want to refer to some object without forbidding its garbage collection).
regarding "hacking":
You can iterate through garbage collector and take out the object
import gc
def objects_by_id(id_):
for obj in gc.get_objects():
if id(obj) == id_:
return obj
You can use mxtools
mx.Tools.makeref(id_)
You can use ctypes
ctypes.cast(id_, ctypes.py_object).value
As I wrote elsewhere:
id is only defined as a number unique to the element among currently existing elements. Some Python implementations (in fact, all main ones but CPython) do not return the memory address.
%~> pypy
Python 2.7.3 (480845e6b1dd219d0944e30f62b01da378437c6c, Aug 08 2013, 17:02:19)
[PyPy 2.1.0 with GCC 4.8.1 20130725 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
And now for something completely different: ``arguably, everything is a niche''
>>>> a = 1
>>>> b = 2
>>>> c = 3
>>>> id(a)
9L
>>>> id(b)
17L
>>>> id(c)
25L
So you have to guarantee that it is the memory address. Furthermore, because of this Python provides no id → object mapping, especially as the object that an id maps to can be changed if the original is deleted.
You have to ask why you're holding the id. If it's for space reasons, bear in mind that containers actually hold references to items, so [a, a, a, a, a] actually takes less space than [id(a), id(a), id(a), id(a), id(a)]; a.
You can consider also making a dict of {id: val} for all the relevant items and storing that. This will keep val alive, so you can use weakrefs to allow the vals to be garbage collected. Remember, use weakref if you want a weakref.
So basically it's because there's no reliable solution that's platform-independant.
it bothers me that we have an object and something that would give its address
Then just remember that we do not. CPython only optimises id under the (correct) assumption that the address is unique. You should never treat is as an address because it's not defined to be.
Why was this decision made?
Because if we were to access things from their id we'd be able to do all sorts of stupid stuff like accessing uninitialised stuff. It also prevents interpreters from optimising things by moving addresses around (JIT compilers like PyPy could not exist as easily if items had to have memory addresses). Furthermore, there is no guarantee that the item is either alive or even the same item at any point.
When references take less space than an integer (which is a reference + an numeric object) there is no point just not using a reference(or a weakref if preferred), which will always do the right thing.
I recently stumbled on this kickass python extension package, Brian Hears that will solve all my coding issues. Problem is, some of the functions return memory addresses instead of expected results. For example:
>>> Parameterize(source, 256, 128)
Out[1]: <Parameterize.Parameterize at 0xda445f8>
I've never seen this before (and don't know its proper name); however, the internet tells me that it's a representation of the memory address of where my result is stored.
I'm really just interested in the result itself. How does one usually go about extracting the actual data from the address in python, or rather the numpy array that the function should (or at least i think it should) return?
Thanks in advance.
EDIT: Added name and link of package
It is returning an object. You should do
p = Parameterize(source, 256, 128)
res = p.usefull_attribute
and then get your results from the object attributes/properties. You can use python's self-documentation (dir(p), help(p) (as pointed out in other answers + comments)) to get python to tell you what attributes/methods your object has.
What it is printing out is the default string representation of your object, this is it's type and location.
Although it is possible that a very thin Python wrapper, written with ctypes, to some library does actually returns you memory pointers, that does not seen to be the case.
The representation on the likes of <Parameterize.Parameterize at 0xda445f8> as you have, is the standard string representation for Python objects.
Even though it actually means a memory address, that number has no use in Python, but to work as an id for your object. (You get hold of it wit "id(object)" ).
To find-out how to use the module you are using, since you are on the interactive prompt, make use of the help and dir introspection builtins to find out what attributes and methods are available on your Parameterize object:
>>> p =Parameterize(source, 256, 128)
>>> p
Out[1]: <Parameterize.Parameterize at 0xda445f8>
>>> dir(p)
>>> help(p)
If libarary which you are using is written in C (or C++) and if functions return "popular" (int, str, etc) types you can be interested in ctypes module (boost.python, swing) to wrap C calls with python types. Then you can use this library as python's one. Of course you have to do conversion (you have to define type) with ctypes. For complex structures you probably have to do it on your own like tcaswell said.
When python gives me the location of an object in memory, what is that for, other than distinguishing between instances in the interactive prompt?
Example:
>>>inspect.walktree
<function walktree at 0x2a97410>
This is the default string representation that is returned if you call repr(obj) on an object which doesn't define the magic __repr__ method (or didn't override the default implementation inherited from object, in the case of new-style objects).
That default string has the purpose of giving the programmer useful information about the type and identity of the underlying object.
Additional information
Internally, the id function is called to get the number included in the string:
>>> o = object()
>>> o
<object object at 0x7fafd75d10a0>
>>> id(o)
140393209204896
>>> "%x" % id(o)
'7fafd75d10a0'
Note that id does NOT represent a unique ID. It can happen that during the lifetime of a program several objects will have the same ID (although never at the the same time).
It also does not have to correlate with the location of the object in memory (although it does in CPython).
You can easily override the representation string for your own classes, by the way:
class MyClass(object):
def __repr__(self):
return "meaningful representation (or is it?)"
This is just a default representation for objects that don't have the __repr__ magic method.
Indeed, the address has no other purpose than "distinguishing between instances".
This is an implementation detail and you shouldn't rely on it:
$ python
>>> import inspect
>>> inspect.walktree
<function walktree at 0x7f07899c9230>
>>> id(inspect.walktree)
139670350238256
$ jython
>>> import inspect
>>> inspect.walktree
<function walktree 1>
>>> id(inspect.walktree)
1
The number being displayed is just an identity that can be use for testing with the is operator to check if two object are the same one. As already said, Whether that number is a memory location or not, is an implementation detail.
I have been reading about repr in Python. I was wondering what the application of the output of repr is. e.g.
class A:
pass
repr(A) ='<class __main__.A at 0x6f570>'
b=A()
repr(b) = '<__main__.A instance at 0x74d78>'
When would one be interested in '<class __main__.A at 0x6f570>' or'<__main__.A instance at 0x74d78>'?
Theoretically, repr(obj) should spit out a string such that it can be fed into eval to recreate the object. In other words,
obj2 = eval(repr(obj1))
should reproduce the object.
In practice, repr is often a "lite" version of str. str might print a human-readable form of the object, whereas repr prints out information like the object's class, usually for debugging purposes. But the usefulness depends a lot on your situation and how the object in question handles repr.
Sometimes you have to deal with or present a byte string such as
bob2='bob\xf0\xa4\xad\xa2'
If you print this out (in Ubuntu) you get
In [62]: print(bob2)
bob𤭢
which is not very helpful to others trying to understand your byte string. In the comments, John points out that in Windows, print(bob2) results in something like bobð¤¢. The problem is that Python detects the default encoding of your terminal/console and tries to decode the byte string according to that encoding. Since Ubuntu and Windows uses different default encodings (possibly utf-8 and cp1252 respectively), different results ensue.
In contrast, the repr of a string is unambiguous:
In [63]: print(repr(bob2))
'bob\xf0\xa4\xad\xa2'
When people post questions here on SO about Python strings, they are often asked to show the repr of the string so we know for sure what string they are dealing with.
In general, the repr should be an unambiguous string representation of the object. repr(obj) calls the object obj's __repr__ method. Since in your example the class A does not have its own __repr__ method, repr(b) resorts to indicating the class and memory address.
You can override the __repr__ method to give more relevant information.
In your example, '<__main__.A instance at 0x74d78>' tells us two useful things:
that b is an instance of class A
in the __main__
namespace,
and that the object resides in
memory at address 0x74d78.
You might for instance, have two instances of class A. If they have the same memory address then you'd know they are "pointing" to the same underlying object. (Note this information can also be obtained using id).
The main purpose of repr() is that it is used in the interactive interpreter and in the debugger to format objects in human-readable form. The example you gave is mainly useful for debugging purposes.