Python: Extract data from memory location

Python: Extract data from memory location - python

I recently stumbled on this kickass python extension package, Brian Hears that will solve all my coding issues. Problem is, some of the functions return memory addresses instead of expected results. For example:
>>> Parameterize(source, 256, 128)
Out[1]: <Parameterize.Parameterize at 0xda445f8>
I've never seen this before (and don't know its proper name); however, the internet tells me that it's a representation of the memory address of where my result is stored.
I'm really just interested in the result itself. How does one usually go about extracting the actual data from the address in python, or rather the numpy array that the function should (or at least i think it should) return?
Thanks in advance.
EDIT: Added name and link of package

It is returning an object. You should do
p = Parameterize(source, 256, 128)
res = p.usefull_attribute
and then get your results from the object attributes/properties. You can use python's self-documentation (dir(p), help(p) (as pointed out in other answers + comments)) to get python to tell you what attributes/methods your object has.
What it is printing out is the default string representation of your object, this is it's type and location.

Although it is possible that a very thin Python wrapper, written with ctypes, to some library does actually returns you memory pointers, that does not seen to be the case.
The representation on the likes of <Parameterize.Parameterize at 0xda445f8> as you have, is the standard string representation for Python objects.
Even though it actually means a memory address, that number has no use in Python, but to work as an id for your object. (You get hold of it wit "id(object)" ).
To find-out how to use the module you are using, since you are on the interactive prompt, make use of the help and dir introspection builtins to find out what attributes and methods are available on your Parameterize object:
>>> p =Parameterize(source, 256, 128)
>>> p
Out[1]: <Parameterize.Parameterize at 0xda445f8>
>>> dir(p)
>>> help(p)

If libarary which you are using is written in C (or C++) and if functions return "popular" (int, str, etc) types you can be interested in ctypes module (boost.python, swing) to wrap C calls with python types. Then you can use this library as python's one. Of course you have to do conversion (you have to define type) with ctypes. For complex structures you probably have to do it on your own like tcaswell said.

Related

Understanding ctx in Python's ast

What is the ctx argument in the Python AST representation? For example:
>>> print(ast.dump(ast.parse('-a')))
Module(body=[Expr(value=UnaryOp(op=USub(), operand=Name(id='a', ctx=Load())))])
In other words, what does ctx=Load() mean or do? The only information I'm able to see from the docs is that the ctx may be one of:
expr_context = Load | Store | Del | AugLoad | AugStore | Param
https://docs.python.org/3.7/library/ast.html. Could someone explain the various expr_context and what those do? I suppose lhs and rhs are the store/load:
>>> print(ast.dump(ast.parse('b=-a')))
Module(body=[Assign(targets=[Name(id='b', ctx=Store())], value=UnaryOp(op=USub(), operand=Name(id='a', ctx=Load())))])
But beyond that, what are all the other options?
Update: Also, yes there is another question similar to this, Python AST: several semantics unclear, e.g. expr_context, but the accepted answer starts with "After some more testing and guessing:..." and it pretty light (to say the least) on details. I'm hoping that someone who actually understands the ast module a bit more can provide a more thorough answer.

An expression in Load context is having its value computed. Store means an expression is being assigned to (including in ways like being used as a with or for target), and Del means that an expression is being deleted (with del). This is described in the Python 3.9 ast docs, which are much better than the 3.7 docs.
Param, AugLoad, and AugStore can be safely ignored. As of Python 3.7, they never appear in an actual AST, and as of 3.9, they are completely gone, even at the implementation level. In 3.7, AugLoad and AugStore sometimes appeared in temporary objects created internally by the compiler, but never in an actual AST. As far as I can tell, Param hasn't been used ever since the introduction of function annotations forced a redesign of the AST representation for function parameters in Python 3.0.

Address Python prints for an identifier [duplicate]

When you call the object.__repr__() method in Python you get something like this back:
<__main__.Test object at 0x2aba1c0cf890>
Is there any way to get a hold of the memory address if you overload __repr__(), other then calling super(Class, obj).__repr__() and regexing it out?

The Python manual has this to say about id():
Return the "identity'' of an object.
This is an integer (or long integer)
which is guaranteed to be unique and
constant for this object during its
lifetime. Two objects with
non-overlapping lifetimes may have the
same id() value. (Implementation note:
this is the address of the object.)
So in CPython, this will be the address of the object. No such guarantee for any other Python interpreter, though.
Note that if you're writing a C extension, you have full access to the internals of the Python interpreter, including access to the addresses of objects directly.

You could reimplement the default repr this way:
def __repr__(self):
return '<%s.%s object at %s>' % (
self.__class__.__module__,
self.__class__.__name__,
hex(id(self))
)

Just use
id(object)

There are a few issues here that aren't covered by any of the other answers.
First, id only returns:
the “identity” of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
In CPython, this happens to be the pointer to the PyObject that represents the object in the interpreter, which is the same thing that object.__repr__ displays. But this is just an implementation detail of CPython, not something that's true of Python in general. Jython doesn't deal in pointers, it deals in Java references (which the JVM of course probably represents as pointers, but you can't see those—and wouldn't want to, because the GC is allowed to move them around). PyPy lets different types have different kinds of id, but the most general is just an index into a table of objects you've called id on, which is obviously not going to be a pointer. I'm not sure about IronPython, but I'd suspect it's more like Jython than like CPython in this regard. So, in most Python implementations, there's no way to get whatever showed up in that repr, and no use if you did.
But what if you only care about CPython? That's a pretty common case, after all.
Well, first, you may notice that id is an integer;* if you want that 0x2aba1c0cf890 string instead of the number 46978822895760, you're going to have to format it yourself. Under the covers, I believe object.__repr__ is ultimately using printf's %p format, which you don't have from Python… but you can always do this:
format(id(spam), '#010x' if sys.maxsize.bit_length() <= 32 else '#18x')
* In 3.x, it's an int. In 2.x, it's an int if that's big enough to hold a pointer—which is may not be because of signed number issues on some platforms—and a long otherwise.
Is there anything you can do with these pointers besides print them out? Sure (again, assuming you only care about CPython).
All of the C API functions take a pointer to a PyObject or a related type. For those related types, you can just call PyFoo_Check to make sure it really is a Foo object, then cast with (PyFoo *)p. So, if you're writing a C extension, the id is exactly what you need.
What if you're writing pure Python code? You can call the exact same functions with pythonapi from ctypes.
Finally, a few of the other answers have brought up ctypes.addressof. That isn't relevant here. This only works for ctypes objects like c_int32 (and maybe a few memory-buffer-like objects, like those provided by numpy). And, even there, it isn't giving you the address of the c_int32 value, it's giving you the address of the C-level int32 that the c_int32 wraps up.
That being said, more often than not, if you really think you need the address of something, you didn't want a native Python object in the first place, you wanted a ctypes object.

Just in response to Torsten, I wasn't able to call addressof() on a regular python object. Furthermore, id(a) != addressof(a). This is in CPython, don't know about anything else.
>>> from ctypes import c_int, addressof
>>> a = 69
>>> addressof(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: invalid type
>>> b = c_int(69)
>>> addressof(b)
4300673472
>>> id(b)
4300673392

You can get something suitable for that purpose with:
id(self)

With ctypes, you can achieve the same thing with
>>> import ctypes
>>> a = (1,2,3)
>>> ctypes.addressof(a)
3077760748L
Documentation:
addressof(C instance) -> integer
Return the address of the C instance internal buffer
Note that in CPython, currently id(a) == ctypes.addressof(a), but ctypes.addressof should return the real address for each Python implementation, if
ctypes is supported
memory pointers are a valid notion.
Edit: added information about interpreter-independence of ctypes

I know this is an old question but if you're still programming, in python 3 these days... I have actually found that if it is a string, then there is a really easy way to do this:
>>> spam.upper
<built-in method upper of str object at 0x1042e4830>
>>> spam.upper()
'YO I NEED HELP!'
>>> id(spam)
4365109296
string conversion does not affect location in memory either:
>>> spam = {437 : 'passphrase'}
>>> object.__repr__(spam)
'<dict object at 0x1043313f0>'
>>> str(spam)
"{437: 'passphrase'}"
>>> object.__repr__(spam)
'<dict object at 0x1043313f0>'

You can get the memory address/location of any object by using the 'partition' method of the built-in 'str' type.
Here is an example of using it to get the memory address of an object:
Python 3.8.3 (default, May 27 2020, 02:08:17)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> object.__repr__(1)
'<int object at 0x7ca70923f0>'
>>> hex(int(object.__repr__(1).partition('object at ')[2].strip('>'), 16))
0x7ca70923f0
>>>
Here, I am using the built-in 'object' class' '__repr__' method with an object/item such as 1 as an argument to return the string and then I am partitioning that string which will return a tuple of the string before the string that I provided, the string that I provided and then the string after the string that I provided, and as the memory location is positioned after 'object at', I can get the memory address as it has partitioned it from that part.
And then as the memory address was returned as the third item in the returned tuple, I can access it with index 2 from the tuple. But then, it has a right angled bracket as a suffix in the string that I obtained, so I use the 'strip' function to remove it, which will return it without the angled bracket. I then transformed the resulted string into an integer with base 16 and then turn it into a hex number.

While it's true that id(object) gets the object's address in the default CPython implementation, this is generally useless... you can't do anything with the address from pure Python code.
The only time you would actually be able to use the address is from a C extension library... in which case it is trivial to get the object's address since Python objects are always passed around as C pointers.

If the __repr__ is overloaded, you may consider __str__ to see the memory address of the variable.
Here is the details of __repr__ versus __str__ by Moshe Zadka in StackOverflow.

There is a way to recovery the value from the 'id' command, here it the TL;DR.
ctypes.cast(memory_address,ctypes.py_object).value
source

Is there any lower layer object manipulation function than "marshal" or "cPickle" in Python?

For example, Marshal is still parsing the input data, according to Python source.
.....
case TYPE_FALSE:
Py_INCREF(Py_False);
retval = Py_False;
break;
case TYPE_TRUE:
Py_INCREF(Py_True);
retval = Py_True;
break;
case TYPE_INT:
retval = PyInt_FromLong(r_long(p));
break;
case TYPE_INT64:
retval = r_long64(p);
break;
case TYPE_LONG:
retval = r_PyLong(p);
break;
case TYPE_FLOAT:
.......
Is there any lower layer object manipulation function than "marshal" or "cPickle" in Python?
For example, I already loaded the dumped data to memory, which I just want to type casting like we can do in C/C++, (PyObject *) data_loaded_in_memory;
Edit: If this cannot be done in python directly, any hints about C functions to write that ability would be great.

Don't think you could just take the memory image of python object, store it, load it back in different interpreter or even the same one later and expect it to make sense. It will not. Integers and floats may be fully contained in the object structure, but string already has a separately-allocated buffer for the data and even a long does.
In another words, cPickle is the lowest possible layer (cPickle is lower level than marshal, because the later maintains compatibility between versions and platforms, which cPickle does not) allowing to store the objects and load them in another interpreter or the same interpreter if they were released from memory in between.

If you do not really need to serialize Python objects in general, only encode and decode certain specific things, then you might look at the struct module. This module would be used to work directly with byte-strings that represent C structs (example: DNS protocol packets). Similar idea to pack and unpack in Perl.

Which is a better repr for a custom Python class?

It seems there are different ways the __repr__ function can return.
I have a class InfoObj that stores a number of things, some of which I don't particularly want users of the class to set by themselves. I recognize nothing is protected in python and they could just dive in and set it anyway, but seems defining it in __init__ makes it more likely someone might see it and assume it's fine to just pass it in.
(Example: Booleans that get set by a validation function when it determines that the object has been fully populated, and values that get calculated from other values when enough information is stored to do so... e.g. A = B + C, so once A and B are set then C is calculated and the object is marked Valid=True.)
So, given all that, which is the best way to design the output of __ repr__?
bob = InfoObj(Name="Bob")
# Populate bob.
# Output type A:
bob.__repr__()
'<InfoObj object at 0x1b91ca42>'
# Output type B:
bob.__repr__()
'InfoObj(Name="Bob",Pants=True,A=7,B=5,C=2,Valid=True)'
# Output type C:
bob.__repr__()
'InfoObj.NewInfoObj(Name="Bob",Pants=True,A=7,B=5,C=2,Valid=True)'
... the point of type C would be to not happily take all the stuff I'd set 'private' in C++ as arguments to the constructor, and make teammates using the class set it up using the interface functions even if it's more work for them. In that case I would define a constructor that does not take certain things in, and a separate function that's slightly harder to notice, for the purposes of __repr__
If it makes any difference, I am planning to store these python objects in a database using their __repr__ output and retrieve them using eval(), at least unless I come up with a better way. The consequence of a teammate creating a full object manually instead of going through the proper interface functions is just that one type of info retrieval might be unstable until someone figures out what he did.

The __repr__ method is designed to produce the most useful output for the developer, not the enduser, so only you can really answer this question. However, I'd typically go with option B. Option A isn't very useful, and option C is needlessly verbose -- you don't know how your module is imported anyway. Others may prefer option C.
However, if you want to store Python objects is a database, use pickle.
import pickle
bob = InfoObj(Name="Bob")
> pickle.dumps(bob)
b'...some bytestring representation of Bob...'
> pickle.loads(pickle.dumps(bob))
Bob(...)
If you're using older Python (pre-3.x), then note that cPickle is faster, but pickle is more extensible. Pickle will work on some of your classes without any configuration, but for more complicated objects you might want to write custom picklers.

What is the Python equivalent of Ruby's "inspect"?

I just want to quickly see the properties and values of an object in Python, how do I do that in the terminal on a mac (very basic stuff, never used python)?
Specifically, I want to see what message.attachments are in this Google App Engine MailHandler example (images, videos, docs, etc.).

If you want to dump the entire object, you can use the pprint module to get a pretty-printed version of it.
from pprint import pprint
pprint(my_object)
# If there are many levels of recursion, and you don't want to see them all
# you can use the depth parameter to limit how many levels it goes down
pprint(my_object, depth=2)
Edit: I may have misread what you meant by 'object' - if you're wanting to look at class instances, as opposed to basic data structures like dicts, you may want to look at the inspect module instead.

use the getmembers attribute of the inspect module
It will return a list of (key, value) tuples. It gets the value from obj.__dict__ if available and uses getattr if the the there is no corresponding entry in obj.__dict__. It can save you from writing a few lines of code for this purpose.

Update
There are better ways to do this than dir. See other answers.
Original Answer
Use the built in function dir(fp) to see the attributes of fp.

I'm surprised no one else has mentioned Python's __str__ method, which provides a string representation of an object. Unfortunately, it doesn't seem to print automatically in pdb.
One can also use __repr__ for that, but __repr__ has other requirements: for one thing, you are (at least in theory) supposed to be able to eval() the output of __repr__, though that requirement seems to be enforced only rarely.

Try
repr(obj) # returns a printable representation of the given object
or
dir(obj) # the list of object methods
or
obj.__dict__ # object variables

Or unify Abrer and Mazur answers and get:
from pprint import pprint
pprint(my_object.__dict__ )

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.