How python variables look like in memory - python

I want to know how a Python variable (int, list, tuple) look like in memory. And this is where I am right now.
from ctypes import string_at
from sys import getsizeof
from binascii import hexlify
string_at(id(a), getsizeof(a))
I expect it will return the hex representation of a variable in the memory.
However, here is the output when I assign value 1,2,3 to variable 'a':
1 - '\xd6\x05\x00\x00\x00\x00\x00\x00\xc0\x92\x17\x00\x01\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00'
2 - '\x17\x02\x00\x00\x00\x00\x00\x00\xc0\x92\x17\x00\x01\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00'
3 - '\xdc\x00\x00\x00\x00\x00\x00\x00\xc0\x92\x17\x00\x01\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00'
4 - '\x06\x01\x00\x00\x00\x00\x00\x00\xc0\x92\x17\x00\x01\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00'
Somewhere close to the middle, I can see \x01, \x02...etc. However, here are my other questions:
At the beginning, I can see two other bytes changing, what are those values?
Except for those \x00, I can see a few other bytes like ...\xc0\x92\x17\x00\x01... how to interpret those values?
Is there any resource available for me to learn how python store variables in memory?

Download the Python C sources and study them. You'll see that "almost everything" is a PyObject* -- a pointer to a PyObject struct. The \x01 &c you see are just random bytes within some of those pointers, nothing to do with the 1,2,3 tuple you're seeing, not directly at least!

Related

python lightweight solution for typecasting float(64) to bytes

I'd like to have a very simple solution in displaying the raw bytes for a float value (or more consecutive ones in memory). in my understanding, this is named typecasting (reading the memory values in byte) not to be misunderstood as casting (reading the value and interpreting that in byte).
The simplest test seem to be:
import numpy
a=3.14159265
print(a.hex())
# Returns 0x1.921fb53c8d4f1p+1
b=numpy.array(a)
print(b.tobytes())
# returns b'\xf1\xd4\xc8S\xfb!\t#'
# expected is something like 'F1' 'D4' 'C8' '53' 'FB' '21' '09' '40'
but the method hex() returns an Interpretation of the IEEE FLOAT represantation in hex. The second method shows four hex-Byte markers \x but I wonder as a float64 should read 8 Bytes. Further I'm wondering by the other characters.
I good old simple C I would have implemented that by simply using an unsigned int pointer on the memory address of the float and printing 8 values from that unsigned int "Array" (pointer.)
I know, that I can use C within python - but are there other simple solutions?
Maybe as it is of interest: I need that functionalaty to save many big float vectors to a BLOB into a database.
I think, similiar Problems are to be found in Correct interpretation of hex byte, convert it to float reading that if possible (displayable characters) they are not printed in \x-form. How can I change that?
You can use the built-in module struct for this. If you have Python 3.5 or later:
import struct
struct.pack('d', a).hex()
It gives:
'f1d4c853fb210940'
If you have Python older than 3.5:
import binascii
binascii.hexlify(struct.pack('d', a))
Or:
hex(struct.unpack('>Q', struct.pack('d', a))[0])
If you have an array of floats and want to use NumPy:
import numpy as np
np.set_printoptions(formatter={'int':hex})
np.array([a]).view('u8')

How to convert a gdb Value to a python numeral object while debugging C program

I'm using python2.6's gdb module while debugging a C program, and would like to convert a gdb.Value instance into a python numeral object (variable) based off the instance's '.Type'.
E.g. turn my C program's SomeStruct->some_float_val = 1./6; to a Python gdb.Value via sfv=gdb.parse_and_eval('SomeStruct->some_double_val'), but THEN turn this into a double precision floating point python variable -- knowing that str(sfv.type.strip_typedefs())=='double' and its size is 8B -- WITHOUT just converting through a string using dbl=float(str(sfv)) or Value.string() but rather something like unpacking the bytes using struct to get the correct double value.
Every link returned from my searches points https://sourceware.org/gdb/onlinedocs/gdb/Values-From-Inferior.html#Values-From-Inferior, but I can't see how to convert a Value instance into a python variable cleanly, say the Value wasn't even in C memory but represented a gdb.Value.address (so can't use Inferior.read_memory()), how would one turn this into a Python int without casting string values?
You can convert it directly from the Value using int or float:
(gdb) python print int(gdb.Value(0))
0
(gdb) python print float(gdb.Value(0.0))
0.0
There seems to be at least one glitch in the system, though, as float(gdb.Value(0)) does not work.
I stumbled upon this while trying to figure out how to do bitwise operations on a pointer. In my particular use case, I needed to calculate a page alignment offset. Python did not want to cast a pointer Value to int, however, the following worked:
int(ptr.cast(gdb.lookup_type("unsigned long long")))
We first make gdb cast our pointer to unsigned long long, and then the resulting gdb.Value can be cast to Python int.

Python Struct Packing Unpacking

I've two bytes \x22\x38. (Read from from a Process Memory so preferably little endian)
I am pretty sure these bytes get converted 0x588, But don't know how.
I want to know how python struct module can be used to convert \x22\x38 to 0x588.
There's something else going on if somehow 2216/3816 maps to anything other than 223816 or 382216, but in any event:
>>> import struct
>>> data = b'\x22\x38'
>>> struct.unpack('<h', data)
(14370,)
>>> struct.unpack('>h', data)
(8760,)
Note that unpack() returns a tuple. h is for short (as in a C short), the < or > sets the endianness. See the struct package docs for full info.

python 32 bit float conversion

Python 2.6 on Redhat 6.3
I have a device that saves 32 bit floating point value across 2 memory registers, split into most significant word and least significant word.
I need to convert this to a float.
I have been using the following code found on SO and it is similar to code I have seen elsewhere
#!/usr/bin/env python
import sys
from ctypes import *
first = sys.argv[1]
second = sys.argv[2]
reading_1 = str(hex(int(first)).lstrip("0x"))
reading_2 = str(hex(int(second)).lstrip("0x"))
sample = reading_1 + reading_2
def convert(s):
i = int(s, 16) # convert from hex to a Python int
cp = pointer(c_int(i)) # make this into a c integer
fp = cast(cp, POINTER(c_float)) # cast the int pointer to a float pointer
return fp.contents.value # dereference the pointer, get the float
print convert(sample)
an example of the register values would be ;
register-1;16282 register-2;60597
this produces the resulting float of
1.21034872532
A perfectly cromulent number, however sometimes the memory values are something like;
register-1;16282 register-2;1147
which, using this function results in a float of;
1.46726675314e-36
which is a fantastically small number and not a number that seems to be correct. This device should be producing readings around the 1.2, 1.3 range.
What I am trying to work out is if the device is throwing bogus values or whether the values I am getting are correct but the function I am using is not properly able to convert them.
Also is there a better way to do this, like with numpy or something of that nature?
I will hold my hand up and say that I have just copied this code from examples on line and I have very little understanding of how it works, however it seemed to work in the test cases that I had available to me at the time.
Thank you.
If you have the raw bytes (e.g. read from memory, from file, over the network, ...) you can use struct for this:
>>> import struct
>>> struct.unpack('>f', '\x3f\x9a\xec\xb5')[0]
1.2103487253189087
Here, \x3f\x9a\xec\xb5 are your input registers, 16282 (hex 0x3f9a) and 60597 (hex 0xecb5) expressed as bytes in a string. The > is the byte order mark.
So depending how you get the register values, you may be able to use this method (e.g. by converting your input integers to byte strings). You can use struct for this, too; this is your second example:
>>> raw = struct.pack('>HH', 16282, 1147) # from two unsigned shorts
>>> struct.unpack('>f', raw)[0] # to one float
1.2032617330551147
The way you've converting the two ints makes implicit assumptions about endianness that I believe are wrong.
So, let's back up a step. You know that the first argument is the most significant word, and the second is the least significant word. So, rather than try to figure out how to combine them into a hex string in the appropriate way, let's just do this:
import struct
import sys
first = sys.argv[1]
second = sys.argv[2]
sample = int(first) << 16 | int(second)
Now we can just convert like this:
def convert(i):
s = struct.pack('=i', i)
return struct.unpack('=f', s)[0]
And if I try it on your inputs:
$ python floatify.py 16282 60597
1.21034872532
$ python floatify.py 16282 1147
1.20326173306

Convert binary information to regular data type without outside modules in python

I'm tasked with reading a poorly formatted binary file and taking in the variables. Although I need to do it in C++ (ROOT, specifically), I've decided to do it in python because python makes sense to me, but my plan is to get it working in python and then tackle re-writing in in C++, so using easy to use python modules won't get me too far later down the road.
Basically, I do this:
In [5]: some_value
Out[5]: '\x00I'
In [6]: ''.join([str(ord(i)) for i in some_value])
Out[6]: '073'
In [7]: int(''.join([str(ord(i)) for i in some_value]))
Out[7]: 73
And I know there has to be a better way. What do you think?
EDIT:
A bit of info on the binary format.
alt text http://grab.by/3njm
alt text http://grab.by/3njv
alt text http://grab.by/3nkL
This is the endian test I am using:
# Read a uint32 for endianess
endian_test = rq1_file.read(uint32)
if endian_test == '\x04\x03\x02\x01':
print "Endian test: \\x04\\x03\\x02\\x01"
swapbits = True
elif endian_test == '\x01\x02\x03\x04':
print "Endian test: \\x01\\x02\\x03\\x04"
swapbits = False
Your int(''.join([str(ord(i)) for i in some_value])) works ONLY when all bytes except the last byte are zero.
Examples:
'\x01I' should be 1 * 256 + 73 == 329; you get 173
'\x01\x02' should be 1 * 256 + 2 == 258; you get 12
'\x01\x00' should be 1 * 256 + 0 == 256; you get 10
It also relies on an assumption that integers are stored in bigendian fashion; have you verified this assumption? Are you sure that '\x00I' represents the integer 73, and not the integer 73 * 256 + 0 == 18688 (or something else)? Please let us help you verify this assumption by telling us what brand and model of computer and what operating system were used to create the data.
How are negative integers represented?
Do you need to deal with floating-point numbers?
Is the requirement to write it in C++ immutable? What does "(ROOT, specifically)" mean?
If the only dictate is common sense, the preferred order would be:
Write it in Python using the struct module.
Write it in C++ but use C++ library routines (especially if floating-point is involved). Don't re-invent the wheel.
Roll your own conversion routines in C++. You could snarf a copy of the C source for the Python struct module.
Update
Comments after the file format details were posted:
The endianness marker is evidently optional, except at the start of a file. This is dodgy; it relies on the fact that if it is not there, the 3rd and 4th bytes of the block are the 1st 2 bytes of the header string, and neither '\x03\x04' nor '\x02\x01' can validly start a header string. The smart thing to do would be to read SIX bytes -- if first 4 are the endian marker, the next two are the header length, and your next read is for the header string; otherwise seek backwards 4 bytes then read the header string.
The above is in the nuisance category. The negative sizes are a real worry, in that they specify a MAXIMUM length, and there is no mention of how the ACTUAL length is determined. It says "The actual size of the entry is then given line by line". How? There is no documentation of what a "line of data" looks like. The description mentions "lines" many times; are these lines terminated by carriage return and/or line feed? If so, how does one tell the difference between say a line feed byte and the first byte of say a uint16 that belongs to the current "line" of data? If no linefeed or whatever, how does one know when the current line of data is finished? Is there a uintNN size in front of every variable or slice thereof?
Then it says that (2) above (negative size) also applies to the header string. The mind boggles. Do you have any examples (in documentation of the file layout, or in actual files) of "negative size" of (a) header string (b) data "line"?
Is this "decided format" publically available e.g. documentation on the web? Does the format have a searchable name? Are you sure you are the first person in the world to want to read that format?
Reading that file format, even with a full specification, is no trivial exercise, even for a binary-format-experienced person who's also experienced with Python (which BTW doesn't have a float128). How many person-hours have you been allocated for the task? What are the penalties for (a) delay (b) failure?
Your original question involved fixing your interesting way of trying to parse a uint16 -- doing much more is way outside the scope/intention of what SO questions are all about.
You're basically computing a "number-in-base-256", which is a polynomial, so, by Horner's method:
>>> v = 0
>>> for c in someval: v = v * 256 + ord(c)
More typical would be to use equivalent bit-operations rather than arithmetic -- the following's equivalent:
>>> v = 0
>>> for c in someval: v = v << 8 | ord(c)
import struct
result, = struct.unpack('>H', some_value)
The equivalent to the Python struct module is a C struct and/or union, so being afraid to use it is silly.
I'm not exactly sure how the format of the data is you want to extract, but maybe you better just write a couple of generic utility functions to extract the different data type you need:
def int1b(data, i):
return ord(data[i])
def int2b(data, i):
return (int1b(data, i) << 8) + int1b(data, i+1)
def int4b(data, i):
return (int2b(data, i) << 16) + int2b(data, i+2)
With such functions you can easily extract values from the data and they also can be translated rather easily to C.

Categories

Resources