Why memory space allocation is different for the same objects? - python

I was experimenting with how Python allocates the memory, so found the same issue like
Size of list in memory and Eli describes in a much better way. His answer leads me to the new doubt that, I checked the size of 1 + [] and [1], but it is different as you can see in the code snippet. if I'm not wrong memory space allocation should be the same. But it's not the case. Anyone can help me with the understanding?
>>> import sys
>>> sys.getsizeof(1)
28
>>> sys.getsizeof([])
64
>>> 28 + 64
92
>>> sys.getsizeof([1])
72

What's the minimum information a list needs to function?
some kind of top-level list object, containg a reference to the class information (methods, type info, etc), and the list's own instance data
the actual objects stored in the list
... that gets you the size you expected. But is it enough?
A fixed-size list object can only track a fixed number of list entries: traditionally just one (head) or two (head and tail).
Adding more entries to the list doesn't change the size of the list object itself, so there must be some extra information: the relationship between list elements.
It's possible to store this information in every Object (this is called an intrusive list), but it's very restrictive: each Object can only be stored in one list at a time.
Since Python lists clearly don't behave like that, we know this extra information isn't already in the list element, and it can't be inside the list object, so it must be stored elsewhere. Which increases the total size of the list.
NB. I've kept this argument fairly abstract deliberately. You could implement list a few different ways, but none of them avoid some extra storage for element relationships, even if the representation differs.

Related

Why doesn't the id of a list change even if the list is moved in memory?

As I know, dynamic arrays (list in Python) move in memory when its size reaches its capacity. And as far as I know the id of an object corresponds to its memory address.
But when appending values to a list many times, its id doesn't change (so it stays in the same place in memory).
Why?
a = []
print(id(a)) # 2539296050560
for i in range(1_000_000):
a.append(i)
print(id(a)) # 2539296050560
You are making a confusion between the address of the list (what is the id in CPython) and the address of the data. Under the hood and in CPython, a list is an object that contains a pointer to the beginning of its data. So when you extend the list, the data will be moved in memory but the list object will not, allowing it to keep a fix id - which is required per the language.

How is memory handled in Python's Lists?

See the code below, as you see when a=[1,2] that is a homogeneous type the address of 1st and 2nd elements differed by 32 bits
but in second case when a=[1,'a',3],there is no relation between address of 1st and 2nd element but there is relation between 1st and 3rd element that is address differs by 64 bits.
So I want to know how is memory handled and how indexing takes place and how is it linked to being non hashable (that is mutable)
>>> a=[1,2]
>>> print(id(a[0]))
4318513456
>>> print(id(a[1]))
4318513488
>>> a=[1,'a',3]
>>> print(id(a[0]))
4318513456
>>> print(id(a[1]))
4319642992
>>> print(id(a[2]))
4318513520
>>>
In general, ids don't matter. Don't worry about ids. Don't look at ids. In fact, it's a CPython implementation detail that ids are memory addresses, because it's convenient. Another Python implementation might do something else.
In any case, you're seeing CPython's small integer cache (see e.g. this question), where certain integer objects are preallocated as "singleton" objects since they're expected to be used often. However, that too is an implementation detail.
The string 'a', on the other hand, is not cached (it might have been interned if your code had been loaded from a .py file on disk), so it's allocated from somewhere else.
As for your question about indexing, CPython (again, another implementation might do things differently) lists are, under the hood, arrays of pointers to PyObjects, so it's just an O(1) operation.

getsizeof returns the same value for seemingly different lists

I have the following two dimensional bitmap:
num = 521
arr = [i == '1' for i in bin(num)[2:].zfill(n*n)]
board = [arr[n*i:n*i+n] for i in xrange(n)]
Just for curiosity I wanted to check how much more space will it take, if it will have integers instead of booleans. So I checked the current size with sys.getsizeof(board) and got 104
After that I modified
arr = [int(i) for i in bin(num)[2:].zfill(n*n)] , but still got 104
Then I decided to see how much will I get with just strings:
arr = [i for i in bin(num)[2:].zfill(n*n)], which still shows 104
This looks strange, because I expected list of lists of strings to waste way more memory than just booleans.
Apparently I am missing something about how the getsizeof calculates the size. Can anyone explain me why I get such results.
P.S. thanks to zehnpard's answer, I see that I can use sum(sys.getsizeof(i) for line in board for i in line) to approximately count the memory (most probably it will not count the lists, which is not that much important for me). Now I see the difference in numbers for string and int/bool (no difference for int and boolean)
The docs for the sys module since Python 3.4 is pretty explicit:
Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to.
Given that Python lists are effectively arrays of pointers to other Python objects, the number of elements a Python list contains will influence its size in memory (more pointers) but the type of objects contained will not (memory-wise, they aren't contained in the list, just pointed at).
To get the size of all items in a container, you need a recursive solution, and the docs helpfully provide a link to an activestate recipe.
http://code.activestate.com/recipes/577504/
Given that this recipe is for Python 2.x, I'm sure this behavior was always standard, and got explicitly mentioned in the docs since 3.4 onwards.

how many space assigned for empty dictionary in python? [duplicate]

This question already has answers here:
How are Python's Built In Dictionaries Implemented?
(3 answers)
Closed 7 years ago.
If we create an empty dictionary, like: idict = {}, how many spaces are assigned for this dictionary ? I know for the list, if we initialize a list like ilist = [], it will always over-allocate the size, first is 4 space, then 8.
What about a dictionary ?
Well, dictionaries don't store the actual string inside them, it works a bit like C/C++ pointers, so you only get a constant overhead in the dictionary for every element.
Testing against
import sys
x = {}
sys.getsizeof(x)
The dictionary itself consists of a number of buckets, each containing:
the hash code of the object currently stored (that is not predictable
from the position of the bucket due to the collision resolution
strategy used)
a pointer to the key object a pointer to the value
object in total at least 12 bytes on 32bit and 24 bytes on 64bit.
The dictionary starts out with 8 empty buckets and is resized by doubling the number of entries whenever its capacity is reached (currently (2n+1)/3).
To be honest it actually works like associative map in C++..If you have ever used C++ then..If you see the source code of Python Interpreter, you will see that it uses your heap memory section to store data to two type & use pointers to point one data to other exactly like map works in C++. In my system it is 280.. Now as #Navneet said you can use sys.getsizeof to calculate the size. But remember that it is system specific & hence your system might not give you 280bytes. Understand that if it is 280bytes, it means it uses a delicate thread of several associative pointers to store an point to the data structure

How to create a memoryview for a non-contiguous memory location?

I have a fragmented structure in memory and I'd like to access it as a contiguous-looking memoryview. Is there an easy way to do this or should I implement my own solution?
For example, consider a file format that consists of records. Each record has a fixed length header, that specifies the length of the content of the record. A higher level logical structure may spread over several records. It would make implementing the higher level structure easier if it could see it's own fragmented memory location as a simple contiguous array of bytes.
Update:
It seems that python supports this 'segmented' buffer type internally, at least based on this part of the documentation. But this is only the C API.
Update2:
As far as I see, the referenced C API - called old-style buffers - does what I need, but it's now deprecated and unavailable in newer version of Python (3.X). The new buffer protocol - specified in PEP 3118 - offers a new way to represent buffers. This API is more usable in most of the use cases (among them, use cases where the represented buffer is not contiguous in memory), but does not support this specific one, where a one dimensional array may be laid out completely freely (multiple differently sized chunks) in memory.
First - I am assuming you are just trying to do this in pure python rather than in a c extension. So I am assuming you have loaded in the different records you are interested in into a set of python objects and your problem is that you want to see the higher level structure that is spread across these objects with bits here and there throughout the objects.
So can you not simply load each of the records into a byte arrays type? You can then use python slicing of arrays to create a new array that has just the data for the high level structure you are interested in. You will then have a single byte array with just the data you are interested in and can print it out or manipulate it in any way that you want to.
So something like:
a = bytearray(b"Hello World") # put your records into byte arrays like this
b = bytearray(b"Stack Overflow")
complexStructure = bytearray(a[0:6]+b[0:]) # Slice and join arrays to form
# new array with just data from your
# high level entity
print complexStructure
Of course you will still ned to know where within the records your high level structure is to slice the arrays correctly but you would need to know this anyway.
EDIT:
Note taking a slice of a list does not copy the data in the list it just creates a new set of references to the data so:
>>> a = [1,2,3]
>>> b = a[1:3]
>>> id(a[1])
140268972083088
>>> id(b[0])
140268972083088
However changes to the list b will not change a as b is a new list. To have the changes automatically change in the original list you would need to make a more complicated object that contained the lists to the original records and hid them in such a way as to be able to decide which list and which element of a list to change or view when a user look to modify/view the complex structure. So something like:
class ComplexStructure():
def add_records(self,record):
self.listofrecords.append(record)
def get_value(self,position):
listnum,posinlist = ... # formula to figure out which list and where in
# list element of complex structure is
return self.listofrecords[listnum][record]
def set_value(self,position,value):
listnum,posinlist = ... # formula to figure out which list and where in
# list element of complex structure is
self.listofrecords[listnum][record] = value
Granted this is not the simple way of doing things you were hoping for but it should do what you need.

Categories

Resources