how many space assigned for empty dictionary in python? [duplicate] - python

This question already has answers here:
How are Python's Built In Dictionaries Implemented?
(3 answers)
Closed 7 years ago.
If we create an empty dictionary, like: idict = {}, how many spaces are assigned for this dictionary ? I know for the list, if we initialize a list like ilist = [], it will always over-allocate the size, first is 4 space, then 8.
What about a dictionary ?

Well, dictionaries don't store the actual string inside them, it works a bit like C/C++ pointers, so you only get a constant overhead in the dictionary for every element.
Testing against
import sys
x = {}
sys.getsizeof(x)
The dictionary itself consists of a number of buckets, each containing:
the hash code of the object currently stored (that is not predictable
from the position of the bucket due to the collision resolution
strategy used)
a pointer to the key object a pointer to the value
object in total at least 12 bytes on 32bit and 24 bytes on 64bit.
The dictionary starts out with 8 empty buckets and is resized by doubling the number of entries whenever its capacity is reached (currently (2n+1)/3).

To be honest it actually works like associative map in C++..If you have ever used C++ then..If you see the source code of Python Interpreter, you will see that it uses your heap memory section to store data to two type & use pointers to point one data to other exactly like map works in C++. In my system it is 280.. Now as #Navneet said you can use sys.getsizeof to calculate the size. But remember that it is system specific & hence your system might not give you 280bytes. Understand that if it is 280bytes, it means it uses a delicate thread of several associative pointers to store an point to the data structure

Related

How does Python know to use the same object in memory? [duplicate]

This question already has answers here:
In Python, when are two objects the same?
(2 answers)
Closed 9 days ago.
If I use the below:
a = 1000
print(id(a))
myList = [a,2000,3000,4000]
print(id(myList[0]))
# prints the same IDs
I get the same id. This makes sense to me. I can understand how the memory manager could assign the same object to these variables, because I am directly referencing a in the list.
However, if I do this instead:
a = 1000
print(id(a))
myList = [1000,2000,3000,4000]
print(id(myList[0]))
# prints the same IDs
I STILL get the same id being output for both prints. How does Python know to use the same object for these assignments? Searching for pre-existence would surely be hugely inefficient so I am presuming something more clever is going on here.
My first thought was something to do with the integer itself being used to calculate the objects address, but the behaviour also holds true for strings:
a = "car"
print(id(a))
myList = ["car",2000,3000,4000]
print(id(myList[0]))
# prints the same IDs
The behaviour does NOT however, hold true for list elements:
a = [1,2,3]
print(id(a))
myList = [[1,2,3],2000,3000,4000]
print(id(myList[0]))
# prints different IDs
Can someone explain the behaviour I am seeing?
EDIT - I have encountered that for small values between -5 and 256, the same object may be used. The thing is that I am seeing the same object still being used even for huge values, or even strings:
a = 1000000000000
myList = [1000000000000,1000,2000]
print(a is myList[0])
# outputs True!
My question is How can Python work out that it is the same object in these cases without searching for pre-existence? Let's say CPython specifically
EDIT - I am using Python V3.8.10
In Python, small and unchanging values like numbers and short strings are stored only once (unless operators are used for making them, this way a new object is created for that) in the computer's memory to save space and speed up the program. This process is called "interning". This means that when you write the same value multiple times, it will have the same memory address (id), and you will get the same id for each instance of that value. However, lists and other more complex data types are not interned, so every time you use a list, a new memory space is allocated for it, giving it a different id.

Why memory space allocation is different for the same objects?

I was experimenting with how Python allocates the memory, so found the same issue like
Size of list in memory and Eli describes in a much better way. His answer leads me to the new doubt that, I checked the size of 1 + [] and [1], but it is different as you can see in the code snippet. if I'm not wrong memory space allocation should be the same. But it's not the case. Anyone can help me with the understanding?
>>> import sys
>>> sys.getsizeof(1)
28
>>> sys.getsizeof([])
64
>>> 28 + 64
92
>>> sys.getsizeof([1])
72
What's the minimum information a list needs to function?
some kind of top-level list object, containg a reference to the class information (methods, type info, etc), and the list's own instance data
the actual objects stored in the list
... that gets you the size you expected. But is it enough?
A fixed-size list object can only track a fixed number of list entries: traditionally just one (head) or two (head and tail).
Adding more entries to the list doesn't change the size of the list object itself, so there must be some extra information: the relationship between list elements.
It's possible to store this information in every Object (this is called an intrusive list), but it's very restrictive: each Object can only be stored in one list at a time.
Since Python lists clearly don't behave like that, we know this extra information isn't already in the list element, and it can't be inside the list object, so it must be stored elsewhere. Which increases the total size of the list.
NB. I've kept this argument fairly abstract deliberately. You could implement list a few different ways, but none of them avoid some extra storage for element relationships, even if the representation differs.

How does Python manage to loop through values in a dictionary?

I want to know how Python loops through values in a dictionary. I know how to do it in code, and all of the answers I have read just explain how to do it.
I want to understand how python finds the values, as I thought that dictionary values were associated with keys. Do dictionary items also have an index value or something?
Thanks for the answers or references to a relevant source in advance :)
I've Googled, stackoverflowed, and read.
edit: I'm interested in how Python3.7 achieves this
According to the source code (dict_items(PyDictObject *mp)) a list of n (size # of key/value pairs in the dictionary) tuples is allocated, and for each non-null value item (line 2278: if (value != NULL)), it is set at the corresponding index of the tuple list.
The python object itself is basically a chunk of memory that knows the size of each object (offset), and where the values start (value_ptr), and where the keys are (ep). So when you get the keys/values (for k,v in object) it basically did an entire traversal through the used up portion of the allocated memory for the object.
Btw, it may help to know PyList_SET_ITEM is just a macro to set the value in an array by its desired index: #define PyList_SET_ITEM(op, i, v) (((PyListObject *)(op))->ob_item\[i\] = (v)). Since arrays are just values stored sequentially in memory, the index operator knows to place the value at memory location of start + (sizeOf(object)*index).
Disclaimer: This is the first time I've tried reading the python source code so my interpretation may be a bit off, or oversimplified.

What is a "Physically Stored Sequence" in Python?

I am currently reading Learning Python, 5th Edition - by Mark Lutz and have come across the phrase "Physically Stored Sequence".
From what I've learnt so far, a sequence is an object that contains items that can be indexed in sequential order from left to right e.g. Strings, Tuples and Lists.
So in regards to a "Physically Stored Sequence", would that be a Sequence that is referenced by a variable for use later on in a program? Or am not getting it?
Thank you in advance for your answers.
A Physically Stored Sequence is best explained by contrast. It is one type of "iterable" with the main example of the other type being a "generator."
A generator is an iterable, meaning you can iterate over it as in a "for" loop, but it does not actually store anything--it merely spits out values when requested. Examples of this would be a pseudo-random number generator, the whole itertools package, or any function you write yourself using yield. Those sorts of things can be the subject of a "for" loop but do not actually "contain" any data.
A physically stored sequence then is an iterable which does contain its data. Examples include most data structures in Python, like lists. It doesn't matter in the Python parlance if the items in the sequence have any particular reference count or anything like that (e.g. the None object exists only once in Python, so [None, None] does not exactly "store" it twice).
A key feature of physically stored sequences is that you can usually iterate over them multiple times, and sometimes get items other than the "first" one (the one any iterable gives you when you call next() on it).
All that said, this phrase is not very common--certainly not something you'd expect to see or use as a workaday Python programmer.

How to create a memoryview for a non-contiguous memory location?

I have a fragmented structure in memory and I'd like to access it as a contiguous-looking memoryview. Is there an easy way to do this or should I implement my own solution?
For example, consider a file format that consists of records. Each record has a fixed length header, that specifies the length of the content of the record. A higher level logical structure may spread over several records. It would make implementing the higher level structure easier if it could see it's own fragmented memory location as a simple contiguous array of bytes.
Update:
It seems that python supports this 'segmented' buffer type internally, at least based on this part of the documentation. But this is only the C API.
Update2:
As far as I see, the referenced C API - called old-style buffers - does what I need, but it's now deprecated and unavailable in newer version of Python (3.X). The new buffer protocol - specified in PEP 3118 - offers a new way to represent buffers. This API is more usable in most of the use cases (among them, use cases where the represented buffer is not contiguous in memory), but does not support this specific one, where a one dimensional array may be laid out completely freely (multiple differently sized chunks) in memory.
First - I am assuming you are just trying to do this in pure python rather than in a c extension. So I am assuming you have loaded in the different records you are interested in into a set of python objects and your problem is that you want to see the higher level structure that is spread across these objects with bits here and there throughout the objects.
So can you not simply load each of the records into a byte arrays type? You can then use python slicing of arrays to create a new array that has just the data for the high level structure you are interested in. You will then have a single byte array with just the data you are interested in and can print it out or manipulate it in any way that you want to.
So something like:
a = bytearray(b"Hello World") # put your records into byte arrays like this
b = bytearray(b"Stack Overflow")
complexStructure = bytearray(a[0:6]+b[0:]) # Slice and join arrays to form
# new array with just data from your
# high level entity
print complexStructure
Of course you will still ned to know where within the records your high level structure is to slice the arrays correctly but you would need to know this anyway.
EDIT:
Note taking a slice of a list does not copy the data in the list it just creates a new set of references to the data so:
>>> a = [1,2,3]
>>> b = a[1:3]
>>> id(a[1])
140268972083088
>>> id(b[0])
140268972083088
However changes to the list b will not change a as b is a new list. To have the changes automatically change in the original list you would need to make a more complicated object that contained the lists to the original records and hid them in such a way as to be able to decide which list and which element of a list to change or view when a user look to modify/view the complex structure. So something like:
class ComplexStructure():
def add_records(self,record):
self.listofrecords.append(record)
def get_value(self,position):
listnum,posinlist = ... # formula to figure out which list and where in
# list element of complex structure is
return self.listofrecords[listnum][record]
def set_value(self,position,value):
listnum,posinlist = ... # formula to figure out which list and where in
# list element of complex structure is
self.listofrecords[listnum][record] = value
Granted this is not the simple way of doing things you were hoping for but it should do what you need.

Categories

Resources