How SharedMemory in python define the size?

How SharedMemory in python define the size? - python

I have some prolem about SharedMemory in python3.8,any help will be good.
Question 1.
SharedMemory has one parameter SIZE,the doc tell me the unit is byte.I created a instance of 1 byte size,then, let shm.buf=bytearray[1,2,3,4], it can work and no any exception!why?
Question 2.
why print buffer is a memory address?
why i set size is 1byte,but result show it allocate 4096byte?
why buffer address and buffer[3:4] address is 3X16X16byte far away?
why buffer[3:4] address same as buffer[1:3] address?
from multiprocessing import shared_memory, Process
def writer(buffer):
print(buffer) # output: <memory at 0x7f982b6234c0>
buffer[:5] = bytearray([1, 2, 3, 4, 5])
buffer[4] = 12
print(buffer[3:4]) # output: <memory at 0x7f982b6237c0>
print(buffer[1:3]) # output: <memory at 0x7f982b6237c0>
if __name__ == '__main__':
shm = shared_memory.SharedMemory('my_memory', create=True, size=1)
print(shm.size) # output: 4096
writer(shm.buf)
print('shm is :', shm) # output: shm is : SharedMemory('my_memory', size=4096)

In answer to question 2:
buffer[3:4] is not, as you seem to suppose, an array reference. It is an expression that takes a slice of buffer and assigns it to a new unnamed variable, which your function prints the ID of, then throws away. Then buffer[1:3] does something similar and the new unnamed variable coincidentally gets allocated to the same memory location as the now disappeared copy of buffer[3:4], because Python's garbage collection knew that location was free.
If you don't throw away the slices after creating them, they will be allocated to different locations. Try this:
>>> b34 = buffer[3:4]
>>> b13 = buffer[1:3]
>>> b34
<memory at 0x0000024E65662940>
>>> b13
<memory at 0x0000024E65662A00>
In this case they are at different locations because there are variables that refer to them both.
And both are at different locations to buffer because they are all 3 different variables that are only related to one another by history, because of the way they were created.
Python variables are not blocks of raw memory that you can index into with a pointer, and thinking that you can access them directly as if they were C byte arrays is not helpful. What a particular Python interpreter does with your data deep down inside is generally not the business of a Python programmer. At the application level, why should anyone care where and how buffer[3:4] is stored?
There is one good reason: if you have huge amounts of data, you may need to understand the details of the implementation because you have a performance problem. But generally the solution at the application level is to use modules like array or pandas where very clever people have already thought about those problems and come up with solutions.
What you are asking is not a question about Python, but about the the details of a particular CPython implementation. If you are interested in that sort of thing, and there is no reason why you should not be, you can always go and read the source. But that might be a little overambitious for the moment.

Related

Possible to get "value of address" in python?

In the following, I can see that when adding an integer in python, it adds the integer, assigns that resultant value to a new memory address and then sets the variable to point to that memory address:
>>> a=100
>>> id(a)
4304852448
>>> a+=5
>>> id(a)
4304852608
Is there a way to see what the value is at the (old) memory address 4304852448 (0x10096d5e0) is? For example: value_of(0x10096d5e0)

it adds the integer, assigns that resultant value to a new memory address
No; the object that represents the resulting value has a different memory address. Typically this object is created on the fly, but specifically for integers (which are immutable, and for some other types applying a similar optimization) the object may already exist and be reused.
Is there a way to see what the value is at the (old) memory address 4304852448 (0x10096d5e0) is?
This question is not well posed. First off, "memory addresses" in this context are virtualized, possibly more than once. Second, in the Python model, addresses do not "have values". They are potentially the location of objects, which in turn represent values.
That said: at the location of the previous id(a), the old object will still be present if and only if it has not been garbage-collected. For this, it is sufficient that some other reference is held to the object. (The timing of garbage collection is implementation-defined, but the reference CPython implementation implements garbage collection by reference counting.)
You are not, in general, entitled to examine the underlying memory directly (except perhaps with some kind of process spy that has the appropriate permissions), because Python is just not that low-level of a language. (As in #xprilion's answer, the CPython implementation provides a lower-level memory interface via ctypes; however, the code there is effectively doing an unsafe cast.)
If you did (and assuming CPython), you would not see a binary representation of the integer 100 in the memory location indicated by calling id(a) the first time - you would see instead the first word of the PyObject struct used to implement objects in the C code, which would usually be the refcount, but could be a pointer to the next live heap object in a reference-tracing Python build (if I'm reading it correctly).
Once it's garbage-collected, the contents of that memory are again undefined. They depend on the implementation, and even specifically for the C implementation, they depend on what the standard library free() does with the memory. (Normally it will be left untouched, because there is no reason to zero it out and it takes time to do so. A debug build might write special values into that memory, because it helps detect memory corruption.)

You are trying to dereference the memory address, much like the pointers used in C/C++. The following code might help you -
import ctypes
a = 100
b = id(a)
print(b, ": ", a)
a+=5
print(id(a), ": ", a)
print(ctypes.cast(b, ctypes.py_object).value)
OUTPUT:
140489243334080 : 100
140489243334240 : 105
100
The above example establishes that the value stored in the previous address of a remains the same.
Hope this answers the question. Read #Karl's answer for more information about what's happening in the background - https://stackoverflow.com/a/58468810/2650341

How to restore a variable value from memory in an ipython session?

How would you restore a python variable from memory once it's been overwritten?
Maybe this makes things a little easier. I'm currently in an ipython session like:
In [1]: var = method_that_cant_be_reproduced()
In [2]: var = [4, 5, 6]
Would it be possible to restore the values assigned to var in step 1, somehow? I'm assuming it hasn't been garbage collected.
I also know some details about the previous value. It was a list and I know its size.

When you used var = [4, 5, 6] the name var was rebound to a new object (the list) and the original object was lost (as there aren't any names referencing it anymore). Because there's nothing left referencing your original object the answer to your question is almost certainly no.
While this may not be the answer that you wanted there are several things to learn from this:
In the event that a function can only be run once/variable can only be generated once, don't overwrite it, keep track of it somewhere be it in another variable or a file
Try not to be in the situation that a variable can only be generated once (what if your generation procedure has a problem and the source is lost?), make a copy of your source before consuming it
Anticipate the unexpected, what if your computer had crashed and your IPython session was lost? Had you stored your result anywhere?
So you've made a mistake, you can still learn from this and move on knowing you won't make the same mistake again.

According to the documentation "the result of input line 4 is available either as Out[4] or as _4".
More details here.

I like to pickle them for retrieval later - especially when I have a class instance or something that takes a long time to recreate when I'm using it for testing.
import pickle
with open('pickled_var', 'wb') as picklefile:
pickle.dump('asdf', picklefile)
with open('pickled_var', 'rb') as picklefile:
unpickled_var = pickle.load(picklefile)
print(unpickled_var)
asdf
in your case, you could pickle it when it's the values you want, and then just load over the "bad" variable to get back to a "good" state.

Update ctypes pointer in place

For various reasons I would like to update the value of a ctypes pointer in place. In other words, what I want is to modify the internal buffer the pointer object wraps. Here is one possible approach:
from ctypes import *
a = pointer(c_int(123))
b = pointer(c_int(456))
memmove(addressof(a), addressof(b), sizeof(c_void_p))
a._objects.clear()
a._objects.update(b._objects)
Now a.contents will return c_long(456). However, playing around with the _objects attribute seems like it's too concerned with the implementation details (will this even behave correctly?). Is there a more idiomatic way to do this?

Since eryksun hasn't posted up his answer I'll add it here my self. This is how it should be done:
from ctypes import *
a = pointer(c_int(123))
b = pointer(c_int(456))
tmp = pointer(a)[0]
tmp.contents = b.contents
Now a.contents = c_int(456). The key is that tmp shares a buffer with a (which is why you'll find tmp._b_needsfree_ == 0).

A pointer is a variable that holds a memory address, so the call memmove(addressof(a), addressof(b),...) actually copies the address held by b into a so a now points at the same memory location that b points to. If that is what you desire, you're done.
If what you want is to set the value of the integer pointed to by a to the same value as that of the integer pointed to by b, then what you need is to copy the contents of the memory at the address pointed to by b into the memory pointed to by a. Like so...
memmove(cast(a, c_void_p).value, cast(b, c_void_p).value, sizeof(c_int))
now the pointer a points to a memory address that holds a value similar to the value stored at the memory pointed to by b.
the _objects attribute is not necessary in both cases (IMHO)
Sorry for the boring post. But that I'm afraid is the (only?) way to do pointers :)

References in Python

I have a multicasting network that needs to continuously send data to all other users. This data will be changing constantly so I do not want the programmer to have to deal with the sending of packets to users. Because of this, I am trying to find out how I can make a reference to any object or variable in Python (I am new to Python) so it can be modified by the user and changes what is sent in the multicasting packets.
Here is an example of what I want:
>>> test = "test"
>>> mdc = MulticastDataClient()
>>> mdc.add(test) # added into an internal list that is sent to all users
# here we can see that we are successfully receiving the data
>>> print mdc.receive()
{'192.168.1.10_0': 'test'}
# now we try to change the value of test
>>> test = "this should change"
>>> print mdc.receive()
{'192.168.1.10_0': 'test'} # want 'test' to change to -> 'this should change'
Any help on how I can fix this would be very much appreciated.
UPDATE:
I have tried it this way as well:
>>> test = [1, "test"]
>>> mdc = MulticastDataClient()
>>> mdc.add(test)
>>> mdc.receive()
{'192.168.1.10_1': 'test'}
>>> test[1] = "change!"
>>> mdc.receive()
{'192.168.1.10_1': 'change!'}
This did work.
However,
>>> val = "ftw!"
>>> nextTest = [4, val]
>>> mdc.add(nextTest)
>>> mdc.receive()
{'192.168.1.10_1': 'change!', '192.168.1.10_4': 'ftw!'}
>>> val = "different."
>>> mdc.receive()
{'192.168.1.10_1': 'change!', '192.168.1.10_4': 'ftw!'}
This does not work. I need 'ftw!' to become 'different.' in this case.
I am using strings for testing and am used to strings being objects from other languages. I will only be editing the contents inside of an object so would this end up working?

In python everything is a reference, but strings are not mutable. So test is holding a reference to "test". If you assign "this should change" to test you just change it to another reference. But your clients still have the reference to "test". Or shorter: It does not work that way in python! ;-)
A solution might be to put the data into an object:
data = {'someKey':"test"}
mdc.add(data)
Now your clients hold a reference to the dictionary. If you update the dictionary like this, your clients will see the changes:
data['someKey'] = "this should change"

You can't, not easily. A name (variable) in Python is just a location for a pointer. Overwrite it and you just replace the pointer with another pointer, i.e. the change is only visible to people who use the same variable. Object members are basically the same, but as their state is seen by everyone with a pointer to them, you can propagate changes like this. You just have to use obj.var every single time. Of course, strings (along with integers, tuples, a few other built-in types, and several other types) are immutable, i.e. you can't change anything about for others to see as you can't change it at all.
However, the mutability of objects opens another possibility: You could, if you bothered to pull it through, write a wrapper class that contains an arbitrary object, allows changing that object though a set() method and delegates everything important to that object. You'd probably run into nasty little troubles sooner or later though. For example, I can't imagine this would play well with metaprogramming that goes through all members, or anything that thinks it has to mess with. It's also incredibly hacky (i.e. unreliable). There's probably a much easier solution.
(On a side note, PyPy has a become function in one of its non-default object spaces that really and truly replaces one object with another, visible to everyone with a reference to that object. It doesn't work with any other implementations though and I think the incredible potential and misuse confusion as well as the fact most of us have rarely ever needed this makes it nearly unacceptable in real code.)

Can this Python code be written more efficiently?

So I have this code in python that writes some values to a Dictionary where each key is a student ID number and each value is a Class (of type student) where each Class has some variables associated with it. '
Code
try:
if ((str(i) in row_num_id.iterkeys()) and (row_num_id[str(i)]==varschosen[1])):
valuetowrite=str(row[i])
if students[str(variablekey)].var2 != []:
students[str(variablekey)].var2.append(valuetowrite)
else:
students[str(variablekey)].var2=([valuetowrite])
except:
two=1#This is just a dummy assignment because I #can't leave it empty... I don't need my program to do anything if the "try" doesn't work. I just want to prevent a crash.
#Assign var3
try:
if ((str(i) in row_num_id.iterkeys()) and (row_num_id[str(i)]==varschosen[2])):
valuetowrite=str(row[i])
if students[str(variablekey)].var3 != []:
students[str(variablekey)].var3.append(valuetowrite)
else:
students[str(variablekey)].var3=([valuetowrite])
except:
two=1
#Assign var4
try:
if ((str(i) in row_num_id.iterkeys()) and (row_num_id[str(i)]==varschosen[3])):
valuetowrite=str(row[i])
if students[str(variablekey)].var4 != []:
students[str(variablekey)].var4.append(valuetowrite)
else:
students[str(variablekey)].var4=([valuetowrite])
except:
two=1
'
The same code repeats many, many times for each variable that the student has (var5, var6,....varX). However, the RAM spike in my program comes up as I execute the function that does this series of variable assignments.
I wish to find out a way to make this more efficient in speed or more memory efficient because running this part of my program takes up around half a gig of memory. :(
Thanks for your help!
EDIT:
Okay let me simplify my question:
In my case, I have a dictionary of about 6000 instantiated classes, where each class has 1000 attributed variables all of type string or list of strings. I don't really care about the number of lines my code is or the speed at which it runs (Right now, my code is at almost 20,000 lines and is about a 1 MB .py file!). What I am concerned about is the amount of memory it is taking up because this is the culprit in throttling my CPU. The ultimate question is: does the number of code lines by which I build up this massive dictionary matter so much in terms of RAM usage?
My original code functions fine, but the RAM usage is high. I'm not sure if that is "normal" with the amount of data I am collecting. Does writing the code in a condensed fashion (as shown by the people who helped me below) actually make a noticeable difference in the amount of RAM I am going to eat up? Sure there are X ways to build a dictionary, but does it even affect the RAM usage in this case?

Edit: The suggested code-refactoring below won't reduce the memory consumption very much. 6000 classes each with 1000 attributes may very well consume half a gig of memory.
You might be better off storing the data in a database and pulling out the data only as you need it via SQL queries. Or you might use shelve or marshal to dump some or all of the data to disk, where it can be read back in only when needed. A third option would be to use a numpy array of strings. The numpy array will hold the strings more compactly. (Python strings are objects with lots of methods which make them bulkier memory-wise. A numpy array of strings loses all those methods but requires relatively little memory overhead.) A fourth option might be to use PyTables.
And lastly (but not leastly), there might be ways to re-design your algorithm to be less memory intensive. We'd have to know more about your program and the problem it's trying to solve to give more concrete advice.
Original suggestion:
for v in ('var2','var3','var4'):
try:
if row_num_id.get(str(i))==varschosen[1]:
valuetowrite=str(row[i])
value=getattr(students[str(variablekey)],v)
if value != []:
value.append(valuetowrite)
else:
value=[valuetowrite]
except PUT_AN_EXPLICT_EXCEPTION_HERE:
pass
PUT_AN_EXPLICT_EXCEPTION_HERE should be replaced with something like AttributeError or TypeError, or ValueError, or maybe something else.
It's hard to guess what to put here because I don't know what kind of values the variables might have.
If you run the code without the try...exception block, and your program crashes, take note of the traceback error message you receive. The last line will say something like
TypeError: ...
In that case, replace PUT_AN_EXPLICT_EXCEPTION_HERE with TypeError.
If your code can fail in a number of ways, say, with TypeError or ValueError, then you can replace PUT_AN_EXPLICT_EXCEPTION_HERE with
(TypeError,ValueError) to catch both kinds of error.
Note: There is a little technical caveat that should be mentioned regarding
row_num_id.get(str(i))==varschosen[1]. The expression row_num_id.get(str(i)) returns None if str(i) is not in row_num_id.
But what if varschosen[1] is None and str(i) is not in row_num_id? Then the condition is True, when the longer original condition returned False.
If that is a possibility, then the solution is to use a sentinal default value like row_num_id.get(str(i),object())==varschosen[1]. Now row_num_id.get(str(i),object()) returns object() when str(i) is not in row_num_id. Since object() is a new instance of object there is no way it could equal varschosen[1].

You've spelled this wrong
two=1#This is just a dummy assignment because I
#can't leave it empty... I don't need my program to do anything if the "try" doesn't work. I just want to prevent a crash.
It's spelled
pass
You should read a tutorial on Python.
Also,
except:
Is a bad policy. Your program will fail to crash when it's supposed to crash.
Names like var2 and var3 are evil. They are intentionally misleading.
Don't repeat str(variablekey) over and over again.
I wish to find out a way to make this more efficient in speed or more memory efficient because running this part of my program takes up around half a gig of memory. :(
This request is unanswerable because we don't know what it's supposed to do. Intentionally obscure names like var1 and var2 make it impossible to understand.

"6000 instantiated classes, where each class has 1000 attributed variables"
So. 6 million objects? That's a lot of memory. A real lot of memory.
What I am concerned about is the amount of memory it is taking up because this is the culprit in throttling my CPU
Really? Any evidence?
but the RAM usage is high
Compared with what? What's your basis for this claim?

Python dicts use a surprisingly large amount of memory. Try:
import sys
for i in range( 30 ):
d = dict( ( j, j ) for j in range( i ) )
print "dict with", i, "elements is", sys.getsizeof( d ), "bytes"
for an illustration of just how expensive they are. Note that this is just the size of the dict itself: it doesn't include the size of the keys or values stored in the dict.
By default, an instance of a Python class stores its attributes in a dict. Therefore, each of your 6000 instances is using a lot of memory just for that dict.
One way that you could save a lot of memory, provided that your instances all have the same set of attributes, is to use __slots__ (see http://docs.python.org/reference/datamodel.html#slots). For example:
class Foo( object ):
__slots__ = ( 'a', 'b', 'c' )
Now, instances of class Foo have space allocated for precisely three attributes, a, b, and c, but no instance dict in which to store any other attributes. This uses only 4 bytes (on a 32-bit system) per attribute, as opposed to perhaps 15-20 bytes per attribute using a dict.
Another way in which you could be wasting memory, given that you have a lot of strings, is if you're storing multiple identical copies of the same string. Using the intern function (see http://docs.python.org/library/functions.html#intern) could help if this turns out to be a problem.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.