I'm quite puzzled by this simple piece of python code:
data = np.arange(2500,8000,100)
logdata = np.zeros((len(data)))+np.nan
randata = logdata
for i in range(len(data)):
logdata[i] = np.log(data[i])
randata[i] = np.log(random.randint(2500,8000))
plt.plot(logdata,randata,'bo')
OK, I don't need a for cycle in this specific instance (I'm just making a simple example), but what really confuses me is the role played by the initialisation of randata. I would expect that in virtue of the for cycle, randata would become a totally different array from logdata, but the two arrays turn out to be the same. I see from older discussions that only way to prevent this from happening, is to initialize randata by its own randata=np.zeros((len(data)))+np.nan or to make a copy randata=logdata.copy() but I don't understand why randata is so deeply linked to logdata in virtue of the for cycle.
If I were to give the following commands
logdata = np.zeros((len(data)))+np.nan
randata = logdata
logdata = np.array([1,2,3])
print(randata)
then randata would still be an array of nan, differently from logdata. Why is so?
Blckknght explains numpy assignment behavior in this post: Numpy array assignment with copy
B = A
This binds a new name B to the existing object already named A.
Afterwards they refer to the same object, so if you modify one in
place, you'll see the change through the other one too.
But to answer why they're "deeply linked" (or rather, point to the same location in memory), it's mostly because copying large arrays is computationally expensive. So in numpy, the assignment = operator references the same block of memory instead of creating a copy at every assignment. If different arrays are desired, we can allocate new memory explicitly using the copy() method. This gives us the efficiency of C/C++ (where avoiding copies is very common by passing around pointers and references) along with the ease-of-use of python (where pointers and references are not available).
I'd say this is a feature, not a bug.
Related
I have some prolem about SharedMemory in python3.8,any help will be good.
Question 1.
SharedMemory has one parameter SIZE,the doc tell me the unit is byte.I created a instance of 1 byte size,then, let shm.buf=bytearray[1,2,3,4], it can work and no any exception!why?
Question 2.
why print buffer is a memory address?
why i set size is 1byte,but result show it allocate 4096byte?
why buffer address and buffer[3:4] address is 3X16X16byte far away?
why buffer[3:4] address same as buffer[1:3] address?
from multiprocessing import shared_memory, Process
def writer(buffer):
print(buffer) # output: <memory at 0x7f982b6234c0>
buffer[:5] = bytearray([1, 2, 3, 4, 5])
buffer[4] = 12
print(buffer[3:4]) # output: <memory at 0x7f982b6237c0>
print(buffer[1:3]) # output: <memory at 0x7f982b6237c0>
if __name__ == '__main__':
shm = shared_memory.SharedMemory('my_memory', create=True, size=1)
print(shm.size) # output: 4096
writer(shm.buf)
print('shm is :', shm) # output: shm is : SharedMemory('my_memory', size=4096)
In answer to question 2:
buffer[3:4] is not, as you seem to suppose, an array reference. It is an expression that takes a slice of buffer and assigns it to a new unnamed variable, which your function prints the ID of, then throws away. Then buffer[1:3] does something similar and the new unnamed variable coincidentally gets allocated to the same memory location as the now disappeared copy of buffer[3:4], because Python's garbage collection knew that location was free.
If you don't throw away the slices after creating them, they will be allocated to different locations. Try this:
>>> b34 = buffer[3:4]
>>> b13 = buffer[1:3]
>>> b34
<memory at 0x0000024E65662940>
>>> b13
<memory at 0x0000024E65662A00>
In this case they are at different locations because there are variables that refer to them both.
And both are at different locations to buffer because they are all 3 different variables that are only related to one another by history, because of the way they were created.
Python variables are not blocks of raw memory that you can index into with a pointer, and thinking that you can access them directly as if they were C byte arrays is not helpful. What a particular Python interpreter does with your data deep down inside is generally not the business of a Python programmer. At the application level, why should anyone care where and how buffer[3:4] is stored?
There is one good reason: if you have huge amounts of data, you may need to understand the details of the implementation because you have a performance problem. But generally the solution at the application level is to use modules like array or pandas where very clever people have already thought about those problems and come up with solutions.
What you are asking is not a question about Python, but about the the details of a particular CPython implementation. If you are interested in that sort of thing, and there is no reason why you should not be, you can always go and read the source. But that might be a little overambitious for the moment.
I have a function takes_ownership() that performs an operation on a Numpy array a in place and effectively relies on becoming the owner of the data. Is there a way to alter the passed array such that it no longer points to its original buffer?
The motivation is a code that processes huge chunks of data in place (due to performance constraints) and a common error is unaware users recycling their input arrays.
Note that the question is very specifically not "why is a different if I change b = a in my function". The question is "how can I make using the given array in place safer for unsuspecting users" when I must not make a copy.
def takes_ownership(a):
b = a
# looking for this step
a.set_internal_buffer([])
# if this were C++ std::vectors, I would be looking for
# b = np.array([])
# a.swap(b)
# an expensive operation that invalidates a
b.resize((6, 6), refcheck=False)
# outside references to a no longer valid
return b
a = np.random.randn(5, 5)
b = takes_ownership(a)
# array no longer has data so that users cannot mess up
assert a.shape = ()
NumPy has a copy function that will clone an array (although if this is an object array, i.e. not primitive, there might still be nested object references after cloning). That being said, this is a questionable design pattern and it would probably be better practice to rewrite your code in a way that does not rely on this condition.
Edit: If you can't copy the array, you will probably need to make sure that none of your other code modifies it in place (instead running immutable operations to produce new arrays). Doing things this way may cause more issues down the road, so I'd recommend refactoring so that it is not necessary.
You can use the numpy's copy, which will absolutely do what you want.
Use the code b = np.copy(a) and no further changes are needed.
If you want to make a copy of an object, in general, so that you can call methods on that object then you can use the copy module.
From the linked page:
A shallow copy constructs a new compound object and then (to the
extent possible) inserts references into it to the objects found in
the original.
A deep copy constructs a new compound object and then, recursively,
inserts copies into it of the objects found in the original.
In this case, in your code import copy and then use b = copy.copy(a) and you will get a shallow copy (Which I think should be good enough for numpy arrays, but you'll want to check that yourself).
The hanging question here is why this is needed. Python prefers to use "pass by reference" when calling a function. The assignment operator = does not actually call any constructor for the object on the left-hand side of the operator, rather it assigns it to the reference of the object on the right-hand side. So, when you call a method on a reference (using the dot operator .), either a or b are going to call the same object in memory unless you make a new object with an explicit copy command.
I have a class in python3 that contains a few variables and represents a state.
During the program (simulation) I need to make a big amount of copies of this state so that I can change it and still have the previous information.
The problem is that deepcopy from the copy module is too slow. Would I be better of creating a method in that class to copy an object, which would create and return a new object and copy the values for each variable? Note: inside the object there is a 3D list as well.
Is there any other solution to this? The deepcopy is really too slow, it takes more than 99% of the execution time according to cProfile.
Edit: Would representing the 3D list and other lists as numpy arrays/matrix and copying them with numpy inside a custom copy function be the better way?
For people from the future having the same problem:
What I did was creating a method inside the class that would manually copy the information. I did not override deepcopy, maybe that would be cleaner, maybe not.
I tried with and without numpy for 2D and 3D lists, but appending 2 numpy arrays later in the code was much more time consuming than making a sum of 2 lists (which I did need to do for my specific program).
So I used:
my_list = list(map(list, my_list)) # for 2D list
my_list = [list(map(list, x)) for x in my_list] # for 3D list
I have a data acquisation class that takes data and saves it into a numpy array.
voltage = float((data.decode())[0:-2]) * 1000
print(voltage)
self.xxList = np.append(self.xxList, [voltage])
Those lines are in a while loop which is managed by a thread. I return the "self.xxlist" with a simple getter:
def get_xlist(self):
return self.xxList
Then i try to have a reference to the same list in another class, which of course has the instance of my data acquistion class.
self.mySerial = SerialFirat(self)
self.xaList = self.mySerial.get_xlist()
This doesn't work with numpy - The self.xaList always stays the same (empty) and doesnt update on any acquired data - but works with a regular python list which uses simple .append(data).
I guess this might be due the way an element is appended to a numpy array, which creates a new list and returns the reference to it, each time an element is appended. The list that i referenced is the first list and the newly created lists have some different adress so the referenced list always stays the same.
I couldnt find a work around to make it function like a normal python list. I would appriciate any help and a clarification if my conclusion on why it doesnt work is correct.
PS: I use the data to plot a live graph but the list - xaList - i feed to the plotting function is always empty and nothings is being plotted. If i directly feed the xxList (the List i write the serialData in) it works but that really leads to a crappy object oriented design
I am trying to find how to apply two functions to a numpy array each one only on half the values.
Here is the code I have been trying
def hybrid_array(xs,height,center,fwhh):
xs[xs<=center] = height*np.exp((-(xs[xs<=center]-center)**2)/(2*(fwhh/(2*np.sqrt(2*np.log(2))))**2))
xs[xs>center] = height*1/(np.abs(1+((center-xs[xs>center])/(fwhh/2))**2))
return xs
However I am overwriting the initial array that is passed to the argument. The usual trick of copying it with a slice ie. the following still changes b.
a = b[:]
c = hybrid_array(a,args)
If there is a better way of doing any part of what I am trying, I would be very grateful if you could let me know as I am still new to numpy arrays.
Thank you
Try copy.deepcopy to copy the array b onto a before calling your function.
import copy
a = copy.deepcopy(b)
c = hybrid_array(a,args)
Alternatively, you can use the copy method of the array
a = b.copy()
c = hybrid_array(a,args)
Note***
You may be wondering, why despite an easier way to copy an array with the copy method of numpy array I introduced the copy.deepcopy. Other's may disagree but here is my reasoning
Using the method deepcopy makes it clear that you are intending to do a deepcopy instead of reference copy
All python's data type do not support the copy method. Numpy has it and good it has but when you are programming with numpy and python you may end up using various numpy and non numpy data types not all of which would support the copy method. To remain consistent I would prefer to use the first.
Copying a NumPy array a is done with a.copy(). In your application, however, there is no need to copy the old data. All you need is a new array of the same shape and dtype as the old one. You can use
result = numpy.empty_like(xs)
to create such an array. If you generally don't want your function to modify its parameter, you should do this inside the function, rather than requiring the caller to take care of this.