function taking ownership of Numpy array - python

I have a function takes_ownership() that performs an operation on a Numpy array a in place and effectively relies on becoming the owner of the data. Is there a way to alter the passed array such that it no longer points to its original buffer?
The motivation is a code that processes huge chunks of data in place (due to performance constraints) and a common error is unaware users recycling their input arrays.
Note that the question is very specifically not "why is a different if I change b = a in my function". The question is "how can I make using the given array in place safer for unsuspecting users" when I must not make a copy.
def takes_ownership(a):
b = a
# looking for this step
a.set_internal_buffer([])
# if this were C++ std::vectors, I would be looking for
# b = np.array([])
# a.swap(b)
# an expensive operation that invalidates a
b.resize((6, 6), refcheck=False)
# outside references to a no longer valid
return b
a = np.random.randn(5, 5)
b = takes_ownership(a)
# array no longer has data so that users cannot mess up
assert a.shape = ()

NumPy has a copy function that will clone an array (although if this is an object array, i.e. not primitive, there might still be nested object references after cloning). That being said, this is a questionable design pattern and it would probably be better practice to rewrite your code in a way that does not rely on this condition.
Edit: If you can't copy the array, you will probably need to make sure that none of your other code modifies it in place (instead running immutable operations to produce new arrays). Doing things this way may cause more issues down the road, so I'd recommend refactoring so that it is not necessary.

You can use the numpy's copy, which will absolutely do what you want.
Use the code b = np.copy(a) and no further changes are needed.
If you want to make a copy of an object, in general, so that you can call methods on that object then you can use the copy module.
From the linked page:
A shallow copy constructs a new compound object and then (to the
extent possible) inserts references into it to the objects found in
the original.
A deep copy constructs a new compound object and then, recursively,
inserts copies into it of the objects found in the original.
In this case, in your code import copy and then use b = copy.copy(a) and you will get a shallow copy (Which I think should be good enough for numpy arrays, but you'll want to check that yourself).
The hanging question here is why this is needed. Python prefers to use "pass by reference" when calling a function. The assignment operator = does not actually call any constructor for the object on the left-hand side of the operator, rather it assigns it to the reference of the object on the right-hand side. So, when you call a method on a reference (using the dot operator .), either a or b are going to call the same object in memory unless you make a new object with an explicit copy command.

Related

Array changes after assignment

I'm quite puzzled by this simple piece of python code:
data = np.arange(2500,8000,100)
logdata = np.zeros((len(data)))+np.nan
randata = logdata
for i in range(len(data)):
logdata[i] = np.log(data[i])
randata[i] = np.log(random.randint(2500,8000))
plt.plot(logdata,randata,'bo')
OK, I don't need a for cycle in this specific instance (I'm just making a simple example), but what really confuses me is the role played by the initialisation of randata. I would expect that in virtue of the for cycle, randata would become a totally different array from logdata, but the two arrays turn out to be the same. I see from older discussions that only way to prevent this from happening, is to initialize randata by its own randata=np.zeros((len(data)))+np.nan or to make a copy randata=logdata.copy() but I don't understand why randata is so deeply linked to logdata in virtue of the for cycle.
If I were to give the following commands
logdata = np.zeros((len(data)))+np.nan
randata = logdata
logdata = np.array([1,2,3])
print(randata)
then randata would still be an array of nan, differently from logdata. Why is so?
Blckknght explains numpy assignment behavior in this post: Numpy array assignment with copy
B = A
This binds a new name B to the existing object already named A.
Afterwards they refer to the same object, so if you modify one in
place, you'll see the change through the other one too.
But to answer why they're "deeply linked" (or rather, point to the same location in memory), it's mostly because copying large arrays is computationally expensive. So in numpy, the assignment = operator references the same block of memory instead of creating a copy at every assignment. If different arrays are desired, we can allocate new memory explicitly using the copy() method. This gives us the efficiency of C/C++ (where avoiding copies is very common by passing around pointers and references) along with the ease-of-use of python (where pointers and references are not available).
I'd say this is a feature, not a bug.

Really simple question about reference of list in python

I have a really simple question about references in python.
I assume you are familiar with this:
aa = [1,2]
bb = aa
aa[0] = 100
print(bb)
As you might guess, the output will be
[100, 2]
and it's totally OK ✔
Let's do another example:
l = [[],[],[]]
a = l[0]
l[0] = [1,2]
print(a)
But here the output is:
[]
I know why that happened.
It's because, on line 3, we made an entirely different list and "replaced it"(not changing but replacing) with l[0]
Now my question is "Can I somehow replace l[0] with [1,2] and also keep a as a reference?
P.S not like l[0].append(1,2)
Short answer: Not really, but there might be something close enough.
The problem is, under the covers, a is a pointer-to-a-list, and l is a pointer-to-a-list-of-pointers-to-lists. When you a = l[0], what that actually translates to at the CPU is "dereference the pointer l, treat the resulting region of memory as a list object, get the first object (which will be the address of another list), and set the value of pointer a to that address". Once you've done that, a and l[0] are only related by concidence; the are two separate pointers that happen, for the moment, to point at the same object. If you assign to either variable, you're changing the value of a pointer, not the contents of the pointed-to object.
Broadly speaking, there's a few ways the computer could practically do what you ask.
Modify the pointed-to object (list) without modifying either pointer. That's what the append function does, along with the many other mutators of python lists. If you want to do this in a way that perhaps more clearly expresses your intent, you could do l[0][:] = [1,2]. That's a list copy operation, copying into the object pointed to by both l[0] and a. This is your best bet as a developer, though note that copy operations are O(n).
Implement a as a pointer-to-a-pointer-to-a-list that is automatically dereferenced (to merely a pointer-to-list) when accessed. This is not, AFAIK, something Python provides any support for; almost no language does. In C you could say list ** a = &(l[0]); but then any time you want to actually do anything with a you'd have to use *a instead.
Tell the interpreter to observe that a is an alias to l[0], rather than its own, separate variable. As far as I know, Python doesn't support this either. In C, you could do it as #define a (l[0]) though you'd want to #undef a when it went out of scope.
Rather than making a a list variable (which is implemented as a pointer-to-list), make it a function: a = lambda: l[0]. This means you have to use a() instead of a anywhere you want to get the actual content of l[0], and you can't assign to l[0] through a() (or through a directly). But it does work, in Python. You could even go so far as to use properties, which would let you skip the parentheses and assign through a, but at the cost of writing a bunch more code to wrap the lists (I'm not aware of a way to attach properties to lists directly, though one might exist, so you'd instead have to create a new object wrapping the list).
If a = l then calling a[0] in that case would yield [1,2], however, you are deleting the list that “a” was referencing, therefore the reference is destroyed as well. You need to either make a = l and call a[0] or reset the reference by calling a = l[0] again.
The original array at l[0] and the [1,2] array are two different objects. there is no way to do that short of re-assigning the a variable to l[0] again once the change has been made.

Why copy.deepcopy doesn't modify the id of an object?

I don't understand why copy.deepcopy does not modify the id of an object:
import copy
a = 'hello world'
print a is copy.deepcopy(a) # => True ???
Simeon's answer is perfectly correct, but I wanted to provide a more general perspective.
The copy module is primarily intended for use with mutable objects. The idea is to make a copy of an object so you can modify it without affecting the original. Since there's no point in making copies of immutable objects, the module declines to do so. Strings are immutable in Python, so this optimization can never affect real-world code.
Python interns the strings so they're the same object (and thus the same when compared with is). This means Python only stores one copy of the same string object (behind the scenes).
The result of copy.deepcopy(a) is not truly a new object and as such it isn't meaningful to perform this call on a string object.
Look again:
import copy
a = ['hello world']
print a is copy.deepcopy(a) # => False
Since the value of an immutible object (such as a string) is incapable of changing without also changing its identity, there's would be no point to creating additional instances. It's only in the case of a mutable object (such as a list) where there's any point to creating a second identity with the same value.
For a thorough introduction to separating the concepts of value, identity and state, I suggest Rich Hickey's talk on the subject.

How to make a variable (truly) local to a procedure or function

ie we have the global declaration, but no local.
"Normally" arguments are local, I think, or they certainly behave that way.
However if an argument is, say, a list and a method is applied which modifies the list, some surprising (to me) results can ensue.
I have 2 questions: what is the proper way to ensure that a variable is truly local?
I wound up using the following, which works, but it can hardly be the proper way of doing it:
def AexclB(a,b):
z = a+[] # yuk
for k in range(0, len(b)):
try: z.remove(b[k])
except: continue
return z
Absent the +[], "a" in the calling scope gets modified, which is not desired.
(The issue here is using a list method,
The supplementary question is, why is there no "local" declaration?
Finally, in trying to pin this down, I made various mickey mouse functions which all behaved as expected except the last one:
def fun4(a):
z = a
z = z.append(["!!"])
return z
a = ["hello"]
print "a=",a
print "fun4(a)=",fun4(a)
print "a=",a
which produced the following on the console:
a= ['hello']
fun4(a)= None
a= ['hello', ['!!']]
...
>>>
The 'None' result was not expected (by me).
Python 2.7 btw in case that matters.
PS: I've tried searching here and elsewhere but not succeeded in finding anything corresponding exactly - there's lots about making variables global, sadly.
It's not that z isn't a local variable in your function. Rather when you have the line z = a, you are making z refer to the same list in memory that a already points to. If you want z to be a copy of a, then you should write z = a[:] or z = list(a).
See this link for some illustrations and a bit more explanation http://henry.precheur.org/python/copy_list
Python will not copy objects unless you explicitly ask it to. Integers and strings are not modifiable, so every operation on them returns a new instance of the type. Lists, dictionaries, and basically every other object in Python are mutable, so operations like list.append happen in-place (and therefore return None).
If you want the variable to be a copy, you must explicitly copy it. In the case of lists, you slice them:
z = a[:]
There is a great answer than will cover most of your question in here which explains mutable and immutable types and how they are kept in memory and how they are referenced. First section of the answer is for you. (Before How do we get around this? header)
In the following line
z = z.append(["!!"])
Lists are mutable objects, so when you call append, it will update referenced object, it will not create a new one and return it. If a method or function do not retun anything, it means it returns None.
Above link also gives an immutable examle so you can see the real difference.
You can not make a mutable object act like it is immutable. But you can create a new one instead of passing the reference when you create a new object from an existing mutable one.
a = [1,2,3]
b = a[:]
For more options you can check here
What you're missing is that all variable assignment in python is by reference (or by pointer, if you like). Passing arguments to a function literally assigns values from the caller to the arguments of the function, by reference. If you dig into the reference, and change something inside it, the caller will see that change.
If you want to ensure that callers will not have their values changed, you can either try to use immutable values more often (tuple, frozenset, str, int, bool, NoneType), or be certain to take copies of your data before mutating it in place.
In summary, scoping isn't involved in your problem here. Mutability is.
Is that clear now?
Still not sure whats the 'correct' way to force the copy, there are
various suggestions here.
It differs by data type, but generally <type>(obj) will do the trick. For example list([1, 2]) and dict({1:2}) both return (shallow!) copies of their argument.
If, however, you have a tree of mutable objects and also you don't know a-priori which level of the tree you might modify, you need the copy module. That said, I've only needed this a handful of times (in 8 years of full-time python), and most of those ended up causing bugs. If you need this, it's a code smell, in my opinion.
The complexity of maintaining copies of mutable objects is the reason why there is a growing trend of using immutable objects by default. In the clojure language, all data types are immutable by default and mutability is treated as a special cases to be minimized.
If you need to work on a list or other object in a truly local context you need to explicitly make a copy or a deep copy of it.
from copy import copy
def fn(x):
y = copy(x)

apply two functions to the two halves of a numpy array

I am trying to find how to apply two functions to a numpy array each one only on half the values.
Here is the code I have been trying
def hybrid_array(xs,height,center,fwhh):
xs[xs<=center] = height*np.exp((-(xs[xs<=center]-center)**2)/(2*(fwhh/(2*np.sqrt(2*np.log(2))))**2))
xs[xs>center] = height*1/(np.abs(1+((center-xs[xs>center])/(fwhh/2))**2))
return xs
However I am overwriting the initial array that is passed to the argument. The usual trick of copying it with a slice ie. the following still changes b.
a = b[:]
c = hybrid_array(a,args)
If there is a better way of doing any part of what I am trying, I would be very grateful if you could let me know as I am still new to numpy arrays.
Thank you
Try copy.deepcopy to copy the array b onto a before calling your function.
import copy
a = copy.deepcopy(b)
c = hybrid_array(a,args)
Alternatively, you can use the copy method of the array
a = b.copy()
c = hybrid_array(a,args)
Note***
You may be wondering, why despite an easier way to copy an array with the copy method of numpy array I introduced the copy.deepcopy. Other's may disagree but here is my reasoning
Using the method deepcopy makes it clear that you are intending to do a deepcopy instead of reference copy
All python's data type do not support the copy method. Numpy has it and good it has but when you are programming with numpy and python you may end up using various numpy and non numpy data types not all of which would support the copy method. To remain consistent I would prefer to use the first.
Copying a NumPy array a is done with a.copy(). In your application, however, there is no need to copy the old data. All you need is a new array of the same shape and dtype as the old one. You can use
result = numpy.empty_like(xs)
to create such an array. If you generally don't want your function to modify its parameter, you should do this inside the function, rather than requiring the caller to take care of this.

Categories

Resources