python object changes the value of an input variable - python

So I don't know if this is a well-formed question, and I'm sorry if it isn't, but I'm pretty stumped. Furthermore, I don't know how to submit a minimal working example because I can't reproduce the behavior without the whole code, which is a little big for stackexchange.
So here's the problem: I have an object which takes as one of its arguments a numpy array. (If it helps, this array represents the initial conditions for a differential equation which a method in my object solves numerically.) After using this array to solve the differential equation, it outputs the answer just fine, BUT the original variable in which I had stored the array has now changed value. Here is what I happens:
import numpy as np
import mycode as mc
input_arr = np.ndarray(some_shape)
foo = mc.MyClass(input_arr)
foo.numerical_solve()
some_output
Fine and dandy. But then, when I check on input_arr, it's changed value. Sometimes it's the same as some_output (which is to say, the final value of the numerical solution), but sometimes it's some interstitial step.
As I said, I'm totally stumped and any advice would be much appreciated!

If you have a mutable object (list, set, numpy.array, ...) and you do not want it mutated, then you need to make a copy and pass that instead:
l1 = [1, 2, 3]
l2 = l1[:]
s1 = set([1, 2, 3])
s2 = s1.copy()
arr1 = np.ndarray(some_shape)
arr2 = np.copy(arr1)

Related

Python function that acts on provided array

Some NumPy functions (e.g. argmax or cumsum) can take an array as an optional out parameter and store the result in that array. Please excuse my less than perfect grasp of the terminology here (which is what prevents me from googling for an answer), but it seems that these functions somehow act on variables that are beyond their scope.
How would I transform this simple function so that it can take an out parameter as the functions mentioned?
import numpy as np
def add_two(a):
return a + 2
a = np.arange(5)
a = add_two(a)
From my understanding, a rewritten version of add_two() would allow for the last line above to be replaced with
add_two(a, out=a)
In my opinion, the best and most explicit is to do as you're currently doing. Python passes the values, not the references as parameters in a function, so you can only modify mutable objects.
One way would be to do:
import numpy as np
def add_two(a, out):
out[:] = a+2
a = np.arange(5)
add_two(a, out=a)
a
Output:
array([2, 3, 4, 5, 6])
NB. Unlike your current solution, this requires that the object passed as parameter out exists and is an array
The naive solution would be to fill in the buffer of the output array with the result of your computation:
def add_two(a, out=None):
result = a + 2
if out is None:
out = result
else:
out[:] = result
return out
The problem (if you could call it that), is that you are still generating the intermediate array, and effectively bypassing the benefits of pre-allocating the result in the first place. A more nuanced approach would be to use the out parameters of the functions in your numpy pipeline:
def add_two(a, out=None):
return np.add(a, 2, out=out)
Unfortunately, as with general vectorization, this can only be done on a case-by-case basis depending on what the desired set of operations is.
As an aside, this has nothing to do with scope. Python objects are specifically available to all namespaces (though their names might not be). If a mutable argument is modified in a function, the changes will always be visible outside the function. See for example "Least Astonishment" and the Mutable Default Argument.

Why can the datatype of an array not be changed inside a loop?

I've got a structure of a list of arrays with complex entries and I want to convert them into floats. The imaginary part can be discarded, thats fine.
import numpy as np
a=np.array([2+3j,3+4j])
b=np.array([1+2j,4+3j])
arrays=[a,b]
for i,y in enumerate(arrays):
y=y.astype('float64')
print(arrays)
I am wondering, why this doesn't work, while on the other hand, changing the type to float before creating the list 'arrays' does work.
import numpy as np
a=np.array([2+3j,3+4j])
b=np.array([1+2j,4+3j])
a=a.astype('float64')
b=b.astype('float64')
arrays=[a,b]
print(arrays)
This is a very basic question, but I would be very happy, if someone could share his or her thoughts about this.
Thanks in advance :)
I am wondering, why this doesn't work ...
y.astype('float64') creates a new object. The assignment in the loop assigns that new object to the name y. Because of the assignment y no longer points to the array item so the array item is not changed.
If you were performing an operation in the loop that modified the item in-place it would work like you expected. ...
for i,y in enumerate(arrays):
np.add(y,4,out=y)
Whereas for i,y in enumerate(arrays): y = y + 4 will behave like the example in your question.

Numpy fill_diagonal return None

I want to generate symmetric zero diagonal matrices. My symmetric part work, but when I use fill_diagonal from numpy as the result I got "None". My code is below. Thank you for reading
import numpy as np
matrix_size = int(input("Size of the matrix \n"))
random_matrix = np.random.random_integers(-4,4,size=(matrix_size,matrix_size))
symmetric_matrix = (random_matrix + random_matrix.T)/2
print(symmetric_matrix)
zero_diogonal_matrix = np.fill_diagonal(symmetric_matrix,0)
print(zero_diogonal_matrix)
np.fill_diagonal(), like many other methods across python/numpy, works in-place. For example: Why does “return list.sort()” return None, not the list?. That is that it directly alters the object in memory and does not create a new object. The return value from such functions is None. Therefore, change:
zero_diogonal_matrix = np.fill_diagonal(symmetric_matrix,0)
To just:
np.fill_diagonal(symmetric_matrix,0)
You will then see the change reflected in symmetric_matrix.
It's probably overkill, but in case you want to preserve the tenet of minimising surprise, you could wrap this (and other functions like it) in a function that takes care of preserving the original array:
def fill_diagonal(source_array, diagonal):
copy = source_array.copy()
np.fill_diagonal(copy, diagonal)
return copy
But the question then becomes "who exactly is going to be least surprised by doing it this way?"

Numpy array.resize() - zeros 'first'

I can use array.resize(shape) to resize my array and have zeros added to those indices without any value. If my array is [1,2,3,4] and I use array.resize[5,0] I get [1,2,3,4,0]. How can I append / pad the zeros to the front, yielding [0,1,2,3,4]?
I am doing this dynamically - trying to use:
array.resize(arrayb.shape)
I would like to avoid (at all costs) making an in-memory copy of the array. That is to reverse the array, resize, and reverse again. Working with a view would be ideal.
You could try working on an array with negative strides (though you can never be sure that resize may not have to make a copy):
_a = np.empty(0) # original array
a = _a[::-1] # the array you work with...
# now instead of a, resize the original _a:
del a # You need to delete it first. Or resize will want refcheck=False, but that
# will be dangerous!
_a.resize(5)
# And update a to the new array:
a = _a[::-1]
But I would really suggest you make the array large enough if in any way possible, this does not seem very beautiful, but I think this is the only way short of copying around data. Your array will also have a negative stride, so it won't be contiguous, so if that means that some function you use on it must make copy, you are out of luck.
Also if you slice your a or _a you have to either make a copy, or make sure you delete them before resizing. While you can give refcheck=False this seems to invalidate the data.
I believe you can use slice assignment to do this. I see no reason why numpy would need to make a copy for an operation like this, as long as it does the necessary checks for overlaps (though of course as others have noted, resize may itself have to allocate a new block of memory). I tested this method with a very large array, and I saw no jump in memory usage.
>>> a = numpy.arange(10)
>>> a.resize(15)
>>> a[5:] = a[:10]
>>> a[0:5] = 0
>>> a
array([0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
The following showed no jump in memory usage for the assignment operation:
>>> a = numpy.arange(100000000)
>>> a.resize(150000000)
>>> a[50000000:] = a[:100000000]
I don't know of a better way, and this is just a conjecture. Let me know if it doesn't work.

How can I tell if NumPy creates a view or a copy?

For a minimal working example, let's digitize a 2D array. numpy.digitize requires a 1D array:
import numpy as np
N = 200
A = np.random.random((N, N))
X = np.linspace(0, 1, 20)
print np.digitize(A.ravel(), X).reshape((N, N))
Now the documentation says:
... A copy is made only if needed.
How do I know if the ravel copy it is "needed" in this case? In general - is there a way I can determine if a particular operation creates a copy or a view?
This question is very similar to a question that I asked a while back:
You can check the base attribute.
a = np.arange(50)
b = a.reshape((5, 10))
print (b.base is a)
However, that's not perfect. You can also check to see if they share memory using np.may_share_memory.
print (np.may_share_memory(a, b))
There's also the flags attribute that you can check:
print (b.flags['OWNDATA']) #False -- apparently this is a view
e = np.ravel(b[:, 2])
print (e.flags['OWNDATA']) #True -- Apparently this is a new numpy object.
But this last one seems a little fishy to me, although I can't quite put my finger on why...
In the documentation for reshape there is some information about how to ensure an exception if a view cannot be made:
It is not always possible to change the shape of an array without copying the data. If you want an error to be raised if the data is copied, you should assign the new shape to the shape attribute of the array:
>>> a = np.zeros((10, 2))
# A transpose make the array non-contiguous
>>> b = a.T
# Taking a view makes it possible to modify the shape without modiying the
# initial object.
>>> c = b.view()
>>> c.shape = (20)
AttributeError: incompatible shape for a non-contiguous array
This is not exactly an answer to your question, but in certain cases it may be just as useful.

Categories

Resources