Some NumPy functions (e.g. argmax or cumsum) can take an array as an optional out parameter and store the result in that array. Please excuse my less than perfect grasp of the terminology here (which is what prevents me from googling for an answer), but it seems that these functions somehow act on variables that are beyond their scope.
How would I transform this simple function so that it can take an out parameter as the functions mentioned?
import numpy as np
def add_two(a):
return a + 2
a = np.arange(5)
a = add_two(a)
From my understanding, a rewritten version of add_two() would allow for the last line above to be replaced with
add_two(a, out=a)
In my opinion, the best and most explicit is to do as you're currently doing. Python passes the values, not the references as parameters in a function, so you can only modify mutable objects.
One way would be to do:
import numpy as np
def add_two(a, out):
out[:] = a+2
a = np.arange(5)
add_two(a, out=a)
a
Output:
array([2, 3, 4, 5, 6])
NB. Unlike your current solution, this requires that the object passed as parameter out exists and is an array
The naive solution would be to fill in the buffer of the output array with the result of your computation:
def add_two(a, out=None):
result = a + 2
if out is None:
out = result
else:
out[:] = result
return out
The problem (if you could call it that), is that you are still generating the intermediate array, and effectively bypassing the benefits of pre-allocating the result in the first place. A more nuanced approach would be to use the out parameters of the functions in your numpy pipeline:
def add_two(a, out=None):
return np.add(a, 2, out=out)
Unfortunately, as with general vectorization, this can only be done on a case-by-case basis depending on what the desired set of operations is.
As an aside, this has nothing to do with scope. Python objects are specifically available to all namespaces (though their names might not be). If a mutable argument is modified in a function, the changes will always be visible outside the function. See for example "Least Astonishment" and the Mutable Default Argument.
Related
I want to implement a function that can compute basic math operations on large array (that won't whole fit in RAM). Therefor I wanted to create a function that will process given operation block by block over selected axis. Main thought of this function is like this:
def process_operation(inputs, output, operation):
shape = inputs[0].shape
for index in range(shape[axis]):
output[index,:] = inputs[0][index:] + inputs[1][index:]
but I want to be able to change the axis by that the blocks should be sliced/indexed.
is it possible to do indexing some sort of dynamic way, not using the ':' syntactic sugar?
I found some help here but so far wasn't much helpful:
thanks
I think you could achieve what you want using python's builtin slice type.
Under the hood, :-expressions used inside square brackets are transformed into instances of slice, but you can also use a slice to begin with. To iterate over different axes of your input you can use a tuple of slices of the correct length.
This might look something like:
def process_operation(inputs, output, axis=0):
shape = inputs[0].shape
for index in range(shape[axis]):
my_slice = (slice(None),) * axis + (index,)
output[my_slice] = inputs[0][my_slice] + inputs[1][my_slice]
I believe this should work with h5py datasets or memory-mapped arrays without any modifications.
Background on slice and __getitem__
slice works in conjunction with the __getitem__ to evaluate the x[key] syntax. x[key] is evaluated in two steps:
If key contains any expressions such as :, i:j or i:j:k then these are de-sugared into slice instances.
key is passed to the __getitem__ method of the object x. This method is responsible for returning the correct value of x[key]
For the example the expressions:
x[2]
y[:, ::2]
are equivalent to:
x.__getitem__(2)
y.__getitem__((slice(None), slice(None, None, 2)))
You can explore how values are converted to slices using a class like the following:
class Sliceable:
def __getitem__(self, key):
print(key)
x = Sliceable()
x[::2] # prints "slice(None, None, 2)"
I want to generate symmetric zero diagonal matrices. My symmetric part work, but when I use fill_diagonal from numpy as the result I got "None". My code is below. Thank you for reading
import numpy as np
matrix_size = int(input("Size of the matrix \n"))
random_matrix = np.random.random_integers(-4,4,size=(matrix_size,matrix_size))
symmetric_matrix = (random_matrix + random_matrix.T)/2
print(symmetric_matrix)
zero_diogonal_matrix = np.fill_diagonal(symmetric_matrix,0)
print(zero_diogonal_matrix)
np.fill_diagonal(), like many other methods across python/numpy, works in-place. For example: Why does “return list.sort()” return None, not the list?. That is that it directly alters the object in memory and does not create a new object. The return value from such functions is None. Therefore, change:
zero_diogonal_matrix = np.fill_diagonal(symmetric_matrix,0)
To just:
np.fill_diagonal(symmetric_matrix,0)
You will then see the change reflected in symmetric_matrix.
It's probably overkill, but in case you want to preserve the tenet of minimising surprise, you could wrap this (and other functions like it) in a function that takes care of preserving the original array:
def fill_diagonal(source_array, diagonal):
copy = source_array.copy()
np.fill_diagonal(copy, diagonal)
return copy
But the question then becomes "who exactly is going to be least surprised by doing it this way?"
So I don't know if this is a well-formed question, and I'm sorry if it isn't, but I'm pretty stumped. Furthermore, I don't know how to submit a minimal working example because I can't reproduce the behavior without the whole code, which is a little big for stackexchange.
So here's the problem: I have an object which takes as one of its arguments a numpy array. (If it helps, this array represents the initial conditions for a differential equation which a method in my object solves numerically.) After using this array to solve the differential equation, it outputs the answer just fine, BUT the original variable in which I had stored the array has now changed value. Here is what I happens:
import numpy as np
import mycode as mc
input_arr = np.ndarray(some_shape)
foo = mc.MyClass(input_arr)
foo.numerical_solve()
some_output
Fine and dandy. But then, when I check on input_arr, it's changed value. Sometimes it's the same as some_output (which is to say, the final value of the numerical solution), but sometimes it's some interstitial step.
As I said, I'm totally stumped and any advice would be much appreciated!
If you have a mutable object (list, set, numpy.array, ...) and you do not want it mutated, then you need to make a copy and pass that instead:
l1 = [1, 2, 3]
l2 = l1[:]
s1 = set([1, 2, 3])
s2 = s1.copy()
arr1 = np.ndarray(some_shape)
arr2 = np.copy(arr1)
I wonder if anyone has an elegant solution to being able to pass a python list, a numpy vector (shape(n,)) or a numpy vector (shape(n,1)) to a function. The idea would be to generalize a function such that any of the three would be valid without adding complexity.
Initial thoughts:
1) Use a type checking decorator function and cast to a standard representation.
2) Add type checking logic inline (significantly less ideal than #1).
3) ?
I do not generally use python builtin array types, but suspect a solution to this question would also support those.
I think the simplest thing to do is to start off your function with numpy.atleast_2d. Then, all 3 of your possibilities will be converted to the x.shape == (n, 1) case, and you can use that to simplify your function.
For example,
def sum(x):
x = np.atleast_2d(x)
return np.dot(x, np.ones((x.shape[0], 1)))
atleast_2d returns a view on that array, so there won't be much overhead if you pass in something that's already an ndarray. However, if you plan to modify x and therefore want to make a copy instead, you can do x = np.atleast_2d(np.array(x)).
You can convert the three types to a "canonical" type, which is a 1dim array, using:
arr = np.asarray(arr).ravel()
Put in a decorator:
import numpy as np
import functools
def takes_1dim_array(func):
#functools.wraps(func)
def f(arr, *a, **kw):
arr = np.asarray(arr).ravel()
return func(arr, *a, **kw)
return f
Then:
#takes_1dim_arr
def func(arr):
print arr.shape
Can I get a numpy vectorized function to use a buffer object as the result as opposed to creating a new array that is returned by that object?
I'd like to do something like this:
fun = numpy.vectorize(lambda x: x + 1)
a = numpy.zeros((1, 10)
buf = numpy.zeros((1, 10)
fun(a, buf_obj = buf)
as opposed to
fun = numpy.vectorize(lambda x: x + 1)
a = numpy.zeros((1, 10)
buf = fun(a)
Not for vectorize, but most numpy functions take an out argument that does exactly what you want.
What function are you trying to use numpy.vectorize with? vectorize is almost always the wrong solution when you're trying to "vectorize" a calculation.
In your example above, if you wanted to do the operation in-place, you could accomplish it with:
a = numpy.zeros((1, 10))
a += 1
Or, if you wanted to be a bit verbose, but do exactly what your example would do:
a = numpy.zeros((1, 10))
buf = numpy.empty_like(a)
numpy.add(a, 1, out=buf)
numpy.vectorize has to call a python function for every element in the array. Therefore, it has additional overhead when compared to numpy functions that operate on the entire array. Usually, when people refer to "vectorizing" an expression to get a speedup, they're referring to building the expression out of building-blocks of basic numpy functions, rather than using vectorize (which is certainly confusing...).
Edit: Based on your comment, vectorize really does fit your use case! (Writing a "raster calculator" is a pretty perfect use case for it, beyond security/sandboxing issues.)
On the other hand, numexpr is probably an even better fit if you don't mind an additional dependency.
It's faster and takes an out parameter.