I'm confused with how numpy methods are applied to nd-arrays. for example:
import numpy as np
a = np.array([[1,2,2],[5,2,3]])
b = a.transpose()
a.sort()
Here the transpose() method is not changing anything to a, but is returning the transposed version of a, while the sort() method is sorting a and is returning a NoneType. Anybody an idea why this is and what is the purpose of this different functionality?
Because numpy authors decided that some methods will be in place and some won't. Why? I don't know if anyone but them can answer that question.
'in-place' operations have the potential to be faster, especially when dealing with large arrays, as there is no need to re-allocate and copy the entire array, see answers to this question
BTW, most if not all arr methods have a static version that returns a new array. For example, arr.sort has a static version numpy.sort(arr) which will accept an array and return a new, sorted array (much like the global sorted function and list.sort()).
In a Python class (OOP) methods which operate in place (modify self or its attributes) are acceptable, and if anything, more common than ones that return a new object. That's also true for built in classes like dict or list.
For example in numpy we often recommend the list append approach to building an new array:
In [296]: alist = []
In [297]: for i in range(3):
...: alist.append(i)
...:
In [298]: alist
Out[298]: [0, 1, 2]
This is common enough that we can readily write it as a list comprehension:
In [299]: [i for i in range(3)]
Out[299]: [0, 1, 2]
alist.sort operates in-place, sorted(alist) returns a new list.
In numpy methods that return a new array are much more common. In fact sort is about the only in-place method I can think of off hand. That and a direct modification of shape: arr.shape=(...).
A number of basic numpy operations return a view. That shares data memory with the source, but the array object wrapper is new. In fact even indexing an element returns a new object.
So while you ultimately need to check the documentation, it's usually safe to assume a numpy function or method returns a new object, as opposed to operating in-place.
More often users are confused by the numpy functions that have the same name as a method. In most of those cases the function makes sure the argument(s) is an array, and then delegates the action to its method. Also keep in mind that in Python operators are translated into method calls - + to __add__, [index] to __getitem__() etc. += is a kind of in-place operation.
Related
I'm learning numpy, however, I don't understand that, for example:
import numpy as np
ints = np.array([3,3,3,2,2,1,1,4,4])
ints.unique() # this won't work
np.unique(ints) # this works
however, some function works both ways
ints.sum()
np.sum(ints)
And I was reading numpy documents, what's the different between attributes vs methods? arributes will return something as well as methods.
unique unlike sum is a free function only and not a class (instance to be precise) method. The difference between the two is
obj.foo() # instance method, obj is implicitly passed to foo()
foo(obj) # free function, obj is explicity passed to foo()
Have a look here for some explanation on different variants of methods. In NumPy, this is mainly a design decision, I believe, however there are certain reasons for some functions to be a free function. One reason that comes to mind, is that unlike in other technical languages (such as MATLAB), numpy arrays can be structured or unstructured and can be flexible in terms of containing objects of different types, for example
a = np.array([[1,2],[3,4]]) # structured array
b = np.array([[1,2],[3,4,5]]) # unstructured array
c = np.array([[1,2],["abc",True]]) # unstructured array with flexible data type
In such scenarios, having to make every function/method an instance method, would lead to confusing behaviour. Even the sum function behaves differently with structured and unstructured arrays
In [18]: a.sum() # sums all elements of the array
Out[18]: 10
In [19]: b.sum() # concatenates all elements of the array
Out[19]: [1, 2, 3, 4, 5]
In contrast, some functions like unique have a much narrower scope in terms of their applications. For example unique only works for structured arrays/buffers of uniform data type and operates on the flattened (1D dimensional) version of the arrays.
attributes of numpy arrays typically tell you about the underlying data type, shape, dimensionality, memory layout/strides and data ownership of the array, for instance:
In [20]: a=np.random.rand(3,4)
In [21]: a.flags
Out[21]:
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
In [22]: a.shape
Out[22]: (3, 4)
In [23]: a.dtype
Out[23]: dtype('float64')
are all attributes and not array methods per say, in other words they are properties.
np.sum is a function that takes an array, or anything that can be turned into an array, and applies it's sum method. See np.source(np.sum) for details.
arr.sum is a method of the arr array. For a ndarray is compiled code. A subclassed array may have a different sum method.
Most of the cases where the are like-named functions and methods, a relationship like this holds.
Look at the source for np.unique to see a different design. One difference that comes to mind is that unique only works with 1d arrays, or with a flattened array. It's not as general purpose a method like sum or mean.
Some of these differences follow a pattern, or are explained, others are probably more the result of a development history. Often it is easier to add new functionality by writing a 'stand-alone' function, rather than adding a method to an existing class. The method is more closely integrated with the class.
To get into more details you'll have to spend time reading the development archives. For roughly that last 5 years, much of that can found by searching the respective github repository and its issues.
Alright, so I apologize ahead of time if I'm just asking something silly, but I really thought I understood how apply_along_axis worked. I just ran into something that might be an edge case that I just didn't consider, but it's baffling me. In short, this is the code that is confusing me:
class Leaf(object):
def __init__(self, location):
self.location = location
def __len__(self):
return self.location.shape[0]
def bulk_leaves(child_array, axis=0):
test = np.array([Leaf(location) for location in child_array]) # This is what I want
check = np.apply_along_axis(Leaf, 0, child_array) # This returns an array of individual leafs with the same shape as child_array
return test, check
if __name__ == "__main__":
test, check = bulk_leaves(np.random.ran(100, 50))
test == check # False
I always feel silly using a list comprehension with numpy and then casting back to an array, but I'm just nor sure of another way to do this. Am I just missing something obvious?
The apply_along_axis is pure Python that you can look at and decode yourself. In this case it essentially does:
check = np.empty(child_array.shape,dtype=object)
for i in range(child_array.shape[1]):
check[:,i] = Leaf(child_array[:,i])
In other words, it preallocates the container array, and then fills in the values with an iteration. That certainly is better than appending to the array, but rarely better than appending values to a list (which is what the comprehension is doing).
You could take the above template and adjust it to produce the array that you really want.
for i in range(check.shape[0]):
check[i]=Leaf(child_array[i,:])
In quick tests this iteration times the same as the comprehension. The apply_along_axis, besides being wrong, is slower.
The problem seems to be that apply_along_axis uses isscalar to determine whether the returned object is a scalar, but isscalar returns False for user-defined classes. The documentation for apply_along_axis says:
The shape of outarr is identical to the shape of arr, except along the axis dimension, where the length of outarr is equal to the size of the return value of func1d.
Since your class's __len__ returns the length of the array it wraps, numpy "expands" the resulting array into the original shape. If you don't define a __len__, you'll get an error, because numpy doesn't think user-defined types are scalars, so it will still try to call len on it.
As far as I can see, there is no way to make this work with a user-defined class. You can return 1 from __len__, but then you'll still get an Nx1 2D result, not a 1D array of length N. I don't see any way to make Numpy see a user-defined instance as a scalar.
There is a numpy bug about the apply_along_axis behavior, but surprisingly I can't find any discussion of the underlying issue that isscalar returns False for non-numpy objects. It may be that numpy just decided to punt and not guess whether user-defined types are vector or scalar. Still, it might be worth asking about this on the numpy list, as it seems odd to me that things like isscalar(object()) return False.
However, if as you say you don't care about performance anyway, it doesn't really matter. Just use your first way with the list comprehension, which already does what you want.
I am new to cython and I am just looking for an easy way of casting a numpy array to a tuple that can then be added to and/or looked up in a dictionary.
In CPython, I can use PyTuple_New and iterate over the values of the array (adding each one to the tuple as though I were appending them to a list).
Cython does not seem to come with the usual CPython functions. How might I turn an array:
array([1,2,3])
into a tuple:
(1, 2, 3)
Cython is a superset of Python so any valid Python code is a valid Cython code. In this case, if you have a NumPy array, just passing it to a tuple class constructor should work just fine (just as you would do in regular Python).
a = np.array([1, 2, 3])
t = tuple(a)
Cython will take care of converting these constructs to appropriate C function calls.
I'm kind of new (1 day) to Python so maybe my question is stupid. I've already looked here but I can't find my answer.
I need to modify the content of an array at a random offset with a random size.
I have a Python API to interface a DDL for an USB device which I can't modify. There is a function just like this one :
def read_device(in_array):
# The DLL accesses the USB device, reads it into in_array of type array('B')
# in_array can be an int (the function will create an array with the int
# size and return it), or can be an array or, can be a tuple (array, int)
In MY code, I create an array of, let's say, 64 bytes and I want to read 16 bytes starting from the 32rd byte. In C, I'd give &my_64_array[31] to the read_device function.
In python, if a give :
read_device(my_64_array[31:31+16])
it seems that in_array is a reference to a copy of the given subset, therefore my_64_array is not modified.
What can I do ? Do I have to split my_64_array and recombine it after ??
Seeing as how you are not able to update and/or change the API code. The best method is to pass the function a small temporary array that you then assign to your existing 64 byte array after the function call.
So that would be something like the following, not knowing the exact specifics of your API call.
the_64_array[31:31+16] = read_device(16)
It's precisely as you say, if you input a slice into a function it creates a reference copy of the slice.
Two possible methods to add it later (assuming read_device returns the relevant slice):
my_64_array = my_64_array[:32] + read_device(my_64_array[31:31+16]) + my_64_array[31+16:]
# equivalently, but around 33% faster for even small arrays (length 10), 3 times faster for (length 50)...
my_64_array[31:31+16] = read_device(my_64_array[31:31+16])
So I think you should be using the latter.
.
If it was a modifiable function (but it's not in this case!) you could be to change your functions arguments (one is the entire array):
def read_device(the_64_array, start=31, end=47):
# some code
the_64_array[start:end] = ... #modifies `in_array` in place
and call read_device(my_64_array) or read(my_64_array, 31, 31+16).
When reading a list subset you're calling __getitem__ with a slice(x, y) argument of that list. In your case these statements are equal:
my_64_array[31:31+16]
my_64_array.__getitem__(slice(31, 31+16))
This means that the __getitem__ function can be overridden in a subclass to obtain different behaviour.
You can also set the same subset using a[1:3] = [1,2,3] in which case it'd call a.__setitem__(slice(1, 3), [1,2,3])
So I'd suggest either of these:
pass the list (my_64_array) and a slice object to read_device instead of passing the result of __getitem__, after which you could read the necessary data and set the corresponding offsets. No subclassing. This is probably the best solution in terms of readability and ease of development.
subclassing list, overriding __getitem__ and __setitem__ to return instances of that subclass with a parent reference, and then change all modifying or reading methods of a list to reference a parent list instead. This might be a little tricky if you're new to python, but basically, you'd exploit that python list properties are largely defined by the methods inside a list instance. This is probably better in terms of performance as you can create references.
If read_device returns the resulting list, and that list is of equal size, you can do this: a[x:y] = read_device(a[x:y])
I am trying to find how to apply two functions to a numpy array each one only on half the values.
Here is the code I have been trying
def hybrid_array(xs,height,center,fwhh):
xs[xs<=center] = height*np.exp((-(xs[xs<=center]-center)**2)/(2*(fwhh/(2*np.sqrt(2*np.log(2))))**2))
xs[xs>center] = height*1/(np.abs(1+((center-xs[xs>center])/(fwhh/2))**2))
return xs
However I am overwriting the initial array that is passed to the argument. The usual trick of copying it with a slice ie. the following still changes b.
a = b[:]
c = hybrid_array(a,args)
If there is a better way of doing any part of what I am trying, I would be very grateful if you could let me know as I am still new to numpy arrays.
Thank you
Try copy.deepcopy to copy the array b onto a before calling your function.
import copy
a = copy.deepcopy(b)
c = hybrid_array(a,args)
Alternatively, you can use the copy method of the array
a = b.copy()
c = hybrid_array(a,args)
Note***
You may be wondering, why despite an easier way to copy an array with the copy method of numpy array I introduced the copy.deepcopy. Other's may disagree but here is my reasoning
Using the method deepcopy makes it clear that you are intending to do a deepcopy instead of reference copy
All python's data type do not support the copy method. Numpy has it and good it has but when you are programming with numpy and python you may end up using various numpy and non numpy data types not all of which would support the copy method. To remain consistent I would prefer to use the first.
Copying a NumPy array a is done with a.copy(). In your application, however, there is no need to copy the old data. All you need is a new array of the same shape and dtype as the old one. You can use
result = numpy.empty_like(xs)
to create such an array. If you generally don't want your function to modify its parameter, you should do this inside the function, rather than requiring the caller to take care of this.