I am looking for some examples which shows the difference between numpy.asanyarray() and numpy.asarray()? And at which conditions should I use specifically asanyarray()?
Code for asanyarray:
return array(a, dtype, copy=False, order=order, subok=True)
for asarray:
return array(a, dtype, copy=False, order=order)
The only difference is in specifying the subok parameter. If you are working with subclasses of ndarray you might want to use it. If you don't know what that means, it probably doesn't matter.
The defaults for np.array are:
array(object, dtype=None, copy=True, order='K', subok=False, ndmin=0)
If you are fine tuning a function that is supposed to work with all kinds of numpy arrays (and lists that can be made into arrays), and shouldn't make unnecessary copies, you can use one of these functions. Otherwise np.array, without or without the extra prameters, works just fine. As a beginner don't put much effort into understanding these differences.
===
expand_dims uses both:
if isinstance(a, matrix):
a = asarray(a)
else:
a = asanyarray(a)
A np.matrix subclass array can only have 2 dimensions, but expand_dims has to change that, so uses asarray to turn the input into a regular ndarray. Otherwise it uses asanyarray. That way a subclass like maskedArray remains that class.
In [158]: np.expand_dims(np.eye(2),1)
Out[158]:
array([[[1., 0.]],
[[0., 1.]]])
In [159]: np.expand_dims(np.matrix(np.eye(2)),1)
Out[159]:
array([[[1., 0.]],
[[0., 1.]]])
In [160]: np.expand_dims(np.ma.masked_array(np.eye(2)),1)
Out[160]:
masked_array(
data=[[[1., 0.]],
[[0., 1.]]],
mask=False,
fill_value=1e+20)
Related
I have record array with 2×2 fixed-size item, with 10 rows; thus the column is 10×2x2. I would like to assign a constant to the whole column. Numpy array will broadcast scalar value correctly, but this does not work in h5py.
import numpy as np
import h5py
dt=np.dtype([('a',('f4',(2,2)))])
# h5py array
h5a=h5py.File('/tmp/t1.h5','w')['/'].require_dataset('test',dtype=dt,shape=(10,))
# numpy for comparison
npa=np.zeros((10,),dtype=dt)
h5a['a']=np.nan
# ValueError: changing the dtype of a 0d array is only supported if the itemsize is unchanged
npa['a']=np.nan
# numpy: broadcasts, OK
In fact, I can't find a way to assign the column without broadcasting:
h5a['a']=np.full((10,2,2),np.nan)
# ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array
Not even one element row:
h5a['a',0]=np.full((2,2),np.nan)
# ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array
What is the problem here?
In [69]: d = f.create_dataset('test', dtype=dt, shape=(3,))
We can set a like sized array:
In [90]: x=np.ones(3,dt)
In [91]: x[:]=2
In [92]: x
Out[92]:
array([([[2., 2.], [2., 2.]],), ([[2., 2.], [2., 2.]],),
([[2., 2.], [2., 2.]],)], dtype=[('a', '<f4', (2, 2))])
and assign it to the dataset:
In [93]: d[:]=x
In [94]: d
Out[94]: <HDF5 dataset "test": shape (3,), type "|V16">
In [95]: d[:]
Out[95]:
array([([[2., 2.], [2., 2.]],), ([[2., 2.], [2., 2.]],),
([[2., 2.], [2., 2.]],)], dtype=[('a', '<f4', (2, 2))])
We can also make a single element array with the correct dtype, and assign that:
In [116]: x=np.array((np.arange(4).reshape(2,2),),dt)
In [117]: x
Out[117]: array(([[0., 1.], [2., 3.]],), dtype=[('a', '<f4', (2, 2))])
In [118]: d[0]=x
With h5py we can index with record and field as:
In [119]: d[0,'a']
Out[119]:
array([[0., 1.],
[2., 3.]], dtype=float32)
Where as ndarray requires a double index as with: d[0]['a']
h5py tries to imitate ndarray indexing, but is not exactly the same. We just have to accept that.
edit
The [118] assignment can also be
In [207]: d[1,'a']=x
The dt here just as one field, but I think this should work with multiple fields. The key is that the value has to be a structured array that matches the d field specification.
I just noticed in the docs that they are trying to move away from the d[1,'a'] indexing, instead using d[1]['a']. But for assignment that doesn't seem to work - not error, just no action. I think d[1] or d['a'] is a copy, the equivalent of a advanced indexing for arrays. For a structured arrays those are view.
I want to take the row sums of one array and place the output into the diagonals of another array. For performance reasons, I want to use the out argument of the np.sum function.
mat1 = np.array([[0.5, 0.5],[0.6, 0.4]])
mat2 = np.zeros([2,2])
mat3 = np.zeros([2,2])
If I want to place the row sums of mat1 into the first row of mat2, I can do it like this:
np.sum(mat1, axis=1, out = mat2[0])
mat2
#array([[ 1., 1.],
# [ 0., 0.]])
However, if I want to place the sums into the diagonal indices of mat3, I can't seem to do so.
np.sum(mat1, axis=1, out = mat3[np.diag_indices(2)])
mat3
#array([[ 0., 0.],
# [ 0., 0.]])
Of course, the following works, but I would like to use the out argument of np.sum
mat3[np.diag_indices(2)] = np.sum(mat1, axis=1)
mat3
#array([[ 1., 0.],
# [ 0., 1.]])
Can someone explain this behavior of the out argument not accepting the diagonal indices of an array as a valid output?
NumPy has two types of indexing: basic indexing and advanced indexing.
Basic indexing is what happens when your index expression uses only integers, slices, ..., and None (a.k.a. np.newaxis). This can be implemented entirely through simple manipulation of offsets and strides, so when basic indexing returns an array, the resulting array is always a view of the original data. Writing to the view writes to the original array.
When you index with an array, as in mat3[np.diag_indices(2)], you get advanced indexing. Advanced indexing cannot be done in a way that returns a view of the original data; it always copies data from the original array. That means that when you try to use the copy as an out parameter:
np.sum(mat1, axis=1, out = mat3[np.diag_indices(2)])
The data is placed into the copy, but the original array is unaffected.
We were supposed to have the ability to use np.diagonal for this by now, but even though the documentation says np.diagonal's output is writeable in NumPy 1.10, the relevant feature for making it writable is still in limbo. It's probably best to just not use the out parameter for this:
mat3[np.diag_indices(2)] = np.sum(mat1, axis=1)
I can't figure out how to use the CArray trait. Why does this class
from traits.api import HasTraits, CArray, Float,Int
import numpy as np
class Coordinate3D(HasTraits):
coordinate = CArray(Float(), shape=(1,3) )
def _coordinate_default(self):
return np.array([1.,2.,3.])
apparently not use my _name_default() method?
In [152]: c=Coordinate3D()
In [153]: c.coordinate
Out[153]: np.array([[ 0., 0., 0.]])
I would have expected np.array([1,2,3])! The _name_default() seems to work with Int
class A(HasTraits):
a=Int
def _a_default(self):
return 2
In [163]: a=A()
In [164]: a.a
Out[164]: 2
So what I am doing wrong here? Also, I can't assign values:
In [181]: c.coordinate=[1,2,3]
TraitError: The 'coordinate' trait of a Coordinate3D instance must be an array of
float64 values with shape (1, 3), but a value of array([ 1., 2., 3.]) <type
'numpy.ndarray'> was specified.
Same error message with
In [182]: c.coordinate=np.array([1,2,3])
There is a difference between one-dimensional arrays and two-dimensional arrays in which one of the dimensions has size 1. You are trying to set a 1-D array into a CArray trait expecting two dimensions. For example, your default method should be:
def _coordinate_default(self):
return np.array([[1., 2., 3.]])
(note the extra square brackets). The array you were setting is of shape (3,), not the desired (1, 3).
Similarly, it will not coerce a flat list into a 2-D array. Try assigning a nested list like
c.coordinate=[[1, 2, 3]]
instead.
(Alternatively, if you actually want 1-D arrays, you should use shape=(3,) in your traits assignment and the other parts should work correctly.)
Dummy me. While copy-pasting from Eclipse to iPython, I didn't use the magic %paste function and messed up the class definition there. The other actual error was the shape of the CArray which must be (3,).
This code
class Coordinate3D(HasTraits):
coordinate = CArray(Float(),shape=(3,))
def __init__(self,iv=None):
super(Coordinate3D,self).__init__()
if iv:
self.coordinate=iv
def _coordinate_default(self):
return array([1,2,3])
def __getitem__(self,index):
return self.coordinate[index]
works like intended:
In [3]: c=Coordinate3D()
In [6]: c.coordinate
Out[6]: array([ 1., 2., 3.])
In [7]: c=Coordinate3D([1,2,5])
In [8]: c.coordinate
Out[8]: array([ 1., 2., 5.])
In [11]: c[0]
Out[11]: 1.0
In extension to the previous answers, I experimented further:
import types
RealNumberType = (types.IntType, types.LongType, types.FloatType)
class ScaleFactor3D(Coordinate3D):
'''Demonstrate subclassing a HasTraits class
and overriding __init__ and a _default method'''
def _coordinate_default(self):
return array([1,1,1])
def __init__(self,iv=None):
if isinstance(iv,RealNumberType):
iv=[iv,iv,iv]
super(ScaleFactor3D,self).__init__(iv)
This works well too:
In [35]: s=ScaleFactor3D()
In [36]: s.coordinate
Out[36]: array([ 1., 1., 1.])
In [37]: s=ScaleFactor3D(3)
In [38]: s.coordinate
Out[38]: array([ 3., 3., 3.])
I thought I'd put this here since I couldn't find much useful information on CArray on the web.
What is the difference between NumPy's np.array and np.asarray? When should I use one rather than the other? They seem to generate identical output.
The definition of asarray is:
def asarray(a, dtype=None, order=None):
return array(a, dtype, copy=False, order=order)
So it is like array, except it has fewer options, and copy=False. array has copy=True by default.
The main difference is that array (by default) will make a copy of the object, while asarray will not unless necessary.
Since other questions are being redirected to this one which ask about asanyarray or other array creation routines, it's probably worth having a brief summary of what each of them does.
The differences are mainly about when to return the input unchanged, as opposed to making a new array as a copy.
array offers a wide variety of options (most of the other functions are thin wrappers around it), including flags to determine when to copy. A full explanation would take just as long as the docs (see Array Creation, but briefly, here are some examples:
Assume a is an ndarray, and m is a matrix, and they both have a dtype of float32:
np.array(a) and np.array(m) will copy both, because that's the default behavior.
np.array(a, copy=False) and np.array(m, copy=False) will copy m but not a, because m is not an ndarray.
np.array(a, copy=False, subok=True) and np.array(m, copy=False, subok=True) will copy neither, because m is a matrix, which is a subclass of ndarray.
np.array(a, dtype=int, copy=False, subok=True) will copy both, because the dtype is not compatible.
Most of the other functions are thin wrappers around array that control when copying happens:
asarray: The input will be returned uncopied iff it's a compatible ndarray (copy=False).
asanyarray: The input will be returned uncopied iff it's a compatible ndarray or subclass like matrix (copy=False, subok=True).
ascontiguousarray: The input will be returned uncopied iff it's a compatible ndarray in contiguous C order (copy=False, order='C').
asfortranarray: The input will be returned uncopied iff it's a compatible ndarray in contiguous Fortran order (copy=False, order='F').
require: The input will be returned uncopied iff it's compatible with the specified requirements string.
copy: The input is always copied.
fromiter: The input is treated as an iterable (so, e.g., you can construct an array from an iterator's elements, instead of an object array with the iterator); always copied.
There are also convenience functions, like asarray_chkfinite (same copying rules as asarray, but raises ValueError if there are any nan or inf values), and constructors for subclasses like matrix or for special cases like record arrays, and of course the actual ndarray constructor (which lets you create an array directly out of strides over a buffer).
The difference can be demonstrated by this example:
Generate a matrix.
>>> A = numpy.matrix(numpy.ones((3, 3)))
>>> A
matrix([[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.]])
Use numpy.array to modify A. Doesn't work because you are modifying a copy.
>>> numpy.array(A)[2] = 2
>>> A
matrix([[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.]])
Use numpy.asarray to modify A. It worked because you are modifying A itself.
>>> numpy.asarray(A)[2] = 2
>>> A
matrix([[ 1., 1., 1.],
[ 1., 1., 1.],
[ 2., 2., 2.]])
The differences are mentioned quite clearly in the documentation of array and asarray. The differences lie in the argument list and hence the action of the function depending on those parameters.
The function definitions are :
numpy.array(object, dtype=None, copy=True, order=None, subok=False, ndmin=0)
and
numpy.asarray(a, dtype=None, order=None)
The following arguments are those that may be passed to array and not asarray as mentioned in the documentation :
copy : bool, optional If true (default), then the object is copied.
Otherwise, a copy will only be made if __array__ returns a copy, if
obj is a nested sequence, or if a copy is needed to satisfy any of the
other requirements (dtype, order, etc.).
subok : bool, optional If True, then sub-classes will be
passed-through, otherwise the returned array will be forced to be a
base-class array (default).
ndmin : int, optional Specifies the minimum number of dimensions that
the resulting array should have. Ones will be pre-pended to the shape
as needed to meet this requirement.
asarray(x) is like array(x, copy=False)
Use asarray(x) when you want to ensure that x will be an array before any other operations are done. If x is already an array then no copy would be done. It would not cause a redundant performance hit.
Here is an example of a function that ensure x is converted into an array first.
def mysum(x):
return np.asarray(x).sum()
Here's a simple example that can demonstrate the difference.
The main difference is that array will make a copy of the original data and using different object we can modify the data in the original array.
import numpy as np
a = np.arange(0.0, 10.2, 0.12)
int_cvr = np.asarray(a, dtype = np.int64)
The contents in array (a), remain untouched, and still, we can perform any operation on the data using another object without modifying the content in original array.
Let's Understand the difference between np.array() and np.asarray() with the example:
np.array(): Convert input data (list, tuple, array, or other sequence type) to an ndarray and copies the input data by default.
np.asarray(): Convert input data to an ndarray but do not copy if the input is already an ndarray.
#Create an array...
arr = np.ones(5); # array([1., 1., 1., 1., 1.])
#Now I want to modify `arr` with `array` method. Let's see...
arr = np.array(arr)[3] = 200; # array([1., 1., 1., 1., 1.])
No change in the array because we are modify a copy of the arr.
Now, modify arr with asarray() method.
arr = np.asarray(arr)[3] = 200; # array([1., 200, 1., 1., 1.])
The change occur in this array because we are work with the original array now.
I am having a hard time creating a numpy 2D array on the fly.
So basically I have a for loop something like this.
for ele in huge_list_of_lists:
instance = np.array(ele)
creates a 1D numpy array of this list and now I want to append it to a numpy array so basically converting list of lists to array of arrays?
I have checked the manual.. and np.append() methods that doesn't work as for np.append() to work, it needs two arguments to append it together.
Any clues?
Create the 2D array up front, and fill the rows while looping:
my_array = numpy.empty((len(huge_list_of_lists), row_length))
for i, x in enumerate(huge_list_of_lists):
my_array[i] = create_row(x)
where create_row() returns a list or 1D NumPy array of length row_length.
Depending on what create_row() does, there might be even better approaches that avoid the Python loop altogether.
Just pass the list of lists to numpy.array, keep in mind that numpy arrays are ndarrays, so the concept to a list of lists doesn't translate to arrays of arrays it translates to a 2d array.
>>> import numpy as np
>>> a = [[1., 2., 3.], [4., 5., 6.]]
>>> b = np.array(a)
>>> b
array([[ 1., 2., 3.],
[ 4., 5., 6.]])
>>> b.shape
(2, 3)
Also ndarrays have nd-indexing so [1][1] becomes [1, 1] in numpy:
>>> a[1][1]
5.0
>>> b[1, 1]
5.0
Did I misunderstand your question?
You defiantly don't want to use numpy.append for something like this. Keep in mind that numpy.append has O(n) run time so if you call it n times, once for each row of your array, you end up with a O(n^2) algorithm. If you need to create the array before you know what all the content is going to be, but you know the final size, it's best to create an array using numpy.zeros(shape, dtype) and fill it in later. Similar to Sven's answer.
import numpy as np
ss = np.ndarray(shape=(3,3), dtype=int);
array([[ 0, 139911262763080, 139911320845424],
[ 10771584, 10771584, 139911271110728],
[139911320994680, 139911206874808, 80]]) #random
numpy.ndarray function achieves this. numpy.ndarray