What is the difference between ndarray and array in NumPy? Where is their implementation in the NumPy source code?
numpy.array is just a convenience function to create an ndarray; it is not a class itself.
You can also create an array using numpy.ndarray, but it is not the recommended way. From the docstring of numpy.ndarray:
Arrays should be constructed using array, zeros or empty ... The parameters given here refer to a
low-level method (ndarray(...)) for instantiating an array.
Most of the meat of the implementation is in C code, here in multiarray, but you can start looking at the ndarray interfaces here:
https://github.com/numpy/numpy/blob/master/numpy/core/numeric.py
numpy.array is a function that returns a numpy.ndarray object.
There is no object of type numpy.array.
Just a few lines of example code to show the difference between numpy.array and numpy.ndarray
Warm up step: Construct a list
a = [1,2,3]
Check the type
print(type(a))
You will get
<class 'list'>
Construct an array (from a list) using np.array
a = np.array(a)
Or, you can skip the warm up step, directly have
a = np.array([1,2,3])
Check the type
print(type(a))
You will get
<class 'numpy.ndarray'>
which tells you the type of the numpy array is numpy.ndarray
You can also check the type by
isinstance(a, (np.ndarray))
and you will get
True
Either of the following two lines will give you an error message
np.ndarray(a) # should be np.array(a)
isinstance(a, (np.array)) # should be isinstance(a, (np.ndarray))
numpy.ndarray() is a class, while numpy.array() is a method / function to create ndarray.
In numpy docs if you want to create an array from ndarray class you can do it with 2 ways as quoted:
1- using array(), zeros() or empty() methods:
Arrays should be constructed using array, zeros or empty (refer to the See Also section below). The parameters given here refer to a low-level method (ndarray(…)) for instantiating an array.
2- from ndarray class directly:
There are two modes of creating an array using __new__:
If buffer is None, then only shape, dtype, and order are used.
If buffer is an object exposing the buffer interface, then all keywords are interpreted.
The example below gives a random array because we didn't assign buffer value:
np.ndarray(shape=(2,2), dtype=float, order='F', buffer=None)
array([[ -1.13698227e+002, 4.25087011e-303],
[ 2.88528414e-306, 3.27025015e-309]]) #random
another example is to assign array object to the buffer
example:
>>> np.ndarray((2,), buffer=np.array([1,2,3]),
... offset=np.int_().itemsize,
... dtype=int) # offset = 1*itemsize, i.e. skip first element
array([2, 3])
from above example we notice that we can't assign a list to "buffer" and we had to use numpy.array() to return ndarray object for the buffer
Conclusion: use numpy.array() if you want to make a numpy.ndarray() object"
I think with np.array() you can only create C like though you mention the order, when you check using np.isfortran() it says false. but with np.ndarrray() when you specify the order it creates based on the order provided.
Related
I am creating a ndarray using:
import numpy as np
arr=np.array({1,2})
print(arr, type(arr))
which outputs
{1, 2} <class 'numpy.ndarray'>
If its type is numpy.ndarray, then o/p must be in square brackets like [1,2]?
Thanks
Yes, but it's because you put on the function np.array a set and not a list
if you try this:
import numpy as np
arr=np.array([1,2])
print(arr, type(arr))
you get:
[1 2] <class 'numpy.ndarray'>
This does something slightly different than you might imagine. Instead of constructing an array with the data you specify, the numbers 1 and 2, you're actually building an array of type object. See below:
>>> np.array({1, 2)).dtype
dtype('O')
This is because sets are not "array-like", in NumPy's terminology, in particular they are not ordered. Thus the array construction does not build an array with the contents of the set, but with the set itself as a single object.
If you really want to build an array from the set's contents you could do the following:
>>> x = np.fromiter(iter({1, 2}), dtype=int)
>>> x.dtype
dtype('int64')
Edit: This answer helps explain how various types are used to build an array in NumPy.
It returns a numpy array object with no dimensions. A set is an object. It is similar to passing numpy.array a number (without brackets). See the difference here:
arr=np.array([1])
arr.shape: (1,)
arr=np.array(1)
arr.shape: ()
arr=np.array({1,2})
arr.shape: ()
Therefore, it treats your entire set as a single object and creates a numpy array with no dimensions that only returns the set object. Sets are not array-like and do not have order, hence according to numpy array doc they are not converted to arrays like you expect. If you wish to create a numpy array from a set and you do not care about its order, use:
arr=np.fromiter({1,2},int)
arr.shape: (2,)
The repr display of ipython may make this clearer:
In [162]: arr=np.array({1,2})
In [163]: arr
Out[163]: array({1, 2}, dtype=object)
arr is a 0d array, object dtype, contain 1 item, the set.
But if we first turn the set into a list:
In [164]: arr=np.array(list({1,2}))
In [165]: arr
Out[165]: array([1, 2])
now we have a 1d (2,) integer dtype array.
np.array(...) converts list (and list like) arguments into a multdimensional array. A set is not sufficiently list-like.
I often write functions/methods that take some variable which can come in many forms, i.e., lists of lists, lists of tuples, tuples of tuples, etc. all containing numbers, that I want to convert into a numpy array, kinda like the following:
import numpy as np
def my_func(var: 'what-freaking-type-here') -> np.ndarray:
a = np.asarray(var, dtype=np.float64) # type: np.array[np.float] maybe?
return a
Basically my question is how to type this appropriately, given that I can pass all kinds of values to this function to finally create a 2D float array (note that this just an example and the dimensionality and type should be interchangeable):
my_func([[0], [0]])
my_func([(0,), (2.3,)])
my_func(((0,), [2.3,]))
my_func(np.arange(100).reshape(10, 10))
I have this practice of taking all kinds of values and turning them into numpy arrays in a lot of places, to make working with the functions easy and intuitive. However, I have no idea how to properly type this to verify with mypy. Any hints?
Try the numpy-stubs: experimental typing stubs for NumPy.
It defines the type of the np.array() function like this:
def array(
object: object,
dtype: _DtypeLike = ...,
copy: bool = ...,
subok: bool = ...,
ndmin: int = ...,
) -> ndarray: ...
Which takes any object for the contents and returns an ndarray type.
It's a work in progress. Do report back if it's effective at this stage.
There's also an older project numpy-mypy. As it points out,
Quite a few numpy methods are incredibly flexible and they do their best to accommodate to any possible argument combination. ... Although this is great for users, it caused us a lot of problems when trying to describe the type signature for those methods.
It defines the type of the np.array() function like this:
def array(object: Any, dtype: Any=None, copy: bool=True,
order: str=None, subok: bool=False,
ndmin: int=0) -> ndarray[Any]: ...
Which takes Any for the contents (no type checking there) and returns an ndarray type that's parameterized (generic) by the element type.
I would like to pass the following array of list of integers (i.e., it's not an two dimensional array) to the Cython method from python code.
Python Sample Code:
import numpy as np
import result
a = np.array([[1], [2,3]])
process_result(a)
The output of a is array([list([1]), list([2, 3])], dtype=object)
Cython Sample Code:
def process_result(int[:,:] a):
pass
The above code gives the following error:
ValueError: Buffer has wrong number of dimensions (expected 2, got 1)
I tried to pass a simple array instead of numpy I got the following error
a = [[1], [2,3]]
process_result(a)
TypeError: a bytes-like object is required, not 'list'
Kindly assist me how to pass the value of a into the Cython method process_result and whats the exact datatype needs to use to receive this value in Cython method.
I think you're using the wrong data-type. Instead of a numpy array of lists, you should be using a list of numpy arrays. There is very little benefit of using numpy arrays of Python objects (such as lists) - unlike numeric types they aren't stored particulatly efficiently, they aren't quick to do calculations on, and you can't accelerate them in Cython. Therefore the outermost level may as well be a normal Python list.
However, the inner levels all look to be homogenous arrays of integers, and so would be ideal candidates for Numpy arrays (especially if you want to process them in Cython).
Therefore, build your list as:
a = [ np.array([1],dtype=np.int), np.array([2,3],dtype=np.int) ]
(Or use tolist on a numpy array)
For your function you can define it like:
def process_result(list a):
cdef int[:] item
for item in a:
#operations on the inner arrays are fast!
pass
Here I've assumed that you most likely want to iterate over the list. Note that there's pretty little benefit in typing a to be list, so you could just leave it untyped (to accept any Python object) and then you could pass it other iterables too, like your original numpy array.
Convert the array of list of integer to list of object (i.e., list of list of integers - its not a two dimensional array)
Python Code:
import numpy as np
import result
a = np.array([[1], [2,3]]).tolist()
process_result(a)
The output of a is [[1], [2,3]]
Cython Sample Code:
def process_result(list a):
pass
Change the int[:, :] to list. It works fine.
Note: If anyone know the optimal answer kindly post it, It will be
helpful.
I am pulling data from a .CSV into an array, as follows:
my_data = genfromtxt('nice.csv', delimiter='')
a = np.array(my_data)
I then attempt to establish the size and shape of the array, thus:
size_array=np.size(a)
shape_array=np.shape(a)
Now, I want to generate an array of identical shape and size, and then carry out some multiplications. The trouble I am having is generating the correctly sized array. I have tried this:
D = np.empty([shape_array,])
I receive the error:
"tuple' object cannot be interpreted as an index".
After investigation my array has a shape of (248L,). Please...how do I get this array in a sensible format?
Thanks.
The line shape_array=np.shape(a) creates a shape tuple, which is the expected input to np.empty.
The expression [shape_array,] is that tuple, wrapped in a list, which seems superfluous. Use shape_array directly:
d = np.empty(shape_array)
On a related note, you can use the function np.empty_like to get an array of the same shape and type as the original more effectively:
d = np.empty_like(a)
If you want to use just the shape and size, there is really no need to store them in separate variables after calling np.size and np.shape. It is more idiomatic to use the corresponding properties of np.ndarray directly:
d = np.empty(a.shape)
I came across the following oddity in numpy which may or may not be a bug:
import numpy as np
dt = np.dtype([('tuple', (int, 2))])
a = np.zeros(3, dt)
type(a['tuple'][0]) # ndarray
type(a[0]['tuple']) # ndarray
a['tuple'][0] = (1,2) # ok
a[0]['tuple'] = (1,2) # ValueError: shape-mismatch on array construction
I would have expected that both of the options below work.
Opinions?
I asked that on the numpy-discussion list. Travis Oliphant answered here.
Citing his answer:
The short answer is that this is not really a "normal" bug, but it could be considered a "design" bug (although the issues may not be straightforward to resolve). What that means is that it may not be changed in the short term --- and you should just use the first spelling.
Structured arrays can be a confusing area of NumPy for several of reasons. You've constructed an example that touches on several of them. You have a data-type that is a "structure" array with one member ("tuple"). That member contains a 2-vector of integers.
First of all, it is important to remember that with Python, doing
a['tuple'][0] = (1,2)
is equivalent to
b = a['tuple']; b[0] = (1,2)
In like manner,
a[0]['tuple'] = (1,2)
is equivalent to
b = a[0]; b['tuple'] = (1,2)
To understand the behavior, we need to dissect both code paths and what happens. You built a (3,) array of those elements in 'a'. When you write b = a['tuple'] you should probably be getting a (3,) array of (2,)-integers, but as there is currently no formal dtype support for (n,)-integers as a general dtype in NumPy, you get back a (3,2) array of integers which is the closest thing that NumPy can give you. Setting the [0] row of this object via
a['tuple'][0] = (1,2)
works just fine and does what you would expect.
On the other hand, when you type:
b = a[0]
you are getting back an array-scalar which is a particularly interesting kind of array scalar that can hold records. This new object is formally of type numpy.void and it holds a "scalar representation" of anything that fits under the "VOID" basic dtype.
For some reason:
b['tuple'] = [1,2]
is not working. On my system I'm getting a different error: TypeError: object of type 'int' has no len()
I think this should be filed as a bug on the issue tracker which is for the time being here: http://projects.scipy.org/numpy
The problem is ultimately the void->copyswap function being called in voidtype_setfields if someone wants to investigate. I think this behavior should work.
An explanation for this is given in a numpy bug report.
I get a different error than you do (using numpy 1.7.0.dev):
ValueError: setting an array element with a sequence.
so the explanation below may not be correct for your system (or it could even be the wrong explanation for what I see).
First, notice that indexing a row of a structured array gives you a numpy.void object (see data type docs)
import numpy as np
dt = np.dtype([('tuple', (int, 2))])
a = np.zeros(3, dt)
print type(a[0]) # = numpy.void
From what I understand, void is sort of like a Python list since it can hold objects of different data types, which makes sense since the columns in a structured array can be different data types.
If, instead of indexing, you slice out the first row, you get an ndarray:
print type(a[:1]) # = numpy.ndarray
This is analogous to how Python lists work:
b = [1, 2, 3]
print b[0] # 1
print b[:1] # [1]
Slicing returns a shortened version of the original sequence, but indexing returns an element (here, an int; above, a void type).
So when you slice into the rows of the structured array, you should expect it to behave just like your original array (only with fewer rows). Continuing with your example, you can now assign to the 'tuple' columns of the first row:
a[:1]['tuple'] = (1, 2)
So,... why doesn't a[0]['tuple'] = (1, 2) work?
Well, recall that a[0] returns a void object. So, when you call
a[0]['tuple'] = (1, 2) # this line fails
you're assigning a tuple to the 'tuple' element of that void object. Note: despite the fact you've called this index 'tuple', it was stored as an ndarray:
print type(a[0]['tuple']) # = numpy.ndarray
So, this means the tuple needs to be cast into an ndarray. But, the void object can't cast assignments (this is just a guess) because it can contain arbitrary data types so it doesn't know what type to cast to. To get around this you can cast the input yourself:
a[0]['tuple'] = np.array((1, 2))
The fact that we get different errors suggests that the above line might not work for you since casting addresses the error I received---not the one you received.
Addendum:
So why does the following work?
a[0]['tuple'][:] = (1, 2)
Here, you're indexing into the array when you add [:], but without that, you're indexing into the void object. In other words, a[0]['tuple'][:] says "replace the elements of the stored array" (which is handled by the array), a[0]['tuple'] says "replace the stored array" (which is handled by void).
Epilogue:
Strangely enough, accessing the row (i.e. indexing with 0) seems to drop the base array, but it still allows you to assign to the base array.
print a['tuple'].base is a # = True
print a[0].base is a # = False
a[0] = ((1, 2),) # `a` is changed
Maybe void is not really an array so it doesn't have a base array,... but then why does it have a base attribute?
This was an upstream bug, fixed as of NumPy PR #5947, with a fix in 1.9.3.