Is a set converted to numpy array? - python

I am creating a ndarray using:
import numpy as np
arr=np.array({1,2})
print(arr, type(arr))
which outputs
{1, 2} <class 'numpy.ndarray'>
If its type is numpy.ndarray, then o/p must be in square brackets like [1,2]?
Thanks

Yes, but it's because you put on the function np.array a set and not a list
if you try this:
import numpy as np
arr=np.array([1,2])
print(arr, type(arr))
you get:
[1 2] <class 'numpy.ndarray'>

This does something slightly different than you might imagine. Instead of constructing an array with the data you specify, the numbers 1 and 2, you're actually building an array of type object. See below:
>>> np.array({1, 2)).dtype
dtype('O')
This is because sets are not "array-like", in NumPy's terminology, in particular they are not ordered. Thus the array construction does not build an array with the contents of the set, but with the set itself as a single object.
If you really want to build an array from the set's contents you could do the following:
>>> x = np.fromiter(iter({1, 2}), dtype=int)
>>> x.dtype
dtype('int64')
Edit: This answer helps explain how various types are used to build an array in NumPy.

It returns a numpy array object with no dimensions. A set is an object. It is similar to passing numpy.array a number (without brackets). See the difference here:
arr=np.array([1])
arr.shape: (1,)
arr=np.array(1)
arr.shape: ()
arr=np.array({1,2})
arr.shape: ()
Therefore, it treats your entire set as a single object and creates a numpy array with no dimensions that only returns the set object. Sets are not array-like and do not have order, hence according to numpy array doc they are not converted to arrays like you expect. If you wish to create a numpy array from a set and you do not care about its order, use:
arr=np.fromiter({1,2},int)
arr.shape: (2,)

The repr display of ipython may make this clearer:
In [162]: arr=np.array({1,2})
In [163]: arr
Out[163]: array({1, 2}, dtype=object)
arr is a 0d array, object dtype, contain 1 item, the set.
But if we first turn the set into a list:
In [164]: arr=np.array(list({1,2}))
In [165]: arr
Out[165]: array([1, 2])
now we have a 1d (2,) integer dtype array.
np.array(...) converts list (and list like) arguments into a multdimensional array. A set is not sufficiently list-like.

Related

Numpy empty list type inference

Why is the empty list [] being inferred as float type when using np.append?
np.append([1,2,3], [0])
# output: array([1, 2, 3, 0]), dtype = np.int64
np.append([1,2,3], [])
# output: array([1., 2., 3.]), dtype = np.float64
This is persistent even when using a np.array([1,2,3], dtype=np.int32) as arr.
It's not possible to specify a dtype for append, so I am just curious on why this happens. Numpy's concatenate does the same thing, but when I try to specify the dtype I get an error:
np.concatenate([[1,2,3], []], dtype=np.int64)
Error:
TypeError: Cannot cast array data from dtype('float64') to dtype('int64') according to the rule 'same_kind'
But finally if I set the unsafe casting rule it works:
np.concatenate([[1,2,3], []], dtype=np.int64, casting='unsafe')
Why is [] considered a float?
np.append is subject to well-defined semantic rules like any Numpy binary operation. As a result, it first converts the input operands to Numpy arrays if this is not the case (typically with np.array) and then apply the semantic rules to find the type of the resulting array and check it is a valid operation before applying the actual operation (here the concatenation). The array type returned by np.array is "determined as the minimum type required to hold the objects in the sequence" regarding to the documentation. When the list is empty, like in your case, the default type is numpy.float64 as stated in the documentation of np.empty. This arbitrary choice was made long ago and has not been changed since in order not to break old codes. Please note that It seems not all Numpy developers agree with the current choice and so this is a matter of debate. For more information, you can read this opened issue.
The rule of thumb is to use either existing Numpy arrays or to perform an explicit conversion to a Numpy array using np.array with a fixed dtype parameter (as described in the above comments).
Look at the code for np.append (via docs link or ipython):
def append(arr, values, axis=None):
arr = asanyarray(arr)
if axis is None:
if arr.ndim != 1:
arr = arr.ravel()
values = ravel(values)
axis = arr.ndim-1
return concatenate((arr, values), axis=axis)
The first argument is turned into an array, if it isn't one already.
You don't specify the axis, so both arr and values are ravelled - turned into 1d array. np.ravel is also python code, and does asanyarray(a).ravel(order=order)
So the dtype inference is done by np.asanyarray.
The rest of the action is np.concatenate. It too will convert the inputs to arrays if necessary. The result dtype is the "highest" of the inputs.
np.append is a poorly conceived (IMO) alternative way of using np.concatenate. It is not a list append clone.
Also be careful about "empty" arrays:
In [73]: np.array([])
Out[73]: array([], dtype=float64)
In [74]: np.empty((0))
Out[74]: array([], dtype=float64)
In [75]: np.empty((0),int)
Out[75]: array([], dtype=int64)
The common list idiom
alist = []
for i in range(10):
alist.append(i)
does not translate well into numpy. Build a list of arrays, and do one concatenate/vstack at the end. Don't iterate over "empty" arrays, however created.

Why the length of the array appended in loop is more than the number of iteration?

I ran this code and expected an array size of 10000 as time is a numpy array of length of 10000.
freq=np.empty([])
for i,t in enumerate(time):
freq=np.append(freq,np.sin(t))
print(time.shape)
print(freq.shape)
But this is the output I got
(10000,)
(10001,)
Can someone explain why I am getting this disparity?
It turns out that the function np.empty() returns an uninitialized array of a given shape. Hence, when you do np.empty([]), it returns an uninitialized array as array(0.14112001). It's like having a value "ready to be used", but without having the actual value. You can check this out by printing the variable freq before the loop starts.
So, when you loop over freq = np.append(freq,np.sin(t)) this actually initializes the array and append a second value to it.
Also, if you just need to create an empty array just do x = np.array([]) or x = [].
You can read more about this numpy.empty function here:
https://numpy.org/doc/1.18/reference/generated/numpy.empty.html
And more about initializing arrays here:
https://www.ibm.com/support/knowledgecenter/SSGH2K_13.1.3/com.ibm.xlc1313.aix.doc/language_ref/aryin.html
I'm not sure if I was clear enough. It's not a straight forward concept. So please let me know.
You should fill np.empty(0).
I look for source code of numpy numpy/core.py
def empty(shape, dtype=None, order='C'):
"""Return a new matrix of given shape and type, without initializing entries.
Parameters
----------
shape : int or tuple of int
Shape of the empty matrix.
dtype : data-type, optional
Desired output data-type.
order : {'C', 'F'}, optional
Whether to store multi-dimensional data in row-major
(C-style) or column-major (Fortran-style) order in
memory.
See Also
--------
empty_like, zeros
Notes
-----
`empty`, unlike `zeros`, does not set the matrix values to zero,
and may therefore be marginally faster. On the other hand, it requires
the user to manually set all the values in the array, and should be
used with caution.
Examples
--------
>>> import numpy.matlib
>>> np.matlib.empty((2, 2)) # filled with random data
matrix([[ 6.76425276e-320, 9.79033856e-307], # random
[ 7.39337286e-309, 3.22135945e-309]])
>>> np.matlib.empty((2, 2), dtype=int)
matrix([[ 6600475, 0], # random
[ 6586976, 22740995]])
"""
return ndarray.__new__(matrix, shape, dtype, order=order)
It will input first arg shape into ndarray, so it will init a new array as [].
And you can print np.empty(0) and freq=np.empty([]) to see what are their differences.
I think you are trying to replicate a list operation:
freq=[]
for i,t in enumerate(time):
freq.append(np.sin(t))
But neither np.empty or np.append are exact clones; the names are similar but the differences are significant.
First:
In [75]: np.empty([])
Out[75]: array(1.)
In [77]: np.empty([]).shape
Out[77]: ()
This is a 1 element, 0d array.
If you look at the code for np.append you'll see that if the 1st argument is not 1d (and axis is not provided) it flattens it (that's documented as well):
In [78]: np.append??
In [82]: np.empty([]).ravel()
Out[82]: array([1.])
In [83]: np.empty([]).ravel().shape
Out[83]: (1,)
It is not a 1d, 1 element array. Append that with another array:
In [84]: np.append(np.empty([]), np.sin(2))
Out[84]: array([1. , 0.90929743])
The result is 2d. Repeat that 1000 times and you end up with 1001 values.
np.empty despite its name does not produce a [] list equivalent. As others show np.array([]) sort of does, as would np.empty(0).
np.append is not a list append clone. It is just a cover function to np.concatenate. It's ok for adding an element to a longer array, but beyond that it has too many pitfalls to be useful. It's especially bad in a loop like this. Getting a correct start array is tricky. And it is slow (compared to list append). Actually these problems apply to all uses of concatenate and stack... in a loop.

How to append to a ndarray

I'm new to Numpy library from Python and I'm not sure what I'm doing wrong here, could you help me please with this?
So, I initialize my ndarray like this.
A = np.array([])
And then I'm training to append into this array A a new array X which has a shape like (1000,32,32) if has any importance.
np.insert(A, X)
The problem here is that if I'm checking the ndarray A after that it's empty, even though the ndarray X has elements inside.
Could you explain me what exactly I'm doing wrong please?
Make sure to write back to A if you use np.append, as in A = np.append(A,X) -- the top-level numpy functions like np.insert and np.append are usually immutable, so even though it gives you a value back, it's your job to store it. np.array likes to flatten the np.ndarray if you use append, so honestly, I think you just want a regular list for A, and that append method is mutable, so no need to write it back.
>>> A = []
>>> X = np.ndarray((1000,32,32))
>>> A.append(X)
>>> print(A)
[array([[[1.43351171e-316, 4.32573840e-317, 4.58492919e-320, ...,
1.14551501e-259, 6.01347002e-154, 1.39804329e-076],
[1.39803697e-076, 1.39804328e-076, 1.39642638e-076, ...,
1.18295070e-076, 7.06474122e-096, 6.01347002e-154],
[1.39804328e-076, 1.39642638e-076, 1.39804065e-076, ...,
1.05118732e-153, 6.01334510e-154, 3.24245662e-086],
...
In [10]: A = np.array([])
In [11]: A.shape
Out[11]: (0,)
In [13]: np.concatenate([A, np.ones((2,3))])
---------------------------------------------------------------------------
...
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 1 dimension(s) and the array at index 1 has 2 dimension(s)
So one first things you need to learn about numpy arrays is that they have shape, and a number of dimensions. Hopefully that error message is clear.
Concatenate with another 1d array does work:
In [14]: np.concatenate([A, np.arange(3)])
Out[14]: array([0., 1., 2.])
But that is just np.arange(3). The concatenate does nothing for us. OK, you might imagine starting a loop like this. But don't. This is not efficient.
You could easily concatenate a list of arrays, as long as the dimensions obey the rules specified in the docs. Those rules are logical, as long as you take the dimensions of the arrays seriously.
In [15]: X = np.ones((1000,32,32))
In [16]: np.concatenate([X,X,X], axis=1).shape
Out[16]: (1000, 96, 32)

Funky behavior with numpy arrays

Am hoping someone can explain to me the following behavior I observe with a numpy array:
>>> import numpy as np
>>> data_block=np.zeros((26,480,1000))
>>> indices=np.arange(1000)
>>> indices.shape
(1000,)
>>> data_block[0,:,:].shape
(480, 1000) #fine and dandy
>>> data_block[0,:,indices].shape
(1000, 480) #what happened???? why the transpose????
>>> ind_slice=np.arange(300) # this is more what I really want.
>>> data_block[0,:,ind_slice].shape
(300, 480) # transpose again! arghhh!
I don't understand this transposing behavior and it is very inconvenient for what I want to do. Could anyone explain it to me? An alternative method for getting that subset of data_block would be a great bonus.
You can achieve your desired result this way:
>>> data_block[0,:,:][:,ind_slice].shape
(480L, 300L)
I confess I don't have a complete understanding of how complicated numpy indexing works, but the documentation seems to hint at the trouble you're having:
Basic slicing with more than one non-: entry in the slicing tuple, acts like repeated application of slicing using a single non-: entry, where the non-: entries are successively taken (with all other non-: entries replaced by :). Thus, x[ind1,...,ind2,:] acts like x[ind1][...,ind2,:] under basic slicing.
Warning: The above is not true for advanced slicing.
and. . .
Advanced indexing is triggered when the selection object, obj, is a non-tuple sequence object, an ndarray (of data type integer or bool), or a tuple with at least one sequence object or ndarray (of data type integer or bool).
Thus you are triggering that behavior by indexing with your ind_slice array instead of a regular slice.
The documentation itself says that this kind of indexing "can be somewhat mind-boggling to understand", so it's not surprising we both have trouble with this :-).
There really is not much to be surprised about once you understand how fancy indexing works. If you have lists or arrays as indices, they must all be of the same shape, or be broadcastable to a common shape. That shape will be the base shape of the return array. If there are indices which are slices, then every entry in the base shape array will be multidimensional, so the base shape gets extended with extra entries. While this may seem a weird choice, it really is the only one consistent with multidimensional fancy indexing. As an example, try to figure what would you expect the return shape to be if you did the following:
>>> ind_slice=np.arange(16).reshape(4, 4)
>>> data_block[ind_slice, :, ind_slice].shape
(4, 4, 480) # No, (4, 4, 480, 4, 4) is not a better option
There are several ways to get what you are after. For the particular case in your question, the most obvious would be to not use fancy indexing, as you can get what you ask with slices:
>>> data_block[0, :, :300].shape
(480, 300)
If you do need fancy indexing, you can replace slices with broadcastable arrays:
>>> data_block[0, np.arange(480)[:, None], ind_slice].shape
(480, 300)
You may want to take a look at np.ogrid and np.mgrid if you need to replace more complicated slices with arrays.

What is the difference between ndarray and array in NumPy?

What is the difference between ndarray and array in NumPy? Where is their implementation in the NumPy source code?
numpy.array is just a convenience function to create an ndarray; it is not a class itself.
You can also create an array using numpy.ndarray, but it is not the recommended way. From the docstring of numpy.ndarray:
Arrays should be constructed using array, zeros or empty ... The parameters given here refer to a
low-level method (ndarray(...)) for instantiating an array.
Most of the meat of the implementation is in C code, here in multiarray, but you can start looking at the ndarray interfaces here:
https://github.com/numpy/numpy/blob/master/numpy/core/numeric.py
numpy.array is a function that returns a numpy.ndarray object.
There is no object of type numpy.array.
Just a few lines of example code to show the difference between numpy.array and numpy.ndarray
Warm up step: Construct a list
a = [1,2,3]
Check the type
print(type(a))
You will get
<class 'list'>
Construct an array (from a list) using np.array
a = np.array(a)
Or, you can skip the warm up step, directly have
a = np.array([1,2,3])
Check the type
print(type(a))
You will get
<class 'numpy.ndarray'>
which tells you the type of the numpy array is numpy.ndarray
You can also check the type by
isinstance(a, (np.ndarray))
and you will get
True
Either of the following two lines will give you an error message
np.ndarray(a) # should be np.array(a)
isinstance(a, (np.array)) # should be isinstance(a, (np.ndarray))
numpy.ndarray() is a class, while numpy.array() is a method / function to create ndarray.
In numpy docs if you want to create an array from ndarray class you can do it with 2 ways as quoted:
1- using array(), zeros() or empty() methods:
Arrays should be constructed using array, zeros or empty (refer to the See Also section below). The parameters given here refer to a low-level method (ndarray(…)) for instantiating an array.
2- from ndarray class directly:
There are two modes of creating an array using __new__:
If buffer is None, then only shape, dtype, and order are used.
If buffer is an object exposing the buffer interface, then all keywords are interpreted.
The example below gives a random array because we didn't assign buffer value:
np.ndarray(shape=(2,2), dtype=float, order='F', buffer=None)
array([[ -1.13698227e+002, 4.25087011e-303],
[ 2.88528414e-306, 3.27025015e-309]]) #random
another example is to assign array object to the buffer
example:
>>> np.ndarray((2,), buffer=np.array([1,2,3]),
... offset=np.int_().itemsize,
... dtype=int) # offset = 1*itemsize, i.e. skip first element
array([2, 3])
from above example we notice that we can't assign a list to "buffer" and we had to use numpy.array() to return ndarray object for the buffer
Conclusion: use numpy.array() if you want to make a numpy.ndarray() object"
I think with np.array() you can only create C like though you mention the order, when you check using np.isfortran() it says false. but with np.ndarrray() when you specify the order it creates based on the order provided.

Categories

Resources