How to append to a ndarray - python

I'm new to Numpy library from Python and I'm not sure what I'm doing wrong here, could you help me please with this?
So, I initialize my ndarray like this.
A = np.array([])
And then I'm training to append into this array A a new array X which has a shape like (1000,32,32) if has any importance.
np.insert(A, X)
The problem here is that if I'm checking the ndarray A after that it's empty, even though the ndarray X has elements inside.
Could you explain me what exactly I'm doing wrong please?

Make sure to write back to A if you use np.append, as in A = np.append(A,X) -- the top-level numpy functions like np.insert and np.append are usually immutable, so even though it gives you a value back, it's your job to store it. np.array likes to flatten the np.ndarray if you use append, so honestly, I think you just want a regular list for A, and that append method is mutable, so no need to write it back.
>>> A = []
>>> X = np.ndarray((1000,32,32))
>>> A.append(X)
>>> print(A)
[array([[[1.43351171e-316, 4.32573840e-317, 4.58492919e-320, ...,
1.14551501e-259, 6.01347002e-154, 1.39804329e-076],
[1.39803697e-076, 1.39804328e-076, 1.39642638e-076, ...,
1.18295070e-076, 7.06474122e-096, 6.01347002e-154],
[1.39804328e-076, 1.39642638e-076, 1.39804065e-076, ...,
1.05118732e-153, 6.01334510e-154, 3.24245662e-086],
...

In [10]: A = np.array([])
In [11]: A.shape
Out[11]: (0,)
In [13]: np.concatenate([A, np.ones((2,3))])
---------------------------------------------------------------------------
...
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 1 dimension(s) and the array at index 1 has 2 dimension(s)
So one first things you need to learn about numpy arrays is that they have shape, and a number of dimensions. Hopefully that error message is clear.
Concatenate with another 1d array does work:
In [14]: np.concatenate([A, np.arange(3)])
Out[14]: array([0., 1., 2.])
But that is just np.arange(3). The concatenate does nothing for us. OK, you might imagine starting a loop like this. But don't. This is not efficient.
You could easily concatenate a list of arrays, as long as the dimensions obey the rules specified in the docs. Those rules are logical, as long as you take the dimensions of the arrays seriously.
In [15]: X = np.ones((1000,32,32))
In [16]: np.concatenate([X,X,X], axis=1).shape
Out[16]: (1000, 96, 32)

Related

Numpy empty list type inference

Why is the empty list [] being inferred as float type when using np.append?
np.append([1,2,3], [0])
# output: array([1, 2, 3, 0]), dtype = np.int64
np.append([1,2,3], [])
# output: array([1., 2., 3.]), dtype = np.float64
This is persistent even when using a np.array([1,2,3], dtype=np.int32) as arr.
It's not possible to specify a dtype for append, so I am just curious on why this happens. Numpy's concatenate does the same thing, but when I try to specify the dtype I get an error:
np.concatenate([[1,2,3], []], dtype=np.int64)
Error:
TypeError: Cannot cast array data from dtype('float64') to dtype('int64') according to the rule 'same_kind'
But finally if I set the unsafe casting rule it works:
np.concatenate([[1,2,3], []], dtype=np.int64, casting='unsafe')
Why is [] considered a float?
np.append is subject to well-defined semantic rules like any Numpy binary operation. As a result, it first converts the input operands to Numpy arrays if this is not the case (typically with np.array) and then apply the semantic rules to find the type of the resulting array and check it is a valid operation before applying the actual operation (here the concatenation). The array type returned by np.array is "determined as the minimum type required to hold the objects in the sequence" regarding to the documentation. When the list is empty, like in your case, the default type is numpy.float64 as stated in the documentation of np.empty. This arbitrary choice was made long ago and has not been changed since in order not to break old codes. Please note that It seems not all Numpy developers agree with the current choice and so this is a matter of debate. For more information, you can read this opened issue.
The rule of thumb is to use either existing Numpy arrays or to perform an explicit conversion to a Numpy array using np.array with a fixed dtype parameter (as described in the above comments).
Look at the code for np.append (via docs link or ipython):
def append(arr, values, axis=None):
arr = asanyarray(arr)
if axis is None:
if arr.ndim != 1:
arr = arr.ravel()
values = ravel(values)
axis = arr.ndim-1
return concatenate((arr, values), axis=axis)
The first argument is turned into an array, if it isn't one already.
You don't specify the axis, so both arr and values are ravelled - turned into 1d array. np.ravel is also python code, and does asanyarray(a).ravel(order=order)
So the dtype inference is done by np.asanyarray.
The rest of the action is np.concatenate. It too will convert the inputs to arrays if necessary. The result dtype is the "highest" of the inputs.
np.append is a poorly conceived (IMO) alternative way of using np.concatenate. It is not a list append clone.
Also be careful about "empty" arrays:
In [73]: np.array([])
Out[73]: array([], dtype=float64)
In [74]: np.empty((0))
Out[74]: array([], dtype=float64)
In [75]: np.empty((0),int)
Out[75]: array([], dtype=int64)
The common list idiom
alist = []
for i in range(10):
alist.append(i)
does not translate well into numpy. Build a list of arrays, and do one concatenate/vstack at the end. Don't iterate over "empty" arrays, however created.

Why the length of the array appended in loop is more than the number of iteration?

I ran this code and expected an array size of 10000 as time is a numpy array of length of 10000.
freq=np.empty([])
for i,t in enumerate(time):
freq=np.append(freq,np.sin(t))
print(time.shape)
print(freq.shape)
But this is the output I got
(10000,)
(10001,)
Can someone explain why I am getting this disparity?
It turns out that the function np.empty() returns an uninitialized array of a given shape. Hence, when you do np.empty([]), it returns an uninitialized array as array(0.14112001). It's like having a value "ready to be used", but without having the actual value. You can check this out by printing the variable freq before the loop starts.
So, when you loop over freq = np.append(freq,np.sin(t)) this actually initializes the array and append a second value to it.
Also, if you just need to create an empty array just do x = np.array([]) or x = [].
You can read more about this numpy.empty function here:
https://numpy.org/doc/1.18/reference/generated/numpy.empty.html
And more about initializing arrays here:
https://www.ibm.com/support/knowledgecenter/SSGH2K_13.1.3/com.ibm.xlc1313.aix.doc/language_ref/aryin.html
I'm not sure if I was clear enough. It's not a straight forward concept. So please let me know.
You should fill np.empty(0).
I look for source code of numpy numpy/core.py
def empty(shape, dtype=None, order='C'):
"""Return a new matrix of given shape and type, without initializing entries.
Parameters
----------
shape : int or tuple of int
Shape of the empty matrix.
dtype : data-type, optional
Desired output data-type.
order : {'C', 'F'}, optional
Whether to store multi-dimensional data in row-major
(C-style) or column-major (Fortran-style) order in
memory.
See Also
--------
empty_like, zeros
Notes
-----
`empty`, unlike `zeros`, does not set the matrix values to zero,
and may therefore be marginally faster. On the other hand, it requires
the user to manually set all the values in the array, and should be
used with caution.
Examples
--------
>>> import numpy.matlib
>>> np.matlib.empty((2, 2)) # filled with random data
matrix([[ 6.76425276e-320, 9.79033856e-307], # random
[ 7.39337286e-309, 3.22135945e-309]])
>>> np.matlib.empty((2, 2), dtype=int)
matrix([[ 6600475, 0], # random
[ 6586976, 22740995]])
"""
return ndarray.__new__(matrix, shape, dtype, order=order)
It will input first arg shape into ndarray, so it will init a new array as [].
And you can print np.empty(0) and freq=np.empty([]) to see what are their differences.
I think you are trying to replicate a list operation:
freq=[]
for i,t in enumerate(time):
freq.append(np.sin(t))
But neither np.empty or np.append are exact clones; the names are similar but the differences are significant.
First:
In [75]: np.empty([])
Out[75]: array(1.)
In [77]: np.empty([]).shape
Out[77]: ()
This is a 1 element, 0d array.
If you look at the code for np.append you'll see that if the 1st argument is not 1d (and axis is not provided) it flattens it (that's documented as well):
In [78]: np.append??
In [82]: np.empty([]).ravel()
Out[82]: array([1.])
In [83]: np.empty([]).ravel().shape
Out[83]: (1,)
It is not a 1d, 1 element array. Append that with another array:
In [84]: np.append(np.empty([]), np.sin(2))
Out[84]: array([1. , 0.90929743])
The result is 2d. Repeat that 1000 times and you end up with 1001 values.
np.empty despite its name does not produce a [] list equivalent. As others show np.array([]) sort of does, as would np.empty(0).
np.append is not a list append clone. It is just a cover function to np.concatenate. It's ok for adding an element to a longer array, but beyond that it has too many pitfalls to be useful. It's especially bad in a loop like this. Getting a correct start array is tricky. And it is slow (compared to list append). Actually these problems apply to all uses of concatenate and stack... in a loop.

Is a set converted to numpy array?

I am creating a ndarray using:
import numpy as np
arr=np.array({1,2})
print(arr, type(arr))
which outputs
{1, 2} <class 'numpy.ndarray'>
If its type is numpy.ndarray, then o/p must be in square brackets like [1,2]?
Thanks
Yes, but it's because you put on the function np.array a set and not a list
if you try this:
import numpy as np
arr=np.array([1,2])
print(arr, type(arr))
you get:
[1 2] <class 'numpy.ndarray'>
This does something slightly different than you might imagine. Instead of constructing an array with the data you specify, the numbers 1 and 2, you're actually building an array of type object. See below:
>>> np.array({1, 2)).dtype
dtype('O')
This is because sets are not "array-like", in NumPy's terminology, in particular they are not ordered. Thus the array construction does not build an array with the contents of the set, but with the set itself as a single object.
If you really want to build an array from the set's contents you could do the following:
>>> x = np.fromiter(iter({1, 2}), dtype=int)
>>> x.dtype
dtype('int64')
Edit: This answer helps explain how various types are used to build an array in NumPy.
It returns a numpy array object with no dimensions. A set is an object. It is similar to passing numpy.array a number (without brackets). See the difference here:
arr=np.array([1])
arr.shape: (1,)
arr=np.array(1)
arr.shape: ()
arr=np.array({1,2})
arr.shape: ()
Therefore, it treats your entire set as a single object and creates a numpy array with no dimensions that only returns the set object. Sets are not array-like and do not have order, hence according to numpy array doc they are not converted to arrays like you expect. If you wish to create a numpy array from a set and you do not care about its order, use:
arr=np.fromiter({1,2},int)
arr.shape: (2,)
The repr display of ipython may make this clearer:
In [162]: arr=np.array({1,2})
In [163]: arr
Out[163]: array({1, 2}, dtype=object)
arr is a 0d array, object dtype, contain 1 item, the set.
But if we first turn the set into a list:
In [164]: arr=np.array(list({1,2}))
In [165]: arr
Out[165]: array([1, 2])
now we have a 1d (2,) integer dtype array.
np.array(...) converts list (and list like) arguments into a multdimensional array. A set is not sufficiently list-like.

Recursively defining an N-dimensional numpy array

I am trying to recursively define a numpy array of N dimensions. After researching for several hours, I have came across a couple of ways this might work (np.append and np.concatenate), however neither of these has given me the desired output. I've been getting either:
[-0.6778734 -0.73517866 -0.73517866 0.6778734 ] (1-d array) or
[array([-0.6778734 , -0.73517866]), array([-0.73517866, 0.6778734 ])] (a list of arrays)
My Input:
[(1.2840277121727839, array([-0.6778734, -0.73517866])),
(0.049083398938327472, array([-0.73517866, 0.6778734 ]))]
Desired output:
array([-0.6778734, -0.73517866], [-0.73517866, 0.6778734])
Is it possible to create a numpy array from arrays, because converting them to lists and back to arrays seems computationally inefficient?
Thanks in advance!
Your input is a list of tuples, each tuple consisting of a number and an array. For some reason you want to throw away the number, and just combine the arrays into a larger array - is that right?
In [1067]: x=[(1.2840277121727839, np.array([-0.6778734, -0.73517866])),
(0.049083398938327472, np.array([-0.73517866, 0.6778734 ]))]
In [1068]: x
Out[1068]:
[(1.2840277121727839, array([-0.6778734 , -0.73517866])),
(0.04908339893832747, array([-0.73517866, 0.6778734 ]))]
A list comprehension does a nice job of extracting the desired elements for the tuples:
In [1069]: [y[1] for y in x]
Out[1069]: [array([-0.6778734 , -0.73517866]), array([-0.73517866, 0.6778734 ])]
and vstack is great for combining arrays into a larger one.
In [1070]: np.vstack([y[1] for y in x])
Out[1070]:
array([[-0.6778734 , -0.73517866],
[-0.73517866, 0.6778734 ]])
vstack is just concatenate with an added step that ensures the inputs are 2d.
np.array([y[1] for y in x]) also works, since you are adding a dimension.
I'm assuming that array([-0.6778734, -0.73517866], [-0.73517866, 0.6778734]) has a typo - that it is missing a set of []. The 2nd parameter to np.array is the dtype, not another list.
Note that both np.array and np.concatentate take a list. It can be list of lists, or list of arrays. It doesn't make much difference. And at this stage don't worry about computational efficiency. Any time you combine the data from 2 or more arrays there will be copying. Arrays have a fixed size, and can't 'grow' without making a new copy.
In [1074]: np.concatenate([y[1] for y in x]).reshape(2,2)
Out[1074]:
array([[-0.6778734 , -0.73517866],
[-0.73517866, 0.6778734 ]])
Lists are effectively 1d, so np.concatenate joins them on that dimension, producing a 4 element 1d array. reshape corrects that. vstack makes them both (1,2) and does a concatenate on the 1st dimension.
Another expression that joins the arrays on a new dimension:
np.concatenate([y[1][None,...] for y in x], axis=0)
The [None,...] adds a new dimension at the start.
Try this:
import numpy as np
a = np.array([1, 2])
b = np.array([3, 4])
print(repr(np.vstack((a, b))))
Gives:
array([[1, 2],
[3, 4]])
You can form the desired 2D array given a list input_data of the form
input_data = [(1.2840277121727839, np.array([-0.6778734, -0.73517866])),
(0.049083398938327472, np.array([-0.73517866, 0.6778734 ]))]
via
nparr = np.array(list(row[1] for row in input_data))

Numpy matrix row stacking

I have 4 arrays (all the same length) which I am trying to stack together to create a new array, with each of the 4 arrays being a row.
My first thought was this:
B = -np.array([[x1[i]],[x2[j]],[y1[i]],[y2[j]]])
However the shape of that is (4,1,20).
To get the 2D output I expected I resorted to this:
B = -np.vstack((np.vstack((np.vstack(([x1[i]],[x2[j]])),[y1[i]])),[y2[j]]))
Where the shape is (4,20).
Is there a better way to do this? And why would the first method not work?
Edit
For clarity, the shapes of x1[i], x2[j], y1[i], y2[j] are all (20,).
The problem is with the extra brackets:
B = -np.array([[x1[i]],[x2[j]],[y1[i]],[y2[j]]]) # (4,1,20)
B = -np.array([x1[i],x2[j],y1[i],y2[j]]) # (4,20)
[[x1[i]] is (1,20) in shape.
In [26]: np.array([np.ones((20,)),np.zeros((20,))]).shape
Out[26]: (2, 20)
vstack works, but np.array does just as well. It's concatenate that needs the extra brackets
In [27]: np.vstack([np.ones((20,)),np.zeros((20,))]).shape
Out[27]: (2, 20)
In [28]: np.concatenate([np.ones((20,)),np.zeros((20,))]).shape
Out[28]: (40,)
In [29]: np.concatenate([[np.ones((20,))],[np.zeros((20,))]]).shape
vstack doesn't need the extra dimensions because it first passes the arrays through [atleast_2d(_m) for _m in tup]
np.vstack takes a sequence of equal-length arrays to stack, one on top of the other, as long as they have compatible shapes. So in your case, a tuple of the one-dimensional arrays would do:
np.vstack((x1[i], x2[j], y1[i], y2[j]))
would do what you want. If this statement is part of a loop building many such 4x20 arrays, however, that may be a different matter.

Categories

Resources