I am testing some edge cases of my program and observed a strange fact. When I create a scalar numpy array, it has size==1 and ndim==0.
>>> A=np.array(1.0)
>>> A.ndim # returns 0
>>> A.size # returns 1
But when I create empty array with no element, then it has size==0 but ndim==1.
>>> A=np.array([])
>>> A.ndim # returns 1
>>> A.size # returns 0
Why is that? I would expect the ndim to be also 0. Or is there another way of creation of 'really' empty array with size and ndim equal to 0?
UPDATE: even A=np.empty(shape=None) does not create dimensionless array of size 0...
I believe the answer is that "No, you can't create an ndarray with both ndim and size of zero". As you've already found out yourself, the (ndim,size) pairs of (1,0) and (0,1) are as low as you can go.
This very nice answer explains a lot about numpy scalar types, and why they're a bit odd to have around. This explanation makes it clear that scalar numpy arrays like array(1) are a very special kind of beast. They only have a single value (causing size==1), but by definition they don't have a sense of dimensionality, hence ndim==0. Non-scalar numpy arrays, on the other hand, can be empty, but they contain at least a pair of square brackets, leading to a minimal ndim of 1, even if their size can be 0 if they are made up of empty lists. (This is how I think about the situation: ndarrays are in a way lists of lists of lists of ..., on as many levels as there are dimensions. 1d arrays are compatible with lists, so an empty list, being still a list, also has a defining dimension.)
The only way to come up with an empty scalar would be to call np.array() like this, but arrays can only be initialized by some actual object. So I believe your program is safe from this edge case.
Related
I'm having some trouble understanding the rules for array broadcasting in Numpy.
Obviously, if you perform element-wise multiplication on two arrays of the same dimensions and shape, everything is fine. Also, if you multiply a multi-dimensional array by a scalar it works. This I understand.
But if you have two N-dimensional arrays of different shapes, it's unclear to me exactly what the broadcasting rules are. This documentation/tutorial explains that: In order to broadcast, the size of the trailing axes for both arrays in an operation must either be the same size or one of them must be one.
Okay, so I assume by trailing axis they are referring to the N in a M x N array. So, that means if I attempt to multiply two 2D arrays (matrices) with equal number of columns, it should work? Except it doesn't...
>>> from numpy import *
>>> A = array([[1,2],[3,4]])
>>> B = array([[2,3],[4,6],[6,9],[8,12]])
>>> print(A)
[[1 2]
[3 4]]
>>> print(B)
[[ 2 3]
[ 4 6]
[ 6 9]
[ 8 12]]
>>>
>>> A * B
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: shape mismatch: objects cannot be broadcast to a single shape
Since both A and B have two columns, I would have thought this would work. So, I'm probably misunderstanding something here about the term "trailing axis", and how it applies to N-dimensional arrays.
Can someone explain why my example doesn't work, and what is meant by "trailing axis"?
Well, the meaning of trailing axes is explained on the linked documentation page.
If you have two arrays with different dimensions number, say one 1x2x3 and other 2x3, then you compare only the trailing common dimensions, in this case 2x3. But if both your arrays are two-dimensional, then their corresponding sizes have to be either equal or one of them has to be 1. Dimensions along which the array has size 1 are called singular, and the array can be broadcasted along them.
In your case you have a 2x2 and 4x2 and 4 != 2 and neither 4 or 2 equals 1, so this doesn't work.
From http://cs231n.github.io/python-numpy-tutorial/#numpy-broadcasting:
Broadcasting two arrays together follows these rules:
If the arrays do not have the same rank, prepend the shape of the lower rank array with 1s until both shapes have the same length.
The two arrays are said to be compatible in a dimension if they have the same size in the dimension, or if one of the arrays has size 1 in that dimension.
The arrays can be broadcast together if they are compatible in all dimensions.
After broadcasting, each array behaves as if it had shape equal to the elementwise maximum of shapes of the two input arrays.
In any dimension where one array had size 1 and the other array had size greater than 1, the first array behaves as if it were copied along that dimension
If this explanation does not make sense, try reading the explanation from the documentation or this explanation.
we should consider two points about broadcasting. first: what is possible. second: how much of the possible things is done by numpy.
I know it might look a bit confusing, but I will make it clear by some example.
lets start from the zero level.
suppose we have two matrices. first matrix has three dimensions (named A) and the second has five (named B). numpy tries to match last/trailing dimensions. so numpy does not care about the first two dimensions of B. then numpy compares those trailing dimensions with each other. and if and only if they be equal or one of them be 1, numpy says "O.K. you two match". and if it these conditions don't satisfy, numpy would "sorry...its not my job!".
But I know that you may say comparison was better to be done in way that can handle when they are devisable(4 and 2 / 9 and 3). you might say it could be replicated/broadcasted by a whole number(2/3 in out example). and i am agree with you. and this is the reason I started my discussion with a distinction between what is possible and what is the capability of numpy.
Recently I used the numpy argmax function which gives the index of the maximum value in a numpy array.
Due to some circumstances I found out that when used with a scalar it just gives out 0, so like this:
np.argmax(3) # equals 0
np.argmax(1000) #equals 0
which makes sense of course, since there is only one index - but is there an actual application where one needs to find the maximum index of a scalar?
I think this is just for consistency as explained in the documentation on scalars:
Array scalars have the same attributes and methods as ndarrays. This
allows one to treat items of an array partly on the same footing as
arrays, smoothing out rough edges that result when mixing scalar and
array operations.
When you don't specify axis in argmax it returns the index into the flattened array, so even in this case the scalar is internally viewed as a 0D array.
I ran this code and expected an array size of 10000 as time is a numpy array of length of 10000.
freq=np.empty([])
for i,t in enumerate(time):
freq=np.append(freq,np.sin(t))
print(time.shape)
print(freq.shape)
But this is the output I got
(10000,)
(10001,)
Can someone explain why I am getting this disparity?
It turns out that the function np.empty() returns an uninitialized array of a given shape. Hence, when you do np.empty([]), it returns an uninitialized array as array(0.14112001). It's like having a value "ready to be used", but without having the actual value. You can check this out by printing the variable freq before the loop starts.
So, when you loop over freq = np.append(freq,np.sin(t)) this actually initializes the array and append a second value to it.
Also, if you just need to create an empty array just do x = np.array([]) or x = [].
You can read more about this numpy.empty function here:
https://numpy.org/doc/1.18/reference/generated/numpy.empty.html
And more about initializing arrays here:
https://www.ibm.com/support/knowledgecenter/SSGH2K_13.1.3/com.ibm.xlc1313.aix.doc/language_ref/aryin.html
I'm not sure if I was clear enough. It's not a straight forward concept. So please let me know.
You should fill np.empty(0).
I look for source code of numpy numpy/core.py
def empty(shape, dtype=None, order='C'):
"""Return a new matrix of given shape and type, without initializing entries.
Parameters
----------
shape : int or tuple of int
Shape of the empty matrix.
dtype : data-type, optional
Desired output data-type.
order : {'C', 'F'}, optional
Whether to store multi-dimensional data in row-major
(C-style) or column-major (Fortran-style) order in
memory.
See Also
--------
empty_like, zeros
Notes
-----
`empty`, unlike `zeros`, does not set the matrix values to zero,
and may therefore be marginally faster. On the other hand, it requires
the user to manually set all the values in the array, and should be
used with caution.
Examples
--------
>>> import numpy.matlib
>>> np.matlib.empty((2, 2)) # filled with random data
matrix([[ 6.76425276e-320, 9.79033856e-307], # random
[ 7.39337286e-309, 3.22135945e-309]])
>>> np.matlib.empty((2, 2), dtype=int)
matrix([[ 6600475, 0], # random
[ 6586976, 22740995]])
"""
return ndarray.__new__(matrix, shape, dtype, order=order)
It will input first arg shape into ndarray, so it will init a new array as [].
And you can print np.empty(0) and freq=np.empty([]) to see what are their differences.
I think you are trying to replicate a list operation:
freq=[]
for i,t in enumerate(time):
freq.append(np.sin(t))
But neither np.empty or np.append are exact clones; the names are similar but the differences are significant.
First:
In [75]: np.empty([])
Out[75]: array(1.)
In [77]: np.empty([]).shape
Out[77]: ()
This is a 1 element, 0d array.
If you look at the code for np.append you'll see that if the 1st argument is not 1d (and axis is not provided) it flattens it (that's documented as well):
In [78]: np.append??
In [82]: np.empty([]).ravel()
Out[82]: array([1.])
In [83]: np.empty([]).ravel().shape
Out[83]: (1,)
It is not a 1d, 1 element array. Append that with another array:
In [84]: np.append(np.empty([]), np.sin(2))
Out[84]: array([1. , 0.90929743])
The result is 2d. Repeat that 1000 times and you end up with 1001 values.
np.empty despite its name does not produce a [] list equivalent. As others show np.array([]) sort of does, as would np.empty(0).
np.append is not a list append clone. It is just a cover function to np.concatenate. It's ok for adding an element to a longer array, but beyond that it has too many pitfalls to be useful. It's especially bad in a loop like this. Getting a correct start array is tricky. And it is slow (compared to list append). Actually these problems apply to all uses of concatenate and stack... in a loop.
Hello I have the following question. I create zero arrays of dimension (40,30,80). Now I need 7*7*7 of these zero arrays in an array. How can I do this?
One of my matrices is created like this:
import numpy as np
zeroMatrix = np.zeros((40,30,80))
My first method was to put the zero matrices in a 7*7*7 list. But i want to have it all in a numpy array. I know that there is a way with structured arrays I think, but i dont know how. If i copy my 7*7*7 list with np.copy() it creates a numpy array with the given shape, but there must be a way to do this instantly, isnt there?
EDIT
Maybe I have to make my question clearer. I have a 7*7 list of my zero matrices. In a for loop all of that arrays will be modified. In another step, this tempory list is appended to an empty list which will have a length of 7 in the end ( So i append the 7*7 list 7 times to the empty list. In the end I have a 7*7*7 List of those matrices. But I think this will be better If I have a numpy array of these zero matrices from the beginning.
Building an array of same-shaped arrays is not well supported by numpy which prefers to create a maximum depth array of minimum depth elements instead.
It turns out that numpy.frompyfunc is quite useful in circumventing this tendency where it is unwanted.
In your specific case one could do:
result = np.frompyfunc(zeroMatrix.copy, 0, 1)(np.empty((7, 7, 7), object))
Indeed:
>>> result.shape
(7, 7, 7)
>>> result.dtype
dtype('O')
>>> result[0, 0, 0].shape
(40, 30, 80)
I'm messing around with 2-dimensional slicing and don't understand why leaving out some defaults grabs the same values from the original array but produces different output. What's going on with the double brackets and shape changing?
x = np.arange(9).reshape(3,3)
y = x[2]
z = x[2:,:]
print y
print z
print shape(y)
print shape(z)
[6 7 8]
[[6 7 8]]
(3L,)
(1L, 3L)
x is a two dimensional array, an instance of NumPy's ndarray object. You can index/slice these objects in essentially two ways: basic and advanced.
y[2] fetches the row at index 2 of the array, returning the array [6 7 8]. You're doing basic slicing because you've specified only an integer. You can also specify a tuple of slice objects and integers for basic slicing, e.g. x[:,2] to select the right-hand column.
With basic slicing, you're also reducing the number of dimensions of the returned object (in this case from two to just one):
An integer, i, returns the same values as i:i+1 except the dimensionality of the returned object is reduced by 1.
So when you ask for the shape of y, this is why you only get back one dimension (from your two-dimensional x).
Advanced slicing occurs when you specify an ndarray: or a tuple with at least one sequence object or ndarray. This is the case with x[2:,:] since 2: counts as a sequence object.
You get back an ndarray. When you ask for its shape, you will get back all of the dimensions (in this case two):
The shape of the output (or the needed shape of the object to be used for setting) is the broadcasted shape.
In a nutshell, as soon as you start slicing along any dimension of your array with :, you're doing advanced slicing and not basic slicing.
One brief point worth mentioning: basic slicing returns a view onto the original array (changes made to y will be reflected in x). Advanced slicing returns a brand new copy of the array.
You can read about array indexing and slicing in much more detail here.