How to intepret the shape of the array in Python?

How to intepret the shape of the array in Python? - python

I am using a package and it is returning me an array. When I print the shape it is (38845,). Just wondering why this ','.
I am wondering how to interpret this.
Thanks.

Python has tuples, which are like lists but of fixed size. A two-element tuple is (a, b); a three-element one is (a, b, c). However, (a) is just a in parentheses. To represent a one-element tuple, Python uses a slightly odd syntax of (a,). So there is only one dimension, and you have a bunch of elements in that one dimension.

It sounds like you're using Numpy. If so, the shape (38845,) means you have a 1-dimensional array, of size 38845.

It seems you're talking of a Numpy array.
shape returns a tuple with the same size as the number of dimensions of the array. Each value of the tuple is the size of the array along the corresponding dimensions, or, as the tutorial says:
An array has a shape given by the number of elements along each axis.
Here you have a 1D-array (as indicated with a 1-element tuple notation, with the coma (as #Amadan) said), and the size of the 1st (and only dimension) is 38845.
For example (3,4) would be a 2D-array of size 3 for the 1st dimension and 4 for the second.
You can check the documentation for shape here: http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.shape.html

Just wondering why this ','.
Because (38845) is the same thing as 38845, but a tuple is expected here, not an int (since in general, your array could have multiple dimensions). (38845,) is a 1-tuple.

Related

Numpy slice of first arbitrary dimensions

There is this great Question/Answer about slicing the last dimension:
Numpy slice of arbitrary dimensions: for slicing a numpy array to obtain the i-th index in the last dimension, one can use ... or Ellipsis,
slice = myarray[...,i]
What if the first N dimensions are needed ?
For 3D myarray, N=2:
slice = myarray[:,:,0]
For 4D myarray, N=2:
slice = myarray[:,:,0,0]
Does this can be generalized to an arbitrary dimension?

I don't think there's any built-in syntactic sugar for that, but slices are just objects like anything else. The slice(None) object is what is created from :, and otherwise just picking the index 0 works fine.
myarray[(slice(None),)*N+(0,)*(myarray.ndim-N)]
Note the comma in (slice(None),). Python doesn't create tuples from parentheses by default unless the parentheses are empty. The comma signifies that don't just want to compute whatever's on the inside.
Slices are nice because they give you a view into the object instead of a copy of the object. You can use the same idea to, e.g., iterate over everything except the N-th dimension on the N-th dimension. There have been some stackoverflow questions about that, and they've almost unanimously resorted to rolling the indices and other things that I think are hard to reason about in high-dimensional spaces. Slice tuples are your friend.
From the comments, #PaulPanzer points out another technique that I rather like.
myarray.T[(myarray.ndim-N)*(0,)].T
First, transposes in numpy are view-operations instead of copy-operations. This isn't inefficient in the slightest. Here's how it works:
Start with myarray with dimensions (0,...,k)
The transpose myarray.T reorders those to (k,...,0)
The whole goal is to fix the last myarray.ndim-N dimensions from the original array, so we select those with [(myarray.ndim-N)*(0,)], which grabs the first myarray.ndim-N dimensions from this array.
They're in the wrong order. We have dimensions (N-1,...,0). Use another transpose with .T to get the ordering (0,...,N-1) instead.

Get range/slice from numpy array the size of another array

I have two numpy arrays, one bigger than another, but both with the same number of dimensions.
I want to get a slice from the bigger array that matches the size of the smaller array. (Starting from 0,0,0....)
So, imagine the big array has shape (10,5,7).
And the small array has shape (10,4,6).
I want to get from the bigger array this slice:
biggerArray[:10,:4,:6]
The length of the shape tuple may vary, and I want to do it for any number of dimensions (Both will always have the same number of dimensions).
How to do that? Is there a way to use tuples as ranges in slices?

Construct the tuple of slice objects manually. biggerArray[:10, :4, :6] is syntactic sugar for biggerArray[(slice(10), slice(4), slice(6))], so:
biggerArray[tuple(map(slice, smallerArray.shape))]
or
biggerArray[tuple(slice(0, n) for n in smallerArray.shape)]
You may want to assert result.shape == smallerArray.shape afterwards, just in case the input shapes weren't what you thought they were.

About Numpy shape

I'm new to numpy & have a question about it :
according to docs.scipy.org, the "shape" method is "the dimensions of the array. For a matrix with n rows and m columns, shape will be (n,m)"
Suppose I am to create a simple array as below:
np.array([[0,2,4],[1,3,5]])
Using the "shape" method, it returns (2,3) (i.e. the array has 2 rows & 3 columns)
However, for an array ([0,2,4]), the shape method would return (3,) (which means it has 3 rows according to the definition above)
I'm confused : the array ([0,2,4]) should have 3 columns not 3 rows so I expect it to return (,3) instead.
Can anyone help to clarify ? Thanks a lot.

This is just notation - in Python, tuples are distinguished from expression grouping (or order of operations stuff) by the use of commas - that is, (1,2,3) is a tuple and (2x + 4) ** 5 contains an expression 2x + 4. In order to keep single-element tuples distinct from single-element expressions, which would otherwise be ambiguous ((1) vs (1) - which is the single-element tuple and which a simple expression that evaluates to 1?), we use a trailing comma to denote tuple-ness.
What you're getting is a single dimension response, since there's only one dimension to measure, packed into a tuple type.

Numpy supports not only 2-dimensional arrays, but multi-dimensional arrays, and by multi-dimension I mean 1-D, 2-D, 3-D .... n-D, And there is a format for representing respective dimension array. The len of array.shape would get you the number of dimensions of that array. If the array is 1-D, the there is no need to represent as (m, n) or if the array is 3-D then it (m, n) would not be sufficient to represent its dimensions.
So the output of array.shape would not always be in (m, n) format, it would depend upon the array itself and you will get different outputs for different dimensions.

same numbers but different shape when slicing 2 dimensional arrays in python with numpy

I'm messing around with 2-dimensional slicing and don't understand why leaving out some defaults grabs the same values from the original array but produces different output. What's going on with the double brackets and shape changing?
x = np.arange(9).reshape(3,3)
y = x[2]
z = x[2:,:]
print y
print z
print shape(y)
print shape(z)
[6 7 8]
[[6 7 8]]
(3L,)
(1L, 3L)

x is a two dimensional array, an instance of NumPy's ndarray object. You can index/slice these objects in essentially two ways: basic and advanced.
y[2] fetches the row at index 2 of the array, returning the array [6 7 8]. You're doing basic slicing because you've specified only an integer. You can also specify a tuple of slice objects and integers for basic slicing, e.g. x[:,2] to select the right-hand column.
With basic slicing, you're also reducing the number of dimensions of the returned object (in this case from two to just one):
An integer, i, returns the same values as i:i+1 except the dimensionality of the returned object is reduced by 1.
So when you ask for the shape of y, this is why you only get back one dimension (from your two-dimensional x).
Advanced slicing occurs when you specify an ndarray: or a tuple with at least one sequence object or ndarray. This is the case with x[2:,:] since 2: counts as a sequence object.
You get back an ndarray. When you ask for its shape, you will get back all of the dimensions (in this case two):
The shape of the output (or the needed shape of the object to be used for setting) is the broadcasted shape.
In a nutshell, as soon as you start slicing along any dimension of your array with :, you're doing advanced slicing and not basic slicing.
One brief point worth mentioning: basic slicing returns a view onto the original array (changes made to y will be reflected in x). Advanced slicing returns a brand new copy of the array.
You can read about array indexing and slicing in much more detail here.

Operator + to add a tuple to another tuple stored inside a multidimensional array of tuples

I recently found out how to use tuples thanks to great contributions from SO users(see here). However I encounter the problem that I can't add a tuple to another tuple stored inside an array of tuples. For instance if I define:
arrtup=empty((2,2),dtype=('int,int'))
arrtup[0,1]=(3,4)
Then if I try to add another tuple to the existing tupe to come up with a multidimensional index:
arrtup[0,1]+(4,4)
I obtain this error:
TypeError: unsupported operand type(s) for +: 'numpy.void' and 'tuple'
Instead of the expected (3,4,4,4) tuple, which I can obtain by:
(3,4)+(4,4)
Any ideas? Thanks!

You are mixing different concepts, I'm afraid.
Your arrtup array is not an array of tuples, it's a structured ndarray, that is, an array of elements that look like tuples but in fact are records (numpy.void objects, to be exact). In your case, you defined these records to consist in 2 integers. Internally, NumPy creates your array as a 2x2 array of blocks, each block taking a given space defined by your dtype: here, a block consists of 2 consecutive blocks of size int (that is, each sub-block takes the space an int takes on your machine).
When you retrieve an element with arrtup[0,1], you get the corresponding block. Because this block is structured as two-subblocks, NumPy returns a numpy.void (the generic object representing structured blocks), which has the same dtype as your array.
Because you set the size of those blocks at the creation of the array, you're no longer able to modify it. That means that you cannot transform your 2-int records into 4-int ones as you want.
However, you can transform you structured array into an array of objects:
new_arr = arrtup.astype(object)
Lo and behold, your elements are no longer np.void but tuples, that you can modify as you want:
new_arr[0,1] = (3,4) # That's a tuple
new_arr[0,1] += (4,4) # Adding another tuple to the element
Your new_arr is a different beast from your arrtup: it has the same size, true, but it's no longer a structured array, it's an array of objects, as illustrated by
>>> new_arr.dtype
dtype("object")
In practice, the memory layout is quite different between arrtup and newarr. newarr doesn't have the same constraints as arrtup, as the individual elements can have different sizes, but object arrays are not as efficient as structured arrays.

The traceback is pretty clear here. arrtup[0,1] is not a tuple. It's of type `numpy.void'.
You can convert it to a tuple quite easily however:
tuple(arrtup[0,1])
which can be concatenated with other tuples:
tuple(arrtup[0,1]) + (4,4)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.