What does "dimensionality" mean for a numpy array? - python

I am still new to scikit-learn and numpy.
I read the tutorial, but I can't understand how they define array dimensions.
In the following example:
>>> import numpy as np
>>> a = np.arange(15).reshape(3, 5)
>>> a
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
>>> a.shape
(3, 5)
>>> a.ndim
2
The array has five variables in each row, so I expect it to have 5 dimensions.
Why is a.ndim equal to 2?

Given that you're using scikit learn, I'll explain this in the context of machine learning as it may make more sense...
Your feature matrix (which I assume is what you're talking about here), is going to be 2 dimensional typically (hence why ndim = 2) because you have rows (which occupy a 1 dimension) and columns (which occupy a second dimension)
In machine learning cases, I typically think of the rows as the samples and columns as the features.
Note, however, that each dimension can have multiple entries (e.g. you will have multiple samples/rows, and multiple columns/features). This tells you the size along that dimension.
So in your case:
>>> a
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
>>> a.shape
(3, 5)
>>> a.ndim
2
You have one dimension that has a length/size of 3. And a second dimension that has 5 entries. You can think of this as a feature matrix containing 3 samples and 5 features/variables, for example.
All in all, you have 2 dimensions (ndim = 2), but the specific size of the array is represented by the shape tuple, which tells you how large each of the 2 dimensions are.
Furthermore, (3,5,2) will be a matrix with 3 dimensions, where the 3rd dimension has 2 values
I think the key here, at least in the 2 dimensional case, is to not think of it as nested lists or nested vectors (which is what it looks like when you consider the []) but to just think of it as a table with rows and columns. The shape tuple and ndim will make more sense when you think of the data structure that way

Dimension means there the length of a.shape tuple.
Shape of the ndarray is (3, 5) since it has 3 rows and 5 columns in it. This is exactly what you try to find, aren't you?

I mistake the Frame and the Array, array is judged by how to find the number, how many steps to locate the number, in this case, you need locate the row and then locate the line. but in Frame, for one instance, this instance own a row, and all the lines in it described its attributes, or called it variables, or dimensions to define a instance. in Frame, you can use a two dimensions array to define a multi-dimension instance.
Array
1 4 5
2 3 6
so it's two steps you will find the number, like you can you a[][] to locate
but in Frame
Length Height Weight
1 23 34 56
2 89 87 63
This is a frame and actually is a 3-dimensional "array", but it's not an array.

Related

Weighted resampling a numpy array

I have a 50 x 4 numpy array and I'd like to repeat the rows to make it a 500 x 4 array. But the catch is, I cannot just repeat the rows along 0th axis. I'd like to have more smaller rows and lesser bigger rows in the expanded array.
The input array has data that looks like this:
[1, 1, 16, 5]
[8, 10, 512, 10]
...
[448, 8192, 409600, 150]
Here, the initial rows are small and the last rows are large. But the scales for each column are different. i.e., 16 might be a very low value for column 2 but a high value for column 1
How can I achieve this using numpy or even lists?
Expected output would be a vector of shape 500 x 4 where each row is taken from the input vector, and repeated for some number of times.
[1, 1, 16, 5] # repeated thrice
[1, 1, 16, 5]
[1, 1, 16, 5]
[8, 10, 512, 10] # repeated twice
[8, 10, 512, 10]
...
[448, 8192, 409600, 150]
You can using np.repeat so to repeat an arbitrary number of time a givent value in an array and then use that as an index for the input array (since np.repeat do not work directly on 2D arrays). Here is an example:
# Example of random input
inputArr = np.random.randint(0, 1000, (50, 4))
# Example with [2, 3, ..., 52] repeated lines
counts = np.arange(2, 52)
# Actual computation
outputArr = inputArr[np.repeat(np.arange(inputArr.shape[0]), counts)]

Efficient ways to aggregate and replicate values in a numpy matrix

In my work I often need to aggregate and expand matrices of various quantities, and I am looking for the most efficient ways to do these actions. E.g. I'll have an NxN matrix that I want to aggregate from NxN into PxP where P < N. This is done using a correspondence between the larger dimensions and the smaller dimensions. Usually, P will be around 100 or so.
For example, I'll have a hypothetical 4x4 matrix like this (though in practice, my matrices will be much larger, around 1000x1000)
m=np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]])
>>> m
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12],
[13, 14, 15, 16]])
and a correspondence like this (schematically):
0 -> 0
1 -> 1
2 -> 0
3 -> 1
that I usually store in a dictionary. This means that indices 0 and 2 (for rows and columns) both get allocated to new index 0 and indices 1 and 3 (for rows and columns) both get allocated to new index 1. The matrix could be anything at all, but the correspondence is always many-to-one when I want to compress.
If the input matrix is A and the output matrix is B, then cell B[0, 0] would be the sum of A[0, 0] + A[0, 2] + A[2, 0] + A[2, 2] because new index 0 is made up of original indices 0 and 2.
The aggregation process here would lead to:
array([[ 1+3+9+11, 2+4+10+12 ],
[ 5+7+13+15, 6+8+14+16 ]])
= array([[ 24, 28 ],
[ 40, 44 ]])
I can do this by making an empty matrix of the right size and looping over all 4x4=16 cells of the initial matrix and accumulating in nested loops, but this seems to be inefficient and the vectorised nature of numpy is always emphasised by people. I have also done it by using np.ix_ to make sets of indices and use m[row_indices, col_indices].sum(), but I am wondering what the most efficient numpy-like way to do it is.
Conversely, what is the sensible and efficient way to expand a matrix using the correspondence the other way? For example with the same correspondence but in reverse I would go from:
array([[ 1, 2 ],
[ 3, 4 ]])
to
array([[ 1, 2, 1, 2 ],
[ 3, 4, 3, 4 ],
[ 1, 2, 1, 2 ],
[ 3, 4, 3, 4 ]])
where the values simply get replicated into the new cells.
In my attempts so far for the aggregation, I have used approaches with pandas methods with groupby on index and columns and then extracting the final matrix with, e.g. df.values. However, I don't know the equivalent way to expand a matrix, without using a lot of things like unstack and join and so on. And I see people often say that using pandas is not time-efficient.
Edit 1: I was asked in a comment about exactly how the aggregation should be done. This is how it would be done if I were using nested loops and a dictionary lookup between the original dimensions and the new dimensions:
>>> m=np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]])
>>> mnew=np.zeros((2,2))
>>> big2small={0:0, 1:1, 2:0, 3:1}
>>> for i in range(4):
... inew = big2small[i]
... for j in range(4):
... jnew = big2small[j]
... mnew[inew, jnew] += m[i, j]
...
>>> mnew
array([[24., 28.],
[40., 44.]])
Edit 2: Another comment asked for the aggregation example towards the start to be made more explicit, so I have done so.
Assuming you don't your indices don't have a regular structure I would do it try sparse matrices.
import scipy.sparse as ss
import numpy as np
# your current array of indices
g=np.array([[0,0],[1,1],[2,0],[3,1]])
# a sparse matrix of (data=ones, (row_ind=g[:,0], col_ind=g[:,1]))
# it is one for every pair (g[i,0], g[i,1]), zero elsewhere
u=ss.csr_matrix((np.ones(len(g)), (g[:,0], g[:,1])))
Aggregate
m=np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]])
u.T # m # u
Expand
m2 = np.array([[1,2],[3,4]])
u # m2 # u.T

numpy - column-wise and row-wise sums of a given 2d matrix

I have this numpy matrix (ndarray).
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20],
[21, 22, 23, 24, 25]])
I want to calculate the column-wise and row-wise sums.
I know this is done by calling respectively
np.sum(mat, axis=0) ### column-wise sums
np.sum(mat, axis=1) ### row-wise sums
but I cannot understand these two calls.
Why is axis 0 giving me the sums column-by-column?!
Shouldn't it be the other way around?
I thought the rows are axis 0, and the columns are axis 1.
What I am seeing as a behavior here looks counter-intuitive
(but I am sure it's OK, I guess I am just missing something important).
I am just looking for some intuitive explanation here.
Thanks in advance.
Intuition around arrays and axes
I want to offer 3 types of intuitions here.
Graphical (How to imagine them visually)
Physical (How they are physically stored)
Logical (How to work with them logically)
Graphical intuition
Consider a numpy array as a n-dimensional object. This n-dimensional object contains elements in each of the directions as below.
Axes in this representation are the direction of the tensor. So, a 2D matrix has only 2 axes, while a 4D tensor has 4 axes.
Sum in a given axis can be essentially considered as a reduction in that direction. Imagine a 3D tensor being squashed in such a way that it becomes flat (a 2D tensor). The axis tells us which direction to squash or reduce it in.
Physical intuition
Numpy stores its ndarrays as contiguous blocks of memory. Each element is stored in a sequential manner every n bytes after the previous.
(images referenced from this excellent SO post)
So if your 3D array looks like this -
Then in memory its stores as -
When retrieving an element (or a block of elements), NumPy calculates how many strides (bytes) it needs to traverse to get the next element in that direction/axis. So, for the above example, for axis=2 it has to traverse 8 bytes (depending on the datatype) but for axis=1 it has to traverse 8*4 bytes, and axis=0 it needs 8*8 bytes.
Axes in this representation is basically the series of next elements after a given stride. Consider the following array -
print(X)
print(X.strides)
[[0 2 1 4 0 0 0]
[5 0 0 0 0 0 0]
[8 0 0 0 0 0 0]
[0 0 0 0 0 0 0]
[0 0 1 0 0 0 0]
[0 0 0 1 0 0 0]]
#Strides (bytes) required to traverse in each axis.
(56, 8)
In the above array, every element after 56 bytes from any element is the next element in axis=0 and every element after 8 bytes from any element is in axis=1. (except from the last element)
Sum or reduction in this regards means taking a sum of every element in that strided series. So, sum over axis=0 means that I need to sum [0,5,8,0,0,0], [2,0,0,0,0,0], ... and sum over axis=1 means just summing [0 2 1 4 0 0 0] , [5 0 0 0 0 0 0], ...
Logical intuition
This interpretation has to do with element groupings. A numpy stores its ndarrays as groups of groups of groups ... of elements. Elements are grouped together and contain the last axis (axis=-1). Then another grouping over them creates another axis before it (axis=-2). The final outermost group is the axis=0.
These are 3 groups of 2 groups of 5 elements.
Similarly, the shape of a NumPy array is also determined by the same.
1D_array = [1,2,3]
2D_array = [[1,2,3]]
3D_array = [[[1,2,3]]]
...
Axes in this representation are the group in which elements are stored. The outermost group is axis=0 and the innermost group is axis=-1.
Sum or reduction in this regard means that I reducing elements across that specific group or axis. So, sum over axis=-1 means I sum over the innermost groups. Consider a (6, 5, 8) dimensional tensor. When I say I want a sum over some axis, I want to reduce the elements lying in that grouping / direction to a single value that is equal to their sum.
So,
np.sum(arr, axis=-1) will reduce the inner most groups (of length 8) into a single value and return (6,5,1) or (6,5).
np.sum(arr, axis=-2) will reduce the elements that lie in the 1st axis (or -2nd axis) direction and reduce those to a single value returning (6,1,8) or (6,8)
np.sum(arr, axis=0) will similarly reduce the tensor to (1,5,8) or (5,8).
Hope these 3 intuitions are beneficial to anyone trying to understand how axes and NumPy tensors work in general and how to build an intuitive understanding to work better with them.
Let's start with a one dimensional example:
a, b, c, d, e = 0, 1, 2, 3, 4
arr = np.array([a, b, c, d, e])
If you do,
arr.sum(0)
Output
10
That is the sum of the elements of the array
a + b + c + d + e
Now before moving on a 2 dimensional example. Let's clarify that in numpy the sum of two 1 dimensional arrays is done element wise, for example:
a = np.array([1, 2, 3, 4, 5])
b = np.array([6, 7, 8, 9, 10])
print(a + b)
Output
[ 7 9 11 13 15]
Now if we change our initial variables to arrays, instead of scalars, to create a two dimensional array and do the sum
a = np.array([1, 2, 3, 4, 5])
b = np.array([6, 7, 8, 9, 10])
c = np.array([11, 12, 13, 14, 15])
d = np.array([16, 17, 18, 19, 20])
e = np.array([21, 22, 23, 24, 25])
arr = np.array([a, b, c, d, e])
print(arr.sum(0))
Output
[55 60 65 70 75]
The output is the same as for the 1 dimensional example, i.e. the sum of the elements of the array:
a + b + c + d + e
Just that now the elements of the arrays are 1 dimensional arrays and the sum of those elements is applied. Now before explaining the results, for axis = 1, let's consider an alternative notation to the notation across axis = 0, basically:
np.array([arr[0, :], arr[1, :], arr[2, :], arr[3, :], arr[4, :]]).sum(0) # [55 60 65 70 75]
That is we took full slices in all other indices that were not the first dimension. If we swap to:
res = np.array([arr[:, 0], arr[:, 1], arr[:, 2], arr[:, 3], arr[:, 4]]).sum(0)
print(res)
Output
[ 15 40 65 90 115]
We get the result of the sum along axis=1. So to sum it up you are always summing elements of the array. The axis will indicate how this elements are constructed.
Intuitively, 'axis 0' goes from top to bottom and 'axis 1' goes from left to right. Therefore, when you sum along 'axis 0' you get the column sum, and along 'axis 1' you get the row sum.
As you go along 'axis 0', the row number increases. As you go along 'axis 1' the column number increases.
Think of a 1-dimension array:
mat=array([ 1, 2, 3, 4, 5])
Its items are called by mat[0], mat[1], etc
If you do:
np.sum(mat, axis=0)
it will return 15
In the background, it sums all items with mat[0], mat[1], mat[2], mat[3], mat[4]
meaning the first index (axis=0)
Now consider a 2-D array:
mat=array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20],
[21, 22, 23, 24, 25]])
When you ask for
np.sum(mat, axis=0)
it will again sum all items based on the first index (axis=0) keeping all the rest same. This means that
mat[0][1], mat[1][1], mat[2][1], mat[3][1], mat[4][1]
will give one sum
mat[0][2], mat[1][2], mat[2][2], mat[3][2], mat[4][2]
will give another one, etc
If you consider a 3-D array, the logic will be the same. Every sum will be calculated on the same axis (index) keeping all the rest same. Sums on axis=0 will be produced by:
mat[0][1][1],mat[1][1][1],mat[2][1][1],mat[3][1][1],mat[4][1][1]
etc
Sums on axis=2 will be produced by:
mat[2][3][0], mat[2][3][1], mat[2][3][2], mat[2][3][3], mat[2][3][4]
etc
I hope you understand the logic. To keep things simple in your mind, consider axis=position of index in a chain index, eg axis=3 on a 7-mensional array will be:
mat[0][0][0][this is our axis][0][0][0]

can u please explain how this tuple (2,0,1) transposing ,i am not able to find logic of this transpose [duplicate]

In [28]: arr = np.arange(16).reshape((2, 2, 4))
In [29]: arr
Out[29]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7]],
[[ 8, 9, 10, 11],
[12, 13, 14, 15]]])
In [32]: arr.transpose((1, 0, 2))
Out[32]:
array([[[ 0, 1, 2, 3],
[ 8, 9, 10, 11]],
[[ 4, 5, 6, 7],
[12, 13, 14, 15]]])
When we pass a tuple of integers to the transpose() function, what happens?
To be specific, this is a 3D array: how does NumPy transform the array when I pass the tuple of axes (1, 0 ,2)? Can you explain which row or column these integers refer to? And what are axis numbers in the context of NumPy?
To transpose an array, NumPy just swaps the shape and stride information for each axis. Here are the strides:
>>> arr.strides
(64, 32, 8)
>>> arr.transpose(1, 0, 2).strides
(32, 64, 8)
Notice that the transpose operation swapped the strides for axis 0 and axis 1. The lengths of these axes were also swapped (both lengths are 2 in this example).
No data needs to be copied for this to happen; NumPy can simply change how it looks at the underlying memory to construct the new array.
Visualising strides
The stride value represents the number of bytes that must be travelled in memory in order to reach the next value of an axis of an array.
Now, our 3D array arr looks this (with labelled axes):
This array is stored in a contiguous block of memory; essentially it is one-dimensional. To interpret it as a 3D object, NumPy must jump over a certain constant number of bytes in order to move along one of the three axes:
Since each integer takes up 8 bytes of memory (we're using the int64 dtype), the stride value for each dimension is 8 times the number of values that we need to jump. For instance, to move along axis 1, four values (32 bytes) are jumped, and to move along axis 0, eight values (64 bytes) need to be jumped.
When we write arr.transpose(1, 0, 2) we are swapping axes 0 and 1. The transposed array looks like this:
All that NumPy needs to do is to swap the stride information for axis 0 and axis 1 (axis 2 is unchanged). Now we must jump further to move along axis 1 than axis 0:
This basic concept works for any permutation of an array's axes. The actual code that handles the transpose is written in C and can be found here.
As explained in the documentation:
By default, reverse the dimensions, otherwise permute the axes according to the values given.
So you can pass an optional parameter axes defining the new order of dimensions.
E.g. transposing the first two dimensions of an RGB VGA pixel array:
>>> x = np.ones((480, 640, 3))
>>> np.transpose(x, (1, 0, 2)).shape
(640, 480, 3)
In C notation, your array would be:
int arr[2][2][4]
which is an 3D array having 2 2D arrays. Each of those 2D arrays has 2 1D array, each of those 1D arrays has 4 elements.
So you have three dimensions. The axes are 0, 1, 2, with sizes 2, 2, 4. This is exactly how numpy treats the axes of an N-dimensional array.
So, arr.transpose((1, 0, 2)) would take axis 1 and put it in position 0, axis 0 and put it in position 1, and axis 2 and leave it in position 2. You are effectively permuting the axes:
0 -\/-> 0
1 -/\-> 1
2 ----> 2
In other words, 1 -> 0, 0 -> 1, 2 -> 2. The destination axes are always in order, so all you need is to specify the source axes. Read off the tuple in that order: (1, 0, 2).
In this case your new array dimensions are again [2][2][4], only because axes 0 and 1 had the same size (2).
More interesting is a transpose by (2, 1, 0) which gives you an array of [4][2][2].
0 -\ /--> 0
1 --X---> 1
2 -/ \--> 2
In other words, 2 -> 0, 1 -> 1, 0 -> 2. Read off the tuple in that order: (2, 1, 0).
>>> arr.transpose((2,1,0))
array([[[ 0, 8],
[ 4, 12]],
[[ 1, 9],
[ 5, 13]],
[[ 2, 10],
[ 6, 14]],
[[ 3, 11],
[ 7, 15]]])
You ended up with an int[4][2][2].
You'll probably get better understanding if all dimensions were of different size, so you could see where each axis went.
Why is the first inner element [0, 8]? Because if you visualize your 3D array as two sheets of paper, 0 and 8 are lined up, one on one paper and one on the other paper, both in the upper left. By transposing (2, 1, 0) you're saying that you want the direction of paper-to-paper to now march along the paper from left to right, and the direction of left to right to now go from paper to paper. You had 4 elements going from left to right, so now you have four pieces of paper instead. And you had 2 papers, so now you have 2 elements going from left to right.
Sorry for the terrible ASCII art. ¯\_(ツ)_/¯
It seems the question and the example originates from the book Python for Data Analysis by Wes McKinney. This feature of transpose is mentioned in Chapter 4.1. Transposing Arrays and Swapping Axes.
For higher dimensional arrays, transpose will accept a tuple of axis numbers to permute the axes (for extra mind bending).
Here "permute" means "rearrange", so rearranging the order of axes.
The numbers in .transpose(1, 0, 2) determines how the order of axes are changed compared to the original. By using .transpose(1, 0, 2), we mean, "Change the 1st axis with the 2nd." If we use .transpose(0, 1, 2), the array will stay the same because there is nothing to change; it is the default order.
The example in the book with a (2, 2, 4) sized array is not very clear since 1st and 2nd axes has the same size. So the end result doesn't seem to change except the reordering of rows arr[0, 1] and arr[1, 0].
If we try a different example with a 3 dimensional array with each dimension having a different size, the rearrangement part becomes more clear.
In [2]: x = np.arange(24).reshape(2, 3, 4)
In [3]: x
Out[3]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
In [4]: x.transpose(1, 0, 2)
Out[4]:
array([[[ 0, 1, 2, 3],
[12, 13, 14, 15]],
[[ 4, 5, 6, 7],
[16, 17, 18, 19]],
[[ 8, 9, 10, 11],
[20, 21, 22, 23]]])
Here, original array sizes are (2, 3, 4). We changed the 1st and 2nd, so it becomes (3, 2, 4) in size. If we look closer to see how the rearrangement exactly happened; arrays of numbers seems to have changed in a particular pattern. Using the paper analogy of #RobertB, if we were to take the 2 chunks of numbers, and write each one on sheets, then take one row from each sheet to construct one dimension of the array, we would now have a 3x2x4-sized array, counting from the outermost to the innermost layer.
[ 0, 1, 2, 3] \ [12, 13, 14, 15]
[ 4, 5, 6, 7] \ [16, 17, 18, 19]
[ 8, 9, 10, 11] \ [20, 21, 22, 23]
It could be a good idea to play with different sized arrays, and change different axes to gain a better intuition of how it works.
I ran across this in Python for Data Analysis by Wes McKinney as well.
I will show the simplest way of solving this for a 3-dimensional tensor, then describe the general approach that can be used for n-dimensional tensors.
Simple 3-dimensional tensor example
Suppose you have the (2,2,4)-tensor
[[[ 0 1 2 3]
[ 4 5 6 7]]
[[ 8 9 10 11]
[12 13 14 15]]]
If we look at the coordinates of each point, they are as follows:
[[[ (0,0,0) (0,0,1) (0,0,2) (0,0,3)]
[ (0,1,0) (0,1,1) (0,1,2) (0,1,3)]]
[[ (1,0,0) (1,0,1) (1,0,2) (0,0,3)]
[ (1,1,0) (1,1,1) (1,1,2) (0,1,3)]]
Now suppose that the array above is example_array and we want to perform the operation: example_array.transpose(1,2,0)
For the (1,2,0)-transformation, we shuffle the coordinates as follows (note that this particular transformation amounts to a "left-shift":
(0,0,0) -> (0,0,0)
(0,0,1) -> (0,1,0)
(0,0,2) -> (0,2,0)
(0,0,3) -> (0,3,0)
(0,1,0) -> (1,0,0)
(0,1,1) -> (1,1,0)
(0,1,2) -> (1,2,0)
(0,1,3) -> (1,3,0)
(1,0,0) -> (0,0,1)
(1,0,1) -> (0,1,1)
(1,0,2) -> (0,2,1)
(0,0,3) -> (0,3,0)
(1,1,0) -> (1,0,1)
(1,1,1) -> (1,1,1)
(1,1,2) -> (1,2,1)
(0,1,3) -> (1,3,0)
Now, for each original value, place it into the shifted coordinates in the result matrix.
For instance, the value 10 has coordinates (1, 0, 2) in the original matrix and will have coordinates (0, 2, 1) in the result matrix. It is placed into the first 2d tensor submatrix in the third row of that submatrix, in the second column of that row.
Hence, the resulting matrix is:
array([[[ 0, 8],
[ 1, 9],
[ 2, 10],
[ 3, 11]],
[[ 4, 12],
[ 5, 13],
[ 6, 14],
[ 7, 15]]])
General n-dimensional tensor approach
For n-dimensional tensors, the algorithm is the same. Consider all of the coordinates of a single value in the original matrix. Shuffle the axes for that individual coordinate. Place the value into the resulting, shuffled coordinates in the result matrix. Repeat for all of the remaining values.
To summarise a.transpose()[i,j,k] = a[k,j,i]
a = np.array( range(24), int).reshape((2,3,4))
a.shape gives (2,3,4)
a.transpose().shape gives (4,3,2) shape tuple is reversed.
when is a tuple parameter is passed axes are permuted according to the tuple.
For example
a = np.array( range(24), int).reshape((2,3,4))
a[i,j,k] equals a.transpose((2,0,1))[k,i,j]
axis 0 takes 2nd place
axis 1 takes 3rd place
axis 2 tales 1st place
of course we need to take care that values in tuple parameter passed to transpose are unique and in range(number of axis)

How to save a Python list of arrays of varied dimensions to mat file [duplicate]

This question already has an answer here:
creating Matlab cell arrays in python
(1 answer)
Closed 2 years ago.
I am going to save a list of arrays into a file that can be read in Matlab. The arrays in the list are of 3-dimensional but varied shapes, so I cannot put them into a single large array.
Originally I thought I could save the list in to a pickle file and then read the file in Matlab. Later I found Matlab does not support reading the pickle file. I also tried using scipy.io.savemat to save the list to a mat file. The inconsistencies of array dimensions causes saving problems.
Anyone has ideas of how to solve the problem? It should be noted that the list file is very large in memory (>4 G).
If you don't need a single file, you can iterate through the list and savemat each entry into a separate file. Then iterate through that directory and load each file into a cell array element.
You can also zip the directory you store these in, to get one file to pass around.
savemat is the right tool for saving numpy arrays in a MATLAB format. I'd suggest making a desired structure in MATLAB, save it, and loadmat. Duplicate that layout when going the other way.
Also loadmat the savemat file to get a better idea how it maps python objects onto MATLAB ones.
Arrays may become order F 2d arrays. Cells may become object dtype arrays, structs may become structured arrays.
Create a .mat` from a list:
In [180]: io.savemat('test.mat', {'x':[np.arange(12).reshape(3,4), np.arange(3), 4]})
reload it:
In [181]: data = io.loadmat('test.mat')
In [182]: data
Out[182]:
{'__header__': b'MATLAB 5.0 MAT-file Platform: posix, Created on: Sat Mar 21 20:02:02 2020',
'__version__': '1.0',
'__globals__': [],
'x': array([[array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]]),
array([[0, 1, 2]]), array([[4]])]], dtype=object)}
It has one named variable
In [183]: data['x']
Out[183]:
array([[array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]]),
array([[0, 1, 2]]), array([[4]])]], dtype=object)
The shape is 2d (like all MATLAB) and object dtype (to hold a mix of items):
In [185]: data['x'].shape
Out[185]: (1, 3)
Within that is a 2d array:
In [186]: data['x'][0,0].shape
Out[186]: (3, 4)
If I check the flags I see data['x'] is order F, F contiguous (but the source, used in the savemat was default C contiguous).
Note also the change in shape of the 1d array and scalar - again following MATLAB conventions.
In Octave:
>> data = load('test.mat')
data =
scalar structure containing the fields:
x =
{
[1,1] =
0 1 2 3
4 5 6 7
8 9 10 11
[1,2] =
0 1 2
[1,3] = 4
}
we get a cell variable.
If I wanted a numpy matrix that didn't get changed in this transfer, I'd have to start with something like:
In [188]: np.arange(12).reshape(3,4,order='F')
Out[188]:
array([[ 0, 3, 6, 9],
[ 1, 4, 7, 10],
[ 2, 5, 8, 11]])
numpy is, by default C order, row-major, with first dimension being the outermost. MATLAB is the opposite - F (fortran), column major, last dimension outermost.

Categories

Resources