How to reshape in Python by preserving structure of matrix - python

I am trying to make a transformation that I thought to be trivial but I am not able to find any solution to this. I have an array of dimensions (3, 3, 2), such as the following one:
array([[1,2], [3,4], [5,6]],
[[7,8], [9,0], [1,2]],
[[3,4], [5,6], [7,8]])
I would like to transform it to a (2, 3, 3) array of the following form
array([[1, 3, 5],
[7, 9, 1],
[3, 5, 7]],
[[2, 4, 6],
[8, 0, 2],
[4, 6, 8]])
such that each matrix of this array contains all the elements of the respective indices for each tuple in the first matrix. Is there any way to do this, either with NumPy or preprocessing tools of ML libraries? In a sense this operation corresponds to "decoupling" the channels of an image in two separate matrix.

As suggested by user hpaulj, the operations needed to do this were np.swapaxes(A, 0, 2).transpose(0,2,1) if A was the original array.
As explained in the documentation, swapaxes interchanges the two axes of a multidimensional array. This leaves the submatrices in a transposed form with respect to the one we want, so we should transpose the array in such a way to preserve the order of the array containing the submatrices and transposing only the submatrices themselves.

Related

Transforming a 2D array to 1D array to create Data Frame

I have three 2D np.array that mathematically are [8:1550] matrices, and I want to express them into 1D np.array of 12400 numbers (8 x 1550 = 12400...) so that I could create a DataFrame later with this code:
Exported_Data = pd.DataFrame({"UD": UD_Data, "NS": NS_Data, "EW": EW_Data})
Exported_Data.to_csv("EXCEL.csv")
To put a simpler example, if I have this:
A = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
And I want to obtain this from that:
B = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
What is the best way to do it?
I would suggest use reshape. It most likely creates a view and is more efficient whereas np.flatten creates a copy:
B = A.reshape(-1)
-1 implicitly takes care of required dimension size.
You can use A.flatten() to convert a 2D array to a 1D array.

Python Numpy syntax: what does array index as two arrays separated by comma mean?

I don't understand array as index in Python Numpy.
For example, I have a 2d array A in Numpy
[[1,2,3]
[4,5,6]
[7,8,9]
[10,11,12]]
What does A[[1,3], [0,1]] mean?
Just test it for yourself!
A = np.arange(12).reshape(4,3)
print(A)
>>> array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
By slicing the array the way you did (docs to slicing), you'll get the first row, zero-th column element and the third row, first column element.
A[[1,3], [0,1]]
>>> array([ 3, 10])
I'd highly encourage you to play around with that a bit and have a look at the documentation and the examples.
Your are creating a new array:
import numpy as np
A = [[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[10, 11, 12]]
A = np.array(A)
print(A[[1, 3], [0, 1]])
# [ 4 11]
See Indexing, Slicing and Iterating in the tutorial.
Multidimensional arrays can have one index per axis. These indices are given in a tuple separated by commas
Quoting the doc:
def f(x,y):
return 10*x+y
b = np.fromfunction(f, (5, 4), dtype=int)
print(b[2, 3])
# -> 23
You can also use a NumPy array as index of an array. See Index arrays in the doc.
NumPy arrays may be indexed with other arrays (or any other sequence- like object that can be converted to an array, such as lists, with the exception of tuples; see the end of this document for why this is). The use of index arrays ranges from simple, straightforward cases to complex, hard-to-understand cases. For all cases of index arrays, what is returned is a copy of the original data, not a view as one gets for slices.

How to split an 2D array, creating arrays from "row to row" values

I want to split an 2D array this way:
Example.
From this 4x4 2D array:
np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]])
Create these four 2x2 2D arrays:
np.array([[1,2],[3,4]])
np.array([[5,6],[7,8]])
np.array([[9,10],[11,12]])
np.array([[13,14],[15,16]])
In a general case, from a NxN 2D array (square arrays) create 2D arrays of KxK shape, as many as possible.
Just to be more precise: to create the output array, not necessarily it will be made of all values from the row.
Example:
From a 2D 8x8 array, with values from 1 to 64, if I want to split this array in 2D 2x2 arrays, the first row from 8x8 array is a row from 1 to 8, and the first output 2D 2x2 array will be np.array([[1,2],[3,4]]), and the second output 2D 2x2 array will be np.array([[5,6],[7,8]])... It continues until the last output 2D array, that will be np.array([[61,62],[63,64]]). Look that each 2D 2x2 array was not filled with all the values from the row (CORRECT).
There is a Numpy method that do this?
You're probably looking for something like numpy.reshape.
In your example:
numpy.array([[1,2,3,4], [5,6,7,8]]).reshape(2,4)
>>>array([[1,2], [3,4], [5,6], [7,8]])
Or, as suggested by #MSeifert, using -1 as final dimension will let numpy do the division by itself:
numpy.array([[1,2,3,4], [5,6,7,8]]).reshape(2,-1)
>>>array([[1,2], [3,4], [5,6], [7,8]])
To get your desired output, you need to reshape to a 3D array and then unpack the first dimension:
>>> inp = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]])
>>> list(inp.reshape(-1, 2, 2))
[array([[1, 2],
[3, 4]]),
array([[5, 6],
[7, 8]]),
array([[ 9, 10],
[11, 12]]),
array([[13, 14],
[15, 16]])]
You can also unpack using = if you want to store the arrays in different variables instead of in one list of arrays:
>>> out1, out2, out3, out4 = inp.reshape(-1, 2, 2)
>>> out1
array([[1, 2],
[3, 4]])
If you're okay with a 3D array containing your 2D 2x2 arrays you don't need unpacking or the list() call:
>>> inp.reshape(-1, 2, 2)
array([[[ 1, 2],
[ 3, 4]],
[[ 5, 6],
[ 7, 8]],
[[ 9, 10],
[11, 12]],
[[13, 14],
[15, 16]]])
The -1 is a special value for reshape. As the documentation states:
One shape dimension can be -1. In this case, the value is inferred from the length of the array and remaining dimensions.
If you want it more general, just take the square root of the row-length and use that as argument for reshape:
>>> inp = np.ones((8, 8)) # 8x8 array
>>> square_shape = 2
>>> inp.reshape(-1, square_shape, square_shape) # 16 2x2 arrays
>>> square_shape = 4
>>> inp.reshape(-1, square_shape, square_shape) # 4 4x4 arrays
If you want to split it row wise, you may do np.reshape(arr,(2,2), order='C')
If you want to split it column wise, you may do not.reshape(arr,(2,2), order='F')

How to sum 2d and 1d arrays in numpy?

Supposing I have 2d and 1d numpy array. I want to add the second array to each subarray of the first one and to get a new 2d array as the result.
>>> import numpy as np
>>> a = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
>>> b = np.array([2, 3])
>>> c = ... # <-- What should be here?
>>> c
array([[3, 5],
[5, 7],
[7, 9],
[9, 22]])
I could use a loop but I think there're standard ways to do it within numpy.
What is the best and quickest way to do it? Performance matters.
Thanks.
I think the comments are missing the explanation of why a+b works. It's called broadcasting
Basically if you have a NxM matrix and a Nx1 vector, you can directly use the + operator to "add the vector to each row of the matrix.
This also works if you have a 1xM vector and want to add it columnwise.
Broadcasting also works with other operators and other Matrix dimensions.
Take a look at the documentation to fully understand broadcasting

Numpy 3d array indexing

I have a 3d numpy array (n_samples x num_components x 2) in the example below n_samples = 5 and num_components = 7.
I have another array (indices) which is the selected component for each sample which is of shape (n_samples,).
I want to select from the data array given the indices so that the resulting array is n_samples x 2.
The code is below:
import numpy as np
np.random.seed(77)
data=np.random.randint(low=0, high=10, size=(5, 7, 2))
indices = np.array([0, 1, 6, 4, 5])
#how can I select indices from the data array?
For example for data 0, the selected component should be the 0th and for data 1 the selected component should be 1.
Note that I can't use any for loops because I'm using it in Theano and the solution should be solely based on numpy.
Is this what you are looking for?
In [36]: data[np.arange(data.shape[0]),indices,:]
Out[36]:
array([[7, 4],
[7, 3],
[4, 5],
[8, 2],
[5, 8]])
To get component #0, use
data[:, 0]
i.e. we get every entry on axis 0 (samples), and only entry #0 on axis 1 (components), and implicitly everything on the remaining axes.
This can be easily generalized to
data[:, indices]
to select all relevant components.
But what OP really wants is just the diagonal of this array, i.e. (data[0, indices[0]], (data[1, indices[1]]), ...) The diagonal of a high-dimensional array can be extracted using the diagonal function:
>>> np.diagonal(data[:, indices])
array([[7, 7, 4, 8, 5],
[4, 3, 5, 2, 8]])
(You may need to transpose the result.)
You have a variety of ways to do so, but this is my loop recommendation:
selection = np.array([ datum[indices[k]] for k,datum in enumerate(data)])
The resulting array, selection, has the desired shape.

Categories

Resources