Trouble reshaping 3-d NumPy array into 2-d NumPy array - python

I'm working on a problem with image processing, and my data is presented as a 3-dimensional NumPy array, where the (x, y, z) entry is the (x, y) pixel (numerical intensity value) of image z. There are 100000 images and each image is 25x25. Thus, the data matrix is of size 25x25x10000. I am trying to convert this into a 2-dimensional matrix of size 10000x625, where each row is a linearization of the pixels in the image. For example, suppose that instead the images were 3x3, we would have the following:
1 2 3
4 5 6 ------> [1, 2, 3, 4, 5, 6, 7, 8, 9]
7 8 9
I am attempting to do this by calling data.reshape((10000, 625)), but the data is no longer aligned properly after doing so. I have tried transposing the matrix in valid stages of reshaping, but that does not seem to fix it.
Does anyone know how to fix this?

If you want the data to be aligned you need to do data.reshape((625, 10000)).
If you want a different layout try np.rollaxis:
data_rolled = np.rollaxis(data, 2, 0) # This is Shape (10000, 25, 25)
data_reshaped = data_rolled.reshape(10000, 625) # Now you can do your reshape.
Numpy needs you to know which elements belong together during reshaping, so only "merge" dimensions that belong together.

The problem is that you aren't respecting the standard index order in your reshape call. The data will only be aligned if the two dimensions you want to combine are in the same position in the new array ((25, 25, 10000) -> (625, 10000)).
Then, to get the shape you want, you can transpose. It's easier to visualize with a smaller example -- when you run into problems like this, always try out a smaller example in the REPL if you can.
>>> a = numpy.arange(12)
>>> a = a.reshape(2, 2, 3)
>>> a
array([[[ 0, 1, 2],
[ 3, 4, 5]],
[[ 6, 7, 8],
[ 9, 10, 11]]])
>>> a.reshape(4, 3)
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
>>> a.reshape(4, 3).T
array([[ 0, 3, 6, 9],
[ 1, 4, 7, 10],
[ 2, 5, 8, 11]])
No need to rollaxis!
Notice how the print layout that numpy uses makes this kind of reasoning easier. The differences between the first and the second step are only in the bracket positions; the numbers all stay in the same place, which often helps when you want to think through shape issues.

Related

opencv / numpy how to get the sum of each pixel

I've got an image 2d array with each pixel containing rgb value (bgr in opencv i guess) and i'm trying to get the a new 2d array which has the sum of each pixel instead.
e.g.
start image:
shape: (1080,1920,3)
[[[255,255,255], [0,0,0]],
[[0,120,255], [0,255,0]]]
result:
shape: (1080,1920,1)
[[[765],[0]],
[[375],[255]]]
I'm sure there's a simple Numpy solution that I just do not know yet...
Any help would be greatly appreciated!
mono = rgb.sum(axis=2)
That produces a shape (1080,1920). If you really need it to have a third dimension, you can use reshape.
By the way, if you're really trying to produce monochrome, this is not the way to do it. There's a formula to convert RGB to mono, and OpenCV has tools to do it.
Are you sure it's a 2-d array? Usually image arrays are 3-d, with shape (height, width, n_channels). If you have an array like that, you can use the sum method on an array, summing across the channel axis.
eg.
In [1]: a = np.random.randint(0, 10, (2, 3, 4))
In [2]: a
Out[2]:
array([[[5, 1, 7, 0],
[7, 3, 1, 5],
[5, 7, 0, 2]],
[[5, 2, 0, 9],
[4, 7, 4, 4],
[0, 7, 1, 3]]])
In [3]: a.sum(axis=-1)
Out[3]:
array([[13, 16, 14],
[16, 19, 11]])

if i have ndarray as say n=np.arange(16).reshape(4,4), which is the second to last axis?

I was going through one of the documentation of NumPy module, I come across something like : If a is an N-D array and b is an M-D array (where M>=2), it is a sum product over the last axis of a and the second-to-last axis of b, I'm beginner to NumPy I thought there are only 2 axes 0 ( rows) and 1( columns) could someone please explain what it means? if I have ND array as say n=np.arange(16).reshape(4,4), which is the second to last axis?
when you first think of it as a simple data structure, you can think of 2-dimensional arrays as rows and columns. But here, instead of saying 0:represents row and 1:column, it is more correct to say 0:represents data and 1:represents dimensions.
In other words, you need to look at the dimension-based, not the axis-based.
np.arange(16).reshape(4,4)
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
Here, we get an array with n*m(4*4) ie 4 dimensions, and 16 data in it.
Below, we obtain a 2-dimensional array containing 16 data.
np.arange(16).reshape(8,2)
array([[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11],
[12, 13],
[14, 15]])
As for the question you want to learn.
a=np.arange(16).reshape(4,4)
print(a[:,-2])
array([ 2, 6, 10, 14])
The above expression returns data in the second-to-last dimension.

1D vector from lower part of diagonal in matrix [Python]

I am struggeling with pretty easy thing but unfortunatelly I cannot solve it. I have a matrix 64x64 elements as you can see on the image. Where reds are zeros and greens are values I am interested in.
I would like to end up with only lower triangular part under diagonal (green values) into one array.
I use Python 2.7
Thank you a lot,
Michael
Assuming you can pull your data into a numpy array, use the tril_indices function. It looks like your data doesn't include the main diagonal so you can shift by -1
data = np.arange(4096).reshape(64, 64)
inds = np.tril_indices(64, -1)
vals = data[inds]
You can use np.tril_indices which returns the indices of a lower triangular part of a matrix with given shape, the indices can be further used to extract values from the matrix, suppose your matrix is called arr:
arr[np.tril_indices(n=64,m=64)]
You can provide an extra offset parameter if you want to exclude the diagonal:
arr[np.tril_indices(n = 64, m = 64, k = -1)]
An example:
arr = np.array([list(range(i, 5+i)) for i in range(5)])
arr
#array([[0, 1, 2, 3, 4],
# [1, 2, 3, 4, 5],
# [2, 3, 4, 5, 6],
# [3, 4, 5, 6, 7],
# [4, 5, 6, 7, 8]])
arr[np.tril_indices(n = 5, m = 5)]
# array([0, 1, 2, 2, 3, 4, 3, 4, 5, 6, 4, 5, 6, 7, 8])
Two time faster than triu on this example :
np.concatenate([arr[i,:i] for i in range(1,n)])

How can I find the dimensions of a matrix in Python?

How can I find the dimensions of a matrix in Python. Len(A) returns only one variable.
Edit:
close = dataobj.get_data(timestamps, symbols, closefield)
Is (I assume) generating a matrix of integers (less likely strings). I need to find the size of that matrix, so I can run some tests without having to iterate through all of the elements. As far as the data type goes, I assume it's an array of arrays (or list of lists).
The number of rows of a list of lists would be: len(A) and the number of columns len(A[0]) given that all rows have the same number of columns, i.e. all lists in each index are of the same size.
If you are using NumPy arrays, shape can be used.
For example
>>> a = numpy.array([[[1,2,3],[1,2,3]],[[12,3,4],[2,1,3]]])
>>> a
array([[[ 1, 2, 3],
[ 1, 2, 3]],
[[12, 3, 4],
[ 2, 1, 3]]])
>>> a.shape
(2, 2, 3)
As Ayman farhat mentioned
you can use the simple method len(matrix) to get the length of rows and get the length of the first row to get the no. of columns using len(matrix[0]) :
>>> a=[[1,5,6,8],[1,2,5,9],[7,5,6,2]]
>>> len(a)
3
>>> len(a[0])
4
Also you can use a library that helps you with matrices "numpy":
>>> import numpy
>>> numpy.shape(a)
(3,4)
To get just a correct number of dimensions in NumPy:
len(a.shape)
In the first case:
import numpy as np
a = np.array([[[1,2,3],[1,2,3]],[[12,3,4],[2,1,3]]])
print("shape = ",np.shape(a))
print("dimensions = ",len(a.shape))
The output will be:
shape = (2, 2, 3)
dimensions = 3
m = [[1, 1, 1, 0],[0, 5, 0, 1],[2, 1, 3, 10]]
print(len(m),len(m[0]))
Output
(3 4)
The correct answer is the following:
import numpy
numpy.shape(a)
Suppose you have a which is an array. to get the dimensions of an array you should use shape.
import numpy as np
a = np.array([[3,20,99],[-13,4.5,26],[0,-1,20],[5,78,-19]])
a.shape
The output of this will be
(4,3)
You may use as following to get Height and Weight of an Numpy array:
int height = arr.shape[0]
int width = arr.shape[1]
If your array has multiple dimensions, you can increase the index to access them.
You simply can find a matrix dimension by using Numpy:
import numpy as np
x = np.arange(24).reshape((6, 4))
x.ndim
output will be:
2
It means this matrix is a 2 dimensional matrix.
x.shape
Will show you the size of each dimension. The shape for x is equal to:
(6, 4)
A simple way I look at it:
example:
h=np.array([[[[1,2,3],[3,4,5]],[[5,6,7],[7,8,9]],[[9,10,11],[12,13,14]]]])
h.ndim
4
h
array([[[[ 1, 2, 3],
[ 3, 4, 5]],
[[ 5, 6, 7],
[ 7, 8, 9]],
[[ 9, 10, 11],
[12, 13, 14]]]])
If you closely observe, the number of opening square brackets at the beginning is what defines the dimension of the array.
In the above array to access 7, the below indexing is used,
h[0,1,1,0]
However if we change the array to 3 dimensions as below,
h=np.array([[[1,2,3],[3,4,5]],[[5,6,7],[7,8,9]],[[9,10,11],[12,13,14]]])
h.ndim
3
h
array([[[ 1, 2, 3],
[ 3, 4, 5]],
[[ 5, 6, 7],
[ 7, 8, 9]],
[[ 9, 10, 11],
[12, 13, 14]]])
To access element 7 in the above array, the index is h[1,1,0]

numpy: efficiently reading a large array

I have a binary file that contains a dense n*m matrix of 32-bit floats. What's the most efficient way to read it into a Fortran-ordered numpy array?
The file is multi-gigabyte in size. I get to control the format, but it must be compact (i.e. about 4*n*m bytes in length) and must be easy to produce from non-Python code.
edit: It is imperative that the method produces a Fortran-ordered matrix directly (due to the size of the data, I can't afford to create a C-ordered matrix and then transform it into a separate Fortran-ordered copy.)
NumPy provides fromfile() to read binary data.
a = numpy.fromfile("filename", dtype=numpy.float32)
will create a one-dimensional array containing your data. To access it as a two-dimensional Fortran-ordered n x m matrix, you can reshape it:
a = a.reshape((n, m), order="FORTRAN")
[EDIT: The reshape() actually copies the data in this case (see the comments). To do it without cpoying, use
a = a.reshape((m, n)).T
Thanks to Joe Kingtion for pointing this out.]
But to be honest, if your matrix has several gigabytes, I would go for a HDF5 tool like h5py or PyTables. Both of the tools have FAQ entries comparing the tool to the other one. I generally prefer h5py, though PyTables seems to be more commonly used (and the scopes of both projects are slightly different).
HDF5 files can be written from most programming language used in data analysis. The list of interfaces in the linked Wikipedia article is not complete, for example there is also an R interface. But I actually don't know which language you want to use to write the data...
Basically Numpy stores the arrays as flat vectors. The multiple dimensions are just an illusion created by different views and strides that the Numpy iterator uses.
For a thorough but easy to follow explanation how Numpy internally works, see the excellent chapter 19 on The Beatiful Code book.
At least Numpy array() and reshape() have an argument for C ('C'), Fortran ('F') or preserved order ('A').
Also see the question How to force numpy array order to fortran style?
An example with the default C indexing (row-major order):
>>> a = np.arange(12).reshape(3,4) # <- C order by default
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> a[1]
array([4, 5, 6, 7])
>>> a.strides
(32, 8)
Indexing using Fortran order (column-major order):
>>> a = np.arange(12).reshape(3,4, order='F')
>>> a
array([[ 0, 3, 6, 9],
[ 1, 4, 7, 10],
[ 2, 5, 8, 11]])
>>> a[1]
array([ 1, 4, 7, 10])
>>> a.strides
(8, 24)
The other view
Also, you can always get the other kind of view using the parameter T of an array:
>>> a = np.arange(12).reshape(3,4, order='C')
>>> a.T
array([[ 0, 4, 8],
[ 1, 5, 9],
[ 2, 6, 10],
[ 3, 7, 11]])
>>> a = np.arange(12).reshape(3,4, order='F')
>>> a.T
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
You can also manually set the strides:
>>> a = np.arange(12).reshape(3,4, order='C')
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> a.strides
(32, 8)
>>> a.strides = (8, 24)
>>> a
array([[ 0, 3, 6, 9],
[ 1, 4, 7, 10],
[ 2, 5, 8, 11]])

Categories

Resources