I have a numpy ndarray object with the following shape:
(3, 256, 170, 256).
So, basically this represents an array of 3-dimensional vectors. The dimension of the vector is the first element as it enables one to write something like: array[0] for the relevant vector component.
Now, I am trying to use scipy pdist function, which computes the distance between the entries. So, I need to modify this array, so that it can be represented as a two dimensional matrix, where the number of rows is 256*170*256 and the number of columns is 3 and pdist should return me the matrix where each element is the squared distance between the corresponding 3 dimensional vectors (if I have interpreted the documentation correctly).
Can someone tell me how I can get a view into this numpy array, so that I can generate this matrix. I do not want to copy the data again (as these matrices can be quite large), so looking for some efficient solutions.
Related
Suppose I have a numpy array A with shape (j,d,d) and I want to obtain an array with shape j, in which each entry corresponds to the determinant of each (d,d) array.
I tried using np.apply_along_axis(np.linalg.det(A), axis=0), but np.apply_along_axis only seems to work for 1D slices.
Is there an efficient way of doing that using only numpy?
np.linalg.det can already do this for an array of arbitrary shape as long as the last two dimensions are square. You can see the documentation here.
I have a 2D array of values and I'm trying to analyze spatial correlations. To calculate a 2D autocorrelation like Moran's I in python, pysal provides an implementation.
1) How do I transform my 2D data into a 1D array expected by pysal?
2) How do I construct a weight array w that is based on distance (what does the input array of points mean in the Kernel distance function?)?
1) The weights array should be flattened in the same way as you flatten the data array. The order doesn't matter, as long as the indices agree.
2) The input array can be spatial coordinates (e.g. x and y, or lat and long). By far the easiest are the indices of your original matrix (e.g. 1 to n times 1 to m).
In the end, your data will be a list with 3 elements: x, y and value. Your weights will be a list with 5 elements: x_from, y_from, x_to, y_to and weight.
I have a set of numpy.arrays of NXM (two dimensions: Range and Azimuth).
I need to form a stack of three dimensions and extract a single dimension vector to compute a covariance matrix (the red vectors in the picture).
How i do this efficiently and easy in Python?
You can make a 3D numpy array pretty easily and then just use the indexing to pull out the bits that you're interested in:
stackOfImages = np.array((image1, image2)) #iterate over these if many more
redData = stackOfImages[:, N-1, M-1]
I read something about NumPy and it's Matrix class. In the documentation the authors write, that we can create only a 2 dimensional Matrix. So I think they mean you can only write something like this:
input = numpy.matrix( ((1,2), (3,4))
Is this right?
But when I write code like this:
input = numpy.matrix( ((1,2), (3,4), (4,5)) )
it also works ...
Normally I would say ok, why not, I'm not intrrested why it works. But I must write an exam for my univerity and so I must know if I've understood it right or do they mean something else with 2D Matrix?
Thanks for your help
They both are 2D matrixes. The first one is 2x2 2D matrix and the second one is 3x2 2D matrix. It is very similar to 2D arrays in programming. The second matrix is defined as int matrix[3][2] in C for example.
Then, a 3D matrix means that it has the following definition: int 3d_array[3][2][3].
In numpy, if i try this with a 3d matrix:
>>> input = numpy.matrix((((2, 3), (4, 5)), ((6, 7), (8, 9))))
ValueError: matrix must be 2-dimensional
emre.'s answer is correct, but I would still like to address the use of numpy matrices, which might be the root of your confusion.
When in doubt about using numpy.matrix, go for ndarrays :
Matrix is actually a ndarray subclass : Everything a matrix can do, ndarray can do it (reverse is not exactly true).
Matrix overrides * and ** operators, and any operation between a Matrix and a ndarray will return a matrix, which is problematic for some algorithms.
More on the ndarray vs matrix debate on this SO post, and specifically this short answer
From Numpy documentation
matrix objects inherit from the ndarray and therefore, they have the same attributes and methods of ndarrays. There are six important differences of matrix objects, however, that may lead to unexpected results when you use matrices but expect them to act like arrays:
Matrix objects can be created using a string notation to allow Matlab-style syntax where spaces separate columns and semicolons (‘;’) separate rows.
Matrix objects are always two-dimensional. This has far-reaching implications, in that m.ravel() is still two-dimensional (with a 1 in the first dimension) and item selection returns two-dimensional objects so that sequence behavior is fundamentally different than arrays.
Matrix objects over-ride multiplication to be matrix-multiplication. Make sure you understand this for functions that you may want to receive matrices. Especially in light of the fact that asanyarray(m) returns a matrix when m is a matrix.
Matrix objects over-ride power to be matrix raised to a power. The same warning about using power inside a function that uses asanyarray(...) to get an array object holds for this fact.
The default __array_priority__ of matrix objects is 10.0, and therefore mixed operations with ndarrays always produce matrices.
Matrices have special attributes which make calculations easier. [...]
I have two M X N matrices which I construct after extracting data from images. Both the vectors have lengthy first row and after the 3rd row they all become only first column.
for example raw vector looks like this
1,23,2,5,6,2,2,6,2,
12,4,5,5,
1,2,4,
1,
2,
2
:
Both vectors have a similar pattern where first three rows have lengthy row and then thin out as it progress. Do do cosine similarity I was thinking to use a padding technique to add zeros and make these two vectors N X N. I looked at Python options of cosine similarity but some examples were using a package call numpy. I couldn't figure out how exactly numpy can do this type of padding and carry out a cosine similarity. Any guidance would be greatly appreciated.
If both arrays have the same dimension, I would flatten them using NumPy. NumPy (and SciPy) is a powerful scientific computational tool that makes matrix manipulations way easier.
Here an example of how I would do it with NumPy and SciPy:
import numpy as np
from scipy.spatial import distance
A = np.array([[1,23,2,5,6,2,2,6,2],[12,4,5,5],[1,2,4],[1],[2],[2]], dtype=object )
B = np.array([[1,23,2,5,6,2,2,6,2],[12,4,5,5],[1,2,4],[1],[2],[2]], dtype=object )
Aflat = np.hstack(A)
Bflat = np.hstack(B)
dist = distance.cosine(Aflat, Bflat)
The result here is dist = 1.10e-16 (i.e., 0).
Note that I've used here the dtype=object because that's the only way I know to be able to store different shapes into an array in NumPy. That's why later I used hstack() in order to flatten the array (instead of using the more common flatten() function).
I would make them into a scipy sparse matrix (http://docs.scipy.org/doc/scipy/reference/sparse.html) and then run cosine similarity from the scikit learn module.
from scipy import sparse
sparse_matrix= scipy.sparse.csr_matrix(your_np_array)
from sklearn.metrics import pairwise_distances
from scipy.spatial.distance import cosine
distance_matrix= pairwise_distances(sparse_matrix, metric="cosine")
Why cant you just run a nested loop over both jagged lists (presumably), summating each row using Euclidian/vector dot product and using the result as a similarity measure. This assumes that the jagged dimensions are identical.
Although I'm not quite sure how you are getting a jagged array from a bitmap image (I would of assumed it would be a proper dense matrix of MxN form) or how the jagged array of arrays above is meant to represent an MxN matrix/image data, and therefore, how padding the data with zeros would make sense? If this was a sparse matrix representation, one would expect row/col information annotated with the values.