In the python computer graphics kit, there is a vec3 type for the representation of three-component vectors, but how can I do the following multiplication:
A three-component vector multiply by its transpose result in a 3*3 matrix, like the following example:
a = vec3(1,1,1)
matrix_m = a * a.transpose()
Anyone knows such a library that can handle multiplying a matrix of dimension 1*3 by another one of dimension 3*1 and result in a matrix of 3*3.
Sorry, I have to clarify a bit more about this. I am talking about matrix math.
It is like:
[a0, a1, a2]*[a0, a1, a2]T = [a0*a0, a0*a1, a0*a2; a1*a0, a1*a1, a1*a2;a2*a0, a2*a1, a2*a2]
Maybe I can try write a function myself, it is so straightforward.....
Some vector math software, such as MATLAB, happily keep track of column vectors and row vectors as separate types of things. Python's Numpy doesn't, but does offer numpy.outer(A,B). Unfortunately, the Graphics Kit (I assume you refer to http://cgkit.sourceforge.net/) doesn't track rows vs columns, use numpy (which would be huge overkill), or provide a vector x vector --> matrix outer product. It looks like you'll have to write your own function to do that.
Related
I have a matrix of 63695 row vectors of dim 384.
I would like to compute the cosine similarity model for this matrix.
I was thinking of vectorizing it.
How would one proceed to that objective?
If you look in scikit-learns source code you will see that X and Y are first normalized and then X_norm # Y_norm.T (dot product) is returned. Or if as in your case no Y exists it is X_norm # X_norm.T.
Normalizing and transposing can be discarded when looking at the runtime, but the matrix multiplaction of a (63695 x 384) matrix should take somewhere in the neighbourhood of 63695*63695 (elements in result matrix) times 384*384 (element-wise multiplactions and additions to calculate one element) calculations, so something like 63695*63695*384*384 = 598,236,810,854,400 operations. (Or strictly, that number of multiplications plus that same number of additions.)
And as you already mentioned it requires 4 (Bytes for float32) * 63695 * 63695 = ~16.2 GB of memory to handle that result matrix.
Do you really need that enormous matrix? What type of data are you handling and what are you trying to do? If we are talking about e.g. vector represenations of text data then you should look at removing duplicates, processing it in chunks or reducing the dimensionality before analysing similarity. If you are looking for something like ranking these cosine similarities and finding then k most similar ones you'd be much better of using algorithms for finding similar data points instead of doing it all by hand yourself.
I'm working on a calculation for a within matrix scatter where i have a 50x20 vector and something that occured to me is that multiplying transposed vectors by the original vector, gives me a dimensional error, saying the following:
operands could not be broadcast together with shapes (50,20) (20,50)
What i tried is: array = my_array * my_array_transposed and got the aforementioned error.
The alternative was to do, then:
new_array = np.dot(my_array, np.transpose(my_array))
In Octave for instance, this would've been a lot easier, but due to the size of the vector, it's kinda hard for me to confirm for ground truth if this is the way to do the following calculation:
Because as far as i know, there is something related as to whether the multiplication is element wise.
My question is, am i applying that formula the right way? If not, whats the right way to multiply a transposed vector by the non-tranposed vector?
Yes, the np.dot formula is the correct one. If you write array = my_array * my_array_transposed you are asking Python to perform component-wise multiplication. Instead you need a row-by-column multiplication which is achieved in numpy with np.dot.
In python I have 2 three dimensional arrays:
T with size (n,n,n)
U with size (k,n,n)
T and U can be seen as many 2-D arrays one next to the other. I need to multiply all those matrices, ie I have to perform the following operation:
for i in range(n):
H[:,:,i] = U[:,:,i].dot(T[:,:,i]).dot(U[:,:,i].T)
As n might be very big I am wondering if this operation could be in some way speed up with numpy.
Carefully looking into the iterators and how they are involved in those dot product reductions, we could translate all of those into one np.einsum implementation like so -
H = np.einsum('ijk,jlk,mlk->imk',U,T,U)
I read something about NumPy and it's Matrix class. In the documentation the authors write, that we can create only a 2 dimensional Matrix. So I think they mean you can only write something like this:
input = numpy.matrix( ((1,2), (3,4))
Is this right?
But when I write code like this:
input = numpy.matrix( ((1,2), (3,4), (4,5)) )
it also works ...
Normally I would say ok, why not, I'm not intrrested why it works. But I must write an exam for my univerity and so I must know if I've understood it right or do they mean something else with 2D Matrix?
Thanks for your help
They both are 2D matrixes. The first one is 2x2 2D matrix and the second one is 3x2 2D matrix. It is very similar to 2D arrays in programming. The second matrix is defined as int matrix[3][2] in C for example.
Then, a 3D matrix means that it has the following definition: int 3d_array[3][2][3].
In numpy, if i try this with a 3d matrix:
>>> input = numpy.matrix((((2, 3), (4, 5)), ((6, 7), (8, 9))))
ValueError: matrix must be 2-dimensional
emre.'s answer is correct, but I would still like to address the use of numpy matrices, which might be the root of your confusion.
When in doubt about using numpy.matrix, go for ndarrays :
Matrix is actually a ndarray subclass : Everything a matrix can do, ndarray can do it (reverse is not exactly true).
Matrix overrides * and ** operators, and any operation between a Matrix and a ndarray will return a matrix, which is problematic for some algorithms.
More on the ndarray vs matrix debate on this SO post, and specifically this short answer
From Numpy documentation
matrix objects inherit from the ndarray and therefore, they have the same attributes and methods of ndarrays. There are six important differences of matrix objects, however, that may lead to unexpected results when you use matrices but expect them to act like arrays:
Matrix objects can be created using a string notation to allow Matlab-style syntax where spaces separate columns and semicolons (‘;’) separate rows.
Matrix objects are always two-dimensional. This has far-reaching implications, in that m.ravel() is still two-dimensional (with a 1 in the first dimension) and item selection returns two-dimensional objects so that sequence behavior is fundamentally different than arrays.
Matrix objects over-ride multiplication to be matrix-multiplication. Make sure you understand this for functions that you may want to receive matrices. Especially in light of the fact that asanyarray(m) returns a matrix when m is a matrix.
Matrix objects over-ride power to be matrix raised to a power. The same warning about using power inside a function that uses asanyarray(...) to get an array object holds for this fact.
The default __array_priority__ of matrix objects is 10.0, and therefore mixed operations with ndarrays always produce matrices.
Matrices have special attributes which make calculations easier. [...]
I have two M X N matrices which I construct after extracting data from images. Both the vectors have lengthy first row and after the 3rd row they all become only first column.
for example raw vector looks like this
1,23,2,5,6,2,2,6,2,
12,4,5,5,
1,2,4,
1,
2,
2
:
Both vectors have a similar pattern where first three rows have lengthy row and then thin out as it progress. Do do cosine similarity I was thinking to use a padding technique to add zeros and make these two vectors N X N. I looked at Python options of cosine similarity but some examples were using a package call numpy. I couldn't figure out how exactly numpy can do this type of padding and carry out a cosine similarity. Any guidance would be greatly appreciated.
If both arrays have the same dimension, I would flatten them using NumPy. NumPy (and SciPy) is a powerful scientific computational tool that makes matrix manipulations way easier.
Here an example of how I would do it with NumPy and SciPy:
import numpy as np
from scipy.spatial import distance
A = np.array([[1,23,2,5,6,2,2,6,2],[12,4,5,5],[1,2,4],[1],[2],[2]], dtype=object )
B = np.array([[1,23,2,5,6,2,2,6,2],[12,4,5,5],[1,2,4],[1],[2],[2]], dtype=object )
Aflat = np.hstack(A)
Bflat = np.hstack(B)
dist = distance.cosine(Aflat, Bflat)
The result here is dist = 1.10e-16 (i.e., 0).
Note that I've used here the dtype=object because that's the only way I know to be able to store different shapes into an array in NumPy. That's why later I used hstack() in order to flatten the array (instead of using the more common flatten() function).
I would make them into a scipy sparse matrix (http://docs.scipy.org/doc/scipy/reference/sparse.html) and then run cosine similarity from the scikit learn module.
from scipy import sparse
sparse_matrix= scipy.sparse.csr_matrix(your_np_array)
from sklearn.metrics import pairwise_distances
from scipy.spatial.distance import cosine
distance_matrix= pairwise_distances(sparse_matrix, metric="cosine")
Why cant you just run a nested loop over both jagged lists (presumably), summating each row using Euclidian/vector dot product and using the result as a similarity measure. This assumes that the jagged dimensions are identical.
Although I'm not quite sure how you are getting a jagged array from a bitmap image (I would of assumed it would be a proper dense matrix of MxN form) or how the jagged array of arrays above is meant to represent an MxN matrix/image data, and therefore, how padding the data with zeros would make sense? If this was a sparse matrix representation, one would expect row/col information annotated with the values.