Detecting Diagonals in a matrix - python

I have outputs from a process that produce a data trend as seen below:
The data output seems to have a trend with the diagonals, however I am unsure on how I can track this. Ultimately, I know the first 15 numbers in each 16 number sample, and want to predict the 16th. It seems like you should be able to do this with some type of approximation that involves matrix math or possible phase shift in a Fourier series. Is there a method that could achieve this? If there is a solution that can be used via Python that would be preferred.

you can use my diagonal detection matrix, it was developed for a similar issue, some times, it is referred to by Omran Matrix. All you need, is to multiply the image (your matrix) with my matrix, and summate the first row of the output, which will give you the number of diagonals in the image. The matrix is also very flexible and can be a vertical rectangular matrix, I used some tricks in the physical meaning to inverse it. I developed it in 2010 in Zurich, while doing my PhD to detect diagonal lines or overtones in sweeps in visual sound images. the matrix is published in Detecting diagonal activity to quantify harmonic structure preservation with cochlear implant mapping or formal link. The PhD thesis is called, mechanism of music perception using cochlear implants, University of Zurich, 2011 by Sherif Omran. If you write a paper, please cite me and good luck
here are similar images with overtones, I used my matrix to detect these diagonal activities, which look very near to yours.

Here is an example of how to check whether opposite diagonals contain only 1s, like in your case:
In [52]: from scipy.sparse import eye
let's create a matrix with a opposite diagonal
In [53]: a = np.fliplr(eye(5, 8, k=1).toarray())
In [54]: a
Out[54]:
array([[ 0., 0., 0., 0., 0., 0., 1., 0.],
[ 0., 0., 0., 0., 0., 1., 0., 0.],
[ 0., 0., 0., 0., 1., 0., 0., 0.],
[ 0., 0., 0., 1., 0., 0., 0., 0.],
[ 0., 0., 1., 0., 0., 0., 0., 0.]])
Flip array in the left/right direction
In [55]: f = np.fliplr(a)
In [56]: f
Out[56]:
array([[ 0., 1., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 1., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 1., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 1., 0., 0.]])
the same can be done:
In [71]: a[::-1,:]
Out[71]:
array([[ 0., 0., 1., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 1., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 1., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 1., 0.]])
get given diagonal
In [57]: np.diag(f, k=1)
Out[57]: array([ 1., 1., 1., 1., 1.])
In [58]: np.diag(f, k=-1)
Out[58]: array([ 0., 0., 0., 0.])
In [111]: a[::-1].diagonal(2)
Out[111]: array([ 1., 1., 1., 1., 1.])
check whether the whole diagonal contains 1s
In [61]: np.all(np.diag(f, k=1) == 1)
Out[61]: True
or
In [64]: (np.diag(f, k=1) == 1).all()
Out[64]: True
In [65]: (np.diag(f, k=0) == 1).all()
Out[65]: False
This answer will help you to find all diagonals
PS i'm a newbie in numpy, so i'm pretty sure there must be faster and more elegant solutions

Related

Is there an efficient way of representing a 2D numpy array for the purpose of fitting a GMM to it?

I have been using Gaussian Mixture Models (GMM) to model a set of peaks in a 2D numpy array (a).
a = np.array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 100., 1000., 100., 2., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 1., 1., 1., 1., 1., 1., 1., 0., 0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 1., 1., 100., 100., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 2., 1., 2., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 1., 1., 1., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 1., 0., 0.],
[0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
The problem is that in order to fit a GMM to my data with sklearn I have to first generate a density_array, which holds a huge amount of data points depending on the height of the peaks in a.
def convert_to_density_array(array):
"""
Convert an array to a density array
"""
density_list = []
# iterate over each i,j coordinate in the array
for (i, j), value in np.ndenumerate(array):
for x in range(int(value)):
density_list.append((i, j))
return np.array(density_list)
density_array = convert_to_density_array(a)
gmm = mixture.GaussianMixture(n_components=2,covariance_type='full').fit(density_array)
Is there an efficient way of representing a 2D numpy array for the purpose of fitting a GMM to it?
you can store data using less precision by adding dtype=np.float32 to your np.array call, which is okay as long as you are fine with 8 digits of precision instead of 15 (which is totally acceptable in your case), but that's the only way to store the same data in memory in less footprint and still pass it to gmm.
what you are trying to do is curve fitting, not data modelling , so you can use scipy curve fit on your original data without making density_array to start with, you just have to pass it a function of two gaussians and in a loop change the initial estimate randomly until you get the least error, but as writing the code for it will take some time, consider this approach only if you cannot get your data in memory using any other method.

Comparing Arrays for Accuracy

I've a 2 arrays:
np.array(y_pred_list).shape
# returns (5, 47151, 10)
np.array(y_val_lst).shape
# returns (5, 47151, 10)
np.array(y_pred_list)[:, 2, :]
# returns
array([[ 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
np.array(y_val_lst)[:, 2, :]
# returns
array([[ 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)
I would like to go through all 47151 examples, and calculate the "accuracy". Meaning the sum of those in y_pred_list that matches y_val_lst over 47151. What's the comparison function for this?
You can find a lot of useful classification scores in sklearn.metrics, particularly accuracy_score(). See the doc here, you would use it as:
import sklearn
acc = sklearn.metrics.accuracy_score(np.array(y_val_list)[:, 2, :],
np.array(y_pred_list)[:, 2, :])
Sounds like you want something like this:
accuracy = (y_pred_list == y_val_lst).all(axis=(0,2)).mean()
...though since your arrays are clearly floating-point arrays, you might want to allow for numerical-precision errors rather than insisting on exact equality:
accuracy = (numpy.abs(y_pred_list - y_val_lst) < tolerance ).all(axis=(0,2)).mean()
(where, for example, tolerance = 1e-10)
The .all(axis=(0,2)) call records cases in which everything in its input is True (i.e. everything matches) when working along the dimension 0 (i.e. the one that has extent 5) and dimension 2 (the one that has extent 10). It outputs a one-dimensional array of length 47151. The .mean() call then gives you the proportion of matches in that sequence, which is my best guess as to what you mean by "over 47151".

Sparse Construct: Repeating Identity

say I have with ij being large (e.g. 5000) , the two following matrices
E = np.identity((ij))
oneVector = np.ones((1, ij))
and I need to compute
np.kron(E, oneVector)
This is quite slow and inefficient. Basically, the Kronecker product of identity and a row vector of ones is repeating the identity matrix horizontally oneVector.size times.
I believe that creating a sparse product would make more sense. scipy.sparse.kron would allow me to create that product if I had both A, B as sparse. But I don't know how to create the vector of ones as a "sparse type" matrix.
Is there a simple way to generate the sparse equivalent of np.ones() or is there another way I should proceed?
The arguments to scipy.sparse.kron do not have to be sparse.
In [31]: import numpy as np
In [32]: import scipy.sparse as sp
In [33]: ij = 4
In [34]: E = sp.identity(ij) # Sparse identity matrix
In [35]: oneVector = np.ones((1, ij)) # Dense
In [36]: m = sp.kron(E, oneVector) # m is sparse.
In [37]: m
Out[37]:
<4x16 sparse matrix of type '<type 'numpy.float64'>'
with 16 stored elements (blocksize = 1x4) in Block Sparse Row format>
In [38]: m.A
Out[38]:
array([[ 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 1.]])
P.S. Based on this comment:
Basically, the Kronecker product of identity and a row vector of ones is repeating the identity matrix horizontally oneVector.size times.
I wonder if you meant kron(oneVector, E):
In [39]: m = sp.kron(oneVector, E)
In [40]: m.A
Out[40]:
array([[ 1., 0., 0., 0., 1., 0., 0., 0., 1., 0., 0., 0., 1., 0., 0., 0.],
[ 0., 1., 0., 0., 0., 1., 0., 0., 0., 1., 0., 0., 0., 1., 0., 0.],
[ 0., 0., 1., 0., 0., 0., 1., 0., 0., 0., 1., 0., 0., 0., 1., 0.],
[ 0., 0., 0., 1., 0., 0., 0., 1., 0., 0., 0., 1., 0., 0., 0., 1.]])

Memory Error in Python when trying to sum through Numpy Ndarray Object

I have a huge numpy ndarray (called mat and of the shape 700000 x 6000) of which I want to sum through the columns and find the nonzero indices.
I want to sum through it like so:
x = np.sum(mat[:,y], axis=1)
indices = np.nonzero(x)
But the first line immediately gives me an instant Memory Error. Is there a way I can go around using np.sum and do it another way that makes this calculation possible?
You have two problems:
See Sven Marnach's comment, it is possible that your data set is too large for your hardware
See ajcr's comment, what you want to do is not feasible the way you try do do it because the notation mat[:,an_index] gives you back an array of dimensionality one, whose only axis is axis=0
Another problem is the nature of your array, if it is an array of floating point numbers the probability that the sum of 700,000 entries is exactly equal to zero is close to zero... it's not impossible of course, but unlikely for certain it is.
That said, if you can reduce your data set or improve your hardware, you can do like this
In [39]: a = np.zeros((10,5))
In [40]: for i in range(5): a[3,i]=1+2*i if i != 3 else 0.0
In [41]: a
Out[41]:
array([[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 1., 3., 5., 0., 9.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]])
In [42]: np.sum(a,axis=0)
Out[42]: array([ 1., 3., 5., 0., 9.])
In [43]: np.nonzero(np.sum(a,axis=0))
Out[43]: (array([0, 1, 2, 4]),)
In [44]:

How can I create a matrix, or convert a 2D array into matrix in Python?

I wish to be able to extract a row or a column from a 2D array in Python such that it preserves the 2D shape and can be used for matrix multiplication. However, I cannot find in the documentation how can this best be done. For example, I can use
a = np.zeros(shape=(6,6))
to create an array, but a[:,0] will have the shape of (6,), and I cannot multiply this by a matrix of shape (6,1). Do I need to reshape a row or a column of an array into a matrix for every matrix multiplication, or are there other ways to do matrix multiplication?
You could use np.matrix directly:
>>> a = np.zeros(shape=(6,6))
>>> ma = np.matrix(a)
>>> ma
matrix([[ 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.]])
>>> ma[0,:]
matrix([[ 0., 0., 0., 0., 0., 0.]])
or you could add the dimension with np.newaxis
>>> a[0,:][np.newaxis, :]
array([[ 0., 0., 0., 0., 0., 0.]])

Categories

Resources