Cross correlation of subarrays - python

I want to get cross correlation each of subarray. I only find the code of correlation of given arrays, but not of subarrays.
I have one array, but the array has subarrays.
z=[sensor_z[pre: next] for pre, next in zip(peak_pos,peak_pos[1:len(peak_pos)])]
Thats my array, the array is divided in subarrays.
What i want is something like
k = []
for elem in z:
k.append(np.correlate(elem,elem+1))
print (k)
I want to calculate the cross correlation of each subarray. I want to compare all subarrays and get the cross correlation of each subarray.
Does anyone an idea how to get this?

Related

Iteratively add values to a zeros matrix from another matrix with non-zero values

I have a 100x100 matrix "A" and I want to create a new 100x100 matrix "B" that only contains a set of values of the higher values of "A", and the rest of elements with zeros.
How can I do it iteratively, by adding one entry at a time?
In fact, I want to do the following:
I have a matrix "A" from which I've computed the minimum spanning tree (matrix "B"), but now I want to add the higher values of "A" to this MST matrix at a time until the matrix has a desired density.
So I need to add a value at a time, compute the matrix density, and continue adding values the higher values of matrix "A" until I get a desired density in matrix "B".
Assuming that density is proportional to the number of elements, you can use np.argpartition:
A = ... # 100, 100 array
n = ... # number of elements you want
M = np.zeros_like(A)
ind = np.argpartition(A, -n, axis=None)
M.ravel()[ind[-n:]] = A.ravel()[ind[-n:]]
Even if your density estimate is not as clear cut as this, this allows you to build an effective binary search or other iterative algorithm to find the right number of elements.

subsetting numpy array to rows within a d-dimensional hypercube

I have a numpy array of shape n x d. Each row represents a point in R^d. I want to filter this array to only rows within a given distance on each axis of a single point--a d-dimensional hypercube, as it were.
In 1 dimension, this could be:
array[np.which(array < lmax and array > lmin)]
where lmax and lmin are the max and min relevant to the point+-distance. But I want to do this in d dimensions. d is not fixed, so hard-coding it out doesn't work. I checked to see if the above works where lmax and lmin are d-length vectors, but it just flattens the array.
I know I could plug the matrix and the point into a distance calculator like scipy.spatial.distance and get some sort of distance metric, but that's likely slower than some simple filtering (if it exists) would be.
The fact I have to do this calculation potentially millions of times means Ideally I'd like a fast solution.
You can try this.
def test(array):
large = array > lmin
small = array < lmax
return array[[i for i in range(array.shape[0])
if np.all(large[i]) and np.all(small[i])]]
For every i, array[i] is a vector. All the elements of a vector should be in range [lmin, lmax], and this process of calculation can be vectorized.

How to seperate multiple matrices from for loop?

I have a function the gives a matrix as a result, since im using a for loop and append the results are 20 matrices in an array. I would like to add up the lower and the upper values of every matrix. np.sum(np.tril(matrix, -1)) will add up the values of all the matrices. Is it possible to do it per matrix? Or can i get 20 seperate matrices to do this?
matrix = []
for i in clubs:
matrix.append(simulate_match(poisson_model, 'ARSENAL', i, max_goals=10))

How do I organize an uneven matrix of many calculations in pandas / numpy / pandas

I am calculating a model that requires a large number of calculations in a big matrix, representing interactions between households (numbering N, roughly 10E4) and firms (numbering M roughly 10E4). In particular, I want to perform the following steps:
X2 is an N x M matrix representing pairwise distance between each household and each firm. The step is to multiply every entry by a parameter gamma.
delta is a vector length M. The step is to broadcast multiply delta into the rows of the matrix from 1.
Exponentiate the matrix from 2.
Calculate the row sums of the matrix from 3.
Broadcast division division by the row sum vector from 4 into the rows of the matrix from 3.
w is a vector of length N. The step is to broadcast multiply w into the columns of the matrix from 5.
The final step is to take column sums of the matrix from 6.
These steps have to be performed 1000s of times in the context of matching the model simulation to data. At the moment, I have an implementation using a big NxM numpy array and using matrix algebra operations to perform the steps as described above.
I would like to be able to reduce the number of calculations by eliminating all the "cells" where the distance is greater than some critical value r.
How can I organize my data to do this, while performing all the operations I need to do (exponentiation, row/column sums, broadcasting across rows and columns)?
The solution I have in mind is something like storing the distance matrix in "long form", with each row representing a household / firm pair, rather than the N x M matrix, deleting all the invalid rows to get an array whose length is something less than NM, and then performing all the calculations in this format. In this solution I am wondering if I can use pandas dataframes to make the "broadcasts" and "row sums" work properly (and quickly). How can I make that work?
(Or alternately, if there is a better way I should be exploring, I'd love to know!)

Quick comparison of numpy array elements, greater or less than each other

I'm currently implementing clustering algorithms in python. As the end product will be using thousands of array elements I'm attempting to minimize looping and optimize it as much as possible initially.
I'm using scipy's cdist to create a 2D array of distances from a chosen number of random clusters. So 3 clusters would produce an array of distances, say for x points:
distances = array([[5.5,2.5,7.3],
[1.0,4.6,2.2],
[6.0,2.8,7.1],
[5.3,4.6,1.5],
...........]])
Where each column is the distance from a cluster and each row is a point, I wish to quickly create an array of values 0,1 or 2, (with possible solution to identical distances occurring) like so:
label = array([1,0,1,2,.......])
A quick solution, other than looping would be appreciated.
Use
distances.argmin(axis=1)
which returns
array([1, 0, 1, 2])
for your example array.
For identical distances it returns the first occurrence of such element.

Categories

Resources