I have a numpy array of shape n x d. Each row represents a point in R^d. I want to filter this array to only rows within a given distance on each axis of a single point--a d-dimensional hypercube, as it were.
In 1 dimension, this could be:
array[np.which(array < lmax and array > lmin)]
where lmax and lmin are the max and min relevant to the point+-distance. But I want to do this in d dimensions. d is not fixed, so hard-coding it out doesn't work. I checked to see if the above works where lmax and lmin are d-length vectors, but it just flattens the array.
I know I could plug the matrix and the point into a distance calculator like scipy.spatial.distance and get some sort of distance metric, but that's likely slower than some simple filtering (if it exists) would be.
The fact I have to do this calculation potentially millions of times means Ideally I'd like a fast solution.
You can try this.
def test(array):
large = array > lmin
small = array < lmax
return array[[i for i in range(array.shape[0])
if np.all(large[i]) and np.all(small[i])]]
For every i, array[i] is a vector. All the elements of a vector should be in range [lmin, lmax], and this process of calculation can be vectorized.
I am calculating a model that requires a large number of calculations in a big matrix, representing interactions between households (numbering N, roughly 10E4) and firms (numbering M roughly 10E4). In particular, I want to perform the following steps:
X2 is an N x M matrix representing pairwise distance between each household and each firm. The step is to multiply every entry by a parameter gamma.
delta is a vector length M. The step is to broadcast multiply delta into the rows of the matrix from 1.
Exponentiate the matrix from 2.
Calculate the row sums of the matrix from 3.
Broadcast division division by the row sum vector from 4 into the rows of the matrix from 3.
w is a vector of length N. The step is to broadcast multiply w into the columns of the matrix from 5.
The final step is to take column sums of the matrix from 6.
These steps have to be performed 1000s of times in the context of matching the model simulation to data. At the moment, I have an implementation using a big NxM numpy array and using matrix algebra operations to perform the steps as described above.
I would like to be able to reduce the number of calculations by eliminating all the "cells" where the distance is greater than some critical value r.
How can I organize my data to do this, while performing all the operations I need to do (exponentiation, row/column sums, broadcasting across rows and columns)?
The solution I have in mind is something like storing the distance matrix in "long form", with each row representing a household / firm pair, rather than the N x M matrix, deleting all the invalid rows to get an array whose length is something less than NM, and then performing all the calculations in this format. In this solution I am wondering if I can use pandas dataframes to make the "broadcasts" and "row sums" work properly (and quickly). How can I make that work?
(Or alternately, if there is a better way I should be exploring, I'd love to know!)
Given a 2D M x N NumPy array and a list of rotation distances, I want to rotate all M rows over the distances in the list. This is what I currently have:
import numpy as np
M = 6
N = 8
dists = [2,0,2,1,4,2] # for example
matrix = np.random.randint(0,2,(M,N))
for i in range(M):
matrix[i] = np.roll(matrix[i], -dists[i])
The last two lines are actually part of an inner loop that gets executed hundreds of thousands of times and it is bottlenecking my performance as measured by cProfile. Is it possible to, for instance, avoid the for-loop and to do this more efficiently?
We can simulate the rolling behaviour with modulus operation after adding dists with a range(0...N) array to give us column indices for each row from where elements are to be picked and shuffled in the same row. We can vectorize this process across all rows with the help of broadcasting. Thus, we would have an implementation like so -
M,N = matrix.shape # Store matrix shape
# Get column indices for all elems for a rolled version with modulus operation
col_idx = np.mod(np.arange(N) + dists[:,None],N)
# Index into matrix with ranged row indices and col indices to get final o/p
out = matrix[np.arange(M)[:,None],col_idx]
I have 9 different numpy arrays that denote the same quantity, in our case xi. They are of length 19 each, i.e. they have been binned.
The difference between these 9 arrays is that, they have been calculated using jackknife resampling, i.e. by omitting some elements each time and repeating the same 9 times.
I would now like to calculate the covariance matrix, which should be of size 19x19. The square root of the diagonal elements of this covariance matrix should give me the error on this quantity (xi) for each bin (19 bins overall).
The equation for the covariance matrix is given by:
Here xi is the quantity. i and j are bins of length 19.
I did't want to write a manual code, so I tried with numpy.cov:
vstack = np.vstack((array1,array2,....,array9))
cov = np.cov(vstack)
This is giving me a matrix of size 9x9 instead of 19x19.
What is the mistake here? Each array, i.e. array1, array2...etc all are of length 19.
As you can see in the Example of the docs the shape of the output equals the number of rows squared. Therefore, when you have 9 rows you get a 9x9 matrix
If you expect a 19x19 matrix then you probably mixed your columns and rows up and you should use transpose
vst = np.vstack((array1,array2,....,array9))
cov_matrix = np.cov(vst.T)
I have a 2-d array for which I want to detect all locally maximal array indices. That is, given an index (i, j), its maximum gradient is the largest absolute change from any of its 8 neighboring values:
Index: (i, j)
Neighbors:
(i-1,j+1) (i,j+1) (i+1,j+1)
(i-1,j) [index] (i+1,j)
(i-1,j-1) (i,j-1) (i+1,j-1)
Neighbor angles:
315 0 45
270 [index] 90
225 180 135
MaxGradient(i,j) = Max(|Val(index) - Val(neighbor)|)
The index is said to be locally maximal if its MaxGradient is at least as large as any of its neighbors' own MaxGradients.
The output of the algorithm should be a 2-d array of tuples, or a 3-d array, where for each index in the original array, the output array contains a value indicating if that index was locally maximal and, if so, the angle of the gradient.
My initial implementation simply passed over the array twice, once to calculate the max gradients (stored in a temporary array) and then once over the temp array to determine the locally maximal indices. Each time, I did this via for loops, looking at each index individually.
Is there some more efficient way to do this in numpy?
Consider these 8 relative indexes:
X1 X2 X3
X4 X X5
X6 X7 X8
You can compute for every pixel X the differences D1=Val(X)-Val(X1), D2=Val(X)-Val(X2), D3=Val(X)-Val(X3), D4=Val(X)-Val(X4). You don't need to compute the other differences because they are mirrors of the first four.
To compute the differences, you can pad the image with a row and a column of zeros and subtract.
As Cyborg pointed out, there are only four differences which need to be computed to complete your calculation (note that there really should be a factor of 1/sqrt(2) for the diagonal and antidiagonal calculations if this really is a spatial gradient calculation on a uniform grid). If I have understood your question, the implementation with numpy could be something like this:
A=np.random.random(100).reshape(10,10)
# Padded copy of A
B=np.empty((12,12))
B[1:-1,1:-1]=A
B[0,1:-1]=A[0,:]
B[-1,1:-1]=A[-1,:]
B[1:-1,0]=A[:,0]
B[1:-1,-1]=A[:,-1]
B[0,0]=A[1,1]
B[-1,-1]=A[-1,-1]
B[-1,0]=A[-1,0]
B[0,1]=A[0,1]
# Compute 4 absolute differences
D1=np.abs(B[1:,1:-1]-B[:-1,1:-1]) # first dimension
D2=np.abs(B[1:-1,1:]-B[1:-1,:-1]) # second dimension
D3=np.abs(B[1:,1:]-B[:-1,:-1]) # Diagonal
D4=np.abs(B[1:,:-1]-B[:-1,1:]) # Antidiagonal
# Compute maxima in each direction
M1=np.maximum(D1[1:,:],D1[:-1,:])
M2=np.maximum(D2[:,1:],D2[:,:-1])
M3=np.maximum(D3[1:,1:],D3[:-1,:-1])
M4=np.maximum(D4[1:,:-1],D4[:-1,1:])
# Compute local maximum for each entry
M=np.max(np.dstack([M1,M2,M3,M4]),axis=2)
That will leave your with the maximum difference in each of the 4 directions of the input A in M. A similar idea can be used for labelling the locally maximal values, culminating in something like
T=np.where((M==np.max(np.dstack([Ma,Mb,Mc,Md,Me,Mf,Mg,Mh]),axis=2)))
which would give you an array contained the coordinates of locally maximal values in M