Numpy Uniform Distribution With Decay - python

I'm trying to construct a matrix of uniform distributions decaying to 0 at the same rate in each row. The distributions should be between -1 and 1. What I'm looking at is to construct something that resembles:
[[0.454/exp(0) -0.032/exp(1) 0.641/exp(2)...]
[-0.234/exp(0) 0.921/exp(1) 0.049/exp(2)...]
...
[0.910/exp(0) 0.003/exp(1) -0.908/exp(2)...]]
I can build a matrix of uniform distributions using:
w = np.array([np.random.uniform(-1, 1, 10) for i in range(10)])
and can achieve the desired result using a for loop with:
for k in range(len(w)):
for l in range(len(w[0])):
w[k][l] = w[k][l]/np.exp(l)
but wanted to know if there was a better way of accomplishing this.

You can use numpy's broadcasting feature to do this:
w = np.random.uniform(-1, 1, size=(10, 10))
weights = np.exp(np.arange(10))
w /= weights

Alok Singhal's answer is best, but as another way to do this (perhaps more explicit)
you can duplicate the vector [exp(0), ...,exp(9)] and stack them all into matrix by doing an outer product with a vector of ones. Then divide the 'w' matrix by the new 'decay' matrix.
n=10
w = np.array([np.random.uniform(-1, 1, n) for i in range(n)])
decay = np.outer( np.ones((n,1)), np.exp(np.arange(10)) )
result = w/decay
You could also use np.tile for creating a matrix out of several copies of a vector. It accomplishes the same thing as the outer product trick.

Related

Rolling/Increasing dimensionality of a NumPy array

I'm currently trying to find an easy way to do the following operation to an N dimensional array in Python. For simplicity let's start with a 1 dimensional array of size 4.
X = np.array([1,2,3,4])
What I want to do is create a new array, call it Y, such that:
Y = np.array([1,2,3,4],[2,3,4,1],[3,4,1,2],[4,1,2,3])
So what I'm trying to do is create an array Y such that:
Y[:,i] = np.roll(X[:],-i, axis = 0)
I know how to do this using for loops, but I'm looking for a faster method of doing so. The actual array I'm trying to do this to is a 3 dimensional array, call it X. What I'm looking for is a way to find an array Y, such that:
Y[:,:,:,i,j,k] = np.roll(X[:,:,:],(-i,-j,-k),axis = (0,1,2))
I can do this using the itertools.product class using for loops, but this is quite slow. If anyone has a better way of doing this, please let me know. I also have CUPY installed with a GTX-970, so if there's a way of using CUDA to do this faster please let me know. If anyone wants some more context please let me know.
Here is my original code for computing the position space two point correlation function. The array x0 is an n by n by n real valued array representing a real scalar field. The function iterate(j,s) runs j iterations. Each iteration consists of generating a random float between -s and s for each lattice site. It then computes the change in the action dS and accepts the change with a probability of min(1,exp^(-dS))
def momentum(k,j,s):
global Gxa
Gx = numpy.zeros((n,n,t))
for i1 in range(0,k):
iterate(j,s)
for i2,i3,i4 in itertools.product(range(0,n),range(0,n),range(0,n)):
x1 = numpy.roll(numpy.roll(numpy.roll(x0, -i2, axis = 0),-i3, axis = 1),-i4,axis = 2)
x2 = numpy.mean(numpy.multiply(x0,x1))
Gx[i2,i3,i4] = x2
Gxa = Gxa + Gx
Gxa = Gxa/k
Approach #1
We can extend this idea to our 3D array case here. So, simply concatenate with sliced versions along the three dims and then use np.lib.stride_tricks.as_strided based scikit-image's view_as_windows to efficiently get the final output as the strided-view of the concatenated version, like so -
from skimage.util.shape import view_as_windows
X1 = np.concatenate((X,X[:,:,:-1]),axis=2)
X2 = np.concatenate((X1,X1[:,:-1,:]),axis=1)
X3 = np.concatenate((X2,X2[:-1,:,:]),axis=0)
out = view_as_windows(X3,X.shape)
Approach #2
For really large arrays, we might want to initialize the output array and then re-use X3 from earlier approach to assign with slicing it. This slicing process would be faster than the original-rolling. The implementation would be -
m,n,r = X.shape
Yout = np.empty((m,n,r,m,n,r),dtype=X.dtype)
for i in range(m):
for j in range(n):
for k in range(r):
Yout[:,:,:,i,j,k] = X3[i:i+m,j:j+n,k:k+r]

Performant creation of sparse (stiffness) matrix

i'm currently implementing a small finite element sim. using Python/Numpy, and i am looking for an efficient way to create the global stiffness matrix:
1) I think that the creation of a sparse matrix from smaller element stiffness matrices should be done using coo_matrix(). However, can i extend an existing coo_matrix, or should i create it from the final i,j and v lists?
2) Currently, i am creating the i and j lists from the smaller element stiffness matrix using list comprehensions and concatenating them. Is there a better way to create these lists?
3) Creation of the data vector: Same question, are python lists preferred over numpy vectors due to the easy extension possibilities?
4) Of course i am open for any advices :). Thank You!
Here is a small example of my current plan to do the global assembly to make clear what i intend:
import numpy as np
from scipy.sparse import coo_matrix
#2 nodes, 3 dof per node
locations = [0, 6]
nNodes = 2
dof =3
totSize = nNodes * dof
Ke = np.array([[1,1,1, 2,2,2],
[1,1,1, 2,2,2],
[1,1,1, 2,2,2],
[2,2,2, 3,3,3],
[2,2,2, 3,3,3],
[2,2,2, 3,3,3]])
I = []
J = []
#generate rowwise i and j lists:
i = [ idx + u for i in range(totSize) for idx in locations for u in range(dof) ]
j = [ idx + u for idx in locations for u in range(dof) for i in range(totSize) ]
I += i
J += J
Data = Ke.flatten()
cMatrix = coo_matrix( (Data, (i,j)), )
In this post, I would try to focus on performance issue specific to the creation of lists i, j and finally matrix cMatrix.
Under those loop/list comprehensions, you are basically performing element-wise additions of locations and range(dof). Porting over to NumPy, we could leverage broadcasting there. Finally, to simulate for range(totSize) again in those comprehensions, we could tile the final addition result with np.tile. We will use it as its flattened version for indexing into columns of the sparse matrix and its transposed flattened version for rows.
Thus, the implementation would look something like this -
idx0 = (np.asarray(locations)[:,None] + np.arange(dof)).ravel()
J = np.tile(idx0[:,None],totSize)
cMatrix = coo_matrix( (Data, (J.ravel('F'),J.ravel())), )

Cosine distance of vector to matrix

In python, is there a vectorized efficient way to calculate the cosine distance of a sparse array u to a sparse matrix v, resulting in an array of elements [1, 2, ..., n] corresponding to cosine(u,v[0]), cosine(u,v[1]), ..., cosine(u, v[n])?
Not natively. You can however use the library scipy that can compute the cosine distance between two vectors for you: http://docs.scipy.org/doc/scipy-0.17.0/reference/generated/scipy.spatial.distance.cosine.html. You can build a version that takes a matrix using this as a stepping stone.
Add the vector onto the end of the matrix, calculate a pairwise distance matrix using sklearn.metrics.pairwise_distances() and then extract the relevant column/row.
So for vector v (with shape (D,)) and matrix m (with shape (N,D)) do:
import sklearn
from sklearn.metrics import pairwise_distances
new_m = np.concatenate([m,v[None,:]], axis=0)
distance_matrix = sklearn.metrics.pairwise_distances(new_m, axis=0), metric="cosine")
distances = distance_matrix[-1,:-1]
Not ideal, but better than iterating!
This method can be extended if you are querying more than one vector. To do this, a list of vectors can be concatenated instead.
I think there is a way using the definition and the numpy library:
Definition:
import numpy as np
#just creating random data
u = np.random.random(100)
v = np.random.random((100,100))
#dot product: for every row in v, multiply u and sum the elements
u_dot_v = np.sum(u*v,axis = 1)
#find the norm of u and each row of v
mod_u = np.sqrt(np.sum(u*u))
mod_v = np.sqrt(np.sum(v*v,axis = 1))
#just apply the definition
final = 1 - u_dot_v/(mod_u*mod_v)
#verify with the cosine function from scipy
from scipy.spatial.distance import cosine
final2 = np.array([cosine(u,i) for i in v])
The definition of cosine distance i found here :https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cosine.html#scipy.spatial.distance.cosine
In scipy.spatial.distance.cosine()
http://docs.scipy.org/doc/scipy-0.17.0/reference/generated/scipy.spatial.distance.cosine.html
Below worked for me, have to provide correct signature
from scipy.spatial.distance import cosine
def cosine_distances(embedding_matrix, extracted_embedding):
return cosine(embedding_matrix, extracted_embedding)
cosine_distances = np.vectorize(cosine_distances, signature='(m),(d)->()')
cosine_distances(corpus_embeddings, extracted_embedding)
In my case
corpus_embeddings is a (10000,128) matrix
extracted_embedding is a 128-dimensional vector

How to perform a rolling sum along a matrix axis?

Given matrix X with T rows and columns k:
T = 50
H = 10
k = 5
X = np.arange(T).reshape(T,1)*np.ones((T,k))
How to perform a rolling cumulative sum of X along the rows axis with lag H?
Xcum = np.zeros((T-H,k))
for t in range(H,T):
Xcum[t-H,:] = np.sum( X[t-H:t,:], axis=0 )
Notice, preferably avoiding strides and convolution, under broadcasting/vectorization best practices.
Sounds like you want the following:
import scipy.signal
scipy.signal.convolve2d(X, np.ones((H,1)), mode='valid')
This of course uses convolve, but the question, as stated, is a convolution operation. Broadcasting would result in a much slower/memory intensive algorithm.
You are actually missing one last row in your rolling sum, this would be the correct output:
Xcum = np.zeros((T-H+1, k))
for t in range(H, T+1):
Xcum[t-H, :] = np.sum(X[t-H:t, :], axis=0)
If you need to do this over an arbitrary axis with numpy only, the simplest will be to do a np.cumsum along that axis, then compute your results as a difference of two slices of that. With your sample array and axis:
temp = np.cumsum(X, axis=0)
Xcum = np.empty((T-H+1, k))
Xcum[0] = temp[H-1]
Xcum[1:] = temp[H:] - temp[:-H]
Another option is to use pandas and its rolling_sum function, which against all odds apparently works on 2D arrays just as you need it to:
import pandas as pd
Xcum = pd.rolling_sum(X, 10)[9:] # first 9 entries are NaN
Here's a strided solution. I realize it's not what you want, but I wondered how it compares.
def foo2(X):
temp = np.lib.stride_tricks.as_strided(X, shape=(H,T-H+1,k),
strides=(k*8,)+X.strides))
# return temp.sum(0)
return np.einsum('ijk->jk', temp)
This times at 35 us, compared to 22 us for Jaime's cumsum solution. einsum is a bit faster than sum(0). temp uses X's data, so there's no memory penalty. But it is harder to understand.

Normalize Numpy Upper-triangular subarray

I have an upper-triangular subarray of dimension 4. It is initialized as
N, Q = (99, 23)
bivariate = np.zeros((N,N,Q,Q))
and then populated by something like
for i in range(N):
for j in range(i+1,N):
bivariate[i,j] = num
I want the upper-triangular elements to be normalized (Q,Q) matrices. I am currently doing this by just doing a
bivariate /= bivariate.sum(axis=3).sum(axis=2)[:,:,np.newaxis,np.newaxis]
but I get Runtime Warnings due to the empty arrays of the lower-triangular portion being normalized. Is there a better way to do this other than the following?
for i in range(N):
for j in range(i+1,N):
bivariate[i,j] /= bivariate[i,j].sum()
Thanks.
If you're concerned about getting np.nan, you could try to replace the null entries of your normalization factor by 1:
norm_factor = bivariate.sum(axis=3).sum(axis=2)[:,:,None,None]
bivariate /= np.where(norm, norm, 1)
At least you'll avoid the for loops...
FWIW, I've found that it's much easier to work on the upper triangular portion separately then insert it back in.
triu = np.tri_indices(n, 1)
upper_tri = bivariate[triu].reshape(-1, Q*Q)
upper_tri /= upper_tri.sum(axis=1)
bivariate[triu] = upper_tri.reshape(-1, Q, Q)

Categories

Resources