Can't get same values as numpy elementwise matrix multiplication using numba

Can't get same values as numpy elementwise matrix multiplication using numba - python

I have been playing around with numba and trying to implement a simple element-wise matrix multiplication. When using 'vectorize' I get the same result as the numpy multiplication but when I'm using 'cuda.jit' they are not same. Many of them are zeros. I'm providing a minimum working example for this purpose. Any help with the problem will be appreciated. I'm using numba o.35.0 and python 2.7
from __future__ import division
from __future__ import print_function
import numpy as np
from numba import vectorize, cuda, jit
M = 80
N = 40
P = 40
# Set the number of threads in a block
threadsperblock = 32
# Calculate the number of thread blocks in the grid
blockspergrid = (M*N*P + (threadsperblock - 1)) // threadsperblock
#vectorize(['float32(float32,float32)'], target='cuda')
def VectorMult3d(a, b):
return a*b
#cuda.jit('void(float32[:, :, :], float32[:, :, :], float32[:, :, :])')
def mult_gpu_3d(a, b, c):
[x, y, z] = cuda.grid(3)
if x < c.shape[0] and y < c.shape[1] and z < c.shape[2]:
c[x, y, z] = a[x, y, z] * b[x, y, z]
if __name__ == '__main__':
A = np.random.normal(size=(M, N, P)).astype(np.float32)
B = np.random.normal(size=(M, N, P)).astype(np.float32)
numpy_C = A*B
A_gpu = cuda.to_device(A)
B_gpu = cuda.to_device(B)
C_gpu = cuda.device_array((M,N,P), dtype=np.float32) # cuda.device_array_like(A_gpu)
mult_gpu_3d[blockspergrid,threadsperblock](A_gpu,B_gpu,C_gpu)
cudajit_C = C_gpu.copy_to_host()
print('------- using cuda.jit -------')
print('Is close?: {}'.format(np.allclose(numpy_C,cudajit_C)))
print('{} of {} elements are close'.format(np.sum(np.isclose(numpy_C,cudajit_C)), M*N*P))
print('------- using cuda.jit -------\n')
vectorize_C_gpu = VectorMult3d(A_gpu, B_gpu)
vectorize_C = vectorize_C_gpu.copy_to_host()
print('------- using vectorize -------')
print('Is close?: {}'.format(np.allclose(numpy_C,vectorize_C)))
print('{} of {} elements are close'.format(np.sum(np.isclose(numpy_C,vectorize_C)), M*N*P))
print('------- using vectorize -------\n')
import numba; print("numba version: "+numba.__version__)

Here is how you could debug this.
Consider a smaller and simplified example with:
reduced array sizes, e.g. (2, 3, 1) (so you could actually print the values and be able to read them)
simple and deterministic contents, e.g. "all ones" (to compare across runs)
additional kernel arguments for debugging
from __future__ import (division, print_function)
import numpy as np
from numba import cuda
M = 2
N = 3
P = 1
threadsperblock = 1
blockspergrid = (M * N * P + (threadsperblock - 1)) // threadsperblock
#cuda.jit
def mult_gpu_3d(a, b, c, grid_ran, grid_multed):
grid = cuda.grid(3)
x, y, z = grid
grid_ran[x] = 1
if (x < c.shape[0]) and (y < c.shape[1]) and (z < c.shape[2]):
grid_multed[x] = 1
c[grid] = a[grid] * b[grid]
if __name__ == '__main__':
A = np.ones((M, N, P), np.int32)
B = np.ones((M, N, P), np.int32)
A_gpu = cuda.to_device(A)
B_gpu = cuda.to_device(B)
C_gpu = cuda.to_device(np.zeros_like(A))
# Tells whether thread at index i have ran
grid_ran = cuda.to_device(np.zeros([blockspergrid], np.int32))
# Tells whether thread at index i have performed multiplication
grid_multed = cuda.to_device(np.zeros(blockspergrid, np.int32))
mult_gpu_3d[blockspergrid, threadsperblock](
A_gpu, B_gpu, C_gpu, grid_ran, grid_multed)
print("grid_ran.shape : ", grid_ran.shape)
print("grid_multed.shape : ", grid_multed.shape)
print("C_gpu.shape : ", C_gpu.shape)
print("grid_ran : ", grid_ran.copy_to_host())
print("grid_multed : ", grid_multed.copy_to_host())
C = C_gpu.copy_to_host()
print("C transpose flat : ", C.T.flatten())
print("C : \n", C)
Output:
grid_ran.shape : (6,)
grid_multed.shape : (6,)
C_gpu.shape : (2, 3, 1)
grid_ran : [1 1 1 1 1 1]
grid_multed : [1 1 0 0 0 0]
C transpose flat : [1 1 0 0 0 0]
C :
[[[1]
[0]
[0]]
[[1]
[0]
[0]]]
You can see that the device grid shape does not correspond to the shape of the arrays: the grid is flat (M*N*P), while arrays are all 3-dimensional (M, N, P). That is, first dimension of the grid has indices in range 0..M*N*P-1 (0..5, totaling 6 values in my example), while first dimension of the array is only in 0..M-1 (0..1, totaling 2 values in my example). This mistake typically leads do out-of-bounds access, but you have protected your kernel with a conditional which cuts down the offending threads:
if (x <= c.shape[0])
This line does not allow threads with indices above M-1 (1 in my example) to run (well, sort of [1]), that is why no values are written and you get many zeros in the resulting array.
Possible solutions:
In general, you could use multidimensional kernel grid configuration, i.e. a 3D vector for blockspergrid instead of a scalar [2].
In particular, as elementwise multiplication is a map operation and does not depend on array shapes, you could flatten all 3 arrays to 1D arrays, run your kernel as is on 1D grid, then reshape the result back [3], [4].
References:
[1] How to understand “All threads in a warp execute the same instruction at the same time.” in GPU?
[2] Understanding CUDA grid dimensions, block dimensions and threads organization (simple explanation)
[3] numpy.ndarray.flatten
[4] numpy.ravel

Related

How to efficiently apply function over each row of ndarray with value from list of args?

I would like to apply function func over each row of 2D ndarray arr shaped n x m with provided list of arguments args (of lengh n). That is for each row i function is executed as func(arr[i, :], args[i]).
This task can be acomplished with np.fromiter (using for loop):
iterable = (func(row, arg) for row, arg in zip(arr, args))
results = np.fromiter(iterable, dtype=int)
However this can take some time in case of large arrays. Acoording to unutbu's answer using numpy's python utility functions (e.g. np.apply_along_axis) does not provide siginifacnt speedup. Is there a way to optimize this process?
To avoid falling into XY problem trap, beneath is my orginal problem statement:
I have an ndarray representing image, shaped n x m. This image undergo processing during, which for each row a specifix index i is calculated. I want to compose a image of orginal shape (n x m) using data on the right from index i for each row. That is I want to resample each row[i:] of length m - i to m samples. Note that I want to use my own implementation of resampling function (don't want to use scipy.signal.resample etc).
EDIT:
Test code with func example (added count argument to fromiter as suggested by LudvigH):
import numpy as np
import matplotlib.pyplot as plt
def simple_slant_range_correction(
row, height, n_samples, max_ground_range, max_slant_range, slant_range_resolution
):
ground_ranges = np.linspace(height, max_ground_range, n_samples)
slant_ranges = np.sqrt(ground_ranges ** 2 + height ** 2)
slant_ranges_indicies = slant_ranges / slant_range_resolution - 1
slant_ranges_indicies_floor = np.floor(slant_ranges_indicies).astype(np.int16)
slant_ranges_indicies_ceil = np.clip(
0, n_samples - 1, slant_ranges_indicies_floor + 1
)
weight = slant_ranges_indicies - slant_ranges_indicies_floor
return (
weight * row[slant_ranges_indicies_ceil]
+ (1 - weight) * row[slant_ranges_indicies_floor]
).astype(np.float32)
if __name__ == "__main__":
# Test parameters
n, m = 100, 100
max_slant_range = 50
slant_range_resolution = max_slant_range / m
# Create some dummy data
data = np.zeros((n, m))
h_indicies = np.ones((n), dtype=int)
for i in np.arange(0, n, 5):
data[:i, :i] += i
h_indicies[:i] += 1
heights = h_indicies * slant_range_resolution
max_ground_ranges = np.sqrt(max_slant_range ** 2 - heights ** 2)
# Perform resampling based on h_index
iters = (
simple_slant_range_correction(
row, height, m, max_ground_range, max_slant_range, slant_range_resolution
)
for row, height, max_ground_range in zip(data, heights, max_ground_ranges)
)
data_sampled = np.fromiter(iters, dtype=np.dtype((np.float32, m)), count=n)
# Plot data
fig, axs = plt.subplots(1, 2)
axs[0].plot(h_indicies + 0.5, np.arange(n) + 0.5, c="red")
axs[0].imshow(data, vmin=0, vmax=data.max())
axs[1].imshow(data_sampled, vmin=0, vmax=data.max())
axs[0].set_axis_off()
axs[1].set_axis_off()
plt.tight_layout()
plt.show()

It is typically faster to take advantage of vectorization by using numpy operations to manipulate the data, as compared to using python functions and objects to manipulate the data. Below is an example of a way to solve the problem described at the end of your question using numpy vectorization.
import numpy as np
Choosing some array and column indices as an example:
# 1 2 3 3 1
# A = 4 5 6 6 row_indices = 3
# 7 8 9 9 2
A = np.array([[1,2,3,3],[4,5,6,6],[7,8,9,9]])
row_indices = np.array([1,3,2])
Use vector operations to build a boolean masking array and then multiply the original array by the mask:
NM = np.shape(A)
N = NM[0]
M = NM[1]
col = np.arange(M,dtype=np.uint32)
B = np.outer(np.ones([1,N],dtype=np.uint32),col)
C = np.outer(row_indices,np.ones([1,M],dtype=np.uint32))
A_sampled = (B>=C)*A
print(A_sampled)
# output:
# 0 2 3 3
# 0 0 0 6
# 0 0 9 9

How do I convert this Matlab code with meshgrid and arrays to Python code?

I am attempting to write a program which constructs a matrix and performs a singular value decomposition on it. I am evaluating the function ax^2 +bx + 1 on a grid. I then make a uniform meshgrid of a and b. The rows of the matrix correspond to different quadratic coefficients, while each column corresponds to a grid point at which the function is evaluated.
The matlab code is here:
% Collect data
x = linspace(-1,1,100);
[a,b] = meshgrid(0:0.1:1,0:0.1:1);
D=zeros(numel(x),numel(a));
sz = size(D)
% Build “Dose” matrix
for i=1:numel(a)
D(:,i) = a(i)*x.^2+b(i)*x+1;
end
% Do the SVD:
[U,S,V]=svd(D,'econ');
D_reconstructed = U*S*V';
plot(diag(S))
scatter3(a(:),b(:),V(:,1))
This is my attempt at a solution:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-1, 1, 100)
def f(x, a, b):
return a*x*x + b*x + 1
a, b = np.mgrid[0:1:0.1,0:1:0.1]
#a = b = np.arange(0,1,0.01)
D = np.zeros((x.size, a.size))
for i in range(a.size):
D[i] = a[i]*x*x +b[i]*x +1
U, S, V = np.linalg.svd(D)
plt.plot(np.diag(S))
fig = plt.figure()
ax = plt.axes(projection="3d")
ax.scatter(a, b, V[0])
but I always get broadcasting errors which I am not sure how to fix.

Firstly, in MATLAB you're assigning to D(:,i), but in python you're assigning to D[i]. The latter is equivalent to D[i, ...] which is in your case D[i, :]. Instead you seem to need D[:, i].
Secondly, in MATLAB using a linear index into a 2d array (namely a and b) will give you flattened views. If you do that with numpy you get slices of an array instead, just as I mentioned with D[i].
You can do away with the loop with broadcasting and getting your desired 2d array by .ravelling (or reshaping) your a and b arrays:
x = np.linspace(-1, 1, 100)[:, None] # inject trailing singleton for broadcasting
a, b = np.mgrid[0:1:0.1, 0:1:0.1]
D = a.ravel() * x**2 + b.ravel() * x + 1
The way this works is that x has shape (100, 1) after we inject a trailing singleton (in MATLAB trailing singletons are implied, in numpy leading ones), and both a.ravel() and b.ravel() have shape (10*10,) which is compatible with (1, 10*10), making broadcasting possible into shape (100, 10*10). You could also replace the calls to ravel with
a, b = np.mgrid[...].reshape(2, -1)
which is a trick I sometimes use, but this is harder to read if you're unfamiliar with the pattern.
Side note: it's better to use example data where dimensions end up being of different size so that you notice if something ends up being transposed.

Converting `for` loop that can't be vectorized to sparse matrix

There are 2 boxes and a small gap that allows 1 particle per second from one box to enter the other box. Whether a particle will go from A to B, or B to A depends on the ratio Pa/Ptot (Pa: number of particles in box A, Ptot: total particles in both boxes).
To make it faster, I need to get rid of the for loops, however I can't find a way to either vectorize them or turn them into a sparse matrix that represents my for loop:
What about for loops you can't vectorize? The ones where the result at iteration n depends on what you calculated in iteration n-1, n-2, etc. You can define a sparse matrix that represents your for loop and then do a sparse matrix solve.
But I can't figure out how to define a sparse matrix out of this. The simulation boils down to calculating:
where
is the piece that gives me trouble when trying to express my problem as described here. (Note: the contents in the parenthesis are a bool operation)
Questions:
Can I vectorize the for loop?
If not, how can I define a sparse matrix?
(bonus question) Why is the execution time x27 faster in Python (0.027s) than Octave (0.75s)?
Note: I implemented the simulation in both Python and Octave and will soon do it on Matlab, therefor the tags are correct.
Octave code
1; % starting with `function` causes errors
function arr = Px_simulation (Pa_init, Ptot, t_arr)
t_size = size(t_arr);
arr = zeros(t_size); % fixed size array is better than arr = []
rand_arr = rand(t_size); % create all rand values at once
_Pa = Pa_init;
for _j=t_arr()
if (rand_arr(_j) * Ptot > _Pa)
_Pa += 1;
else
_Pa -= 1;
endif
arr(_j) = _Pa;
endfor
endfunction
t = 1:10^5;
for _i=1:3
Ptot = 100*10^_i;
tic()
Pa_simulation = Px_simulation(Ptot, Ptot, t);
toc()
subplot(2,2,_i);
plot(t, Pa_simulation, "-2;simulation;")
title(strcat("{P}_{a0}=", num2str(Ptot), ',P=', num2str(Ptot)))
endfor
Python
import numpy
import matplotlib.pyplot as plt
import timeit
import cpuinfo
from random import random
print('\nCPU: {}'.format(cpuinfo.get_cpu_info()['brand']))
PARTICLES_COUNT_LST = [1000, 10000, 100000]
DURATION = 10**5
t_vals = numpy.linspace(0, DURATION, DURATION)
def simulation(na_initial, ntotal, tvals):
shape = numpy.shape(tvals)
arr = numpy.zeros(shape)
na_current = na_initial
for i in range(len(tvals)):
if random() > (na_current/ntotal):
na_current += 1
else:
na_current -= 1
arr[i] = na_current
return arr
plot_lst = []
for i in PARTICLES_COUNT_LST:
start_t = timeit.default_timer()
n_a_simulation = simulation(na_initial=i, ntotal=i, tvals=t_vals)
execution_time = (timeit.default_timer() - start_t)
print('Execution time: {:.6}'.format(execution_time))
plot_lst.append(n_a_simulation)
for i in range(len(PARTICLES_COUNT_LST)):
plt.subplot('22{}'.format(i))
plt.plot(t_vals, plot_lst[i], 'r')
plt.grid(linestyle='dotted')
plt.xlabel("time [s]")
plt.ylabel("Particles in box A")
plt.show()

IIUC you can use cumsum() in both Octave and Numpy:
Octave:
>> p = rand(1, 5);
>> r = rand(1, 5);
>> p
p =
0.43804 0.37906 0.18445 0.88555 0.58913
>> r
r =
0.70735 0.41619 0.37457 0.72841 0.27605
>> cumsum (2*(p<(r+0.03)) - 1)
ans =
1 2 3 2 1
>> (2*(p<(r+0.03)) - 1)
ans =
1 1 1 -1 -1
Also note that the following function will return values ([-1, 1]):

Element-wise maximum of two sparse matrices

Is there an easy/build-in way to get the element-wise maximum of two (or ideally more) sparse matrices? I.e. a sparse equivalent of np.maximum.

This did the trick:
def maximum (A, B):
BisBigger = A-B
BisBigger.data = np.where(BisBigger.data < 0, 1, 0)
return A - A.multiply(BisBigger) + B.multiply(BisBigger)

No, there's no built-in way to do this in scipy.sparse. The easy solution is
np.maximum(X.A, Y.A)
but this is obviously going to be very memory-intensive when the matrices have large dimensions and it might crash your machine. A memory-efficient (but by no means fast) solution is
# convert to COO, if necessary
X = X.tocoo()
Y = Y.tocoo()
Xdict = dict(((i, j), v) for i, j, v in zip(X.row, X.col, X.data))
Ydict = dict(((i, j), v) for i, j, v in zip(Y.row, Y.col, Y.data))
keys = list(set(Xdict.iterkeys()).union(Ydict.iterkeys()))
XmaxY = [max(Xdict.get((i, j), 0), Ydict.get((i, j), 0)) for i, j in keys]
XmaxY = coo_matrix((XmaxY, zip(*keys)))
Note that this uses pure Python instead of vectorized idioms. You can try shaving some of the running time off by vectorizing parts of it.

Here's another memory-efficient solution that should be a bit quicker than larsmans'. It's based on finding the set of unique indices for the nonzero elements in the two arrays using code from Jaime's excellent answer here.
import numpy as np
from scipy import sparse
def sparsemax(X, Y):
# the indices of all non-zero elements in both arrays
idx = np.hstack((X.nonzero(), Y.nonzero()))
# find the set of unique non-zero indices
idx = tuple(unique_rows(idx.T).T)
# take the element-wise max over only these indices
X[idx] = np.maximum(X[idx].A, Y[idx].A)
return X
def unique_rows(a):
void_type = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
b = np.ascontiguousarray(a).view(void_type)
idx = np.unique(b, return_index=True)[1]
return a[idx]
Testing:
def setup(n=1000, fmt='csr'):
return sparse.rand(n, n, format=fmt), sparse.rand(n, n, format=fmt)
X, Y = setup()
Z = sparsemax(X, Y)
print np.all(Z.A == np.maximum(X.A, Y.A))
# True
%%timeit X, Y = setup()
sparsemax(X, Y)
# 100 loops, best of 3: 4.92 ms per loop

The latest scipy (13.0) defines element-wise booleans for sparse matricies. So:
BisBigger = B>A
A - A.multiply(BisBigger) + B.multiply(BisBigger)
np.maximum does not (yet) work because it uses np.where, which is still trying to get the truth value of an array.
Curiously B>A returns a boolean dtype, while B>=A is float64.

Here is a function that returns a sparse matrix that is element-wise maximum of two sparse matrices. It implements the answer by hpaulj:
def sparse_max(A, B):
"""
Return the element-wise maximum of sparse matrices `A` and `B`.
"""
AgtB = (A > B).astype(int)
M = AgtB.multiply(A - B) + B
return M
Testing:
A = sparse.csr_matrix(np.random.randint(-9,10, 25).reshape((5,5)))
B = sparse.csr_matrix(np.random.randint(-9,10, 25).reshape((5,5)))
M = sparse_max(A, B)
M2 = sparse_max(B, A)
# Test symmetry:
print((M.A == M2.A).all())
# Test that M is larger or equal to A and B, element-wise:
print((M.A >= A.A).all())
print((M.A >= B.A).all())

from scipy import sparse
from numpy import array
I = array([0,3,1,0])
J = array([0,3,1,2])
V = array([4,5,7,9])
A = sparse.coo_matrix((V,(I,J)),shape=(4,4))
A.data.max()
9
If you haven't already, you should try out ipython, you could have saved your self time my making your spare matrix A then simply typing A. then tab, this will print a list of methods that you can call on A. From this you would see A.data gives you the non-zero entries as an array and hence you just want the maximum of this.

How to run a .py module?

I've got zero experience with Python. I have looked around some tutorial materials, but it seems difficult to understand a advanced code. So I came here for a more specific answer.
For me the mission is to redo the code in my computer.
Here is the scenario:
I'm a graduate student studying tensor factorization in relation learning. A paper[1] providing a code to run this algorithm, as follows:
import logging, time
from numpy import dot, zeros, kron, array, eye, argmax
from numpy.linalg import qr, pinv, norm, inv
from scipy.linalg import eigh
from numpy.random import rand
__version__ = "0.1"
__all__ = ['rescal', 'rescal_with_random_restarts']
__DEF_MAXITER = 500
__DEF_INIT = 'nvecs'
__DEF_PROJ = True
__DEF_CONV = 1e-5
__DEF_LMBDA = 0
_log = logging.getLogger('RESCAL')
def rescal_with_random_restarts(X, rank, restarts=10, **kwargs):
"""
Restarts RESCAL multiple time from random starting point and
returns factorization with best fit.
"""
models = []
fits = []
for i in range(restarts):
res = rescal(X, rank, init='random', **kwargs)
models.append(res)
fits.append(res[2])
return models[argmax(fits)]
def rescal(X, rank, **kwargs):
"""
RESCAL
Factors a three-way tensor X such that each frontal slice
X_k = A * R_k * A.T. The frontal slices of a tensor are
N x N matrices that correspond to the adjecency matrices
of the relational graph for a particular relation.
For a full description of the algorithm see:
Maximilian Nickel, Volker Tresp, Hans-Peter-Kriegel,
"A Three-Way Model for Collective Learning on Multi-Relational Data",
ICML 2011, Bellevue, WA, USA
Parameters
----------
X : list
List of frontal slices X_k of the tensor X. The shape of each X_k is ('N', 'N')
rank : int
Rank of the factorization
lmbda : float, optional
Regularization parameter for A and R_k factor matrices. 0 by default
init : string, optional
Initialization method of the factor matrices. 'nvecs' (default)
initializes A based on the eigenvectors of X. 'random' initializes
the factor matrices randomly.
proj : boolean, optional
Whether or not to use the QR decomposition when computing R_k.
True by default
maxIter : int, optional
Maximium number of iterations of the ALS algorithm. 500 by default.
conv : float, optional
Stop when residual of factorization is less than conv. 1e-5 by default
Returns
-------
A : ndarray
array of shape ('N', 'rank') corresponding to the factor matrix A
R : list
list of 'M' arrays of shape ('rank', 'rank') corresponding to the factor matrices R_k
f : float
function value of the factorization
iter : int
number of iterations until convergence
exectimes : ndarray
execution times to compute the updates in each iteration
"""
# init options
ainit = kwargs.pop('init', __DEF_INIT)
proj = kwargs.pop('proj', __DEF_PROJ)
maxIter = kwargs.pop('maxIter', __DEF_MAXITER)
conv = kwargs.pop('conv', __DEF_CONV)
lmbda = kwargs.pop('lmbda', __DEF_LMBDA)
if not len(kwargs) == 0:
raise ValueError( 'Unknown keywords (%s)' % (kwargs.keys()) )
sz = X[0].shape
dtype = X[0].dtype
n = sz[0]
k = len(X)
_log.debug('[Config] rank: %d | maxIter: %d | conv: %7.1e | lmbda: %7.1e' % (rank,
maxIter, conv, lmbda))
_log.debug('[Config] dtype: %s' % dtype)
# precompute norms of X
normX = [norm(M)**2 for M in X]
Xflat = [M.flatten() for M in X]
sumNormX = sum(normX)
# initialize A
if ainit == 'random':
A = array(rand(n, rank), dtype=dtype)
elif ainit == 'nvecs':
S = zeros((n, n), dtype=dtype)
T = zeros((n, n), dtype=dtype)
for i in range(k):
T = X[i]
S = S + T + T.T
evals, A = eigh(S,eigvals=(n-rank,n-1))
else :
raise 'Unknown init option ("%s")' % ainit
# initialize R
if proj:
Q, A2 = qr(A)
X2 = __projectSlices(X, Q)
R = __updateR(X2, A2, lmbda)
else :
R = __updateR(X, A, lmbda)
# compute factorization
fit = fitchange = fitold = f = 0
exectimes = []
ARAt = zeros((n,n), dtype=dtype)
for iter in xrange(maxIter):
tic = time.clock()
fitold = fit
A = __updateA(X, A, R, lmbda)
if proj:
Q, A2 = qr(A)
X2 = __projectSlices(X, Q)
R = __updateR(X2, A2, lmbda)
else :
R = __updateR(X, A, lmbda)
# compute fit value
f = lmbda*(norm(A)**2)
for i in range(k):
ARAt = dot(A, dot(R[i], A.T))
f += normX[i] + norm(ARAt)**2 - 2*dot(Xflat[i], ARAt.flatten()) + lmbda*(R[i].flatten()**2).sum()
f *= 0.5
fit = 1 - f / sumNormX
fitchange = abs(fitold - fit)
toc = time.clock()
exectimes.append( toc - tic )
_log.debug('[%3d] fit: %.5f | delta: %7.1e | secs: %.5f' % (iter,
fit, fitchange, exectimes[-1]))
if iter > 1 and fitchange < conv:
break
return A, R, f, iter+1, array(exectimes)
def __updateA(X, A, R, lmbda):
n, rank = A.shape
F = zeros((n, rank), dtype=X[0].dtype)
E = zeros((rank, rank), dtype=X[0].dtype)
AtA = dot(A.T,A)
for i in range(len(X)):
F += dot(X[i], dot(A, R[i].T)) + dot(X[i].T, dot(A, R[i]))
E += dot(R[i], dot(AtA, R[i].T)) + dot(R[i].T, dot(AtA, R[i]))
A = dot(F, inv(lmbda * eye(rank) + E))
return A
def __updateR(X, A, lmbda):
r = A.shape[1]
R = []
At = A.T
if lmbda == 0:
ainv = dot(pinv(dot(At, A)), At)
for i in range(len(X)):
R.append( dot(ainv, dot(X[i], ainv.T)) )
else :
AtA = dot(At, A)
tmp = inv(kron(AtA, AtA) + lmbda * eye(r**2))
for i in range(len(X)):
AtXA = dot(At, dot(X[i], A))
R.append( dot(AtXA.flatten(), tmp).reshape(r, r) )
return R
def __projectSlices(X, Q):
q = Q.shape[1]
X2 = []
for i in range(len(X)):
X2.append( dot(Q.T, dot(X[i], Q)) )
return X2
It's boring to paste such a long code but there is no other way to figure out my problems. I'm sorry about this.
I import this module and pass them arguments according to the author's website:
import pickle, sys
from rescal import rescal
rank = sys.argv[1]
X = pickle.load('us-presidents.pickle')
A, R, f, iter, exectimes = rescal(X, rank, lmbda=1.0)
The dataset us-presidents.rdf can be found here.
My questions are:
According to the code note, the tensor X is a list. I don't quite understand this, how do I relate a list to a tensor in Python? Can I understand tensor = list in Python?
Should I convert RDF format to a triple(subject, predicate, object) format first? I'm not sure of the data structure of X. How do I assignment values to X by hand?
Then, how to run it?
I paste the author's code without his authorization, is it an act of infringement? if so, I am so sorry and I will delete it soon.
The problems may be a little bored, but these are important to me. Any help would be greatly appreciated.
[1] Maximilian Nickel, Volker Tresp, Hans-Peter Kriegel,
A Three-Way Model for Collective Learning on Multi-Relational Data,
in Proceedings of the 28th International Conference on Machine Learning, 2011 , Bellevue, WA, USA

To answer Q2: you need to transform the RDF and save it before you can load it from the file 'us-presidents.pickle'. The author of that code probably did that once because the Python native pickle format loads faster. As the pickle format includes the datatype of the data, it is possible that X is some numpy class instance and you would need either an example pickle file as used by this code, or some code doing the pickle.dump to figure out how to convert from RDF to this particular pickle file as rescal expects it.
So this might answer Q1: the tensor consists of a list of elements. From the code you can see that the X parameter to rescal has a length (k = len(X) ) and can be indexed (T = X[i]). So it elements are used as a list (even if it might be some other datatype, that just behaves as such.
As an aside: If you are not familiar with Python and are just interested in the result of the computation, you might get more help contacting the author of the software.

According to the code note, the tensor X is a list. I don't quite understand this, how do I relate a list to a tensor in Python? Can I
understand tensor = list in Python?
Not necessarily but the author of the code has decided to represent the tensor data as a list data structure. As the comments indicate, the list X contains:
List of frontal slices X_k of the tensor X. The shape of each X_k is ('N', 'N')
That means the tensor is repesented as a list of tuples: [(N, N), ..., (N, N)].
I'm not sure of the data structure of X. How do I assignment values to X by hand?
Now that we now the data structure of X, we can assign values to it using assignment. The following will assign the tuple (1, 3) to the first position in the list X (as the first position is at index 0, the second at position 1, et cetera):
X[0] = (1, 3)
Similarly, the following will assign the tuple (2, 4) to the second position:
X[1] = (2, 4)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Can't get same values as numpy elementwise matrix multiplication using numba - python

Related

How to efficiently apply function over each row of ndarray with value from list of args?

How do I convert this Matlab code with meshgrid and arrays to Python code?

Converting `for` loop that can't be vectorized to sparse matrix

Element-wise maximum of two sparse matrices

How to run a .py module?

Categories

Resources