Linear least-squares solution for 3d inputs

Linear least-squares solution for 3d inputs - python

Problem
Say I have two arrays with the following shapes:
y.shape is (z, b). Picture this as a collection of z (b,) y vectors.
x.shape is (z, b, c). Picture this as a collection of z (b, c) multivariate x matrices.
My intent is to find the z independent vectors of least-squares coefficient solutions. I.e. the first solution is from regressing y[0] on x[0], where those inputs have shape (b, ) and (b, c) respectively. (b observations, c features.) The result would be shape (z, c).
Some example data
np.random.seed(123)
x = np.random.randn(190, 20, 3)
y = np.random.randn(190, 20) # Assumes no intercept term
# First vector of coefficients
np.linalg.lstsq(x[0], y[0])[0]
# array([-0.12823781, -0.3055392 , 0.11602805])
# Last vector of coefficients
np.linalg.lstsq(x[-1], y[-1])[0]
# array([-0.02777503, -0.20425779, 0.22874169])
NumPy's least-squares solver lstsq can't operate on these. (With my intended result being shape (190, 3), or 190 vectors of 3 coefficients each. Each (3,) vector is one coefficient set from regressions with n=20.)
Is there a workaround to get to the coefficient matrices wrapped into one result array? I'm thinking possibly of the matrix formulation:
For a 1d y and 2d x this would just be:
def coefs(y, x):
return np.dot(np.linalg.inv(np.dot(x.T, x)), np.dot(x.T, y))
but I'm having trouble getting this to accept a 2d y and 3d x as above.
Lastly, I'm curious as to why lstsq has trouble here. Is there a simple answer as to why the inputs must be at most 2d?

Here is some demo to demonstrate:
the problems mentioned in my comments
a mostly empirical analysis of looped-lstsq vs. one-step-embedded-lstsq
(with some surprising result at the end which is to be taken with a grain of salt):
Code
import numpy as np
import scipy.sparse as sp
from sklearn.datasets import make_regression
from time import perf_counter as pc
np.set_printoptions(edgeitems=3,infstr='inf',
linewidth=160, nanstr='nan', precision=1,
suppress=False, threshold=1000, formatter=None)
""" Create task """
Z, B, C = 4, 3, 2
Zs = []
Bs = []
for i in range(Z):
X, y, = make_regression(n_samples=B, n_features=C, random_state=i)
Zs.append(X)
Bs.append(y)
Zs = np.array(Zs)
Bs = np.array(Bs)
""" Independent looping """
print('LOOPED CALLS')
start = pc()
result = np.empty((Z, C))
for z in range(Z):
result[z] = np.linalg.lstsq(Zs[z], Bs[z])[0]
end = pc()
print('lhs-shape: ', Zs.shape)
print('lhs-dense-fill-ratio: ', np.count_nonzero(Zs) / np.product(Zs.shape))
print('used time: ', end-start)
print(result)
""" Embedding in one """
print('EMBEDDING INTO ONE CALL')
Zs_ = sp.block_diag([Zs[i] for i in range(Z)]).todense() # convenient to use scipy.sparse
# oops: there is a dense-one too:
# -> scipy.linalg.block_diag
Bs_ = Bs.flatten()
start = pc() # one could argue if transform above should be timed too!
result_ = np.linalg.lstsq(Zs_, Bs_)[0]
end = pc()
print('lhs-shape: ', Zs_.shape)
print('lhs-dense-fill-ratio: ', np.count_nonzero(Zs_) / np.product(Zs_.shape))
print('used time: ', end-start)
print(result_)
Output
LOOPED CALLS
lhs-shape: (4, 3, 2)
lhs-dense-fill-ratio: 1.0
used time: 0.0005415275241778155
[[ 89.2 43.8]
[ 68.5 41.9]
[ 61.9 20.5]
[ 5.1 44.1]]
EMBEDDING INTO ONE CALL
lhs-shape: (12, 8)
lhs-dense-fill-ratio: 0.25
used time: 0.00015907748341232328
[ 89.2 43.8 68.5 41.9 61.9 20.5 5.1 44.1]
lstsq problem-dimensions for each case
While the original data looks like:
[[[ 2.2 1. ]
[-1. 1.9]
[ 0.4 1.8]]
[[-1.1 -0.5]
[-2.3 0.9]
[-0.6 1.6]]
[[ 1.6 -2.1]
[-0.1 -0.4]
[-0.8 -1.8]]
[[-0.3 -0.4]
[ 0.1 -1.9]
[ 1.8 0.4]]]
[[ 242.7 -5.4 112.9]
[ -95.7 -121.4 26.2]
[ 57.9 -12. -88.8]
[ -17.1 -81.6 28.4]]
and each solve looks like:
LHS
[[ 2.2 1. ]
[-1. 1.9]
[ 0.4 1.8]]
RHS
[ 242.7 -5.4 112.9]
the embedded problem (one solving-step) looks like:
LHS
[[ 2.2 1. 0. 0. 0. 0. 0. 0. ]
[-1. 1.9 0. 0. 0. 0. 0. 0. ]
[ 0.4 1.8 0. 0. 0. 0. 0. 0. ]
[ 0. 0. -1.1 -0.5 0. 0. 0. 0. ]
[ 0. 0. -2.3 0.9 0. 0. 0. 0. ]
[ 0. 0. -0.6 1.6 0. 0. 0. 0. ]
[ 0. 0. 0. 0. 1.6 -2.1 0. 0. ]
[ 0. 0. 0. 0. -0.1 -0.4 0. 0. ]
[ 0. 0. 0. 0. -0.8 -1.8 0. 0. ]
[ 0. 0. 0. 0. 0. 0. -0.3 -0.4]
[ 0. 0. 0. 0. 0. 0. 0.1 -1.9]
[ 0. 0. 0. 0. 0. 0. 1.8 0.4]]
RHS
[ 242.7 -5.4 112.9 -95.7 -121.4 26.2 57.9 -12. -88.8 -17.1 -81.6 28.4]
There is no way, given the assumptions / standard-form of lstsq to embed this independence-assumption without introducing a lot of zeros!
lstsq is:
not able to exploit sparsity as the core-algorithm is a dense-one
take a look at the transformed shape: this will be heavy in terms of memory and computation!
not able to use information from fit 0 to speed up something in fit 1
they are independent after all; no information gain in theory
able to vectorize a lot (but that's not helping in general)
Your example-shapes
Trimmed output for your specific shapes, this time: test a sparse-solver too:
Added code (at the end)
print('EMBEDDING -> sparse-solver')
Zs_ = sp.csc_matrix(Zs_) # sparse!
start = pc()
result__ = sp.linalg.lsmr(Zs_, Bs_)[0]
end = pc()
print('lhs-shape: ', Zs_.shape)
print('lhs-dense-fill-ratio: ', Zs_.nnz / np.product(Zs_.shape))
print('used time: ', end-start)
print(result__)
Output
LOOPED CALLS
lhs-shape: (190, 20, 3)
lhs-dense-fill-ratio: 1.0
used time: 0.01716980329027777
[ 11.9 31.8 29.6]
...
[ 44.8 28.2 62.3]]
EMBEDDING INTO ONE CALL
lhs-shape: (3800, 570)
lhs-dense-fill-ratio: 0.00526315789474
used time: 0.6774500271820254
[ 11.9 31.8 29.6 ... 44.8 28.2 62.3]
EMBEDDING -> sparse-solver
lhs-shape: (3800, 570)
lhs-dense-fill-ratio: 0.00526315789474
used time: 0.0038423098412817547 # a bit of a surprise
[ 11.9 31.8 29.6 ... 44.8 28.2 62.3]
Conclusion
In general: solve independently!
In some cases, the task above will be solved faster when using the sparse-solver approach, but analysis here is hard as we are comparing two completely different algorithms (direct vs. iterative) and the results might change in some dramatical way for other data.

Here is the linear algebra solution, with the speed right on par with #sascha's looped version for smaller arrays.
print('Matrix formulation')
start = pc()
result = np.squeeze(np.matmul(np.linalg.inv(np.matmul(Zs.swapaxes(1,2), Zs)),
np.matmul(Zs.swapaxes(1,2), np.atleast_3d(Bs))))
end = pc()
print('used time: ', end-start)
print(result)
Ouput:
Matrix formulation
used time: 0.00015713176480858237
[[ 89.2 43.8]
[ 68.5 41.9]
[ 61.9 20.5]
[ 5.1 44.1]]
However, #sascha's answer wins out easily for much larger inputs, especially as the size of the third dimension grows (number of exogenous variables/features).
Z, B, C = 400, 300, 20
Zs = []
Bs = []
for i in range(Z):
X, y, = make_regression(n_samples=B, n_features=C, random_state=i)
Zs.append(X)
Bs.append(y)
Zs = np.array(Zs)
Bs = np.array(Bs)
# --------
print('Matrix formulation')
start = pc()
result = np.squeeze(np.matmul(np.linalg.inv(np.matmul(Zs.swapaxes(1,2), Zs)),
np.matmul(Zs.swapaxes(1,2), np.atleast_3d(Bs))))
end = pc()
print('used time: ', end-start)
print(result)
# --------
print('Looped calls')
start = pc()
result = np.empty((Z, C))
for z in range(Z):
result[z] = np.linalg.lstsq(Zs[z], Bs[z])[0]
end = pc()
print('used time: ', end-start)
print(result)
Output:
Matrix formulation
used time: 0.24000779996413257
[[ 1.2e+01 1.3e-15 6.3e+01 ..., -8.9e-15 5.3e-15 -1.1e-14]
[ 5.8e+01 2.7e-14 -4.8e-15 ..., 8.5e+01 -1.5e-14 1.8e-14]
[ 1.2e+01 -1.2e-14 4.4e-16 ..., 6.0e-15 8.6e+01 6.0e+01]
...,
[ 2.9e-15 6.6e+01 1.1e-15 ..., 9.8e+01 -2.9e-14 8.4e+01]
[ 2.8e+01 6.1e+01 -1.2e-14 ..., -2.5e-14 6.3e+01 5.9e+01]
[ 7.0e+01 3.3e-16 8.4e+00 ..., 4.1e+01 -6.2e-15 5.8e+01]]
Looped calls
used time: 0.17400113389658145
[[ 1.2e+01 7.1e-15 6.3e+01 ..., -2.8e-14 1.1e-14 -4.8e-14]
[ 5.8e+01 -5.7e-14 -4.9e-14 ..., 8.5e+01 -5.3e-15 6.8e-14]
[ 1.2e+01 3.6e-14 4.5e-14 ..., -3.6e-15 8.6e+01 6.0e+01]
...,
[ 6.3e-14 6.6e+01 -1.4e-13 ..., 9.8e+01 2.8e-14 8.4e+01]
[ 2.8e+01 6.1e+01 -2.1e-14 ..., -1.4e-14 6.3e+01 5.9e+01]
[ 7.0e+01 -1.1e-13 8.4e+00 ..., 4.1e+01 -9.4e-14 5.8e+01]]

Related

apply diminishing returns across 2 axis with numpy

How can I use numpy to apply a level of diminishing returns across 2 axes. I'm working with temperature model data for a fixed (x,y) location. So the axes I'm working with is t_axis time and the z_axis vertical atmosphere.
The values below dont really apply to what would make sense for the normal atmosphere, but lets pretend.
a1=np.arange(16).reshape(4,4)
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
assume the information above is current forecast model data for my location, and it is predicting a temp of 12°C at the surface right now. But when I walk outside its actually 10°C, so I want to adjust the model data and make that temperature 10°C.
z_axis=3
t_axis=0
a1[z_axis,t_axis]=10
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[10 13 14 15]]
but really what I want to do apply a level of correction based on 2 variables t_mod (diminished returns over time) & z_mod (diminished returns through the vertical atmosphere).
correction = -2
t_mod=.05#50%
z_mod=0.25#25%
# how can i generate this array from modifiers
a2=np.array([
[0,0,0,0],#6k feet above ground level (agl)
[0,0,0,0],#4k feet agl
[.25,.13,0,0],#2k feet agl
[1,.5,.25,0]#surface
# ^ ^ ^ ^__ +3 hour
# | | L__ +2 hour
# | L__ +1 hour
# L__ zero hour
])
a1+(a2*correction )
[[ 0. 1. 2. 3. ]
[ 4. 5. 6. 7. ]
[ 7.5 8.74 8.8 11. ]
[10. 12. 13.5 15. ]]
Is this the approach I should be using? If so how can I generate a2 from the z and t axis modifiers.

How about this, we use linear stepping in t and z directions and multiply the t and z axes for points inside the matrix:
def shock_2d(t_mod, z_mod, n=4):
ts = np.maximum(1 - np.arange(n)*t_mod,0)
zs = np.maximum(1 - np.arange(n)*z_mod,0)
shock = zs.reshape(-1,1) # ts.reshape(1,-1)
return np.flipud(shock)
eg
shock_2d(t_mod = 0.5, z_mod = 0.25)
Out:
array([[0.25 , 0.125, 0. , 0. ],
[0.5 , 0.25 , 0. , 0. ],
[0.75 , 0.375, 0. , 0. ],
[1. , 0.5 , 0. , 0. ]])
and
shock_2d(t_mod = 0.05, z_mod = 0.25)
Out:
array([[0.25 , 0.2375, 0.225 , 0.2125],
[0.5 , 0.475 , 0.45 , 0.425 ],
[0.75 , 0.7125, 0.675 , 0.6375],
[1. , 0.95 , 0.9 , 0.85 ]])
the last argument, n, is the size of the matrix

How to create a rectangular grid with custom start point and step value

I'm working on a project where I need to calibrate to cameras. As you know one needs to define a plane grid points in the 3D-world and find their correspondences on the image plane. Therefore, the first camera has the following 3D_grid points:
mport cv2 as cv
import numpy as np
WPoints_cam1 = np.zeros((9*3,3), np.float64)
WPoints_cam1[:,:2] = np.mgrid[0:9,0:3].T.reshape(-1,2)*0.4
print(WPoints_cam1)
[[0. 0. 0. ]# world coordinate center
[0.4 0. 0. ]
[0.8 0. 0. ]
[1.2 0. 0. ]
[1.6 0. 0. ]
[2. 0. 0. ]
[2.4 0. 0. ]
[2.8 0. 0. ]
[3.2 0. 0. ]
[0. 0.4 0. ]
[0.4 0.4 0. ]
[0.8 0.4 0. ]
[1.2 0.4 0. ]
[1.6 0.4 0. ]
[2. 0.4 0. ]
[2.4 0.4 0. ]
[2.8 0.4 0. ]
[3.2 0.4 0. ]
[0. 0.8 0. ]
[0.4 0.8 0. ]
[0.8 0.8 0. ]
[1.2 0.8 0. ]
[1.6 0.8 0. ]
[2. 0.8 0. ]
[2.4 0.8 0. ]
[2.8 0.8 0. ]
[3.2 0.8 0. ]]
As seen above the first grid (for the first camera) starts from the defined reference 3D_point (0,0,0) and ends by the point (3.2,0.8 0) with a constant offset of 0.4 and 9x3 dimension
Note that all Z coordinates were put to Z=0 (Zhengyou Zhang calibration)
Now my question is, as I need to define a second grid(for the second camera) that also refers to the defined 3D_coordinate center (0,0,0), I need to define a grid that starts from (3.6,0,0) and ends with (6.8,0.8,0) with the same offset 0.4 and has a dimension 9x3
I believe this is easy to do. However I can't think out of the box due to my beginner level of experience.
Would appreciate for some help and thanks in advance.

You can scale each column like this:
np.mgrid[0:8, 0:3].T.reshape(-1,2) * np.array([(7.8 - 3.6) / 7, 0.4]) + np.array([3.6, 0])
or combine it into scaling matrix like this (and then add on a vector for the translation)
np.mgrid[0:8, 0:3].T.reshape(-1,2) # np.array([[(7.8 - 3.6) / 7, 0], [0, 0.4]]).T + np.array([3.6, 0])
regarding where (7.8 - 3.6) / 7 comes from, the numerator should be self evident. The denominator is the same but for your original dimensions. With 0:8 the max is 7 and the min is 0 so the denominator becomes 7 - 0.

python numpy - improve efficiency on column-wise cosine similarity

I am fairly new to programming and I never used numpy before.
So, I have a matrix with 19001 x 19001 dimensions. It contains a lot of zeros, so it is relatively sparse. I wrote some code to compute the pairwise cosine similarity of the columns if the item in the row is non-zero. I add all the pairwise similarity values of one row and do some mathematical operations on them to obtain one value for each row of the matrix in the end (see code below). It does what it is supposed to, however as dealing with a great number of dimensions, it is really slow. Is there any way to modify my code to make it more efficient?
import numpy as np
from scipy.spatial.distance import cosine
row_number = 0
out_file = open('outfile.txt', 'w')
for row in my_matrix:
non_zeros = np.nonzero(my_matrix[row_number])[0]
non_zeros = list(non_zeros)
cosine_sim = []
for item in non_zeros:
if len(non_zeros) <= 1:
break
x = non_zeros[0]
y = non_zeros[1]
similarity = 1 - cosine(my_matrix[:, x], my_matrix[:, y])
cosine_sim.append(similarity)
non_zeros.pop(0)
summing = np.sum(cosine_sim)
mean = summing / len(cosine_sim)
log = np.log(mean)
out_file_value = log * -1
out_file.write(str(row_number) + " " + str(out_file_value) + "\n")
if row_number <= 19000:
row_number += 1
else:
break
I know that there are some function to actually compute the cosine similarity even between columns (from sklearn.metrics.pairwise import cosine_similarity), so I tried it. However, the output is kind of the same but on the same time really confusing to me even though I read the documentation and the posts on this page referring to the issue.
For instance:
my_matrix =[[0. 0. 7. 0. 5.]
[0. 0. 11. 0. 0.]
[0. 2. 0. 0. 0.]
[0. 0. 2. 11. 5.]
[0. 0. 5. 0. 0.]]
transposed = np.transpose(my_matrix)
sim_matrix = cosine_similarity(transposed)
# resulting similarity matrix
sim_matrix =[[0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0.]
[0. 0. 1. 0.14177624 0.45112924]
[0. 0. 0.14177624 1. 0.70710678]
[0. 0. 0.45112924 0.70710678 1.]]
If I compute the cosine similarity with my code above, it returns 0.45112924 for the 1st row ([0]) and 0.14177624 and 0.70710678 for row 4 ([3]).
out_file.txt
0 0.796001425306
1 nan
2 nan
3 0.856981065776
4 nan
I greatly appreciate any help or suggestions to my question!

You can consider using scipy instead. However, it doesn't take sparse matrix input. You have to provide numpy array.
import scipy.sparse as sp
from scipy.spatial.distance import cdist
X = np.random.randn(10000, 10000)
D = cdist(X, X.T, metric='cosine') # cosine distance matrix between 2 columns
Here is the speed that I got for 10000 x 10000 random array.
%timeit cdist(X, X.T, metric='cosine')
16.4 s ± 325 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Try on small array
X = np.array([[1,0,1], [0, 3, 2], [1,0,1]])
D = cdist(X, X.T, metric='cosine')
This will give
[[ 1.11022302e-16 1.00000000e+00 4.22649731e-01]
[ 6.07767730e-01 1.67949706e-01 9.41783727e-02]
[ 1.11022302e-16 1.00000000e+00 4.22649731e-01]]
For example D[0, 2] is the cosine distance between column 0 and 2
from numpy.linalg import norm
1 - np.dot(X[:, 0], X[:,2])/(norm(X[:, 0]) * norm(X[:,2])) # give 0.422649

Theano: how to efficiently undo/reverse max-pooling

I'm using Theano 0.7 to create a convolutional neural net which uses max-pooling (i.e. shrinking a matrix down by keeping only the local maxima).
In order to "undo" or "reverse" the max-pooling step, one method is to store the locations of the maxima as auxiliary data, then simply recreate the un-pooled data by making a big array of zeros and using those auxiliary locations to place the maxima in their appropriate locations.
Here's how I'm currently doing it:
import numpy as np
import theano
import theano.tensor as T
minibatchsize = 2
numfilters = 3
numsamples = 4
upsampfactor = 5
# HERE is the function that I hope could be improved
def upsamplecode(encoded, auxpos):
shp = encoded.shape
upsampled = T.zeros((shp[0], shp[1], shp[2] * upsampfactor))
for whichitem in range(minibatchsize):
for whichfilt in range(numfilters):
upsampled = T.set_subtensor(upsampled[whichitem, whichfilt, auxpos[whichitem, whichfilt, :]], encoded[whichitem, whichfilt, :])
return upsampled
totalitems = minibatchsize * numfilters * numsamples
code = theano.shared(np.arange(totalitems).reshape((minibatchsize, numfilters, numsamples)))
auxpos = np.arange(totalitems).reshape((minibatchsize, numfilters, numsamples)) % upsampfactor # arbitrary positions within a bin
auxpos += (np.arange(4) * 5).reshape((1,1,-1)) # shifted to the actual temporal bin location
auxpos = theano.shared(auxpos.astype(np.int))
print "code:"
print code.get_value()
print "locations:"
print auxpos.get_value()
get_upsampled = theano.function([], upsamplecode(code, auxpos))
print "the un-pooled data:"
print get_upsampled()
(By the way, in this case I have a 3D tensor, and it's only the third axis that gets max-pooled. People who work with image data might expect to see two dimensions getting max-pooled.)
The output is:
code:
[[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]]
locations:
[[[ 0 6 12 18]
[ 4 5 11 17]
[ 3 9 10 16]]
[[ 2 8 14 15]
[ 1 7 13 19]
[ 0 6 12 18]]]
the un-pooled data:
[[[ 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 2. 0.
0. 0. 0. 0. 3. 0.]
[ 0. 0. 0. 0. 4. 5. 0. 0. 0. 0. 0. 6. 0. 0.
0. 0. 0. 7. 0. 0.]
[ 0. 0. 0. 8. 0. 0. 0. 0. 0. 9. 10. 0. 0. 0.
0. 0. 11. 0. 0. 0.]]
[[ 0. 0. 12. 0. 0. 0. 0. 0. 13. 0. 0. 0. 0. 0.
14. 15. 0. 0. 0. 0.]
[ 0. 16. 0. 0. 0. 0. 0. 17. 0. 0. 0. 0. 0. 18.
0. 0. 0. 0. 0. 19.]
[ 20. 0. 0. 0. 0. 0. 21. 0. 0. 0. 0. 0. 22. 0.
0. 0. 0. 0. 23. 0.]]]
This method works but it's a bottleneck, taking most of my computer's time (I think the set_subtensor calls might imply cpu<->gpu data copying). So: can this be implemented more efficiently?
I suspect there's a way to express this as a single set_subtensor() call which may be faster, but I don't see how to get the tensor indexing to broadcast properly.
UPDATE: I thought of a way of doing it in one call, by working on the flattened tensors:
def upsamplecode2(encoded, auxpos):
shp = encoded.shape
upsampled = T.zeros((shp[0], shp[1], shp[2] * upsampfactor))
add_to_flattened_indices = theano.shared(np.array([ [[(y + z * numfilters) * numsamples * upsampfactor for x in range(numsamples)] for y in range(numfilters)] for z in range(minibatchsize)], dtype=theano.config.floatX).flatten(), name="add_to_flattened_indices")
upsampled = T.set_subtensor(upsampled.flatten()[T.cast(auxpos.flatten() + add_to_flattened_indices, 'int32')], encoded.flatten()).reshape(upsampled.shape)
return upsampled
get_upsampled2 = theano.function([], upsamplecode2(code, auxpos))
print "the un-pooled data v2:"
ups2 = get_upsampled2()
print ups2
However, this is still not good efficiency-wise because when I run this (added on to the end of the above script) I find out that the Cuda libraries can't currently do the integer index manipulation efficiently:
ERROR (theano.gof.opt): Optimization failure due to: local_gpu_advanced_incsubtensor1
ERROR (theano.gof.opt): TRACEBACK:
ERROR (theano.gof.opt): Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/theano/gof/opt.py", line 1493, in process_node
replacements = lopt.transform(node)
File "/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/opt.py", line 952, in local_gpu_advanced_incsubtensor1
gpu_y = gpu_from_host(y)
File "/usr/local/lib/python2.7/dist-packages/theano/gof/op.py", line 507, in __call__
node = self.make_node(*inputs, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/basic_ops.py", line 133, in make_node
dtype=x.dtype)()])
File "/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/type.py", line 69, in __init__
(self.__class__.__name__, dtype, name))
TypeError: CudaNdarrayType only supports dtype float32 for now. Tried using dtype int64 for variable None

I don't know whether this is faster, but it may be a little more concise. See if it is useful for your case.
import numpy as np
import theano
import theano.tensor as T
minibatchsize = 2
numfilters = 3
numsamples = 4
upsampfactor = 5
totalitems = minibatchsize * numfilters * numsamples
code = np.arange(totalitems).reshape((minibatchsize, numfilters, numsamples))
auxpos = np.arange(totalitems).reshape((minibatchsize, numfilters, numsamples)) % upsampfactor
auxpos += (np.arange(4) * 5).reshape((1,1,-1))
# first in numpy
shp = code.shape
upsampled_np = np.zeros((shp[0], shp[1], shp[2] * upsampfactor))
upsampled_np[np.arange(shp[0]).reshape(-1, 1, 1), np.arange(shp[1]).reshape(1, -1, 1), auxpos] = code
print "numpy output:"
print upsampled_np
# now the same idea in theano
encoded = T.tensor3()
positions = T.tensor3(dtype='int64')
shp = encoded.shape
upsampled = T.zeros((shp[0], shp[1], shp[2] * upsampfactor))
upsampled = T.set_subtensor(upsampled[T.arange(shp[0]).reshape((-1, 1, 1)), T.arange(shp[1]).reshape((1, -1, 1)), positions], encoded)
print "theano output:"
print upsampled.eval({encoded: code, positions: auxpos})

Fastest way to compute upper-triangular matrix of geometric series (Python)

and thanks in advance for the help.
Using Python (mostly numpy), I am trying to compute an upper-triangular matrix where each row "j" is the first j-terms of a geometric series, all rows using the same parameter.
For example, if my parameter is B (where abs(B)=<1, i.e. B in [-1,1]), then row 1 would be [1 B B^2 B^3 ... B^(N-1)], row 2 would be [0 1 B B^2...B^(N-2)] ... row N would be [0 0 0 ... 1].
This computation is key to a Bayesian Metropolis-Gibbs sampler, and so needs to be done thousands of times for new values of "B".
I have currently tried this two ways:
Method 1 - Mostly Vectorized:
B_Matrix = np.triu(np.dot(np.reshape(B**(-1*np.array(range(N))),(N,1)),np.reshape(B**(np.array(range(N))),(1,N))))
Essentially, this is the upper triangle part of a product of an Nx1 and 1xN set of matrices:
upper triangle ([1 B^(-1) B^(-2) ... B^(-(N-1))]' * [1 B B^2 B^3 ... B^(N-1)])
This works great for small N (algebraically it is correct), but for large N it errs out. And it produces errors out for B=0 (which should be allowed). I believe this is stemming from taking B^(-N) ~ inf for small B and large N.
Method 2:
B_Matrix = np.zeros((N,N))
B_Row_1 = B**(np.array(range(N)))
for n in range(N):
B_Matrix[n,n:] = B_Row_1[0:N-n]
So that just fills in the matrix row by row, but uses a loop which slows things down.
I was wondering if anyone had run into this before, or had any better ideas on how to compute this matrix in a faster way.
I've never posted on stackoverflow before, but didn't see this question anywhere, and thought I'd ask.
Let me know if there's a better place to ask this, and if I should provide anymore detail.

You could use scipy.linalg.toeplitz:
In [12]: n = 5
In [13]: b = 0.5
In [14]: toeplitz(b**np.arange(n), np.zeros(n)).T
Out[14]:
array([[ 1. , 0.5 , 0.25 , 0.125 , 0.0625],
[ 0. , 1. , 0.5 , 0.25 , 0.125 ],
[ 0. , 0. , 1. , 0.5 , 0.25 ],
[ 0. , 0. , 0. , 1. , 0.5 ],
[ 0. , 0. , 0. , 0. , 1. ]])
If your use of the array is strictly "read only", you can play tricks with numpy strides to quickly create an array that uses only 2*n-1 elements (instead of n^2):
In [55]: from numpy.lib.stride_tricks import as_strided
In [56]: def make_array(b, n):
....: vals = np.zeros(2*n - 1)
....: vals[n-1:] = b**np.arange(n)
....: a = as_strided(vals[n-1:], shape=(n, n), strides=(-vals.strides[0], vals.strides[0]))
....: return a
....:
In [57]: make_array(0.5, 4)
Out[57]:
array([[ 1. , 0.5 , 0.25 , 0.125],
[ 0. , 1. , 0.5 , 0.25 ],
[ 0. , 0. , 1. , 0.5 ],
[ 0. , 0. , 0. , 1. ]])
If you will modify the array in-place, make a copy of the result returned by make_array(b, n). That is, arr = make_array(b, n).copy().
The function make_array2 incorporates the suggestion #Jaime made in the comments:
In [30]: def make_array2(b, n):
....: vals = np.zeros(2*n-1)
....: vals[n-1] = 1
....: vals[n:] = b
....: np.cumproduct(vals[n:], out=vals[n:])
....: a = as_strided(vals[n-1:], shape=(n, n), strides=(-vals.strides[0], vals.strides[0]))
....: return a
....:
In [31]: make_array2(0.5, 4)
Out[31]:
array([[ 1. , 0.5 , 0.25 , 0.125],
[ 0. , 1. , 0.5 , 0.25 ],
[ 0. , 0. , 1. , 0.5 ],
[ 0. , 0. , 0. , 1. ]])
make_array2 is more than twice as fast as make_array:
In [35]: %timeit make_array(0.99, 600)
10000 loops, best of 3: 23.4 µs per loop
In [36]: %timeit make_array2(0.99, 600)
100000 loops, best of 3: 10.7 µs per loop

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Linear least-squares solution for 3d inputs - python

Related

apply diminishing returns across 2 axis with numpy

How to create a rectangular grid with custom start point and step value

python numpy - improve efficiency on column-wise cosine similarity

Theano: how to efficiently undo/reverse max-pooling

Fastest way to compute upper-triangular matrix of geometric series (Python)

Categories

Resources