In TensorFlow I have a rank-2 tensor M (a matrix) of shape [D, D] and a rank-3 tensor T of shape [D, D, D].
I need to combine them to form a new matrix R as follows: the element R[a, b+c-a] is given by the sum of all the elements T[a, b, c]*M[b, c] where b+c-a is constant (where b+c-a has to be between 0 and D-1).
An inefficient way to create R is with nested for loops over the indices and a check that b+c-a does not exceed D-1 (e.g. in numpy):
R = np.zeros([D,D])
for a in range(D):
for b in range(D):
for c in range(D):
if 0 <= b+c-a < D:
R[a, b+c-a] += T[a, b, c]*M[b, c]
but I would like to use broadcasting and/or other more efficient methods.
How can I achieve this?
You can vectorize that calculation as follows:
import numpy as np
np.random.seed(0)
D = 10
M = np.random.rand(D, D)
T = np.random.rand(D, D, D)
# Original calculation
R = np.zeros([D, D])
for a in range(D):
for b in range(D):
for c in range(D):
if 0 <= b + c - a < D:
R[a, b + c - a] += T[a, b, c] * M[b, c]
# Vectorized calculation
tm = T * M
a = np.arange(D)[:, np.newaxis, np.newaxis]
b, c = np.ogrid[:D, :D]
col_idx = b + c - a
m = (col_idx >= 0) & (col_idx < D)
row_idx = np.tile(a, [1, D, D])
R2 = np.zeros([D, D])
np.add.at(R2, (row_idx[m], col_idx[m]), tm[m])
# Check result
print(np.allclose(R, R2))
# True
Alternatively, you could consider using Numba to accelerate the loops:
import numpy as np
import numba as nb
#nb.njit
def calculation_nb(T, M, D):
tm = T * M
R = np.zeros((D, D), dtype=tm.dtype)
for a in nb.prange(D):
for b in range(D):
for c in range(max(a - b, 0), min(D + a - b, D)):
R[a, b + c - a] += tm[a, b, c]
return R
print(np.allclose(R, calculation_nb(T, M, D)))
# True
In a couple of quick tests, even without parallelization, this is quite faster than NumPy.
Related
Here is python code in cvxpy:
import numpy as np
import time
import cvxpy as cp
n = 10
a = np.random.randint(1, 10, size=n)
b = np.random.randint(1, 10, size=n)
c = np.random.randint(1, 10, size=n)
d = np.random.randint(1, 10, size=n)
x = cp.Variable(shape=n, boolean=True)
# objective function
objective = cp.Maximize(cp.sum(cp.multiply(x,a)))
# constraints
constraints = []
constraints.append(cp.sum(cp.multiply(x, b) <= 50) # constraint 1
constraints.append(cp.sum_largest(cp.hstack([
cp.sum(cp.multiply(x, b)),
cp.sum(cp.multiply(x, c)),
cp.sum(cp.multiply(x, d))]), 2) <= 100) # constraint 2
prob = cp.Problem(objective, constraints)
# solve model
prob.solve(solver=cp.CBC, verbose=False, maximumSeconds=100)
print("status:", prob.status)
a, b, c, d and x are all binary. The objective is max(sum(x*a)) and the constraints are:
sum(x*b) <= 50
sum of the largest 2 values in [sum(x*b), sum(x*c), sum(x*d)] <= 100, this is implemented via sum_largest sum_largest([x*a, x*b, x*c], 2) <= 100
define others=[b, c, d] - b - (largest 2 value in [b, c, d])
For example:
case1: [sum(x*b), sum(x*c), sum(x*d)] = [1,2,3], so (largest 2 value in [b, c, d]) = [c, d] and others=[b, c, d] - b - [c, d] = None
case2: [sum(x*b), sum(x*c), sum(x*d)] = [3,2,1], so (largest 2 value in [b, c, d]) = [b, c] and others=[b, c, d] - b - [b, c] = d
constriants:
for i in others:
constraints.append(cp.sum(cp.multiply(x, i) <= 10)
Constraints 1, 2 have already been implemented. How can I implement constraint 3? Is it even possible in cvxpy?
Note: the question has changed, so this is no longer a valid answer.
The original problem was:
(1) sum(x*b) <= 5
(2) max([sum(x*b), sum(x*c), sum(x*d)]) <= 10
(3) define others=[b, c, d] - b - maxBCD([b, c, d]) (others is a set of symbols, maxBCD returns a symbol)
for i in others:
constraints.append(cp.sum(cp.multiply(x, i) <= 1)
This is an answer to that original problem statement. I have not deleted the answer as it is a good starting point (also because the answer states the problem a bit more precisely: in mathematical terms).
(1) and (2) can be written as:
xb = sum(x*b)
xc = sum(x*c)
xd = sum(x*d)
xb <= 5
xc <= 10
xd <= 10
(3) needs to know which one is the maximum. I assume two or all three can be maximum.
bIsMax,cIsMax,dIsMax: binary variables
# bisMax=1 => xb is largest
xb >= xc - (1-bisMax)*10
xb >= xd - (1-bisMax)*10
# cisMax=1 => xc is largest
xc >= xb - (1-cisMax)*5
xc >= xd - (1-cisMax)*10
# disMax=1 => xd is largest
xd >= xb - (1-disMax)*5
xd >= xc - (1-disMax)*10
# at least one of them is largest
bIsMax + cIsMax + dIsMax >= 1
Note that others=[b, c, d] - b- maxBCD can be restated as: others=[c, d] - maxBCD.
# if c is not max then xc<=1
xc <= 1 + 9*cIsMax
# if d is not max then xd<=1
xd <= 1 + 9*dIsMax
Left to the reader:
check the math and implement in CVXPY.
update the answer to the revised question.
I have a 3D numpy array A of shape (2133, 3, 3). Basically this is a list of 2133 lists with three 3D points. Furthermore I have a function which takes three 3D points and returns one 3D point, x = f(a, b, c), with a, b, c, x numpy arrays of length 3. Now I want to apply f to A, so that the output is an array of shape (2133, 3). So something like numpy.array([f(*A[0]),...,f(*A[2132])).
I tried numpy.apply_along_axis and numpy.vectorize without success.
To be more precise the function f I consider is given by:
def f(a, b, c, r1, r2=None, r3=None):
a = np.asarray(a)
b = np.asarray(b)
c = np.asarray(c)
if np.linalg.matrix_rank(np.matrix([a, b, c])) != 3:
# raise ValueError('The points are not collinear.')
return None
a, b, c, = sort_triple(a, b, c)
if any(r is None for r in (r2, r3)):
r2, r3 = (r1, r1)
ex = (b - a) / (np.linalg.norm(b - a))
i = np.dot(ex, c - a)
ey = (c - a - i*ex) / (np.linalg.norm(c - a - i*ex))
ez = np.cross(ex, ey)
d = np.linalg.norm(b - a)
j = np.dot(ey, c - a)
x = (pow(r1, 2) - pow(r2, 2) + pow(d, 2)) / (2 * d)
y = ((pow(r1, 2) - pow(r3, 2) + pow(i, 2) + pow(j, 2)) / (2*j)) - ((i/j)*x)
z_square = pow(r1, 2) - pow(x, 2) - pow(y, 2)
if z_square >= 0:
z = np.sqrt(z_square)
intersection = a + x * ex + y*ey + z*ez
return intersection
A = np.array([[[131.83, 25.2, 0.52], [131.51, 22.54, 0.52],[133.65, 23.65, 0.52]], [[13.02, 86.98, 0.52], [61.02, 87.12, 0.52],[129.05, 87.32, 0.52]]])
r1 = 1.7115
Thanks to the great help of #jdehesa I was able to produce an alternative solution to the one given by #hpaulj. I am not sure if this solution is the most elegant one but it worked so far. Comments are appreciated.
def sort_triple(a, b, c):
pts = np.stack((a, b, c), axis=1)
xSorted = pts[np.arange(pts.shape[0])[:, None], np.argsort(pts[:, :, 0])]
orientation = np.cross(xSorted[:, 1] - xSorted[:, 0], xSorted[:, 2] -
xSorted[:, 0])[:, 2] >= 0
xSorted_flipped = np.stack((xSorted[:, 0], xSorted[:, 2], xSorted[:, 1]),
axis=1)
xSorted = np.where(orientation[:, np.newaxis, np.newaxis], xSorted,
xSorted_flipped)
return map(np.squeeze, np.split(xSorted, 3, axis=1))
def f(A, r1, r2=None, r3=None):
a, b, c = map(np.squeeze, np.split(A, 3, axis=1))
a, b, c = sort_triple(a, b, c)
if any(r is None for r in (r2, r3)):
r2, r3 = (r1, r1)
ex = (b - a) / (np.linalg.norm(b - a, axis=1))[:, np.newaxis]
i = inner1d(ex, (c - a))
ey = ((c - a - i[:, np.newaxis]*ex) /
(np.linalg.norm(c - a - i[:, np.newaxis]*ex, axis=1))[:, np.newaxis])
ez = np.cross(ex, ey)
d = np.linalg.norm(b - a, axis=1)
j = inner1d(ey, c - a)
x = (np.square(r1) - np.square(r2) + np.square(d)) / (2 * d)
y = ((np.square(r1) - np.square(r3) + np.square(i) + np.square(j)) / (2*j) -
i/j*x)
z_square = np.square(r1) - np.square(x) - np.square(y)
mask = z_square < 0
z_square[mask] *= 0
z = np.sqrt(z_square)
z[mask] = np.nan
intersection = (a + x[:, np.newaxis] * ex + y[:, np.newaxis] * ey +
z[:, np.newaxis] * ez)
return intersection
Probably the map parts in each function could be done better. Maybe also the excessive use of np.newaxis.
This works fine (after commenting out sort_triple):
res = [f(*row,r1) for row in A]
print(res)
producing:
[array([ 132.21182324, 23.80481826, 1.43482849]), None]
That looks like one row produced a (3,) array, the other had some sort of problem and produced None. I don't know if that None was due to removing the sort or not. But in any case, turning a mix of arrays and None back into an array would be a problem. If all items of res were matching arrays, we could stack them back into a 2d array.
There are ways of getting modest speed improvements (compared to this list comprehension). But with a complex function like this, the time spent in the function (called 2000 times) dominates the time spent by the iteration mechanism.
And since you are iterating on the 1st dimension, and passing the other 2 (as 3 arrays), this explicit loop is a lot easier to use than vectorize, frompyfunc or apply_along/over...
To get significant time savings you have to write f() to work with the 3d array directly.
Basically, I have 2 tensors: A, where A.shape = (N, H, D), and B, where B.shape = (K, H, D). What I would like to do is to get a tensor, C, with shape (N, K, D, H) such that :
C[i, j, :, :] = A[i, :, :] * B[j, :, :].
Can this be done efficiently in Theano?
Side note: The actual end result that I would like to achieve is to have a tensor, E, of shape (N, K, D) such that :
E[i, j, :] = (A[i, :, :]*B[j, :, :]).sum(0)
So, if there is a way to get this directly, I would prefer it (saves on space hopefully).
One approach could be suggested that uses broadcasting -
(A[:,None]*B).sum(2)
Please note that the intermediate array being created would be of shape (N, K, H, D) before sum-reduction on axis=2 reduces it to (N,K,D).
You can get the final three dimensional result E without creating the large intermediate array using batched_dot:
import theano.tensor as tt
A = tt.tensor3('A') # A.shape = (D, N, H)
B = tt.tensor3('B') # B.shape = (D, H, K)
E = tt.batched_dot(A, B) # E.shape = (D, N, K)
Unfortunately this requires you to permute the dimensions on your input and output arrays. Though this can be done with dimshuffle in Theano it seems batched_dot can't cope with arbitrarily strided arrays and so the following raises a ValueError: Some matrix has no unit stride when E is evaluated:
import theano.tensor as tt
A = tt.tensor3('A') # A.shape = (N, H, D)
B = tt.tensor3('B') # B.shape = (K, H, D)
A_perm = A.dimshuffle((2, 0, 1)) # A_perm.shape = (D, N, H)
B_perm = B.dimshuffle((2, 1, 0)) # B_perm.shape = (D, H, K)
E_perm = tt.batched_dot(A_perm, B_perm) # E_perm.shape = (D, N, K)
E = E_perm.dimshuffle((1, 2, 0)) # E.shape = (N, K, D)
batched_dot uses scan along the first (size D) dimension. As scan is performed sequentially this could be computationally less efficient than computing all the products in parallel if running on a GPU.
You can tradeoff between the memory efficiency of the batched_dot approach and parallelism in the broadcasting approach using scan explicitly. Idea would be to calculate the full product C for batches of size M in parallel (assuming M is an exact factor of D), iterating over batches with scan:
import theano as th
import theano.tensor as tt
A = tt.tensor3('A') # A.shape = (N, H, D)
B = tt.tensor3('B') # B.shape = (K, H, D)
A_batched = A.reshape((N, H, M, D / M))
B_batched = B.reshape((K, H, M, D / M))
E_batched, _ = th.scan(
lambda a, b: (a[:, :, None, :] * b[:, :, :, None]).sum(1),
sequences=[A_batched.T, B_batched.T]
)
E = E_batched.reshape((D, K, N)).T # E.shape = (N, K, D)
I tried to copy one array, says A (2-D) to another array, says B (3-D) which have following shape
A is m * n array and B is m * n * p array
I tried the following code but it is very slow, like 1 sec/frame
for r in range (0, h):
for c in range (0, w):
x = random.randint(0, 20)
B[r, c, x] = A[r, c]
I also read some websites about fancy indexing but I still don't know how to apply it in mine.
I propose a solution using array indices. M,N,P are each (m,n) index arrays, specifying the m*n elements of B that will receive data from A.
def indexing(A, p):
m,n = A.shape
B = np.zeros((m,n,p), dtype=int)
P = np.random.randint(0, p, (m,n))
M, N = np.indices(A.shape)
B[M,N,P] = A
return B
For comparision, the original loop, and the solution using shuffle
def looping(A, p):
m, n = A.shape
B = np.zeros((m,n,p), dtype=int)
for r in range (m):
for c in range (n):
x = np.random.randint(0, p)
B[r, c, x] = A[r, c]
return B
def shuffling(A, p):
m, n = A.shape
B = np.zeros((m,n,p), dtype=int)
B[:,:,0] = A
map(np.random.shuffle, B.reshape(m*n,p))
return B
for m,n,p = 1000,1000,20, timings are:
looping: 1.16 s
shuffling: 10 s
indexing: 271 ms
for small m,n, looping is fastest. My indexing solution takes more time to setup, but the actual assignment is fast. The shuffling solution has as many iterations as the original.
The M,N arrays don't have to be full. They can be column and row arrays, respectively
M = np.arange(m)[:,None]
N = np.arange(n)[None,:]
or
M,N = np.ogrid[:m,:n]
This shaves off some time, more so for small test cases than a large one.
A repeatable version:
def indexing(A, p, B=None):
m, n = A.shape
if B is None:
B = np.zeros((m,n,p), dtype=int)
for r in range (m):
for c in range (n):
x = np.random.randint(0, p)
B[r, c, x] = A[r, c]
return B
indexing(A,p,indexing(A,p))
If A isn't the same size as the 1st 2 dim of B the index ranges will have to be changed. A doesn't have to be a 2D array either:
B[[0,0,2],[1,1,0],[3,4,5]] = [10,11,12]
Assuming that h=m, w=n and x=p, this should give you the same as you have in your example:
B[:,:,0]=A
map(np.random.shuffle, B.reshape(h*w,p))
Note also, I'm assuming the answer to NPE's question in comments is 'yes'
I have a plane, plane A, defined by its orthogonal vector, say (a, b, c).
(i.e. the vector (a, b, c) is orthogonal to plane A)
I wish to project a vector (d, e, f) onto plane A.
How can I do it in Python? I think there must be some easy ways.
Take (d, e, f) and subtract off the projection of it onto the normalized normal to the plane (in your case (a, b, c)). So:
v = (d, e, f)
- sum((d, e, f) *. (a, b, c)) * (a, b, c) / sum((a, b, c) *. (a, b, c))
Here, by *. I mean the component-wise product. So this would mean:
sum([x * y for x, y in zip([d, e, f], [a, b, c])])
or
d * a + e * b + f * c
if you just want to be clear but pedantic
and similarly for (a, b, c) *. (a, b, c). Thus, in Python:
from math import sqrt
def dot_product(x, y):
return sum([x[i] * y[i] for i in range(len(x))])
def norm(x):
return sqrt(dot_product(x, x))
def normalize(x):
return [x[i] / norm(x) for i in range(len(x))]
def project_onto_plane(x, n):
d = dot_product(x, n) / norm(n)
p = [d * normalize(n)[i] for i in range(len(n))]
return [x[i] - p[i] for i in range(len(x))]
Then you can say:
p = project_onto_plane([3, 4, 5], [1, 2, 3])