I think this issue boils down to my lack of understanding with Theano works. I'm in a situation where I want to create a variable that is the result of a subtraction between a distribution and a numpy array. This works fine when I specify the shape parameter as 1
import pymc3 as pm
import numpy as np
import theano.tensor as T
X = np.random.randint(low = -10, high = 10, size = 100)
with pm.Model() as model:
nl = pm.Normal('nl', shape = 1)
det = pm.Deterministic('det', nl - x)
nl.dshape
(1,)
However, this breaks when I specify shape > 1
with pm.Model() as model:
nl = pm.Normal('nl', shape = 2)
det = pm.Deterministic('det', nl - X)
ValueError: Input dimension mis-match. (input[0].shape[0] = 2, input[1].shape[0] = 100)
nl.dshape
(2,)
X.shape
(100,)
I tried transposing X to make it broadcastable
X2 = X.reshape(-1, 1).transpose()
X2.shape
(1, 100)
But now it declares a mismatch at .shape[1] instead of .shape[0]
with pm.Model() as model:
nl = pm.Normal('nl', shape = 2)
det = pm.Deterministic('det', nl - X2)
ValueError: Input dimension mis-match. (input[0].shape[1] = 2, input[1].shape[1] = 100)
I can make this work if I loop over the elements of the distribution
distShape = 2
with pm.Model() as model:
nl = pm.Normal('nl', shape = distShape)
det = {}
for i in range(distShape):
det[i] = pm.Deterministic('det' + str(i), nl[i] - X)
det
{0: det0, 1: det1}
However this feels inelegant and constrains me to use loops for the rest of the model. I was wondering if there was a way to specify this operation so that it could work the same as with distributions.
distShape = 2
with pm.Model() as model:
nl0 = pm.Normal('nl1', shape = distShape)
nl1 = pm.Normal('nl2', shape = 1)
det = pm.Deterministic('det', nl0 - nl1)
You can do
X = np.random.randint(low = -10, high = 10, size = 100)
X = x[:,None] # or x.reshape(-1, 1)
and then
with pm.Model() as model:
nl = pm.Normal('nl', shape = 2)
det = pm.Deterministic('det', nl - X)
In this case the shapes of nl and X will be ((2, 1), (100,)), respectively and then broadcastable.
Notice we get the same behavior with two NumPy arrays (not only one Theano tensor and one NumPy array)
a0 = np.array([1,2])
b0 = np.array([1,2,3,5])
a0 = a0[:,None] # comment/uncomment this line
print(a0.shape, b0.shape)
b0-a0
Related
We try to convert a randomly assigned tensor from 0 to 255 into a histogram and apply a smoothing filter to the histogram.
I am trying to add the result of the filter operation to a new array tensor, but I get an error about shape. Please solve it.
tensorflow version 2.0.0
x = tf.random.uniform(shape=[32,32], minval=0, maxval=255, dtype=tf.float32)
x = tf.reshape(x, [1024])
print("x",x)
#H = get2dHistogram(x, y, value_range=[[0.0,1.0], [0.0,1.0]], nbins=100, dtype=tf.dtypes.int32)
H = tf.histogram_fixed_width(x, value_range=[0, 255], nbins=256)
H = tf.cast(H, tf.float32)
print(H)
print("shape: ",np.shape(H))
filter_size = 7
zero_n = int(filter_size/2)
zeros = tf.constant([0.0]*zero_n)
print(zeros)
new = tf.concat([zeros, H], 0)
print(new)
print("shape: ",np.shape(new))
new = tf.concat([new, zeros], 0)
print(new)
print("shape: ",np.shape(new))
filter_size = 7
filter_list = []
for i in range(filter_size): # make filter array
filter_list.append(float(1/filter_size))
filter_array = np.array(filter_list, dtype = np.float32)
filter_array_tf = tf.constant(filter_array, dtype=tf.float32)
print("filter_array_tf:", filter_array_tf)
sm_hist = []
sm_hist = np.array(sm_hist, dtype=np.float32)
sm_hist_tf = tf.constant(sm_hist, dtype=tf.float32)
for i in range(0, 256):
alist = new[i:i+filter_size]
alist = tf.multiply(alist, filter_array_tf)
alist = tf.reduce_sum(alist)
print("alist:", alist)
print("sm_hist_tf:", sm_hist_tf)
sm_hist_tf = tf.concat([sm_hist_tf, alist], 0)
print(sm_hist_tf)
The error that I get:
InvalidArgumentError: ConcatOp : Ranks of all input tensors should match: shape[0] = [0] vs. shape[1] = [] [Op:ConcatV2] name: concat
change the last line of your for loop to:
sm_hist_tf = tf.concat([sm_hist_tf, tf.expand_dims(alist,0)], 0)
First things first: I'm relatively new to TensorFlow.
I'm trying to implement a custom layer in tensorflow.keras and I'm having relatively hard time when I try to achieve the following:
I've got 3 Tensors (x,y,z) of shape (?,49,3,3,32) [where ? is the batch size]
On each Tensor I compute the sum over the 3rd and 4th axes [thus I end up with 3 Tensors of shape (?,49,32)]
By doing an argmax (A)on the above 3 Tensors (?,49,32) I get a single (?,49,32) Tensor
Now I want to use this tensor to select slices from the initial x,y,z Tensors in the following form:
Each element in the last dimension of A corresponds to the selected Tensor.
(aka: 0 = X, 1 = Y, 2 = Z)
The index of the last dimension of A corresponds to the slice that I would like to extract from the Tensor last dimension.
I've tried to achieve the above using tf.gather but I had no luck. Then I tried using a series of tf.map_fn, which is ugly and computationally costly.
To simplify the above:
let's say we've got an A array of shape (3,3,3,32). Then the numpy equivalent of what I try to achieve is this:
import numpy as np
x = np.random.rand(3,3,32)
y = np.random.rand(3,3,32)
z = np.random.rand(3,3,32)
x_sums = np.sum(np.sum(x,axis=0),0);
y_sums = np.sum(np.sum(y,axis=0),0);
z_sums = np.sum(np.sum(z,axis=0),0);
max_sums = np.argmax([x_sums,y_sums,z_sums],0)
A = np.array([x,y,z])
tmp = []
for i in range(0,len(max_sums)):
tmp.append(A[max_sums[i],:,:,i)
output = np.transpose(np.stack(tmp))
Any suggestions?
ps: I tried tf.gather_nd but I had no luck
This is how you can do something like that with tf.gather_nd:
import tensorflow as tf
# Make example data
tf.random.set_seed(0)
b = 10 # Batch size
x = tf.random.uniform((b, 49, 3, 3, 32))
y = tf.random.uniform((b, 49, 3, 3, 32))
z = tf.random.uniform((b, 49, 3, 3, 32))
# Stack tensors together
data = tf.stack([x, y, z], axis=2)
# Put reduction axes last
data_t = tf.transpose(data, (0, 1, 5, 2, 3, 4))
# Reduce
s = tf.reduce_sum(data_t, axis=(4, 5))
# Find largest sums
idx = tf.argmax(s, 3)
# Make gather indices
data_shape = tf.shape(data_t, idx.dtype)
bb, ii, jj = tf.meshgrid(*(tf.range(data_shape[i]) for i in range(3)), indexing='ij')
# Gather result
output_t = tf.gather_nd(data_t, tf.stack([bb, ii, jj, idx], axis=-1))
# Reorder axes
output = tf.transpose(output_t, (0, 1, 3, 4, 2))
print(output.shape)
# TensorShape([10, 49, 3, 3, 32])
I want to vectorize the following code:
def style_noise(self, y, style):
n = torch.randn(y.shape)
for i in range(n.shape[0]):
n[i] = (n[i] - n.mean(dim=(1, 2, 3))[i]) * style.std(dim=(1, 2, 3))[i] / n.std(dim=(1, 2, 3))[i] + style.mean(dim=(1, 2, 3))[i]
noise = Variable(n, requires_grad=False).to(y.device)
return noise
I didn't find a way nice way of doing so.
y and style are 4d tensors, say style.shape = y.shape = [64, 3, 128, 128].
I want to return the noise tensor, noise.shape = [64, 3, 128, 128].
Please let me know in the comments if the question is not clear.
Your use case is exactly why the .mean and .std methods come with a keepdim parameter. You can make use of this to enable broadcasting semantics to vectorize things for you:
def style_noise(self, y, style):
n = torch.randn(y.shape)
n_mean = n.mean(dim=(1, 2, 3), keepdim=True)
n_std = n.std(dim=(1, 2, 3), keepdim=True)
style_mean = style.mean(dim=(1, 2, 3), keepdim=True)
style_std = style.std(dim=(1, 2, 3), keepdim=True)
n = (n - n_mean) * style_std / n_std + style_mean
noise = Variable(n, requires_grad=False).to(y.device)
return noise
To calculate mean and std for the whole tensor you set no arguments
m = t.mean(); print(m) # if you don't set the dim for the whole tensor
s = t.std(); print(s) # if you don't set the dim for the whole tensor
Then if your shape is 2,2,2 for instance, create tensors for broadcasting subtract and division.
ss = torch.empty(2,2,2).fill_(s)
print(ss)
mm = torch.empty(2,2,2).fill_(m)
print(mm)
At the moment keepdim is not working as expected when you don't set the dim.
m = t.mean(); print(m) # for the whole tensor
s = t.std(); print(s) # for the whole tensor
m = t.mean(dim=0); print(m) # 0 means columns mean
s = t.std(dim=0); print(s) # 0 means columns mean
m = t.mean(dim=1); print(m) # 1 means rows mean
s = t.std(dim=1); print(s) # 1 means rows mean
s = t.mean(keepdim=True);print(s) # will not work
m = t.std(keepdim=True);print(m) # will not work
If you set a dim as a tuple, then it will return mean for axes, you asked not for the whole.
In Tensorflow, I'm trying to create the following matrix:
A = [[a, 0], [0,b]]
Where a and b are the parameters I'm trying to solve for.
Here's what I have so far:
a = tf.Variable((1,), name="a", dtype = tf.float64)
b = tf.Variable((1,), name="b", dtype = tf.float64)
const = tf.constant(0,dtype = tf.float64, shape = (1,))
A0 = tf.transpose(tf.stack([a,const]))
A1 = tf.transpose(tf.stack([const,b]))
A = tf.stack([A0,A1])
However the shape of A ends up being (2,1,2) which is wrong (since A0 and B0 both have shapes (1,2))
Is there an easier way to create the matrix object A in Tensorflow, or does anyone know why the shape is getting messed up with what I'm doing?
Well you can create a single variable vector params = tf.Variable((2,), name="ab") and then multiply with the identity matrix tf.eye(2):
A = tf.matmul(tf.expand_dims(params,0), tf.eye(2))
tf.stack increases the rank of the tensor (creating a new axis) and combines them in the new axis. If you want to combine tensors along an existing axis, you should use tf.concat.
a = tf.Variable((1,), name="a", dtype = tf.float64)
b = tf.Variable((1,), name="b", dtype = tf.float64)
const = tf.constant(0,dtype = tf.float64, shape = (1,))
A0 = tf.stack([a, const], axis=1)
A1 = tf.stack([const, b], axis=1) # more clear than tf.transpose
A = tf.concat((A0, A1), axis=0)
A is now shape (2, 2).
To explain, each object is a rank-1 tensor with one element:
A = [1]
const = [0]
stacking gives:
tf.stack((A, const), axis=0) = [[1], [0]] # 2x1 matrix
concatenating gives:
tf.concat((A, const), axis=0) = [1, 0] # 2 vector
In short: I am looking for a simple numpy (maybe oneliner) implementation of Maxpool - maximum on a window on numpy.narray for all location of the window across dimensions.
In more details: I am implementing a convolutional neural network ("CNN"), one of the typical layers in such a network is MaxPool layer (look for example here). Writing
y = MaxPool(x, S), x is an input narray and S is a parameter, using pseudocode, the output of the MaxPool is given by:
y[b,h,w,c] = max(x[b, s*h + i, s*w + j, c]) over i = 0,..., S-1; j = 0,...,S-1.
That is, y is narray where the value at indexes b,h,w,c equals the maximum taken over the window of size S x S along the second and the third dimension of the input x, the window "corner" is placed at the indexes b,h,w,c.
Some additional details: The network is implemented using numpy. CNN has many "layers" where output of one layer is the input to the next layer. The input to a layers are numpy.narrays called "tensors". In my case tensors are 4-dimensional numpy.narray's, x. That is x.shape is a tuple (B,H,W,C). Each size of dimensions changes after the tensor is process by a layer, for example the input to layer i= 4 can have size B = 10, H = 24, W = 24, C = 3, while the output, aka input to i+1 layer has B = 10, H = 12, W = 12, C = 5. As indicated in the comments the size after application of MaxPool is (B, H - S + 1, W - S + 1, C).
For a concreteness: if I use
import numpy as np
y = np.amax(x, axis = (1,2))
where x.shape is say (2,3,3,4) this will give me what I want but for a degenerate case where the window I am maximizing over is of the size 3 x 3, the size of the second and third dimension of x, which is not exactly what I want.
Here's a solution using np.lib.stride_tricks.as_strided to create sliding windows resulting in a 6D array of shape : (B,H-S+1,W-S+1,S,S,C) and then simply performing max along the fourth and fifth axes, resulting in an output array of shape : (B,H-S+1,W-S+1,C). The intermediate 6D array would be a view into the input array and as such won't occupy anymore memory. The subsequent operation of max being a reduction would efficiently utilize the sliding views.
Thus, an implementation would be -
# Based on http://stackoverflow.com/a/41850409/3293881
def patchify(img, patch_shape):
a, X, Y, b = img.shape
x, y = patch_shape
shape = (a, X - x + 1, Y - y + 1, x, y, b)
a_str, X_str, Y_str, b_str = img.strides
strides = (a_str, X_str, Y_str, X_str, Y_str, b_str)
return np.lib.stride_tricks.as_strided(img, shape=shape, strides=strides)
out = patchify(x, (S,S)).max(axis=(3,4))
Sample run -
In [224]: x = np.random.randint(0,9,(10,24,24,3))
In [225]: S = 5
In [226]: np.may_share_memory(patchify(x, (S,S)), x)
Out[226]: True
In [227]: patchify(x, (S,S)).shape
Out[227]: (10, 20, 20, 5, 5, 3)
In [228]: patchify(x, (S,S)).max(axis=(3,4)).shape
Out[228]: (10, 20, 20, 3)