Is it somehow possible in Keras (neural network library) to do a multiplication with a fixed / given numpy array?
I like to multiply the output of a 2D-convolution with a matrix. I tried to use Backend.dot, but it does not seem to work (i always get some errors like numpy.ndarray object has no attribute get_shape).
Thank you
-edit-
I found a solution (already yesterday, anyway: thanks for the comments). I liked to convert a 2D-spectrum to a 2D-mel-spectrum inside the network (not trainable). The 2D-spectrogram may be seen as a large matrix and then the mel-spectrogram may be calculated via a matrix multiplication.
Long story short: I used a TimeDistributed Dense-Layer without a bias. I just load the weights (=the constant matrix) and thats it:
# Change time / frequency axis (required for TimeDistributed)
nw = input_layer
nw = Permute((2, 1))(nw)
# Create a layer that computes the Mel spectrogram
mel_basis = librosa.filters.mel(22050, N_FFT)[:, :spectrogram_freqs] # The spectrogram has only spectrogram_freqs frequencies
mel_layer = TimeDistributed(Dense(mel_basis.shape[0], bias=False, trainable=False))
nw = mel_layer(nw)
mel_layer.set_weights([np.transpose(mel_basis)])
# Change the time / frequency axis
nw = Permute((2, 1))(nw)
Related
Regarding the answer posted here, when I want to use the equations for obtaining the values of the parameters of the transposed convolution, I face some problems. For example, I have a tensor with the size of [16, 256, 16, 160, 160] and I want to upsample that to the size of [16, 256, 16, 224, 224]. Based on the equation of the transposed convolution, when, for solving the equations for the height, I select stride+2 and I want to find the k (kernel size), I have the following equation that the kernel size will have a large and also negative value.
224 = (160 - 2)x (2) + 1x(k - 1) + 1
What is wrong with my calculations and how I can find the parameters.
I don't think you applied the formula incorrectly, I think it's primarily the issue with the input and output dimensions you desire that are not possible with a stride=2
Transposed or Dialated convolutions scale the output really quickly. Let's say for example, you were just taking these params for your Transposed Convolution(I'm simplifying the values here to 1D just to make the calculations clear):
Input Size = 160
Stride = 2
Kernel = 1
Padding = 0
Output Padding = 0
Now we apply the formula from the official docs for calculating output shape:
H_out =(H_in − 1)×stride[0]−2×padding[0]+dilation[0]×(kernel_size[0]−1)+output_padding[0]+1
OR we can simplify the formula a bit:
Output Size = ((Input Size - 1) * Strides) - (2 * Padding) + Filter_Size + Ouput Padding
Here, Filter_Size = dilation_factor* (kernel_size-1) to make the formula seem less scary.
Now let's take our example and put the values in to see what Transposed OUtput size we can get with the stride=2 and smallest kernel size possible, that is, kernel=1
Ouput_Size = ((160-1)*2) - (2*0) + 1*(1-1) + 0
Output_Size = 318 - 0 + 0 + 0
Output_Size = 318
So, with the stride you want, you will atleast have an output_size >= 318 and you want 224 hence the negative kernel_size.
I hope that answers your question.
Ref Links to understand Transposed Convolution calculations better with an example:
Paperspace: Transpose Convolution Explained for Up-Sampling Images
Calculating the Output Size of Convolutions and Transpose Convolutions
There is no good constructive answer to this question.
Being in some sense inverse to conv2d, which downsample image stride times, transposed_conv2d upsample stride times. One can not use it for arbitrary resize and get evenly good result, there's torchvision.transforms.Resize or adaptive pooling for this.
torchvision.transforms.Resize is the default choice, it is simple and flexible, one can feed PIL image or torch.Tensor to it, - use former, if input sizes vary dynamically, use latter, if not.
Adaptive pooling, usually it is AdaptiveAvgPool2d, is more sofisticated, it supposed to be a part of architecture. Being inserted at the begining of network, it works as (batched) ImageResize; no magic - it is CPU implemented usualy, one will have a hard time implementing it on tensor hardware. In embedded solutions it is typical to have special image processor for such work.
Well, you still could formaly solved the task with transposed_conv2d, by playing with padding, but it would be just cutting off part of the image, probably loosing information, or inserting a lot of useless spacing.
I have been struggling with this for quite some time. All I want is a torch.diff() function. However, many matrix operations do not appear to be easily compatible with tensor operations.
I have tried an enormous amount of various pytorch operation combinations, yet none of them work.
Due to the fact that pytorch hasn't implemented this basic feature, I started by simply trying to subtract the element i+1 from element i along a specific axis.
However, you can't simply do this element-wise (due to the tensor limitations), so I tried to construct another tensor, with the elements shifted along one axis:
ix_plus_one = [0]+list(range(0,prediction.size(1)-1))
ix_differential_tensor = torch.LongTensor(ix_plus_one)
diff_one_tensor = prediction[:,ix_differential_tensor]
But now we have a different problem - indexing doesn't really work to mimic numpy in pytorch as it advertises, so you can't index with a "list-like" Tensor like this. I also tried using the tensor scatter functions
So I'm still stuck with this simple problem of trying to get a gradient on a pytoch tensor.
All of my searching leads to the marvelous capabilities of pytorchs' "autograd" function - which has nothing to do with this problem.
A 1D convolution with a fixed filter should do the trick:
filter = torch.nn.Conv1d(in_channels=1, out_channels=1, kernel_size=2, stride=1, padding=1, groups=1, bias=False)
kernel = np.array([-1.0, 1.0])
kernel = torch.from_numpy(kernel).view(1,1,2)
filter.weight.data = kernel
filter.weight.requires_grad = False
Then use filter like you would any other layer in torch.nn.
Also, you might want to change padding to suit your specific needs.
There appears to be a simpler solution to this (as I needed a similarly), referenced here: https://discuss.pytorch.org/t/equivalent-function-like-numpy-diff-in-pytorch/35327/2
diff = x[1:] - x[:-1]
which can be done along different dimensions such as
diff = polygon[:, 1:] - polygon[:, :-1]
I would recommend writing a unit test that verifies identical behavior though.
For all those running into the question after March 2021
As of torch 1.8 there's torch.diff that works exactly as expected by the OP
I'm trying to write a function that performs Convolution, and I'm getting a little challenged trying to create the output volume using numpy. Specifically, I have an input image that is represented as an array of dimensions (150,150,3). Now, I want to convolve over this image with a set of kernels num_kernels, which are arrays of dimension (4,4,3), and I want these kernels to move over the image with a stride of 2. My thought process has been:
(1) I'll create an output array which is comprised of taking (4,4,3) size chunks out of the input array and stretching these out into rows, and ultimately making a large matrix of these.
(2) Then, I'll create a parameter array composed of all of my (4,4,3) kernels stretched out into rows, which will also make a large matrix.
(3) Then I can dot product these matrices together and reshape the output matrix into the proper dimensions.
My rough psuedo-code start to number (1) is as follows.
def Convolution(input, filter_size, num_filters, stride):
X = input
output_Volume = np.zeros(#dimensions)
weights = np.zeros(#dimensions)
#get weights from other function
for width in range(0,150,2):
for height in range(0,150,2):
row = X(#indexes here to take out chunk).flatten
output_Volume.append(row) #something of this sort
return #dot product output volume and weights
If someone could provide a specific code example of how to implement this (most helpful would be answers to (1) and (2)) in Python (I'm using numpy), it would be much appreciated. Thank you!
The tutorial on MNIST for ML Beginners, in Implementing the Regression, shows how to make the regression on a single line, followed by an explanation that mentions the use of a trick (emphasis mine):
y = tf.nn.softmax(tf.matmul(x, W) + b)
First, we multiply x by W with the expression tf.matmul(x, W). This is flipped from when we multiplied them in our equation, where we had Wx, as a small trick to deal with x being a 2D tensor with multiple inputs.
What is the trick here, and why are we using it?
Well, there's no trick here. That line basically points to one previous equation multiplication order
# Here the order of W and x, this equation for single example
y = Wx +b
# if you want to use batch of examples you need the change the order of multiplication; instead of using another transpose op
y = xW +b
# hence
y = tf.matmul(x, W)
Ok, I think the main point is that if you train in batches (i.e. train with several instances of the training set at once), TensorFlow always assumes that the zeroth dimension of x indicates the number of events per batch.
Suppose you want to map a training instance of dimension M to a target instance of dimension N. You would typically do this by multiplying x (a column vector) with a NxM matrix (and, optionally, add a bias with dimension N (also a column vector)), i.e.
y = W*x + b, where y is also a column vector.
This is perfectly alright seen from the perspective of linear algebra. But now comes the point with the training in batches, i.e. training with several training instances at once.
To get to understand this, it might be helpful to not view x (and y) as vectors of dimension M (and N), but as matrices with the dimensions Mx1 (and Nx1 for y).
Since TensorFlow assumes that the different training instances constituting a batch are aligned along the zeroth dimension, we get into trouble here since the zeroth dimension is occupied by the different elements of one single instance.
The trick is then to transpose the above equation (remember that transposition of a product also switches the order of the two transposed objects):
y^T = x^T * W^T + b^T
This is pretty much what has been described in short within the tutorial.
Note that y^T is now a matrix of dimension 1xN (practically a row vector), while x^T is a matrix of dimension 1xM (also a row vector). W^T is a matrix of dimension MxN. In the tutorial, they did not write x^T or y^T, but simply defined the placeholders according to this transposed equation. The only point that is not clear to me is why they did not define b the "transposed way". I assume that the + operator automatically transposes b if it is necessary in order to get the correct dimensions.
The rest is now pretty easy: if you have batches larger than 1 instance, you just "stack" multiple of the x (1xM) matrices, say to a matrix of dimensions (AxM) (where A is the batch size). b will hopefully automatically broadcasted to this number of events (that means to a matrix of dimension (AxN). If you then use
y^T = x^T * W^T + b^T,
you will get a (AxN) matrix of the targets for each element of the batch.
In the Breaking Linear Classifiers on ImageNet blog post, the author presented a very simple example on how to modify an image to fool a classifier. The technique given is pretty simple: xad = x + 0.5w where x is the 1d vector and w is the 1d weight. This is all good and clear. However, I am trying to implement this with the MNIST dataset and got stuck, with no idea how to turn this simple idea into actual results. I'd like to know how to use the known w matrix to modify a given x matrix (or simply a flattened 1d image vector).
My images matrix x is of the shape (1032, 784) (each image is a flattened vector with 784 numbers), and my weight matrix w has the shape (784, 10). So the question is how to implement the idea introduced in the above mentioned article? In particular, how to add a bit weight to all images? Something like this:
x + 0.5 * w
My code can be found on GitHub. Solution with numpy is preferred, but using TensorFlow would be fine as well. Thanks!
Figured out how:
So, if we're trying to create adversarial images to be falsely classified as "6", we need to grab the weights for "6" only from the weight matrix:
w_six = w[:, 6]
Then we can simply do matrix addition:
images_fool = x + 1.5 * w_six