Related
I'm doing my project about machine learning and I need to merge (concatenate) two tensors that have different shapes.
For more details:
We're trying to concatenate an matrix of tokens with an one hot matrix. tokens pass through an embedding layer so we get a weights matrix with shape like (100, 10, 300).
Finally we need to merge one hot matrix and weights matrix like this:
(100, 300) and (100, 10, 300) to be (100, >11, 300)
This is, to append each of the 300 one hot vectors in matrix in the first position of each weight matrix like (1,300) + (1,10,300) to get a sample of merged values with shape (1,>11,300)
I actually reached this in a manual form through a loop but it takes too much time so I wanted to know if this is posible through keras or any other similar.
This is the function I wrote so here I reached what I wanted, but if it is possible to do in a better way that doesn't take too much time is the ideal.
def join_demo_sentence(X, Demo, embedding, max_length):
X = pad_sequences(X, maxlen=max_length, padding='post')
Demo = pad_sequences(Demo, maxlen=300, padding='post')
joined = []
for i, sequence in tqdm(enumerate(X), desc='Joining'):
demo = Demo[i]
sequence = embedding.get_weights()[0][sequence]
join = np.insert(sequence.T, 0, demo, axis=1)
joined.append(join.T)
X = np.asarray(joined)
return X
That function loops through the matrix to join demo one hot values and sentence tokens so for final result I get the sentences with the one hot of demographic in first position.
I'm learning about keras so I think there's a way with keras.layers.Concatenate
Is it what you need:
a = tf.random.uniform((100, 10, 300))
b = tf.random.uniform((100, 300))
b = b[:, tf.newaxis, :] # add the second axis
res = tf.concat((a, b), -2)
I have a Keras LSTM model that contains multiple outputs.
The model is defined as follows:
outputs=[]
main_input = Input(shape= (seq_length,feature_cnt), name='main_input')
lstm = LSTM(32,return_sequences=True)(main_input)
for _ in range((output_branches)): #output_branches is the number of output branches of the model
prediction = LSTM(8,return_sequences=False)(lstm)
out = Dense(1)(prediction)
outputs.append(out)
model = Model(inputs=main_input, outputs=outputs)
model.compile(optimizer='rmsprop',loss='mse')
I have a problem when reshaping the output data.
The code for reshaping the output data is:
y=y.reshape((len(y),output_branches,1))
I got the following error:
ValueError: Error when checking model target: the list of Numpy arrays
that you are passing to your model is not the size the model expected.
Expected to see 5 array(s), but instead got the following list of 1
arrays: [array([[[0.29670931],
[0.16652206],
[0.25114482],
[0.36952324],
[0.09429612]],
[[0.16652206],
[0.25114482],
[0.36952324],
[0.09429612],...
How can I correctly reshape the output data?
It depends on how y is structured initially. Here I assume that y is a single-valued label for each sequence in batch.
When there are multiple inputs/outputs model.fit() expects a corresponding list of inputs/outputs to be given. np.split(y, output_branches, axis=-1) in a following fully reproducible example does exactly this - for each batch splits a single list of outputs into a list of separate outputs where each output (in this case) is 1-element list:
import tensorflow as tf
import numpy as np
tf.enable_eager_execution()
batch_size = 100
seq_length = 10
feature_cnt = 5
output_branches = 3
# Say we've got:
# - 100-element batch
# - of 10-element sequences
# - where each element of a sequence is a vector describing 5 features.
X = np.random.random_sample([batch_size, seq_length, feature_cnt])
# Every sequence of a batch is labelled with `output_branches` labels.
y = np.random.random_sample([batch_size, output_branches])
# Here y.shape() == (100, 3)
# Here we split the last axis of y (output_branches) into `output_branches` separate lists.
y = np.split(y, output_branches, axis=-1)
# Here y is not a numpy matrix anymore, but a list of matrices.
# E.g. y[0].shape() == (100, 1); y[1].shape() == (100, 1) etc...
outputs = []
main_input = tf.keras.layers.Input(shape=(seq_length, feature_cnt), name='main_input')
lstm = tf.keras.layers.LSTM(32, return_sequences=True)(main_input)
for _ in range(output_branches):
prediction = tf.keras.layers.LSTM(8, return_sequences=False)(lstm)
out = tf.keras.layers.Dense(1)(prediction)
outputs.append(out)
model = tf.keras.models.Model(inputs=main_input, outputs=outputs)
model.compile(optimizer='rmsprop', loss='mse')
model.fit(X, y)
You might need to play around with axes as you didn't specify how exactly your data look like.
EDIT:
As author is looking for an answer drawing from official sources, it's mentioned here (not explicitly though, it only mentions what the Dataset should yield, hence - what kind of input structure model.fit() expects):
When calling fit with a Dataset object, it should yield either a tuple of lists like ([title_data, body_data, tags_data], [priority_targets, dept_targets]) or a tuple of dictionaries like ({'title': title_data, 'body': body_data, 'tags': tags_data}, {'priority': priority_targets, 'department': dept_targets}).
Since you have an amount of outputs equal to output_branches, your output data must be a list with the same amount of arrays.
Basically, if the output data is in the middle dimension as your reshape suggests:
y = [ y[:,i] for i in range(output_branches)]
I have a bunch of images that are grouped into tensor of a following shape:
> images.shape
produces (2000, 1440, 1, 16) which have the following meaning (rows, cols, channels, images_count)
Now for the sake of explanation simplicity I need to perform a weighted sum of those images that would result with one image i.e. (2000, 1440, 1).
Actually there are multiple groups of weights (over 128) and this means that out of 16 input images I get 128 merged images instead of just one which judging by the image size is pretty heavy operation.
And so I'm looking for ways / ideas that would allow me to perform the operation fast and efficiently with minimal amount of temporaries and memory size consumed.
Are there any mechanisms in TF that would allow to perform this operation efficiently and fast?
Thank you in advance!
Suppose for simplicity that you have a 8 different groups of weights and the data is in format you have specified.
First we convert images to a conventional batch_size first form. Than we expand dimensions of images by adding second one to support broadcasting when we are doing element-wise multiplication between images and weights. Finally, we reduce the first dimension by computing the (already weighted) sum of images for each weight group.
import tensorflow as tf
import numpy as np
x = tf.placeholder(tf.float32, shape=(2000, 1440, 1, None))
w = tf.placeholder(tf.float32, shape=(None, 2000, 1440, 1))
xtransposed = tf.transpose(x, perm=[3, 0, 1, 2]) # n_samples first
xexpanded = tf.expand_dims(xtransposed, 1) # expand for broadcasting
multiplied = xexpanded * w
reduced = tf.reduce_sum(multiplied, axis=0) # weighted sum over all images
images = np.random.normal(size=(2000, 1440, 1, 16))
weights = np.random.normal(size=(8, 2000, 1440, 1))
with tf.Session() as sess:
res = sess.run(reduced, feed_dict={x:images, w:weights})
print(res.shape) # (8, 2000, 1440, 1)
res now stores weighted sums for 8 different weights groups in numpy format.
I have the following neural network in Keras:
inp = layers.Input((3,))
#Middle layers omitted
out_prop = layers.Dense(units=3, activation='softmax')(inp)
out_value = layers.Dense(units=1, activation = 'linear')(inp)
Then I prepared a pseudo-input to test my network:
inpu = np.array([[1,2,3],[4,5,6],[7,8,9]])
When I try to predict, this happens:
In [45]:nn.network.predict(inpu)
Out[45]:
[array([[0.257513 , 0.41672954, 0.32575747],
[0.20175152, 0.4763418 , 0.32190666],
[0.15986516, 0.53449154, 0.30564335]], dtype=float32),
array([[-0.24281949],
[-0.10461146],
[ 0.11201331]], dtype=float32)]
So, as you can see above, I wanted two output: one should have been an array with size 3, the other should have been a normal value. Instead, I get a 3x3 matrix, and an array with 3 elements. What am I doing wrong?
You are passing three input samples to the network:
>>> inpu.shape
(3,3) # three samples of size 3
And you have two output layers: one of them outputs a vector of size 3 for each sample and the other outputs a vector of size one (i.e. scalar), again for each sample. As a result the output shapes would be (3, 3) and (3, 1).
Update: If you want your network to accept an input sample of shape (3,3) and outputs vectors of size 3 and 1, and you want to only use Dense layers in your network, then you must use a Flatten layer somewhere in the model. One possible option is to use it right after the input layer:
inp = layers.Input((3,3)) # don't forget to set the correct input shape
x = Flatten()(inp)
# pass x to other Dense layers
Alternatively, you could flatten your data to have a shape of (num_samples, 9) and then pass it to your network without using a Flatten layer.
Update 2: As #Mete correctly pointed out in the comments, make sure the input array have a shape of (num_samples, 3, 3) if each input sample has a shape of (3,3).
I am using dynamic_rnn to process MNIST data:
# LSTM Cell
lstm = rnn_cell.LSTMCell(num_units=200,
forget_bias=1.0,
initializer=tf.random_normal)
# Initial state
istate = lstm.zero_state(batch_size, "float")
# Get lstm cell output
output, states = rnn.dynamic_rnn(lstm, X, initial_state=istate)
# Output at last time point T
output_at_T = output[:, 27, :]
Full code: http://pastebin.com/bhf9MgMe
The input to the lstm is (batch_size, sequence_length, input_size)
As a result the dimensions of output_at_T is (batch_size, sequence_length, num_units) where num_units=200.
I need to get the last output along the sequence_length dimension. In the code above, this is hardcoded as 27. However, I do not know the sequence_length in advance as it can change from batch to batch in my application.
I tried:
output_at_T = output[:, -1, :]
but it says negative indexing is not implemented yet, and I tried using a placeholder variable as well as a constant (into which I could ideally feed the sequence_length for a particular batch); neither worked.
Any way to implement something like this in tensorflow atm?
Have you noticed that there are two outputs from dynamic_rnn?
Output 1, let's call it h, has all outputs at each time steps (i.e. h_1, h_2, etc),
Output 2, final_state, has two elements: the cell_state, and the last output for each element of the batch (as long as you input the sequence length to dynamic_rnn).
So from:
h, final_state= tf.dynamic_rnn( ..., sequence_length=[batch_size_vector], ... )
the last state for each element in the batch is:
final_state.h
Note that this includes the case when the length of the sequence is different for each element of the batch, as we are using the sequence_length argument.
This is what gather_nd is for!
def extract_axis_1(data, ind):
"""
Get specified elements along the first axis of tensor.
:param data: Tensorflow tensor that will be subsetted.
:param ind: Indices to take (one for each element along axis 0 of data).
:return: Subsetted tensor.
"""
batch_range = tf.range(tf.shape(data)[0])
indices = tf.stack([batch_range, ind], axis=1)
res = tf.gather_nd(data, indices)
return res
In your case (assuming sequence_length is a 1-D tensor with the length of each axis 0 element):
output = extract_axis_1(output, sequence_length - 1)
Now output is a tensor of dimension [batch_size, num_cells].
output[:, -1, :]
works with Tensorflow 1.x now!!
Most answers cover it thoroughly, but this code snip might help understand what's really being returned by the dynamic_rnn layer
=> Tuple of (outputs, final_output_state).
So for an input with max sequence length of T time steps outputs is of the shape [Batch_size, T, num_inputs] (given time_major=False; default value) and it contains the output state at each timestep h1, h2.....hT.
And final_output_state is of the shape [Batch_size,num_inputs] and has the final cell state cT and output state hT of each batch sequence.
But since the dynamic_rnn is being used my guess is your sequence lengths vary for each batch.
import tensorflow as tf
import numpy as np
from tensorflow.contrib import rnn
tf.reset_default_graph()
# Create input data
X = np.random.randn(2, 10, 8)
# The second example is of length 6
X[1,6:] = 0
X_lengths = [10, 6]
cell = tf.nn.rnn_cell.LSTMCell(num_units=64, state_is_tuple=True)
outputs, states = tf.nn.dynamic_rnn(cell=cell,
dtype=tf.float64,
sequence_length=X_lengths,
inputs=X)
result = tf.contrib.learn.run_n({"outputs": outputs, "states":states},
n=1,
feed_dict=None)
assert result[0]["outputs"].shape == (2, 10, 64)
print result[0]["outputs"].shape
print result[0]["states"].h.shape
# the final outputs state and states returned must be equal for each
# sequence
assert(result[0]["outputs"][0][-1]==result[0]["states"].h[0]).all()
assert(result[0]["outputs"][-1][5]==result[0]["states"].h[-1]).all()
assert(result[0]["outputs"][-1][-1]==result[0]["states"].h[-1]).all()
The final assertion will fail as the final state for the 2nd sequence is at 6th time step ie. the index 5 and the rest of the outputs from [6:9] are all 0s in the 2nd timestep
I am new to Stackoverflow and cannot comment yet so I am writing this new answer. #VM_AI, the last index is tf.shape(output)[1] - 1.
So, reusing your answer:
# Let's first fetch the last index of seq length
# last_index would have a scalar value
last_index = tf.shape(output)[1] - 1
# Then let's reshape the output to [sequence_length,batch_size,num_units]
# for convenience
output_rs = tf.transpose(output,[1,0,2])
# Last state of all batches
last_state = tf.nn.embedding_lookup(output_rs,last_index)
This works for me.
You should be able to access the shape of your output tensor using tf.shape(output). The tf.shape() function will return a 1d tensor containing the sizes of the output tensor. In your example, this would be (batch_size, sequence_length, num_units)
You should then be able to extract the value of output_at_T as output[:, tf.shape(output)[1], :]
There is a function in TensorFlow tf.shape that allows you to get the symbolic interpretation of shape rather than None being returned by output._shape[1]. And after fetching the last index you can lookup by using tf.nn.embedding_lookup, which is recommended especially when the data to be fetched is high as this does parallel lookup 32 by default.
# Let's first fetch the last index of seq length
# last_index would have a scalar value
last_index = tf.shape(output)[1]
# Then let's reshape the output to [sequence_length,batch_size,num_units]
# for convenience
output_rs = tf.transpose(output,[1,0,2])
# Last state of all batches
last_state = tf.nn.embedding_lookup(output_rs,last_index)
This should work.
Just to clarify what #Benoit Steiner said. His solution would not work as tf.shape would return symbolic interpretation of the shape value, and such cannot be used for slicing tensors i.e., direct indexing