multiple image input training using dataset object - python
How to use a dataset object as input in the model.fit() training loop, for a model with multiple inputs?
Trying to pass the dataset itself gives me the following error:
Failed to find data adapter that can handle input: (<class 'list'> containing values of types {"<class 'tensorflow.python.data.ops.dataset_ops.MapDataset'>"}), <class 'NoneType'>
My case here:
I have a multiple input model built with keras
The inputs are named 'First', 'Second' and 'Third'
I have an image dataset in keras-style:
main_directory/
...class_a/
......a_image_1.jpg
......a_image_2.jpg
...class_b/
......b_image_1.jpg
......b_image_2.jpg
I create the dataset object using tf.keras.utils.image_dataset_from_directory:
train_dataset = image_dataset_from_directory(train_dir,
shuffle=False,
label_mode='categorical',
batch_size=hyperparameters["BATCH_SIZE"],
image_size=IMG_SIZE)
Now, each image is divided in 3 parts, each part serving as input to each of the inputs of the model. I take care of that using some map functions. This is not relevant tot he problem and I will not include it. I cannot use the cropping layers included in TF because of unrelated reasons.
I then try to start the training loop:
history = model.fit([train_dataset1,
train_dataset2,
train_dataset3,
],
epochs=epochs,
callbacks=callbacks,
validation_data=validation_dataset
validation_steps=steps
)
And here is where I get the error.
I have tried some other approaches, like using a dict instead of a list.
The problem seems to be that when training a model with multiple inputs, the fit() loop expects data to come as a list for x-values and a list for y-values, but I haven't been able to split the dataset object into the required formats
I have read many topics on this, but all use datasets that are created using the tf.data.Dataset.from_tensor_slices() method, which is not applicable in my case
Additionally, there is no indication of how the validation dataset has to be structured (at least according to the model.fit() documentation)
I have found some guidance saying that the validation dataset must have the same number of input/outputs as the training datasets (makes sense), but again, no indication on how to build or feed the validation dataset for a multiple input model
As I stated in a comment above, I managed to solve this issue, but there seems to be a bug in the way the Keras fit() training loops handles the input from a zipped dataset, so if you need to do this, you'll have to wait until it's fixed or write your own training loop.
How to approach this
I ended up solving this issue in the following way:
I created 3 separate dataset objects. Then to feed it to the model.fit() training loop, I used the tf.data.Dataset.zip() (as Almog David pointed out in the comments) method to create a single dataset containing the 3 separate datasets:
train_dataset = tf.keras.preprocessing.image_dataset_from_directory(train_dir,
shuffle=False,
label_mode='categorical',
batch_size=32,
image_size=IMG_SIZE)
validation_dataset = tf.keras.preprocessing.image_dataset_from_directory(validation_dir,
shuffle=False,
label_mode='categorical',
batch_size=32,
image_size=IMG_SIZE)
def resize_data1(images, classes):
return (tfimgcrop(images,
offset_height=0,
offset_width=0,
target_height=64,
target_width=64),
classes)
def resize_data2(images, classes):
return (tfimgcrop(images,
offset_height=0,
offset_width=64,
target_height=64,
target_width=64),
classes)
def resize_data3(images, classes):
return (tfimgcrop(images,
offset_height=0,
offset_width=128,
target_height=64,
target_width=64),
classes)
train_dataset_unb = train_dataset.unbatch()
train_dataset1 = train_dataset_unb.map(resize_data1)
train_dataset2 = train_dataset_unb.map(resize_data2)
train_dataset3 = train_dataset_unb.map(resize_data3)
train_dataset_zip = tf.data.Dataset.zip((train_dataset1, train_dataset2, train_dataset3))
validation_dataset_unb = validation_dataset.unbatch()
validation_dataset1 = validation_dataset_unb.map(resize_data1)
validation_dataset2 = validation_dataset_unb.map(resize_data2)
validation_dataset3 = validation_dataset_unb.map(resize_data3)
validation_dataset_zip = tf.data.Dataset.zip((validation_dataset1, validation_dataset2, validation_dataset3))
Validating the approach by testing what the zipped datasets return in each call:
Printing elements in a for() loop using the tf.data.Dataset.as_numpy_iterator() method:
for element in train_dataset_zip.as_numpy_iterator():
print("element", element)
Outputs:
element [[[x1, y1, z1], [c1,]], [[x2, y2, z2], [c2,]], [[x3, y3, z3], [c3,]]]
element [[[x1, y1, z1], [c1,]], [[x2, y2, z2], [c2,]], [[x3, y3, z3], [c3,]]]
[...]
element [[[x1, y1, z1], [c1,]], [[x2, y2, z2], [c2,]], [[x3, y3, z3], [c3,]]]
Printing elements in a for() loop using the python intrinsic enumerate() function:
for idx, (ds1, ds2, ds3) in enumerate(train_dataset_zip):
print("ds1: ", ds1)
print("ds2: ", ds2)
print("ds3: ", ds3)
Outputs:
ds1: (<tf.Tensor: shape=(64, 64, 3), dtype=float32, numpy=[(large array with raw pixel values)], , dtype=float32)>, <tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.], dtype=float32)>
ds2: (<tf.Tensor: shape=(64, 64, 3), dtype=float32, numpy=[(large array with raw pixel values)], , dtype=float32)>, <tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.], dtype=float32)>
ds3: (<tf.Tensor: shape=(64, 64, 3), dtype=float32, numpy=[(large array with raw pixel values)], , dtype=float32)>, <tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.], dtype=float32)>
ds1: (<tf.Tensor: shape=(64, 64, 3), dtype=float32, numpy=[(large array with raw pixel values)], , dtype=float32)>, <tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.], dtype=float32)>
ds2: (<tf.Tensor: shape=(64, 64, 3), dtype=float32, numpy=[(large array with raw pixel values)], , dtype=float32)>, <tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.], dtype=float32)>
ds3: (<tf.Tensor: shape=(64, 64, 3), dtype=float32, numpy=[(large array with raw pixel values)], , dtype=float32)>, <tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.], dtype=float32)>
[...]
ds1: (<tf.Tensor: shape=(64, 64, 3), dtype=float32, numpy=[(large array with raw pixel values)], , dtype=float32)>, <tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.], dtype=float32)>
ds2: (<tf.Tensor: shape=(64, 64, 3), dtype=float32, numpy=[(large array with raw pixel values)], , dtype=float32)>, <tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.], dtype=float32)>
ds3: (<tf.Tensor: shape=(64, 64, 3), dtype=float32, numpy=[(large array with raw pixel values)], , dtype=float32)>, <tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.], dtype=float32)>
Related
difference results of continuous call tf.keras.layers.DenseFeatures
Here is my code import tensorflow as tf age_buck = tf.feature_column.numeric_column('col1') age_buck_column = tf.feature_column.bucketized_column(age_buck, [2, 4, 6, 8, 10]) age_embedding = tf.feature_column.embedding_column(age_buck_column, dimension=3, trainable=False) input_layer2 = tf.keras.layers.DenseFeatures([age_embedding]) input_layer2({"col1":[[2]]}) The question is that I only run this cell twice and get different results. input_layer2 = tf.keras.layers.DenseFeatures([age_embedding]) input_layer2({"col1":[[2]]}) Here are the results. First result. <tf.Tensor: shape=(1, 3), dtype=float32, numpy=array([[-0.31282678, 0.6201095 , 0.98769945]], dtype=float32)> Second result. <tf.Tensor: shape=(1, 3), dtype=float32, numpy=array([[ 0.7076622 , 0.62326956, -0.5604797 ]], dtype=float32)> I am very confused about what happened when we call tf.keras.layers.DenseFeatures([age_embedding])
How to convert ragged tensor to list of tensors inside a graph
I have a ragged tensor with variable shape in the 2nd dimension N x ? x 4, I'd like to convert it to a list of tensors. Down below is a function that works, but only when it's not decorated with tf.function. I need this function to run inside a tf graph. import tensorflow as tf raggedTensor = tf.ragged.constant([[[0.7688891291618347, 0.3979208469390869, 0.9807137250900269, 0.5825483798980713], [0.69159334897995, 0.48753976821899414, 0.7804230451583862, 0.5539296865463257]], [[0.5818965435028076, 0.343869686126709, 0.8541288375854492, 0.6288187503814697], [0.636405348777771, 0.6720571517944336, 0.7466434240341187, 0.7985518574714661]], [[0.65436190366745, 0.47322067618370056, 0.9061073660850525, 0.6343377828598022]], [[0.7395644187927246, 0.6922436356544495, 0.9913792610168457, 1.0], [0.7860392928123474, 0.44102346897125244, 0.8941574096679688, 0.637432873249054]]]) def convertGT(x): out = [] for i in range(x.nrows()): out.append(x[i].to_tensor()) return out #runs fine convertGT(raggedTensor) [<tf.Tensor: shape=(2, 4), dtype=float32, numpy= array([[0.7688891 , 0.39792085, 0.9807137 , 0.5825484 ], [0.69159335, 0.48753977, 0.78042305, 0.5539297 ]], dtype=float32)>, <tf.Tensor: shape=(2, 4), dtype=float32, numpy= array([[0.58189654, 0.3438697 , 0.85412884, 0.62881875], [0.63640535, 0.67205715, 0.7466434 , 0.79855186]], dtype=float32)>, <tf.Tensor: shape=(1, 4), dtype=float32, numpy=array([[0.6543619 , 0.47322068, 0.90610737, 0.6343378 ]], dtype=float32)>, <tf.Tensor: shape=(2, 4), dtype=float32, numpy= array([[0.7395644 , 0.69224364, 0.99137926, 1. ], [0.7860393 , 0.44102347, 0.8941574 , 0.6374329 ]], dtype=float32)>] #tf.function def convertGT(x): out = [] for i in range(x.nrows()): out.append(x[i].to_tensor()) return out #this will throw the error convertGT(raggedTensor) InaccessibleTensorError: The tensor 'Tensor("while/RaggedToTensor/RaggedTensorToTensor:0", shape=(None, None), dtype=float32)' cannot be accessed here: it is defined in another function or code block. Use return values, explicit Python locals or TensorFlow collections to access it. Defined in: FuncGraph(name=while_body_4893, id=140025401503696); accessed from: FuncGraph(name=convertGT, id=140025402568040).
I don't think it is possible to work with lists within tf.function: python side effects (like appending to lists) only happen the first time you call a Function with a set of inputs. Afterwards, the traced tf.Graph is reexecuted, without executing the Python code. See here: https://www.tensorflow.org/guide/function#python_side_effects
I managed to come up with a solution, here it is: #tf.function def unstackRaggedBBox(ragged): #convert the ragged to a zero padded tensor _tensor = ragged.to_tensor() ret = [] for i in range(_tensor.shape[0]): #create a mask/remove the 0 values mask = tf.cast(_tensor[i], dtype=tf.bool) row = tf.boolean_mask(_tensor[i], mask, axis=0) #reshape the tensor to Nx4 size = tf.cast(tf.shape(row, out_type=tf.dtypes.int32) / 4, tf.int32) row = tf.reshape(row, [size[0], 4]) ret.append(row) return ret I think the ragged_tensor slice operation caused the problem, lists are working fine in graphs.
How to convert a list of tensorflow EagerTensors to a numpy array
I have data that looks this when I print it: data = [<tf.Tensor: shape=(), dtype=float32, numpy=-0.0034351824>, <tf.Tensor: shape=(), dtype=float32, numpy=0.0003163157>, <tf.Tensor: shape=(), dtype=float32, numpy=0.00060091465>, <tf.Tensor: shape=(), dtype=float32, numpy=0.0012879161>, <tf.Tensor: shape=(), dtype=float32, numpy=0.0002799925>] So this is a list where the elements are of type <class 'tensorflow.python.framework.ops.EagerTensor'>. I would like to convert it to a standard numpy array. So in this case it would look like: array([-0.0034351824, 0.0003163157, 0.00060091465, 0.0012879161, 0.0002799925]) How can you do that?
You can use import numpy as np x = np.array(data)
You can use np.ravel since you want a 1D list: np.ravel(data) array([0.22184575, 0.3621379 , 0.5509906 , 0.20388651, 0.94017696], dtype=float32) Full example: import tensorflow as tf import numpy as np data = [tf.random.uniform((1,)) for i in range(5)] np.ravel(data)
How to create mini-batches using tensorflow.data.experimental.CsvDataset compatible with model's input shape?
I'm going to train mini-batch by using tensorflow.data.experimental.CsvDataset in TensorFlow 2. But Tensor's shape doesn't fit to my model's input shape. Please let me know what is the best way to mini-batch training by a dataset of TensorFlow. I tried as follows: # I have a dataset with 4 features and 1 label feature = tf.data.experimental.CsvDataset(['C:/data/iris_0.csv'], record_defaults=[.0] * 4, header=True, select_cols=[0,1,2,3]) label = tf.data.experimental.CsvDataset(['C:/data/iris_0.csv'], record_defaults=[.0] * 1, header=True, select_cols=[4]) dataset = tf.data.Dataset.zip((feature, label)) # and I try to minibatch training: model = tf.keras.Sequential([tf.keras.layers.Dense(1, input_shape=(4,))]) model.compile(loss='mse', optimizer='sgd') model.fit(dataset.repeat(1).batch(3), epochs=1) I got an error: ValueError: Error when checking input: expected dense_6_input to have shape (4,) but got array with shape (1,) Because of : CsvDataset() returns the a tensor of shape (features, batch), but I need it to be of shape (batch, features). Reference code: for feature, label in dataset.repeat(1).batch(3).take(1): print(feature) # (<tf.Tensor: id=487, shape=(3,), dtype=float32, numpy=array([5.1, 4.9, 4.7], dtype=float32)>, <tf.Tensor: id=488, shape=(3,), dtype=float32, numpy=array([3.5, 3. , 3.2], dtype=float32)>, <tf.Tensor: id=489, shape=(3,), dtype=float32, numpy=array([1.4, 1.4, 1.3], dtype=float32)>, <tf.Tensor: id=490, shape=(3,), dtype=float32, numpy=array([0.2, 0.2, 0.2], dtype=float32)>)
The tf.data.experimental.CsvDataset creates a dataset where each element of the dataset correponds to a row in the CSV file and consists of multiple tensors, i.e. a separate tensor for each column. Therefore, first you need to use map method of dataset to stack all of these tensors into a single tensor so as it would be compatible with the input shape expected by the model: def map_func(features, label): return tf.stack(features, axis=1), tf.stack(label, axis=1) dataset = dataset.map(map_func).batch(BATCH_SIZE)
Concurrent Tensorflow String Split
I need to do some preprocessing before feeding the input to my network. I am working on my data iterator and making some functions to make the data ready. My train set is a large file and each line has following scheme. Each line is semicolon seperated which could also be empty. Each split then contains different events which are como seperated and finaly each observation has 5 singnals which are colon seperated. 1:1560629595635:183.94:z1:Happy,2:1560629505635:100:z1:Sad;5:1561929595635:1:z1:Happy,13:1561629595635:12:j1:Sad;50:15616295956351:10:f1:Sad My objective is to split each line in following order: 1. ';' Split 2. ',' Split 3. ':' Split *Note: length after the last split = 5 In following function that I wrote, if I specify indx for the function it works fine for only one fragment output of ';' split. def __semicolon_coma_colon_split(line, table, indx): sources = tf.strings.split(line, ';') pairs = tf.strings.split(sources[indx], ',') items = tf.strings.split(pairs, ':').to_tensor() return (tf.strings.to_number(items[:, 0], out_type=tf.int32), tf.strings.to_number(items[:, 1], out_type=tf.int64), tf.strings.to_number(items[:, 2], out_type=tf.float32), tf.cast(table.lookup(items[:, 3]), tf.int32), tf.cast(table.lookup(items[:, 4]), tf.int32)) And the current output shape is (<tf.Tensor: shape=(2,)>, <tf.Tensor: shape=(2,)>, <tf.Tensor: shape=(2,)>, <tf.Tensor: shape=(2,)>, <tf.Tensor: shape=(2,)>) However, I need to run this cuncurrently on all the splits after ';' and the expected output tensor would look like a tuple of tuples that each contains 5 tensors; ((<tf.Tensor: shape=(2,)>, <tf.Tensor: shape=(2,)>, <tf.Tensor: shape=(2,)>, <tf.Tensor: shape=(2,)>, <tf.Tensor: shape=(2,)>), (<tf.Tensor: shape=(2,)>, <tf.Tensor: shape=(2,)>, <tf.Tensor: shape=(2,)>, <tf.Tensor: shape=(2,)>, <tf.Tensor: shape=(2,)>), (<tf.Tensor: shape=(1,)>, <tf.Tensor: shape=(1,)>, <tf.Tensor: shape=(1,)>, <tf.Tensor: shape=(1,)>, <tf.Tensor: shape=(1,)>))
Can't you do this with split built-in method for strings?