Where does keras actually initialize the dataset?

Where does keras actually initialize the dataset? - python

I'm trying to figure out the implementation of SGD in tensorflow, especially how keras/tensorflow actually initializes and dispatches the dataset.
In the constructor (__init__ method) of class TensorLikeDataAdapter, self._dataset is initialized by this line
https://github.com/tensorflow/tensorflow/blob/r2.5/tensorflow/python/keras/engine/data_adapter.py#L346
self._dataset = dataset
I tried to print the value out with this line
print('enumerate_epochs self._dataset', list(self._dataset))
and I got
<_OptionsDataset shapes: ((None, 2), (None,)), types: (tf.float32, tf.float32)>
which seems to indicate that the dataset hasn't yet been actually loaded.
At the very begining of the enumerate_epochs method
https://github.com/tensorflow/tensorflow/blob/r2.5/tensorflow/python/keras/engine/data_adapter.py#L1196
I added this line
def enumerate_epochs(self):
print('enumerate_epochs self._dataset', list(self._dataset))
and I got 3 (I set epoch=3) of the actual dataset, which means the dataset has been initialized and randomized somewhere before.
I went through the whole data_adapter.py but failed to locate where the dataset is actually initialized.
highlight
I also tried this line
print('data_handler._dataset', data_handler._dataset)
for epoch, iterator in data_handler.enumerate_epochs():
and I got
data_handler._dataset <_OptionsDataset shapes: ((None, 2), (None,)), types: (tf.float32, tf.float32)>
However, this line
def _truncate_execution_to_epoch(self):
print('_truncate_execution_to_epoch self._dataset', list(self._dataset))
gives 3 (epoch=3) of the actual dataset, which means somewhere just in between the dataset is actually initialized though I couldn't imagine where it could be!
I also tried class DataHandler
print('DataHandler self._dataset', list(self._dataset))
self._configure_dataset_and_inferred_steps(strategy, x, steps_per_epoch,
class_weight, distribute)
and I got this error
AttributeError: 'DataHandler' object has no attribute '_dataset'
Could someone help me to see the light at the end of the tunnel.

Related

read_train_sets() missing 1 required positional argument: 'classes'

I've been looking over a few Tensorflow and Keras guides and am generally as much of a beginner as you can get when it comes to Python. Any help with the below problem would be much appreciated. I'm struggling to figure out what the problem with this line of code below is. I'm getting the read_train_sets from a separate file that is defined as:
def read_train_sets(self, train_path, image_width, image_height, classes, validation_size)
I then called this in a separate file with the following code:
data = read_train_sets(train_path, img_width, img_height, classes, validation_size=0.2)
But then I got an error message that says:
<ipython-input-22-e2aa446e36dd> in <module>
----> 1 data = read_train_sets(train_path, img_width, img_height, classes, validation_size=0.2)
TypeError: read_train_sets() missing 1 required positional argument: 'classes'
Any idea what this means? I thought I'm already calling classes, but then again, I could be wrong.

That read_train_set is a function that belongs to the class DataSet.
In your code:
data = read_train_sets(train_path, img_width, img_height, classes, validation_size=0.2)
You are using this function as you created it yourself, and the error tells you that you are lacking one argument because it does not identify the self argument when you call the function.
The correct way to call that function should be something like:
data = dataset.read_train_sets(train_path, img_width, img_height, classes, validation_size=0.2)
You will need to previously create a DataSet object (I named it dataset in the example).
Source: https://github.com/rdcolema/tensorflow-image-classification/blob/master/dataset.py

Keras : ValueError: `y` argument is not supported when using `keras.utils.Sequence` as input

I wanted to do a simple classification task but when I try to run it, I am getting this error:
`ValueError: `y` argument is not supported when using `keras.utils.Sequence` as input.`

Use the keyword validation_data. If you pass your validation dataset without the keyword, Keras thinks validation_dataset is your labels.
model.fit(train_dataset, validation_data=validation_dataset, batch_size=32, epochs=1)

Let's imagine that you want to predict if it's a Dog or a Cat picture, you have two labels Dog and Cat. Now when it comes to CNN dataset the way most of peoples structure it (which is what is expecting image_generator.flow_from_directory) looks like this :
Train/
.........Cat/
...............Image1.jpg
...............Image2.jpg
.........Dog/
...............Image3.jpg
...............Image4.jpg
Same for your label directory.
Btw your directory Label should be renamed validation ( it's not really important but it makes more sense)

What is the difference between these two ways of building a model in keras?

I am new to Keras and after going through a few tutorials i started building a model and found these two styles of implementations. However i am getting an error in the first one and second one works fine. Can someone explain the difference between the two?
First Method:
visible = Embedding(QsVocabSize, 1024, input_length=max_length_inp, mask_zero=True)
encoder = LSTM(100,activation='relu')(visible)
Second Method:
model = Sequential()
model.add(Embedding(QsVocabSize, 1024, input_length=max_length_inp, mask_zero=True))
model.add(LSTM(100,activation ='relu'))
This is the error I get:
ValueError: Layer lstm_59 was called with an input that isn't a symbolic tensor. Received type: <class 'keras.layers.embeddings.Embedding'>. Full input: [<keras.layers.embeddings.Embedding object at 0x00000207BC7DBCC0>]. All inputs to the layer should be tensors.

They're two ways of creating DL models in Keras. The first code snippet follows functional style. This style is used for creating complex models like multi-input/output, shared layers etc.
https://keras.io/getting-started/functional-api-guide/
The second code snippet is Sequential style. Simple models can be created which involves just stacking of layers.
https://keras.io/getting-started/sequential-model-guide/
If you read the functional API guide, you'll notice the following point:
'A layer instance is callable (on a tensor), and it returns a tensor'
Now the error you're seeing would make sense. This line only creates the layer and doesn't invoke it by passing a tensor.
visible = Embedding(QsVocabSize, 1024, input_length=max_length_inp, mask_zero=True)
Subsequently, passing this Embedding object to LSTM layer throws an error as it is expecting a Tensor.
This is an example from the functional API guide. Notice the output tensors getting passed from one layer to another.
main_input = Input(shape=(100,), dtype='int32', name='main_input')
x = Embedding(output_dim=512, input_dim=10000, input_length=100)(main_input)
lstm_out = LSTM(32)(x)

Saving Keras model - UserWarning: Layer XX was passed non-serializable keyword arguments

I've just trained and finished my first sequence to sequence model with Keras and now wants to save it so I later can load the model and use it (without having to train it every time). When saving it I do:
model.save_weights('models/model_weights.h5')
with open('models/model_architecture.json', 'w') as f:
f.write(model.to_json())
But, doing this produces a bunch of user warnings (more or less one for every layer) of the type:
path/to/site-packages/keras/engine/topology.py:2379: UserWarning: Layer lstm_15 was
passed non-serializable keyword arguments: {'initial_state': [<tf.Tensor 's0_7:0'
shape=(?, 64) dtype=float32>, <tf.Tensor 'c0_7:0' shape=(?, 64) dtype=float32>]}.
They will not be included in the serialized model (and thus will be missing at
deserialization time).
str(node.arguments) + '. They will not be included '
Even though just warnings this really puts off the model and accuracy after loading the model.
Everything works perfectly fine right after training (making good predictions, etc.), it's only the saving part that fails. What can I do about this? Anyone else experienced the same thing and solved it somehow? Is there a workaround? May it be any issues with the names I have given the different layers?

You can save the model "code", maybe a .py file just to create the model exactly as it were.
Then you load weights: model.load_weights('models/model_weights.h5').

Serializing a `tf.Summary` object to `string` Tensor

I try to generate image summaries to be displayed in tensorboard. This worked in an eager execution environment.
Now, I try to use the eval_metric_ops returning a dict of operations to compute metrics during execution of the computation graph. For this, I rely on tf.py_func to do my metrics computations and plots. This function signature is
tf.py_func(
func,
inp,
Tout,
stateful=True,
name=None
)
Where Tout is the returned type of the function. I managed to make it work for simple metrics (float values). As far as I understand, I need to define a string returned type for my summaries which will be parsed after to rebuild my images.
Here is the blocking point.
I build my Summary with:
summ = tf.Summary(value=[
tf.Summary.Value(
tag=metric_name,
image=tf.Summary.Image(
encoded_image_string=encode_image_array_as_png_str(
self._last_metrics[metric_name])))])
Returning it as is, I get: W tensorflow/core/framework/op_kernel.cc:1306] Unimplemented: Unsupported object type Summary
Returning str(summ) gives: WARNING:tensorflow:Skipping summary for ..., cannot parse string to Summary.
I also tried to build it with:
tf.summary.image(
name,
tensor,
max_outputs=3,
collections=None,
family=None
)
But this gives: W tensorflow/core/framework/op_kernel.cc:1306] Unimplemented: Unsupported object type Tensor
Do you know how to serialize a Summary to a string/bytes iterable/whatever can be interpreted as a string Tensor, in a way that it can be parsed back to an image Summary after that.
Thanks.

Shame on me.
As many other classes in tensorflow, Summary is defined by a Protocol Buffer message and thus, implement the SerializeToString().
Hence, just returning summ.SerializeToString() works!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Where does keras actually initialize the dataset? - python

Related

read_train_sets() missing 1 required positional argument: 'classes'

Keras : ValueError: `y` argument is not supported when using `keras.utils.Sequence` as input

What is the difference between these two ways of building a model in keras?

Saving Keras model - UserWarning: Layer XX was passed non-serializable keyword arguments

Serializing a `tf.Summary` object to `string` Tensor

Categories

Resources