Tensorflow - batch size issue thrown - python

I am following along this tutorial (https://colab.research.google.com/github/khanhlvg/tflite_raspberry_pi/blob/main/object_detection/Train_custom_model_tutorial.ipynb) from Colab and running it on my own Windows machine.
When I debug my script it throws me this error >
The size of the train_data (0) couldn't be smaller than batch_size (4). To solve this problem, set the batch_size smaller or increase the size of the train_data.
On this snippet of my code
model = object_detector.create(train_data, model_spec=spec, batch_size=4, train_whole_model=True, epochs=20, validation_data=val_data)
My own train data contains 101 images and the example from Colab only contains 62 in their training folder.
I understand it's complaining about training data can't be smaller than batch size but I don't understand why its throwing it in the first place since my training data is not empty.
On my own machine I have Tensorflow Version: 2.8.0 just like in the colab.
I've tried increasing batch sizes all the way from 0 to 100plus but stil gives me the same error.
I've tried dropping one sample so there are 100 images and setting sample size to 2 , 4 etc... but still throws the error.
I'm leading to the conclusion that it is not loading in the data correctly but why?

For anybody running into the same issue as I was , here was my solution.
Okay so the reason this is happening is because of different versions of Python.
I was trying to run this locally with Python 3.8.10
Colab is running 3.7.12 .
I ran all of my data on colab using version (3.7.12) and trained my model with no more further issues.

Related

Tflite Model Maker using Pascal-VOC

I have 31.216 labeled images for object detection. I used LabelIMG program to label images and it's in Pascal-VOC format. I want to create a tflite model for my Kotlin project. However, I have serious problems.
First, If I use my local environment, I tried to install tflite-model-maker library in PyCharm using pip install tflite-model-maker. It downloaded ~30GB and Python still says unresolved reference. Then I tried to add the library from here but it also didn't work. I couldn't achieve importing the library.
On the second way, I used Google Colab. Following this tutorial from Tensorflow. I mounted my Google Drive in Colab and edited all codes for my dataset path. I ran this line model.export(export_dir='.', tflite_filename='AslModel.tflite') lastly and it create model file in the colab directory. I continued to run next line model.evaluate_tflite('AslModel.tflite', val_data) and it gave 16 hours ETA and after 14 hours Google Colab runtime gave an error and all runtime has been reset. Now, I have a tflite and I tested it. Since there is no evaluation step, it makes bad predictions. I started all over again but Google Colab gave an error again. I guess ~7 hours training + ~16 hours of evaluation is impossible with Google Colab because there is 24h limit. Thus, my question is how can I run the evaluation step only?
model is defining in this line and it takes 7 hours model = object_detector.create(train_data, model_spec=spec, batch_size=4, train_whole_model=True, epochs=20, validation_data=val_data). Instead of this line, I want to initialize my tflite file to a model like model = LoadModel(PATH_OF_MY_TFLITE). I couldn't find any load method so I'm stuck there.
To sum up, objective is training the Pascal-VOC formatted dataset. I couldn't import the libraries for Python and with Google Colab I have raw tflite model but it needs evaluation and I can't run previous steps due to time limit. Lastly, I bought Colab Pro but I spent all my compute unit. I don't even know what is the purpose of compute unit. I'm waiting for suggestions. Thank you.

fine tuning GPT2 on Colab gives error: Your session crashed after using all available RAM

I'm new to ml, and trying to create a ml model fine tuning GPT2.
I get the dataset and preprocessed it (file_name). But when I actually try to run below code, fine tuning GPT2, Colab always say 'Your session crashed after using all available RAM.'
gpt2.finetune(sess,
dataset=file_name,
model_name='124M',
steps=50,
restore_from='fresh',
run_name='run1',
print_every=10,
sample_every=10,
save_every=10,
batch_size=16
)
I'm already on the Colab pro version and RAM size is 25GB, and file size is only about 500MB. I tried lowering training steps and batch size but this error is keep happening.
Any idea how can I stop this behavior?

Using tf.data.Dataset with Keras on a TPU

I am training a model with Keras which constitutes of a Huggingface RoBERTa model as a backbone with a downstream task of span prediction and binary prediction for text.
I have been training the model regularly with datasets which are under 2 Gb in size, which has worked fine. The dataset has grown in size in recent weeks and now recently, it has gotten to around 2.3 Gb in size which makes it over the 2 Gb google protobuf hard limit. This makes it impossible to train the model with keras with numpy tensors without a generator on TPUs as tensorflow uses google protobuf to buffer the tensors for the TPUs, and trying to serve all the data without a generator fails. If I use a dataset under 2 Gb in size, everything works fine. TPUs don't support Keras generators yet, so I was looking into using the tf.data.Dataset api instead.
After seeing this question I adopted code from this gist trying to get this to work, resulting in the following code:
def tfdata_generator(x, y, is_training, batch_size=384):
dataset = tf.data.Dataset.from_tensor_slices((x, y))
if is_training:
dataset = dataset.shuffle(1000)
dataset = dataset.map(map_fn)
dataset = dataset.batch(batch_size)
dataset = dataset.repeat()
dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
return dataset
The model is created and compiled for TPU use as before which has never caused any problems and then I create the generators and call the fit function:
train_gen = tfdata_generator(x_train, y_train, is_training=True)
model.fit(
train_gen,
steps_per_epoch=10000,
epochs=1,
)
This results in the following error:
FetchOutputs node : not found [Op:AutoShardDataset]
edit: Colab with bare minimum code and a dummy dataset - unfortunately, b/c of Colab RAM restrictions, building a dummy dataset exceeding 2 Gb in size crashes the notebook. But still, displays code that runs and works on CPU/TPU with a smaller dataset.
This code does however work on a CPU. I can't find any further information on this error online and haven't been able to find more detailed information on how to use TPUs with Keras servicing training data using generators. Have looked into tfrecords a bit but also find documentation on TPUs missing. All help appreciated!
For numpy tensors, 2GB seams to a hard limit for TPU training (as of now).
I see 2 workarounds that you could use.
Write your tf.data to a gs bucket as TFRecord/CSV using TFRecordWriter and let the TPU use training data from that bucket.
Use tf.data service, for your input pipeline. It's a relatively new service that let's you run your data pipeline on separate workers. For details on how to run please see running_the_tfdata_service.

Resource Exhausted in Tensorflow with any architecture

I tried to train a image classifier using tensorflow. I used data api to load the dataset and i used dataset caching to speed up training process. while trying to training the model i struck with a error called Resource Exhausted. I tried to change the batch size even after trying different batch size like 32,64,128 i could not over come this problem
I have tried to remove some layers but i could not fix this error.
Check your batch_size. Decrease it. It seems it is overwhelming.

Error while training: tensorflow:Your input ran out of data; interrupting training

I'm trying to execute the colab notebook associated to the following link that trains Keras Retinanet in order to find objects inside images:
https://www.freecodecamp.org/news/object-detection-in-colab-with-fizyr-retinanet-efed36ac4af3/
However, even if I follow entirely the guide, when I start the train with the line :
!keras_retinanet/bin/train.py --freeze-backbone --random-transform --weights {PRETRAINED_MODEL} --batch-size 8 --steps 500 --epochs 10 csv annotations.csv classes.csv
I get this error at the first epoch:
WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches (in this case, 5000 batches). You may need to use the repeat() function when building your dataset.
This even if, I repeat, I'm following exactly the notebook.
I also tried to train using the Pascal VOC as in the official github repo (by fizyr) is specified, but I get again this error.
Can someone help me? Thanks
EDIT: I managed to solve, by removing the default number of train steps in the train.py file, leaving keras to calculate itself the proper number steps automatically.
I found the solution as posted by hansoli68 in the following thread:
https://github.com/fizyr/keras-retinanet/issues/1449

Categories

Resources