I'm trying to understand how to read local images, use them as TensorFlow Dataset and train Keras model with TF Dataset. I'm following TF Keras MNIST TPU tutorial. The only difference that I want to read my set of images and train on them.
Let's say I have list of images (file names) and corresponding list of labels.
files = [...] # list of file names
labels = [...] # list of labels (integers)
images = tf.constant(files) # or tf.convert_to_tensor(files)
labels = tf.constant(labels) # or tf.convert_to_tensor(labels)
dataset = tf.data.Dataset.from_tensor_slices((images, labels))
dataset = dataset.shuffle(len(files))
dataset = dataset.repeat()
dataset = dataset.map(parse_function).batch(batch_size)
The parse_function is a simple function which reads the input file name and yields the image data and corresponding label, e.g.
def parse_function(filename, label):
image_string = tf.read_file(filename)
image_decoded = tf.image.decode_image(image_string)
image = tf.cast(image_decoded, tf.float32)
return image, label
At this point I have a dataset which is a tf.data.Dataset type (more precisely tf.data.BatchDataset) and I pass it along to keras model trained_model from tutorial, e.g.
history = trained_model.fit(dataset, ...)
But at this point code breaks with the following error:
AttributeError: 'BatchDataset' object has no attribute 'ndim'
The error comes from keras which performs the check on given input like that
from keras import backend as K
K.is_tensor(dataset) # which returns false
Keras tries to determine type of the input and since it is not a tensor it assumes it is numpy array and tries to get its dimension. That's why the error occurs.
My questions here are the following:
am I reading TF dataset correctly? I looked up plenty of examples on internet and it seems I'm reading it as people suggest
why my dataset is not a tensor? may be I need to perform additional conversion, but it is not the case of TF tutorial
why in TF tutorial everything works with tf datasets, I really don't see any difference from they way how they read MNIST data (which is in different data-format, but eventually they get images) and what I'm doing here.
Any suggestion would be greatly appreciated.
Please note, even TF tutorial is about TPUs it is structured such that it works on both TPUs and CPU/GPUs.
Turns out the problem was in using Keras model. The example in TF tutorial relies on Keras model build using tf.keras module (all layers, model, etc. came from tf.keras). While the model I was using (DenseNet) relies on pure keras module, i.e. all layers came from keras module and not from tf.keras. This cause the tf.data.Dataset to be checked for ndim in fit method of keras model. Once I adjusted my DenseNet to use tf.keras layers everything become working again.
Related
Tl;DR: How could I access the pytorch pre-trained model for Swin-Transformer so that I could extract features from it to train it on segmentation task using DeepLabv3+ head on a custom data set with image sizes of 512
I am testing SwinTransformer backbone with Deeplabv3+ as head for semantic segmentation.
I already have the code for Head and it get the features from backbone and then process the features in different way. It is working fine for ResNet and Xception. The thing is that I want to work with pre-trained SWIN. Now there are many ways but each of those has it's own set of problems.
1. Using timm library for pre-trainned models`
! pip install timm
import timm
import torch
all_swins = timm.list_models('*swin*')
print(all_swins)
model = timm.create_model('swin_large_patch4_window12_384_in22k', in_chans = 3, pretrained = True,)
print(model.default_cfg)
dummy_image = torch.randn(1,3,512,512) # create an image of size 512
model.forward_features(dummy_image) # extract features : Will result in error
new_model = timm.create_model('swin_large_patch4_window12_384_in22k', in_chans = 3, pretrained = True, img_size = 512) # This won't work I'll explain below
The problem is that None of the above code work. Reason being that the model accepts here an image of size 384 but my images are of 512. So you could change the argument img_size for other CNNs but as the author has clarified in this question
It should work with the vit, vit_deit, vit_deit_distilled. Has not been implemented for pit, swin, and tnt yet.
2. Using MMcv / MMSeg library:
Please open this colab notebook. I have commented and documented the part
Problem: The pre-trained weights are for only for a specific method which produced SOTA results i.e ADE dataset using UperNet backbone. I can not use it with DeepLabv3+ on a custom dataset.
3. Segmentations Models Pytorch Library which uses timm encoders
Problem: Again, as it uses timm, so the image resolutions can't be changed.
4. PaddleSeg Library
It has Swin transformer but Deeplabv3+ works only with Resnet50 and 101
Last Resort: In the end, I pulled up the official code from microsoft where I found couple of useful things:
configuration yml file
Code which they use for model building def build_model()
Code inside def main() which parses the arguments and builds the whole model
I don't really know how to build the model changing the image size configurations. If anyone has any idea, please help.
I am trying to implement object detection using MobileNetV2 model on Flutter. Since, most of the examples or implementation available online for Flutter app is not using MobileNetV2, so I took a long route to reach to that phase.
The way I achieved this is as follows:
1) Created a python script where I am using MobileNetV2 model (pre-trained on ImageNet for 1000 classes) of Keras (backend Tensorflow) and tested it with images to see if it is returning the correct labels after detecting objects correctly. [Python script provided below for reference]
2) Converted the same MobileNetV2 keras model (MobileNetV2.h5) to Tensorflow Lite model (MobileNetV2.tflite)
3) Followed the existing example of creating Flutter app to use Tensorflow Lite (https://itnext.io/working-with-tensorflow-lite-in-flutter-f00d733a09c3). Replaced the TFLite model shown in the example with the MobileNetV2.tflite model and used the ImageNet classes/labels in https://gist.github.com/aaronpolhamus/964a4411c0906315deb9f4a3723aac57 as the labels.txt.
[GitHub project of the Flutter example is provided here: https://github.com/umair13adil/tensorflow_lite_flutter]
When I now run the Flutter app, it is running without any error, however during classification/predicting the label the output is not correct. For example: It classifies an orange (object id: n07747607) as poncho (object id: n03980874), and classifies pomegranate (object id: n07768694) as banded_gecko (object id: n01675722).
However, if I use the same pictures and test it with my python script, it is returning the correct labels. So, I was wondering if the issue is actually with the label.txt (which holds the labels) used in the Flutter app, where the order of the labels is not matching the inference of the model.
Can anyone mention that how I can resolve the issue to classify the correct objects? How can I get the ImageNet labels that are used by the MobileNetV2 (keras) so that I can use that in the Flutter app?
My Flutter App to detect object using MobileNetv2 can be downloaded from here: https://github.com/somdipdey/Tensorflow_Lite_Object_Detection_Flutter_App
My python script to convert the MobileNetV2 model (keras) to TFLite while testing it on image for classification as follows:
import tensorflow as tf
from tensorflow import keras
from keras.preprocessing import image
from keras.applications.mobilenet_v2 import preprocess_input, decode_predictions
import numpy as np
import PIL
from PIL import Image
import requests
from io import BytesIO
# load the model
model = tf.keras.applications.MobileNetV2(weights='imagenet', include_top=True)
#model = tf.keras.models.load_model('MobileNetV2.h5')
# To save model
model.save('MobileNetV2.h5')
# chose the URL image that you want
URL = "https://images.unsplash.com/photo-1557800636-894a64c1696f?ixlib=rb-1.2.1&w=1000&q=80"
# get the image
response = requests.get(URL)
img = Image.open(BytesIO(response.content))
# resize the image according to each model (see documentation of each model)
img = img.resize((224, 224))
##############################################
# if you want to read the image from your PC
#############################################
# img_path = 'myimage.jpg'
# img = image.load_img(img_path, target_size=(299, 299))
#############################################
# convert to numpy array
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
features = model.predict(x)
# return the top 10 detected objects
num_top = 10
labels = decode_predictions(features, top=num_top)
print(labels)
#load keras model
new_model= tf.keras.models.load_model(filepath="MobileNetV2.h5")
# Create a converter # I could also directly use keras model instead of loading it again
converter = tf.lite.TFLiteConverter.from_keras_model(new_model)
# Convert the model
tflite_model = converter.convert()
# Create the tflite model file
tflite_model_name = "MobileNetV2.tflite"
open(tflite_model_name, "wb").write(tflite_model)
Let me start by sharing the ImageNet labels in two formats, JSON and txt. Given the fact MobileNetV2 is trained on ImageNet, it should be returning results based on these labels.
My initial thought is that there must be an error with the 2nd step of your pipeline. I assume you are trying to convert the trained Keras-based weights to Tensorflow Lite weights (is it the same format with pure Tensorflow?). A good option is to try and find already saved weights in the format of Tensorflow Lite ( but I guess they might not be available and that's why you are doing the conversion). I had similar problems with converting TF weights to Keras so you must be sure whether the conversion was successfully done before even going to step 3, creation of Flutter app to use Tensorflow Lite. A good way to achieve this is by printing all the available classes of your classifier and compare them with the original ImageNet labels given above.
I have built an Image Classifier model using Inception V3 and I have saved the model in "SavedModel" format to deploy it to production. I am wondering how I can bundle the pre-processing steps to the final model so that the model ingest data in its natural form.
The pre-processing steps that I have are:
- resizing the image to target_size of 299, 299 using keras load_model
- change the image to numpy array
- expand dimensions
- pre_process input using inception_v3 import preprocess_input call
When a model is deployed, as per my understanding what actually is deployed is the python code for inference utilising the model. In this python code you can write code for all your preprocessing using openCV or any other python libraries and pass the image as an argument to this python code.
eg inferenceFile.py imageToInfer.png
An out of the box thought would be to write a different deep learning model to which as input your non-preprocessed image and output the preprocessed image you feed to the model, Not sure if this could be achievable.
What is the right way to preprocess the data in Keras while fine-tuning the pre-trained models in keras.applications for our own data?
Keras provides the following preprocess_input functions
keras.applications.imagenet_utils.preprocess_input
keras.applications.inception_v3.preprocess_input
keras.applications.xception.preprocess_input
keras.applications.inception_resnet_v2.preprocess_input
Looking inside it seems like for inception_v3, xception, and inception_resnet_v2, it calls keras.applications.imagenet_utils.preprocess_input with mode='tf'. While for other models it sets mode='caffe' each of which perform a different transformation.
In the blog post about transfer learning from Francois chollet -- https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html -- it is normalized to [0, 1] through a division with 255. Shouldn't the preprocess_input functions in Keras be used instead?
Also it is not clear whether the input images should be in RGB or BGR? Is there any consistency regarding this or is it specific to the pre-trained model being used?
Always use the preprocess_input function in the corresponding model-level module. That is, use keras.applications.inception_v3.preprocess_input for InceptionV3 and keras.applications.resnet50.preprocess_input for ResNet50.
The mode argument specifies the preprocessing method used when training the original model. mode='tf' means that the pre-trained weights are converted from TF, where the authors trained model with [-1, 1] input range. So are mode='caffe' and mode='torch'.
The input to applications.*.preprocess_input is always RGB. If a model expects BGR input, the channels will be permuted inside preprocess_input.
The blog post you've mentioned was posted before the keras.applications module was introduced. I wouldn't recommend using it as a reference for transfer learning with keras.applications. Maybe it'll be better to try the examples in the docs instead.
I'm very new to TensorFlow and Python. I have a dataset, very similar to the MNIST dataset (28 * 28 image). I have been following a lot of the online tutorials on how to implement a basic neural network with tensorflow and found that most of them just use:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot = True)
Is there a way for me to use my own MNIST-like data instead of importing it from tensorflow? Furthermore, will I still be able to use mnist.train.next_batch with the MNIST-like data? Thank you.
The MNIST dataset used in tensorflow tutorial includes 4 files:
train-images-idx3-ubyte
train-labels-idx1-ubyte
t10k-images-idx3-ubyte
t10k-labels-idx1-ubyte
The first two are training data and training labels; The next two are test data and testing labels. The pixel values/label are stored as byte streams in the file. If your dataset has the exact format as MNIST dataset above, definitely you can use the same approach. The image and label part are read using extract_image and extract_labels method defined here.
Actually it is up to you to store your data in any other format (maybe tf.Example TFRecord file is actually easier). Take a look at the new API too.