Saving downloaded data as well as trained model

Saving downloaded data as well as trained model - python

Is there a way to save the execution of a particular block of code in a notebook such that I don't have to run it again. And can continue with the rest of code after reloading?
For example,
from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
(train_images1, train_labels), (test_images1, test_labels) = datasets.cifar10.load_data()
# Normalize pixel values to be between 0 and 1
train_images, test_images = train_images1 / 255.0, test_images1 / 255.0
#My cnn model, upto the training
#Save upto here.
Can I save the execution upto here for later usage, that is including the downloaded files and trained model.

Save Model:
model_json = model.to_json()
with open("model.json", "w") as json_file:
json_file.write(model_json)
model.save_weights("model.h5")
print("Saved model .......")
Load saved Model:
json_file = open('model.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
loaded_model = model_from_json(loaded_model_json)
loaded_model.load_weights("model.h5")
print("Loaded model...........")
For more details, You can find my implementation here. Now this will save both dataset as well as trained model also.

There is! You can save your NumPy data using numpy.save("train_images.npy", train_images) and load them using train_images = numpy.load("train_images.npy"). When working with notebooks, just put save and load in two different cells and run whatever cell you need.
Documentation:
save
load
There are many variations like savez to save multiple arrays in an uncompressed file or savez_compressed for compressed files.

Related

Extract labels from tflite model file

I have a trained TF-Lite model (model.tflite) for image classification with several labels. The output of the model provides an array of probabilities, but I don't know the order to the labels.
Can I extract the labels from the TF model?

I think this might extract the metadata
pip install tflite_support
import os
from tflite_support import metadata as _metadata
from tflite_support import metadata_schema_py_generated as _metadata_fb
model_file = <model_path>
displayer = _metadata.MetadataDisplayer.with_model_file(model_file)
export_json_file = os.path.join(os.path.splitext(model_file)[0] + ".json")
json_file = displayer.get_metadata_json()
with open(export_json_file, "w") as f:
f.write(json_file)

The simplest thing to do is to dump the labels file from the TF Lite model file. That file is a zip archive, so just do this:
unzip mobilenet_v1_0.75_160_quantized_1_metadata_1.tflite
Archive: mobilenet_v1_0.75_160_quantized_1_metadata_1.tflite
extracting: labels.txt
The "labels.txt" file (or something similarly named) contains the list of labels for the model.
Reference (and more info on how to read TF Lite model metadata): https://www.tensorflow.org/lite/models/convert/metadata#read_the_associated_files_from_models
Note: A TF Lite model is not guaranteed to contain a labels file like this, but most publicly published models, such as ones on tfhub.dev, should have this metadata included.

ValueError: Image size is zero in Google Colab

I'm learning ML model training following this tutorial from Tensorflow. I have uploaded my own dataset from my computer to a folder named "sample_arrow" in Google Colab and specified the path to it:
image_path = 'sample_arrow'
The folder contains images, the size is not 0. But I get an error when executing this line of code:
data = DataLoader.from_folder(image_path) train_data, test_data = data.split(0.9)
ValueError: Image size is zero
What is wrong here? Maybe the folder path is not specified correctly? I'm completely new to the topic, unfamiliar with Pyhon (have Java skills) and would appreciate a detailed answer.

At last, I've found the solution.
Import os and the correct path definition were missing:
import os
root_path = "/content/"
image_path = os.path.join(os.path.dirname(root_path), 'sample_arrow')

Use the following code in google colab:
import tensorflow as tf
data_path = tf.keras.utils.get_file(
'flower_photos',
'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
untar=True)
from tflite_model_maker import image_classifier
from tflite_model_maker.image_classifier import DataLoader
# Load input data specific to an on-device ML app.
data = DataLoader.from_folder(data_path)
train_data, test_data = data.split(0.9)
# Customize the TensorFlow model.
model = image_classifier.create(train_data)
# Evaluate the model.
loss, accuracy = model.evaluate(test_data)
# Export to Tensorflow Lite model and label file in `export_dir`.
model.export(export_dir='/tmp/')
So you need to download the data! took me a while to find the solution!!

Custom data generator

I have a standard directory structure of train, validation, test, and each contain class subdirectories.
...
|train
|class A
|1
|1_1.raw
|1_2.raw
...
|2
...
|class B
...
|test
...
I want to use the flow_from_directory API, but all I can find is an ImageDataGenerator, and the files I have are raw numpy arrays (generated with arr.tofile(...)).
Is there an easy way to use ImageDataGenerator with a custom file loader?
I'm aware of flow_from_dataframe, but that doesn't seem to accomplish what I want either; it's for reading images with more custom organization. I want a simple way to load raw binary files instead of having to re-encode 100,000s of files into jpgs with some precision loss along the way (and wasted time, etc.).

Tensorflow is an entire ecosystem with IO capabilities and ImageDataGenerator is one of the least flexible approaches. Read here on How to Load Numpy Data in Tensorflow.
import tensorflow as tf
import numpy as np
DATA_URL = 'https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz'
path = tf.keras.utils.get_file('mnist.npz', DATA_URL)
with np.load(path) as data:
train_examples = data['x_train']
train_labels = data['y_train']
test_examples = data['x_test']
test_labels = data['y_test']
train_dataset = tf.data.Dataset.from_tensor_slices((train_examples, train_labels))
test_dataset = tf.data.Dataset.from_tensor_slices((test_examples, test_labels))

Loading images in Keras for CNN from directory but label in CSV file

I have a set of image files in a directory train_images = './data/images' and train_labels = './data/labels.csv'
For example - There are 1000 images in train_images as 377.jpg,17814.jpg .... and so on. And the class they correspond to are saved in a different CSV file.
EDIT- Here are a few rows from the CSV file -
>>
ID Class
0 377.jpg MIDDLE
1 17814.jpg YOUNG
2 21283.jpg MIDDLE
3 16496.jpg YOUNG
4 4487.jpg MIDDLE
Here I.D is the image file name and the class is the class it is associated to.
I could have used the very usual
ImageDataGenerator().flow_from_directory(train_images, class_mode='binary', batch_size=64)
but the problem is that labels are in a CSV file. What I could do is to rename all the files using os and put different files in different directories and then load it but it looks so immature and foolish.
How can I load data in Keras for CNN where each image is of dimension (h,w,c)?

Here's my example using ImageDataGenerator, with the flow_from_dataframe function from ImageDataGenerator, and Pandas to read the CSV. The CSV I was using had two columns:
x_col="Image"
y_col="Id"
So the first column is the filename e.g. xxxx.jpg, and the second column is the class. In this case, since it is from the kaggle humpback whale challenge, what kind of whale it is. The image files are in the directory "../input/humpback-whale-identification/train/"
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation,
Conv2D, Flatten, Dropout, MaxPooling2D, BatchNormalization
from tensorflow.keras.preprocessing.image import
ImageDataGenerator
from keras import regularizers, optimizers
import os
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
So read the CSV using pandas:
traindf=pd.read_csv('../input/humpback-whale-
identification/train.csv',dtype=str)
Now using ImageDataGenerator
datagen=ImageDataGenerator(rescale=1./255.,validation_split=0.25)
train_generator=datagen.flow_from_dataframe(
dataframe=traindf,
directory="../input/humpback-whale-identification/train/",
x_col="Image",
y_col="Id",
subset="training",
batch_size=32,
seed=42,
shuffle=True,
class_mode="categorical",
target_size=(100,100))
Now sometimes the filename/ID in the CSV doesn't have an extension. So in that I used the following to add extensions to
them:
def append_ext(fn):
return fn+".jpg"
traindf["Image"]=traindf["Image"].apply(append_ext)
Well hope that is helpful! It's my first try at answering a Q here :-)
The Kaggle dataset/challenge is here https://www.kaggle.com/c/humpback-whale-identification
Note: I've seen people doing this in all kinds of ways on kaggle! But this seems the easiest!

Then you can use pandas to read the csv file as a DataFrame using the function read_csv:
import pandas as pd
df = pd.read_csv('csvfilename', delimiter=',')
Then use the flow_from_dataframe function of the ImageDataGenerator class.
There is a tutorial at this link
flow_from_dataframe(dataframe, directory=None, x_col='filename', y_col='class', weight_col=None, target_size=(256, 256), color_mode='rgb', classes=None, class_mode='categorical', batch_size=32, shuffle=True, seed=None, save_to_dir=None, save_prefix='', save_format='png', subset=None, interpolation='nearest', validate_filenames=True)

Python: how to save training datasets

I have got training datasets, which are xtrain, ytrain, xtest and ytest. They are all numpy arrays. I want to save them together into a file, so that I can load them into workspace as done in keras for mnist.load_data:
(xtrain, ytrain), (xtest, ytest) = mnist.load_data(filepath)
In python, is there any way to save my training datasets into such a single file? Or is there any other appreciate methods to save them?

You have a number of options:
npz
hdf5
pickle
Keras provides option to save models to hdf5. Also, note that out of the three, it's the only interoperable format.

Pickle is a good way to go:
import pickle as pkl
#to save it
with open("train.pkl", "w") as f:
pkl.dump([train_x, train_y], f)
#to load it
with open("train.pkl", "r") as f:
train_x, train_y = pkl.load(f)
If your dataset is huge, I would recommend check out hdf5 as #Lukasz Tracewski mentioned.

I find hickle is a very nice way to save them all together into a dict:
import hickle as hkl
data = {'xtrain': xtrain, 'xtest': xtest,'ytrain': ytrain,'ytest':ytest}
hkl.dump(data,'data.hkl')

You simply could use numpy.save
np.save('xtrain.npy', xtrain)
or in a human readable format
np.savetxt('xtrain.txt', xtrain)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Saving downloaded data as well as trained model - python

Related

Extract labels from tflite model file

ValueError: Image size is zero in Google Colab

Custom data generator

Loading images in Keras for CNN from directory but label in CSV file

Python: how to save training datasets

Categories

Resources