I have been going through Google's Machine Learning Crash Course, and am at the "First steps with TensorFlow" section. I wanted to run the examples on my machine, and keep getting an error that says:
ValueError: Could not find trained model in model_dir: C:\Users\Username\AppData
The folder at the end is different every time I run the script. So it's creating a directory for model_dir, but then puts nothing there, or puts my model there and it is deleted by the time the predict() method is called.
If I try to define model_dir in the estimator.LinearRegressor init method and set the checkpoint_path of the predict() method to the same directory, it tells me access is denied no matter where I point, in C or to C:\Users, etc.
I should also mention I am executing inside an Anaconda environment.
Any help is greatly appreciated!
import math
from IPython import display
from matplotlib import cm
from matplotlib import gridspec
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
from sklearn import metrics
import tensorflow as tf
from tensorflow.python.data import Dataset
pd.options.display.max_rows = 10
pd.options.display.float_format = '{:.1f}'.format
#LOAD Dataset
california_housing_dataframe = pd.read_csv("california_housing_train.csv", sep=",")
#Randomize data (to avoid ordering bias) and div a clumn by 1000 to get to a learning rate we usually work with
california_housing_dataframe = california_housing_dataframe.reindex(
california_housing_dataframe["median_house_value"] /= 1000.0
print(california_housing_dataframe) #print top and botton 5 rows (see max rows 10 above)
#examine data briefly
# Define the input feature: total_rooms.
my_feature = california_housing_dataframe[["total_rooms"]]
# Configure a numeric feature column for total_rooms.
feature_columns = [tf.feature_column.numeric_column("total_rooms")]
# Define the label.
targets = california_housing_dataframe["median_house_value"]
# Use gradient descent as the optimizer for training the model.
my_optimizer = tf.contrib.estimator.clip_gradients_by_norm(my_optimizer, 5.0)
# Configure the linear regression model with our feature columns and optimizer.
# Set a learning rate of 0.0000001 for Gradient Descent.
linear_regressor = tf.estimator.LinearRegressor(
def my_input_fn(features, targets, batch_size=1, shuffle=True, num_epochs=None):
"""Trains a linear regression model of one feature.
features: pandas DataFrame of features
targets: pandas DataFrame of targets
batch_size: Size of batches to be passed to the model
shuffle: True or False. Whether to shuffle the data.
num_epochs: Number of epochs for which data should be repeated. None = repeat indefinitely
Tuple of (features, labels) for next data batch
# Convert pandas data into a dict of np arrays.
features = {key:np.array(value) for key,value in dict(features).items()}
# Construct a dataset, and configure batching/repeating
ds = Dataset.from_tensor_slices((features,targets)) # warning: 2GB limit
ds = ds.batch(batch_size).repeat(num_epochs)
# Shuffle the data, if specified
if shuffle:
ds = ds.shuffle(buffer_size=10000)
# Return the next batch of data
features, labels = ds.make_one_shot_iterator().get_next()
return features, labels
_ = linear_regressor.train(
input_fn = lambda:my_input_fn(my_feature, targets),
# Create an input function for predictions.
# Note: Since we're making just one prediction for each example, we don't
# need to repeat or shuffle the data here.
prediction_input_fn =lambda: my_input_fn(my_feature, targets, num_epochs=1, shuffle=False)
# Call predict() on the linear_regressor to make predictions.
predictions = linear_regressor.predict(input_fn = prediction_input_fn
# Format predictions as a NumPy array, so we can calculate error metrics.
predictions = np.array([item['predictions'][0] for item in predictions])
Full traceback:
WARNING:tensorflow:Using temporary folder as model directory: C:\Users\Username\
Traceback (most recent call last):
File "fstf.py", line 104, in <module>
predictions = np.array([item['predictions'][0] for item in predictions])
File "fstf.py", line 104, in <listcomp>
predictions = np.array([item['predictions'][0] for item in predictions])
File "C:\Users\Username\AppData\Local\conda\conda\envs\tensorflow\lib\site-pac
kages\tensorflow\python\estimator\estimator.py", line 471, in predict
ValueError: Could not find trained model in model_dir: C:\Users\Username\AppData
Because you didn't specify a parameter for the LinearRegressor,so your trained model is saved in the system temporary directory and deleted/cleaned by the system when your program was completed.
So you should specify a model_dir parameter for LinearRegressor.
The __init__ function of LinearRegressor is that:
You can read the doc here
In terms of your code, you should change these code
linear_regressor = tf.estimator.LinearRegressor(
linear_regressor = tf.estimator.LinearRegressor(
Your program will run successfully, Good Luck!!
i also meet the problem,and i solve that by add these codes:
input_fn = lambda:my_input_fn(my_feature, targets),
because you miss the Step 5: Train the Model
you should set eval_steps is 1 or smaller,
and set eval_batch_size is all eval data or more bigger.
if it will evaluate much steps,
for cheackpoint life circle, it only keep 5 last .ckpt for default(you can customize).
and the next step's batch can't to evaluate.
and it will raise a error:
ValueError: Could not find trained model in model_dir: {your_model_dir}.
more detail:
- https://www.tensorflow.org/api_docs/python/tf/estimator/RunConfig
- https://github.com/colinwke/wide_deep_demo
I'm trying to train a TF estimator using the RNNEstimator() class, but I'm having trouble with defining the estimator. My goal is the following:
Create a tf.data.Dataset.
Feed it into the RNN estimator.
The first part seems to be working correctly. I define the
def _parse_func(record):
# takes tf record as input and returns the following tensors
# numeric_tensor.shape = (5,170) and y.shape=()
return {'numerical': numeric_tensor,}, y
def input_fn(filenames=['data.tfrecord']):
# Returns parsed tf record i.e. the tf.data.Dataset
dataset = tf.data.TFRecordDataset(filenames=filenames)
dataset = dataset.map(map_func=_parse_func)
dataset = dataset.repeat()
dataset = dataset.batch(batch_size=BATCH_SIZE)
return dataset
Now let's move onto the meaty part.
Estimators take care of creating the session and graph. So I simply create the estimator in the following format:
# create the column
column = tf.contrib.feature_column.sequence_numeric_column('numerical')
# create the estimator
estimator = RNNEstimator(
num_units=[32, 16], cell_type='lstm')
# train the estimator
estimator.train(input_fn=input_fn, steps=100)
However, this doesn't work. It gives me a variety of errors! In particularly, at the moment I get:
TypeError: Input must be a SparseTensor.
Additionally, I seem to be unable to change the loss to log-loss. I tried setting it by passing it to the head parameter using:
head = tf.contrib.estimator.regression_head(loss_fn=tf.losses.log_loss)
I am trying to make a classifier to learn if a movie review was positive or negative from its contents. I have using a couple of files that are relevant, a file of the total vocabulary(one word per line) across every document, two CSVs(one for the training set, one for the testing) containing the score each document got in a specific order, and two CSVs(same as above) where on one line it is the the index of each word that appears in that review looking at the vocab as a list. So for every a review like "I liked this movie" have something like a score line of 1(0: dislike, 1 like) and a word line of [2,13,64,33]. I use the DNNClassifier and currently am using 1 feature which is an embedding column wrapped around a categorical_column_with_identity. My code runs but it takes absolutely terrible results and I'm not sure why. Perhaps someone with more knowledge about tensor flow could help me out. Also I don't go on here much but I honestly tried and couldn't find a post that directly helps me.
import tensorflow as tf
import pandas as pd
import numpy as np
import os
embedding_d = 18
label_name = ['Label']
col_name = ["Words"]
hidden_unit = [10]*5
BATCH = 50
STEP = 5000
#Ignore some warning messages but an optional compiler
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
##Function to feed into training
def train_input_fn(features, labels, batch_size):
# Convert the inputs to a Dataset.
dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))
# Shuffle, repeat, and batch the examples.
dataset = dataset.shuffle(1000).repeat().batch(batch_size)
# Return the dataset.
return dataset
##Orignal Eval. Untouched so far. Mostly likely will need to be changed.
def eval_input_fn(features, labels, batch_size):
"""An input function for evaluation or prediction"""
if labels is None:
# No labels, use only features.
inputs = features
inputs = (features, labels)
# Convert the inputs to a Dataset.
dataset = tf.data.Dataset.from_tensor_slices(inputs)
# Batch the examples
assert batch_size is not None, "batch_size must not be None"
dataset = dataset.batch(batch_size)
# Return the dataset.
return dataset
## Produces dataframe for labels and features(words) using pandas
def loadData():
train_label =pd.read_csv("aclImdb/train/yaynay.csv",names=label_name)
test_label =pd.read_csv("aclImdb/test/yaynay.csv",names=label_name)
train_feat = pd.read_csv("aclImdb/train/set.csv", names = col_name)
test_feat = pd.read_csv("aclImdb/test/set.csv", names = col_name)
train_feat[col_name] =train_feat[col_name].astype(np.int64)
test_feat[col_name] =test_feat[col_name].astype(np.int64)
return (train_feat,train_label),(test_feat,test_label)
## Stuff that I believe is somewhat working
# Get labels for test and training data
(train_x,train_y), (test_x,test_y) = loadData()
## Get the features for each document
train_feature = []
#Currently only one key but this could change in the future
for key in train_x.keys():
#Create a categorical_column column
idCol = tf.feature_column.categorical_column_with_identity(
key= key,
embedding_column = tf.feature_column.embedding_column(
categorical_column= idCol,
##Create the neural network
classifier = tf.estimator.DNNClassifier(
# Species no. of layers and no. of neurons in each layer
# Number of output options(here there are 11 for scores 0-10 inclusive)
n_classes= 2)
# Train the Model
#First numerical value is batch size, second is total steps to take.
classifier.train(input_fn= lambda: train_input_fn(train_x, train_y, BATCH),steps=STEP)
#Evaluate the model
eval_result = classifier.evaluate(
input_fn=lambda:eval_input_fn(test_x, test_y,
BATCH), steps = STEP)
print('\nTest set accuracy: {accuracy:0.3f}\n'.format(**eval_result))
I am trying to fine-tune inceptionv3 model using slim tensorflow library.
I am unable to understand certain things while writing the code for it. I tried to read source code (no proper documentation) and figured out few things and I am able to fine-tune it and save the check point. Here are the steps I followed
1. I created a tf.record for my training data which is fine, now I am reading the data using the below code.
import tensorflow as tf
import tensorflow.contrib.slim.nets as nets
import tensorflow.contrib.slim as slim
import matplotlib.pyplot as plt
import numpy as np
# get the data and labels here
data_path = '/home/sfarkya/nvidia_challenge/datasets/detrac/train1.tfrecords'
# Training setting
num_epochs = 100
initial_learning_rate = 0.0002
learning_rate_decay_factor = 0.7
num_epochs_before_decay = 5
num_classes = 5980
# load the checkpoint
model_path = '/home/sfarkya/nvidia_challenge/datasets/detrac/inception_v3.ckpt'
# log directory
log_dir = '/home/sfarkya/nvidia_challenge/datasets/detrac/fine_tuned_model'
with tf.Session() as sess:
feature = {'train/image': tf.FixedLenFeature([], tf.string),
'train/label': tf.FixedLenFeature([], tf.int64)}
# Create a list of filenames and pass it to a queue
filename_queue = tf.train.string_input_producer([data_path], num_epochs=1)
# Define a reader and read the next record
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)
# Decode the record read by the reader
features = tf.parse_single_example(serialized_example, features=feature)
# Convert the image data from string back to the numbers
image = tf.decode_raw(features['train/image'], tf.float32)
# Cast label data into int32
label = tf.cast(features['train/label'], tf.int32)
# Reshape image data into the original shape
image = tf.reshape(image, [128, 128, 3])
# Creates batches by randomly shuffling tensors
images, labels = tf.train.shuffle_batch([image, label], batch_size=64, capacity=128, num_threads=2,
Now I am finetuning the model using slim and this is the code.
init_op = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer())
# Create a coordinator and run all QueueRunner objects
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
# load model
# load the inception model from the slim library - we are using inception v3
#inputL = tf.placeholder(tf.float32, (64, 128, 128, 3))
img, lbl = sess.run([images, labels])
one_hot_labels = slim.one_hot_encoding(lbl, num_classes)
with slim.arg_scope(slim.nets.inception.inception_v3_arg_scope()):
logits, inceptionv3 = nets.inception.inception_v3(inputs=img, num_classes=5980, is_training=True,
# Restore convolutional layers:
variables_to_restore = slim.get_variables_to_restore(exclude=['InceptionV3/Logits', 'InceptionV3/AuxLogits'])
init_fn = slim.assign_from_checkpoint_fn(model_path, variables_to_restore)
# loss function
loss = tf.losses.softmax_cross_entropy(onehot_labels=one_hot_labels, logits = logits)
total_loss = tf.losses.get_total_loss()
# train operation
train_op = slim.learning.create_train_op(total_loss + loss, optimizer= tf.train.AdamOptimizer(learning_rate=1e-4))
print('Im here')
# Start training.
slim.learning.train(train_op, log_dir, init_fn=init_fn, save_interval_secs=20, number_of_steps= 10)
Now I have few questions about the code, which I am quite unable to figure out. Once, the code reaches slim.learning.train I don't see anything printing however, it's training, I can see in the log. Now,
1. How do I give the number of epochs to the code? Right now it's running step by step with each step has batch_size = 64.
2. How do I make sure that in the code tf.train.shuffle_batch I am not repeating my images and I am training over the whole dataset?
3. How can I print the loss values while it's training?
Here are answers to your questions.
You cannot give epochs directly to slim.learning.train. Instead, you give the number of batches as the argument. It is called number_of_steps. It is used to set an operation called should_stop_op on line 709. I assume you know how to convert number of epochs to batches.
I don't think the shuffle_batch function will repeat images because internally it uses the RandomShuffleQueue. According to this answer, the RandomShuffleQueue enqueues elements using a background thread as:
While size(queue) < capacity:
Add an element to the queue
It dequeues elements as:
While the number of elements dequeued < batch_size:
Wait until the size(queue) >= min_after_dequeue + 1 elements.
Select an element from the queue uniformly at random, remove it from the queue, and add it the output batch.
So in my opinion, there is very little chance that the elements would be repeated, because in the dequeuing operation, the chosen element is removed from the queue. So it is sampling without replacement.
Will a new queue be created for every epoch?
The tensors being inputted to tf.train.shuffle_batch are image and label which ultimately come from the filename_queue. If that queue is producing TFRecord filenames indefinitely, then I don't think a new queue will be created by shuffle_batch. You can also create a toy code like this to understand how shuffle_batch works.
Coming to the next point, how to train over the whole dataset? In your code, the following line gets the list of TFRecord filenames.
filename_queue = tf.train.string_input_producer([data_path], num_epochs=1)
If filename_queue covers all TFRecords that you have, then you are surely training over the entire dataset. Now, how to shuffle the entire dataset is another question. As mentioned here by #mrry, there is no support (yet, AFAIK) to shuffle out-of-memory datasets. So the best way is to prepare many shards of your dataset such that each shard contains about 1024 examples. Shuffle the list of TFRecord filenames as:
filename_queue = tf.train.string_input_producer([data_path], shuffle=True, capacity=1000)
Note that I removed the num_epochs = 1 argument and set shuffle=True. This way it will produce the shuffled list of TFRecord filenames indefinitely. Now on each file, if you use tf.train.shuffle_batch, you will get a near-to-uniform shuffling. Basically, as the number of examples in each shard tend to 1, your shuffling will get more and more uniform. I like to not set num_epochs and instead terminate the training using the number_of_steps argument mentioned earlier.
To print the loss values, you could probably just edit the training.py and introduce logging.info('total loss = %f', total_loss). I don't know if there is any simpler way. Another way without changing the code is to view summaries in Tensorboard.
There are very helpful articles on how to view summaries in Tensorboard, including the link at the end of this answer. Generally, you need to do the following things.
Create summary object.
Write variables of interest into summary.
Merge all individual summaries.
Create a summary op.
Create a summary file writer.
Write the summaries throughout the training at a desired frequency.
Now steps 5 and 6 are already done automatically for you if you use slim.learning.train.
For first 4 steps, you could check the file train_image_classifier.py. Line 472 shows you how to create a summaries object. Lines 490, 512 and 536 write the relevant variables into summaries. Line 549 merges all summaries and the line 553 creates an op. You can pass this op to slim.learning.train and you can also specify how frequently you want to write summaries. In my opinion, do not write anything apart from loss, total_loss, accuracy and learning rate into the summaries, unless you want to do specific debugging. If you write histograms, then the tensorboard file could take tens of hours to load for networks like ResNet-50 (my tensorboard file once was 28 GB, which took 12 hours to load the progress of 6 days!). By the way, you could actually use train_image_classifier.py file to finetune and you will skip most of the steps above. However, I prefer this as you get to learn a lot of things.
See the launching tensorboard section on how to view the progress in a browser.
Additional remarks:
Instead of minimizing total_loss + loss, you could do the following:
loss = tf.losses.softmax_cross_entropy(onehot_labels=one_hot_labels, logits = logits)
total_loss = tf.losses.get_total_loss()
train_op = slim.learning.create_train_op(total_loss, optimizer=tf.train.AdamOptimizer(learning_rate=1e-4))
I found this post to be very useful when I was learning Tensorflow.
I have generated some input data in a CSV where 'awesomeness' is 'age * 10'. It looks like this:
age, awesomeness
67, 670
38, 380
32, 320
69, 690
40, 400
It should be trivial to write a tensorflow model that can predict 'awesomeness' from 'age', but I can't make it work.
When I run training, the output I get is:
accuracy: 0.0 <----------------------------------- What!??
accuracy/baseline_label_mean: 443.8
accuracy/threshold_0.500000_mean: 0.0
auc: 0.0
global_step: 6000
labels/actual_label_mean: 443.8
labels/prediction_mean: 1.0
loss: -2.88475e+09
precision/positive_threshold_0.500000_mean: 1.0
recall/positive_threshold_0.500000_mean: 1.0
Please note that this is obviously a completely contrived example, but that is because I was getting the same result with a more complex meaningful model with a much larger data set; 0% accuracy.
This is my attempt at the most minimal possible reproducible test case that I can make which exhibits the same behaviour.
Here's what I'm doing, based on the census example for the DNNClassifier from tflearn:
COLUMNS = ["age", "awesomeness"]
OUTPUT_COLUMN = "awesomeness"
def build_estimator(model_dir):
"""Build an estimator."""
age = tf.contrib.layers.real_valued_column("age")
deep_columns = [age]
m = tf.contrib.learn.DNNClassifier(model_dir=model_dir,
hidden_units=[50, 10])
return m
def input_fn(df):
"""Input builder function."""
feature_cols = {k: tf.constant(df[k].values, shape=[df[k].size, 1]) for k in CONTINUOUS_COLUMNS}
output = tf.constant(df[OUTPUT_COLUMN].values, shape=[df[OUTPUT_COLUMN].size, 1])
return feature_cols, output
def train_and_eval(model_dir, train_steps):
"""Train and evaluate the model."""
train_file_name, test_file_name = training_data()
df_train = pd.read_csv(...) # ommitted for clarity
df_test = pd.read_csv(...)
m = build_estimator(model_dir)
m.fit(input_fn=lambda: input_fn(df_train), steps=train_steps)
results = m.evaluate(input_fn=lambda: input_fn(df_test), steps=1)
for key in sorted(results):
print("%s: %s" % (key, results[key]))
def training_data():
"""Return path to the training and test data"""
training_datafile = path.join(path.dirname(__file__), 'data', 'data.training')
test_datafile = path.join(path.dirname(__file__), 'data', 'data.test')
return training_datafile, test_datafile
model_folder = 'scripts/model' # Where to store the model
train_steps = 2000 # How many iterations to run while training
train_and_eval(model_folder, train_steps)
A couple of notes:
The original example tutorial this is based on is here https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/learn/wide_n_deep_tutorial.py
Notice I am using the DNNClassifier, not the LinearClassifier as I want specifically to deal with continuous input variables.
A lot of examples just use 'premade' data sets which are known to work with examples; my data set has been manually generated and is absolutely not random.
I have verified the csv loader is loading the data correctly as int64 values.
Training and test data are generated identically, but have different values in them; however, using data.training as the test data still returns a 0% accuracy, so there's no question that something isn't working, this isn't just over-fitting.
First of all, you are describing a regression task, not a classification task. Therefore, both, DNNClassifier and LinearClassifier would be the wrong thing to use. That also makes accuracy the wrong quantity to use to tell if your model works or not. I suggest you read up on these two different context e.g. in the book "The Elements of Statistical Learning"
But here is a short answer to your problem. Say you have a linear model
awesomeness_predicted = slope * age
where slope is the parameter you want to learn from data. Lets say you have data age[0], ..., age[N] and the corresponding awesomeness values a_data[0],...,a_data[N]. In order to specify if your model works well, we are going to use mean squared error, that is
error = sum((a_data[i] - a_predicted[i])**2 for i in range(N))
What you want to do now is start with a random guess for slope and gradually improving using gradient descent. Here is a full working example in pure tensorflow
import tensorflow as tf
import numpy as np
DTYPE = tf.float32
## Generate Data
age = np.array([67, 38, 32, 69, 40])
awesomeness = 10 * age
## Generate model
# define the parameter of the model
slope = tf.Variable(initial_value=tf.random_normal(shape=(1,), dtype=DTYPE))
# define the data inputs to the model as variable size tensors
x = tf.placeholder(DTYPE, shape=(None,))
y_data = tf.placeholder(DTYPE, shape=(None,))
# specify the model
y_pred = slope * x
# use mean squared error as loss function
loss = tf.reduce_mean(tf.square(y_data - y_pred))
target = tf.train.AdamOptimizer().minimize(loss)
## Train Model
init = tf.global_variables_initializer()
with tf.Session() as sess:
for epoch in range(100000):
_, training_loss = sess.run([target, loss],
feed_dict={x: age, y_data: awesomeness})
print("Training loss: ", training_loss)
print("Found slope=", sess.run(slope))
There are a few things I would like to say.
Assuming you load the data correctly:
-This looks like a regression task and you are using a classifier. I'm not saying it doesn't work at all, but like this your are giving a label to each entry of age and training on the whole batch each epoch is very unstable.
-You are getting a huge value for the loss, your gradients are exploding. Having this toy dataset you probably need to tune hyperparameters like hidden neurons, learning rate and number of epochs. Try to log the loss value for each epoch and see if that may be the problem.
-Last suggestion, make your data work with a simpler model, possibly suited for your task, like a regression model and then scale up
See also https://github.com/tflearn/tflearn/blob/master/examples/basics/multiple_regression.py for using tflearn to solve this.
""" Multiple Regression/Multi target Regression Example
The input features have 10 dimensions, and target features are 2 dimension.
from __future__ import absolute_import, division, print_function
import tflearn
import numpy as np
# Regression data- 10 training instances
#10 input features per instance.
#2 output features per instance
# Multiple Regression graph, 10-d input layer
input_ = tflearn.input_data(shape=[None,10])
#10-d fully connected layer
r1 = tflearn.fully_connected(input_,10)
#2-d fully connected layer for output
r1 = tflearn.fully_connected(r1,2)
r1 = tflearn.regression(r1, optimizer='sgd', loss='mean_square',
metric='R2', learning_rate=0.01)
m = tflearn.DNN(r1)
m.fit(X,Y, n_epoch=100, show_metric=True, snapshot_epoch=False)
#Predict for 1 instance
print("\nInput features: ",testinstance)
print("\n Predicted output: ")
I am looking at TF Slim introductory document and from what I understand, it only takes in one batch of image data at each run(32 images). Obviously, one wants to loop through this and train for many different batches. The intro does not cover this. How can this be done properly. I imagine there should be some way to specify a load batch function which should be called automatically when starting a batch training event, but I can't seem to find a simple example for this on the intro.
# Note that this may take several minutes.
import os
from datasets import flowers
from nets import inception
from preprocessing import inception_preprocessing
slim = tf.contrib.slim
image_size = inception.inception_v1.default_image_size
def get_init_fn():
"""Returns a function run by the chief worker to warm-start the training."""
checkpoint_exclude_scopes=["InceptionV1/Logits", "InceptionV1/AuxLogits"]
exclusions = [scope.strip() for scope in checkpoint_exclude_scopes]
variables_to_restore = []
for var in slim.get_model_variables():
excluded = False
for exclusion in exclusions:
if var.op.name.startswith(exclusion):
excluded = True
if not excluded:
return slim.assign_from_checkpoint_fn(
os.path.join(checkpoints_dir, 'inception_v1.ckpt'),
train_dir = '/tmp/inception_finetuned/'
with tf.Graph().as_default():
dataset = flowers.get_split('train', flowers_data_dir)
images, _, labels = load_batch(dataset, height=image_size, width=image_size)
# Create the model, use the default arg scope to configure the batch norm parameters.
with slim.arg_scope(inception.inception_v1_arg_scope()):
logits, _ = inception.inception_v1(images, num_classes=dataset.num_classes, is_training=True)
# Specify the loss function:
one_hot_labels = slim.one_hot_encoding(labels, dataset.num_classes)
slim.losses.softmax_cross_entropy(logits, one_hot_labels)
total_loss = slim.losses.get_total_loss()
# Create some summaries to visualize the training process:
tf.scalar_summary('losses/Total Loss', total_loss)
# Specify the optimizer and create the train op:
optimizer = tf.train.AdamOptimizer(learning_rate=0.01)
train_op = slim.learning.create_train_op(total_loss, optimizer)
# Run the training:
final_loss = slim.learning.train(
print('Finished training. Last batch loss %f' % final_loss)
The slim.learning.train function contains a training loop, so the code you've given does in fact train on multiple batches of images.
See here in the source code, where train_step_fn is called within a while loop. train_step (the default value of train_step_fn) contains the line sess.run([train_op, global_step]...), which actually runs the training operation on a single batch of images.