I am using tf.keras.preprocessing.image_dataset_from_directory to get a BatchDataset, where the dataset has 10 classes.
I am trying to integrate this BatchDataset with a Keras VGG16 (docs) network. From the docs:
Note: each Keras Application expects a specific kind of input preprocessing. For VGG16, call tf.keras.applications.vgg16.preprocess_input on your inputs before passing them to the model.
However, I am struggling to get this preprocess_input working with a BatchDataset. Can you please help me figure out how to connect these two dots?
Please see the below code:
train_ds = tf.keras.preprocessing.image_dataset_from_directory(train_data_dir, image_size=(224, 224))
train_ds = tf.keras.applications.vgg16.preprocess_input(train_ds)
This will throw TypeError: 'BatchDataset' object is not subscriptable:
Traceback (most recent call last):
...
File "/path/to/venv/lib/python3.10/site-packages/keras/applications/vgg16.py", line 232, in preprocess_input
return imagenet_utils.preprocess_input(
File "/path/to/venv/lib/python3.10/site-packages/keras/applications/imagenet_utils.py", line 117, in preprocess_input
return _preprocess_symbolic_input(
File "/path/to/venv/lib/python3.10/site-packages/keras/applications/imagenet_utils.py", line 278, in _preprocess_symbolic_input
x = x[..., ::-1]
TypeError: 'BatchDataset' object is not subscriptable
From TypeError: 'DatasetV1Adapter' object is not subscriptable (from BatchDataset not subscriptable when trying to format Python dictionary as table) the suggestion was to use:
train_ds = tf.keras.applications.vgg16.preprocess_input(
list(train_ds.as_numpy_iterator())
)
However, this also fails:
Traceback (most recent call last):
...
File "/path/to/venv/lib/python3.10/site-packages/keras/applications/vgg16.py", line 232, in preprocess_input
return imagenet_utils.preprocess_input(
File "/path/to/venv/lib/python3.10/site-packages/keras/applications/imagenet_utils.py", line 117, in preprocess_input
return _preprocess_symbolic_input(
File "/path/to/venv/lib/python3.10/site-packages/keras/applications/imagenet_utils.py", line 278, in _preprocess_symbolic_input
x = x[..., ::-1]
TypeError: list indices must be integers or slices, not tuple
This is all using Python==3.10.3 with tensorflow==2.8.0.
How can I get this working? Thank you in advance.
Okay I figured it out. I needed to pass a tf.Tensor, not a tf.data.Dataset. One can get a Tensor out by iterating over the Dataset.
This can be done in a few ways:
train_ds = tf.keras.preprocessing.image_dataset_from_directory(...)
# Option 1
batch_images = next(iter(train_ds))[0]
preprocessed_images = tf.keras.applications.vgg16.preprocess_input(batch_images)
# Option 2:
for batch_images, batch_labels in train_ds:
preprocessed_images = tf.keras.applications.vgg16.preprocess_input(batch_images)
If you convert option 2 into a generator, it can be directly passed into the downstream model.fit. Cheers!
Related
My model is designed to train dual images. Since the dataset is very huge I used tf.data.Dataset method to get them as batches as suggested here. However I had a difficulty at properly inputting a batch of images for training. I looked up some possible solutions to no avail. Still, after these modifications:
ds_train = tf.data.Dataset.zip((tr_inputs, tr_labels)).batch(64)
iterator = ds_train.make_one_shot_iterator()
next_batch = iterator.get_next()
result = list()
with tf.Session() as sess:
try:
while True:
result.append(sess.run(next_batch))
except tf.errors.OutOfRangeError:
pass
train_examples = np.array(list(zip(*result))[0]) # tr_examples[0][0].shape (64, 224, 224, 3)
val_examples = np.array(list(zip(*val_result))[0]) # val_examples[0][0].shape (64, 224, 224, 3)
The training code snippet is as follows:
hist = base_model.fit((tr_examples[0][0], tr_examples[0][1]), epochs=epochs, verbose=1,
validation_data=(val_examples[0][0], val_examples[0][1]), shuffle=True)
And the error trace:
Traceback (most recent call last):
File "/home/user/00_files/project/DOUBLE_INPUT/dual_input.py", line 177, in <module>
validation_data=(val_examples[0][0], val_examples[0][1]), shuffle=True)
File "/home/user/.local/lib/python3.5/site-packages/keras/engine/training.py", line 955, in fit
batch_size=batch_size)
File "/home/user/.local/lib/python3.5/site-packages/keras/engine/training.py", line 754, in _standardize_user_data
exception_prefix='input')
File "/home/user/.local/lib/python3.5/site-packages/keras/engine/training_utils.py", line 90, in standardize_input_data
data = [standardize_single_array(x) for x in data]
File "/home/user/.local/lib/python3.5/site-packages/keras/engine/training_utils.py", line 90, in <listcomp>
data = [standardize_single_array(x) for x in data]
File "/home/user/.local/lib/python3.5/site-packages/keras/engine/training_utils.py", line 25, in standardize_single_array
elif x.ndim == 1:
AttributeError: 'tuple' object has no attribute 'ndim'
Looking at the shapes of inputs (in the code snippets' comments), it should work. I guess there is only one step left, but I am not sure what is missing.
I am using python 3.5, keras 2.2.0, tensorflow-gpu 1.9.0 on Ubuntu 16.04.
Help is much appreciated.
EDIT: after correcting the parantheses, it threw this error:
ValueError: Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 2 array(s), but instead got the following list of 1 arrays: [array([[[[0.9607844 , 0.9607844 , 0.9607844 ],
[0.9987745 , 0.9987745 , 0.9987745 ],
[0.9960785 , 0.9960785 , 0.9960785 ],
...,
[0.9609069 , 0.9609069 , 0.96017164...
Process finished with exit code 1
hist = base_model.fit((tr_examples[0][0], tr_examples[0][1]), epochs=epochs, verbose=1,
validation_data=(val_examples[0][0], val_examples[0][1]), shuffle=True)
should be:
hist = base_model.fit(tr_examples[0][0], tr_examples[0][1], epochs=epochs, verbose=1,
validation_data=(val_examples[0][0], val_examples[0][1]), shuffle=True)
Note that while the validation_data parameter expects a tuple, the training input/label pair should not be a tuple (i.e., remove the parenthesis).
This is the error message I get. In the first line, I output the shapes of predicted and target. From my understanding, the error arises from those shapes not being the same but here they clearly are.
torch.Size([6890, 3]) torch.Size([6890, 3])
Traceback (most recent call last):
File "train.py", line 251, in <module>
main()
File "train.py", line 230, in main
train(net, training_dataset, targets, device, criterion, optimizer, epoch, args.epochs)
File "train.py", line 101, in train
loss = criterion(predicted, target.detach().cpu().numpy())
File "/home/hb119056/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/home/hb119056/.local/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 443, in forward
return F.mse_loss(input, target, reduction=self.reduction)
File "/home/hb119056/.local/lib/python3.6/site-packages/torch/nn/functional.py", line 2244, in mse_loss
if not (target.size() == input.size()):
TypeError: 'int' object is not callable
I hope all the relevant context information is provided and if not, please let me know. Thanks for any suggestions!
EDIT: This is the part of the code where this error occurs:
target = torch.from_numpy(np.load(file_dir + '/points/points{:03}.npy'.format(i))).to(device)
rv = torch.zeros(12 * outputs.shape[0])
for j in [x for x in range(10) if x != i]:
source = torch.from_numpy(np.load(file_dir + '/points/points{:03}.npy'.format(j))).to(device)
rv = factor.ransac(source, target, prob, n_iter, tol, device) # some self-written RANSAC-like method
predicted = factor.predict(source, rv, outputs)
print(target.shape, predicted.shape)
loss = criterion(predicted, target.detach().cpu().numpy()) ## error occurs here
criterion is nn.MSELoss().
A little bit late but maybe it will help someone else. Just solved the same problem for myself.
As Alpha said in his answer we cannot call .size() for a numpy array.
But we can call .size() for a tensor.
Therefore, we need to make our target a tensor. You can do it like this:
target = torch.from_numpy(target)
I'm using GPU, so I also needed to send my target to GPU. You can do it like this:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
target = target.to(device)
And then the loss function must work perfectly.
It probably means that you are trying to call a method when a property with the same name is available. If this is indeed the problem, the solution is easy. Simply change the method call into a property access.
If you are comparing in the following way:
compare = (X.method() == Y.method())
Change it to:
compare = (X.method == Y.method)
If this does not answer your question, kindly share the code which you have used to compare the shapes.
that's because your target is a numpy object
File "train.py", line 101, in train:
target.detach().cpu().numpy()
in your code change the target type to numpy.
TLDR try change
loss = criterion(predicted, target.detach().cpu().numpy()) ## error occurs here
to
loss = criterion(predicted, target) ## error occurs here
for example:
In [6]: b = np.ones(3)
In [7]: b.size
Out[7]: 3
In [8]: b.size()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-8-365705555409> in <module>
----> 1 b.size()
TypeError: 'int' object is not callable
I want to use pretrained models' convolutionnal feature maps as input features for a master model.
inputs = layers.Input(shape=(100, 100, 12))
sub_models = get_model_ensemble(inputs)
sub_models_outputs = [m.layers[-1] for m in sub_models]
inputs_augmented = layers.concatenate([inputs] + sub_models_outputs, axis=-1)
Here is the key part of what I do in get_model_ensemble():
for i in range(len(models)):
model = models[i]
for lay in model.layers:
lay.name = lay.name + "_" + str(i)
# Remove the last classification layer to rather get the underlying convolutional embeddings
model.layers.pop()
# while "conv2d" not in model.layers[-1].name.lower():
# model.layers.pop()
model.layers[0] = new_input_layer
return models
All this gives:
Traceback (most recent call last):
File "model_ensemble.py", line 151, in <module>
model = get_mini_ensemble_net()
File "model_ensemble.py", line 116, in get_mini_ensemble_net
inputs_augmented = layers.concatenate([inputs] + sub_models_outputs, axis=-1)
File "/usr/local/lib/python3.4/dist-packages/keras/layers/merge.py", line 508, in concatenate
return Concatenate(axis=axis, **kwargs)(inputs)
File "/usr/local/lib/python3.4/dist-packages/keras/engine/topology.py", line 549, in __call__
input_shapes.append(K.int_shape(x_elem))
File "/usr/local/lib/python3.4/dist-packages/keras/backend/tensorflow_backend.py", line 451, in int_shape
shape = x.get_shape()
AttributeError: 'BatchNormalization' object has no attribute 'get_shape'
Here is type info:
print(type(inputs))
print(type(sub_models[0]))
print(type(sub_models_outputs[0]))
<class 'tensorflow.python.framework.ops.Tensor'>
<class 'keras.engine.training.Model'>
<class 'keras.layers.normalization.BatchNormalization'>
Note: the models I get from get_model_ensemble() have got their compile() function already called. So, how should I concatenate my models properly? Why wont it work? I guess that maybe that has something to do with how would the inputs be fed to the sub-models and how I hot-swapped their input layers.
Thanks for the help!
The thing works if we do:
sub_models_outputs = [m(inputs) for m in sub_models]
rather than:
sub_models_outputs = [m.layers[-1] for m in sub_models]
TLDR: models needs to be called as a layer.
I have a trained model which I am loading using CNTK.load_model() function. I was looking at the MNIST Tutorial on the CNTK git repo as reference for model evaluation code. I have created a data reader (which is a MinibatchSource object) and trying to run model.eval(mb) where mb = minibatch_source.next_minibatch(...) (Similar to this answer)
But, I'm getting the following error message
Traceback (most recent call last):
File "LID_test.py", line 162, in <module>
test_and_evaluate()
File "LID_test.py", line 159, in test_and_evaluate
predictions = model.eval(mb)
File "/home/t-asbahe/anaconda3/envs/cntk-py35/lib/python3.5/site-packages/cntk/ops/functions.py", line 228, in eval
_, output_map = self.forward(arguments, self.outputs, device=device, as_numpy=as_numpy)
File "/home/t-asbahe/anaconda3/envs/cntk-py35/lib/python3.5/site-packages/cntk/utils/swig_helper.py", line 62, in wrapper
result = f(*args, **kwds)
File "/home/t-asbahe/anaconda3/envs/cntk-py35/lib/python3.5/site-packages/cntk/ops/functions.py", line 354, in forward
None, device)
File "/home/t-asbahe/anaconda3/envs/cntk-py35/lib/python3.5/site-packages/cntk/utils/__init__.py", line 393, in sanitize_var_map
if len(arguments) < len(op_arguments):
TypeError: object of type 'Variable' has no len()
I have no input_variable named 'Variable' in my model and I don't see any reason to get this error.
P.S.: My inputs are sparse inputs (one-hots)
You have a few options:
Pass a set of data as numpy array (instance in CNTK 202 tutorial) where onehot data is passed in as a numpy array.
pred = model.eval({model.arguments[0]:[onehot]})
Read the minibatch data and pass it to the eval function
eval_input_map = { input : reader_eval.streams.features }
eval_data = reader_eval.next_minibatch(eval_minibatch_size,
input_map = eval_input_map)
mydata = eval_data[input].value
predicted= model.eval(mydata)
I'm trying to use implement the code from this page. But I can't work out how to format the data (training set / testing set) correctly. My code:
numpy_rng = numpy.random.RandomState(123)
dbn = DBN(numpy_rng=numpy_rng, n_ins=2,hidden_layers_sizes=[50, 50, 50],n_outs=1)
train_set_x = [
([1,2],[2,]), #first element in the tuple is the input, the second is the output
([4,5],[5,])
]
testing_set_x = [
([6,1],[3,]), #same format as the training set
]
#when I looked at the load_data function found elsewhere in the tutorial (I'll show the code they used at the bottom for ease) I found it rather confusing, but this was my first attempt at recreating what they did
train_set_xPrime = [theano.shared(numpy.asarray(train_set_x[0][0],dtype=theano.config.floatX),borrow=True),theano.shared(numpy.asarray(train_set_x[0][1],dtype=theano.config.floatX),borrow=True)]
pretraining_fns = dbn.pretraining_functions(train_set_x=train_set_xPrime,batch_size=10,k=1)
which produced this error:
Traceback (most recent call last):
File "/Users/spudzee1111/Desktop/Code/NNChatbot/DeepBeliefScratch.command", line 837, in <module>
pretraining_fns = dbn.pretraining_functions(train_set_x=train_set_xPrime,batch_size=10,k=1)
File "/Users/spudzee1111/Desktop/Code/NNChatbot/DeepBeliefScratch.command", line 532, in pretraining_functions
n_batches = train_set_x.get_value(borrow=True).shape[0] / batch_size
AttributeError: 'list' object has no attribute 'get_value'
I can't work out how the input is supposed to be formatted. I tried using theano.shared on the list, so that it would be:
train_set_xPrime = theano.shared([theano.shared(numpy.asarray(train_set_x[0][0],dtype=theano.config.floatX),borrow=True),theano.shared(numpy.asarray(train_set_x[0][1],dtype=theano.config.floatX),borrow=True)],borrow=True)
but then it said:
Traceback (most recent call last):
File "/Users/spudzee1111/Desktop/Code/NNChatbot/DeepBeliefScratch.command", line 834, in <module>
train_set_xPrime = theano.shared([theano.shared(numpy.asarray(train_set_x[0][0],dtype=theano.config.floatX),borrow=True),theano.shared(numpy.asarray(train_set_x[0][1],dtype=theano.config.floatX),borrow=True)],borrow=True) #,borrow=True),numpy.asarray(train_set_x[0][1],dtype=theano.config.floatX),borrow=True))
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/theano/compile/sharedvalue.py", line 228, in shared
(value, kwargs))
TypeError: No suitable SharedVariable constructor could be found. Are you sure all kwargs are supported? We do not support the parameter dtype or type. value="[<TensorType(float64, vector)>, <TensorType(float64, vector)>]". parameters="{'borrow': True}"
I tried other combinations but none of them worked.
This should work
numpy_rng = numpy.random.RandomState(123)
dbn = DBN(numpy_rng=numpy_rng, n_ins=2, hidden_layers_sizes=[50, 50, 50], n_outs=1)
train_set = [
([1,2],[2,]),
([4,5],[5,])
]
train_set_x = [train_set[i][0] for i in range(len(train_set))]
nparray = numpy.asarray(train_set_x, dtype=theano.config.floatX)
train_set_x = theano.shared(nparray, borrow=True)
pretraining_fns = dbn.pretraining_functions(train_set_x=train_set_x, batch_size=10, k=1)
The method pretraining_fns is expecting as an input a shared variable of size (number of samples, dimension of inputs). You could check this by looking at the shape of the MNIST dataset, the standard input for this example
It doesn't take a list as an input because this method is only for the pre-training functions. DBNs are pre-trained with an unsupervised learning algorithm, so it doesn't make sense to use the labels
Furthermore, the input list to make your numpy array doesn't make sense. train_set_x[0][0] yields only the first training example. You want train_set_xPrime to have all training examples. Even if you did train_set_x[0] you would have the first training example but with the labels