I have installed Tensorflow and Tflearn on my Jetson Tx1. Tensorflow works and the program I'm trying to run works on my mac. But I get this error when I run it on my jetson.
Traceback (most recent call last):
File "net.py", line 164, in <module>
net = tflearn.regression(net, optimizer='adam', learning_rate=0.00001)
File "/usr/local/lib/python3.5/dist-packages/tflearn/layers/estimator.py", line 174, in regression
loss = objectives.get(loss)(incoming, placeholder)
File "/usr/local/lib/python3.5/dist-packages/tflearn/objectives.py", line 66, in categorical_crossentropy
keepdims=True)
TypeError: reduce_sum() got an unexpected keyword argument 'keepdims'
The code for the neural net
# Network building
net = tflearn.input_data([None, 25])
net = tflearn.embedding(net, input_dim=len(words), output_dim=256) #Embedding instead of one hot encoding.
net = tflearn.lstm(net, 256, dropout=0.9) #0.9, 0.00001, 30 was good -->63%
net = tflearn.fully_connected(net, 2, activation='softmax')
net = tflearn.regression(net, optimizer='adam', loss='categorical_crossentropy', learning_rate=0.00001)
# Training
model = tflearn.DNN(net, tensorboard_verbose=0)
model.fit(x_train, y_train, n_epoch=15, validation_set=(x_test, y_test), show_metric=True, batch_size=30)
model.save('mod.model')
For Tensorflow v1.4 or below, the parameter to preserve dimensions is written keep_dims (with underscore). The change (to keepdims, currently with retro-compatibility) was introduced in v1.5.
It is thus possible that your TFlearn version is too recent for your Tensorflow. Upgrading the latter may solve your problem.
Related
I use keras for training a model (theano backend). I've created a dataset with images from google using an expansion for downloading. But I have the error.
When I start running my code some time passes and then the error appears. If target_size < width & height of a picture - the first error appears if target_size = width & height of a picture - the second.
import numpy as np
from PIL import Image
from matplotlib import pyplot as plt
import os
import keras
import joblib
from keras.preprocessing.image import ImageDataGenerator
train_images = 'C:\\Users\\Администратор\\AppData\\Local\\Programs\\Python\\Python36-32\\train_images'
model = keras.Sequential([
keras.layers.Flatten(input_shape=(36, 36, 3)),
keras.layers.Dense(64, activation='relu'),
keras.layers.Dropout(0.5),
keras.layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
datagen = ImageDataGenerator(rescale = 1. /255)
train_generator = datagen.flow_from_directory(
train_images,
target_size = (36,36),
batch_size = 4,
class_mode = 'binary')
model.fit(np.array(train_generator), epochs=10, validation_split = 0.1)
Error. If you need more details let me know.
C:\Users\Администратор>C:\Users\Администратор\AppData\Local\Programs\Python\Pyth
on36-32\image_guess.py
Using Theano backend.
WARNING (theano.configdefaults): g++ not available, if using conda: `conda insta
ll m2w64-toolchain`
C:\Users\Администратор\AppData\Local\Programs\Python\Python36-32\lib\site-packag
es\theano\configdefaults.py:560: UserWarning: DeprecationWarning: there is no c+
+ compiler.This is deprecated and with Theano 0.11 a c++ compiler will be mandat
ory
warnings.warn("DeprecationWarning: there is no c++ compiler."
WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to exe
cute optimized C-implementations (for both CPU and GPU) and will default to Pyth
on implementations. Performance will be severely degraded. To remove this warnin
g, set Theano flags cxx to an empty string.
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS fu
nctions.
Found 336 images belonging to 2 classes.
Traceback (most recent call last):
File "C:\Users\Администратор\AppData\Local\Programs\Python\Python36-32\image_g
uess.py", line 32, in <module>
model.fit(np.array(train_generator), epochs=10, validation_split = 0.1)
File "C:\Users\Администратор\AppData\Local\Programs\Python\Python36-32\lib\sit
e-packages\keras_preprocessing\image\iterator.py", line 104, in __next__
return self.next(*args, **kwargs)
File "C:\Users\Администратор\AppData\Local\Programs\Python\Python36-32\lib\sit
e-packages\keras_preprocessing\image\iterator.py", line 116, in next
return self._get_batches_of_transformed_samples(index_array)
File "C:\Users\Администратор\AppData\Local\Programs\Python\Python36-32\lib\sit
e-packages\keras_preprocessing\image\iterator.py", line 231, in _get_batches_of_
transformed_samples
x = img_to_array(img, data_format=self.data_format)
File "C:\Users\Администратор\AppData\Local\Programs\Python\Python36-32\lib\sit
e-packages\keras_preprocessing\image\utils.py", line 309, in img_to_array
x = np.asarray(img, dtype=dtype)
File "C:\Users\Администратор\AppData\Local\Programs\Python\Python36-32\lib\sit
e-packages\numpy\core\_asarray.py", line 83, in asarray
return array(a, dtype, copy=False, order=order)
TypeError: float() argument must be a string or a number, not 'JpegImageFile'
I am building a prediction model for sequence data using conv1d layer provided by Keras. This is how I did
input_layer = Input(shape=(500,))
layer = Conv1D(128,5,activation="relu")(input_layer)
layer = MaxPooling1D(pool_size=2)(layer)
layer = Flatten()(layer)
layer = Dense(128, activation='relu')(layer)
output_layer = Dense(10, activation='softmax')(layer)
classifier = Model(input_layer, output_layer)
classifier.summary()
classifier.compile(optimizer=optimizers.Adam(), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
return classifier
However, am facing the following error:
Traceback (most recent call last):
File "train.py", line 71, in <module>
classifier = create_cnn_model()
File "train.py", line 60, in create_cnn_model
layer = Conv1D(128,5, activation="relu")(input_layer)
File "C:\Python368\lib\site-packages\keras\backend\tensorflow_backend.py", line 75, in symbolic_fn
_wrapper
return func(*args, **kwargs)
File "C:\Python368\lib\site-packages\keras\engine\base_layer.py", line 446, in __call__
self.assert_input_compatibility(inputs)
File "C:\Python368\lib\site-packages\keras\engine\base_layer.py", line 342, in assert_input_compat
ibility
str(K.ndim(x)))
ValueError: Input 0 is incompatible with layer conv1d_1: expected ndim=3, found ndim=2
I think the input_shape in the first layer is not setup right. How to set it up?
Right, conv layers need 3 dimensional input.
I am assuming you have a univariate time series with 500 samples.
You need to write a function to split the time series into steps.
For example:
x y
[t-n,...,t-2,t-1] t
So you are basically using the last n values to predict the next value in your series.
Then your input shape will be [len(x), n, 1]
I've an issue running a Keras model on a Google Cloud Platform instance.
The model is the following:
n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1]
train_y = train_y.reshape((train_y.shape[0], train_y.shape[1], 1))
verbose, epochs, batch_size = 1, 1, 64 # low number of epochs just for testing purpose
with tf.device('/cpu:0'):
m = Sequential()
m.add(CuDNNLSTM(20, input_shape=(n_timesteps, n_features)))
m.add(LeakyReLU(alpha=0.1))
m.add(RepeatVector(n_outputs))
m.add(CuDNNLSTM(20, return_sequences=True))
m.add(LeakyReLU(alpha=0.1))
m.add(TimeDistributed(Dense(20)))
m.add(LeakyReLU(alpha=0.1))
m.add(TimeDistributed(Dense(1)))
self.model = multi_gpu_model(m, gpus=8)
self.model.compile(loss='mse', optimizer='adam')
self.model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, verbose=verbose)
As you can see from the code above, I run the model on machine with 8 GPUs (Nvidia Tesla K80).
Train works well, without any errors. However, the prediction fails and returns the following error:
W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at cudnn_rnn_ops.cc:1336 : Unknown: CUDNN_STATUS_BAD_PARAM
in tensorflow/stream_executor/cuda/cuda_dnn.cc(1285): 'cudnnSetTensorNdDescriptor( tensor_desc.get(), data_type, sizeof(dims) / sizeof(dims[0]), dims, strides)'
Here the code to run the prediction:
self.model.predict(input_x)
What I've noticed is that if I remove the code for multi-GPU data parallelism, the code works well using a single GPU.
To be more precise, if I comment this line, the code works without error
self.model = multi_gpu_model(m, gpus=8)
What am I missing?
virtualenv information
cudatoolkit - 10.0.130
cudnn - 7.6.4
keras - 2.2.4
keras-applications - 1.0.8
keras-base - 2.2.4
keras-gpu - 2.2.4
python - 3.6
UPDATE
train_x.shape = (1441, 288, 1)
train_y.shape = (1441, 288, 1)
input_x.shape = (1, 288, 1)
After Olivier Dehaene's reply I tried his suggestion and it worked.
I tried to modify the input_x shape in order to obtain (8, 288, 1).
In order to do that I also modified train_x and train_y shapes.
Here a recap:
train_x.shape = (8065, 288, 1)
train_y.shape = (8065, 288, 1)
input_x.shape = (8, 288, 1)
But now I've the same error on the training phase, on this line:
self.model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, verbose=verbose)
From the tf.keras.utils.multi_gpu_model we can see that it works in the following way:
Divide the model's input(s) into multiple sub-batches.
Apply a model copy on each sub-batch. Every model copy is executed on a dedicated GPU.
Concatenate the results (on CPU) into one big batch.
You are triggering an error because the input of the CuDNNLSTM layer is empty for at least one of the model copy. This is because the divide operations requires that: input // n_gpus > 0
Try this code out:
input_x = np.random.randn(8, n_timesteps, n_features)
model.predict(input_x)
When trying to fit Keras model, written in tensorflow.keras API with tf.Dataset induced iterator, the model is complaining about steps_per_epoch argument, even though I've set this one to a concrete value.
Here below is my model class
import tensorflow as tf
import numpy as np
from typing import Union, List
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping
from tensorflow.keras import layers
from tftools import TFTools
class TestServe():
def __init__(self, tfrecords: Union[List[tf.train.Example], tf.train.Example], batch_size: int = 10, input_shape: tuple = (64, 23)) -> None:
self.tfrecords = tfrecords
self.batch_size = batch_size
self.input_shape = input_shape
def get_model(self):
ins = layers.Input(shape=(64, 23))
l = layers.Reshape((*self.input_shape, 1))(ins)
l = layers.Conv2D(8, (30, 23), padding='same', activation='relu')(l)
l = layers.MaxPool2D((4, 5), strides=(4, 5))(l)
l = layers.Conv2D(16, (3, 3), padding='same', activation='relu')(l)
l = layers.Conv2D(32, (3, 3), padding='same', activation='relu')(l)
l = layers.MaxPool2D((2, 2), strides=(2, 2))(l)
l = layers.Flatten()(l)
out = layers.Dense(1, activation='softmax')(l)
return tf.keras.models.Model(ins, out)
def train(self):
# Create Dataset
dataset = TFTools.create_dataset(self.tfrecords)
dataset = dataset.repeat(6).batch(self.batch_size)
val_iterator = dataset.take(300).make_one_shot_iterator()
train_iterator = dataset.skip(300).make_one_shot_iterator()
model = self.get_model()
model.summary()
model.compile(optimizer='rmsprop',
loss='binary_crossentropy', metrics=['accuracy'])
model.fit(train_iterator, validation_data=val_iterator,
epochs=10, verbose=1, steps_per_epoch=20)
def predict(self, X: np.array) -> np.array:
pass
ts = TestServe(['./ok.tfrecord', './nok.tfrecord'])
ts.train()
But as soon I start the training, before the first batch is finished, I get an exception from tensorflow
2019-06-13 14:22:25.393398: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 1995445000 Hz
2019-06-13 14:22:25.393681: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x2f7d120 executing computations on platform Host. Devices:
2019-06-13 14:22:25.393708: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined>
Epoch 1/2
19/20 [===========================>..] - ETA: 0s - loss: 1.1921e-07 - acc: 1.0000Traceback (most recent call last):
File "TestServe.py", line 62, in <module>
ts.train()
File "TestServe.py", line 56, in train
epochs=2, verbose=1, callbacks=callbacks, steps_per_epoch=20) #The steps_per_epoch is typically samples_per_epoch / batch_size
File "/home/josef/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 880, in fit
validation_steps=validation_steps)
File "/home/josef/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 364, in model_iteration
validation_in_fit=True)
File "/home/josef/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 202, in model_iteration
steps_per_epoch)
File "/home/josef/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 76, in _get_num_samples_or_steps
'steps_per_epoch')
File "/home/josef/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_utils.py", line 230, in check_num_samples
if check_steps_argument(ins, steps, steps_name):
File "/home/josef/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_utils.py", line 960, in check_steps_argument
input_type=input_type_str, steps_name=steps_name))
ValueError: When using data tensors as input to a model, you should specify the `steps_per_epoch` argument.
The original dataset contains around 1500 samples, but I want to join multiple tfrecord files to TFRecordDataset so I wont have the information about the length.
Anyone saw something similar before? I dont know where to go for help, since the tf.keras API is relatively new. The create_dataset function just returns the dataset mapped with the right parse function.
Found the solution.
There is not only steps_per_epoch but also validation_steps parameter, which you also have to specify.
This error was reported when I tried to use a TensorFlow 2.0 model when I actually had an older version (TensorFlow 1.14) installed locally.
To upgrade to the latest TensorFlow version, run:
python -m pip install --upgrade pip
python -m pip install --upgrade tensorflow
I'm trying to get into machine learning and I've decided on using tflearn for a start.
I used tflearn's quickstart guide to get the basics and tried using that neural network for a task I've set myself:
Predicting the age of abalones from their dimensions. For this I downloaded the according dataset as .csv from the UCI repository. The table is in this format:
SEX|LENGTH|DIAMETER|HEIGHT|WHOLE WEIGHT|SHUCKED WEIGHT|VISCERA WEIGHT|SHELL WEIGHT|RINGS
Since the age is the same as the number of rings, I imported the .csv like this:
data, labels = load_csv("abalone.csv", categorical_labels=False, has_header=False)
The task is to predict the number of rings based on the data, so I set up my input layer like this:
net = tflearn.input_data(shape=[None, 8])
Added four hidden layers with the default linear activation function:
net = tflearn.fully_connected(net, 320)
net = tflearn.fully_connected(net, 200)
net = tflearn.fully_connected(net, 200)
net = tflearn.fully_connected(net, 320)
And an output layer with one node since there is only one result (no. of rings):
net = tflearn.fully_connected(net, 1, activation="sigmoid")
net = tflearn.regression(net)
Now I initialize the model but during training the above error occurs:
model = tflearn.DNN(net)
model.fit(data, labels, n_epoch=1000, show_metric=True, batch_size=1600)
The entire exception:
Traceback (most recent call last):
File "D:\OneDrive\tensornet.py", line 34, in <module>
model.fit(data, labels, n_epoch=1000, show_metric=True, batch_size=1600)
File "C:\Python3\lib\site-packages\tflearn\models\dnn.py", line 215, in fit
callbacks=callbacks)
File "C:\Python3\lib\site-packages\tflearn\helpers\trainer.py", line 333, in fit
show_metric)
File "C:\Python3\lib\site-packages\tflearn\helpers\trainer.py", line 774, in _train
feed_batch)
File "C:\Python3\lib\site-packages\tensorflow\python\client\session.py", line 767, in run
run_metadata_ptr)
File "C:\Python3\lib\site-packages\tensorflow\python\client\session.py", line 944, in _run
% (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (1600,) for Tensor 'TargetsData/Y:0', which has shape '(?, 1)'
From what I understand, the exception occurs when trying to fit my labels (which are a 1600x1 Tensor) with my output layer. But I don't know how to fix this.
You need to add another axis to the labels so they'll have a (1600,1) shape instead of (1600,)
The simplest way to do it is like this:
labels = labels[:, np.newaxis]