I built my own sklearn-like estimator using pytorch training inside GPU (cuda) and it works fine with RandomizedSearchCV when n_jobs==1. When n_jobs > 1, I get the following error:
PicklingError: Can't pickle <class 'main.LSTM'>: attribute lookup LSTM on main failed
This is the piece of code giving me the error:
model = my_model(input_size=1, hidden_layer_size=80, n_lstm_units=3, bidirectional=False,
output_size=1, training_batch_size=60, epochs=7500, device=device)
model.to(device)
hidden_layer_size = random.uniform(40, 200, 20).astype("int")
n_lstm_units = arange(1, 4)
parametros = {'hidden_layer_size': hidden_layer_size, 'n_lstm_units': n_lstm_units}
splitter = ShuffleSplit()
regressor = model
cv_search = \
RandomizedSearchCV(estimator=regressor, cv=splitter,
search_spaces=parametros,
refit=True,
n_iter=4,
verbose=1,
n_jobs=2,
scoring=make_scorer(mean_squared_error,
greater_is_better=False,
needs_proba=False))
cv_search = MetaSKLearnWrapper(cv_search)
cv_search.fit(X, y)
Using Neuraxle wrapper leads to exactly same error, changes nothing.
I found closest solution here, but still don't know how to use RandomizedSearchCV within Neuraxle. It is a brand new project, so I couldn't find an answer on their docs or community examples. If anyone can give me an example or a good indication it will save my life. Thank you
Ps: Any way to run RandomizedSearchCV with my pytorch model on the gpu without Neuraxle also helps, I just need n_jobs>1.
Ps2: My model has a fit() method that creates and moves tensors to the gpu and works already tested.
There are multiple criteria that must be respected here for your code to work:
You need to use Neuraxle's RandomSearch instead of sklearn's random search for this to work. Use Neuraxle's base classes when possible.
Make sure that you use a Neuraxle BaseStep for your pytorch model, instead of a sklearn base classe.
Also, you should create your PyTorch code only in the setup() method or later. You can't create a PyTorch model in the __init__ of the BaseStep that contains pytorch code. You will want to read this page.
You will probably have to create a Saver for your BaseStep that contains PyTorch code if you want to serialize and then load your trained pipeline again. You can see how we created our TensorFlow Saver for our TensorFlow BaseStep and do something similar. Your saver will probably be much simpler than ours due to the more eager nature of PyTorch. For instance, you could have self.model inside your extension of the BaseStep class. The role of the saver would be to save and strip away this simple variable from self, and to be able to reload it whenever needed.
To sum up: you'd need to create two classes, and your two classes should look very similar our two TensorFlow step and saver classes here, to the exception that you PyTorch model is in a self.model variable of your step.
I'd be glad to see your implementation of your PyTorch base step and of your PyTorch saver!
You could then also even use the AutoML class (see AutoML example here) to save experiments in a Hyperparameter Repository as seen in the example.
All of the examples I have seen with tensorflow transform export the models after training using an estimator with the export_saved_model function. I am working with existing training code that does not use estimators and saves with saved_model.simple_save.
What is the best practice method for saving a transform_fn with a model for serving without using estimators?
I'm trying to modify a program that uses the Estimator class in TensorFlow (v1.10) and I would like to access the evaluation metric results every time evaluation occurs so that I can copy the checkpoint files only when a new maximum has been achieved.
One idea I had was to create a class inheriting from SessionRunHook, doing the work I want in the after_run method. According to the documentation I can specify what is passed to after_run using before_run. However I cannot find a way to access the evaluation metrics results I want from the information passed in to before_run.
I looked into the Estimator code and it appears that it is writing the results to a summary file so another idea I had was to read this back in the after_run method, but the summary api doesn't seem to provide any read operations.
Are there any other ways I can achieve what I want to do? Not using the Estimator class is not an option as that would involve drastic changes to the code I'm working with.
Checkpoints are not the same as exporting. Checkpoints are about fault-recovery and involve saving the complete training state (weights, global step number, etc.).
In your case I would recommend exporting. The exported model will written to a directory called “exporter” and the serving input function specifies what the end-user will be expected to provide to the prediction service.
You can use the class "Best Exporter" to just export the models that are perfoming best:
https://www.tensorflow.org/api_docs/python/tf/estimator/BestExporter
This class exports the serving graph and checkpoints of the best models.
Also, it performs a model export everytime when the new model is better than any exsiting model.
Posting here as I couldn't find an explicit answer from tensorflow's documentation. I am curious about the actual purpose of the summary files in tensorflow. After training, I call the tensor flow model (the model file and the meta file) that were saved by:
tf.train.saver()
There seems to be no need for me to actually keep the summary files apart from logging training information;I can use my models to predict without referencing the summaries.
Is the summary file merely just log files of the training runs (accuracy and loss). Is there any other purpose that these files serve?
I would like to be use GridSearchCV to determine the parameters of a classifier, and using pipelines seems like a good option.
The application will be for image classification using Bag-of-Word features, but the issue is that there is a different logical pipeline depending on whether training or test examples are used.
For each training set, KMeans must run to produce a vocabulary that will be used for testing, but for test data no KMeans process is run.
I cannot see how it is possible to specify this difference in behavior for a pipeline.
You probably need to derive from the KMeans class and override the following methods to use your vocabulary logic:
fit_transform will only be called on the train data
transform will be called on the test data
Maybe class derivation is not alway the best option. You can also write your own transformer class that wraps calls to an embedded KMeans model and provides the fit / fit_transform / transform API that is expected by the Pipeline class for the first stages.