I have one CSV with around 10k rows and around 370 columns mostly numerical (int or float) and ID columns which is unique and I know the target column (integer type column) which needs to be used as inference for What-If Tool in Tensorboard. I'm not much experienced in tensorflow, but I could not find the documentation that fit my purposes correctly.
Initially, I built my model using this documentation:
https://www.tensorflow.org/tutorials/load_data/pandas_dataframe
To serve the model I went through this documentation:
https://www.tensorflow.org/tensorboard/what_if_tool
Where it said in the requirements:
The model(s) you wish to explore must be served using TensorFlow Serving using the classify, regress, or predict API.
This leads to this link:
https://github.com/tensorflow/serving
I was able to build the saved_model.pb file and use it for serving using docker successfully, but when I use it in Tensorboard What-If Tool I get an error saying "Expected one input Tensor"
And then I went through these links for doing the changes to the model for serving to add input and outputs:
https://www.tensorflow.org/tfx/tutorials/serving/rest_simple
https://www.tensorflow.org/guide/saved_model
But I still can't understand how or what to give as input and output as I only have a target integer column I know about from my CSV. Neither do I understand how to add signatures properly for all 3 APIs.
I checked the UCI Census Demo model and loaded the model and in signatures, I could see classification, regression, and such and all of them are pruned Concrete Functions which I have no idea about.
My client requires me to load the CSV with model understanding and predict features enabled with both Classification and Regression.
Related
I've been looking to train my own ELMo model for the past week and came across these two implementations allenai/bilm-tf & allenai/allennlp. I've been facing a few roadblocks for a few techniques I've tried and would like to clarify my findings, so that I can get a clearer direction.
As my project revolves around healthcare, I would like to train the embeddings from scratch for better results. The dataset I am working on is MIMIC-III and the entire dataset is stored in one .csv, unlike 1 Billion Word Language Model Benchmark (data used in tutorials) where files are stored in separate .txt files.
I was following this "Using ELMo as a PyTorch Module to train a new model" tutorial but I figured out that one of the requirements is a .hdf5 weights_file.
(Question) Does this mean that I will have to train a bilm model first to get .hdf5 weights to input? Can I train an ELMo model from scratch using allennlp.modules.elmo.Elmo? Is there any other way where I can train a model this way with an empty .hdf5 as I was able to run this successfully with tutorial data.
(Question) What will be the best method for me to train my embeddings? (PS: some methods I've tried are documented below). In my case where I will probably need a custom DatasetReader, rather than converting the csv to txt files, wasting memory.
Here, let me go into the details of other methods I have tried so far. Serves as a backstory to the main question of what may be the best technique. Please let me know if you know of any other methods to train my own ELMo model, or if one of the following methods are preferred over the others.
I've tried training a model using the allennlp train ... command by following this tutorial. However, I was unable to run with tutorial data due to the following error which I am still unable to solve.
allennlp.common.checks.ConfigurationError: Experiment specified GPU device 1 but there are only 1 devices available.
Secondly, this is a technique that I found but have not tried. Similar to the technique above it uses the allennlp train ... command but instead I use allenai/allennlp-template-config-files as a template and modify the Model and DatasetReader.
Lastly, I tried using the TensorFlow implementation allenai/bilm-tf following tutorials like this. However, I would like to avoid this method as TF1 is quite outdated. Besides receiving tons of warnings, I faced an error for CUDA as well.
2021-09-14 17:31:36.222624: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 18.45M (19346432 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
I want to use tensorflow for detecting cars in an embedded system, so I tried ssd_mobilenet_v2 and it actually did pretty well for me, except for some specific car types which are not very common and I think that is why the model does not recognize them. I have a dataset of these cases and I want to improve the model by fine-tuning it. I should also note that I need a .tflite file because I'm using tflite_runtime in python.
I followed these instructions https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10 and I could train the model and reached a reasonable loss value. I then used export_tflite_ssd_graph.py in the object detection API to build inference_graph from the trained model. Afterwards I used toco tool to build a .tflite file out of it.
But here is the problem, after I've done all that; not only the model did not improve, but now it does not detect any cars. I got confused and do not know what is the problem, I searched a lot and did not find any tutorial about doing what I need to do. They just added a new object to a model and then exported it, which I tried and I was successful doing that. I also tried to build a .tflite file without training the model and directly from the Tensorflow detection model zoo and it worked fine. So I think the problem has something to do with the training process. Maybe I am missing something there.
Another thing that I did not find in documents is that whether is it possible to "add" a class to the current classes of an object detection model. For example, let's assume the mobilenet ssd v2 detects 90 different object classes, I would like to add another class so that the model detects 91 different classes instead of 90 classes. As far as I understand and tested after doing transfer learning using object detection API, I could only detect the objects that I had in my dataset and the old classes will be gone. So how do I do what I explained?
I found out that there is no way to 'add' a class to the previously trained classes but with providing a little amount of data of that class you can have your model detect it. The reason is that the last layer of the model changes when transfer learning is applied. In my case I labeled around 3k frames containing about 12k objects because my frames would be complicated. But for simpler tasks as I saw in tutorials 200-300 annotated images would be enough.
And for the part that the model did not detect anything it has something to do with the convert command that I used. I should have used tflite_convert instead of toco. I explained more here.
There are many ways to save a model and its weights. It is confusing when there are so many ways and not any source where we can read and compare their properties.
Some of the formats I know are:
1. YAML File - Structure only
2. JSON File - Structure only
3. H5 Complete Model - Keras
4. H5 Weights only - Keras
5. ProtoBuf - Deployment using TensorFlow serving
6. Pickle - Scikit-learn
7. Joblib - Scikit-learn - replacement for Pickle, for objects containing large data.
Discussion:
Unlike scikit-learn, Keras does not recommend you save models using pickle. Instead, models are saved as an HDF5 file. The HDF5 file contains everything you need to not only load the model to make predictions (i.e., architecture and trained parameters) but also to restart training (i.e., loss and optimizer settings and the current state).
What are other formats to save the model for Scikit-learn, Keras, Tensorflow, and Mxnet? Also what info I am missing about each of the above-discussed formats?
There are also formats like onnx which basically supports most of the frameworks and helps in removing the confusion of using different formats for different frameworks.
There exists also TFJS format, which enables you to use the model on web or node.js environments. Additionally, you will need TF Lite format to make inference on mobile and edge devices. Most recently, TF Lite for Microcontrollers exports the model as a byte array in C header file.
Your question on formats for saving a model has multiple possible answers, based on why you want to save your model:
Save your model to resume training it later
Save your model to load it for inference later
These scenarios give you a couple of options:
You could save your model using the library-specific saving functions; if you want to resume training, make sure that you have saved all the information you need to really be able to resume training. Formats here will vary by library, and indeed are not aimed at being formats that you would inspect or read in any way - they are just files. If you are looking for a library that wraps all of these save functions behind a common API, you should check out the modelstore Python library.
You can also want to use a common format like ONNX; there are converters from Keras to ONNX and scikit-learn to ONNX available; but it is uncommon to use this format to later resume training. The benefit here is that they are all saved to a common format, which may streamline the process of loading them later.
I faced a strange challenge trying to train neural network using code from github, it is huggingface conversational model.
What happens: even i use my own dataset for training result remains the same like with original dataset. My hypothesis that it is a somehow cache problem - old dataset continuously get loaded from cached and replace my.
Them when i launch actual interactive session with neural network it works, but without my data, even if i pass model checkpoint.
Why i think of cache: in this repo author use automatic downloading and caching neural network model in /home/joo/.cache/torch/pytorch_transformers/ if no parameter specified in terminal.
I have created an issue on Github. BUT i am not sure is that a problem specific for this repo only, or it is a common problem with retraining neural networks i faced first time.
https://github.com/huggingface/transfer-learning-conv-ai/issues/36
Some copypaste from issue:
I am still curious, was not able to pass my dataset:
I added to original 200mb json my personality
trained once more with --dataset_path ./my.json
invoke interact.py with new checkpoint and path python ./interact.py --model_checkpoint
./runs/Oct08_18-22-53_joo-tf_openai-gpt/ --dataset_path ./my.json
and it reports Gathered 18878 personalities (but not 18879, with my own).
I changed the code in interact.py to choose my first perosnality this way
was: personality = random.choice(personalities)
become: personality = personalities[0]
and this first personality is not mine.
Solved: it is a specific issue to this repo, just hardcoded dataset path.
But still why it doesn't load first time - no answer
I'm looking to run a basic fully-connected neural network for the MNIST dataset with the C++ API v1.2 from Tensorflow. I have trained the model and exported it using tf.train.Saver() in Python. This gave me a checkpoint file, a data file, an index file and a meta file.
I know that the data file contains the saved variables while the meta file contains the graph from using Tensorboard on a previous project.
However, I am not sure what is the recommended way to load those files
and run the trained model in a C++ environment in v1.2, since all the
tutorials and questions I've found are for older versions which differ
substantially.
I've found that tensorflow::ops::Restore should be the method to do such a thing, but I know that inference in Tensorflow isn't well supported, as such I am not certain what parameters should I give it in order to receive the trained model that I can just put into a session->Run() and receive an accuracy statement when fed test data.