Python Killed after saving or loading numpy arrays

Python Killed after saving or loading numpy arrays - python

I am trying to learn deep learning, especially how to do 'speaker diarization (github link).
I built an environment on Ubuntu to run a speaker diarization project on github.
However, I faced a 'Killed' message during generating embeddings for speaker.
I could find that error had occurred with below code (From here)
np.savez('training_data', train_sequence=train_sequence, train_cluster_id=train_cluster_id)
After I reduce the epochs, I found that training_data.npz is about 500MB (much smaller than my RAM) and somehow generating embeddings is normally finished with few epochs.
But after this, I faced same error on loading this small training_data.npz for training.
train_data = np.load('./ghostvlad/training_data.npz')
To summarize, I don't understand why my system can't save or load this small file.
(P.S. Sorry for my poor English)

Related

Tflite Model Maker using Pascal-VOC

I have 31.216 labeled images for object detection. I used LabelIMG program to label images and it's in Pascal-VOC format. I want to create a tflite model for my Kotlin project. However, I have serious problems.
First, If I use my local environment, I tried to install tflite-model-maker library in PyCharm using pip install tflite-model-maker. It downloaded ~30GB and Python still says unresolved reference. Then I tried to add the library from here but it also didn't work. I couldn't achieve importing the library.
On the second way, I used Google Colab. Following this tutorial from Tensorflow. I mounted my Google Drive in Colab and edited all codes for my dataset path. I ran this line model.export(export_dir='.', tflite_filename='AslModel.tflite') lastly and it create model file in the colab directory. I continued to run next line model.evaluate_tflite('AslModel.tflite', val_data) and it gave 16 hours ETA and after 14 hours Google Colab runtime gave an error and all runtime has been reset. Now, I have a tflite and I tested it. Since there is no evaluation step, it makes bad predictions. I started all over again but Google Colab gave an error again. I guess ~7 hours training + ~16 hours of evaluation is impossible with Google Colab because there is 24h limit. Thus, my question is how can I run the evaluation step only?
model is defining in this line and it takes 7 hours model = object_detector.create(train_data, model_spec=spec, batch_size=4, train_whole_model=True, epochs=20, validation_data=val_data). Instead of this line, I want to initialize my tflite file to a model like model = LoadModel(PATH_OF_MY_TFLITE). I couldn't find any load method so I'm stuck there.
To sum up, objective is training the Pascal-VOC formatted dataset. I couldn't import the libraries for Python and with Google Colab I have raw tflite model but it needs evaluation and I can't run previous steps due to time limit. Lastly, I bought Colab Pro but I spent all my compute unit. I don't even know what is the purpose of compute unit. I'm waiting for suggestions. Thank you.

Can I train an ELMo model from scratch using allennlp.modules.elmo.Elmo?

I've been looking to train my own ELMo model for the past week and came across these two implementations allenai/bilm-tf & allenai/allennlp. I've been facing a few roadblocks for a few techniques I've tried and would like to clarify my findings, so that I can get a clearer direction.
As my project revolves around healthcare, I would like to train the embeddings from scratch for better results. The dataset I am working on is MIMIC-III and the entire dataset is stored in one .csv, unlike 1 Billion Word Language Model Benchmark (data used in tutorials) where files are stored in separate .txt files.
I was following this "Using ELMo as a PyTorch Module to train a new model" tutorial but I figured out that one of the requirements is a .hdf5 weights_file.
(Question) Does this mean that I will have to train a bilm model first to get .hdf5 weights to input? Can I train an ELMo model from scratch using allennlp.modules.elmo.Elmo? Is there any other way where I can train a model this way with an empty .hdf5 as I was able to run this successfully with tutorial data.
(Question) What will be the best method for me to train my embeddings? (PS: some methods I've tried are documented below). In my case where I will probably need a custom DatasetReader, rather than converting the csv to txt files, wasting memory.
Here, let me go into the details of other methods I have tried so far. Serves as a backstory to the main question of what may be the best technique. Please let me know if you know of any other methods to train my own ELMo model, or if one of the following methods are preferred over the others.
I've tried training a model using the allennlp train ... command by following this tutorial. However, I was unable to run with tutorial data due to the following error which I am still unable to solve.
allennlp.common.checks.ConfigurationError: Experiment specified GPU device 1 but there are only 1 devices available.
Secondly, this is a technique that I found but have not tried. Similar to the technique above it uses the allennlp train ... command but instead I use allenai/allennlp-template-config-files as a template and modify the Model and DatasetReader.
Lastly, I tried using the TensorFlow implementation allenai/bilm-tf following tutorials like this. However, I would like to avoid this method as TF1 is quite outdated. Besides receiving tons of warnings, I faced an error for CUDA as well.
2021-09-14 17:31:36.222624: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 18.45M (19346432 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY

Python unable to access multiple GB of my ram?

I'm writing a machine learning project for some fun but I have run into an interesting error that I can't seem to fix. I'm using Sklearn (LinearSVC, train_test_split), numpy, and a few other small libraries like collections.
The project is a comment classifier - You put in a comment, it spits out a classification. The problem I'm running into is that a Memory Error (Unable to allocate 673. MiB for an array with shape (7384, 11947) and data type float64) when doing a train_test_split to check the classifier accuracy, specifically when I call model.fit.
There are 11947 unique words that my program finds, and I have a large training sample (14,769), but I've never had an issue where I run out of RAM. The problem is, I'm not running out of RAM. I have 32 GB, but the program ends up using less than 1gb before it gives up.
Is there something obvious I'm missing?

machine learning model upon persisting performance reduces

I have built a preliminary ML (PySpark) model with sample data on my PC (Windows) and the accuracy is around 70%. After persisting model binary on disk I am reading it from a different jupyter notebook and the accuracy is somewhere near 70%. Now if I do the same thing on our cluster (MapR/Unix), after reading the model binary from disk, accuracy goes down to 10-11% (the dataset is also exactly same). Even with the full dataset I got the same issue (just for information).
As the cluster has Unix OS, I tried training-persisting-testing the model in a docker container (Unix), but no issue there. The issue is only with the cluster.
I have been scratching my head since then about what might be causing this and how to resolve it. Please help.
Edit:
It's a classification problem and I have used pyspark.ml.classification.RandomForestClassifier.
To persist the models I am simply using the standard setup:
model.write().overwrite().save(model_path)
And to load the model:
model = pyspark.ml.classification.RandomForestClassificationModel().load(model_path)
I have used StringIndexer, OneHotEncoder etc in the model and have also persisted them on disk to in order to use them in the other jupyter notebook (same way as the main model).
Edit:
Python: 3.x
Spark: 2.3.1

How can I train dlib shape predictor using a very large training set

I'm trying to use the python dlib.train_shape_predictor function to train using a very large set of images (~50,000).
I've created an xml file containing the necessary data, but it seems like train_shape_predictor loads all the referenced images into RAM before it starts training. This leads to the process getting terminated because it uses over 100gb of RAM. Even trimming down the data set uses over 20gb (machine only has 16gb physical memory).
Is there some way to get train_shape_predictor to load images on demand, instead of all at once?
I'm using python 3.7.2 and dlib 19.16.0 installed via pip on macOS.

I posted this as an issue on the dlib github and got this response from the author:
It's not reasonable to change the code to cycle back and forth between disk and ram like that. It will make training very slow. You should instead buy more RAM, or use smaller images.
As designed, large training sets need tons of RAM.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.