How to update data_dir and data_path in TF DatasetInfo object?

How to update data_dir and data_path in TF DatasetInfo object? - python

I'm trying to run a script that builds and loads a TF dataset. The dataset is cityscapes and it is already downloaded and stored in fs/datasets/cityscapes/. I can't move the data. In the directory, there are the following files: ['tfrecord', 'gtFine', 'tfrecord_instances_old', 'README', 'leftImg8bit', 'cityscapesScripts', 'tfrecord_instances', 'license.txt']. An error arises when I try to run dataset = self._dataset_builder.as_dataset(split=self._split, decoders=self._decoders). This error is
AssertionError: Dataset cityscapes: could not find data in /fs/datasets/cityscapes. Please make sure to call dataset_builder.download_and_prepare(), or pass download=True to tfds.load() before trying to access the tf.data.Dataset object.
I believe the issue relates to Constructing tf.data.Dataset cityscapes for split train, from /fs/datasets/cityscapes/cityscapes/semantic_segmentation/1.0.0 which is printed before the error. This added path comes from the Cityscapes TFDS DatasetInfo object. If I try to edit the data_dir or data_path in that object with self._dataset_builder.info.data_dir='/fs/datasets/cityscapes', I receive the error message: AttributeError: can't set attribute. So if anyone has a fix, I'd appreciate it.

Related

Error , Unsuccessful TensorSliceReader constructor: Failed to find any matching files for ram to unpickle a file

I am running into this error , i can't unpickle a file on my jupyter notebook:
import os
import pickle
import joblib
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
filename = open("loan_model3.pkl", "rb")
mdl = pickle.load(filename)
mdl.close()
and it always shows the below error message , even tho i'vce upgraded all my libraries
Error Message:
FileNotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for ram://89506590-ec42-44a9-b67c-3ee4cc8e884e/variables/variables You may be trying to load on a different device from the computational device. Consider setting the experimental_io_deviceoption intf.saved_model.LoadOptions to the io_device such as '/job:localhost'.
I tried to upgrade my libraries but still didn't work.

I got the same error too when I was trying to store my Sequential model in .pkl file, since Sequential model is a TensorFlow Keras model so we have to store it in .h5 file and Keras saves models in this format as it can easily store the weights and model configuration in a single file.
Code:
from keras.models import load_model
model.save('model.h5')
model_final = load_model('model.h5')

Idk if you are still here but I found the solution. basically you should not save the tensorflow model into a pickle file but instead into h5 file
## save model
save_path = './model.h5'
model.save(save_path)
## load tensorflow model
model = keras.models.load_model(save_path)
This worked for me. Hope this helps you too.

this worked for me:
import tensorflow as tf
path = './model.h5'
model.save(path )
loaded_model= tf.keras.models.load_model(path )

I have faced the same issue, but by saving the model as .h5 file worked for me. Now i'm able to load .h5 model.

'str' object has no attribute 'call' when converting .model to tflite file

I have been following this tutorial to perform voice command recognition for a couple words on my ESP32: https://github.com/atomic14/voice-controlled-robot
I was able to train my model and have the "fully_trained.model" file: "fully_trained.model"
Currently I am trying to convert the .model file into the tflite file, however I am getting the "'str' has no attribute 'call'" error: Code, Code, Errors
My tensorflow version is 2.6.2 and python version is 3.10.
Unfortunately, I do not have 10 reputation points yet, so I couldnt embed the images.

If you use tf.lite.TFLiteConverter.from_keras_model you need to pass the tf.keras.Model instance, not the path to the saved_model folder.
Use tf.lite.TFLiteConverter.from_saved_model() instead and pass the path to the "fully_trained.model" folder.

You've passed "fully_trained.model", with quotation marks, as an argument to TFLiteConverter. That's a string. Give the model a name and pass that name as an argument to the converter, without quotation marks.

Where are the `tfds.load` datasets are saved?

I downloaded the cats vs dogs dataset using the tfds.load('cats_vs_dogs') and I want to find where it has been saved on my computer, after reading a bit I came across someone who claims the dataset can be found at ~/tensorflow_datasets/cats_vs_dogs/ but I can't find a folder that is called cats_vs_dogs at that path, how can I get the path to the files?

As per default
as I assume TFDS_DATA_DIR has not been set, datasets will be stored under ~/tensorflow_datasets
However, as this depends on your system and setup: If you want to check the dataset and see it, I would suggest to just manually set data_dir when using tfds.load. Then you know for sure, where it is stored.

You can use this:
import tensorflow_datasets as tfds
tfds.core.get_tfds_path('cats_vs_dogs')
'C:/Users/user/anaconda3/envs/env/lib/site-packages/tensorflow_datasets/cats_vs_dogs'

You can also set a folder to download as :
data_dir = 'D:\\Sandbox\\Github\\DATA_TFDS'
tfds.load(name='mnist',
split=['train', 'test'],
shuffle_files=True,
data_dir=data_dir,
with_info=True,
download=True)

TypeError: Input 'filename' of 'ReadFile' Op has type float32 that does not match expected type of string

I am running this code from the tutorial here: https://keras.io/examples/vision/image_classification_from_scratch/
with a custom dataset, that is divided in 2 datasets as in the tutorial. However, I got this error:
TypeError: Input 'filename' of 'ReadFile' Op has type float32 that does not match expected type of string.
I made this casting. I tried this:
is_jfif = str(tf.compat.as_bytes("JFIF")) in fobj.peek(10)
but nothing changed as far as the error
I am trying all day to figure out how to solve it, without any success. Can someone help me? Thank you...

Simplest way I found is to create a subfolder and copy the files to that subfolder.
i.e. Lets assume your files are 0.jpg, 1.jpg,2.jpg....2000.jpg and in directory named "patterns".
Seems like the Keras API does not accept it as the files are named by numbers and for Keras it is in float32.
To overcome this issue, either you can rename the files as one answer suggests, or you can simply create a subfolder under "patterns" (i.e. "patterndir"). So now your image files are under ...\patterns\patterndir
Keras (internally) possibly using the subdirectory name and may be attaching it in front of the image file thus making it a string (sth like patterndir_01.jpg, patterndir_02.jpg) [Note this is my interpretation, does not mean that it is true]
When you compile it this time, you will see that it works and you will get a compiler message as:
Found 2001 files belonging to 1 classes.
Using 1601 files for training.
Found 2001 files belonging to 1 classes.
Using 400 files for validation.
My code looks like this
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
#Generate a dataset
image_size = (28, 28)
batch_size = 32
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
"patterns",
validation_split=0.2,
subset="training",
seed=1337,
image_size=image_size,
batch_size=batch_size,
)
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
"patterns",
validation_split=0.2,
subset="validation",
seed=1337,
image_size=image_size,
batch_size=batch_size,
)

In my case, I simply did not have enough samples in the training directories. There was one per category and I got the error.

Just make a subdirectory and move your files there.
So if the files are here:
'/home/dataset_28/'
Put them here:
'/home/dataset_28/files/'
And then do this:
from tensorflow.keras.preprocessing import image_dataset_from_directory
image_dataset_from_directory('/home/dataset_28/', batch_size=1, image_size=(28, 28))

The names of the files are in the float32 format.
Renaming all the images in the dataset solves the problem.
Loop over all the files with os.rename().

I was just hitting this TypeError: Input 'filename' of 'ReadFile' Op has type float32 that does not match expected type of string. error too with tensorflow==2.4.4.
I played around with validation_split:
Error happens: validation_split=0.001
I did this in an effort to have 0 images in the validation set.
Error doesn't happen: validation_split=0.2
This results in 1 image used for validation
Conclusion: a known root cause of this error is 0 images inside the validation set.
Failed Fixes
Per this answer I renamed my files via os.rename to 1.jpg, 2.jpg, 3.jpg, ... didn't work :/
Per this answer talking about one image/category, that's wrong, it's fine to have just one image inside a category.

One of the issues is related to the image downloading. If a designated image document is not downloaded, it also show the same error.

You have to check the several things after this exception appeared:
Do you have enough data for training?
If you only have limited data in your training set, this exception would appear. I guess if you want to split the data, the amount of the data should be divisible by 10 (Take validation_split=0.1 for example).
Do your image in valid format?
This method only allows formats in ('.bmp', '.gif', '.jpeg', '.jpg', '.png'). Invalid format would appear this exception.
Honestly, the exception doesn't give much information of what's happening exactly. Hopefully would update in near future.

Issues in Gensim WordRank Embeddings

I am using Gensim wrapper to obtain wordRank embeddings (I am following their tutorial to do this) as follows.
from gensim.models.wrappers import Wordrank
model = Wordrank.train(wr_path = "models", corpus_file="proc_brown_corp.txt",
out_name= "wr_model")
model.save("wordrank")
model.save_word2vec_format("wordrank_in_word2vec.vec")
However, I am getting the following error FileNotFoundError: [WinError 2] The system cannot find the file specified. I am just wondering what I have made wrong as everything looks correct to me. Please help me.
Moreover, I want to know if the way I am saving the model is correct. I saw that Gensim offers the method save_word2vec_format. What is the advantage of using it without directly using the original wordRank model?

FileNotFoundError: [WinError 2] The system cannot find the file specified.
So, I am gonna assume here that you got the traceback on
model = Wordrank.train(wr_path = "models", corpus_file="proc_brown_corp.txt",
out_name= "wr_model")
See, the wr_path is supposed to point to where you have your wordrank installed, to be more specific, the path to the folder where your wordrank binary is saved.
So mine was path_to_wordrank_binary ='/home/ubuntu/wordrank' where wordrank is the folder that contains the wordrank.cpp
Then ensure that your corpus file is on the current directory. Since that's what you have given.
This is the tutorial you should be looking into.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to update data_dir and data_path in TF DatasetInfo object? - python

Related

Error , Unsuccessful TensorSliceReader constructor: Failed to find any matching files for ram to unpickle a file

'str' object has no attribute 'call' when converting .model to tflite file

Where are the `tfds.load` datasets are saved?

TypeError: Input 'filename' of 'ReadFile' Op has type float32 that does not match expected type of string

Issues in Gensim WordRank Embeddings

Categories

Resources