Using dataset.map function passes Tensor without values - python

Hello i'm using tensorflow 2.10 and struggling with tensorflow data api. I wrote the following simple piece of code:
def loaddata(filepath, label):
data = np.load(filepath.numpy().decode())
return data, label
filenames = []
labels = []
/*appending some data into filenames and labels*/
dataset = tf.data.Dataset.from_tensor_slices((filenames, labels))
dataset = dataset.map(loaddata)
However, i receive "AttributeError: 'Tensor' object has no attribute 'numpy'" error. I have search similar error but could not find effective way to pass tuples via dataset.map() function.
Any help will be appreciated.

Related

Store original data (e.g., text, image) along with tensor data in Pytorch Dataloader

Currently, I am using TensorDataset followed by DataLoader to load my dataset like below:
tensor_loader = TensorDataset(x_input_ids,x_seg_ids,x_atten_masks,y)
data_loader = DataLoader(tensor_loader, shuffle=True, batch_size=batch_size)
I now want to also store original (text) data along with the tensor data in the data_loader like below:
tensor_loader = TensorDataset(x_input_ids,x_seg_ids,x_atten_masks,y, x_input_strs)
Note: x_input_strs is text data corresponding to x_input_ids but it fails since TensorDataset allows only tensors. I also tried something like this:
tensor_loader = Dataset(x_input_ids,x_seg_ids,x_atten_masks,y, x_input_strs)
But it gives the following error:
TypeError: object.__new__() takes exactly one argument (the type to instantiate)
Any suggestions are appreciated.

understanding what keras and TensorFlow to use in text classification

I was trying to classify my text in tensorflow and keras and every time I tried using the keras to read my files from my directory then it would through and error that the features for reading the text was not available yet it is included in the documentation
I made my own file reader functionality which is here how to read text files in keras using os.walk and converting to batched dataset and now trying to vectorize my text using keras preprocessing then again the module is not available
as asked in the comment I was trying to use keras https://keras.io/api/preprocessing/text/ guide and the one https://keras.io/examples/nlp/text_classification_from_scratch/ here which uses the vectorizing, I started digging into uisng tf to make tokens from the text and I found how to vectorize the text here https://www.tensorflow.org/tutorials/text/word2vec
but now the problem was that I could not use the functions because I could not understand it very well my code was as follows
train_dataset = get_files_from_dir(train_path,batch_size=batch_size, seed=seed) # calls the fetch text which returns dataset of text and labels batched
text_ds = train_dataset.map(lambda x, y: x) # get featues only (text with no labels)
def vec_maker(text):
tokens = text.lower().split()
vocab, index = {}, 1 # start indexing from 1
vocab['<pad>'] = 0 # add a padding token
for token in tokens:
if token not in vocab:
vocab[token] = index
index += 1
self.vocab =vocab
return text
now my problem is how do I map the text_ds to the function to make the vectors because if I try passing the variable as function arguments it direct is says that
File"/home/kim/Desktop/programs/python/text_processing/prog/text_process.py", line 78, in vec_maker
tokens = text.lower().split()
AttributeError: 'MapDataset' object has no attribute 'lower'
help and explanation will much be appreciated

How to load Fashion MNIST dataset in Tensorflow Fedarated?

I am working on a project with Tensorflow federated. I have managed to use the libraries provided by TensorFlow Federated Learning simulations in order to load, train, and test some datasets.
For example, i load the emnist dataset
emnist_train, emnist_test = tff.simulation.datasets.emnist.load_data()
and it got the data sets returned by load_data() as instances of tff.simulation.ClientData. This is an interface that allows me to iterate over client ids and allow me to select subsets of the data for simulations.
len(emnist_train.client_ids)
3383
emnist_train.element_type_structure
OrderedDict([('pixels', TensorSpec(shape=(28, 28), dtype=tf.float32, name=None)), ('label', TensorSpec(shape=(), dtype=tf.int32, name=None))])
example_dataset = emnist_train.create_tf_dataset_for_client(
emnist_train.client_ids[0])
I am trying to load the fashion_mnist dataset with Keras to perform some federated operations:
fashion_train,fashion_test=tf.keras.datasets.fashion_mnist.load_data()
but I get this error
AttributeError: 'tuple' object has no attribute 'element_spec'
because Keras returns a Tuple of Numpy arrays instead of a tff.simulation.ClientData like before:
def tff_model_fn() -> tff.learning.Model:
return tff.learning.from_keras_model(
keras_model=factory.retrieve_model(True),
input_spec=fashion_test.element_spec,
loss=loss_builder(),
metrics=metrics_builder())
iterative_process = tff.learning.build_federated_averaging_process(
tff_model_fn, Parameters.server_adam_optimizer_fn, Parameters.client_adam_optimizer_fn)
server_state = iterative_process.initialize()
To sum up,
Is any way to create tuple elements of tff.simulation.ClientData from Keras Tuple Numpy arrays?
Another solution that comes to my mind is to use the
tff.simulation.HDF5ClientData and load
manually the appropriate files in aHDF5format (train.h5, test.h5) in order to get the tff.simulation.ClientData, but my problem is that i cant find the url for fashion_mnist HDF5 file format i mean something like that for both train and test:
fileprefix = 'fed_emnist_digitsonly'
sha256 = '55333deb8546765427c385710ca5e7301e16f4ed8b60c1dc5ae224b42bd5b14b'
filename = fileprefix + '.tar.bz2'
path = tf.keras.utils.get_file(
filename,
origin='https://storage.googleapis.com/tff-datasets-public/' + filename,
file_hash=sha256,
hash_algorithm='sha256',
extract=True,
archive_format='tar',
cache_dir=cache_dir)
dir_path = os.path.dirname(path)
train_client_data = hdf5_client_data.HDF5ClientData(
os.path.join(dir_path, fileprefix + '_train.h5'))
test_client_data = hdf5_client_data.HDF5ClientData(
os.path.join(dir_path, fileprefix + '_test.h5'))
return train_client_data, test_client_data
My final goal is to make the fashion_mnist dataset work with the TensorFlow federated learning.
You're on the right track. To recap: the datasets returned by tff.simulation.dataset APIs are tff.simulation.ClientData objects. The object returned by tf.keras.datasets.fashion_mnist.load_data is a tuple of numpy arrays.
So what is needed is to implement a tff.simulation.ClientData to wrap the dataset returned by tf.keras.datasets.fashion_mnist.load_data. Some previous questions about implementing ClientData objects:
Federated learning : convert my own image dataset into tff simulation Clientdata
How define tff.simulation.ClientData.from_clients_and_fn Function?
Is there a reasonable way to create tff clients datat sets?
This does require answering an important question: how should the Fashion MNIST data be split into individual users? The dataset doesn't include features that that could be used for partitioning. Researchers have come up with a few ways to synthetically partition the data, e.g. randomly sampling some labels for each participant, but this will have a great effect on model training and is useful to invest some thought here.

Extract image dataset from tensorflow record dataset in batches

I recently started studying CNNs with tensorflow and found tfrecords very helpful in speeding up the training, however I'm struggling with data API.
After parsing, my dataset is composed of (image, label) tuples, this is fine for training, however I'm trying to extract the image in another dataset to call keras.predict() on.
I've tried this solution:
test_set = get_set_tfrecord(test_path, _parse_function, num_parallel_calls = 4)
lab = []
f = True
for image, label in test_set.take(600):
if f:
img = tf.data.Dataset.from_tensors(image)
f = False
else:
img = img.concatenate(tf.data.Dataset.from_tensors(image))
lab.append(label.numpy())
naive, not great code, but it works EXCEPT in order to perform concatenation (i.e. stacking) it loads every image into RAM.
What's the proper way of doing this?
You can use the map API from tf.data.Dataset. You can write the following code.
result = test_set.map(lambda image, label: image)
# You can iterate and check what you have received at the end.
# I expect only the images.
for image in result.take(1):
print(image)
I hope that using the above code you resolve your issue and that this answer serves you well.

Pre processing keras dataset using keras tokenizer

I am trying to do some pre processing using the keras tokenizer on data I read using the following code:
dataset = tf.data.Dataset.from_tensor_slices(filenames)
dataset = dataset.interleave(lambda x:
tf.data.TFRecordDataset(x).prefetch(params.num_parallel_readers),
cycle_length=params.num_parallel_readers,
block_length=1)
dataset = dataset.map(_parse_example, num_parallel_calls = params.num_parallel_calls)
Now that I have the parsed example (output of _parse_example map function) I want to do some pre-processing on the text using tf.keras.preprocessing.text.Tokenizer method texts_to_sequences.
However, texts_to_sequences expects an input of python strings and I get Tensors in the parsed_example.
I can work around it by using py_func to wrap my code (see 'emb': tf.py_func.. in the code below), but then I will not be able to serialize my model (according to the py_func documentation).
dataset = dataset.map(lambda features, labels:
({'window': features['window'],
'winSize': features['winSize'],
'LandingPage': features['LandingPage'],
'emb': tf.py_func(getEmb, [features['window']], tf.int32)},
tf.one_hot(labels, hparams.numClasses) ))
Looking for a way to do that (or a link to some similar example)

Categories

Resources