Upon each epoch of a tensorflow dataset it appears that it is able to recall functions applied to each element separately again. I am unable to find in the docs or source how/why this happens which is my question. This can be reproduced here:
import tensorflow as tf
import random
dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3])
dataset = dataset.repeat(3)
def p(ds):
for ex in ds:
ex = ex + random.randint(3, 9)
print(ex)
return ds
nds = p(dataset)
output -----
tf.Tensor(6, shape=(), dtype=int32)
tf.Tensor(10, shape=(), dtype=int32)
tf.Tensor(10, shape=(), dtype=int32)
tf.Tensor(9, shape=(), dtype=int32)
tf.Tensor(8, shape=(), dtype=int32)
tf.Tensor(6, shape=(), dtype=int32)
tf.Tensor(8, shape=(), dtype=int32)
tf.Tensor(6, shape=(), dtype=int32)
tf.Tensor(11, shape=(), dtype=int32)
Can anyone explain how this happens?
I want to add a note that this can happen when repeat() is inf and this is done in a generator.
Related
I am learning the tensorflow which version is 2.8.0 on my MacBook M1.
For debugging the code in the map function of dataset, I want to print tensor value in my function.
def function(i):
print("in: ", i)
if i < 2:
i = i - 1
return i
dataset = tf.data.Dataset.range(1, 6)
dataset = dataset.map(lambda x: tf.py_function(function, inp=[x], Tout=tf.int64))
for x in dataset:
print("out:", x)
I got the output as blew:
in: tf.Tensor(1, shape=(), dtype=int64)
out: tf.Tensor(0, shape=(), dtype=int64)
in: tf.Tensor(2, shape=(), dtype=int64)
out: tf.Tensor(2, shape=(), dtype=int64)
in: tf.Tensor(3, shape=(), dtype=int64)
out: tf.Tensor(3, shape=(), dtype=int64)
in: tf.Tensor(4, shape=(), dtype=int64)
out: tf.Tensor(4, shape=(), dtype=int64)
in: tf.Tensor(5, shape=(), dtype=int64)
out: tf.Tensor(5, shape=(), dtype=int64)
After I delete the print outside, I did not get any output.
What's the difference between the print inside and the print outside.
I don't understand why it can only take effect when the prints appear at the same time.
Beside that, what the difference between print and tf.print?
I believe it is because tensorflow datasets have lazy loading, which means they aren't evaluated until you actually try to iterate over the result.
When you removed the for loop, you were no longer iterating over the result, so it was never evaluated.
See https://stackoverflow.com/a/54679387/494134
I have data that looks this when I print it:
data = [<tf.Tensor: shape=(), dtype=float32, numpy=-0.0034351824>, <tf.Tensor: shape=(), dtype=float32, numpy=0.0003163157>, <tf.Tensor: shape=(), dtype=float32, numpy=0.00060091465>, <tf.Tensor: shape=(), dtype=float32, numpy=0.0012879161>, <tf.Tensor: shape=(), dtype=float32, numpy=0.0002799925>]
So this is a list where the elements are of type <class 'tensorflow.python.framework.ops.EagerTensor'>.
I would like to convert it to a standard numpy array. So in this case it would look like:
array([-0.0034351824, 0.0003163157, 0.00060091465, 0.0012879161, 0.0002799925])
How can you do that?
You can use
import numpy as np
x = np.array(data)
You can use np.ravel since you want a 1D list:
np.ravel(data)
array([0.22184575, 0.3621379 , 0.5509906 , 0.20388651, 0.94017696], dtype=float32)
Full example:
import tensorflow as tf
import numpy as np
data = [tf.random.uniform((1,)) for i in range(5)]
np.ravel(data)
I need to do some preprocessing before feeding the input to my network. I am working on my data iterator and making some functions to make the data ready. My train set is a large file and each line has following scheme. Each line is semicolon seperated which could also be empty. Each split then contains different events which are como seperated and finaly each observation has 5 singnals which are colon seperated.
1:1560629595635:183.94:z1:Happy,2:1560629505635:100:z1:Sad;5:1561929595635:1:z1:Happy,13:1561629595635:12:j1:Sad;50:15616295956351:10:f1:Sad
My objective is to split each line in following order:
1. ';' Split
2. ',' Split
3. ':' Split
*Note: length after the last split = 5
In following function that I wrote, if I specify indx for the function it works fine for only one fragment output of ';' split.
def __semicolon_coma_colon_split(line, table, indx):
sources = tf.strings.split(line, ';')
pairs = tf.strings.split(sources[indx], ',')
items = tf.strings.split(pairs, ':').to_tensor()
return (tf.strings.to_number(items[:, 0], out_type=tf.int32),
tf.strings.to_number(items[:, 1], out_type=tf.int64),
tf.strings.to_number(items[:, 2], out_type=tf.float32),
tf.cast(table.lookup(items[:, 3]), tf.int32),
tf.cast(table.lookup(items[:, 4]), tf.int32))
And the current output shape is
(<tf.Tensor: shape=(2,)>, <tf.Tensor: shape=(2,)>, <tf.Tensor: shape=(2,)>, <tf.Tensor: shape=(2,)>, <tf.Tensor: shape=(2,)>)
However, I need to run this cuncurrently on all the splits after ';' and the expected output tensor would look like a tuple of tuples that each contains 5 tensors;
((<tf.Tensor: shape=(2,)>, <tf.Tensor: shape=(2,)>, <tf.Tensor: shape=(2,)>, <tf.Tensor: shape=(2,)>, <tf.Tensor: shape=(2,)>),
(<tf.Tensor: shape=(2,)>, <tf.Tensor: shape=(2,)>, <tf.Tensor: shape=(2,)>, <tf.Tensor: shape=(2,)>, <tf.Tensor: shape=(2,)>),
(<tf.Tensor: shape=(1,)>, <tf.Tensor: shape=(1,)>, <tf.Tensor: shape=(1,)>, <tf.Tensor: shape=(1,)>, <tf.Tensor: shape=(1,)>))
Can't you do this with split built-in method for strings?
What is the equivalent of Panda's df.head() for tf datasets?
Following the documentation here I've constructed the following toy examples:
dset = tf.data.Dataset.from_tensor_slices((tf.constant([1.,2.,3.]), tf.constant([4.,4.,4.]), tf.constant([5.,6.,7.])))
print(dset)
outputs
<TensorSliceDataset shapes: ((), (), ()), types: (tf.float32, tf.float32, tf.float32)>
I would prefer to get back something resembling a tensor, so to get some values I'll make an iterator.
dset_iter = dset.__iter__()
print(dset_iter.next())
outputs
(<tf.Tensor: id=122, shape=(), dtype=float32, numpy=1.0>,
<tf.Tensor: id=123, shape=(), dtype=float32, numpy=4.0>,
<tf.Tensor: id=124, shape=(), dtype=float32, numpy=5.0>)
So far so good. Let's try some windowing...
windowed = dset.window(2)
print(windowed)
outputs
<WindowDataset shapes: (<tensorflow.python.data.ops.dataset_ops.DatasetStructure object at 0x1349b25c0>, <tensorflow.python.data.ops.dataset_ops.DatasetStructure object at 0x1349b27b8>, <tensorflow.python.data.ops.dataset_ops.DatasetStructure object at 0x1349b29b0>), types: (<tensorflow.python.data.ops.dataset_ops.DatasetStructure object at 0x1349b25c0>, <tensorflow.python.data.ops.dataset_ops.DatasetStructure object at 0x1349b27b8>, <tensorflow.python.data.ops.dataset_ops.DatasetStructure object at 0x1349b29b0>)>
Ok, use the iterator trick again:
windowed_iter = windowed.__iter__()
windowed_iter.next()
outputs
(<_VariantDataset shapes: (), types: tf.float32>,
<_VariantDataset shapes: (), types: tf.float32>,
<_VariantDataset shapes: (), types: tf.float32>)
What? A WindowDataset's iterator gives back a tuple of other dataset objects?
I would expect the first item in this WindowDataset to be the tensor with values [[1.,4.,5.],[2.,4.,6.]]. Maybe this is still true, but it isn't readily apparent to me from this 3-tuple of datasets.
Ok. Let's get their iterators...
vd = windowed_iter.get_next()
vd0, vd1, vd2 = vd[0], vd[1], vd[2]
vd0i, vd1i, vd2i = vd0.__iter__(), vd1.__iter__(), vd2.__iter__()
print(vd0i.next(), vd1i.next(), vd2i.next())
outputs
(<tf.Tensor: id=357, shape=(), dtype=float32, numpy=1.0>,
<tf.Tensor: id=358, shape=(), dtype=float32, numpy=4.0>,
<tf.Tensor: id=359, shape=(), dtype=float32, numpy=5.0>)
As you can see, this workflow is quickly becoming a mess. I like how Tf2.0 is attempting to make the framework more interactive and pythonic. Are there good examples of the datasets api conforming to this vision too?
I was in a similar situation. I eventually ended up using zip
train_dataset = train_dataset.window(10, shift=5)
for step_dataset in train_dataset:
for (images, labels, paths) in zip(*step_dataset):
train_step(images, labels)
I have tried the following:
>>> import tensorflow as tf
>>> tf.enable_eager_execution()
>>> tf.math.pow(3,3)
2019-05-06 16:05:51.508296: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
<tf.Tensor: id=3, shape=(), dtype=int32, numpy=27>
>>> tf.math.pow(9,(1/3))
<tf.Tensor: id=7, shape=(), dtype=int32, numpy=1>
>>> tf.math.pow(27,(1/3))
<tf.Tensor: id=11, shape=(), dtype=int32, numpy=1>
>>> tf.math.pow(27,0.3)
<tf.Tensor: id=15, shape=(), dtype=int32, numpy=1>
>>> tf.math.pow(4,0.5)
<tf.Tensor: id=19, shape=(), dtype=int32, numpy=1>
>>> tf.math.pow(4,1/2)
<tf.Tensor: id=23, shape=(), dtype=int32, numpy=1>
I know that sqrt is used for square root. But if I need the cube root of a number, how I can calculate it with tensorflow?
Try this please:
>>> import tensorflow as tf
>>> tf.enable_eager_execution()
>>> tf.math.pow(27.0,1.0/3.0)
2019-05-06 16:22:39.403646: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
<tf.Tensor: id=3, shape=(), dtype=float32, numpy=3.0>
Guess the issue is with datatype.
If you mean seeing the result, you can use InteractiveSession and eval:
import tensorflow as tf
tf.InteractiveSession()
tf.pow(27.0,1/3).eval()