I succeeded at executing all the steps of the online tutorial for google cloud ml.
But since the dataset used in this tutorial is already a TFRecord, I didn't understand well in what way transform say my numpy dataset to a TFRecord one.
Then, I tried to create my TFRecord using this a little bit modified code compared to the official convert_to_records.py. What I understand is that we can only convert primitive variables to TFRecord and that is why the trick to convert a list of float to bytes is used.
Then I have to somewhere convert back my string to a list of floats. Thus, I tried to perform this task with either the line 97 or the line 98 in my modified script model.py.
Unfortunately, none of these attempts is working. I always get the following error message :
ValueError: rank of shape must be at least 2 not: 1
This is because the shape of my variable features is (batch_size,) and not (batch_size, IMAGE_PIXELS). But I don't understand why.
Am I trying to launch google-cloud-ml the wrong way or are there some more parameters to tweak ?
The error indicates a rank 2 (matrix) is expected but the value is actually rank 1 (a vector). I suspect this is because np.tostring() returns a single string rather than a list of strings.
I think that is somewhat tangential as I don't think your float-to-string and string-to-float conversions are consistent. You convert float-to-string using numpy's builtin tostring() method. That returns the byte representation of the data: i.e.
import numpy as np
x = np.array([1.0, 2.0])
print x.tostring()
Returns
�?#
And not
['1.0', '2.0']
The latter is what tf.string_to_number expects.
You could make the float-to-string and string-to-float conversions consistent but I think a better solution is to just represent the data as floats. For example:
def _int64_feature(value):
return tf.train.Feature(int64_list=tf.train.Int64List(value=value))
def _float_feature(value):
return tf.train.Feature(float_list=tf.train.FloatList(value=value))
e = tf.train.Example(features=tf.train.Features(feature={
'labels': _int64_feature([10]),
'features': _float_feature([100.0, 200, ....])}))
feature_map = {
'labels': tf.FixedLenFeature(
shape=[1], dtype=tf.int64, default_value=[-1]),
'features': tf.FixedLenFeature(
shape=[NUM_PIXELS], dtype=tf.float32),
}
result = tf.parse_example([e.SerializeToString()], features=feature_map)
A Feature proto allows float32's to be stored inside float_list. You only need to convert floats to bytes if you are using float64. Your data is float32 so that's unnecessary.
It might help to analyze both the output of read_data_sets.py and the output of parse_example operation in your model.py
What read_data_sets produces
read_data_sets, as you point out, creates numpy arrays for each image. They have shape [28, 28, 1] for height x width x channels (the images are monochrome) and in your original call to read_data_sets, you were specifying that you wanted the image data as uint8 arrays. When you call tostring on the uint8 numpy array, the shape information is discarded and since each uint8 is a single byte, you end up with a byte string of length 784 with one entry for each pixel in the original 28x28x1 numpy array in row-major order. This is then stored as a bytes_list in the resulting tf.train.Example.
To recap, each entry in the feature map under the features key has a list of bytes with exactly one entry. That entry is a string of length 784 where each 'character' in the string is a value between 0-255 representing the monochrome pixel value for a point in the original 28x28 image. The following is a sample instance of tf.train.Example as printed by Python:
features {
feature {
key: "features"
value {
bytes_list {
value: "\000\000\257..."
}
}
}
feature {
key: "labels"
value {
int64_list {
value: 10
}
}
}
}
What parse_example expects and returns
tf.parse_example accepts a vector of tf.string objects as input. These objects are serialized tf.train.Example objects. In your code, util.read_examples produces exactly that.
The other argument to tf.parse_example is the schema to the examples. As mentioned before, the features entry in your Example is a tf.string as defined above. For reference, your code has:
def parse_examples(examples):
feature_map = {
'labels': tf.FixedLenFeature(
shape=[], dtype=tf.int64, default_value=[-1]),
'features': tf.FixedLenFeature(
shape=[], dtype=tf.string),
}
return tf.parse_example(examples, features=feature_map)
The interest thing, related to the error message you received, is the shape parameter. That shape parameter specifies the shape of a single instance, in this case, by specifying that shape=[] you are saying that each image is a rank-0 string, which is to say, a plain-old string (i.e., not a vector, not a matrix, etc.). This requires that the bytes_list have exactly one element. That's exactly what you are storing in each features field of your tf.train.Example.
Even though the shape property refers to the shape of a single instance, the output of tf.parse_example for the features field will be the whole batch of examples. This can be a bit confusing. So while each individual example has a single string (shape=[]), the batch is a vector of strings (shape=[batch_size]).
Using the image
Having the image data in a string is not very useful; we need to convert it back to numerical data. The TensorFlow op to do this is tf.decode_raw (Jeremy Lewi explained why tf.string_to_number won't work here):
image_bytes = tf.decode_raw(parsed['features'], out_type=tf.uint8)
image_data = tf.cast(image_bytes, tf.float32)
(be sure to set out_type=tf.uint8 since that was the data type that was output in read_data_sets). Typically, you're going to want to cast the result to a tf.float32. Sometimes it's even useful to reshape the tensor to recover the original shapes, e.g.,
# New shape is [batch_size, height, width, channels]. We use
# -1 as the first dimension in case batches are variable size.
image_data = tf.reshape(image_data, [-1, 28, 28, 1])
(NB: you probably don't need that in your code).
Alternatively, you could store the data as tf.float32 by calling read_data_sets with dtype=tf.float32 (the default). Then you can construct your tf.train.Example as explained by Jeremy Lewi, who also gave the code to parse such examples. However, the shapes will be different in that case. The shape of each instance (as indicated by the shape in FixedLenFeature) is now IMAGE_PIXELS, and the shape of the features entry in the output of tf.parsed_example is [batch_size, IMAGE_PIXELS].
The tradeoff between uint8 and float32, of course, is that the data on disk will be approximately four times as large for the latter, but you avoid the extra cast needed for the former. In the case of MNIST where there isn't much data, the added clarity of directly dealing with float data is probably worth the extra space.
Related
I am using Scikit-Image imread function for reading images for a PyTorch data loader.
I get errors from the function ToTensor(), saying the the strides of the numpy array are negative.
I read about it and using somearray.copy() solves it.
Yet, I'd like to solve it from the root. How can I force Scikit-Image to read the image into a contiguous array with regular strides?
I looked for solutions for this case and they mostly about creating a new copy of data which I want to avoid.
Those are the properties of the array:
print(f'shape: {img.shape}')
print(f'dtype: {img.dtype}')
print(f'strides: {img.strides}')
The output:
shape: (4032, 3024, 3)
dtype: uint8
strides: (3, -12096, 1)
When I run img.base I get the values of the data. Though the dimensions are (3024, 4032, 3)
I don't know a lot about image file formats, but can make some deductions from the data you provided
shape: (4032, 3024, 3)
dtype: uint8
strides: (3, -12096, 1)
img.base (3024, 4032, 3)
img is a view of its base. The negative strides[1] means that dimension has been reversed, e.g. with a ::-1 indexing. The fact that the largest stride is in the middle, means the first two dimensions have been swapped (transpose(1,0,2)). I expect img.base.strides is (12096,3,1). 12096 is 3*4032.
jpg is a compressed format, but I assume the base is close in layout to the file, and this view is needed to conform to our normal numpy expectations for an array.
img.copy() will have the same shape, but strides will be (9072,3,1).
If plt.imread produces an array with that shape and strides, it may well have returned that copy rather than the view. It's not necessarily being any more "efficient".
Think about how we print a 2d array - 1st dimension, rows, going down, 2nd, columns, going across, left to right. But think about a common xy plot - x goes left to right, and y goes from bottom up. Or look at what np.meshgrid says about indexing, 'ij' versus 'xy'.
Having the size 3 dimension last is just another convention. That's the color 'channel', 3 for RGB, 4 adds a transparency value, and 1 for b/w. Sometimes arrays have that dimension first.
I am building a dataset with two tensors of shape [batch,width,heigh,3] and [batch,class] for each element. For simplicity lets say class = 5.
What shape do you feed to dataset.padded_batch(1000,shape) such that image is padded along the width/height/3 axis?
I have tried the following:
tf.TensorShape([[None,None,None,3],[None,5]])
[tf.TensorShape([None,None,None,3]),tf.TensorShape([None,5])]
[[None,None,None,3],[None,5]]
([None,None,None,3],[None,5])
(tf.TensorShape([None,None,None,3]),tf.TensorShape([None,5]))
Each raising TypeError
The docs state:
padded_shapes: A nested structure of tf.TensorShape or tf.int64 vector
tensor-like objects representing the shape to which the respective
component of each input element should be padded prior to batching.
Any unknown dimensions (e.g. tf.Dimension(None) in a tf.TensorShape or
-1 in a tensor-like object) will be padded to the maximum size of that dimension in each batch.
The relevant code:
dataset = tf.data.Dataset.from_generator(generator,tf.float32)
shapes = (tf.TensorShape([None,None,None,3]),tf.TensorShape([None,5]))
batch = dataset.padded_batch(1,shapes)
Thanks to mrry for finding the solution. Turns out that the type in from_generator has to match the number of tensors in the entries.
new code:
dataset = tf.data.Dataset.from_generator(generator,(tf.float32,tf.float32))
shapes = (tf.TensorShape([None,None,None,3]),tf.TensorShape([None,5]))
batch = dataset.padded_batch(1,shapes)
TensorShape doesn't accept nested lists. tf.TensorShape([None, None, None, 3, None, 5]) and TensorShape(None) (note no []) are legal.
Combining these two tensors sounds odd to me, though. I'm not sure what you're trying to accomplish, but I'd recommend trying to do it without combining tensors of different dimensions.
I'm looking to see if there is a way to feed sequence data as Numpy arrays to a text LSTM model defined in CTNK. Each instance in my dataset is a sequence of integers mapping back to words, and the length of each sequence is different. It seems like one can convert their raw text data to the CTF format and feed this data to a model by creating a reader function which generates mini-batches as in this example. However, I'm wondering if there is a way to feed Numpy arrays to this same model.
Further down in this example, there is a discussion of feeding sequences with Numpy, which I was hoping would solve my problem. However, the example deals with sequences of images instead of variable-length sequences of words. In the case of the example, we'll end up with a tensor of n elements that are each 3 x 32 x 32, and we can set up an input variable expecting these dimensions. However, in the case of sequences of words where each sequence has a different length, this example breaks down.
Any help on interop between CTNK and Numpy for text-based LSTM's / RNN's would be greatly appreciated.
You are probably looking for:
x = cntk.sequence.input_variable(shape=())
Here is a sample little program that demonstrates how it works with a variable sequence length:
import numpy as np
import cntk
# define the model
x = cntk.sequence.input_variable(shape=())
z = cntk.sequence.last(x)
# define the data
a = [[1,2,3], [4,5], [6,7,8,9], [0]]
b = [np.array(i, dtype=np.float32) for i in a]
# evaluate
res = z.eval({x: b})
print(res)
I want to know why this error occured.
The input is image files (24*375*3(width, height, channels)images, *.png) and output is labeled file(.csv) which has Boolean (0 or 1) label.
Here is my github
https://github.com/dldudwo0805/DeepLearningPractice
Plz. Give me advice.
The error code is -
The value of a feed cannot be a tf.Tensor object. Acceptable feed values include Python scalars, strings, lists, numpy ndarrays, or TensorHandles.
y_data = tf.reshape(y_data, [50, 1])
y_data is a tensor. Try np.reshape rather than tf.reshape
There is a function, I got the following message while calling this function
The function is to resize a given image set, and put the transformed ones into the new set imgs_p.
For instance, the input imgs has shape (5635,1,420,580). I want to transform it (5635,64,80,1). This is what I did as follows, but i got the error message as ValueError: could not broadcast input array from shape (80,64) into shape (80,1)
How to solve this problem? Thanks.
def preprocess(imgs):
imgs_p = np.ndarray((imgs.shape[0],img_rows, img_cols,imgs.shape[1]), dtype=np.uint8)
print('imgs_p: ',imgs_p.shape)
for i in range(imgs.shape[0]):
print('imgs[i,0]: ',imgs[i,0].shape)
imgs_p[i,0]=resize(imgs[i,0],(img_rows,img_cols))
return imgs_p
I presume you want to roll the "1" dimension to the correct position:
z = np.moveaxis(z, 1, -1).shape
whereafter you can run a for-loop over each image and reshape, either using skimage or scipy.ndimage.
Be careful with downsampling! You probably want to apply a Gaussian blur first to make sure all the data is taken into account.