I have created a Deep Convolution Neural Network to classify individual pixels in an image. My training data will always be the same size (32x32x7), but my testing data can be any size.
Currently, my model will only work on images that are the same size. I have used the tensorflow mnist tutorial extensively to help me construct my model. In this tutorial, we only use 28x28 images. How would the following mnist model be changed to accept images of any size?
x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])
W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))
x_image = tf.reshape(x, [-1, 28, 28, 1])
To make things a little bit more complicated, my model has transpose convolutions where the output shape needs to be specified. How would I adjust the following line of code so that the transpose convolution will output a shape that is the same size of the input.
DeConnv1 = tf.nn.conv3d_transpose(layer1, filter = w, output_shape = [1,32,32,7,1], strides = [1,2,2,2,1], padding = 'SAME')
Unfortunately there's no way to build dynamic graphs in Tensorflow (You could try with fold but that's outside the scope of the question). This leaves you with two options:
Bucketing: You create multiple input tensors in a few hand picked sizes and then in runtime you choose the right bucket (see example). Either way you'll probably need the second option. Seq2seq with bucketing
Resize the input and output images.
Assuming the images all maintain the same aspect ration you can try resizing the image before inference. Not sure why you care about the output since MNIST is a classification task.
Either way you can use the same approach:
from PIL import Image
basewidth = 28 # MNIST image width
img = Image.open('your_input_img.jpg')
wpercent = (basewidth/float(img.size[0]))
hsize = int((float(img.size[1])*float(wpercent)))
img = img.resize((basewidth,hsize), Image.ANTIALIAS)
# Save image or feed directly to tensorflow
The mnist model code which you mentioned is an example using FC networks and not for convolution networks. The input shape of [None,784] is given for mnist size (28 x 28). The example is a FC network which has fixed input size.
What you are asking for is not possible in FC networks because the number of weights and biases are dependent on the input shape. This is possible if you are using a Fully convolution architecture. So my suggestion is to use a fully convolution architecture so that the weights and biases are not dependent on the input shape
Adding to #gidim's answer, here is how you can resize the images in Tensorflow, and feed the results directly to your inference. Note: This method scales and distorts the image, which might increase your loss.
All credit goes to Prasad Pai's article on Data Augmentation.
import tensorflow as tf
import numpy as np
from PIL import Image
def tf_resize_images(X_img_file_paths):
X_data = []
X = tf.placeholder(tf.float32, (None, None, CHANNELS))
tf_img = tf.image.resize_images(X, (IMAGE_SIZE, IMAGE_SIZE),
with tf.Session() as sess:
# Each image is resized individually as different image may be of different size.
for index, file_path in enumerate(X_img_file_paths):
img = Image.open(file_path)
resized_img = sess.run(tf_img, feed_dict = {X: img})
X_data = np.array(X_data, dtype = np.float32) # Convert to numpy
return X_data
I have a tensorflow keras model trained with tensorflow 2.3. The model takes as input an image, however the model was trained with scaled inputs and therefore we have to scale the image by 255 before inputting them into the model.
As we use this model across a variety of platforms, I am trying to simplify this by modifying the model to simply insert a rescale layer at the start of the keras model (i.e. immediately after the input). Therefore any future consumption of this model can simply pass an image without having to scale them.
I am having a lot of trouble getting this to work. I understand I need to use the following function to create a rescaling layer;
tf.keras.layers.experimental.preprocessing.Rescaling(255, 0.0, "rescaling")
But I am unsure how to insert this to the start of the model.
you can insert this layer at the top of your trained model. below an example where first we train a model manual scaling the input and the we using the same trained model but adding at the top a Rescaling layer
from tensorflow.keras.layers.experimental.preprocessing import Rescaling
# generate dummy data
input_dim = (28,28,3)
n_sample = 10
X = np.random.randint(0,255, (n_sample,)+input_dim)
y = np.random.uniform(0,1, (n_sample,))
# create base model
inp = Input(input_dim)
x = Conv2D(8, (3,3))(inp)
x = Flatten()(x)
out = Dense(1)(x)
# fit base model with manual scaling
model = Model(inp, out)
model.compile('adam', 'mse')
model.fit(X/255, y, epochs=3)
# create new model with pretrained weight + rescaling at the top
inp = Input(input_dim)
scaled_input = Rescaling(1/255, 0.0, "rescaling")(inp)
out = model(scaled_input)
scaled_model = Model(inp, out)
# compare prediction with manual scaling vs layer scaling
pred = model.predict(X/255)
pred_scaled = scaled_model.predict(X)
(pred.round(5) == pred_scaled.round(5)).all() # True
Rescaling the images is part of data preprocessing, also rescaling images is called image normalization, this process is useful for providing a uniform scale for the dataset or numerical values you are using before building your model.In keras you can do this in many ways using one of the following according to your target:
If you are training using an Artificial neural network model you can use:-
"Batch normalization layer" or "Layer Normalization" or by the rescale method of keras you mentioned. You can look at this resource for more information about normalization .
to use the rescale method you mentioned:
#importing you libraries 1st
import tensorflow as tf
from tensorflow.keras.layers import BatchNormalization
#if your are using dataset from directory
import pathlib
then import your Dataset:
Dataset_Dir = '/Dataset/ path'
image size = (256,256) #the image size in your dataset
image shape = (96,96,3) #The shape you wish for your images in your network
Then divide your dataset to train-test I use 70-30 percent
Training_set = tf.keras.preprocessing.image_dataset_from_directory(Dataset_Dir,batch_size= 32,
image_size= image_size,
validation_split= 0.3,subset = "training",seed =123)
Test set
Testing_set = tf.keras.preprocessing.image_dataset_from_directory(Dataset_Dir,image_size= image_size,
validation_split=0.3,seed=123,subset ="validation")
normalization layer:
normalization_layer = tf.keras.layers.experimental.preprocessing.Rescaling(1./255)
normalized_training_set = Training_set.map(lambda x, y: (normalization_layer(x), y))
training_image_batch,training_labels_batch = next(iter(normalized_training_set))
for more about this method too:
look at tensorflow tutorial:
I'm working on a problem that involves computationally evaluating three-dimensional data of the shape (32, 16, 5) and providing a corrected form of this data also in the shape of (32, 16, 5). The problem is relatively specific to my field, but it can be viewed as analogous to processing color images (just with five color channels instead of three). If it helps, this could be thought of as a color correction model.
In my initial efforts, I created a random forest model using XGBoost for each of these output parameters. I had good results, but found that the sheer number of output parameters (32*16*5 = 2560) made the runtime of this approach too long, so I am looking for an alternative.
I'm looking at using Keras to solve this, using a convolutional neural network approach, since the adjacent 'pixels' in my data should have some useful information about their neighbors. Note that 'adjacency' here is both spatial and in the color channels. So far, I am doing alright in creating a simple model that I believe has inputs/outputs of the correct shape, but I am running into an issue when I try to train the model on some dummy images:
#!/usr/bin/env python3
import tensorflow as tf
import pandas as pd
import numpy as np
def create_model(image_shape, batch_size = 10):
width, height, channels = image_shape
conv_shape = (batch_size, width, height, channels)
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv3D(filters = channels, kernel_size = 3, input_shape = conv_shape, padding = "same"))
model.add(tf.keras.layers.Dense(channels, activation = "relu"))
return model
if __name__ == "__main__":
image_shape = (32, 16, 5)
# Create test input/output data sets:
input_img = np.random.rand(*image_shape) # Create one dummy input image
output_img = np.random.rand(*image_shape) # Create one dummy output image
# Create a bogus 'training set' by copying the input/output images into lists many times
inputs = [input_img]*500
outputs = [output_img]*500
# Create the model and fit it to the dummy data
model = create_model(image_shape)
model.compile(loss = "mean_squared_error", optimizer = "adam", metrics = ["accuracy"])
model.fit(input_img, output_img)
However, when I run this code, I get the following error:
ValueError: Input 0 of layer sequential is incompatible with the layer: : expected min_ndim=5, found ndim=3. Full shape received: [32, 16, 5]
I am not really sure what the other two expected dimensions are for the data passed into model.fit(). I suspect this is a problem with the way that I am formatting my input data. Even if I have a list of input/output images, that will only bring the ndim of my data to 4, not 5.
I have been trying to find similar examples in the documentation and around the web to see what I'm doing incorrectly, but 3D convolution on a non-classifier network seems a bit off the beaten path, and I'm not having much luck (or just don't know the name of what I should search for).
I have tried passing the dummy training set to model.fit instead of two individual images. Fitting with model.fit(inputs, outputs) instead, I get:
ValueError: Layer sequential expects 1 inputs, but it received 500 input tensors.
It seems that passing a list of tensors isn't correct here. If I convert the list of input images to numpy arrays with:
inputs = np.array(inputs)
outputs = np.array(outputs)
This does bring up the number of dimensions in my input data to 4, but Keras is still expecting 5. The error I get in this case is very similar to the first:
ValueError: Input 0 of layer sequential is incompatible with the layer: : expected min_ndim=5, found ndim=4. Full shape received: [None, 32, 16, 5]
I'm definitely not understanding something here, and any help would be appreciated.
I think you made two mistakes in your code:
Instead of using Conv3D, you need to use Conv2D.
model.fit(input_img, output_img) should be model.fit(inputs, outputs).
The reason why you need to use Conv2D is the shape of your data is (length,width,channel), it doesn't possess an extra dimension.
Try the script below
#!/usr/bin/env python3
import tensorflow as tf
import pandas as pd
import numpy as np
def create_model(image_shape, batch_size = 10):
width, height, channels = image_shape
conv_shape = (width, height, channels)
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv2D(filters = channels, kernel_size = 3, input_shape = conv_shape, padding = "same"))
model.add(tf.keras.layers.Dense(channels, activation = "relu"))
return model
if __name__ == "__main__":
image_shape = (32, 16, 5)
# Create test input/output data sets:
input_img = np.random.rand(*image_shape) # Create one dummy input image
output_img = np.random.rand(*image_shape) # Create one dummy output image
# Create a bogus 'training set' by copying the input/output images into lists many times
inputs = np.array([input_img]*500)
outputs = np.array([output_img]*500)
# Create the model and fit it to the dummy data
model = create_model(image_shape)
model.compile(loss = "mean_squared_error", optimizer = "adam", metrics = ["accuracy"])
model.fit(inputs, outputs)
I have trained a classifier with this: https://teachablemachine.withgoogle.com/
Then I set up a python environment where I can run the model. I heard that with some tweaks such model could be turned into a deep dream like model.
Does anyone know how I could tweak the model with keras to generate pictures that it learned co classify? Is it even possible?
Here is my current code:
import tensorflow.keras
from PIL import Image, ImageOps
import numpy as np
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
# Disable scientific notation for clarity
# Load the model
model = tensorflow.keras.models.load_model('C:/Users/me/Downloads/keras_model.h5')
# Create the array of the right shape to feed into the keras model
# The 'length' or number of images you can put into the array is
# determined by the first position in the shape tuple, in this case 1.
data = np.ndarray(shape=(1, 224, 224, 3), dtype=np.float32)
# Replace this with the path to your image
image = Image.open('C:/Users/me/Downloads/0a8d8fa2c09ed00a54b6590f2fa01436.jpg')
#resize the image to a 224x224 with the same strategy as in TM2:
#resizing the image to be at least 224x224 and then cropping from the center
size = (224, 224)
image = ImageOps.fit(image, size, Image.ANTIALIAS)
#turn the image into a numpy array
image_array = np.asarray(image)
# display the resized image
# Normalize the image
normalized_image_array = (image_array.astype(np.float32) / 127.0) - 1
# Load the image into the array
data[0] = normalized_image_array
# run the inference
prediction = model.predict(data)
The idea is quite simple. you need to feed the image to the model and then maximize the activation of certain layers wrt the image itself not the weights of the model (changing the layers will change the result)
tensorflow made an awesome notebook, check it out here for more information and detailed examples
It is the first time that I am working with the LSTM networks. I have a video with a frame rate of 30 fps. I have a CNN network (AlexNet based) and I want to feed the last layer of my CNN network into the recurrent network (I am using tensorflow). Supposing that my batch_size=30, so equal to the fps, and I want to have a timestep of 1 second (so, every 30 frames). The output of the last layer of my network will be [bast_size, 1000], so in my case [30, 1000], now do I have to reshape the size of my output to [batch_size, time_steps, features] (in my case: [30, 30, 1000])? Is that correct? or am I wrong?
Consider to build your CNN model with Conv2D and MaxPool2D layers, until you reach your Flatten layer, because the vectorized output from the Flatten layer will be you input data to the LSTM part of your structure.
So, build your CNN model like this:
model_cnn = Sequential()
Now, this is an interesting point, the current version of Keras has some incompatibility with some TensorFlow structures that will not let you stack your entire layers in just one Sequential object.
So it's time to use the Keras Model Object to complete you neural network with a trick:
input_lay = Input(shape=(None, ?, ?, ?)) #dimensions of your data
time_distribute = TimeDistributed(Lambda(lambda x: model_cnn(x)))(input_lay) # keras.layers.Lambda is essential to make our trick work :)
lstm_lay = LSTM(?)(time_distribute)
output_lay = Dense(?, activation='?')(lstm_lay)
And finally, now it's time to put together our 2 separated models:
model = Model(inputs=[input_lay], outputs=[output_lay])
Now, on our OpenCV part, use an algorithm like the one shown below to preprocess your videos directly in order to build a big tensor of frames to you feed in your network:
video_folder = '/path.../'
X_data = []
y_data = []
list_of_videos = os.listdir(vide_folder)
for i in list_of_videos:
#Video Path
vid = str(video_folder + i) #path to each video from list1 = os.listdir(path)
#Reading the Video
cap = cv2.VideoCapture(vid)
#Reading Frames
#fps = vcap.get(5)
#To Store Frames
frames = []
for j in range(40): #here we get 40 frames, for example
ret, frame = cap.read()
if ret == True:
print('Class 1 - Success!')
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) #converting to gray
frame = cv2.resize(frame,(30,30),interpolation=cv2.INTER_AREA)
X_data.append(frames) #appending each tensor of 40 frames resized for 30x30
y_data.append(1) #appending a class label to the set of 40 frames
X_data = np.array(X_data)
y_data = np.array(y_data) #ready to split! :)
If you merge several small sequences from different videos to form a batch, the output of the last layer of your model (the RNN) should already be [batch_size, window_size, num_classes]. Basically, you want to wrap your CNN with reshape layers which will concatenate the frames from each batch:
input -> [batch_size, window_size, nchannels, height, width]
reshape -> [batch_size * window_size, nchannels, height, width]
CNN -> [batch_size * window_size, feat_size]
reshape -> [batch_size, window_size, feats_size]
RNN -> [batch_size, window_size, num_outputs] (assuming frame-wise predictions)
But this will take a lot of memory, so you can set batch size to 1, which is what you seem to be doing if I understood correctly. In this case you can spare the first reshape.
I'm not sure about the order of the axes above, but the general logic remains the same.
As a side note: if you plan on using Batch Normalization at some point, you may want to raise the batch size because consecutive frames from a single segment might not contain a lot of variety by themselves. Also give a double check on the batch normalization axes which should cover both time and batch axes.
I want to make tensorflow's inception v3 to give out tags for an image. My goal is to convert a JPEG image to input that is accepted by inception neural network. I don't know how to process the images first so that it can run with Google Inception's v3 model. The original tensorflow project is here:
Originally, all the images are in a dataset and the entire dataset is first passed to input() or distorted_inputs() in ImageProcessing.py . The images in dataset are processed and passed to the train() or eval() methods (both of these work). The problem is I want a function to print out tags for one specific image (not dataset).
Below is the code for inference function that is used to generate tag with google inception. inceptionv4 function is a convolutional neural network implemented in tensorflow.
def inference(images, num_classes, for_training=False, restore_logits=True,
"""Build Inception v3 model architecture.
See here for reference: http://arxiv.org/abs/1512.00567
images: Images returned from inputs() or distorted_inputs().
num_classes: number of classes
for_training: If set to `True`, build the inference model for training.
Kernels that operate differently for inference during training
e.g. dropout, are appropriately configured.
restore_logits: whether or not the logits layers should be restored.
Useful for fine-tuning a model with different num_classes.
scope: optional prefix string identifying the ImageNet tower.
Logits. 2-D float Tensor.
Auxiliary Logits. 2-D float Tensor of side-head. Used for training only.
# Parameters for BatchNorm.
batch_norm_params = {
# Decay for the moving averages.
# epsilon to prevent 0s in variance.
'epsilon': 0.001,
# Set weight_decay for weights in Conv and FC layers.
with slim.arg_scope([slim.ops.conv2d, slim.ops.fc], weight_decay=0.00004):
with slim.arg_scope([slim.ops.conv2d],
logits, endpoints = inception_v4(
# Add summaries for viewing model statistics on TensorBoard.
# Grab the logits associated with the side head. Employed during training.
auxiliary_logits = endpoints['AuxLogits']
return logits, auxiliary_logits
This is my attempt to process the image before it is passed to inference function.
def process_image(self, image_path):
filename_queue = tf.train.string_input_producer(image_path)
reader = tf.WholeFileReader()
key, value = reader.read(filename_queue)
img = tf.image.decode_jpeg(value)
height = self.image_size
width = self.image_size
image_data = tf.cast(img, tf.float32)
image_data = tf.reshape(image_data, shape=[1, height, width, 3])
return image_data
I wanted to process an image file simply so that I can pass it to the inference function. And that inference prints out the tags. The above code didn't work and printed error:
ValueError: Shape () must have rank at least 1
I appreciate if anyone can provide any insight into this problem.
Inception just needs (299,299,3) images with inputs scaled between -1 and 1. See code below. I just change the images using this and put them in a TFRecord ( and then queue ) to run my stuff.
from PIL import Image
import PIL
import numpy as np
def load_image( self, image_path ):
img = Image.open( image_path )
newImg = img.resize((299,299), PIL.Image.BILINEAR).convert("RGB")
data = np.array( newImg.getdata() )
return 2*( data.reshape( (newImg.size[0], newImg.size[1], 3) ).astype( np.float32 )/255 ) - 1