I am trying to build a reinforcement learning agent to learn off a custom environment, built to openai's gym specifications.
I have np arrays of size (20, 7) which I want to pass to the network, and output one of 7 actions.
I am having trouble building the actual network, as I want to include LSTM layers. My code is as follows:
def build_model():
model = Sequential()
model.add(LSTM(60, return_sequences = True, input_shape=(20, 7), activation = 'relu'))
model.add(Dense(21, activation = "relu"))
model.add(Flatten())
model.add(Dense(7, activation="linear"))
model.compile(loss="mse", optimizer=Adam(lr=0.0002), metrics=['accuracy'])
return model
However, when I build the agent, there is suddenly an extra dimension added on which the network does not expect:
def build_agent(model, actions):
policy = BoltzmannQPolicy()
memory = SequentialMemory(limit=50000, window_length = 1)
dqn = DQNAgent(model=model, memory=memory, policy=policy,
nb_actions=actions, nb_steps_warmup=10, target_model_update=1e-2)
return dqn
dqn = build_agent(model, actions)
dqn.compile(Adam(lr=1e-3), metrics=['mae'])
dqn.fit(env, nb_steps=50000, visualize=False, verbose=1)
ValueError: Error when checking input: expected lstm_input to have 3 dimensions, but got array with shape (1, 1, 20, 7)
Im not exactly sure why the agent is reshaping the data to add an extra dimension, (or two?) but if anyone had an idea on how to stop this from happening so I can train my network I would be very grateful. My solution runs when I code it myself however I want to make use of the keras rl-2 library.
Thanks in advance!
For anyone looking for an answer, I fixed this by adding layer:
model.add(Reshape((20, 7), input_shape=(1, 20, 7)))
as the first layer
As I understand it your agent is able to compute your environment but is unable to compute other keras rl-2 environments because they add another dimension to the feature vector (the input). I believe this is because the environments you are trying to run returns a feature vector that includes channels. Channels simply mean how many values do you need to describe one pixel. For instance, RGB needs 3 channels while your environment returns a simplification of only one channel.
Since you are only interested in one channel you could either squeeze away that channel since you do not need it with:
state = state.squeeze(axis=1)
Before passing it into the net. Or you could define your model to include channels by setting your input to be (1,20,7) which could be useful if you in the future want to apply convolutional layers which require a defined count of channels.
try this at first layer
window_length = 1
model.add(Flatten(input_shape=(window_lenght ,) + env.observation_space.shape))
Related
Ng's Convolutional Neural Network class's Week 2 Lab on using Transfer Learning with MobileNetV2 (summary: https://github.com/EhabR98/Transfer-Learning-with-MobileNetV2) and an additional tutorial (https://blog.roboflow.com/how-to-train-mobilenetv2-on-a-custom-dataset/) both begin like this:
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE, include_top=False, weights='imagenet')
base_model.trainable = False
They then proceed to add a pooling layer(s), a Dropout layer and a Dense 1-unit layer to the end, apply a BinaryCrossentropy loss and some kind of optimizer, then train it on some custom data that has been inputted. Lets call this custom model "model2" as Ng's lab does
Here's what my the Coursera class model looks like, its important to include here because the variable base_model is called in two different closures throughout the Coursera lab (previous to this it was called outside of a method, as base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE, include_top=True, weights='imagenet'); base_model.trainable= False)
def alpaca_model(image_shape=IMG_SIZE, data_augmentation=data_augmenter())
input_shape = image_shape + (3,0)
base_model = tf.keras.applications.MobileNetV2(input_shape=input_shape, include_top=False, weights='imagenet')
base_model.trainable = False
inputs = tf.keras.Input(shape=input_shape)
x = data_augmentation(inputs)
x = preprocess_input(x)
x = base_model(x, training=False)
x = tfl.GlobalAveragePooling2D()(x)
x = tfl.Dropout(0.2)(x)
prediction_layer = tfl.Dense(1)
outputs = prediction_layer(x)
model = tf.keras.Model(inputs, outputs)
return model
model2 = alpaca_model()
base_learning_rate = 0.001
initial_epochs = 5
model2.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=base_learning_rate), loss=tf.keras.losses.BinaryCrossentropy(from_logits=True), metrics=["accuracy"])
history = model2.fit(train_dataset, validation_data=validation_dataset, epochs=initial_epochs)
This performs OK, getting as much as 80% accuracy
Fine tuning -- Now in both the course lab and the tutorial, they then proceed to "unfreeze" some of the last layers of the internal network so that they can be trained, like so:
fine_tune_at = 120
base_model = model2.layers[4] #totally separate question, but I would love to hear in comments, what this does exactly. It is difficult to Google this.
base_model.trainable = True
print("#/layers in base model: ", len(base_model.layers))
for layer in base_model.layers[:fine_tune_at]:
layer.trainable = False
loss_function = tf.keras.losses.BinaryCrossentrop(from_logits=True)
optimizer = tf.keras.optimizers.Adam(learning_rate=base_learning_rate*0.1)
metrics = ['accuracy']
fine_tune_epochs = 5
total_epochs = initial_epochs + fine_tune_epochs
Up until this point, I'm satisfied, I can clearly see what is going on, but then:
model2.compile(loss=loss_function,optimizer=optimizer,metrics=metrics)
history_fine = model2.fit(train_dataset, epochs=total_epochs, initial_epoch=history.epoch[-1], validation_data = validation_dataset)
This leads to a marked improvement in results. Which confused me, I was very much expecting base_model to get passed in somehow. I didn’t imagine that altering some other variable that hasn’t been passed into or been initially called would come into play.
So given all of that context, the question is: How is altering the base_model affecting model2?
If the above example from the Coursera lab is as confusing to you as it is to me, the example shown on https://blog.roboflow.com/how-to-train-mobilenetv2-on-a-custom-dataset/ as mentioned above is much simpler and contains much less ambiguity as base_model is defined only once. Regardless, the same dynamic applies and I'm equally confused on both. Thanks again for your time
Your
totally separate question, but I would love to hear in comments, what this does exactly.
The following list get the MobileNetV2 model:
base_model = model2.layers[4]
Why 4? Because the first layer is the input, the second layer is the data augmentation (a Sequential model), the third and fourth layers are for image preprocessing (divide by 127.5 and subtract -1 to have values between -1 and 1), the fifth layer is MobileNetV2 (index 4). The other layers are your top-net.
How is altering the base_model affecting model2
During the first pass (transfer learning), all layers of MNv2 are frozen, so weights and biases remain intact. Whereas for the second pass (fine tuning), the last convolution layers (block 13 to 16 and last Conv2D) are now unfrozen so that the model can modify the weights and bias of the base model. Therefore, the following layers will be changed during training.
To view the full model summary with nested models, use:
>>> model2.summary(line_length=125, expand_nested=True, show_trainable=True)
I'll go ahead and post an answer that speaks to (the problem with) Professor Ng's Convolution Neural Network, Week 2 Assignment, "Transfer Learning with MobileNet", with the hope that other students might find this answer and realize that they are not crazy and that the Lab was poorly coded.
I'm not sure how it (appears to work) on Jupyter, but the main reason I was having problems with this lab was that base_model was defined several times within the lab. It should have only been defined once. Even worse, the base_model was redefined inside the alpaca_model() function, but that's not accessible outside the closure of the function. I'm not in the industry, but that is just plain terrible coding to redefine a variable inside a method that's already been defined then call it again outside of the method.
Once I took base_model out of the function, defining it beforehand, everything works perfectly not just on the computer, but in my head.
I'm working in a neural network using keras (from tensorflow) for a college project.
I'm pretty new to the library, so I don't really know how should I feed the data into the model in order for the training to work. I've been searching the internet for hours
and I can't find a proper tutorial / documentation on how to do it.
Here's the model I'm using, one of the simplest possible ones:
model = keras.Sequential([
keras.layers.Dense(20, input_dim=1,activation = activations.relu),
keras.layers.Dense(10, activation= activations.relu),
keras.layers.Dense(8, activation= activations.sigmoid)
])
model.compile(optimizer = "adam", loss = "sparse_categorical_crossentropy",
metrics = ["accuracy"])
The input to the network is a list of 20 floats, and the output a list of 8 floats ranged from 0 to 1 (confidence level), so I think this model is OK, please let me know if I'm wrong.
Here's a diagram of the model i'm trying to build and train:
Let's say I have:
10 input examples (10 lists of 20 floats) for the expected output [1,0,0,0,0,0,0,0]
10 input examples for the expect output [0,1,0,0,0,0,0,0,0,0]
...
10 input examples for the expected output [0,0,0,0,0,0,0,1]
How should I prepare this data in order to use it with
model.fit(training_inputs,expected_outputs,epochs = NUM_EPOCHS) ?
What should training_inputs exactly be? and expected_outputs?
Any help will be appreciated. Thank you for your time!
First of all, you have two issues in your model. According to your description, your input data is 20-dimensional, so in the first layer you should have input_dim=20. Then, you have a cross-entropy loss, so I'm assuming that you are training a 8-class classifier. If that's the case, then instead of keras.layers.Dense(8, activation= activations.sigmoid) you should use
keras.layers.Dense(8, activation=None),
keras.layers.Softmax()
as that ensures that you get a distribution over classes for each input data point.
Now regarding your input data question, training_inputs should a tensor (or numpy array, which will be readily converted) with shape (n_points, 20) in your case. Accordingly, expected_outputs should have shape (n_points, 8). So, just concatenate/stack your input data along the first dimension (axis=0), such that each row corresponds to your 20-dimensional data points. You do the same for expected_outputs, maybe something like,
expected_outputs = np.r_[
np.tile([[1,0,0,0,0,0,0,0]], (10, 1)),
np.tile([[0,1,0,0,0,0,0,0]], (10, 1)),
...
np.tile([[0,0,0,0,0,0,0,1]], (10, 1)),
]
Remember to set batch_size and shuffle!
I am using pretrained models to classify image. My question is what kind of layers do I have to add after using the pretrained model structure in my model, resp. why these two implementations differ. To be specific:
Consider two examples, one using the cats and dogs dataset:
One implementation can be found here. The crucial point is that the base model:
# Create the base model from the pre-trained model MobileNet V2
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
include_top=False,
weights='imagenet')
base_model.trainable = False
is frozen and a GlobalAveragePooling2D() is added, before a final tf.keras.layers.Dense(1) is added. So the model structure looks like:
model = tf.keras.Sequential([
base_model,
global_average_layer,
prediction_layer
])
which is equivalent to:
model = tf.keras.Sequential([
base_model,
tf.keras.layers.GlobalAveragePooling2D()
tf.keras.layers.Dense(1)
])
So they added not only a final dense(1) layer, but also a GlobalAveragePooling2D() layer before.
The other using the tf flowers dataset:
In this implementation it is different. A GlobalAveragePooling2D() is not added.
feature_extractor_url = "https://tfhub.dev/google/tf2-preview/mobilenet_v2/feature_vector/2"
feature_extractor_layer = hub.KerasLayer(feature_extractor_url,
input_shape=(224,224,3))
feature_extractor_layer.trainable = False
model = tf.keras.Sequential([
feature_extractor_layer,
layers.Dense(image_data.num_classes)
])
Where image_data.num_classes is 5 representing the different flower classification. So in this example a GlobalAveragePooling2D() layer is not added.
I do not understand this. Why is this different? When to add a GlobalAveragePooling2D() or not? And what is better / should I do?
I am not sure if the reason is that in one case the dataset cats and dogs is binary classification and in the other it is a multiclass classifcation problem. Or the difference is that in one case tf.keras.applications.MobileNetV2 was used to load MobileNetV2 and in the other implementation hub.KerasLayer was used to get the feature_extractor. When I check the model in the first implementation:
I can see that the last layer is a relu activation layer.
When I check the feature_extractor:
model = tf.keras.Sequential([
feature_extractor,
tf.keras.layers.Dense(1)
])
model.summary()
I get the output:
So maybe reason is also that I do not understand the difference between tf.keras.applications.MobileNetV2 vs hub.KerasLayer. The hub.KerasLayer just gives me the feature extractor. I know this, but still I think I did not get the difference between these two methods.
I cannot check the layers of the feature_extractor itself. So feature_extractor.summary() or feature_extractor.layers does not work. How can I inspect the layers here? And how can I know I should add GlobalAveragePooling2D or not?
Summary
Why is this different? When to add a GlobalAveragePooling2D() or not? And what is better / should I do?
The first case it outputs 4 dimensional tensors that are raw outputs of the last convolutional layer. So, you need to flatten them somehow, and in this example you are using GlobalAveragePooling2D (but you could use any other strategy). I can't tell which is better: it depends on your problem, and depending on how hub.KerasLayer version implemented the flatten, they could be exactly the same. That said, I'd just pickup one of them and go on: I don't see huge differences among them,
Long answer: understanding Keras implementation
The difference is in the output of both base models: in your keras examples, outputs are of shape (bz, hh, ww, nf) where bz is batch size, hh and ww are height and weight of the last convolutional layer in the model and nf is the number of filters (or convolutions) applied in this last layer.
So: this outputs the output of the last convolutions (or filters) of the base model.
Hence, you need to convert those outputs (which you can think them as images) to vectors of shape (bz, n_feats), where n_feats is the number of features the base model is computing. Once this conversion is done, you can stack your classification layer (or as many layers as you want) because at this point you have vectors.
How to compute this conversion? Some common alternatives are taking the average or maximum among the convolutional output (which reduces the size), or you could just reshape them as a single row, or add more convolutional layers until you get a vector as an output (I strongly suggest to follow usual practices like average or maximum).
In your first example, when calling tf.keras.applications.MobileNetV2, you are using the default police with respect to this last year, and hence, the last convolutional layer is let "as is": some convolutions. You can modify this behavior with the param pooling, as documented here:
pooling: Optional pooling mode for feature extraction when include_top is False.
None (default) means that the output of the model will be the 4D tensor output of the last convolutional block.
avg means that global average pooling will be applied to the output of the last convolutional block, and thus the output of the model will be a 2D tensor.
max means that global max pooling will be applied.
In summary, in your first example, you are building the base model without telling explicitly what to do with the last layer, the model keeps returning 4 dimensional tensors that you immediately convert to vectors with the usage of average pooling, so you can avoid this explicit average pooling if you tell Keras to do it:
# Create the base model from the pre-trained model MobileNet V2
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
include_top=False,
pooling='avg', # Tell keras to average last layer
weights='imagenet')
base_model.trainable = False
model = tf.keras.Sequential([
base_model,
# global_average_layer, -> not needed any more
prediction_layer
])
TFHub implementation
Finally, when you use the TensorFlow Hub implementation, as you picked up the feature_vector version of the model, it already implements some kind of pooling (which I didn't found yet how) to make sure the model outputs vectors rather than 4 dimensional tensors. So, you don't need to add explicitly the layer to convert them because it is already done.
In my opinion, I prefer Keras implementation since it gives you more freedom to pick the strategy you want (in fact you could keep stacking whatever you want).
Lets say there is a model taking [1, 208, 208, 3] images and has 6 pooling layers with kernels [2, 2, 2, 2, 2, 7] which would result in a feature column for image [1, 1, 1, 2048] for 2048 filters in the last conv layer. Note, how the last pooling layer accepts [1, 7, 7, 2048] inputs
If we relax the constrains for the input image (which is typically the case for object deteciton models) than after same set of pooling layers image of size [1, 104, 208, 3] would produce pre-last-pooling output of [1, 4, 7, 2024] and [1, 256, 408, 3] would yeild [1, 8, 13, 2048]. This maps would have about the same amount information as original [1, 7, 7, 2048] but the original pooling layer would not produce a feature column wiht [1, 1, 1, N]. That is why we switch to global pooling layer.
In short, global pooling layer is important if we don't have strict restriction on the input image size (and don't resize the image as the first op in the model).
I think difference in output of models
"https://tfhub.dev/google/tf2-preview/mobilenet_v2/feature_vector/2" has output is 1d vector * batch_size, you just can't apply Conv2D to it.
Output of tf.keras.applications.MobileNetV2 probably more complex, thus you have more capability to transform one.
I am training an autoencoder constructed using the Sequential API in Keras. I'd like to create separate models that implement the encoding and decoding functions. I know from examples how to do this with the functional API, but I can't find an example of how it's done with the Sequential API. The following sample code is my starting point:
input_dim = 2904
encoding_dim = 4
hidden_dim = 128
# instantiate model
autoencoder = Sequential()
# 1st hidden layer
autoencoder.add(Dense(hidden_dim, input_dim=input_dim, use_bias=False))
autoencoder.add(BatchNormalization())
autoencoder.add(Activation('elu'))
autoencoder.add(Dropout(0.5))
# encoding layer
autoencoder.add(Dense(encoding_dim, use_bias=False))
autoencoder.add(BatchNormalization())
autoencoder.add(Activation('elu'))
# autoencoder.add(Dropout(0.5))
# 2nd hidden layer
autoencoder.add(Dense(hidden_dim, use_bias=False))
autoencoder.add(BatchNormalization())
autoencoder.add(Activation('elu'))
autoencoder.add(Dropout(0.5))
# output layer
autoencoder.add(Dense(input_dim))
I realize I can select individual layers using autoencoder.layer[i], but I don't know how to associate a new model with a range of such layers. I naively tried the following:
encoder = Sequential()
for i in range(0,7):
encoder.add(autoencoder.layers[i])
decoder = Sequential()
for i in range(7,12):
decoder.add(autoencoder.layers[i])
print(encoder.summary())
print(decoder.summary())
which seemingly worked for the encoder part (a valid summary was shown), but the decoder part generated an error:
This model has not yet been built. Build the model first by calling build() or calling fit() with some data. Or specify input_shape or batch_input_shape in the first layer for automatic build.
Since the input shape for a middle layer (i.e. here I am referring to autoencoder.layers[7]) is not explicitly set, when you add it to another model as the first layer, that model would not be built automatically (i.e. building process involves constructing weight tensor for the layers in the model). Therefore, you need to call build method explicitly and set the input shape:
decoder.build(input_shape=(None, encoding_dim)) # note that batch axis must be included
As a side note, there is no need to call print on model.summary(), since it would print the result by itself.
Another way which also works.
input_img = Input(shape=(encoding_dim,))
previous_layer = input_img
for i in range(bottleneck_layer,len(autoencoder.layers)): # bottleneck_layer = index of bottleneck_layer + 1!
next_layer = autoencoder.layers[i](previous_layer)
previous_layer = next_layer
decoder = Model(input_img, next_layer)
I am trying to perform the usual classification on the MNIST database but with randomly cropped digits.
Images are cropped the following way : removed randomly first/last and/or row/column.
I would like to use a Convolutional Neural Network using Keras (and Tensorflow backend) to perform convolution and then the usual classification.
Inputs are of variable size and i can't manage to get it to work.
Here is how I cropped digits
import numpy as np
from keras.utils import to_categorical
from sklearn.datasets import load_digits
digits = load_digits()
X = digits.images
X = np.expand_dims(X, axis=3)
X_crop = list()
for index in range(len(X)):
X_crop.append(X[index, np.random.randint(0,2):np.random.randint(7,9), np.random.randint(0,2):np.random.randint(7,9), :])
X_crop = np.array(X_crop)
y = to_categorical(digits.target)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_crop, y, train_size=0.8, test_size=0.2)
And here is the architecture of the model I want to use
from keras.layers import Dense, Dropout
from keras.layers.convolutional import Conv2D
from keras.models import Sequential
model = Sequential()
model.add(Conv2D(filters=10,
kernel_size=(3,3),
input_shape=(None, None, 1),
data_format='channels_last'))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
model.summary()
model.fit(X_train, y_train, epochs=100, batch_size=16, validation_data=(X_test, y_test))
Does someone have an idea on how to handle variable sized input in my neural network?
And how to perform classification?
TL/DR - go to point 4
So - before we get to the point - let's fix some problems with your network:
Your network will not work because of activation: with categorical_crossentropy you need to have a softmax activation:
model.add(Dense(10, activation='softmax'))
Vectorize spatial tensors: as Daniel mentioned - you need to, at some stage, switch your vectors from spatial (images) to vectorized (vectors). Currently - applying Dense to output from a Conv2D is equivalent to (1, 1) convolution. So basically - output from your network is spatial - not vectorized what causes dimensionality mismatch (you can check that by running your network or checking the model.summary(). In order to change that you need to use either GlobalMaxPooling2D or GlobalAveragePooling2D. E.g.:
model.add(Conv2D(filters=10,
kernel_size=(3, 3),
input_shape=(None, None, 1),
padding="same",
data_format='channels_last'))
model.add(GlobalMaxPooling2D())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(10, activation='softmax'))
Concatenated numpy arrays need to have the same shape: if you check the shape of X_crop you'll see that it's not a spatial matrix. It's because you concatenated matrices with different shapes. Sadly - it's impossible to overcome this issue as numpy.array need to have a fixed shape.
How to make your network train on examples of different shape: The most important thing in doing this is to understand two things. First - is that in a single batch every image should have the same size. Second - is that calling fit multiple times is a bad idea - as you reset inner model states. So here is what needs to be done:
a. Write a function which crops a single batch - e.g. a get_cropped_batches_generator which given a matrix cuts a batch out of it and crops it randomly.
b. Use train_on_batch method. Here is an example code:
from six import next
batches_generator = get_cropped_batches_generator(X, batch_size=16)
losses = list()
for epoch_nb in range(nb_of_epochs):
epoch_losses = list()
for batch_nb in range(nb_of_batches):
# cropped_x has a different shape for different batches (in general)
cropped_x, cropped_y = next(batches_generator)
current_loss = model.train_on_batch(cropped_x, cropped_y)
epoch_losses.append(current_loss)
losses.append(epoch_losses.sum() / (1.0 * len(epoch_losses))
final_loss = losses.sum() / (1.0 * len(losses))
So - a few comments to code above: First, train_on_batch doesn't use nice keras progress bar. It returns a single loss value (for a given batch) - that's why I added logic to compute loss. You could use Progbar callback for that also. Second - you need to implement get_cropped_batches_generator - I haven't written a code to keep my answer a little bit more clear. You could ask another question on how to implement it. Last thing - I use six to keep compatibility between Python 2 and Python 3.
Usually, a model containing Dense layers cannot have variable size inputs, unless the outputs are also variable. But see the workaround and also the other answer using GlobalMaxPooling2D - The workaround is equivalent to GlobalAveragePooling2D. These are layers that can eliminiate the variable size before a Dense layer and suppress the spatial dimensions.
For an image classification case, you may want to resize the images outside the model.
When my images are in numpy format, I resize them like this:
from PIL import Image
im = Image.fromarray(imgNumpy)
im = im.resize(newSize,Image.LANCZOS) #you can use options other than LANCZOS as well
imgNumpy = np.asarray(im)
Why?
A convolutional layer has its weights as filters. There is a static filter size, and the same filter is applied to the image over and over.
But a dense layer has its weights based on the input. If there is 1 input, there is a set of weights. If there are 2 inputs, you've got twice as much weights. But weights must be trained, and changing the amount of weights will definitely change the result of the model.
As #Marcin commented, what I've said is true when your input shape for Dense layers has two dimensions: (batchSize,inputFeatures).
But actually keras dense layers can accept inputs with more dimensions. These additional dimensions (which come out of the convolutional layers) can vary in size. But this would make the output of these dense layers also variable in size.
Nonetheless, at the end you will need a fixed size for classification: 10 classes and that's it. For reducing the dimensions, people often use Flatten layers, and the error will appear here.
A possible fishy workaround (not tested):
At the end of the convolutional part of the model, use a lambda layer to condense all the values in a fixed size tensor, probably taking a mean of the side dimensions and keeping the channels (channels are not variable)
Suppose the last convolutional layer is:
model.add(Conv2D(filters,kernel_size,...))
#so its output shape is (None,None,None,filters) = (batchSize,side1,side2,filters)
Let's add a lambda layer to condense the spatial dimensions and keep only the filters dimension:
import keras.backend as K
def collapseSides(x):
axis=1 #if you're using the channels_last format (default)
axis=-1 #if you're using the channels_first format
#x has shape (batchSize, side1, side2, filters)
step1 = K.mean(x,axis=axis) #mean of side1
return K.mean(step1,axis=axis) #mean of side2
#this will result in a tensor shape of (batchSize,filters)
Since the amount of filters is fixed (you have kicked out the None dimensions), the dense layers should probably work:
model.add(Lambda(collapseSides,output_shape=(filters,)))
model.add(Dense.......)
.....
In order for this to possibly work, I suggest that the number of filters in the last convolutional layer be at least 10.
With this, you can make input_shape=(None,None,1)
If you're doing this, remember that you can only pass input data with a fixed size per batch. So you have to separate your entire data in smaller batches, each batch having images all of the same size. See here: Keras misinterprets training data shape