I have the following code snippet:
img = cv2.imread('1.jpg')
When I print img, I get the result shown below. How can I return the 1.jpg part only?
[[[140 149 139]
[153 162 152]
[155 165 153]
...,
[ 44 20 8]
[ 46 22 10]
[ 46 22 10]]
[[151 160 150]
[156 165 155]
[152 162 150]
...,
[ 47 23 11]
[ 48 24 12]
[ 45 21 9]]
[[155 164 154]
[152 161 151]
[146 156 144]
...,
[ 47 23 11]
[ 49 25 13]
[ 49 25 13]]
...,
[[ 28 16 6]
[ 33 21 11]
[ 32 20 10]
...,
[144 131 105]
[150 137 111]
[151 138 112]]
[[ 33 18 9]
[ 34 19 10]
[ 34 20 8]
...,
[144 135 108]
[143 134 107]
[148 139 112]]
[[ 31 16 7]
[ 31 16 7]
[ 35 21 9]
...,
[145 141 112]
[137 133 105]
[143 139 111]]]
Thanks.
I believe cv2.imread returns a numpy array. So, there is no way to store the name unless you use a custom class.
class MyImage:
def __init__(self, img_name):
self.img = cv2.imread(img_name)
self.__name = img_name
def __str__(self):
return self.__name
Then, you can use this as:
>>> x = MyImage('1.jpg')
>>> str(x)
1.jpg
>>> x.img # the numpy array
Related
i ran into an error while trying to train a tensorflow model. In my code below i can choose a folder with pictures of the person and its name. The Pictures should get trained by the model. At the first dataset it all works finde, but when i am trying to train a second set it gives me the error that the Shapes are incompatible.
I am thankful for every help!
import os
import cv2
import numpy as np
import tensorflow as tf
# User input name and filepath
name = "Tony Stark"
folder_path = "C:/Users/johan/Pictures/tony"
img_size = (160, 160)
# Read labels from the label.txt file
label_dict = {}
if os.path.isfile("label.txt"):
with open("label.txt", "r") as file:
label_dict = eval(file.read())
# Add new label if it does not exist
if name not in label_dict:
new_label = len(label_dict)
label_dict[name] = new_label
with open("label.txt", "w") as file:
file.write(str(label_dict))
else:
new_label = label_dict[name]
# Load the model if it exists, create a new one if it doesn't
if os.path.exists("tony_face_model.h5"):
model = tf.keras.models.load_model("tony_face_model.h5")
print("Loaded model from disk")
else:
model = tf.keras.models.Sequential([
tf.keras.layers.Input(shape=(img_size[0], img_size[1], 3)),
tf.keras.layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation="relu"),
tf.keras.layers.Dense(len(label_dict), activation="softmax"),
])
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])
print("Created a new model")
face_images = []
labels = []
for i, filename in enumerate(os.listdir(folder_path)):
img_path = os.path.join(folder_path, filename)
img = cv2.imread(img_path)
face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_frontalface_default.xml")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray, scaleFactor=1.2, minNeighbors=5)
for (x, y, w, h) in faces:
if w == h:
face_image = img[y:y + h, x:x + w]
face_image = cv2.resize(face_image, img_size)
face_images.append(face_image)
label = new_label
labels.append(label)
if len(face_images) > 0:
face_images = np.array(face_images)
labels = np.array(labels)
# convert labels to one-hot encoding
labels = tf.keras.utils.to_categorical(labels)
# Split the data into training and validation sets
split = int(len(face_images) * 0.8)
train_images = face_images[:split]
train_labels = labels[:split]
val_images = face_images[split:]
val_labels = labels[split:]
# train the model on the new data
model.fit(x=train_images, y=train_labels, validation_data=(val_images, val_labels), epochs=30)
model.save("tony_face_model.h5")
print("Model saved to disk")
else:
print("No new faces found")
An here the Error-Code:
Loaded model from disk
Epoch 1/30
Traceback (most recent call last):
File "C:\Users\johan\PycharmProjects\recognitionSoft\main.py", line 78, in <module>
model.fit(x=train_images, y=train_labels, validation_data=(val_images, val_labels), epochs=30)
File "C:\Users\johan\PycharmProjects\recognitionSoft\venv\lib\site-packages\keras\utils\traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "C:\Users\johan\AppData\Local\Temp\__autograph_generated_file0ma0lsv9.py", line 15, in tf__train_function
retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
ValueError: in user code:
File "C:\Users\johan\PycharmProjects\recognitionSoft\venv\lib\site-packages\keras\engine\training.py", line 1249, in train_function *
return step_function(self, iterator)
File "C:\Users\johan\PycharmProjects\recognitionSoft\venv\lib\site-packages\keras\engine\training.py", line 1233, in step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "C:\Users\johan\PycharmProjects\recognitionSoft\venv\lib\site-packages\keras\engine\training.py", line 1222, in run_step **
outputs = model.train_step(data)
File "C:\Users\johan\PycharmProjects\recognitionSoft\venv\lib\site-packages\keras\engine\training.py", line 1024, in train_step
loss = self.compute_loss(x, y, y_pred, sample_weight)
File "C:\Users\johan\PycharmProjects\recognitionSoft\venv\lib\site-packages\keras\engine\training.py", line 1082, in compute_loss
return self.compiled_loss(
File "C:\Users\johan\PycharmProjects\recognitionSoft\venv\lib\site-packages\keras\engine\compile_utils.py", line 265, in __call__
loss_value = loss_obj(y_t, y_p, sample_weight=sw)
File "C:\Users\johan\PycharmProjects\recognitionSoft\venv\lib\site-packages\keras\losses.py", line 152, in __call__
losses = call_fn(y_true, y_pred)
File "C:\Users\johan\PycharmProjects\recognitionSoft\venv\lib\site-packages\keras\losses.py", line 284, in call **
return ag_fn(y_true, y_pred, **self._fn_kwargs)
File "C:\Users\johan\PycharmProjects\recognitionSoft\venv\lib\site-packages\keras\losses.py", line 2004, in categorical_crossentropy
return backend.categorical_crossentropy(
File "C:\Users\johan\PycharmProjects\recognitionSoft\venv\lib\site-packages\keras\backend.py", line 5532, in categorical_crossentropy
target.shape.assert_is_compatible_with(output.shape)
ValueError: Shapes (None, 1) and (None, 2) are incompatible
Process finished with exit code 1
And here the different shape values:
Set1:
train_images
[[[[132 160 174]
[127 159 172]
[123 157 170]
...
[ 23 32 38]
[ 15 23 29]
[ 17 25 31]]
[[131 159 174]
[127 159 172]
[127 161 174]
...
[ 42 48 60]
[ 37 43 56]
[ 21 28 40]]
[[126 156 173]
[123 151 168]
[122 149 166]
...
[ 36 45 55]
[ 54 62 76]
[ 41 47 63]]
...
[[143 172 181]
[144 173 182]
[145 174 183]
...
[ 36 38 40]
[ 27 32 31]
[ 34 41 37]]
[[150 178 188]
[145 174 183]
[134 164 172]
...
[ 31 33 34]
[ 21 26 24]
[ 30 37 31]]
[[142 171 180]
[134 164 172]
[140 169 178]
...
[ 46 48 48]
[ 44 50 45]
[ 42 51 42]]]
[[[244 249 248]
[250 254 253]
[238 244 243]
...
[237 246 242]
[235 244 236]
[241 250 240]]
[[235 240 239]
[244 250 249]
[246 254 253]
...
[235 241 239]
[238 244 240]
[239 244 238]]
[[239 244 243]
[240 245 244]
[240 248 247]
...
[231 237 236]
[237 242 240]
[240 244 239]]
...
[[248 245 240]
[248 245 240]
[243 240 235]
...
[243 246 248]
[241 243 237]
[241 241 232]]
[[248 245 240]
[248 245 240]
[243 240 235]
...
[238 241 247]
[241 244 242]
[239 241 234]]
[[247 244 239]
[246 243 238]
[243 240 235]
...
[226 229 242]
[239 243 244]
[233 237 230]]]
[[[141 150 110]
[141 150 110]
[143 152 112]
...
[ 11 36 53]
[ 5 30 46]
[ 5 30 46]]
[[141 150 110]
[141 150 110]
[143 152 112]
...
[ 11 36 53]
[ 5 30 46]
[ 5 30 46]]
[[147 155 111]
[147 155 111]
[151 159 115]
...
[ 16 39 55]
[ 10 33 47]
[ 10 33 47]]
...
[[192 161 52]
[192 161 52]
[190 159 50]
...
[124 128 132]
[128 132 137]
[128 132 137]]
[[191 160 51]
[191 160 51]
[189 157 49]
...
[122 125 130]
[127 131 136]
[127 131 136]]
[[191 160 51]
[191 160 51]
[189 157 49]
...
[122 125 130]
[127 131 136]
[127 131 136]]]]
train_labels
[[1.]
[1.]
[1.]]
val_images
[[[[ 6 11 16]
[ 6 8 18]
[ 4 5 16]
...
[218 196 161]
[221 196 162]
[222 197 163]]
[[ 7 13 22]
[ 5 8 22]
[ 2 5 19]
...
[230 208 176]
[235 210 178]
[232 209 177]]
[[ 0 8 16]
[ 2 5 19]
[ 1 4 18]
...
[232 210 178]
[234 211 179]
[233 210 178]]
...
[[225 192 148]
[226 192 149]
[225 191 148]
...
[204 170 134]
[204 171 132]
[203 170 131]]
[[225 192 148]
[226 192 149]
[225 191 148]
...
[204 170 135]
[204 171 132]
[202 169 130]]
[[225 192 148]
[226 192 149]
[225 191 148]
...
[205 170 135]
[203 170 131]
[202 169 130]]]]
val_labels
[[1.]]
Set2:
train_images
[[[[213 203 198]
[198 187 183]
[208 198 197]
...
[208 183 163]
[207 183 162]
[206 182 161]]
[[216 206 201]
[206 195 192]
[198 188 187]
...
[208 183 162]
[208 183 162]
[209 182 162]]
[[221 211 205]
[210 199 196]
[177 167 166]
...
[209 183 162]
[208 183 162]
[209 181 161]]
...
[[ 33 4 2]
[ 33 4 2]
[ 34 5 2]
...
[ 64 37 29]
[ 63 36 28]
[ 55 28 20]]
[[ 35 4 2]
[ 35 4 2]
[ 37 4 2]
...
[ 56 29 22]
[ 59 31 25]
[ 53 26 19]]
[[ 35 4 2]
[ 36 4 2]
[ 39 4 2]
...
[ 46 19 12]
[ 54 27 20]
[ 55 28 21]]]
[[[126 50 7]
[127 54 6]
[129 55 8]
...
[126 136 145]
[126 132 135]
[ 89 101 107]]
[[127 52 6]
[125 53 6]
[126 54 6]
...
[122 134 151]
[132 137 143]
[ 59 73 85]]
[[132 58 10]
[130 55 9]
[128 56 8]
...
[100 119 140]
[115 122 136]
[ 93 105 116]]
...
[[159 70 18]
[163 73 18]
[170 80 23]
...
[ 27 22 21]
[ 30 22 19]
[ 31 19 18]]
[[166 77 27]
[162 74 20]
[167 80 24]
...
[ 28 20 20]
[ 27 19 19]
[ 27 19 15]]
[[165 78 22]
[169 82 26]
[168 80 26]
...
[ 29 21 21]
[ 29 22 19]
[ 28 21 18]]]
[[[180 179 175]
[ 56 55 51]
[ 96 95 94]
...
[252 252 247]
[252 252 246]
[252 252 246]]
[[178 177 172]
[ 60 59 55]
[ 83 82 80]
...
[251 250 246]
[251 251 244]
[251 251 244]]
[[227 226 221]
[ 74 73 68]
[ 66 66 64]
...
[251 250 246]
[251 250 246]
[251 250 246]]
...
[[111 122 155]
[108 120 152]
[110 121 154]
...
[203 205 205]
[201 205 207]
[201 205 207]]
[[113 126 158]
[110 123 155]
[109 121 153]
...
[201 206 205]
[200 204 205]
[198 204 205]]
[[116 128 160]
[114 126 158]
[112 124 157]
...
[201 206 205]
[198 204 205]
[198 204 205]]]]
train_labels
[[0. 1.]
[0. 1.]
[0. 1.]]
val_images
[[[[249 248 250]
[249 248 251]
[249 249 249]
...
[249 249 249]
[249 249 249]
[249 249 249]]
[[249 248 250]
[249 248 250]
[251 250 248]
...
[249 249 249]
[249 249 249]
[249 249 249]]
[[249 249 249]
[250 249 249]
[251 251 246]
...
[249 249 249]
[249 249 249]
[249 249 249]]
...
[[240 231 219]
[243 233 221]
[245 235 223]
...
[244 244 230]
[237 236 221]
[241 238 223]]
[[240 231 219]
[243 233 221]
[245 235 223]
...
[242 242 229]
[241 241 226]
[236 234 221]]
[[241 233 218]
[243 234 220]
[244 235 221]
...
[118 118 110]
[240 240 231]
[239 237 226]]]]
val_labels
[[0. 1.]]
I would like to store 25 arrays in a 5x5 array in python.
Currently, I am trying to slice an image using openCV into 25 pieces using nested for loops.
I am having difficulty storing the cropped images in the slices array
board = cv.imread("King Domino dataset/Cropped and perspective corrected boards/1.jpg",1)
tileDimW = int(board.shape[0]/5)
tileDimH = int(board.shape[1]/5)
slices = np.array([5,5])
slice = np.array([tileDimH,tileDimW])
for h in range(5):
for w in range(5):
slice = board[tileDimH*h:tileDimH*(h+1),tileDimW*w:tileDimW*(w+1)]
slices[h,w] = slice
I get the error message:
"IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed" in the final line
Update:
To address what I am guessing is your actual question (see my comment above).
Assuming your board is an array with the shape (X, Y, ...). If you want split that up into 25 tiles shaped (X/5, Y/5, ...), you can simply do the following:
Split it into 5 "vertical" tiles once along axis 1 (horizontally or column-wise) giving you an array with the shape (5, X, Y/5, ...), i.e. with each tile having the shape (X, Y/5, ...).
Split that array into 5 again along axis 1, which effectively means splitting each of the 5 tiles along their respective axis 0 (vertically or row-wise). Each of those tiles we got from step 1 will then have the shape (5, X/5, Y/5, ...), meaning each sub-tile will be split into 5 tiles of shape (X/5, Y/5, ...).
Say we have array a with the shape (10, 15):
a = np.arange(150).reshape(10, 15)
print(a)
[[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14]
[ 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29]
[ 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44]
[ 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59]
[ 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74]
[ 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89]
[ 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104]
[105 106 107 108 109 110 111 112 113 114 115 116 117 118 119]
[120 121 122 123 124 125 126 127 128 129 130 131 132 133 134]
[135 136 137 138 139 140 141 142 143 144 145 146 147 148 149]]
Step 1, using numpy.hsplit, which is equivalent to numpy.split with axis=1:
a1 = np.array(np.hsplit(a, 5))
print(a1)
[[[ 0 1 2]
[ 15 16 17]
...
[120 121 122]
[135 136 137]]
[[ 3 4 5]
[ 18 19 20]
...
[123 124 125]
[138 139 140]]
[[ 6 7 8]
[ 21 22 23]
...
[126 127 128]
[141 142 143]]
[[ 9 10 11]
[ 24 25 26]
...
[129 130 131]
[144 145 146]]
[[ 12 13 14]
[ 27 28 29]
...
[132 133 134]
[147 148 149]]]
Step 2:
a2 = np.array(np.hsplit(a1, 5))
print(a2)
[[[[ 0 1 2]
[ 15 16 17]]
...
[[ 12 13 14]
[ 27 28 29]]]
[[[ 30 31 32]
[ 45 46 47]]
...
[[ 42 43 44]
[ 57 58 59]]]
...
...
...
[[[120 121 122]
[135 136 137]]
...
[[132 133 134]
[147 148 149]]]]
Thus you can achieve the final result in one line like so:
b = np.array(np.hsplit(np.array(np.hsplit(a, 5)), 5))
print(b.shape)
(5, 5, 2, 3)
Then you have 5 x 5 tiles with the shape (2, 3).
Thus, you should be able to achieve what you want, by doing this:
slices = np.array(np.hsplit(np.array(np.hsplit(board, 5)), 5))
Avoid for-loops as much as possible, if you are already working with numpy arrays. Almost always there is a numpy-solution that is orders of magnitude faster (and probably more concise at that).
Hope this helps.
Original answer:
Given an array a with the shape (25, X, Y, Z, ...) you can simply reshape it to (5, 5, X, Y, Z, ...) like this:
a.reshape((5, 5) + a.shape[1:])
For example given a = np.arange(25*2).reshape((25, 2)), the array looks like this:
array([[ 0, 1],
[ 2, 3],
...
[46, 47],
[48, 49]])
And after the reshaping it looks like this:
array([[[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9]],
...
[[40, 41],
[42, 43],
[44, 45],
[46, 47],
[48, 49]]])
I have numpy array which shape is (512,512,3)
It is 512 x 512 image.
I want to show image and save as png with matplotlib
How can I do this???
How should I convert??
[[[ 87 48 39]
[107 43 29]
[101 40 28]
...
[115 107 100]
[115 106 100]
[115 107 102]]
[[ 94 44 30]
[106 38 20]
[ 97 38 23]
...
[114 109 103]
[113 108 103]
[114 106 98]]
[[ 87 41 30]
[ 96 40 32]
[ 92 38 37]
...
[114 110 105]
[114 110 105]
[116 109 98]]
...
[[123 112 112]
[140 120 121]
[135 120 119]
...
[215 191 218]
[221 195 223]
[217 196 214]]
[[127 116 119]
[134 115 115]
[138 123 124]
...
[217 195 220]
[220 199 221]
[215 193 208]]
[[125 118 117]
[127 115 116]
[131 121 123]
...
[215 199 220]
[216 198 217]
[202 179 198]]]
You can try the below snip,
from PIL import Image
import numpy as np
# data is your array
img = Image.fromarray(data, 'RGB')
img.save('my.png')
img.show()
plt.imshow(array)
plt.savefig('filename.png')
"Im2col" has already been implemented, Implement MATLAB's im2col 'sliding' in Python, efficiently for 2-D images in Python. I was wondering whether it is possible to extend this to arbitrary N-D images? Many applications involve high-dimensional data (e.g. convolutions, filtering, max pooling, etc.).
So the purpose of this question was really just to post my solution to this problem publicly. I could not seem to find such a solution on Google, so I decided to take a stab at it myself. Turns out the implementation is actually quite simple to extend from "Approach #2" in the post referenced in my question!
Efficient Implementation of N-D "im2col"
def im2col(im, win, strides = 1):
# Dimensions
ext_shp = tuple(np.subtract(im.shape, win) + 1)
shp = tuple(win) + ext_shp
strd = im.strides*2
win_len = np.prod(win)
try:
len(strides)
except:
strides = [strides]*im.ndim
strides = [min(i, s) for i, s in zip(im.shape, strides)]
# Stack all possible patches as an N-D array using a strided view followed by reshaping
col = np.lib.stride_tricks.as_strided(im, shape = shp, strides = strd).reshape(win_len, -1).reshape(-1, *ext_shp)
# Extract patches with stride and reshape into columns
slcs = tuple([slice(None, None, None)] + [slice(None, None, s) for s in strides])
col = col[slcs].reshape(win_len, -1)
return col
Efficient Implementation of N-D "col2im"
def col2im(col, im_shp, win, strides = 1):
# Dimensions
try:
len(strides)
except:
strides = [strides]*len(im_shp)
strides = [min(i, s) for i, s in zip(im_shp, strides)]
# Reshape columns into image
if col.ndim > 1:
im = col.reshape((-1, ) + tuple(np.subtract(im_shp, win)//np.array(strides) + 1))[0]
else:
im = col.reshape(tuple(np.subtract(im_shp, win)//np.array(strides) + 1))
return im
Verification That It Works
Let's define an arbitrary 3-D input:
x = np.arange(216).reshape(6, 6, 6)
print(x)
[[[ 0 1 2 3 4 5]
[ 6 7 8 9 10 11]
[ 12 13 14 15 16 17]
[ 18 19 20 21 22 23]
[ 24 25 26 27 28 29]
[ 30 31 32 33 34 35]]
[[ 36 37 38 39 40 41]
[ 42 43 44 45 46 47]
[ 48 49 50 51 52 53]
[ 54 55 56 57 58 59]
[ 60 61 62 63 64 65]
[ 66 67 68 69 70 71]]
[[ 72 73 74 75 76 77]
[ 78 79 80 81 82 83]
[ 84 85 86 87 88 89]
[ 90 91 92 93 94 95]
[ 96 97 98 99 100 101]
[102 103 104 105 106 107]]
[[108 109 110 111 112 113]
[114 115 116 117 118 119]
[120 121 122 123 124 125]
[126 127 128 129 130 131]
[132 133 134 135 136 137]
[138 139 140 141 142 143]]
[[144 145 146 147 148 149]
[150 151 152 153 154 155]
[156 157 158 159 160 161]
[162 163 164 165 166 167]
[168 169 170 171 172 173]
[174 175 176 177 178 179]]
[[180 181 182 183 184 185]
[186 187 188 189 190 191]
[192 193 194 195 196 197]
[198 199 200 201 202 203]
[204 205 206 207 208 209]
[210 211 212 213 214 215]]]
Let's extract all the patches with a non-uniform window and equal stride:
y = im2col(x, [1, 3, 2], strides = [1, 3, 2])
print(y.T) # transposed for ease of visualization
[[ 0 1 6 7 12 13]
[ 2 3 8 9 14 15]
[ 4 5 10 11 16 17]
[ 18 19 24 25 30 31]
[ 20 21 26 27 32 33]
[ 22 23 28 29 34 35]
[ 36 37 42 43 48 49]
[ 38 39 44 45 50 51]
[ 40 41 46 47 52 53]
[ 54 55 60 61 66 67]
[ 56 57 62 63 68 69]
[ 58 59 64 65 70 71]
[ 72 73 78 79 84 85]
[ 74 75 80 81 86 87]
[ 76 77 82 83 88 89]
[ 90 91 96 97 102 103]
[ 92 93 98 99 104 105]
[ 94 95 100 101 106 107]
[108 109 114 115 120 121]
[110 111 116 117 122 123]
[112 113 118 119 124 125]
[126 127 132 133 138 139]
[128 129 134 135 140 141]
[130 131 136 137 142 143]
[144 145 150 151 156 157]
[146 147 152 153 158 159]
[148 149 154 155 160 161]
[162 163 168 169 174 175]
[164 165 170 171 176 177]
[166 167 172 173 178 179]
[180 181 186 187 192 193]
[182 183 188 189 194 195]
[184 185 190 191 196 197]
[198 199 204 205 210 211]
[200 201 206 207 212 213]
[202 203 208 209 214 215]]
Let's convert this back to a (downsampled) image:
z = col2im(y, x.shape, [1, 3, 2], strides = [1, 3, 2])
print(z)
[[[ 0 2 4]
[ 18 20 22]]
[[ 36 38 40]
[ 54 56 58]]
[[ 72 74 76]
[ 90 92 94]]
[[108 110 112]
[126 128 130]]
[[144 146 148]
[162 164 166]]
[[180 182 184]
[198 200 202]]]
As you can see, the final output is indeed the downsampled image that we expect (you can easily check this by going value by value). The dimensionality and strides I chose were purely illustrative. There's no reason why the window size has to be the same as your stride or that you can't go higher than 3 dimensions.
Applications
If you want to use this practically, all you have to do is intercept the output of im2col before turning it back into an image. For example, if you want to do pooling, you could take the mean or the maximum across the 0th axis. If you want to do a convolution, you just need to multiply this by your flattened convolutional filter.
There may be more efficient alternatives to this already implemented under the hood of Tensorflow, etc. that are faster than "im2col." This is not meant to be the MOST efficient implementation. And of course, you could possibly optimize my code further by eliminating the intermediate reshaping step in "im2col," but it wasn't immediately obvious to me so I just left it at that. If you have a better solution, let me know. Anyways, hope this helps someone else looking for the same answer!
I am experimenting the different dimensions one can have in an array using ndim().
x=np.arange(0,100,1).reshape(1,20,5)
The shape is:
[[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]
[25 26 27 28 29]
[30 31 32 33 34]
[35 36 37 38 39]
[40 41 42 43 44]
[45 46 47 48 49]
[50 51 52 53 54]
[55 56 57 58 59]
[60 61 62 63 64]
[65 66 67 68 69]
[70 71 72 73 74]
[75 76 77 78 79]
[80 81 82 83 84]
[85 86 87 88 89]
[90 91 92 93 94]
[95 96 97 98 99]]]
After, print x.ndim shows the array dimension is 3
I cannot visualize why the dimension is 3.
How does the shapes of respective arrays look like with dimensions 0,1,2,3,4,5......?
A simply way to count dimension is counting [ in the output. One [ for one dimension. Here you have three [s, therefore you have 3 dimension. Since one of the dimension is 1, you may be mislead. Here is another example:
x=np.arange(0,24,1).reshape(2,2,6)
Then, x is
array([[[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11]],
[[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23]]])
Now, it is clear that x is a 3 dimension array.