Why couldn't I feed a 4-tuple to nn.ReplicationPad2d()? - python

I'm applying yolov5 on kitti raw image [C, H, W] = [3, 375, 1242]. Therefore I need to pad the image so that the H and W being dividable by 32. I'm using nn.ReplicationPad2d to do the padding: [3, 375, 1242] -> [3, 384, 1248].
In the official tutorial of nn.ReplicationPad2d it was said that we give a 4-tuple to indicate padding sizes for left, right, top and bottom.
The Problem is:
When I give a 4-tuple (0, pad1, 0, pad2), it claims that: 3D tensors expect 2 values for padding
When I give a 2-tuple (pad1, pad2), the pad can be implemented but it seems that only W was padded by pad1+pad2, while H stays unchanged. Because I 'll get a tensor of size [3, 375, 1257].
1257-1242 = 15 = 9+6, where 9 was supposed to pad H and 6 pad W.
I could not figure out what is the problem here...
thanks in advance
Here is my code:
def paddingImage(img, divider=32):
if img.shape[1]%divider != 0 or img.shape[2]%divider != 0:
padding1_mult = int(img.shape[1] / divider) + 1
padding2_mult = int(img.shape[2] / divider) + 1
pad1 = (divider * padding1_mult) - img.shape[1]
pad2 = (divider * padding2_mult) - img.shape[2]
# pad1 = 32 - (img.shape[1]%32)
# pad2 = 32 - (img.shape[2]%32)
# pad1 = 384 - 375 # 9
# pad2 = 1248 - 1242 # 6
#################### PROBLEM ####################
padding = nn.ReplicationPad2d((pad1, pad2))
#################### PROBLEM ####################
return padding(img)
else:
return img
Where img was given as a torch.Tensor in the main function:
# ...
image_tensor = torch.from_numpy(image_np).type(torch.float32)
image_tensor = paddingImage(image_tensor)
image_np = image_tensor.numpy()
# ...

PyTorch expects the input to ReplicationPad2d to be batched image tensors. Therefore, we can unsqueeze to add a 'batch dimension'.
def paddingImage(img, divider=32):
if img.shape[1]%divider != 0 or img.shape[2]%divider != 0:
padding1_mult = int(img.shape[1] / divider) + 1
padding2_mult = int(img.shape[2] / divider) + 1
pad1 = (divider * padding1_mult) - img.shape[1]
pad2 = (divider * padding2_mult) - img.shape[2]
# pad1 = 32 - (img.shape[1]%32)
# pad2 = 32 - (img.shape[2]%32)
# pad1 = 384 - 375 # 9
# pad2 = 1248 - 1242 # 6
padding = nn.ReplicationPad2d((0, pad2, 0, pad1))
# Add a extra batch-dimension, pad, and then remove batch-dimension
return torch.squeeze(padding(torch.unsqueeze(img,0)),0)
else:
return img
Hope this helps!
EDIT
As GoodDeeds mentions, this is resolved with later versions of PyTorch. Either upgrade PyTorch, or, if that's not an option, use the code above.

The issue likely is that your PyTorch version is too old. The documentation corresponds to the latest version, and support for padding tensors without a batch dimension was added in v1.10.0. To resolve this, either upgrade your PyTorch version or add a batch dimension to your image, for example, by using unsqueeze(0).

Related

convert a python list to tensor pytorch

I want to convert a list of pixel values to tensor, but I got an error. My code calculate the pixel values (RGB) for each detected object in the image. How we can convert the list to tensor??
my code:
cropped_images =[]
imgs = PIL.Image.open(img_path).convert('RGB')
#print(img_path)
image_width, image_height = imgs.size
imgArrays = np.array(imgs)
X = (xCenter*image_width)
Y = (yCenter*image_height)
W = (Width*image_width)
H = (Height*image_height)
cropped_image = np.zeros((image_height, image_width))
for i in range(len(X)):
x1, y1, w, h = X[i], Y[i], W[i], H[i]
x_start = int(x1 - (w/2))
y_start = int(y1 - (h/2))
x_end = int(x_start + w)
y_end = int(y_start + h)
temp = imgArrays[y_start: y_end, x_start: x_end]
cropped_image_pixels = torch.as_tensor(temp)
cropped_images.append(cropped_image_pixels)
stacked_tensor = torch.stack(cropped_images)
print(stacked_tensor)
the error:
RuntimeError Traceback (most recent call last)
<ipython-input-82-653a155c3b71> in <module>()
130
131 if __name__=="__main__":
--> 132 main()
2 frames
<ipython-input-80-670335a0656c> in __getitem__(self, idx)
76 cropped_image_pixels = torch.as_tensor(temp)
77 cropped_images.append(cropped_image_pixels)
---> 78 stacked_tensor = torch.stack(cropped_images)
79
80 print(stacked_tensor)
RuntimeError: stack expects each tensor to be equal size, but got [506, 343, 3] at entry 0 and [520, 334, 3] at entry 1
list of tensors has two tensors
and it's clear that both don't have same size
torch.stack(tensors, dim=0, *, out=None) → Tensor
Concatenates a sequence of tensors along a new dimension.
All tensors need to be of the same size.
you can use this pseudo code
import torchvision.transforms as transforms
.
.
.
.
temp=[]
for img_name in LIST:
img=cv2.resize(img,(H,W))
temp.append(img)
train_x=np.asarray(temp)
transform = transforms.Compose(
[transforms.ToTensor(),
check doc

PyTorch: Vectorizing patch selection from a batch of images

Suppose I have a batch of images as a tensor, for example:
images = torch.zeros(64, 3, 1024, 1024)
Now, I want to select a patch from each of those images. All the patches are of the same size, but have different starting positions for each image in the batch.
size_x = 100
size_y = 100
start_x = torch.zeros(64)
start_y = torch.zeros(64)
I can achieve the desired result like that:
result = []
for i in range(arr.shape[0]):
result.append(arr[i, :, start_x[i]:start_x[i]+size_x, start_y[i]:start_y[i]+size_y])
result = torch.stack(result, dim=0)
The question is -- is it possible to do the same thing faster, without a loop? Perhaps there is some form of advanced indexing, or a PyTorch function that can do this?
You can use torch.take to get rid of a for loop. But first, an array of indices should be created with this function
def convert_inds(img_a,img_b,patch_a,patch_b,start_x,start_y):
all_patches = np.zeros((len(start_x),3,patch_a,patch_b))
patch_src = np.zeros((patch_a,patch_b))
inds_src = np.arange(patch_b)
patch_src[:] = inds_src
for ind,info in enumerate(zip(start_x,start_y)):
x,y = info
if x + patch_a + 1 > img_a: return False
if y + patch_b + 1 > img_b: return False
start_ind = img_b * x + y
end_ind = img_b * (x + patch_a -1) + y
col_src = np.linspace(start_ind,end_ind,patch_b)[:,None]
all_patches[ind,:] = patch_src + col_src
return all_patches.astype(np.int)
As you can see, this function essentially creates the indices for each patch you want to slice. With this function, the problem can be easily solved by
size_x = 100
size_y = 100
start_x = torch.zeros(64)
start_y = torch.zeros(64)
images = torch.zeros(64, 3, 1024, 1024)
selected_inds = convert_inds(1024,1024,100,100,start_x,start_y)
selected_inds = torch.tensor(selected_inds)
res = torch.take(images,selected_inds)
UPDATE
OP's observation is correct, the approach above is not faster than a naive approach. In order to avoid building indices every time, here is another solution based on unfold
First, build a tensor of all the possible patches
# create all possible patches
all_patches = images.unfold(2,size_x,1).unfold(3,size_y,1)
Then, slice the desired patches from all_patches
img_ind = torch.arange(images.shape[0])
selected_patches = all_patches[img_ind,:,start_x,start_y,:,:]

How can I keep the association between image and coordinate (z-axis) after a preprocess function?

I'm working on a preprocessing function that takes DICOM files a input and returns a 3D np.array (image stack). The problem is that I need to keep the association between ImagePositionPatient[2] and the relative position of the processed images in the output array.
For example, if a slice with ImagePositionPatient[2] == 5 is mapped to a processed slice in position 3 in the returned stack, I need to return another array that has 5 in the third position, and the same for all original slices. For slices created during processing by interpolation or padding, the array shall contain a palceholder value like -99999 instead.
I paste my code here.
EDIT: new simplified version
def lung_segmentation(patient_dir):
"""
Load the dicom files of a patient, build a 3D image of the scan, normalize it to (1mm x 1mm x 1mm) and segment
the lungs
:param patient_dir: directory of dcm files
:return: a numpy array of size (384, 288, 384)
"""
""" LOAD THE IMAGE """
# Initialize image and get dcm files
dcm_list = glob(patient_dir + '/*.dcm')
img = np.zeros((len(dcm_list), 512, 512), dtype='float32') # inizializza un
# vettore di len(..) di matrici di 0 e di ampiezza 512x512
z = []
# For each dcm file, get the corresponding slice, normalize HU values, and store the Z position of the slice
for i, f in enumerate(dcm_list):
dcm = dicom.read_file(f)
img[i] = float(dcm.RescaleSlope) * dcm.pixel_array.astype('float32') + float(dcm.RescaleIntercept)
z.append(dcm.ImagePositionPatient[-1])
# Get spacing and reorder slices
spacing = list(map(float, dcm.PixelSpacing)) + [np.median(np.diff(np.sort(z)))]
print("LO SPACING e: "+str(spacing))
# spacing = list(map(lambda dcm, z: dcm.PixelSpacing + [np.median(np.diff(np.sort(z)))]))
img = img[np.argsort(z)]
""" NORMALIZE HU AND RESOLUTION """
# Clip and normalize
img = np.clip(img, -1024, 4000) # clippa con minimo a 1024 e max a 4k
img = (img + 1024.) / (4000 + 1024.)
# Rescale 1mm x 1mm x 1mm
new_shape = map(lambda x, y: int(x * y), img.shape, spacing[::-1])
old_shape = img.shape
img = resize(img, new_shape, preserve_range=True)
print('nuova shape calcolata'+ str(img.shape)+' con calcolo eseguito su img_shape: '+str(old_shape)+' * '+str(spacing[::-1]))
lungmask = np.zeros(img.shape) # WE NEED LUNGMASK FOR CODE BELOW
lungmask[int(img.shape[0]/2 - img.shape[0]/4) : int(img.shape[0]/2 + img.shape[0]/4),
int(img.shape[1]/2 - img.shape[1]/4) : int(img.shape[1]/2 + img.shape[1]/4),
int(img.shape[2]/2 - img.shape[2]/4) : int(img.shape[2]/2 + img.shape[2]/4)] = 1
# I set to value = 1 some pixel for executing code below, free to change
""" CENTER AND PAD TO GET SHAPE (384, 288, 384) """
# Center the image
sum_x = np.sum(lungmask, axis=(0, 1))
sum_y = np.sum(lungmask, axis=(0, 2))
sum_z = np.sum(lungmask, axis=(1, 2))
mx = np.nonzero(sum_x)[0][0]
Mx = len(sum_x) - np.nonzero(sum_x[::-1])[0][0]
my = np.nonzero(sum_y)[0][0]
My = len(sum_y) - np.nonzero(sum_y[::-1])[0][0]
mz = np.nonzero(sum_z)[0][0]
Mz = len(sum_z) - np.nonzero(sum_z[::-1])[0][0]
img = img * lungmask
img = img[mz:Mz, my:My, mx:Mx]
# Pad the image to (384, 288, 384)
nz, nr, nc = img.shape
pad1 = int((384 - nz) / 2)
pad2 = 384 - nz - pad1
pad3 = int((288 - nr) / 2)
pad4 = 288 - nr - pad3
pad5 = int((384 - nc) / 2)
pad6 = 384 - nc - pad5
# Crop images too big
if pad1 < 0:
img = img[:, -pad1:384 - pad2]
pad1 = pad2 = 0
if img.shape[0] == 383:
pad1 = 1
if pad3 < 0:
img = img[:, :, -pad3:288 - pad4]
pad3 = pad4 = 0
if img.shape[1] == 287:
pad3 = 1
if pad5 < 0:
img = img[:, :, -pad5:384 - pad6]
pad5 = pad6 = 0
if img.shape[2] == 383:
pad5 = 1
# Pad
img = np.pad(img, pad_width=((pad1 - 4, pad2 + 4), (pad3, pad4), (pad5, pad6)), mode='constant')
# The -4 / +4 is here for "historical" reasons, but it can be removed
return img
reference library for resize methods etc. is skimage
I will try to give at least some hints to the answer. As has been discussed in the comments, resizing may remove the processed data at the original positions due to needed interpolation - so in the end you have to come up with a solution for that, either by changing the resizing target to a multipe of the actual resolution, or by returning the interpolated positions instead.
The basic idea is to have your positions array z be transformed the same as the images are in z direction. So for each operation in processing that changes the z location of the processed image, a similar operation has to be done for z.
Let's say you have 5 slices with a slice distance of 3mm:
>>> z
[0, 6, 3, 12, 9]
We can make a numpy array from it for easier handling:
z_out = np.array(y)
This corresponds to the unprocessed img list.
Now you sort the image list, so you have to also sort z_out:
img = img[np.argsort(z)]
z_out = np.sort(z_out)
>>> z_out
[0, 3, 6, 9, 12]
Next, the image is resized, introducing interpolated slices.
I will assume here that the resizing is done so that the slice distance is a multiple of the target resolution during resizing. In this case you to calculate the number of interpolated slices, and fill the new position array with corresponding placeholder values:
slice_distance = int((max(z) - min(z)) / (len(z) - 1))
nr_interpolated = slice_distance - 1 # you may adapt this to your algorithm
index_diff = np.arange(len(z) - 1) # used to adapt the insertion index
for i in range(nr_interpolated):
index = index_diff * (i + 1) + 1 # insertion index for placeholders
z_out = np.insert(z_out, index, -99999) # insert placeholder for interpolated positions
This gives you the z array filled with the placeholder value where interpolated slices occur in the image array:
>>> z_out
[0, -99999, -999999, 3, -99999, -999999, 6, -99999, -999999, 9, -99999, -999999, 12]
Then you have to do the same padding as for the image in the z direction:
img = np.pad(img, pad_width=((pad1 - 4, pad2 + 4), (pad3, pad4), (pad5, pad6)), mode='constant')
# use 'minimum' so that the placeholder is used
z_out = np.pad(z_out, pad_width=(pad1 - 4, pad2 + 4), mode='minimum')
Assuming padding values 1 and 3 for simplicity this gives you:
>>> z_out
[-99999, 0, -99999, -999999, 3, -99999, -999999, 6, -99999, -999999, 9, -99999, -999999, 12, -99999, -999999, -99999]
If you have more transformations in z directions, you have to do the corresponding changes to z_out. If you are done, you can return your position list together with the image list:
return img, z_out
As an aside: your code will only work as intented if your image has a transverse (axial) orientation, otherwise you have to calculate the z position array from Image Position Patient and Image Orientation Patient, instead of just using the z component of the image position.

Function AddBackward0 returned an invalid gradient at index 1 - expected type torch.FloatTensor but got torch.cuda.FloatTensor

My Model:
class myNet(nn.Module):
def __init__(self):
super(myNet,self).__init__()
self.act1=Dynamic_relu_b(64)
self.conv1=nn.Conv2d(3,64,3)
self.pool=nn.AdaptiveAvgPool2d(1)
self.fc=nn.Linear(128,20)
def forward(self,x):
x=self.conv1(x)
x=self.act1(x)
x=self.pool(x)
x=x.view(x.shape[0],-1)
x=self.fc(x)
return x
A code that replicates the experiment is provided:
def one_hot_smooth_label(x,num_class,smooth=0.1):
num=x.shape[0]
labels=torch.zeros((num,20))
for i in range(num):
labels[i][x[i]]=1
labels=(1-(num_class-1)/num_class*smooth)*labels+smooth/num_class
return labels
images=torch.rand((4,3,300,300))
images=images.cuda()
labels=torch.from_numpy(np.array([1,0,0,1]))
model=myNet()
model=model.cuda()
output=model(images)
labels=one_hot_smooth_label(labels,20)
labels = labels.cuda()
criterion=nn.BCEWithLogitsLoss()
loss=criterion(output,labels)
loss.backward()
The error:
RuntimeError Traceback (most recent call last)
<ipython-input-42-1268777e87e6> in <module>()
21
22 loss=criterion(output,labels)
---> 23 loss.backward()
1 frames
/usr/local/lib/python3.6/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
98 Variable._execution_engine.run_backward(
99 tensors, grad_tensors, retain_graph, create_graph,
--> 100 allow_unreachable=True) # allow_unreachable flag
101
102
RuntimeError: Function AddBackward0 returned an invalid gradient at index 1 - expected type TensorOptions(dtype=float, device=cpu, layout=Strided, requires_grad=false) but got TensorOptions(dtype=float, device=cuda:0, layout=Strided, requires_grad=false) (validate_outputs at /pytorch/torch/csrc/autograd/engine.cpp:484)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x46 (0x7fcf7711b536 in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x2d84224 (0x7fcfb1bad224 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_cpu.so)
frame #2: torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&) + 0x548 (0x7fcfb1baed58 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_cpu.so)
frame #3: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&, bool) + 0x3d2 (0x7fcfb1bb0ce2 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_cpu.so)
frame #4: torch::autograd::Engine::thread_init(int) + 0x39 (0x7fcfb1ba9359 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_cpu.so)
frame #5: torch::autograd::python::PythonEngine::thread_init(int) + 0x38 (0x7fcfbe2e8378 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_python.so)
frame #6: <unknown function> + 0xbd6df (0x7fcfe23416df in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #7: <unknown function> + 0x76db (0x7fcfe34236db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #8: clone + 0x3f (0x7fcfe375c88f in /lib/x86_64-linux-gnu/libc.so.6)
After many experiments, I found that act1 in the model was the problem. If you delete act1, the error will not appear!
But I don't know why act1 has this problem.
What seems to be the wrong part of the error is requiers_grad=False, and I don't know which part set this.
This is the code about act1(Dynamic_relu_b):
class Residual(nn.Module):
def __init__(self, in_channel, R=8, k=2):
super(Residual, self).__init__()
self.avg = nn.AdaptiveAvgPool2d((1, 1))
self.relu = nn.ReLU(inplace=True)
self.R = R
self.k = k
out_channel = int(in_channel / R)
self.fc1 = nn.Linear(in_channel, out_channel)
fc_list = []
for i in range(k):
fc_list.append(nn.Linear(out_channel, 2 * in_channel))
self.fc2 = nn.ModuleList(fc_list)
def forward(self, x):
x = self.avg(x)
x = torch.squeeze(x)
x = self.fc1(x)
x = self.relu(x)
result_list = []
for i in range(self.k):
result = self.fc2[i](x)
result = 2 * torch.sigmoid(result) - 1
result_list.append(result)
return result_list
class Dynamic_relu_b(nn.Module):
def __init__(self, inchannel, R=8, k=2):
super(Dynamic_relu_b, self).__init__()
self.lambda_alpha = 1
self.lambda_beta = 0.5
self.R = R
self.k = k
self.init_alpha = torch.zeros(self.k)
self.init_beta = torch.zeros(self.k)
self.init_alpha[0] = 1
self.init_beta[0] = 1
for i in range(1, k):
self.init_alpha[i] = 0
self.init_beta[i] = 0
self.residual = Residual(inchannel)
def forward(self, input):
delta = self.residual(input)
in_channel = input.shape[1]
bs = input.shape[0]
alpha = torch.zeros((self.k, bs, in_channel))
beta = torch.zeros((self.k, bs, in_channel))
for i in range(self.k):
for j, c in enumerate(range(0, in_channel * 2, 2)):
alpha[i, :, j] = delta[i][:, c]
beta[i, :, j] = delta[i][:, c + 1]
alpha1 = alpha[0]
beta1 = beta[0]
max_result = self.dynamic_function(alpha1, beta1, input, 0)
for i in range(1, self.k):
alphai = alpha[i]
betai = beta[i]
result = self.dynamic_function(alphai, betai, input, i)
max_result = torch.max(max_result, result)
return max_result
def dynamic_function(self, alpha, beta, x, k):
init_alpha = self.init_alpha[k]
init_beta = self.init_beta[k]
alpha = init_alpha + self.lambda_alpha * alpha
beta = init_beta + self.lambda_beta * beta
bs = x.shape[0]
channel = x.shape[1]
results = torch.zeros_like(x)
for i in range(bs):
for c in range(channel):
results[i, c, :, :] = x[i, c] * alpha[i, c] + beta[i, c]
return results
How should I solve this problem?
In PyTorch two tensors need to be on the same device to perform any mathematical operation between them. But in your case one is on the CPU and the other on the GPU. The error is not as clear as it normally is, because it happened in the backwards pass. You were (un)lucky that your forward pass did not fail. That's because there is an exception to the same device restriction, namely when using scalar values in the mathematical operation, e.g. tensor * 2, and it even occurs when the scalar is a tensor: cpu_tensor * tensor(2, device='cuda:0'). You are using a lot of loops and accessing individual scalars to calculate further results.
While the forward pass works like that, in the backward pass when the gradients are calculated, the gradients are multiplied with the previous ones (application of the chain rule). At that point, the two are on different devices.
You have identified that it's in the Dynamic_relu_b. In there you need to make sure that every tensor that you create, is on the same device as the input. The two tensors you create in the forward method are:
alpha = torch.zeros((self.k, bs, in_channel))
beta = torch.zeros((self.k, bs, in_channel))
These are created on the CPU, but your input is on the GPU, so you need to put them on the GPU as well. To be generic, it should be put onto the device where the input is located.
alpha = torch.zeros((self.k, bs, in_channel), device=input.device)
beta = torch.zeros((self.k, bs, in_channel), device=input.device)
The biggest problem in your code are the loops. Not only did they obfuscated a bug, they are very harmful for performance, since they can neither be parallelised nor vectorised, and those are the reasons why GPUs are so fast. I'm certain that these loops can be replaced with more efficient operations, but you'll have to get out of the mindset of creating an empty tensor and then filling it one by one.
I'll give you one example from dynamic_function:
results = torch.zeros_like(x)
for i in range(bs):
for c in range(channel):
results[i, c, :, :] = x[i, c] * alpha[i, c] + beta[i, c]
You're multiplying x (size: [bs, channel, height, width]) with alpha (size: [bs, channel]), where every plane (height, width) of x is multiplied by a different element of alpha (scalar). That would be same as doing an element-wise multiplication with a tensor of the same size as the plane [height, width], but where all elements are the same scalar.
Thankfully, you don't need to repeat them yourself, since singular dimensions (dimensions with size 1) are automatically expanded to match the size of the other tensor, see PyTorch - Broadcasting Semantics for details. That means you only need to reshape alpha to have size [bs, channel, 1, 1].
The loop can therefore be replaced with:
results = x * alpha.view(bs, channel, 1, 1) + beta.view(bs, channel, 1, 1)
By eliminating that loop, you gain a lot of performance, and your initial error just got much clearer, because the forward pass would fail with the following message:
File "main.py", line 78, in dynamic_function
results = x * alpha.view(bs, channel, 1, 1) + beta.view(bs, channel, 1, 1)
RuntimeError: expected device cuda:0 but got device cpu
Now you would know that one of these is on the CPU and the other on the GPU.

Python upscaling an image (no external library help)

I'm trying to upscale the image by 200% but there's some weird bars over the output image. I figure it has something to do with the center pixel. I'm trying to do it without library functions such as resize(). For reference, I'm trying to implement this functionality:
import numpy as np
img = cv2.imread('C:\\Users\\usama\\Downloads\\lena.tiff',0) # Open Image in grayscale
origImg = np.asarray(img) # Convert Image to 2D Array
upscaledImg = np.zeros((1024,1024)) # Empty Array for upscaled Image
rowOld = 0 # Orignal Image Row
rowNew = 0 # Upscaled Image Row
colOld = 0 # Original Image Column
colNew = 0 # Upscaled Image Column
def pixeltop():
return int(origImg[rowOld][colOld]) / 2 + int(origImg[rowOld][colOld + 1]) / 2
def pixelcenter():
return (int(origImg[rowOld+1][colOld]) + int(origImg[rowOld+1][colOld + 1]) + int(origImg[rowOld+1][colOld]) + int(origImg[rowOld][colOld + 1]))/5
def pixelleft():
return int(origImg[rowOld][colOld]) / 2 + int(origImg[rowOld + 1][colOld]) / 2
def pixelright():
return int(origImg[rowOld][colOld + 1]) / 2 + int(origImg[rowOld + 1][colOld + 1]) / 2
def pixelbottom():
return int(origImg[rowOld + 1][colOld]) / 2 + int(origImg[rowOld + 1][colOld + 1]) / 2
while rowOld < (len(origImg)): # Outer Loop for transversing rows
colOld = 0
colNew = 0
while colOld < (len(origImg)): # Inner Loop for transversing columns
upscaledImg[rowNew][colNew] = origImg[rowOld][colOld]
upscaledImg[rowNew][colNew+1] = pixeltop()
upscaledImg[rowNew][colNew+2] = origImg[rowOld][colOld+1]
upscaledImg[rowNew+1][colNew] = pixelleft()
upscaledImg[rowNew+1][colNew+1] = pixelcenter()
upscaledImg[rowNew+1][colNew+2] = pixelright()
upscaledImg[rowNew+2][colNew] = origImg[rowOld+1][colOld]
upscaledImg[rowNew+2][colNew+1] = pixelbottom()
upscaledImg[rowNew+2][colNew+2] = origImg[rowOld+1][colOld+1]
colOld +=2
colNew +=4
if(rowOld == 511):
break
rowOld += 2
rowNew += 4
cv2.imwrite('upscaled.png',upscaledImg)
Output:
the new image is constructed by modifying windows of 3x3 pixels, but your window is moving 4 pixel by 4 pixels, leaving a gap of one pixel, hence the black bars.
example just focusing on row:
we start with rownew=0
-> Img[0] is set
-> Img[0+1] is set
->img[0+2] is set
now rownew+=4
-> Img[4+0] is set
->Img[4+1] is set
->Img[4+2] is set
leaving Img[3] blank
You could either change the padding of your window to be 3 or implement assignations to have a 4x4 window

Categories

Resources