How to use pose heatmap for GAN conditioning? - python

I would like to ask a question about building a pose-conditioned StyleGAN in PyTorch. My intent here is to generate images of human models only in conditioned poses (based on 17x64x64 pose heatmaps). Assuming that generator adjustments are already more or less finished, how can I include pose conditioning into discriminator?
We can use Discriminator class from https://github.com/NVlabs/stylegan2-ada-pytorch/blob/main/training/networks.py as an example: here, in the forward() method of the DiscriminatorEpilogue a simple label-based conditioning is applied.
def forward(self, x, img, cmap, force_fp32=False):
misc.assert_shape(x, [None, self.in_channels, self.resolution, self.resolution]) # [NCHW]
# Here, cmap is just a simple class label mapping. In my case, cmap would include
# a 17-channel pose heatmap from a certain source image.
_ = force_fp32 # unused
dtype = torch.float32
memory_format = torch.contiguous_format
# FromRGB.
x = x.to(dtype=dtype, memory_format=memory_format)
if self.architecture == 'skip':
misc.assert_shape(img, [None, self.img_channels, self.resolution, self.resolution])
img = img.to(dtype=dtype, memory_format=memory_format)
x = x + self.fromrgb(img)
# Main layers.
if self.mbstd is not None:
x = self.mbstd(x)
x = self.conv(x)
x = self.fc(x.flatten(1))
x = self.out(x)
# Conditioning.
if self.cmap_dim > 0:
misc.assert_shape(cmap, [None, self.cmap_dim])
x = (x * cmap).sum(dim=1, keepdim=True) * (1 / np.sqrt(self.cmap_dim))
assert x.dtype == dtype
return x
How could I adjust this code to accomodate my problem, with heatmap dimensions being [batch_size, 17, 64, 64]? I thought abbout flattening the heatmap, but that would lose the spatial information. Another option would be to extract heatmap xmap from the image and calculate some form of distance between xmap and gmap(some form of pixel-wise MAE?). However, I struggle to imagine how to combine such a result with the base output x for the purpose of conditioning.

Related

Apply a kernel to selected pixels only within an image

I'm trying to build a customized DCGAN in PyTorch for a project. The custom part is my own gaussian blur filter on the end of the generator that blurs only pixel values over a certain threshold. The issue is when I add the filter to the forward pass of the generator, the backward() call takes much longer. Without the filter, it's pretty much instant; with the filter, it takes over a minute.
I'm assuming it's the big loop iterating through the image that's causing the problem, is there a way around this?
The forward pass of the generator:
def forward(self, x):
x = self.gen(x)
x = convolve2D(x)
return x
The convolve2D function takes a batch and iterates through it selectively applying the filter:
def convolve2D(batch, padding=0, strides=1):
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
kernel = torch.tensor(([1, 2, 1], [2, 4, 2], [1, 2, 1])).to(device)
kernel_sum = kernel.sum()
# Gather Shapes of Kernel + Image + Padding
xKernShape = kernel.shape[0]
yKernShape = kernel.shape[1]
xImgShape = batch.shape[2]
yImgShape = batch.shape[3]
copy = batch.clone()
for i, image in tqdm(enumerate(copy)):
for j, channel in enumerate(image):
# Apply Equal Padding to All Sides
if padding != 0:
channelPadded = torch.zeros((channel.shape[0] + padding*2, channel.shape[1] + padding*2))
channelPadded[int(padding):int(-1 * padding), int(padding):int(-1 * padding)] = channel
print(channelPadded)
else:
channelPadded = channel
# Iterate through image
for y in range(yImgShape):
# Exit Convolution
if y > yImgShape - yKernShape:
break
# Only Convolve if y has gone down by the specified Strides
if y % strides == 0:
for x in range(xImgShape):
# Go to next row once kernel is out of bounds
if x > xImgShape - xKernShape:
break
# Ignore if pixel is an edge
if channel[x + 1, y + 1] < 0:
continue
else:
# Only Convolve if x has moved by the specified Strides
if x % strides == 0:
batch[i][j][x + 1, y + 1] = torch.mul(kernel, channelPadded[x: x + xKernShape, y: y + yKernShape]).sum() / kernel_sum
return batch
Yes, for backprop in pytorch a loop over the x and y coordinates of the image will be unbearably slow. Let original be batch[i][j] and blurred be the result when kernel is applied non-selectively over original in its entirety. For simplicity I'm omitting the code for this as it is trivial as long as the padding results in original and blurred having the same size.
Now, that you have convolved the kernel over the entire image, simply selectively choose elements from either blurred or original based on the value in original.
output = torch.where(original > threshold, blurred, original)

How to correctly replace the real part of the FFT?

I try to reproduce a data augmentation method, which comes from the paper:
Qinwei Xu, Ruipeng Zhang, Ya Zhang, Yanfeng Wang and Qi Tian "A Fourier-based Framework for Domain Generalization" (CVPR 2021).
It is mentioned in the paper that they set the real part to a constant (the constant in the paper is 20000) to eliminate the amplitude and realize the reconstruction of the image relying only on the phase.
Below is my code:
img = process_img("./data/house.jpg", 128)
img_fft = torch.fft.fft2(img, dim=(-2, -1))
amp = torch.full(img_fft.shape, 200000)
img_fft.real = amp
img_ifft = torch.fft.ifft2(img_fft, dim=(-2, -1))
img_ifft = img_ifft.squeeze(0)
img_ifft = img_ifft.transpose(2, 0)
img_ifft = np.array(img_ifft)
cv2.imshow("", img_ifft.real)
Among them, the process_img function is only used to convert ndarray to tensor, as shown below:
loader = transforms.Compose([transforms.ToTensor()])
def process_img(img_path, img_size):
img = cv2.imread(img_path)
img = cv2.resize(img, (img_size, img_size))
img = img.astype(np.float32) / 255.0
img = loader(img)
img = img.unsqueeze(0)
return img
The first is the original image, the second is the image provided by the paper, and the third is the image generated by my code:
It can be seen that the images generated by my method are very different from those provided in the paper, and there are some artifacts. Why is there such a result?
You are confusing "real"/"imaginary" parts of complex numbers with "amplitude"/"phase" representation.
Here's the quick guide:
A complex number z can be expressed by either a sum of its real part x and its imaginary part y:
z = x + j y
Alternatively, once can express the same complex number z as a rotated vector with amplitude r and an angle phi:
z = r exp(j phi)
Where r = sqrt(x^2 + y^2) and phi=atan2(x,y).
This image (from Wikipedia) explain this visually:
In your code, you replace the "real" part, but in the paper, they suggest replacing the "amplitude".
If you want to replace the amplitude:
const_amp = ... # whatever the constant amplitude you want
new_fft = const_amp * torch.exp(1j * img_fft.angle())
# reconstruct the new image from the modulated Fourier:
img_ifft = torch.fft.ifft2(new_fft, dim=(-2, -1))
This results with the following image:

Is it possible to convert this numpy function to tensorflow?

I have a function that takes a [32, 32, 3] tensor, and outputs a [256,256,3] tensor.
Specifically, the function interprets the smaller array as if it was a .svg file, and 'renders' it to a 256x256 array as a canvas using this algorithm
For an explanation of WHY I would want to do this, see This question
The function behaves exactly as intended, until I try to include it in the training loop of a GAN. The current error I'm seeing is:
NotImplementedError: Cannot convert a symbolic Tensor (mul:0) to a numpy array.
A lot of other answers to similar errors seem to boil down to "You need to re-write the function using tensorflow, not numpy"
Here's the working code using numpy - is it possible to re-write it to exclusively use tensorflow functions?
def convert_to_bitmap(input_tensor, target, j):
#implied conversion to nparray - the tensorflow docs seem to indicate this is okay, but the error is thrown here when training
array = input_tensor
outputArray = target
output = target
for i in range(32):
col = float(array[i,0,j])
if ((float(array[i,0,0]))+(float(array[i,0,1]))+(float(array[i,0,2]))/3)< 0:
continue
#slice only the red channel from the i line, multiply by 255
red_array = array[i,:,0]*255
#slice only the green channel, multiply by 255
green_array = array[i,:,1]*255
#combine and flatten them
combined_array = np.dstack((red_array, green_array)).flatten()
#remove the first two and last two indices of the combined array
index = [0,1,62,63]
clipped_array = np.delete(combined_array,index)
#filter array to remove values less than 0
filtered = clipped_array > 0
filtered_array = clipped_array[filtered]
#check array has an even number of values, delete the last index if it doesn't
if len(filtered_array) % 2 == 0:
pass
else:
filtered_array = np.delete(filtered_array,-1)
#convert into a set of tuples
l = filtered_array.tolist()
t = list(zip(l, l[1:] + l[:1]))
if not t:
continue
output = fill_polygon(t, outputArray, col)
return(output)
The 'fill polygon' function is copied from the 'mahotas' library:
def fill_polygon(polygon, canvas, color):
if not len(polygon):
return
min_y = min(int(y) for y,x in polygon)
max_y = max(int(y) for y,x in polygon)
polygon = [(float(y),float(x)) for y,x in polygon]
if max_y < canvas.shape[0]:
max_y += 1
for y in range(min_y, max_y):
nodes = []
j = -1
for i,p in enumerate(polygon):
pj = polygon[j]
if p[0] < y and pj[0] >= y or pj[0] < y and p[0] >= y:
dy = pj[0] - p[0]
if dy:
nodes.append( (p[1] + (y-p[0])/(pj[0]-p[0])*(pj[1]-p[1])) )
elif p[0] == y:
nodes.append(p[1])
j = i
nodes.sort()
for n,nn in zip(nodes[::2],nodes[1::2]):
nn += 1
canvas[y, int(n):int(nn)] = color
return(canvas)
NOTE: I'm not trying to get someone to convert the whole thing for me! There are some functions that are pretty obvious (tf.stack instead of np.dstack), but others that I don't even know how to start, like the last few lines of the fill_polygon function above.
Yes you can actually do this, you can use a python function in sth called tf.pyfunc. Its a python wrapper but its extremely slow in comparison to plain tensorflow. However, tensorflow and Cuda for example are so damn fast because they use stuff like vectorization, meaning you can rewrite a lot , really many of the loops in terms of mathematical tensor operations which are very fast.
In general:
If you want to use custom code as a custom layer, i would recommend you to rethink the algebra behind those loops and try to express them somehow different. If its just preprocessing before the training is going to start, you can use tensorflow but doing the same with numpy and other libraries is easier.
To your main question: Yes its possible, but better dont use loops. Tensorflow has a build-in loop optimizer but then you have to use tf.while() and thats anyoing (maybe just for me). I just blinked over your code, but it looks like you should be able to vectorize it quite good using the standard tensorflow vocabulary. If you want it fast, i mean really fast with GPU support write all in tensorflow, but nothing like 50/50 with tf.convert_to_tensor(), because than its going to be slow again. because than you switch between GPU and CPU and plain Python interpreter and the tensorflow low level API. Hope i could help you at least a bit
This code 'works', in that it only uses tensorflow functions, and does allow the model to train when used in a training loop:
def convert_image (x):
#split off the first column of the generator output, and store it for later (remove the 'colours' column)
colours_column = tf.slice(img_to_convert, tf.constant([0,0,0], dtype=tf.int32), tf.constant([32,1,3], dtype=tf.int32))
#split off the rest of the data, only keeping R + G, and discarding B
image_data_red = tf.slice(img_to_convert, tf.constant([0,1,0], dtype=tf.int32), tf.constant([32,31,1], dtype=tf.int32))
image_data_green = tf.slice(img_to_convert, tf.constant([0,1,1], dtype=tf.int32), tf.constant([32, 31,1], dtype=tf.int32))
#roll each row by 1 position, and make two more 2D tensors
rolled_red = tf.roll(image_data_red, shift=-1, axis=0)
rolled_green = tf.roll(image_data_green, shift=-1, axis=0)
#remove all values where either the red OR green channels are 0
zeroes = tf.constant(0, dtype=tf.float32)
#this is for the 'count_nonzero' command
boolean_red_data = tf.not_equal(image_data_red, zeroes)
boolean_green_data = tf.not_equal(image_data_green, zeroes)
initial_data_mask = tf.logical_and(boolean_red_data, boolean_green_data)
#count non-zero values per row and flatten it
count = tf.math.count_nonzero(initial_data_mask, 1)
count_flat = tf.reshape(count, [-1])
flat_red = tf.reshape(image_data_red, [-1])
flat_green = tf.reshape(image_data_green, [-1])
boolean_red = tf.math.logical_not(tf.equal(flat_red, tf.zeros_like(flat_red)))
boolean_green = tf.math.logical_not(tf.equal(flat_green, tf.zeros_like(flat_red)))
mask = tf.logical_and(boolean_red, boolean_green)
flat_red_without_zero = tf.boolean_mask(flat_red, mask)
flat_green_without_zero = tf.boolean_mask(flat_green, mask)
# create a ragged tensor
X0_ragged = tf.RaggedTensor.from_row_lengths(values=flat_red_without_zero, row_lengths=count_flat)
Y0_ragged = tf.RaggedTensor.from_row_lengths(values=flat_green_without_zero, row_lengths=count_flat)
#do the same for the rolled version
rolled_data_mask = tf.roll(initial_data_mask, shift=-1, axis=1)
flat_rolled_red = tf.reshape(rolled_red, [-1])
flat_rolled_green = tf.reshape(rolled_green, [-1])
#from SO "shift zeros to the end"
boolean_rolled_red = tf.math.logical_not(tf.equal(flat_rolled_red, tf.zeros_like(flat_rolled_red)))
boolean_rolled_green = tf.math.logical_not(tf.equal(flat_rolled_green, tf.zeros_like(flat_rolled_red)))
rolled_mask = tf.logical_and(boolean_rolled_red, boolean_rolled_green)
flat_rolled_red_without_zero = tf.boolean_mask(flat_rolled_red, rolled_mask)
flat_rolled_green_without_zero = tf.boolean_mask(flat_rolled_green, rolled_mask)
# create a ragged tensor
X1_ragged = tf.RaggedTensor.from_row_lengths(values=flat_rolled_red_without_zero, row_lengths=count_flat)
Y1_ragged = tf.RaggedTensor.from_row_lengths(values=flat_rolled_green_without_zero, row_lengths=count_flat)
#available outputs for future use are:
X0 = X0_ragged.to_tensor(default_value=0.)
Y0 = Y0_ragged.to_tensor(default_value=0.)
X1 = X1_ragged.to_tensor(default_value=0.)
Y1 = Y1_ragged.to_tensor(default_value=0.)
#Example tensor cel (replace with (x))
P = tf.cast(x, dtype=tf.float32)
#split out P.x and P.y, and fill a ragged tensor to the same shape as Rx
Px_value = tf.cast(x, dtype=tf.float32) - tf.cast((tf.math.floor(x/255)*255), dtype=tf.float32)
Py_value = tf.cast(tf.math.floor(x/255), dtype=tf.float32)
Px = tf.squeeze(tf.ones_like(X0)*Px_value)
Py = tf.squeeze(tf.ones_like(Y0)*Py_value)
#for each pair of values (Y0, Y1, make a vector, and check to see if it crosses the y-value (Py) either up or down
a = tf.math.less(Y0, Py)
b = tf.math.greater_equal(Y1, Py)
c = tf.logical_and(a, b)
d = tf.math.greater_equal(Y0, Py)
e = tf.math.less(Y1, Py)
f = tf.logical_and(d, e)
g = tf.logical_or(c, f)
#Makes boolean bitwise mask
#calculate the intersection of the line with the y-value, assuming it intersects
#P.x <= (G.x - R.x) * (P.y - R.y) / (G.y - R.y + R.x) - use tf.divide_no_nan for safe divide
h = tf.math.less(Px,(tf.math.divide_no_nan(((X1-X0)*(Py-Y0)),(Y1-Y0+X0))))
#combine using AND with the mask above
i = tf.logical_and(g,h)
#tf.count_nonzero
#reshape to make a column tensor with the same dimensions as the colours
#divide by 2 using tf.floor_mod (returns remainder of division - any remainder means the value is odd, and hence the point is IN the polygon)
final_count = tf.cast((tf.math.count_nonzero(i, 1)), dtype=tf.int32)
twos = tf.ones_like(final_count, dtype=tf.int32)*tf.constant([2], dtype=tf.int32)
divide = tf.cast(tf.math.floormod(final_count, twos), dtype=tf.int32)
index = tf.cast(tf.range(0,32, delta=1), dtype=tf.int32)
clipped_index = divide*index
sort = tf.sort(clipped_index)
reverse = tf.reverse(sort, [-1])
value = tf.slice(reverse, [0], [1])
pair = tf.constant([0], dtype=tf.int32)
slice_tensor = tf.reshape(tf.stack([value, pair, pair], axis=0),[-1])
output_colour = tf.slice(colours_column, slice_tensor, [1,1,3])
return output_colour
This is where the 'convert image' function is applied using tf.vectorize_map:
def convert_images(image_to_convert):
global img_to_convert
img_to_convert = image_to_convert
process_list = tf.reshape((tf.range(0,65536, delta=1, dtype=tf.int32)), [65536, 1])
output_line = tf.vectorized_map(convert_image, process_list)
output_line_squeezed = tf.squeeze(output_line)
output_reshape = (tf.reshape(output_line_squeezed, [256,256,3])/127.5)-1
output = tf.expand_dims(output_reshape, axis=0)
return output
It is PAINFULLY slow, though - It does not appear to be using the GPU, and looks to be single threaded as well.
I'm adding it as an answer to my own question because is clearly IS possible to do this numpy function entirely in tensorflow - it just probably shouldn't be done like this.

Color Correction Matrix in XYZ/RGB not working

I am aiming to perform a color correction based on a reference image, using color charts. As a personal goal, I'm trying to correct the colors of an image I previously modified. The chart has been affected by the same layers, of course:
Originals:
Manually modified:
I'm using the following function that I've written myself to get the matrix:
def _get_matrix_transformation(self,
observed_colors: np.ndarray,
reference_colors: np.ndarray):
"""
Args:
observed_colors: colors found in target chart
reference_colors: colors found on source/reference image
Returns:
Nothing.
"""
# case 1
observed_m = [observed_colors[..., i].mean() for i in range(observed_colors.shape[-1])]
observed_colors = (observed_colors - observed_m).astype(np.float32)
reference_m = [reference_colors[..., i].mean() for i in range(reference_colors.shape[-1])]
reference_colors = (reference_colors - reference_m).astype(np.float32)
# XYZ color conversion
observed_XYZ = cv.cvtColor(observed_colors, cv.COLOR_BGR2XYZ)
observed_XYZ = np.reshape(observed_colors, (observed_XYZ.shape[0] * observed_XYZ.shape[1],
observed_XYZ.shape[2]))
reference_XYZ = cv.cvtColor(reference_colors, cv.COLOR_BGR2XYZ)
reference_XYZ = np.reshape(reference_colors, (reference_XYZ.shape[0] * reference_XYZ.shape[1],
reference_XYZ.shape[2]))
# case 2
# mean subtraction in order to use the covariance matrix
# observed_m = [observed_XYZ[..., i].mean() for i in range(observed_XYZ.shape[-1])]
# observed_XYZ = observed_XYZ - observed_m
# reference_m = [reference_XYZ[..., i].mean() for i in range(reference_XYZ.shape[-1])]
# reference_XYZ = reference_XYZ - reference_m
# apply SVD
H = np.dot(reference_XYZ.T, observed_XYZ)
U, S, Vt = np.linalg.svd(H)
# get transformation
self._M = Vt.T * U.T
# consider reflection case
if np.linalg.det(self._M) < 0:
Vt[2, :] *= -1
self._M = Vt.T * U.T
return
I'm applying the correction like this:
def _apply_profile(self, img: np.ndarray) -> np.ndarray:
"""
Args:
img: image to be corrected.
Returns:
Corrected image.
"""
# Revert gamma compression
img = adjust_gamma(img, gamma=1/2.2)
# Apply color correction
corrected_img = cv.cvtColor(img.astype(np.float32), cv.COLOR_BGR2XYZ)
corrected_img = corrected_img.reshape((corrected_img.shape[0]*corrected_img.shape[1], corrected_img.shape[2]))
corrected_img = np.dot(self._M, corrected_img.T).T.reshape(img.shape)
corrected_img = cv.cvtColor(corrected_img.astype(np.float32), cv.COLOR_XYZ2BGR)
corrected_img = np.clip(corrected_img, 0, 255)
# Apply gamma
corrected_img = adjust_gamma(corrected_img.astype(np.uint8), gamma=2.2)
return corrected_img
The result I'm currently getting if the transformation is done in BGR (just commented color conversion functions):
In XYZ (don't pay attention to the resizing, that's because of me):
Now, I'm asking these questions:
Is inverting gamma necessary in this case? If so, am I doing it correctly? Should I implement a LUT that works with other data types such as np.float32?
Subtraction of the mean should be done in XYZ on BGR color space (case 1 vs case 2)?
Is considering the reflection case (as in a rigid body rotation problem) necessary?
Is clipping necessary? And if so, are those the correct values and data types?

Iterate over a Tensor flow Placeholder

docArray is a placeholder used to build the tensorflow graph. The graph is built properly but when the data is fed using feed_dict in session, the variable length do not get adjusted dynamically. Following is the code snippet.
lContext = tf.zeros((100,1), dtype=tf.float64)
rContext = tf.zeros((100,1), dtype=tf.float64)
for i in range(1, docArray.shape[1].valu):
j = docArrayShape - 1 - i
lContext = tf.concat([lContext,somefun1()], 1)
rContext = tf.concat([somefun2(), rContext], 1)
X = tf.concat([lContext, docArray, rContext], axis= 0)
When this code is used as forward pass, error comes up when docArray is initialised as
docArray = tf.placeholder(tf.float64, [100, None])
In case i randomly initialise the docArray with random shape, while feeding the realtime docArray data of shape (100 x N), where N is number of words in a document, i get error while training when concatenating, as the lContext and docArray will be in different shape.
The size of sample document is not fixed.
Thanks in advance, for the help.
Since you have not mentioned what is the size of the variables during the time of concatenation, it is difficult to estimate where it is going wrong. But in general, for concatenation to take place, the tensors undergoing concatenation should have same dtype and same dimensions in all axis except the axis which is undergoing concatenation.
For example,
Not allowed: (different dtype)
x = tf.placeholder(tf.float32, (100, None))
y = tf.placeholder(tf.float64, (100, None))
z = tf.concat((x,y), axis = 0)
Not allowed: (shape of 100 and 200 mismatch)
x = tf.placeholder(tf.float32, (100, None))
y = tf.placeholder(tf.float32, (200, None))
z = tf.concat((x,y), axis = 1)
Allowed: (Same dtype and axis match)
x = tf.placeholder(tf.float32, (100, 300))
y = tf.placeholder(tf.float32, (200, 300))
z = tf.concat((x,y), axis = 0)
In above example, if using None like other example, it will compile, but during run-time, the None has to represent same shape.

Categories

Resources