How to use if statement PyTorch using torch.FloatTensor - python

I am trying to use the if statement in my PyTorch code using torch.FloatTensor as data type, to speed it up into the GPU.
This is my code:
import torch
import time
def fitness(x):
return torch.pow(x, 2)
def velocity(v, gxbest, pxbest, pybest, x, pop):
return torch.rand(pop).type(dtype)*v + \
torch.rand(pop).type(dtype)*(pxbest - x) + \
torch.rand(pop).type(dtype)*(gxbest.expand(x.size(0)) - x)
dtype = torch.cuda.FloatTensor
def main():
pop, xmax, xmin, niter = 300000, 50, -50, 100
v = torch.rand(pop).type(dtype)
x = (xmax-xmin)*torch.rand(pop).type(dtype)+xmin
y = fitness(x)
[miny, indexminy] = y.min(0)
gxbest = x[indexminy]
pxbest = x
pybest = y
for K in range(niter):
vnext = velocity(v, gxbest, pxbest, pybest, x, pop)
xnext = x + vnext
ynext = fitness(x)
[minynext, indexminynext] = ynext.min(0)
if (minynext < miny):
miny = minynext
gxbest = xnext[indexminynext]
indexpbest = (ynext < pybest)
pxbest[indexpbest] = xnext[indexpbest]
pybest[indexpbest] = ynext[indexpbest]
x = xnext
v = vnext
main()
Unfortanally it is not working. It is giving me a error message and I can not figure it out what is the problem.
RuntimeError: bool value of non-empty torch.cuda.ByteTensor objects is ambiguous
How can I use the if in PyTorch? I tried to convert the cuda.Tensor into a numpy array but it did not work also.
minynext = minynext.cpu().numpy()
miny = miny.cpu().numpy()
PS: Am I doing the code the efficient/faster way possible ? Or should I change something to achieve faster results?

If you look into the following simple example:
import torch
a = torch.LongTensor([1])
b = torch.LongTensor([5])
print(a > b)
Output:
0
[torch.ByteTensor of size 1]
Comparing tensors a and b results in a torch.ByteTensor which is obviously not equivalent to boolean. So, you can do the following.
print(a[0] > b[0]) # False
So, you should change your if condition as follows.
if (minynext[0] < miny[0])

When you compare pyTorch tensors, the output is usually a ByteTensor. This data type is not suitable for if statements.
Change the condition inside the if:
if (minynext[0] < miny[0])

Related

Problems with hurdle and truncated count model from statsmodels

I want to use a hurdle model from statsmodels (https://www.statsmodels.org/dev/examples/notebooks/generated/count_hurdle.html).
Unfortunately, when following the manual, I am running into several problems:
I am only able to load the discrete_model and count_model from discrete:
Hence, the following work
import statsmodels.discrete.discrete_model
import statsmodels.discrete.count_model
However, there is no module named 'statsmodels.discrete.truncated_model'
Furthermore, I am not able to run
get_diagnostic()
with
a poisson regression. I am running the same code as on the statsmodels website (see below), but it throughs the following error:
'PoissonResults' object has no attribute 'get_diagnostic'
Thanks for your help!
np.random.seed(987456348)
# large sample to get strong results
nobs = 5000
x = np.column_stack((np.ones(nobs), np.linspace(0, 1, nobs)))
mu0 = np.exp(0.5 *2 * x.sum(1))
y = np.random.poisson(mu0, size=nobs)
print(np.bincount(y))
y_ = y
indices = np.arange(len(y))
mask = mask0 = y > 0
for _ in range(10):
print( mask.sum())
indices = mask #indices[mask]
if not np.any(mask):
break
mu_ = np.exp(0.5 * x[indices].sum(1))
y[indices] = y_ = np.random.poisson(mu_, size=len(mu_))
np.place(y, mask, y_)
mask = np.logical_and(mask0, y == 0)
np.bincount(y)
mod_p = Poisson(y, x)
res_p = mod_p.fit()
print(res_p.summary())
dia_p = res_p.get_diagnostic()
dia_p.plot_probs();

Speeding up a pytorch tensor operation

I am trying to speed up the below operation by doing some sort of matrix/vector-multiplication, can anyone see a nice quick solution?
It should also work for a special case where a tensor has shape 0 (torch.Size([])) but i am not able to initialize such a tensor.
See the image below for the type of tensor i am referring to:
tensor to add to test
def adstock_geometric(x: torch.Tensor, theta: float):
x_decayed = torch.zeros_like(x)
x_decayed[0] = x[0]
for xi in range(1, len(x_decayed)):
x_decayed[xi] = x[xi] + theta * x_decayed[xi - 1]
return x_decayed
def adstock_multiple_samples(x: torch.Tensor, theta: torch.Tensor):
listtheta = theta.tolist()
if isinstance(listtheta, float):
return adstock_geometric(x=x,
theta=theta)
x_decayed = torch.zeros((100, 112, 1))
for idx, theta_ in enumerate(listtheta):
x_decayed_one_entry = adstock_geometric(x=x,
theta=theta_)
x_decayed[idx] = x_decayed_one_entry
return x_decayed
if __name__ == '__main__':
ones = torch.tensor([1])
hundreds = torch.tensor([idx for idx in range(100)])
x = torch.tensor([[idx] for idx in range(112)])
ones = adstock_multiple_samples(x=x,
theta=ones)
hundreds = adstock_multiple_samples(x=x,
theta=hundreds)
print(ones)
print(hundreds)
I came up with the following, which is 40 times faster on your example:
import torch
def adstock_multiple_samples(x: torch.Tensor, theta: torch.Tensor):
arange = torch.arange(len(x))
powers = (arange[:, None] - arange).clip(0)
return ((theta[:, None, None] ** powers[None, :, :]).tril() * x).sum(-1)
It behaves as expected:
>>> x = torch.arange(112)
>>> theta = torch.arange(100)
>>> adstock_multiple_samples(x, theta)
... # the same output
Note that I considered that x was a 1D-tensor, as for your example the second dimension was not needed.
It also works with theta = torch.empty((0,)), and it returns an empty tensor.

PyTorch: Vectorizing patch selection from a batch of images

Suppose I have a batch of images as a tensor, for example:
images = torch.zeros(64, 3, 1024, 1024)
Now, I want to select a patch from each of those images. All the patches are of the same size, but have different starting positions for each image in the batch.
size_x = 100
size_y = 100
start_x = torch.zeros(64)
start_y = torch.zeros(64)
I can achieve the desired result like that:
result = []
for i in range(arr.shape[0]):
result.append(arr[i, :, start_x[i]:start_x[i]+size_x, start_y[i]:start_y[i]+size_y])
result = torch.stack(result, dim=0)
The question is -- is it possible to do the same thing faster, without a loop? Perhaps there is some form of advanced indexing, or a PyTorch function that can do this?
You can use torch.take to get rid of a for loop. But first, an array of indices should be created with this function
def convert_inds(img_a,img_b,patch_a,patch_b,start_x,start_y):
all_patches = np.zeros((len(start_x),3,patch_a,patch_b))
patch_src = np.zeros((patch_a,patch_b))
inds_src = np.arange(patch_b)
patch_src[:] = inds_src
for ind,info in enumerate(zip(start_x,start_y)):
x,y = info
if x + patch_a + 1 > img_a: return False
if y + patch_b + 1 > img_b: return False
start_ind = img_b * x + y
end_ind = img_b * (x + patch_a -1) + y
col_src = np.linspace(start_ind,end_ind,patch_b)[:,None]
all_patches[ind,:] = patch_src + col_src
return all_patches.astype(np.int)
As you can see, this function essentially creates the indices for each patch you want to slice. With this function, the problem can be easily solved by
size_x = 100
size_y = 100
start_x = torch.zeros(64)
start_y = torch.zeros(64)
images = torch.zeros(64, 3, 1024, 1024)
selected_inds = convert_inds(1024,1024,100,100,start_x,start_y)
selected_inds = torch.tensor(selected_inds)
res = torch.take(images,selected_inds)
UPDATE
OP's observation is correct, the approach above is not faster than a naive approach. In order to avoid building indices every time, here is another solution based on unfold
First, build a tensor of all the possible patches
# create all possible patches
all_patches = images.unfold(2,size_x,1).unfold(3,size_y,1)
Then, slice the desired patches from all_patches
img_ind = torch.arange(images.shape[0])
selected_patches = all_patches[img_ind,:,start_x,start_y,:,:]

Is it possible to convert this numpy function to tensorflow?

I have a function that takes a [32, 32, 3] tensor, and outputs a [256,256,3] tensor.
Specifically, the function interprets the smaller array as if it was a .svg file, and 'renders' it to a 256x256 array as a canvas using this algorithm
For an explanation of WHY I would want to do this, see This question
The function behaves exactly as intended, until I try to include it in the training loop of a GAN. The current error I'm seeing is:
NotImplementedError: Cannot convert a symbolic Tensor (mul:0) to a numpy array.
A lot of other answers to similar errors seem to boil down to "You need to re-write the function using tensorflow, not numpy"
Here's the working code using numpy - is it possible to re-write it to exclusively use tensorflow functions?
def convert_to_bitmap(input_tensor, target, j):
#implied conversion to nparray - the tensorflow docs seem to indicate this is okay, but the error is thrown here when training
array = input_tensor
outputArray = target
output = target
for i in range(32):
col = float(array[i,0,j])
if ((float(array[i,0,0]))+(float(array[i,0,1]))+(float(array[i,0,2]))/3)< 0:
continue
#slice only the red channel from the i line, multiply by 255
red_array = array[i,:,0]*255
#slice only the green channel, multiply by 255
green_array = array[i,:,1]*255
#combine and flatten them
combined_array = np.dstack((red_array, green_array)).flatten()
#remove the first two and last two indices of the combined array
index = [0,1,62,63]
clipped_array = np.delete(combined_array,index)
#filter array to remove values less than 0
filtered = clipped_array > 0
filtered_array = clipped_array[filtered]
#check array has an even number of values, delete the last index if it doesn't
if len(filtered_array) % 2 == 0:
pass
else:
filtered_array = np.delete(filtered_array,-1)
#convert into a set of tuples
l = filtered_array.tolist()
t = list(zip(l, l[1:] + l[:1]))
if not t:
continue
output = fill_polygon(t, outputArray, col)
return(output)
The 'fill polygon' function is copied from the 'mahotas' library:
def fill_polygon(polygon, canvas, color):
if not len(polygon):
return
min_y = min(int(y) for y,x in polygon)
max_y = max(int(y) for y,x in polygon)
polygon = [(float(y),float(x)) for y,x in polygon]
if max_y < canvas.shape[0]:
max_y += 1
for y in range(min_y, max_y):
nodes = []
j = -1
for i,p in enumerate(polygon):
pj = polygon[j]
if p[0] < y and pj[0] >= y or pj[0] < y and p[0] >= y:
dy = pj[0] - p[0]
if dy:
nodes.append( (p[1] + (y-p[0])/(pj[0]-p[0])*(pj[1]-p[1])) )
elif p[0] == y:
nodes.append(p[1])
j = i
nodes.sort()
for n,nn in zip(nodes[::2],nodes[1::2]):
nn += 1
canvas[y, int(n):int(nn)] = color
return(canvas)
NOTE: I'm not trying to get someone to convert the whole thing for me! There are some functions that are pretty obvious (tf.stack instead of np.dstack), but others that I don't even know how to start, like the last few lines of the fill_polygon function above.
Yes you can actually do this, you can use a python function in sth called tf.pyfunc. Its a python wrapper but its extremely slow in comparison to plain tensorflow. However, tensorflow and Cuda for example are so damn fast because they use stuff like vectorization, meaning you can rewrite a lot , really many of the loops in terms of mathematical tensor operations which are very fast.
In general:
If you want to use custom code as a custom layer, i would recommend you to rethink the algebra behind those loops and try to express them somehow different. If its just preprocessing before the training is going to start, you can use tensorflow but doing the same with numpy and other libraries is easier.
To your main question: Yes its possible, but better dont use loops. Tensorflow has a build-in loop optimizer but then you have to use tf.while() and thats anyoing (maybe just for me). I just blinked over your code, but it looks like you should be able to vectorize it quite good using the standard tensorflow vocabulary. If you want it fast, i mean really fast with GPU support write all in tensorflow, but nothing like 50/50 with tf.convert_to_tensor(), because than its going to be slow again. because than you switch between GPU and CPU and plain Python interpreter and the tensorflow low level API. Hope i could help you at least a bit
This code 'works', in that it only uses tensorflow functions, and does allow the model to train when used in a training loop:
def convert_image (x):
#split off the first column of the generator output, and store it for later (remove the 'colours' column)
colours_column = tf.slice(img_to_convert, tf.constant([0,0,0], dtype=tf.int32), tf.constant([32,1,3], dtype=tf.int32))
#split off the rest of the data, only keeping R + G, and discarding B
image_data_red = tf.slice(img_to_convert, tf.constant([0,1,0], dtype=tf.int32), tf.constant([32,31,1], dtype=tf.int32))
image_data_green = tf.slice(img_to_convert, tf.constant([0,1,1], dtype=tf.int32), tf.constant([32, 31,1], dtype=tf.int32))
#roll each row by 1 position, and make two more 2D tensors
rolled_red = tf.roll(image_data_red, shift=-1, axis=0)
rolled_green = tf.roll(image_data_green, shift=-1, axis=0)
#remove all values where either the red OR green channels are 0
zeroes = tf.constant(0, dtype=tf.float32)
#this is for the 'count_nonzero' command
boolean_red_data = tf.not_equal(image_data_red, zeroes)
boolean_green_data = tf.not_equal(image_data_green, zeroes)
initial_data_mask = tf.logical_and(boolean_red_data, boolean_green_data)
#count non-zero values per row and flatten it
count = tf.math.count_nonzero(initial_data_mask, 1)
count_flat = tf.reshape(count, [-1])
flat_red = tf.reshape(image_data_red, [-1])
flat_green = tf.reshape(image_data_green, [-1])
boolean_red = tf.math.logical_not(tf.equal(flat_red, tf.zeros_like(flat_red)))
boolean_green = tf.math.logical_not(tf.equal(flat_green, tf.zeros_like(flat_red)))
mask = tf.logical_and(boolean_red, boolean_green)
flat_red_without_zero = tf.boolean_mask(flat_red, mask)
flat_green_without_zero = tf.boolean_mask(flat_green, mask)
# create a ragged tensor
X0_ragged = tf.RaggedTensor.from_row_lengths(values=flat_red_without_zero, row_lengths=count_flat)
Y0_ragged = tf.RaggedTensor.from_row_lengths(values=flat_green_without_zero, row_lengths=count_flat)
#do the same for the rolled version
rolled_data_mask = tf.roll(initial_data_mask, shift=-1, axis=1)
flat_rolled_red = tf.reshape(rolled_red, [-1])
flat_rolled_green = tf.reshape(rolled_green, [-1])
#from SO "shift zeros to the end"
boolean_rolled_red = tf.math.logical_not(tf.equal(flat_rolled_red, tf.zeros_like(flat_rolled_red)))
boolean_rolled_green = tf.math.logical_not(tf.equal(flat_rolled_green, tf.zeros_like(flat_rolled_red)))
rolled_mask = tf.logical_and(boolean_rolled_red, boolean_rolled_green)
flat_rolled_red_without_zero = tf.boolean_mask(flat_rolled_red, rolled_mask)
flat_rolled_green_without_zero = tf.boolean_mask(flat_rolled_green, rolled_mask)
# create a ragged tensor
X1_ragged = tf.RaggedTensor.from_row_lengths(values=flat_rolled_red_without_zero, row_lengths=count_flat)
Y1_ragged = tf.RaggedTensor.from_row_lengths(values=flat_rolled_green_without_zero, row_lengths=count_flat)
#available outputs for future use are:
X0 = X0_ragged.to_tensor(default_value=0.)
Y0 = Y0_ragged.to_tensor(default_value=0.)
X1 = X1_ragged.to_tensor(default_value=0.)
Y1 = Y1_ragged.to_tensor(default_value=0.)
#Example tensor cel (replace with (x))
P = tf.cast(x, dtype=tf.float32)
#split out P.x and P.y, and fill a ragged tensor to the same shape as Rx
Px_value = tf.cast(x, dtype=tf.float32) - tf.cast((tf.math.floor(x/255)*255), dtype=tf.float32)
Py_value = tf.cast(tf.math.floor(x/255), dtype=tf.float32)
Px = tf.squeeze(tf.ones_like(X0)*Px_value)
Py = tf.squeeze(tf.ones_like(Y0)*Py_value)
#for each pair of values (Y0, Y1, make a vector, and check to see if it crosses the y-value (Py) either up or down
a = tf.math.less(Y0, Py)
b = tf.math.greater_equal(Y1, Py)
c = tf.logical_and(a, b)
d = tf.math.greater_equal(Y0, Py)
e = tf.math.less(Y1, Py)
f = tf.logical_and(d, e)
g = tf.logical_or(c, f)
#Makes boolean bitwise mask
#calculate the intersection of the line with the y-value, assuming it intersects
#P.x <= (G.x - R.x) * (P.y - R.y) / (G.y - R.y + R.x) - use tf.divide_no_nan for safe divide
h = tf.math.less(Px,(tf.math.divide_no_nan(((X1-X0)*(Py-Y0)),(Y1-Y0+X0))))
#combine using AND with the mask above
i = tf.logical_and(g,h)
#tf.count_nonzero
#reshape to make a column tensor with the same dimensions as the colours
#divide by 2 using tf.floor_mod (returns remainder of division - any remainder means the value is odd, and hence the point is IN the polygon)
final_count = tf.cast((tf.math.count_nonzero(i, 1)), dtype=tf.int32)
twos = tf.ones_like(final_count, dtype=tf.int32)*tf.constant([2], dtype=tf.int32)
divide = tf.cast(tf.math.floormod(final_count, twos), dtype=tf.int32)
index = tf.cast(tf.range(0,32, delta=1), dtype=tf.int32)
clipped_index = divide*index
sort = tf.sort(clipped_index)
reverse = tf.reverse(sort, [-1])
value = tf.slice(reverse, [0], [1])
pair = tf.constant([0], dtype=tf.int32)
slice_tensor = tf.reshape(tf.stack([value, pair, pair], axis=0),[-1])
output_colour = tf.slice(colours_column, slice_tensor, [1,1,3])
return output_colour
This is where the 'convert image' function is applied using tf.vectorize_map:
def convert_images(image_to_convert):
global img_to_convert
img_to_convert = image_to_convert
process_list = tf.reshape((tf.range(0,65536, delta=1, dtype=tf.int32)), [65536, 1])
output_line = tf.vectorized_map(convert_image, process_list)
output_line_squeezed = tf.squeeze(output_line)
output_reshape = (tf.reshape(output_line_squeezed, [256,256,3])/127.5)-1
output = tf.expand_dims(output_reshape, axis=0)
return output
It is PAINFULLY slow, though - It does not appear to be using the GPU, and looks to be single threaded as well.
I'm adding it as an answer to my own question because is clearly IS possible to do this numpy function entirely in tensorflow - it just probably shouldn't be done like this.

Numpy convert list of 3D variable size volumes to 4D array

I'm working on a neural network where I am augmenting data via rotation and varying the size of each input volume.
Let me back up, the input to the network is a 3D volume. I generate variable size 3D volumes, and then pad each volume with zero's such that the input volume is constant. Check here for an issue I was having with padding (now resolved).
I generate a variable size 3D volume, append it to a list, and then convert the list into a numpy array. At this point, padding hasn't occured so converting it into a 4D tuple makes no sense...
input_augmented_matrix = []
label_augmented_matrix = []
for i in range(n_volumes):
if i % 50 == 0:
print ("Augmenting step #" + str(i))
slice_index = randint(0,n_input)
z_max = randint(5,n_input)
z_rand = randint(3,5)
z_min = z_max - z_rand
x_max = randint(75, n_input_x)
x_rand = randint(60, 75)
x_min = x_max - x_rand
y_max = randint(75, n_input_y)
y_rand = randint(60, 75)
y_min = y_max - y_rand
random_rotation = randint(1,4) * 90
for j in range(2):
temp_volume = np.empty((z_rand, x_rand, y_rand))
k = 0
for z in range(z_min, z_max):
l = 0
for x in range(x_min, x_max):
m = 0
for y in range(y_min, y_max):
if j == 0:
#input volume
try:
temp_volume[k][l][m] = input_matrix[z][x][y]
except:
pdb.set_trace()
else:
#ground truth volume
temp_volume[k][l][m] = label_matrix[z][x][y]
m = m + 1
l = l + 1
k = k + 1
temp_volume = np.asarray(temp_volume)
temp_volume = np.rot90(temp_volume,random_rotation)
if j == 0:
input_augmented_matrix.append(temp_volume)
else:
label_augmented_matrix.append(temp_volume)
input_augmented_matrix = np.asarray(input_augmented_matrix)
label_augmented_matrix = np.asarray(label_augmented_matrix)
The dimensions of input_augmented_matrix at this point is (N,)
Then I pad with the following code...
for i in range(n_volumes):
print("Padding volume #" + str(i))
input_augmented_matrix[i] = np.lib.pad(input_augmented_matrix[i], ((0,n_input_z - int(input_augmented_matrix[i][:,0,0].shape[0])),
(0,n_input_x - int(input_augmented_matrix[i][0,:,0].shape[0])),
(0,n_input_y - int(input_augmented_matrix[i][0,0,:].shape[0]))),
'constant', constant_values=0)
label_augmented_matrix[i] = np.lib.pad(label_augmented_matrix[i], ((0,n_input_z - int(label_augmented_matrix[i][:,0,0].shape[0])),
(0,n_input_x - int(label_augmented_matrix[i][0,:,0].shape[0])),
(0,n_input_y - int(label_augmented_matrix[i][0,0,:].shape[0]))),
'constant', constant_values=0)
At this point, the dimensions are still (N,) even though every element of the list is constant. For example input_augmented_matrix[0] = input_augmented_matrix[1]
Currently I just loop through and create a new array, but it takes too long and I would prefer some sort of method that automates this. I do it with the following code...
input_4d = np.empty((n_volumes, n_input_z, n_input_x, n_input_y))
label_4d = np.empty((n_volumes, n_input_z, n_input_x, n_input_y))
for i in range(n_volumes):
print("Converting to 4D tuple #" + str(i))
for j in range(n_input_z):
for k in range(n_input_x):
for l in range(n_input_y):
input_4d[i][j][k][l] = input_augmented_matrix[i][j][k][l]
label_4d[i][j][k][l] = label_augmented_matrix[i][j][k][l]
Is there a cleaner and faster way to do this?
As I understood this part
k = 0
for z in range(z_min, z_max):
l = 0
for x in range(x_min, x_max):
m = 0
for y in range(y_min, y_max):
if j == 0:
#input volume
try:
temp_volume[k][l][m] = input_matrix[z][x][y]
except:
pdb.set_trace()
else:
#ground truth volume
temp_volume[k][l][m] = label_matrix[z][x][y]
m = m + 1
l = l + 1
k = k + 1
You just want to do this
temp_input = input_matrix[z_min:z_max, x_min:x_max, y_min:y_max]
temp_label = label_matrix[z_min:z_max, x_min:x_max, y_min:y_max]
and then
temp_input = np.rot90(temp_input, random_rotation)
temp_label = np.rot90(temp_label, random_rotation)
input_augmented_matrix.append(temp_input)
label_augmented_matrix.append(temp_label)
Here
input_augmented_matrix[i] = np.lib.pad(
input_augmented_matrix[i],
((0,n_input_z - int(input_augmented_matrix[i][:,0,0].shape[0])),
(0,n_input_x - int(input_augmented_matrix[i][0,:,0].shape[0])),
(0,n_input_y - int(input_augmented_matrix[i][0,0,:].shape[0]))),
'constant', constant_values=0)
Better to do this, because shape property gives you size of array by all dimensions
ia_shape = input_augmented_matrix[i].shape
input_augmented_matrix[i] = np.lib.pad(
input_augmented_matrix[i],
((0, n_input_z - ia_shape[0]),
(0, n_input_x - ia_shape[1])),
(0, n_input_y - ia_shape[2]))),
'constant',
constant_values=0)
I guess now you're ready to refactor the last part of your code with magic indexing of NumPy.
My common suggestions:
use functions for repeated parts of code to avoid such indents like in your cascade of loops;
if you need so lot of nested loops, think about recursion, if you can't deal without them;
explore abilities of NumPy in official documentation: they're really exciting ;) For example, indexing is helpful for this task;
use PyLint and Flake8 packages to inspect quality of your code.
Do you want to write neural network by yourself, or you just want to solve some patterns recognition task? SciPy library may contain what you need and it's based on NumPy.

Categories

Resources