Speed up multiplication of two dense tensors

Speed up multiplication of two dense tensors - python

I want to perform element wise multiplication between two tensors, where most of the elements are zero.
For two example tensors:
test1 = np.zeros((2, 3, 5, 6))
test1[0, 0, :, 2] = 4
test1[0, 1, [2, 4], 1] = 7
test1[0, 2, 2, :] = 2
test1[1, 0, 4, 1:3] = 5
test1[1, :, 0, 1] = 3
and,
test2 = np.zeros((5, 6, 4, 7))
test2[2, 2, 2, 4] = 4
test2[0, 1, :, 1] = 3
test2[4, 3, 2, :] = 6
test2[1, 0, 3, 1:3] = 1
test2[3, :, 0, 1] = 2
the calulation I need is:
result = test1[..., None, None] * test2[None, None, ...]
In the actual use case I am coding for, the tensors can have more dimensions and much longer lengths in some of the dimensions, so while the multiplication is reasonably quick, I would like to utilise the fact that most of the elements are zero.
My first thought was to make a sparse representation of each tensor.
coords1 = np.nonzero(test1)
shape1 = test1.shape
test1_squished = test1[coords1]
coords1 = np.array(coords1)
coords2 = np.nonzero(test2)
shape2 = test2.shape
test2_squished = test2[coords2]
coords2 = np.array(coords2)
Here there is enough information to perform the multiplication, by comparing the coordinates along the equal axes and multiplying if they are the same.
I have a function for adding a new axis,
def new_axis(coords, shape, axis):
new_coords = np.zeros((len(coords)+1, len(coords[0])))
new_index = np.delete(np.arange(0, len(coords)+1), axis)
new_coords[new_index] = coords
coords = new_coords
new_shape = np.zeros(len(new_coords), dtype=int)
new_shape[new_index] = shape
new_shape[axis] = 1
new_shape = np.array(new_shape)
return coords, new_shape
and for performing the multiplication,
def multiply(coords1, shape1, array1, coords2, shape2, array2): #all inputs should be numpy arrays
if np.array_equal( shape1, shape2 ):
index1 = np.nonzero( ( coords1.T[:, None, :] == coords2.T ).all(-1).any(-1) )[0]
index2 = np.nonzero( ( coords2.T[:, None, :] == coords1.T ).all(-1).any(-1) )[0]
array = array1[index1] * array2[index2]
coords = ( coords1.T[index] ).T
shape = shape1
else:
if len(shape1) == len(shape2):
equal_index = np.nonzero( ( shape1 == shape2 ) )[0]
not_equal_index = np.nonzero( ~( shape1 == shape2 ) )[0]
if np.logical_or( ( shape1[not_equal_index] == 1 ), ( shape2[not_equal_index] == 1 ) ).all():
#if where not equal, one of them = 1 -> can broadcast
# compare dimensions with same length, if equal then multiply corresponding elements
multiply_index1 = np.nonzero(
( coords1[equal_index].T[:, None, :] == coords2[equal_index].T ).all(-1).any(-1)
)[0]
# would like vecotrised version of below
array = []
coords = []
for index in multiply_index1:
multiply_index2 = np.nonzero( ( (coords2[equal_index]).T == (coords1[equal_index]).T[index] ).all(-1) )[0]
array.append( test_squished[index] * test2_squished[multiply_index2] )
temp = np.zeros((6, len(multiply_index2)))
temp[not_equal_index] = ((coords1[not_equal_index].T[index]).T + (coords2[not_equal_index].T[multiply_index2])).T
if len(multiply_index2)==1:
temp[equal_index] = coords1[equal_index].T[index].T[:, None]
else:
temp[equal_index] = np.repeat( coords1[equal_index].T[index].T[:, None], len(multiply_index2), axis=-1)
coords.append(temp)
array = np.concatenate(array)
coords = np.concatenate(coords, axis=-1)
shape = shape1
shape[np.where(shape==1)] = shape2[np.where(shape==1)]
else:
print("error")
else:
print("error")
return array, coords, shape
However the multiply function is very inefficient and so I lose any gain of going to the sparse representation.
Is there an elegant vectorised approach to the multiply function? Or is there a better solution than this sparse tensor idea?
Thanks in advance.

Related

Extracting Outer index boundaries of a 2D numpy array index

I have a 100 by 100 2D numpy array. and I also have the index of such array. Is there any way that I can extract or get the "unique indexes" along each side of the array? (North_Bound, East_Bound, West_Bound, South_Bound)? Below is my code attempt, but something is wrong as the size of each side index list be 99 but its not and sometimes generates erroneous indexes on my actual big data! Is there any better reliable way to do this job that would not generates wrong results?
import numpy as np
my_array = np.random.rand(100, 100)
indexes = np.argwhere(my_array[:, :] == my_array[:, :])
indexes = list(indexes)
NBound_indexes = np.argwhere(my_array[:, :] == my_array[0, :])
NBound_indexes = list(NBound_indexes)
SBound_indexes = np.argwhere(my_array[:, :] == my_array[99, :])
SBound_indexes = list(SBound_indexes)
WBound_indexes = []
for element in range(0, 100):
#print(element)
WB_index = np.argwhere(my_array[:, :] == my_array[element, 0])
WB_index = WB_index[0]
WBound_indexes.append(WB_index)
EBound_indexes = []
for element in range(0, 100):
#print(element)
EB_index = np.argwhere(my_array[:, :] == my_array[element, 99])
EB_index = EB_index[0]
EBound_indexes.append(EB_index)
outet_belt_ind = NBound_indexes
NBound_indexes.extend(EBound_indexes) #, SBound_index, WBound_index)
NBound_indexes.extend(SBound_indexes)
NBound_indexes.extend(WBound_indexes)
outer_bound = []
for i in NBound_indexes:
i_list = list(i)
outer_bound.append(i_list)
outer_bound = [outer_bound[i] for i in range(len(outer_bound)) if i == outer_bound.index(outer_bound[i]) ]

This wouldn't extract them directly from my_array but you could use list comprehensions to generate similar results to your code above:
y, x = my_array.shape
WBound_indexes = [np.array((i, 0)) for i in range(1, y-1)]
EBound_indexes = [np.array((i, x-1)) for i in range(1, y-1)]
NBound_indexes = [np.array((0, i)) for i in range(x)]
SBound_indexes = [np.array((y-1, i)) for i in range(x)]
outer_bound = NBound_indexes + WBound_indexes + SBound_indexes + EBound_indexes
For example, WBound_indexes would look like:
[array([1, 0]), array([2, 0]), ..., array([97, 0]), array([98, 0])]

How do I modify this function to return a 4d array instead of 3d?

I created this function that takes in a dataframe to return an ndarrays of input and label.
def transform_to_array(dataframe, chunk_size=100):
grouped = dataframe.groupby('id')
# initialize accumulators
X, y = np.zeros([0, 1, chunk_size, 4]), np.zeros([0,]) # original inpt shape: [0, 1, chunk_size, 4]
# loop over each group (df[df.id==1] and df[df.id==2])
for _, group in grouped:
inputs = group.loc[:, 'A':'D'].values
label = group.loc[:, 'label'].values[0]
# calculate number of splits
N = (len(inputs)-1) // chunk_size
if N > 0:
inputs = np.array_split(
inputs, [chunk_size + (chunk_size*i) for i in range(N)])
else:
inputs = [inputs]
# loop over splits
for inpt in inputs:
inpt = np.pad(
inpt, [(0, chunk_size-len(inpt)),(0, 0)],
mode='constant')
# add each inputs split to accumulators
X = np.concatenate([X, inpt[np.newaxis, np.newaxis]], axis=0)
y = np.concatenate([y, label[np.newaxis]], axis=0)
return X, y
The function returned X of shape (n_samples, 1, chunk_size, 4) and y of shape (n_samples, ).
For examples:
N = 10_000
id = np.arange(N)
labels = np.random.randint(5, size=N)
df = pd.DataFrame(data = np.random.randn(N, 4), columns=list('ABCD'))
df['label'] = labels
df.insert(0, 'id', id)
df = df.loc[df.id.repeat(157)]
df.head()
id A B C D label
0 0 -0.571676 -0.337737 -0.019276 -1.377253 1
0 0 -0.571676 -0.337737 -0.019276 -1.377253 1
0 0 -0.571676 -0.337737 -0.019276 -1.377253 1
0 0 -0.571676 -0.337737 -0.019276 -1.377253 1
0 0 -0.571676 -0.337737 -0.019276 -1.377253 1
To generate the followings:
X, y = transform_to_array(df)
X.shape # shape of input
(20000, 1, 100, 4)
y.shape # shape of label
(20000,)
This function works fine as intended, however, it takes long time to finish execution:
start_time = time.time()
X, y = transform_to_array(df)
end_time = time.time()
print(f'Time taken: {end_time - start_time} seconds.')
Time taken: 227.83956217765808 seconds.
In attempt to improve performance of the function (minimise exec. time), I created the following modified func:
def modified_transform_to_array(dataframe, chunk_size=100):
# group data by 'id'
grouped = dataframe.groupby('id')
# initialize lists to store transformed data
X, y = [], []
# loop over each group (df[df.id==1] and df[df.id==2])
for _, group in grouped:
# get input and label data for group
inputs = group.loc[:, 'A':'D'].values
label = group.loc[:, 'label'].values[0]
# calculate number of splits
N = (len(inputs)-1) // chunk_size
if N > 0:
# split input data into chunks
inputs = np.array_split(
inputs, [chunk_size + (chunk_size*i) for i in range(N)])
else:
inputs = [inputs]
# loop over splits
for inpt in inputs:
# pad input data to have a chunk size of chunk_size
inpt = np.pad(
inpt, [(0, chunk_size-len(inpt)),(0, 0)],
mode='constant')
# add each input split and corresponding label to lists
X.append(inpt)
y.append(label)
# convert lists to numpy arrays
X = np.array(X)
y = np.array(y)
return X, y
At first, it seems like I succeeded reducing time taken:
start_time = time.time()
X2, y2 = modified_transform_to_array(df)
end_time = time.time()
print(f'Time taken: {end_time - start_time} seconds.')
Time taken: 5.842168092727661 seconds.
However, the result is that it changes the shape of the intended returned value.
X2.shape # this should be (20000, 1, 100, 4)
(20000, 100, 4)
y.shape # this is fine
(20000, )
Question
How do I modify modified_transform_to_array() to return the intended array shape (n_samples, 1, chunk_size, 4) since it is much faster?

You can simply reshape the X just before returning it at the end of modified_transform_to_array(), e.g.:
def modified_transform_to_array( ... ):
...
# convert lists to numpy arrays
X = np.array(X)
y = np.array(y)
X = X.reshape((X.shape[0], 1, *X.shape[1:])) # <-- THIS LINE
return X, y
or, equivalently:
X = X.reshape((X.shape[0], 1, X.shape[1], X.shape[2]))
As pointed out in #MSS's answer, you can achieve the same reshaping result also with slicing, by starting from a a slicing where you are selecting the whole array (i.e. X[:, :, :]) and inserting a None (or its more explicit alias np.newaxis) in the position where you want to augment the number of dimensions:
X = X[:, None, :, :]
X = X[:, np.newaxis, :, :]
The last two slicing can be replaced by an Ellipsis ... which essentially produces enough full-axis slicing (i.e. : or slice(None)) to fill the whole array dimensions.
X = X[:, None, ...]
X = X[:, np.newaxis, ...]
You may want to read the relevant section of NumPy's user guide for further explanations on the use of None and Ellipsis in NumPy's slicing.

Add a new axis to your X just before returning it in modified_transform_to_array, e.g.:
def modified_transform_to_array( ... ):
...
# convert lists to numpy arrays
X = np.array(X)
y = np.array(y)
X = X[:, np.newaxis, ...] # <---in this place
# X = X[:, None, :, :]
return X, y

How to convert numpy code into tensorflow?

I am trying to convert numpy code into tensorflow graph format. But somewhere I am missing an understanding of dimensionality.
Here is numpy code:
def delta_to_boxes3d(deltas, anchors, coordinate='lidar'):
# Input:
# deltas: (N, w, l, 14)(200,240,14)
# feature_map_shape: (w, l)
# anchors: (w, l, 2, 7)(200,240,2,7)
# Ouput:
# boxes3d: (N, w*l*2, 7)
#anchros shape 9200,240,2,7)
anchors_reshaped = anchors.reshape(-1, 7) #(96000,7)
deltas = deltas.reshape(-1, 7) #(96000,7)
anchors_d = np.sqrt(anchors_reshaped[:, 4]**2 + anchors_reshaped[:, 5]**2)
boxes3d = np.zeros_like(deltas)
boxes3d[..., [0, 1]] = deltas[..., [0, 1]] * \
anchors_d[:, np.newaxis] + anchors_reshaped[..., [0, 1]] #in this line I have the problem
boxes3d[..., [2]] = deltas[..., [2]] * \
1.73 + anchors_reshaped[..., [2]] #ANCHOR_H = 1.73
boxes3d[..., [3, 4, 5]] = np.exp(
deltas[..., [3, 4, 5]]) * anchors_reshaped[..., [3, 4, 5]]
boxes3d[..., 6] = deltas[..., 6] + anchors_reshaped[..., 6]
return boxes3d
Here is the code which I have been trying:
def delta_boxes3d():
anchors = tf.placeholder(tf.float32,shape=[None,None,2,7],name="anchor") #check the anchor type later
anchors_reshaped = tf.reshape(anchors,shape=[96000,7])
delta = tf.placeholder(tf.float32,shape=[None,None,14],name="delta")
anchors_d = tf.sqrt(tf.add(tf.pow(anchors_reshaped[:,4],2),tf.pow(anchors_reshaped[:,5],2))) #96000
deltas = tf.reshape(delta,[96000,7])
x_shape = tf.shape(deltas)
boxes3d_ = tf.multiply(deltas[:,0:2],tf.add(tf.expand_dims(anchors_d,-1),anchors_reshaped[:,0:2]))
boxes3d = tf.ones(x_shape[:-1]) + boxes3d_
elta_ = np.random.rand(200,240,14)
anchor_ = np.random.rand(200,240,2,7)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
result = sess.run(boxes3d1,feed_dict={anchors:anchor_,delta:delta_}) #(96000,7) #need to get boxes3d
print(result.shape)
I am getting below error:
ValueError: Dimensions must be equal, but are 96000 and 2 for '{{node add_2}} = AddV2[T=DT_FLOAT](ones, Mul)' with input shapes: [96000], [96000,2].
Could someone help me with this?
Thanks in advance

The error comes from the line boxes3d = tf.ones(x_shape[:-1]) + boxes3d_.
You are trying to add shapes (96000,) and (96000,2), which you can't without expanding dims. If you want to add a scalar, you can do boxes3d = 1 + boxes3d.
In the example above, you want to do multiplication by scalar followed by addition.
Note that in the NumPy example in the highlighted line, you do the multiplication first and then addition. In TensorFlow, you made it the other way around (possibly by mistake).
I rewrote your NumPy example to Tensorflow 2 so that both functions return the same output.
def delta_boxes3d(deltas, anchors):
deltas = tf.constant(deltas)
anchors = tf.constant(anchors)
anchors_reshaped = tf.reshape(anchors, shape=[96000, 7])
anchors_d = tf.sqrt(tf.add(tf.pow(anchors_reshaped[:, 4], 2), tf.pow(anchors_reshaped[:, 5], 2))) # 96000
deltas = tf.reshape(deltas, [96000, 7])
boxes3d_01 = tf.add(tf.multiply(deltas[:, 0:2], tf.expand_dims(anchors_d, -1)), anchors_reshaped[:, 0:2])
boxes3d_2 = deltas[..., 2:3] * 1.73 + anchors_reshaped[..., 2:3]
boxes3d_345 = tf.exp(deltas[..., 3:6]) * anchors_reshaped[..., 3:6]
boxes3d_6 = deltas[..., 6:7] + anchors_reshaped[..., 6:7]
boxes3d = tf.concat([boxes3d_01, boxes3d_2, boxes3d_345, boxes3d_6], axis=-1)
return boxes3d
deltas = np.random.rand(200, 240, 14)
anchors = np.random.rand(200, 240, 2, 7)
print(delta_to_boxes3d(deltas, anchors))
print(delta_boxes3d(deltas, anchors))
You can notice I created smaller arrays first, and then I concatenated them. This is because Tensorflow won't allow me to modify EagerTensors.
Notice the difference between deltas[..., 2] and deltas[..., 2:3]. The second one doesn't reduce the last dimension. They return shapes (96000,), and (96000,1) respectively.

kNN feature should passed through as list

my data is like:
sample1 = [[1, 0, 3, 5, 0, 9], 0, 1.5, 0]
sample2 = [[0, 4, 0, 6, 2, 0], 2, 1.9, 1]
sample3 = [[9, 7, 6, 0, 0, 0], 0, 1.3, 1]
paul = pd.DataFrame(data = [sample1, sample2, sample3], columns=`['list','cat','metr','target'])`
on this data a scikit-learn kNN-Regression with an specific distance function should be done.
The distance function is:
def my_distance(X,Y,**kwargs):
if len(X)>1:
x = X
y = Y
all_minima = []
for k in range(len(x)):
one_minimum = min(x[k],y[k])
all_minima.append(one_minimum)
sum_all_minima=sum(all_minima)
distance = (sum(x)+sum(y)-sum_all_minima) * kwargs["Para_list"]
elif X.dtype=='int64':
x = X
y = Y
if x == y and x != -1:
distance = 0
elif x == -1 or y == -1 or x is None or y is None:
distance = kwargs["Para_minus1"] * 1
else:
distance = kwargs["Para_nominal"] * 1
else:
x = X
y = Y
if x == y:
distance = 0
elif x == -1 or y == -1 or x is None or y is None:
distance = kwargs["Para_minus1"] * 1
else:
distance = abs(x-y) * kwargs["Para_metrisch"]
return distance
And should be implemented as valid distance function by
DistanceMetric.get_metric('pyfunc',func=my_distance)
As I'm right, the scikit code should be like this:
train , test = train_test_split(paul, test_size = 0.3)
#x_train soll nur unabhähgige Variablen enthalten, andere kommen raus:
x_train = train.drop('target', axis=1)
y_train = train['target']
x_test = test.drop('target', axis = 1)
y_test = test['target']
knn = KNeighborsRegressor(n_neighbors=2,
algorithm='ball_tree',
metric=my_distance,
metric_params={"Para_list": 2,
"Para_minus1": 3,
"Para_metrisch": 2,
"Para_nominal": 4}))
knn.fit(x_train,y_train)
y_pred=knn.predict(x_test)
I get
ValueError: setting an array element with a sequence.
I guess scikit can not handle a single feature item as list? Is there a way to make that happen?

I guess scikit can not handle a single feature item as list? Is there a way to make that happen?
No, there is no way I know of to make this happen. You need to convert this feature into 2D matrix, concatenate it with other 1D features, to form data appropriately. This is standard sklearn behavior.
Unless you have some very narrow use-case, making 2D array from list feature is totally fine. I assume, all lists have same length.

How to pad multiple tensors with one on main diagonal and zeros elsewhere?

I have R as 2D rotation matrices of shape (N,2,2). Now I wish to extend each matrix to (3,3) 3D rotation matrices, i.e. to put zeros in each [:,:2,:2] and put 1 to [:,2,2].
How to do this in tensorflow?
UPDATE
I tried this way
R = tf.get_variable(name='R', shape=np.shape(R_value), dtype=tf.float64,
initializer=tf.constant_initializer(R_value))
eye = tf.eye(np.shape(R_value)[1]+1)
right_column = eye[:2,2]
bottom_row = eye[2,:]
R = tf.concat([R, right_column], 3)
R = tf.concat([R, bottom_row], 2)
but failed, because concat doesn't do broadcasting...
UPDATE 2
I made explicit broadcasting and also fixed wrong indices in concat calls:
R = tf.get_variable(name='R', shape=np.shape(R_value), dtype=tf.float64,
initializer=tf.constant_initializer(R_value))
eye = tf.eye(np.shape(R_value)[1]+1, dtype=tf.float64)
right_column = eye[:2,2]
right_column = tf.expand_dims(right_column, 0)
right_column = tf.expand_dims(right_column, 2)
right_column = tf.tile(right_column, (np.shape(R_value)[0], 1, 1))
bottom_row = eye[2,:]
bottom_row = tf.expand_dims(bottom_row, 0)
bottom_row = tf.expand_dims(bottom_row, 0)
bottom_row = tf.tile(bottom_row, (np.shape(R_value)[0], 1, 1))
R = tf.concat([R, right_column], 2)
R = tf.concat([R, bottom_row], 1)
The solutions looks rather complex. Are there any simpler ones?

first pad zeros to [N, 2, 2] to be [N, 3, 3] with padded = tf.pad(R, [[0, 0], [0, 1], [0, 1]])
then convert padded[N, 2, 2] to 1:
since tf.Tensor does not support assignment, you can do this with initializing a np.array, and then add them together.
arr = np.zeros((3, 3))
arr[2, 2] = 1
R = padded + arr # broadcast used here
now variable R is what you need

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Speed up multiplication of two dense tensors - python

Related

Extracting Outer index boundaries of a 2D numpy array index

How do I modify this function to return a 4d array instead of 3d?

How to convert numpy code into tensorflow?

kNN feature should passed through as list

How to pad multiple tensors with one on main diagonal and zeros elsewhere?

Categories

Resources