numpy: working on a subset of matrix

numpy: working on a subset of matrix - python

I have a 4x4 identity matrix in numpy and I want to scale the first 3 dimensions by a factor. Currently, the way I am doing it as follows:
# Some scaling factors passed as a parameter by the user
scale = (2, 3, 4)
scale += (1,) # extend the tuple
my_mat = scale * np.eye(4)
Out of curiosity, I was wondering if there is some way to do this without extending the tuple.

This is quickly done with numpy broadcasting rules and indexing
A = np.eye(4)
scale = [2, 3, 4]
A[:3, :3] *= scale

Related

Python - matrix multiplication

i have an array y with shape (n,), I want to compute the inner product matrix, which is a n * n matrix
However, when I tried to do it in Python
np.dot(y , y)
I got the answer n, this is not what I am looking for
I have also tried:
np.dot(np.transpose(y),y)
np.dot(y, np.transpose(y))
I always get the same answer n

I think you are looking for:
np.multiply.outer(y,y)
or equally:
y = y[None,:]
y.T#y
example:
y = np.array([1,2,3])[None,:]
output:
#[[1 2 3]
# [2 4 6]
# [3 6 9]]

You can try to reshape y from shape (70,) to (70,1) before multiplying the 2 matrices.
# Reshape
y = y.reshape(70,1)
# Either below code would work
y*y.T
np.matmul(y,y.T)

One-liner?
np.dot(a[:, None], a[None, :])
transpose doesn't work on 1-D arrays, because you need atleast two axes to 'swap' them. This solution adds a new axis to the array; in the first argument, it looks like a column vector and has two axes; in the second argument it still looks like a row vector but has two axes.

Looks like what you need is the # matrix multiplication operator. dot method is only to compute dot product between vectors, what you want is matrix multiplication.
>>> a = np.random.rand(70, 1)
>>> (a # a.T).shape
(70, 70)
UPDATE:
Above answer is incorrect. dot does the same things if the array is 2D. See the docs here.
np.dot computes the dot product of two arrays. Specifically,
If both a and b are 1-D arrays, it is inner product of vectors (without complex conjugation).
If both a and b are 2-D arrays, it is matrix multiplication, but using matmul or a # b is preferred.
Simplest way to do what you want is to convert the vector to a matrix first using np.matrix and then using the #. Although, dot can also be used # is better because conventionally dot is used for vectors and # for matrices.
>>> a = np.random.rand(70)
(70,)
>>> a.shape
>>> a = np.matrix(a).T
>>> a.shape
(70, 1)
>>> (a # a.T).shape
(70, 70)

Numpy matrix 4x4 # Nx4

I'm trying to apply world transformation to a numpy matrix. However, I can't seem to find a numpy way to perform a 4x4 matrix multiplication by Nx4 vector where N is the number of vertices.
I have both tried Nx4x4#Nx4 and 4x4#Nx4 multiplications. Sure, I could do this element wise but I'm hoping there's a smarter way to do this.
vertices = np.ones([VERTEX_COUNT, 4])
vertices[:, 0:3] = vertex_map[element.path_vertices]
matrix = np.full([VERTEX_COUNT, 4, 4], np.reshape(element.matrix, [4, 4]))
transformed = matrix # vertices # dimension mismatch
# i would rather not do this
# matrix = np.reshape(element.matrix, [4, 4])
# transformed = np.array([matrix # vertex for vertex in vertices])

Alternative to loop for for boolean / nonzero indexing of numpy array

I need to select only the non-zero 3d portions of a 3d binary array (or alternatively the true values of a boolean array). Currently I am able to do so with a series of 'for' loops that use np.any, but this does work but seems awkward and slow, so currently investigating a more direct way to accomplish the task.
I am rather new to numpy, so the approaches that I have tried include a) using
np.nonzero, which returns indices that I am at a loss to understand what to do with for my purposes, b) boolean array indexing, and c) boolean masks. I can generally understand each of those approaches for simple 2d arrays, but am struggling to understand the differences between the approaches, and cannot get them to return the right values for a 3d array.
Here is my current function that returns a 3D array with nonzero values:
def real_size(arr3):
true_0 = []
true_1 = []
true_2 = []
print(f'The input array shape is: {arr3.shape}')
for zero_ in range (0, arr3.shape[0]):
if arr3[zero_].any()==True:
true_0.append(zero_)
for one_ in range (0, arr3.shape[1]):
if arr3[:,one_,:].any()==True:
true_1.append(one_)
for two_ in range (0, arr3.shape[2]):
if arr3[:,:,two_].any()==True:
true_2.append(two_)
arr4 = arr3[min(true_0):max(true_0) + 1, min(true_1):max(true_1) + 1, min(true_2):max(true_2) + 1]
print(f'The nonzero area is: {arr4.shape}')
return arr4
# Then use it on a small test array:
test_array = np.zeros([2, 3, 4], dtype = int)
test_array[0:2, 0:2, 0:2] = 1
#The function call works and prints out as expected:
non_zero = real_size(test_array)
>> The input array shape is: (2, 3, 4)
>> The nonzero area is: (2, 2, 2)
# So, the array is correct, but likely not the best way to get there:
non_zero
>> array([[[1, 1],
[1, 1]],
[[1, 1],
[1, 1]]])
The code works appropriately, but I am using this on much larger and more complex arrays, and don't think this is an appropriate approach. Any thoughts on a more direct method to make this work would be greatly appreciated. I am also concerned about errors and the results if the input array has for example two separate non-zero 3d areas within the original array.
To clarify the problem, I need to return one or more 3D portions as one or more 3d arrays beginning with an original larger array. The returned arrays should not include extraneous zeros (or false values) in any given exterior plane in three dimensional space. Just getting the indices of the nonzero values (or vice versa) doesn't by itself solve the problem.

Assuming you want to eliminate all rows, columns, etc. that contain only zeros, you could do the following:
nz = (test_array != 0)
non_zero = test_array[nz.any(axis=(1, 2))][:, nz.any(axis=(0, 2))][:, :, nz.any(axis=(0, 1))]
An alternative solution using np.nonzero:
i = [np.unique(_) for _ in np.nonzero(test_array)]
non_zero = test_array[i[0]][:, i[1]][:, :, i[2]]
This can also be generalized to arbitrary dimensions, but requires a bit more work (only showing the first approach here):
def real_size(arr):
nz = (arr != 0)
result = arr
axes = np.arange(arr.ndim)
for axis in range(arr.ndim):
zeros = nz.any(axis=tuple(np.delete(axes, axis)))
result = result[(slice(None),)*axis + (zeros,)]
return result
non_zero = real_size(test_array)

Scaling set of rows in a tensor by constant factor

TL;DR How to scale part of tensor by 2 (row-indices present in a tf list)
Details:
indices_of_scaling_ids: Stores list of row_ids
Tensor("Squeeze:0", dtype=int64, device=/device:GPU:0)
[1, 4, 5, 6, 12]
emb_inputs = tf.nn.embedding_lookup(embedding, self.all_rows)
#tensor with shape (batch_size=4, all_row_len, emb_size=128)
So, for every self.all_rows, the emb_inputs is evaluated.
Question / Challenge faced: I need to scale the emb_inputs by 2.0 for every row_ids mentioned in indices_of_scaling_ids.
I have tried various splicing things, but can't seem to get to a nice solution. Can someone suggest? Thanks
N.B. Beginner at Tensorflow

Try with something like this:
SCALE = 2
emb_inputs = ...
indices_of_scaling_ids = ...
emb_shape = tf.shape(emb_inputs)
# Select indices in boolean array
r = tf.range(emb_shape[1])
mask = tf.reduce_any(tf.equal(r[:, tf.newaxis], indices_of_scaling_ids), axis=1)
# Tile the mask
mask = tf.tile(mask[tf.newaxis, :, tf.newaxis], (emb_shape[0], 1, emb_shape[2]))
# Choose scaled or not depending on indices
result = tf.where(mask, SCALE * emb_inputs, emb_inputs)

Numpy - Speed up iteration comparison?

The following use case:
I have a Numpy matrix/array with a few thousand 2d points. Call it A.
Eg:
[1 2]
[300 400]
..
[123 242]
I also have another Numpy matrix with a few 2d points as above. Call it B.
Basically, I want to iterate through A, then iterate through B and compute the distance between A[i] and B[j]. Then assign that back to another array. I could do it like this:
for i, (x0, x1) in enumerate(zip(A[:,0],A[:,1])):
weight_distance = 0
for j, (p0, p1) in enumerate(zip(A[:,0],A[:,1])):
weight_distance = weight_distance + distance((p0,p1),(x0,x1))
weight_array[i] = weight_distance
But this is too slow. What might be a Numpy way to approach this?

What you're probably looking for is the code in scipy.spatial.distance, particularly the cdist function. This can efficiently compute the pairwise distances between arrays of points for a wide variety of metrics.
import numpy as np
from scipy.spatial.distance import cdist
A = np.random.random((1000, 2))
B = np.random.random((100, 2))
D = cdist(A, B, metric='euclidean')
print(D.shape) # (1000, 100)
weights = D.sum(1)
print(weights.shape) # (1000,)
Here euclidean is the standard root-sum-square distance that you're probably used to, and D[i, j] holds the distance between A[i] and B[j], and so summing along axis 1 gives the desired weights.
There are ways to do this via broadcasting directly in numpy, but that approach would use several large temporary arrays, and will in general be slower than the scipy cdist approach.
Edit:
I thought I may as well add a note on the NumPy-only approach. It looks like this:
D2 = np.sqrt(((A[:, None, :] - B[None, :, :]) ** 2).sum(-1))
weights2 = D2.sum(1)
np.allclose(weights, weights2) # True
Let's break it down:
A[:, None, :] adds a new dimension to A, so its shape is now [1000, 1, 2]. Similar for B[None, :, :], which becomes [1, 100, 2]
A[:, None, :] - B[None, :, :] is a broadcasting operation which results in an array of differences, with shape [1000, 100, 2]
We square every element of this result.
the sum(-1) method on this result sums across the last dimension, resulting in an array of shape [1000, 100]
we take the square root of the result, which gives the distance matrix
we sum along axis 1 to get the weights
Notice that this broadcasting approach creates not one, but two temporary arrays of size 1000 * 100 * 2 along the way, which is why it is less efficient than a purpose-built compiled function like cdist.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

numpy: working on a subset of matrix - python

This is quickly done with numpy broadcasting rules and indexing A = np.eye(4) scale = [2, 3, 4] A[:3, :3] *= scale

Related

Python - matrix multiplication

Numpy matrix 4x4 # Nx4

Alternative to loop for for boolean / nonzero indexing of numpy array

Scaling set of rows in a tensor by constant factor

Numpy - Speed up iteration comparison?

Categories

Resources