How do I use Numpy matrix operations to calculate over multiple vector samples at once?
Please see below the code I came up with, 'd' is the outcome I'm trying to get. But this is only one sample. How do I calculate the output without doing something like repeat the code for every sample OR looping through every sample?
a = np.array([[1, 2, 3]])
b = np.array([[1, 2, 3]])
c = np.array([[1, 2, 3]])
d = ((a.T * b).flatten() * c.T)
a1 = np.array([[2, 3, 4]])
b1 = np.array([[2, 3, 4]])
c1 = np.array([[2, 3, 4]])
d1 = ((a1.T * b1).flatten() * c1.T)
a2 = np.array([[3, 4, 5]])
b2 = np.array([[3, 4, 5]])
c2 = np.array([[3, 4, 5]])
d2 = ((a2.T * b2).flatten() * c2.T)
The way broadcasting works is to repeat your data along an axis of size one as many times as necessary to make your element-wise operation work. That is what is happening to axis 1 of a.T and axis 0 of b. Similar for the product of the result. My recommendation would be to concatenate all your inputs along another dimension, to allow broadcasting to happen along the existing two.
Before showing how to do that, let me just mention that you would be much better off using ravel instead of flatten in your example. flatten makes a copy of the data, while ravel only makes a view. Since a.T * b is a temporary matrix anyway, there is really no reason to make the copy.
The easiest way to combine some arrays along a new dimension is np.stack. I would recommend combining along the first dimension for a couple of reasons. It's the default for stack and your result can be indexed more easily: d[0] will be d, d[1] will be d1, etc. If you ever add matrix multiplication into your pipeline, np.dot will work out of the box since it operates on the last two dimensions.
a = np.stack((a0, a1, a2, ..., aN))
b = np.stack((b0, b1, b2, ..., bN))
c = np.stack((c0, c1, c2, ..., cN))
Now a, b and c are all 3D arrays the first dimension is the measurement index. The second and third correspond to the two dimensions of the original arrays.
With this structure, what you called transpose before is just swapping the last two dimensions (since one of them is 1), and raveling/flattening is just multiplying out the last two dimensions, e.g. with reshape:
d = (a.reshape(N, -1, 1) * b).reshape(N, 1, -1) * c.reshape(N, -1, 1)
If you set one of the dimensions to have size -1 in the reshape, it will absorb the remaining size. In this case, all your arrays have 3 elements, so the -1 will be equivalent to 3.
You have to be a little careful when you convert the ravel operation to 3D. In 2D, x.ravel() * c.T implicitly transforms x into a 1xN array before broadcasting. In 3D, x.reshape(3, -1) creates a 2D 3x27 array, which you multiply by c.reshape(3, -1, 1), which is 3x3x1. Broadcasting rules state that you are effectively multiplying a 1x3x27 array by a 3x3x1, but you really want to multiply a 3x1x27 array by the 3x3x1, so you need to specify all three axes for the 3D "ravel" explicitly.
Here is an IDEOne link with your sample data for you to play with: https://ideone.com/p8vTlx
Related
I am trying to get the dotproduct of two arrays in python using the numpy package. I get as output an array of size (n,). It says that my array has no column while I do see the results when I print it. Why does my array have no column and how do I fix this?
My goal is to calculate y - np.dot(x,b). The issue is that y is (124, 1) while np.dot(x,b) is (124,)
Thanks
It seems that you are trying to subtract two arrays of a different shape. Fortunately, it is off by a single additional axis, so there are two ways of handling it.
(1) You slice the y array to match the shape of the dot(x,b) array:
y = y[:,0]
print(y-np.dot(x,b))
(2) You add an additional axis on the np.dot(x,b) array:
dot = np.dot(x,b)
dot = dot[:,None]
print(y-dot)
Hope this helps
it may depends on the dimension of your array
For example :
a = [1, 0]
b = [[4, 1], [2, 2]]
c = np.dot(a,b)
gives
array([4, 1])
and its shape is (2,)
but if you change a like :
a = [[1, 0],[1,1]]
then result is :
array([[4, 1],
[6, 3]])
and its shape is (2,2)
Let's suppose I have two arrays that represent pixels in pictures.
I want to build an array of tensordot products of pixels of a smaller picture with a bigger picture as it "scans" the latter. By "scanning" I mean iteration over rows and columns while creating overlays with the original picture.
For instance, a 2x2 picture can be overlaid on top of 3x3 in four different ways, so I want to produce a four-element array that contains tensordot products of matching pixels.
Tensordot is calculated by multiplying a[i,j] with b[i,j] element-wise and summing the terms.
Please examine this code:
import numpy as np
a = np.array([[0,1,2],
[3,4,5],
[6,7,8]])
b = np.array([[0,1],
[2,3]])
shape_diff = (a.shape[0] - b.shape[0] + 1,
a.shape[1] - b.shape[1] + 1)
def compute_pixel(x,y):
sub_matrix = a[x : x + b.shape[0],
y : y + b.shape[1]]
return np.tensordot(sub_matrix, b, axes=2)
def process():
arr = np.zeros(shape_diff)
for i in range(shape_diff[0]):
for j in range(shape_diff[1]):
arr[i,j]=compute_pixel(i,j)
return arr
print(process())
Computing a single pixel is very easy, all I need is the starting location coordinates within a. From there I match the size of the b and do a tensordot product.
However, because I need to do this all over again for each x and y location as I'm iterating over rows and columns I've had to use a loop, which is of course suboptimal.
In the next piece of code I have tried to utilize a handy feature of tensordot, which also accepts tensors as arguments. In order words I can feed an array of arrays for different combinations of a, while keeping the b the same.
Although in order to create an array of said combination, I couldn't think of anything better than using another loop, which kind of sounds silly in this case.
def try_vector():
tensor = np.zeros(shape_diff + b.shape)
for i in range(shape_diff[0]):
for j in range(shape_diff[1]):
tensor[i,j]=a[i: i + b.shape[0],
j: j + b.shape[1]]
return np.tensordot(tensor, b, axes=2)
print(try_vector())
Note: tensor size is the sum of two tuples, which in this case gives (2, 2, 2, 2)
Yet regardless, even if I produced such array, it would be prohibitively large in size to be of any practical use. For doing this for a 1000x1000 picture, could probably consume all the available memory.
So, is there any other ways to avoid loops in this problem?
In [111]: process()
Out[111]:
array([[19., 25.],
[37., 43.]])
tensordot with 2 is the same as element multiply and sum:
In [116]: np.tensordot(a[0:2,0:2],b, axes=2)
Out[116]: array(19)
In [126]: (a[0:2,0:2]*b).sum()
Out[126]: 19
A lower-memory way of generating your tensor is:
In [121]: np.lib.stride_tricks.sliding_window_view(a,(2,2))
Out[121]:
array([[[[0, 1],
[3, 4]],
[[1, 2],
[4, 5]]],
[[[3, 4],
[6, 7]],
[[4, 5],
[7, 8]]]])
We can do a broadcasted multiply, and sum on the last 2 axes:
In [129]: (Out[121]*b).sum((2,3))
Out[129]:
array([[19, 25],
[37, 43]])
I sometimes work with 1D arrays:
A = np.array([1, 2, 3, 4])
or 2D arrays (mono or stereo signals read with scipy.io.wavfile):
A = np.array([[1, 2], [3, 4], [5, 6], [7,8]])
Without having to distinguish these 2 cases with an if A.ndim == 2:..., is there a simple one-line solution to multiply this array A by an 1D array B = np.linspace(0., 1., 4)?
If A is 1D, then it's just A * B, and if A is 2D, what I mean is multiply each line of A by each element of B.
Note: this question arises naturally when working with both mono and stereo sounds read with scipy.io.wavfile.
Approach #1
We can use einsum to cover generic ndarrays -
np.einsum('i...,i->i...',A,B)
The trick that works here is the ellipsis that broadcasts for the trailing dimensions after the first axis as they are in A and keeps them in the output, while keeping the first axes for the two inputs aligned, which is the intended multiplication here. With A as 1D array, there's no broadcasting and that essentially reduces to : np.einsum('i,i->i',A,B) under the hoods.
Schematically put :
A : i x ....
B : i
out : i x ....
Hence, covers for A with any number of dimensions.
More info on the use of ellipsis from the docs :
To enable and control broadcasting, use an ellipsis. Default
NumPy-style broadcasting is done by adding an ellipsis to the left of
each term, like np.einsum('...ii->...i', a). To take the trace along
the first and last axes, you can do np.einsum('i...i', a), or to do a
matrix-matrix product with the left-most indices instead of rightmost,
you can do np.einsum('ij...,jk...->ik...', a, b).
Approach #2
Using the fact that we are trying to align the first axis of A with the only axis of 1D array B, we can simply transpose A, multiply with B and finally transpose back -
(A.T*B).T
I'm currently trying to append multiple Numpy arrays together. Basically, what I want to do is to start from a (1 x m) matrix (technically a vector), and end up with a (n x m) matrix. So going from n (1 x m) matrices (vectors) to one (n x m) matrix (If that makes any sense). The ultimate goal with this is to write the matrix into a csv-file with the numpy.savetxt() function so I'll end up with a csv-file with n columns of m length.
The problem with this is that numpy.append() appends the vectors together into a (1 x 2m) vector. So let's say a1 and a2 are Numpy arrays with 10000 elements each. I'll append a2 into a1 by using the append function and simultaneously creating a new array called a, which contains both a1 and a2.
a=np.append(a1, a2, axis=0)
a.shape
>>(20000,)
What I want instead is for the shape to be of the form
>>(2, 10000)
or more generally
>>(n, m)
What should I do? Please note, that I want to continue adding the vectors into the array. Thanks for your time!
you can use the transpose of numpy.column_stack
For example:
import numpy as np
a=np.array([1,2,3,4,5])
b=np.array([9,8,7,6,5])
c=np.column_stack((a,b)).T
print c
>>> array([[1, 2, 3, 4, 5],
[9, 8, 7, 6, 5]])
print a.shape,b.shape,c.shape
>>> (5,) (5,) (2, 5)
EDIT:
you can keep adding columns like so:
d=np.array([2,2,2,2,2])
c=np.column_stack((c.T,d)).T
print c
>>> array([[1, 2, 3, 4, 5],
[9, 8, 7, 6, 5],
[2, 2, 2, 2, 2]])
print c.shape
>>> (3, 5)
This should work
a=np.append(a1, a2, axis=0).reshape(2,10000)
a.shape
>>(2,10000)
In order to merge arrays vertically I would use np.vstack
import numpy as np
np.vstack((a1,a2))
However, from my point of view, numpy.array shouldn't be created using for loops and appending the new array to the old one. Instead, either you create first the whole numpy.array (nxm) and you write the data from the for loop into that array,
data = np.zeros((n,m))
for i in range(n):
data[i] = ...
or you first create your array as an ordinary python list using append which you can transform at the end into an numpy.array.
data = []
for i in range(n):
data.append(...)
data = np.asarray(data)
This is my goal, using Python Numpy:
I would like to create a (1000,1000) dimensional array/matrix of dot product values. That means each array/matrix entry is the dot product of vectors 1 through 1000. Constructing this is theoretically simple: one defines a (1,1000) dimensional matrix of vectors v1, v2, ..., v1000
import numpy as np
vectorvalue = np.matrix([v1, v2, v3, ..., v1000])
and takes the dot product with the transpose, i.e.
matrix_of_dotproducts = np.tensordot(vectorvalue.T, vectorvalue)
And the shape of the array/matrix will be (1000, 1000). The (1,1) entry will be the dot product of vectors (v1,v1), the (1,2) entry will be the dot product of vectors (v1,v2), etc. In order to calculate the dot product with numpy for a three-dimensional vector, it's wise to use numpy.tensordot() instead of numpy.dot()
Here's my problem: I'm not beginning with an array of vector values. I'm beginning with three 1000 element arrays of each coordinate values, i.e. an array of x-coordinates, y-coordinates, and z-coordinates.
xvalues = np.array([x1, x2, x3, ..., x1000])
yvalues = np.array([y1, y2, y3, ..., y1000])
zvalues = np.array([z1, z2, z3, ..., z1000])
Is the easiest thing to do to construct a (3, 1000) numpy array/matrix and then take the tensor dot product for each pair?
v1 = np.array([x1,y1,z1])
v2 = np.array([x2,y2,z2])
...
I'm sure there's a more tractable and efficient way to do this...
PS: To be clear, I would like to take a 3D dot product. That is, for vectors
A = (a1, a2, a3)
and B = (b1, b2, b3),
the dot product should be
dotproduct(A,B) = a1b1 + a2b2 + a3b3.
IIUC, you can build the intermediate array as you suggested:
>>> arr = np.vstack([xvalues, yvalues, zvalues]).T
>>> out = arr.dot(arr.T)
Which seems to be what you want:
>>> out.shape
(1000, 1000)
>>> out[3,4]
1.193097281209083
>>> arr[3].dot(arr[4])
1.193097281209083
So, you're not far off with your initial thought. There's very little overhead involved in concatenating the arrays, but if you're interested in doing in within numpy, there's a built-in set of functions, vstack, hstack, and dstack that should perform exactly as you wish. (Vertical, horizontal, and depth respectively)
I'll leave it up to you to determine which to you where, but here's an example shamelessly stolen from the docs to help get you started:
>>> a = np.array([1, 2, 3])
>>> b = np.array([2, 3, 4])
>>> np.vstack((a,b))
array([[1, 2, 3],
[2, 3, 4]])
For reference: vstack docs, hstack docs, and dstack docs
If it feels a little over-the-top to have three separate functions here then you're right! That's why numpy also has the concatenate function. It's just a generalization of vstack, hstack, and dstack that takes an axis argument.
>>> a = np.array([[1, 2], [3, 4]])
>>> b = np.array([[5, 6]])
>>> np.concatenate((a, b), axis=0)
array([[1, 2],
[3, 4],
[5, 6]])
Concatenate docs