Numpy subtraction of every pair of elements of vector

Numpy subtraction of every pair of elements of vector - python

Given a numpy array, say
x = np.array(
[[0],
[1],
[2]]
)
I would like to find the matrix containing a-b for every possible pair a,b in x. I.e.
[[0-0, 0-1, 0-2]
[1-0, 1-1, 1-2]
[2-0, 2-1, 2-2]]
==
[[ 0, -1, -2]
[+1, 0, -1]
[+2, +1, 0]]
I am avoiding using a for loop for the sake of efficiency.

As Michael wrote, numpy broadcasting can help you with this. If you try to perform an operation on a vector a with shape (3,1) with vector b with shape (1,3), numpy under the treats it as if the rows of a were repeated across the columns and columns of b where repeated across the rows result in the operations you described.
That's why Michael told you to subtract the transpose of the first vector with itself to recover the result you asked for. x-x.T
Broadcasting is good because it acheives this with striding, and most of the time uses less memory.
In the more general case, a and b do not have the same length for this to work. There are more details here:
https://numpy.org/doc/stable/user/basics.broadcasting.html

Related

How to slice a numpy array using index arrays with different shapes?

Let's say that we have the following 2d numpy array:
arr = np.array([[1,1,0,1,1],
[0,0,0,1,0],
[1,0,0,0,0],
[0,0,1,0,0],
[0,1,0,0,0]])
and the following indices for rows and columns:
rows = np.array([0,2,4])
cols = np.array([1,2])
The objective is to slice arr using rows and cols to take the following expected result:
arr_sliced = np.array([[1,0],
[0,0],
[1,0]])
Using directly the arrays as indices like arr[rows, cols] leads to:
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (3,) (2,)
So what is the straightforward way to achieve this kind of slicing?
Update: useful information about the solution
So the solution was simple enough and it demands a basic comprehension about numpy's broadcasting. Someone could read these nice but not so representative examples from numpy. Also, the general broadcasting rules explains why there is no shape mismatch in:
arr[rows[:, np.newaxis], cols]
# rows[:, np.newaxis].shape == (3,1)
# cols.shape == (2,)

You can use:
arr[rows[:,None], cols[None]]
Output:
array([[1, 0],
[0, 0],
[1, 0]])

It looks like it is much quicker than indexing for large arrays.
arr[np.ix_([0,2,4],[1,2])]
array([[1, 0],
[0, 0],
[1, 0]])
document: https://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.ix_.html
This function takes N 1-D sequences and returns N outputs with N dimensions each, such that the shape is 1 in all but one dimension and the dimension with the non-unit shape value cycles through all N dimensions.

Randomly choose index based on condition in numpy

Let's say I have 2D numpy array with 0 and 1 as values. I want to randomly pick an index that contains 1. Is there efficient way to do this using numpy?
I achieved it in pure python, but it's too slow.
Example input:
[[0, 1], [1, 0]]
output:
(0, 1)
EDIT:
For clarification: I want my function to get 2D numpy array with values belonging to {0, 1}. I want the output to be a tuple (2D index) of randomly (uniformly) picked value from the given array that is equal to 1.
EDIT2:
Using Paul H's suggestion, I came up with this:
nonzero = np.nonzero(a)
return random.choice(list(zip(nonzero)))
But it doesn't work with numpy's random choice, only with python's. Is there a way to optimise it better?

It's easier to get all the non-zero coordinates and sample from there:
xs,ys = np.where([[0, 1], [1, 0]])
# randomly pick a number:
idx = np.random.choice(np.arange(len(xs)) )
# output:
out = xs[idx], ys[idx]

You may try argwhere and permutation
a = np.array([[0, 1], [1, 0]])
b = np.argwhere(a)
tuple(np.random.permutation(b)[0])

vectorized addition of parameterized arrays

I'd like to sum a series of vectors defined by a (complicated) function of parameters. Is there a way to use vectorization in place of the nested for loop to get the same value for ans in the code below? Memory usage is not to be a constraint. Note that for the actual problem the nested for loop is known to limit performance, despite this not being the case here.
import numpy as np
a1_vec = np.array([0.3, 1])
a2_vec = np.array([3.3, 10])
b1_vec = np.array([0.5, 0.7])
b2_vec = np.array([1.5, 1.3])
x = np.arange(0, 10000)
ans = 0
for a1, b1 in zip(a1_vec, b1_vec):
for a2, b2 in zip(a2_vec, b2_vec):
ans += x*np.exp(- a1 - b2) + x**(1 / 2)*np.cos(b1) + x**(1 / 3)*np.sin(a2)

So as #anon pointed out in his comment, you can use numpy's array broadcasting:
using X[:,None,None] or the more explicit X[:,np.newaxis,np.newaxis] (which just is an more explicit alias for None). With this np.newaxis, you can introduce a new, empty dimension into your array. Your X.shape (dimensionality of the array) will be no longer be (10000) but (10000,1,1).
Any operation on two arrays needs the same dimensions on both arrays (e.g. (2,5) and (2,5) will work). One of the cool features of numpy is the broadcasting, which happens when an array operation is done between two arrays where one dimension that is larger than 1 is combined with a dimension of 1 (e.g. (2,5) and (2,1) where the second dimension is 5 in the first and 1 in the second vector). In this case numpy will broadcast the array with the dimension of 1 by simply looping over it (and it will do so in a c-compiled fast way). In the example it will take the of the (2,5) array with the (2,1) array it will pretend that the second array is also a (2,5) array just with the same two numbers 5 times repeated.
so as an easy example look at
a = np.arange(3)
print(a*a)
#[0,2,4]
this is just normal element by element multiplication
now if you introduce empty dimensions using broadcasting rules:
print(a[:,None]*a[None,:])
#[[0 0 0]
# [0 1 2]
# [0 2 4]]
It's a really cool and the very key to understanding the power of numpy, but admittedly it also took me some time to get familiar with it.

Applying matrix functions like scipy.linalg.eigh to higher dimensional arrays

I am new to numpy but have been using python for quite a while as an engineer.
I am writing a program that currently stores stress tensors as 3x3 numpy arrays within another NxM array which represents values through time and through the thickness of a wall, so overall it is an NxMx3x3 numpy array. I want to efficiently calculate the eigenvals and vectors of each 3x3 array within this larger array. So far I have tried to using "fromiter" but this doesn't seem to work because the functions returns 2 arrays. I have also tried apply_along_axis which also doesn't work because it says the inner 3x3 is not a square matrix? I can do it with list comprehension, but this doesn't seem ideal to resort to using lists.
Example just calculating eigenvals using list comprehension
import numpy as np
from scipy import linalg
a=np.random.random((2,2,3,3))
f=linalg.eigvalsh
ans=np.asarray([f(x) for x in a.reshape((4,3,3))])
ans.shape=(2,2,3)
I thought something like this would work but I have played around with it and can't get it working:
np.apply_along_axis(f,0,a)
BTW the 2x2 bit could be up to 5000x100 and this code is repeated ~50x50x200 times hence the need for efficiency. Any help would be greatly appreciated?

You can use numpy.linalg.eigh. It accepts an array like your example a.
Here's an example. First, create an array of 3x3 symmetric arrays:
In [96]: a = np.random.random((2, 2, 3, 3))
In [97]: a = a + np.transpose(a, axes=(0, 1, 3, 2))
In [98]: a[0, 0]
Out[98]:
array([[0.61145048, 0.85209618, 0.03909677],
[0.85209618, 1.79309413, 1.61209077],
[0.03909677, 1.61209077, 1.55432465]])
Compute the eigenvalues and eigenvectors of all the 3x3 arrays:
In [99]: evals, evecs = np.linalg.eigh(a)
In [100]: evals.shape
Out[100]: (2, 2, 3)
In [101]: evecs.shape
Out[101]: (2, 2, 3, 3)
Take a look at the result for a[0, 0]:
In [102]: evals[0, 0]
Out[102]: array([-0.31729364, 0.83148477, 3.44467813])
In [103]: evecs[0, 0]
Out[103]:
array([[-0.55911658, 0.79634401, 0.23070516],
[ 0.63392772, 0.23128064, 0.73800062],
[-0.53434473, -0.55887877, 0.63413738]])
Verify that it is the same as computing the eigenvalues and eigenvectors for a[0, 0] separately:
In [104]: np.linalg.eigh(a[0, 0])
Out[104]:
(array([-0.31729364, 0.83148477, 3.44467813]),
array([[-0.55911658, 0.79634401, 0.23070516],
[ 0.63392772, 0.23128064, 0.73800062],
[-0.53434473, -0.55887877, 0.63413738]]))

Selecting a column of a numpy array

I am somewhat confused about selecting a column of an NumPy array, because the result is different from Matlab and even from NumPy matrix. Please see the following cases.
In Matlab, we use the following command to select a column vector out of a matrix.
x = [0, 1; 2 3]
out = x(:, 1)
Then out becomes [0; 2], which is a column vector.
To do the same thing with a NumPy Matrix
x = np.matrix([[0, 1], [2, 3]])
out = x[:, 0]
Then the output is np.matrix([[0], [2]]) which is expected, and it is a column vector.
However, in case of NumPy array
x = np.array([[0, 1], [2, 3]])
out = x[:, 0]
The output is np.array([0, 2]) which is 1 dimensional, so it is not a column vector. My expectation is it should have been np.array([[0], [2]]).
I have two questions.
1. Why is the output from the NumPy array case different form the NumPy matrix case? This is causing a lot of confusion to me, but I think there might be some reason for this.
2. To get a column vector from a 2-Dim NumPy Array, then should I do additional things like expand_dims
x = np.array([[0, 1], [2, 3]])
out = np.expand_dims(x[:, 0], axis = 1)

In MATLAB everything has atleast 2 dimensions. In older MATLABs, 2d was it, now they can have more. np.matrix is modeled on that old MATLAB.
What does MATLAB do when you index a 3d matrix?
np.array is more general. It can have 0, 1, 2 or more dimensions.
x[:, 0]
x[0, :]
both select one column or row, and return an array with one less dimension.
x[:, [0]]
x[[0], :]
would return 2d arrays, with a singleton dimension.
In Octave (MATLAB clone) indexing produces inconsistent results, depending on which side of matrix I select:
octave:7> x=ones(2,3,4);
octave:8> size(x)
ans =
2 3 4
octave:9> size(x(1,:,:))
ans =
1 3 4
octave:10> size(x(:,:,1))
ans =
2 3
MATLAB/Octave adds dimensions at the end, and apparently readily squeezes them down on that side as well.
numpy orders the dimensions in the other direction, and can add dimensions at the start as needed. But it is consistent in squeezing out singleton dimensions when indexing.
The fact that numpy can have any number of dimensions, while MATLAB has a minimum of 2 is a crucial difference that often trips up MATLAB users. But one isn't any more logical than the other. MATLAB's practice is more a more matter of history than general principals.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Numpy subtraction of every pair of elements of vector - python

Given a numpy array, say x = np.array( [[0], [1], [2]] ) I would like to find the matrix containing a-b for every possible pair a,b in x. I.e. [[0-0, 0-1, 0-2] [1-0, 1-1, 1-2] [2-0, 2-1, 2-2]] == [[ 0, -1, -2] [+1, 0, -1] [+2, +1, 0]] I am avoiding using a for loop for the sake of efficiency.

Related

How to slice a numpy array using index arrays with different shapes?

Randomly choose index based on condition in numpy

vectorized addition of parameterized arrays

Applying matrix functions like scipy.linalg.eigh to higher dimensional arrays

Selecting a column of a numpy array

Categories

Resources