Broadcast through numpy array with list of arrays - python

Given an adjacency list:
adj_list = [array([0,1]),array([0,1,2]),array([0,2])]
And an array of indices,
ind_arr = array([0,1,2])
Goal:
A = np.zeros((3,3))
for i in ind_arr:
A[i,list(adj_list[x])] = 1.0/float(adj_list[x].shape[0])
Currently, I have written:
A[ind_list[:],adj_list[:]] = 1. / len(adj_list[:])
And tried various configurations of indexing within this scaffold.

Here's one approach -
lens = np.array([len(i) for i in adj_list])
col_idx = np.concatenate(adj_list)
out = np.zeros((len(lens), col_idx.max()+1))
row_idx = np.repeat(np.arange(len(lens)), lens)
vals = np.repeat(1.0/lens, lens)
out[row_idx, col_idx] = vals
Sample input, output -
In [494]: adj_list = [np.array([0,2]),np.array([0,1,4])]
In [496]: out
Out[496]:
array([[ 0.5 , 0. , 0.5 , 0. , 0. ],
[ 0.33333333, 0.33333333, 0. , 0. , 0.33333333]])
Sparse matrix as output
Additionally, if you want to save memory and create a sparse matrix instead, that's an easy extension -
In [506]: from scipy.sparse import csr_matrix
In [507]: csr_matrix((vals, (row_idx, col_idx)), shape=(len(lens), col_idx.max()+1))
Out[507]:
<2x5 sparse matrix of type '<type 'numpy.float64'>'
with 5 stored elements in Compressed Sparse Row format>
In [508]: _.toarray()
Out[508]:
array([[ 0.5 , 0. , 0.5 , 0. , 0. ],
[ 0.33333333, 0.33333333, 0. , 0. , 0.33333333]])

I don't think you can completely eliminate loops due to the mixed data types, but you can reduce the nested double for loops to a single one:
A = np.zeros((2, 3))
for i, arr in enumerate(adj_list):
arr_size = len(arr)
A[i, :arr_size] = 1./arr_size
A
# array([[ 0.5 , 0.5 , 0. ],
# [ 0.33333333, 0.33333333, 0.33333333]])
Or if the numbers in the arrays are actually columns positions:
A = np.zeros((2, 3))
for i, arr in enumerate(adj_list):
A[i, arr] = 1./len(arr)
A
# array([[ 0.5 , 0.5 , 0. ],
# [ 0.33333333, 0.33333333, 0.33333333]])
Another option using MultiLabelBinarizer from sklearn(but may not be as efficient):
from sklearn.preprocessing import MultiLabelBinarizer
​
mlb = MultiLabelBinarizer()
adj_list = [np.array([0,1]),np.array([0,1,2])]
​
sizes = np.fromiter(map(len, adj_list), dtype=int)
mlb.fit_transform(adj_list)/sizes[:,None]
# array([[ 0.5 , 0.5 , 0. ],
# [ 0.33333333, 0.33333333, 0.33333333]])

Related

Efficiently inversing a `csr_matrix` in `scipy` element-wise

Let $A$ be a csr_matrix representing the connectivity matrix for a graph where $A_{ij}$ is the weight of an edge. Now, I need to inverse each non-zero element of the matrix in an efficient way. The way I'm doing this right now is
B = 1.0 / A.toarray()
B[B == np.inf] = 0
This has two down-sides:
memory usage increases by converting a csr_matrix to an array.
a division by zero happens
Are there any suggestions to do this more efficient?
One way you could do this is to create a new matrix from the data, indices and indptr of A: B = csr_matrix((1/A.data, A.indices, A.indptr)).
(This assumes that there are no explicitly stored zeros in A, so 1/A.data doesn't result in some values being inf.)
For example,
In [108]: A
Out[108]:
<4x4 sparse matrix of type '<class 'numpy.float64'>'
with 4 stored elements in Compressed Sparse Row format>
In [109]: A.A
Out[109]:
array([[0. , 1. , 2.5, 0. ],
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 4. ],
[2. , 0. , 0. , 0. ]])
In [110]: B = csr_matrix((1/A.data, A.indices, A.indptr))
In [111]: B
Out[111]:
<4x4 sparse matrix of type '<class 'numpy.float64'>'
with 4 stored elements in Compressed Sparse Row format>
In [112]: B.A
Out[112]:
array([[0. , 1. , 0.4 , 0. ],
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0.25],
[0.5 , 0. , 0. , 0. ]])
csr has a power method:
In [598]: M = sparse.csr_matrix([[0,3,2],[.5,0,10]])
In [599]: M
Out[599]:
<2x3 sparse matrix of type '<class 'numpy.float64'>'
with 4 stored elements in Compressed Sparse Row format>
In [600]: M.A
Out[600]:
array([[ 0. , 3. , 2. ],
[ 0.5, 0. , 10. ]])
In [601]: x = M.power(-1)
In [602]: x
Out[602]:
<2x3 sparse matrix of type '<class 'numpy.float64'>'
with 4 stored elements in Compressed Sparse Row format>
In [603]: x.A
Out[603]:
array([[0. , 0.33333333, 0.5 ],
[2. , 0. , 0.1 ]])

Numpy covariance command returning matrix with more dimensions than input

I have an arbitrary row vector "u" and an arbitrary matrix "e" as follows:
u = np.resize(np.array([8,3]),[1,2])
e = np.resize(np.array([[2,2,5,5],[1, 6, 7, 4]]),[4,2])
np.cov(u,e)
array([[ 12.5, 0. , 0. , -12.5, 7.5],
[ 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ],
[-12.5, 0. , 0. , 12.5, -7.5],
[ 7.5, 0. , 0. , -7.5, 4.5]])
The matrix that this returns is 5x5. This is confusing to me because the largest dimension of the inputs is only 4.
Thus, this may be less of a numpy question and more of a math question...not sure...
Please refer to the official numpy documentation (https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.cov.html) and check whether you usage of the numpy.cov function is consistent with what you are trying to achieve and you understand what you are trying to do.
When looking at the signature
numpy.cov(m, y=None, rowvar=True, bias=False, ddof=None, fweights=None, aweights=None)
m : array_like
A 1-D or 2-D array containing multiple variables and observations.
Each row of m represents a variable, and each column a single observation > > of all those variables. Also see rowvar below.
y : array_like, optional
An additional set of variables and observations. y has the same form as that of m.
Note how m and y are combined as shown in the last example on the page
>>> x = [-2.1, -1, 4.3]
>>> y = [3, 1.1, 0.12]
>>> X = np.stack((x, y), axis=0)
>>> print(np.cov(X))
[[ 11.71 -4.286 ]
[ -4.286 2.14413333]]
>>> print(np.cov(x, y))
[[ 11.71 -4.286 ]
[ -4.286 2.14413333]]
>>> print(np.cov(x))
11.71

How to nullify all entries except for argmax?

Assuming I have a matrix / array / list like a=[1,2,3,4,5] and I want to nullify all entries except for the max so it would be a=[0,0,0,0,5].
I'm using b = [val if idx == np.argmax(a) else 0 for idx,val in enumerate(a)] but is there a better (and faster) way (especially for more than 1-dim arrays...)
You can use numpy for an in-place solution. Note that the below method will make all matches for the max value equal to 0.
import numpy as np
a = np.array([1,2,3,4,5])
a[np.where(a != a.max())] = 0
# array([0, 0, 0, 0, 5])
For unique maxima, see #cᴏʟᴅsᴘᴇᴇᴅ's solution.
Rather than masking, you can create an array of zeros and set the right index appropriately?
1-D (optimised) Solution
(Setup) Convert a to a 1D array: a = np.array([1,2,3,4,5]).
To replace just one instance of the max
b = np.zeros_like(a)
i = np.argmax(a)
b[i] = a[i]
To replace all instances of the max
b = np.zeros_like(a)
m = a == a.max()
b[m] = a[m]
N-D solution
np.random.seed(0)
a = np.random.randn(5, 5)
b = np.zeros_like(a)
m = a == a.max(1, keepdims=True)
b[m] = a[m]
b
array([[0. , 0. , 0. , 2.2408932 , 0. ],
[0. , 0.95008842, 0. , 0. , 0. ],
[0. , 1.45427351, 0. , 0. , 0. ],
[0. , 1.49407907, 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 2.26975462]])
Works for all instances of max per row.

How to create a numpy matrix/2d array from multiple 2d arrays?

In python I would like to build a matrix from four 2d numpy arrays
m = np.eye(3, 3)
c = np.random.rand(2, 3)
cT = c.T
z = np.zeros([min(np.shape(c)), min(np.shape(c))])
and the new matrix shape is defined as:
[[m, cT],
[c, z]]
or like this (with numerical data):
1. 0. 0. 0.0109 0.5339
0. 1. 0. 0.4991 0.9854
0. 0. 1. 0.5942 0.7565
0.0109 0.4991 0.5942 0. 0.
0.5339 0.9854 0.7565 0. 0.
I would like to ask you what would be the easiest was and also the quickest (CPU-wise) in python using numpy
The most straightforward way is to copy each piece of data across to the appropriate slice
>>> m = np.eye(3, 3)
>>> c = np.random.rand(2, 3)
>>> cT = c.T
>>> z = np.empty([min(np.shape(c)), min(np.shape(c))])
>>> X = np.eye(5, 5)
>>> X[:3, :3] = m
>>> X[:3, -2:] = c.T
>>> X[-2:, :3] = c
>>> X[-2:, -2:] = z
>>> X
array([[ 1. , 0. , 0. , 0.98834141, 0.69806125],
[ 0. , 1. , 0. , 0.97342311, 0.97368278],
[ 0. , 0. , 1. , 0.28701318, 0.08705423],
[ 0.98834141, 0.97342311, 0.28701318, 0. , 0. ],
[ 0.69806125, 0.97368278, 0.08705423, 0. , 0. ]])
>>>
Combining vstack and hstack can do this:
from numpy import ones, hstack, vstack
a, b, c, d = ones((3,3)), 2*ones((3,2)), 3*ones((2,3)), 4*ones((2,2))
x = hstack(( vstack((a, c)), vstack((b, d)) ))
[[ 1. 1. 1. 2. 2.]
[ 1. 1. 1. 2. 2.]
[ 1. 1. 1. 2. 2.]
[ 3. 3. 3. 4. 4.]
[ 3. 3. 3. 4. 4.]]

Build diagonal matrix without using for loop

I am trying to build the following matrix in Python without using a for loop:
A
[[ 0.1 0.2 0. 0. 0. ]
[ 1. 2. 3. 0. 0. ]
[ 0. 1. 2. 3. 0. ]
[ 0. 0. 1. 2. 3. ]
[ 0. 0. 0. 4. 5. ]]
I tried the fill_diagonal method in NumPy (see matrix B below) but it does not give me the same matrix as shown in matrix A:
B
[[ 1. 0.2 0. 0. 0. ]
[ 0. 2. 0. 0. 0. ]
[ 0. 0. 3. 0. 0. ]
[ 0. 0. 0. 1. 0. ]
[ 0. 0. 0. 4. 5. ]]
Here is the Python code that I used to construct the matrices:
import numpy as np
import scipy.linalg as sp # maybe use scipy to build diagonal matrix?
#---- build diagonal square array using "for" loop
m = 5
A = np.zeros((m, m))
A[0, 0] = 0.1
A[0, 1] = 0.2
for i in range(1, m-1):
A[i, i-1] = 1 # m-1
A[i, i] = 2 # m
A[i, i+1] = 3 # m+1
A[m-1, m-2] = 4
A[m-1, m-1] = 5
print('A \n', A)
#---- build diagonal square array without loop
B = np.zeros((m, m))
B[0, 0] = 0.1
B[0, 1] = 0.2
np.fill_diagonal(B, [1, 2, 3])
B[m-1, m-2] = 4
B[m-1, m-1] = 5
print('B \n', B)
So is there a way to construct a diagonal matrix like the one shown by matrix A without using a for loop?
There are functions for this in scipy.sparse, e.g.:
from scipy.sparse import diags
C = diags([1,2,3], [-1,0,1], shape=(5,5), dtype=float)
C = C.toarray()
C[0, 0] = 0.1
C[0, 1] = 0.2
C[-1, -2] = 4
C[-1, -1] = 5
Diagonal matrices are generally very sparse, so you could also keep it as a sparse matrix. This could even have large efficiency benefits, depending on the application.
The efficiency gains sparse matrices could give you depend very much on matrix size. For a 5x5 array you can't really be bothered I guess. But for larger matrices creating the array could be a lot faster with sparse matrices, illustrated by the following example with an identity matrix:
%timeit np.eye(3000)
# 100 loops, best of 3: 3.12 ms per loop
%timeit sparse.eye(3000)
# 10000 loops, best of 3: 79.5 µs per loop
But the real strength of the sparse matrix data type is shown when you need to do mathematical operations on arrays that are sparse:
%timeit np.eye(3000).dot(np.eye(3000))
# 1 loops, best of 3: 2.8 s per loop
%timeit sparse.eye(3000).dot(sparse.eye(3000))
# 1000 loops, best of 3: 1.11 ms per loop
Or when you need to work with some very large but sparse array:
np.eye(1E6)
# ValueError: array is too big.
sparse.eye(1E6)
# <1000000x1000000 sparse matrix of type '<type 'numpy.float64'>'
# with 1000000 stored elements (1 diagonals) in DIAgonal format>
Notice that the number of 0 is always 3 (or a constant whenever you want to have a diagonal matrix like this):
In [10]:
import numpy as np
A1=[0.1, 0.2]
A2=[1,2,3]
A3=[4,5]
SPC=[0,0,0] #=or use np.zeros #spacing zeros
np.hstack((A1,SPC,A2,SPC,A2,SPC,A2,SPC,A3)).reshape(5,5)
Out[10]:
array([[ 0.1, 0.2, 0. , 0. , 0. ],
[ 1. , 2. , 3. , 0. , 0. ],
[ 0. , 1. , 2. , 3. , 0. ],
[ 0. , 0. , 1. , 2. , 3. ],
[ 0. , 0. , 0. , 4. , 5. ]])
In [11]:
import itertools #A more general way of doing it
np.hstack(list(itertools.chain(*[(item, SPC) for item in [A1, A2, A2, A2, A3]]))[:-1]).reshape(5,5)
Out[11]:
array([[ 0.1, 0.2, 0. , 0. , 0. ],
[ 1. , 2. , 3. , 0. , 0. ],
[ 0. , 1. , 2. , 3. , 0. ],
[ 0. , 0. , 1. , 2. , 3. ],
[ 0. , 0. , 0. , 4. , 5. ]])

Categories

Resources