I am trying to calculate the rational basis for null space of a matrix. There is quite a few posts about how nullspace is calculated using Python/numpy but they calculate it for orthonormal basis and not for the rational basis. Here is how this is done in MATLAB:
ns = null(A,'r')
When I look at the source code, I saw that it is calculated like this:
function Z = null(A,how)
[m,n] = size(A)
%...
[R,pivcol] = rref(A);
r = length(pivcol);
nopiv = 1:n;
nopiv(pivcol) = [];
Z = zeros(n,n-r,class(A));
if n > r
Z(nopiv,:) = eye(n-r,n-r,class(A));
if r > 0
Z(pivcol,:) = -R(1:r,nopiv);
end
end
%...
function [A,jb] = rref(A,tol)
%...
[m,n] = size(A);
[num, den] = rat(A);
rats = isequal(A,num./den);
if (nargin < 2), tol = max(m,n)*eps(class(A))*norm(A,'inf'); end
i = 1;
j = 1;
jb = [];
while (i <= m) && (j <= n)
[p,k] = max(abs(A(i:m,j))); k = k+i-1;
if (p <= tol)
A(i:m,j) = zeros(m-i+1,1);
j = j + 1;
else
jb = [jb j];
A([i k],j:n) = A([k i],j:n);
A(i,j:n) = A(i,j:n)/A(i,j);
for k = [1:i-1 i+1:m]
A(k,j:n) = A(k,j:n) - A(k,j)*A(i,j:n);
end
i = i + 1;
j = j + 1;
end
end
if rats
[num,den] = rat(A);
A=num./den;
end
Here rref is the reduced row echelon form. Thus by looking at this source code I tried to recreate it with following code:
def fract(x):
return Fraction(x)
def dnm(x):
return x.denominator
def nmr(x):
return x.numerator
fractionize = np.vectorize(fract)
denom = np.vectorize(dnm)
numer = np.vectorize(nmr)
def rref(A,tol=1e-12):
m,n = A.shape
Ar = A.copy()
i,j = 0,0
jb = []
while i < m and j < n:
p = np.max(np.abs(Ar[i:m,j]))
k = np.where(np.abs(Ar[i:m,j]) == p)[0][0]
k = k + i - 1
if (p <= tol):
Ar[i:m,j] = np.zeros((m-i,))
j += 1
else:
jb.append(j)
Ar[(i,k),j:n] = Ar[(k,i),j:n]
Ar[i,j:n] = Ar[i,j:n]/Ar[i,j]
for k in np.hstack((np.arange(0,i),np.arange(i+1,m))):
Ar[k,j:n] = Ar[k,j:n] - Ar[k,j]*A[i,j:n]
i += 1
j += 1
print(len(jb))
return Ar,jb
def null(A,tol=1e-5):
m,n = A.shape
R,pivcol = rref(A,tol=tol)
print(pivcol)
r = len(pivcol)
nopiv = np.ones(n).astype(bool)
nopiv[pivcol] = np.zeros(r).astype(bool)
Z = np.zeros((n,n-r))
if n > r:
Z[nopiv,:] = np.eye(n-r,n-r)
if r > 0:
Z[pivcol,:] = -R[:r,nopiv]
return Z
There are two things that I don't know. First, I do not know how to add the ratios part into rref function. Second, I am not sure if my indexes are correct since MATLAB's indices are start from 1 and indexing includes the last element when you choose for a slice (i.e. 1:5 includes both 1 and 5).
SymPy does that out of the box, although (being symbolic, and in Python) not as fast as NumPy or Scipy would. An example with floating point input:
from sympy import Matrix, S, nsimplify
M = Matrix([[2.75, -1.2, 0, 3.2], [8.29, -4.8, 7, 0.01]])
print(nsimplify(M, rational=True).nullspace())
Prints a list of two column vectors, represented as one-column matrices.
[Matrix([
[ 700/271],
[9625/1626],
[ 1],
[ 0]]), Matrix([
[ -1279/271],
[-17667/2168],
[ 0],
[ 1]])]
The use of nsimplify was necessary to convert floats to the rationals that they were meant to represent. If the matrix is created as a matrix of integer/rational entries, that would not be necessary.
M = Matrix([[1, 2, 3, 5, 9], [9, -3, 0, 2, 4], [S(3)/2, 0, -1, 2, 0]])
print(M.nullspace())
[Matrix([
[ -74/69],
[-176/69],
[ 9/23],
[ 1],
[ 0]]), Matrix([
[ -70/69],
[-118/69],
[ -35/23],
[ 0],
[ 1]])]
Here, S(3)/2 is used instead of `3/2 in order to force SymPy object creation instead of floating point evaluation.
Related
I'm trying to solve a minimization problem where an initial solution is already present and the objective function is based on this initial solution.
I have some sort of line y_line which is an initial mapping of resources and stations:
y_line = np.array([[1, 0, 0],
[0, 1, 0],
[0, 0, 1]])
Additionally, I have a savings array for selling from the line S, an array for buying new EC and for processing P
S = np.array([[-260., -260., -260.],
[-30., -30., -30.],
[360., 360., 360.]], dtype=int)
EC = np.array([[1000, 1000, 1000],
[2000, 2000, 2000],
[5000, 5000, 5000]], dtype=int)
P = np.array([[720., 720., 720.],
[1440., 1440., 1440.],
[3600., 3600., 3600.]], dtype=int)
Using just a simplified constraint: every workstation i must have at least one resource j -> sum(y[i, j] for j in j_idx) == 1 for all i in i_idx.
My objective is that every sold resource from the initial y_line brings us savings, every newly bought costs us and the solution (the new line) y has a processing cost for operating. I have defined the objective as follows:
y_delta = y - y_line # delta between new line (y) and old line (y_line)
y_delta_plus = np.zeros(y.shape, dtype=object) # 1
y_delta_minus = np.zeros(y.shape, dtype=object) # 2
# I -> new bought resources
y_delta_plus[y_delta >= 0] = y_delta[y_delta >= 0]
# II -> sold resources
y_delta_minus[y_delta <= 0] = y_delta[y_delta <= 0]
c_i = y_delta_plus * EC # invest
c_s = y_delta_minus * S # savings
c_p = y * P # processing cost
c_y = np.sum(c_s + c_i + c_p)
However, if I solve this model (full code see below), then the objective value (5760) doesn't match my sanity check calculations (12430). Would it be possible to set initial values for y[i, j]? Or is there another function to achieve this?
from ortools.linear_solver import pywraplp
import numpy as np
y_line = np.array([[1, 0, 0],
[0, 1, 0],
[0, 0, 1]])
S = np.array([[-260., -260., -260.],
[-30., -30., -30.],
[360., 360., 360.]], dtype=int)
EC = np.array([[1000, 1000, 1000],
[2000, 2000, 2000],
[5000, 5000, 5000]], dtype=int)
P = np.array([[720., 720., 720.],
[1440., 1440., 1440.],
[3600., 3600., 3600.]], dtype=int)
solver = pywraplp.Solver('stack', pywraplp.Solver.SAT_INTEGER_PROGRAMMING)
y = np.zeros_like(y_line, dtype=object)
i_idx = range(y_line.shape[0])
j_idx = range(y_line.shape[1])
for i in i_idx:
for j in j_idx:
y[i, j] = solver.IntVar(0, 1, 'y[%i_%i]' % (i, j))
for i in i_idx:
solver.Add(
sum(y[i, j] for j in j_idx) == 1
)
def objective(y, y_line):
y_delta = y - y_line # delta between new line (y) and old line (y_line)
y_delta_plus = np.zeros(y.shape, dtype=object) # 1
y_delta_minus = np.zeros(y.shape, dtype=object) # 2
# I -> new bought resources
y_delta_plus[y_delta >= 0] = y_delta[y_delta >= 0]
# II -> sold resources
y_delta_minus[y_delta <= 0] = y_delta[y_delta <= 0]
c_i = y_delta_plus * EC # invest
c_s = y_delta_minus * S # savings
c_p = y * P # processing
return np.sum(c_s + c_i + c_p)
c_y = objective(y=y, y_line=y_line)
solver.Minimize(
c_y
)
# [START solve]
print("Number of constraints:", solver.NumConstraints())
print("Number of variables:", solver.NumVariables())
status = solver.Solve()
# [END solve]
y_new = np.zeros_like(y)
for i in range(y_line.shape[0]):
for j in range(y_line.shape[1]):
if y[i, j].solution_value() > 0:
y_new[i, j] = y[i, j].solution_value()
print(f"Objective sat: {solver.Objective().Value()}")
print(y_new)
# Number of constraints: 3
# Number of variables: 9
# Objective sat: 5760.0
# [[1.0 0 0]
# [1.0 0 0]
# [1.0 0 0]]
# %%
c_y_test = objective(y=y_new, y_line=y_line)
c_y_test # -> 12430.0
The model can be solved. However, not with the approach, I chose in the first place. Using a pywraplp model it didn't work, yet with a cp_model it can be solved using predefined variables (as mentioned by #sascha). The arrays y_line, S, EC and P are the same as above. The solemn constraint is the same as well. Yet, the "filtering" I could solve using:
for i in range(len(y_cp.flatten())):
model.AddElement(i, y_delta.flatten().tolist(), y_cp.flatten().tolist()[i] - y_line.flatten().tolist()[i])
for i in i_idx:
for j in j_idx:
model.AddMaxEquality(y_delta_plus[i, j], [y_delta[i, j], model.NewConstant(0)])
model.AddMinEquality(y_delta_minus[i, j], [y_delta[i, j], model.NewConstant(0)])
model.Minimize(
np.sum(y_delta_plus * EC) + np.sum(y_delta_minus * S) + np.sum(y_cp * P)
)
The solving and sanity check yields:
solver_cp = cp_model.CpSolver()
solver_cp.Solve(model)
y_new_cp = np.zeros_like(y_cp)
for i in i_idx:
for j in j_idx:
if solver_cp.Value(y_cp[i, j]) > 0:
y_new_cp[i, j] = solver_cp.Value(y_cp[i, j])
print(f"Objective cp: {solver_cp.ObjectiveValue()}")
print(y_new_cp)
# Objective cp: 5760.0
# [[1 0 0]
# [0 1 0]
# [1 0 0]]
c_y_test = objective(y=y_new_cp, y_line=y_line)
c_y_test # -> 5760 -> Correct
The cp_model could solve it and match the sanity check.
With the pywraplp model I couldn't figure out how to solve it.
I have two sorted, numpy arrays similar to these ones:
x = np.array([1, 2, 8, 11, 15])
y = np.array([1, 8, 15, 17, 20, 21])
Elements never repeat in the same array. I want to figure out a way of pythonicaly figuring out a list of indexes that contain the locations in the arrays at which the same element exists.
For instance, 1 exists in x and y at index 0. Element 2 in x doesn't exist in y, so I don't care about that item. However, 8 does exist in both arrays - in index 2 in x but index 1 in y. Similarly, 15 exists in both, in index 4 in x, but index 2 in y. So the outcome of my function would be a list that in this case returns [[0, 0], [2, 1], [4, 2]].
So far what I'm doing is:
def get_indexes(x, y):
indexes = []
for i in range(len(x)):
# Find index where item x[i] is in y:
j = np.where(x[i] == y)[0]
# If it exists, save it:
if len(j) != 0:
indexes.append([i, j[0]])
return indexes
But the problem is that arrays x and y are very large (millions of items), so it takes quite a while. Is there a better pythonic way of doing this?
Without Python loops
Code
def get_indexes_darrylg(x, y):
' darrylg answer '
# Use intersect to find common elements between two arrays
overlap = np.intersect1d(x, y)
# Indexes of common elements in each array
loc1 = np.searchsorted(x, overlap)
loc2 = np.searchsorted(y, overlap)
# Result is the zip two 1d numpy arrays into 2d array
return np.dstack((loc1, loc2))[0]
Usage
x = np.array([1, 2, 8, 11, 15])
y = np.array([1, 8, 15, 17, 20, 21])
result = get_indexes_darrylg(x, y)
# result[0]: array([[0, 0],
[2, 1],
[4, 2]], dtype=int64)
Timing Posted Solutions
Results show that darrlg code has the fastest run time.
Code Adjustment
Each posted solution as a function.
Slight mod so that each solution outputs an numpy array.
Curve named after poster
Code
import numpy as np
import perfplot
def create_arr(n):
' Creates pair of 1d numpy arrays with half the elements equal '
max_val = 100000 # One more than largest value in output arrays
arr1 = np.random.randint(0, max_val, (n,))
arr2 = arr1.copy()
# Change half the elements in arr2
all_indexes = np.arange(0, n, dtype=int)
indexes = np.random.choice(all_indexes, size = n//2, replace = False) # locations to make changes
np.put(arr2, indexes, np.random.randint(0, max_val, (n//2, ))) # assign new random values at change locations
arr1 = np.sort(arr1)
arr2 = np.sort(arr2)
return (arr1, arr2)
def get_indexes_lllrnr101(x,y):
' lllrnr101 answer '
ans = []
i=0
j=0
while (i<len(x) and j<len(y)):
if x[i] == y[j]:
ans.append([i,j])
i += 1
j += 1
elif (x[i]<y[j]):
i += 1
else:
j += 1
return np.array(ans)
def get_indexes_joostblack(x, y):
'joostblack'
indexes = []
for idx,val in enumerate(x):
idy = np.searchsorted(y,val)
try:
if y[idy]==val:
indexes.append([idx,idy])
except IndexError:
continue # ignore index errors
return np.array(indexes)
def get_indexes_mustafa(x, y):
indices_in_x = np.flatnonzero(np.isin(x, y)) # array([0, 2, 4])
indices_in_y = np.flatnonzero(np.isin(y, x[indices_in_x])) # array([0, 1, 2]
return np.array(list(zip(indices_in_x, indices_in_y)))
def get_indexes_darrylg(x, y):
' darrylg answer '
# Use intersect to find common elements between two arrays
overlap = np.intersect1d(x, y)
# Indexes of common elements in each array
loc1 = np.searchsorted(x, overlap)
loc2 = np.searchsorted(y, overlap)
# Result is the zip two 1d numpy arrays into 2d array
return np.dstack((loc1, loc2))[0]
def get_indexes_akopcz(x, y):
' akopcz answer '
return np.array([
[i, j]
for i, nr in enumerate(x)
for j in np.where(nr == y)[0]
])
perfplot.show(
setup = create_arr, # tuple of two 1D random arrays
kernels=[
lambda a: get_indexes_lllrnr101(*a),
lambda a: get_indexes_joostblack(*a),
lambda a: get_indexes_mustafa(*a),
lambda a: get_indexes_darrylg(*a),
lambda a: get_indexes_akopcz(*a),
],
labels=["lllrnr101", "joostblack", "mustafa", "darrylg", "akopcz"],
n_range=[2 ** k for k in range(5, 21)],
xlabel="Array Length",
# More optional arguments with their default values:
# logx="auto", # set to True or False to force scaling
# logy="auto",
equality_check=None, #np.allclose, # set to None to disable "correctness" assertion
# show_progress=True,
# target_time_per_measurement=1.0,
# time_unit="s", # set to one of ("auto", "s", "ms", "us", or "ns") to force plot units
# relative_to=1, # plot the timings relative to one of the measurements
# flops=lambda n: 3*n, # FLOPS plots
)
What you are doing is O(nlogn) which is decent enough.
If you want, you can do it in O(n) by iterating on both arrays with two pointers and since they are sorted, increase the pointer for the array with smaller object.
See below:
x = [1, 2, 8, 11, 15]
y = [1, 8, 15, 17, 20, 21]
def get_indexes(x,y):
ans = []
i=0
j=0
while (i<len(x) and j<len(y)):
if x[i] == y[j]:
ans.append([i,j])
i += 1
j += 1
elif (x[i]<y[j]):
i += 1
else:
j += 1
return ans
print(get_indexes(x,y))
which gives me:
[[0, 0], [2, 1], [4, 2]]
Although, this function will search for all the occurances of x[i] in the y array, if duplicates are not allowed in y it will find x[i] exactly once.
def get_indexes(x, y):
return [
[i, j]
for i, nr in enumerate(x)
for j in np.where(nr == y)[0]
]
You can use numpy.searchsorted:
def get_indexes(x, y):
indexes = []
for idx,val in enumerate(x):
idy = np.searchsorted(y,val)
if y[idy]==val:
indexes.append([idx,idy])
return indexes
One solution is to first look from x's side to see what values are included in y by getting their indices through np.isin and np.flatnonzero, and then use the same procedure from the other side; but instead of giving x entirely, we give only the (already found) intersected elements to gain time:
indices_in_x = np.flatnonzero(np.isin(x, y)) # array([0, 2, 4])
indices_in_y = np.flatnonzero(np.isin(y, x[indices_in_x])) # array([0, 1, 2])
Now you can zip them to get the result:
result = list(zip(indices_in_x, indices_in_y)) # [(0, 0), (2, 1), (4, 2)]
I'm attempting to convert a double summation formula into code, but can't figure out the correct matrix/vector representation of it.
The first summation is i to n, and the second is over j > i to n.
I'm guessing there is a much more efficient & pythonic way of writing this?
I resorted to nested for loops to just get it working but, as expected, it runs very slowly with a large dataset:
def wapc_denom(weights, vols):
x = []
y = []
for i, wi in enumerate(weights):
for j, wj in enumerate(weights):
if j > i:
x.append(wi * wj * vols[i] * vols[j])
y.append(np.sum(x))
return np.sum(y)
Edit:
Using guidance from smci's answer I think I have a potential solution:
def wapc_denom2(weights, vols):
return np.sum(np.tril(np.outer(weights, vols.T)**2, k=-1))
Assuming you want to count every term only once (for that you have to move the x = [] into the outer loop) one cheap way of computing the sum would be
Create mock data
weights = np.random.random(10)
vols = np.random.random(10)
Do the calculation
wv = weights * vols
result = (wv.sum()**2 - wv#wv) / 2
Check that it's the same
def wapc_denom(weights, vols):
y = []
for i, wi in enumerate(weights):
x = []
for j, wj in enumerate(weights):
if j > i:
x.append(wi * wj * vols[i] * vols[j])
y.append(np.sum(x))
return np.sum(y)
assert np.allclose(result, wapc_denom(weights, vols))
Why does it work?
What we are doing is compute the sum of the full matrix, subtract the diagonal and divide by two. This is cheap because it is easy to verify that the sum of an outer product is just the product of the summed factors.
wi * wj * vols[i] * vols[j] is a telltale. vols is another vector, so first you want to compute the vector wv = w * vols
then (wj * vols[j]) * (wi * vols[i]) = wv^T * wv is your (matrix outer product) expression; that's a column vector * a row vector. But actually you only want the sum. So I don't see a need to construct a vector y.append(np.sum(x)), you're only going to sum it anyway np.sum(y)
also the if j > i part means you only want the sum of the Lower Triangular part, and exclude the diagonal.
EDIT: the result is fully determined just from wv, I didn't think we needed the matrix to get the sum, and we didn't need the diagonal; #PaulPanzer found the most compact expression.
You can use triangulations in numpy, check np.triu and np.meshgrid. Do:
np.product(np.triu(np.meshgrid(weights,weights), 1) * np.triu(np.meshgrid(vols,vols), 1),0).sum(1).cumsum().sum()
Example:
w = np.arange(4) +1
v = np.array([1,3,2,2])
print(np.triu(np.meshgrid(w,w), k=1))
>>array([[[0, 2, 3, 4],
[0, 0, 3, 4],
[0, 0, 0, 4],
[0, 0, 0, 0]],
[[0, 1, 1, 1],
[0, 0, 2, 2],
[0, 0, 0, 3],
[0, 0, 0, 0]]])
# example of product + triu + meshgrid (your x values):
print(np.product(np.triu(np.meshgrid(w,w), 1) * np.triu(np.meshgrid(v,v), 1),0))
>>array([[ 0, 6, 6, 8],
[ 0, 0, 36, 48],
[ 0, 0, 0, 48],
[ 0, 0, 0, 0]])
print(np.product(np.triu(np.meshgrid(w,w), 1) * np.triu(np.meshgrid(v,v), 1),0).sum(1).cumsum().sum())
>> 428
print(wapc_denom(w, v))
>> 428
Pandas has a widely-used groupby facility to split up a DataFrame based on a corresponding mapping, from which you can apply a calculation on each subgroup and recombine the results.
Can this be done flexibly in NumPy without a native Python for-loop? With a Python loop, this would look like:
>>> import numpy as np
>>> X = np.arange(10).reshape(5, 2)
>>> groups = np.array([0, 0, 0, 1, 1])
# Split up elements (rows) of `X` based on their element wise group
>>> np.array([X[groups==i].sum() for i in np.unique(groups)])
array([15, 30])
Above 15 is the sum of the first three rows of X, and 30 is the sum of the remaining two.
By "flexibly,” I just mean that we aren't focusing on one particular computation such as sum, count, maximum, etc, but rather passing any computation to the grouped arrays.
If not, is there a faster approach than the above?
How about using scipy sparse matrix
import numpy as np
from scipy import sparse
import time
x_len = 500000
g_len = 100
X = np.arange(x_len * 2).reshape(x_len, 2)
groups = np.random.randint(0, g_len, x_len)
# original
s = time.time()
a = np.array([X[groups==i].sum() for i in np.unique(groups)])
print(time.time() - s)
# using scipy sparse matrix
s = time.time()
x_sum = X.sum(axis=1)
b = np.array(sparse.coo_matrix(
(
x_sum,
(groups, np.arange(len(x_sum)))
),
shape=(g_len, x_len)
).sum(axis=1)).ravel()
print(time.time() - s)
#compare
print(np.abs((a-b)).sum())
result on my PC
0.15915322303771973
0.012875080108642578
0
More than 10 times faster.
Update!
Let's benchmark answers of #Paul Panzer and #Daniel F. It is summation only benchmark.
import numpy as np
from scipy import sparse
import time
# by #Daniel F
def groupby_np(X, groups, axis = 0, uf = np.add, out = None, minlength = 0, identity = None):
if minlength < groups.max() + 1:
minlength = groups.max() + 1
if identity is None:
identity = uf.identity
i = list(range(X.ndim))
del i[axis]
i = tuple(i)
n = out is None
if n:
if identity is None: # fallback to loops over 0-index for identity
assert np.all(np.in1d(np.arange(minlength), groups)), "No valid identity for unassinged groups"
s = [slice(None)] * X.ndim
for i_ in i:
s[i_] = 0
out = np.array([uf.reduce(X[tuple(s)][groups == i]) for i in range(minlength)])
else:
out = np.full((minlength,), identity, dtype = X.dtype)
uf.at(out, groups, uf.reduce(X, i))
if n:
return out
x_len = 500000
g_len = 200
X = np.arange(x_len * 2).reshape(x_len, 2)
groups = np.random.randint(0, g_len, x_len)
print("original")
s = time.time()
a = np.array([X[groups==i].sum() for i in np.unique(groups)])
print(time.time() - s)
print("use scipy coo matrix")
s = time.time()
x_sum = X.sum(axis=1)
b = np.array(sparse.coo_matrix(
(
x_sum,
(groups, np.arange(len(x_sum)))
),
shape=(g_len, x_len)
).sum(axis=1)).ravel()
print(time.time() - s)
#compare
print(np.abs((a-b)).sum())
print("use scipy csr matrix #Daniel F")
s = time.time()
x_sum = X.sum(axis=1)
c = np.array(sparse.csr_matrix(
(
x_sum,
groups,
np.arange(len(groups)+1)
),
shape=(len(groups), g_len)
).sum(axis=0)).ravel()
print(time.time() - s)
#compare
print(np.abs((a-c)).sum())
print("use bincount #Paul Panzer #Daniel F")
s = time.time()
d = np.bincount(groups, X.sum(axis=1), g_len)
print(time.time() - s)
#compare
print(np.abs((a-d)).sum())
print("use ufunc #Daniel F")
s = time.time()
e = groupby_np(X, groups)
print(time.time() - s)
#compare
print(np.abs((a-e)).sum())
STDOUT
original
0.2882847785949707
use scipy coo matrix
0.012301445007324219
0
use scipy csr matrix #Daniel F
0.01046299934387207
0
use bincount #Paul Panzer #Daniel F
0.007468223571777344
0.0
use ufunc #Daniel F
0.04431319236755371
0
The winner is the bincount solution. But the csr matrix solution is also very interesting.
#klim's sparse matrix solution would at first sight appear to be tied to summation. We can, however, use it in the general case by converting between the csr and csc formats:
Let's look at a small example:
>>> m, n = 3, 8
>>> idx = np.random.randint(0, m, (n,))
>>> data = np.arange(n)
>>>
>>> M = sparse.csr_matrix((data, idx, np.arange(n+1)), (n, m))
>>>
>>> idx
array([0, 2, 2, 1, 1, 2, 2, 0])
>>>
>>> M = M.tocsc()
>>>
>>> M.indptr, M.indices
(array([0, 2, 4, 8], dtype=int32), array([0, 7, 3, 4, 1, 2, 5, 6], dtype=int32))
As we can see after conversion the internal representation of the sparse matrix yields the indices grouped and sorted:
>>> groups = np.split(M.indices, M.indptr[1:-1])
>>> groups
[array([0, 7], dtype=int32), array([3, 4], dtype=int32), array([1, 2, 5, 6], dtype=int32)]
>>>
We could have obtained the same using a stable argsort:
>>> np.argsort(idx, kind='mergesort')
array([0, 7, 3, 4, 1, 2, 5, 6])
>>>
But sparse matrices are actually faster, even when we allow argsort to use a faster non-stable algorithm:
>>> m, n = 1000, 100000
>>> idx = np.random.randint(0, m, (n,))
>>> data = np.arange(n)
>>>
>>> timeit('sparse.csr_matrix((data, idx, np.arange(n+1)), (n, m)).tocsc()', **kwds)
2.250748165184632
>>> timeit('np.argsort(idx)', **kwds)
5.783584725111723
If we require argsort to keep groups sorted, the difference is even larger:
>>> timeit('np.argsort(idx, kind="mergesort")', **kwds)
10.507467685034499
If you want a more flexible implementation of groupby that can group using any of numpy's ufuncs:
def groupby_np(X, groups, axis = 0, uf = np.add, out = None, minlength = 0, identity = None):
if minlength < groups.max() + 1:
minlength = groups.max() + 1
if identity is None:
identity = uf.identity
i = list(range(X.ndim))
del i[axis]
i = tuple(i)
n = out is None
if n:
if identity is None: # fallback to loops over 0-index for identity
assert np.all(np.in1d(np.arange(minlength), groups)), "No valid identity for unassinged groups"
s = [slice(None)] * X.ndim
for i_ in i:
s[i_] = 0
out = np.array([uf.reduce(X[tuple(s)][groups == i]) for i in range(minlength)])
else:
out = np.full((minlength,), identity, dtype = X.dtype)
uf.at(out, groups, uf.reduce(X, i))
if n:
return out
groupby_np(X, groups)
array([15, 30])
groupby_np(X, groups, uf = np.multiply)
array([ 0, 3024])
groupby_np(X, groups, uf = np.maximum)
array([5, 9])
groupby_np(X, groups, uf = np.minimum)
array([0, 6])
There's probably a faster way than this (both of the operands are making copies right now), but:
np.bincount(np.broadcast_to(groups, X.T.shape).ravel(), X.T.ravel())
array([ 15., 30.])
If you want to extend the answer to a ndarray, and still have a fast computation, you could extend the Daniel's solution :
x_len = 500000
g_len = 200
y_len = 2
X = np.arange(x_len * y_len).reshape(x_len, y_len)
groups = np.random.randint(0, g_len, x_len)
# original
a = np.array([X[groups==i].sum(axis=0) for i in np.unique(groups)])
# alternative
bins = [0] + list(np.bincount(groups, minlength=g_len).cumsum())
Z = np.argsort(groups)
d = np.array([X.take(Z[bins[i]:bins[i+1]],0).sum(axis=0) for i in range(g_len)])
It took about 30 ms (15ms for creating bins + 15ms for summing) instead of 280 ms on the original way in this example.
d.shape
>>> (1000, 2)
Let's say I have an 2D array of (N, N) shape:
import numpy as np
my_array = np.random.random((N, N))
Now I want to do some computations only on some "cells" of this array, for instance the ones inside the central part of the array. To avoid doing computations on cells I'm not interested in, what I usually do here is create a Boolean mask, in this spirit:
my_mask = np.zeros_like(my_array, bool)
my_mask[40:61,40:61] = True
my_array[my_mask] = some_twisted_computations(my_array[my_mask])
But what if some_twisted_computations() involves values of the neighboring cells if they are inside the mask? Performance-wise, would it be a good idea to create an "adjacency array" with a (len(my_mask), 4) shape, storing the index of 4-connected neighbor cells in the flat my_array[mask] array that I will use in some_twisted_computations()? If yes, what are the efficient options for computing such adjacency array? Should I switch to lower-level langage/other data structures?
My real-worlds arrays shapes are around (1000,1000,1000), the mask concerns only a small subset (~100000) of these values and is of rather complex geometry. I hope my questions make sense...
EDIT: the very dirty and slow solution I've worked out:
wall = mask
i = 0
top_neighbors = []
down_neighbors = []
left_neighbors = []
right_neighbors = []
indices = []
for index, val in np.ndenumerate(wall):
if not val:
continue
indices += [index]
if wall[index[0] + 1, index[1]]:
down_neighbors += [(index[0] + 1, index[1])]
else:
down_neighbors += [i]
if wall[index[0] - 1, index[1]]:
top_neighbors += [(index[0] - 1, index[1])]
else:
top_neighbors += [i]
if wall[index[0], index[1] - 1]:
left_neighbors += [(index[0], index[1] - 1)]
else:
left_neighbors += [i]
if wall[index[0], index[1] + 1]:
right_neighbors += [(index[0], index[1] + 1)]
else:
right_neighbors += [i]
i += 1
top_neighbors = [i if type(i) is int else indices.index(i) for i in top_neighbors]
down_neighbors = [i if type(i) is int else indices.index(i) for i in down_neighbors]
left_neighbors = [i if type(i) is int else indices.index(i) for i in left_neighbors]
right_neighbors = [i if type(i) is int else indices.index(i) for i in right_neighbors]
The best answer will probably depend on the nature of the computations you want to do. For example, if they can be expressed as summations over neighboring pixels, then something like np.convolve or scipy.signal.fftconvolve can be a really nice solution.
For your specific question of efficiently generating arrays of neighbor indices, you might try something like this:
x = np.random.rand(100, 100)
mask = x > 0.9
i, j = np.where(mask)
i_neighbors = i[:, np.newaxis] + [0, 0, -1, 1]
j_neighbors = j[:, np.newaxis] + [-1, 1, 0, 0]
# need to do something with the edge cases
# the best choice will depend on your application
# here we'll change out-of-bounds neighbors to the
# central point itself.
i_neighbors = np.clip(i_neighbors, 0, 99)
j_neighbors = np.clip(j_neighbors, 0, 99)
# compute some vectorized result over the neighbors
# as a concrete example, here we'll do a standard deviation
result = x[i_neighbors, j_neighbors].std(axis=1)
The result is an array of values corresponding to the masked region, containing the standard deviation of neighboring values.
Hopefully that approach will work for whatever specific problem you have in mind!
Edit: given the edited question above, here's how my response can be adapted to generate arrays of indices in a vectorized manner:
x = np.random.rand(100, 100)
mask = x > -0.9
i, j = np.where(mask)
i_neighbors = i[:, np.newaxis] + [0, 0, -1, 1]
j_neighbors = j[:, np.newaxis] + [-1, 1, 0, 0]
i_neighbors = np.clip(i_neighbors, 0, 99)
j_neighbors = np.clip(j_neighbors, 0, 99)
indices = np.zeros(x.shape, dtype=int)
indices[mask] = np.arange(len(i))
neighbor_in_mask = mask[i_neighbors, j_neighbors]
neighbors = np.where(neighbor_in_mask,
indices[i_neighbors, j_neighbors],
np.arange(len(i))[:, None])
left_indices, right_indices, top_indices, bottom_indices = neighbors.T