I've figured out how to code a function such that you can row reduce and solve linear algebra problems, the only issue I'm running into is setting up the if condition that does the partial pivoting when the constant that is used to row reduce is equal to 0. I've attempted it below and the logic makes sense but am struggling to understand why my solution doesn't work.
import numpy as np
def gaussElim(A,B):
M = np.concatenate((A,B), axis=1)# Combines the two matrices (assuming they have the same number of rows and matrix B has been )
nr, nc = M.shape
for r in range (nr):
const = M[r][r]
if const == 0: # **This is the condition that is tripping me up**
for i in range (nr-1):
M[r][i]=M[r+1][i]
M[r+1][i] = M[r][i]
const = M[r][r]
for c in range (r,nc):
M[r][c] = M[r][c]/const
for rr in range(nr):
if rr!= r :
const = M[rr][r]
for c in range(r,nc):
M[rr][c] = M[rr][c] - const * M[r][c]
return M[:, nc-1]
Mrx = np.array([ [1.0,3,2,4,3,1], [-4,0,3,2,3,4], [3,-1,3,2,2,5], [3,3,12,2,-
6,-4], [-1,-2,-3,7,6,4], [7,5,0,0,4,2] ])
Rhs = np.array([[ 4, 5, 6, 10, 6, -8 ]]) # This is a row vectorv
RhsT = Rhs.T # Rhs.T is transpose of Rhs and a column vector
S = gaussElim(Mrx,RhsT)
print(S)
A1 = np.array([[2.0,1,-1],[2,1,-2],[1,-1,1]])
b1 = np.array([[1.0],[-2],[2]])
S1 = gaussElim(A1,b1)
print (S1)
x = np.linalg.solve(A1,b1)
print(x)
Yes I have looked at other people's solutions to Gaussian elimination but I want to understand why more specifically my solution to partial pivoting isn't working. Thank you!
Imputing A1, b1 into my function gives me
[1, 0.8, 1.8]
The correct answer is
[1, 2, 3]
The print statements were so I could see if my function was working as intended.
One issue is this code:
for i in range (nr-1):
M[r][i]=M[r+1][i] # line 1
M[r+1][i] = M[r][i] # line 2
const = M[r][r] # line 3
The line I've commented as line 2 simply undoes the work of line 1. If you're attempting to swap values, try replacing lines 1 and 2 with this:
M[r][i], M[r+1][i] = M[r+1][i], M[r][i]
... or, if you prefer to really be explicit about the swap:
temp = M[r][i]
M[r][i]=M[r+1][i] # line 1
M[r+1][i] = temp # line 2
A second (probably benign) issue in your code is that line 3 above (const = M[r][r]) does not need to be inside the loop as it currently is, and I believe you can outdent one level with no change in the ultimate result.
A third (benign) issue is that M[r][c] = M[r][c]/const can be simplified to M[r][c] /= const, assuming the original arrays A and B (and thus M) are floats.
Similarly, M[rr][c] = M[rr][c] - const * M[r][c] can be simplified to M[rr][c] -= const * M[r][c].
Putting it all together (most importantly, correcting the first issue above):
import numpy as np
def gaussElim(A,B):
M = np.concatenate((A,B), axis=1)# Combines the two matrices (assuming they have the same number of rows and matrix B has been )
nr, nc = M.shape
for r in range (nr):
const = M[r][r]
if const == 0: # **This is the condition that is tripping me up**
for i in range (nr-1):
M[r][i], M[r+1][i] = M[r+1][i], M[r][i]
const = M[r][r]
for c in range (r,nc):
M[r][c] /= const
for rr in range(nr):
if rr!= r :
const = M[rr][r]
for c in range(r,nc):
M[rr][c] -= const * M[r][c]
return M[:, nc-1]
Mrx = np.array([ [1.0,3,2,4,3,1], [-4,0,3,2,3,4], [3,-1,3,2,2,5], [3,3,12,2,-
6,-4], [-1,-2,-3,7,6,4], [7,5,0,0,4,2] ])
Rhs = np.array([[ 4, 5, 6, 10, 6, -8 ]]) # This is a row vectorv
RhsT = Rhs.T # Rhs.T is transpose of Rhs and a column vector
S = gaussElim(Mrx,RhsT)
print(S)
A1 = np.array([[2.0,1,-1],[2,1,-2],[1,-1,1]])
b1 = np.array([[1.0],[-2],[2]])
S1 = gaussElim(A1,b1)
print (S1)
x = np.linalg.solve(A1,b1)
print(x)
Output:
[-0.97124946 2.88607869 -1.80198876 3.47492434 -6.16688284 4.51794207]
[0.33333333 1.33333333 1. ]
[[1.]
[2.]
[3.]]
Related
There are 2 boxes and a small gap that allows 1 particle per second from one box to enter the other box. Whether a particle will go from A to B, or B to A depends on the ratio Pa/Ptot (Pa: number of particles in box A, Ptot: total particles in both boxes).
To make it faster, I need to get rid of the for loops, however I can't find a way to either vectorize them or turn them into a sparse matrix that represents my for loop:
What about for loops you can't vectorize? The ones where the result at iteration n depends on what you calculated in iteration n-1, n-2, etc. You can define a sparse matrix that represents your for loop and then do a sparse matrix solve.
But I can't figure out how to define a sparse matrix out of this. The simulation boils down to calculating:
where
is the piece that gives me trouble when trying to express my problem as described here. (Note: the contents in the parenthesis are a bool operation)
Questions:
Can I vectorize the for loop?
If not, how can I define a sparse matrix?
(bonus question) Why is the execution time x27 faster in Python (0.027s) than Octave (0.75s)?
Note: I implemented the simulation in both Python and Octave and will soon do it on Matlab, therefor the tags are correct.
Octave code
1; % starting with `function` causes errors
function arr = Px_simulation (Pa_init, Ptot, t_arr)
t_size = size(t_arr);
arr = zeros(t_size); % fixed size array is better than arr = []
rand_arr = rand(t_size); % create all rand values at once
_Pa = Pa_init;
for _j=t_arr()
if (rand_arr(_j) * Ptot > _Pa)
_Pa += 1;
else
_Pa -= 1;
endif
arr(_j) = _Pa;
endfor
endfunction
t = 1:10^5;
for _i=1:3
Ptot = 100*10^_i;
tic()
Pa_simulation = Px_simulation(Ptot, Ptot, t);
toc()
subplot(2,2,_i);
plot(t, Pa_simulation, "-2;simulation;")
title(strcat("{P}_{a0}=", num2str(Ptot), ',P=', num2str(Ptot)))
endfor
Python
import numpy
import matplotlib.pyplot as plt
import timeit
import cpuinfo
from random import random
print('\nCPU: {}'.format(cpuinfo.get_cpu_info()['brand']))
PARTICLES_COUNT_LST = [1000, 10000, 100000]
DURATION = 10**5
t_vals = numpy.linspace(0, DURATION, DURATION)
def simulation(na_initial, ntotal, tvals):
shape = numpy.shape(tvals)
arr = numpy.zeros(shape)
na_current = na_initial
for i in range(len(tvals)):
if random() > (na_current/ntotal):
na_current += 1
else:
na_current -= 1
arr[i] = na_current
return arr
plot_lst = []
for i in PARTICLES_COUNT_LST:
start_t = timeit.default_timer()
n_a_simulation = simulation(na_initial=i, ntotal=i, tvals=t_vals)
execution_time = (timeit.default_timer() - start_t)
print('Execution time: {:.6}'.format(execution_time))
plot_lst.append(n_a_simulation)
for i in range(len(PARTICLES_COUNT_LST)):
plt.subplot('22{}'.format(i))
plt.plot(t_vals, plot_lst[i], 'r')
plt.grid(linestyle='dotted')
plt.xlabel("time [s]")
plt.ylabel("Particles in box A")
plt.show()
IIUC you can use cumsum() in both Octave and Numpy:
Octave:
>> p = rand(1, 5);
>> r = rand(1, 5);
>> p
p =
0.43804 0.37906 0.18445 0.88555 0.58913
>> r
r =
0.70735 0.41619 0.37457 0.72841 0.27605
>> cumsum (2*(p<(r+0.03)) - 1)
ans =
1 2 3 2 1
>> (2*(p<(r+0.03)) - 1)
ans =
1 1 1 -1 -1
Also note that the following function will return values ([-1, 1]):
I want to obtain a list (or array, doesn't matter) of A from the following formula:
A_i = X_(k!=i) * S_(k!=i) * X'_(k!=i)
where:
X is a vector (and X' is the transpose of X), S is a matrix, and the subscript k is defined as {k=1,2,3,...n| k!=i}.
X = [x1, x2, ..., xn]
S = [[s11,s12,...,s1n],
[s21,s22,...,s2n]
[... ... ... ..]
[sn1,sn2,...,snn]]
I take the following as an example:
X = [0.1,0.2,0.3,0.5]
S = [[0.4,0.1,0.3,0.5],
[2,1.5,2.4,0.6]
[0.4,0.1,0.3,0.5]
[2,1.5,2.4,0.6]]
So, eventually, I would get a list of four values for A.
I did this:
import numpy as np
x = np.array([0.1,0.2,0.3,0.5])
s = np.matrix([[0.4,0.1,0.3,0.5],[1,2,1.5,2.4,0.6],[0.4,0.1,0.3,0.5],[1,2,1.5,2.4,0.6]])
for k in range(x) if k!=i
A = (x.dot(s)).dot(np.transpose(x))
print (A)
I am confused with how to use a conditional 'for' loop. Could you please help me to solve it? Thanks.
EDIT:
Just to explain more. If you take i=1, then the formula will be:
A_1 = X_(k!=1) * S_(k!=1) * X'_(k!=1)
So any array (or value) associated with subscript 1 will be deleted in X and S. like:
X = [0.2,0.3,0.5]
S = [[1.5,2.4,0.6]
[0.1,0.3,0.5]
[1.5,2.4,0.6]]
Step 1: correctly calculate A_i
Step 2: collect them into A
I assume what you want to calculate is
An easy way to do so is to mask away the entries using masked arrays. This way we don't need to delete or copy any matrixes.
# sample
x = np.array([1,2,3,4])
s = np.diag([4,5,6,7])
# we will use masked arrays to remove k=i
vec_mask = np.zeros_like(x)
matrix_mask = np.zeros_like(s)
i = 0 # start
# set masks
vec_mask[i] = 1
matrix_mask[i] = matrix_mask[:,i] = 1
s_mask = np.ma.array(s, mask=matrix_mask)
x_mask = np.ma.array(x, mask=vec_mask)
# reduced product, remember using np.ma.inner instead np.inner
Ai = np.ma.inner(np.ma.inner(x_mask, s_mask), x_mask.T)
vec_mask[i] = 0
matrix_mask[i] = matrix_mask[:,i] = 0
As terms of 0 don't add to the sum, we actually can ignore masking the matrix and just mask the vector:
# we will use masked arrays to remove k=i
mask = np.zeros_like(x)
i = 0 # start
# set masks
mask[i] = 1
x_mask = np.ma.array(x, mask=mask)
# reduced product
Ai = np.ma.inner(np.ma.inner(x_mask, s), x_mask.T)
# unset mask
mask[i] = 0
The final step is to assemble A out of the A_is, so in total we get
x = np.array([1,2,3,4])
s = np.diag([4,5,6,7])
mask = np.zeros_like(x)
x_mask = np.ma.array(x, mask=mask)
A = []
for i in range(len(x)):
x_mask.mask[i] = 1
Ai = np.ma.inner(np.ma.inner(x_mask, s), x_mask.T)
A.append(Ai)
x_mask.mask[i] = 0
A_vec = np.array(A)
Implementing a matrix/vector product using loops will be rather slow in Python. Therefore, I suggest to actually delete the rows/columns/elements at the given index and perform the fast built-in dot product without any explicit loops:
i = 0 # don't forget Python's indices are zero-based
x_ = np.delete(X, i) # remove element
s_ = np.delete(S, i, axis=0) # remove row
s_ = np.delete(s_, i, axis=1) # remove column
result = x_.dot(s_).dot(x_) # no need to transpose a 1-D array
The code is below:
import numpy as np
X = np.array(range(15)).reshape(5,3) # X's element value is meaningless
flag = np.random.randn(5,4)
y = np.array([0, 1, 2, 3, 0]) # Y's element value in range(flag.shape[1]) and Y.shape[0] equals X.shape[0]
dW = np.zeros((3, 4)) # dW.shape equals (X.shape[1], flag.shape[1])
for i in xrange(5):
for j in xrange(4):
if flag[i,j] > 0:
dW[:,j] += X[i,:].T
dW[:,y[i]] -= X[i,:].T
To compute dW more efficiently, how to vectorize this for loop?
Here's how I'd do it:
# has shape (x.shape[1],) + flag.shape
masked = np.where(flag > 0, X.T[...,np.newaxis], 0)
# sum over the i index
dW = masked.sum(axis=1)
# sum over the j index
np.subtract.at(dW, np.s_[:,y], masked.sum(axis=2))
# dW[:,y] -= masked.sum(axis=2) does not work here
See the documentation of ufunc.at for an explanation of that last comment
Here's a vectorized approach based upon np.add.reduceat -
# --------------------- Setup output array ----------------------------------
dWOut = np.zeros((X.shape[1], flag.shape[1]))
# ------ STAGE #1 : Vectorize calculations for "dW[:,j] += X[i,:].T" --------
# Get indices where flag's transposed version has > 0
idx1 = np.argwhere(flag.T > 0)
# Row-extended version of X using idx1's col2 that corresponds to i-iterator
X_ext1 = X[idx1[:,1]]
# Get the indices at which we need to columns change
shift_idx1 = np.append(0,np.where(np.diff(idx1[:,0])>0)[0]+1)
# Use the changing indices as boundaries for add.reduceat to add
# groups of rows from extended version of X
dWOut[:,np.unique(idx1[:,0])] += np.add.reduceat(X_ext1,shift_idx1,axis=0).T
# ------ STAGE #2 : Vectorize calculations for "dW[:,y[i]] -= X[i,:].T" -------
# Repeat same philsophy for this second stage, except we need to index into y.
# So, that would involve sorting and also the iterator involved is just "i".
idx2 = idx1[idx1[:,1].argsort()]
cols_idx1 = y[idx2[:,1]]
X_ext2 = X[idx2[:,1]]
sort_idx = (y[idx2[:,1]]).argsort()
X_ext2 = X_ext2[sort_idx]
shift_idx2 = np.append(0,np.where(np.diff(cols_idx1[sort_idx])>0)[0]+1)
dWOut[:,np.unique(cols_idx1)] -= np.add.reduceat(X_ext2,shift_idx2,axis=0).T
You can do this:
ff = (flag > 0) * 1
ff = ff.reshape((5, 4, 1, 1))
XX = ff * X
[ii, jj] = np.meshgrid(np.arange(5), np.arange(4))
dW[:, jj] += XX[ii, jj, ii, :].transpose((2, 0, 1))
dW[:, y[ii]] -= XX[ii, jj, ii, :].transpose((2, 0, 1))
You can further merge and fold these expressions to get a one-liner but it won't add any more performance.
Update #1: Yep, sorry this is not giving correct results, I had a typo in my check
I am aware that numpy arrays are pointer arrays. And I know that is possible to define pointers in python. But I am wondering, if I make a variable equal to an element in a numpy vector, is it still a pointer or is it de-referenced? Is there a way I can find out or test this?
Example
import scipy
vec = scipy.randn(10)
vecptr = vec # vecptr is a pointer to vec
vecval = scipy.copy(vec) # vecval is not a pointer.
var = vec[3] # is var pointer or is it copied by value ???
print(type(var)) # returns numpy.float64. does this mean its a 1x1 numpy vec and therefore a pointer ?
The reason I ask is, what I really want to know is; will the code below this double up my memory? I am trying to create more meaningful variable names to my vector that is returned
v = self.viewCoefs[sz][sv][sa]
gw = v[0]
G0 = v[1]
G1 = v[2]
G2 = v[3]
alpha0 = v[4]
alpha1 = v[5]
alpha2 = v[6]
beta0 = v[7]
beta1 = v[8]
beta2 = v[9]
beta3 = v[10]
gamma0 = v[11]
gamma1 = v[12]
gamma2 = v[12]
gamma3 = v[12]
gamma4 = v[13]
delta0 = v[14]
delta1 = v[15]
delta2 = v[16]
delta3 = v[17]
delta4 = v[18]
delta5 = v[19]
zeta_prime_0 = v[20]
zeta_prime_1 = v[21]
zeta_prime_2 = v[22]
Gamma_prime_0 = v[23]
Gamma_prime_1 = v[24]
Gamma_prime_2 = v[25]
Gamma_prime_3 = v[26]
Because I have lots of these to follow
p0 = alpha0 + alpha1*scipy.log(bfrac) + alpha2*scipy.log(bfrac)**2
p1 = beta0 + beta1*scipy.log(bfrac) + beta2*scipy.log(bfrac)**2 + beta3*scipy.log(bfrac)**3
p2 = gamma0 + gamma1*scipy.log(bfrac) + gamma2*scipy.log(bfrac)**2 + gamma3*scipy.log(bfrac)**3 + gamma4*scipy.log(bfrac)**4
p3 = delta0 + delta1*scipy.log(bfrac) + delta2*scipy.log(bfrac)**2 + delta3*scipy.log(bfrac)**3 + delta4*scipy.log(bfrac)**4 + delta5*scipy.log(bfrac)**5
subSurfRrs = g*(p0*u + p1*u**2 + p2*u**3 + p3*u**4)
## and lots more
So I would like meaningful variable names without doubling my memory foot print.
#
Okay, If I got it right, the solution to NOT double up my memory is :
v = self.veiwCoefs[sz][sv][sa]
gw = v[0:1]
G0 = v[1:2]
G1 = v[2:1]
alpha0 = v[3:4]
alpha1 = v[4:5]
alpha2 = v[5:6]
beta0 = v[6:7]
beta1 = v[7:8]
beta2 = v[8:9]
beta3 = v[9:10]
## etc
p0 = alpha0[0] + alpha1*scipy.log(bfrac) + alpha2[0]*scipy.log(bfrac)**2
p1 = beta0[0] + beta1[0]*scipy.log(bfrac) + beta2[0]*scipy.log(bfrac)**2 + beta3[0]*scipy.log(bfrac)**3
## etc
You almost have it, but here is how to create a view of a single element:
In [1]: import numpy as np
In [23]: v = np.arange(10)
In [24]: a = v[3:4]
In [25]: a[0] = 100
In [26]: v
Out[26]: array([ 0, 1, 2, 100, 4, 5, 6, 7, 8, 9])
Here a is a view of the fourth element of v, so when you change a you change the corresponding position in v.
Views are very useful, and using them well can help save quite a bit of memory, but in your case I don't think views are appropriate. While a view does reuse the underlying data, I would not call it a pointer. Each view is a unique ndarray object, meaning it has it's own properties, for example shape:
In [4]: a = np.arange(7)
In [5]: b = a[1:5]
In [6]: b.shape = (2,2)
In [7]: b
Out[7]:
array([[1, 2],
[3, 4]])
In [8]: a.shape
Out[8]: (7,)
so when you do b = a[0:1], you're creating a brand new ndarray object to hold one int/float/... or whatever. If you want you have meaningful names for each element of your array, you're probably not going to get much more efficient than:
v = self.viewCoefs[sz][sv][sa]
gw = v[0]
G0 = v[1]
G1 = v[2]
G2 = v[3]
alpha0 = v[4]
## etc
That being said, you should try and see if there is a better way to vectorized you code, meaning try to write your code as operations on arrays instead of operations on elements of arrays. For example you might write:
coefs = np.zeros((5,5))
lt = np.tril_indices(5)
coefs[lt] = self.viewCoefs[sz][sv][sa]
p = (coefs * scipy.log(bfrac)**[1, 2, 3, 4, 5]).sum(-1)
subSurfRrs = g*(p*u**[1, 2, 3, 4]).sum()
Vectorized code can be much faster when using numpy. In this case we also exploit numpy's broadcasting, which I thought was very confusing until I got to know it a little better and realized how useful it could be.
Assume you have an array of values that will need to be summed together
d = [1,1,1,1,1]
and a second array specifying which elements need to be summed together
i = [0,0,1,2,2]
The result will be stored in a new array of size max(i)+1. So for example i=[0,0,0,0,0] would be equivalent to summing all the elements of d and storing the result at position 0 of a new array of size 1.
I tried to implement this using
c = zeros(max(i)+1)
c[i] += d
However, the += operation adds each element only once, thus giving the unexpected result of
[1,1,1]
instead of
[2,1,2]
How would one correctly implement this kind of summation?
If I understand the question correctly, there is a fast function for this (as long as the data array is 1d)
>>> i = np.array([0,0,1,2,2])
>>> d = np.array([0,1,2,3,4])
>>> np.bincount(i, weights=d)
array([ 1., 2., 7.])
np.bincount returns an array for all integers range(max(i)), even if some counts are zero
Juh_'s comment is the most efficient solution. Here's working code:
import numpy as np
import scipy.ndimage as ni
i = np.array([0,0,1,2,2])
d = np.array([0,1,2,3,4])
n_indices = i.max() + 1
print ni.sum(d, i, np.arange(n_indices))
This solution should be more efficient for large arrays (it iterates over the possible index values instead of the individual entries of i):
import numpy as np
i = np.array([0,0,1,2,2])
d = np.array([0,1,2,3,4])
i_max = i.max()
c = np.empty(i_max+1)
for j in range(i_max+1):
c[j] = d[i==j].sum()
print c
[1. 2. 7.]
def zeros(ilen):
r = []
for i in range(0,ilen):
r.append(0)
i_list = [0,0,1,2,2]
d = [1,1,1,1,1]
result = zeros(max(i_list)+1)
for index in i_list:
result[index]+=d[index]
print result
In the general case when you want to sum submatrices by labels you can use the following code
import numpy as np
from scipy.sparse import coo_matrix
def labeled_sum1(x, labels):
P = coo_matrix((np.ones(x.shape[0]), (labels, np.arange(len(labels)))))
res = P.dot(x.reshape((x.shape[0], np.prod(x.shape[1:]))))
return res.reshape((res.shape[0],) + x.shape[1:])
def labeled_sum2(x, labels):
res = np.empty((np.max(labels) + 1,) + x.shape[1:], x.dtype)
for i in np.ndindex(x.shape[1:]):
res[(...,)+i] = np.bincount(labels, x[(...,)+i])
return res
The first method use the sparse matrix multiplication. The second one is the generalization of user333700's answer. Both methods have comparable speed:
x = np.random.randn(100000, 10, 10)
labels = np.random.randint(0, 1000, 100000)
%time res1 = labeled_sum1(x, labels)
%time res2 = labeled_sum2(x, labels)
np.all(res1 == res2)
Output:
Wall time: 73.2 ms
Wall time: 68.9 ms
True