I have two lists. One with indexes (set obviously) and one with values. Is it possible to convert them in a numpy array with fixed size efficiently?
indexes = [1,2,6,7]
values = [0.2,0.5,0.6,0.2]
size = 10
What I want to output is:
print(magic_func(indexes,values,size))
array(0,0.2,0.5,0,0,
0,0.6,0.2,0,0)
It's easy in two lines, if you want:
In [1]: import numpy as np
In [2]: arr = np.zeros(size)
In [3]: arr[indexes] = values
In [4]: arr
Out[4]: array([ 0. , 0.2, 0.5, 0. , 0. , 0. , 0.6, 0.2, 0. , 0. ])
I have this delta function which have 3 cases. mask1, mask2 and if none of them is satisfied delta = 0, since res = np.zeros
def delta(r, dr):
res = np.zeros(r.shape)
mask1 = (r >= 0.5*dr) & (r <= 1.5*dr)
res[mask1] = (5-3*np.abs(r[mask1])/dr \
- np.sqrt(-3*(1-np.abs(r[mask1])/dr)**2+1)) \
/(6*dr)
mask2 = np.logical_not(mask1) & (r <= 0.5*dr)
res[mask2] = (1+np.sqrt(-3*(r[mask2]/dr)**2+1))/(3*dr)
return res
Then I have this other function where I call the former and I construct an array, E
def matrix_E(nk,X,Y,xhi,eta,dx,dy):
rx = abs(X[np.newaxis,:] - xhi[:,np.newaxis])
ry = abs(Y[np.newaxis,:] - eta[:,np.newaxis])
deltx = delta(rx,dx)
delty = delta(ry,dy)
E = deltx*delty
return E
The thing is that most of the elements of E belong to the third case of delta, 0. Most means about 99%.
So, I would like to have a sparse matrix instead of a dense one and not to stock the 0 elements in order to save memory.
Any ideas in how I could do it?
The normal way to create a sparse matrix is to construct three 1d arrays, with the nonzero values, and their i and j indexes. Then pass them to the coo_matrix function.
The coordinates don't have to be in order, so you could construct the arrays for the 2 nonzero mask cases and concatenate them.
Here's a sample construction using 2 masks
In [107]: x=np.arange(5)
In [108]: i,j,data=[],[],[]
In [110]: mask1=x%2==0
In [111]: mask2=x%2!=0
In [112]: i.append(x[mask1])
In [113]: j.append((x*2)[mask1])
In [114]: i.append(x[mask2])
In [115]: j.append(x[mask2])
In [116]: i=np.concatenate(i)
In [117]: j=np.concatenate(j)
In [118]: i
Out[118]: array([0, 2, 4, 1, 3])
In [119]: j
Out[119]: array([0, 4, 8, 1, 3])
In [120]: M=sparse.coo_matrix((x,(i,j)))
In [121]: print(M)
(0, 0) 0
(2, 4) 1
(4, 8) 2
(1, 1) 3
(3, 3) 4
In [122]: M.A
Out[122]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 3, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 4, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 2]])
A coo format stores those 3 arrays as is, but they get sorted and cleaned up when converted to other formats and printed.
I can work on adapting this to your case, but this may be enough to get you started.
It looks like X,Y,xhi,eta are 1d arrays. rx and ry are then 2d. delta returns a result the same shape as its input. E = deltx*delty suggests that deltax and deltay are the same shape (or at least broadcastable).
Since sparse matrix has a .multiply method to do element wise multiplication, we can focus on producing sparse delta matrices.
If you afford the memory to make rx, and a couple of masks, then you can also afford to make deltax (all the same size). Even through deltax has lots of zeros, it is probably fastest to make it dense.
But let's try to case the delta calculation, as a sparse build.
This looks like the essense of what you are doing in delta, at least with one mask:
start with a 2d array:
In [138]: r = np.arange(24).reshape(4,6)
In [139]: mask1 = (r>=8) & (r<=16)
In [140]: res1 = r[mask1]*0.2
In [141]: I,J = np.where(mask1)
the resulting vectors are:
In [142]: I
Out[142]: array([1, 1, 1, 1, 2, 2, 2, 2, 2], dtype=int32)
In [143]: J
Out[143]: array([2, 3, 4, 5, 0, 1, 2, 3, 4], dtype=int32)
In [144]: res1
Out[144]: array([ 1.6, 1.8, 2. , 2.2, 2.4, 2.6, 2.8, 3. , 3.2])
Make a sparse matrix:
In [145]: M=sparse.coo_matrix((res1,(I,J)), r.shape)
In [146]: M.A
Out[146]:
array([[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 1.6, 1.8, 2. , 2.2],
[ 2.4, 2.6, 2.8, 3. , 3.2, 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. ]])
I could make another sparse matrix with mask2, and add the two.
In [147]: mask2 = (r>=17) & (r<=22)
In [148]: res2 = r[mask2]*-0.4
In [149]: I,J = np.where(mask2)
In [150]: M2=sparse.coo_matrix((res2,(I,J)), r.shape)
In [151]: M2.A
Out[151]:
array([[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , -6.8],
[-7.2, -7.6, -8. , -8.4, -8.8, 0. ]])
...
In [153]: (M1+M2).A
Out[153]:
array([[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 1.6, 1.8, 2. , 2.2],
[ 2.4, 2.6, 2.8, 3. , 3.2, -6.8],
[-7.2, -7.6, -8. , -8.4, -8.8, 0. ]])
Or I could concatenate the res1 and res2, etc and make one sparse matrix:
In [156]: I1,J1 = np.where(mask1)
In [157]: I2,J2 = np.where(mask2)
In [158]: res12=np.concatenate((res1,res2))
In [159]: I12=np.concatenate((I1,I2))
In [160]: J12=np.concatenate((J1,J2))
In [161]: M12=sparse.coo_matrix((res12,(I12,J12)), r.shape)
In [162]: M12.A
Out[162]:
array([[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 1.6, 1.8, 2. , 2.2],
[ 2.4, 2.6, 2.8, 3. , 3.2, -6.8],
[-7.2, -7.6, -8. , -8.4, -8.8, 0. ]])
Here I choose the masks so the nonzero values don't overlap, but both methods work if they did. It's a delibrate design feature of the coo format that values for repeated indices are summed. It's very handy feature when creating sparse matries for finite element problems.
I can also get index arrays by creating a sparse matrix from the mask:
In [179]: rmask1=sparse.coo_matrix(mask1)
In [180]: rmask1.row
Out[180]: array([1, 1, 1, 1, 2, 2, 2, 2, 2], dtype=int32)
In [181]: rmask1.col
Out[181]: array([2, 3, 4, 5, 0, 1, 2, 3, 4], dtype=int32)
In [184]: sparse.coo_matrix((res1, (rmask1.row, rmask1.col)),rmask1.shape).A
Out[184]:
array([[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 1.6, 1.8, 2. , 2.2],
[ 2.4, 2.6, 2.8, 3. , 3.2, 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. ]])
I can't, though, create a mask from a sparse version of r. (r>=8) & (r<=16). That kind of inequality test has not been implemented for sparse matrices. But that might not matter, since r is probably not sparse.
I have several sparse vectors represented as lists of tuples eg.
[[(22357, 0.6265631775164965),
(31265, 0.3900572375543419),
(44744, 0.4075397480094991),
(47751, 0.5377595092643747)],
[(22354, 0.6265631775164965),
(31261, 0.3900572375543419),
(42344, 0.4075397480094991),
(47751, 0.5377595092643747)],
...
]
And my goal is to compose scipy.sparse.csr_matrix from several millions of vectors like this.
I would like to ask if there exists some simple elegant solution for this kind of conversion without trying to stuck everything to memory.
EDIT:
Just a clarification: My goal is to build the 2d matrix, where each of my sparse vectors represent one row in matrix.
Collecting indices,data into a structured array avoids the integer-double conversion issue. It is also a bit faster than the vstack approach (in limited testing) (With list data like this np.array is faster than np.vstack.)
indptr = np.cumsum([0]+[len(i) for i in vectors])
aa = np.array(vectors,dtype='i,f').flatten()
A = sparse.csr_matrix((aa['f1'], aa['f0'], indptr))
I substituted the list comprehension for map since I'm using Python3.
Indicies in the coo format (data, (i,j)) might be more intuitive
ii = [[i]*len(v) for i,v in enumerate(vectors)])
ii = np.array(ii).flatten()
aa = np.array(vectors,dtype='i,f').flatten()
A2 = sparse.coo_matrix((aa['f1'],(np.array(ii), aa['f0'])))
# A2.tocsr()
Here, ii from the 1st step is the row numbers for each sublist.
[[0, 0, 0, 0],
[1, 1, 1, 1],
[2, 2, 2, 2],
[3, 3, 3, 3],
...]]
This construction method is slower than the csr direct indptr.
For a case where there are differing numbers of entries per row, this approach works (using intertools.chain to flatten lists):
A sample list (no empty rows for now):
In [779]: vectors=[[(1, .12),(3, .234),(6,1.23)],
[(2,.222)],
[(2,.23),(1,.34)]]
row indexes:
In [780]: ii=[[i]*len(v) for i,v in enumerate(vectors)]
In [781]: ii=list(chain(*ii))
column and data values pulled from tuples and flattened
In [782]: jj=[j for j,_ in chain(*vectors)]
In [783]: data=[d for _,d in chain(*vectors)]
In [784]: ii
Out[784]: [0, 0, 0, 1, 2, 2]
In [785]: jj
Out[785]: [1, 3, 6, 2, 2, 1]
In [786]: data
Out[786]: [0.12, 0.234, 1.23, 0.222, 0.23, 0.34]
In [787]: A=sparse.csr_matrix((data,(ii,jj))) # coo style input
In [788]: A.A
Out[788]:
array([[ 0. , 0.12 , 0. , 0.234, 0. , 0. , 1.23 ],
[ 0. , 0. , 0.222, 0. , 0. , 0. , 0. ],
[ 0. , 0.34 , 0.23 , 0. , 0. , 0. , 0. ]])
Consider the following:
import numpy as np
from scipy.sparse import csr_matrix
vectors = [[(22357, 0.6265631775164965),
(31265, 0.3900572375543419),
(44744, 0.4075397480094991),
(47751, 0.5377595092643747)],
[(22354, 0.6265631775164965),
(31261, 0.3900572375543419),
(42344, 0.4075397480094991),
(47751, 0.5377595092643747)]]
indptr = np.cumsum([0] + map(len, vectors))
indices, data = np.vstack(vectors).T
A = csr_matrix((data, indices.astype(int), indptr))
Unfortunately, this way the column indices are converted from integers to doubles and back. This works correctly for up to very large matrices, but is not ideal.
I searched the net to find a guide for Scipy sparse matrices and I failed. I would be happy if anybody would share any source for it but now going to question:
I have an array of tuples. I want to change the array of tuples to a sparse matrix where the tuples appear on the main diagonal and diagonal just beside to it as the following example shows it. What is the fancy(efficient) way of doing it?
import numpy as np
A=np.asarray([[1,2],[3,4],[5,6],[7,8]])
B=np.zeros((A.shape[0],A.shape[0]+1))
for i in range(A.shape[0]):
B[i,i]=A[i,0]
B[i,i+1]=A[i,1]
print B
Output being:
[[ 1. 2. 0. 0. 0.]
[ 0. 3. 4. 0. 0.]
[ 0. 0. 5. 6. 0.]
[ 0. 0. 0. 7. 8.]]
You can build those really fast as a CSR matrix:
>>> A = np.asarray([[1,2],[3,4],[5,6],[7,8]])
>>> rows = len(A)
>>> cols = rows + 1
>>> data = A.flatten() # we want a copy
>>> indptr = np.arange(0, len(data)+1, 2) # 2 non-zero entries per row
>>> indices = np.repeat(np.arange(cols), [1] + [2] * (cols-2) + [1])
>>> import scipy.sparse as sps
>>> a_sps = sps.csr_matrix((data, indices, indptr), shape=(rows, cols))
>>> a_sps.A
array([[1, 2, 0, 0, 0],
[0, 3, 4, 0, 0],
[0, 0, 5, 6, 0],
[0, 0, 0, 7, 8]])
Try diags from scipy
import numpy as np
import scipy.sparse
A = np.asarray([[1,2],[3,4],[5,6],[7,8]])
B = scipy.sparse.diags([A[:,0], A[:,1]], [0, 1], [4, 5])
When I print B.todense(), it gives me
[[ 1. 2. 0. 0. 0.]
[ 0. 3. 4. 0. 0.]
[ 0. 0. 5. 6. 0.]
[ 0. 0. 0. 7. 8.]]
Suppose we had two arrays: some values, e.g. array([1.2, 1.4, 1.6]), and some indices (let's say, array([0, 2, 1])) Our output is expected to be the values put into a bigger array, "addressed" by the indices, so we would get
array([[ 1.2, 0. , 0. ],
[ 0. , 0. , 1.4],
[ 0. , 1.6, 0. ]])
Is there a way to do this without loops, in a nice, fast way?
With
a = zeros((3,3))
b = array([0, 2, 1])
vals = array([1.2, 1.4, 1.6])
You just need to index it (with the help of arange or r_):
>>> a[r_[:len(b)], b] = vals
array([[ 1.2, 0. , 0. ],
[ 0. , 0. , 1.4],
[ 0. , 1.6, 0. ]])
How do we modify this for higher dimensions? For example, a is a 5x4x3 array and b and vals are 5x4 arrays.
then How do we modify the statement a[r_[:len(b)],b] = vals ?