how to make matrix diagonal with larger shape - python

i have 1D array with shape is 777599. i want to make matrix diagonal of my data 1D array be 2D array matrix diagonal. but i have a problem.
this is my code:
import numpy as np
a = np.linspace(0, 2000, 777599)
b = np.diag(a)
print(b.shape)
and the response is:
Traceback (most recent call last):
File "/home/willi/PycharmProjects/006_TA/017_gravkorCG5.py", line 29, in <module>
b = np.diag(a)
File "<__array_function__ internals>", line 6, in diag
File "/home/willi/PycharmProjects/venv/lib/python3.5/site-packages/numpy/lib/twodim_base.py", line 275, in diag
res = zeros((n, n), v.dtype)
MemoryError: Unable to allocate 4.40 TiB for an array with shape (777599, 777599) and data type float64

An array with 777599x777599 (i.e. 604660204801) elements is huge. Sparse matrices to the rescue (requires pip install scipy):
import numpy as np
from scipy import sparse
a = np.linspace(0, 2000, 777599)
b = sparse.csc_matrix((a, (range(a.shape[0]), range(a.shape[0]))))
It will be slower than dense matrix... if a dense matrix could fit into memory. :)

Related

Can u check why is an error coming after concatenating numpy arrays

I tried Concatenating 2 numpy arrays but I got an error.
The error is:
Traceback (most recent call last):
File "C:\Users\hp\Desktop\Python\Numpy\OperationsOnArrays1.py", line 28, in <module>
array3 = np.concatenate((array,array2))
File "<__array_function__ internals>", line 5, in concatenate
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 3 and the array at index 1 has size 2
import numpy as np
array = np.array([2,43,2,
4,1,3])
# Sorting an array by ascending order
array = np.sort(array)
# Sorting by specifying the axis
array = np.array([[2,5,4],[3,2,1]])
# array5 = np.array([[2,5,4],[3,2,1]])
array = np.sort(array,axis=1)
# Concate (adding 1 array after another)
array2 = np.zeros((4,2))
print(array2)
array3 = np.concatenate((array,array2))
print(array)
print(array2)
print(array3)
print(array.shape, array2.shape) will print (2, 3) (4, 2).
For concatenate to work, the first dimension has to be the same in all arrays.

Initialize high dimensional sparse matrix

I want to initialize 300,000 x 300,0000 sparse matrix using sklearn, but it requires memory as if it was not sparse:
>>> from scipy import sparse
>>> sparse.rand(300000,300000,.1)
it gives the error:
MemoryError: Unable to allocate 671. GiB for an array with shape (300000, 300000) and data type float64
which is the same error as if I initialize using numpy:
np.random.normal(size=[300000, 300000])
Even when I go to a very low density, it reproduces the error:
>>> from scipy import sparse
>>> from scipy import sparse
>>> sparse.rand(300000,300000,.000000000001)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../python3.8/site-packages/scipy/sparse/construct.py", line 842, in rand
return random(m, n, density, format, dtype, random_state)
File ".../lib/python3.8/site-packages/scipy/sparse/construct.py", line 788, in random
ind = random_state.choice(mn, size=k, replace=False)
File "mtrand.pyx", line 980, in numpy.random.mtrand.RandomState.choice
File "mtrand.pyx", line 4528, in numpy.random.mtrand.RandomState.permutation
MemoryError: Unable to allocate 671. GiB for an array with shape (90000000000,) and data type int64
Is there a more memory-efficient way to create such a sparse matrix?
Just generate only what you need.
from scipy import sparse
import numpy as np
n, m = 300000, 300000
density = 0.00000001
size = int(n * m * density)
rows = np.random.randint(0, n, size=size)
cols = np.random.randint(0, m, size=size)
data = np.random.rand(size)
arr = sparse.csr_matrix((data, (rows, cols)), shape=(n, m))
This lets you build monster sparse arrays provided they're sparse enough to fit into memory.
>>> arr
<300000x300000 sparse matrix of type '<class 'numpy.float64'>'
with 900 stored elements in Compressed Sparse Row format>
This is probably how the sparse.rand constructor should be working anyway. If any row, col pairs collide it'll add the data values together, which is probably fine for all applications I can think of.
Try passing a reasonable density argument as seen in the docs... if you have like 10 trillion cells maybe like 0.00000001 or something...
https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.rand.html#scipy.sparse.rand
#hpaulj's comment is spot on. There is a clue in the error message also.
MemoryError: Unable to allocate 671. GiB for an array with shape (90000000000,) and data type int64
There is a reference to int64 and not float64 and a linear array of size 300,000 X 300,000. This refers to an intermediate step of random sampling in the creation of the sparse matrix, which occupies a lot of memory anyway.
Note that while creating any sparse matrix (irrespective of the format), you have to account memory for the non-zero values and for representing the position of the values in the matrix.

Value Error:Setting an array element with sequence

def get_column_normalized_matrix(A):
d=sp.csr_matrix.get_shape(A)[0]
Q=mat.zeros((d,d))
V=mat.zeros((1,d))
sp.csr_matrix.sum(A,axis=0,dtype='int',out=V)
for i in range(0,d):
if V[0,i]!=0:
Q[:,i]=sc.divide(A[:,i],V[0,i])
return Q
Input A is an adjacency matrix of sparse format.I am getting the above as error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 8, in get_column_normalized_matrix
ValueError: setting an array element with a sequence.
The problem you have is that you are trying to assign a sparse matrix into a dense one. This is not done automatically. It is rather simple to fix, though, by turning the sparse matrix into a dense one, using .todense():
import scipy.sparse as sp
import numpy.matlib as mat
import scipy as sc
def get_column_normalized_matrix(A):
d=sp.csr_matrix.get_shape(A)[0]
Q=mat.zeros((d,d))
V=mat.zeros((1,d))
sp.csr_matrix.sum(A,axis=0,dtype='int',out=V)
for i in range(0,d):
if V[0,i]!=0:
# Explicitly turn the sparse matrix into a dense one:
Q[:,i]=sc.divide(A[:,i],V[0,i]).todense()
return Q
If you instead want the output to be sparse, then you have to ensure that your output matrix Q is sparse to begin with. That can be achieved as follows:
def get_column_normalized_matrix(A):
d=sp.csr_matrix.get_shape(A)[0]
Q=sp.csr_matrix(A) # Create sparse output matrix
V=mat.zeros((1,d))
sp.csr_matrix.sum(A,axis=0,dtype='int',out=V)
for i in range(0,d):
if V[0,i]!=0:
# Update sparse matrix
Q[:,i]=sc.divide(A[:,i],V[0,i])
return Q
As can be seen, Q is created as a copy of A. This makes the same element being non zero in both matrices, which ensures efficient updating, since no new elements will be added.

Numpy memory error

I'm running into a memory error issue with numpy. The following line of code seems to be the issue:
self.D_r = numpy.diag(1/numpy.sqrt(self.r))
Where self.r is a relatively small numpy array.
The interesting thing is I monitored the memory usage and the process took up at most 3% of the RAM on the machine. So I'm thinking there's something that is killing the script before all the RAM is taken up because there is an expectation that the process will do so. If anybody has any ideas I would be very grateful.
Edit 1:
Here's the traceback:
Traceback (most recent call last):
File "/path_to_file/my_script.py", line 82, in <module>
mca_X = mca.mca(X)
File "/path_to_file/mca.py", line 54, in __init__
self.D_r = numpy.diag(1/numpy.sqrt(self.r.values))
File "/path_to_file/numpy/lib/twodim_base.py", line 302, in diag
res = zeros((n, n), v.dtype)
MemoryError
Running the script on KDD Cup 99 data (with one-hot-encoded nominal variables).
If the argument to np.diag() is a 1D, it creates a 2D array, using the 1D array as diagonal:
Signature: np.diag(v, k=0)
Parameters
v : array_like
If `v` is a 2-D array, return a copy of its `k`-th diagonal.
If `v` is a 1-D array, return a 2-D array with `v` on the `k`-th
diagonal.
This squares the memory size of the array.
if self.r is a 1D little array of more than 51000 éléments it can create a memory error :
In [85]: a=np.diag(arange(5e4))
In [86]: a.shape
Out[86]: (50000, 50000)
In [88]: a.size*a.itemsize
Out[88]: 20 000 000 000 # 20 Go
In [87]: a=np.diag(arange(5.1e4))
---------------------------------------------------------------------------
MemoryError

Getting around, numpy objects mismatch error in python

I'm having a problem with multiplying two big matrices in python using numpy.
I have a (15,7) matrix and I want to multipy it by its transpose, i.e. AT(7,15)*A(15*7) and mathemeticaly this should work, but I get an error :
ValueError:shape mismatch:objects cannot be broadcast to a single shape
I'm using numpy in Python. How can I get around this, anyone please help!
You've probably represented the matrices as arrays. You can either convert them to matrices with np.asmatrix, or use np.dot to do the matrix multiplication:
>>> X = np.random.rand(15 * 7).reshape((15, 7))
>>> X.T * X
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: operands could not be broadcast together with shapes (7,15) (15,7)
>>> np.dot(X.T, X).shape
(7, 7)
>>> X = np.asmatrix(X)
>>> (X.T * X).shape
(7, 7)
One difference between arrays and matrices is that * on a matrix is matrix product, while on an array it's an element-wise product.

Categories

Resources