Basically, I am getting a memory error in python when trying to perform an algebraic operation on a numpy matrix. The variable u, is a large matrix of double (in the failing case its a 288x288x156 matrix of doubles. I only get this error in this huge case, but I am able to do this on other large matrices, just not this big). Here is the Python error:
Traceback (most recent call last):
File "S:\3D_Simulation_Data\Patient SPM Segmentation\20 pc
t perim erosion flattop\SwSim.py", line 121, in __init__
self.mainSimLoop()
File "S:\3D_Simulation_Data\Patient SPM Segmentation\20 pc
t perim erosion flattop\SwSim.py", line 309, in mainSimLoop
u = solver.solve_cg(u,b,tensors,param,fdHold,resid) # Solve the left hand si
de of the equation Au=b with conjugate gradient method to approximate u
File "S:\3D_Simulation_Data\Patient SPM Segmentation\20 pc
t perim erosion flattop\conjugate_getb.py", line 47, in solv
e_cg
u = u + alpha*p
MemoryError
u = u + alpha*p is the line of code that fails.
alpha is just a double, while u and r are the large matrices described above (both of the same size).
I don't know that much about memory errors especially in Python. Any insight/tips into solving this would be very appreciated!
Thanks
Rewrite to
p *= alpha
u += p
and this will use much less memory. Whereas p = p*alpha allocates a whole new matrix for the result of p*alpha and then discards the old p; p*= alpha does the same thing in place.
In general, with big matrices, try to use op= assignment.
Another tip I have found to avoid memory errors is to manually control garbage collection. When objects are deleted or go our of scope, the memory used for these variables isn't freed up until a garbage collection is performed. I have found with some of my code using large numpy arrays that I get a MemoryError, but that I can avoid this if I insert calls to gc.collect() at appropriate places.
You should only look into this option if using "op=" style operators etc doesn't solve your problem as it's probably not the best coding practice to have gc.collect() calls everywhere.
Your matrix has 288x288x156=12,939,264 entries, which for double could come out to 400MB in memory. numpy throwing a MemoryError at you just means that in the function you called the memory needed to perform the operation wasn't available from the OS.
If you can work with sparse matrices this might save you a lot of memory.
Related
I have been trying to debug a program using vast amounts of memory and have distilled it into the following example:
# Caution, use carefully, this can utilise all available memory on your computer
# and render it effectively unresponsive, to the point where you cannot access
# the shell to kill the process; thus requiring reboot.
import numpy as np
import collections
import torch
# q = collections.deque(maxlen=1500) # Uses around 6.4GB
# q = collections.deque(maxlen=3000) # Uses around 12GB
q = collections.deque(maxlen=5000) # Uses around 18GB
def f():
nparray = np.zeros([4,84,84], dtype=np.uint8)
q.append(nparray)
nparray1 = np.zeros([32,4,84,84], dtype=np.float32)
tens = torch.tensor(nparray1, dtype=torch.float32)
while True:
f()
Please note the cautionary message in the 1st line of this program. If you set maxlen to a level where it uses too much of your available RAM, it can crash your computer.
I measured the memory using top (VIRT column), and its memory use seems wildly excessive (details on the commented lines above). From previous experience in my original program if maxlen is high enough it will crash my computer.
Why is it using so much memory?
I calculate the increase in expected memory from maxlen=1500 to maxlen=3000 to be:
4 * 84 * 84 * 15000 / (1024**2) == 403MB.
But we see an increase of 6GB.
There seems to be some sort of interaction between using collections and the tensor allocation as commenting either out causes memory use to be expected; eg commenting out the tensor line leads to total memory use of 2GB which seems much more reasonable.
Thanks for any help or insight,
Julian.
I think PyTorch store and update the computational graph each time you call f(), and thus the graph-size just keeps getting bigger and bigger.
Can you try to free the memory usage by using del(tens) (deleting the reference for the variable after usage), and let me know how it works? (found in PyTorch-documents here: https://pytorch.org/docs/stable/notes/faq.html)
I'm confused by how numpy's memmap handles changes to data when using copy-on-write (mmap_mode=c). Since nothing is written to the original array on disk, I'm expecting that it has to store all changes in memory, and thus could run out of memory if you modify every single element. To my surprise, it didn't.
I am trying to reduce my memory usage for my machine learning scripts which I run on a shared cluster (the less mem each instance takes, the more instances I can run at the same time). My data are very large numpy arrays (each > 8 Gb). My hope is to use np.memmap to work with these arrays with small memory (<4Gb available).
However, each instance might modify the data differently (e.g. might choose to normalize the input data differently each time). This has implications for storage space. If I use the r+ mode, then normalizing the array in my script will permanently change the stored array.
Since I don't want redundant copies of the data, and just want to store the original data on disk, I thought I should use the 'c' mode (copy-on-write) to open the arrays. But then where do your changes go? Are the changes kept just in memory? If so, if I change the whole array won't I run out of memory on a small-memory system?
Here's an example of a test which I expected to fail:
On a large memory system, create the array:
import numpy as np
GB = 1000**3
GiB = 1024**3
a = np.zeros((50000, 20000), dtype='float32')
bytes = a.size * a.itemsize
print('{} GB'.format(bytes / GB))
print('{} GiB'.format(bytes / GiB))
np.save('a.npy', a)
# Output:
# 4.0 GB
# 3.725290298461914 GiB
Now, on a machine with just 2 Gb of memory, this fails as expected:
a = np.load('a.npy')
But these two will succeed, as expected:
a = np.load('a.npy', mmap_mode='r+')
a = np.load('a.npy', mmap_mode='c')
Issue 1: I run out of memory running this code, trying to modify the memmapped array (fails regardless of r+/c mode):
for i in range(a.shape[0]):
print('row {}'.format(i))
a[i,:] = i*np.arange(a.shape[1])
Why does this fail (especially, why does it fail even in r+ mode, where it can write to the disk)? I thought memmap would only load pieces of the array into memory?
Issue 2: When I force the numpy to flush the changes every once in a while, both r+/c mode successfully finish the loop. But how can c mode do this? I didn't think flush() would do anything for c mode? The changes aren't written to disk, so they are kept in memory, and yet somehow all the changes, which must be over 3Gb, don't cause out-of-memory errors?
for i in range(a.shape[0]):
if i % 100 == 0:
print('row {}'.format(i))
a.flush()
a[i,:] = i*np.arange(a.shape[1])
Numpy isn't doing anything clever here, it's just deferring to the builtin memmap module, which has an access argument that:
accepts one of four values: ACCESS_READ, ACCESS_WRITE, or ACCESS_COPY to specify read-only, write-through or copy-on-write memory respectively.
On linux, this works by calling the mmap system call with
MAP_PRIVATE
Create a private copy-on-write mapping. Updates to the
mapping are not visible to other processes mapping the same
file, and are not carried through to the underlying file.
Regarding your question
The changes aren't written to disk, so they are kept in memory, and yet somehow all the changes, which must be over 3Gb, don't cause out-of-memory errors?
The changes likely are written to disk, but just not to the file you opened. They're likely paged into virtual memory somewhere.
I have a huge dataset on which I wish to PCA. I am limited by RAM and computational efficency of PCA.
Therefore, I shifted to using Iterative PCA.
Dataset Size-(140000,3504)
The documentation states that This algorithm has constant memory complexity, on the order of batch_size, enabling use of np.memmap files without loading the entire file into memory.
This is really good, but unsure on how take advantage of this.
I tried load one memmap hoping it would access it in chunks but my RAM blew.
My code below ends up using a lot of RAM:
ut = np.memmap('my_array.mmap', dtype=np.float16, mode='w+', shape=(140000,3504))
clf=IncrementalPCA(copy=False)
X_train=clf.fit_transform(ut)
When I say "my RAM blew", the Traceback I see is:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\sklearn\base.py", line 433, in fit_transfo
rm
return self.fit(X, **fit_params).transform(X)
File "C:\Python27\lib\site-packages\sklearn\decomposition\incremental_pca.py",
line 171, in fit
X = check_array(X, dtype=np.float)
File "C:\Python27\lib\site-packages\sklearn\utils\validation.py", line 347, in
check_array
array = np.array(array, dtype=dtype, order=order, copy=copy)
MemoryError
How can I improve this without comprising on accuracy by reducing the batch-size?
My ideas to diagnose:
I looked at the sklearn source code and in the fit() function Source Code I can see the following. This makes sense to me, but I am still unsure about what is wrong in my case.
for batch in gen_batches(n_samples, self.batch_size_):
self.partial_fit(X[batch])
return self
Edit:
Worst case scenario I will have to write my own code for iterativePCA which batch processes by reading and closing .npy files. But that would defeat the purpose of taking advantage of already present hack.
Edit2:
If somehow I could delete a batch of processed memmap file. It would make much sense.
Edit3:
Ideally if IncrementalPCA.fit() is just using batches it should not crash my RAM. Posting the whole code, just to make sure I am not making a mistake in flushing the memmap completely to disk before.
temp_train_data=X_train[1000:]
temp_labels=y[1000:]
out = np.empty((200001, 3504), np.int64)
for index,row in enumerate(temp_train_data):
actual_index=index+1000
data=X_train[actual_index-1000:actual_index+1].ravel()
__,cd_i=pywt.dwt(data,'haar')
out[index] = cd_i
out.flush()
pca_obj=IncrementalPCA()
clf = pca_obj.fit(out)
Surprisingly, I note out.flush doesn't free my memory. Is there a way to using del out to free my memory completely and then someone pass a pointer of the file to IncrementalPCA.fit().
You have hit a problem with sklearn in a 32 bit environment. I presume you are using np.float16 because you're in a 32 bit environment and you need that to allow you to create the memmap object without numpy thowing errors.
In a 64 bit environment (tested with Python3.3 64 bit on Windows), your code just works out of the box. So, if you have a 64 bit computer available - install python 64-bit and numpy, scipy, scikit-learn 64 bit and you are good to go.
Unfortunately, if you cannot do this, there is no easy fix. I have raised an issue on github here, but it is not easy to patch. The fundamental problem is that within the library, if your type is float16, a copy of the array to memory is triggered. The detail of this is below.
So, I hope you have access to a 64 bit environment with plenty of RAM. If not, you will have to split up your array yourself and batch process it, a rather larger task...
N.B It's really good to see you going to the source to diagnose your problem :) However, if you look at the line where the code fails (from the Traceback), you will see that the for batch in gen_batches code that you found is never reached.
Detailed diagnosis:
The actual error generated by OP code:
import numpy as np
from sklearn.decomposition import IncrementalPCA
ut = np.memmap('my_array.mmap', dtype=np.float16, mode='w+', shape=(140000,3504))
clf=IncrementalPCA(copy=False)
X_train=clf.fit_transform(ut)
is
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\sklearn\base.py", line 433, in fit_transfo
rm
return self.fit(X, **fit_params).transform(X)
File "C:\Python27\lib\site-packages\sklearn\decomposition\incremental_pca.py",
line 171, in fit
X = check_array(X, dtype=np.float)
File "C:\Python27\lib\site-packages\sklearn\utils\validation.py", line 347, in
check_array
array = np.array(array, dtype=dtype, order=order, copy=copy)
MemoryError
The call to check_array(code link) uses dtype=np.float, but the original array has dtype=np.float16. Even though the check_array() function defaults to copy=False and passes this to np.array(), this is ignored (as per the docs), to satisfy the requirement that the dtype is different; therefore a copy is made by np.array.
This could be solved in the IncrementalPCA code by ensuring that the dtype was preserved for arrays with dtype in (np.float16, np.float32, np.float64). However, when I tried that patch, it only pushed the MemoryError further along the chain of execution.
The same copying problem occurs when the code calls linalg.svd() from the main scipy code and this time the error occurs during a call to gesdd(), a wrapped native function from lapack. Thus, I do not think there is a way to patch this (at least not an easy way - it is at minimum alteration of code in core scipy).
Does the following alone trigger the crash?
X_train_mmap = np.memmap('my_array.mmap', dtype=np.float16,
mode='w+', shape=(n_samples, n_features))
clf = IncrementalPCA(n_components=50).fit(X_train_mmap)
If not then you can use that model to transform (project your data iteratively) to a smaller data using batches:
X_projected_mmap = np.memmap('my_result_array.mmap', dtype=np.float16,
mode='w+', shape=(n_samples, clf.n_components))
for batch in gen_batches(n_samples, self.batch_size_):
X_batch_projected = clf.transform(X_train_mmap[batch])
X_projected_mmap[batch] = X_batch_projected
I have not tested that code but I hope that you get the idea.
I'm having trouble using the pinv function from the numpy.linalg module. I want to invert a rectangular matrix A:
try:
Binv = np.linalg.pinv(A)
except:
print("an error occurs")
When I run the code no exception is raised, but in my Python prompt the following red text appears: init_dgesdd failed init.
However when I use my code with other matrices in other contexts (different shapes, different conditioning values...) it works fine.
After investigation of the error, it seems to come from memory issues. When I use a matrix with a (105 x 177144) shape, it works. But when I use a matrix with a (105 x 178668) shape, it does not work.
Moreover, a quik look at the numpy.linalg.umath_linalg.c.src code shows that the error mentionned in my previous post raised when the memory allocation of the memory buffer failed. This memory buffer is used to store the U, S, VT and all the intermediate arrays needed during the svd computation.
I'm experiencing a very weird problem when using a large numpy array. Here's the basic context. I have about 15 lists of paired objects which I am constructing adjacency matrices for. Each adjacency matrix is about 4000 x 4000 (square matrix where the diagonal means the object is paired with itself) so it's big but not too big. Here's the basic setup of my code:
def createAdjacencyMatrix(pairedObjectList, objectIndexList):
N = len(objectIndexList)
adj = numpy.zeros((N,N))
for i in range(0, len(pairedObjectList):
#put a 1 in the correct row/column position for the pair etc.
return adj
In my script I call this function about 15 times, one for each paired object list. However every time I run it I get this error:
adj = np.zeros((N,N))
MemoryError
I really don't understand where the memory error is coming from. Even though I'm making this big matrix, it only exists within the scope of that function, so shouldn't it be cleared from memory every time the function is finished? Not to mention, if the same variable is hanging around in memory, then shouldn't it just overwrite those memory positions?
Any help understanding this much appreciated.
EDIT : Here's the full output of the traceback
Traceback (most recent call last):
File "create_biogrid_adjacencies.py", line 119, in <module>
adjMat = dsu.createAdjacencyMatrix(proteinList,idxRefDict)
File "E:\Matt\Documents\Research\NU\networks_project\data_setup_utils.py", line 18, in createAdjacencyMatrix
adj = np.zeros((N,N))
MemoryError