I have a list which I .append() to in a for-loop, finally the length of the list is around 180,000. Each item in the list is a numpy array of 7680 float32 values.
Then I convert the list to a numpy array, i.e. I expect an array of shape ( 180000, 7680 ):
d = numpy.asarray( dlist, dtype = 'float32' )
That caused the script to crash with the message Killed.
Is memory the problem? Assuming float32 takes 4 bytes, 180000x7680x4bytes = 5.5 GB.
I am using 64 bit Ubuntu, 12 GB RAM.
Yes, memory is the problem
Your estimate also needs to take into account a memory-allocation already done for the list-representation of the 180000 x 7680 x float32, so without other details on dynamic memory-releases / garbage-collections, the numpy.asarray() method needs a bit more than just another space of 1800000 x 7680 x numpy.float32 bytes.
If you try to test with less than a third length of the list, you may inspect the resulting effective-overhead of the numpy.array data-representation, so as to have exact data for your memory-feasible design
Memory-profiling may help to point out the bottleneck and understand the code requirements, that may sometimes help to save half of the allocation space needed for data, compared to an original mode of data-flow and operations:
(Fig.: Courtesy scikit-learn testing numpy-based or BLAS-direct calling method impact on memory-allocation envelopes )
You should take into account, that you need twice the size of memory in the process of conversion.
Also, other software may take some of your RAM and when you have no additional paging space defined, using 11GB of your 12GB memory will probably bring your system into trouble.
Related
UPDATED question:
I have a 120000x14000 matrix that is sparse. Then I want to do some matrix algebra:
c = np.sum(indM, axis=1).T
w = np.diag(1 / np.array(c)[0]) # Fails with memory error
w = sparse.eye(len(indM), dtype=np.float)/np.array(c)[0] # Fails with memory error
w = np.nan_to_num(w)
u = w # indM # Fails with 'Object types not supported'
u_avg = np.array(np.sum(u, axis=0) / np.sum(indM, axis=0))[0]
So the problem is that the above first fails with memory error when creating a diagonal matrix with non-integers in the diagonal. If I manage to procese, the kernel somehow don't recognize "Objects" as supported types meaning I can't do sparse matrices, I think?
What do you recommend I do?
Try using numpy's sum. In my experience, it tends to blow other stuff out of the water when it comes to performance.
import numpy as np
c = np.sum(indM,axis=1)
It sounds like you don't have enough RAM to handle such a large array. The obvious choice here is to use methods from scipy.sparse but you say you've tried that and still encounter a memory problem. Fortunately, there are still a few other options:
Change your dataframe to a numpy array (this may reduce memory overhead)
You could use numpy.memmap to map your array to a location stored in binary on disk.
At the expense of precision, you could change the dtype of any floats from float64 (the default) to float32.
If you are loading your data from a .csv file, pd.read_csv has an option chunksize which allows you to read in your data in chunks.
Try using a cloud-based resource like Kaggle. There may be more processing power available there than on your machine.
I am trying to understand why this python code results in a process that requires 236 MB of memory, considering that the list is only 76 MB long.
import sys
import psutil
initial = psutil.virtual_memory().available / 1024 / 1024
available_memory = psutil.virtual_memory().available
vector_memory = sys.getsizeof([])
vector_position_memory = sys.getsizeof([1]) - vector_memory
positions = 10000000
print "vector with %d positions should use %d MB of memory " % (positions, (vector_memory + positions * vector_position_memory) / 1024 / 1024)
print "it used %d MB of memory " % (sys.getsizeof(range(0, positions)) / 1024 / 1024)
final = psutil.virtual_memory().available / 1024 / 1024
print "however, this process used in total %d MB" % (initial - final)
The output is:
vector with 10000000 positions should use 76 MB of memory
it used 76 MB of memory
however, this process used in total 236 MB
Adding x10 more positions (i.e. positions = 100000000) results in x10 more memory.
vector with 100000000 positions should use 762 MB of memory
it used 762 MB of memory
however, this process used in total 2330 MB
My ultimate goal is to suck as much memory as I can to create a very long list. To do this, I created this code to understand/predict how big my list could be based on available memory. To my surprise, python needs a ton of memory to manage my list, I guess.
Why does python use so much memory?! What is it doing with it? Any idea on how I can predict python's memory requirements to effectively create a list to use pretty much all the available memory while preventing the OS from doing swap?
The getsizeof function only includes the space used by the list itself.
But the list is effectively just an array of pointers to int objects, and you created 10000000 of those, and each one of those takes memory as well—typically 24 bytes.
The first few numbers (usually up to 255) are pre-created and cached by the interpreter, so they're effectively free, but the rest are not. So, you want to add something like this:
int_memory = sys.getsizeof(10000)
print "%d int objects should use another %d MB of memory " % (positions - 256, (positions - 256) * int_memory / 1024 / 1024)
And then the results will make more sense.
But notice that if you aren't creating a range with 10M unique ints, but instead, say, 10M random ints from 0-10000, or 10M copies of 0, that calculation will no longer be correct. So if want to handle those cases, you need to do something like stash the id of every object you've seen so far and skip any additional references to the same id.
The Python 2.x docs used to have a link to an old recursive getsizeof function that does that, and more… but that link went dead, so it was removed.
The 3.x docs have a link to a newer one, which may or may not work in Python 2.7. (I notice from a quick glance that it uses a __future__ statement for print, and falls back from reprlib.repr to repr, so it probably does.)
If you're wondering why every int is 24 bytes long (in 64-bit CPython; it's different for different platforms and implementations, of course):
CPython represents every builtin type as a C struct that contains, at least, space for a refcount and a pointer to the type. Any actual value the object needs to represent is in addition to that.1 So, the smallest non-singleton type is going to take 24 bytes per instance.
If you're wondering how you can avoid using up 24 bytes per integer, the answer is to use NumPy's ndarray—or, if for some reason you can't, the stdlib's array.array.
Either one lets you specify a "native type", like np.int32 for NumPy or i for array.array, and create an array that holds 100M of those native-type values directly. That will take exactly 4 bytes per value, plus a few dozen constant bytes of header overhead, which is a lot smaller than a list's 8 bytes of pointer, plus a bit of slack at the end that scales with the length, plus an int object wrapping up each value.
Using array.array, you're sacrificing speed for space,2 because every time you want to access one of those values, Python has to pull it out and "box" it as an int object.
Using NumPy, you're gaining both speed and space, because NumPy will let you perform vectorized operations over the whole array in a tightly-optimized C loop.
1. What about non-builtin types, that you create in Python with class? They have a pointer to a dict—which you can see from Python-land as __dict__—that holds all the attributes you add. So they're 24 bytes according to getsizeof, but of course you have to also add the size of that dict.
2. Unless you aren't. Preventing your system from going into swap hell is likely to speed things up a lot more than the boxing and unboxing slows things down. And, even if you aren't avoiding that massive cliff, you may still be avoiding smaller cliffs involving VM paging or cache locality.
I am reading data from an UDP socket in a while loop. I need the most efficient way to
1) Read the data (*) (that's kind of solved, but comments are appreciated)
2) Dump the (manipulated) data periodically in a file (**) (The Question)
I am anticipating a bottleneck in the numpy's "tostring" method. Let's consider the following piece of (an incomplete) code:
import socket
import numpy
nbuf=4096
buf=numpy.zeros(nbuf,dtype=numpy.uint8) # i.e., an array of bytes
f=open('dump.data','w')
datasocket=socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
# ETC.. (code missing here) .. the datasocket is, of course, non-blocking
while True:
gotsome=True
try:
N=datasocket.recv_into(buf) # no memory-allocation here .. (*)
except(socket.error):
# do nothing ..
gotsome=False
if (gotsome):
# the bytes in "buf" will be manipulated in various ways ..
# the following write is done frequently (not necessarily in each pass of the while loop):
f.write(buf[:N].tostring()) # (**) The question: what is the most efficient way to do this?
f.close()
Now, at (**), as I understand it:
1) buf[:N] allocates memory for a new array object, having the length N+1, right? (maybe not)
.. and after that:
2) buf[:N].tostring() allocates memory for a new string, and the bytes from buf are copied into this string
That seems a lot of memory-allocation & swapping. In this same loop, in the future, I will read several sockets and write into several files.
Is there a way to just tell f.write to access directly the memory address of "buf" from 0 to N bytes and write them onto the disk?
I.e., to do this in the spirit of the buffer interface and avoid those two extra memory allocations?
P. S. f.write(buf[:N].tostring()) is equivalent to buf[:N].tofile(f)
Basically, it sounds like you want to use the array's tofile method or directly use the ndarray.data buffer object.
For your exact use-case, using the array's data buffer is the most efficient, but there are a lot of caveats that you need to be aware of for general use. I'll elaborate in a bit.
However, first let me answer a couple of your questions and provide a bit of clarification:
buf[:N] allocates memory for a new array object, having the length N+1, right?
It depends on what you mean by "new array object". Very little additional memory is allocated, regardless of the size of the arrays involved.
It does allocate memory for a new array object (a few bytes), but it does not allocate additional memory for the array's data. Instead, it creates a "view" that shares the original array's data buffer. Any changes you make to y = buf[:N] will affect buf as well.
buf[:N].tostring() allocates memory for a new string, and the bytes from buf are copied into this string
Yes, that's correct.
On a side note, you can actually go the opposite way (string to array) without allocating any additional memory:
somestring = 'This could be a big string'
arr = np.frombuffer(buffer(somestring), dtype=np.uint8)
However, because python strings are immutable, arr will be read-only.
Is there a way to just tell f.write to access directly the memory address of "buf" from 0 to N bytes and write them onto the disk?
Yep!
Basically, you'd want:
f.write(buf[:N].data)
This is very efficient and will work for any file-like object. It's almost definitely what you want in this exact case. However, there are several caveats!
First off, note that N will be in items in the array, not in bytes directly. They're equivalent in your example code (due to dtype=np.int8, or any other 8-bit datatype).
If you did want to write a number of bytes, you could do
f.write(buf.data[:N])
...but slicing the arr.data buffer will allocate a new string, so it's functionally similar to buf[:N].tostring(). At any rate, be aware that doing f.write(buf[:N].tostring()) is different than doing f.write(buf.data[:N]) for most dtypes, but both will allocate a new string.
Next, numpy arrays can share data buffers. In your example case, you don't need to worry about this, but in general, using somearr.data can lead to surprises for this reason.
As an example:
x = np.arange(10, dtype=np.uint8)
y = x[::2]
Now, y shares the same memory buffer as x, but it's not contiguous in memory (have a look at x.flags vs y.flags). Instead it references every other item in x's memory buffer (compare x.strides to y.strides).
If we try to access y.data, we'll get an error telling us that this is not a contiguous array in memory, and we can't get a single-segment buffer for it:
In [5]: y.data
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-54-364eeabf8187> in <module>()
----> 1 y.data
AttributeError: cannot get single-segment buffer for discontiguous array
This is a large part of the reason that numpy array's have a tofile method (it also pre-dates python's buffers, but that's another story).
tofile will write the data in the array to a file without allocating additional memory. However, because it's implemented at the C-level it only works for real file objects, not file-like objects (e.g. a socket, StringIO, etc).
For example:
buf[:N].tofile(f)
However, this is implemented at the C-level, and will only work for actual file objects, not sockets, StringIO, and other file-like objects.
This does allow you to use arbitrary array indexing, however.
buf[someslice].tofile(f)
Will make a new view (same memory buffer), and efficiently write it to disk. In your exact case, it will be slightly slower than slicing the arr.data buffer and directly writing it to disk.
If you'd prefer to use array indexing (and not number of bytes) then the ndarray.tofile method will be more efficient than f.write(arr.tostring()).
I read (int)32 bit audio data (given as string by previous commands) into a numpy.int32 array with :
myarray = numpy.fromstring(data, dtype=numpy.int32)
But then I want to store it in memory as int16 (I know this will decrease the bit depth / resolution / sound quality) :
myarray = myarray >> 16
my_16bit_array = myarray.astype('int16')
It works very well, but : is there a faster solution? (here I use : a string buffer, 1 array in int32, 1 array in int16 ; I wanted to know if it's possible to save one step)
How about this?
np.fromstring(data, dtype=np.uint16)[0::2]
Note however, that overhead of the kind you describe here is common when working with numpy, and cannot always be avoided. If this kind of overhead isn't acceptable for your application, make sure that you plan ahead to write extension modules for the performance critical parts.
Note: it should be 0::2 or 1::2 depending on the endianness of your platform
This question already has answers here:
Is there support for sparse matrices in Python?
(2 answers)
Closed 10 years ago.
I am looking for a solution to store about 10 million floating point (double precision) numbers of a sparse matrix. The matrix is actually a two-dimensional triangular matrix consisting of 1 million by 1 million elements. The element (i,j) is the actual score measure score(i,j) between the element i and element j. The storage method must allow very fast access to this information maybe by memory mapping the file containing the matrix. I certainly don't want to load all the file in memory.
class Score(IsDescription):
grid_i = UInt32Col()
grid_j = UInt32Col()
score = FloatCol()
I've tried pytables by using the Score class as exposed, but I cannot access directly to the element i,j without scanning all the rows. Any suggestion?
10 million double precision floats take up 80 MB of memory. If you store them in a 1 million x 1 million sparse matrix, in CSR or CSC formats, you will need an additional 11 million int32s, for a total of around 125 MB. That's probably less than 7% of the physical memory in your system. And in my experience, on a system with 4GB running a 32-bit version of python, you rarely start having trouble allocating arrays until you try to get a hold of ten times that.
Run the following code on your computer:
for j in itertools.count(100) :
try :
a = np.empty((j * 10**6,), dtype='uint8`)
print 'Allocated {0} MB of memory!'.format(j)
del a
except MemoryError:
print 'Failed to allocate {0} MB of memory!'.format(j)
break
And unless it fails to get you at least 4 times the amount calculated above, don't even hesitate about sticking the whole thing in memory using a scipy.sparse format.
I have no experience with pytables, nor much with numpy's memmap arrays. But it seems to me that either one of those will involve you coding the logic to handle the sparsity, something I would try to avoid unless impossible to.
You should use scipy.sparse. Here's some more info about the formats and usage.