When I load an array using numpy.loadtxt, it seems to take too much memory. E.g.
a = numpy.zeros(int(1e6))
causes an increase of about 8MB in memory (using htop, or just 8bytes*1million \approx 8MB). On the other hand, if I save and then load this array
numpy.savetxt('a.csv', a)
b = numpy.loadtxt('a.csv')
my memory usage increases by about 100MB! Again I observed this with htop. This was observed while in the iPython shell, and also while stepping through code using Pdb++.
Any idea what's going on here?
After reading jozzas's answer, I realized that if I know ahead of time the array size, there is a much more memory efficient way to do things if say 'a' was an mxn array:
b = numpy.zeros((m,n))
with open('a.csv', 'r') as f:
reader = csv.reader(f)
for i, row in enumerate(reader):
b[i,:] = numpy.array(row)
Saving this array of floats to a text file creates a 24M text file. When you re-load this, numpy goes through the file line-by-line, parsing the text and recreating the objects.
I would expect memory usage to spike during this time, as numpy doesn't know how big the resultant array needs to be until it gets to the end of the file, so I'd expect there to be at least 24M + 8M + other temporary memory used.
Here's the relevant bit of the numpy code, from /lib/npyio.py:
# Parse each line, including the first
for i, line in enumerate(itertools.chain([first_line], fh)):
vals = split_line(line)
if len(vals) == 0:
continue
if usecols:
vals = [vals[i] for i in usecols]
# Convert each value according to its column and store
items = [conv(val) for (conv, val) in zip(converters, vals)]
# Then pack it according to the dtype's nesting
items = pack_items(items, packing)
X.append(items)
#...A bit further on
X = np.array(X, dtype)
This additional memory usage shouldn't be a concern, as this is just the way python works - while your python process appears to be using 100M of memory, internally it maintains knowledge of which items are no longer used, and will re-use that memory. For example, if you were to re-run this save-load procedure in the one program (save, load, save, load), your memory usage will not increase to 200M.
Here is what I ended up doing to solve this problem. It works even if you don't know the shape ahead of time. This performs the conversion to float first, and then combines the arrays (as opposed to #JohnLyon's answer, which combines the arrays of string then converts to float). This used an order of magnitude less memory for me, although perhaps was a bit slower. However, I literally did not have the requisite memory to use np.loadtxt, so if you don't have sufficient memory, then this will be better:
def numpy_loadtxt_memory_friendly(the_file, max_bytes = 1000000, **loadtxt_kwargs):
numpy_arrs = []
with open(the_file, 'rb') as f:
i = 0
while True:
print(i)
some_lines = f.readlines(max_bytes)
if len(some_lines) == 0:
break
vec = np.loadtxt(some_lines, **loadtxt_kwargs)
if len(vec.shape) < 2:
vec = vec.reshape(1,-1)
numpy_arrs.append(vec)
i+=len(some_lines)
return np.concatenate(numpy_arrs, axis=0)
Related
I have some python code for reading data from RAM of an FPGA and writing it to disk on my computer. The code's runtime is 2.56sec. I need to bring it down to 2sec.
mem = device.getNode("udaq.readout_mem").readBlock(16384)
device.dispatch()
ram.append(mem)
ram.reverse()
memory = ram.pop()
for j in range(16384):
if 0 < j < 4096:
f.write('0x%05x\t0x%08x\n' %(j, memory[j]))
if 8192 < j < 12288:
f.write('0x%05x\t0x%08x\n' %(j, memory[j]))
Your loop is very unefficient. You're literally iterating for nothing when values aren't in range. And you're spending a lot of time testing the indices.
Don't do one loop & 2 tests. Just create 2 loops without index tests (note that first index is skipped if we respect your tests:
for j in range(1,4096):
f.write('0x%05x\t0x%08x\n' %(j, memory[j]))
for j in range(8193,12288):
f.write('0x%05x\t0x%08x\n' %(j, memory[j]))
maybe more pythonic & more concise (& not using memory[j] so it has a chance to be faster):
import itertools
for start,end in ((1,4096),(8193,12288)):
sl = itertools.islice(memory,start,end)
for j,m in enumerate(sl,start):
f.write('0x%05x\t0x%08x\n' %(j, m))
the outer loop saves the 2 loops (so if there are more offsets, just add them in the tuple list). The islice object creates a slice of the memory but no copies are made. It iterates without checking the indices each time for array out of bounds, so it can be faster. It has yet to be benched, but the writing to disk is probably taking a lot of time as well.
Jean-François Fabre's observations on the loops are very good, but we can go further. The code is performing around 8000 write operations, of constant size, and with nearly the same content. We can prepare a buffer to do that in one operation.
# Prepare buffer with static portions
addresses = list(range(1,4096)) + list(range(8193,12288))
dataoffset = 2+5+1+2
linelength = dataoffset+8+1
buf = bytearray(b"".join(b'0x%05x\t0x%08x\n'%(j,0)
for j in addresses))
# Later on, fill in data
for line,address in enumerate(addresses):
offset = linelength*line+dataoffset
buf[offset:offset+8] = b"%08x"%memory[address]
f.write(buf)
This means far fewer system calls. It's likely we can go even further by e.g. reading the memory as a buffer and using b2a_hex or similar rather than a string formatting per word. It might also make sense to precalculate the offsets rather than using enumerate.
I have large binary data files that have a predefined format, originally written by a Fortran program as little endians. I would like to read these files in the fastest, most efficient manner, so using the array package seemed right up my alley as suggested in Improve speed of reading and converting from binary file?.
The problem is the pre-defined format is non-homogeneous. It looks something like this:
['<2i','<5d','<2i','<d','<i','<3d','<2i','<3d','<i','<d','<i','<3d']
with each integer i taking up 4 bytes, and each double d taking 8 bytes.
Is there a way I can still use the super efficient array package (or another suggestion) but with the right format?
Use struct. In particular, struct.unpack.
result = struct.unpack("<2i5d...", buffer)
Here buffer holds the given binary data.
It's not clear from your question whether you're concerned about the actual file reading speed (and building data structure in memory), or about later data processing speed.
If you are reading only once, and doing heavy processing later, you can read the file record by record (if your binary data is a recordset of repeated records with identical format), parse it with struct.unpack and append it to a [double] array:
from functools import partial
data = array.array('d')
record_size_in_bytes = 9*4 + 16*8 # 9 ints + 16 doubles
with open('input', 'rb') as fin:
for record in iter(partial(fin.read, record_size_in_bytes), b''):
values = struct.unpack("<2i5d...", record)
data.extend(values)
Under assumption you are allowed to cast all your ints to doubles and willing to accept increase in allocated memory size (22% increase for your record from the question).
If you are reading the data from file many times, it could be worthwhile to convert everything to one large array of doubles (like above) and write it back to another file from which you can later read with array.fromfile():
data = array.array('d')
with open('preprocessed', 'rb') as fin:
n = os.fstat(fin.fileno()).st_size // 8
data.fromfile(fin, n)
Update. Thanks to a nice benchmark by #martineau, now we know for a fact that preprocessing the data and turning it into an homogeneous array of doubles ensures that loading such data from file (with array.fromfile()) is ~20x to ~40x faster than reading it record-per-record, unpacking and appending to array (as shown in the first code listing above).
A faster (and a more standard) variation of record-by-record reading in #martineau's answer which appends to list and doesn't upcast to double is only ~6x to ~10x slower than array.fromfile() method and seems like a better reference benchmark.
Major Update: Modified to use proper code for reading in a preprocessed array file (function using_preprocessed_file() below), which dramatically changed the results.
To determine what method is faster in Python (using only built-ins and the standard libraries), I created a script to benchmark (via timeit) the different techniques that could be used to do this. It's a bit on the longish side, so to avoid distraction, I'm only posting the code tested and related results. (If there's sufficient interest in the methodology, I'll post the whole script.)
Here are the snippets of code that were compared:
#TESTCASE('Read and constuct piecemeal with struct')
def read_file_piecemeal():
structures = []
with open(test_filenames[0], 'rb') as inp:
size = fmt1.size
while True:
buffer = inp.read(size)
if len(buffer) != size: # EOF?
break
structures.append(fmt1.unpack(buffer))
return structures
#TESTCASE('Read all-at-once, then slice and struct')
def read_entire_file():
offset, unpack, size = 0, fmt1.unpack, fmt1.size
structures = []
with open(test_filenames[0], 'rb') as inp:
buffer = inp.read() # read entire file
while True:
chunk = buffer[offset: offset+size]
if len(chunk) != size: # EOF?
break
structures.append(unpack(chunk))
offset += size
return structures
#TESTCASE('Convert to array (#randomir part 1)')
def convert_to_array():
data = array.array('d')
record_size_in_bytes = 9*4 + 16*8 # 9 ints + 16 doubles (standard sizes)
with open(test_filenames[0], 'rb') as fin:
for record in iter(partial(fin.read, record_size_in_bytes), b''):
values = struct.unpack("<2i5d2idi3d2i3didi3d", record)
data.extend(values)
return data
#TESTCASE('Read array file (#randomir part 2)', setup='create_preprocessed_file')
def using_preprocessed_file():
data = array.array('d')
with open(test_filenames[1], 'rb') as fin:
n = os.fstat(fin.fileno()).st_size // 8
data.fromfile(fin, n)
return data
def create_preprocessed_file():
""" Save array created by convert_to_array() into a separate test file. """
test_filename = test_filenames[1]
if not os.path.isfile(test_filename): # doesn't already exist?
data = convert_to_array()
with open(test_filename, 'wb') as file:
data.tofile(file)
And here were the results running them on my system:
Fastest to slowest execution speeds using Python 3.6.1
(10 executions, best of 3 repetitions)
Size of structure: 164
Number of structures in test file: 40,000
file size: 6,560,000 bytes
Read array file (#randomir part 2): 0.06430 secs, relative 1.00x ( 0.00% slower)
Read all-at-once, then slice and struct: 0.39634 secs, relative 6.16x ( 516.36% slower)
Read and constuct piecemeal with struct: 0.43283 secs, relative 6.73x ( 573.09% slower)
Convert to array (#randomir part 1): 1.38310 secs, relative 21.51x (2050.87% slower)
Interestingly, most of the snippets are actually faster in Python 2...
Fastest to slowest execution speeds using Python 2.7.13
(10 executions, best of 3 repetitions)
Size of structure: 164
Number of structures in test file: 40,000
file size: 6,560,000 bytes
Read array file (#randomir part 2): 0.03586 secs, relative 1.00x ( 0.00% slower)
Read all-at-once, then slice and struct: 0.27871 secs, relative 7.77x ( 677.17% slower)
Read and constuct piecemeal with struct: 0.40804 secs, relative 11.38x (1037.81% slower)
Convert to array (#randomir part 1): 1.45830 secs, relative 40.66x (3966.41% slower)
Take a look at the documentation for numpy's fromfile function: https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.fromfile.html and https://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html#arrays-dtypes-constructing
Simplest example:
import numpy as np
data = np.fromfile('binary_file', dtype=np.dtype('<i8, ...'))
Read more about "Structured Arrays" in numpy and how to specify their data type(s) here: https://docs.scipy.org/doc/numpy/user/basics.rec.html#
There's a lot of good and helpful answers here, but I think the best solution needs more explaining. I implemented a method that reads the entire data file in one pass using the built-in read() and constructs a numpy ndarray all at the same time. This is more efficient than reading the data and constructing the array separately, but it's also a bit more finicky.
line_cols = 20 #For example
line_rows = 40000 #For example
data_fmt = 15*'f8,'+5*'f4,' #For example (15 8-byte doubles + 5 4-byte floats)
data_bsize = 15*8 + 4*5 #For example
with open(filename,'rb') as f:
data = np.ndarray(shape=(1,line_rows),
dtype=np.dtype(data_fmt),
buffer=f.read(line_rows*data_bsize))[0].astype(line_cols*'f8,').view(dtype='f8').reshape(line_rows,line_cols)[:,:-1]
Here, we open the file as a binary file using the 'rb' option in open. Then, we construct our ndarray with the proper shape and dtype to fit our read buffer. We then reduce the ndarray into a 1D array by taking its zeroth index, where all our data is hiding. Then, we reshape the array using np.astype, np.view and np.reshape methods. This is because np.reshape doesn't like having data with mixed dtypes, and I'm okay with having my integers expressed as doubles.
This method is ~100x faster than looping line-for-line through the data, and could potentially be compressed down into a single line of code.
In the future, I may try to read the data in even faster using a Fortran script that essentially converts the binary file into a text file. I don't know if this will be faster, but it may be worth a try.
My problem is, I need to read around 50M lines from a file in format
x1 "\t" x2 "\t" .. x10 "\t" count
and then to compute the matrix A with components A[j][i] = Sum (over all lines) count * x_i * x_j.
I tried 2 approaches, both reading the file line per line:
1) keep A a Python matrix and update in for loop:
for j in range(size):
for i in range(size):
A[j][i] += x[j] * x[i] * count
2) make A a numpy array, and update using numpy.add:
numpy.add(A, count * numpy.outer(x, x))
What surprised me is that the 2nd approach has been around 30% slower than the first one. And both are really slow - around 10 minutes for the whole file...
Is there some way to speed up the calculation of the matrix? Maybe there is some function that would read the data entirely from the file (or in large chunks) and not line per line? Any suggestions?
Some thoughts:
Use pandas.read_csv with the C engine to read the file. It is a lot faster than np.genfromtxt because the engine is c/Cython optimized.
You can read the whole file in memory and then do the calculations. this is the easiest way but from an efficiency perspective your CPU will be mostly idle waiting for input. This time could be better used calculating stuff.
You can try to read and process line by line (ex: with the cvs module). While io will still be the bottleneck by the end you will have processed your file. The problem here is that you still will have some efficiency loss due to the Python overhead.
Probably the best combination would be to read by chunks using pandas.read_csv with the iterator and chunk_size parameters set and process chunks at a time. I bet there is an optimal chunk size that will beat the other methods.
Your matrix is symmetric, compute just the upper half using your first approach (55 computations per row instead of 100).
The second approach is slower. I don't know why but, if you're instantiating 50M small ndarrays, it is possible that's the bottleneck and possibly using a single ndarray and copying each row data
x = np.zeros((11,))
for l in data.readlines():
x[:] = l.split()
A+=np.outer(x[:-1],x[:-1])*x[-1]
may result in a speedup.
Depending on how much memory you have available on you machine, you try using a regular expression to parse the values and numpy reshaping and slicing to apply the calculations. If you run out of memory, consider a similar approach but read the file in, say, 1M line chunks.
txt = open("C:/temp/input.dat").read()
values = re.split("[\t|\n]", txt.strip())
thefloats = [ float(x) for x in values]
mat = np.reshape(thefloats, (num_cols, num_rows))
for i in range(len(counts)):
mat[:-1,i] *= counts[-1,i]
I have a piece of software that reads a file and transforms each first value it reads per line using a function (derived from numpy.polyfit and numpy.poly1d functions).
This function has to then write the transformed file away and I wrongly (it seems) assumed that the disk I/O part was the performance bottleneck.
The reason why I claim that it is the transformation that is slowing things down is because I tested the code (listed below) after i changed transformedValue = f(float(values[0])) into transformedValue = 1000.00 and that took the time required down from 1 min to 10 seconds.
I was wondering if anyone knows of a more efficient way to perform repeated transformations like this?
Code snippet:
def transformFile(self, f):
""" f contains the function returned by numpy.poly1d,
inputFile is a tab seperated file containing two floats
per line.
"""
with open (self.inputFile,'r') as fr:
for line in fr:
line = line.rstrip('\n')
values = line.split()
transformedValue = f(float(values[0])) # <-------- Bottleneck
outputBatch.append(str(transformedValue)+" "+values[1]+"\n")
joinedOutput = ''.join(outputBatch)
with open(output,'w') as fw:
fw.write(joinedOutput)
The function f is generated by another function, the function fits a 2d degree polynomial through a set of expected floats and a set of measured floats. A snippet from that function is:
# Perform 2d degree polynomial fit
z = numpy.polyfit(measuredValues,expectedValues,2)
f = numpy.poly1d(z)
-- ANSWER --
I have revised the code to vectorize the values prior to transforming them, which significantly speed-up the performance, the code is now as follows:
def transformFile(self, f):
""" f contains the function returned by numpy.poly1d,
inputFile is a tab seperated file containing two floats
per line.
"""
with open (self.inputFile,'r') as fr:
outputBatch = []
x_values = []
y_values = []
for line in fr:
line = line.rstrip('\n')
values = line.split()
x_values.append(float(values[0]))
y_values.append(int(values[1]))
# Transform python list into numpy array
xArray = numpy.array(x_values)
newArray = f(xArray)
# Prepare the outputs as a list
for index, i in enumerate(newArray):
outputBatch.append(str(i)+" "+str(y_values[index])+"\n")
# Join the output list elements
joinedOutput = ''.join(outputBatch)
with open(output,'w') as fw:
fw.write(joinedOutput)
It's difficult to suggest improvements without knowing exactly what your function f is doing. Are you able to share it?
However, in general many NumPy operations often work best (read: "fastest") on NumPy array objects rather than when they are repeated multiple times on individual values.
You might like to consider reading the numbers values[0] into a Python list, passing this to a NumPy array and using vectorisable NumPy operations to obtain an array of output values.
I have very long arrays and tables of time-value pairs in pytables. I need to be able to perform linear interpolation and zero order hold interpolation on this data.
Currently, I'm turning the columns into numpy arrays using pytables' column-wise slice notation and then feeding the numpy arrays to scipy.interpolate.interp1d to create the interpolation functions.
Is there a better way to do this?
The reason I ask is that it is my understanding that turning the columns into numpy arrays basically copies them into memory. Which means that when I start running my code full throttle I'm going to be in trouble since I will be working with data sets large enough to drown my desktop. Please correct me if I'm mistaken on this point.
Also, due to the large amounts of data I'll be working with, I suspect that writing a function that iterates over the pytables arrays/tables in order to do the interpolation myself will be incredibly slow since I need to call the interpolation function many, many times (about as many times as there are records in the data I'm trying to interpolate).
Your question is difficult to answer because there is always a trade off between memory and computation time and you are essentially asking to not have to sacrifice either of them, which is impossible. scipy.interpolate.interp1d() requires that the arrays be in memory and writing an out-of-core interpolator requires that you query the disk linearly with the number of times that you call it.
That said, there are a couple of things that you can do, none of which are perfect.
The first thing that you can try is down sampling the data. This will cut down the data that you need to have in memory by the factor that you down sample. The disadvantage is that your interpolation is that much coarser. Luckily this is pretty easy to do. Just provide a step size to the columns that you access. For down sampling factor of 4 you would do:
with tb.open_file('myfile.h5', 'r') as f:
x = f.root.mytable.cols.x[::4]
y = f.root.mytable.cols.y[::4]
f = scipy.interpolate.interp1d(x, y)
ynew = f(xnew)
You could make this step size adjustable based on the memory available if you wanted to as well.
Alternatively, if the data set that you are interpolating values for - xnew - exists only on a subset of the original domain, you can get away with reading in only portions of the original table that are in the new neighborhood. Given a fudge factor of 10%, you would do something like the following:
query = "{0} <= x & x <= {1}".format(xnew.min()*0.9, xnew.max()*1.1)
with tb.open_file('myfile.h5', 'r') as f:
data = f.root.mytable.read_where(query)
f = scipy.interpolate.interp1d(data['x'], data['y'])
ynew = f(xnew)
Extending this idea, if we have the case where xnew sorted (monotonically increasing) but does extend over the entire original domain, then you can read in from the table on disk in a chunked fashion. Say we want to have 10 chunks:
newlen = len(xnew)
chunks = 10
chunklen = newlen/ chunks
ynew = np.empty(newlen, dtype=float)
for i in range(chunks):
xnew_chunk = xnew[i*chunklen:(i+1)*chunklen]
query = "{0} <= x & x <= {1}".format(xnew_chunklen.min()*0.9,
xnew_chunklen.max()*1.1)
with tb.open_file('myfile.h5', 'r') as f:
data = f.root.mytable.read_where(query)
f = scipy.interpolate.interp1d(data['x'], data['y'])
ynew[i*chunklen:(i+1)*chunklen] = f(xnew_chunk)
Striking the balance between memory and I/O speed is always a challenge. There are probably things that you can do to speed these strategies up depending on how regular your data is. Still, this should be enough to get you started.