In my PYTHON program, I am loading a lot of (floating) numbers for later use. I am talking about 100 Million numbers or more. It seems that I run into problems with memory space on RAM. Since the numbers I am saving do not need to have a high precision (3-4 digits would be more than enough) and are usually small (in the range -1000 .. 1000) I do not use the precision provided by a 64bit float.
Is there a possibility to save a floating number using less memory (maybe 8 or 16 bit)?
Thank you!
I would use the types in the numpy library, which provides the following types of interest:
float_
float16
float32
float64
So, if you wanted a 16-bit floating point number (1 sign bit, 5 exponent, and
10 for the mantissa), you could use the following:
import numpy as np
x = np.float16(10.0)
See also, data types in NumPy
Pack them into arrays of float-format values using the struct package's f format.
Related
Use Python 3.9.2 read the beginning of TB size binary file (piece of it) as below:
file=open(filename,'rb')
bytes=file.read(8)
print(bytes)
b'\x14\x00\x80?\xb5\x0c\xf81'
I tried np.fromfile np.fromfile(np.complex64) ways to read the file filename.
float_data1 = np.fromfile(filename,np.float32)
float_data2 = np.fromfile(filename,np.complex64)
As the binary file always bigger than 500GB,even TB size,how to read complex data from TB size binary file, fast and keep the most acuuracy?
This is related to your ham post.
samples = np.fromfile(filename, np.complex128)
and
Those codes equal to -1.9726906072368233e-31,+3.6405886029665884e-23.
No, they don't equal that. That's just your interpretation of bytes as float64. That interpretation is incorrect!
You assume these are 64-bit floating point numbers. They are not; you really need to stop assuming that; it's wrong, and we can't help you if you still act as if it were 64-bit floats forming a 128 bit complex value.
Besides documents,I compare the byte content in the answer,that is more than reading docs.
As I already pointed out, that is wrong. Your computer can read anything as any type, just as you tell them, even if it's not the original type it's been stored in. You stored complex64, but read complex128. That's why your values are so inplausible.
It's 32-bit floats, forming a 64 bit complex value. The official block documentation for the file sink also points that out, and even explains the numpy dtype you need to use!
Anyways, you can use numpy's memmap functionality to map the file contents without reading them all to RAM. That works. Again, you need to use the right dtype, which is, to repeat this the 10th time, not complex128.
It's really easy:
data = numpy.memmap(filename, dtype=numpy.complex64)
done.
I am trying to read in a 2 dimensional range of values from a ".xlsb" file using xlwings. The range contains a series of formulas, that returns floats. When I read in the values, it gets read in as Decimals rather than floats. The problem is Decimals beyond 4 spots get truncated. For example, I have a value in excel of 0.0913495 but it gets read in as Decimal('0.0913'). To make matters worse, when I try converting these decimals to floats, I see that any precision beyond 4 decimal places has been completely ignored. For example calling float(Decimal('0.0913')) returns 0.0913!
So far I have tried the following to fix this problem, none have worked:
Set precision to 28 by calling decimal.getcontext().prec = 28. I have also tried 7, 8, etc. This seems to change nothing.
Use the .options method: sheet.range("myrange").options(numbers = lambda x : float(x)).value
Tried ".raw_value"
Ironically (2) still returns numbers as decimals, it is as if my options got ignored.
This is a problem as for my particular application I rely on a higher degree of accuracy than 4 decimals places, yet xlwings refuses to read in the estimated values at any precision beyond 4 decimal places. How do I fix this?
For reference, I am using xlwings 0.23.0 with Python 3.8.8 and Excel version 2108 (Build 14326.20238)
I have a homebrew binary fixed-point arithmetic support library and would like to add numpy array support. Specifically I would like to be able to pass around 2D arrays of fixed-point binary numbers and do various operations on them such as addition, subtraction, multiplication, rounding, changing of fixed point format, etc.
The fixed-point support under the hood works on integers, and separate tracking of fixed-point format data (number of integer and fractional bits) for range checking and type conversion.
I have been reading the numpy documentation on ndarray subclassing and dtype, it seems like I might want at least a custom dtype, or separate dtype object for every unique range/precision configuration of fixed-point numbers. I tried subclassing numpy.dtype in Python but that is not allowed.
I'm not sure if I can write something to interoperate with numpy in the way I want without writing C level code - everything so far is pure Python, I have avoided looking under the covers at how to work on the C-based layer of numpy.
For anyone interested, this turned out to be too hard to do in Python extended Numpy, or just didn't fit the data model. I ended up writing a separate Python library of types implementing the behaviours I wanted, that use Numpy arrays of integers under the hood for speed.
It works OK and does the strict binary range calculation and checking that I wanted, but suffers Python code speed overhead especially with small arrays. If I had time I'm sure it could be done much better/faster as a C library.
The library fxpmath for python supports Numpy N dimensional arrays for fixed-point numbers and logiacal and arithmetic operations. You can find info at:
https://github.com/francof2a/fxpmath
an example:
from fxpmath import Fxp
# ndim list as input
x = Fxp([[-1.5, 2.5], [0.125, 7.75]])
# ndim list of binary strings
y = Fxp([['0b1100', '0b0110'], ['0b0000', '0b1111']], signed=True, n_frac=2)
import numpy as np
# numpy ndarrays as inputs
z1 = Fxp(np.random.uniform(size=(100,20)), signed=True, n_word=8, n_frac=6)
z2 = Fxp(np.random.uniform(size=(100,20)), signed=True, n_word=8, n_frac=6)
# some operation
z = z1 + z2
I have some fortran code that I would like to convert to python. Said code makes extensive use of a 'type' data structure to describe data files that have long headers containing multiple variables, as well as sub-headers, and the actual data itself which is stored in a five dimensional array; the useable dimensions of the array being defined by other variables in the header. In fortran I use an include file to define the type in each of the suite of programs I use.
In the below I've called the type 'SEQUENCE':
TYPE SEQUENCE
INTEGER NHEADER ! Number of items recorded from file header
CHARACTER*(256) FILE
CHARACTER*(8) UTC
CHARACTER*(10) UTDATE
CHARACTER*(24) OBJ_NAME
CHARACTER*(24) OBJ_CLASS
DOUBLE PRECISION MOD_TEMP
DOUBLE PRECISION MOD_FREQ
CHARACTER*(1) PMT
CHARACTER*(6) MOD
INTEGER FILNUM
CHARACTER*(24) FILTER
DOUBLE PRECISION MOD_AMP
INTEGER HT1
INTEGER GAIN1
INTEGER HT2
INTEGER GAIN2
DOUBLE PRECISION WP_DELTA
DOUBLE PRECISION WPA
DOUBLE PRECISION WPB
CHARACTER*(MAX_WP) WPSEQ
DOUBLE PRECISION ROTA
DOUBLE PRECISION ROTB
CHARACTER*(MAX_ROT) ROTSEQ
DOUBLE PRECISION TELESCOPE_PA
CHARACTER*(7) WAVEFORM
CHARACTER*(256) SKY_SUB
CHARACTER*(256) OS_SUB1
CHARACTER*(256) OS_SUB2
CHARACTER*(256) NOTES
INTEGER REPEATS
INTEGER WP_ROTATIONS
INTEGER ROTATIONS
INTEGER INTEGRATIONS
INTEGER CHANNELS
INTEGER POINTS
DOUBLE PRECISION ROT_POS(MAX_ROT)
DOUBLE PRECISION PRG_MOD_TEMP(MAX_REP,MAX_ROT,MAX_INT)
CHARACTER*(8) PRG_UTC(MAX_REP,MAX_ROT,MAX_INT)
DOUBLE PRECISION DAT(MAX_REP,MAX_ROT,MAX_INT,N_CH,MAX_PT)
DOUBLE PRECISION EXP_TIME
DOUBLE PRECISION INT_TIME
DOUBLE PRECISION TOTAL_SEQ_TIME
END TYPE
In case your wondering these files represent data attained by an instrument that goes through a number of configurations for each data block. Some of the configurations are nominally equivalent others are measuring an opposite state of the system.
As you can see there are a number of different data types in the structure, and times are currently being stored as strings, which is not ideal. Operations involving time are a total pain in fortran, being able to more easily deal with this in python is one of the motivations for switching.
In some of the programs I make arrays of multiple SEQUENCEs. In others I need to perform mathematical operations along particular combinations of the dimensions in SEQUENCE.DAT(). Being fortran this is typically done with lots of loops and if statements.
I'm moderately new to python, so before I rewrite a lot of code I need to figure out what the best way to do this is. Or else do it again in a year.
Initially I hoped that pandas would provide the answer, but it seems you can't create panels of more than 3 dimensions, and the dataframes have to be all the same size. I don't really want to have to create my own classes from scratch as this seems like a lot of work to build functionality that might be more easily attained in another way. Should I be using records? Or something else?
What would you recommend? What advantages are there in terms of simplicity/ease of getting started and/or functionality?
This question already has answers here:
Is there support for sparse matrices in Python?
(2 answers)
Closed 10 years ago.
I am looking for a solution to store about 10 million floating point (double precision) numbers of a sparse matrix. The matrix is actually a two-dimensional triangular matrix consisting of 1 million by 1 million elements. The element (i,j) is the actual score measure score(i,j) between the element i and element j. The storage method must allow very fast access to this information maybe by memory mapping the file containing the matrix. I certainly don't want to load all the file in memory.
class Score(IsDescription):
grid_i = UInt32Col()
grid_j = UInt32Col()
score = FloatCol()
I've tried pytables by using the Score class as exposed, but I cannot access directly to the element i,j without scanning all the rows. Any suggestion?
10 million double precision floats take up 80 MB of memory. If you store them in a 1 million x 1 million sparse matrix, in CSR or CSC formats, you will need an additional 11 million int32s, for a total of around 125 MB. That's probably less than 7% of the physical memory in your system. And in my experience, on a system with 4GB running a 32-bit version of python, you rarely start having trouble allocating arrays until you try to get a hold of ten times that.
Run the following code on your computer:
for j in itertools.count(100) :
try :
a = np.empty((j * 10**6,), dtype='uint8`)
print 'Allocated {0} MB of memory!'.format(j)
del a
except MemoryError:
print 'Failed to allocate {0} MB of memory!'.format(j)
break
And unless it fails to get you at least 4 times the amount calculated above, don't even hesitate about sticking the whole thing in memory using a scipy.sparse format.
I have no experience with pytables, nor much with numpy's memmap arrays. But it seems to me that either one of those will involve you coding the logic to handle the sparsity, something I would try to avoid unless impossible to.
You should use scipy.sparse. Here's some more info about the formats and usage.