If I don't set the precision, how many decimal places can Numpy be accurate?
I've been dealing with some scientific computing problems for a while now, usually with precision to 15 decimal places, and I don't know if numpy can do this kind of precision.
If not, how can I set the precision?
Here's a page that might help.
https://numpy.org/doc/stable/user/basics.types.html
The bottom line is that numpy uses the default double precision floating point number, which gives you approximately 16 decimal places of precision on most 64 bit systems. Unfortunately it doesn't support more precision than that.
To calculate with more precision, you'll need some Python libraries. Here's something you might find helpful.
https://zetcode.com/python/decimal/
You haven't specified what data type you're using, but numpy's data types documentation could be a good starting point. You can get maximum precision if you use numpy.longdouble, which is platform-specific and usually depends on what your C compiler considers long double.
Without setting the dtype explicitly, you will get 64 bit precision:
>>> import numpy as np
>>> x = np.array([1.0,2.0,3.0])
>>> print(x.dtype)
float64
Related
I'm trying to use python to investigate the effect of C++ truncating doubles to floats.
In C++ I have relativistic energies and momenta which are cast to floats, and I'm trying to work out whether at these energies saving them as doubles would actually result in any improved precision in the difference between energy and momentum.
I chose python because it seemed like a quick and easy way to look at this with some test energies, but now I realise that the difference between C++ 32 bit floats and 64 bit doubles isn't reflect in python.
I haven't yet found a simple way of taking a number and reducing the number of bits used to store it in python. Please can someone enlighten me?
I'd suggest using Numpy as well. It exposes various data types including C style floats and doubles.
Other useful tools are the C++17 style hex encoding and the decimal module for getting accurate decimal expansions.
For example:
import numpy as np
from decimal import Decimal
for ftype in (np.float32, np.float64):
v = np.exp(ftype(1))
pyf = float(v)
print(f"{v.dtype} {v:20} {pyf.hex()} {Decimal(pyf)}")
giving
float32 2.7182819843292236 0x1.5bf0aa0000000p+1 2.7182819843292236328125
float64 2.718281828459045 0x1.5bf0a8b145769p+1 2.718281828459045090795598298427648842334747314453125
Unfortunately the hex-encoding is a bit verbose for float32s (i.e. the zeros are redundant), and the decimal module doesn't know about Numpy floats, so you need to convert it to a Python-native float first. But given that binary32's can be directly converted to binary64's this doesn't seem too bad.
Just thought, that it sounds like you might want these written out to a file. If so, Numpy scalars (and ndarrays) support the buffer protocol, which means you can just write them out or use bytes(v) to get the underlying bytes.
Is there some type like long in Python 2 that can handle very small number like 8.5e-350?
Or is there any way around in this situation because python supresses floats after 320 decimal places to 0?
The standard library comes with the decimal module
The decimal module provides support for fast correctly-rounded decimal
floating point arithmetic. It offers several advantages over the float
datatype.
...
Unlike hardware based binary floating point, the decimal module has a user alterable precision (defaulting to 28 places) which can be as
large as needed for a given problem:
>>> from decimal import *
>>> getcontext().prec = 500
>>> Decimal(10) / Decimal(3)
Decimal('3.3333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333')
>>> len('3.3333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333')
501
Quick start tutorial
Python uses IEEE 745 so you don't get 350 decimal places, you get about 15 significant figures and a limit on the exponent to +/-350. If that number of significant figures is OK, and you don't need more dynamic range (i.e. all your values are small), express your quantities in different units so your values are in a suitable range.
I am attempting to replicate a DSP algorithm in Python that was originally written in C. The trick is I also need to retain the same behavior of the 32 bit fixed point variables from the C version, including any numerical errors that the limited precision would introduce.
The current options I think are available are:
I know the python Decimal type can be used for fixed-point arithmetic, however from what I can tell there is no way to adjust the size of a Decimal variable. To my knowledge numpy does not support doing fixed point operations.
I did a quick experiment to see how fiddling with the Decimal precision affected things:
>>> a = dc.Decimal(1.1)
>>> a
Decimal('1.100000000000000088817841970012523233890533447265625')
>>> sys.getsizeof(a)
104
>>> dc.getcontext().prec = 16
>>> a = dc.Decimal(1.1)
>>> a
Decimal('1.1999999999999999555910790149937383830547332763671875')
>>> sys.getsizeof(a)
104
There is a change before/after the precision change, however there are still a large number of decimal places. The variable is still the same size, and has quite a few decimal places after it.
How can I best go about achieving the original objective? I do know that Python ctypes has the C language float, but I do not know if that will be useful in this case. I do not know if there is even a way to accurately mimic C type fixed point math in Python.
Thanks!
I recommend fxpmath module for fixed-point operations in python. Maybe with that you can emulate the fixed point arithmetic defining precision in bits (fractional part). It supports arrays and some arithmetic operations.
Repo at: https://github.com/francof2a/fxpmath
Here an example:
from fxpmath import Fxp
x = Fxp(1.1, True, 32, 16) # (val, signed, n_word, n_frac)
print(x)
print(x.precision)
results in:
1.0999908447265625
1.52587890625e-05
I have used one python code in PyCharm in Linux and the format of number was
-91.35357. When I used the same code in PyCharm in Windows format was
-91.35356999999999. The problem is that value is consisted in the file name which I need to open (and the list of files to open is long).
Anyone knows possible explanation and hot to fix it?
Floats
Always remember that float numbers have a limited precision. If you think about it, there must be a limit to how exactly you represent a number if you limit storage to 32 or 64 bits (or any other number).
in Python
Python provides just one float type. Float numbers are usually implemented using 64 bits, but yet they might be 64 bit in one Python binary, 32 bit on another, so you can't really rely on that (however, see #Mark Dickinson comment below).
Let's test this. But note that, because Python does not provide float32 and float64 alternatives, we will use a different library, numpy, to provide us with those types and operations:
>>> n = 1.23456789012345678901234567890
>>> n
1.2345678901234567
>>> numpy.float64(n)
1.2345678901234567
>>> numpy.float32(n)
1.2345679
Here we can see that Python, in my computer, handles the variable as a float64. This already truncates the number we introduced (because a float64 can only handle so much precision).
When we use a float32, precision is further reduced and, because of truncation, the closest number we can represent is slightly different.
Conclusion
Float resolution is limited. Furthermore, some operations behave differently across different architectures.
Even if you are using a consistent float size, not all numbers can be represented, and operations will accumulate truncation errors.
Comparing a float to another float shall be done considering a possible error margin. Do not use float_a == float_b, instead use abs(float_a - float_b) < error_margin.
Relying on float representations is always a bad idea. Python sometimes uses scientific notation:
>>> a = 0.0000000001
>>> str(a)
'1e-10'
You can get consistent rounding approximation (ie, to use in file names), but remember that storage and representation are different things. This other thread may assist you: Limiting floats to two decimal points
In general, I'd advise against using float numbers in file names or as any other kind of identifier.
Latitude / Longitude
float32 numbers have not enough precision to represent the 5th and 6th decimal numbers in latitude/longitude pairs (depending on whether the integer part has one, two or three digits).
If you want to learn what's really happening, check this page and test some of your numbers: https://www.h-schmidt.net/FloatConverter/IEEE754.html
Representing
Note that Python rounds float values when representing them:
>>> lat = 123.456789
>>> "{0:.6f}".format(lat)
'123.456789'
>>> "{0:.5f}".format(lat)
'123.45679'
And as stated above, latitude/longitude cannot be correctly represented by a float32 down to the 6th decimal, and furthermore, the truncated float values are rounded when presented by Python:
>>> lat = 123.456789
>>> lat
123.456789
>>> "{0:.5f}".format(numpy.float64(lat))
'123.45679'
>>> "{0:.5f}".format(numpy.float32(lat))
'123.45679'
>>> "{0:.6f}".format(numpy.float32(lat))
'123.456787'
As you can see, the rounded version of that float32 number fails to match the original number from the 5th decimal. But also does the rounded version to the 5th decimal of the float64 number.
Your PyCharm on Linux is simply rounding of your large floating point number. Rounding it off to the nearest 6 or 7 can resolve your issue but DONT USE THESE AS FILE NAMES.
Keeping your code constant in both cases then, their can be many explanations:
1) 32-bit Processors handles float differently than 64-Bit Processors.
2) PyCharm for both Linux and Windows behaves differently for floating points which we cannot determine exactly, may be PyCharm for Windows is better optimised.
edit 1
Explanation for Point 1
on 32-Bit processors everything is really done in 80-bit precision internally. The precision really just determines how many of those bits are stored in memory. This is part of the reason why different optimisation settings can change results slightly: They change the amount of rounding from 80-bit to 32- or 64-bit.
edit 2
You can use hashmapping for saving your data in files and then mapping them onto the co-ordinates.
Example:
# variable = {(long,lat):"<random_file_name>"}
cordinates_and_file ={(-92.45453534,-87.2123123):"AxdwaWAsdAwdz"}
when performing math operations on float16 Numpy numbers, the result is also in float16 type number.
My question is how exactly the result is computed?
Say Im multiplying/adding two float16 numbers, does python generate the result in float32 and then truncate/round the result to float16? Or does the calculation performed in '16bit multiplexer/adder hardware' all the way?
another question - is there a float8 type? I couldnt find this one... if not, then why? Thank-you all!
To the first question: there's no hardware support for float16 on a typical processor (at least outside the GPU). NumPy does exactly what you suggest: convert the float16 operands to float32, perform the scalar operation on the float32 values, then round the float32 result back to float16. It can be proved that the results are still correctly-rounded: the precision of float32 is large enough (relative to that of float16) that double rounding isn't an issue here, at least for the four basic arithmetic operations and square root.
In the current NumPy source, this is what the definition of the four basic arithmetic operations looks like for float16 scalar operations.
#define half_ctype_add(a, b, outp) *(outp) = \
npy_float_to_half(npy_half_to_float(a) + npy_half_to_float(b))
#define half_ctype_subtract(a, b, outp) *(outp) = \
npy_float_to_half(npy_half_to_float(a) - npy_half_to_float(b))
#define half_ctype_multiply(a, b, outp) *(outp) = \
npy_float_to_half(npy_half_to_float(a) * npy_half_to_float(b))
#define half_ctype_divide(a, b, outp) *(outp) = \
npy_float_to_half(npy_half_to_float(a) / npy_half_to_float(b))
The code above is taken from scalarmath.c.src in the NumPy source. You can also take a look at loops.c.src for the corresponding code for array ufuncs. The supporting npy_half_to_float and npy_float_to_half functions are defined in halffloat.c, along with various other support functions for the float16 type.
For the second question: no, there's no float8 type in NumPy. float16 is a standardized type (described in the IEEE 754 standard), that's already in wide use in some contexts (notably GPUs). There's no IEEE 754 float8 type, and there doesn't appear to be an obvious candidate for a "standard" float8 type. I'd also guess that there just hasn't been that much demand for float8 support in NumPy.
This answer builds on the float8 aspect of the question. The accepted answer covers the rest pretty well.One of the main reasons there isn't a widely accepted float8 type, other than a lack of standard is that its not very useful practically.
Primer on Floating Point
In standard notation, a float[n] data type is stored using n bits in memory. That means that at most only 2^n unique values can be represented. In IEEE 754, a handful of these possible values, like nan, aren't even numbers as such. That means all floating point representations (even if you go float256) have gaps in the set of rational numbers that they are able to represent and they round to the nearest value if you try to get a representation for a number in this gap. Generally the higher the n, the smaller these gaps are.
You can see the gap in action if you use the struct package to get the binary representation of some float32 numbers. Its a bit startling to run into at first but there's a gap of 32 just in the integer space:
import struct
billion_as_float32 = struct.pack('f', 1000000000 + i)
for i in range(32):
billion_as_float32 == struct.pack('f', 1000000001 + i) // True
Generally, floating point is best at tracking only the most significant bits so that if your numbers have the same scale, the important differences are preserved. Floating point standards generally differ only in the way they distribute the available bits between a base and an exponent. For instance, IEEE 754 float32 uses 24 bits for the base and 8 bits for the exponent.
Back to float8
By the above logic, a float8 value can only ever take on 256 distinct values, no matter how clever you are in splitting the bits between base and exponent. Unless you're keen on it rounding numbers to one of 256 arbitrary numbers clustered near zero its probably more efficient to just track the 256 possibilities in a int8.
For instance, if you wanted to track a very small range with coarse precision you could divide the range you wanted into 256 points and then store which of the 256 points your number was closest to. If you wanted to get really fancy you could have a non-linear distribution of values either clustered at the centre or at the edges depending on what mattered most to you.
The likelihood of anyone else (or even yourself later on) needing this exact scheme is extremely small and most of the time the extra byte or 3 you pay as a penalty for using float16 or float32 instead is too small to make a meaningful difference. Hence...almost no one bothers to write up a float8 implementation.