I have a question about the plotting. I want to plot some data between ranges :
3825229325678980.0786812569752124806963380417361932
and
3825229325678980.078681262584097479512892231994772
but I get the following error:
Attempting to set identical bottom==top results
in singular transformations; automatically expanding.
bottom=3.82522932568e+15, top=3.82522932568e+15
How should I increase the decimal points here to solve the problem?
The difference between your min and max value is less than the precision an eps of a double (~1e-15).
Basically using a 4-byte floating point representation you can not distinguish between the two numbers.
I suggest to remove all the integer digits from your data and represent only the decimal part. The integer part is only a big constant that you can always add later.
It might be easiest to scale your data to provide a range that looks less like zero.
Related
I am trying to read a dataframe from a csv, do some calculations with it and then export the results to another csv. While doing that I noticed that the value 8.1e-202 is getting changed to 8.1000000000000005e-202. But all the other numbers are represented correctly.
Example:
A example.csv looks like this:
id,e-value
ID1,1e-20
ID2,8.1e-202
ID3,9.24e-203
If I do:
import pandas as pd
df = pd.read_csv("example.csv")
df.iloc[1]["e-value"]
>>> 8.1000000000000005e-202
df.iloc[2]["e-value"]
>>> 9.24e-203
Why is 8.1e-202 being altered but 9.24e-203 isn't?
I tried to change the datatype that pandas is using from the default
df["e-value"].dtype
>>> dtype('float64')
to numpy datatypes like this:
import numpy as np
df = pd.read_csv("./temp/test", dtype={"e-value" : np.longdouble})
but this will just result in:
df.iloc[1]["e-value"]
>>> 8.100000000000000522e-202
Can someone explain to me why this is happening? I can't replicate this problem with any other number. Everything bigger or smaller than 8.1e-202 seems to work normally.
EDIT:
To specify my problem. I am aware that floats are not perfect. My actual problem with this is that once I write the dataframe back to a csv the resulting file will then look like this:
id,e-value
ID1,1e-20
ID2,8.1000000000000005e-202
ID3,9.24e-203
And I need the second row to be ID2,8.1e-202
I "fixed" this by just formatting this column before I write the csv, but I'm unhappy with this solution since the formatting will change other elements to something scientific notation where it was just a normal float.
def format_eval(e):
return "{0:.1e}".format(e)
df["e-value"] = df["e-value"].apply(lambda x: format_eval(x))
Float number representation is something not so simple. Not every real number can be represented and almost all (relatively speaking) are actually approximations. Is not like integers, the precision varies and python has a precision undefined float really.
Each floating point standar will have their own set of real numbers that can represent exactly. There's no work around.
https://en.wikipedia.org/wiki/Single-precision_floating-point_format
https://en.wikipedia.org/wiki/IEEE_754-2008_revision
If the problem really is the arithmetic or comparison, you should consider if error will grow or decrease. For example multiplying by large numbers can grow the representation error.
And also, when comparing you should do things like math.is_close. Basically comparing the distance between the numbers.
If you are trying to represent and operate real numbers, that aren't irrational numbers. Like integers, fractions or decimal numbers with finite digits, you can also consider cast to the proper digit representation like: int, decimal or fraction.
See this for further ideas:
https://davidamos.dev/the-right-way-to-compare-floats-in-python/#:~:text=How%20To%20Compare%20Floats%20in%20Python&text=If%20abs(a%20%2D%20b),rel_tol%20keyword%20argument%20of%20math.
For a certain task, I have too many repeated calls to a complex function, call it f(x) where x is float. I do not have very large floats and not too much precision is required, so I thought why not use a lookup table for f(x), where x is a float16, maximum size of lookup table is (2**16). I was planning on making a small python demo using np.float16. I am a bit stuck on how to iterate over range of all floats. In C/C++, I would have used an uint16_t, kept incrementing it. How do I create this table using python ?
You can generate all the possible values using arange and then reinterpret the values as float16 values using view. Here is an example:
np.arange(65536, dtype=np.uint16).view(np.float16)
This should give you all possible float16 values. Note that many are NaN values.
I used netCDF Python library to read netCDF variable which has list(variable) returns correct decimal precision as in the image (using PyCharm IDE). However, when I try to get the element by index, e.g: variable[0], it returns the rounded value instead (e.g: 5449865.55794), while I need 5449865.55793999997.
How can I iterate this list with correct decimal precision ?
Some basic code
from netCDF4 import Dataset
nc_dataset = Dataset(self.get_file().get_filepath(), "r")
variable_name = "E"
// netCDF file contains few variables (axis dimensions)
variable = nc_dataset.variables[variable_name]
variable is not a list but a netCDF object, however when using list() or variable[index] will return element value of the axis dimension.
The decimals you are chasing are bogus. The difference is only in the way these numbers are represented on your screen, not in how they are stored in your computer.
Try the following to convince yourself
a = 5449865.55793999997
a
# prints 5449865.55794
The difference between the two numbers if we were to take them literally is 3x10^-11. The smallest difference a 64 bit floating point variable at the size of a can resolve is more than an order of magnitude larger. So your computer cannot tell these two decimal numbers apart.
But look at the bright side. Your data aren't corrupted by some mysterious process.
Hope this is what suits your needs:
import decimal
print decimal.Decimal('5449865.55793999997')
Just as a preamble I am using python 3 and the bitstring library.
So Arinc429 words are 32 bit data words.
Bits 1-8 are used to store the label. Say for example I want the word to set the latitude, according to the label docs, set latitude is set to the octal
041
I can model this in python by doing:
label = BitArray(oct='041')
print(label.bin)
>> 000100001
The next two bits can be used to send a source, or extend the label by giving an equipment ID. Equipment IDs are given in hex, the one I wish to use is
002
So again, I add it to a new BitArray object and convert it to binary
>> 000000010
Next comes the data field which spans from bits 11-29. Say I want to set the latitude to the general area of London (51.5072). This is where I'm getting stuck as floats can only be 32/64 bits long.
There are 2 other parts of the word, but before I go there I am just wondering, if I am going along the right track, or way off how you would construct such a word?
Thanks.
I think you're on the right track, but you need to either know or decide the format for your data field.
If the 19 bits you want to represent a float are documented somewhere as being a float then look how that conversion is done (as that's not at all a standard number of bits for a floating point number). If those bits are free-form and you can choose both the encode and decode then just pick something appropriate.
There is a standard for 16-bit floats which is occasionally used, but if you only want to represent a latitude I'd go for something simpler. As it can only got from 0 to 360 just scale that to an integer from 0 to 2^19 and store the integer.
So 51.5072 becomes (51.5072/360*(2**19)) = 75012
Then store this as a unsigned integer
> latitude = BitArray(uint=75012, length=19)
This gives you a resolution of about 0.0007 degrees, which is the best you can hope for. To convert back:
> latitude.uint*360.0/2**19
51.50665283203125
I'm working on a program with fairly complex numerics, mostly in numpy with complex datatypes. Some of the calculation are returning nearly empty arrays with a complex component that is almost zero. For example:
(2 + 0j, 3+0j, 4+3.9320340202e-16j)
Clearly the third component is basically 0, but for whatever reason, this is the output of my calculation and it turns out that for some of these nearly zero values, np.is_complex() returns True. Rather than dig through that big code, I think it's sensible to just apply a cutoff. My question is, what is a sensible cutoff that anything below should be considered a zero? 0.00? 0.000000? etc...
I understand that these values are due to rounding errors in floating point math, and just want to handle them sensibly. What is the tolerance/range one allows for such precision error? I'd like to set it to a parameter:
ABOUTZERO=0.000001
As others have commented, what constitutes 'almost zero' really does depend on your particular application, and how large you expect the rounding errors to be.
If you must use a hard threshold, a sensible value might be the machine epsilon, which is defined as the upper bound on the relative error due to rounding for floating point operations. Intuitively, it is the smallest positive number that, when added to 1.0, gives a result >1.0 using a given floating point representation and rounding method.
In numpy, you can get the machine epsilon for a particular float type using np.finfo:
import numpy as np
print(np.finfo(float).eps)
# 2.22044604925e-16
print(np.finfo(np.float32).eps)
# 1.19209e-07