I used netCDF Python library to read netCDF variable which has list(variable) returns correct decimal precision as in the image (using PyCharm IDE). However, when I try to get the element by index, e.g: variable[0], it returns the rounded value instead (e.g: 5449865.55794), while I need 5449865.55793999997.
How can I iterate this list with correct decimal precision ?
Some basic code
from netCDF4 import Dataset
nc_dataset = Dataset(self.get_file().get_filepath(), "r")
variable_name = "E"
// netCDF file contains few variables (axis dimensions)
variable = nc_dataset.variables[variable_name]
variable is not a list but a netCDF object, however when using list() or variable[index] will return element value of the axis dimension.
The decimals you are chasing are bogus. The difference is only in the way these numbers are represented on your screen, not in how they are stored in your computer.
Try the following to convince yourself
a = 5449865.55793999997
a
# prints 5449865.55794
The difference between the two numbers if we were to take them literally is 3x10^-11. The smallest difference a 64 bit floating point variable at the size of a can resolve is more than an order of magnitude larger. So your computer cannot tell these two decimal numbers apart.
But look at the bright side. Your data aren't corrupted by some mysterious process.
Hope this is what suits your needs:
import decimal
print decimal.Decimal('5449865.55793999997')
Related
I am trying to read a dataframe from a csv, do some calculations with it and then export the results to another csv. While doing that I noticed that the value 8.1e-202 is getting changed to 8.1000000000000005e-202. But all the other numbers are represented correctly.
Example:
A example.csv looks like this:
id,e-value
ID1,1e-20
ID2,8.1e-202
ID3,9.24e-203
If I do:
import pandas as pd
df = pd.read_csv("example.csv")
df.iloc[1]["e-value"]
>>> 8.1000000000000005e-202
df.iloc[2]["e-value"]
>>> 9.24e-203
Why is 8.1e-202 being altered but 9.24e-203 isn't?
I tried to change the datatype that pandas is using from the default
df["e-value"].dtype
>>> dtype('float64')
to numpy datatypes like this:
import numpy as np
df = pd.read_csv("./temp/test", dtype={"e-value" : np.longdouble})
but this will just result in:
df.iloc[1]["e-value"]
>>> 8.100000000000000522e-202
Can someone explain to me why this is happening? I can't replicate this problem with any other number. Everything bigger or smaller than 8.1e-202 seems to work normally.
EDIT:
To specify my problem. I am aware that floats are not perfect. My actual problem with this is that once I write the dataframe back to a csv the resulting file will then look like this:
id,e-value
ID1,1e-20
ID2,8.1000000000000005e-202
ID3,9.24e-203
And I need the second row to be ID2,8.1e-202
I "fixed" this by just formatting this column before I write the csv, but I'm unhappy with this solution since the formatting will change other elements to something scientific notation where it was just a normal float.
def format_eval(e):
return "{0:.1e}".format(e)
df["e-value"] = df["e-value"].apply(lambda x: format_eval(x))
Float number representation is something not so simple. Not every real number can be represented and almost all (relatively speaking) are actually approximations. Is not like integers, the precision varies and python has a precision undefined float really.
Each floating point standar will have their own set of real numbers that can represent exactly. There's no work around.
https://en.wikipedia.org/wiki/Single-precision_floating-point_format
https://en.wikipedia.org/wiki/IEEE_754-2008_revision
If the problem really is the arithmetic or comparison, you should consider if error will grow or decrease. For example multiplying by large numbers can grow the representation error.
And also, when comparing you should do things like math.is_close. Basically comparing the distance between the numbers.
If you are trying to represent and operate real numbers, that aren't irrational numbers. Like integers, fractions or decimal numbers with finite digits, you can also consider cast to the proper digit representation like: int, decimal or fraction.
See this for further ideas:
https://davidamos.dev/the-right-way-to-compare-floats-in-python/#:~:text=How%20To%20Compare%20Floats%20in%20Python&text=If%20abs(a%20%2D%20b),rel_tol%20keyword%20argument%20of%20math.
I am trying to read in a 2 dimensional range of values from a ".xlsb" file using xlwings. The range contains a series of formulas, that returns floats. When I read in the values, it gets read in as Decimals rather than floats. The problem is Decimals beyond 4 spots get truncated. For example, I have a value in excel of 0.0913495 but it gets read in as Decimal('0.0913'). To make matters worse, when I try converting these decimals to floats, I see that any precision beyond 4 decimal places has been completely ignored. For example calling float(Decimal('0.0913')) returns 0.0913!
So far I have tried the following to fix this problem, none have worked:
Set precision to 28 by calling decimal.getcontext().prec = 28. I have also tried 7, 8, etc. This seems to change nothing.
Use the .options method: sheet.range("myrange").options(numbers = lambda x : float(x)).value
Tried ".raw_value"
Ironically (2) still returns numbers as decimals, it is as if my options got ignored.
This is a problem as for my particular application I rely on a higher degree of accuracy than 4 decimals places, yet xlwings refuses to read in the estimated values at any precision beyond 4 decimal places. How do I fix this?
For reference, I am using xlwings 0.23.0 with Python 3.8.8 and Excel version 2108 (Build 14326.20238)
I am trying to manipulate a dataframe. The value of in a list which I use to append a column to the dataframe is 161137531201111100. However, I created a dictionary whose keys are the unique values of this column, and I use this dictionary in further operations. This could used to run perfectly before.
However, after trying this code on another data I had the following error:
KeyError: 1.611375312011111e+17
which means that this value is not the of the dictionary; I tried to trace the code, everything seemed to be okay. However, when I opened the csv file of the dataframe I built I found out that the value that is causing the problem is: 161137531201111000 which is not in the list(and ofc not a key in the dictionary) I used to create this column of dataframe. This seems weird. However, I don't know what is the reason? Is there any reason that a number is saved in another way?
And how can I save it as it is in all phases? Also, why did it change in the csv?
No unfortunately, they are not equal
print(1.611375312011111e+17 == 161137531201111000)` # False.
The problem lies in the way floating numbers are handled by computers, in general, and most programming languages, including Python.
Always use integers (and not "too large") when doing computations if you want exact results.
See Is floating point math broken? for generic explanation that you definitely must know as a programmer, even if it's not specific to Python.
(and be aware that Python tries to do a rather good job at keeping precision on integers, that unfortunately won't work on floating-point numbers).
And just for the sake of "fun" with floating point numbers, 1.611375312011111e+17 is actually equal to the integer 161137531201111104!
print(format (1.611375312011111e+17, ".60g")) # shows 161137531201111104
print(1.611375312011111e+17 == 161137531201111104) # True
a = dict()
a[1.611375312011111e+17] = "hello"
#print(a[161137531201111100]) # Key error, as in question
print(a[161137531201111104]) # This one shows "hello" properly!
I'm trying to store first 1000 bernoulli numbers in a dictionary in python. At first I just stored the numbers as it is. So I got an overflow error. Now after going through previous answers I thought of using decimal module.
So here it is
-5218507479961513801890596392421261361036935624312258325065379143295948300812040703848766095836974598734762472300638625802884257082786883956679824964010841565051175167717451747328911935282639583972372470105587187736495055501208701522099921363239317373617854217050435670713936357978555246779460902210809009009539232173 / 2291190
The 260th bernouli number. I was able to store all the previous ones in the dictionary.
This is the sample code I've written.
from decimal import *
d = Decimal
getcontext().prec = 10000
di = {260: d(-5218507479961513801890596392421261361036935624312258325065379143295948300812040703848766095836974598734762472300638625802884257082786883956679824964010841565051175167717451747328911935282639583972372470105587187736495055501208701522099921363239317373617854217050435670713936357978555246779460902210809009009539232173 / 2291190)}
This is the error snap shot
Is there any better way to handle such huge numbers ? Please tell me if there is some thing that can be done to store these numbers.
You should convert the large number to Decimal before doing the division, i.e.:
(Note the end of brackets)
di = {260: d(-5218507479961513801890596392421261361036935624312258325065379143295948300812040703848766095836974598734762472300638625802884257082786883956679824964010841565051175167717451747328911935282639583972372470105587187736495055501208701522099921363239317373617854217050435670713936357978555246779460902210809009009539232173) / 2291190}
I have a question about the plotting. I want to plot some data between ranges :
3825229325678980.0786812569752124806963380417361932
and
3825229325678980.078681262584097479512892231994772
but I get the following error:
Attempting to set identical bottom==top results
in singular transformations; automatically expanding.
bottom=3.82522932568e+15, top=3.82522932568e+15
How should I increase the decimal points here to solve the problem?
The difference between your min and max value is less than the precision an eps of a double (~1e-15).
Basically using a 4-byte floating point representation you can not distinguish between the two numbers.
I suggest to remove all the integer digits from your data and represent only the decimal part. The integer part is only a big constant that you can always add later.
It might be easiest to scale your data to provide a range that looks less like zero.