How do you round a number to correct uncertainty? - python

In python, I have a number: U = 0.02462631224438585 +- 3.350971888120506e-06.
How do I round it to the correct significant figures due to the uncertainty being rounded to 1s.f.?
Is there an easy way of using numpy? Or scipy or are the built-in function the best for this?
I've tried using set_printoptions(precision=3) but this doesn't work.
I've also tried using round(number, significant - len(str(number))), but this seems long-winded.
I'm sure I have used a function that is simply a couple of years ago without having to create my own.
The final number should be U = 2.4626e-02 +- 3e-06
or U = (2.4626 +- 3e-4)e-02

the uncertainties module has the capability of computing the number of significant digits
import uncertainties
a = uncertainties.ufloat(0.02462631224438585, 3.350971888120506e-06)
print(a)
# 0.0246263+/-0.0000034
the defaut is two significant digits, however there is a format key for controlling the output
print('{:.1u}, {:.3uf}, {:.2uL}, {:0.2ue}'.format(a,a,a,a))
# 0.024626+/-0.000003, 0.02462631+/-0.00000335, 0.0246263 \pm 0.0000034, (2.46263+/-0.00034)e-02

Related

Cannot increase accuracy of calculation (not enough decimal spaces) [duplicate]

This question already has answers here:
Is floating point arbitrary precision available?
(5 answers)
Closed 4 years ago.
I have code that is supposed to return a table with values that solve a particular system of equations using Backwards Euler Method.
I wrote code for one a timestep of 0.025, and it has returned me the necessary values, however, when I increase the amount of calculations, at some point the accuracy is lost because of lack of decimal spaces at the end of the values.
It just returns the same numbers over and over.
What can I do to increase accuracy? I have tried working with decimal but it still returns me the same values, some with less accuracy
This is the code that works:
delta3 = float(0.025)
nfinal3 = 1/0.025
ic3 = numpy.array([[1], [1]])
be3 = PrettyTable(['t','x','y'])
for l in range(0, int(nfinal3)):
x3=ic3[0]
y3=ic3[1]
firstline3 = -199*(x3+delta3)-198*(y3+delta3)
secondline3 = 99*(x3+delta3)+98*(y3+delta3)
systems3 = numpy.array([[firstline3],
[secondline3]])
step3 = delta3*systems3
result3 = numpy.array([[ic3[0] + step3[0]], [[ic3[1]+step3[1]]]])
ic3[0]=result3[0]
ic3[1]=result3[1]
be3.add_row([l+1, result3[0], result3[1]])
print be3[0]
and this is the code that gives inaccurate numbers
t4 = 0.01
n4 = 1/t4
ic4 = numpy.array([[1],[1]])
be4 = PrettyTable(['t','x','y'])
for q in range(0, int(n4)):
x4=ic4[0]
y4=ic4[1]
firstline4 = t4*(-199*(x4+t4)-198*(y4+t4))
secondline4 = t4*(99*(x4+t4)+98*(y4+t4))
result4 = numpy.array([[ic4[0]+firstline4], [ic4[1]+secondline4]])
ic4[0]=result4[0]
ic4[1]=result4[1]
be4.add_row([q+1, result4[0], result4[1]])
print be4
I'm relatively new to Python, so I might not understand higher-end concepts, but I would appreciate if anyone could point out what I'm doing wrong or what is a good module or function to use for this.
You could make your numpy arrays to store 128-bit floating points like so: result4 = np.array([[ic4[0]+firstline4], [ic4[1]+secondline4]], dtype = np.float128). You will have to do this for all the numpy arrays being used in the calculation. You can also make your scalars to be 0-Dimensional Numpy arrays with 128-bit like so t4 = np.array(0.01, dtype=np.float128). Then you'll need to re-write your operations using only Numpy arrays.
Another alternative is to use the decimal library from https://docs.python.org/2/library/decimal.html
Welcome to StackOverflow!
For higher accuracy, you could use decimal module in python. Simply wrap your decimal number with Decimal function (e.g. Decimal(0.025)) instead of float. There are operations function in the documentation that you could explore too. Use that in priority to the usual operation functions.

Which stage to round at?

In Python, at what stage should round be used? Take this example: 10 * math.log(x) + 10 If I want this to be rounded which should I use?
round(10 * math.log(x) + 5)
round(10 * math.log(x)) + 5
10 * round(math.log(x)) + 5
My guess would be that rounding early would run the fastest because more arithmetic happens with integers, which seem like they should be faster than floats. Rounding seems less likely to break if some later values change.
Would the answer be the same with int()?
Don't prematurely optimize. In many cases, it's not highly optimized mathematical functions which slow down programs, but the logic, structure or data types used in the calculation.
To that end, I recommend you use cProfile to identify bottlenecks. Note that cProfile itself has an overhead, so it is mostly useful for relative comparisons.
As per #glibdud's comment, you have to understand how rounding will affect your calculation. Try a few examples, or perform a test to see how your error may vary across a large number of inputs.
The earlier you are rounding, the more your result will be affected by this rounding. In my opinion, it all depends on the expectations of your program.
As for the difference between int() and round(), this thread answers it perfectly.
To be more specific to your question about performance : The round() function, that is a python built-in, is implemented in C, and you shouldn't really worry about performance as it will be very, very negligible.
Round function
That entirely depends upon how you want your answer to be formatted and interpreted. I would not be hung up on the speed of the round function though (unless the very minor performance gain is crucial to your program). I would think about what I'm trying to accomplish by rounding. If your goal is to produce an output that is rounded to the nearest integer (for simplicity) then I would encompass your entire arithmetic statement into the round function. If your goal is to only use rounded integers in your log calculations (maybe because you don't want to use floats) then you should only round the math.log(x) function. There is no technical reason why you would use either, but there is definitely a logical reason that you would want to choose either of your options.
Please note that the Python Math.log() function is the base of e by default. By your questions it's unclear what base you expect so I'll assume log base of 10 like Google does. In order to make it equivalent to the mathematical function provided the code would need to be:
import math
#assuming x equals 2
x = 2
function1 = round(10 * math.log(x,10) + 5)
function2 = round(10 * math.log(x,10)) + 5)
function3 = 10 * round(math.log(x,10)) + 5)
function4 = 10*math.log(x,10)+5
print(function1)
print(function2)
print(function3)
print(function4)
Now, assuming x = 2, the calculations for the mathematical equation is 8.01029995664
Looking at the printed output from the above code:
8
8
5
8.010299956639813
It clearly shows that functions 1,2 and 4 are roughly mathematically equivalent with function 3 being incorrect. This is because the round function uses Half and Above rule to round up. Math.log(2,10) results in 0.3, so when the round function happens it drops to zero.
As for the equivalence of int() and round() the link referenced by IMCoins is pretty good. The summation is that int() removes decimal values from a number and the round uses the half and above rule so it will act like the int() for anything less than x.5.
As for the speed question, if accuracy is non-negotiable it would be best to round upon completion of the answer due to the same reasons as why function 3 was wrong above. If you're fairly certain you can round safely at a step, then I agree with the answer above to use CProfile and find the bottlenecks
Hope this helps.
I have no clue, but let's see :)
import time
import math
n = 1000000
x = 5
def timeit(f):
t_0 = time.perf_counter()
for _ in range(n):
f()
t_1 = time.perf_counter()
print((t_1 - t_0)/ n)
def fun1():
round(10 * math.log(x) + 5)
def fun2():
round(10 * math.log(x)) + 5
def fun3():
10 * round(math.log(x)) + 5
[timeit(_) for _ in [fun1, fun2, fun3]]
On my computer the last one is slightly faster than the others.

MATLAB matrix power algorithm

I'm looking to port an algorithm from MATLAB to Python. One step in said algorithm involves taking A^(-1/2) where A is a 9x9 square complex matrix. As I understand it, the square root of matrices (and by extension their inverses) are not-unique.
I've been experimenting with scipy.linalg.fractional_matrix_power and an approximation using A^(-1/2) = exp((-1/2)*log(A)) with numpy's built in expm and logm functions. The former is exceptionally poor and only provides 3 decimal places of precision whereas the latter is decently correct for elements in the top left corner but gets progressively worse as you move down and to the right. This may or may not be a perfectly valid mathematical solution to the expression however it doesn't suffice for this application.
As a result, I'm looking to directly implement MATLAB's matrix power algorithm in Python so that I can 100% confirm the same result each time. Does anyone have any insight or documentation on how this would work? The more parallelizable this algorithm is, the better, as eventually the goal would be to rewrite it in OpenCL for GPU acceleration.
EDIT: An MCVE as requested:
[[(0.591557294607941+4.33680868994202e-19j), (-0.219707725574605-0.35810724986609j), (-0.121305654177909+0.244558388829046j), (0.155552026648172-0.0180264818714123j), (-0.0537690384136066-0.0630740244116577j), (-0.0107526931263697+0.0397896274845627j), (0.0182892503609312-0.00653264433724856j), (-0.00710188853532244-0.0050445035279044j), (-2.20414002823034e-05+0.00373184532662288j)], [(-0.219707725574605+0.35810724986609j), (0.312038814492119+2.16840434497101e-19j), (-0.109433401402399-0.174379997015402j), (-0.0503362231078033+0.108510948023091j), (0.0631826956936223-0.00992931123813742j), (-0.0219902325360141-0.0233215237172002j), (-0.00314837555001163+0.0148621558916679j), (0.00630295247506065-0.00266790359447072j), (-0.00249343102520442-0.00156160619280611j)], [(-0.121305654177909-0.244558388829046j), (-0.109433401402399+0.174379997015402j), (0.136649392858215-1.76182853028894e-19j), (-0.0434623984527311-0.0669251299161109j), (-0.0168737559719828+0.0393768358149159j), (0.0211288536117387-0.00417146769324491j), (-0.00734306979471257-0.00712443264825166j), (-0.000742681625102133+0.00455752452374196j), (0.00179068247786595-0.000862706240042082j)], [(0.155552026648172+0.0180264818714123j), (-0.0503362231078033-0.108510948023091j), (-0.0434623984527311+0.0669251299161109j), (0.0467980890488569+5.14996031930615e-19j), (-0.0140208255975664-0.0209483313237692j), (-0.00472995448413803+0.0117916398375124j), (0.00589653974090387-0.00134198920550751j), (-0.00202109265416585-0.00184021636458858j), (-0.000150793859056431+0.00116822322464066j)], [(-0.0537690384136066+0.0630740244116577j), (0.0631826956936223+0.00992931123813742j), (-0.0168737559719828-0.0393768358149159j), (-0.0140208255975664+0.0209483313237692j), (0.0136137125669776-2.03287907341032e-20j), (-0.00387854073283377-0.0056769786724813j), (-0.0011741038702424+0.00306007798625676j), (0.00144000687517355-0.000355251914809693j), (-0.000481433965262789-0.00042129815655098j)], [(-0.0107526931263697-0.0397896274845627j), (-0.0219902325360141+0.0233215237172002j), (0.0211288536117387+0.00417146769324491j), (-0.00472995448413803-0.0117916398375124j), (-0.00387854073283377+0.0056769786724813j), (0.00347771689075251+8.21621958836671e-20j), (-0.000944046302699304-0.00136521328407881j), (-0.00026318475762475+0.000704212317211994j), (0.00031422288569727-8.10033316327328e-05j)], [(0.0182892503609312+0.00653264433724856j), (-0.00314837555001163-0.0148621558916679j), (-0.00734306979471257+0.00712443264825166j), (0.00589653974090387+0.00134198920550751j), (-0.0011741038702424-0.00306007798625676j), (-0.000944046302699304+0.00136521328407881j), (0.000792908166233942-7.41153828847513e-21j), (-0.00020531962049495-0.000294952695922854j), (-5.36226164765808e-05+0.000145645628243286j)], [(-0.00710188853532244+0.00504450352790439j), (0.00630295247506065+0.00266790359447072j), (-0.000742681625102133-0.00455752452374196j), (-0.00202109265416585+0.00184021636458858j), (0.00144000687517355+0.000355251914809693j), (-0.00026318475762475-0.000704212317211994j), (-0.00020531962049495+0.000294952695922854j), (0.000162971629601464-5.39321759384574e-22j), (-4.03304806590714e-05-5.77159110863666e-05j)], [(-2.20414002823034e-05-0.00373184532662288j), (-0.00249343102520442+0.00156160619280611j), (0.00179068247786595+0.000862706240042082j), (-0.000150793859056431-0.00116822322464066j), (-0.000481433965262789+0.00042129815655098j), (0.00031422288569727+8.10033316327328e-05j), (-5.36226164765808e-05-0.000145645628243286j), (-4.03304806590714e-05+5.77159110863666e-05j), (3.04302590501313e-05-4.10281583826302e-22j)]]
I can think of two explanations, in both cases I accuse user error. In chronological order:
Theory #1 (the subtle one)
My suspicion is that you're copying the printed values of the input matrix from one code as input into the other. I.e. you're throwing away double precision when you switch codes, which gets amplified during the inverse-square-root calculation.
As proof, I compared MATLAB's inverse square root with the very function you're using in python. I will show a 3x3 example due to size considerations, but—spoiler warning—I did the same with a 9x9 random matrix and got two results with condition number 11.245754109790719 (MATLAB) and 11.245754109790818 (numpy). That should tell you something about the similarity of the results without having to save and load the actual matrices between the two codes. I suggest you do this though: keywords are scipy.io.loadmat and savemat.
What I did was generate the random data in python (because that's what I prefer):
>>> import numpy as np
>>> print((np.random.rand(3,3) + 1j*np.random.rand(3,3)).tolist())
[[(0.8404782758300281+0.29389006737780765j), (0.741574080512219+0.7944606900644321j), (0.12788250870304718+0.37304665786925073j)], [(0.8583402784463595+0.13952117266781894j), (0.2138809231406249+0.6233427148017449j), (0.7276466404131303+0.6480559739625379j)], [(0.1784816129006297+0.72452362541158j), (0.2870462766764591+0.8891190037142521j), (0.0980355896905617+0.03022344706473823j)]]
By copying the same truncated output into both codes, I guarantee the correspondence of the inputs.
Example in MATLAB:
>> M = [[(0.8404782758300281+0.29389006737780765j), (0.741574080512219+0.7944606900644321j), (0.12788250870304718+0.37304665786925073j)]; [(0.8583402784463595+0.13952117266781894j), (0.2138809231406249+0.6233427148017449j), (0.7276466404131303+0.6480559739625379j)]; [(0.1784816129006297+0.72452362541158j), (0.2870462766764591+0.8891190037142521j), (0.0980355896905617+0.03022344706473823j)]];
>> A = M^(-0.5);
>> format long
>> disp(A)
0.922112307438377 + 0.919346397931976i 0.108620882045523 - 0.649850434897895i -0.778737740194425 - 0.320654127149988i
-0.423384022626231 - 0.842737730824859i 0.592015668030645 + 0.661682656423866i 0.529361991464903 - 0.388343838121371i
-0.550789874427422 + 0.021129515921025i 0.472026152514446 - 0.502143106675176i 0.942976466768961 + 0.141839849623673i
>> cond(A)
ans =
3.429368520364765
Example in python:
>>> M = [[(0.8404782758300281+0.29389006737780765j), (0.741574080512219+0.7944606900644321j), (0.12788250870304718+0.37304665786925073j)], [(0.8583402784463595+0.13952117266781894j), (0.2138809231406249+0.6233427148017449j), (0.7276466404
... 131303+0.6480559739625379j)], [(0.1784816129006297+0.72452362541158j), (0.2870462766764591+0.8891190037142521j), (0.0980355896905617+0.03022344706473823j)]]
>>> A = fractional_matrix_power(M,-0.5)
>>> print(A)
[[ 0.92211231+0.9193464j 0.10862088-0.64985043j -0.77873774-0.32065413j]
[-0.42338402-0.84273773j 0.59201567+0.66168266j 0.52936199-0.38834384j]
[-0.55078987+0.02112952j 0.47202615-0.50214311j 0.94297647+0.14183985j]]
>>> np.linalg.cond(A)
3.4293685203647408
My suspicion is that if you scipy.io.loadmat the matrix into python, do the calculation, scipy.io.savemat the result and load it back in with MATLAB, you'll see less than 1e-12 absolute error (hopefully even less) between the results.
Theory #2 (the facepalm one)
My suspicion is that you're using python 2, and your -1/2-powered division is a simple inverse:
>>> # python 3 below
>>> # python 3's // is python 2's /, i.e. integer division
>>> 1/2
0.5
>>> 1//2
0
>>> -1/2
-0.5
>>> -1//2
-1
So if you're using python 2, then calling
fractional_matrix_power(M,-1/2)
is actually the inverse of M. The obvious solution is to switch to python 3. The less obvious solution is to keep using python 2 (which you shouldn't, as the above exemplifies), but use
from __future__ import division
on top of your every source file. This will override the behaviour of the simple / division operator so that it reflects the python 3 version, and you will have one less headache.

Numerical precision in python

I've been doing simple numerical experiments with python, like computing
factorials. For instance, compute the factorial of 32:
My routine:
2.6313083693369503e+35
From scipy.misc:
2.6313083693369355e+35
I want to point out that my routine calculates the logarithm of the factorial,
it calculates the sumation of logarithms starting from 1 to 32 (in this case)
and then I just take the exp function (I do it this way because of stuff learned from
Fortran 90).
It is a surprise that the correct answer is
263130836933693530167218012160000000
according to pari/gp.
I would be very happy if someone can point me out to references where I can look for
correct numerical answers in Python. The documentation it's ok but only if one want
"short" numbers.
log and exp functions operate on floating points, which have limited precision. Python's integers, on the other hand, can have arbitrary precision. So, you can compute the factorial of 32 in linear space just fine using integers.
f = 1
for i in xrange(32):
f *= i + 1
print f # prints '263130836933693530167218012160000000'
You can do it this way:
import operator
n=32
print reduce(operator.__mul__,range(1,n+1))
# 263130836933693530167218012160000000

python value becomes zero, how to prevent

I have a numerical problem while doing likelihood ratio tests in python. I'll not go into too much detail about what the statistics mean, my problems comes down to calculating this:
LR = LR_H0 / LR_h1
where LR is the number of interest and LR_H0 and LR_H1 are numbers that can be VERY close to zero. This leads to a few numerical issues; if LR_H1 is too small then python will recognise this as a division by zero.
ZeroDivisionError: float division by zero
Also, although this is not the main issue, if LR_H1 is small enough to allow the division then the fraction LR_H0 / LR_h1 might become too big (I'm assuming that python also has an upper limit value of what a float can be).
Any tips on what the best way is to circumvent this problem? I'm considering doing something like:
def small_enough( num ):
if num == 0.0:
return *other small number*
else:
return num
But this is not ideal because it would approximate the LR value and I would like to guarantee some precision.
Work with logarithms. Take the log of all your likelihoods, and add or subtract logarithms instead of multiplying and dividing. You'll be able to work with much greater ranges of values without losing precision.

Categories

Resources