Python Numbers mysteriously being rounded on comparison - python

I was having some problem with Numpy arrays and I stumbled across it, and it confused me.
I'm trying to compare 2 parts of arrays using array_equal
np.array_equal(updated_image_values[j][k],np.array(initial_means[i]))
This is returning False when the numbers are
[ 0.90980393 0.8392157 0.65098041]
[ 0.90980393 0.8392157 0.65098041]
Above is my print of the two arrays.
However, when I print the individual elements one seems to be rounded of for no reason
print updated_image_values[j][k][0] #0.909804
print initial_means[i][0] #0.90980393
Then obviously when these individual elements are compared it returns False
print updated_image_values[j][k][0]==initial_means[i][0] #False
Can anyone explain why Python is doing the comparison wrong and for no apparent reason rounding the numbers?

I assume that updated_image_values has had some operations done on it. And what classes are the numbers?
My guess is that what you're seeing isn't "rounding", it's got something to do with the __str__ or __repr__ functions of the classes. The fact that you're seeing 0.90980393 when you print the list means that the element is not really rounded to 0.909804. Try "{0:.10f}".format(updated_image_values[j][k][0]).
As for the comparison, you're probably seeing the floating point operations change the value enough that it's outside of the tolerance of array_equal. Try using isclose instead.

Related

numpy.linalg.det returns very small numbers instead of 0

I calculated the determinant of matrix using np.linalg.det(matrix) but it returns weird values. For example, it gives 1.1012323e-16 instead of 0.
Of course, I can round the result with numpy.around, but is there any option to set some "default" rounding for results of all numpy methods, including numpy.linalg.det?
The value of the determinant looking "weird" is due to the floating point arithmetic, you can look it up.
Regarding your question, I believe numpy.set_printoptions is what you are looking for. Please, see Docs

Can I discard the complex portion of results generated with scipy.linalg.logm?

I have a matrix that look like the one below. It is always a square matrix (up to 1000 x 1000) with the values are between 0 and 1:
data = np.array([[0.0308, 0.07919, 0.05694, 0.00662, 0.00927],
[0.07919, 0.00757, 0.00720, 0.00526, 0.00709],
[0.05694, 0.00720, 0.00518, 0.00707, 0.00413],
[0.00662, 0.00526, 0.00707, 0.01612, 0.00359],
[0.00927, 0.00709, 0.00413, 0.00359, 0.00870]])
When I try to take the natural log of this matrix, using scipy.linalg.logm, it gives me the following result.
print(logm(data))
>> [[-2.3492917 +1.42962407j 0.15360003-1.26717846j 0.15382223-0.91631624j 0.15673496+0.0443927j 0.20636448-0.01113953j]
[ 0.15360003-1.26717846j -3.75764578+2.16378501j 1.92614937-0.60836013j -0.13584605+0.27652444j 0.27819383-0.25190565j]
[ 0.15382223-0.91631624j 1.92614937-0.60836013j -5.08018989+2.52657239j 0.37036433-0.45966441j -0.03892575+0.36450564j]
[ 0.15673496+0.0443927j -0.13584605+0.27652444j 0.37036433-0.45966441j -4.22733838+0.09726189j 0.26291385-0.07980921j]
[ 0.20636448-0.01113953j 0.27819383-0.25190565j -0.03892575+0.36450564j 0.26291385-0.07980921j -4.91972246+0.06594195j]]
First of all, why is this happening? Based on another post I found here, pertaining to a different scipy.linalg method, this is due to truncation and rounding issues caused by floating point errors.
If that is correct, then how am I able to fix it? The second answer on that same linked post suggested this:
(2) All imaginary parts returned by numpy's linalg.eig are close to the machine precision. Thus you should consider them zero.
Is this correct? I can use numpy.real(data) to simply discard the complex portion of the values, but I don't know if that is a mathematically (or scientifically) robust thing to do.
Additionally I attempted to use tensorflow's linalg.logm method, but got the exact same complex results meaning this isn't unexpected behavior?

Are the values 161137531201111100, 1.611375312011111e+17 equal?

I am trying to manipulate a dataframe. The value of in a list which I use to append a column to the dataframe is 161137531201111100. However, I created a dictionary whose keys are the unique values of this column, and I use this dictionary in further operations. This could used to run perfectly before.
However, after trying this code on another data I had the following error:
KeyError: 1.611375312011111e+17
which means that this value is not the of the dictionary; I tried to trace the code, everything seemed to be okay. However, when I opened the csv file of the dataframe I built I found out that the value that is causing the problem is: 161137531201111000 which is not in the list(and ofc not a key in the dictionary) I used to create this column of dataframe. This seems weird. However, I don't know what is the reason? Is there any reason that a number is saved in another way?
And how can I save it as it is in all phases? Also, why did it change in the csv?
No unfortunately, they are not equal
print(1.611375312011111e+17 == 161137531201111000)` # False.
The problem lies in the way floating numbers are handled by computers, in general, and most programming languages, including Python.
Always use integers (and not "too large") when doing computations if you want exact results.
See Is floating point math broken? for generic explanation that you definitely must know as a programmer, even if it's not specific to Python.
(and be aware that Python tries to do a rather good job at keeping precision on integers, that unfortunately won't work on floating-point numbers).
And just for the sake of "fun" with floating point numbers, 1.611375312011111e+17 is actually equal to the integer 161137531201111104!
print(format (1.611375312011111e+17, ".60g")) # shows 161137531201111104
print(1.611375312011111e+17 == 161137531201111104) # True
a = dict()
a[1.611375312011111e+17] = "hello"
#print(a[161137531201111100]) # Key error, as in question
print(a[161137531201111104]) # This one shows "hello" properly!

Safety of taking `int(numpy.sqrt(N))`

Let's say I'm considering M=N**2 where N is an integer. It appears that numpy.sqrt(M) returns a float (actually numpy.float64).
I could imagine that there could be a case where it returns, say, N-10**(-16) due to numerical precision issues, in which case int(numpy.sqrt(M)) would be N-1.
Nevertheless, my tests have N==numpy.sqrt(M) returning True, so it looks like this approximation isn't happening.
Is it safe for me to assume that int(numpy.sqrt(M)) is indeed accurate when M is a perfect square? If so, for bonus, what's going on in the background that makes it work?
To avoid missing the integer by 1E-15, you could use :
int(numpy.sqrt(M)+0.5)
or
int(round(numpy.sqrt(M)))

vector magnitude for large components

I noticed that numpy has a built in function linalg.norm(vector), which produces the magnitude. For small values I get the desired output
>>> import numpy as np
>>> np.linalg.norm([0,2])
2.0
However for large values:
>>> np.linalg.norm([0,149600000000])
2063840737.6330884
This is a huge error, what could I do instead. Making my own function seems to produce the same error. What is the problem here, is a rounding error this big?, and what can I do instead?
Your number is written as an integer, and yet it is too big to fit into a numpy.int32. This problem seems to happen even in python3, where
the native numbers are big.
In numerical work I try to make everything floating point unless it is an index. So I tried:
In [3]: np.linalg.norm([0.0,149600000000.0])
Out[3]: 149600000000.0
To elaborate: in this case Adding the .0 was an easy way of turning integers into doubles. In more realistic code, you might have incoming data which is of uncertain type. The safest (but not always the right) thing to do is just coerce to a floating point array at the top of your function.
def do_something_with_array(arr):
arr = np.double(arr) # or np.float32 if you prefer.
... do something ...

Categories

Resources