vector magnitude for large components - python

I noticed that numpy has a built in function linalg.norm(vector), which produces the magnitude. For small values I get the desired output
>>> import numpy as np
>>> np.linalg.norm([0,2])
2.0
However for large values:
>>> np.linalg.norm([0,149600000000])
2063840737.6330884
This is a huge error, what could I do instead. Making my own function seems to produce the same error. What is the problem here, is a rounding error this big?, and what can I do instead?

Your number is written as an integer, and yet it is too big to fit into a numpy.int32. This problem seems to happen even in python3, where
the native numbers are big.
In numerical work I try to make everything floating point unless it is an index. So I tried:
In [3]: np.linalg.norm([0.0,149600000000.0])
Out[3]: 149600000000.0
To elaborate: in this case Adding the .0 was an easy way of turning integers into doubles. In more realistic code, you might have incoming data which is of uncertain type. The safest (but not always the right) thing to do is just coerce to a floating point array at the top of your function.
def do_something_with_array(arr):
arr = np.double(arr) # or np.float32 if you prefer.
... do something ...

Related

Can I discard the complex portion of results generated with scipy.linalg.logm?

I have a matrix that look like the one below. It is always a square matrix (up to 1000 x 1000) with the values are between 0 and 1:
data = np.array([[0.0308, 0.07919, 0.05694, 0.00662, 0.00927],
[0.07919, 0.00757, 0.00720, 0.00526, 0.00709],
[0.05694, 0.00720, 0.00518, 0.00707, 0.00413],
[0.00662, 0.00526, 0.00707, 0.01612, 0.00359],
[0.00927, 0.00709, 0.00413, 0.00359, 0.00870]])
When I try to take the natural log of this matrix, using scipy.linalg.logm, it gives me the following result.
print(logm(data))
>> [[-2.3492917 +1.42962407j 0.15360003-1.26717846j 0.15382223-0.91631624j 0.15673496+0.0443927j 0.20636448-0.01113953j]
[ 0.15360003-1.26717846j -3.75764578+2.16378501j 1.92614937-0.60836013j -0.13584605+0.27652444j 0.27819383-0.25190565j]
[ 0.15382223-0.91631624j 1.92614937-0.60836013j -5.08018989+2.52657239j 0.37036433-0.45966441j -0.03892575+0.36450564j]
[ 0.15673496+0.0443927j -0.13584605+0.27652444j 0.37036433-0.45966441j -4.22733838+0.09726189j 0.26291385-0.07980921j]
[ 0.20636448-0.01113953j 0.27819383-0.25190565j -0.03892575+0.36450564j 0.26291385-0.07980921j -4.91972246+0.06594195j]]
First of all, why is this happening? Based on another post I found here, pertaining to a different scipy.linalg method, this is due to truncation and rounding issues caused by floating point errors.
If that is correct, then how am I able to fix it? The second answer on that same linked post suggested this:
(2) All imaginary parts returned by numpy's linalg.eig are close to the machine precision. Thus you should consider them zero.
Is this correct? I can use numpy.real(data) to simply discard the complex portion of the values, but I don't know if that is a mathematically (or scientifically) robust thing to do.
Additionally I attempted to use tensorflow's linalg.logm method, but got the exact same complex results meaning this isn't unexpected behavior?

Float precision differs between elements in pandas dataframe

I am trying to read a dataframe from a csv, do some calculations with it and then export the results to another csv. While doing that I noticed that the value 8.1e-202 is getting changed to 8.1000000000000005e-202. But all the other numbers are represented correctly.
Example:
A example.csv looks like this:
id,e-value
ID1,1e-20
ID2,8.1e-202
ID3,9.24e-203
If I do:
import pandas as pd
df = pd.read_csv("example.csv")
df.iloc[1]["e-value"]
>>> 8.1000000000000005e-202
df.iloc[2]["e-value"]
>>> 9.24e-203
Why is 8.1e-202 being altered but 9.24e-203 isn't?
I tried to change the datatype that pandas is using from the default
df["e-value"].dtype
>>> dtype('float64')
to numpy datatypes like this:
import numpy as np
df = pd.read_csv("./temp/test", dtype={"e-value" : np.longdouble})
but this will just result in:
df.iloc[1]["e-value"]
>>> 8.100000000000000522e-202
Can someone explain to me why this is happening? I can't replicate this problem with any other number. Everything bigger or smaller than 8.1e-202 seems to work normally.
EDIT:
To specify my problem. I am aware that floats are not perfect. My actual problem with this is that once I write the dataframe back to a csv the resulting file will then look like this:
id,e-value
ID1,1e-20
ID2,8.1000000000000005e-202
ID3,9.24e-203
And I need the second row to be ID2,8.1e-202
I "fixed" this by just formatting this column before I write the csv, but I'm unhappy with this solution since the formatting will change other elements to something scientific notation where it was just a normal float.
def format_eval(e):
return "{0:.1e}".format(e)
df["e-value"] = df["e-value"].apply(lambda x: format_eval(x))
Float number representation is something not so simple. Not every real number can be represented and almost all (relatively speaking) are actually approximations. Is not like integers, the precision varies and python has a precision undefined float really.
Each floating point standar will have their own set of real numbers that can represent exactly. There's no work around.
https://en.wikipedia.org/wiki/Single-precision_floating-point_format
https://en.wikipedia.org/wiki/IEEE_754-2008_revision
If the problem really is the arithmetic or comparison, you should consider if error will grow or decrease. For example multiplying by large numbers can grow the representation error.
And also, when comparing you should do things like math.is_close. Basically comparing the distance between the numbers.
If you are trying to represent and operate real numbers, that aren't irrational numbers. Like integers, fractions or decimal numbers with finite digits, you can also consider cast to the proper digit representation like: int, decimal or fraction.
See this for further ideas:
https://davidamos.dev/the-right-way-to-compare-floats-in-python/#:~:text=How%20To%20Compare%20Floats%20in%20Python&text=If%20abs(a%20%2D%20b),rel_tol%20keyword%20argument%20of%20math.

Why is the result of numpy fft different from matlab fft?

I was using parameters and formulations below to generate signals.
python code:
import numpy as np
fs=15e6
dt=1/fs
f0=1e6
pri=400e-6
t=np.arange(0,pri,dt)
i=64
fd=5/(i*pri)
xt=0.1*np.exp(2j*np.pi*f0*t)
xf=np.fft.fft(xt)
matlab code is very similar with python code:
fs=15e6
dt=1/fs
f0=1e6
pri=400e-6
t=0:dt:pri-dt
i=64
fd=5/(i*pri)
xt=0.1*exp(2j*pi*f0*t)
xf=fft(xt)
These code will generate an array of length 6000 to perform fft. Then I calculate the result in matlab using the same method. The result is absolutely same when the fft length is less than 6000. But it became a little different when the fft length is 6000.
The result of xf in python is:
xf[:5] = [4.68819428e-12-2.53650626e-12j,
6.55886345e-12+4.51937973e-13j,
5.91758655e-12+4.48215898e-12j,
2.07297400e-12+6.37992397e-12j,
-1.44454940e-12+5.60550355e-12j]
The result of xf in matlab is:
xf(1:5) = 5.165829569664382e-12+1.503743771929872e-12j
4.389776854811194e-12+5.127317569216533e-12j
1.067288620484369e-12+7.191186166371298e-12j
-3.058138112418996e-12+6.189531470616248e-12j
-5.288313073640339e-12+2.908982377132765e-12j
if use length 5999 to do fft like this in python:
xf=np.fft.fft(xt, 5999)
or in matlab:
xf=fft(xt, 5999)
The result is absolutely identical.
In python:
xf[:5] = [-0.09135455+0.04067366j,
-0.09160153+0.04072616j,
-0.09184974+0.04077892j,
-0.09209917+0.04083194j,
-0.09234986+0.04088522j]
In matlab:
xf(1:5) = -9.135455e-02+4.067366e-2j
-9.160153e-02+4.072616e-2j
-9.184974e-02+4.077892e-2j
-9.209917e-02+4.083194e-2j
-9.234986e-02+4.088522e-2j
I was confused. Can anybody illustrate this phenomenon? Thanks for your help.
PS: python 3.8.5, numpy 1.19.2, matlab 2014
demio. I think the different values you are getting is because MATLAB's floating point rounding errors. For low values, of order 1e-15, that values are rounded to 0 and that generates an error of the order that is being rounded to. It happens the same way for really big values. You can see a related post with pretty good explanation of this on: https://es.mathworks.com/matlabcentral/answers/475494-unexpected-results-due-to-floating-point-rounding-errors-by-performing-arithmetic-calculations-on-la.
Also it is worth noticing that even though this floating point rounding errors always occur you have to determine whether that's significant or not taking into account your set of data and the result you are expecting. Sometimes those absolute differences does not mean anything because the relative differences are marginal. If you wish to avoid this behavior from MATLAB you need to use the sym function, that triggers MATLAB to use a Symbolic representation which involves several things, one of them being that the numbers are represented more accurately. More on this subject can be found here: https://es.mathworks.com/help/symbolic/create-symbolic-numbers-variables-and-expressions.html#buyfu27.

Does numpy methods work correctly on numbers too big to fit numpy dtypes?

I would like to know if numbers bigger than what int64 or float128 can be correctly processed by numpy functions
EDIT: numpy functions applied to numbers/python objects outside of any numpy array. Like using a np function in a list comprehension that applies to the content of a list of int128?
I can't find anything about that in their docs, but I really don't know what to think and expect. From tests, it should work but I want to be sure, and a few trivial test won't help for that. So I come here for knowledge:
If np framework is not handling such big numbers, are its functions able to deal with these anyway?
EDIT: sorry, I wasn't clear. Please see the edit above
Thanks by advance.
See the Extended Precision heading in the Numpy documentation here. For very large numbers, you can also create an array with dtype set to 'object', which will allow you essentially to use the Numpy framework on the large numbers but with lower performance than using native types. As has been pointed out, though, this will break when you try to call a function not supported by the particular object saved in the array.
import numpy as np
arr = np.array([10**105, 10**106], dtype='object')
But the short answer is that you you can and will get unexpected behavior when using these large numbers unless you take special care to account for them.
When storing a number into a numpy array with a dtype not sufficient to store it, you will get truncation or an error
arr = np.empty(1, dtype=np.int64)
arr[0] = 2**65
arr
Gives OverflowError: Python int too large to convert to C long.
arr = np.empty(1, dtype=float16)
arr[0] = 2**64
arr
Gives inf (and no error)
arr[0] = 2**15 + 2
arr
Gives [ 32768.] (i.e., 2**15), so truncation occurred. It would be harder for this to happen with float128...
You can have numpy arrays of python objects, which could be a python integer which is too big to fit in np.int64. Some of numpy's functionality will work, but many functions call underlying c code which will not work. Here is an example:
import numpy as np
a = np.array([123456789012345678901234567890]) # a has dtype object now
print((a*2)[0]) # Works and gives the right result
print(np.exp(a)) # Does not work, because "'int' object has no attribute 'exp'"
Generally, most functionality will probably be lost for your extremely large numbers. Also, as it has been pointed out, when you have an array with a dtype of np.int64 or similar, you will have overflow problems, when you increase the size of your array elements over that types limit. With numpy, you have to be careful about what your array's dtype is!

Realistic float value for "about zero"

I'm working on a program with fairly complex numerics, mostly in numpy with complex datatypes. Some of the calculation are returning nearly empty arrays with a complex component that is almost zero. For example:
(2 + 0j, 3+0j, 4+3.9320340202e-16j)
Clearly the third component is basically 0, but for whatever reason, this is the output of my calculation and it turns out that for some of these nearly zero values, np.is_complex() returns True. Rather than dig through that big code, I think it's sensible to just apply a cutoff. My question is, what is a sensible cutoff that anything below should be considered a zero? 0.00? 0.000000? etc...
I understand that these values are due to rounding errors in floating point math, and just want to handle them sensibly. What is the tolerance/range one allows for such precision error? I'd like to set it to a parameter:
ABOUTZERO=0.000001
As others have commented, what constitutes 'almost zero' really does depend on your particular application, and how large you expect the rounding errors to be.
If you must use a hard threshold, a sensible value might be the machine epsilon, which is defined as the upper bound on the relative error due to rounding for floating point operations. Intuitively, it is the smallest positive number that, when added to 1.0, gives a result >1.0 using a given floating point representation and rounding method.
In numpy, you can get the machine epsilon for a particular float type using np.finfo:
import numpy as np
print(np.finfo(float).eps)
# 2.22044604925e-16
print(np.finfo(np.float32).eps)
# 1.19209e-07

Categories

Resources