I don't think this is possible, hence I decided to ask here to see as googling around hasn't returned any results that hint that I can do so.
Especially after reading this:
Can doubles be used to represent a 64 bit number without loss of precision
Though my numbers can be held in 32bit as the example below shows.
But is there any way in MATLAB to convert a double precision value to single without loosing information?
e.g. in MATLAB
> a = 103364148
a =
103364148
> single(a)
ans =
103364144
Or maybe there is another way in another language, e.g. Python?
I'm working with GPUMat where I can only use GPUSingle, so I'm trying to find a way to work with stuff that is double to MATLAB in single to the GPU.
Thanks,
A single can hold integer numbers up to 2^24 (16,777,216) without loss of precision - some bits are required for the sign and the exponent .
In other words, no, there is no way that you can make a number larger than 2^24 fit into a single without error (note that it can hold some larger numbers, as long as they can be written as the product of a number smaller 2^24 and some power of 2).
However, are you sure you need that kind of precision for your calculations? As long as all your integers are less than 2^24, you should be fine.
When you're doing these kinds of experiments, you should turn on
format long
so you can see more decimal values. For example,
>> pi
ans =
3.1416
>> format long
>> ans
ans =
3.141592653589793
If your only concern are integers, you could use int32 instead
Related
I have used one python code in PyCharm in Linux and the format of number was
-91.35357. When I used the same code in PyCharm in Windows format was
-91.35356999999999. The problem is that value is consisted in the file name which I need to open (and the list of files to open is long).
Anyone knows possible explanation and hot to fix it?
Floats
Always remember that float numbers have a limited precision. If you think about it, there must be a limit to how exactly you represent a number if you limit storage to 32 or 64 bits (or any other number).
in Python
Python provides just one float type. Float numbers are usually implemented using 64 bits, but yet they might be 64 bit in one Python binary, 32 bit on another, so you can't really rely on that (however, see #Mark Dickinson comment below).
Let's test this. But note that, because Python does not provide float32 and float64 alternatives, we will use a different library, numpy, to provide us with those types and operations:
>>> n = 1.23456789012345678901234567890
>>> n
1.2345678901234567
>>> numpy.float64(n)
1.2345678901234567
>>> numpy.float32(n)
1.2345679
Here we can see that Python, in my computer, handles the variable as a float64. This already truncates the number we introduced (because a float64 can only handle so much precision).
When we use a float32, precision is further reduced and, because of truncation, the closest number we can represent is slightly different.
Conclusion
Float resolution is limited. Furthermore, some operations behave differently across different architectures.
Even if you are using a consistent float size, not all numbers can be represented, and operations will accumulate truncation errors.
Comparing a float to another float shall be done considering a possible error margin. Do not use float_a == float_b, instead use abs(float_a - float_b) < error_margin.
Relying on float representations is always a bad idea. Python sometimes uses scientific notation:
>>> a = 0.0000000001
>>> str(a)
'1e-10'
You can get consistent rounding approximation (ie, to use in file names), but remember that storage and representation are different things. This other thread may assist you: Limiting floats to two decimal points
In general, I'd advise against using float numbers in file names or as any other kind of identifier.
Latitude / Longitude
float32 numbers have not enough precision to represent the 5th and 6th decimal numbers in latitude/longitude pairs (depending on whether the integer part has one, two or three digits).
If you want to learn what's really happening, check this page and test some of your numbers: https://www.h-schmidt.net/FloatConverter/IEEE754.html
Representing
Note that Python rounds float values when representing them:
>>> lat = 123.456789
>>> "{0:.6f}".format(lat)
'123.456789'
>>> "{0:.5f}".format(lat)
'123.45679'
And as stated above, latitude/longitude cannot be correctly represented by a float32 down to the 6th decimal, and furthermore, the truncated float values are rounded when presented by Python:
>>> lat = 123.456789
>>> lat
123.456789
>>> "{0:.5f}".format(numpy.float64(lat))
'123.45679'
>>> "{0:.5f}".format(numpy.float32(lat))
'123.45679'
>>> "{0:.6f}".format(numpy.float32(lat))
'123.456787'
As you can see, the rounded version of that float32 number fails to match the original number from the 5th decimal. But also does the rounded version to the 5th decimal of the float64 number.
Your PyCharm on Linux is simply rounding of your large floating point number. Rounding it off to the nearest 6 or 7 can resolve your issue but DONT USE THESE AS FILE NAMES.
Keeping your code constant in both cases then, their can be many explanations:
1) 32-bit Processors handles float differently than 64-Bit Processors.
2) PyCharm for both Linux and Windows behaves differently for floating points which we cannot determine exactly, may be PyCharm for Windows is better optimised.
edit 1
Explanation for Point 1
on 32-Bit processors everything is really done in 80-bit precision internally. The precision really just determines how many of those bits are stored in memory. This is part of the reason why different optimisation settings can change results slightly: They change the amount of rounding from 80-bit to 32- or 64-bit.
edit 2
You can use hashmapping for saving your data in files and then mapping them onto the co-ordinates.
Example:
# variable = {(long,lat):"<random_file_name>"}
cordinates_and_file ={(-92.45453534,-87.2123123):"AxdwaWAsdAwdz"}
So I'm trying to store a LOT of numbers, and I want to optimize storage space.
A lot of the numbers generated have pretty high precision floating points, so:
0.000000213213 or 323224.23125523 - long, high memory floats.
I want to figure out the best way, either in Python with MySQL(MariaDB) - to store the number with smallest data size.
So 2.132e-7 or 3.232e5, just to basically store it as with as little footprint as possible, with a decimal range that I can specify - but removing the information after n decimals.
I assume storing as a DOUBLE is the way to go, but can I truncate the precision and save on space too?
I'm thinking some number formating / truncating in Python followed by just normal storage as a DOUBLE would work - but would that actually save any space as opposed to just immediately storing the double with N decimals attached.
Thanks!
All python floats have the same precision and take the same amount of storage. If you want to reduce overall storage numpy arrays should do the trick.
if, on the other hand, you are trying to minimize the representation of numbers for say transmission via json or xml, you could use f-strings.
>>> from math import pi
>>> pi
3.141592653589793
>>> f'{pi:3.2}.'
'3.1.'
>>> bigpi = pi*10e+100
>>> bigpi
3.141592653589793e+101
>>> f'{bigpi:3.2}'
'3.1e+101'
I am learning Python these days, and this is probably my first post on Python. I am relatively new to R as well, and have been using R for about a year. I am comparing both the languages while learning Python. I apologize if this question is too basic.
I am unsure why R outputs Inf for something python doesn't. Let's take 2^1500 as an example.
In R:
nchar(2^1500)
[1] 3
2^1500
[1] Inf
In Python:
len(str(2**1500))
Out[7]: 452
2**1500
Out[8]: 3507466211043403874...
I have two questions:
a) Why is it that R provides Inf when Python doesn't.
b) I researched How to work with large numbers in R? thread. It seems that Brobdingnag could help us out with dealing with large numbers. However, even in such case, I am unable to compute nchar. How do I compute above expression i.e. 2^1500 in R
2^Brobdingnag::as.brob(500)
[1] +exp(346.57)
> nchar(2^Brobdingnag::as.brob(500))
Error in nchar(2^Brobdingnag::as.brob(500)) :
no method for coercing this S4 class to a vector
In answer to your questions:
a) They use different representations for numbers. Most numbers in R are represented as double precision floating point values. These are all 64 bits long, and give about 15 digit precision throughout the range, which goes from -double.xmax to double.xmax, then switches to signed infinite values. R also uses 32 bit integer values sometimes. These cover the range of roughly +/- 2 billion. R chooses these types because it is geared towards statistical and numerical methods, and those rarely need more precision than double precision gives. (They often need a bigger range, but usually taking logs solves that problem.)
Python is more of a general purpose platform, and it has types discussed in MichaelChirico's comment.
b) Besides Brobdingnag, the gmp package can handle arbitrarily large integers. For example,
> as.bigz(2)^1500
Big Integer ('bigz') :
[1] 35074662110434038747627587960280857993524015880330828824075798024790963850563322203657080886584969261653150406795437517399294548941469959754171038918004700847889956485329097264486802711583462946536682184340138629451355458264946342525383619389314960644665052551751442335509249173361130355796109709885580674313954210217657847432626760733004753275317192133674703563372783297041993227052663333668509952000175053355529058880434182538386715523683713208549376
> nchar(as.character(as.bigz(2)^1500))
[1] 452
I imagine the as.character() call would also be needed with Brobdingnag.
Apparently python uses arbitrary precision integers by default when needed. R does not. However, there are many useful R packages to perform arbitrary precision arithmetic. Which package to pick depends on the use case.
To bring up a package that hasn't been discussed yet, consider the Rmpfr package:
> library(Rmpfr)
> a <- 2^mpfr(1500, 10000)
> a
1 'mpfr' number of precision 10000 bits
[1] 35074662110434038747627587960280857993524015880330828824075798024790963850563322203657080886584969261653150406795437517399294548941469959754171038918004700847889956485329097264486802711583462946536682184340138629451355458264946342525383619389314960644665052551751442335509249173361130355796109709885580674313954210217657847432626760733004753275317192133674703563372783297041993227052663333668509952000175053355529058880434182538386715523683713208549376
It requires you to set a precision, but if you make it large enough it can hold 2^1500 as integer.
However, it also doesn't seem to define an as.character() function:
> as.character(a)
[1] "<S4 object of class \"mpfr1\">"
So if your problem is specifically to count digits, then the gmp package as discussed in this answer is probably the way to go. On the other hand, if you're interested in arbitrary precision floating point arithmetic, Rmpfr might be a better choice.
So I have a list of tuples of two floats each. Each tuple represents a range. I am going through another list of floats which represent values to be fit into the ranges. All of these floats are < 1 but positive, so precision matter. One of my tests to determine if a value fits into a range is failing when it should pass. If I print the value and the range that is causing problems I can tell this much:
curValue = 0.00145000000671
range = (0.0014500000067055225, 0.0020968749796738849)
The conditional that is failing is:
if curValue > range[0] and ... blah :
# do some stuff
From the values given by curValue and range, the test should clearly pass (don't worry about what is in the conditional). Now, if I print explicitly what the value of range[0] is I get:
range[0] = 0.00145000000671
Which would explain why the test is failing. So my question then, is why is the float changing when it is accessed. It has decimal values available up to a certain precision when part of a tuple, and a different precision when accessed. Why would this be? What can I do to ensure my data maintains a consistent amount of precision across my calculations?
The float doesn't change. The built-in numberic types are all immutable. The cause for what you're observing is that:
print range[0] uses str on the float, which (up until very recent versions of Python) printed less digits of a float.
Printing a tuple (be it with repr or str) uses repr on the individual items, which gives a much more accurate representation (again, this isn't true anymore in recent releases which use a better algorithm for both).
As for why the condition doesn't work out the way you expect, it's propably the usual culprit, the limited precision of floats. Try print repr(curVal), repr(range[0]) to see if what Python decided was the closest representation of your float literal possible.
In modern day PC's floats aren't that precise. So even if you enter pi as a constant to 100 decimals, it's only getting a few of them accurate. The same is happening to you. This is because in 32-bit floats you only get 24 bits of mantissa, which limits your precision (and in unexpected ways because it's in base2).
Please note, 0.00145000000671 isn't the exact value as stored by Python. Python only diplays a few decimals of the complete stored float if you use print. If you want to see exactly how python stores the float use repr.
If you want better precision use the decimal module.
It isn't changing per se. Python is doing its best to store the data as a float, but that number is too precise for float, so Python modifies it before it is even accessed (in the very process of storing it). Funny how something so small is such a big pain.
You need to use a arbitrary fixed point module like Simple Python Fixed Point or the decimal module.
Not sure it would work in this case, because I don't know if Python's limiting in the output or in the storage itself, but you could try doing:
if curValue - range[0] > 0 and...
Problem: to see when computer makes approximation in mathematical calculations when I use Python
Example of the problem:
My old teacher once said the following statement
You cannot never calculate 200! with your computer.
I am not completely sure whether it is true or not nowadays.
It seems that it is, since I get a lot zeros for it from a Python script.
How can you see when your Python code makes approximations?
Python use arbitrary-precision arithmetic to calculate with integers, so it can exactly calculate 200!. For real numbers (so-called floating-point), Python does not use an exact representation. It uses a binary representation called IEEE 754, which is essentially scientific notation, except in base 2 instead of base 10.
Thus, any real number that cannot be exactly represented in base 2 with 53 bits of precision, Python cannot produce an exact result. For example, 0.1 (in base 10) is an infinite decimal in base 2, 0.0001100110011..., so it cannot be exactly represented. Hence, if you enter on a Python prompt:
>>> 0.1
0.10000000000000001
The result you get back is different, since has been converted from decimal to binary (with 53 bits of precision), back to decimal. As a consequence, you get things like this:
>>> 0.1 + 0.2 == 0.3
False
For a good (but long) read, see What Every Programmer Should Know About Floating-Point Arithmetic.
Python has unbounded integer sizes in the form of a long type. That is to say, if it is a whole number, the limit on the size of the number is restricted by the memory available to Python.
When you compute a large number such as 200! and you see an L on the end of it, that means Python has automatically cast the int to a long, because an int was not large enough to hold that number.
See section 6.4 of this page for more information.
200! is a very large number indeed.
If the range of an IEEE 64-bit double is 1.7E +/- 308 (15 digits), you can see that the largest factorial you can get is around 170!.
Python can handle arbitrary sized numbers, as can Java with its BigInteger.
Without some sort of clarification to that statement, it's obviously false. Just from personal experience, early lessons in programming (in the late 1980s) included solving very similar, if not exactly the same, problems. In general, to know some device which does calculations isn't making approximations, you have to prove (in the math sense of a proof) that it isn't.
Python's integer types (named int and long in 2.x, both folded into just the int type in 3.x) are very good, and do not overflow like, for example, the int type in C. If you do the obvious of print 200 * 199 * 198 * ... it may be slow, but it will be exact. Similiarly, addition, subtraction, and modulus are exact. Division is a mixed bag, as there's two operators, / and //, and they underwent a change in 2.x—in general you can only treat it as inexact.
If you want more control yet don't want to limit yourself to integers, look at the decimal module.
Python handles large numbers automatically (unlike a language like C where you can overflow its datatypes and the values reset to zero, for example) - over a certain point (sys.maxint or 2147483647) it converts the integer to a "long" (denoted by the L after the number), which can be any length:
>>> def fact(x):
... return reduce(lambda x, y: x * y, range(1, x+1))
...
>>> fact(10)
3628800
>>> fact(200)
788657867364790503552363213932185062295135977687173263294742533244359449963403342920304284011984623904177212138919638830257642790242637105061926624952829931113462857270763317237396988943922445621451664240254033291864131227428294853277524242407573903240321257405579568660226031904170324062351700858796178922222789623703897374720000000000000000000000000000000000000000000000000L
Long numbers are "easy", floating point is more complicated, and almost any computer representation of a floating point number is an approximation, for example:
>>> float(1)/3
0.33333333333333331
Obviously you can't store an infinite number of 3's in memory, so it cheats and rounds it a bit..
You may want to look at the decimal module:
Decimal numbers can be represented exactly. In contrast, numbers like 1.1 do not have an exact representation in binary floating point. End users typically would not expect 1.1 to display as 1.1000000000000001 as it does with binary floating point.
Unlike hardware based binary floating point, the decimal module has a user alterable precision (defaulting to 28 places) which can be as large as needed for a given problem
See Handling very large numbers in Python.
Python has a BigNum class for holding 200! and will use it automatically.
Your teacher's statement, though not exactly true here is true in general. Computers have limitations, and it is good to know what they are. Remember that every time you add another integer of data storage, you can store a number that is 2^32 (4 billion +) times larger. It is hard to comprehend how many more numbers that is - but maths gets slower as you add more integers to store the exact value of a very large number.
As an example (what you can store with 1000 bits)
>>> 2 << 1000
2143017214372534641896850098120003621122809623411067214887500776740702102249872244986396
7576313917162551893458351062936503742905713846280871969155149397149607869135549648461970
8421492101247422837559083643060929499671638825347975351183310878921541258291423929553730
84335320859663305248773674411336138752L
I tried to illustrate how big a number you can store with 10000 bits, or even 8,000,000 bits (a megabyte) but that number is many pages long.