We can define variables as integer values, e.g.
x = 3
y = -2
and then operate on bits with binary operators &, |, ^ and ~. The question is if we always get the same result on every architecture, or is the behavior architecture specific?
Can we always assume a two's complement representation of integers?
Python 2.x supports two integer types: int and long. int is based on the underlying C long type and long is an arbitrary precision type. Very early version of Python (pre-2.2), treated the types as two separate types but they were mostly combined in Python 2.2.
Python 3.x only uses the arbitrary precision type.
Bit operations behave as if applied to arbitrary-precision 2's complement numbers. If required, an int will be automatically promoted to a long in Python 2.x.
The behavior is consistent across platforms.
From the python 2 documentation (emphasis mine):
Plain integers: These represent numbers in the range -2147483648 through 2147483647. (The range may be larger on machines with a larger natural word size, but not smaller.) When the result of an operation would fall outside this range, the result is normally returned as a long integer (in some cases, the exception OverflowError is raised instead). For the purpose of shift and mask operations, integers are assumed to have a binary, 2’s complement notation using 32 or more bits, and hiding no bits from the user (i.e., all 4294967296 different bit patterns correspond to different values).
So yes: the integers are architecture specific for Python 2.
From the Python 3 documentation:
Plain integers: These represent numbers in an unlimited range, subject to available (virtual) memory only. For the purpose of shift and mask operations, a binary representation is assumed, and negative numbers are represented in a variant of 2’s complement which gives the illusion of an infinite string of sign bits extending to the left.
So no: the integers are not architecture specific for Python 3.
Related
I would like to create a function which gives the following things
32_bit_binary(-1) should be '11111111111111111111111111111111'
32_bit_binary(1) should be '00000000000000000000000000000001'
Now I gave the following code
def 32_bit_binary(num_bits):
return '{:032b}'.format(num_bits)
But when I gave this
print(32_bit_binary(-1))
it came -00000000000000000000000000000001
What is wrong with the code?
As #gspr said, formatting as base 2 doesn’t give you the actual representation. You can solve it by masking the negative integer, which has infinitely many leading 1s for the purposes of bitwise operations, down to 32 bits:
return f"{num_bits & 0xffff_ffff:032b}"
String formatting like {:032b} does not give you the actual representation of the number. It just writes the number in base-2. That's a purely mathematical operation. Implementation details like how the computer represents said number (like using binary, using 2's complement for negative numbers, etc.) are not in the scope of those string formatting operations.
A good way to get at the actual representations of values in Python is the struct module. For example, struct.pack("#i", -1) returns the bytestring b'\xff\xff\xff\xff'. Printing that bytestring in binary is left as an exercise to the reader.
PS: For numbers other than -1, you may be surprised by the output of struct.pack. The term you'll want to look up is endianness, and the # in my struct.pack formatting string.
Why does Python return True when I compare int and float objects which have the same value?
For example:
>>> 5*2 == 5.0*2.0
True
It's not as simple as a type conversion.
10 == 10.0 delegates to the arguments' __eq__ methods, trying (10).__eq__(10.0) first, and then (10.0).__eq__(10) if the first call returns NotImplemented. It makes no attempt to convert types. (Technically, the method lookup uses a special routine that bypasses instance __dict__ entries and __getattribute__/__getattr__ overrides, so it's not quite equivalent to calling the methods yourself.)
int.__eq__ has no idea how to handle a float:
>>> (10).__eq__(10.0)
NotImplemented
but float.__eq__ knows how to handle ints:
>>> (10.0).__eq__(10)
True
float.__eq__ isn't just performing a cast internally, either. It has over 100 lines of code to handle float/int comparison without the rounding error an unchecked cast could introduce. (Some of that could be simplified if the C-level comparison routine didn't also have to handle >, >=, <, and <=.)
Objects of different types, except different numeric types, never compare equal.
And:
Python fully supports mixed arithmetic: when a binary arithmetic operator has operands of different numeric types, the operand with the “narrower” type is widened to that of the other, where integer is narrower than floating point, which is narrower than complex. Comparisons between numbers of mixed type use the same rule.
https://docs.python.org/3/library/stdtypes.html#numeric-types-int-float-complex
The comparison logic is implemented by each type's __eq__ method. And the standard numeric types are implemented in a way that they support comparisons (and arithmetic operations) among each other. Python as a language never does implicit type conversion (like Javascript's == operator would do implicit type juggling).
The simple answer is that the langue is designed this way. Here is an excerpt from the documentation supporting this:
6.10.1 Value Comparisons
Numbers of built-in numeric types (Numeric Types — int, float, complex) and of the standard library types fractions.Fraction and decimal.Decimal can be compared within and across their types, with the restriction that complex numbers do not support order comparison.
In other words, we want different numeric types with the same value to be equal.
PEP 20
Special cases aren't special enough to break the rules.
Although practicality beats purity.
What benefit is there to making numeric types not comparable, besides making life difficult in most common cases?
You can have a look at the source code for the CPython implementation.
The function is preceded by this comment explaining how the conversion is attempted:
/* Comparison is pretty much a nightmare. When comparing float to float,
* we do it as straightforwardly (and long-windedly) as conceivable, so
* that, e.g., Python x == y delivers the same result as the platform
* C x == y when x and/or y is a NaN.
* When mixing float with an integer type, there's no good *uniform* approach.
* Converting the double to an integer obviously doesn't work, since we
* may lose info from fractional bits. Converting the integer to a double
* also has two failure modes: (1) an int may trigger overflow (too
* large to fit in the dynamic range of a C double); (2) even a C long may have
* more bits than fit in a C double (e.g., on a 64-bit box long may have
* 63 bits of precision, but a C double probably has only 53), and then
* we can falsely claim equality when low-order integer bits are lost by
* coercion to double. So this part is painful too.
*/
Other implementations are not guaranteed to follow the same logic.
From the documentation:
Python fully supports mixed arithmetic: when a binary arithmetic
operator has operands of different numeric types, the operand with the
“narrower” type is widened to that of the other, where plain integer
is narrower than long integer is narrower than floating point is
narrower than complex. Comparisons between numbers of mixed type use
the same rule.
According to this 5*2 is widened to 10.0 and which is equal to 10.0
If you are comparing the mixed data types then the result will be considered on the basics of data type which is having long range, so in your case float range is more then int
float max number can be --> 1.7976931348623157e+308
int max number can be --> 9223372036854775807
Thanks
The == operator compares only the values but not the types. You can use the 'is' keyword to achieve the same effect as using === in other languages. For instance
5 is 5.0
returns
False
==
is a comparison operator
You are actually asking the interpreter if both sides of your expression are equal or not.
In other words you are asking for it to return a Boolean value, not to convert data types. If you want to convert the data types you will have to do so implicitly in your code.
I have used one python code in PyCharm in Linux and the format of number was
-91.35357. When I used the same code in PyCharm in Windows format was
-91.35356999999999. The problem is that value is consisted in the file name which I need to open (and the list of files to open is long).
Anyone knows possible explanation and hot to fix it?
Floats
Always remember that float numbers have a limited precision. If you think about it, there must be a limit to how exactly you represent a number if you limit storage to 32 or 64 bits (or any other number).
in Python
Python provides just one float type. Float numbers are usually implemented using 64 bits, but yet they might be 64 bit in one Python binary, 32 bit on another, so you can't really rely on that (however, see #Mark Dickinson comment below).
Let's test this. But note that, because Python does not provide float32 and float64 alternatives, we will use a different library, numpy, to provide us with those types and operations:
>>> n = 1.23456789012345678901234567890
>>> n
1.2345678901234567
>>> numpy.float64(n)
1.2345678901234567
>>> numpy.float32(n)
1.2345679
Here we can see that Python, in my computer, handles the variable as a float64. This already truncates the number we introduced (because a float64 can only handle so much precision).
When we use a float32, precision is further reduced and, because of truncation, the closest number we can represent is slightly different.
Conclusion
Float resolution is limited. Furthermore, some operations behave differently across different architectures.
Even if you are using a consistent float size, not all numbers can be represented, and operations will accumulate truncation errors.
Comparing a float to another float shall be done considering a possible error margin. Do not use float_a == float_b, instead use abs(float_a - float_b) < error_margin.
Relying on float representations is always a bad idea. Python sometimes uses scientific notation:
>>> a = 0.0000000001
>>> str(a)
'1e-10'
You can get consistent rounding approximation (ie, to use in file names), but remember that storage and representation are different things. This other thread may assist you: Limiting floats to two decimal points
In general, I'd advise against using float numbers in file names or as any other kind of identifier.
Latitude / Longitude
float32 numbers have not enough precision to represent the 5th and 6th decimal numbers in latitude/longitude pairs (depending on whether the integer part has one, two or three digits).
If you want to learn what's really happening, check this page and test some of your numbers: https://www.h-schmidt.net/FloatConverter/IEEE754.html
Representing
Note that Python rounds float values when representing them:
>>> lat = 123.456789
>>> "{0:.6f}".format(lat)
'123.456789'
>>> "{0:.5f}".format(lat)
'123.45679'
And as stated above, latitude/longitude cannot be correctly represented by a float32 down to the 6th decimal, and furthermore, the truncated float values are rounded when presented by Python:
>>> lat = 123.456789
>>> lat
123.456789
>>> "{0:.5f}".format(numpy.float64(lat))
'123.45679'
>>> "{0:.5f}".format(numpy.float32(lat))
'123.45679'
>>> "{0:.6f}".format(numpy.float32(lat))
'123.456787'
As you can see, the rounded version of that float32 number fails to match the original number from the 5th decimal. But also does the rounded version to the 5th decimal of the float64 number.
Your PyCharm on Linux is simply rounding of your large floating point number. Rounding it off to the nearest 6 or 7 can resolve your issue but DONT USE THESE AS FILE NAMES.
Keeping your code constant in both cases then, their can be many explanations:
1) 32-bit Processors handles float differently than 64-Bit Processors.
2) PyCharm for both Linux and Windows behaves differently for floating points which we cannot determine exactly, may be PyCharm for Windows is better optimised.
edit 1
Explanation for Point 1
on 32-Bit processors everything is really done in 80-bit precision internally. The precision really just determines how many of those bits are stored in memory. This is part of the reason why different optimisation settings can change results slightly: They change the amount of rounding from 80-bit to 32- or 64-bit.
edit 2
You can use hashmapping for saving your data in files and then mapping them onto the co-ordinates.
Example:
# variable = {(long,lat):"<random_file_name>"}
cordinates_and_file ={(-92.45453534,-87.2123123):"AxdwaWAsdAwdz"}
This question already has answers here:
Handling very large numbers in Python
(6 answers)
Closed 11 days ago.
I'm learning Python and I have a question about the range of the data types.
This program:
print("8 bits:", pow(2, 8)-1)
print("16 bits:", pow(2, 16)-1)
print("32 bits:", pow(2, 32)-1)
print("64 bits:", pow(2, 64)-1)
print( pow(18446744073709551615+18446744073709551615+2, 9) )
Produces the following output:
8 bits: 255
16 bits: 65535
32 bits: 4294967295
64 bits: 18446744073709551615
12663316555422952143897729076205936129798725073982046203600028471956337925454431
59912019973433564390346740077701202633417478988975650566195033836314121693019733
02667340133957632
My question is: how can Python calculate the result of the last call to pow()? My CPU cannot handle integers with more than 64 bits, so I expect the operation to produce an overflow.
The Python long integer type is only limited by your available memory. Until you run out of memory, the digits will just keep on coming.
Quoting from the numeric types documentation:
Long integers have unlimited precision.
Python will transparently use long integers when you need the unlimited precision. In Python 3, all integers are long integers; there is no distinction.
Python knows two data types for integers: int and long. If a number is too large for an int (64 bit), then automatically a long is used. Also if the result of a computation is too large for an int, a long is used instead.
You can explicitly declare a literal to be a long; just add an L to it (an l also is possible but is discouraged because in many fonts this is indistinguishable or at least very much alike a 1 (one) character). So, 5L is a long.
Typically this distinction is not important; to know of the difference will become necessary, though, if you compare the types of values (because type(5) ≠ type(5L)).
longs aren't limited in any particular number of bits. At ridiculous high values, the memory consumption and the computation times pose a practical limit, though.
Also keep in mind that computing stuff with these long might be faster than printing them because when converting them to a string for the printing, they have to be converted into the decimal system.
Problem: to see when computer makes approximation in mathematical calculations when I use Python
Example of the problem:
My old teacher once said the following statement
You cannot never calculate 200! with your computer.
I am not completely sure whether it is true or not nowadays.
It seems that it is, since I get a lot zeros for it from a Python script.
How can you see when your Python code makes approximations?
Python use arbitrary-precision arithmetic to calculate with integers, so it can exactly calculate 200!. For real numbers (so-called floating-point), Python does not use an exact representation. It uses a binary representation called IEEE 754, which is essentially scientific notation, except in base 2 instead of base 10.
Thus, any real number that cannot be exactly represented in base 2 with 53 bits of precision, Python cannot produce an exact result. For example, 0.1 (in base 10) is an infinite decimal in base 2, 0.0001100110011..., so it cannot be exactly represented. Hence, if you enter on a Python prompt:
>>> 0.1
0.10000000000000001
The result you get back is different, since has been converted from decimal to binary (with 53 bits of precision), back to decimal. As a consequence, you get things like this:
>>> 0.1 + 0.2 == 0.3
False
For a good (but long) read, see What Every Programmer Should Know About Floating-Point Arithmetic.
Python has unbounded integer sizes in the form of a long type. That is to say, if it is a whole number, the limit on the size of the number is restricted by the memory available to Python.
When you compute a large number such as 200! and you see an L on the end of it, that means Python has automatically cast the int to a long, because an int was not large enough to hold that number.
See section 6.4 of this page for more information.
200! is a very large number indeed.
If the range of an IEEE 64-bit double is 1.7E +/- 308 (15 digits), you can see that the largest factorial you can get is around 170!.
Python can handle arbitrary sized numbers, as can Java with its BigInteger.
Without some sort of clarification to that statement, it's obviously false. Just from personal experience, early lessons in programming (in the late 1980s) included solving very similar, if not exactly the same, problems. In general, to know some device which does calculations isn't making approximations, you have to prove (in the math sense of a proof) that it isn't.
Python's integer types (named int and long in 2.x, both folded into just the int type in 3.x) are very good, and do not overflow like, for example, the int type in C. If you do the obvious of print 200 * 199 * 198 * ... it may be slow, but it will be exact. Similiarly, addition, subtraction, and modulus are exact. Division is a mixed bag, as there's two operators, / and //, and they underwent a change in 2.x—in general you can only treat it as inexact.
If you want more control yet don't want to limit yourself to integers, look at the decimal module.
Python handles large numbers automatically (unlike a language like C where you can overflow its datatypes and the values reset to zero, for example) - over a certain point (sys.maxint or 2147483647) it converts the integer to a "long" (denoted by the L after the number), which can be any length:
>>> def fact(x):
... return reduce(lambda x, y: x * y, range(1, x+1))
...
>>> fact(10)
3628800
>>> fact(200)
788657867364790503552363213932185062295135977687173263294742533244359449963403342920304284011984623904177212138919638830257642790242637105061926624952829931113462857270763317237396988943922445621451664240254033291864131227428294853277524242407573903240321257405579568660226031904170324062351700858796178922222789623703897374720000000000000000000000000000000000000000000000000L
Long numbers are "easy", floating point is more complicated, and almost any computer representation of a floating point number is an approximation, for example:
>>> float(1)/3
0.33333333333333331
Obviously you can't store an infinite number of 3's in memory, so it cheats and rounds it a bit..
You may want to look at the decimal module:
Decimal numbers can be represented exactly. In contrast, numbers like 1.1 do not have an exact representation in binary floating point. End users typically would not expect 1.1 to display as 1.1000000000000001 as it does with binary floating point.
Unlike hardware based binary floating point, the decimal module has a user alterable precision (defaulting to 28 places) which can be as large as needed for a given problem
See Handling very large numbers in Python.
Python has a BigNum class for holding 200! and will use it automatically.
Your teacher's statement, though not exactly true here is true in general. Computers have limitations, and it is good to know what they are. Remember that every time you add another integer of data storage, you can store a number that is 2^32 (4 billion +) times larger. It is hard to comprehend how many more numbers that is - but maths gets slower as you add more integers to store the exact value of a very large number.
As an example (what you can store with 1000 bits)
>>> 2 << 1000
2143017214372534641896850098120003621122809623411067214887500776740702102249872244986396
7576313917162551893458351062936503742905713846280871969155149397149607869135549648461970
8421492101247422837559083643060929499671638825347975351183310878921541258291423929553730
84335320859663305248773674411336138752L
I tried to illustrate how big a number you can store with 10000 bits, or even 8,000,000 bits (a megabyte) but that number is many pages long.