Context Free Grammar Differentiate Integer and Floating Point Constants - python

I am writing an LR(1) parser, and I've been basing my test grammar off of the C language. I've looked at the grammar for both C and Python:
https://www.lysator.liu.se/c/ANSI-C-grammar-y.html
https://docs.python.org/3/reference/grammar.html
C seems to use the symbol CONSTANT for integer and floating point constants, and Python uses NUMBER.
What I'm wondering is why are these not separated into individual symbols such as INT and FLOAT so that they can later be put into separate nodes in the Abstract Syntax Tree?
Since we already know what type of number it is after the lexer has parsed it, why merge them into a generic 'NUMBER' and later try to figure out which one it is again?

Being able to handle some special cases earlier does not simplify things, since you still need the same code in a different place later. For example, consider the code y + z. Python doesn't know what that is, other than at run time it will invoke y.__add__(z). The code to generate that isn't going away. That same code can take 3 + x and just as easily generate (3).__add__(z). So it doesn't really simplify anything to distinguish between y + z and 3 + z during parsing. (The same logic holds if y is a float literal instead of an identifier.)
Now consider something like 3.0 + 5. Separate code exists to replace this with 8.0 instead of (3.0).__add__(5) prior to byte-code compilation, because 1) it's simple to do and 2) it is demonstrably better than invoking a function at run time. However, this still isn't done by the parser. This is done by an optimizer that runs over the tree looking for things like NUMBER + NUMBER. Once that is found, the optimizer can determine if the NUMBERs are ints or floats, and produce the appropriate sum to include in the code. This is simpler than having to handle 4 different bits of parse tree INT + FLOAT, FLOAT + INT, FLOAT + FLOAT, and INT + INT.

Related

Division in python an types

I have a code which part of it looks like this,
A=(-1//2+(int(math.sqrt(1+8*t)))//2)
if type(A)==int:
print(t)
print(A)
The problem arises when I use "/" to get "A",
Since I am using "/", I always get an extra decimal point. For example 5/5=1.0 or 4/2=2.0 etc, which python interprets it as a float (I am using 3.6.5). Hence whatever the result is my code stuck at line2.
When I use // the same thing happens. I get 5/2=2 which its float actually but it appears as an integer.
Since my code depends on the type of this division how I can solve this problem?
A=(-1//2+(int(math.sqrt(1+8*t)))//2) its a actually the formula for finding the roots of the quadratic equation (where in the equation a=1 and b=1 and c=-2t for ax^2+bx+c) I need only the integer roots with positive values
What you're trying to do won't work. For two integers x and y, x/y is always a float, even if it happens to be integral, and x//y is always an int, even if it has to truncate (throw away) a fractional part. So testing type(A) == int doesn't test for anything except which of the two you used.
There is a method float.is_integer that you can use, and that works fine for integers divided by 2—but it doesn't work once you're using sqrt. Explaining floating-point rounding issues is a big enough job that it takes up a whole paper that's so important that it's been included by reference in multiple language specifications, but the short version is that sqrt could very easily give you a number that's a tiny big bigger or smaller than an integer, so is_integer will give you the wrong answer.
What you probably want to do is something like this:
if math.isclose(A, round(A)):
The round function will round a float to the nearest integer. The isclose function will then check whether the resulting integer is "close enough" to the original float. You should read the docs on isclose to understand exactly what it does, but in this case, I think the default values will be fine, unless you're dealing with huge integers.

Python rounding and inserting into array does not round [duplicate]

So I have a list of tuples of two floats each. Each tuple represents a range. I am going through another list of floats which represent values to be fit into the ranges. All of these floats are < 1 but positive, so precision matter. One of my tests to determine if a value fits into a range is failing when it should pass. If I print the value and the range that is causing problems I can tell this much:
curValue = 0.00145000000671
range = (0.0014500000067055225, 0.0020968749796738849)
The conditional that is failing is:
if curValue > range[0] and ... blah :
# do some stuff
From the values given by curValue and range, the test should clearly pass (don't worry about what is in the conditional). Now, if I print explicitly what the value of range[0] is I get:
range[0] = 0.00145000000671
Which would explain why the test is failing. So my question then, is why is the float changing when it is accessed. It has decimal values available up to a certain precision when part of a tuple, and a different precision when accessed. Why would this be? What can I do to ensure my data maintains a consistent amount of precision across my calculations?
The float doesn't change. The built-in numberic types are all immutable. The cause for what you're observing is that:
print range[0] uses str on the float, which (up until very recent versions of Python) printed less digits of a float.
Printing a tuple (be it with repr or str) uses repr on the individual items, which gives a much more accurate representation (again, this isn't true anymore in recent releases which use a better algorithm for both).
As for why the condition doesn't work out the way you expect, it's propably the usual culprit, the limited precision of floats. Try print repr(curVal), repr(range[0]) to see if what Python decided was the closest representation of your float literal possible.
In modern day PC's floats aren't that precise. So even if you enter pi as a constant to 100 decimals, it's only getting a few of them accurate. The same is happening to you. This is because in 32-bit floats you only get 24 bits of mantissa, which limits your precision (and in unexpected ways because it's in base2).
Please note, 0.00145000000671 isn't the exact value as stored by Python. Python only diplays a few decimals of the complete stored float if you use print. If you want to see exactly how python stores the float use repr.
If you want better precision use the decimal module.
It isn't changing per se. Python is doing its best to store the data as a float, but that number is too precise for float, so Python modifies it before it is even accessed (in the very process of storing it). Funny how something so small is such a big pain.
You need to use a arbitrary fixed point module like Simple Python Fixed Point or the decimal module.
Not sure it would work in this case, because I don't know if Python's limiting in the output or in the storage itself, but you could try doing:
if curValue - range[0] > 0 and...

In python, what exactly is going on in the background such that "x = 1j" works, but "x = 1*j" throws an error?

Specifically, if I wanted to define an object, say z, such that
x = 1z
worked but
x = 1*z
failed threw an error, how would I define such an object?
I don't think it involves overloading the multiplying operator.
1j, works because it's a literal for a Complex Number (you mentioned 1j in your question title). Kind of like [] is a literal for a list.
Here's the relevant excerpt from the Python docs / spec:
Imaginary literals are described by the following lexical definitions:
imagnumber ::= (floatnumber | intpart) ("j" | "J")
An imaginary
literal yields a complex number with a real part of 0.0. Complex
numbers are represented as a pair of floating point numbers and have
the same restrictions on their range. To create a complex number with
a nonzero real part, add a floating point number to it, e.g., (3+4j).
In other words, 1j is a special case, and there's nothing you can do to make 1z work like 1j does. 1z is a SyntaxError, and that's it (as far as Python is concerned, that is).

Python default behavior of str(x)

I am depending on some code that uses the Decimal class because it needs precision to a certain number of decimal places. Some of the functions allow inputs to be floats because of the way that it interfaces with other parts of the codebase. To convert them to decimal objects, it uses things like
mydec = decimal.Decimal(str(x))
where x is the float taken as input. My question is, does anyone know what the standard is for the 'str' method as applied to floats?
For example, take the number 2.1234512. It is stored internally as 2.12345119999999999 because of how floats are represented.
>>> x = 2.12345119999999999
>>> x
2.1234511999999999
>>> str(x)
'2.1234512'
Ok, str(x) in this case is doing something like '%.6f' % x. This is a problem with the way my code converts to decimals. Take the following:
>>> d = decimal.Decimal('2.12345119999999999')
>>> ds = decimal.Decimal(str(2.12345119999999999))
>>> d - ds
Decimal('-1E-17')
So if I have the float, 2.12345119999999999, and I want to pass it to Decimal, converting it to a string using str() gets me the wrong answer. I need to know what are the rules for str(x) that determine what the formatting will be, because I need to determine whether this code needs to be re-written to avoid this error (note that it might be OK, because, for example, the code might round to the 10th decimal place once we have a decimal object)
There must be some set of rules in python's docs that hopefully someone here can point me to. Thanks!
In the Python source, look in "Include/floatobject.h". The precision for the string conversion is set a few lines from the top after an comment with some explanation of the choice:
/* The str() precision PyFloat_STR_PRECISION is chosen so that in most cases,
the rounding noise created by various operations is suppressed, while
giving plenty of precision for practical use. */
#define PyFloat_STR_PRECISION 12
You have the option of rebuilding, if you need something different. Any changes will change formatting of floats and complex numbers. See ./Objects/complexobject.c and ./Objects/floatobject.c. Also, you can compare the difference between how repr and str convert doubles in these two files.
There's a couple of issues worth discussing here, but the summary is: you cannot extract information that is not stored on your system already.
If you've taken a decimal number and stored it as a floating point, you'll have lost information, since most decimal (base 10) numbers with a finite number of digits cannot be stored using a finite number of digits in base 2 (binary).
As was mentioned, str(a_float) will really call a_float.__str__(). As the documentation states, the purpose of that method is to
return a string containing a nicely printable representation of an object
There's no particular definition for the float case. My opinion is that, for your purposes, you should consider __str__'s behavior to be undefined, since there's no official documentation on it - the current implementation can change anytime.
If you don't have the original strings, there's no way to extract the missing digits of the decimal representation from the float objects. All you can do is round predictably, using string formatting (which you mention):
Decimal( "{0:.5f}".format(a_float) )
You can also remove 0s on the right with resulting_string.rstrip("0").
Again, this method does not recover the information that has been lost.

Byte precision of value in Python?

I have a hash function in Python.
It returns a value.
How do I see the byte-size of this return value? I want to know if it is 4-bytes or 8 or what.
Reason:
I want to make sure that the min value is 0 and the max value is 2**32, otherwise my calculations are incorrect.
I want to make sure that packing it to a I struct (unsigned int) is correct.
More specifically, I am calling murmur.string_hash(`x`).
I want to know sanity-check that I am getting a 4-byte unsigned return value. If I have a value of a different size, them my calculations get messed up. So I want to sanity check it.
If it's an arbitrary function that returns a number, there are only 4 standard types of numbers in Python: small integers (C long, at least 32 bits), long integers ("unlimited" precision), floats (C double), and complex numbers.
If you are referring to the builtin hash, it returns a standard integer (C long):
>>> hash(2**31)
-2147483648
If you want different hashes, check out hashlib.
Generally, thinking of a return value as a particular byte precision in Python is not the best way to go, especially with integers. For most intents and purposes, Python "short" integers are seamlessly integrated with "long" (unlimited) integers. Variables are promoted from the smaller to the larger type as necessary to hold the required value. Functions are not required to return any particular type (the same function could return different data types depending on the input, for example).
When a function is provided by a third-party package (as this one is), you can either just trust the documentation (which for Murmur indicates 4-byte ints as far as I can tell) or test the return value yourself before using it (whether by if, assert, or try, depending on your preference).

Categories

Resources