PEP 3141 defines a numerical hierarchy with Complex.__add__ but no Number.__add__. This seems to be a weird choice, since the other numeric type Decimal that (virtually) derives from Number also implements an add method.
So why is it this way? If I want to add type annotations or assertions to my code, should I use x:(Complex, Decimal)? Or x:Number and ignore the fact that this declaration is practically meaningless?
I believe the answer can be found in the Rejected Alternatives:
The initial version of this PEP defined an algebraic hierarchy
inspired by a Haskell Numeric Prelude [3] including MonoidUnderPlus,
AdditiveGroup, Ring, and Field, and mentioned several other possible
algebraic types before getting to the numbers. We had expected this to
be useful to people using vectors and matrices, but the NumPy
community really wasn't interested ...
There are more complicated number systems where addition is clearly not supported. They could have went in much more detail with their class hierarchy (and originally intended to), but there is a lack of interest in the community. Hence, it is easier just to leave Numbers unspecified for anyone who wants to get more complicated.
Note that Monoids are an example where only one binary operation is defined.
In numbers.py. There is note on Decimal and Real.
24 ## Notes on Decimal
25 ## ----------------
26 ## Decimal has all of the methods specified by the Real abc, but it should
27 ## not be registered as a Real because decimals do not interoperate with
28 ## binary floats (i.e. Decimal('3.14') + 2.71828 is undefined). But,
29 ## abstract reals are expected to interoperate (i.e. R1 + R2 should be
30 ## expected to work if R1 and R2 are both Reals).
And also put some related links here. Really a good question, drive me dig the hole around. :P
A related github issue
PEP 3119 Which all about
ABC(Abstract Base Class) and PEP3141 defines Number part.
cpython/Lib/numbers.py
Related
At least in Python 3, float has attributes real and imag, and a method conjugate(). Since issubclass(float, complex) evaluates to False, what is the reason for these?
It is obviously a design choice and it is very well rooted in Python numeric types (i.e. bool, int, float, complex), as clear from the source code (e.g. for float).
This has been discussed in PEP 3141, which resulted in the numbers module for Numeric abstract base classes module.
As you can see, .real, .imag and .conjugate() are part of the generic Number abstraction.
From a practical perspective, this means that any numeric algorithm can be safely written for complex and it will gracefully work for any Number subtype.
Searching for this topic I came across the following: How to represent integer infinity?
I agree with Martijn Peeters that adding a separate special infinity value for int may not be the best of ideas.
However, this makes type hinting difficult. Assume the following code:
myvar = 10 # type: int
myvar = math.inf # <-- raises a typing error because math.inf is a float
However, the code behaves everywhere just the way as it should. And my type hinting is correct everywhere else.
If I write the following instead:
myvar = 10 # type: Union[int, float]
I can assign math.inf without a hitch. But now any other float is accepted as well.
Is there a way to properly constrain the type-hint? Or am I forced to use type: ignore each time I assign infinity?
The super lazy (and probably incorrect) solution:
Rather than adding a specific value, the int class can be extended via subclassing. This approach is not without a number of pitfalls and challenges, such as the requirement to handle the infinity value for the various __dunder__ methods (i.e. __add__, __mul__, __eq__ and the like, and all of these should be tested). This would be an unacceptable amount of overhead in the use cases where a specific value is required. In such a case, wrapping the desired value with typing.cast would be able to better indicate to the type hinting system the specific value (i.e. inf = cast(int, math.inf)) be acceptable for assignment.
The reason why this approach is incorrect is simply this: since the value assigned looks/feels exactly like some number, some other users of your API may end up inadvertently use this as an int and then the program may explode on them badly when math.inf (or variations of such) be provided.
An analogy is this: given that lists have items that are indexed by positive integers, we would expect that any function that return an index to some item be some positive integer so we may use it directly (I know this is not the case in Python given there are semantics that allow negative index values be used, but pretend we are working with say C for the moment). Say this function return the first occurrence of the matched item, but if there are any errors it return some negative number, which clearly exceed the range of valid values for an index to some item. This lack of guarding against naive usage of the returned value will inevitably result in problems that a type system is supposed to solve.
In essence, creating surrogate values and marking that as an int will offer zero value, and inevitably allow unexpected and broken API/behavior to be exhibited by the program due to incorrect usage be automatically allowed.
Not to mention the fact that infinity is not a number, thus no int value can properly represent that (given that int represent some finite number by its very nature).
As an aside, check out str.index vs str.find. One of these have a return value that definitely violate user expectations (i.e. exceed the boundaries of the type positive integer; won't be told that the return value may be invalid for the context which it may be used at during compile time, results in potential failure randomly at runtime).
Framing the question/answer in more correct terms:
Given the problem is really about the assignment of some integer when a rate exist, and if none exist some other token that represent unboundedness for the particular use case should be done (it could be some built-in value such as NotImplemented or None). However as those tokens would also not be int values, it means myvar would actually need a type that encompasses those, and with a way to apply operation that would do the right thing.
This unfortunately isn't directly available in Python in a very nice way, however in strongly static typed languages like Haskell, the more accepted solution is to use a Maybe type to define a number type that can accept infinity. Note that while floating point infinity is also available there, it inherits all the problems of floating point numbers that makes that an untenable solution (again, don't use inf for this).
Back to Python: depending on the property of the assignment you actually want, it could be as simple as creating a class with a constructor that can either accept an int or None (or NotImplemented), and then provide a method which the users of the class may make use of the actual value. Python unfortunately do not provide the advanced constructs to make this elegant so you will inevitably end up with code managing this be splattered all over the place, or have to write a number of methods that handle whatever input as expected and produce the required output in the specific ways your program actual needs.
Unfortunately, type-hinting is really only scratching the surface and simply grazing over of what more advanced languages have provided and solved at a more fundamental level. I supposed if one must program in Python, it is better than not having it.
Facing the same problem, I "solved" as follow.
from typing import Union
import math
Ordinal = Union[int, float] # int or infinity
def fun(x:Ordinal)->Ordinal:
if x > 0:
return x
return math.inf
Formally, it does exactly what you did not wanted to. But now the intend is clearer. When the user sees Ordinal, he knows that it is expected to be int or math.inf.
and the linter is happy.
In the languages I have tested, - (x div y ) is not equal to -x div y; I have tested // in Python, / in Ruby, div in Perl 6; C has a similar behavior.
That behavior is usually according to spec, since div is usually defined as the rounding down of the result of the division, however it does not make a lot of sense from the arithmetic point of view, since it makes div behave in a different way depending on the sign, and it causes confusion such as this post on how it is done in Python.
Is there some specific rationale behind this design decision, or is just div defined that way from scratch? Apparently Guido van Rossum uses a coherency argument in a blog post that explains how it is done in Python, but you can have coherency also if you choose to round up.
(Inspired by this question by PMurias in the #perl6 IRC channel)
Ideally, we would like to have two operations div and mod, satisfying, for each b>0:
(a div b) * b + (a mod b) = a
0 <= (a mod b) < b
(-a) div b = -(a div b)
This is, however, a mathematical impossibility. If all the above were true, we would have
1 div 2 = 0
1 mod 2 = 1
since this is the unique integer solution to (1) and (2). Hence, we would also have, by (3),
0 = -0 = -(1 div 2) = (-1) div 2
which, by (1), implies
-1 = ((-1) div 2) * 2 + ((-1) mod 2) = 0 * 2 + ((-1) mod 2) = (-1) mod 2
making (-1) mod 2 < 0 which contradicts (2).
Hence, we need to give up some property among (1), (2), and (3).
Some programming languages give up (3), and make div round down (Python, Ruby).
In some (rare) cases the language offers multiple division operators. For instance, in Haskell we have div,mod satisfying only (1) and (2), similarly to Python, and we also have quot,rem satisfying only (1) and (3). The latter pair of operators rounds division towards zero, at the price of returning negative remainders, e.g., we have (-1) `quot` 2 = 0 and (-1) `rem` 2 = (-1).
C# also gives up (2), and allows % to return a negative remainder. Coherently, integer division rounds towards zero. Java, Scala, Pascal, and C, starting from C99, also adopt this strategy.
Floating-point operations are defined by IEEE754 with numeric applications in mind and, by default, round to the nearest representable value in a very strictly-defined manner.
Integer operations in computers are not defined by general international standards. The operations granted by languages (especially those of the C family) tend to follow whatever the underlying computer provides. Some languages define certain operations more robustly than others, but to avoid excessively difficult or slow implementations on the available (and popular) computers of their time, will choose a definition that follows its behaviour quite closely.
For this reason, integer operations tend to wrap around on overflow (for addition, multiplication, and shifting-left), and round towards negative infinity when producing an inexact result (for division, and shifting-right). Both of these are simple truncation at their respective end of the integer in two's-complement binary arithmetic; the simplest way to handle a corner-case.
Other answers discuss the relationship with the remainder or modulus operator that a language might provide alongside division. Unfortunately they have it backwards. Remainder depends on the definition of division, not the other way around, while modulus can be defined independently of division - if both arguments happen to be positive and division rounds down, they work out to be the same, so people rarely notice.
Most modern languages provide either a remainder operator or a modulus operator, rarely both. A library function may provide the other operation for people who care about the difference, which is that remainder retains the sign of the dividend, while modulus retains the sign of the divisor.
Because the implication of integer division is that the full answer includes a remainder.
Wikipedia has a great article on this, including history as well as theory.
As long as a language satisfies the Euclidean division property that (a/b) * b + (a%b) == a, both flooring division and truncating division are coherent and arithmetically sensible.
Of course people like to argue that one is obviously correct and the other is obviously wrong, but it has more the character of a holy war than a sensible discussion, and it usually has more to do with the choice of their early preferred language than anything else. They also often tend to argue primarily for their chosen %, even though it probably makes more sense to choose / first and then just pick the % that matches.
Flooring (like Python):
No less an authority than Donald Knuth suggests it.
% following the sign of the divisor is apparently what about 70% of all students guess
The operator is usually read as mod or modulo rather than remainder.
"C does it"—which isn't even true.1
Truncating (like C++):
Makes integer division more consistent with IEEE float division (in default rounding mode).
More CPUs implement it. (May not be true at different times in history.)
The operator is read modulo rather than remainder (even though this actually argues against their point).
The division property conceptually is more about remainder than modulus.
The operator is read mod rather than modulo, so it should follow Fortran's distinction. (This may sound silly, but may have been the clincher for C99. See this thread.)
"Euclidean" (like Pascal—/ floors or truncates depending on signs, so % is never negative):
Niklaus Wirth argued that nobody is ever surprised by positive mod.
Raymond T. Boute later argued that you can't implement Euclidean division naively with either of the other rules.
A number of languages provide both. Typically—as in Ada, Modula-2, some Lisps, Haskell, and Julia—they use names related to mod for the Python-style operator and rem for the C++-style operator. But not always—Fortran, for example, calls the same things modulo and mod (as mentioned above for C99).
We don't know why Python, Tcl, Perl, and the other influential scripting languages mostly chose flooring. As noted in the question, Guido van Rossum's answer only explains why he had to choose one of the three consistent answers, not why he picked the one he did.
However, I suspect the influence of C was key. Most scripting languages are (at least initially) implemented in C, and borrow their operator inventory from C. C89's implementation-defined % is obviously broken, and not suitable for a "friendly" language like Tcl or Python. And C calls the operator "mod". So they go with modulus, not remainder.
1. Despite what the question says—and many people using it as an argument—C actually doesn't have similar behavior to Python and friends. C99 requires truncating division, not flooring. C89 allowed either, and also allowed either version of mod, so there's no guarantee of the division property, and no way to write portable code doing signed integer division. That's just broken.
As Paula said, it is because of the remainder.
The algorithm is founded on Euclidean division.
In Ruby, you can write this rebuilding the dividend with consistency:
puts (10/3)*3 + 10%3
#=> 10
It works the same in real life. 10 apples and 3 people. Ok you can cut one apple in three, but going outside the set integers.
With negative numbers the consistency is also kept:
puts (-10/3)*3 + -10%3 #=> -10
puts (10/(-3))*(-3) + 10%(-3) #=> 10
puts (-10/(-3))*(-3) + -10%(-3) #=> -10
The quotient is always round down (down along the negative axis) and the reminder follows:
puts (-10/3) #=> -4
puts -10%3 #=> 2
puts (10/(-3)) #=> -4
puts 10%(-3) # => -2
puts (-10/(-3)) #=> 3
puts -10%(-3) #=> -1
This answer addresses a sub-part of the question that the other (excellent) answers didn't explicitly address. You noted:
you can have coherency also if you choose to round up.
Other answers addressed the choice between rounding down (towards -∞) and truncating (rounding towards 0) but didn't compare rounding up (towards ∞).
(The accepted answer touches on performance reasons to prefer rounding down on a two's-complement machine, which would also apply in comparison to rounding up. But there are more important semantic reasons to avoid rounding up.)
This answer directly addresses why rounding up is not a great solution.
Rounding up breaks elementary-school expectations
Building on an example from a previous answer's, it's common to informally say something like this:
If I evenly divide fourteen marbles among three people, each person gets four marbles and there are two marbles left over.
Indeed, this is how many students are first taught division (before being introduced to fractions/decimals). A student might write 14 ÷ 3 = 4 remainder 2. Since this is introduced so early, we'd really like our div operator to preserve this property.
Or, put a bit more formally, of the three properties discussed in the top-voted answer, the first one ((a div b) × b + (a mod b) = a) is by far the most important.
But rounding up breaks this property. If div rounds up, then 14 div 3 returns 5. This means that the equation above simplifies to 15 + (13 mod 4) = 13 – and that's not true for any definition of mod. Similarly, the less-formal/elementary-school approach is also out of luck – or at least requires introducing negative marbles: "Each person gets 5 marbles and there are negative one marbles left over".
(Rounding to the nearest integer also breaks the property when, as in the example above, that means rounding up.)
Thus, if we want to maintain elementary expectations, we cannot round up. And with rounding up off the table, the coherency argument that you linked in the question is sufficient to justify rounding down.
Being somehow surprised seeing things like this work:
float f = 10.25f;
int i = (int)f;
// Will give you i = 10
What is the gain?
OTOH 10.25 is quite a different thing than 10, which will be agreed, bad things might happen from such a soft conversion.
Which languages raise an error instead?
Would expect someting like: "Error: Can't represent 10.25 as an integer".
WRT to answers given meanwhile: Yes, it might considered reliable the way a function like "round" is. But not straight WRT to integrity of data/information to be expected from cast.
Maybe a function "truncate" which defaults to behavior of "int" would make a better choice?
It is precisely the
(int)f
that tells that the programmer is aware of what he is doing, while silently cutting off the fractional part and storing the rest in an integer is forbidden in most programming languages.
By the way, it is not just that the fractional part is cut off. It is also that a floating point number can have a value so large that it can't possibly be represented as an int. Consider:
(int) 1e20f
The statement int i = (int)f; explicitly says "Please take my float f and make it into int". This is certainly something I quite often find a useful thing to do - why wouldn't you want to be able to convert a float value from a calculation of some sort to an integer? The cast (int) will tell the compiler that "I really want this to be an integer", just like in C you can do char *p = (char *)1234567; - a typecast is there to tell the compiler "I really know what I'm doing".
If you do int i = f; or int i = 10.25; the compiler will still do what you "asked for" - convert the float value to an integer. It will probably issue a warning to say "You are converting a float to int", if you enable the appropriate warnings.
C and C++ are languages that require you to understand what you are doing, and what the consequences are - some other languages put more "barriers" in place to prevent such things, but that often means that the compiler has to add extra code to check things at runtime - C and C++ are designed to be "fast" languages.
It's a bit like driving a car, putting the car in reverse when there is a wall right behind, and stepping on the gas, will probably cause the car to crash into the wall - if that's not what you want, then "don't do that".
Firstly, the conversion is most definitely "reliable", as in "it will always do the same thing".
Whether you want to do that or not is up to you. In general the C/C++ languages are designed to give the programmer a lot of low-level power, and that means that the programmer needs to know what they are doing. If a float-to-int conversion surprises you then you need to think harder.
In fact, GCC has an option -Wconversion that will highlight cases like this. It isn't enabled by default, and is not part of -Wall or -Wextra (presumably because the behaviour is well understood and "expected" by most programmers), but the option is there if you need it.
Except that it won't give a warning, in this case, because your code includes an explicit cast (int), so the compiler assumes you did it deliberately.
This gives a warning (with -Wconversion):
int i = f;
This does not:
int i = (int)f;
Converting to an integer is useful in cases where you are working with complex data, but ultimately need to convert this data to an int to do something with it. Think of offsets in arrays, or pixels on a screen
Think of drawing a circle on the screen. There does not exist a fraction of a pixel (so the coordinates are ints), but you cannot calculate the coordinates of the pixel with just ints (sinus works with pi and other floats).
There's the following bit of Python code in a project I have to maintain:
# If the `factor` decimal is given, compute new price and a delta
factor = +factor.quantize(TWOPLACES)
new_price = +Decimal(old_price * factor).quantize(TWOPLACES)
delta = new_price - old_price
The question here is the purpose of + in front of a variable.
Python docs call it unary plus operator, which “yields its numeric argument unchanged”. Can it be safely removed then?
(Incidentally, the code was written by me some time ago, hopefully I've learned the lesson—it wouldn't be a question if tests existed, or if the use of unary plus on a decimal was clarified in comments.)
What that plus sign does depends on what it's defined to do by the result of that expression (that object's __pos__() method is called). In this case, it's a Decimal object, and the unary plus is equivalent to calling the plus() method. Basically, it's used to apply the current context (precision, rounding, etc.) without changing the sign of the number. Look for a setcontext() or localcontext() call elsewhere to see what the context is. For more information, see here.
The unary plus is not used very often, so it's not surprising this usage is unfamiliar. I think the decimal module is the only standard module that uses it.
I ran into this same problem when I wrongly assumed that Python must support the C increment (++) operator; it doesn't! Instead, it applies the plus-sign operator (+) twice! Which does nothing twice, I soon learned. However, because "++n" looked valid... not flagged as a syntax error... I created a terrible bug for myself.
So unless you redefine what it does, unary + actually does nothing. Unary - changes from positive to negative and vice-versa, which is why "--n" is also not flagged as a syntax error but it also does nothing.