I was looking through the quicktions fractions library and I found this cython syntax I've never seen before:
an, ad = (<Fraction>a)._numerator, (<Fraction>a)._denominator
What does (<Fractions>a) represent? It seems like it's some sort of memory allocation. But, I'm not sure.
It's a type cast.
It assures Cython that the object really is a Fraction so that it can access the _numerator and _denominator attributes of the cdef type. Without the cast it can only use the generic Python lookup mechanisms to find attributes which doesn't allow you to access any non-public attributes of cdef types.
It doesn't do any checks that it is actually the correct type, so if you're not 100% sure that the object is actually a fraction then you should do <Fraction?> instead which does check.
That is just the Cython syntax for type casting. In this case, a is being casted to a Fraction type. The additional parentheses are necessary to signify that you want to cast a and the get the _numerator property of the casted value, not cast a._numerator.
Related
In numpy, to check the type of array the code is
type(array_name)
but to check the type of the values stored in the array the code is
array_name.dtype
I would have thought it would be
dtype(array_name)
This problem keeps arises in different contexts as well.
dtype is the type of the contents of the array, and it's a numpy (and pandas)-specific thing. It's easier and more convenient both for the developers and the users of the library to store it as such.
type returns the Python type of any object, is a Python built-in function. While the designers of Python could have made it a property on every object, they chose to make it a global function.
dtype and type seem very similar in this case, but in reality, they have nothing to do with each other.
In Python, type is a built-in function that returns the type of anything you pass to it in argument. You could call type(x) without any assumption on x and Python would tell you about the type of x.
On the other hand, numpy arrays are objects. As such they have a certain number of attributes and one of them is dtype. Only numpy arrays (and other objects that follow the same logic) have dtypes: it wouldn't make sense to ask for the dtype of an integer for example.
Searching for this topic I came across the following: How to represent integer infinity?
I agree with Martijn Peeters that adding a separate special infinity value for int may not be the best of ideas.
However, this makes type hinting difficult. Assume the following code:
myvar = 10 # type: int
myvar = math.inf # <-- raises a typing error because math.inf is a float
However, the code behaves everywhere just the way as it should. And my type hinting is correct everywhere else.
If I write the following instead:
myvar = 10 # type: Union[int, float]
I can assign math.inf without a hitch. But now any other float is accepted as well.
Is there a way to properly constrain the type-hint? Or am I forced to use type: ignore each time I assign infinity?
The super lazy (and probably incorrect) solution:
Rather than adding a specific value, the int class can be extended via subclassing. This approach is not without a number of pitfalls and challenges, such as the requirement to handle the infinity value for the various __dunder__ methods (i.e. __add__, __mul__, __eq__ and the like, and all of these should be tested). This would be an unacceptable amount of overhead in the use cases where a specific value is required. In such a case, wrapping the desired value with typing.cast would be able to better indicate to the type hinting system the specific value (i.e. inf = cast(int, math.inf)) be acceptable for assignment.
The reason why this approach is incorrect is simply this: since the value assigned looks/feels exactly like some number, some other users of your API may end up inadvertently use this as an int and then the program may explode on them badly when math.inf (or variations of such) be provided.
An analogy is this: given that lists have items that are indexed by positive integers, we would expect that any function that return an index to some item be some positive integer so we may use it directly (I know this is not the case in Python given there are semantics that allow negative index values be used, but pretend we are working with say C for the moment). Say this function return the first occurrence of the matched item, but if there are any errors it return some negative number, which clearly exceed the range of valid values for an index to some item. This lack of guarding against naive usage of the returned value will inevitably result in problems that a type system is supposed to solve.
In essence, creating surrogate values and marking that as an int will offer zero value, and inevitably allow unexpected and broken API/behavior to be exhibited by the program due to incorrect usage be automatically allowed.
Not to mention the fact that infinity is not a number, thus no int value can properly represent that (given that int represent some finite number by its very nature).
As an aside, check out str.index vs str.find. One of these have a return value that definitely violate user expectations (i.e. exceed the boundaries of the type positive integer; won't be told that the return value may be invalid for the context which it may be used at during compile time, results in potential failure randomly at runtime).
Framing the question/answer in more correct terms:
Given the problem is really about the assignment of some integer when a rate exist, and if none exist some other token that represent unboundedness for the particular use case should be done (it could be some built-in value such as NotImplemented or None). However as those tokens would also not be int values, it means myvar would actually need a type that encompasses those, and with a way to apply operation that would do the right thing.
This unfortunately isn't directly available in Python in a very nice way, however in strongly static typed languages like Haskell, the more accepted solution is to use a Maybe type to define a number type that can accept infinity. Note that while floating point infinity is also available there, it inherits all the problems of floating point numbers that makes that an untenable solution (again, don't use inf for this).
Back to Python: depending on the property of the assignment you actually want, it could be as simple as creating a class with a constructor that can either accept an int or None (or NotImplemented), and then provide a method which the users of the class may make use of the actual value. Python unfortunately do not provide the advanced constructs to make this elegant so you will inevitably end up with code managing this be splattered all over the place, or have to write a number of methods that handle whatever input as expected and produce the required output in the specific ways your program actual needs.
Unfortunately, type-hinting is really only scratching the surface and simply grazing over of what more advanced languages have provided and solved at a more fundamental level. I supposed if one must program in Python, it is better than not having it.
Facing the same problem, I "solved" as follow.
from typing import Union
import math
Ordinal = Union[int, float] # int or infinity
def fun(x:Ordinal)->Ordinal:
if x > 0:
return x
return math.inf
Formally, it does exactly what you did not wanted to. But now the intend is clearer. When the user sees Ordinal, he knows that it is expected to be int or math.inf.
and the linter is happy.
import typing
type(typing.cast(int, '11'))
still returns <class 'str'> instead of int. Then, what does typing.cast do here?
From the documentation (emphasis mine):
Cast a value to a type.
This returns the value unchanged. To the type checker this signals that the return value has the designated type, but at runtime we intentionally don’t check anything (we want this to be as fast as possible).
The "casting" only takes place in the type-checking system, not at runtime.
I'm using type hints and mypy more and more. I however have some questions about when I should explicitly annotate a declaration, and when the type can be determined automatically by mypy.
Ex:
def assign_volume(self, volume: float) -> None:
self._volume = volume * 1000
Should I write
self._volume: float = volume *1000
In this case?
Now if I have the following function:
def return_volume(self) -> float:
return self._volume
and somewhere in my code:
my_volume = return_volume()
Should I write:
my_volume: float = return_volume()
Mypy (and PEP 484 in general) is designed so that in the most ideal case, you only need to add type annotations to the "boundaries" or "interfaces" of your code.
For example, you basically must add annotations/type metadata in the following places:
The parameter and return types of functions and methods.
Any object fields (assuming the types of your fields are not inferrable just by looking at your constructor)
When you inherit a class. For example, if you specifically want to subclass a dict of ints to strs, you should do class MyClass(Dict[int, str]): ..., not class MyClass(dict): ....
These are all examples of "boundaries" of your code. Type hints on parameter/return types let the caller of the function make sure they're calling it correctly, type hints on fields let the caller know they're using the object correctly, etc...
Mypy (and other PEP 484 compliant tools) will then use that information and try to infer the types of everything else. This behavior is designed to roughly mimic how humans read code: once you know what types are being passed in, for example, it's usually pretty easy to understand what the rest of the code does.
After all, Python is a language that was designed from the start to be readable! We don't need to scatter type hints everywhere to enhance our understanding of what the code does.
Of course, mypy (and other PEP 484-compliant tools) aren't perfect, and sometimes they might not correctly infer what the type of some local variable will be. In that case, you might need to add a type hint to help mypy along. Ethan's answer gives a good overview of some common cases to watch out for. (Interestingly, these cases also tend to be examples of where a human reader might struggle to understand your code!)
So, to put everything together, the general recommendation is to:
Add type hints to all of the "boundaries" of your code, like function parameters and return types.
Default to not annotating variables. If mypy is unable to infer what type some variable should be, add an annotation to help it.
If you find yourself needing to annotate lots of variables to make mypy happy, consider refactoring your code. If mypy is getting confused easily, a human reader is also likely to get confused easily.
So, to go back to your examples, you would not add type hints in either case. Both a human reader and mypy can tell that your _volume field must be a float: it's immediately obvious that must be the case since the parameter is a float and multiplying a float by an int will always produce another float.
Similarly, you would not add an annotation to your my_volume variable. Since return_volume() has type hints, it's trivially easy to see what type it's returning and understand that my_volume is of type float. (And if you make a mistake and accidentally think it's something other then a float, then mypy will catch that for you.)
Mypy does some pretty advanced type inference. Usually, you do not need to annotate variables. The mypy documentation [1] says this about inference:
Mypy considers the initial assignment as the definition of a variable. If you do not explicitly specify the type of the variable, mypy infers the type based on the static type of the value expression
The general rule of thumb then is "annotate variables whose types are not inferrable at their initial assignment".
Here are some examples:
Empty containers. If I define a as a = [], mypy will not know what types are valid in the list a.
Optional types. Oftentimes, if I define an Optional type, I will assign the variable to None. For example, if I do a = None, mypy will infer that a has type NoneType, if you want to assign a to 5 later on, you need to annotate it: a: Optional[int] = None.
Complex nested containers. For example, if you have a dictionary with both list and string values, mypy might, for example, infer Dict[str, Any]. You may need to annotate it to be more accurate.
Of course there are many more cases.
In your examples, mypy can infer the types of the expressions.
[1] https://mypy.readthedocs.io/en/latest/type_inference_and_annotations.html
For myself it started to write type hints everywhere, where it is possible. It isn't slower at all and it makes it easier if you will go back to your old code in the feature. So there is now negative aspect on using them as much as possible except of the size of your python file.
Sorry if this is quite noobish to you, but I'm just starting out to learn Python after learning C++ & Java, and I am wondering how in the world I could just declare variables like id = 0 and name = 'John' without any int's or string's in front! I figured out that perhaps it's because there are no ''s in a number, but how would Python figure that out in something like def increase(first, second) instead of something like int increase(int first, int second) in C++?!
The literal objects you mention carry (pointers to;-) their own types with them of course, so when a name's bound to that object the problem of type doesn't arise -- the object always has a type, the name doesn't -- just delegates that to the object it's bound to.
There's no "figuring out" in def increase(first, second): -- name increase gets bound to a function object, names first and second are recorded as parameters-names and will get bound (quite possibly to objects of different types at various points) as increase gets called.
So say the body is return first + second -- a call to increase('foo', 'bar') will then happily return 'foobar' (delegating the addition to the objects, which in this case are strings), and maybe later a call to increase(23, 45) will just as happily return 68 -- again by delegating the addition to the objects bound to those names at the point of call, which in this case are ints. And if you call with incompatible types you'll get an exception as the delegated addition operation can't make sense of the situation -- no big deal!
Python is dynamically typed: all variables can refer to an object of any type. id and name can be anything, but the actual objects are of types like int and str. 0 is a literal that is parsed to make an int object, and 'John' a literal that makes a str object. Many object types do not have literals and are returned by a callable (like frozenset—there's no way to make a literal frozenset, you must call frozenset.)
Consequently, there is no such thing as declaration of variables, since you aren't defining anything about the variable. id = 0 and name = 'John' are just assignment.
increase returns an int because that's what you return in it; nothing in Python forces it not to be any other object. first and second are only ints if you make them so.
Objects, to a certain extent, share a common interface. You can use the same operators and functions on them all, and if they support that particular operation, it works. It is a common, recommended technique to use different types that behave similarly interchangably; this is called duck typing. For example, if something takes a file object you can instead pass a cStringIO.StringIO object, which supports the same method as a file (like read and write) but is a completely different type. This is sort of like Java interfaces, but does not require any formal usage, you just define the appropriate methods.
Python uses the duck-typing method - if it walks, looks and quacks like a duck, then it's a duck. If you pass in a string, and try to do something numerical on it, then it will fail.
Have a look at: http://en.wikipedia.org/wiki/Python_%28programming_language%29#Typing and http://en.wikipedia.org/wiki/Duck_typing
When it comes to assigning literal values to variables, the type of the literal value can be inferred at the time of lexical analysis. For example, anything matching the regular expression (-)?[1-9][0-9]* can be inferred to be an integer literal. If you want to convert it to a float, there needs to be an explicit cast. Similarly, a string literal is any sequence of characters enclosed in single or double quotes.
In a method call, the parameters are not type-checked. You only need to pass in the correct number of them to be able to call the method. So long as the body of the method does not cause any errors with respect to the arguments, you can call the same method with lots of different types of arguments.
In Python, Unlike in C++ and Java, numbers and strings are both objects. So this:
id = 0
name = 'John'
is equivalent to:
id = int(0)
name = str('John')
Since variables id and name are references that may address any Python object, they don't need to be declared with a particular type.