Disclaimer:
unlike 99.9% of most out there, I didn't pick up python until very late in the progression of languages I write in. I won't harp on some of the odd behaviors of the import model, but I do find myself having an issue understanding why the type checking (ie: "what kinda thing is you random object some user has given me hmm?) is all over the place.
Really this is just checking what class of data a thing is, but in python it's never struck me as being straightforward and in my research on the interwebz, well let's just say their are opinions and the only thing anyone agrees on is using the term pythonic. My question boils down to type(x) == y vs isinstance(x, y) when the type isn't one of the more straightforward list, tuple, float, int, ... yadda yadda .
Current Conundrum:
I need the ability to determine if an object that is being passed(either directly, or dynamically within a recursive routine) is not just an iterable, but more specifically an object created by scandir. Please don't get lost in the singular issue, i'll show i have many ways to get to this, but the bigger question is:
A) Is the method I'm using to coerce the output of type() going to bite me in the backside given a case I am not thinking of?
B) Am I missing a simpler way of accessing the 'class|type' of an object that is language-specific type of thing?
C) TBD
I'll start by showing maybe where the root of my disconnect comes from, and have a little fun with the people I know will take the time to answer this question properly by a first example in R.
I'm going to cast my own class attribute just to show what i'm talking about:
> a <- 1:3
> class(a)
[1] "integer"
> attr(a, "class")
[1] "integer"
Ok so, like in python, we can ask if this is an int(eger) etc. Now I can re-class as I see fit, which is getting to the point of where i'm going with the python issue:
> class(a) <- "i.can.reclass.how.i.want"
> class(a)
[1] "i.can.reclass.how.i.want"
> attr(a, "class")
[1] "i.can.reclass.how.i.want"
So now in python, let's say I have a data.frame, or as you all put it DataFrame:
>>> import pandas as pd
>>> df = pd.DataFrame({"a":[1,2,3]})
>>> type(df)
pandas.core.frame.DataFrame
Ok, so if i want to determine if my object is a DataFrame:
>>> df = pd.DataFrame({"a":[1,2,3]})
# Get the mro of type(df)? and remove 'object' as an item in the mro tuple
>>> isinstance(df, type(df).__mro__[:-1])
True
# hmmmm
>>> isinstance(df, (pandas.core.frame.DataFrame))
NameError: name 'pandas' is not defined
# hmmm.. aight let's try..
>>> isinstance(df, (pd.core.frame.DataFrame))
True
# Lulz... alright then, I guess i get that, but why did __mro__ pass with pandas vs pd? Not the point...
For when you can't do that
# yes..i know.. 3.5+ os.scandir... focus on bigger picture of this question/issue
import scandir
>>> a = scandir.scandir("/home")
>>> type(a)
posix.ScandirIterator
>>> str(type(scandir.scandir("/home")))
"<class 'scandir.ScandirIterator'>"
>>> isinstance(scandir.scandir("/home"), (scandir,scandir.ScandirIterator))
AttributeError: module 'scandir' has no attribute 'ScandirIterator'
# Okay fair enough.. kinda thought it could work like pandas, maybe can but I can't find it?
Question:
Does that mean that my only way of knowing the instance/type of certain objects like the scandir example are essentially the below type hacks?
import re
def isinstance_from_type(x, class_info):
_chunk = re.search("(?<=\s['|\"]).*?(?=['|\"])", str(type(x)),re.DOTALL)
try:
return _chunk.group(0) == str(class_info)
except:
return False
>>> a = scandir.scandir("/home")
>>> type(a) == "scandir.ScandirIterator"
False
>>> isinstance_from_type(a, "scandir.ScandirIterator")
True
Okay I get why i don't get a string back from calling type etc, but please let me know if there's a better, more universal and consistent method i simply don't know, or the hot and dangerous things that are coming using a regex; trust me.. i get it.
Thanks for reading and any/all feedback about the mechanics of this specific to python are welcomed.
Related
On a project I have a generic function, which can take different data types as input data. While migrating project to Python 3 I have an issue with odict_values. I need to convert those to list, unfortunately, not all data types should be converted. So I decided to do something like this:
if isinstance(data, odict_values):
data = list(data)
But I get an error - undefined variable odict_values. I don't understand what should I provide as a second argument for isinstance. I can clearly see <class 'odict_values'> if I use type(data). The best solution I came up so far is to use:
str(type(data)) == "<class 'odict_values'>"
but it feels wrong.
The odict_values type is not accessible in the built-in types, nor in the collections module.
That means you have to define it yourself:
from collections import OrderedDict
odict_values = type(OrderedDict().values())
You can (and probably should) use a more descriptive name for this type than odict_values.
However you can then you can use this type as second argument for isinstance checks:
isinstance({1: 1}.values(), odict_values) # False
isinstance(OrderedDict([(1, 1)]).values(), odict_values) # True
If you want a more general test if it's a view on the values of a mapping (like dict and OrderedDict), then you could use the abstract base class ValuesView:
from collections.abc import ValuesView
isinstance({1: 1}.values(), ValuesView) # True
isinstance(OrderedDict([(1, 1)]).values(), ValuesView) # True
I just start with python and get one question. is it a good idea to design a function return multi type of value? I read some information on sit and totally understand it is better to rise exception when an error is encountered or a precondition is unsatisfied. but what if there is no error but just different multi type of return value? it is a dummy function but for multi_value function, i do not need to write something like multi_value()[0] if I need to access the value from function
refer:https://docs.quantifiedcode.com/python-anti-patterns/maintainability/returning_more_than_one_variable_type_from_function_call.html
from typing import Union
def multi_value(para : Union[list, int]):
return para[0] if len(para) == 1 else para
def fun(para : Union[list, int]):
return para
print(type(multi_value([1,2,3]))) #--> [1,2,3]
print(type(multi_value(['1']))) #--> '1'
print(type(multi_value([1]))) #--> 1
The idea of not having different types is to make your functions easier to use. If the caller has to check the return type he gets unnecessarily complicated, hard to read and error prone code. Even worse: the caller might not do the check at all and then gets caught by surprise. You show a nice example: returning a list or a scalar value. If you do as shown the caller has to write something like
res = multi_value(x)
try:
for i in res:
do_something_with_res(i)
except TypeError:
do_something_with_res(res)
Given your function really does not throw anything this all would collapse to
for i in multi_value(x)
do_something_with_res(i)
if you would returning single (or no) values also as lists. The advantage should be obvious. You may think you do the caller a favor - but that is just not true.
Some remark to the linked article: I think they gave a sub-optimal example on the matter. The example is more about returning error code vs raising exceptions, which is a little different.
TLDR summary
I wrote a function navigateDict that does a safe navigation on a dict, similar to dict.get() but nested. It replaces code like
if 1 in data and 'i' in data[1] and 'a' in data[1]['i']:
print data[1]['i']['a']
else:
print "Not found"
with the roughly equivalent
found = navigateDict(data, 1, 'i', 'a')
if found is not None:
print found
else:
print "Not found"
Is anything similar to this already part of the standard library?
Is there a more idiomatic way to do the same thing?
Any response that requires typing any path component key more than once is probably a non-answer.
Additional details
The implementation is as follows:
# Allow fallback value other than None
def navigateDictEx(d, keys, fallback=None):
for key in keys:
if key in d:
d = d[key]
else:
return fallback
return d
def navigateDict(d, *keys):
return navigateDictEx(d, keys)
See the summary for example usage.
Pythonic or not, this function reduces repetition in a place where redundancy is a bad idea. For example, changing one path component in the example requires up to three distinct values to be modified as one in the original example, but only one in the modified example. Given my regular tendency to err, this is a big win.
Ultimately I'm asking this: Is there something in the standard library that does this, or am I going to need to find a place for it in my project's library?
If hits are expected to dominate misses
brionius correctly points out that catching KeyError will work:
try:
print data[1]['i']['a']
except KeyError:
print "Not found"
This might be the way I go; it's pretty terse and cuts the repetition. However, it does reflect an assumption that there will be more hits than misses. If there's a better way of assuming the opposite I'd like to know that, also.
One way to do this is as follows:
try:
print data[1]['i']['a']
except KeyError:
print "Not found!"
It's in line with the spirit of duck-typing. It may or may not be as fast, as I believe handling exceptions carries a certain amount of overhead, but it's certainly "safe".
a solution like this is cool
https://twitter.com/raymondh/status/343823801278140417
>>> from collections import defaultdict
>>> infinite_defaultdict = lambda: defaultdict(infinite_defaultdict)
>>> d = infinite_defaultdict()
>>> d['x']['y']['z'] = 10
>>> if d['x']['y']['z']: print d['x']['y']['z'] #better reflects that misses are common
Years late to the game, but for anyone stumbling upon this, there still does not seem to be a native, fluent way to safely navigate a Python dict.
Enter RestResponse:
"RestResponse aims to be a fluent python object for interfacing with RESTful JSON APIs"
This library includes a NoneProp object that allows for safely navigating (and building) JSON data structures.
>>> import RestResponse
>>> data = RestResponse.parse({})
>>> data.property.is_none
None
>>> bool(data.property.is_none)
False
>>> isinstance(data.property.is_none, RestResponse.NoneProp)
True
>>> data.property.is_none = None
>>> isinstance(data.property.is_none, RestResponse.NoneProp)
False
>>> print data.pretty_print()
{
"property": {
"is_none": null
}
}
I'm a programming student and my teacher is starting with C to teach us the programming paradigms, he said it's ok if I deliver my homework in python (it's easier and faster for the homeworks). And I would like to have my code to be as close as possible as in plain C.
Question is:
How do I declare data types for variables in python like you do in C. ex:
int X,Y,Z;
I know I can do this in python:
x = 0
y = 0
z = 0
But that seems a lot of work and it misses the point of python being easier/faster than C.
So, whats the shortest way to do this?
P.S. I know you don't have to declare the data type in python most of the time, but still I would like to do it so my code looks as much possible like classmates'.
Starting with Python 3.6, you can declare types of variables and functions, like this :
explicit_number: type
or for a function
def function(explicit_number: type) -> type:
pass
This example from this post: How to Use Static Type Checking in Python 3.6 is more explicit
from typing import Dict
def get_first_name(full_name: str) -> str:
return full_name.split(" ")[0]
fallback_name: Dict[str, str] = {
"first_name": "UserFirstName",
"last_name": "UserLastName"
}
raw_name: str = input("Please enter your name: ")
first_name: str = get_first_name(raw_name)
# If the user didn't type anything in, use the fallback name
if not first_name:
first_name = get_first_name(fallback_name)
print(f"Hi, {first_name}!")
See the docs for the typing module
Edit: Python 3.5 introduced type hints which introduced a way to specify the type of a variable. This answer was written before this feature became available.
There is no way to declare variables in Python, since neither "declaration" nor "variables" in the C sense exist. This will bind the three names to the same object:
x = y = z = 0
Simply said: Typing in python is useful for hinting only.
x: int = 0
y: int = 0
z: int = 0
Python isn't necessarily easier/faster than C, though it's possible that it's simpler ;)
To clarify another statement you made, "you don't have to declare the data type" - it should be restated that you can't declare the data type. When you assign a value to a variable, the type of the value becomes the type of the variable. It's a subtle difference, but different nonetheless.
I'm surprised no one has pointed out that you actually can do this:
decimalTwenty = float(20)
In a lot of cases it is meaningless to type a variable, as it can be retyped at any time. However in the above example it could be useful. There are other type functions like this such as: int(), long(), float() and complex()
But strong types and variable definitions are actually there to make development easier. If you haven't thought these things through in advance you're not designing and developing code but merely hacking.
Loose types simply shift the complexity from "design/hack" time to run time.
Everything in Python is an object, and that includes classes, class instances, code in functions, libraries of functions called modules, as well as data values like integers, floating-point numbers, strings, or containers like lists and dictionaries. It even includes namespaces which are dictionary-like (or mapping) containers which are used to keep track of the associations between identifier names (character string objects) and to the objects which currently exist. An object can even have multiple names if two or more identifiers become associated with the same object.
Associating an identifier with an object is called "binding a name to the object". That's the closest thing to a variable declaration there is in Python. Names can be associated with different objects at different times, so it makes no sense to declare what type of data you're going to attach one to -- you just do it. Often it's done in one line or block of code which specifies both the name and a definition of the object's value causing it to be created, like <variable> = 0 or a function starting with a def <funcname>.
How this helps.
I use data types to assert unique values in python 2 and 3. Otherwise I cant make them work like a str or int types. However if you need to check a value that can have any type except a specific one, then they are mighty useful and make code read better.
Inherit object will make a type in python.
class unset(object):
pass
>>> print type(unset)
<type 'type'>
Example Use: you might want to conditionally filter or print a value using a condition or a function handler so using a type as a default value will be useful.
from __future__ import print_function # make python2/3 compatible
class unset(object):
pass
def some_func(a,b, show_if=unset):
result = a + b
## just return it
if show_if is unset:
return result
## handle show_if to conditionally output something
if hasattr(show_if,'__call__'):
if show_if(result):
print( "show_if %s = %s" % ( show_if.__name__ , result ))
elif show_if:
print(show_if, " condition met ", result)
return result
print("Are > 5)")
for i in range(10):
result = some_func(i,2, show_if= i>5 )
def is_even(val):
return not val % 2
print("Are even")
for i in range(10):
result = some_func(i,2, show_if= is_even )
Output
Are > 5)
True condition met 8
True condition met 9
True condition met 10
True condition met 11
Are even
show_if is_even = 2
show_if is_even = 4
show_if is_even = 6
show_if is_even = 8
show_if is_even = 10
if show_if=unset is perfect use case for this because its safer and reads well. I have also used them in enums which are not really a thing in python.
I have a Python function that takes a numeric argument that must be an integer in order for it behave correctly. What is the preferred way of verifying this in Python?
My first reaction is to do something like this:
def isInteger(n):
return int(n) == n
But I can't help thinking that this is 1) expensive 2) ugly and 3) subject to the tender mercies of machine epsilon.
Does Python provide any native means of type checking variables? Or is this considered to be a violation of the language's dynamically typed design?
EDIT: since a number of people have asked - the application in question works with IPv4 prefixes, sourcing data from flat text files. If any input is parsed into a float, that record should be viewed as malformed and ignored.
isinstance(n, int)
If you need to know whether it's definitely an actual int and not a subclass of int (generally you shouldn't need to do this):
type(n) is int
this:
return int(n) == n
isn't such a good idea, as cross-type comparisons can be true - notably int(3.0)==3.0
Yeah, as Evan said, don't type check. Just try to use the value:
def myintfunction(value):
""" Please pass an integer """
return 2 + value
That doesn't have a typecheck. It is much better! Let's see what happens when I try it:
>>> myintfunction(5)
7
That works, because it is an integer. Hm. Lets try some text.
>>> myintfunction('text')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in myintfunction
TypeError: unsupported operand type(s) for +: 'int' and 'str'
It shows an error, TypeError, which is what it should do anyway. If caller wants to catch that, it is possible.
What would you do if you did a typecheck? Show an error right? So you don't have to typecheck because the error is already showing up automatically.
Plus since you didn't typecheck, you have your function working with other types:
Floats:
>>> print myintfunction(2.2)
4.2
Complex numbers:
>>> print myintfunction(5j)
(2+5j)
Decimals:
>>> import decimal
>>> myintfunction(decimal.Decimal('15'))
Decimal("17")
Even completely arbitrary objects that can add numbers!
>>> class MyAdderClass(object):
... def __radd__(self, value):
... print 'got some value: ', value
... return 25
...
>>> m = MyAdderClass()
>>> print myintfunction(m)
got some value: 2
25
So you clearly get nothing by typechecking. And lose a lot.
UPDATE:
Since you've edited the question, it is now clear that your application calls some upstream routine that makes sense only with ints.
That being the case, I still think you should pass the parameter as received to the upstream function. The upstream function will deal with it correctly e.g. raising an error if it needs to. I highly doubt that your function that deals with IPs will behave strangely if you pass it a float. If you can give us the name of the library we can check that for you.
But... If the upstream function will behave incorrectly and kill some kids if you pass it a float (I still highly doubt it), then just just call int() on it:
def myintfunction(value):
""" Please pass an integer """
return upstreamfunction(int(value))
You're still not typechecking, so you get most benefits of not typechecking.
If even after all that, you really want to type check, despite it reducing your application's readability and performance for absolutely no benefit, use an assert to do it.
assert isinstance(...)
assert type() is xxxx
That way we can turn off asserts and remove this <sarcasm>feature</sarcasm> from the program by calling it as
python -OO program.py
Python now supports gradual typing via the typing module and mypy. The typing module is a part of the stdlib as of Python 3.5 and can be downloaded from PyPi if you need backports for Python 2 or previous version of Python 3. You can install mypy by running pip install mypy from the command line.
In short, if you want to verify that some function takes in an int, a float, and returns a string, you would annotate your function like so:
def foo(param1: int, param2: float) -> str:
return "testing {0} {1}".format(param1, param2)
If your file was named test.py, you could then typecheck once you've installed mypy by running mypy test.py from the command line.
If you're using an older version of Python without support for function annotations, you can use type comments to accomplish the same effect:
def foo(param1, param2):
# type: (int, float) -> str
return "testing {0} {1}".format(param1, param2)
You use the same command mypy test.py for Python 3 files, and mypy --py2 test.py for Python 2 files.
The type annotations are ignored entirely by the Python interpreter at runtime, so they impose minimal to no overhead -- the usual workflow is to work on your code and run mypy periodically to catch mistakes and errors. Some IDEs, such as PyCharm, will understand type hints and can alert you to problems and type mismatches in your code while you're directly editing.
If, for some reason, you need the types to be checked at runtime (perhaps you need to validate a lot of input?), you should follow the advice listed in the other answers -- e.g. use isinstance, issubclass, and the like. There are also some libraries such as enforce that attempt to perform typechecking (respecting your type annotations) at runtime, though I'm uncertain how production-ready they are as of time of writing.
For more information and details, see the mypy website, the mypy FAQ, and PEP 484.
if type(n) is int
This checks if n is a Python int, and only an int. It won't accept subclasses of int.
Type-checking, however, does not fit the "Python way". You better use n as an int, and if it throws an exception, catch it and act upon it.
Don't type check. The whole point of duck typing is that you shouldn't have to. For instance, what if someone did something like this:
class MyInt(int):
# ... extra stuff ...
Programming in Python and performing typechecking as you might in other languages does seem like choosing a screwdriver to bang a nail in with. It is more elegant to use Python's exception handling features.
From an interactive command line, you can run a statement like:
int('sometext')
That will generate an error - ipython tells me:
<type 'exceptions.ValueError'>: invalid literal for int() with base 10: 'sometext'
Now you can write some code like:
try:
int(myvar) + 50
except ValueError:
print "Not a number"
That can be customised to perform whatever operations are required AND to catch any errors that are expected. It looks a bit convoluted but fits the syntax and idioms of Python and results in very readable code (once you become used to speaking Python).
I would be tempted to to something like:
def check_and_convert(x):
x = int(x)
assert 0 <= x <= 255, "must be between 0 and 255 (inclusive)"
return x
class IPv4(object):
"""IPv4 CIDR prefixes is A.B.C.D/E where A-D are
integers in the range 0-255, and E is an int
in the range 0-32."""
def __init__(self, a, b, c, d, e=0):
self.a = check_and_convert(a)
self.b = check_and_convert(a)
self.c = check_and_convert(a)
self.d = check_and_convert(a)
assert 0 <= x <= 32, "must be between 0 and 32 (inclusive)"
self.e = int(e)
That way when you are using it anything can be passed in yet you only store a valid integer.
how about:
def ip(string):
subs = string.split('.')
if len(subs) != 4:
raise ValueError("incorrect input")
out = tuple(int(v) for v in subs if 0 <= int(v) <= 255)
if len(out) != 4:
raise ValueError("incorrect input")
return out
ofcourse there is the standard isinstance(3, int) function ...
For those who are looking to do this with assert() function. Here is how you can efficiently place the variable type check in your code without defining any additional functions. This will prevent your code from running if the assert() error is raised.
assert(type(X) == int(0))
If no error was raised, code continues to work. Other than that, unittest module is a very useful tool for this sorts of things.