Assume the following class:
class PersistenceType(enum.Enum):
keyring = 1
file = 2
def __str__(self):
type2String = {PersistenceType.keyring: "keyring", PersistenceType.file: "file"}
return type2String[self]
#staticmethod
def from_string(type):
if (type == "keyring" ):
return PersistenceType.keyring
if (type == "file"):
return PersistenceType.file
raise ???
Being a python noob, I am simply wondering: what specific kind of exception should be raised here?
The short answer is ValueError:
Raised when a built-in operation or function receives an argument that has the right type but an inappropriate value, and the situation is not described by a more precise exception such as IndexError.
The longer answer is that almost none of that class should exist. Consider:
class PersistenceType(enum.Enum):
keyring = 1
file = 2
This gives you everything your customized enum does:
To get the same result as your customised __str__ method, just use the name property:
>>> PersistenceType.keyring.name
'keyring'
To get a member of the enum using its name, treat the enum as a dict:
>>> PersistenceType['keyring']
<PersistenceType.keyring: 1>
Using the built-in abilities of Enum.enum gives you several advantages:
You're writing much less code.
You aren't repeating the names of the enum members all over the place, so you aren't going to miss anything if you modify it at some point.
Users of your enum, and readers of code that uses it, don't need to remember or look up any customized methods.
If you're coming to Python from Java, it's always worth bearing in mind that:
Python Is Not Java (or, stop writing so much code)
Guido1 has a time machine (or, stop writing so much code)
1 … or in this case Ethan Furman, the author of the enum module.
Related
For me what I do is detect what is unpickable and make it into a string (I guess I could have deleted it too but then it will falsely tell me that field didn't exist but I'd rather have it exist but be a string). But I wanted to know if there was a less hacky more official way to do this.
Current code I use:
def make_args_pickable(args: Namespace) -> Namespace:
"""
Returns a copy of the args namespace but with unpickable objects as strings.
note: implementation not tested against deep copying.
ref:
- https://stackoverflow.com/questions/70128335/what-is-the-proper-way-to-make-an-object-with-unpickable-fields-pickable
"""
pickable_args = argparse.Namespace()
# - go through fields in args, if they are not pickable make it a string else leave as it
# The vars() function returns the __dict__ attribute of the given object.
for field in vars(args):
field_val: Any = getattr(args, field)
if not dill.pickles(field_val):
field_val: str = str(field_val)
setattr(pickable_args, field, field_val)
return pickable_args
Context: I think I do it mostly to remove the annoying tensorboard object I carry around (but I don't think I will need the .tb field anymore thanks to wandb/weights and biases). Not that this matters a lot but context is always nice.
Related:
What does it mean for an object to be picklable (or pickle-able)?
Python - How can I make this un-pickleable object pickleable?
Edit:
Since I decided to move away from dill - since sometimes it cannot recover classes/objects (probably because it cannot save their code or something) - I decided to only use pickle (which seems to be the recommended way to be done in PyTorch).
So what is the official (perhaps optimized) way to check for pickables without dill or with the official pickle?
Is this the best:
def is_picklable(obj):
try:
pickle.dumps(obj)
except pickle.PicklingError:
return False
return True
thus current soln:
def make_args_pickable(args: Namespace) -> Namespace:
"""
Returns a copy of the args namespace but with unpickable objects as strings.
note: implementation not tested against deep copying.
ref:
- https://stackoverflow.com/questions/70128335/what-is-the-proper-way-to-make-an-object-with-unpickable-fields-pickable
"""
pickable_args = argparse.Namespace()
# - go through fields in args, if they are not pickable make it a string else leave as it
# The vars() function returns the __dict__ attribute of the given object.
for field in vars(args):
field_val: Any = getattr(args, field)
# - if current field value is not pickable, make it pickable by casting to string
if not dill.pickles(field_val):
field_val: str = str(field_val)
elif not is_picklable(field_val):
field_val: str = str(field_val)
# - after this line the invariant is that it should be pickable, so set it in the new args obj
setattr(pickable_args, field, field_val)
return pickable_args
def make_opts_pickable(opts):
""" Makes a namespace pickable """
return make_args_pickable(opts)
def is_picklable(obj: Any) -> bool:
"""
Checks if somehting is pickable.
Ref:
- https://stackoverflow.com/questions/70128335/what-is-the-proper-way-to-make-an-object-with-unpickable-fields-pickable
"""
import pickle
try:
pickle.dumps(obj)
except pickle.PicklingError:
return False
return True
Note: one of the reasons I want something "offical"/tested is because I am getting pycharm halt on the try catch: How to stop PyCharm's break/stop/halt feature on handled exceptions (i.e. only break on python unhandled exceptions)? which is not what I want...I want it to only halt on unhandled exceptions.
What is the proper way to make an object with unpickable fields pickable?
I believe the answer to this belongs in the question you linked -- Python - How can I make this un-pickleable object pickleable?. I've added a new answer to that question explaining how you can make an unpicklable object picklable the proper way, without using __reduce__.
So what is the official (perhaps optimized) way to check for pickables without dill or with the official pickle?
Objects that are picklable are defined in the docs as follows:
None, True, and False
integers, floating point numbers, complex numbers
strings, bytes, bytearrays
tuples, lists, sets, and dictionaries containing only picklable objects
functions defined at the top level of a module (using def, not lambda)
built-in functions defined at the top level of a module
classes that are defined at the top level of a module
instances of such classes whose dict or the result of calling getstate() is picklable (see section Pickling Class Instances for details).
The tricky parts are (1) knowing how functions/classes are defined (you can probably use the inspect module for that) and (2) recursing through objects, checking against the rules above.
There are a lot of caveats to this, such as the pickle protocol versions, whether the object is an extension type (defined in a C extension like numpy, for example) or an instance of a 'user-defined' class. Usage of __slots__ can also impact whether an object is picklable or not (since __slots__ means there's no __dict__), but can be pickled with __getstate__. Some objects may also be registered with a custom function for pickling. So, you'd need to know if that has happened as well.
Technically, you can implement a function to check for all of this in Python, but it will be quite slow by comparison. The easiest (and probably most performant, as pickle is implemented in C) way to do this is to simply attempt to pickle the object you want to check.
I tested this with PyCharm pickling all kinds of things... it doesn't halt with this method. The key is that you must anticipate pretty much any kind of exception (see footnote 3 in the docs). The warnings are optional, they're mostly explanatory for the context of this question.
def is_picklable(obj: Any) -> bool:
try:
pickle.dumps(obj)
return True
except (pickle.PicklingError, pickle.PickleError, AttributeError, ImportError):
# https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled
return False
except RecursionError:
warnings.warn(
f"Could not determine if object of type {type(obj)!r} is picklable"
"due to a RecursionError that was supressed. "
"Setting a higher recursion limit MAY allow this object to be pickled"
)
return False
except Exception as e:
# https://docs.python.org/3/library/pickle.html#id9
warnings.warn(
f"An error occurred while attempting to pickle"
f"object of type {type(obj)!r}. Assuming it's unpicklable. The exception was {e}"
)
return False
Using the example from my other answer I linked above, you could make your object picklable by implementing __getstate__ and __setstate__ (or subclassing and adding them, or making a wrapper class) adapting your make_args_pickable...
class Unpicklable:
"""
A simple marker class so we can distinguish when a deserialized object
is a string because it was originally unpicklable
(and not simply a string to begin with)
"""
def __init__(self, obj_str: str):
self.obj_str = obj_str
def __str__(self):
return self.obj_str
def __repr__(self):
return f'Unpicklable(obj_str={self.obj_str!r})'
class PicklableNamespace(Namespace):
def __getstate__(self):
"""For serialization"""
# always make a copy so you don't accidentally modify state
state = self.__dict__.copy()
# Any unpicklables will be converted to a ``Unpicklable`` object
# with its str format stored in the object
for key, val in state.items():
if not is_picklable(val):
state[key] = Unpicklable(str(val))
return state
def __setstate__(self, state):
self.__dict__.update(state) # or leave unimplemented
In action, I'll pickle a namespace whose attributes contain a file handle (normally not picklable) and then load the pickle data.
# Normally file handles are not picklable
p = PicklableNamespace(f=open('test.txt'))
data = pickle.dumps(p)
del p
loaded_p = pickle.loads(data)
# PicklableNamespace(f=Unpicklable(obj_str="<_io.TextIOWrapper name='test.txt' mode='r' encoding='cp1252'>"))
Yes, a try/except is the best way to go about this.
Per the docs, pickle is capable of recursively pickling objects, that is to say, if you have a list of objects that are pickleable, it will pickle all objects inside of that list if you attempt to pickle that list. This means that you cannot feasibly test to see if an object is pickleable without pickling it. Because of that, your structure of:
def is_picklable(obj):
try:
pickle.dumps(obj)
except pickle.PicklingError:
return False
return True
is the simplest and easiest way to go about checking this. If you are not working with recursive structures and/or you can safely assume that all recursive structures will only contain pickleable objects, you could check the type() value of the object against the list of pickleable objects:
None, True, and False
integers, floating point numbers, complex numbers
strings, bytes, bytearrays
tuples, lists, sets, and dictionaries containing only picklable objects
functions defined at the top level of a module (using def, not lambda)
built-in functions defined at the top level of a module
classes that are defined at the top level of a module
instances of such classes whose dict or the result of calling getstate() is picklable (see section Pickling Class Instances for details).
This is likely faster than using a try:... except:... like you showed in your question.
To me no matter the error I want my function to tell me it's not pickable. So it seems to work if I do this:
def is_picklable(obj: Any) -> bool:
"""
Checks if somehting is pickable.
Ref:
- https://stackoverflow.com/questions/70128335/what-is-the-proper-way-to-make-an-object-with-unpickable-fields-pickable
- pycharm halting all the time issue: https://stackoverflow.com/questions/70761481/how-to-stop-pycharms-break-stop-halt-feature-on-handled-exceptions-i-e-only-b
"""
import pickle
try:
pickle.dumps(obj)
except:
return False
return True
plus as an added bonus it doesn't freak pycharm out see How to stop PyCharm's break/stop/halt feature on handled exceptions (i.e. only break on python unhandled exceptions)? for details.
I want to cleanup each parameter before passing it to the class methods. Right now I have smth like this:
from cerberus import Validator
class MyValidator(Validator): # Validator - external lib (has it's own _validate methods)
def _validate_type_name(self, value):
# validation code here
# goal - to clean up value before passing to each methods (mine + from lib) e.g., value.strip()
schema = {"name": {"type": "name"}, # name - custom type from _validate_type_name
"planet_type": {"type": "string"}} # string - external lib type
my_dict = {"name": " Mars ",
"planet_type": " terrestrial "}
v = MyValidator(schema)
print(v.validate(my_dict)) # True/ False
# NOTE: I would like to do cleanup on method type level (not pass to schema)
I would like to clean up data before passing to the MyValidator methods (e.g., simple strip) but I don't want to make it as a separate step (just in case someone forgets to execute it before calling validation). I'd like to integrate cleanup with validation methods (external ones + mine).
I was considering either decorator on class or metaclass, but maybe there's a better approach. I don't have much experience here, asking for your advice.
If your goal is to make sure that the caller does the cleaning (i.e. you want them to "clean" their own copy of the value rather than having you return a modified version to them, which necessitates that it happen outside your function), then a decorator can't do much more than enforcement -- i.e. you can wrap all the functions such that a runtime exception is raised if an invalid value comes through.
The way that I'd tackle this instead of a decorator would be with types (which requires that you include mypy in your testing process, but you should be doing that anyway IMO). Something like:
from typing import NewType
CleanString = NewType('CleanString', str)
clean(value: str) -> CleanString:
"""Does cleanup on a raw string to make it a 'clean' string"""
value = value.strip()
# whatever else
return CleanString(value)
class MyValidator(Validator):
def validate_name(self, value: CleanString) -> bool:
# this will now flag a mypy error if someone passes a plain str to it,
# saying a 'str' was provided where a 'CleanString' was required!
Static typing has the advantage of raising an error before the code is even executed, and regardless of the actual runtime value.
I want to write a number of related parse functions, that take text and return objects or raise exceptions, rather like int() and float() do. I do anticipate being able to supply these recursively to higher level parsers. I want to be able to configure these at run time, and have either their docstrings, or some other attribute, settable to report how they've been configured.
Python's 'There should be one—and preferably only one—obvious way to do it' has let me down here.
I appear to be able to do exactly the same thing with either a class with a call method, or a function that returns a function.
For instance, my two attempts at a toy range-constrained number parser are below.
class Parser():
def __init__(self, nType=int, nRange=None):
self.nType = nType
self.nRange = nRange
self.__doc__ = 'class - range is {}'.format(str(nRange))
def __call__(self, inStr):
x = self.nType(inStr)
if self.nRange:
if not self.nRange[0] <= x <= self.nRange[1]:
raise ValueError('{} is out of range (class)'.format(inStr))
return x
def parserFactory(nType=int, nRange=None):
def parser(inStr):
x = nType(inStr)
if nRange:
if not nRange[0] <= x <= nRange[1]:
raise ValueError('{} is out of range (factory)'.format(inStr))
return x
parser.__doc__ = 'factory - range is {}'.format(str(nRange))
return parser
a = Parser()
b = Parser(nRange=(3,6), nType=float)
c = parserFactory(nType=float)
d = parserFactory(nRange=(3, 6))
for string in ['4', '14']:
for x in [a,b,c,d,int]:
print(x.__doc__[:35])
try:
print(string, x(string))
except ValueError as error:
print(error)
Both do what I want. Both have more or less the same complexity, and essentially the same statements, albeit in a different order. The factory is slightly shorter. I don't anticipate needing to use any other class methods. I don't see any clear way to choose which is 'better'.
Is one or the other more pythonic?
Is one or the other more likely to run me into difficulty if (when) I try to modify them in yet unanticipated ways?
What do most people do?
I'm a fairly inexperienced programmer. I've read wikipedia's entry on 'factory method pattern' and the subtleties in it go straight over my head.
(edit) Having read comments, answers and links, I think one of the problems is that neither is a good fit. You would not expect a class to have so few methods, even though it can. You would not expect a function to be carrying an attribute, even though it can. As the syntax is so similar, it probably doesn't matter which I use initially, as I can switch without a change in behaviour. (/edit)
You can think of functions as syntactic sugar for classes with only a __init__ and __call__. That would also be true for generators vs classes, context managers vs classes, ...
If you are only passing the parser around and calling it someplace(i.e. doing function things), then you should use the factory. It also allows you to migrate to the class later easily, your factory can simply return the class.
If, besides calling it, you need to inspect or change the values of the parser in other parts of your code, then you should go with classes.
All that said, in this specific case you showed here, I think I would use functools.partial
I'm starting to port some code from Python2.x to Python3.x, but before I make the jump I'm trying to modernise it to recent 2.7. I'm making good progress with the various tools (e.g. futurize), but one area they leave alone is the use of buffer(). In Python3.x buffer() has been removed and replaced with memoryview() which in general looks to be cleaner, but it's not a 1-to-1 swap.
One way in which they differ is:
In [1]: a = "abcdef"
In [2]: b = buffer(a)
In [3]: m = memoryview(a)
In [4]: print b, m
abcdef <memory at 0x101b600e8>
That is, str(<buffer object>) returns a byte-string containing the contents of the object, whereas memoryviews return their repr(). I think the new behaviour is better, but it's causing issues.
In particular I've got some code which is throwing an exception because it's receiving a byte-string containing <memory at 0x1016c95a8>. That suggests that there's a piece of code somewhere else that is relying on this behaviour to work, but I'm having real trouble finding it.
Does anybody have a good debugging trick for this type of problem?
One possible trick is to write a subclass of the memoryview and temporarily change all your memoryview instances to, lets say, memoryview_debug versions:
class memoryview_debug(memoryview):
def __init__(self, string):
memoryview.__init__(self, string)
def __str__(self):
# ... place a breakpoint, log the call, print stack trace, etc.
return memoryview.__str__(self)
EDIT:
As noted by OP it is apparently impossible to subclass from memoryview. Fortunately thanks to dynamic typing that's not a big problem in Python, it will be just more inconvenient. You can change inheritance to composition:
class memoryview_debug:
def __init__(self, string):
self.innerMemoryView = memoryview(string)
def tobytes(self):
return self.innerMemoryView.tobytes()
def tolist(self):
return self.innerMemoryView.tolist()
# some other methods if used by your code
# and if overridden in memoryview implementation (e.g. __len__?)
def __str__(self):
# ... place a breakpoint, log the call, print stack trace, etc.
return self.innerMemoryview.__str__()
I have a Python function that takes a numeric argument that must be an integer in order for it behave correctly. What is the preferred way of verifying this in Python?
My first reaction is to do something like this:
def isInteger(n):
return int(n) == n
But I can't help thinking that this is 1) expensive 2) ugly and 3) subject to the tender mercies of machine epsilon.
Does Python provide any native means of type checking variables? Or is this considered to be a violation of the language's dynamically typed design?
EDIT: since a number of people have asked - the application in question works with IPv4 prefixes, sourcing data from flat text files. If any input is parsed into a float, that record should be viewed as malformed and ignored.
isinstance(n, int)
If you need to know whether it's definitely an actual int and not a subclass of int (generally you shouldn't need to do this):
type(n) is int
this:
return int(n) == n
isn't such a good idea, as cross-type comparisons can be true - notably int(3.0)==3.0
Yeah, as Evan said, don't type check. Just try to use the value:
def myintfunction(value):
""" Please pass an integer """
return 2 + value
That doesn't have a typecheck. It is much better! Let's see what happens when I try it:
>>> myintfunction(5)
7
That works, because it is an integer. Hm. Lets try some text.
>>> myintfunction('text')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in myintfunction
TypeError: unsupported operand type(s) for +: 'int' and 'str'
It shows an error, TypeError, which is what it should do anyway. If caller wants to catch that, it is possible.
What would you do if you did a typecheck? Show an error right? So you don't have to typecheck because the error is already showing up automatically.
Plus since you didn't typecheck, you have your function working with other types:
Floats:
>>> print myintfunction(2.2)
4.2
Complex numbers:
>>> print myintfunction(5j)
(2+5j)
Decimals:
>>> import decimal
>>> myintfunction(decimal.Decimal('15'))
Decimal("17")
Even completely arbitrary objects that can add numbers!
>>> class MyAdderClass(object):
... def __radd__(self, value):
... print 'got some value: ', value
... return 25
...
>>> m = MyAdderClass()
>>> print myintfunction(m)
got some value: 2
25
So you clearly get nothing by typechecking. And lose a lot.
UPDATE:
Since you've edited the question, it is now clear that your application calls some upstream routine that makes sense only with ints.
That being the case, I still think you should pass the parameter as received to the upstream function. The upstream function will deal with it correctly e.g. raising an error if it needs to. I highly doubt that your function that deals with IPs will behave strangely if you pass it a float. If you can give us the name of the library we can check that for you.
But... If the upstream function will behave incorrectly and kill some kids if you pass it a float (I still highly doubt it), then just just call int() on it:
def myintfunction(value):
""" Please pass an integer """
return upstreamfunction(int(value))
You're still not typechecking, so you get most benefits of not typechecking.
If even after all that, you really want to type check, despite it reducing your application's readability and performance for absolutely no benefit, use an assert to do it.
assert isinstance(...)
assert type() is xxxx
That way we can turn off asserts and remove this <sarcasm>feature</sarcasm> from the program by calling it as
python -OO program.py
Python now supports gradual typing via the typing module and mypy. The typing module is a part of the stdlib as of Python 3.5 and can be downloaded from PyPi if you need backports for Python 2 or previous version of Python 3. You can install mypy by running pip install mypy from the command line.
In short, if you want to verify that some function takes in an int, a float, and returns a string, you would annotate your function like so:
def foo(param1: int, param2: float) -> str:
return "testing {0} {1}".format(param1, param2)
If your file was named test.py, you could then typecheck once you've installed mypy by running mypy test.py from the command line.
If you're using an older version of Python without support for function annotations, you can use type comments to accomplish the same effect:
def foo(param1, param2):
# type: (int, float) -> str
return "testing {0} {1}".format(param1, param2)
You use the same command mypy test.py for Python 3 files, and mypy --py2 test.py for Python 2 files.
The type annotations are ignored entirely by the Python interpreter at runtime, so they impose minimal to no overhead -- the usual workflow is to work on your code and run mypy periodically to catch mistakes and errors. Some IDEs, such as PyCharm, will understand type hints and can alert you to problems and type mismatches in your code while you're directly editing.
If, for some reason, you need the types to be checked at runtime (perhaps you need to validate a lot of input?), you should follow the advice listed in the other answers -- e.g. use isinstance, issubclass, and the like. There are also some libraries such as enforce that attempt to perform typechecking (respecting your type annotations) at runtime, though I'm uncertain how production-ready they are as of time of writing.
For more information and details, see the mypy website, the mypy FAQ, and PEP 484.
if type(n) is int
This checks if n is a Python int, and only an int. It won't accept subclasses of int.
Type-checking, however, does not fit the "Python way". You better use n as an int, and if it throws an exception, catch it and act upon it.
Don't type check. The whole point of duck typing is that you shouldn't have to. For instance, what if someone did something like this:
class MyInt(int):
# ... extra stuff ...
Programming in Python and performing typechecking as you might in other languages does seem like choosing a screwdriver to bang a nail in with. It is more elegant to use Python's exception handling features.
From an interactive command line, you can run a statement like:
int('sometext')
That will generate an error - ipython tells me:
<type 'exceptions.ValueError'>: invalid literal for int() with base 10: 'sometext'
Now you can write some code like:
try:
int(myvar) + 50
except ValueError:
print "Not a number"
That can be customised to perform whatever operations are required AND to catch any errors that are expected. It looks a bit convoluted but fits the syntax and idioms of Python and results in very readable code (once you become used to speaking Python).
I would be tempted to to something like:
def check_and_convert(x):
x = int(x)
assert 0 <= x <= 255, "must be between 0 and 255 (inclusive)"
return x
class IPv4(object):
"""IPv4 CIDR prefixes is A.B.C.D/E where A-D are
integers in the range 0-255, and E is an int
in the range 0-32."""
def __init__(self, a, b, c, d, e=0):
self.a = check_and_convert(a)
self.b = check_and_convert(a)
self.c = check_and_convert(a)
self.d = check_and_convert(a)
assert 0 <= x <= 32, "must be between 0 and 32 (inclusive)"
self.e = int(e)
That way when you are using it anything can be passed in yet you only store a valid integer.
how about:
def ip(string):
subs = string.split('.')
if len(subs) != 4:
raise ValueError("incorrect input")
out = tuple(int(v) for v in subs if 0 <= int(v) <= 255)
if len(out) != 4:
raise ValueError("incorrect input")
return out
ofcourse there is the standard isinstance(3, int) function ...
For those who are looking to do this with assert() function. Here is how you can efficiently place the variable type check in your code without defining any additional functions. This will prevent your code from running if the assert() error is raised.
assert(type(X) == int(0))
If no error was raised, code continues to work. Other than that, unittest module is a very useful tool for this sorts of things.