Python `UserString` seems problematic?

Python `UserString` seems problematic? - python

I need to use UserString to create my own str class, but its implementation seems problematic.
For example, in the class definition, it reads:
def __eq__(self, string):
if isinstance(string, UserString):
return self.data == string.data
return self.data == string
But since an empty list ([]) is actually an instance of UserString:
isinstance([], UserString) == True
Now this code doesn't work:
s = UserString("")
if s in [None, [], {}, ()]:
# do whatever
because in operator will use UserString's __eq__ to check membership but [] does not have .data attribute. This issue doesn't exist in the built-in str class.
I know this is a trivial, non-realistic example, but anyone encountered this problem before using UserString and what is the best way to circumvent this (maybe method override in my own subclass)? Any other caveats?
Note: I am aware of this SO thread, but I don't think my question is a duplicate of it.
It seems like no one can reproduce isinstance([], UserString) == True. But this is a screenshot from my PyCharm IDE:

Related

How do I specify literally a specific type (like str (not instance of str)) in Python with type hinting?

Is there a way I can specify a variable should be literally a specific type in Python?
I know that it is possible to specify a variable should a specific type or subtype, but how can I specify only that type?
var: str # instance of string
var: type[str] # subtype or type string
var: (literally str class) # part in question
An example of when this functionality may be necessary pertains to Union types. For example, what if I want the argument passed to a specific parameter of a function or the return type of a function to be literally the type int or str?
def example1(arg: (Literally) int | (Literally) str):
...
def example2(arg) -> (Literally) int | (Literally) str:
...

If you want to specify a type for your variable you can follow this practice,
before putting any values in your variable, define the variable like this:
string:
x = ''
list:
x = []
Dictionary:
x = {}
and so on ...

TL;DR: You don't.
That is not how types work in general. It's like asking "how can I check if something is just a bear, but not a polar bear, grizzly bear, or any other subtype of bear?" What does that even mean?
If something can be subtyped, there is no way to do what you are asking.
Note that I am talking about static type checking here, not about runtime logic. Of course you can do this:
assert type(x) is str
That will ensure that x is of the str class and not of a subclass, but there are good reasons, why that is usually discouraged in favor of this:
assert isinstance(x, str)
That has almost nothing to do with type annotations of variables though.
By default, a type can have any number of descendant types. If you want to have a custom type, which is final, i.e. subtypes shall not exist, you can make use of PEP 591:
from typing import final
#final
class MyStr(str):
pass
if __name__ == '__main__':
x: MyStr
x = MyStr("abc")
The relevant thing for the type checker here is that MyStr is final, which means that it will give you an error the moment you try to subclass MyStr without even concerning itself with whether you assign a value of that subclass to x later on. Taking mypy for example, if you do this:
from typing import final
#final
class MyStr(str):
pass
class OtherStr(MyStr): # this is where the error occurs
pass
if __name__ == '__main__':
x: MyStr
x = OtherStr("abc") # irrelevant at this point
It will not complain about the last line at all. Instead it will only say:
error: Cannot inherit from final class "MyStr" [misc]
In other words, the moment you try and create a subclass, the rest of the things you do with that class no longer matter because they are by definition invalid.
If you really want to, you can of course shadow str, but this will come with a bunch of caveats:
from builtins import str as built_in_str
from typing import final
#final
class str(built_in_str):
pass
if __name__ == '__main__':
x: str
x = str("abc")
I think this is a very bad idea. One reason is that you won't be able to just use a string literal "abc" to assign to x because that gives you a builtin.str object. But I think you can imagine whole lot of other problems.
Hope this makes sense.

How to check if a variable is instance of any class [duplicate]

I need to determine if a given Python variable is an instance of native type: str, int, float, bool, list, dict and so on. Is there elegant way to doing it?
Or is this the only way:
if myvar in (str, int, float, bool):
# do something

This is an old question but it seems none of the answers actually answer the specific question: "(How-to) Determine if Python variable is an instance of a built-in type". Note that it's not "[...] of a specific/given built-in type" but of a.
The proper way to determine if a given object is an instance of a buil-in type/class is to check if the type of the object happens to be defined in the module __builtin__.
def is_builtin_class_instance(obj):
return obj.__class__.__module__ == '__builtin__'
Warning: if obj is a class and not an instance, no matter if that class is built-in or not, True will be returned since a class is also an object, an instance of type (i.e. AnyClass.__class__ is type).

The best way to achieve this is to collect the types in a list of tuple called primitiveTypes and:
if isinstance(myvar, primitiveTypes): ...
The types module contains collections of all important types which can help to build the list/tuple.
Works since Python 2.2

Not that I know why you would want to do it, as there isn't any "simple" types in Python, it's all objects. But this works:
type(theobject).__name__ in dir(__builtins__)
But explicitly listing the types is probably better as it's clearer. Or even better: Changing the application so you don't need to know the difference.
Update: The problem that needs solving is how to make a serializer for objects, even those built-in. The best way to do this is not to make a big phat serializer that treats builtins differently, but to look up serializers based on type.
Something like this:
def IntSerializer(theint):
return str(theint)
def StringSerializer(thestring):
return repr(thestring)
def MyOwnSerializer(value):
return "whatever"
serializers = {
int: IntSerializer,
str: StringSerializer,
mymodel.myclass: MyOwnSerializer,
}
def serialize(ob):
try:
return ob.serialize() #For objects that know they need to be serialized
except AttributeError:
# Look up the serializer amongst the serializer based on type.
# Default to using "repr" (works for most builtins).
return serializers.get(type(ob), repr)(ob)
This way you can easily add new serializers, and the code is easy to maintain and clear, as each type has its own serializer. Notice how the fact that some types are builtin became completely irrelevant. :)

You appear to be interested in assuring the simplejson will handle your types. This is done trivially by
try:
json.dumps( object )
except TypeError:
print "Can't convert", object
Which is more reliable than trying to guess which types your JSON implementation handles.

What is a "native type" in Python? Please don't base your code on types, use Duck Typing.

you can access all these types by types module:
`builtin_types = [ i for i in types.__dict__.values() if isinstance(i, type)]`
as a reminder, import module types first
def isBuiltinTypes(var):
return type(var) in types.__dict__.values() and not isinstance(var, types.InstanceType)

It's 2020, I'm on python 3.7, and none of the existing answers worked for me. What worked instead is the builtins module. Here's how:
import builtins
type(your_object).__name__ in dir(builtins)

Built in type function may be helpful:
>>> a = 5
>>> type(a)
<type 'int'>

building off of S.Lott's answer you should have something like this:
from simplejson import JSONEncoder
class JSONEncodeAll(JSONEncoder):
def default(self, obj):
try:
return JSONEncoder.default(self, obj)
except TypeError:
## optionally
# try:
# # you'd have to add this per object, but if an object wants to do something
# # special then it can do whatever it wants
# return obj.__json__()
# except AttributeError:
##
# ...do whatever you are doing now...
# (which should be creating an object simplejson understands)
to use:
>>> json = JSONEncodeAll()
>>> json.encode(myObject)
# whatever myObject looks like when it passes through your serialization code
these calls will use your special class and if simplejson can take care of the object it will. Otherwise your catchall functionality will be triggered, and possibly (depending if you use the optional part) an object can define it's own serialization

For me the best option is:
allowed_modules = set(['numpy'])
def isprimitive(value):
return not hasattr(value, '__dict__') or \
value.__class__.__module__ in allowed_modules
This fix when value is a module and value.__class__.__module__ == '__builtin__' will fail.

The question asks to check for non-class types. These types don't have a __dict__ member (You could also test for __repr__ member, instead of checking for __dict__) Other answers mention to check for membership in types.__dict__.values(), but some of the types in this list are classes.
def isnonclasstype(val):
return getattr(val,"__dict__", None) != None
a=2
print( isnonclasstype(a) )
a="aaa"
print( isnonclasstype(a) )
a=[1,2,3]
print( isnonclasstype(a) )
a={ "1": 1, "2" : 2 }
print( isnonclasstype(a) )
class Foo:
def __init__(self):
pass
a = Foo()
print( isnonclasstype(a) )
gives me:
> python3 t.py
False
False
False
False
True
> python t.py
False
False
False
False
True

Python: Checking if an object is atomically pickleable

What's an accurate way of checking whether an object can be atomically pickled? When I say "atomically pickled", I mean without considering other objects it may refer to. For example, this list:
l = [threading.Lock()]
is not a a pickleable object, because it refers to a Lock which is not pickleable. But atomically, this list itself is pickleable.
So how do you check whether an object is atomically pickleable? (I'm guessing the check should be done on the class, but I'm not sure.)
I want it to behave like this:
>>> is_atomically_pickleable(3)
True
>>> is_atomically_pickleable(3.1)
True
>>> is_atomically_pickleable([1, 2, 3])
True
>>> is_atomically_pickleable(threading.Lock())
False
>>> is_atomically_pickleable(open('whatever', 'r'))
False
Etc.

Given that you're willing to break encapsulation, I think this is the best you can do:
from pickle import Pickler
import os
class AtomicPickler(Pickler):
def __init__(self, protocol):
# You may want to replace this with a fake file object that just
# discards writes.
blackhole = open(os.devnull, 'w')
Pickler.__init__(self, blackhole, protocol)
self.depth = 0
def save(self, o):
self.depth += 1
if self.depth == 1:
return Pickler.save(self, o)
self.depth -= 1
return
def is_atomically_pickleable(o, protocol=None):
pickler = AtomicPickler(protocol)
try:
pickler.dump(o)
return True
except:
# Hopefully this exception was actually caused by dump(), and not
# something like a KeyboardInterrupt
return False
In Python the only way you can tell if something will work is to try it. That's the nature of a language as dynamic as Python. The difficulty with your question is that you want to distinguish between failures at the "top level" and failures at deeper levels.
Pickler.save is essentially the control-center for Python's pickling logic, so the above creates a modified Pickler that ignores recursive calls to its save method. Any exception raised while in the top-level save is treated as a pickling failure. You may want to add qualifiers to the except statement. Unqualified excepts in Python are generally a bad idea as exceptions are used not just for program errors but also for things like KeyboardInterrupt and SystemExit.
This can give what are arguably false negatives for types with odd custom pickling logic. For example, if you create a custom list-like class that instead of causing Pickler.save to be recursively called it actually tried to pickle its elements on its own somehow, and then created an instance of this class that contained an element that its custom logic could not pickle, is_atomically_pickleable would return False for this instance even though removing the offending element would result in an object that was pickleable.
Also, note the protocol argument to is_atomically_pickleable. Theoretically an object could behave differently when pickled with different protocols (though that would be pretty weird) so you should make this match the protocol argument you give to dump.

Given the dynamic nature of Python, I don't think there's really a well-defined way to do what you're asking aside from heuristics or a whitelist.
If I say:
x = object()
is x "atomically pickleable"? What if I say:
x.foo = threading.Lock()
? is x "atomically pickleable" now?
What if I made a separate class that always had a lock attribute? What if I deleted that attribute from an instance?

I think the persistent_id interface is a poor match for you are attempting to do. It is designed to be used when your object should refer to equivalent objects on the new program rather then copies of the old one. You are attempting to filter out every object that cannot be pickled which is different and why are you attempting to do this.
I think this is a sign of problem in your code. That fact that you want to pickle objects which refer to gui widgets, files, and locks suggests that you are doing something strange. The kind of objects you typically persist shouldn't be related to or hold references to that sort of object.
Having said that, I think your best option is the following:
class MyPickler(Pickler):
def save(self, obj):
try:
Pickler.save(self, obj)
except PicklingEror:
Pickle.save( self, FilteredObject(obj) )
This should work for the python implementation, I make no guarantees as to what will happen in the C implementation. Every object which gets saved will be passed to the save method. This method will raise the PicklingError when it cannot pickle the object. At this point, you can step in and recall the function asking it to pickle your own object which should pickle just fine.
EDIT
From my understanding, you have essentially a user-created dictionary of objects. Some objects are picklable and some aren't. I'd do this:
class saveable_dict(dict):
def __getstate__(self):
data = {}
for key, value in self.items():
try:
encoded = cPickle.dumps(value)
except PicklingError:
encoded = cPickle.dumps( Unpickable() )
return data
def __setstate__(self, state):
for key, value in state:
self[key] = cPickle.loads(value)
Then use that dictionary when you want to hold that collection of objects. The user should be able to get any picklable objects back, but everything else will come back as the Unpicklable() object. The difference between this and the previous approach is in objects which are themselves pickable but have references to unpicklable objects. But those objects are probably going to come back broken regardless.
This approach also has the benefit that it remains entirely within the defined API and thus should work in either cPickle or pickle.

I ended up coding my own solution to this.
Here's the code. Here are the tests. It's part of GarlicSim, so you can use it by installing garlicsim and doing from garlicsim.general_misc import pickle_tools.
If you want to use it on Python 3 code, use the Python 3 fork of garlicsim.
Here is an excerpt from the module (may be outdated):
import re
import cPickle as pickle_module
import pickle # Importing just to get dispatch table, not pickling with it.
import copy_reg
import types
from garlicsim.general_misc import address_tools
from garlicsim.general_misc import misc_tools
def is_atomically_pickleable(thing):
'''
Return whether `thing` is an atomically pickleable object.
"Atomically-pickleable" means that it's pickleable without considering any
other object that it contains or refers to. For example, a `list` is
atomically pickleable, even if it contains an unpickleable object, like a
`threading.Lock()`.
However, the `threading.Lock()` itself is not atomically pickleable.
'''
my_type = misc_tools.get_actual_type(thing)
return _is_type_atomically_pickleable(my_type, thing)
def _is_type_atomically_pickleable(type_, thing=None):
'''Return whether `type_` is an atomically pickleable type.'''
try:
return _is_type_atomically_pickleable.cache[type_]
except KeyError:
pass
if thing is not None:
assert isinstance(thing, type_)
# Sub-function in order to do caching without crowding the main algorithm:
def get_result():
# We allow a flag for types to painlessly declare whether they're
# atomically pickleable:
if hasattr(type_, '_is_atomically_pickleable'):
return type_._is_atomically_pickleable
# Weird special case: `threading.Lock` objects don't have `__class__`.
# We assume that objects that don't have `__class__` can't be pickled.
# (With the exception of old-style classes themselves.)
if not hasattr(thing, '__class__') and \
(not isinstance(thing, types.ClassType)):
return False
if not issubclass(type_, object):
return True
def assert_legit_pickling_exception(exception):
'''Assert that `exception` reports a problem in pickling.'''
message = exception.args[0]
segments = [
"can't pickle",
'should only be shared between processes through inheritance',
'cannot be passed between processes or pickled'
]
assert any((segment in message) for segment in segments)
# todo: turn to warning
if type_ in pickle.Pickler.dispatch:
return True
reduce_function = copy_reg.dispatch_table.get(type_)
if reduce_function:
try:
reduce_result = reduce_function(thing)
except Exception, exception:
assert_legit_pickling_exception(exception)
return False
else:
return True
reduce_function = getattr(type_, '__reduce_ex__', None)
if reduce_function:
try:
reduce_result = reduce_function(thing, 0)
# (The `0` is the protocol argument.)
except Exception, exception:
assert_legit_pickling_exception(exception)
return False
else:
return True
reduce_function = getattr(type_, '__reduce__', None)
if reduce_function:
try:
reduce_result = reduce_function(thing)
except Exception, exception:
assert_legit_pickling_exception(exception)
return False
else:
return True
return False
result = get_result()
_is_type_atomically_pickleable.cache[type_] = result
return result
_is_type_atomically_pickleable.cache = {}

dill has the pickles method for such a check.
>>> import threading
>>> l = [threading.Lock()]
>>>
>>> import dill
>>> dill.pickles(l)
True
>>>
>>> dill.pickles(threading.Lock())
True
>>> f = open('whatever', 'w')
>>> f.close()
>>> dill.pickles(open('whatever', 'r'))
True
Well, dill atomically pickles all of your examples, so let's try something else:
>>> l = [iter([1,2,3]), xrange(5)]
>>> dill.pickles(l)
False
Ok, this fails. Now, let's investigate:
>>> dill.detect.trace(True)
>>> dill.pickles(l)
T4: <type 'listiterator'>
False
>>> map(dill.pickles, l)
T4: <type 'listiterator'>
Si: xrange(5)
F2: <function _eval_repr at 0x106991cf8>
[False, True]
Ok. we can see the iter fails, but the xrange does pickle. So, let's replace the iter.
>>> l[0] = xrange(1,4)
>>> dill.pickles(l)
Si: xrange(1, 4)
F2: <function _eval_repr at 0x106991cf8>
Si: xrange(5)
True
Now our object atomically pickles.

How do I tell what type of data is inside python variable?

I have a list of variables.. inside the list are strings, numbers, and class objects. I need to perform logic based on each different type of data. I am having trouble detecting class objects and branching my logic at that point.
if(type(lists[listname][0]).__name__ == 'str'): # <--- this works for strings
elif(type(lists[listname][0]).__name__ == 'object'): <--- this does not work for classes
in the second line of code above, the name variable contains "Address" as the class name. I was hoping it would contain "class" or "object" so I could branch my program. I will have many different types of objects in the future, so it's a bit impractical to perform logic on every different class name, "Address" "Person" etc
please let me know if my question needs clarification.
thanks!!

FYI: it also makes a difference if its a new-style class or not:
# python
type(1).__name__
'int'
type('1').__name__
'str'
class foo(object):
pass
type(foo()).__name__
'foo'
class bar:
pass
type(bar()).__name__
'instance'
If you can make sure they're all new-style classes, your method will determine the real type. If you make them old-style, it'll show up as 'instance'. Not that I'm recommending making everything all old-style just for this.
However, you can take it one step further:
type(bar().__class__).__name__
'classobj'
type(foo().__class__).__name__
'type'
And always look for 'classobj' or 'type'. (Or the name of the metaclass, if it has one.)

I think you want the isinstance function.
if isinstance(o, ClassName):
However, you'll need to first verify that o is an object, you can use type for that.

It's common in Python to use exception handling to decide which code path to take; inspecting the exact type of an object (with isinstance()) to decide what to do with it is discouraged.
For example, say that what you want to do is, if it's a string, print it in "title case", and if it's an object, you want to call a particular method on it. So:
try:
# is it an object with a particular method?
lists[listname][0].particularMethod()
except AttributeError:
# no, it doesn't have particularMethod(),
# so we expect it to be a string; print it in title case
print lists[listname][0].title()

If you are only interested in handling two types specifically, you could test for them explicitly using isinstance and then handle the leftovers:
import numbers
for item in list:
if isinstance(item, basestring): # (str, unicode)
do_string_thing(item)
elif isinstance(item, numbers.Real): # (int, float, long)
do_number_thing(item)
else:
do_object_thing(item)

Determine if Python variable is an instance of a built-in type

I need to determine if a given Python variable is an instance of native type: str, int, float, bool, list, dict and so on. Is there elegant way to doing it?
Or is this the only way:
if myvar in (str, int, float, bool):
# do something

This is an old question but it seems none of the answers actually answer the specific question: "(How-to) Determine if Python variable is an instance of a built-in type". Note that it's not "[...] of a specific/given built-in type" but of a.
The proper way to determine if a given object is an instance of a buil-in type/class is to check if the type of the object happens to be defined in the module __builtin__.
def is_builtin_class_instance(obj):
return obj.__class__.__module__ == '__builtin__'
Warning: if obj is a class and not an instance, no matter if that class is built-in or not, True will be returned since a class is also an object, an instance of type (i.e. AnyClass.__class__ is type).

The best way to achieve this is to collect the types in a list of tuple called primitiveTypes and:
if isinstance(myvar, primitiveTypes): ...
The types module contains collections of all important types which can help to build the list/tuple.
Works since Python 2.2

Not that I know why you would want to do it, as there isn't any "simple" types in Python, it's all objects. But this works:
type(theobject).__name__ in dir(__builtins__)
But explicitly listing the types is probably better as it's clearer. Or even better: Changing the application so you don't need to know the difference.
Update: The problem that needs solving is how to make a serializer for objects, even those built-in. The best way to do this is not to make a big phat serializer that treats builtins differently, but to look up serializers based on type.
Something like this:
def IntSerializer(theint):
return str(theint)
def StringSerializer(thestring):
return repr(thestring)
def MyOwnSerializer(value):
return "whatever"
serializers = {
int: IntSerializer,
str: StringSerializer,
mymodel.myclass: MyOwnSerializer,
}
def serialize(ob):
try:
return ob.serialize() #For objects that know they need to be serialized
except AttributeError:
# Look up the serializer amongst the serializer based on type.
# Default to using "repr" (works for most builtins).
return serializers.get(type(ob), repr)(ob)
This way you can easily add new serializers, and the code is easy to maintain and clear, as each type has its own serializer. Notice how the fact that some types are builtin became completely irrelevant. :)

You appear to be interested in assuring the simplejson will handle your types. This is done trivially by
try:
json.dumps( object )
except TypeError:
print "Can't convert", object
Which is more reliable than trying to guess which types your JSON implementation handles.

What is a "native type" in Python? Please don't base your code on types, use Duck Typing.

you can access all these types by types module:
`builtin_types = [ i for i in types.__dict__.values() if isinstance(i, type)]`
as a reminder, import module types first
def isBuiltinTypes(var):
return type(var) in types.__dict__.values() and not isinstance(var, types.InstanceType)

It's 2020, I'm on python 3.7, and none of the existing answers worked for me. What worked instead is the builtins module. Here's how:
import builtins
type(your_object).__name__ in dir(builtins)

Built in type function may be helpful:
>>> a = 5
>>> type(a)
<type 'int'>

building off of S.Lott's answer you should have something like this:
from simplejson import JSONEncoder
class JSONEncodeAll(JSONEncoder):
def default(self, obj):
try:
return JSONEncoder.default(self, obj)
except TypeError:
## optionally
# try:
# # you'd have to add this per object, but if an object wants to do something
# # special then it can do whatever it wants
# return obj.__json__()
# except AttributeError:
##
# ...do whatever you are doing now...
# (which should be creating an object simplejson understands)
to use:
>>> json = JSONEncodeAll()
>>> json.encode(myObject)
# whatever myObject looks like when it passes through your serialization code
these calls will use your special class and if simplejson can take care of the object it will. Otherwise your catchall functionality will be triggered, and possibly (depending if you use the optional part) an object can define it's own serialization

For me the best option is:
allowed_modules = set(['numpy'])
def isprimitive(value):
return not hasattr(value, '__dict__') or \
value.__class__.__module__ in allowed_modules
This fix when value is a module and value.__class__.__module__ == '__builtin__' will fail.

The question asks to check for non-class types. These types don't have a __dict__ member (You could also test for __repr__ member, instead of checking for __dict__) Other answers mention to check for membership in types.__dict__.values(), but some of the types in this list are classes.
def isnonclasstype(val):
return getattr(val,"__dict__", None) != None
a=2
print( isnonclasstype(a) )
a="aaa"
print( isnonclasstype(a) )
a=[1,2,3]
print( isnonclasstype(a) )
a={ "1": 1, "2" : 2 }
print( isnonclasstype(a) )
class Foo:
def __init__(self):
pass
a = Foo()
print( isnonclasstype(a) )
gives me:
> python3 t.py
False
False
False
False
True
> python t.py
False
False
False
False
True

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python `UserString` seems problematic? - python

Related

How do I specify literally a specific type (like str (not instance of str)) in Python with type hinting?

How to check if a variable is instance of any class [duplicate]

Python: Checking if an object is atomically pickleable

How do I tell what type of data is inside python variable?

Determine if Python variable is an instance of a built-in type

Categories

Resources