Why can a Python object have an attribute represented by an integer? - python

I think I might have a fundamental misunderstanding of what a python attribute actually is. Consider the following:
>>> class Test:
... pass
...
>>> t = Test()
>>> setattr(t, '0', 0)
>>> t.0
File "<stdin>", line 1
t.0
^
SyntaxError: invalid syntax
>>> getattr(t, '0')
0
>>> setattr(t, 'one', 1)
>>> t.one
1
>>> getattr(t, 'one')
1
Why does Python allow me to set an attribute if I can't legally access it with dot notation? I understand that t.0 makes no sense, but at the same time I wonder why it's no different than t.one because I created them the same way.

Attributes are a kind of members any Python object can have. Usually you would expect the built-in syntax to dictate what kind of attribute names are accepted. For that, the definition is pretty clear:
attributeref ::= primary "." identifier
So what follows after the dot is required to be a valid identifier which limits the allowed attribute names easily. Ignoring other Unicode areas for now, it essentially means that the attribute may not start with a number. So 0 is not a valid identifier and as such t.0 is not a valid attribute reference as per the specification.
However, getattr and alike work a bit differently. They just require the attribute name to be a string. And that string is passed on directly to the internal PyObject_GetAttr functions. And those don’t require a valid identifier.
So using getattr etc., you can essentially trick Python and attach attribute to objects, which names would not be allowed according to the specification of the language.

This is just a quirk of the syntax and semantics of python. Any string can be used as an attribute name, however only identifiers can be used with dot notation. Thus the only way of accessing non-identifier attributes is with getattr/setattr or some other indirect function. Strangely enough this practice doesn't extend so far as to allow any type to be an attribute, only strings get that privilege.

Related

Cannot call __subclass__ method inside format string expansion

I have this line of python code that returns a list of all classes that currently exist:
'a'.__class__.__mro__[1].__subclasses__()
Now I'd expect that this line would work in the same way:
'{0.__class__.__mro__[1].__subclasses__()}'.format('a')
but this gives me the error:
AttributeError: type object 'object' has no attribute '__subclasses__()'
Then again this line:
'{0.__class__.__mro__[1].__subclasses__}'.format('a')
prints out
'<built-in method __subclasses__ of type object at 0x9d1260>'
so the methods seems to be there but I can't call it for some reason. Can somebody explain this behavior to me?
str.format doesn't support arbitrary expressions in format strings. You can use indexing and attribute access, but not the function call operator (and even indexing is a little weird). Rather than trying to stuff all that in the format string itself, evaluate the expression outside and pass the result as an argument to format:
'{}'.format('a'.__class__.__mro__[1].__subclasses__())
Incidentally, this __subclasses__ call doesn't give you a list of all subclasses that exist. It gives you a list of all direct subclasses of object. It doesn't include grandchild classes or further descendants.
Also, unless you're trying to perform some kind of sandbox escape or you've got some other weird constraint, you don't need to go through the whole 'a'.__class__.__mro__[1] rigmarole just to refer to object.
You were accessing an attribute "__subclass__()" (string!) instead of accessing __subclass__ function then call it. i.e. the format string doesn't execute the function and returns the result in string.
To prove, try something instead:
>>> '{0.__class__.__mro__[1].__str__}'.format('a')
"<slot wrapper '__str__' of 'object' objects>"
>>> '{0.__class__.__mro__[1].__str__()}'.format('a')
AttributeError: type object 'object' has no attribute '__str__()'
As stated, str.format does not allow calls.
If you really desire such behaviour, you could try out f-strings (if you are python 3.6+):
>>> x = 'a'
>>> f'{x.__class__.__mro__[1].__subclasses__()}'
# output omitted

arbitrary __getattr__ fails on classes that inherit object but not on object [duplicate]

So, I was playing around with Python while answering this question, and I discovered that this is not valid:
o = object()
o.attr = 'hello'
due to an AttributeError: 'object' object has no attribute 'attr'. However, with any class inherited from object, it is valid:
class Sub(object):
pass
s = Sub()
s.attr = 'hello'
Printing s.attr displays 'hello' as expected. Why is this the case? What in the Python language specification specifies that you can't assign attributes to vanilla objects?
For other workarounds, see How can I create an object and add attributes to it?.
To support arbitrary attribute assignment, an object needs a __dict__: a dict associated with the object, where arbitrary attributes can be stored. Otherwise, there's nowhere to put new attributes.
An instance of object does not carry around a __dict__ -- if it did, before the horrible circular dependence problem (since dict, like most everything else, inherits from object;-), this would saddle every object in Python with a dict, which would mean an overhead of many bytes per object that currently doesn't have or need a dict (essentially, all objects that don't have arbitrarily assignable attributes don't have or need a dict).
For example, using the excellent pympler project (you can get it via svn from here), we can do some measurements...:
>>> from pympler import asizeof
>>> asizeof.asizeof({})
144
>>> asizeof.asizeof(23)
16
You wouldn't want every int to take up 144 bytes instead of just 16, right?-)
Now, when you make a class (inheriting from whatever), things change...:
>>> class dint(int): pass
...
>>> asizeof.asizeof(dint(23))
184
...the __dict__ is now added (plus, a little more overhead) -- so a dint instance can have arbitrary attributes, but you pay quite a space cost for that flexibility.
So what if you wanted ints with just one extra attribute foobar...? It's a rare need, but Python does offer a special mechanism for the purpose...
>>> class fint(int):
... __slots__ = 'foobar',
... def __init__(self, x): self.foobar=x+100
...
>>> asizeof.asizeof(fint(23))
80
...not quite as tiny as an int, mind you! (or even the two ints, one the self and one the self.foobar -- the second one can be reassigned), but surely much better than a dint.
When the class has the __slots__ special attribute (a sequence of strings), then the class statement (more precisely, the default metaclass, type) does not equip every instance of that class with a __dict__ (and therefore the ability to have arbitrary attributes), just a finite, rigid set of "slots" (basically places which can each hold one reference to some object) with the given names.
In exchange for the lost flexibility, you gain a lot of bytes per instance (probably meaningful only if you have zillions of instances gallivanting around, but, there are use cases for that).
As other answerers have said, an object does not have a __dict__. object is the base class of all types, including int or str. Thus whatever is provided by object will be a burden to them as well. Even something as simple as an optional __dict__ would need an extra pointer for each value; this would waste additional 4-8 bytes of memory for each object in the system, for a very limited utility.
Instead of doing an instance of a dummy class, in Python 3.3+, you can (and should) use types.SimpleNamespace for this.
It is simply due to optimization.
Dicts are relatively large.
>>> import sys
>>> sys.getsizeof((lambda:1).__dict__)
140
Most (maybe all) classes that are defined in C do not have a dict for optimization.
If you look at the source code you will see that there are many checks to see if the object has a dict or not.
So, investigating my own question, I discovered this about the Python language: you can inherit from things like int, and you see the same behaviour:
>>> class MyInt(int):
pass
>>> x = MyInt()
>>> print x
0
>>> x.hello = 4
>>> print x.hello
4
>>> x = x + 1
>>> print x
1
>>> print x.hello
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
AttributeError: 'int' object has no attribute 'hello'
I assume the error at the end is because the add function returns an int, so I'd have to override functions like __add__ and such in order to retain my custom attributes. But this all now makes sense to me (I think), when I think of "object" like "int".
https://docs.python.org/3/library/functions.html#object :
Note: object does not have a __dict__, so you can’t assign arbitrary attributes to an instance of the object class.
It's because object is a "type", not a class. In general, all classes that are defined in C extensions (like all the built in datatypes, and stuff like numpy arrays) do not allow addition of arbitrary attributes.
This is (IMO) one of the fundamental limitations with Python - you can't re-open classes. I believe the actual problem, though, is caused by the fact that classes implemented in C can't be modified at runtime... subclasses can, but not the base classes.

Attributes which aren't valid python identifiers

The usual method of attribute access requires attribute names to be valid python identifiers.
But attributes don't have to be valid python identifiers:
>>> class Thing:
... def __init__(self):
... setattr(self, '0potato', 123)
...
>>> t = Thing()
>>> Thing.__getattribute__(t, '0potato')
123
>>> getattr(t, '0potato')
123
Of course, t.0potato remains a SyntaxError, but the attribute is there nonetheless:
>>> vars(t)
{'0potato': 123}
What is the reason for this being permissable? Is there really any valid use-case for attributes with spaces, empty string, python reserved keywords etc? I thought the reason was that attributes were just keys in the object/namespace dict, but this makes no sense because other objects which are valid dict keys are not allowed:
>>> setattr(t, ('tuple',), 321)
TypeError: attribute name must be string, not 'tuple'
The details from a comment on the post fully answer this question, so I'm posting it as an answer:
Guido says:
...it is a feature that you can use any arbitrary string
with getattr() and setattr(). However these functions should (and do!)
reject non-strings.
Possible use-cases include hiding attributes from regular dotted access, and making attributes in correspondence with external data sources (which may clash with Python keywords). So, the argument seems to be there's simply no good reason to forbid it.
As for a reason to disallow non-strings, this seems to be a sensible restriction which is ensuring greater performance of the implementation:
Although Python's dicts already have some string-only optimizations -- they just dynamically adapt to a more generic and slightly slower approach once the first non-key string shows up.
So, to answer the use case question, looking at the reasoning behind how Python works in the references from the comments above, we can infer some of the situations that might make this Pythonic quirk useful.
You want an object to have an attribute that cannot be accessed with dot notation, say, to protect it from the naive user. (Quoting Guido: "some people might use this to hide state they don't want accessible using regular attribute notation (x.foo)". Of course, he goes on to say, "but that feels like abuse of the namespace to me, and there are plenty of other
ways to manage such state.")
You want an object's attribute names to correspond to external data over which you have no control. Thus, you have to be able to use whatever strings appear in the external data as an attribute name even if it matches a Python reserved word or contains embedded spaces or dashes, etc.

GetAttr Function Problems (Python 3)

I have the following in a Python script:
setattr(stringRESULTS, "b", b)
Which gives me the following error:
AttributeError: 'str' object has no attribute 'b'
Can any-one telling me what the problem is here?
Don't do this. To quote the inestimable Greg Hewgill,
"If you ever find yourself using quoted names to refer to variables,
there's usually a better way to do whatever you're trying to do."
[Here you're one level up and using a string variable for the name, but it's the same underlying issue.] Or as S. Lott followed up with in the same thread:
"90% of the time, you should be using a dictionary. The other 10% of
the time, you need to stop what you're doing entirely."
If you're using the contents of stringRESULTS as a pointer to some object fred which you want to setattr, then these objects you want to target must already exist somewhere, and a dictionary is the natural data structure to store them. In fact, depending on your use case, you might be able to use dictionary key/value pairs instead of attributes in the first place.
IOW, my version of what (I'm guessing) you're trying to do would probably look like
d[stringRESULTS].b = b
or
d[stringRESULTS]["b"] = b
depending on whether I wanted/needed to work with an object instance or a dictionary would suffice.
(P.S. relatively few people subscribe to the python-3.x tag. You'll usually get more attention by adding the bare 'python' tag as well.)
Since str is a low-level primitive type, you can't really set any arbitrary attribute on it. You probably need either a dict or a subclass of str:
class StringResult(str):
pass
which should behave as you expect:
my_string_result = StringResult("spam_and_eggs")
my_string_result.b = b
EDIT:
If you're trying to do what DSM suggests, ie. modify a property on a variable that has the same name as the value of the stringRESULTS variable then this should do the trick:
locals()[stringRESULTS].b = b
Please note that this is an extremely dangerous operation and can wreak all kinds of havoc on your app if you aren't careful.

How does Python differentiate between the different data types?

Sorry if this is quite noobish to you, but I'm just starting out to learn Python after learning C++ & Java, and I am wondering how in the world I could just declare variables like id = 0 and name = 'John' without any int's or string's in front! I figured out that perhaps it's because there are no ''s in a number, but how would Python figure that out in something like def increase(first, second) instead of something like int increase(int first, int second) in C++?!
The literal objects you mention carry (pointers to;-) their own types with them of course, so when a name's bound to that object the problem of type doesn't arise -- the object always has a type, the name doesn't -- just delegates that to the object it's bound to.
There's no "figuring out" in def increase(first, second): -- name increase gets bound to a function object, names first and second are recorded as parameters-names and will get bound (quite possibly to objects of different types at various points) as increase gets called.
So say the body is return first + second -- a call to increase('foo', 'bar') will then happily return 'foobar' (delegating the addition to the objects, which in this case are strings), and maybe later a call to increase(23, 45) will just as happily return 68 -- again by delegating the addition to the objects bound to those names at the point of call, which in this case are ints. And if you call with incompatible types you'll get an exception as the delegated addition operation can't make sense of the situation -- no big deal!
Python is dynamically typed: all variables can refer to an object of any type. id and name can be anything, but the actual objects are of types like int and str. 0 is a literal that is parsed to make an int object, and 'John' a literal that makes a str object. Many object types do not have literals and are returned by a callable (like frozenset—there's no way to make a literal frozenset, you must call frozenset.)
Consequently, there is no such thing as declaration of variables, since you aren't defining anything about the variable. id = 0 and name = 'John' are just assignment.
increase returns an int because that's what you return in it; nothing in Python forces it not to be any other object. first and second are only ints if you make them so.
Objects, to a certain extent, share a common interface. You can use the same operators and functions on them all, and if they support that particular operation, it works. It is a common, recommended technique to use different types that behave similarly interchangably; this is called duck typing. For example, if something takes a file object you can instead pass a cStringIO.StringIO object, which supports the same method as a file (like read and write) but is a completely different type. This is sort of like Java interfaces, but does not require any formal usage, you just define the appropriate methods.
Python uses the duck-typing method - if it walks, looks and quacks like a duck, then it's a duck. If you pass in a string, and try to do something numerical on it, then it will fail.
Have a look at: http://en.wikipedia.org/wiki/Python_%28programming_language%29#Typing and http://en.wikipedia.org/wiki/Duck_typing
When it comes to assigning literal values to variables, the type of the literal value can be inferred at the time of lexical analysis. For example, anything matching the regular expression (-)?[1-9][0-9]* can be inferred to be an integer literal. If you want to convert it to a float, there needs to be an explicit cast. Similarly, a string literal is any sequence of characters enclosed in single or double quotes.
In a method call, the parameters are not type-checked. You only need to pass in the correct number of them to be able to call the method. So long as the body of the method does not cause any errors with respect to the arguments, you can call the same method with lots of different types of arguments.
In Python, Unlike in C++ and Java, numbers and strings are both objects. So this:
id = 0
name = 'John'
is equivalent to:
id = int(0)
name = str('John')
Since variables id and name are references that may address any Python object, they don't need to be declared with a particular type.

Categories

Resources