The usual method of attribute access requires attribute names to be valid python identifiers.
But attributes don't have to be valid python identifiers:
>>> class Thing:
... def __init__(self):
... setattr(self, '0potato', 123)
...
>>> t = Thing()
>>> Thing.__getattribute__(t, '0potato')
123
>>> getattr(t, '0potato')
123
Of course, t.0potato remains a SyntaxError, but the attribute is there nonetheless:
>>> vars(t)
{'0potato': 123}
What is the reason for this being permissable? Is there really any valid use-case for attributes with spaces, empty string, python reserved keywords etc? I thought the reason was that attributes were just keys in the object/namespace dict, but this makes no sense because other objects which are valid dict keys are not allowed:
>>> setattr(t, ('tuple',), 321)
TypeError: attribute name must be string, not 'tuple'
The details from a comment on the post fully answer this question, so I'm posting it as an answer:
Guido says:
...it is a feature that you can use any arbitrary string
with getattr() and setattr(). However these functions should (and do!)
reject non-strings.
Possible use-cases include hiding attributes from regular dotted access, and making attributes in correspondence with external data sources (which may clash with Python keywords). So, the argument seems to be there's simply no good reason to forbid it.
As for a reason to disallow non-strings, this seems to be a sensible restriction which is ensuring greater performance of the implementation:
Although Python's dicts already have some string-only optimizations -- they just dynamically adapt to a more generic and slightly slower approach once the first non-key string shows up.
So, to answer the use case question, looking at the reasoning behind how Python works in the references from the comments above, we can infer some of the situations that might make this Pythonic quirk useful.
You want an object to have an attribute that cannot be accessed with dot notation, say, to protect it from the naive user. (Quoting Guido: "some people might use this to hide state they don't want accessible using regular attribute notation (x.foo)". Of course, he goes on to say, "but that feels like abuse of the namespace to me, and there are plenty of other
ways to manage such state.")
You want an object's attribute names to correspond to external data over which you have no control. Thus, you have to be able to use whatever strings appear in the external data as an attribute name even if it matches a Python reserved word or contains embedded spaces or dashes, etc.
Related
Does Python have a builtin type for representing symbolic values, when strings cannot be used?
A quick implementation of my own would look like
class Symbol:
def __init__(self, name):
self.name = name
def __repr__(self):
return self.name
Usecase
Such symbols are useful when a value – say a keyword argument or a list entry – needs to be initialized to a value that indicates that it hasn't been explicitly set.
If the values have constraints on allowed types, commonly None or some string would be used, but if any value is allowed, some other unique object is needed. My common method is to use an object() assigned to some private variable, but the symbol pattern is more convenient for debugging due to providing a meaningful printed representation.
As an alternative, one could use e.g. a tuple ('default value',) and compare against it with the is operator, but this wouldn't work e.g. for dictionary keys.
While the pattern is simple enough to copy/paste into each shell-script I am writing, a builtin solution with established behavior would be preferable.
Non-builtins
I know, that there are packages that provide a symbol type. An obvious one would be the symbol type of sympi, and there is https://pypi.org/project/SymbolType/. However, adding dependencies to avoid a 5-line pattern seems a heavy overkill, hence my question about a builtin type.
You could use the enum library:
https://docs.python.org/3/library/enum.html
I constantly see people state that "Everything in Python is an object.", but I haven't seen "thing" actually defined. This saying would lead me to believe that all tokens of any kind are also considered to be objects, including operators, punctuators, whitespace, etc. Is that actually the case? Is there a more concise way of stating what a Python object actually is?
Thanks
Anything that can be assigned to a variable is an object.
That includes functions, classes, and modules, and of course int's, str's, float's, list's, and everything else. It does not include whitespace, punctuation, or operators.
Just to mention it, there is the operator module in the standard library which includes functions that implement operators; those functions are objects. That doesn't mean + or * are objects.
I could go on and on, but this is simple and pretty complete.
Some values are obviously objects; they are instances of a class, have attributes, etc.
>>> i = 3
>>> type(i)
<type 'int'>
>>> i.denominator
1
Other values are less obviously objects. Types are objects:
>>> type(int)
<type 'type'>
>>> int.__mul__(3, 5)
15
Even type is an object (of type type, oddly enough):
>>> type(type)
<type 'type'>
Modules are objects:
>>> import sys
>>> type(sys)
<type 'module'>
Built-in functions are objects:
>>> type(sum)
<type 'builtin_function_or_method'>
In short, if you can reference it by name, it's an object.
What is generally meant is that most things, for example functions and methods are objects. Modules too. Classes (not just their instances) themselves are objects. and int/float/strings are objects. So, yes, things generally tend to be objects in Python. Cyphase is correct, I just wanted to give some examples of things that might not be immediately obvious as objects.
Being objects then a number of properties are observable on things that you would consider special case, baked-in stuff in other languages. Though __dict__, which allows arbitrary attribute assignment in Python, is often missing on things intended for large volume instantiations like int.
Therefore, at least on pure-Python objects, a lot of magic can happen, from introspection to things like creating a new class on the fly.
Kinda like turtles all the way down.
You're not going to find a rigorous definition like C++11's, because Python does not have a formal specification like C++11, it has a reference manual like pre-ISO C++. The Data model chapter is as rigorous as it gets:
Objects are Python’s abstraction for data. All data in a Python program is represented by objects or by relations between objects. (In a sense, and in conformance to Von Neumann’s model of a “stored program computer,” code is also represented by objects.)
Every object has an identity, a type and a value. An object’s identity never changes once it has been created; you may think of it as the object’s address in memory. …
The glossary also has a shorter definition:
Any data with state (attributes or value) and defined behavior (methods).
And it's true that everything in Python has methods and (other) attributes. Even if there are no public methods, there's a set of special methods and values inherited from the object base class, like the __str__ method.
This wasn't true in versions of Python before 2.2, which is part of the reason we have multiple words for nearly the same thing—object, data, value; type, class… But from then on, the following kinds of things are identical:
Objects.
Things that can be returned or yielded by a function.
Things that can be stored in a variable (including a parameter).
Things that are instances of type object (usually indirectly, through a subclass or two).
Things that can be the value resulting from an expression.
Things represented by pointers to PyObject structs in CPython.
… and so on.
That's what "everything is an object" means.
It also means that Python doesn't have "native types" and "class types" like Java, or "value types" and "reference types" like C#; there's only one kind of thing, objects.
This saying would lead me to believe that all tokens of any kind are also considered to be objects, including operators, punctuators, whitespace, etc. Is that actually the case?
No. Those things don't have values, so they're not objects.1
Also, variables are not objects. Unlike C-style variables, Python variables are not memory locations with a type containing a value, they're just names bound to a value in some namespace.2 And that's why you can't pass around references to variables; there is no "thing" to reference.3
Assignment targets are also not objects. They sometimes look a lot like values, and even the core devs sometimes refer to things like the a, b in a, b = 1, 2 loosely as a tuple object—but there is no tuple there.4
There's also a bit of apparent vagueness with things like elements of a numpy.array (or an array.array or ctypes.Structure). When you write a[0] = 3, the 3 object doesn't get stored in the array the way it would with a list. Instead, numpy stores some bytes that Python doesn't even understand, but that it can use to do "the same thing a 3 would do" in array-wide operations, or to make a new copy of the 3 object if you later ask for a[0] = 3.
But if you go back to the definition, it's pretty clear that this "virtual 3" is not an object—while it has a type and value, it does not have an identity.
1. At the meta level, you can write an import hook that can act on imported code as a byte string, a decoded Unicode string, a list of token tuples, an AST node, a code object, or a module, and all of those are objects… But at the "normal" level, from within the code being imported, tokens, etc. are not objects.
2. Under the covers, there's almost always a string object to represent that name, stored in a dict or tuple that represents the namespace, as you can see by calling globals() or dir(self). But that's not what the variable is.
3. A closure cell is sort of a way of representing a reference to a variable, but really, it's the cell itself that's an object, and the variables at different scopes are just a slightly special kind of name for that cell.
4. However, in a[0] = 3, although a[0] isn't a value, a and 0 are, because that assignment is equivalent to the expression a.__setitem__(0, 3), except that it's not an expression.
I think I might have a fundamental misunderstanding of what a python attribute actually is. Consider the following:
>>> class Test:
... pass
...
>>> t = Test()
>>> setattr(t, '0', 0)
>>> t.0
File "<stdin>", line 1
t.0
^
SyntaxError: invalid syntax
>>> getattr(t, '0')
0
>>> setattr(t, 'one', 1)
>>> t.one
1
>>> getattr(t, 'one')
1
Why does Python allow me to set an attribute if I can't legally access it with dot notation? I understand that t.0 makes no sense, but at the same time I wonder why it's no different than t.one because I created them the same way.
Attributes are a kind of members any Python object can have. Usually you would expect the built-in syntax to dictate what kind of attribute names are accepted. For that, the definition is pretty clear:
attributeref ::= primary "." identifier
So what follows after the dot is required to be a valid identifier which limits the allowed attribute names easily. Ignoring other Unicode areas for now, it essentially means that the attribute may not start with a number. So 0 is not a valid identifier and as such t.0 is not a valid attribute reference as per the specification.
However, getattr and alike work a bit differently. They just require the attribute name to be a string. And that string is passed on directly to the internal PyObject_GetAttr functions. And those don’t require a valid identifier.
So using getattr etc., you can essentially trick Python and attach attribute to objects, which names would not be allowed according to the specification of the language.
This is just a quirk of the syntax and semantics of python. Any string can be used as an attribute name, however only identifiers can be used with dot notation. Thus the only way of accessing non-identifier attributes is with getattr/setattr or some other indirect function. Strangely enough this practice doesn't extend so far as to allow any type to be an attribute, only strings get that privilege.
I have the following in a Python script:
setattr(stringRESULTS, "b", b)
Which gives me the following error:
AttributeError: 'str' object has no attribute 'b'
Can any-one telling me what the problem is here?
Don't do this. To quote the inestimable Greg Hewgill,
"If you ever find yourself using quoted names to refer to variables,
there's usually a better way to do whatever you're trying to do."
[Here you're one level up and using a string variable for the name, but it's the same underlying issue.] Or as S. Lott followed up with in the same thread:
"90% of the time, you should be using a dictionary. The other 10% of
the time, you need to stop what you're doing entirely."
If you're using the contents of stringRESULTS as a pointer to some object fred which you want to setattr, then these objects you want to target must already exist somewhere, and a dictionary is the natural data structure to store them. In fact, depending on your use case, you might be able to use dictionary key/value pairs instead of attributes in the first place.
IOW, my version of what (I'm guessing) you're trying to do would probably look like
d[stringRESULTS].b = b
or
d[stringRESULTS]["b"] = b
depending on whether I wanted/needed to work with an object instance or a dictionary would suffice.
(P.S. relatively few people subscribe to the python-3.x tag. You'll usually get more attention by adding the bare 'python' tag as well.)
Since str is a low-level primitive type, you can't really set any arbitrary attribute on it. You probably need either a dict or a subclass of str:
class StringResult(str):
pass
which should behave as you expect:
my_string_result = StringResult("spam_and_eggs")
my_string_result.b = b
EDIT:
If you're trying to do what DSM suggests, ie. modify a property on a variable that has the same name as the value of the stringRESULTS variable then this should do the trick:
locals()[stringRESULTS].b = b
Please note that this is an extremely dangerous operation and can wreak all kinds of havoc on your app if you aren't careful.
Suppose that I have a class like this
class Employee:
pass
I create two objects for Employee as below
john = Employee()
rob = Employee()
..and create instance variables
john.age = 12
rob.age = '15'
The compiler accepts both and prints the age (john's age in int and rob's age in string). How is this logical? The same data attribute having different type in each object.
Thanks.
Be sure to understand this fundamental principle: in Python, variables don't have types. Values have types. This is the essence of Python's being a dynamically-typed language similarly to Lisp and Javascript, but unlike C, C++ and Java.
>>> foo = 5 # foo now holds a value of type int
>>> foo = 'hello' # foo now holds a value of type string
Here's an excerpt from Wikipedia's entry on typing in Python:
Python uses duck typing and has typed
objects but untyped variable names.
Type constraints are not checked at
compile time; rather, operations on an
object may fail, signifying that the
given object is not of a suitable
type. Despite being dynamically typed,
Python is strongly typed, forbidding
operations that are not well-defined
(for example, adding a number to a
string) rather than silently
attempting to make sense of them.
Do read more on this subject (especially what Duck Typing is) if you want to learn Python.
P.S. This issue is totally orthogonal to attributes of objects. Attributes are just other "variables" in Python, which also can hold values. These values can be of different types.
Because by saying rob.age you are not creating a class-wide data attribute that has a specific type; you are merely creating an instance-local, instance-specific attribute that refers to a concrete entity, the string '15'. To create a class-wide attribute you would have to say Employee.age = … or set age inside the class Employee: block. By setting the attribute to a descriptor, I suppose you could check its type every time it is set and restrict it to an integer or string or whatever; but in general, either a class attribute or an instance attribute is just a name for an object, and all Python cares is that .age names an object.
And note that Python could not really guess what you mean anyway. If you say that john.age is 12, you seem to want Python to guess that all other .age attributes should also be numbers. But why shouldn't Python go even further, and guess that they are integers — or better yet, that they are positive even integers? I really do not think it would be reasonable in any case for Python to extrapolate from a single assignment to some kind of guess as to how you will treat that attribute in all other instances of the class.
It's fundamentally what you get when you have a dynamically typed language.
Type is determined at runtime not at declaration.
It has advantages and disadvantages, but I believe the advantages outweigh its disadvantages in most development contexts.
(It's what Ruby, Python, PHP, etc. do)
Python's compiler does not care what type of value you bind to an attribute/name, nor does it have to; Python's dynamic nature means that the important type checks (which are usually actually attribute checks) are done at runtime.
The term "variable" is confusioning in Python.