I'm trying to generate the values that will go into a custom enum instead of using literals:
from enum import IntEnum
class Test(IntEnum):
for i in range(3):
locals()['ABC'[i]] = i
del i
My desired output is three attributes, named A, B, C, with values 0, 1, 2, respectively. This is based on two expectations that I've come to take for granted about python:
The class body will run in an isolated namespace before anything else
locals during that run will refer to said isolated namespace
Once the body is done executing, I would expect the result to be not much different than calling IntEnum('Test', [('A', 0), ('B', 1), ('C', 2)]) (which works just fine BTW).
Instead, I get an error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in Test
File "/usr/lib/python3.8/enum.py", line 95, in __setitem__
raise TypeError('Attempted to reuse key: %r' % key)
TypeError: Attempted to reuse key: 'i'
If I try doing the same with class Test: instead of class Test(IntEnum):, it works as expected. The traceback is showing the problem to be happening in enum.py. This contradicts my assumptions about how things work.
What is going on with this code, and how to I create attributes in the local namespace of the class body before IntEnum can get to them?
Background The reason that I'm trying to create the enum this way is that the "real" values are a more complex tuple, and there is a __new__ method defined to parse the tuple and assign some attributes to the individual enum objects. All that does not seem to be relevant to figuring out what is happening with the error and fixing it.
First, an explanation of what is happening. Before executing the class body, the metaclass's __prepare__ method is used to create the namespace. Normally, this is just a dict. However, enum.EnumType uses a enum._EnumDict class, which specifically prevents duplicate names from being added to the namespace. While this does not alter how the code in the class body is run, it does alter the namespace into which that code places names.
There are a couple of exceptions to the duplicate prevention, which offer potential solutions. First, the proper solution is to use the _ignore_ sunder attribute. If it gets set first, the variable i can be used normally, and will not appear in the final class:
class Test(IntEnum):
_ignore_ = ['i']
for i in range(3):
locals()['ABC'[i]] = i
Another, much hackier method is to use a dunder name, which will be ignored by the metaclass:
class Test(IntEnum):
for __i__ in range(3):
locals()['ABC'[__i__]] = __i__
del __i__
While this solution is functional, it uses dunders, which are nominally reserved by the language, and an undocumented feature, both of which are bad.
Related
I know what namespaces are. But when running
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('bar')
parser.parse_args(['XXX']) # outputs: Namespace(bar='XXX')
What kind of object is Namespace(bar='XXX')? I find this totally confusing.
Reading the argparse docs, it says "Most ArgumentParser actions add some value as an attribute of the object returned by parse_args()". Shouldn't this object then appear when running globals()? Or how can I introspect it?
Samwise's answer is very good, but let me answer the other part of the question.
Or how can I introspect it?
Being able to introspect objects is a valuable skill in any language, so let's approach this as though Namespace is a completely unknown type.
>>> obj = parser.parse_args(['XXX']) # outputs: Namespace(bar='XXX')
Your first instinct is good. See if there's a Namespace in the global scope, which there isn't.
>>> Namespace
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'Namespace' is not defined
So let's see the actual type of the thing. The Namespace(bar='XXX') printer syntax is coming from a __str__ or __repr__ method somewhere, so let's see what the type actually is.
>>> type(obj)
<class 'argparse.Namespace'>
and its module
>>> type(obj).__module__
'argparse'
Now it's a pretty safe bet that we can do from argparse import Namespace and get the type. Beyond that, we can do
>>> help(argparse.Namespace)
in the interactive interpreter to get detailed documentation on the Namespace class, all with no Internet connection necessary.
It's simply a container for the data that parse_args generates.
https://docs.python.org/3/library/argparse.html#argparse.Namespace
This class is deliberately simple, just an object subclass with a readable string representation.
Just do parser.parse_args(...).bar to get the value of your bar argument. That's all there is to that object. Per the doc, you can also convert it to a dict via vars().
The symbol Namespace doesn't appear when running globals() because you didn't import it individually. (You can access it as argparse.Namespace if you want to.) It's not necessary to touch it at all, though, because you don't need to instantiate a Namespace yourself. I've used argparse many times and until seeing this question never paid attention to the name of the object type that it returns -- it's totally unimportant to the practical applications of argparse.
Namespace is basically just a bare-bones class, on whose instances you can define attributes, with a few niceties:
A nice __repr__
Only keyword arguments can be used to instantiate it, preventing "anonymous" attributes.
A convenient method to check if an attribute exists (foo in Namespace(bar=3) evaluates to False)
Equality with other Namespace instances based on having identical attributes and attribute values. (E.g. ,Namespace(foo=3, bar=5) == Namespace(bar=5, foo=3))
Instances of Namespace are returned by parse_args:
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('bar')
args = parser.parse_args(['XXX'])
assert args.bar == 'XXX'
In cpython this code would work:
import inspect
from types import FunctionType
def f(a, b): # line 5
print(a, b)
f_clone = FunctionType(
f.__code__,
f.__globals__,
closure=f.__closure__,
name=f.__name__
)
f_clone.__annotations__ = {'a': int, 'b': int}
f_clone.__defaults__ = (1, 2)
print(inspect.signature(f_clone)) # (a: int = 1, b: int = 2)
print(inspect.signature(f)) # (a, b)
f_clone() # 1 2
f(1, 2) # 1 2
try:
f()
except TypeError as e:
print(e) # f() missing 2 required positional arguments: 'a' and 'b'
However in cython when calling f_clone, I get:
XXX lineno: 5, opcode: 0
Traceback (most recent call last):
...
File "test.py", line 5, in f # line of f definitio
SystemError: unknown opcode
I need this to create a copy of class __init__ method on each class creation and and modify its signature, but keep original __init__ signature untouched.
Edit:
Changes made to signature of copied object must not affect runtime calls and needed only for inspection purposes.
I am relatively convinced this is never going to work well. If I were you I'd modify your code to fail elegantly for unclonable functions (maybe by just using the original __init__ and not replacing it, since this seems to be a purely cosmetic approach to generate prettier docstrings). After that you could submit an issue to the Cython issue tracker - however the maintainers of Cython know that full-introspection compatibility with Python is very challenging, so may not be hugely interested.
One of the main reasons I think you should just handle the error rather than find a workaround is that Cython is not the only method to accelerate Python. For example Numba can generate classes containing JIT accelerated code, or people can write their own functions in C (either as a C-API function, or perhaps wrapped with Ctypes or CFFI). These are all situations where your rather fragile introspection approach is likely to break. Handling the error fixes it for all of these; while you're likely to need an individual workaround for each one, plus all the methods I haven't thought of, plus any that are developed in the future.
Some details about Cython functions: at the moment a Cython has a compilation option called binding that can generate functions in two different modes:
With binding=False functions have the type builtin_function_or_method, which has minimum introspection capacities, and so no __code__, __globals__, __closure__ (or most other) attributes.
With binding=True functions have the type cython_function_or_method. This has improved introspection capacity, so does provide most of the expected annotations. However some of them are nonsense defaults - specifically __code__. The __code__ attribute is expected to be Python bytecode, however Cython doesn't use Python bytecode (since it's compiled to C). Therefore it just provides a dummy attribute.
It looks like Cython defaults to binding=True when compiling a .py file and when compiling a regular (non-cdef) class, giving the behaviour you report. However, when compiling a .pyx file it currently defaults to binding=False. It's possible you may also want to handle the binding=False case in some circumstances too.
Having established that trying to create a regular Python function object with the __code__ attribute of a cython_function_or_method isn't going to work, let's look at a few other options:
>>> print(f)
<cyfunction f at 0x7f08a1c63550>
>>> type(f)()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: cannot create 'cython_function_or_method' instances
So you can't create your own cython_function_or_method and populate it from Python - the type does not have a user callable constructor.
copy.copy appears to work, but doesn't actually create a new instance:
>>> import copy
>>> copy.copy(f)
<cyfunction f at 0x7f08a1c63550>
Note, however, that this has exactly the same address - it isn't a copy:
>>> copy.copy(f) is f
True
At which point I'm out of ideas.
What I don't quite get is why you don't use functools.wraps?
#functools.wraps(f):
def wrapper(*args, **kwargs):
return f(*args, **kwargs)
This updates wrapper with most of the relevant introspection attributes from f, works for both types of Cython function (to an extent - the binding=False case doesn't provide much useful information), and should work for most other types of function too.
It's possible I'm missing something, but it seems a whole lot less fragile than your scheme of copying code objects.
class Applicant:
applicant_id_count=1000
application_dict={
"A":0,
"B":0,
"C":0
}
def __init__(self,applicant_name):
self.__applicant_name=applicant_name
self.__applicant_id=None
self.__job_band=None
I need to make the static variables in the above class i.e. application_dict and applicant_id_count as private static variables. Or is there any such thing in python?
Python does not have access modifiers. If you want to access an instance (or class) variable from outside the instance or class, you are always allowed to do so.
That said there's a convention using underscores(_) that most developers follow to indicate that a variable/method is private. A single underscore is a convention of saying that it's a private variable but it actually doesn't change the access privilege. Example:
class Applicant:
_applicant_id_count = 1000
Applicant._applicant_id_count # equals to 1000
If you want to emulate private variables for some reason, you can always use the __ prefix. Python mangles the names of variables so that they're not easily visible. Example:
class Applicant:
__applicant_id_count=1000
You will get the following error when someone tries to directly access it:
Applicant.__applicant_id_count
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: class Applicant has no attribute '__applicant_id_count'
Someone can hack their way and use the variable like this:
Applicant._Applicant__applicant_id_count # prints out 1000
You can read more about it here: https://www.geeksforgeeks.org/private-variables-python/
In Python, you can always access all variables. But, there is a convention for naming of this classes and attributes. You can use the __ prefix (two underscores) from PEP 8. Python mangles the names of variables like __foo so that they're not easily visible to code outside the class that contains them. Also, if you want a protected variable scope, you can use _ prefix (one underscore).
When creating a new class instance, I'm trying to call a method in a different class however can't get it to work. Here's what I have:
class DataProject(object):
def __init__(self, name=none,input_file=None,datamode=None,comments=None,readnow=True):
..............
# here's where I'm trying to call the method in the other class
if type(input_file) == str:
self.input_file_format = self.input_file.split(".")[-1]
if readnow:
getattr(Analysis(),'read_'+self.input_file_format)(self,input_file)
class Analysis(object):
def __init__(self):
pass # nothing happens here atm
def read_xlsx(self,parent,input_file):
"""Method to parse xlsx files and dump them into a DataFrame"""
xl = pd.ExcelFile(input_file)
for s in sheet_names:
parent.data[s]=xl.parse(s)
I'm getting a NameError: global name 'read_xlsx' is not defined when I run this with afile.xlxs as input which made me think that I just discovered a massive hole in my Python knowledge (not that there aren't many but they tend to be hard to see, sort of like big forests...).
I would have thought that getattr(Analysis(), ... ) would access the global name space in which it would find the Analysis class and its methods. And in fact print(globals().keys()) shows that Analysis is part of this:
['plt', 'mlab', '__builtins__', '__file__', 'pylab', 'DataProject', 'matplotlib', '__package__', 'W32', 'Helpers', 'time', 'pd', 'pyplot', 'np', '__name__', 'dt', 'Analysis', '__doc__']
What am I missing here?
EDIT:
The full traceback is:
Traceback (most recent call last):
File "C:\MPython\dataAnalysis\dataAnalysis.py", line 101, in <module>
a=DataProject(input_file='C:\\MPython\\dataAnalysis\\EnergyAnalysis\\afile.xlxs',readnow=True)
File "C:\MPython\dataAnalysis\dataAnalysis.py", line 73, in __init__
getattr(Analysis(),'read_'+self.input_file_format)(self,input_file)
File "C:\MPython\dataAnalysis\dataAnalysis.py", line 90, in read_xls
read_xlsx(input_file)
NameError: global name 'read_xlsx' is not defined
My main call is:
if __name__=="__main__":
a=DataProject(input_file='C:\\MPython\\dataAnalysis\\EnergyAnalysis\\afile.xlx',readnow=True)
From the full traceback, it appears that your DataProject class is calling (successfully) the Analysys.read_xls method, which in turn is trying to call read_xlsx. However, it's calling it as a global function, not as a method.
Probably you just need to replace the code on line 90, turning read_xlsx(input_file) into self.read_xlsx(input_file), though you might need to pass an extra parameter for the parent DataProject instance too.
getattr() works as you describe it in both Python2.x and Python3.x. The bug must be somewhere else.
This modification of your code (none of the core logic is changed) works fine for instance:
class DataProject(object):
def __init__(self, name="myname",input_file="xlsx",datamode=None,comments=None,readnow=True):
if type(input_file) == str:
self.input_file_format = input_file.split(".")[-1]
if readnow:
getattr(Analysis(),'read_'+self.input_file_format)(self,input_file)
class Analysis(object):
def __init__(self):
pass # nothing happens here atm
def read_xlsx(self,parent,input_file):
"""Method to parse xlsx files and dumpt them into a DataFrame"""
print("hello")
a=DataProject()
Output is:
$ python3 testfn.py
hello
Why using getattr() in this way is usually a bad idea
The way you are using getattr forces a naming convention on your methods (read_someformat). The naming of your methods should not be a core part of your program's logic. - You should always be able to change a function's name at every call and definition of that function and leave behaviour of the program intact.
If a file format needs to be handled by a specific method this logic should be delegated to some unit (e.g a function) with responsibility for this. One way (there are others) of doing this is to have a function which takes the input and decides which function needs to handle it:
def read_file(self,file,format):
if format == `xls`:
self.read_xls(file)
if format == `csv`:
self.read_csv(file)
The above snippet does have its issues too (a better way to do it would be the chain of responsibility pattern for example) but it will be fine for small scripts and is much nicer.
I am writing a binding system that exposes classes and functions to python in a slightly unusual way.
Normally one would create a python type and provide a list of functions that represent the methods of that type, and then allow python to use its generic tp_getattro function to select the right one.
For reasons I wont go into here, I can't do it this way, and must provide my own tp_getattro function, that selects methods from elsewhere and returns my own 'bound method' wrapper. This works fine, but means that a types methods are not listed in its dictionary (so dir(MyType()) doesn't show anything interesting).
The problem is that I cannot seem to get __add__ methods working. see the following sample:
>>> from mymod import Vec3
>>> v=Vec3()
>>> v.__add__
<Bound Method of a mymod Class object at 0xb754e080>
>>> v.__add__(v)
<mymod.Vec3 object at 0xb751d710>
>>> v+v
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'mymod.Vec3' and 'mymod.Vec3'
As you can see, Vec3 has an __add__ method which can be called, but python's + refuses to use it.
How can I get python to use it? How does the + operator actually work in python, and what method does it use to see if you can add two arbitrary objects?
Thanks.
(P.S. I am aware of other systems such as Boost.Python and SWIG which do this automatically, and I have good reason for not using them, however wonderful they may be.)
Do you have an nb_add in your type's number methods structure (pointed by field tp_as_number of your type object)?