How to clone function in cython? I getting SystemError: unknown opcode - python

In cpython this code would work:
import inspect
from types import FunctionType
def f(a, b): # line 5
print(a, b)
f_clone = FunctionType(
f.__code__,
f.__globals__,
closure=f.__closure__,
name=f.__name__
)
f_clone.__annotations__ = {'a': int, 'b': int}
f_clone.__defaults__ = (1, 2)
print(inspect.signature(f_clone)) # (a: int = 1, b: int = 2)
print(inspect.signature(f)) # (a, b)
f_clone() # 1 2
f(1, 2) # 1 2
try:
f()
except TypeError as e:
print(e) # f() missing 2 required positional arguments: 'a' and 'b'
However in cython when calling f_clone, I get:
XXX lineno: 5, opcode: 0
Traceback (most recent call last):
...
File "test.py", line 5, in f # line of f definitio
SystemError: unknown opcode
I need this to create a copy of class __init__ method on each class creation and and modify its signature, but keep original __init__ signature untouched.
Edit:
Changes made to signature of copied object must not affect runtime calls and needed only for inspection purposes.

I am relatively convinced this is never going to work well. If I were you I'd modify your code to fail elegantly for unclonable functions (maybe by just using the original __init__ and not replacing it, since this seems to be a purely cosmetic approach to generate prettier docstrings). After that you could submit an issue to the Cython issue tracker - however the maintainers of Cython know that full-introspection compatibility with Python is very challenging, so may not be hugely interested.
One of the main reasons I think you should just handle the error rather than find a workaround is that Cython is not the only method to accelerate Python. For example Numba can generate classes containing JIT accelerated code, or people can write their own functions in C (either as a C-API function, or perhaps wrapped with Ctypes or CFFI). These are all situations where your rather fragile introspection approach is likely to break. Handling the error fixes it for all of these; while you're likely to need an individual workaround for each one, plus all the methods I haven't thought of, plus any that are developed in the future.
Some details about Cython functions: at the moment a Cython has a compilation option called binding that can generate functions in two different modes:
With binding=False functions have the type builtin_function_or_method, which has minimum introspection capacities, and so no __code__, __globals__, __closure__ (or most other) attributes.
With binding=True functions have the type cython_function_or_method. This has improved introspection capacity, so does provide most of the expected annotations. However some of them are nonsense defaults - specifically __code__. The __code__ attribute is expected to be Python bytecode, however Cython doesn't use Python bytecode (since it's compiled to C). Therefore it just provides a dummy attribute.
It looks like Cython defaults to binding=True when compiling a .py file and when compiling a regular (non-cdef) class, giving the behaviour you report. However, when compiling a .pyx file it currently defaults to binding=False. It's possible you may also want to handle the binding=False case in some circumstances too.
Having established that trying to create a regular Python function object with the __code__ attribute of a cython_function_or_method isn't going to work, let's look at a few other options:
>>> print(f)
<cyfunction f at 0x7f08a1c63550>
>>> type(f)()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: cannot create 'cython_function_or_method' instances
So you can't create your own cython_function_or_method and populate it from Python - the type does not have a user callable constructor.
copy.copy appears to work, but doesn't actually create a new instance:
>>> import copy
>>> copy.copy(f)
<cyfunction f at 0x7f08a1c63550>
Note, however, that this has exactly the same address - it isn't a copy:
>>> copy.copy(f) is f
True
At which point I'm out of ideas.
What I don't quite get is why you don't use functools.wraps?
#functools.wraps(f):
def wrapper(*args, **kwargs):
return f(*args, **kwargs)
This updates wrapper with most of the relevant introspection attributes from f, works for both types of Cython function (to an extent - the binding=False case doesn't provide much useful information), and should work for most other types of function too.
It's possible I'm missing something, but it seems a whole lot less fragile than your scheme of copying code objects.

Related

Subtype Polymorphism is broken in Cython v30.0.0a11?

Trying to pass an instance of a derived class to a function which accepts instances of the superclass gives an error in Cython v3.0.0a11:
test.pyx:
class MyInt(int):
pass
def takes_int(a: int):
pass
try.py:
from test import takes_int, MyInt
takes_int(MyInt(1))
try.py OUTPUT:
Traceback (most recent call last):
File "C:\Users\LENOVO PC\PycharmProjects\MyProject\cython_src\try.py", line 3, in <module>
takes_int(MyInt(1))
TypeError: Argument 'a' has incorrect type (expected int, got MyInt)
Changing to v0.29.32, cleaning the generated C file and the object files, and re-running, gets rid of the error.
This is (kind of) expected.
Cython has never allowed subtype polymorphism for builtin type arguments. See https://cython.readthedocs.io/en/latest/src/userguide/language_basics.html#types:
This requires an exact match of the class, it does not allow subclasses. This allows Cython to optimize code by accessing internals of the builtin class, which is the main reason for declaring builtin types in the first place.
This is a restriction which applies only to builtin types - for Cython defined cdef classes it works fine. It's also slightly different to the usual rule for annotations, but it's there because it's the only way that Cython can do much with these annotation.
What's changed is that an int annotation is interpreted as "any object" in Cython 0.29.x and a Python int in Cython 3. (Note that cdef int declares a C int though.) The reason for not using an int annotation in earlier versions of Cython is that Python 2 has two Python integer types, and it isn't easy to accept both of those and usefully use the types.
I'm not quite sure exactly what the final version of Cython 3 will end up doing with int annotations though.
If you don't want Cython to use the annotation (for example you would like your int class to be accepted) then you can turn off annotation_typing locally with the #cython.annotation_typing(False) decorator.

Python: type-hinting CTypes "pointer to X" types in a way that MyPy accepts

I have a large Python binding over a C library, with complex memory management. To help with that, I have drawn up the following aliases for strings (here, you a minimal reproducible example of my definitions and the subsequent problem).
from typing import Type, TypeAlias
from ctypes import POINTER, pointer, c_char, c_char_p
#normal python string
p_useable_p_str = str
#the result of a my_str.decode("utf-8")
c_useable_p_str = bytes
#used for string literals returned by the C, which should not be freed
p_useable_c_str = c_char_p
#used for allocated strings returned by the C, which need to be freed later
c_useable_c_str = POINTER(c_char) #the problematic line
def example(hello: c_useable_c_str): # source of the MyPy error
pass
My code runs fine (memory freed properly, consistent inheritance of the above if necessary, etc) when following the above definitions+usage conventions). POINTER(c_char) has the intended behavior in the rest of the code.
However, analyzing the above with MyPy, I get:
playground.py:10: error: Variable "playground.c_useable_c_str" is not valid as a type
playground.py:10: note: See https://mypy.readthedocs.io/en/latest/common_issues.html#variables-vs-type-aliases
Found 1 error in 1 file (checked 1 source file)
I get this error for anything that uses the c_useable_c_str alias. I, of course, read the section in the documentation linked above, and tried using Type and TypeAlias in a bunch of different ways - to no avail.
The only syntax that seems to make MyPy happy is
c_useable_c_str = pointer[c_char]
However, when actually running the code with this definition of the type alias, I get the following error (not seen by MyPy, so I suspect a bug on MyPy's end, or a lack of typing in the standard):
Traceback (most recent call last):
File "/home/fulguritude/ProfessionalWork/LEDR/Orchestra-AvesTerra/Python_binding/playground2.py", line 92, in <module>
c_useable_c_str = pointer[c_char]
TypeError: 'builtin_function_or_method' object is not subscriptable
Any ideas as to how I'm supposed to make things consistent ?
TLDR: what's the valid way to type hint, and alias, a "pointer to X" with MyPy and CTypes ?
I think you can use something like:
if TYPE_CHECKING: # valid for mypy
c_useable_c_str = pointer[c_char] # the problematic line
else: # valid at run time
c_useable_c_str = pointer
I found an imperfect (but good enough) solution, based on what I found here: https://github.com/python/mypy/issues/7540 (comment by thijsmie, commented on May 21, 2021).
if not TYPE_CHECKING:
# Monkeypatch typed pointer from typeshed into ctypes
# NB: files that wish to import ctypes.pointer should all import `pointer` from this file instead
class pointer_fix:
#classmethod
def __class_getitem__(cls, item):
return POINTER(item)
pointer = pointer_fix
It does solve MyPy's complaints concerning c_useable_c_str, and functions that use c_useable_c_str still get proper typing errors when the argument provided isn't of a compatible type, so that's good.
Well, EXCEPT for the following case:
class t_string(c_useable_c_str):
pass
class t_string_p(pointer[t_string]):
pass
where MyPy does not seem to be able to understand that t_string_p is compatible with the type pointer[pointer[c_char]]. For these cases, I used a type union; not ideal, but it got the job done.
I'll be leaving this question unanswered for now, in case someone, someday, can find a better solution.

Add a signature, with annotations, to extension methods

When embedding Python in my application, and writing an extension type, I can add a signature to the method by using a properly crafted .tp_doc string.
static PyMethodDef Answer_methods[] = {
{ "ultimate", (PyCFunction)Answer_ultimate, METH_VARARGS,
"ultimate(self, question='Life, the universe, everything!')\n"
"--\n"
"\n"
"Return the ultimate answer to the given question." },
{ NULL }
};
When help(Answer) is executed, the following is returned (abbreviated):
class Answer(builtins.object)
|
| ultimate(self, question='Life, the universe, everything!')
| Return the ultimate answer to the given question.
This is good, but I'm using Python3.6, which has support for annotations. I'd like to annotate question to be a string, and the function to return an int. I've tried:
static PyMethodDef Answer_methods[] = {
{ "ultimate", (PyCFunction)Answer_is_ultimate, METH_VARARGS,
"ultimate(self, question:str='Life, the universe, everything!') -> int\n"
"--\n"
"\n"
"Return the ultimate answer to the given question." },
{ NULL }
};
but this reverts to the (...) notation, and the documentation becomes:
| ultimate(...)
| ultimate(self, question:str='Life, the universe, everything!') -> int
| --
|
| Return the ultimate answer to the given question.
and asking for inspect.signature(Answer.ultimate) results in an exception.
Traceback (most recent call last):
File "<string>", line 11, in <module>
File "inspect.py", line 3037, in signature
File "inspect.py", line 2787, in from_callable
File "inspect.py", line 2266, in _signature_from_callable
File "inspect.py", line 2090, in _signature_from_builtin
ValueError: no signature found for builtin <built-in method ultimate of example.Answer object at 0x000002179F3A11B0>
I've tried to add the annotations after the fact with Python code:
example.Answer.ultimate.__annotations__ = {'return': bool}
But the builtin method descriptors can't have annotations added this way.
Traceback (most recent call last):
File "<string>", line 2, in <module>
AttributeError: 'method_descriptor' object has no attribute '__annotations__'
Is there a way to add annotations to extension methods, using the C-API?
Argument Clinic looked promising and may still be very useful, but as of 3.6.5, it doesn't support annotations.
annotation
The annotation value for this parameter. Not currently supported, because PEP 8 mandates that the Python library may not use annotations.
TL;DR There is currently no way to do this.
How do signatures and C extensions work together?
In theory it works like this (for Python C extension objects):
If the C function has the "correct docstring" the signature is stored in the __text_signature__ attribute.
If you call help or inspect.signature on such an object it parses the __text_signature__ and tries to construct a signature from that.
If you use the argument clinic you don't need to write the "correct docstring" yourself. The signature line is generated based on comments in the code. However the 2 steps mentioned before still happen. They just happen to the automatically generated signature line.
That's why built-in Python functions like sum have a __text-signature__s:
>>> sum.__text_signature__
'($module, iterable, start=0, /)'
The signature in this case is generated through the argument clinic based on the comments around the sum implementation.
What are the problems with annotations?
There are several problems with annotations:
Return annotations break the contract of a "correct docstring". So the __text_signature__ will be empty when you add a return annotation. That's a major problem because a workaround would necessarily involve re-writing the part of the CPython C code that is responsible for the docstring -> __text_signature__ translation! That's not only complicated but you would also have to provide the changed CPython version so that it works for the people using your functions.
Just as example, if you use this "signature":
ultimate(self, question:str='Life, the universe, everything!') -> int
You get:
>>> ultimate.__text_signature__ is None
True
But if you remove the return annotation:
ultimate(self, question:str='Life, the universe, everything!')
It gives you a __text_signature__:
>>> ultimate.__text_signature__
"(self, question:str='Life, the universe, everything!')"
If you don't have the return annotation it still won't work because annotations are explicitly not supported (currently).
Assuming you have this signature:
ultimate(self, question:str='Life, the universe, everything!')
It doesn't work with inspect.signature (the exception message actually says it all):
>>> import inspect
>>> inspect.signature(ultimate)
Traceback (most recent call last):
...
raise ValueError("Annotations are not currently supported")
ValueError: Annotations are not currently supported
The function that is responsible for the parsing of __text_signature__ is inspect._signature_fromstr. In theory it could be possible that you maybe could make it work by monkey-patching it (return annotations still wouldn't work!). But maybe not, there are several places that make assumptions about the __text_signature__ that may not work with annotations.
Would PyFunction_SetAnnotations work?
In the comments this C API function was mentioned. However that deliberately doesn't work with C extension functions. If you try to call it on a C extension function it will raise a SystemError: bad argument to internal function call. I tested this with a small Cython Jupyter "script":
%load_ext cython
%%cython
cdef extern from "Python.h":
bint PyFunction_SetAnnotations(object func, dict annotations) except -1
cpdef call_PyFunction_SetAnnotations(object func, dict annotations):
PyFunction_SetAnnotations(func, annotations)
>>> call_PyFunction_SetAnnotations(sum, {})
---------------------------------------------------------------------------
SystemError Traceback (most recent call last)
<ipython-input-4-120260516322> in <module>()
----> 1 call_PyFunction_SetAnnotations(sum, {})
SystemError: ..\Objects\funcobject.c:211: bad argument to internal function
So that also doesn't work with C extension functions.
Summary
So return annotations are completely out of the question currently (at least without distributing your own CPython with the program). Parameter annotations could work if you monkey-patch a private function in the inspect module. It's a Python module so it could be feasible, but I haven't made a proof-of-concept so treat this as a maybe possible, but probably very complicated and almost certainly not worth the trouble.
However you can always just wrap the C extension function with a Python function (just a very thing wrapper). This Python wrapper can have function annotations. It's more maintenance and a tiny bit slower but saves you all the hassle with signatures and C extensions. I'm not exactly sure but if you use Cython to wrap your C or C++ code it might even have some automated tooling (writing the Python wrappers automatically).

How to prevent overwritting Python Built-in Function by accident?

I know that it is a bad idea to name a variable that is the same name as a Python built-in function. But say if a person doesn't know all the "taboo" variable names to avoid (e.g. list, set, etc.), is there a way to make Python at least to stop you (e.g. via error messages) from corrupting built-in functions?
For example, command line 4 below allows me to overwrite / corrupt the built-in function set() without stopping me / producing errors. (This error was left un-noticed until it gets to command line 6 below when set() is called.). Ideally I would like Python to stop me at command line 4 (instead of waiting till command line 6).
Note: following executions are performed in Python 2.7 (iPython) console. (Anaconda Spyder IDE).
In [1]: myset = set([1,2])
In [2]: print(myset)
set([1, 2])
In [3]: myset
Out[3]: {1, 2}
In [4]: set = set([3,4])
In [5]: print(set)
set([3, 4])
In [6]: set
Out[6]: {3, 4}
In [7]: myset2 = set([5,6])
Traceback (most recent call last):
File "<ipython-input-7-6f49577a7a45>", line 1, in <module>
myset2 = set([5,6])
TypeError: 'set' object is not callable
Background: I was following the tutorial at this HackerRank Python Set Challenge. The tutorial involves creating a variable valled set (which has the same name as the Python built-in function). I tried out the tutorial line-by-line exactly and got the "set object is not callable" error. The above test is driven by this exercise. (Update: I contacted HackerRank Support and they have confirmed they might have made a mistake creating a variable with built-in name.)
As others have said, in Python the philosophy is to allow users to "misuse" things rather than trying to imagine and prevent misuses, so nothing like this is built-in. But, by being so open to being messed around with, Python allows you to implement something like what you're talking about, in a limited way*. You can replace certain variable namespace dictionaries with objects that will prevent your favorite variables from being overwritten. (Of course, if this breaks any of your code in unexpected ways, you get both pieces.)
For this, you need to use use something like eval(), exec, execfile(), or code.interact(), or override __import__(). These allow you to provide objects, that should act like dictionaries, which will be used for storing variables. We can create a "safer" replacement dictionary by subclassing dict:
class SafeGlobals(dict):
def __setitem__(self, name, value):
if hasattr(__builtins__, name) or name == '__builtins__':
raise SyntaxError('nope')
return super(SafeGlobals, self).__setitem__(name, value)
my_globals = SafeGlobals(__builtins__=__builtins)
With my_globals set as the current namespace, setting a variable like this:
x = 3
Will translate to the following:
my_globals['x'] = 3
The following code will execute a Python file, using our safer dictionary for the top-level namespace:
execfile('safetyfirst.py', SafeGlobals(__builtins__=__builtins__))
An example with code.interact():
>>> code.interact(local=SafeGlobals(__builtins__=__builtins__))
Python 2.7.9 (default, Mar 1 2015, 12:57:24)
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> x = 2
>>> x
2
>>> dict(y=5)
{'y': 5}
>>> dict = "hi"
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "<stdin>", line 4, in __setitem__
SyntaxError: nope
*Unfortunately, this approach is very limited. It will only prevent overriding built-ins in the top-level namespace. You're free to override built-ins in other namespaces:
>>> def f():
... set = 1
... return set
...
>>> f()
1
This is an interesting idea; unfortunately, Python is not very restrictive and does not offer out-of-the-box solutions for such intentions. Overriding lower-level identifiers in deeper nested scopes is part of Python's philosophy and is wanted and often used, in fact. If you disabled this feature somehow, I guess a lot of library code would be broken at once.
Nevertheless you could create a check function which tests if anything in the current stack has been overridden. For this you would step through all the nested frames you are in and check if their locals also exist in their parent. This is very introspective work and probably not what you want to do but I think it could be done. With such a tool you could use the trace facility of Python to check after each executed line whether the state is still clean; that's the same functionality a debugger uses for step-by-step debugging, so this is again probably not what you want.
It could be done, but it would be like nailing glasses to a wall to make sure you never forget where they are.
Some more practical approach:
For builtins like the ones you mentioned you always can access them by writing explicitly __builtins__.set etc. For things imported, import the module and call the things by their module name (e. g. sys.exit() instead of exit()). And normally one knows when one is going to use an identifier, so just do not override it, e. g. do not create a variable named set if you are going to create a set object.

Python's c api and __add__ calls

I am writing a binding system that exposes classes and functions to python in a slightly unusual way.
Normally one would create a python type and provide a list of functions that represent the methods of that type, and then allow python to use its generic tp_getattro function to select the right one.
For reasons I wont go into here, I can't do it this way, and must provide my own tp_getattro function, that selects methods from elsewhere and returns my own 'bound method' wrapper. This works fine, but means that a types methods are not listed in its dictionary (so dir(MyType()) doesn't show anything interesting).
The problem is that I cannot seem to get __add__ methods working. see the following sample:
>>> from mymod import Vec3
>>> v=Vec3()
>>> v.__add__
<Bound Method of a mymod Class object at 0xb754e080>
>>> v.__add__(v)
<mymod.Vec3 object at 0xb751d710>
>>> v+v
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'mymod.Vec3' and 'mymod.Vec3'
As you can see, Vec3 has an __add__ method which can be called, but python's + refuses to use it.
How can I get python to use it? How does the + operator actually work in python, and what method does it use to see if you can add two arbitrary objects?
Thanks.
(P.S. I am aware of other systems such as Boost.Python and SWIG which do this automatically, and I have good reason for not using them, however wonderful they may be.)
Do you have an nb_add in your type's number methods structure (pointed by field tp_as_number of your type object)?

Categories

Resources