I have a Python extension module which creates a tuple as an attribute of another object, and sets items in the tuple. Whenever I execute this module in Python, I keep getting the error SystemError: bad argument to internal function
After reading over the docs for PyTuple, and debugging my program for a few hours, I still couldn't figure out what the hell was going on. Running my program through a debugger indicated the problem was occurring within a library call inside the Python interpreter. So, finally, I looked at the Python source code, and at long last I realized the problem. The PyTuple_SetItem function has an interesting restriction which I didn't know about, and can't find explicitly documented.
Here is the important function in the Python source (edited for clarity):
int PyTuple_SetItem(register PyObject *op, register Py_ssize_t i, PyObject *newitem)
{
.....
if (!PyTuple_Check(op) || op->ob_refcnt != 1) {
Py_XDECREF(newitem);
PyErr_BadInternalCall();
return -1;
}
.....
}
The important line here is the condition op->ob_refcnt != 1. So here's the problem: you can't even call PyTuple_SetItem unless the Tuple has a ref-count of 1. It looks like the idea here is that you're never supposed to use PyTuple_SetItem except right after you create a tuple using PyTuple_New(). I guess this makes sense, since Tuples are supposed to be immutable, after all, so this restriction helps keep your C code more in line with the abstractions of the Python type system.
However, I can't find this restriction documented anywhere. Relevant docs seem to be here and here, neither of which specify this restriction. The docs basically say that when you call PyTuple_New(X), all items in the tuple are initialized to NULL. Since NULL is not a valid Python value, it's up to the extension-module programmer to make sure that all slots in the Tuple are filled in with proper Python values before returning the Tuple to the interpreter. But it doesn't say anywhere that this must be done while the Tuple object has a ref count of 1.
So now, the problem is that I've basically coded myself into a corner because I wasn't aware of this (undocumented?) restriction on PyTuple_SetItem. My code is structured in such a way that it's very inconvenient to insert items into the tuple until after the tuple itself has become an attribute of another object. So, when it comes time to fill in the items in the tuple, the tuple already has a higher ref count.
I'll probably have to restructure my code, but I was seriously considering just temporarily setting the ref count on the Tuple to 1, inserting the items, and then restoring the original ref count. Of course, that's a horrible hack, I know, and not any kind of permanent solution. Regardless, I'd like to know if the requirement regarding the ref count on the Tuple is documented anywhere. Is it just an implementation detail of CPython, or is it something that API users can rely on as expected behavior?
I'm quite sure that you can get around the restrictions by using PyTuple_SET_ITEM instead of PyTuple_SetItem. PyTuple_SET_ITEM is a macro defined in tupleobject.h as follows:
#define PyTuple_SET_ITEM(op, i, v) (((PyTupleObject*)(op))->ob_item[i] = v
So, if you are absolutely, definitely and utterly sure that:
op is a tuple object
you haven't initialized slot i in the tuple so far
you own a reference to v and you want to let the tuple steal it and
there is no chance of another Python object using the tuple for anything before you call PyTuple_SET_ITEM
then I guess you are safe to use PyTuple_SET_ITEM.
The Python C API is very underdocumented and I would not be surprised if this restriction wasn't mentioned anywhere.
Of course, you should never be modifying tuples once something has gotten a hold of them regardless; either pass in the elements you need to put in the tuple, or use a list instead.
Related
The quest started from a simple LeetCode problem. I am learning python and trying to solve a problem in leetcode where I have used len() while checking the condition in while loop. I got curious if I write len(nums) in while will my program do more computations. To find this out I started looking for the source code.
while i < len(nums):
#if both the numbers are same we can pop the ith number
#else just increase the index and return the length in the end.
if nums[i] == val:
nums.pop(i)
else:
i+=1
return len(nums)
Now, I have 2 question:
How to look for the source code of builtin functions without manually searching the source code on GitHub?
How len function works internally?
I have 2 assumptions for it:
Python treat every thing as an object and they have a property called length(or something like that) and whenever I pop an element from list. This property get decremented by 1.
Another assumption is python in someway iterates over the whole object and return the length.
I got the source code. However, again it's using another function to calculate the length.
static PyObject *
builtin_len(PyObject *module, PyObject *obj)
/*[clinic end generated code: output=fa7a270d314dfb6c input=bc55598da9e9c9b5]*/
{
Py_ssize_t res;
res = PyObject_Size(obj);
if (res < 0) {
assert(PyErr_Occurred());
return NULL;
}
return PyLong_FromSsize_t(res);
}
#Antti Haapala did a great job in explaining the answer. However, It doesn't answer my question.
These are some of the relevant question that I found on the stack overflow:
How to view source code of function in python?
explanation of C implementation python's len function
Question 0
I got curious if I write len(nums) in while will my program do more computations.
One aspect of this is documented in the Python wiki's TimeComplexity page for all built-in data structures. len() for a list is O(1).
If you mean something along the lines of
will my program be faster if I do n = len(nums), then manually subtract 1 from n each time I remove from the list
then that's a whole other question, and the answer is likely (measure it!) to be (perhaps somewhat unintuitively) "no", since len() is implemented in C (fast!) and interpreting Python code (n -= 1) and executing it is slower.
Question 1
How to look for the source code of builtin functions without manually searching the source code on GitHub?
As prerequisites, you will need to
know how to read C and understand the control flow
be able to keep track of the call graph (in your head, in a text file, on a notepad)
have an intuition of where you start looking in the source code
GitHub's source code search is, well, passable, but you'll have a better time downloading the source and using a better IDE to jump around in the code.
For built-in functions in modules, I'd start searching for e.g. mathmodule.c for the math functions, etc.
For implementations of objects, there's e.g. listobject.c. It's fairly logical (most of the time).
Question 2
You already found builtin_len.
You can see it calls PyObject_Size. That's defined here.
It does PySequenceMethods *m = Py_TYPE(o)->tp_as_sequence;, i.e. grabs a pointer to the type header of the object, and the "slot" (not to be confused with the Python userland slots) of the sequence-related methods for the object.
If that method collection contains a valid sq_length() function, it is called: Py_ssize_t len = m->sq_length(o); If that length is valid, it is returned, and len() wraps the bare size_t into a Python long object and passes it to you.
If that fails, PyMapping_Size gets called.
It does a similar thing as the sq_length stuff, only using mapping methods, tp_as_mapping and mp_length.
If all that fails, a TypeError is raised using the type_error() helper.
Here in listobject.c, you can see how list_length() is hooked up to be sq_length for list objects.
That function only calls Py_SIZE()[https://docs.python.org/3/c-api/structures.html#c.Py_SIZE], which is a macro to access the ob_size field which all PyVarObjects have.
The documentation on find how Python's list objects use ob_size is here.
As for how a custom type with __len__ hooks up into all of this, my recollection is that objects with __len__ will have their sq_length() call the Python callable, if one exists, and that value is then "trampolined" back through the C code back to your Python code.
I'm writing a shared object in Go (c-shared) which will be loaded and run from python. Everything is working fine, until the Go code needs to return an error. I am converting the error to string using error.Error() but when trying to return that to python, cgo is hitting:
panic: runtime error: cgo result has Go pointer
Which is very odd, since this is a string and not a pointer supposedly. I know there are no issues with returning go strings via shared object exported function, as I do that in several other places without any issue.
The Go code looks like:
package main
import "C"
//export MyFunction
func MyFunction() string {
err := CallSomethingInGo()
if err != nil {
return err.Error()
}
return ""
}
func main() {}
The go code is compiled to .so using buildmode=c-shared and then In the python code, I have something like this:
from ctypes import *
lib = cdll.LoadLibrary("./mygocode.so")
class GoString(Structure):
_fields_ = [("p", c_char_p),("n", c_longlong)]
theFunction = lib.MyFunction
theFunction.restype = GoString
err = theFunction()
When the last line executes and the golang code returns NO error then everything is fine and it works! But, if the golang code tries to return an error (e.g. CallSomethingInGo fails and returns err) then the python code fails with:
panic: runtime error: cgo result has Go pointer
I've tried manually returning strings from go to python and it works fine, but trying to return error.Error() (which should be a string per my understanding) fails. What is the correct way to return the string representation of the error to python?
One more piece of info - from golang, I did a printf("%T", err) and I see the type of the error is:
*os.PathError
I also did printf("%T", err.Error()) and confirmed the type returned by err.Error() was 'string' so I am still not sure why this isn't working.
Even stranger to me...I tried modifying the go functions as shown below for a test, and this code works fine and returns "test" as a string back to python...
//export MyFunction
func MyFunction() string {
err := CallSomethingInGo()
if err != nil {
// test
x := errors.New("test")
return x.Error()
}
return ""
}
I'm so confused! How can that test work, but not err.Error() ?
As I said in a comment, you're just not allowed to do that.
The rules for calling Go code from C code are outlined in the Cgo documentation, with this particular issue described in this section, in this way (though I have bolded a few sections in particular):
Passing pointers
Go is a garbage collected language, and the garbage collector needs to know the location of every pointer to Go memory. Because of this, there are restrictions on passing pointers between Go and C.
In this section the term Go pointer means a pointer to memory allocated by Go (such as by using the & operator or calling the predefined new function) and the term C pointer means a pointer to memory allocated by C (such as by a call to C.malloc). Whether a pointer is a Go pointer or a C pointer is a dynamic property determined by how the memory was allocated; it has nothing to do with the type of the pointer.
Note that values of some Go types, other than the type's zero value, always include Go pointers. This is true of string, slice, interface, channel, map, and function types. A pointer type may hold a Go pointer or a C pointer. Array and struct types may or may not include Go pointers, depending on the element types. All the discussion below about Go pointers applies not just to pointer types, but also to other types that include Go pointers.
Go code may pass a Go pointer to C provided the Go memory to which it points does not contain any Go pointers. The C code must preserve this property: it must not store any Go pointers in Go memory, even temporarily. When passing a pointer to a field in a struct, the Go memory in question is the memory occupied by the field, not the entire struct. When passing a pointer to an element in an array or slice, the Go memory in question is the entire array or the entire backing array of the slice.
C code may not keep a copy of a Go pointer after the call returns. This includes the _GoString_ type, which, as noted above, includes a Go pointer; _GoString_ values may not be retained by C code.
A Go function called by C code may not return a Go pointer (which implies that it may not return a string, slice, channel, and so forth). A Go function called by C code may take C pointers as arguments, and it may store non-pointer or C pointer data through those pointers, but it may not store a Go pointer in memory pointed to by a C pointer. A Go function called by C code may take a Go pointer as an argument, but it must preserve the property that the Go memory to which it points does not contain any Go pointers.
Go code may not store a Go pointer in C memory. C code may store Go pointers in C memory, subject to the rule above: it must stop storing the Go pointer when the C function returns.
These rules are checked dynamically at runtime. The checking is controlled by the cgocheck setting of the GODEBUG environment variable. The default setting is GODEBUG=cgocheck=1, which implements reasonably cheap dynamic checks. These checks may be disabled entirely using GODEBUG=cgocheck=0. Complete checking of pointer handling, at some cost in run time, is available via GODEBUG=cgocheck=2.
It is possible to defeat this enforcement by using the unsafe package, and of course there is nothing stopping the C code from doing anything it likes. However, programs that break these rules are likely to fail in unexpected and unpredictable ways.
This is what you are seeing: you have a program that breaks several rules, and now it fails in unexpected and unpredictable ways. In particular, your lib.MyFunction is
a Go function called by C code
since Python's cdll handlers count as C code. You can return nil, as that's the zero-value, but you are not allowed to return Go strings. The fact that the empty-string constant (and other string constants from some other error types) is not caught at runtime is a matter of luck.1
1Whether this is good luck or bad luck depends on your point of view. If it failed consistently, perhaps you would have consulted the Cgo documentation earlier. Instead, it fails unpredictably, but not in your most common case. What's happening here is that the string constants were compiled to text (or rodata) sections and therefore are not actually dynamically allocated. However, some—not all, but some—errors' string bytes are dynamically allocated. Some os.PathErrors point into GC-able memory, and these are the cases that are caught by the
reasonably cheap dynamic checks
mentioned in the second-to-last paragraph.
Part of a utility system my AcecoolLib package I'm writing by porting all / most of my logic to Python, and other various languages, on contains a simple, but greatly useful helper... a function named ENUM.
It has many useful features, such as automatically creating maps of the enums, extended or reverse maps if you have the map assigned to more than just values, and a lot more.
It can create maps for generating function names dynamically, it can create simple maps between enumeration and text or string identifiers for language, and much more.
The function declaration is simple, too:
def ENUM( _count = None, *_maps ):
It has an extra helper... Here: https://www.dropbox.com/s/6gzi44i7dh58v61/dynamic_properties_accessorfuncs_and_more.py?dl=0
The other one isn't used. ENUM_MAP is, but the other isn't.
Anyway, before I start going into etc.. etc.. the question is:
How can I count the return variables outside of the function... ie:
ENUM_EXAMPLE_A, ENUM_EXAMPLE_B, ENUM_EXAMPLE_C, ENUM_LIST_EXAMPLE, MAP_ENUM_EXAMPLE = ENUM( None, [ '#example_a', '#example_b', '#example_c' ] )
Where List is a simple list of 0 = 0, 1 = 1, 2 = 2, or something. , then the map links so [ 0 = '#example_a', 1 = '#example_b', etc.. ], then [ '#example_a' = 0, etc.. ] for reverse... or something along those lines.
There are other advanced use cases, not sure if I have those features in the file above, but regardless... I'm trying to simply count the return vars... and get the names.
I know it is likely possible, to read the line from which the call is executed... read the file, get the line, break it apart and do all of that... but I'm hoping something exists to do that without having to code it from scratch in the default Python system...
in short: I'd like to get rid of the first argument of ENUM( _count, *_maps ) so that only the optional *_maps is used. So if I call: ENUM_A, ENUM_B, ENUM_C, LIST_ENUMS = ENUM( ); it'll detect 4 output returns, and get the name of them so I can see if the last contains certain text different from the style of the first... ie, if they want the list, etc.... If they add a map, then optional list, etc.. and I can just count back n _maps to find the list arg, or not...
I know it probably isn't necessary, but I want it to be easy and dynamic so if I add a new enum to a giant list, I don't have to add the number ( although for those I use the maps which means I have to add an entry anyway )...
Either way - I know in Lua, this is stupid easy to do with built-in functions.. I'm hoping Python has built in functions to easily grab the data too.
Thanks!
Here is the one proposed answer, similar to what I could do in my Lua framework... The difference, though, is my framework has to load all of the files into memory ( for dynamic reloading, and dynamic changes, going to the appropriate location - and to network the data by combining everything so the file i/o cost is 'averted' - and Lua handles tables incredibly well ).
The simple answer, is that it is possible.. I'm not sure about in default Python without file i/o, however this method would easily work. This answer will be in pseudo context - but the functionality does exist.
Logic:
1) Using traces, you can determine which file / path and which line, called the ENUM function.
2) Read the calling file as text -- if you can read directly to a line without having to process the entire file - then that would be quicker. There may be some libraries out there that do this. In default Python, I haven't done a huge amount of file i/o other than the basics so I'm not up to speed on all of the most useful things as I typically use SQL for storage purposes, etc...
3) With the line in question, split the line text on '=', ie: before the function call to have the arguments, and the function itself.. call it _result
4)a IF you have no results then someone called the function without returning anything - odd..
4) split _result[ 0 ] on ',' to get each individual argument, and trim whitespace left / right --
5) Combine the clean arguments into a list..
6) Process the args -- ie: determine the method the developer uses to name their enum values, and see if that style changes from the last argument ( if no map ). If map, then go back n or n*2 elements for the list, then onward from there for the map vars. With maps, map returns are given - the only thing I need to do dynamically is the number and determine if the user has a list arg, or not..
Note: There is a very useful and simple mechanism in Python to do a lot of these functions in-line with a single line of code.
All of this is possible, and easy to create in Python. The thing I dislike about this solution is the fact that it requires file i/o -- If your program is executed from another program, and doesn't remain in memory, this means these tasks are always repeated making it less friendly, and more costly...
If the program opens, and remains open, then the cost is more up-front instead of on-going making it not as bad.
Because I use ENUMs in everything, including quick executable scripts which run then close - I don't want to use file i/o..
But, a solution does exist. I'm looking for an alternate.
Simple answer is you can't.
In Python when you do (a, b, c) = func() it's called tuple unpacking. Essentially it's expecting func() to return a tuple of exactly 3 elements (in this example). However, you can also do a = func() and then a will contain a 3-element tuple or whatever func decided to return. Regardless of how func is called, there's nothing within the method that knows how the return value is going to be processed after it's returned.
I wanted to provide a more pythonic way of doing what you're intending, but I'm not really sure I understand the purpose of ENUM(). It seems like you're trying to create constants, but Python doesn't really have true constants.
EDIT:
Methods are only aware of what's passed in as arguments. If you want some sort of ENUM to value mapping then the best equivalent is a dict. You could then have a method that took ENUM('A', 'B', 'C') and returned {'A':0, 'B':1, 'C':2} and then you'd use dict look-ups to get the values.
enum = ENUM('A', 'B', 'C')
print(enum['A']) # prints 0
As follow is my understanding of types & parameters passing in java and python:
In java, there are primitive types and non-primitive types. Former are not object, latter are objects.
In python, they are all objects.
In java, arguments are passed by value because:
primitive types are copied and then passed, so they are passed by value for sure. non-primitive types are passed by reference but reference(pointer) is also value, so they are also passed by value.
In python, the only difference is that 'primitive types'(for example, numbers) are not copied, but simply taken as objects.
Based on official doc, arguments are passed by assignment. What does it mean by 'passed by assignment'? Is objects in java work the same way as python? What result in the difference (passed by value in java and passed by argument in python)?
And is there any wrong understanding above?
tl;dr: You're right that Python's semantics are essentially Java's semantics, without any primitive types.
"Passed by assignment" is actually making a different distinction than the one you're asking about.1 The idea is that argument passing to functions (and other callables) works exactly the same way assignment works.
Consider:
def f(x):
pass
a = 3
b = a
f(a)
b = a means that the target b, in this case a name in the global namespace, becomes a reference to whatever value a references.
f(a) means that the target x, in this case a name in the local namespace of the frame built to execute f, becomes a reference to whatever value a references.
The semantics are identical. Whenever a value gets assigned to a target (which isn't always a simple name—e.g., think lst[0] = a or spam.eggs = a), it follows the same set of assignment rules—whether it's an assignment statement, a function call, an as clause, or a loop iteration variable, there's just one set of rules.
But overall, your intuitive idea that Python is like Java but with only reference types is accurate: You always "pass a reference by value".
Arguing over whether that counts as "pass by reference" or "pass by value" is pointless. Trying to come up with a new unambiguous name for it that nobody will argue about is even more pointless. Liskov invented the term "call by object" three decades ago, and if that never caught on, anything someone comes up with today isn't likely to do any better.
You understand the actual semantics, and that's what matters.
And yes, this means there is no copying. In Java, only primitive values are copied, and Python doesn't have primitive values, so nothing is copied.
the only difference is that 'primitive types'(for example, numbers) are not copied, but simply taken as objects
It's much better to see this as "the only difference is that there are no 'primitive types' (not even simple numbers)", just as you said at the start.
It's also worth asking why Python has no primitive types—or why Java does.2
Making everything "boxed" can be very slow. Adding 2 + 3 in Python means dereferencing the 2 and 3 objects, getting the native values out of them, adding them together, and wrapping the result up in a new 5 object (or looking it up in a table because you already have an existing 5 object). That's a lot more work than just adding two ints.3
While a good JIT like Hotspot—or like PyPy for Python—can often automatically do those optimizations, sometimes "often" isn't good enough. That's why Java has native types: to let you manually optimize things in those cases.
Python, instead, relies on third-party libraries like Numpy, which let you pay the boxing costs just once for a whole array, instead of once per element. Which keeps the language simpler, but at the cost of needing Numpy.4
1. As far as I know, "passed by assignment" appears a couple times in the FAQs, but is not actually used in the reference docs or glossary. The reference docs already lean toward intuitive over rigorous, but the FAQ, like the tutorial, goes much further in that direction. So, asking what a term in the FAQ means, beyond the intuitive idea it's trying to get across, may not be a meaningful question in the first place.
2. I'm going to ignore the issue of Java's lack of operator overloading here. There's no reason they couldn't include special language rules for a handful of core classes, even if they didn't let you do the same thing with your own classes—e.g., Go does exactly that for things like range, and people rarely complain.
3. … or even than looping over two arrays of 30-bit digits, which is what Python actually does. The cost of working on unlimited-size "bigints" is tiny compared to the cost of boxing, so Python just always pays that extra, barely-noticeable cost. Python 2 did, like Java, have separate fixed and bigint types, but a couple decades of experience showed that it wasn't getting any performance benefits out of the extra complexity.
4. The implementation of Numpy is of course far from simple. But using it is pretty simple, and a lot more people need to use Numpy than need to write Numpy, so that turns out to be a pretty decent tradeoff.
Similar to passing reference types by value in C#.
Docs: https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/classes-and-structs/passing-reference-type-parameters#passing-reference-types-by-value
Code demo:
# mutable object
l = [9, 8, 7]
def createNewList(l1: list):
# l1+[0] will create a new list object, the reference address of the local variable l1 is changed without affecting the variable l
l1 = l1+[0]
def changeList(l1: list):
# Add an element to the end of the list, because l1 and l refer to the same object, so l will also change
l1.append(0)
print(l)
createNewList(l)
print(l)
changeList(l)
print(l)
# immutable object
num = 9
def changeValue(val: int):
# int is an immutable type, and changing the val makes the val point to the new object 8,
# it's not change the num value
value = 8
print(num)
changeValue(num)
print(num)
I have the following in a Python script:
setattr(stringRESULTS, "b", b)
Which gives me the following error:
AttributeError: 'str' object has no attribute 'b'
Can any-one telling me what the problem is here?
Don't do this. To quote the inestimable Greg Hewgill,
"If you ever find yourself using quoted names to refer to variables,
there's usually a better way to do whatever you're trying to do."
[Here you're one level up and using a string variable for the name, but it's the same underlying issue.] Or as S. Lott followed up with in the same thread:
"90% of the time, you should be using a dictionary. The other 10% of
the time, you need to stop what you're doing entirely."
If you're using the contents of stringRESULTS as a pointer to some object fred which you want to setattr, then these objects you want to target must already exist somewhere, and a dictionary is the natural data structure to store them. In fact, depending on your use case, you might be able to use dictionary key/value pairs instead of attributes in the first place.
IOW, my version of what (I'm guessing) you're trying to do would probably look like
d[stringRESULTS].b = b
or
d[stringRESULTS]["b"] = b
depending on whether I wanted/needed to work with an object instance or a dictionary would suffice.
(P.S. relatively few people subscribe to the python-3.x tag. You'll usually get more attention by adding the bare 'python' tag as well.)
Since str is a low-level primitive type, you can't really set any arbitrary attribute on it. You probably need either a dict or a subclass of str:
class StringResult(str):
pass
which should behave as you expect:
my_string_result = StringResult("spam_and_eggs")
my_string_result.b = b
EDIT:
If you're trying to do what DSM suggests, ie. modify a property on a variable that has the same name as the value of the stringRESULTS variable then this should do the trick:
locals()[stringRESULTS].b = b
Please note that this is an extremely dangerous operation and can wreak all kinds of havoc on your app if you aren't careful.