How/why does set() in {frozenset()} work?

How/why does set() in {frozenset()} work? - python

Even though sets are unhashable, membership check in other set works:
>>> set() in {frozenset()}
True
I expected TypeError: unhashable type: 'set', consistent with other behaviours in Python:
>>> set() in {} # doesn't work when checking in dict
TypeError: unhashable type: 'set'
>>> {} in {frozenset()} # looking up some other unhashable type doesn't work
TypeError: unhashable type: 'dict'
So, how is set membership in other set implemented?

set_contains is implemented like this:
static int
set_contains(PySetObject *so, PyObject *key)
{
PyObject *tmpkey;
int rv;
rv = set_contains_key(so, key);
if (rv < 0) {
if (!PySet_Check(key) || !PyErr_ExceptionMatches(PyExc_TypeError))
return -1;
PyErr_Clear();
tmpkey = make_new_set(&PyFrozenSet_Type, key);
if (tmpkey == NULL)
return -1;
rv = set_contains_key(so, tmpkey);
Py_DECREF(tmpkey);
}
return rv;
}
So this will delegate directly to set_contains_key which will essentially hash the object and then look up the element using its hash.
If the object is unhashable, set_contains_key returns -1, so we get inside that if. Here, we check explicitly whether the passed key object is a set (or an instance of a set subtype) and whether we previously got a type error. This would suggest that we tried a containment check with a set but that failed because it is unhashable.
In that exact situation, we now create a new frozenset from that set and attempt the containment check using set_contains_key again. And since frozensets are properly hashable, we are able to find our result that way.
This explains why the following examples will work properly even though the set itself is not hashable:
>>> set() in {frozenset()}
True
>>> set(('a')) in { frozenset(('a')) }
True

The last line of the documentation for sets discusses this:
Note, the elem argument to the __contains__(), remove(), and discard()
methods may be a set. To support searching for an equivalent
frozenset, a temporary one is created from elem.

Related

Equivalent of python walrus operator (:=) in c++11?

Recently I have been using the := operator in python quite a bit, in this way:
if my_object := SomeClass.function_that_returns_object():
# do something with this object if it exists
print(my_object.some_attribute)
The question
Is there any way to do this in c++11 without the use of stdlib?
for example in an arduino sketch if I wanted to use a method that may potentially return zero, such as:
if(char * data = myFile.readBytes(data, dataLen))
{
// do something
}
else
{
// do something else
}

Python's := assignment expression operator (aka, the "walrus" operator) returns the value of an assignment.
C++'s = assignment operator (both copy assignment and move assignment, as well as other assignment operators) does essentially the same thing, but in a different way. The result of an assignment is a reference to the object that was assigned to, allowing that object to be evaluated in further expressions.
So, the equivalent of:
if my_object := SomeClass.function_that_returns_object():
# do something with this object if it exists
print(my_object.some_attribute)
Would be just like you showed:
SomeType *object;
if ((my_object = SomeClass.function_that_returns_object())) {
// do something with this object if it exists
print(my_object->some_attribute);
}
If function_that_returns_object() returns a null pointer, the if evaluates object as false, otherwise it evaluates as true. The same can be done with other types, eg:
int value;
if ((value = SomeClass.function_that_returns_int()) == 12345) {
// do something with this value if it matches
}

Not exactly, no.
As is mentioned in other answers, the c++ = operator already does most of what you want. If have an existing variable then assignment to that variable returns a reference to it, so you can put that into an if condition:
Foo* a_pointer;
if (a_pointer = some_function()) {
//...
}
Here, the body of the if conditional will execute if some_function return a non-null pointer and a_pointer will be a copy of the pointer returned by some_function.
Unlike the walrus operator though, this has the limitation that a_pointer had to first be defined outside of the if condition.
C++17 adds something a bit closer, in that you can initialize a variable inside of the if condition with a special if-initializer syntax:
if (Foo* a_pointer = some_function(); a_pointer) {
//...
}
Note that the initializer still doesn't directly contribute to the truthiness of the if condition. It's only the expression after the ; that determines if the body of the if statement will execute. In this case, a_pointer is initialized to be the value returned by some_function in the initializer and then the condition part checks if a_pointer is truthy.

According to the documentation, readBytes returns the number of bytes placed in the buffer (not a pointer), so I think you just need to do something like:
if(myFile.readBytes(data, dataLen))
{
// do something with data
}
else
{
// do something else
}

operator.index with custom class instance

I have a simple class below,
class MyClass(int):
def __index__(self):
return 1
According to operator.index documentation,
operator.index(a)
Return a converted to an integer. Equivalent to a.__index__()
But when I use operator.index with MyClass instance, I got 100 instead of 1 (I am getting 1 if I use a.__index__()). Why is that?.
>>> a = MyClass(100)
>>>
>>> import operator
>>> print(operator.index(a))
100
>>> print(a.__index__())
1

This actually appears to be a deep-rooted issue in cpython. If you look at the source code for operator.py, you can see the definition of index:
def index(a):
"Same as a.__index__()."
return a.__index__()
So...why is it not equivalent? It's literally calling __index__. Well, at the bottom of the source, there's the culprit:
try:
from _operator import *
except ImportError:
pass
else:
from _operator import __doc__
It's overwriting the definitions with a native _operator module. In fact, if you comment this out (either by modifying the actual library or making your own fake operator.py* and importing that), it works. So, we can find the source code for the native _operator library, and look at the related part:
static PyObject *
_operator_index(PyObject *module, PyObject *a)
{
return PyNumber_Index(a);
}
So, it's a wrapper around the PyNumber_Index function. PyNumber_Index is a wrapper around _PyNumber_Index, so we can look at that:
PyObject *
_PyNumber_Index(PyObject *item)
{
PyObject *result = NULL;
if (item == NULL) {
return null_error();
}
if (PyLong_Check(item)) {
Py_INCREF(item);
return item;
}
if (!_PyIndex_Check(item)) {
PyErr_Format(PyExc_TypeError,
"'%.200s' object cannot be interpreted "
"as an integer", Py_TYPE(item)->tp_name);
return NULL;
}
result = Py_TYPE(item)->tp_as_number->nb_index(item);
if (!result || PyLong_CheckExact(result))
return result;
if (!PyLong_Check(result)) {
PyErr_Format(PyExc_TypeError,
"__index__ returned non-int (type %.200s)",
Py_TYPE(result)->tp_name);
Py_DECREF(result);
return NULL;
}
/* Issue #17576: warn if 'result' not of exact type int. */
if (PyErr_WarnFormat(PyExc_DeprecationWarning, 1,
"__index__ returned non-int (type %.200s). "
"The ability to return an instance of a strict subclass of int "
"is deprecated, and may be removed in a future version of Python.",
Py_TYPE(result)->tp_name)) {
Py_DECREF(result);
return NULL;
}
return result;
}
PyObject *
PyNumber_Index(PyObject *item)
{
PyObject *result = _PyNumber_Index(item);
if (result != NULL && !PyLong_CheckExact(result)) {
Py_SETREF(result, _PyLong_Copy((PyLongObject *)result));
}
return result;
}
You can see before it even calls nb_index (the C name for __index__), it calls PyLong_Check on the argument, and if it's true, it just returns the item with no modification. PyLong_Check is a macro that checks for long subtyping (int in python is a PyLong):
#define PyLong_Check(op) \
PyType_FastSubclass(Py_TYPE(op), Py_TPFLAGS_LONG_SUBCLASS)
#define PyLong_CheckExact(op) Py_IS_TYPE(op, &PyLong_Type)
So, basically, the takeaway is that for whatever reason, probably for speed, int subclasses don't get their __index__ method called, and instead just get _PyLong_Copy'd to the resulting return value, but only in the native _operator module, and not in the non-native operator.py. This conflict of implementation as well as inconsistency in documentation leads me to believe that this is an issue, either in the documentation or the implementation, and you may want to raise it as one.
It's likely a documentation and not an implementation issue, as cpython has a habit of sacrificing correctness for speed: (nan,) == (nan,) but nan != nan.
* You may have to name it something like fake_operator.py then import it with import fake_operator as operator

This is because your type is an int subclass. __index__ will not be used because the instance is already an integer. That much is by design, and unlikely to be considered a bug in CPython. PyPy behaves the same.
In _operator.c:
static PyObject *
_operator_index(PyObject *module, PyObject *a)
/*[clinic end generated code: output=d972b0764ac305fc input=6f54d50ea64a579c]*/
{
return PyNumber_Index(a);
}
Note that operator.py Python code is not used generally, this code is only a fallback in the case that compiled _operator module is not available. That explains why the result a.__index__() differs.
In abstract.c, cropped after the relevant PyLong_Check part:
/* Return an exact Python int from the object item.
Raise TypeError if the result is not an int
or if the object cannot be interpreted as an index.
*/
PyObject *
PyNumber_Index(PyObject *item)
{
PyObject *result = _PyNumber_Index(item);
if (result != NULL && !PyLong_CheckExact(result)) {
Py_SETREF(result, _PyLong_Copy((PyLongObject *)result));
}
return result;
}
...
/* Return a Python int from the object item.
Can return an instance of int subclass.
Raise TypeError if the result is not an int
or if the object cannot be interpreted as an index.
*/
PyObject *
_PyNumber_Index(PyObject *item)
{
PyObject *result = NULL;
if (item == NULL) {
return null_error();
}
if (PyLong_Check(item)) {
Py_INCREF(item);
return item; /* <---- short-circuited here */
}
...
}
The documentation for operator.index is inaccurate, so this may be considered a minor documentation issue:
>>> import operator
>>> operator.index.__doc__
'Same as a.__index__()'
So, why isn't __index__ considered for integers? The probable answer is found in PEP 357, under the discussion section titled Speed:
Implementation should not slow down Python because integers and long integers used as indexes will complete in the same number of instructions. The only change will be that what used to generate an error will now be acceptable.
We do not want to slow down the most common case for slicing with integers, having to check for an nb_index slot every time.

Update
This answer is incorrect; I misread the documentation. See Aplet123's answer instead. Tl;dr the problem is actually that the C implementation doesn't match the documentation and Python implementation. The C implementation is more like a if isinstance(a, int) else a.__index__().
To prove it, try defining MyClass.__int__(). The outcome will be the same.
Original answer
See the documentation for object.__index__():
object.__index__(self)
Called to implement operator.index(), and whenever Python needs to losslessly convert the numeric object to an integer object (such as in slicing, or in the built-in bin(), hex() and oct() functions). Presence of this method indicates that the numeric object is an integer type. Must return an integer.
If __int__(), __float__() and __complex__() are not defined then corresponding built-in functions int(), float() and complex() fall back to __index__().
(added bold)
a.__int__() exists, so its return value is used instead.
>>> a.__int__
<method-wrapper '__int__' of MyClass object at 0x7f2c5f0f4ec8>
>>> a.__int__()
100

Why don't python dict keys/values quack like a duck?

Python is duck typed, and generally this avoids casting faff when dealing with primitive objects.
The canonical example (and the reason behind the name) is the duck test: If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.
However one notable exception is dict keys/values, which look like a duck and swim like a duck, but notably do not quack like a duck.
>>> ls = ['hello']
>>> d = {'foo': 'bar'}
>>> for key in d.keys():
.. print(key)
..
'foo'
>>> ls + d.keys()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can only concatenate list (not "dict_keys") to list
Can someone enlighten me as to why this is?

Dict keys actually implements the set's interface rather than the list's, so you can perform set operations with dict keys directly with other sets:
d.keys() & {'foo', 'bar'} # returns {'foo'}
But it doesn't implement the __getitem__, __setitem__, __delitem__, and insert methods, which are required to "quack" like a list, so it cannot perform any of the list operations without being explicitly converted to a list first:
ls + list(d.keys()) # returns ['hello', 'foo']

There is an explicit check for list type (or its children) in python source code (so even tuple doesn't qualify):
static PyObject *
list_concat(PyListObject *a, PyObject *bb)
{
Py_ssize_t size;
Py_ssize_t i;
PyObject **src, **dest;
PyListObject *np;
if (!PyList_Check(bb)) {
PyErr_Format(PyExc_TypeError,
"can only concatenate list (not \"%.200s\") to list",
bb->ob_type->tp_name);
return NULL;
}
so python can compute size very quickly and reallocate the result without trying all containers or iterate on the right hand to find out, providing very fast list addition.
#define b ((PyListObject *)bb)
size = Py_SIZE(a) + Py_SIZE(b);
if (size < 0)
return PyErr_NoMemory();
np = (PyListObject *) PyList_New(size);
if (np == NULL) {
return NULL;
}
One way to workaround this is to use in-place extension/addition:
my_list += my_dict # adding .keys() is useless
because in that case, in-place add iterates on the right hand side: so every collection qualifies.
(or of course force iteration of the right hand: + list(my_dict))
So it could accept any type but I suspect that the makers of python didn't find it worth it and were satisfied with a simple & fast implementation which is used 99% of the time.

If you go into the definition of d.keys() the you can see the following.
def keys(self): # real signature unknown; restored from __doc__
""" D.keys() -> a set-like object providing a view on D's keys """
pass
Or use this statement:
print(d.keys.__doc__)
It clearly mentions that the output is set-like object.
Now you are trying to append a set to a list.
You need to convert the set into list and then append it.
x = ls + list(d.keys())
print(x)
# ['hello', 'foo']

overloaded iter is bypassed when deriving from dict

Trying to create a custom case-insensitive dictionary, I came the following inconvenient and (from my point-of-view) unexpected behaviour. If deriving a class from dict, the overloaded __iter__, keys, values functions are ignored when converting back to dict. I have condensed it to the following test case:
import collections
class Dict(dict):
def __init__(self):
super(Dict, self).__init__(x = 1)
def __getitem__(self, key):
return 2
def values(self):
return 3
def __iter__(self):
yield 'y'
def keys(self):
return 'z'
if hasattr(collections.MutableMapping, 'items'):
items = collections.MutableMapping.items
if hasattr(collections.MutableMapping, 'iteritems'):
iteritems = collections.MutableMapping.iteritems
d = Dict()
print(dict(d)) # {'x': 1}
print(dict(d.items())) # {'y': 2}
The values for keys,values and __iter__,__getitem__ are inconsistent only for demonstration which methods are actually called.
The documentation for dict.__init__ says:
If a positional argument is given and it is a mapping object, a
dictionary is created with the same key-value pairs as the mapping
object. Otherwise, the positional argument must be an iterator object.
I guess it has something to do with the first sentence and maybe with optimizations for builtin dictionaries.
Why exactly does the call to dict(d) not use any of keys, __iter__?
Is it possible to overload the 'mapping' somehow to force the dict constructor to use my presentation of key-value pairs?
Why did I use this? For a case-insensitive but -preserving dictionary, I wanted to:
store (lowercase => (original_case, value)) internally, while appearing as (any_case => value).
derive from dict in order to work with some external library code that uses isinstance checks
not use 2 dictionary lookups: lower_case=>original_case, followed by original_case=>value (this is the solution which I am doing now instead)
If you are interested in the application case: here is corresponding branch

In the file dictobject.c, you see in line 1795ff. the relevant code:
static int
dict_update_common(PyObject *self, PyObject *args, PyObject *kwds, char *methname)
{
PyObject *arg = NULL;
int result = 0;
if (!PyArg_UnpackTuple(args, methname, 0, 1, &arg))
result = -1;
else if (arg != NULL) {
_Py_IDENTIFIER(keys);
if (_PyObject_HasAttrId(arg, &PyId_keys))
result = PyDict_Merge(self, arg, 1);
else
result = PyDict_MergeFromSeq2(self, arg, 1);
}
if (result == 0 && kwds != NULL) {
if (PyArg_ValidateKeywordArguments(kwds))
result = PyDict_Merge(self, kwds, 1);
else
result = -1;
}
return result;
}
This tells us that if the object has an attribute keys, the code which is called is a mere merge. The code called there (l. 1915 ff.) makes a distinction between real dicts and other objects. In the case of real dicts, the items are read out with PyDict_GetItem(), which is the "most inner interface" to the object and doesn't bother using any user-defined methods.
So instead of inheriting from dict, you should use the UserDict module.

Is it possible to overload the 'mapping' somehow to force the dict constructor to use my presentation of key-value pairs?
No.
Being an inherent type, redefining the semantics of dict would certainly cause outright breakage elsewhere.
You've got a library that you can't override the behavior of dict in, that's tough, but redefining the language primitives isn't the answer. You'd probably find it irksome if someone screwed with the commutative property of integer addition behind your back; that's why they can't.
And with regard to your comment "UserDict (correctly) gives False in isinstance(d, dict) checks", of course it does because it isn't a dict and dict has very specific invariants which UserDict can't assure.

Class that acts as mapping for **unpacking

Without subclassing dict, what would a class need to be considered a mapping so that it can be passed to a method with **.
from abc import ABCMeta
class uobj:
__metaclass__ = ABCMeta
uobj.register(dict)
def f(**k): return k
o = uobj()
f(**o)
# outputs: f() argument after ** must be a mapping, not uobj
At least to the point where it throws errors of missing functionality of mapping, so I can begin implementing.
I reviewed emulating container types but simply defining magic methods has no effect, and using ABCMeta to override and register it as a dict validates assertions as subclass, but fails isinstance(o, dict). Ideally, I dont even want to use ABCMeta.

The __getitem__() and keys() methods will suffice:
>>> class D:
def keys(self):
return ['a', 'b']
def __getitem__(self, key):
return key.upper()
>>> def f(**kwds):
print kwds
>>> f(**D())
{'a': 'A', 'b': 'B'}

If you're trying to create a Mapping — not just satisfy the requirements for passing to a function — then you really should inherit from collections.abc.Mapping. As described in the documentation, you need to implement just:
__getitem__
__len__
__iter__
The Mixin will implement everything else for you: __contains__, keys, items, values, get, __eq__, and __ne__.

The answer can be found by digging through the source.
When attempting to use a non-mapping object with **, the following error is given:
TypeError: 'Foo' object is not a mapping
If we search CPython's source for that error, we can find the code that causes that error to be raised:
case TARGET(DICT_UPDATE): {
PyObject *update = POP();
PyObject *dict = PEEK(oparg);
if (PyDict_Update(dict, update) < 0) {
if (_PyErr_ExceptionMatches(tstate, PyExc_AttributeError)) {
_PyErr_Format(tstate, PyExc_TypeError,
"'%.200s' object is not a mapping",
Py_TYPE(update)->tp_name);
PyDict_Update is actually dict_merge, and the error is thrown when dict_merge returns a negative number. If we check the source for dict_merge, we can see what leads to -1 being returned:
/* We accept for the argument either a concrete dictionary object,
* or an abstract "mapping" object. For the former, we can do
* things quite efficiently. For the latter, we only require that
* PyMapping_Keys() and PyObject_GetItem() be supported.
*/
if (a == NULL || !PyDict_Check(a) || b == NULL) {
PyErr_BadInternalCall();
return -1;
The key part being:
For the latter, we only require that PyMapping_Keys() and PyObject_GetItem() be supported.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How/why does set() in {frozenset()} work? - python

The last line of the documentation for sets discusses this: Note, the elem argument to the contains(), remove(), and discard() methods may be a set. To support searching for an equivalent frozenset, a temporary one is created from elem.

Related

Equivalent of python walrus operator (:=) in c++11?

operator.index with custom class instance

Why don't python dict keys/values quack like a duck?

overloaded iter is bypassed when deriving from dict

Class that acts as mapping for **unpacking

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How/why does set() in {frozenset()} work? - python

The last line of the documentation for sets discusses this: Note, the elem argument to the __contains__(), remove(), and discard() methods may be a set. To support searching for an equivalent frozenset, a temporary one is created from elem.

Related

Equivalent of python walrus operator (:=) in c++11?

operator.index with custom class instance

Why don't python dict keys/values quack like a duck?

overloaded __iter__ is bypassed when deriving from dict

Class that acts as mapping for **unpacking

Categories

Resources

The last line of the documentation for sets discusses this: Note, the elem argument to the contains(), remove(), and discard() methods may be a set. To support searching for an equivalent frozenset, a temporary one is created from elem.

overloaded iter is bypassed when deriving from dict