Python augmented assignment issue - python

i ran into something interesting about the python augmented assignment +=
it seems to be automatic data type conversion is not always done for a += b if a is a 'simpler' data type, while a = a + b seems to work always
cases where the conversion is done
a = 1
b = 1j
a = 1
b = 0.5
case where the conversion is not done
from numpy import array
a = array([0, 0 ,0])
b = array([0, 0, 1j])
after a += b, a remains as integer matrix, instead of complex matrix
i used to think a += b is the same as a = a + b, what is the difference of them in the underlying implementation?

For the + operator, Python defines three "special" methods that an object may implement:
__add__: adds two items (+ operator). When you do a + b, the __add__ method of a is called with b as an argument.
__radd__: reflected add; for a + b, the __radd__ method of b is called with a as an instance. This is only used when a doesn't know how to do the add and the two objects are different types.
__iadd__: in-place add; used for a += b where the result is assigned back to the left variable. This is provided separately because it might be possible to implement it in a more efficient way. For example, if a is a list, then a += b is the same as a.extend(b). However, in the case of c = a + b you have to make a copy of a before you extend it since a is not to be modified in this case. Note that if you don't implement __iadd__ then Python will just call __add__ instead.
So since these different operations are implemented with separate methods, it is possible (but generally bad practice) to implement them so they do totally different things, or perhaps in this case, only slightly different things.
Others have deduced that you're using NumPy and explained its behavior. However, you asked about the underlying implementation. Hopefully you now see why it is sometimes the case that a += b is not the same as a = a + b. By the way, a similar trio of methods may also be implemented for other operations. See this page for a list of all the supported in-place methods.

If array is numpy.array (you don't actually specify), then the issue that's happening is because these arrays cannot change their type. When you create the array without a type specifier, it guesses a type. If you then attempt to do an operation that type doesn't support (like adding it to a type with a larger domain, like complex), numpy knows perform the calculation, but it also knows that the result can only be stored in the type with the larger domain. It complains (on my machine, anyway, the first time I do such an assignment) that the result doesn't fit. When you do a regular addition, a new array has to be made in any case, and numpy gives it the correct type.
>>> a=numpy.array([1])
>>> a.dtype
dtype('int32')
>>> b=numpy.array([1+1j])
>>> b.dtype
dtype('complex128')
>>> a+b
array([ 2.+1.j])
>>> (a+b).dtype
dtype('complex128')
>>> a+=b
>>> a
array([2])
>>> a.dtype
dtype('int32')
>>>

The difference between a = a + b and a += b is, that the latter addition will, whenever possible, done “in-place” that means by changing the object a. You can easily see this with lists.
a = b = [1, 2]
a += [3]
print b # [1, 2, 3]
a = b = [1, 2]
a = a + [3]
print b # [1, 2]

Rafe Kettler's answer is correct, but it seems you have managed to get a=[0,0,0] after adding it to b (according to your post).
Well if you're using numpy or scipy (I say this because I see array and wonder what array is being created here), then this is "normal", and should even raise a warning:
ComplexWarning: Casting complex values to real discards the imaginary part

Related

Is the marked information really lost?

a = a[0] = [['Is this information lost?']]
print(a)
Is there any way to gain the string again?
If not, how is this memory-wise handled?
This example is a bit more illustrative of what's going on:
>>> b = [1, 2]
>>> print(id(b))
22918532837512
>>> a = a[0] = b
>>> print(a)
[[...], 2]
>>> print(id(a))
22918532837512
>>> print(id(a[0]))
22918532837512
>>> print(b)
[[...], 2]
>>> print(id(b))
22918532837512
It is important to understand here that = is not formally an operator in Python, but rather a delimiter, part of the syntax for assignment statements. In Python, unlike in C or C++, assignments are not expressions. Multi-assignment statements such as x = y = z are directly accommodated by assignment-statement syntax, not as a consequence of using a single assignment as an expression. A multi-assignment specifies that the value(s) being assigned should be assigned to each target (list), so that x = y = z is equivalent to
x, y = z, z
Except that z is evaluated only once. Python does not define the order in which the targets are assigned, but CPython does it left-to-right, so that the above works about the same as
x = z
y = z
except, again, for the multiple evaluation of z.
And with that, we can understand the original statement. This:
a = a[0] = [['Is this information lost?']]
works like
temp = [['Is this information lost?']]
a = temp
a[0] = temp
del temp
, except that it does not involve a temporary name binding for the target list. Indeed, it should be clear that the previous is also equivalent to this, which is how I imagine most people would write it:
a = [['Is this information lost?']]
a[0] = a
Thus, to answer the original question, the string 'Is this information lost?' was never accessible other than via the list, so the assignment to a[0] leaves no name binding through which the string can be reached. The string is indeed lost at that point, in that sense. Python will continue to track it, however, until it is garbage collected.
As far as I can tell, a is a circular structure -- a list whose one and only element is itself.
>>> len(a)
1
>>> a
[[...]]
>>> len(a[0])
1
>>> a[0]
[[...]]
There is no longer any reference to the string, and hence no way to recover it from the Python interpreter. It should also therefore have been garbage-collected and no longer be in memory (although this is not guaranteed, and it may still be cached but inaccessible).

How to monkey patch python list __setitem__ method

I'd like to monkey-patch Python lists, in particular, replacing the __setitem__ method with custom code. Note that I am not trying to extend, but to overwrite the builtin types. For example:
>>> # Monkey Patch
... # Replace list.__setitem__ with a Noop
...
>>> myList = [1,2,3,4,5]
>>> myList[0] = "Nope"
>>> myList
[1, 2, 3, 4, 5]
Yes, I know that is a downright perverted thing to do to python code. No, my usecase doesn't really make sense. Nonetheless, can it be done?
Possible avenues:
Setting a read only attribute on builtins using ctypes
The forbiddenfruit module allows patching of C builtins, but does not work when trying to override the list methods
This Gist also manages monkey patching of builtin by manipulating the object's dictionary. I've updated it to Python3 here but it still doesn't allow overriding of the methods.
The Pyrthon library overrides the list type in a module to make it immutable by using AST transformation. This could be worth investigating.
Demonstrative example
I actually manage to override the methods themselves, as shown below:
import ctypes
def magic_get_dict(o):
# find address of dict whose offset is stored in the type
dict_addr = id(o) + type(o).__dictoffset__
# retrieve the dict object itself
dict_ptr = ctypes.cast(dict_addr, ctypes.POINTER(ctypes.py_object))
return dict_ptr.contents.value
def magic_flush_mro_cache():
ctypes.PyDLL(None).PyType_Modified(ctypes.cast(id(object), ctypes.py_object))
print(list.__setitem__)
dct = magic_get_dict(list)
dct['__setitem__'] = lambda s, k, v: s
magic_flush_mro_cache()
print(list.__setitem__)
x = [1,2,3,4,5]
print(x.__setitem__)
x.__setitem__(0,10)
x[1] = 20
print(x)
Which outputs the following:
➤ python3 override.py
<slot wrapper '__setitem__' of 'list' objects>
<function <lambda> at 0x10de43f28>
<bound method <lambda> of [1, 2, 3, 4, 5]>
[1, 20, 3, 4, 5]
But as shown in the output, this doesn't seem to affect the normal syntax for setting an item (x[0] = 0)
Alternative: Monkey patching an individual list instance
As a lesser alternative, if I was able to monkey patch an individual list's instance, this could work too. Perhaps by changing the class pointer of the list to a custom class.
A little late to the party, but nonetheless, here's the answer.
As user2357112 hinted in the comment above, modifying the dict won't suffice, since __getitme__ (and other double-underscore names) are mapped to their slot, and won't be updated without calling update_slot (which isn't exported, so that would be a little tricky).
Inspired by the above comment, here's a working example of making __setitem__ a no-op for specific lists:
# assuming v3.8 (tested on Windows x64 and Ubuntu x64)
# definition of PyTypeObject: https://github.com/python/cpython/blob/3.8/Include/cpython/object.h#L177
# no extensive testing was performed and I'll let other decide if this is a good idea or not, but it's possible
import ctypes
Py_TPFLAGS_HEAPTYPE = (1 << 9)
# calculate the offset of the tp_flags field
offset = ctypes.sizeof(ctypes.c_ssize_t) * 1 # PyObject_VAR_HEAD.ob_base.ob_refcnt
offset += ctypes.sizeof(ctypes.c_void_p) * 1 # PyObject_VAR_HEAD.ob_base.ob_type
offset += ctypes.sizeof(ctypes.c_ssize_t) * 1 # PyObject_VAR_HEAD.ob_size
offset += ctypes.sizeof(ctypes.c_void_p) * 1 # tp_name
offset += ctypes.sizeof(ctypes.c_ssize_t) * 2 # tp_basicsize+tp_itemsize
offset += ctypes.sizeof(ctypes.c_void_p) * 1 # tp_dealloc
offset += ctypes.sizeof(ctypes.c_ssize_t) * 1 # tp_vectorcall_offset
offset += ctypes.sizeof(ctypes.c_void_p) * 7 # tp_getattr+tp_setattr+tp_as_async+tp_repr+tp_as_number+tp_as_sequence+tp_as_mapping
offset += ctypes.sizeof(ctypes.c_void_p) * 6 # tp_hash+tp_call+tp_str+tp_getattro+tp_setattro+tp_as_buffer
tp_flags = ctypes.c_ulong.from_address(id(list) + offset)
assert(tp_flags.value == list.__flags__) # should be the same
lst1 = [1,2,3]
lst2 = [1,2,3]
dont_set_me = [lst1] # these lists cannot be set
# define new method
orig = list.__setitem__
def new_setitem(self, *args):
if [_ for _ in dont_set_me if _ is self]: # check for identical object in list
print('Nope')
else:
return orig(self, *args)
tp_flags.value |= Py_TPFLAGS_HEAPTYPE # add flag, to allow type_setattro to continue
list.__setitem__ = new_setitem # set method, this will already call PyType_Modified and update_slot
tp_flags.value &= (~Py_TPFLAGS_HEAPTYPE) # remove flag
print(lst1, lst2) # > [1, 2, 3] [1, 2, 3]
lst1[0],lst2[0]='x','x' # > Nope
print(lst1, lst2) # > [1, 2, 3] ['x', 2, 3]
Edit
See here why it's not supported to begin with. Mainly, as explained by Guido van Rossum:
This is prohibited intentionally to prevent accidental fatal changes to built-in types (fatal to parts of the code that you never though of). Also, it is done to prevent the changes to affect different interpreters residing in the address space, since built-in types (unlike user-defined classes) are shared between all such interpreters.
I also searched for all usages of Py_TPFLAGS_HEAPTYPE in cpython and they all seem to be related to GC or some validations.
So I guess if:
You don't change the types structure (I believe the above doesnt)
You're not using multiple interpreters in the same process
You remove the flag and immediately restore it in a single-threaded state
You don't really do anything that can affect GC when the flag is removed
You'll just be fine <generic disclaimer here>.
Can't be done. If you do force that using CTypes, you will just crash the Python runtime faster than anything else - as many things itnernally just make use of Python data types.

Function that works as append for numpy.array

How can I write a function that works like array.append() for numpy.array?
I have tried this
import numpy as np
def append_np(ar, el):
ar = np.append(ar, el)
z = np.array([5], dtype='int32')
z = np.append(z, 6)
append_np(z, 7)
print z
but this code appends only '6':
[5 6]
"that works like array.append()"
First of all, the data structure in Python you most likely are referring to here as "array" is called "list".
Then, the append() methods for Python lists and Numpy arrays behave fundamentally different. Say that l is a Python list. l.append() modifies the list in-place and returns None. In contrast, Numpy's append() method for arrays does not change the array it operates on. It returns a new array object.
See: http://docs.scipy.org/doc/numpy/reference/generated/numpy.append.html
A copy of arr with values appended to axis. Note that append does not
occur in-place: a new array is allocated and filled.
This explains why you need to return the result of your append_np() function and assign the return value, as in new_z = append_np(z, 7).
You have probably used this function for a Python list:
def append(ar, el):
ar = ar.append(el)
and called it like this:
z = [1, 2]
append(z, 7)
print z
And you have seen that it has modified your z, indeed. But why, what has happened in this function? The object that was passed as first argument (bound to the name ar) got modified in-place. That is why z "on the outside" changed. You made use of this side effect of the function without knowing, and this is dangerous. Within the function the name ar got re-assigned to the None singleton object (which is the return value of the list append method). You did not return this object or use it, so this assignment served no purpose in your program. You discovered yourself that this approach is problematic, because when you re-structured your function to append_np() you suddenly realized that it did not have a "side effect" on z.
That is, for Python lists you would not outsource the append operation into another function. You would just, from the very beginning, state:
z = [1, 2]
z.append(7)
print z

In Python, why doesn't 'y = x; y += 1' also increment x?

First create a function for displaying reference count (note that we have to -1 each time to get the correct value, as the function itself INCREF-s the argument)
>>> from sys import getrefcount as rc
>>> x=1.1
>>> rc(x)-1
1
Now make another reference to the same PyObject:
>>> y=x
>>> rc(x)-1
2
>>> rc(y)-1
2
>>> x is y
True
Now perform an operation on the second handle, y:
>>> y+=1
This should be invoking PyNumber_InPlaceAdd on the PyObject that y points to.
So if this is true I would be expecting x to also read 2.1
>>> x,y
(1.1, 2.1)
>>> x is y
False
>>> rc(x)-1
1
>>> rc(y)-1
1
So my question is, what is Python doing internally to provide the right behaviour, rather than the behaviour I would expect from looking at PyNumber_InPlaceAdd?
(Note: I am using 1.1; if I used 1 the initial reference count would be >300, because 1 must be used all over the place behind-the-scenes in CPython, and it is clever enough to reuse objects.)
(This also begs the question: if I have foo = 20; bar = 19; bar += 1 does this mean it has to look through all its objects and check whether there already exists an object with this value, and if so reuse it? A simple test shows that the answer is no. Which is good news. It would be horribly slow once the program size gets big. So Python must just optimise for small integers.)
You don't need getrefcount for this, you can just use id:
>>> x = 1.1
>>> id(x)
50107888
>>> y = x
>>> id(y)
50107888 # same object
>>> y += 1
>>> id(y)
40186896 # different object
>>> id(x)
50107888 # no change there
float objects (along with e.g. str and int) are immutable in Python, they cannot be changed in-place. The addition operation therefore creates a new object, with the new value, and assigns it to the name y, effectively:
temp = y + 1
y = temp
In CPython, integers from -5 to 256 inclusive are "interned", i.e. stored for reuse, such that any operation with the result e.g. 1 will give a reference to the same object. This saves memory compared to creating new objects for these frequently-used values each time they're needed. You're right that it would be a pain to search all existing objects for a match every time a new object might be needed, so this is only done over a limited range. Using a contiguous range also means that the "search" is really just an offset in an array.
Now perform an operation on the second handle, y:
>>> y+=1
This should be invoking PyNumber_InPlaceAdd on the PyObject that y
points to.
Up to here you are right.
But in-place adding of numbers returns a distinct object, not the old one.
The old one, as it is immutable, keeps its value.

What are the semantics of the 'is' operator in Python?

How does the is operator determine if two objects are the same? How does it work? I can't find it documented.
From the documentation:
Every object has an identity, a type
and a value. An object’s identity
never changes once it has been
created; you may think of it as the
object’s address in memory. The ‘is‘
operator compares the identity of two
objects; the id() function returns an
integer representing its identity
(currently implemented as its
address).
This would seem to indicate that it compares the memory addresses of the arguments, though the fact that it says "you may think of it as the object's address in memory" might indicate that the particular implementation is not guranteed; only the semantics are.
Comparison Operators
Is works by comparing the object referenced to see if the operands point to the same object.
>>> a = [1, 2]
>>> b = a
>>> a is b
True
>>> c = [1, 2]
>>> a is c
False
c is not the same list as a therefore the is relation is false.
To add to the other answers, you can think of a is b working as if it was is_(a, b):
def is_(a, b):
return id(a) == id(b)
Note that you cannot directly replace a is b with id(a) == id(b), but the above function avoids that through parameters.

Categories

Resources