How to monkey patch python list __setitem__ method - python

I'd like to monkey-patch Python lists, in particular, replacing the __setitem__ method with custom code. Note that I am not trying to extend, but to overwrite the builtin types. For example:
>>> # Monkey Patch
... # Replace list.__setitem__ with a Noop
...
>>> myList = [1,2,3,4,5]
>>> myList[0] = "Nope"
>>> myList
[1, 2, 3, 4, 5]
Yes, I know that is a downright perverted thing to do to python code. No, my usecase doesn't really make sense. Nonetheless, can it be done?
Possible avenues:
Setting a read only attribute on builtins using ctypes
The forbiddenfruit module allows patching of C builtins, but does not work when trying to override the list methods
This Gist also manages monkey patching of builtin by manipulating the object's dictionary. I've updated it to Python3 here but it still doesn't allow overriding of the methods.
The Pyrthon library overrides the list type in a module to make it immutable by using AST transformation. This could be worth investigating.
Demonstrative example
I actually manage to override the methods themselves, as shown below:
import ctypes
def magic_get_dict(o):
# find address of dict whose offset is stored in the type
dict_addr = id(o) + type(o).__dictoffset__
# retrieve the dict object itself
dict_ptr = ctypes.cast(dict_addr, ctypes.POINTER(ctypes.py_object))
return dict_ptr.contents.value
def magic_flush_mro_cache():
ctypes.PyDLL(None).PyType_Modified(ctypes.cast(id(object), ctypes.py_object))
print(list.__setitem__)
dct = magic_get_dict(list)
dct['__setitem__'] = lambda s, k, v: s
magic_flush_mro_cache()
print(list.__setitem__)
x = [1,2,3,4,5]
print(x.__setitem__)
x.__setitem__(0,10)
x[1] = 20
print(x)
Which outputs the following:
➤ python3 override.py
<slot wrapper '__setitem__' of 'list' objects>
<function <lambda> at 0x10de43f28>
<bound method <lambda> of [1, 2, 3, 4, 5]>
[1, 20, 3, 4, 5]
But as shown in the output, this doesn't seem to affect the normal syntax for setting an item (x[0] = 0)
Alternative: Monkey patching an individual list instance
As a lesser alternative, if I was able to monkey patch an individual list's instance, this could work too. Perhaps by changing the class pointer of the list to a custom class.

A little late to the party, but nonetheless, here's the answer.
As user2357112 hinted in the comment above, modifying the dict won't suffice, since __getitme__ (and other double-underscore names) are mapped to their slot, and won't be updated without calling update_slot (which isn't exported, so that would be a little tricky).
Inspired by the above comment, here's a working example of making __setitem__ a no-op for specific lists:
# assuming v3.8 (tested on Windows x64 and Ubuntu x64)
# definition of PyTypeObject: https://github.com/python/cpython/blob/3.8/Include/cpython/object.h#L177
# no extensive testing was performed and I'll let other decide if this is a good idea or not, but it's possible
import ctypes
Py_TPFLAGS_HEAPTYPE = (1 << 9)
# calculate the offset of the tp_flags field
offset = ctypes.sizeof(ctypes.c_ssize_t) * 1 # PyObject_VAR_HEAD.ob_base.ob_refcnt
offset += ctypes.sizeof(ctypes.c_void_p) * 1 # PyObject_VAR_HEAD.ob_base.ob_type
offset += ctypes.sizeof(ctypes.c_ssize_t) * 1 # PyObject_VAR_HEAD.ob_size
offset += ctypes.sizeof(ctypes.c_void_p) * 1 # tp_name
offset += ctypes.sizeof(ctypes.c_ssize_t) * 2 # tp_basicsize+tp_itemsize
offset += ctypes.sizeof(ctypes.c_void_p) * 1 # tp_dealloc
offset += ctypes.sizeof(ctypes.c_ssize_t) * 1 # tp_vectorcall_offset
offset += ctypes.sizeof(ctypes.c_void_p) * 7 # tp_getattr+tp_setattr+tp_as_async+tp_repr+tp_as_number+tp_as_sequence+tp_as_mapping
offset += ctypes.sizeof(ctypes.c_void_p) * 6 # tp_hash+tp_call+tp_str+tp_getattro+tp_setattro+tp_as_buffer
tp_flags = ctypes.c_ulong.from_address(id(list) + offset)
assert(tp_flags.value == list.__flags__) # should be the same
lst1 = [1,2,3]
lst2 = [1,2,3]
dont_set_me = [lst1] # these lists cannot be set
# define new method
orig = list.__setitem__
def new_setitem(self, *args):
if [_ for _ in dont_set_me if _ is self]: # check for identical object in list
print('Nope')
else:
return orig(self, *args)
tp_flags.value |= Py_TPFLAGS_HEAPTYPE # add flag, to allow type_setattro to continue
list.__setitem__ = new_setitem # set method, this will already call PyType_Modified and update_slot
tp_flags.value &= (~Py_TPFLAGS_HEAPTYPE) # remove flag
print(lst1, lst2) # > [1, 2, 3] [1, 2, 3]
lst1[0],lst2[0]='x','x' # > Nope
print(lst1, lst2) # > [1, 2, 3] ['x', 2, 3]
Edit
See here why it's not supported to begin with. Mainly, as explained by Guido van Rossum:
This is prohibited intentionally to prevent accidental fatal changes to built-in types (fatal to parts of the code that you never though of). Also, it is done to prevent the changes to affect different interpreters residing in the address space, since built-in types (unlike user-defined classes) are shared between all such interpreters.
I also searched for all usages of Py_TPFLAGS_HEAPTYPE in cpython and they all seem to be related to GC or some validations.
So I guess if:
You don't change the types structure (I believe the above doesnt)
You're not using multiple interpreters in the same process
You remove the flag and immediately restore it in a single-threaded state
You don't really do anything that can affect GC when the flag is removed
You'll just be fine <generic disclaimer here>.

Can't be done. If you do force that using CTypes, you will just crash the Python runtime faster than anything else - as many things itnernally just make use of Python data types.

Related

UnboundLocalError when using += operator [duplicate]

I've seen there are actually two (maybe more) ways to concatenate lists in Python:
One way is to use the extend() method:
a = [1, 2]
b = [2, 3]
b.extend(a)
the other to use the plus (+) operator:
b += a
Now I wonder: which of those two options is the 'pythonic' way to do list concatenation and is there a difference between the two? (I've looked up the official Python tutorial but couldn't find anything anything about this topic).
The only difference on a bytecode level is that the .extend way involves a function call, which is slightly more expensive in Python than the INPLACE_ADD.
It's really nothing you should be worrying about, unless you're performing this operation billions of times. It is likely, however, that the bottleneck would lie some place else.
You can't use += for non-local variable (variable which is not local for function and also not global)
def main():
l = [1, 2, 3]
def foo():
l.extend([4])
def boo():
l += [5]
foo()
print l
boo() # this will fail
main()
It's because for extend case compiler will load the variable l using LOAD_DEREF instruction, but for += it will use LOAD_FAST - and you get *UnboundLocalError: local variable 'l' referenced before assignment*
You can chain function calls, but you can't += a function call directly:
class A:
def __init__(self):
self.listFoo = [1, 2]
self.listBar = [3, 4]
def get_list(self, which):
if which == "Foo":
return self.listFoo
return self.listBar
a = A()
other_list = [5, 6]
a.get_list("Foo").extend(other_list)
a.get_list("Foo") += other_list #SyntaxError: can't assign to function call
I would say that there is some difference when it comes with numpy (I just saw that the question ask about concatenating two lists, not numpy array, but since it might be a issue for beginner, such as me, I hope this can help someone who seek the solution to this post), for ex.
import numpy as np
a = np.zeros((4,4,4))
b = []
b += a
it will return with error
ValueError: operands could not be broadcast together with shapes (0,) (4,4,4)
b.extend(a) works perfectly
The .extend() method on lists works with any iterable*, += works with some but can get funky.
import numpy as np
l = [2, 3, 4]
t = (5, 6, 7)
l += t
l
[2, 3, 4, 5, 6, 7]
l = [2, 3, 4]
t = np.array((5, 6, 7))
l += t
l
array([ 7, 9, 11])
l = [2, 3, 4]
t = np.array((5, 6, 7))
l.extend(t)
l
[2, 3, 4, 5, 6, 7]
Python 3.6
*pretty sure .extend() works with any iterable but please comment if I am incorrect
Edit: "extend()" changed to "The .extend() method on lists"
Note: David M. Helmuth's comment below is nice and clear.
Actually, there are differences among the three options: ADD, INPLACE_ADD and extend. The former is always slower, while the other two are roughly the same.
With this information, I would rather use extend, which is faster than ADD, and seems to me more explicit of what you are doing than INPLACE_ADD.
Try the following code a few times (for Python 3):
import time
def test():
x = list(range(10000000))
y = list(range(10000000))
z = list(range(10000000))
# INPLACE_ADD
t0 = time.process_time()
z += x
t_inplace_add = time.process_time() - t0
# ADD
t0 = time.process_time()
w = x + y
t_add = time.process_time() - t0
# Extend
t0 = time.process_time()
x.extend(y)
t_extend = time.process_time() - t0
print('ADD {} s'.format(t_add))
print('INPLACE_ADD {} s'.format(t_inplace_add))
print('extend {} s'.format(t_extend))
print()
for i in range(10):
test()
ADD 0.3540440000000018 s
INPLACE_ADD 0.10896000000000328 s
extend 0.08370399999999734 s
ADD 0.2024550000000005 s
INPLACE_ADD 0.0972940000000051 s
extend 0.09610200000000191 s
ADD 0.1680199999999985 s
INPLACE_ADD 0.08162199999999586 s
extend 0.0815160000000077 s
ADD 0.16708400000000267 s
INPLACE_ADD 0.0797719999999913 s
extend 0.0801490000000058 s
ADD 0.1681250000000034 s
INPLACE_ADD 0.08324399999999343 s
extend 0.08062700000000689 s
ADD 0.1707760000000036 s
INPLACE_ADD 0.08071900000000198 s
extend 0.09226200000000517 s
ADD 0.1668420000000026 s
INPLACE_ADD 0.08047300000001201 s
extend 0.0848089999999928 s
ADD 0.16659500000000094 s
INPLACE_ADD 0.08019399999999166 s
extend 0.07981599999999389 s
ADD 0.1710910000000041 s
INPLACE_ADD 0.0783479999999912 s
extend 0.07987599999999873 s
ADD 0.16435900000000458 s
INPLACE_ADD 0.08131200000001115 s
extend 0.0818660000000051 s
From the CPython 3.5.2 source code:
No big difference.
static PyObject *
list_inplace_concat(PyListObject *self, PyObject *other)
{
PyObject *result;
result = listextend(self, other);
if (result == NULL)
return result;
Py_DECREF(result);
Py_INCREF(self);
return (PyObject *)self;
}
ary += ext creates a new List object, then copies data from lists "ary" and "ext" into it.
ary.extend(ext) merely adds reference to "ext" list to the end of the "ary" list, resulting in less memory transactions.
As a result, .extend works orders of magnitude faster and doesn't use any additional memory outside of the list being extended and the list it's being extended with.
╰─➤ time ./list_plus.py
./list_plus.py 36.03s user 6.39s system 99% cpu 42.558 total
╰─➤ time ./list_extend.py
./list_extend.py 0.03s user 0.01s system 92% cpu 0.040 total
The first script also uses over 200MB of memory, while the second one doesn't use any more memory than a 'naked' python3 process.
Having said that, the in-place addition does seem to do the same thing as .extend.
I've looked up the official Python tutorial but couldn't find anything anything about this topic
This information happens to be buried in the Programming FAQ:
... for lists, __iadd__ [i.e. +=] is equivalent to calling extend on the list and returning the list. That's why we say that for lists, += is a "shorthand" for list.extend
You can also see this for yourself in the CPython source code: https://github.com/python/cpython/blob/v3.8.2/Objects/listobject.c#L1000-L1011
Only .extend() can be used when the list is in a tuple
This will work
t = ([],[])
t[0].extend([1,2])
while this won't
t = ([],[])
t[0] += [1,2]
The reason is that += generates a new object. If you look at the long version:
t[0] = t[0] + [1,2]
you can see how that would change which object is in the tuple, which is not possible. Using .extend() modifies an object in the tuple, which is allowed.
According to the Python for Data Analysis.
“Note that list concatenation by addition is a comparatively expensive operation since a new list must be created and the objects copied over. Using extend to append elements to an existing list, especially if you are building up a large list, is usually preferable. ”
Thus,
everything = []
for chunk in list_of_lists:
everything.extend(chunk)
is faster than the concatenative alternative:
everything = []
for chunk in list_of_lists:
everything = everything + chunk

Is it possible something like lvalue of perl or setf of lisp in python?

In lisp you can say:
(setf (aref a 1) 5)
In perl you can say:
substr( $string, $start, $stop ) =~ s/a/b/g
Is it possible something like this in python? I mean is it possible to use function result as a lvalue (as a target for assignment operation)?
No. Assigning to the result of a function call is specifically prohibited at the compiler level:
>>> foo() = 3
File "<stdin>", line 1
SyntaxError: can't assign to function call
There are however two special cases in the Python syntax:
# Slice assignment
a = [1,2,3,4]
a[0:2] = 98, 99 # (a will become [98, 99, 3, 4])
# Tuple assignment
(x, y, z) = (10, 20, 30)
Note also that in Python there is a statement/function duality and an assignment or an augmented assignment (+=, *= ...) is not just a normal operator, but a statement and has special rules.
Moreover in Python there is no general concept of "pointer"... the only way to pass to a function a place where to store something is to pass a "setter" closure because to find an assignable place you need to use explicit names, indexing or you need to work with the instance dictionary if the place is an object instance member).
# Pass the function foo where to store the result
foo( lambda value : setattr(myObject, "member", value) )
No, there isn't any way to do this in general. The slice notation comes close in a limited case, as you can do things like this:
>>> a = [1, 2, 3]
>>> a[1:2] = [5, 6]
>>> a
[1, 5, 6, 3]
In short, no.
However, if you define __setitem__, you can assign to a subscript, e.g.
foo['subscript'] = 7
And you could return foo (and also the subscript, if you wanted) from a function.
container, subscript = whatevs()
container[subscript] = 7
or, in one line:
operator.setitem(*(whatevs()+(7,)))
See operator.
Generally, no (don't stop reading!!!!). Observe the following:
class test:
test = 4
test().test = 5
# we can no longer refer to the created object.
x = test().test = 6
x # 6
However, doing some searching I found this (which looks like bad practice, but usable):
globals()["varname"] = 5
varname # 5
So, mixing your Perl with my Python we get:
globals()[substr( $string, $start, $stop )] = something
substr( $string, $start, $stop ) # something
# Note: wouldn't work because the function returns a string.
# I just don't know what the function returns.
# so exec("print " +substr( $string, $start, $stop ) I guess
# similarly, and possibly a little better practice
locals()["somethingdif"] = somethingelse
somethingdif # somethingelse
To mitigate massive downvoting, I should mention you can totally screw up your program with this. But you probably know that. Just make sure you don't overwrite existing variables when using this method by checking "somevar" not in locals() or "somevar" not in globals().

How Does Calling Work In Python? [duplicate]

This question already has answers here:
Does Python make a copy of objects on assignment?
(5 answers)
How do I pass a variable by reference?
(39 answers)
Why can a function modify some arguments as perceived by the caller, but not others?
(13 answers)
Closed last month.
For a project I'm working on, I'm implementing a linked-list data-structure, which is based on the idea of a pair, which I define as:
class Pair:
def __init__(self, name, prefs, score):
self.name = name
self.score = score
self.preferences = prefs
self.next_pair = 0
self.prev_pair = 0
where self.next_pair and self.prev_pair are pointers to the previous and next links, respectively.
To set up the linked-list, I have an install function that looks like this.
def install(i, pair):
flag = 0
try:
old_pair = pair_array[i]
while old_pair.next_pair != 0:
if old_pair == pair:
#if pair in remainders: remainders.remove(pair)
return 0
if old_pair.score < pair.score:
flag = 1
if old_pair.prev_pair == 0: # we are at the beginning
old_pair.prev_pair = pair
pair.next_pair = old_pair
pair_array[i] = pair
break
else: # we are not at the beginning
pair.prev_pair = old_pair.prev_pair
pair.next_pair = old_pair
old_pair.prev_pair = pair
pair.prev_pair.next_pair = pair
break
else:
old_pair = old_pair.next_pair
if flag==0:
if old_pair == pair:
#if pair in remainders: remainders.remove(pair)
return 0
if old_pair.score < pair.score:
if old_pair.prev_pair==0:
old_pair.prev_pair = pair
pair.next_pair = old_pair
pair_array[i] = pair
else:
pair.prev_pair = old_pair.prev_pair
pair.next_pair = old_pair
old_pair.prev_pair = pair
pair.prev_pair.next_pair = pair
else:
old_pair.next_pair = pair
pair.prev_pair = old_pair
except KeyError:
pair_array[i] = pair
pair.prev_pair = 0
pair.next_pair = 0
Over the course of the program, I am building up a dictionary of these linked-lists, and taking links off of some and adding them in others. Between being pruned and re-installed, the links are stored in an intermediate array.
Over the course of debugging this program, I have come to realize that my understanding of the way Python passes arguments to functions is flawed. Consider this test case I wrote:
def test_install():
p = Pair(20000, [3, 1, 2, 50], 45)
print p.next_pair
print p.prev_pair
parse_and_get(g)
first_run()
rat = len(juggler_array)/len(circuit_array)
pref_size = get_pref_size()
print pref_size
print install(3, p)
print p.next_pair.name
print p.prev_pair
When I run this test, I get the following result.
0
0
10
None
10108
0
What I don't understand is why the second call to p.next_pair produces a different result (10108) than the first call (0). install does not return a Pair object that can overwrite the one passed in (it returns None), and it's not as though I'm passing install a pointer.
My understanding of call-by-value is that the interpreter copies the values passed into a function, leaving the caller's variables unchanged. For example, if I say
def foo(x):
x = x+1
return x
baz = 2
y = foo(baz)
print y
print baz
Then 3 and 2 should be printed, respectively. And indeed, when I test that out in the Python interpreter, that's what happens.
I'd really appreciate it if anyone can point me in the right direction here.
In Python, everything is an object. Simple assignment stores a reference to the assigned object in the assigned-to name. As a result, it is more straightforward to think of Python variables as names that are assigned to objects, rather than objects that are stored in named locations.
For example:
baz = 2
... stores in baz a pointer, or reference, to the integer object 2 which is stored elsewhere. (Since the type int is immutable, Python actually has a pool of small integers and reuses the same 2 object everywhere, but this is an implementation detail that need not concern us much.)
When you call foo(baz), foo()'s local variable x also points to the integer object 2 at first. That is, the foo()-local name x and the global name baz are names for the same object, 2. Then x = x + 1 is executed. This changes x to point to a different object: 3.
It is important to understand: x is not a box that holds 2, and 2 is then incremented to 3. No, x initially points to 2 and that pointer is then changed to point to 3. Naturally, since we did not change what object baz points to, it still points to 2.
Another way to explain it is that in Python, all argument passing is by value, but all values are references to objects.
A counter-intuitive result of this is that if an object is mutable, it can be modified through any reference and all references will "see" the change. For example, consider this:
baz = [1, 2, 3]
def foo(x):
x[0] = x[0] + 1
foo(baz)
print baz
>>> [2, 2, 3]
This seems very different from our first example. But in reality, the argument is passed the same way. foo() receives a pointer to baz under the name x and then performs an operation on it that changes it (in this case, the first element of the list is pointed to a different int object). The difference is that the name x is never pointed to a new object; it is x[0] that is modified to point to a different object. x itself still points to the same object as baz. (In fact, under the hood the assignment to x[0] becomes a method call: x.__setitem__().) Therefore baz "sees" the modification to the list. How could it not?
You don't see this behavior with integers and strings because you can't change integers or strings; they are immutable types, and when you modify them (e.g. x = x + 1) you are not actually modifying them but binding your variable name to a completely different object. If you change baz to a tuple, e.g. baz = (1, 2, 3), you will find that foo() gives you an error because you can`t assign to elements of a tuple; tuples are another immutable type. "Changing" a tuple requires creating a new one, and assignment then points the variable to the new object.
Objects of classes you define are mutable and so your Pair instance can be modified by any function it is passed into -- that is, attributes may be added, deleted, or reassigned to other objects. None of these things will re-bind any of the names pointing to your object, so all the names that currently point to it will "see" the changes.
Python does not copy anything when passing variables to a function. It is neither call-by-value nor call-by-reference, but of those two it is more similar to call-by-reference. You could think of it as "call-by-value, but the value is a reference".
If you pass a mutable object to a function, then modifying that object inside the function will affect the object everywhere it appears. (If you pass an immutable object to a function, like a string or an integer, then by definition you can't modify the object at all.)
The reason this isn't technically pass-by-reference is that you can rebind a name so that the name refers to something else entirely. (For names of immutable objects, this is the only thing you can do to them.) Rebinding a name that exists only inside a function doesn't affect any names that might exist outside the function.
In your first example with the Pair objects, you are modifying an object, so you see the effects outside of the function.
In your second example, you are not modifying any objects, you are just rebinding names to other objects (other integers in this case). baz is a name that points to an integer object (in Python, everything is an object, even integers) with a value of 2. When you pass baz to foo(x), the name x is created locally inside the foo function on the stack, and x is set to the pointer that was passed into the function -- the same pointer as baz. But x and baz are not the same thing, they only contain pointers to the same object. On the x = x+1 line, x is rebound to point to an integer object with a value of 3, and that pointer is what is returned from the function and used to bind the integer object to y.
If you rewrote your first example to explicitly create a new Pair object inside your function based on the information from the Pair object passed into it (whether this is a copy you then modify, or if you make a constructor that modifies the data on construction) then your function would not have the side-effect of modifying the object that was passed in.
Edit: By the way, in Python you shouldn't use 0 as a placeholder to mean "I don't have a value" -- use None. And likewise you shouldn't use 0 to mean False, like you seem to be doing in flag. But all of 0, None and False evaluate to False in boolean expressions, so no matter which of those you use, you can say things like if not flag instead of if flag == 0.
I suggest that you forget about implementing a linked list, and simply use an instance of a Python list. If you need something other than the default Python list, maybe you can use something from a Python module such as collections.
A Python loop to follow the links in a linked list will run at Python interpreter speed, which is to say, slowly. If you simply use the built-in list class, your list operations will happen in Python's C code, and you will gain speed.
If you need something like a list but with fast insertion and fast deletion, can you make a dict work? If there is some sort of ID value (string or integer or whatever) that can be used to impose an ordering on your values, you could just use that as a key value and gain lightning fast insert and delete of values. Then if you need to extract values in order, you can use the dict.keys() method function to get a list of key values and use that.
But if you really need linked lists, I suggest you find code written and debugged by someone else, and adapt it to your needs. Google search for "python linked list recipe" or "python linked list module".
I'm going to throw in a slightly complicating factor:
>>> def foo(x):
... x *= 2
... return x
...
Define a slightly different function using a method I know is supported for numbers, lists, and strings.
First, call it with strings:
>>> baz = "hello"
>>> y = foo(baz)
>>> y
'hellohello'
>>> baz
'hello'
Next, call it with lists:
>>> baz=[1,2,2]
>>> y = foo(baz)
>>> y
[1, 2, 2, 1, 2, 2]
>>> baz
[1, 2, 2, 1, 2, 2]
>>>
With strings, the argument isn't modified. With lists, the argument is modified.
If it were me, I'd avoid modifying arguments within methods.

Python augmented assignment issue

i ran into something interesting about the python augmented assignment +=
it seems to be automatic data type conversion is not always done for a += b if a is a 'simpler' data type, while a = a + b seems to work always
cases where the conversion is done
a = 1
b = 1j
a = 1
b = 0.5
case where the conversion is not done
from numpy import array
a = array([0, 0 ,0])
b = array([0, 0, 1j])
after a += b, a remains as integer matrix, instead of complex matrix
i used to think a += b is the same as a = a + b, what is the difference of them in the underlying implementation?
For the + operator, Python defines three "special" methods that an object may implement:
__add__: adds two items (+ operator). When you do a + b, the __add__ method of a is called with b as an argument.
__radd__: reflected add; for a + b, the __radd__ method of b is called with a as an instance. This is only used when a doesn't know how to do the add and the two objects are different types.
__iadd__: in-place add; used for a += b where the result is assigned back to the left variable. This is provided separately because it might be possible to implement it in a more efficient way. For example, if a is a list, then a += b is the same as a.extend(b). However, in the case of c = a + b you have to make a copy of a before you extend it since a is not to be modified in this case. Note that if you don't implement __iadd__ then Python will just call __add__ instead.
So since these different operations are implemented with separate methods, it is possible (but generally bad practice) to implement them so they do totally different things, or perhaps in this case, only slightly different things.
Others have deduced that you're using NumPy and explained its behavior. However, you asked about the underlying implementation. Hopefully you now see why it is sometimes the case that a += b is not the same as a = a + b. By the way, a similar trio of methods may also be implemented for other operations. See this page for a list of all the supported in-place methods.
If array is numpy.array (you don't actually specify), then the issue that's happening is because these arrays cannot change their type. When you create the array without a type specifier, it guesses a type. If you then attempt to do an operation that type doesn't support (like adding it to a type with a larger domain, like complex), numpy knows perform the calculation, but it also knows that the result can only be stored in the type with the larger domain. It complains (on my machine, anyway, the first time I do such an assignment) that the result doesn't fit. When you do a regular addition, a new array has to be made in any case, and numpy gives it the correct type.
>>> a=numpy.array([1])
>>> a.dtype
dtype('int32')
>>> b=numpy.array([1+1j])
>>> b.dtype
dtype('complex128')
>>> a+b
array([ 2.+1.j])
>>> (a+b).dtype
dtype('complex128')
>>> a+=b
>>> a
array([2])
>>> a.dtype
dtype('int32')
>>>
The difference between a = a + b and a += b is, that the latter addition will, whenever possible, done “in-place” that means by changing the object a. You can easily see this with lists.
a = b = [1, 2]
a += [3]
print b # [1, 2, 3]
a = b = [1, 2]
a = a + [3]
print b # [1, 2]
Rafe Kettler's answer is correct, but it seems you have managed to get a=[0,0,0] after adding it to b (according to your post).
Well if you're using numpy or scipy (I say this because I see array and wonder what array is being created here), then this is "normal", and should even raise a warning:
ComplexWarning: Casting complex values to real discards the imaginary part

Hidden features of Python [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
What are the lesser-known but useful features of the Python programming language?
Try to limit answers to Python core.
One feature per answer.
Give an example and short description of the feature, not just a link to documentation.
Label the feature using a title as the first line.
Quick links to answers:
Argument Unpacking
Braces
Chaining Comparison Operators
Decorators
Default Argument Gotchas / Dangers of Mutable Default arguments
Descriptors
Dictionary default .get value
Docstring Tests
Ellipsis Slicing Syntax
Enumeration
For/else
Function as iter() argument
Generator expressions
import this
In Place Value Swapping
List stepping
__missing__ items
Multi-line Regex
Named string formatting
Nested list/generator comprehensions
New types at runtime
.pth files
ROT13 Encoding
Regex Debugging
Sending to Generators
Tab Completion in Interactive Interpreter
Ternary Expression
try/except/else
Unpacking+print() function
with statement
Chaining comparison operators:
>>> x = 5
>>> 1 < x < 10
True
>>> 10 < x < 20
False
>>> x < 10 < x*10 < 100
True
>>> 10 > x <= 9
True
>>> 5 == x > 4
True
In case you're thinking it's doing 1 < x, which comes out as True, and then comparing True < 10, which is also True, then no, that's really not what happens (see the last example.) It's really translating into 1 < x and x < 10, and x < 10 and 10 < x * 10 and x*10 < 100, but with less typing and each term is only evaluated once.
Get the python regex parse tree to debug your regex.
Regular expressions are a great feature of python, but debugging them can be a pain, and it's all too easy to get a regex wrong.
Fortunately, python can print the regex parse tree, by passing the undocumented, experimental, hidden flag re.DEBUG (actually, 128) to re.compile.
>>> re.compile("^\[font(?:=(?P<size>[-+][0-9]{1,2}))?\](.*?)[/font]",
re.DEBUG)
at at_beginning
literal 91
literal 102
literal 111
literal 110
literal 116
max_repeat 0 1
subpattern None
literal 61
subpattern 1
in
literal 45
literal 43
max_repeat 1 2
in
range (48, 57)
literal 93
subpattern 2
min_repeat 0 65535
any None
in
literal 47
literal 102
literal 111
literal 110
literal 116
Once you understand the syntax, you can spot your errors. There we can see that I forgot to escape the [] in [/font].
Of course you can combine it with whatever flags you want, like commented regexes:
>>> re.compile("""
^ # start of a line
\[font # the font tag
(?:=(?P<size> # optional [font=+size]
[-+][0-9]{1,2} # size specification
))?
\] # end of tag
(.*?) # text between the tags
\[/font\] # end of the tag
""", re.DEBUG|re.VERBOSE|re.DOTALL)
enumerate
Wrap an iterable with enumerate and it will yield the item along with its index.
For example:
>>> a = ['a', 'b', 'c', 'd', 'e']
>>> for index, item in enumerate(a): print index, item
...
0 a
1 b
2 c
3 d
4 e
>>>
References:
Python tutorial—looping techniques
Python docs—built-in functions—enumerate
PEP 279
Creating generators objects
If you write
x=(n for n in foo if bar(n))
you can get out the generator and assign it to x. Now it means you can do
for n in x:
The advantage of this is that you don't need intermediate storage, which you would need if you did
x = [n for n in foo if bar(n)]
In some cases this can lead to significant speed up.
You can append many if statements to the end of the generator, basically replicating nested for loops:
>>> n = ((a,b) for a in range(0,2) for b in range(4,6))
>>> for i in n:
... print i
(0, 4)
(0, 5)
(1, 4)
(1, 5)
iter() can take a callable argument
For instance:
def seek_next_line(f):
for c in iter(lambda: f.read(1),'\n'):
pass
The iter(callable, until_value) function repeatedly calls callable and yields its result until until_value is returned.
Be careful with mutable default arguments
>>> def foo(x=[]):
... x.append(1)
... print x
...
>>> foo()
[1]
>>> foo()
[1, 1]
>>> foo()
[1, 1, 1]
Instead, you should use a sentinel value denoting "not given" and replace with the mutable you'd like as default:
>>> def foo(x=None):
... if x is None:
... x = []
... x.append(1)
... print x
>>> foo()
[1]
>>> foo()
[1]
Sending values into generator functions. For example having this function:
def mygen():
"""Yield 5 until something else is passed back via send()"""
a = 5
while True:
f = (yield a) #yield a and possibly get f in return
if f is not None:
a = f #store the new value
You can:
>>> g = mygen()
>>> g.next()
5
>>> g.next()
5
>>> g.send(7) #we send this back to the generator
7
>>> g.next() #now it will yield 7 until we send something else
7
If you don't like using whitespace to denote scopes, you can use the C-style {} by issuing:
from __future__ import braces
The step argument in slice operators. For example:
a = [1,2,3,4,5]
>>> a[::2] # iterate over the whole list in 2-increments
[1,3,5]
The special case x[::-1] is a useful idiom for 'x reversed'.
>>> a[::-1]
[5,4,3,2,1]
Decorators
Decorators allow to wrap a function or method in another function that can add functionality, modify arguments or results, etc. You write decorators one line above the function definition, beginning with an "at" sign (#).
Example shows a print_args decorator that prints the decorated function's arguments before calling it:
>>> def print_args(function):
>>> def wrapper(*args, **kwargs):
>>> print 'Arguments:', args, kwargs
>>> return function(*args, **kwargs)
>>> return wrapper
>>> #print_args
>>> def write(text):
>>> print text
>>> write('foo')
Arguments: ('foo',) {}
foo
The for...else syntax (see http://docs.python.org/ref/for.html )
for i in foo:
if i == 0:
break
else:
print("i was never 0")
The "else" block will be normally executed at the end of the for loop, unless the break is called.
The above code could be emulated as follows:
found = False
for i in foo:
if i == 0:
found = True
break
if not found:
print("i was never 0")
From 2.5 onwards dicts have a special method __missing__ that is invoked for missing items:
>>> class MyDict(dict):
... def __missing__(self, key):
... self[key] = rv = []
... return rv
...
>>> m = MyDict()
>>> m["foo"].append(1)
>>> m["foo"].append(2)
>>> dict(m)
{'foo': [1, 2]}
There is also a dict subclass in collections called defaultdict that does pretty much the same but calls a function without arguments for not existing items:
>>> from collections import defaultdict
>>> m = defaultdict(list)
>>> m["foo"].append(1)
>>> m["foo"].append(2)
>>> dict(m)
{'foo': [1, 2]}
I recommend converting such dicts to regular dicts before passing them to functions that don't expect such subclasses. A lot of code uses d[a_key] and catches KeyErrors to check if an item exists which would add a new item to the dict.
In-place value swapping
>>> a = 10
>>> b = 5
>>> a, b
(10, 5)
>>> a, b = b, a
>>> a, b
(5, 10)
The right-hand side of the assignment is an expression that creates a new tuple. The left-hand side of the assignment immediately unpacks that (unreferenced) tuple to the names a and b.
After the assignment, the new tuple is unreferenced and marked for garbage collection, and the values bound to a and b have been swapped.
As noted in the Python tutorial section on data structures,
Note that multiple assignment is really just a combination of tuple packing and sequence unpacking.
Readable regular expressions
In Python you can split a regular expression over multiple lines, name your matches and insert comments.
Example verbose syntax (from Dive into Python):
>>> pattern = """
... ^ # beginning of string
... M{0,4} # thousands - 0 to 4 M's
... (CM|CD|D?C{0,3}) # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 C's),
... # or 500-800 (D, followed by 0 to 3 C's)
... (XC|XL|L?X{0,3}) # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 X's),
... # or 50-80 (L, followed by 0 to 3 X's)
... (IX|IV|V?I{0,3}) # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 I's),
... # or 5-8 (V, followed by 0 to 3 I's)
... $ # end of string
... """
>>> re.search(pattern, 'M', re.VERBOSE)
Example naming matches (from Regular Expression HOWTO)
>>> p = re.compile(r'(?P<word>\b\w+\b)')
>>> m = p.search( '(((( Lots of punctuation )))' )
>>> m.group('word')
'Lots'
You can also verbosely write a regex without using re.VERBOSE thanks to string literal concatenation.
>>> pattern = (
... "^" # beginning of string
... "M{0,4}" # thousands - 0 to 4 M's
... "(CM|CD|D?C{0,3})" # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 C's),
... # or 500-800 (D, followed by 0 to 3 C's)
... "(XC|XL|L?X{0,3})" # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 X's),
... # or 50-80 (L, followed by 0 to 3 X's)
... "(IX|IV|V?I{0,3})" # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 I's),
... # or 5-8 (V, followed by 0 to 3 I's)
... "$" # end of string
... )
>>> print pattern
"^M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$"
Function argument unpacking
You can unpack a list or a dictionary as function arguments using * and **.
For example:
def draw_point(x, y):
# do some magic
point_foo = (3, 4)
point_bar = {'y': 3, 'x': 2}
draw_point(*point_foo)
draw_point(**point_bar)
Very useful shortcut since lists, tuples and dicts are widely used as containers.
ROT13 is a valid encoding for source code, when you use the right coding declaration at the top of the code file:
#!/usr/bin/env python
# -*- coding: rot13 -*-
cevag "Uryyb fgnpxbiresybj!".rapbqr("rot13")
Creating new types in a fully dynamic manner
>>> NewType = type("NewType", (object,), {"x": "hello"})
>>> n = NewType()
>>> n.x
"hello"
which is exactly the same as
>>> class NewType(object):
>>> x = "hello"
>>> n = NewType()
>>> n.x
"hello"
Probably not the most useful thing, but nice to know.
Edit: Fixed name of new type, should be NewType to be the exact same thing as with class statement.
Edit: Adjusted the title to more accurately describe the feature.
Context managers and the "with" Statement
Introduced in PEP 343, a context manager is an object that acts as a run-time context for a suite of statements.
Since the feature makes use of new keywords, it is introduced gradually: it is available in Python 2.5 via the __future__ directive. Python 2.6 and above (including Python 3) has it available by default.
I have used the "with" statement a lot because I think it's a very useful construct, here is a quick demo:
from __future__ import with_statement
with open('foo.txt', 'w') as f:
f.write('hello!')
What's happening here behind the scenes, is that the "with" statement calls the special __enter__ and __exit__ methods on the file object. Exception details are also passed to __exit__ if any exception was raised from the with statement body, allowing for exception handling to happen there.
What this does for you in this particular case is that it guarantees that the file is closed when execution falls out of scope of the with suite, regardless if that occurs normally or whether an exception was thrown. It is basically a way of abstracting away common exception-handling code.
Other common use cases for this include locking with threads and database transactions.
Dictionaries have a get() method
Dictionaries have a 'get()' method. If you do d['key'] and key isn't there, you get an exception. If you do d.get('key'), you get back None if 'key' isn't there. You can add a second argument to get that item back instead of None, eg: d.get('key', 0).
It's great for things like adding up numbers:
sum[value] = sum.get(value, 0) + 1
Descriptors
They're the magic behind a whole bunch of core Python features.
When you use dotted access to look up a member (eg, x.y), Python first looks for the member in the instance dictionary. If it's not found, it looks for it in the class dictionary. If it finds it in the class dictionary, and the object implements the descriptor protocol, instead of just returning it, Python executes it. A descriptor is any class that implements the __get__, __set__, or __delete__ methods.
Here's how you'd implement your own (read-only) version of property using descriptors:
class Property(object):
def __init__(self, fget):
self.fget = fget
def __get__(self, obj, type):
if obj is None:
return self
return self.fget(obj)
and you'd use it just like the built-in property():
class MyClass(object):
#Property
def foo(self):
return "Foo!"
Descriptors are used in Python to implement properties, bound methods, static methods, class methods and slots, amongst other things. Understanding them makes it easy to see why a lot of things that previously looked like Python 'quirks' are the way they are.
Raymond Hettinger has an excellent tutorial that does a much better job of describing them than I do.
Conditional Assignment
x = 3 if (y == 1) else 2
It does exactly what it sounds like: "assign 3 to x if y is 1, otherwise assign 2 to x". Note that the parens are not necessary, but I like them for readability. You can also chain it if you have something more complicated:
x = 3 if (y == 1) else 2 if (y == -1) else 1
Though at a certain point, it goes a little too far.
Note that you can use if ... else in any expression. For example:
(func1 if y == 1 else func2)(arg1, arg2)
Here func1 will be called if y is 1 and func2, otherwise. In both cases the corresponding function will be called with arguments arg1 and arg2.
Analogously, the following is also valid:
x = (class1 if y == 1 else class2)(arg1, arg2)
where class1 and class2 are two classes.
Doctest: documentation and unit-testing at the same time.
Example extracted from the Python documentation:
def factorial(n):
"""Return the factorial of n, an exact integer >= 0.
If the result is small enough to fit in an int, return an int.
Else return a long.
>>> [factorial(n) for n in range(6)]
[1, 1, 2, 6, 24, 120]
>>> factorial(-1)
Traceback (most recent call last):
...
ValueError: n must be >= 0
Factorials of floats are OK, but the float must be an exact integer:
"""
import math
if not n >= 0:
raise ValueError("n must be >= 0")
if math.floor(n) != n:
raise ValueError("n must be exact integer")
if n+1 == n: # catch a value like 1e300
raise OverflowError("n too large")
result = 1
factor = 2
while factor <= n:
result *= factor
factor += 1
return result
def _test():
import doctest
doctest.testmod()
if __name__ == "__main__":
_test()
Named formatting
% -formatting takes a dictionary (also applies %i/%s etc. validation).
>>> print "The %(foo)s is %(bar)i." % {'foo': 'answer', 'bar':42}
The answer is 42.
>>> foo, bar = 'question', 123
>>> print "The %(foo)s is %(bar)i." % locals()
The question is 123.
And since locals() is also a dictionary, you can simply pass that as a dict and have % -substitions from your local variables. I think this is frowned upon, but simplifies things..
New Style Formatting
>>> print("The {foo} is {bar}".format(foo='answer', bar=42))
To add more python modules (espcially 3rd party ones), most people seem to use PYTHONPATH environment variables or they add symlinks or directories in their site-packages directories. Another way, is to use *.pth files. Here's the official python doc's explanation:
"The most convenient way [to modify
python's search path] is to add a path
configuration file to a directory
that's already on Python's path,
usually to the .../site-packages/
directory. Path configuration files
have an extension of .pth, and each
line must contain a single path that
will be appended to sys.path. (Because
the new paths are appended to
sys.path, modules in the added
directories will not override standard
modules. This means you can't use this
mechanism for installing fixed
versions of standard modules.)"
Exception else clause:
try:
put_4000000000_volts_through_it(parrot)
except Voom:
print "'E's pining!"
else:
print "This parrot is no more!"
finally:
end_sketch()
The use of the else clause is better than adding additional code to the try clause because it avoids accidentally catching an exception that wasn’t raised by the code being protected by the try ... except statement.
See http://docs.python.org/tut/node10.html
Re-raising exceptions:
# Python 2 syntax
try:
some_operation()
except SomeError, e:
if is_fatal(e):
raise
handle_nonfatal(e)
# Python 3 syntax
try:
some_operation()
except SomeError as e:
if is_fatal(e):
raise
handle_nonfatal(e)
The 'raise' statement with no arguments inside an error handler tells Python to re-raise the exception with the original traceback intact, allowing you to say "oh, sorry, sorry, I didn't mean to catch that, sorry, sorry."
If you wish to print, store or fiddle with the original traceback, you can get it with sys.exc_info(), and printing it like Python would is done with the 'traceback' module.
Main messages :)
import this
# btw look at this module's source :)
De-cyphered:
The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than right now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
Interactive Interpreter Tab Completion
try:
import readline
except ImportError:
print "Unable to load readline module."
else:
import rlcompleter
readline.parse_and_bind("tab: complete")
>>> class myclass:
... def function(self):
... print "my function"
...
>>> class_instance = myclass()
>>> class_instance.<TAB>
class_instance.__class__ class_instance.__module__
class_instance.__doc__ class_instance.function
>>> class_instance.f<TAB>unction()
You will also have to set a PYTHONSTARTUP environment variable.
Nested list comprehensions and generator expressions:
[(i,j) for i in range(3) for j in range(i) ]
((i,j) for i in range(4) for j in range(i) )
These can replace huge chunks of nested-loop code.
Operator overloading for the set builtin:
>>> a = set([1,2,3,4])
>>> b = set([3,4,5,6])
>>> a | b # Union
{1, 2, 3, 4, 5, 6}
>>> a & b # Intersection
{3, 4}
>>> a < b # Subset
False
>>> a - b # Difference
{1, 2}
>>> a ^ b # Symmetric Difference
{1, 2, 5, 6}
More detail from the standard library reference: Set Types

Categories

Resources