Emulating pass-by-value behaviour in python - python

I would like to emulate the pass-by-value behaviour in python. In other words, I would like to make absolutely sure that the function I write do not modify user supplied data.
One possible way is to use deep copy:
from copy import deepcopy
def f(data):
data = deepcopy(data)
#do stuff
is there more efficient or more pythonic way to achieve this goal, making as few assumptions as possible about the object being passed (such as .clone() method)
Edit
I'm aware that technically everything in python is passed by value. I was interested in emulating the behaviour, i.e. making sure I don't mess with the data that was passed to the function. I guess the most general way is to clone the object in question either with its own clone mechanism or with deepcopy.

There is no pythonic way of doing this.
Python provides very few facilities for enforcing things such as private or read-only data. The pythonic philosophy is that "we're all consenting adults": in this case this means that "the function shouldn't change the data" is part of the spec but not enforced in the code.
If you want to make a copy of the data, the closest you can get is your solution. But copy.deepcopy, besides being inefficient, also has caveats such as:
Because deep copy copies everything it may copy too much, e.g., administrative data structures that should be shared even between copies.
[...]
This module does not copy types like module, method, stack trace, stack frame, file, socket, window, array, or any similar types.
So i'd only recommend it if you know that you're dealing with built-in Python types or your own objects (where you can customize copying behavior by defining the __copy__ / __deepcopy__ special methods, there's no need to define your own clone() method).

You can make a decorator and put the cloning behaviour in that.
>>> def passbyval(func):
def new(*args):
cargs = [deepcopy(arg) for arg in args]
return func(*cargs)
return new
>>> #passbyval
def myfunc(a):
print a
>>> myfunc(20)
20
This is not the most robust way, and doesn't handle key-value arguments or class methods (lack of self argument), but you get the picture.
Note that the following statements are equal:
#somedecorator
def func1(): pass
# ... same as ...
def func2(): pass
func2 = somedecorator(func2)
You could even have the decorator take some kind of function that does the cloning and thus allowing the user of the decorator to decide the cloning strategy. In that case the decorator is probably best implemented as a class with __call__ overridden.

There are only a couple of builtin typs that work as references, like list, for example.
So, for me the pythonic way for doing a pass-by-value, for list, in this example, would be:
list1 = [0,1,2,3,4]
list2 = list1[:]
list1[:] creates a new instance of the list1, and you can assign it to a new variable.
Maybe you could write a function that could receive one argument, then check its type, and according that resulta, perform a builtin operation that could return a new instance of the argument passed.
As I said earlier, there are only a few builtin types, that their behavior is like references, lists in this example.
Any way... hope it helps.

I can't figure out any other pythonic option. But personally I'd prefer the more OO way.
class TheData(object):
def clone(self): """return the cloned"""
def f(data):
#do stuff
def caller():
d = TheData()
f(d.clone())

usually when passing data to an external API, you can assure the integrity of your data by passing it as an immutable object, for example wrap your data into a tuple. This cannot be modified, if that is what you tried to prevent by your code.

Further to user695800's answer, pass by value for lists possible with the [:] operator
def listCopy(l):
l[1] = 5
for i in l:
print i
called with
In [12]: list1 = [1,2,3,4]
In [13]: listCopy(list1[:])
1
5
3
4
list1
Out[14]: [1, 2, 3, 4]

Though i'm sure there's no really pythonic way to do this, i expect the pickle module will give you copies of everything you have any business treating as a value.
import pickle
def f(data):
data = pickle.loads(pickle.dumps((data)))
#do stuff

Many people use the standard library copy. I prefer to defining __copy__ or __deepcopy__ in my classes. The methods in copy may have some problems.
The shallow copy will keep references to the objects in the original instead of creating new one.
The deepcopy will run recursively, sometimes may cause dead-loop. If without enough attention, memory may explode.
To avoid these out-of-control behaviors, define your own shallow/deep copy methods through overwriting __copy__ and __deepcopy__. And Alex's answer give a good example.

Related

Overriding the [] for constructing Python lists [duplicate]

Couldn't find a way to phrase the title better, feel free to correct.
I'm pretty new to Python, currently experimenting with the language.. I've noticed that all built-ins types cannot be extended with other members.. I'd like for example to add an each method to the list type, but that would be impossible. I realize that it's designed that way for efficiency reasons, and that most of the built-ins types are implemented in C.
Well, one why I found to override this behavior is be defining a new class, which extends list but otherwise does nothing. Then I can assign the variable list to that new class, and each time I would like to instantiate a new list, I'd use the list constructor, like it would have been used to create the original list type.
class MyList(list):
def each(self, func):
for item in self:
func(item)
list = MyList
my_list = list((1,2,3,4))
my_list.each(lambda x: print(x))
Output:
1
2
3
4
The idea can be generalize of course by defining a method that gets a built it type and returns a class that extends that type. Further more, the original list variable can be saved in another variable to keep an access to it.
Only problem I'm facing right now, is that when you instantiate a list by its literal form, (i.e. [1,2,3,4]), it will still use the original list constructor (or does it?). Is there a way to override this behavior? If the answer is no, do you know of some other way of enabling the user to extend the built-ins types? (just like javascript allows extending built-ins prototypes).
I find this limitation of built-ins (being unable to add members to them) one of Python's drawbacks, making it inconsistent with other user-defined types... Overall I really love the language, and I really don't understand why this limitation is REALLY necessary.
This is a conscientious choice from Python.
Firstly, with regards to patching inbuilt type, this is primarily a design decision and only secondarily an optimization. I have learnt from much lurking on the Python Mailing List that monkey patching on builtin types, although enjoyable for small scripts, serves no good purpose in anything larger.
Libraries, for one, make certain assumptions about types. If it were encouraged to extend default types, many libraries would end up fighting each other. It would also discourage making new types – a deque is a deque, an ordered set is an ordered set, a dictionary is a dictionary and that should be that.
Literal syntax is a particularly important point. If you cannot guarantee that [1, 2, 3] is a list, what can you guarantee? If people could change those behaviours it would have such global impacts as to destroy the stability of a lot of code. There is a reason goto and global variables are discouraged.
There is one particular hack that I am fond of, though. When you see r"hello", this seems to be an extended literal form.
So why not r[1, 2, 3]?
class ListPrefixer:
def __init__(self, typ):
self.typ = typ
def __getitem__(self, args):
return self.typ(args)
class MyList(list):
def each(self, func):
return MyList(func(x) for x in self)
e = ListPrefixer(MyList)
e[1, 2, 3, 4].each(lambda x: x**2)
#>>> [1, 4, 9, 16]
Finally, if you really want to do deep AST hacks, check out MacroPy.

Is it idiomatic to take advantage of python's pass-by-sharing of mutable data structures like lists?

I know that in Python, because it's pass-by-sharing, if I pass a mutable object (like a list) to a function, and then use that function to mutate it, I don't need to explicitly pass it back, because the caller can see the changes:
def add_to_list(list_of_nums):
list_of_nums.append(26)
my_list = [12]
add_to_list(my_list)
print my_list # >>>[12, 26]
So this works. But is it a good idea/good python practice? My gut says it's not (the same way global variables are almost always a bad idea), but maybe that's just because I first learned C++ in all its pass-by-value glory.
And yes, I know that I can code my way around this (say by creating a class), but the question is, should I, or is this generally seen as acceptable practice?
I think whether or not this is "acceptable" will be determined by the context and how well the function is named. keep Pep 20 in mind: "Explicit is better than implicit."
Python fully supports functional programming, and in that case, modifying objects that are passed in is often expected. If the function is named appropriately and documented well, I think it's fine. Your example illustrates this pretty well. The function is called add_to_list, which pretty explicitly says what the function does.
If your program/script takes more of an object-oriented approach, modifying passed-in objects should be replaced by creating the appropriate classes instead, like the native list class in your example - it has an append() method that modifies the list instead of passing the list into a separate function.
The key is to be consistent with your paradigm and well documented. If you cover both of those bases, I think it's acceptable.

Should I subclass list or create class with list as attribute?

I need a container that can collect a number of objects and provides some reporting functionality on the container's elements. Essentially, I'd like to be able to do:
magiclistobject = MagicList()
magiclistobject.report() ### generates all my needed info about the list content
So I thought of subclassing the normal list and adding a report() method. That way, I get to use all the built-in list functionality.
class SubClassedList(list):
def __init__(self):
list.__init__(self)
def report(self): # forgive the silly example
if 999 in self:
print "999 Alert!"
Instead, I could also create my own class that has a magiclist attribute but I would then have to create new methods for appending, extending, etc., if I want to get to the list using:
magiclistobject.append() # instead of magiclistobject.list.append()
I would need something like this (which seems redundant):
class MagicList():
def __init__(self):
self.list = []
def append(self,element):
self.list.append(element)
def extend(self,element):
self.list.extend(element)
# more list functionality as needed...
def report(self):
if 999 in self.list:
print "999 Alert!"
I thought that subclassing the list would be a no-brainer. But this post here makes it sounds like a no-no. Why?
One reason why extending list might be bad is since it ties together your 'MagicReport' object too closely to the list. For example, a Python list supports the following methods:
append
count
extend
index
insert
pop
remove
reverse
sort
It also contains a whole host of other operations (adding, comparisons using < and >, slicing, etc).
Are all of those operations things that your 'MagicReport' object actually wants to support? For example, the following is legal Python:
b = [1, 2]
b *= 3
print b # [1, 2, 1, 2, 1, 2]
This is a pretty contrived example, but if you inherit from 'list', your 'MagicReport' object will do exactly the same thing if somebody inadvertently does something like this.
As another example, what if you try slicing your MagicReport object?
m = MagicReport()
# Add stuff to m
slice = m[2:3]
print type(slice)
You'd probably expect the slice to be another MagicReport object, but it's actually a list. You'd need to override __getslice__ in order to avoid surprising behavior, which is a bit of a pain.
It also makes it harder for you to change the implementation of your MagicReport object. If you end up needing to do more sophisticated analysis, it often helps to be able to change the underlying data structure into something more suited for the problem.
If you subclass list, you could get around this problem by just providing new append, extend, etc methods so that you don't change the interface, but you won't have any clear way of determining which of the list methods are actually being used unless you read through the entire codebase. However, if you use composition and just have a list as a field and create methods for the operations you support, you know exactly what needs to be changed.
I actually ran into a scenario very similar to your at work recently. I had an object which contained a collection of 'things' which I first internally represented as a list. As the requirements of the project changed, I ended up changing the object to internally use a dict, a custom collections object, then finally an OrderedDict in rapid succession. At least in my experience, composition makes it much easier to change how something is implemented as opposed to inheritance.
That being said, I think extending list might be ok in scenarios where your 'MagicReport' object is legitimately a list in all but name. If you do want to use MagicReport as a list in every single way, and don't plan on changing its implementation, then it just might be more convenient to subclass list and just be done with it.
Though in that case, it might be better to just use a list and write a 'report' function -- I can't imagine you needing to report the contents of the list more than once, and creating a custom object with a custom method just for that purpose might be overkill (though this obviously depends on what exactly you're trying to do)
As a general rule, whenever you ask yourself "should I inherit or have a member of that type", choose not to inherit. This rule of thumb is known as "favour composition over inheritance".
The reason why this is so is: composition is appropriate where you want to use features of another class; inheritance is appropriate if other code needs to use the features of the other class with the class you are creating.

How to make a variable (truly) local to a procedure or function

ie we have the global declaration, but no local.
"Normally" arguments are local, I think, or they certainly behave that way.
However if an argument is, say, a list and a method is applied which modifies the list, some surprising (to me) results can ensue.
I have 2 questions: what is the proper way to ensure that a variable is truly local?
I wound up using the following, which works, but it can hardly be the proper way of doing it:
def AexclB(a,b):
z = a+[] # yuk
for k in range(0, len(b)):
try: z.remove(b[k])
except: continue
return z
Absent the +[], "a" in the calling scope gets modified, which is not desired.
(The issue here is using a list method,
The supplementary question is, why is there no "local" declaration?
Finally, in trying to pin this down, I made various mickey mouse functions which all behaved as expected except the last one:
def fun4(a):
z = a
z = z.append(["!!"])
return z
a = ["hello"]
print "a=",a
print "fun4(a)=",fun4(a)
print "a=",a
which produced the following on the console:
a= ['hello']
fun4(a)= None
a= ['hello', ['!!']]
...
>>>
The 'None' result was not expected (by me).
Python 2.7 btw in case that matters.
PS: I've tried searching here and elsewhere but not succeeded in finding anything corresponding exactly - there's lots about making variables global, sadly.
It's not that z isn't a local variable in your function. Rather when you have the line z = a, you are making z refer to the same list in memory that a already points to. If you want z to be a copy of a, then you should write z = a[:] or z = list(a).
See this link for some illustrations and a bit more explanation http://henry.precheur.org/python/copy_list
Python will not copy objects unless you explicitly ask it to. Integers and strings are not modifiable, so every operation on them returns a new instance of the type. Lists, dictionaries, and basically every other object in Python are mutable, so operations like list.append happen in-place (and therefore return None).
If you want the variable to be a copy, you must explicitly copy it. In the case of lists, you slice them:
z = a[:]
There is a great answer than will cover most of your question in here which explains mutable and immutable types and how they are kept in memory and how they are referenced. First section of the answer is for you. (Before How do we get around this? header)
In the following line
z = z.append(["!!"])
Lists are mutable objects, so when you call append, it will update referenced object, it will not create a new one and return it. If a method or function do not retun anything, it means it returns None.
Above link also gives an immutable examle so you can see the real difference.
You can not make a mutable object act like it is immutable. But you can create a new one instead of passing the reference when you create a new object from an existing mutable one.
a = [1,2,3]
b = a[:]
For more options you can check here
What you're missing is that all variable assignment in python is by reference (or by pointer, if you like). Passing arguments to a function literally assigns values from the caller to the arguments of the function, by reference. If you dig into the reference, and change something inside it, the caller will see that change.
If you want to ensure that callers will not have their values changed, you can either try to use immutable values more often (tuple, frozenset, str, int, bool, NoneType), or be certain to take copies of your data before mutating it in place.
In summary, scoping isn't involved in your problem here. Mutability is.
Is that clear now?
Still not sure whats the 'correct' way to force the copy, there are
various suggestions here.
It differs by data type, but generally <type>(obj) will do the trick. For example list([1, 2]) and dict({1:2}) both return (shallow!) copies of their argument.
If, however, you have a tree of mutable objects and also you don't know a-priori which level of the tree you might modify, you need the copy module. That said, I've only needed this a handful of times (in 8 years of full-time python), and most of those ended up causing bugs. If you need this, it's a code smell, in my opinion.
The complexity of maintaining copies of mutable objects is the reason why there is a growing trend of using immutable objects by default. In the clojure language, all data types are immutable by default and mutability is treated as a special cases to be minimized.
If you need to work on a list or other object in a truly local context you need to explicitly make a copy or a deep copy of it.
from copy import copy
def fn(x):
y = copy(x)

Python class function default variables are class objects? [duplicate]

This question already has answers here:
Why does using `arg=None` fix Python's mutable default argument issue?
(5 answers)
"Least Astonishment" and the Mutable Default Argument
(33 answers)
Closed 9 months ago.
I was writing some code this afternoon, and stumbled across a bug in my code. I noticed that the default values for one of my newly created objects was carrying over from another object! For example:
class One(object):
def __init__(self, my_list=[]):
self.my_list = my_list
one1 = One()
print(one1.my_list)
[] # empty list, what you'd expect.
one1.my_list.append('hi')
print(one1.my_list)
['hi'] # list with the new value in it, what you'd expect.
one2 = One()
print(one2.my_list)
['hi'] # Hey! It saved the variable from the other One!
So I know it can be solved by doing this:
class One(object):
def __init__(self, my_list=None):
self.my_list = my_list if my_list is not None else []
What I would like to know is... Why? Why are Python classes structured so that the default values are saved across instances of the class?
This is a known behaviour of the way Python default values work, which is often surprising to the unwary. The empty array object [] is created at the time of definition of the function, rather than at the time it is called.
To fix it, try:
def __init__(self, my_list=None):
if my_list is None:
my_list = []
self.my_list = my_list
Several others have pointed out that this is an instance of the "mutable default argument" issue in Python. The basic reason is that the default arguments have to exist "outside" the function in order to be passed into it.
But the real root of this as a problem has nothing to do with default arguments. Any time it would be bad if a mutable default value was modified, you really need to ask yourself: would it be bad if an explicitly provided value was modified? Unless someone is extremely familiar with the guts of your class, the following behaviour would also be very surprising (and therefore lead to bugs):
>>> class One(object):
... def __init__(self, my_list=[]):
... self.my_list = my_list
...
>>> alist = ['hello']
>>> one1 = One(alist)
>>> alist.append('world')
>>> one2 = One(alist)
>>>
>>> print(one1.my_list) # Huh? This isn't what I initialised one1 with!
['hello', 'world']
>>> print(one2.my_list) # At least this one's okay...
['hello', 'world']
>>> del alist[0]
>>> print one2.my_list # What the hell? I just modified a local variable and a class instance somewhere else got changed?
['world']
9 times out of 10, if you discover yourself reaching for the "pattern" of using None as the default value and using if value is None: value = default, you shouldn't be. You should be just not modifying your arguments! Arguments should not be treated as owned by the called code unless it is explicitly documented as taking ownership of them.
In this case (especially because you're initialising a class instance, so the mutable variable is going to live a long time and be used by other methods and potentially other code that retrieves it from the instance) I would do the following:
class One(object):
def __init__(self, my_list=[])
self.my_list = list(my_list)
Now you're initialising the data of your class from a list provided as input, rather than taking ownership of a pre-existing list. There's no danger that two separate instances end up sharing the same list, nor that the list is shared with a variable in the caller which the caller may want to continue using. It also has the nice effect that your callers can provide tuples, generators, strings, sets, dictionaries, home-brewed custom iterable classes, etc, and you know you can still count on self.my_list having an append method, because you made it yourself.
There's still a potential problem here, if the elements contained in the list are themselves mutable then the caller and this instance can still accidentally interfere with each other. I find it not to very often be a problem in practice in my code (so I don't automatically take a deep copy of everything), but you have to be aware of it.
Another issue is that if my_list can be very large, the copy can be expensive. There you have to make a trade-off. In that case, maybe it is better to just use the passed-in list after all, and use the if my_list is None: my_list = [] pattern to prevent all default instances sharing the one list. But if you do that you need to make it clear, either in documentation or the name of the class, that callers are relinquishing ownership of the lists they use to initialise the instance. Or, if you really want to be constructing a list solely for the purpose of wrapping up in an instance of One, maybe you should figure out how to encapsulate the creation of the list inside the initialisation of One, rather than constructing it first; after all, it's really part of the instance, not an initialising value. Sometimes this isn't flexible enough though.
And sometimes you really honestly do want to have aliasing going on, and have code communicating by mutating values they both have access to. I think very hard before I commit to such a design, however. And it will surprise others (and you when you come back to the code in X months), so again documentation is your friend!
In my opinion, educating new Python programmers about the "mutable default argument" gotcha is actually (slightly) harmful. We should be asking them "Why are you modifying your arguments?" (and then pointing out the way default arguments work in Python). The very fact of a function having a sensible default argument is often a good indicator that it isn't intended as something that receives ownership of a pre-existing value, so it probably shouldn't be modifying the argument whether or not it got the default value.
Basically, python function objects store a tuple of default arguments, which is fine for immutable things like integers, but lists and other mutable objects are often modified in-place, resulting in the behavior you observed.
This is standard behavior of default arguments anywhere in Python, not just in classes.
For more explanation, see Mutable defaults for function/method arguments.
Python functions are objects. Default arguments of a function are attributes of that function. So if the default value of an argument is mutable and it's modified inside your function, the changes are reflected in subsequent calls to that function.
Not an answer, but it's worth noting this is also true for class variables defined outside any class functions.
Example:
>>> class one:
... myList = []
...
>>>
>>> one1 = one()
>>> one1.myList
[]
>>> one2 = one()
>>> one2.myList.append("Hello Thar!")
>>>
>>> one1.myList
['Hello Thar!']
>>>
Note that not only does the value of myList persist, but every instance of myList points to the same list.
I ran into this bug/feature myself, and spent something like 3 hours trying to figure out what was going on. It's rather challenging to debug when you are getting valid data, but it's not from the local computations, but previous ones.
It's made worse since this is not just a default argument. You can't just put myList in the class definition, it has to be set equal to something, although whatever it is set equal to is only evaluated once.
The solution, at least for me, was to simply create all the class variable inside __init__.

Categories

Resources