python: unintentionally modifying parameters passed into a function - python

A few times I accidentally modified the input to a function. Since Python has no constant references, I'm wondering what coding techniques might help me avoid making this mistake too often?
Example:
class Table:
def __init__(self, fields, raw_data):
# fields is a dictionary with field names as keys, and their types as value
# sometimes, we want to delete some of the elements
for field_name, data_type in fields.items():
if some_condition(field_name, raw_data):
del fields[field_name]
# ...
# in another module
# fields is already initialized here to some dictionary
table1 = Table(fields, raw_data1) # fields is corrupted by Table's __init__
table2 = Table(fields, raw_data2)
Of course the fix is to make a copy of the parameter before I change it:
def __init__(self, fields, raw_data):
fields = copy.copy(fields)
# but copy.copy is safer and more generally applicable than .copy
# ...
But it's so easy to forget.
I'm half thinking to make a copy of each argument at the beginning of every function, unless the argument potentially refers to a large data set which may be expensive to copy or unless the argument is intended to be modified. That would nearly eliminate the problem, but it would result in a significant amount of useless code at the start of each function. In addition, it would essentially override Python's approach of passing parameters by reference, which presumably was done for a reason.

First general rule: don't modify containers: create new ones.
So don't modify your incoming dictionary, create a new dictionary with a subset of the keys.
self.fields = dict( key, value for key, value in fields.items()
if accept_key(key, data) )
Such methods are typically slightly more efficient then going through and deleting the bad elements anyways. More generally, its often easier to avoid modifying objects and instead create new ones.
Second general rule: don't modify containers after passing them off.
You can't generally assume that containers to which you have passed data have made their own copies. As result, don't try to modify the containers you've given them. Any modifications should be done before handing off the data. Once you've passed the container to somebody else you are no longer the sole master of it.
Third general rule: don't modify containers you didn't create.
If you get passed some sort of container, you do not know who else might be using the container. So don't modify it. Either use the unmodified version or invoke rule1, creating a new container with the desired changes.
Fourth general rule: (stolen from Ethan Furman)
Some functions are supposed to modify the list. That is their job. If this is the case make that apparent in the function name (such as the list methods append and extend).
Putting it all together:
A piece of code should only modify a container when it is the only piece of code with access to that container.

Making copies of parameters 'just-in-case' is a bad idea: you end up paying for it in lousy performance; or you have to keep track of the sizes of your arguments instead.
Better to get a good understanding of objects and names and how Python deals with them. A good start being this article.
The importart point being that
def modi_list(alist):
alist.append(4)
some_list = [1, 2, 3]
modi_list(some_list)
print(some_list)
has exactly the same affect as
some_list = [1, 2, 3]
same_list = some_list
same_list.append(4)
print(some_list)
because in the function call no copying of arguments is taking place, no creating pointers is taking place... what is taking place is Python saying alist = some_list and then executing the code in the function modi_list(). In other words, Python is binding (or assigning) another name to the same object.
Finally, when you do have a function that is going to modify its arguments, and you don't want those changes visible outside the function, you can usually just do a shallow copy:
def dont_modi_list(alist):
alist = alist[:] # make a shallow copy
alist.append(4)
Now some_list and alist are two different list objects that happen to contain the same objects -- so if you are just messing with the list object (inserting, deleting, rearranging) then you are fine, buf if you are going to go even deeper and cause modifications to the objects in the list then you will need to do a deepcopy(). But it's up to you to keep track of such things and code appropriately.

You can use a metaclass as follows:
import copy, new
class MakeACopyOfConstructorArguments(type):
def __new__(cls, name, bases, dct):
rv = type.__new__(cls, name, bases, dct)
old_init = dct.get("__init__")
if old_init is not None:
cls.__old_init = old_init
def new_init(self, *a, **kw):
a = copy.deepcopy(a)
kw = copy.deepcopy(kw)
cls.__old_init(self, *a, **kw)
rv.__init__ = new.instancemethod(new_init, rv, cls)
return rv
class Test(object):
__metaclass__ = MakeACopyOfConstructorArguments
def __init__(self, li):
li[0]=3
print li
li = range(3)
print li
t = Test(li)
print li

There is a best practice for this, in Python, and is called unit testing.
The main point here is that dynamic languages allows for much rapid development even with full unit tests; and unit tests are a much tougher safety net than static typing.
As writes Martin Fowler:
The general argument for static types is that it catches bugs that are
otherwise hard to find. But I discovered that in the presence of
SelfTestingCode, most bugs that static types would have were found
just as easily by the tests.

Related

Custom class which is a dict, but initialized without dict copy?

For legibility purposes, I would like to have a custom class that behaves exactly like a dict (but carries a meaningful type instead of the more general dict type):
class Derivatives(dict):
"Dictionary that represents the derivatives."
Now, is there a way of building new objects of this class in a way that does not involve copies? The naive usage
derivs = Derivatives({var: 1}) # var is a Python object
in fact creates a copy of the dictionary passed as an argument, which I would like to avoid, for efficiency reasons.
I tried to bypass the copy but then the class of the dict cannot be changed, in CPython:
class Derivatives(dict):
def __new__(cls, init_dict):
init_dict.__class__ = cls # Fails with __class__ assignment: only for heap types
return init_dict
I would like to have both the ability to give an explicit class name to the dictionaries that the program manipulates and an efficient way of building such dictionaries (instead of being forced to copy a Python dict). Is this doable efficiently in Python?
PS: The use case is maybe 100,000 creations of single-key Derivatives, where the key is a variable (not a string, so no keyword initialization). This is actually not slow, so "efficiency reasons" here means more something like "elegance": there is ideally no need to waste time doing a copy when the copy is not needed. So, in this particular case the question is more about the elegance/clarity that Python can bring here than about running speed.
By inheriting from dict you are given three possibilities for constructor arguments: (baring the {} literal)
class dict(**kwarg)
class dict(mapping, **kwarg)
class dict(iterable, **kwarg)
This means that, in order to instantiate your instance you must do one of the following:
Pass the variables as keywords D(x=1) which are then packed into an intermediate dictionary anyway.
Create a plain dictionary and pass it as a mapping.
Pass an iterable of (key,value) pairs.
So in all three of these cases you will need to create intermediate objects to satisfy the dict constructor.
The third option for a single pair it would look like D(((var,1),)) which I highly recommend against for readability sake.
So if you want your class to inherit from a dictionary, using Derivatives({var: 1}) is your most efficient and most readable option.
As a personal note if you will have thousands of single pair dictionaries I'm not sure how the dict setup is the best in the first place, you may just reconsider the basis of your class.
TL;DR: There's not general-purpose way to do it unless you do it in C.
Long answer:
The dict class is implemented in C. Thus, there is no way to access it's internal properties - and most importantly, it's internal hash table, unless you use C.
In C, you could simply copy the pointer representing the hash table into your object without having to iterate over the dict (key, value) pairs and insert them into your object. (Of course, it's a bit more complicated than this. Note that I omit memory management details).
Longer answer:
I'm not sure why you are concerned about efficiency.
Python passes arguments as references. It rarely every copies unless you explicitly tell it to.
I read in the comments that you can't use named parameters, as the keys are actual Python objects. That leaves me to understand that you're worried about copying the dict keys (and maybe values). However, even the dictionary keys are not copied, and passed by reference! Consider this code:
class Test:
def __init__(self, x, y):
self.x = x
self.y = y
def __hash__(self):
return self.x
t = Test(1, 2)
print(t.y) # prints 2
d = {t: 1}
print(d[t]) # prints 1
keys = list(d.keys())
keys[0].y = 10
print(t.y) # prints 10! No copying was made when inserting object into dictionary.
Thus, the only remaining area of concern is iterating through the dict and inserting the values in your Derivatives class. This is unavoidable, unless you can somehow set the internal hash table of your class to the dict's internal hash table. There is no way to do this in pure python, as the dict class is implemented in C (as mentioned above).
Note that others have suggested using generators. This seems like a good idea too - say if you were reading the derivatives from a file or if you were generating them with a simple formula. It would avoid creating the dict object in the first place. However, there will be no noticable improvements in efficiency if the generators are just wrappers around lists (or any other data structure that can contain an arbritary set of values).
Your best bet is do stick with your original method. Generators are great, but they can't efficiently represent an arbritary set of values (which might be the case in your scenario). It's also not worth it to do it in C.
EDIT: It might be worth it to do it in C, after all!
I'm not too big on the details of the Python C API, but consider defining a class in C, for example,DerivativesBase (deriving from dict). All you do is define an __init__ function in C for DerivativesBase that takes a dict as a parameter and copies the hash table pointer from the dict into your DerivativesBase object. Then, in python, your Derivatives class derives from DerivativesBase and implements the bulk of the functionality.

Should I subclass list or create class with list as attribute?

I need a container that can collect a number of objects and provides some reporting functionality on the container's elements. Essentially, I'd like to be able to do:
magiclistobject = MagicList()
magiclistobject.report() ### generates all my needed info about the list content
So I thought of subclassing the normal list and adding a report() method. That way, I get to use all the built-in list functionality.
class SubClassedList(list):
def __init__(self):
list.__init__(self)
def report(self): # forgive the silly example
if 999 in self:
print "999 Alert!"
Instead, I could also create my own class that has a magiclist attribute but I would then have to create new methods for appending, extending, etc., if I want to get to the list using:
magiclistobject.append() # instead of magiclistobject.list.append()
I would need something like this (which seems redundant):
class MagicList():
def __init__(self):
self.list = []
def append(self,element):
self.list.append(element)
def extend(self,element):
self.list.extend(element)
# more list functionality as needed...
def report(self):
if 999 in self.list:
print "999 Alert!"
I thought that subclassing the list would be a no-brainer. But this post here makes it sounds like a no-no. Why?
One reason why extending list might be bad is since it ties together your 'MagicReport' object too closely to the list. For example, a Python list supports the following methods:
append
count
extend
index
insert
pop
remove
reverse
sort
It also contains a whole host of other operations (adding, comparisons using < and >, slicing, etc).
Are all of those operations things that your 'MagicReport' object actually wants to support? For example, the following is legal Python:
b = [1, 2]
b *= 3
print b # [1, 2, 1, 2, 1, 2]
This is a pretty contrived example, but if you inherit from 'list', your 'MagicReport' object will do exactly the same thing if somebody inadvertently does something like this.
As another example, what if you try slicing your MagicReport object?
m = MagicReport()
# Add stuff to m
slice = m[2:3]
print type(slice)
You'd probably expect the slice to be another MagicReport object, but it's actually a list. You'd need to override __getslice__ in order to avoid surprising behavior, which is a bit of a pain.
It also makes it harder for you to change the implementation of your MagicReport object. If you end up needing to do more sophisticated analysis, it often helps to be able to change the underlying data structure into something more suited for the problem.
If you subclass list, you could get around this problem by just providing new append, extend, etc methods so that you don't change the interface, but you won't have any clear way of determining which of the list methods are actually being used unless you read through the entire codebase. However, if you use composition and just have a list as a field and create methods for the operations you support, you know exactly what needs to be changed.
I actually ran into a scenario very similar to your at work recently. I had an object which contained a collection of 'things' which I first internally represented as a list. As the requirements of the project changed, I ended up changing the object to internally use a dict, a custom collections object, then finally an OrderedDict in rapid succession. At least in my experience, composition makes it much easier to change how something is implemented as opposed to inheritance.
That being said, I think extending list might be ok in scenarios where your 'MagicReport' object is legitimately a list in all but name. If you do want to use MagicReport as a list in every single way, and don't plan on changing its implementation, then it just might be more convenient to subclass list and just be done with it.
Though in that case, it might be better to just use a list and write a 'report' function -- I can't imagine you needing to report the contents of the list more than once, and creating a custom object with a custom method just for that purpose might be overkill (though this obviously depends on what exactly you're trying to do)
As a general rule, whenever you ask yourself "should I inherit or have a member of that type", choose not to inherit. This rule of thumb is known as "favour composition over inheritance".
The reason why this is so is: composition is appropriate where you want to use features of another class; inheritance is appropriate if other code needs to use the features of the other class with the class you are creating.

Passing a collection argument without unpacking its contents

Question: What are the pros and cons of writing an __init__ that takes a collection directly as an argument, rather than unpacking its contents?
Context: I'm writing a class to process data from several fields in a database table. I iterate through some large (~100 million rows) query result, passing one row at a time to a class that performs the processing. Each row is retrieved from the database as a tuple (or optionally, as a dictionary).
Discussion: Assume I'm interested in exactly three fields, but what gets passed into my class depends on the query, and the query is written by the user. The most basic approach might be one of the following:
class Direct:
def __init__(self, names):
self.names = names
class Simple:
def __init__(self, names):
self.name1 = names[0]
self.name2 = names[1]
self.name3 = names[2]
class Unpack:
def __init__(self, names):
self.name1, self.name2, self.name3 = names
Here are some examples of rows that might be passed to a new instance:
good = ('Simon', 'Marie', 'Kent') # Exactly what we want
bad1 = ('Simon', 'Marie', 'Kent', '10 Main St') # Extra field(s) behind
bad2 = ('15', 'Simon', 'Marie', 'Kent') # Extra field(s) in front
bad3 = ('Simon', 'Marie') # Forgot a field
When faced with the above, Direct always runs (at least to this point) but is very likely to be buggy (GIGO). It takes one argument and assigns it exactly as given, so this could be a tuple or list of any size, a Null value, a function reference, etc. This is the most quick-and-dirty way I can think of to initialize the object, but I feel like the class should complain immediately when I give it data it's clearly not designed to handle.
Simple handles bad1 correctly, is buggy when given bad2, and throws an error when given bad3. It's convenient to be able to effectively truncate the inputs from bad1 but not worth the bugs that would come from bad2. This one feels naive and inconsistent.
Unpack seems like the safest approach, because it throws an error in all three "bad" cases. The last thing we want to do is silently fill our database with bad information, right? It takes the tuple directly, but allows me to identify its contents as distinct attributes instead of forcing me to keep referring to indices, and complains if the tuple is the wrong size.
On the other hand, why pass a collection at all? Since I know I always want three fields, I can define __init__ to explicitly accept three arguments, and unpack the collection using the *-operator as I pass it to the new object:
class Explicit:
def __init__(self, name1, name2, name3):
self.name1 = name1
self.name2 = name2
self.name3 = name3
names = ('Guy', 'Rose', 'Deb')
e = Explicit(*names)
The only differences I see are that the __init__ definition is a bit more verbose and we raise TypeError instead of ValueError when the tuple is the wrong size. Philosophically, it seems to make sense that if we are taking some group of data (a row of a query) and examining its parts (three fields), we should pass a group of data (the tuple) but store its parts (the three attributes). So Unpack would be better.
If I wanted to accept an indeterminate number of fields, rather than always three, I still have the choice to pass the tuple directly or use arbitrary argument lists (*args, **kwargs) and *-operator unpacking. So I'm left wondering, is this a completely neutral style decision?
This question is probably best answered by trying out the different approaches and seeing what makes the most sense to you and is the most easily understood by others reading your code.
Now that I have the benefit of more experience, I'd ask myself, how do I plan to access these values?
When I access any one of the values in this collection, am I likely to be using most or all of the values in that same subroutine or section of code? If so, the "Direct" approach is a good choice; it's the most compact and it lets me think about the collection as a collection until the point that I absolutely need to pay attention to what's inside.
On the other hand, if I'm using some values here, some values there, I don't want have to constantly remember which index to access or add verbosity in the form of dictionary keys when I could just be referring directly to the values using separately named attributes. I would probably avoid the "Direct" approach in this case so that I only have to even think about the fact that there's a collection when the class is first initialized.
Each of the remaining approaches involves splitting the collection up into different attributes, and I think the clear winner here is the "Explicit" approach. The "Simple" and "Unpack" approaches share a hidden dependency on the order of the collection, without offering any real advantage.

Python class function default variables are class objects? [duplicate]

This question already has answers here:
Why does using `arg=None` fix Python's mutable default argument issue?
(5 answers)
"Least Astonishment" and the Mutable Default Argument
(33 answers)
Closed 9 months ago.
I was writing some code this afternoon, and stumbled across a bug in my code. I noticed that the default values for one of my newly created objects was carrying over from another object! For example:
class One(object):
def __init__(self, my_list=[]):
self.my_list = my_list
one1 = One()
print(one1.my_list)
[] # empty list, what you'd expect.
one1.my_list.append('hi')
print(one1.my_list)
['hi'] # list with the new value in it, what you'd expect.
one2 = One()
print(one2.my_list)
['hi'] # Hey! It saved the variable from the other One!
So I know it can be solved by doing this:
class One(object):
def __init__(self, my_list=None):
self.my_list = my_list if my_list is not None else []
What I would like to know is... Why? Why are Python classes structured so that the default values are saved across instances of the class?
This is a known behaviour of the way Python default values work, which is often surprising to the unwary. The empty array object [] is created at the time of definition of the function, rather than at the time it is called.
To fix it, try:
def __init__(self, my_list=None):
if my_list is None:
my_list = []
self.my_list = my_list
Several others have pointed out that this is an instance of the "mutable default argument" issue in Python. The basic reason is that the default arguments have to exist "outside" the function in order to be passed into it.
But the real root of this as a problem has nothing to do with default arguments. Any time it would be bad if a mutable default value was modified, you really need to ask yourself: would it be bad if an explicitly provided value was modified? Unless someone is extremely familiar with the guts of your class, the following behaviour would also be very surprising (and therefore lead to bugs):
>>> class One(object):
... def __init__(self, my_list=[]):
... self.my_list = my_list
...
>>> alist = ['hello']
>>> one1 = One(alist)
>>> alist.append('world')
>>> one2 = One(alist)
>>>
>>> print(one1.my_list) # Huh? This isn't what I initialised one1 with!
['hello', 'world']
>>> print(one2.my_list) # At least this one's okay...
['hello', 'world']
>>> del alist[0]
>>> print one2.my_list # What the hell? I just modified a local variable and a class instance somewhere else got changed?
['world']
9 times out of 10, if you discover yourself reaching for the "pattern" of using None as the default value and using if value is None: value = default, you shouldn't be. You should be just not modifying your arguments! Arguments should not be treated as owned by the called code unless it is explicitly documented as taking ownership of them.
In this case (especially because you're initialising a class instance, so the mutable variable is going to live a long time and be used by other methods and potentially other code that retrieves it from the instance) I would do the following:
class One(object):
def __init__(self, my_list=[])
self.my_list = list(my_list)
Now you're initialising the data of your class from a list provided as input, rather than taking ownership of a pre-existing list. There's no danger that two separate instances end up sharing the same list, nor that the list is shared with a variable in the caller which the caller may want to continue using. It also has the nice effect that your callers can provide tuples, generators, strings, sets, dictionaries, home-brewed custom iterable classes, etc, and you know you can still count on self.my_list having an append method, because you made it yourself.
There's still a potential problem here, if the elements contained in the list are themselves mutable then the caller and this instance can still accidentally interfere with each other. I find it not to very often be a problem in practice in my code (so I don't automatically take a deep copy of everything), but you have to be aware of it.
Another issue is that if my_list can be very large, the copy can be expensive. There you have to make a trade-off. In that case, maybe it is better to just use the passed-in list after all, and use the if my_list is None: my_list = [] pattern to prevent all default instances sharing the one list. But if you do that you need to make it clear, either in documentation or the name of the class, that callers are relinquishing ownership of the lists they use to initialise the instance. Or, if you really want to be constructing a list solely for the purpose of wrapping up in an instance of One, maybe you should figure out how to encapsulate the creation of the list inside the initialisation of One, rather than constructing it first; after all, it's really part of the instance, not an initialising value. Sometimes this isn't flexible enough though.
And sometimes you really honestly do want to have aliasing going on, and have code communicating by mutating values they both have access to. I think very hard before I commit to such a design, however. And it will surprise others (and you when you come back to the code in X months), so again documentation is your friend!
In my opinion, educating new Python programmers about the "mutable default argument" gotcha is actually (slightly) harmful. We should be asking them "Why are you modifying your arguments?" (and then pointing out the way default arguments work in Python). The very fact of a function having a sensible default argument is often a good indicator that it isn't intended as something that receives ownership of a pre-existing value, so it probably shouldn't be modifying the argument whether or not it got the default value.
Basically, python function objects store a tuple of default arguments, which is fine for immutable things like integers, but lists and other mutable objects are often modified in-place, resulting in the behavior you observed.
This is standard behavior of default arguments anywhere in Python, not just in classes.
For more explanation, see Mutable defaults for function/method arguments.
Python functions are objects. Default arguments of a function are attributes of that function. So if the default value of an argument is mutable and it's modified inside your function, the changes are reflected in subsequent calls to that function.
Not an answer, but it's worth noting this is also true for class variables defined outside any class functions.
Example:
>>> class one:
... myList = []
...
>>>
>>> one1 = one()
>>> one1.myList
[]
>>> one2 = one()
>>> one2.myList.append("Hello Thar!")
>>>
>>> one1.myList
['Hello Thar!']
>>>
Note that not only does the value of myList persist, but every instance of myList points to the same list.
I ran into this bug/feature myself, and spent something like 3 hours trying to figure out what was going on. It's rather challenging to debug when you are getting valid data, but it's not from the local computations, but previous ones.
It's made worse since this is not just a default argument. You can't just put myList in the class definition, it has to be set equal to something, although whatever it is set equal to is only evaluated once.
The solution, at least for me, was to simply create all the class variable inside __init__.

Emulating pass-by-value behaviour in python

I would like to emulate the pass-by-value behaviour in python. In other words, I would like to make absolutely sure that the function I write do not modify user supplied data.
One possible way is to use deep copy:
from copy import deepcopy
def f(data):
data = deepcopy(data)
#do stuff
is there more efficient or more pythonic way to achieve this goal, making as few assumptions as possible about the object being passed (such as .clone() method)
Edit
I'm aware that technically everything in python is passed by value. I was interested in emulating the behaviour, i.e. making sure I don't mess with the data that was passed to the function. I guess the most general way is to clone the object in question either with its own clone mechanism or with deepcopy.
There is no pythonic way of doing this.
Python provides very few facilities for enforcing things such as private or read-only data. The pythonic philosophy is that "we're all consenting adults": in this case this means that "the function shouldn't change the data" is part of the spec but not enforced in the code.
If you want to make a copy of the data, the closest you can get is your solution. But copy.deepcopy, besides being inefficient, also has caveats such as:
Because deep copy copies everything it may copy too much, e.g., administrative data structures that should be shared even between copies.
[...]
This module does not copy types like module, method, stack trace, stack frame, file, socket, window, array, or any similar types.
So i'd only recommend it if you know that you're dealing with built-in Python types or your own objects (where you can customize copying behavior by defining the __copy__ / __deepcopy__ special methods, there's no need to define your own clone() method).
You can make a decorator and put the cloning behaviour in that.
>>> def passbyval(func):
def new(*args):
cargs = [deepcopy(arg) for arg in args]
return func(*cargs)
return new
>>> #passbyval
def myfunc(a):
print a
>>> myfunc(20)
20
This is not the most robust way, and doesn't handle key-value arguments or class methods (lack of self argument), but you get the picture.
Note that the following statements are equal:
#somedecorator
def func1(): pass
# ... same as ...
def func2(): pass
func2 = somedecorator(func2)
You could even have the decorator take some kind of function that does the cloning and thus allowing the user of the decorator to decide the cloning strategy. In that case the decorator is probably best implemented as a class with __call__ overridden.
There are only a couple of builtin typs that work as references, like list, for example.
So, for me the pythonic way for doing a pass-by-value, for list, in this example, would be:
list1 = [0,1,2,3,4]
list2 = list1[:]
list1[:] creates a new instance of the list1, and you can assign it to a new variable.
Maybe you could write a function that could receive one argument, then check its type, and according that resulta, perform a builtin operation that could return a new instance of the argument passed.
As I said earlier, there are only a few builtin types, that their behavior is like references, lists in this example.
Any way... hope it helps.
I can't figure out any other pythonic option. But personally I'd prefer the more OO way.
class TheData(object):
def clone(self): """return the cloned"""
def f(data):
#do stuff
def caller():
d = TheData()
f(d.clone())
usually when passing data to an external API, you can assure the integrity of your data by passing it as an immutable object, for example wrap your data into a tuple. This cannot be modified, if that is what you tried to prevent by your code.
Further to user695800's answer, pass by value for lists possible with the [:] operator
def listCopy(l):
l[1] = 5
for i in l:
print i
called with
In [12]: list1 = [1,2,3,4]
In [13]: listCopy(list1[:])
1
5
3
4
list1
Out[14]: [1, 2, 3, 4]
Though i'm sure there's no really pythonic way to do this, i expect the pickle module will give you copies of everything you have any business treating as a value.
import pickle
def f(data):
data = pickle.loads(pickle.dumps((data)))
#do stuff
Many people use the standard library copy. I prefer to defining __copy__ or __deepcopy__ in my classes. The methods in copy may have some problems.
The shallow copy will keep references to the objects in the original instead of creating new one.
The deepcopy will run recursively, sometimes may cause dead-loop. If without enough attention, memory may explode.
To avoid these out-of-control behaviors, define your own shallow/deep copy methods through overwriting __copy__ and __deepcopy__. And Alex's answer give a good example.

Categories

Resources