Why copy.deepcopy doesn't modify the id of an object? - python

I don't understand why copy.deepcopy does not modify the id of an object:
import copy
a = 'hello world'
print a is copy.deepcopy(a) # => True ???

Simeon's answer is perfectly correct, but I wanted to provide a more general perspective.
The copy module is primarily intended for use with mutable objects. The idea is to make a copy of an object so you can modify it without affecting the original. Since there's no point in making copies of immutable objects, the module declines to do so. Strings are immutable in Python, so this optimization can never affect real-world code.

Python interns the strings so they're the same object (and thus the same when compared with is). This means Python only stores one copy of the same string object (behind the scenes).
The result of copy.deepcopy(a) is not truly a new object and as such it isn't meaningful to perform this call on a string object.

Look again:
import copy
a = ['hello world']
print a is copy.deepcopy(a) # => False
Since the value of an immutible object (such as a string) is incapable of changing without also changing its identity, there's would be no point to creating additional instances. It's only in the case of a mutable object (such as a list) where there's any point to creating a second identity with the same value.
For a thorough introduction to separating the concepts of value, identity and state, I suggest Rich Hickey's talk on the subject.

Related

function taking ownership of Numpy array

I have a function takes_ownership() that performs an operation on a Numpy array a in place and effectively relies on becoming the owner of the data. Is there a way to alter the passed array such that it no longer points to its original buffer?
The motivation is a code that processes huge chunks of data in place (due to performance constraints) and a common error is unaware users recycling their input arrays.
Note that the question is very specifically not "why is a different if I change b = a in my function". The question is "how can I make using the given array in place safer for unsuspecting users" when I must not make a copy.
def takes_ownership(a):
b = a
# looking for this step
a.set_internal_buffer([])
# if this were C++ std::vectors, I would be looking for
# b = np.array([])
# a.swap(b)
# an expensive operation that invalidates a
b.resize((6, 6), refcheck=False)
# outside references to a no longer valid
return b
a = np.random.randn(5, 5)
b = takes_ownership(a)
# array no longer has data so that users cannot mess up
assert a.shape = ()
NumPy has a copy function that will clone an array (although if this is an object array, i.e. not primitive, there might still be nested object references after cloning). That being said, this is a questionable design pattern and it would probably be better practice to rewrite your code in a way that does not rely on this condition.
Edit: If you can't copy the array, you will probably need to make sure that none of your other code modifies it in place (instead running immutable operations to produce new arrays). Doing things this way may cause more issues down the road, so I'd recommend refactoring so that it is not necessary.
You can use the numpy's copy, which will absolutely do what you want.
Use the code b = np.copy(a) and no further changes are needed.
If you want to make a copy of an object, in general, so that you can call methods on that object then you can use the copy module.
From the linked page:
A shallow copy constructs a new compound object and then (to the
extent possible) inserts references into it to the objects found in
the original.
A deep copy constructs a new compound object and then, recursively,
inserts copies into it of the objects found in the original.
In this case, in your code import copy and then use b = copy.copy(a) and you will get a shallow copy (Which I think should be good enough for numpy arrays, but you'll want to check that yourself).
The hanging question here is why this is needed. Python prefers to use "pass by reference" when calling a function. The assignment operator = does not actually call any constructor for the object on the left-hand side of the operator, rather it assigns it to the reference of the object on the right-hand side. So, when you call a method on a reference (using the dot operator .), either a or b are going to call the same object in memory unless you make a new object with an explicit copy command.

python atomic data types

It was written here that Python has both atomic and reference object types. Atomic objects are: int, long, complex.
When assigning atomic object, it's value is copied, when assigning reference object it's reference is copied.
My question is:
why then, when i do the code bellow i get 'True'?
a = 1234
b = a
print id(a) == id(b)
It seems to me that i don't copy value, i just copy reference, no matter what type it is.
Assignment (binding) in Python NEVER copies data. It ALWAYS copies a reference to the value being bound.
The interpreter computes the value on the right-hand side, and the left-hand side is bound to the new value by referencing it. If expression on the right-hand side is an existing value (in other words, if no operators are required to compute its value) then the left-hand side will be a reference to the same object.
After
a = b
is executed,
a is b
will ALWAYS be true - that's how assignment works in Python. It's also true for containers, so x[i].some_attribute = y will make x[i].some_attribute is y true.
The assertion that Python has atomic types and reference types seems unhelpful to me, if not just plain untrue. I'd say it has atomic types and container types. Containers are things like lists, tuples, dicts, and instances with private attributes (to a first approximation).
As #BallPointPen helpfully pointed out in their comment, mutable values can be altered without needing to re-bind the reference. Since immutable values cannot be altered, references must be re-bound in order to refer to a different value.
Edit: Recently reading the English version of the quoted page (I'm afraid I don't understand Russian) I see "Python uses dynamic typing, and a combination of reference counting and a cycle-detecting garbage collector for memory management." It's possible the Russian page has mistranslated this to give a false impression of the language, or that it was misunderstood by the OP. But Python doesn't have "reference types" except in the most particular sense for weakrefs and similar constructs.
int types are immutable.
what you see is the reference for the number 1234 and that will never change.
for mutable object like list, dictionary you can use
import copy
a = copy.deepcopy(b)
Actually like #spectras said there are only references but there are immutable objects like floats, ints, tuples. For immutable objects (apart from memory consumption) it just does not matter if you pass around references or create copies.
The interpreter even does some optimizations making use of numbers with the same value being interchangeable making checking numbers for identity interesting because eg for
a=1
b=1
c=2/2
d=12345
e=12345*1
a is b is true and a is c is also true but d is e is false (== works normally as expected)
Immutable objects are atomic the way that changing them is threadsafe because you do not actually change the object itself but just put a new reference in a variable (which is threadsafe).

Is there some deep difference in Python between methods that alter the object and ones that do not?

For instance, in the case of
>>> f.read()
'This statement is false.\n'
>>> f
<_io.TextIOWrapper name='test.txt' mode='r' encoding='UTF-8'>
>>> f.seek(0)
>>> f.read()
'This statement is false.\n'
f is still a file object with the same contents. Whereas with
>>> lst = []
>>> lst.append(2)
>>> lst
[2]
the list lst is altered.
Is this difference an instance of a general theme or trend? Is there a special set of methods that alter the object? (In my example, would f be considered altered by f.seek(0)?)
Whether or not the object is altered by a method does not make the method any different. So to answer your question: no.
However, of course, some methods are designed with the purpose of altering the object, and some are designed not to alter the object.
(Also, there are of course classmethods and staticmethods, which cannot alter the instance - otherwise they wouldn't be class or static methods!)
I'm not quite sure what you have in mind, but in general, no, there is no fundamental distinction between methods that alter their object and those that do not. Python has nothing akin to the const keyword of C or C++.
In fact, sometimes it is intentionally not revealed whether a method alters the object it acts on. This is to allow for different implementations under the hood, some of which will alter the object's state and some of which won't. This does have implications when writing parallel code (e.g. using multiprocessing): objects might not be safe to use concurrently from multiple threads, and they might not tell you explicitly.
I don't see the objective for this question. All languages have methods that need modify the object and methods that not modify the object. If you want append a element to a array is impossible that you don't modify the object.
The answer is that depends that you want to do, the method need alter or not the object.

inconsistent variable scope in python

it seems that strings and dicts behave fundamentally differently in python. when i pass a string to a function it gets modified in the local function's scope only, but when i do the same with a dict, it gets modified in the scope beyond the function:
def change_str(s):
s += " qwe"
def change_arr(a):
a[1] = "qwe"
ss = "asd"
change_str(ss)
print ss
# prints:
# asd
aa = {0:"asd"}
change_arr(aa)
print aa
# prints:
# {0: 'asd', 1: 'qwe'}
is this behavior intentional, and if so then why?
It is intentional behavior. Strings are immutable in python, so essentially all string operations return a new string and as your functions do not return anything, you cannot see the new string asd qwe. You can change the contents of mutable containers outside of local scope without declaring them global.
You can read more about mutable types in the official documentation of pythons data model.
Don't let the 'assignment' operator fool you. This is what is really going on in each of these functions:
def change_str(s):
# operation has been split into 2 steps for clarity
t = s.__iadd__("qwe") # creates a new string object
s = t # as you know, without the `global` keyword, this `s` is local.
def change_arr(a):
a.__setitem__(1, "qwe")
As you can see, only one of these functions actually has an assignment operation. The []= is shorthand for (or equivalent to) .__setitem__().
Yes, it's intentional. Each type determines how operators work on it. The dict type is set up so that a[1] = "qwe" modifies the dict object. Such changes will be seen in any piece of code that references that object. The string type is set up so that s += "qwe" does not modify the object, but returns a new object. So other code that was referencing the original object will see no changes.
The shorthand way of saying that is that strings are immutable and dicts are mutable. However, it's worth noting that "dicts are mutable" isn't the whole reason why the behavior happens. The reason is that item assignment (someDict[item] = val) is an operation that actaully mutates a dict.

How to make a variable (truly) local to a procedure or function

ie we have the global declaration, but no local.
"Normally" arguments are local, I think, or they certainly behave that way.
However if an argument is, say, a list and a method is applied which modifies the list, some surprising (to me) results can ensue.
I have 2 questions: what is the proper way to ensure that a variable is truly local?
I wound up using the following, which works, but it can hardly be the proper way of doing it:
def AexclB(a,b):
z = a+[] # yuk
for k in range(0, len(b)):
try: z.remove(b[k])
except: continue
return z
Absent the +[], "a" in the calling scope gets modified, which is not desired.
(The issue here is using a list method,
The supplementary question is, why is there no "local" declaration?
Finally, in trying to pin this down, I made various mickey mouse functions which all behaved as expected except the last one:
def fun4(a):
z = a
z = z.append(["!!"])
return z
a = ["hello"]
print "a=",a
print "fun4(a)=",fun4(a)
print "a=",a
which produced the following on the console:
a= ['hello']
fun4(a)= None
a= ['hello', ['!!']]
...
>>>
The 'None' result was not expected (by me).
Python 2.7 btw in case that matters.
PS: I've tried searching here and elsewhere but not succeeded in finding anything corresponding exactly - there's lots about making variables global, sadly.
It's not that z isn't a local variable in your function. Rather when you have the line z = a, you are making z refer to the same list in memory that a already points to. If you want z to be a copy of a, then you should write z = a[:] or z = list(a).
See this link for some illustrations and a bit more explanation http://henry.precheur.org/python/copy_list
Python will not copy objects unless you explicitly ask it to. Integers and strings are not modifiable, so every operation on them returns a new instance of the type. Lists, dictionaries, and basically every other object in Python are mutable, so operations like list.append happen in-place (and therefore return None).
If you want the variable to be a copy, you must explicitly copy it. In the case of lists, you slice them:
z = a[:]
There is a great answer than will cover most of your question in here which explains mutable and immutable types and how they are kept in memory and how they are referenced. First section of the answer is for you. (Before How do we get around this? header)
In the following line
z = z.append(["!!"])
Lists are mutable objects, so when you call append, it will update referenced object, it will not create a new one and return it. If a method or function do not retun anything, it means it returns None.
Above link also gives an immutable examle so you can see the real difference.
You can not make a mutable object act like it is immutable. But you can create a new one instead of passing the reference when you create a new object from an existing mutable one.
a = [1,2,3]
b = a[:]
For more options you can check here
What you're missing is that all variable assignment in python is by reference (or by pointer, if you like). Passing arguments to a function literally assigns values from the caller to the arguments of the function, by reference. If you dig into the reference, and change something inside it, the caller will see that change.
If you want to ensure that callers will not have their values changed, you can either try to use immutable values more often (tuple, frozenset, str, int, bool, NoneType), or be certain to take copies of your data before mutating it in place.
In summary, scoping isn't involved in your problem here. Mutability is.
Is that clear now?
Still not sure whats the 'correct' way to force the copy, there are
various suggestions here.
It differs by data type, but generally <type>(obj) will do the trick. For example list([1, 2]) and dict({1:2}) both return (shallow!) copies of their argument.
If, however, you have a tree of mutable objects and also you don't know a-priori which level of the tree you might modify, you need the copy module. That said, I've only needed this a handful of times (in 8 years of full-time python), and most of those ended up causing bugs. If you need this, it's a code smell, in my opinion.
The complexity of maintaining copies of mutable objects is the reason why there is a growing trend of using immutable objects by default. In the clojure language, all data types are immutable by default and mutability is treated as a special cases to be minimized.
If you need to work on a list or other object in a truly local context you need to explicitly make a copy or a deep copy of it.
from copy import copy
def fn(x):
y = copy(x)

Categories

Resources