This is a question rotating a matrix 90 degrees clockwise, i don't understand why i cannot use:
matrix = zip(*matrix[::-1])
but:
class Solution:
def rotate(self, matrix):
"""
:type matrix: List[List[int]]
:rtype: void Do not return anything, modify matrix in-place instead.
"""
matrix[::] = zip(*matrix[::-1])
matrix in your method is a reference to a matrix object. Assignment to matrix will change matrix to reference your newly created object, but not change the contents of the original object. matrix[::] = invokes __setitem__ on the object referenced by matrix which changes the contents of the object accordingly.
In Python, all assignments bind a reference to a name. Operators call a method of an existing reference1. In your case, the statement
matrix = ...
is purely an assignment2. It computes the right hand side, and binds it to the name matrix in the local function scope. Whatever object matrix referred to when you passed it in remains untouched.
This is why you don't see the changes you made. It's not that the function doesn't work this way, it's that it doesn't do anything with the rotated list. The data is discarded as soon as the function exits.
The operation
matrix[:] = ...
on the other hand is not an assignment in the semantic sense, despite the = symbol3. It's a call to matrix.__setitem__(...)4. The__setitem__ method, like any other method, operates directly on the object without changing it's name bindings.
As far as indexing goes, [:] is equivalent to [::]. They are shorthand for [0:len(matrix)] and [0:len(matrix):1], respectively. In both cases, the default step size will be used. In general, any index with colons in it will be converted to a slice object. Missing elements are set to None and replaced by the sequence-specific defaults shown here.
1 Some operators, like += perform an assignment after calling a method. These are called augmented assignments. But that's not a case we're interested in right now.
2 Besides literal assignment statements (=), some other types of assignments are def (which binds a function object to its name), class (which does the same for a class object), import (which binds a module or element of a module to a name), passing arguments to a function (which binds objects to the local argument names or kwarg dictionary keys), and for (which binds an element from an iterator to the loop variable at each iteration).
3 It's still an assignment from the point of view of the parser, but the statement is handled completely differently. A similar statement that is not actually an assignment is using the = operator on an attribute implemented as a descriptor, such as a property.
4 Technically, it's more of an equivalent to type(matrix).__setitem__(matrix, ...), but with some additional optimizations. For example, the metaclass of type(matrix) won't ever be searched.
Related
This question already has answers here:
How to prevent a list from changing after being used as parameter in function?
(3 answers)
Closed 1 year ago.
My (limited) understanding of Python is that objects defined outside a function should remain the same even if modified inside a function, i.e.
def fun():
i=1
return('done')
i=0
fun()
i==0 ## True
However, lists (and other objects like numpy arrays) change when indexed inside a function:
def fun():
img[0] = img[0] + 100
return('done')
img = [0, 1]
fun()
img == [0, 1] ## False
Obviously I am missing a core concept with respect to how global and local variables are handled inside functions but can't seem to find any information online. Could someone please explain why objects change inside functions when indexed? Also, could someone describe how to avoid this "feature" so that when I index objects (lists, arrays, etc...) within a function, I don't inadvertently change the objects defined outside that function? Thanks!
Please see my recent answer to a closely related question: https://stackoverflow.com/a/69303154/8431111
There's two concepts in play for your example:
behavior of an assignment versus a reference
mutating a single container object
For item (1.) when fun assigns i = 1,
it creates a new local variable
due to the lack of a global declaration.
For item (2.) let's consider this slightly simpler code instead,
as it doesn't change the essential problem:
img[0] = 100
The assignment operator locates img
by first consulting the function's local scope (not found)
and then finding it in the module's global scope.
With a reference to the existing img object in hand,
it then calls __setitem__
to alter what the zero-th element points to.
In this case it will point at an immutable int object.
The setter's assignment is performed by a STORE_SUBSCR bytecode instruction.
I couldn't find a convenient way to distinguish in-place methods from assignable methods in python.
I mean for example a method like my_list.sort() don't need assignment and it does changes itself (it is in-place right?), but some other methods need assignment to a variable.
am I wrong ?
The reason you can't easily find such a distinction is that it really doesn't exist except by tenuous convention. "In-place" just means that a function modifies the data of a mutable argument, rather than returning an all new object. If "Not in-place" is taken to mean that a function returns a new object encapsulating updated data, leaving the input alone, then there are simply too many other possible conventions to be worth cataloging.
The standard library does its best to follow the convention of not returning values from single-argument in-place functions. So for example you have list.sort, list.append, random.shuffle and heapq.heapify all operate in-place, returning None. At the same time, you have functions and methods that create new objects, and must therefore return them, like sorted, list.__add__ and tuple.__iadd__. But you also have in-place methods that must return a value like list.__iadd__ (compare to list.extend which does not return a value).
__iadd__ and similar in-place operators emphasize a very important point, which is that in-place operation is not an option for immutable objects. Python has a workaround for this:
x = (1, 2)
y = (3, 4)
x += y
For all objects, the third line is equivalent to
x = type(x).__iadd__(x, y)
Ignoring the fact that the method is called as a function, notice that the name x is reassigned, so even if x += y has to create a new object (e.g., because tuple is immutable), you can still see it through the name x. Mutable objects will generally just return x in this case, so the method call will appear not to return a value, even when it really does.
As an interesting aside, the reassignment sometimes causes an unexpected error:
>>> z = ([],)
>>> z[0].extend([1, 2]) # OK
>>> z[0] += [3, 4] # Error! But list is mutable!
Many third party libraries, such as numpy, support the convention of in-place functions without a return value, up to a point. Most numpy functions create new objects, like np.cumsum, np.add, and np.sort. However, there are in also functions and methods that operate in-place and return None, like np.ndarray.sort and np.random.shuffle.
numpy can work with large memory buffers, which means that in-place operation is often desirable. Instead of having a separate in-place version of the function, some functions and methods (most notably universal functions) have an out parameter that can be set to the input, like np.cumsum, np.ndarray.cumsum, and np.add. In these cases, the function will operate in-place, but still return a reference to the out parameter, much in the same way that python's in-place operators do.
An added complication is that not all functions and methods perform a single action on a single object. You can write a class like this to illustrate:
class Test:
def __init__(self, value):
self.value = value
def op(self, other):
other.value += self.value
return self
This class modifies another object, but returns a reference to the unmodified self. While contrived, the example serves to illustrate that the in-place/not-in-place paradigm is not all-encompassing.
TL;DR
In the end, the general concept of in-place is often useful, but can't replace the need for reading documentation and understanding what each function does on an individual basis. This will also save you from many common gotchas with mutable objects supporting in-place operations vs immutable ones just emulating them.
I am coding in Python trying to decide whether I should return a numpy array (the result of a diff on some other array) or return numpy.where(diff)[0], which is a smaller array but requires that little extra work to create. Let's call the method where this happens methodB.
I call methodB from methodA. The rub is that I won't necessarily always need the where() result in methodA, but I might. So is it worth doing this work inside methodB, or should I pass back the (much larger memory-wise) diff itself and then only process it further in methodA if needed? That would be the more efficient choice assuming methodA just gets a reference to the result.
So, are function results ever not copied when they are passed back the the code that called that function?
I believe that when methodB finishes, all the memory in its frame will be reclaimed by the system, so methodA has to actually copy anything returned by methodB in to its own frame in order to be able to use it. I would call this "return by value". Is this correct?
Yes, you are correct. In Python, arguments are always passed by value, and return values are always returned by value. However, the value being returned (or passed) is a reference to a potentially shared, potentially mutable object.
There are some types for which the value being returned or passed may be the actual object itself, e.g. this is the case for integers, but the difference between the two can only be observed for mutable objects which integers aren't, and de-referencing an object reference is completely transparent, so you will never notice the difference. To simplify your mental model, you may just assume that arguments and return values are always passed by value (this is true anyhow), and that the value being passed is always a reference (this is not always true, but you cannot tell the difference, you can treat it as a simple performance optimization).
Note that passing / returning a reference by value is in no way similar (and certainly not the same thing) as passing / returning by reference. In particular, it does not allow you to mutate the name binding in the caller / callee, as pass-by-reference would allow you to.
This particular flavor of pass-by-value, where the value is typically a reference is the same in e.g. ECMAScript, Ruby, Smalltalk, and Java, and is sometimes called "call by object sharing" (coined by Barbara Liskov, I believe), "call by sharing", "call by object", and specifically within the Python community "call by assignment" (thanks to #timgeb) or "call by name-binding" (thanks to #Terry Jan Reedy) (not to be confused with call by name, which is again a different thing).
Assignment never copies data. If you have a function foo that returns a value, then an assignment like result = foo(arg) never copies any data. (You could, of course, have copy-operations in the function's body.) Likewise, return x does not copy the object x.
Your question lacks a specific example, so I can't go into more detail.
edit: You should probably watch the excellent Facts and Myths about Python names and values talk.
So roughly your code is:
def methodA(arr):
x = methodB(arr)
....
def methodB(arr):
diff = somefn(arr)
# return diff or
# return np.where(diff)[0]
arr is a (large) array, that is passed a reference to methodA and methodB. No copies are made.
diff is a similar size array that is generated in methodB. If that is returned, it be referenced in the methodA namespace by x. No copy is made in returning it.
If the where array is returned, diff disappears when methodB returns. Assuming it doesn't share a data buffer with some other array (such as arr), all the memory that it occupied is recovered.
But as long as memory isn't tight, returning diff instead of the where result won't be more expensive. Nothing is copied during the return.
A numpy array consists of small object wrapper with attributes like shape and dtype. It also has a pointer to a potentially large data buffer. Where possible numpy tries to share buffers, but readily makes new ndarray objects. Thus there's an important distinction between view and copy.
I see what I missed now: Objects are created on the heap, but function frames are on the stack. So when methodB finishes, its frame will be reclaimed, but that object will still exist on the heap, and methodA can access it with a simple reference.
I would like to do something like the following:
class Foo(object):
def __init__(self):
self.member = 10
pass
def factory(foo):
foo = Foo()
aTestFoo = None
factory(aTestFoo)
print aTestFoo.member
However it crashes with AttributeError: 'NoneType' object has no attribute 'member':
the object aTestFoo has not been modified inside the call of the function factory.
What is the pythonic way of performing that ? Is it a pattern to avoid ? If it is a current mistake, how is it called ?
In C++, in the function prototype, I would have added a reference to the pointer to be created in the factory... but maybe this is not the kind of things I should think about in Python.
In C#, there's the key word ref that allows to modify the reference itself, really close to the C++ way. I don't know in Java... and I do wonder in Python.
Python does not have pass by reference. One of the few things it shares with Java, by the way. Some people describe argument passing in Python as call by value (and define the values as references, where reference means not what it means in C++), some people describe it as pass by reference with reasoning I find quite questionable (they re-define it to use to what Python calls "reference", and end up with something which has nothing to do with what has been known as pass by reference for decades), others go for terms which are not as widely used and abused (popular examples are "{pass,call} by {object,sharing}"). See Call By Object on effbot.org for a rather extensive discussion on the defintions of the various terms, on history, and on the flaws in some of the arguments for the terms pass by reference and pass by value.
The short story, without naming it, goes like this:
Every variable, object attribute, collection item, etc. refers to an object.
Assignment, argument passing, etc. create another variable, object attribute, collection item, etc. which refers to the same object but has no knowledge which other variables, object attributes, collection items, etc. refer to that object.
Any variable, object attribute, collection item, etc. can be used to modify an object, and any other variable, object attribute, collection item, etc. can be used to observe that modification.
No variable, object attribute, collection item, etc. refers to another variable, object attribute, collection items, etc. and thus you can't emulate pass by reference (in the C++ sense) except by treating a mutable object/collection as your "namespace". This is excessively ugly, so don't use it when there's a much easier alternative (such as a return value, or exceptions, or multiple return values via iterable unpacking).
You may consider this like using pointers, but not pointers to pointers (but sometimes pointers to structures containing pointers) in C. And then passing those pointers by value. But don't read too much into this simile. Python's data model is significantly different from C's.
You are making a mistake here because in Python
"We call the argument passing technique _call by sharing_,
because the argument objects are shared between the
caller and the called routine. This technique does not
correspond to most traditional argument passing techniques
(it is similar to argument passing in LISP). In particular it
is not call by value because mutations of arguments per-
formed by the called routine will be visible to the caller.
And it is not call by reference because access is not given
to the variables of the caller, but merely to certain objects."
in Python, the variables in the formal argument list are bound to the
actual argument objects. the objects are shared between caller
and callee; there are no "fresh locations" or extra "stores" involved.
(which, of course, is why the CLU folks called this mechanism "call-
by-sharing".)
and btw, Python functions doesn't run in an extended environment, either. function bodies have very limited access to the surrounding environment.
The Assignment Statements section of the Python docs might be interesting.
The = statement in Python acts differently depending on the situation, but in the case you present, it just binds the new object to a new local variable:
def factory(foo):
# This makes a new instance of Foo,
# and binds it to a local variable `foo`,
foo = Foo()
# This binds `None` to a top-level variable `aTestFoo`
aTestFoo = None
# Call `factory` with first argument of `None`
factory(aTestFoo)
print aTestFoo.member
Although it can potentially be more confusing than helpful, the dis module can show you the byte-code representation of a function, which can reveal how Python works internally. Here is the disassembly of `factory:
>>> dis.dis(factory)
4 0 LOAD_GLOBAL 0 (Foo)
3 CALL_FUNCTION 0
6 STORE_FAST 0 (foo)
9 LOAD_CONST 0 (None)
12 RETURN_VALUE
What that says is, Python loads the global Foo class by name (0), and calls it (3, instantiation and calling are very similar), then stores the result in a local variable (6, see STORE_FAST). Then it loads the default return value None (9) and returns it (12)
What is the pythonic way of performing that ? Is it a pattern to avoid ? If it is a current mistake, how is it called ?
Factory functions are rarely necessary in Python. In the occasional case where they are necessary, you would just return the new instance from your factory (instead of trying to assign it to a passed-in variable):
class Foo(object):
def __init__(self):
self.member = 10
pass
def factory():
return Foo()
aTestFoo = factory()
print aTestFoo.member
Your factory method doesn't return anything - and by default it will have a return value of None. You assign aTestFoo to None, but never re-assign it - which is where your actual error is coming from.
Fixing these issues:
class Foo(object):
def __init__(self):
self.member = 10
pass
def factory(obj):
return obj()
aTestFoo = factory(Foo)
print aTestFoo.member
This should do what I think you are after, although such patterns are not that typical in Python (ie, factory methods).
I have to write a testing module and have c++-Background. That said, I am aware that there are no pointers in python but how do I achieve the following:
I have a test method which looks in pseudocode like this:
def check(self,obj,prop,value):
if obj.prop <> value: #this does not work,
#getattr does not work either, (objects has no such method (interpreter output)
#I am working with objects from InCyte's python interface
#the supplied findProp method does not do either (i get
#None for objects I can access on the shell with obj.prop
#and yes I supply the method with a string 'prop'
if self._autoadjust:
print("Adjusting prop from x to y")
obj.prop = value #setattr does not work, see above
else:
print("Warning Value != expected value for obj")
Since I want to check many different objects in separate functions I would like to be able to keep the check method in place.
In general, how do I ensure that a function affects the passed object and does not create a copy?
myobj.size=5
resize(myobj,10)
print myobj.size #jython =python2.5 => print is not a function
I can't make resize a member method since the myobj implementation is out of reach, and I don't want to type myobj=resize(myobj, 10) everywhere
Also, how can I make it so that I can access those attributes in a function to which i pass the object and the attribute name?
getattr isn't a method, you need to call it like this
getattr(obj, prop)
similarly setattr is called like this
setattr(obj, prop, value)
In general how do I ensure that a function affects the passed object and does not create a copy?
Python is not C++, you never create copies unless you explicitly do so.
I cant make resize a member method since myobj implementation is out of reach, and I don't want to type myobj=resize(myobj,10) everywere
I don't get it? Why should be out of reach? if you have the instance, you can invoke its methods.
In general, how do I ensure that a function affects the passed object
By writing code inside the function that affects the passed-in object, instead of re-assigning to the name.
and does not create a copy?
A copy is never created unless you ask for one.
Python "variables" are names for things. They don't store objects; they refer to objects. However, unlike C++ references, they can be made to refer to something else.
When you write
def change(parameter):
parameter = 42
x = 23
change(x)
# x is still 23
The reason x is still 23 is not because a copy was made, because a copy wasn't made. The reason is that, inside the function, parameter starts out as a name for the passed-in integer object 23, and then the line parameter = 42 causes parameter to stop being a name for 23, and start being a name for 42.
If you do
def change(parameter):
parameter.append(42)
x = [23]
change(x)
# now x is [23, 42]
The passed-in parameter changes, because .append on a list changes the actual list object.
I can't make resize a member method since the myobj implementation is out of reach
That doesn't matter. When Python compiles, there is no type-checking step, and there is no step to look up the implementation of a method to insert the call. All of that is handled when the code actually runs. The code will get to the point myobj.resize(), look for a resize attribute of whatever object myobj currently refers to (after all, it can't know ahead of time even what kind of object it's dealing with; variables don't have types in Python but instead objects do), and attempt to call it (throwing the appropriate exceptions if (a) the object turns out not to have that attribute; (b) the attribute turns out not to actually be a method or other sort of function).
Also, how can I make it so that I can access those attributes in a function to which i pass the object and the attribute name? / getattr does not work either
Certainly it works if you use it properly. It is not a method; it is a built-in top-level function. Same thing with setattr.