in pandas the inplace parameter make modification on the reference but I know in python data are sent by value not by reference i want to know how this is implemented or how this work
Python’s argument passing model is neither “Pass by Value” nor “Pass by Reference” but it is “Pass by Object Reference”
When you pass a dictionary to a function and modify that dictionary inside the function, the changes will reflect on the dictionary everywhere.
However, here we are dealing with something even less ambiguous. When passing inplace=True to a method call on a pandas object (be it a Series or a DataFrame), we are simply saying: change the current object instead of getting me a new one. Method calls can modify variables of the instances on which they were called - this is independent of whether a language is "call by value" or "call by reference". The only case in which this would get tricky is if a language only had constants (think val) and no variables (think var) - think purely functional languages. Then, it's true - you can only return new objects and can't modify any old ones. In practice, though, even in purest of languages you can find ways to update records in-place.
Related
Python doesn't seem to have a valid const qualifier per How do I create a constant in Python? What would be the most "pythonic" way for differentiating read-only / mutable function parameters? Should I just point it out in the comments?
# my_graph is READ-ONLY
# my_set is added items with property X ...
def my_lovely_function(my_graph,my_set):
In C and several other languages, typically real outputs are communicated via inputs that are passed by pointers or by reference. The reason for this is that the return value mechanism in many of these paradigms has been hijacked for error handling purposes i.e. return value is a success/error code while useful outputs are populated in an input/pointer reference. This has led for the need to denote some inputs as being untouchable (consts) and others as being touchable to prevent confusion in using the function.
Python typically doesn't want you to do things that way. It wants you to use Exceptions and exception handling for error handling and to use return statements for actual outputs. This is cleaner and more in line with the original idea of return values before they were highjacked by error handling.
In some cases, it is still more convenient to use a mutable input to transfer data out. Everything in python is always by reference. This is fine except if the calling context doesn't want you, the function, to modify the variable it provided as an input.
Python's solution is to 1) expect the function writer to properly document inputs, outputs, and side-effects on mutable inputs, and 2) provide the calling context the option of passing in immutable objects if they want to ensure those objects will not be changed.
So if you have a list and you don't want some function you call to add or subtract things from it, pass in the information as a tuple instead. No function will be able to add or subtract anything to your tuple, however they might be able to change elements of the tuple if those are mutable. Instead of a set, pass a frozenset. There is no immutable dict type, but you can get around that by passing a copy or translating it to a frozenset of tuples. Strings, ints, floats, and complex numbers are all immutable. Note that mutable objects embedded in immutable containers can still be changed. If this is undesired, then make sure they are immutable. Alternatively, if you are paranoid, you can call copy.deepcopy() on an object to make a totally independent copy (recursively) to pass into the function. Any changes at any nested level of this deep copy will not affect the original object.
When writing a function, it should be clear from the documentation (preferably docstring) or the code itself what the return values and side effects on mutable objects are. Using docstrings to capture this when writing a function is best practice.
When calling a function, you should defensively make use of immutable types (or deep copying if need be) as needed for your specific circumstances.
I understand that the dot operator is accessing the method specific to an object that is an instance of the class containing that method/function. However, in which cases do you instead call the function directly on an object, in the form func(obj) as opposed to obj.func()?
Can both techniques always be implemented (at least in custom code) or are there certain cases in which the former should be used over the latter, and vice versa?
I had previously read that the form func(obj) is for processing data that the object holds, but why would this not be possible with doing obj.dataMember.func(), is there an advantage to passing just the object, such as some change in mutability?
If the function exists exclusively to serve that object type, then you should probably make it a method of the class; that requires the obj.func() syntax.
If the function will also work on objects not of that one class, then you should make it a regular function, performing the generalization and discrimination with the function. This requires the syntax func(obj).
I am coding in Python trying to decide whether I should return a numpy array (the result of a diff on some other array) or return numpy.where(diff)[0], which is a smaller array but requires that little extra work to create. Let's call the method where this happens methodB.
I call methodB from methodA. The rub is that I won't necessarily always need the where() result in methodA, but I might. So is it worth doing this work inside methodB, or should I pass back the (much larger memory-wise) diff itself and then only process it further in methodA if needed? That would be the more efficient choice assuming methodA just gets a reference to the result.
So, are function results ever not copied when they are passed back the the code that called that function?
I believe that when methodB finishes, all the memory in its frame will be reclaimed by the system, so methodA has to actually copy anything returned by methodB in to its own frame in order to be able to use it. I would call this "return by value". Is this correct?
Yes, you are correct. In Python, arguments are always passed by value, and return values are always returned by value. However, the value being returned (or passed) is a reference to a potentially shared, potentially mutable object.
There are some types for which the value being returned or passed may be the actual object itself, e.g. this is the case for integers, but the difference between the two can only be observed for mutable objects which integers aren't, and de-referencing an object reference is completely transparent, so you will never notice the difference. To simplify your mental model, you may just assume that arguments and return values are always passed by value (this is true anyhow), and that the value being passed is always a reference (this is not always true, but you cannot tell the difference, you can treat it as a simple performance optimization).
Note that passing / returning a reference by value is in no way similar (and certainly not the same thing) as passing / returning by reference. In particular, it does not allow you to mutate the name binding in the caller / callee, as pass-by-reference would allow you to.
This particular flavor of pass-by-value, where the value is typically a reference is the same in e.g. ECMAScript, Ruby, Smalltalk, and Java, and is sometimes called "call by object sharing" (coined by Barbara Liskov, I believe), "call by sharing", "call by object", and specifically within the Python community "call by assignment" (thanks to #timgeb) or "call by name-binding" (thanks to #Terry Jan Reedy) (not to be confused with call by name, which is again a different thing).
Assignment never copies data. If you have a function foo that returns a value, then an assignment like result = foo(arg) never copies any data. (You could, of course, have copy-operations in the function's body.) Likewise, return x does not copy the object x.
Your question lacks a specific example, so I can't go into more detail.
edit: You should probably watch the excellent Facts and Myths about Python names and values talk.
So roughly your code is:
def methodA(arr):
x = methodB(arr)
....
def methodB(arr):
diff = somefn(arr)
# return diff or
# return np.where(diff)[0]
arr is a (large) array, that is passed a reference to methodA and methodB. No copies are made.
diff is a similar size array that is generated in methodB. If that is returned, it be referenced in the methodA namespace by x. No copy is made in returning it.
If the where array is returned, diff disappears when methodB returns. Assuming it doesn't share a data buffer with some other array (such as arr), all the memory that it occupied is recovered.
But as long as memory isn't tight, returning diff instead of the where result won't be more expensive. Nothing is copied during the return.
A numpy array consists of small object wrapper with attributes like shape and dtype. It also has a pointer to a potentially large data buffer. Where possible numpy tries to share buffers, but readily makes new ndarray objects. Thus there's an important distinction between view and copy.
I see what I missed now: Objects are created on the heap, but function frames are on the stack. So when methodB finishes, its frame will be reclaimed, but that object will still exist on the heap, and methodA can access it with a simple reference.
I know that in Python, because it's pass-by-sharing, if I pass a mutable object (like a list) to a function, and then use that function to mutate it, I don't need to explicitly pass it back, because the caller can see the changes:
def add_to_list(list_of_nums):
list_of_nums.append(26)
my_list = [12]
add_to_list(my_list)
print my_list # >>>[12, 26]
So this works. But is it a good idea/good python practice? My gut says it's not (the same way global variables are almost always a bad idea), but maybe that's just because I first learned C++ in all its pass-by-value glory.
And yes, I know that I can code my way around this (say by creating a class), but the question is, should I, or is this generally seen as acceptable practice?
I think whether or not this is "acceptable" will be determined by the context and how well the function is named. keep Pep 20 in mind: "Explicit is better than implicit."
Python fully supports functional programming, and in that case, modifying objects that are passed in is often expected. If the function is named appropriately and documented well, I think it's fine. Your example illustrates this pretty well. The function is called add_to_list, which pretty explicitly says what the function does.
If your program/script takes more of an object-oriented approach, modifying passed-in objects should be replaced by creating the appropriate classes instead, like the native list class in your example - it has an append() method that modifies the list instead of passing the list into a separate function.
The key is to be consistent with your paradigm and well documented. If you cover both of those bases, I think it's acceptable.
I would like to write a Python function that mutates one of the arguments (which is a list, ie, mutable). Something like this:
def change(array):
array.append(4)
change(array)
I'm more familiar with passing by value than Python's setup (whatever you decide to call it). So I would usually write such a function like this:
def change(array):
array.append(4)
return array
array = change(array)
Here's my confusion. Since I can just mutate the argument, the second method would seem redundant. But the first one feels wrong. Also, my particular function will have several parameters, only one of which will change. The second method makes it clear what argument is changing (because it is assigned to the variable). The first method gives no indication. Is there a convention? Which is 'better'? Thank you.
The first way:
def change(array):
array.append(4)
change(array)
is the most idiomatic way to do it. Generally, in python, we expect a function to either mutate the arguments, or return something1. The reason for this is because if a function doesn't return anything, then it makes it abundantly clear that the function must have had some side-effect in order to justify it's existence (e.g. mutating the inputs).
On the flip side, if you do things the second way:
def change(array):
array.append(4)
return array
array = change(array)
you're vulnerable to have hard to track down bugs where a mutable object changes all of a sudden when you didn't expect it to -- "But I thought change made a copy"...
1Technically every function returns something, that _something_ just happens to be None ...
The convention in Python is that functions either mutate something, or return something, not both.
If both are useful, you conventionally write two separate functions, with the mutator named for an active verb like change, and the non-mutator named for a participle like changed.
Almost everything in builtins and the stdlib follows this pattern. The list.append method you're calling returns nothing. Same with list.sort—but sorted leaves its argument alone and instead returns a new sorted copy.
There are a handful of exceptions for some of the special methods (e.g., __iadd__ is supposed to mutate and then return self), and a few cases where there clearly has to be one thing getting mutating and a different thing getting returned (like list.pop), and for libraries that are attempting to use Python as a sort of domain-specific language where being consistent with the target domain's idioms is more important than being consistent with Python's idioms (e.g., some SQL query expression libraries). Like all conventions, this one is followed unless there's a good reason not to.
So, why was Python designed this way?
Well, for one thing, it makes certain errors obvious. If you expected a function to be non-mutating and return a value, it'll be pretty obvious that you were wrong, because you'll get an error like AttributeError: 'NoneType' object has no attribute 'foo'.
It also makes conceptual sense: a function that returns nothing must have side-effects, or why would anyone have written it?
But there's also the fact that each statement in Python mutates exactly one thing—almost always the leftmost object in the statement. In other languages, assignment is an expression, mutating functions return self, and you can chain up a whole bunch of mutations into a single line of code, and that makes it harder to see the state changes at a glance, reason about them in detail, or step through them in a debugger.
Of course all of this is a tradeoff—it makes some code more verbose in Python than it would be in, say, JavaScript—but it's a tradeoff that's deeply embedded in Python's design.
It hardly ever makes sense to both mutate an argument and return it. Not only might it cause confusion for whoever's reading the code, but it leaves you susceptible to the mutable default argument problem. If the only way to get the result of the function is through the mutated argument, it won't make sense to give the argument a default.
There is a third option that you did not show in your question. Rather than mutating the object passed as the argument, make a copy of that argument and return it instead. This makes it a pure function with no side effects.
def change(array):
array_copy = array[:]
array_copy.append(4)
return array_copy
array = change(array)
From the Python documentation:
Some operations (for example y.append(10) and y.sort()) mutate the
object, whereas superficially similar operations (for example y = y +
[10] and sorted(y)) create a new object. In general in Python (and in
all cases in the standard library) a method that mutates an object
will return None to help avoid getting the two types of operations
confused. So if you mistakenly write y.sort() thinking it will give
you a sorted copy of y, you’ll instead end up with None, which will
likely cause your program to generate an easily diagnosed error.
However, there is one class of operations where the same operation
sometimes has different behaviors with different types: the augmented
assignment operators. For example, += mutates lists but not tuples or
ints (a_list += [1, 2, 3] is equivalent to a_list.extend([1, 2, 3])
and mutates a_list, whereas some_tuple += (1, 2, 3) and some_int += 1
create new objects).
Basically, by convention, a function or method that mutates an object does not return the object itself.