I am a beginner in python and I find this about mutabilty quite confusing and non intuitive. Given a list:
lst = [[1, 2, 3], [4, 5, 6]]
And trying to change the list within a for-loop.
for i in lst:
i = "test"
I understand that this does not change the list. But:
for i in lst:
i[1] = "test"
I was surprised that referring to the sublist led to following outcome:
[[1, 'test', 3], [4, 'test', 6]]
I tried to understand with a visualizer but I don't get it. Would anybody please explain this in plain words? Thank you.
In the first case, you simply have a copied reference to the element.
i ---> lst[n]
In the latter case, you are dereferencing the reference, and changing data (not in a copy):
i[i] ---> lst[n][i]
Therefore assigning to i[n] will point to the actual mutable object, not a copy of it.
Assignment (which is what the = operator does) assigns a value to a name.
i is a variable name in the local namespace. At the time it runs inside the for loop, it refers to a list. By assigning a string to it, you cause the name to no longer refer to a list, but instead refer to the string. The list is unaffected -- the only thing that changes is the value associated with the name.
i[1] is a name that specifies a specific location inside one of the two lists, depending on what i is set to at the time. By assigning a string to it, you cause the name to no longer refer to the object that previously occupied that space inside the list (an integer, 2 or 5 depending on the list) and instead refer to the string. The integer is unaffected -- the only thing that changes is the value associated with the name.
So in each case, assignment is doing the same thing -- it's making a name refer to a new thing instead of the old thing it referred to. The difference is that the second case is a special name in that it refers to a property of a mutable object.
for does not make copies of each element it yields. As such, the yielded object retains all the properties of the original (since it is the original), including mutability.
since for loop in case of string is called in different way
like
for i in lst:
it means if lst is list or array of string then i is referenced to the value of x[i] directly that is why in first case result was "test " two times because length of "lst" was just 2
while in second case i[1] means
i ------lst[i]
i[1]--------`-lst[i][1] means
first value equal lst[0][1]---[1,'test',3]
second value equal lst[1][1]--[4,'test',6]
Related
2 part question. Given the following code, my understanding is that i is a variable created within the loop function. As such, the changes have local scope and should not carry over ‘outside’ the function.
list = [3,4,5,6,7]
for element in list:
i = element * 100
print(i)
print(i)
printing i within the loop will produce the correct mathematical change. However, the second print(i) outside the loop returns 700. This technically refers to the correct reassignment of element 7 from the original list. So if changes in the loop only exist within the loop, why is it that this last one carried over ‘outside the loop’?
Furthermore, why is it that print(i) outside the loop returns the change to the last element? Why not the first element? Why not all of them? Is there some function I can call outside the loop to see the changes applied to elements 3,4,5,6?
Part 2 of my question - I know for a change to apply outside a loop, you should target the element via its index itself. Eg use for ‘element’ in range(len(list)). But can one also do this with enumerate? If so, how?
It seems that enumerate returns an object in the form of a tuple (it adds an ‘index counter’ as the first element, and keeps list as the 2nd element). And since tuples are immutable it would seem there is no way to effect a change on a global scope, is that correct?
For example, when I run the following code:
my_list = [1,2,100]
for xyz in enumerate(my_list):
xyz = 2 * xyz
print(xyz)
All it does it return to me the final element in my_list, with its index counter, concatenated to itself (‘doubled’). Eg (2,100) has become (2,100, 2,100). So is there no way to use enumerate to change elements within the original list?
To your first question, let's go through your misunderstandings one by one:
i is a variable created within the loop function
No. There is no "loop function" here. (User #0x5453 already mentioned this briefly in a comment.) The for-loop is just a language construct to facilitate iteration. Thus, i is just a variable that happens to be created, assigned, and re-assigned multiple times throughout the iteration of the for-loop.
the changes have local scope and should not carry over ‘outside’
See above. There is no other scope here. All those lines of code, starting with the list assignment and ending with that print(i) after the loop are all in the same scope. This should answer the following of your questions:
why is it that this last [change] carried over ‘outside the loop’?
why is it that print(i) outside the loop returns the change to the last element?
The variable i was re-assigned multiple times. In the last iteration of the for-loop, the value assigned to i was 7 * 100, i.e. 700. Then the loop ends, and nothing else happens to i, so it still holds the value 700.
To the second question:
for a change to apply outside a loop, you should target the element via its index itself
If you refer to the your list, then the loop has nothing to do with that at all. If you want to change an element of the list, yes, re-assignment only works by targeting the index of that element in the list:
my_list = [1, 2, 100]
my_list[2] = 420
print(my_list) # output: [1, 2, 420]
can one also do this with enumerate?
No, enumerate is something else entirely. Calling enumerate on my_list returns an iterator, which can then be used in a for-loop just like the original list can, but each element that iterator produces is (in its simplest form) a 2-tuple, where the second element is an element from my_list and the first element is the index of that element.
Try this:
for tup in enumerate(my_list):
print(tup[0], tup[1])
This will give you:
0 1
1 2
2 420
Since tuples are iterables, they support unpacking, which is why it is common to go for more readability and instead do this:
for idx, element in enumerate(my_list):
print(idx, element)
The output is the same.
Nothing about this changes my_list. All these operations do, is iterate over it in some way, which just produces elements from it one by one.
Now, you do this:
for xyz in enumerate(my_list):
xyz = 2 * xyz
Since xyz is just that index-element-tuple produced by enumerate, multiplying it by 2 just concatenates it with itself, as you correctly noted, creating a new tuple and re-assigning it to xyz. In the next iteration it gets overwritten again by what enumerate produces, then re-assigned again by you and so on. Again, none of that changes my_list, it just changes that xyz variable.
If I understood you correctly, you want to be able to mutate your list within a loop over its elements. While this can quickly lead to dangerous territory, a simple working example would be this:
for idx, element in enumerate(my_list):
my_list[idx] = 2 * element
print(my_list)
The output:
[2, 4, 840]
Now we actually change/re-assign specific elements in that list. As before, assignment only works with an index. But the index is what we get as the first element of the 2-tuple provided by enumerate in this case.
However, there are sometimes (arguably) more elegant ways to accomplish the same thing. For instance, we could use list comprehension to achieve the same result:
my_list = [element * 2 for element in my_list]
Here we overwrite my_list with a new one that we created through list comprehension iterating over the original list.
Hope this helps clear up some misconceptions.
I've noticed that many operations on lists that modify the list's contents will return None, rather than returning the list itself. Examples:
>>> mylist = ['a', 'b', 'c']
>>> empty = mylist.clear()
>>> restored = mylist.extend(range(3))
>>> backwards = mylist.reverse()
>>> with_four = mylist.append(4)
>>> in_order = mylist.sort()
>>> without_one = mylist.remove(1)
>>> mylist
[0, 2, 4]
>>> [empty, restored, backwards, with_four, in_order, without_one]
[None, None, None, None, None, None]
What is the thought process behind this decision?
To me, it seems hampering, since it prevents "chaining" of list processing (e.g. mylist.reverse().append('a string')[:someLimit]). I imagine it might be that "The Powers That Be" decided that list comprehension is a better paradigm (a valid opinion), and so didn't want to encourage other methods - but it seems perverse to prevent an intuitive method, even if better alternatives exist.
This question is specifically about Python's design decision to return None from mutating list methods like .append. Novices often write incorrect code that expects .append (in particular) to return the same list that was just modified.
For the simple question of "how do I append to a list?" (or debugging questions that boil down to that problem), see Why does "x = x.append([i])" not work in a for loop?.
To get modified versions of the list, see:
For .sort: How can I get a sorted copy of a list?
For .reverse: How can I get a reversed copy of a list (avoid a separate statement when chaining a method after .reverse)?
The same issue applies to some methods of other built-in data types, e.g. set.discard (see How to remove specific element from sets inside a list using list comprehension) and dict.update (see Why doesn't a python dict.update() return the object?).
The same reasoning applies to designing your own APIs. See Is making in-place operations return the object a bad idea?.
The general design principle in Python is for functions that mutate an object in-place to return None. I'm not sure it would have been the design choice I'd have chosen, but it's basically to emphasise that a new object is not returned.
Guido van Rossum (our Python BDFL) states the design choice on the Python-Dev mailing list:
I'd like to explain once more why I'm so adamant that sort() shouldn't
return 'self'.
This comes from a coding style (popular in various other languages, I
believe especially Lisp revels in it) where a series of side effects
on a single object can be chained like this:
x.compress().chop(y).sort(z)
which would be the same as
x.compress()
x.chop(y)
x.sort(z)
I find the chaining form a threat to readability; it requires that the
reader must be intimately familiar with each of the methods. The
second form makes it clear that each of these calls acts on the same
object, and so even if you don't know the class and its methods very
well, you can understand that the second and third call are applied to
x (and that all calls are made for their side-effects), and not to
something else.
I'd like to reserve chaining for operations that return new values,
like string processing operations:
y = x.rstrip("\n").split(":").lower()
There are a few standard library modules that encourage chaining of
side-effect calls (pstat comes to mind). There shouldn't be any new
ones; pstat slipped through my filter when it was weak.
I can't speak for the developers, but I find this behavior very intuitive.
If a method works on the original object and modifies it in-place, it doesn't return anything, because there is no new information - you obviously already have a reference to the (now mutated) object, so why return it again?
If, however, a method or function creates a new object, then of course it has to return it.
So l.reverse() returns nothing (because now the list has been reversed, but the identfier l still points to that list), but reversed(l) has to return the newly generated list because l still points to the old, unmodified list.
EDIT: I just learned from another answer that this principle is called Command-Query separation.
One could argue that the signature itself makes it clear that the function mutates the list rather than returning a new one: if the function returned a list, its behavior would have been much less obvious.
If you were sent here after asking for help fixing your code:
In the future, please try to look for problems in the code yourself, by carefully studying what happens when the code runs. Rather than giving up because there is an error message, check the result of each calculation, and see where the code starts working differently from what you expect.
If you had code calling a method like .append or .sort on a list, you will notice that the return value is None, while the list is modified in place. Study the example carefully:
>>> x = ['e', 'x', 'a', 'm', 'p', 'l', 'e']
>>> y = x.sort()
>>> print(y)
None
>>> print(x)
['a', 'e', 'e', 'l', 'm', 'p', 'x']
y got the special None value, because that is what was returned. x changed, because the sort happened in place.
It works this way on purpose, so that code like x.sort().reverse() breaks. See the other answers to understand why the Python developers wanted it that way.
To fix the problem
First, think carefully about the intent of the code. Should x change? Do we actually need a separate y?
Let's consider .sort first. If x should change, then call x.sort() by itself, without assigning the result anywhere.
If a sorted copy is needed instead, use y = x.sorted(). See How can I get a sorted copy of a list? for details.
For other methods, we can get modified copies like so:
.clear -> there is no point to this; a "cleared copy" of the list is just an empty list. Just use y = [].
.append and .extend -> probably the simplest way is to use the + operator. To add multiple elements from a list l, use y = x + l rather than .extend. To add a single element e wrap it in a list first: y = x + [e]. Another way in 3.5 and up is to use unpacking: y = [*x, *l] for .extend, y = [*x, e] for .append. See also How to allow list append() method to return the new list for .append and How do I concatenate two lists in Python? for .extend.
.reverse -> First, consider whether an actual copy is needed. The built-in reversed gives you an iterator that can be used to loop over the elements in reverse order. To make an actual copy, simply pass that iterator to list: y = list(reversed(x)). See How can I get a reversed copy of a list (avoid a separate statement when chaining a method after .reverse)? for details.
.remove -> Figure out the index of the element that will be removed (using .index), then use slicing to find the elements before and after that point and put them together. As a function:
def without(a_list, value):
index = a_list.index(value)
return a_list[:index] + a_list[index+1:]
(We can translate .pop similarly to make a modified copy, though of course .pop actually returns an element from the list.)
See also A quick way to return list without a specific element in Python.
(If you plan to remove multiple elements, strongly consider using a list comprehension (or filter) instead. It will be much simpler than any of the workarounds needed for removing items from the list while iterating over it. This way also naturally gives a modified copy.)
For any of the above, of course, we can also make a modified copy by explicitly making a copy and then using the in-place method on the copy. The most elegant approach will depend on the context and on personal taste.
As we know list in python is a mutable object and one of characteristics of mutable object is the ability to modify the state of this object without the need to assign its new state to a variable. we should demonstrate more about this topic to understand the root of this issue.
An object whose internal state can be changed is mutable. On the other hand, immutable doesn’t allow any change in the object once it has been created. Object mutability is one of the characteristics that makes Python a dynamically typed language.
Every object in python has three attributes:
Identity – This refers to the address that the object refers to in the computer’s memory.
Type – This refers to the kind of object that is created. For example integer, list, string etc.
Value – This refers to the value stored by the object. For example str = "a".
While ID and Type cannot be changed once it’s created, values can be changed for Mutable objects.
let us discuss the below code step-by-step to depict what it means in Python:
Creating a list which contains name of cities
cities = ['London', 'New York', 'Chicago']
Printing the location of the object created in the memory address in hexadecimal format
print(hex(id(cities)))
Output [1]: 0x1691d7de8c8
Adding a new city to the list cities
cities.append('Delhi')
Printing the elements from the list cities, separated by a comma
for city in cities:
print(city, end=', ')
Output [2]: London, New York, Chicago, Delhi
Printing the location of the object created in the memory address in hexadecimal format
print(hex(id(cities)))
Output [3]: 0x1691d7de8c8
The above example shows us that we were able to change the internal state of the object cities by adding one more city 'Delhi' to it, yet, the memory address of the object did not change. This confirms that we did not create a new object, rather, the same object was changed or mutated. Hence, we can say that the object which is a type of list with reference variable name cities is a MUTABLE OBJECT.
While the immutable object internal state can not be changed. For instance, consider the below code and associated error message with it, while trying to change the value of a Tuple at index 0
Creating a Tuple with variable name foo
foo = (1, 2)
Changing the index 0 value from 1 to 3
foo[0] = 3
TypeError: 'tuple' object does not support item assignment
We can conclude from the examples why mutable object shouldn't return anything when executing operations on it because it's modifying the internal state of the object directly and there is no point in returning new modified object. unlike immutable object which should return new object of the modified state after executing operations on it.
First of All, I should tell that what I am suggesting is without a doubt, a bad programming practice but if you want to use append in lambda function and you don't care about the code readability, there is way to just do that.
Imagine you have a list of lists and you want to append a element to each inner lists using map and lambda. here is how you can do that:
my_list = [[1, 2, 3, 4],
[3, 2, 1],
[1, 1, 1]]
my_new_element = 10
new_list = list(map(lambda x: [x.append(my_new_element), x][1], my_list))
print(new_list)
How it works:
when lambda wants to calculate to output, first it should calculate the [x.append(my_new_element), x] expression. To calculate this expression the append function will run and the result of expression will be [None, x] and by specifying that you want the second element of the list the result of [None,x][1] will be x
Using custom function is more readable and the better option:
def append_my_list(input_list, new_element):
input_list.append(new_element)
return input_list
my_list = [[1, 2, 3, 4],
[3, 2, 1],
[1, 1, 1]]
my_new_element = 10
new_list = list(map(lambda x: append_my_list(x, my_new_element), my_list))
print(new_list)
This problem is very simple to appreciate, here is the program -
hisc = [1,2,3,4]
print("\n", hisc)
ohisc = hisc
hisc.append(5)
print("\nPreviously...", ohisc)
print("\nAnd now...", hisc)
input("\nETE")
When I run it ohisc gets the 5. Why does ohisc change? How can I stop it from changing?
Apologies if this is something obvious.
Python variables are references. As such, the assignment copies the reference rather than the content of the variable.
In order to avoid this, all you have to do is create a new object:
ohisc = list(hisc)
This is using the list constructor which creates a new list from a given iterable.
Alternatively you can also assign from a slice (which creates a new object):
ohisc = hisc[:]
[] is the general slice operator which is used to extract a subset from a given collection. We simply leave out the start and end position (they default to the begin and end of the collection, respectively).
You definitely need to understand everything in Konrad Rudolph's answer. And I think in your specific case, that's what you want, too. But often there's a better way: If you avoid mutating objects (that is, changing them in-place), it never matters whether two names are referring to the same object or not. For example, you can change this:
hisc.append(5)
to this:
hisc = hisc + [5]
That doesn't change hisc in-place; it creates a new list, with the 5 added on to the end of it, and then assigns that to hisc. So, the fact that ohisc was pointing to the same list as hisc doesn't matter—that list is still there, unchanged, for ohisc to point to.
Let's say you wanted to replace all the negative values of the list with 0. That's pretty easy with mutation:
for i in range(len(lst)):
list[i] = max(list[i], 0)
But even easier without:
lst = [max(elem, 0) for elem in lst]
Now, what if you wanted to remove every negative list element? You can't change the shape of a sequence while looping over it, so you have to either make a copy of the list (so you can loop over one copy while you change the other), or come up with a more complicated algorithm (e.g., swap each 0 backward and then remove all the 0's at the end). But it's easy to do immutably:
lst = [elem for elem in lst if elem >= 0]
So, when would you ever want to mutate? Well, often you want two references to the same object, so when you update one, the other one sees the changes. In that case, you obviously have to have actual changes for the other one to see.
Here's a good explanation of what is happening: Python: copying a list the right way
Basically, you're making a pointer to the list but what you want to do is make a copy of the list.
Try this instead:
hisc = [1,2,3,4]
ohisc = hisc[:]
This question already has answers here:
How do I pass a variable by reference?
(39 answers)
Closed 9 months ago.
My code :
locs = [ [1], [2] ]
for loc in locs:
loc = []
print locs
# prints => [ [1], [2] ]
Why is loc not reference of elements of locs ?
Python : Everything is passed as reference unless explicitly copied [ Is this not True ? ]
Please explain.. how does python decides referencing and copying ?
Update :
How to do ?
def compute(ob):
if isinstance(ob,list): return process_list(ob)
if isinstance(ob,dict): return process_dict(ob)
for loc in locs:
loc = compute(loc) # What to change here to make loc a reference of actual locs iteration ?
locs must contain the final processed response !
I don't want to use enumerate, is it possible without it ?
Effbot (aka Fredrik Lundh) has described Python's variable passing style as call-by-object: http://effbot.org/zone/call-by-object.htm
Objects are allocated on the heap and pointers to them can be passed around anywhere.
When you make an assignment such as x = 1000, a dictionary entry is created that maps the string "x" in the current namespace to a pointer to the integer object containing one thousand.
When you update "x" with x = 2000, a new integer object is created and the dictionary is updated to point at the new object. The old one thousand object is unchanged (and may or may not be alive depending on whether anything else refers to the object).
When you do a new assignment such as y = x, a new dictionary entry "y" is created that points to the same object as the entry for "x".
Objects like strings and integers are immutable. This simply means that there are no methods that can change the object after it has been created. For example, once the integer object one-thousand is created, it will never change. Math is done by creating new integer objects.
Objects like lists are mutable. This means that the contents of the object can be changed by anything pointing to the object. For example, x = []; y = x; x.append(10); print y will print [10]. The empty list was created. Both "x" and "y" point to the same list. The append method mutates (updates) the list object (like adding a record to a database) and the result is visible to both "x" and "y" (just as a database update would be visible to every connection to that database).
Hope that clarifies the issue for you.
Everything in Python is passed and assigned by value, in the same way that everything is passed and assigned by value in Java. Every value in Python is a reference (pointer) to an object. Objects cannot be values. Assignment always copies the value (which is a pointer); two such pointers can thus point to the same object. Objects are never copied unless you're doing something explicit to copy them.
For your case, every iteration of the loop assigns an element of the list into the variable loc. You then assign something else to the variable loc. All these values are pointers; you're assigning pointers; but you do not affect any objects in any way.
It doesn't help in Python to think in terms of references or values. Neither is correct.
In Python, variables are just names. In your for loop, loc is just a name that points to the current element in the list. Doing loc = [] simply rebinds the name loc to a different list, leaving the original version alone.
But since in your example, each element is a list, you could actually mutate that element, and that would be reflected in the original list:
for loc in locs:
loc[0] = loc[0] * 2
When you say
loc = []
you are rebinding the loc variable to a newly created empty list
Perhaps you want
loc[:] = []
Which assigns a slice (which happens to be the whole list) of loc to the empty list
Everything is passed by object. Rebinding and mutating are different operations.
locs = [ [1], [2] ]
for loc in locs:
del loc[:]
print locs
Why is loc not reference of elements of locs ?
It is. Or at least, it is in the same sense that every other variable in Python is. Python variables are names, not storage. loc is a name that is used to refer to elements of [[1,2], [3,4]], while locs is a name that refers to the entire structure.
loc = []
This does not mean "look at the thing that loc names, and cause it to turn into []". It cannot mean that, because Python objects are not capable of such a thing.
Instead, it means "cause loc to stop being a name for the thing that it's currently a name for, and start instead being a name for []". (Of course, it means the specific [] that's provided there, since in general there may be several objects in memory that are the same.)
Naturally, the contents of locs are unchanged as a result.
How do I remove a character from an element in a list?
Example:
mylist = ['12:01', '12:02']
I want to remove the colon from the time stamps in a file, so I can more easily convert them to a 24hour time. Right now I am trying to loop over the elements in the list and search for the one's containing a colon and doing a substitute.
for num in mylist:
re.sub(':', '', num)
But that doesn't seem to work.
Help!
The list comprehension solution is the most Pythonic one, but, there's an important twist:
mylist[:] = [s.replace(':', '') for s in mylist]
If you assign to mylist, the barename, as in the other answer, rather than to mylist[:], the "whole-list slice", as I recommend, you're really doing something very different than "replacing entries in the list": you're making a new list and just rebinding the barename that you were previously using to refer to the old list.
If that old list is being referred to by multiple names (including entries in containers), this rebinding doesn't affect any of those: for example, if you have a function which takes mylist as an argument, the barename assignment has any effect only locally to the function, and doesn't alter what the caller sees as the list's contents.
Assigning to the whole-list slice, mylist[:] = ..., alters the list object rather than mucking around with switching barenames' bindings -- now that list is truly altered and, no matter how it's referred to, the new value is what's seen. For example, if you have a function which takes mylist as an argument, the whole-list slice assignment alters what the caller sees as the list's contents.
The key thing is knowing exactly what effect you're after -- most commonly you'll want to alter the list object, so, if one has to guess, whole-list slice assignment is usually the best guess to take;-). Performance-wise, it makes no difference either way (except that the barename assignment, if it keeps both old and new list objects around, will take up more memory for whatever lapse of time both objects are still around, of course).
Use a list comprehension to generate a new list:
>>> mylist = ['12:01', '12:02']
>>> mylist = [s.replace(':', '') for s in mylist]
>>> print mylist
['1201', '1202']
The reason that your solution doesn't work is that re.sub returns a new string -- strings are immutable in Python, so re.sub can't modify your existing strings.
for i, num in enumerate(mylist):
mylist[i] = num.replace(':','')
You have to insert the return of re.sub back in the list. Below is for a new list. But you can do that for mylist as well.
mylist = ['12:01', '12:02']
tolist = []
for num in mylist:
a = re.sub(':', '', num)
tolist.append(a)
print tolist
Strings in python are immutable, meaning no function can change the contents of an existing string, only provide a new string. Here's why.
See here for a discussion on string types that can be changed. In practice though, it's better to adjust to the immutability of strings.
Instead of list comprehension, you can also use a map call:
mylist = ['12:01', '12:02']
map(lambda f: f.replace(':', ''), mylist)
Returns:
['1201', '1202']