Removing while iterating on sequences in python - python

Could any one please explain how removing items from a list works and why removing items from a set doesn't work?
Though we can iterate both lists and sets, why can't changes be made to sets? Is it because sets aren't ordered?

Under the hood SET implementation is totally different from LIST.
List: Python’s lists are variable-length arrays, not Lisp-style linked lists. The implementation uses a contiguous array of references to other objects, and keeps a pointer to this array and the array’s length in a list head structure.
Set: a set uses a hashtable as it's underlying data structure. Just like a dictionary but with dummy values. And we are using the key as element in set like list.
Dictionaries or Set implement a tp_iter slot that returns an efficient iterator that iterates over the keys of the dictionary. During such an iteration, the dictionary or set should not be modified, except that setting the value for an existing key is
allowed (deletions or additions are not, nor is the update() method). This means that we can write
So, while you are iterating a set
>>> for i in s:
... s.pop()
...
0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: Set changed size during iteration
>>>
But if you use while you can delete or update it:
>>> s = set(range(5))
>>> while s:
... s.pop()
... print s
...
0
set([1, 2, 3, 4])
1
set([2, 3, 4])
2
set([3, 4])
3
set([4])
4
set([])
>>>
You can see here in the source code:

Related

Iterating and Removing From a Set - Possible or Not?

I don't have a specific piece of code that I want looked at; however, I do have a question that I can't seem to get a straight, clear answer.
Here's the question: if I have a set, I can iterate over it in a for loop. As I iterate over it can I remove specific numbers using .remove() or do I have to convert my set to a list first? If that is the case, why must I convert it first?
In both cases, you should avoid iterating and removing items from a list or set. It's not a good idea to modify something that you're iterating through as you can get unexpected results. For instance, lets start with a set
numbers_set = {1,2,3,4,5,6}
for num in numbers_set:
numbers_set.remove(num)
print(numbers_set)
We attempt to iterate through and delete each number but we get this error.
Traceback (most recent call last):
File ".\test.py", line 2, in <module>
for num in numbers_set:
RuntimeError: Set changed size during iteration
Now you mentioned "do I have to convert my set to a list first?". Well lets test it out.
numbers_list = [1,2,3,4,5,6]
for num in numbers_list:
print(num)
numbers_list.remove(num)
print(numbers_list)
This is the result:
[2, 4, 6]
We would expect the list to be empty but it gave us this result. Whether you're trying to iterate through a list or a set and delete items, its generally not a good idea.
#nathancy has already given a good explanation as to why deleting during iteration won't work, but I'd like to suggest an alternative: instead of doing the deletion at the same time as you iterate, do it instead as a second stage. So, you'd instead:
Iterate over your set to decide what you want to delete, and store the collection of things to be deleted separately.
Iterate over your to-be-deleted collection and removing each item from the original set.
For instance:
def should_be_deleted(num):
return num % 2 == 0
my_set = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
to_delete = []
for value in my_set:
if should_be_deleted(value):
to_delete.append(value)
for value in to_delete:
my_set.remove(value)
print(my_set)
Prints:
set([1, 3, 5, 7, 9])
The same pattern can be applied to delete from any collection—not just set, but also list, dict, etc.
This is how I'd do this:
myset = ...
newset = set()
while myset:
v = myset.pop()
if not do_i_want_to_delete_this_value(v):
newset.add(v)
myset = newset
A list comprehension will work too:
myset = set([x for x in myset if not do_i_want_to_delete_this_value(x)])
But this gets messy if you want to do other stuff while you're iterating and you don't want to wrap all that logic in a single function call. Nothing wrong with doing that though.
myset = set([x for x in myset if process_element(x)])
process_element() just has to return True/False to say if the element should be removed from the set.

What does the list() function do in Python?

I know that the list() constructor creates a new list but what exactly are its characteristics?
What happens when you call list((1,2,3,4,[5,6,7,8],9))?
What happens when you call list([[[2,3,4]]])?
What happens when you call list([[1,2,3],[4,5,6]])?
From what I can tell, calling the constructor list removes the most outer braces (tuple or list) and replaces them with []. Is this true? What other nuances does list() have?
list() converts the iterable passed to it to a list. If the itertable is already a list then a shallow copy is returned, i.e only the outermost container is new rest of the objects are still the same.
>>> t = (1,2,3,4,[5,6,7,8],9)
>>> lst = list(t)
>>> lst[4] is t[4] #outermost container is now a list() but inner items are still same.
True
>>> lst1 = [[[2,3,4]]]
>>> id(lst1)
140270501696936
>>> lst2 = list(lst1)
>>> id(lst2)
140270478302096
>>> lst1[0] is lst2[0]
True
Python has a well-established documentation set for every release version, readable at https://docs.python.org/. The documentation for list() states that list() is merely a way of constructing a list object, of which these are the listed ways:
Using a pair of square brackets to denote the empty list: []
Using square brackets, separating items with commas: [a], [a, b, c]
Using a list comprehension: [x for x in iterable]
Using the type constructor: list() or list(iterable)
The list() function accepts any iterable as its argument, and the return value is a list object.
Further reading: https://docs.python.org/3.4/library/stdtypes.html#typesseq-list
Yes it is true.
Its very simple. list() takes an iterable object as input and adds its elements to a newly created list. Elements can be anything. It can also be an another list or an iterable object, and it will be added to the new list as it is.
i.e no nested processing will happen.
You said: "From what I can tell, calling the constructor list removes the most outer braces (tuple or list) and replaces them with []. Is this true?"
IMHO, this is not a good way to think about what list() does. True, square brackets [] are used to write a list literal, and are used when you tell a list to represent itself as a string, but ultimately, that's just notation. It's better to think of a Python list as a particular kind of container object with certain properties, eg it's ordered, indexable, iterable, mutable, etc.
Thinking of the list() constructor in terms of it performing a transformation on the kind of brackets of a tuple that you pass it is a bit like saying adding 3 to 6 turns the 6 upside down to make 9. It's true that a '9' glyph looks like a '6' glyph turned upside down, but that's got nothing to do with what happens on the arithmetic level, and it's not even true of all fonts.
aTuple = (123, 'xyz', 'zara', 'abc');
aList = list(aTuple)
print "List elements : ", aList
When we run above program, it produces following result:
List elements : [123, 'xyz', 'zara', 'abc']
It is another way to create a list in python. How convenient!
Your question is vague, but this is the output as follows, it doesn't "replace" the outer braces, it creates a data structure of a list, that can contain any value in a "listed" order (one after the other, after the other, and so on...) in a recursive way, you can add/remove elements to a specified index using append and pop. By the other hand, tuples are static and are not dynamically linked, they are more like an array of any type of element.
WHEN:
list((1,2,3,4,[5,6,7,8],9))
RETURNS:
[1, 2, 3, 4, [5, 6, 7, 8], 9]
WHEN:
list([[[2,3,4]]])
RETURNS:
[[[2, 3, 4]]]
WHEN:
list([[1,2,3],[4,5,6]])
RETURNS:
[[1, 2, 3], [4, 5, 6]]

Append item to a specified list in a list of lists (Python) [duplicate]

This question already has answers here:
List of lists changes reflected across sublists unexpectedly
(17 answers)
Closed 9 years ago.
I'm practicing my progamming skills by solving problems from project euler at the moment, and now I've come across some (in my opinion) strange behavior on Python.
When I do:
list = [[1]]*20
I get a list of 20 lists containing element 1, as expected. However, when I would like to append a 2 to the third element from this list, I would do that as follows:
list[3].append(2)
This however changes ALL the elements in the list. Even when I take a detour, like:
l = list[3]
l.append(2)
list[3] = l
All my elements get changed. Can anyone please tell me how to do this and get an output like so:
[[1], [1], [1], [1, 2], [1] .... [1]]
Thanks in advance.
Python lists are mutable objects, so when you do [[1]]*20 it creates one list object [1] and then places 20 references to it in the toplevel list.
As far as the mutability problem is concerned, this is the same as the following
a = [1,2,3]
b = a
b.append(4)
a # [1,2,3,4]
This happens because b=a merely copies the reference to the list instance from a to b. They are both referring to the same actual list.
In order to create a list of lists, like you tried above, you need to create a unique list for each entry. A list comprehension works nicely:
mainlist = [[1] for x in range(20)]
mainlist[0].append(2)
mainlist # [[1,2],[1],[1],...]
Edit
As an aside, since type names are metaclasses in Python, naming your variables by the type name is a bad idea.
The reason is that can cause several issues further down in the code:
a = range(3) # [0,1,2]
type(a) # (type 'list')
isinstance(a, list) # True
Now, create a variable named list
list = range(3)
list # [0,1,2]
isinstance(list, list)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: isinstance() arg 2 must be a class, type, or tuple of classes and types
Not to mention, now you cant use the list() operator
c = list((1,2,3))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'list' object is not callable

Understanding the behavior of Python's set

The documentation for the built-in type set says:
class set([iterable])
Return a new set or frozenset object
whose elements are taken from
iterable. The elements of a set must
be hashable.
That is all right but why does this work:
>>> l = range(10)
>>> s = set(l)
>>> s
set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
And this doesn't:
>>> s.add([10])
Traceback (most recent call last):
File "<pyshell#7>", line 1, in <module>
s.add([10])
TypeError: unhashable type: 'list'
Both are lists. Is some magic happening during the initialization?
When you initialize a set, you provide a list of values that must each be hashable.
s = set()
s.add([10])
is the same as
s = set([[10]])
which throws the same error that you're seeing right now.
In [13]: (2).__hash__
Out[13]: <method-wrapper '__hash__' of int object at 0x9f61d84>
In [14]: ([2]).__hash__ # nothing.
The thing is that set needs its items to be hashable, i.e. implement the __hash__ magic method (this is used for ordering in the tree as far as I know). list does not implement that magic method, hence it cannot be added in a set.
In this line:
s.add([10])
You are trying to add a list to the set, rather than the elements of the list. If you want ot add the elements of the list, use the update method.
Think of the constructor being something like:
class Set:
def __init__(self,l):
for elem in l:
self.add(elem)
Nothing too interesting to be concerned about why it takes lists but on the other hand add(element) does not.
It behaves according to the documentation: set.add() adds a single element (and since you give it a list, it complains it is unhashable - since lists are no good as hash keys). If you want to add a list of elements, use set.update(). Example:
>>> s = set([1,2,3])
>>> s.add(5)
>>> s
set([1, 2, 3, 5])
>>> s.update([8])
>>> s
set([8, 1, 2, 3, 5])
s.add([10]) works as documented. An exception is raised because [10] is not hashable.
There is no magic happening during initialisation.
set([0,1,2,3,4,5,6,7,8,9]) has the same effect as set(range(10)) and set(xrange(10)) and set(foo()) where
def foo():
for i in (9,8,7,6,5,4,3,2,1,0):
yield i
In other words, the arg to set is an iterable, and each of the values obtained from the iterable must be hashable.

Python append() vs. + operator on lists, why do these give different results?

Why do these two operations (append() resp. +) give different results?
>>> c = [1, 2, 3]
>>> c
[1, 2, 3]
>>> c += c
>>> c
[1, 2, 3, 1, 2, 3]
>>> c = [1, 2, 3]
>>> c.append(c)
>>> c
[1, 2, 3, [...]]
>>>
In the last case there's actually an infinite recursion. c[-1] and c are the same. Why is it different with the + operation?
To explain "why":
The + operation adds the array elements to the original array. The array.append operation inserts the array (or any object) into the end of the original array, which results in a reference to self in that spot (hence the infinite recursion in your case with lists, though with arrays, you'd receive a type error).
The difference here is that the + operation acts specific when you add an array (it's overloaded like others, see this chapter on sequences) by concatenating the element. The append-method however does literally what you ask: append the object on the right-hand side that you give it (the array or any other object), instead of taking its elements.
An alternative
Use extend() if you want to use a function that acts similar to the + operator (as others have shown here as well). It's not wise to do the opposite: to try to mimic append with the + operator for lists (see my earlier link on why). More on lists below:
Lists
[edit] Several commenters have suggested that the question is about lists and not about arrays. The question has changed, though I should've included this earlier.
Most of the above about arrays also applies to lists:
The + operator concatenates two lists together. The operator will return a new list object.
List.append does not append one list with another, but appends a single object (which here is a list) at the end of your current list. Adding c to itself, therefore, leads to infinite recursion.
As with arrays, you can use List.extend to add extend a list with another list (or iterable). This will change your current list in situ, as opposed to +, which returns a new list.
Little history
For fun, a little history: the birth of the array module in Python in February 1993. it might surprise you, but arrays were added way after sequences and lists came into existence.
The concatenation operator + is a binary infix operator which, when applied to lists, returns a new list containing all the elements of each of its two operands. The list.append() method is a mutator on list which appends its single object argument (in your specific example the list c) to the subject list. In your example this results in c appending a reference to itself (hence the infinite recursion).
An alternative to '+' concatenation
The list.extend() method is also a mutator method which concatenates its sequence argument with the subject list. Specifically, it appends each of the elements of sequence in iteration order.
An aside
Being an operator, + returns the result of the expression as a new value. Being a non-chaining mutator method, list.extend() modifies the subject list in-place and returns nothing.
Arrays
I've added this due to the potential confusion which the Abel's answer above may cause by mixing the discussion of lists, sequences and arrays.
Arrays were added to Python after sequences and lists, as a more efficient way of storing arrays of integral data types. Do not confuse arrays with lists. They are not the same.
From the array docs:
Arrays are sequence types and behave very much like lists, except that the type of objects stored in them is constrained. The type is specified at object creation time by using a type code, which is a single character.
append is appending an element to a list. if you want to extend the list with the new list you need to use extend.
>>> c = [1, 2, 3]
>>> c.extend(c)
>>> c
[1, 2, 3, 1, 2, 3]
Python lists are heterogeneous that is the elements in the same list can be any type of object. The expression: c.append(c) appends the object c what ever it may be to the list. In the case it makes the list itself a member of the list.
The expression c += c adds two lists together and assigns the result to the variable c. The overloaded + operator is defined on lists to create a new list whose contents are the elements in the first list and the elements in the second list.
So these are really just different expressions used to do different things by design.
The method you're looking for is extend(). From the Python documentation:
list.append(x)
Add an item to the end of the list; equivalent to a[len(a):] = [x].
list.extend(L)
Extend the list by appending all the items in the given list; equivalent to a[len(a):] = L.
list.insert(i, x)
Insert an item at a given position. The first argument is the index of the element before which to insert, so a.insert(0, x) inserts at the front of the list, and a.insert(len(a), x) is equivalent to a.append(x).
you should use extend()
>>> c=[1,2,3]
>>> c.extend(c)
>>> c
[1, 2, 3, 1, 2, 3]
other info: append vs. extend
See the documentation:
list.append(x)
Add an item to the end of the list; equivalent to a[len(a):] = [x].
list.extend(L)
- Extend the list by appending all the items in the given list;
equivalent to a[len(a):] = L.
c.append(c) "appends" c to itself as an element. Since a list is a reference type, this creates a recursive data structure.
c += c is equivalent to extend(c), which appends the elements of c to c.

Categories

Resources