Surprising behaviour when iterating over a list while modifying it - python

I'm aware that this is an example of code you should never write, and I'm not asking for the right approach. My concern is that I don't understand its behaviour, so I was hoping someone could shed light on it:
def foo(list_of_chars):
'''Returns the set of words that can be obtained by removing one character from list_of_chars.'''
result = set()
my_copy = list(list_of_chars)
for elem in my_copy:
my_copy.remove(elem)
result.add(''.join(my_copy))
my_copy = list(list_of_chars)
return result
I would have expected either behaviour from this function (let's say list_of_chars is ['h', 'e', 'l', 'l', 'o']):
the code runs flawlessly because we restore my_copy at the end of each iteration, so when the for loop assigns the next element to elem it is the same as if we had never touched the list (so we do iterate over 'h', 'e', 'l', 'l', 'o');
the code fails miserably because the assignment at the end of the loop is somehow ignored, so we first remove 'h', skip 'e', remove 'l', skip the next 'l', and then remove 'o' -- and possibly crash.
What actually happens is stranger: we iterate over 'h', 'l', 'l', 'o', so 'e' is skipped, but it's the only character that is skipped. Other examples behave in the same way: only the second element of list_of_chars is overlooked. Can someone explain this? (Python 2 and 3 yield the same result).

The for loop does not "read" my_copy again on every iteration, but my_copy.remove does.
for elem in my_copy:
my_copy.remove(elem)
On the first iteration, my_copy in both lines refers to the same object. You're actually modifying the object for iterates over. However, at the end of the iteration, you replace my_copy with something else. The for loop retains its original object reference, but my_copy.remove refers to the current version of my_copy. So now the object the for loop iterates over and the object that you remove an element from are two different objects.
Hence remove interferes with the loop only on the first iteration.

Related

Rearrange a list of strings

I want to rearrange or modify he sequence of elements (strings) in a list. This is the original list
['A', 'B', 'C', 'D', 'E', 'F', 'G']
I want to move E and F behind (or after?) B.
['A', 'B', 'E', 'F', 'C', 'D', 'G']
^^^ ^^^
The decision what to move comes from the user. There is no rule behind and no way to formulate that in an algorithm. In other words the action move something behind something other is input from the user; e.g. the user mark two elements with her/his mouse and drag an drop it behind another element.
My code works and is able to do this. But I wonder if there is a more efficient and pythonic way to do this. Maybe I missed some of Python's nice in-build features.
#!/usr/bin/env python3
# input data
original = list('ABCDEFG')
# move "EF" behind "B" (this is user input)
to_move = 'EF'
behind = 'B'
# expected result
rearanged = list('ABEFCDG')
# index for insertion
idx_behind = original.index(behind)
# each element to move
for c in reversed(to_move): # "reverse!"
# remove from original position
original.remove(c)
# add to new position
original.insert(idx_behind + 1, c)
# True
print(original == rearanged)
You can assume
Elements in original are unique.
to_move always exist in original.
behind always exist in original.
The elements in to_move are always adjacent.
Other example of possible input:
Move ['B'] behind F
Move ['A', 'B'] behind C
This is not possible:
Move ['A', 'F'] behind D
Don't use .remove when the goal is to erase from a specific position; though you may know what is at that position, .remove a) will search for it again, and b) remove the first occurrence, which is not necessarily the one you had in mind.
Don't remove elements one at a time if you want to remove several consecutive elements; that's why slices exist, and why the del operator works the way that it does. Not only is it already harder to iterate when you can say what you want directly, but you have to watch out for the usual problems with modifying a list while iterating over it.
Don't add elements one at a time if you want to add several elements that will be consecutive; instead, insert them all at once by slice assignment. Same reasons apply here.
Especially don't try to interleave insertion and removal operations. That's far more complex than necessary, and could cause problems if the insertion location overlaps the source location.
Thus:
original = list('ABCDEFG')
start = original.index('E')
# grabbing two consecutive elements:
to_move = original[start:start+2]
# removing them:
del original[start:start+2]
# now figure out where to insert in that result:
insertion_point = original.index('B') + 1
# and insert:
original[insertion_point:insertion_point] = to_move
If it is just a small number of items you want to rearrange, just swap the relevant elements:
lst = ['A', 'B', 'C', 'D', 'E', 'F', 'G']
lst[2], lst[4] = lst[4], lst[2] # switch 'C' and 'E'
lst[3], lst[5] = lst[5], lst[3] # switch 'D' and 'F'
lst
['A', 'B', 'E', 'F', 'C', 'D', 'G']

After inserting list into a set, set only contains the items in list but not list

I'm trying to filter out unique items from a list of lists and have tried the following:
Data: sent_list = [['cleverness', 'wit'],['the', 'best', 'story'],['best', 'story'],['wit']]
I have tried:
word_set = set()
for sent in sent_list:
for word in sent:
word_set.update(word)
word_set
Output was:
{'b', 'c', 'e', 'h', 'i', 'l', 'n', 'o', 'r', 's', 't', 'v', 'w', 'y'}
But the expected code and result was:
word_set = set()
for sent in sent_list:
word_set.update(sent)
word_set
{'best', 'cleverness', 'story', 'the', 'wit'}
I used the first 'for' loop for loop to access each sublist in the main list, then a second 'for' loop to access each word in the sublist, but seems my understanding is wrong. Also, in the correct code, if a list is directly updated to set, set should have lists in it, isn't it?
Please help me in understanding this concept.
You should use 'add' method to add only one element to list. 'update' method adds multiple elements. That is why it considers a string as an array of characters. Correct code would be
word_set = set()
for sent in sent_list:
word_set.update(sent)
word_set
The right method to use here is set.add(word) not set.update(word).

Understanding Recursion "local" variable in Python

I'm just learning about recursion and am trying to apply it in some for-fun, come-to-understand-it ways. (Yes, this whole thing is better done by three nested for loops)
def generate_string(current_string, still_to_place):
if still_to_place:
potential_items = still_to_place.pop(0)
for item in potential_items:
generate_string(current_string + item, still_to_place)
#print("Want to call generate_string({}, {})".format(current_string + item, still_to_place))
else:
print(current_string)
generate_string("", [['a','b','c'],['d','e','f'],['g','h','i']])
If I comment out the recursive call and uncomment the print, it prints exactly what I'd hope it would be calling. However, just uncommenting the print shows that it calls an empty still_to_place array even when it should still have the [d,e,f], [g,h,i] from the "higher up" recursion I think.
What am I missing in my understanding? Thanks!
Right, this is the expected behavior. The reason is that still_to_place is being shared between each function call. Mutable objects in Python are 'passed by assignment', meaning that, if you pass a list to a function, that function shares a reference to the SAME list. This thread has more detail.
So, each time you call still_to_place.pop(0), you are popping the list in every recursive call. They all share the exact same list.
This behavior is not always desirable, often you want your list to be immutable. In this case, you need to pass your recursive call a modified copy of the data structure. Here's what your code would look like using the immutable approach:
def generate_string(current_string, still_to_place):
if still_to_place:
potential_items = still_to_place[0]
for item in potential_items:
generate_string(current_string + item, still_to_place[1:])
print("Want to call generate_string({}, {})".format(current_string + item, still_to_place))
else:
print(current_string)
generate_string("", [['a','b','c'],['d','e','f'],['g','h','i']])
As a rule of thumb, methods on the object (e.g. .pop) will modify it in-place. Also, different languages approach mutability differently, in some language, data structures are ALWAYS immutable.
I outputted what I got during each iteration of generate_string and this is what I got. It's probably all confusing because nothing is behaving how you expected, but let me explain what Python is thinking.
#1st time
current_string = ""
still_to_place = [['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i']]
We start out by passing in the above data, however, as we walk through what happens, we first pop the first array ['a', 'b', 'c'], and we begin to iterate through this popped array. However because we called .pop(0), we now only have the latter part of the array, the still_to_place.pop(0) on the first recursive call that is made to generate_string()
#2nd time
current_string = "a"
still_to_place = [['d', 'e', 'f'], ['g', 'h', 'i']]
This is exactly what was in current_string and still_to_place the first time the recursive call was made. Now we are going to begin executing the function again from the beginning. We call the pop function again, removing the second array ['d', 'e', 'f']. Now we are only left with the third and final array.
#3rd time
current_string = "ad"
still_to_place = [['g', 'h', 'i']]
As we iterate through ['g', 'h', 'i'], because still_to_place is now empty. (We have just popped the last array.) Any calls to generate_string will go directly to the else clause, and we are going to be printing out the "ad" string plus the values in the array we just popped.
#4th, 5th and 6th time
still_to_place = []
current_string = "adg"
still_to_place = []
current_string = "adh"
still_to_place = []
current_string = "adi"
We now continue where the last recursive call left off, when we where going through the second time. This is where things get confusing. When we left off current_string = "a" and still_to_place was originally [['d', 'e', 'f'], ['g', 'h', 'i']], but we have since popped everything off of the array. You see, arrays behave differently than numbers or strings. All versions of an array share the same data. You change the data once, it changes it everywhere that array is used. (objects and dictionaries also behave this way.)
So with all that said still_to_place = [], and still_to_place will stay empty for the remainder of the recursive calls. potential_items still has the data that it popped of ['d', 'e', 'f']. We've already executed the 'd' string in steps #4, #5, and #6, so we can finish where we left of
#7th and 8th times
still_to_place = []
current_string = "ae"
still_to_place = []
current_string = "af"
Once again potential_items has ['a', 'b', 'c'] and we've already executed 'a'. Unlike still_to_place, potential_items is a local variable with a smaller scope. If you know how scopes work, then it will make since why we can have multiple potential_items, but it is the same still_to_place that was being used. Each time we popped an item off of still_to_place we where adding the popped result to a new potential items variable with a limited scope. still_to_place was global to the entire program, and so one change to still_to_place would cause changes that where not being anticipated.
Hopefully I made things more confusing, and not less confusing. Leave a comment on what you need more clarification on.

python function that return all initial segments of a list

I am trying to produce a function that take as input a list of string and returns all the initial segments of that list.
i.e the output should be:
([[], ['k'], ['k', 'i'], ['k', 'i', 'm'], ['k', 'i', 'm', 'i']])
I have done the following but it is not correct because I get number instead of characters.
def funv(k):
return [[i for i in range(i)] for i in range(len(k))]
Can anyone tell me what can I do to correct it?
This should work:
[list(k[:i]) for i in range(1, len(k)+1)]
I really have no idea what you are doing, but it sounds like you are just missing one small thing. Instead of returning i in your inner list comprehension, you want to return k[i]. i is the position where k[i] would be the character at position i.
def funv(k):
return [[k[i] for i in range(i)] for i in range(len(k))]

Can it be done with recursion?

im trying to make my program run as i want it to, but i have some trouble with that, hope someone can help with that.
I wrote a program that takes a list of chars and assembles them to create words. Word ends when there is a " " in list. So it looks like that:
inp = ['r','e', 'e', 'l', ' ', 'y', 'e', 'l', 'l', 'o', 'w', ' ', 'g', 'e', 'l',' ', 'p','e','e','k']
outp = ['reel', 'yellow', 'gel', 'peek']
The code looks like this:
def mer(inp, outp=[]):
tail = 0
for item in inp:
if item == (" "):
inp[:tail] = ["".join(inp[:tail])]
outp.append(inp.pop(0))
inp.remove(item)
if ((" ") in inp) == False:
inp[:] = ["".join(inp[:])]
outp.append(inp.pop(0))
tail +=1
And now to get the output (in the case with the input like on top) i need to call mer two times. Is there a way to make it run untill the input list is empty, or maybe use a recursion?
It's just a programming exercise, so it can be probably all done better, but for now thats all i need.
You can use join and split:
>>> ''.join(inp).split()
['reel', 'yellow', 'gel', 'peek']
# recursion
from itertools import takewhile
def fun(x):
if not x:
return
y = list(takewhile(lambda i:i!=' ', x))
yield ''.join(y)
for z in fun(x[len(y)+1:]):
yield z
list(fun(['r','e', 'e', 'l', ' ', 'y', 'e', 'l', 'l', 'o', 'w', ' ', 'g', 'e', 'l',' ', 'p','e','e','k']))
I know you asked for a method using recursion, but the most pythonic method in this case is to join the characters together, then split them.
outp = "".join(input).split(" ")
And now to get the output (in the case with the input like on top) i need to call mer two times.
The problem with your algorithm is that you are modifying the list while you iterate over it. This is a naughty and unsafe thing to be doing.
After "reel" is put into outp, inp is ['y', 'e', 'l', 'l', 'o', 'w', ' ', 'g', 'e', 'l',' ', 'p','e','e','k']. But the next character that will be examined by the loop is - at least in the CPython implementation - not the 'y' of 'yellow', but the 'w'. This is because the iteration internally stores an index (which happens to be in sync with the tail variable that you update manually) and uses that to grab elements. The listiterator created behind the scenes to implement the for-loop is utterly unaware of changes to the list that it's iterating over, and thus can't adjust to keep the "same position" (and who knows what you really mean by that, anyway?).
You can see this for yourself if you add a couple of "trace" print statements to the code to show the state of the variables at various points.
Anyway, since the iterator is at the 'w' at this point, it will find the space next and extract 'yellow' just fine; but next it will move to the 'k' of "peek", missing the space after 'gel', and it won't run any of the code in your second if-case, either, because the space between 'gel' and 'peek' is still in the buffer (you didn't really think clearly enough about the real end condition).
If you really, really want to do everything the hard way instead of just writing ''.join(inp).split(' '), you could fix the problem by tracking a beginning-of-word and end-of-word index, slicing out sublists, joining them and putting the resulting words into the output, and leaving the input alone. While we're at it:
functions should use the return value to return data; passing in an outp parameter is silly - let's just return a list of words.
We can use the built-in enumerate function to get indices that match up with the list elements as we iterate.
I have no idea what "mer" means.
You use way too many parentheses, and comparing to boolean literals (True and False) is poor style.
So, the corrected code using the original algorithm:
def words_from(chars):
begin = 0 # index of beginning of current word
result = [] # where we store the output
for i, char in enumerate(chars):
if char == ' ':
result.append(''.join(chars[begin:i]))
begin = i + 1
# At the end, make one more word from the chars after the last space.
result.append(''.join(chars[begin:]))
return result
You should definitely use join and split for this, but since the question specifically asks for a recursive solution, here is an answer that uses one.
This is meant as an exercise in recursion only, this code should not be used.
def join_split(inp, outp=None):
if not inp:
return outp
if inp[0] == ' ':
return join_split(inp[1:], (outp or ['']) + [''])
if outp is None:
return join_split(inp[1:], [inp[0]])
outp[-1] += inp[0]
return join_split(inp[1:], outp)
>>> join_split(['r','e', 'e', 'l', ' ', 'y', 'e', 'l', 'l', 'o', 'w', ' ', 'g', 'e', 'l',' ', 'p','e','e','k'])
['reel', 'yellow', 'gel', 'peek']

Categories

Resources