Python: Adding element to list while iterating - python

I know that it is not allowed to remove elements while iterating a list, but is it allowed to add elements to a python list while iterating. Here is an example:
for a in myarr:
if somecond(a):
myarr.append(newObj())
I have tried this in my code and it seems to work fine, however I don't know if it's because I am just lucky and that it will break at some point in the future?
EDIT: I prefer not to copy the list since "myarr" is huge, and therefore it would be too slow. Also I need to check the appended objects with "somecond()".
EDIT: At some point "somecond(a)" will be false, so there can not be an infinite loop.
EDIT: Someone asked about the "somecond()" function. Each object in myarr has a size, and each time "somecond(a)" is true and a new object is appended to the list, the new object will have a size smaller than a. "somecond()" has an epsilon for how small objects can be and if they are too small it will return "false"

Why don't you just do it the idiomatic C way? This ought to be bullet-proof, but it won't be fast. I'm pretty sure indexing into a list in Python walks the linked list, so this is a "Shlemiel the Painter" algorithm. But I tend not to worry about optimization until it becomes clear that a particular section of code is really a problem. First make it work; then worry about making it fast, if necessary.
If you want to iterate over all the elements:
i = 0
while i < len(some_list):
more_elements = do_something_with(some_list[i])
some_list.extend(more_elements)
i += 1
If you only want to iterate over the elements that were originally in the list:
i = 0
original_len = len(some_list)
while i < original_len:
more_elements = do_something_with(some_list[i])
some_list.extend(more_elements)
i += 1

well, according to http://docs.python.org/tutorial/controlflow.html
It is not safe to modify the sequence
being iterated over in the loop (this
can only happen for mutable sequence
types, such as lists). If you need to
modify the list you are iterating over
(for example, to duplicate selected
items) you must iterate over a copy.

You could use the islice from itertools to create an iterator over a smaller portion of the list. Then you can append entries to the list without impacting the items you're iterating over:
islice(myarr, 0, len(myarr)-1)
Even better, you don't even have to iterate over all the elements. You can increment a step size.

In short: If you'are absolutely sure all new objects fail somecond() check, then your code works fine, it just wastes some time iterating the newly added objects.
Before giving a proper answer, you have to understand why it considers a bad idea to change list/dict while iterating. When using for statement, Python tries to be clever, and returns a dynamically calculated item each time. Take list as example, python remembers a index, and each time it returns l[index] to you. If you are changing l, the result l[index] can be messy.
NOTE: Here is a stackoverflow question to demonstrate this.
The worst case for adding element while iterating is infinite loop, try(or not if you can read a bug) the following in a python REPL:
import random
l = [0]
for item in l:
l.append(random.randint(1, 1000))
print item
It will print numbers non-stop until memory is used up, or killed by system/user.
Understand the internal reason, let's discuss the solutions. Here are a few:
1. make a copy of origin list
Iterating the origin list, and modify the copied one.
result = l[:]
for item in l:
if somecond(item):
result.append(Obj())
2. control when the loop ends
Instead of handling control to python, you decides how to iterate the list:
length = len(l)
for index in range(length):
if somecond(l[index]):
l.append(Obj())
Before iterating, calculate the list length, and only loop length times.
3. store added objects in a new list
Instead of modifying the origin list, store new object in a new list and concatenate them afterward.
added = [Obj() for item in l if somecond(item)]
l.extend(added)

You can do this.
bonus_rows = []
for a in myarr:
if somecond(a):
bonus_rows.append(newObj())
myarr.extend( bonus_rows )

Access your list elements directly by i. Then you can append to your list:
for i in xrange(len(myarr)):
if somecond(a[i]):
myarr.append(newObj())

make copy of your original list, iterate over it,
see the modified code below
for a in myarr[:]:
if somecond(a):
myarr.append(newObj())

I had a similar problem today. I had a list of items that needed checking; if the objects passed the check, they were added to a result list. If they didn't pass, I changed them a bit and if they might still work (size > 0 after the change), I'd add them on to the back of the list for rechecking.
I went for a solution like
items = [...what I want to check...]
result = []
while items:
recheck_items = []
for item in items:
if check(item):
result.append(item)
else:
item = change(item) # Note that this always lowers the integer size(),
# so no danger of an infinite loop
if item.size() > 0:
recheck_items.append(item)
items = recheck_items # Let the loop restart with these, if any
My list is effectively a queue, should probably have used some sort of queue. But my lists are small (like 10 items) and this works too.

You can use an index and a while loop instead of a for loop if you want the loop to also loop over the elements that is added to the list during the loop:
i = 0
while i < len(myarr):
a = myarr[i];
i = i + 1;
if somecond(a):
myarr.append(newObj())

Expanding S.Lott's answer so that new items are processed as well:
todo = myarr
done = []
while todo:
added = []
for a in todo:
if somecond(a):
added.append(newObj())
done.extend(todo)
todo = added
The final list is in done.

Alternate solution :
reduce(lambda x,newObj : x +[newObj] if somecond else x,myarr,myarr)

Assuming you are adding at the last of this list arr, You can try this method I often use,
arr = [...The list I want to work with]
current_length = len(arr)
i = 0
while i < current_length:
current_element = arr[i]
do_something(arr[i])
# Time to insert
insert_count = 1 # How many Items you are adding add the last
arr.append(item_to_be inserted)
# IMPORTANT!!!! increase the current limit and indexer
i += 1
current_length += insert_count
This is just boilerplate and if you run this, your program will freeze because of infinite loop. DO NOT FORGET TO TERMINATE THE LOOP unless you need so.

Related

Python - leave only n items in a list

There are many ways to remove n items from the list, but I couldn't find a way to keep n items.
lst = ["ele1", "ele2", "ele3", "ele4", "ele5", "ele6", "ele7", "ele8", "ele9", "ele10"]
n = 5
lst = lst[:len(lst)-(len(lst)-n)]
print(lst)
So I tried to solve it in the same way as above, but the problem is that the value of 'lst' always changes in the work I am trying to do, so that method is not valid.
I want to know how to leave only n elements in a list and remove all elements after that.
The simplest/fastest solution is:
del lst[n:]
which tells it to delete any elements index n or above (implicitly keeping 0 through n - 1, a total of n elements).
If you must preserve the original list (e.g. maybe you received it as an argument, and it's poor form to change what they passed you most of the time), you can just reverse the approach (slice out what you want to keep, rather than remove what you want to discard) and do:
truncated = lst[:n] # You have access to short form and long form
so you have both the long and short form, or if you don't need the original list anymore, but it might be aliased elsewhere and you want the aliases unmodified:
lst = lst[:n] # Replaces your local alias, but leaves other aliases unchanged
very similar to the solution of ShadowRanger, but using a slice assignment on the left hand side of the assigment oparator:
lst[n:] = []

Python - Pop element from for loop and append to end, modifying list while looping

For every element in a for loop, I want to check if the element satisfies some condition. If yes, I want to do something to it; if no, I want to add it to the end of the list and do it later.
I'm aware that modifying loops while looping is bad, but I want to do it anyway, and want to know how to do it correctly.
for i in list:
if something(i):
do(i)
else:
#append to end of list, do later
list.remove(i)
list.append(i)
This piece of code mostly works, but causes me to skip the element after the removed i while iterating. How do I work around that?
I'm pretty sure there is no correct way to modify a list while looping over a list. In particular, removing objects from a list will cause issues (see strange result when removing item from a list).
I would consider making additional lists, one for your results, one temp list for the elements you have left to process. Note that in my example, tmp1 is your input list.
tmp1 = [values to process]
tmp2 = []
results = []
while(tmp1)
for i in tmp1:
if something(i):
results.append(i)
do(i)
else:
tmp2.append(i)
tmp1 = tmp2
tmp2 = []
Realized just after asking that I don't even need to remove if I'm only ever going to iterate through the list once - if I just append, I'll be able to operate on the element once everything else is finished.
In other words, this works:
for i in list:
if something(i):
do(i)
else:
#append to end of list, do later
list.append(i)

Python2.7 - list.remove(item) within a loop gives unexpected behaviuor [duplicate]

This question already has answers here:
Removing Item From List - during iteration - what's wrong with this idiom? [duplicate]
(9 answers)
Closed 4 years ago.
I want to remove all even numbers in a list. But something confused me...
This is the code.
lst = [4,4,5,5]
for i in lst:
if i % 2 == 0:
print i
lst.remove(i)
print lst
It prints [4, 5, 5] Why not [5, 5]?
It should be like this
for i in lst[:]:
if i % 2 == 0:
print i
lst.remove(i)
print lst
Problem:
You are modifying the list while you iterate over it. Due to which the iteration is stopped before it could complete
Solution:
You could iterate over copy of the list
You could use list comprehension :
lst=[i for i in lst if i%2 != 0]
By using list.remove, you are modifying the list during the iteration. This breaks the iteration giving you unexpected results.
One solution is to create a new list using either filter or a list comprehension:
>>> filter(lambda i: i % 2 != 0, lst)
[5, 5]
>>> [i for i in lst if i % 2 != 0]
[5, 5]
You can assign either expression to lst if needed, but you can't avoid creating a new list object with these methods.
Other answers have already mentioned that you're modifying the list while iterating over it, and offered better ways to do it. Personally I prefer the list comprehension method:
odd_numbers = [item for item in numbers if item % 2 != 0]
For your specified case of a very small list, I would definitely go with that.
However, this does create a new list, which could be a problem if you have a very large list. In the case of integers, large probably means millions at least, but to be precise, it's however large it needs to be to start giving you issues with memory usage. In that case, here are a couple ways to do it.
One way is similar to the intent of the code in your question. You iterate over the list, removing the even numbers as you go. However, to avoid the problems that modifying a list you're iterating over can cause, you iterate over it backwards. There are ways to iterate forward, but this is simpler.
Here's one way using a while loop:
# A one hundred million item list that we don't want to copy
# even just the odd numbers from to put into a new list.
numbers = range(100000000) # list(range(100000000)) in Python 3
index = len(numbers) - 1 # Start on the index of the last item
while index >= 0:
if numbers[index] % 2 == 0:
numbers.pop(index)
index -= 1
Here's another way using a for loop:
# A one hundred million item list that we don't want to copy
# even just the odd numbers from to put into a new list.
numbers = range(100000000) # list(range(100000000)) in Python 3
for index in xrange(len(numbers) - 1, -1, -1): # range(...) in Python 3
if numbers[index] % 2 == 0:
numbers.pop(index)
Notice in both the while loop and for loop versions, I used numbers.pop(index), not numbers.remove(numbers[index]). First of all, .pop() is much more efficient because it provides the index, whereas .remove() would have to search the list for the first occurrence of the value. Second, notice that I said, "first occurrence of the value". That means that unless every item is unique, using .remove() would remove a different item than the one the loop is currently on, which would end up leaving the current item in the list.
There's one more solution I want to mention, for situations where you need to keep the original list, but don't want to use too much more memory to store a copy of the odd numbers. If you only want to iterate over the odd numbers once (or you're so averse to using memory that you'd rather recalculate things when you need to), you can use a generator. Doing so would let you iterate over the odd numbers in the list without needing any additional memory, apart from the inconsequential amount used by the generator mechanism.
A generator expression is defined exactly like a list comprehension, except that it's enclosed in parentheses instead of square brackets:
odd_numbers = (item for item in numbers if item % 2 != 0)
Remember that the generator expression is iterating over the original list, so changing the original list mid-iteration will give you the same problems as modifying a list while iterating over it in a for loop. In fact, the generator expression is itself using a for loop.
As an aside, generator expressions shouldn't be relegated only to very large lists; I use them whenever I don't need to calculate a whole list in one go.
Summary / TLDR:
The "best" way depends exactly what you're doing, but this should cover a lot of situations.
Assuming lists are either "small" or "large":
If your list is small, use the list comprehension (or even the generator expression if you can). If it's large, read on.
If you don't need the original list, use the while loop or for loop methods to remove the even numbers entirely (though using .pop(), not .remove()). If you do need the original list, read on.
If you're only iterating over the odd numbers once, use the generator expression. If you're iterating over them more than once, but you're willing to repeat computation to save memory, use the generator expression.
If you're iterating over the odd numbers too many times to recompute them each time, or you need random access, then use a list comprehension to make a new list with only the odd numbers in them. It's going to use a lot of memory, but them's the breaks.
As a general principle, you should not modify a collection while you are iterating over it. This leads to skipping of some elements, and index error in some cases.
Instead of removing elements from list, it would be easier if you just create another reference with same name. It has lesser time complexity too.
lst = filter(lambda i: i % 2 !=0, lst)

Python: Control Flow

An issue with my existing code. Code goes:
example_dic = {'name': 'jim','value': 4}
list_of_dic = [example_dic,dic2,dic3,...]
empty_list = [] #will be filled with multiple dictionaries all in same format/same keys
key_sum = sum(blah['value'] for blah in empty_list) #tested this with a filled in "list_of_dic", works as expected
if not empty_list or key_sum < arbitrary_value:
for things in list_of_dic[:]:
if case1:
empty_list.append(things)
list_of_dic.remove(things)
elif case2:
empty_list.append(things)
list_of_dic.remove(things)
else:
pass
Problem is that key_sum does not get updated ever even though things are being appended onto empty_list. As I said in the comments, I know the key_sum line works because I tried it by filling in the list of dictionaries with random stuff first.
What I want is that items will keep being added onto list_of_dic only while key_sum < arbitrary. If for example I want key_sum < 20, if the next item causes key_sum >= 20, I do not want it to be added at all, not simply break and end after it's already been added. I also do not want the code to end there, if there is a list of 10 items and the 1st one has value = 22 I don't want the whole thing to stop, I want it to keep going through the rest, adding items on until it cannot add anymore that wouldn't cause key_sum >= 20.
Simpler answer would be, is there any other language which doesn't require such unnecessary complication for what seems like a very simple task?
There are a couple of issues with this. One is that your code assumes that key_sum gets automatically updated when you change empty_list, but that's not the case. It just gets calculated once. You'll need to recalculate key_sum on every iteration, or if you're really worried about efficiency, increment the key_sum every time you append to empty_list. It also seems like you want to check the value of key_sum on every iteration of your for loop, rather than only after you've iterated over the entire list_of_dic.
The second issue is that you're removing items from list_of_dic while you iterate over it. This has undefined behavior in Python, and generally results in certain elements of your iterable being skipped over. Instead, you need to iterate over a copy of the list.
Summarizing the changes:
for things in list_of_dic[:]: # Iterate over a copy of list_of_dic
do_append = False
if case1:
do_append = True
elif case2:
do_append = True
if do_append:
if (key_sum + things['value']) >= arbitrary_value:
continue
empty_list.append(things)
list_of_dic.remove(things)
key_sum += things['value']

What is the fastest way to add data to a list without duplication in python (2.5)

I have about half a million items that need to be placed in a list, I can't have duplications, and if an item is already there I need to get it's index. So far I have
if Item in List:
ItemNumber=List.index(Item)
else:
List.append(Item)
ItemNumber=List.index(Item)
The problem is that as the list grows it gets progressively slower until at some point it just isn't worth doing. I am limited to python 2.5 because it is an embedded system.
You can use a set (in CPython since version 2.4) to efficiently look up duplicate values. If you really need an indexed system as well, you can use both a set and list.
Doing your lookups using a set will remove the overhead of if Item in List, but not that of List.index(Item)
Please note ItemNumber=List.index(Item) will be very inefficient to do after List.append(Item). You know the length of the list, so your index can be retrieved with ItemNumber = len(List)-1.
To completely remove the overhead of List.index (because that method will search through the list - very inefficient on larger sets), you can use a dict mapping Items back to their index.
I might rewrite it as follows:
# earlier in the program, NOT inside the loop
Dup = {}
# inside your loop to add items:
if Item in Dup:
ItemNumber = Dup[Item]
else:
List.append(Item)
Dup[Item] = ItemNumber = len(List)-1
If you really need to keep the data in an array, I'd use a separate dictionary to keep track of duplicates. This requires twice as much memory, but won't slow down significantly.
existing = dict()
if Item in existing:
ItemNumber = existing[Item]
else:
ItemNumber = existing[Item] = len(List)
List.append(Item)
However, if you don't need to save the order of items you should just use a set instead. This will take almost as little space as a list, yet will be as fast as a dictionary.
Items = set()
# ...
Items.add(Item) # will do nothing if Item is already added
Both of these require that your object is hashable. In Python, most types are hashable unless they are a container whose contents can be modified. For example: lists are not hashable because you can modify their contents, but tuples are hashable because you cannot.
If you were trying to store values that aren't hashable, there isn't a fast general solution.
You can improve the check a lot:
check = set(List)
for Item in NewList:
if Item in check: ItemNumber = List.index(Item)
else:
ItemNumber = len(List)
List.append(Item)
Or, even better, if order is not important you can do this:
oldlist = set(List)
addlist = set(AddList)
newlist = list(oldlist | addlist)
And if you need to loop over the items that were duplicated:
for item in (oldlist & addlist):
pass # do stuff

Categories

Resources