Impact of removing a list item on reversed() in python

Impact of removing a list item on reversed() in python - python

As far as I know, reversed() function gives an iterator and works just like iter() but will give the items in reverse order. However I faced a strange behavior from the object that gets back from reversed() function.
By looking at:
lst = ['a', 'b', 'c', 'd']
iter_lst = iter(lst)
lst.remove('c')
print(list(iter_lst))
output : ['a', 'b', 'd']
It's just as expected. but:
lst = ['a', 'b', 'c', 'd']
rev_iter_lst = reversed(lst)
lst.remove('c')
print(list(rev_iter_lst))
output : []
Shouldn't it be : ['d', 'b', 'a'] ?
Is it something in implementation of __reversed__() method in list object or __next__() method in the iterator object that prevents this ? I mean if something changes in original list it won't produce reverse sequence maybe...
Update: I've posted an answer which is a possible fix to it here, I've tested that but I'm unaware if there is a situation that this implementation would give unexcepted result.

according to list.__reversed__ source code, the iterator remembers the last index address and returns the iterator which remembers the last index address.
now when you remove an item it will shift all the indexes and makes the last address point to nowhere and it will return an empty list because there is nothing to iterate over.
let me describe more:
consider following list: lst = ['a','b','c']
also let's assume lst[0] is at the 100 and each character is one byte so, the character 'c' is in the 102.
when you create a revered iterator it will remember 102 as the start point.
in the next step, we omit 'b' now the character 'c' is in address 101.
and when you ask the iterator to iterate, it will start to look at position 102. what it will found there? literally nothing and obviously it will return an empty list.
I hope this can be helpful :)
EDIT: the word address is not correct. I must use index instead...

So after the discussion with #kkasra12 I ended up implementing Python's pseudo-list object (without doing all the necessary checkings) to mimic it's behavior and I just focused on reverse() operation. Here is my class:
class MyList:
def __init__(self, n):
self.length = n
self._seq = list(range(n))
#property
def seq(self):
return self._seq
def __len__(self):
return self.length
def __getitem__(self, item):
return self._seq[item]
def __setitem__(self, idx, value):
self._seq[idx] = value
def __reversed__(self):
return ReverseIterator(self)
def __str__(self):
return str(self._seq)
def append(self, v):
self._seq.append(v)
self.length += 1
def remove(self, v):
self._seq.remove(v)
self.length -= 1
And my ReverseIterator:
class ReverseIterator:
def __init__(self, org):
self.org = org
self._index = org.length
def __iter__(self):
return self
def __next__(self):
if 0 < self._index:
try:
item = self.org.seq[self._index - 1]
self._index -= 1
return item
except IndexError:
raise StopIteration()
else:
raise StopIteration()
The result:
obj = MyList(6)
iter_obj = iter(obj)
obj.remove(2)
print(list(iter_obj))
print('-----------------------')
obj = MyList(6)
rev_iter_obj = reversed(obj)
obj.remove(2)
print(list(rev_iter_obj))
output :
[0, 1, 3, 4, 5]
-----------------------
[]
By commenting those remove statements above, we can see that it works like original list object.
Then I created new SmartReverseIterator iterator which can handle if an item is removed from the original object and can generate the values on the fly just like how iter() wokred on the list in OP.
The only thing should be considered is if an item is removed(self._index would be smaller than original object's length), the self._index should be reset.
class SmartReverseIterator:
def __init__(self, org):
self.org = org
self._index = org.length
def __iter__(self):
return self
def __next__(self):
if 0 < self._index:
try:
item = self.org.seq[self._index - 1]
return item
except IndexError:
self._index = self.org.length
item = self.org.seq[self._index - 1]
return item
finally:
self._index -= 1
else:
raise StopIteration()
By changing the __reversed__ method on MyList to return this new iterator, the result is going to be:
obj = MyList(6)
iter_obj = iter(obj)
obj.remove(2)
print(list(iter_obj))
print('-----------------------')
obj = MyList(6)
rev_iter_obj = reversed(obj)
obj.remove(2)
print(list(rev_iter_obj))
Output:
[0, 1, 3, 4, 5]
-----------------------
[5, 4, 3, 1, 0]
I wanted to know if there is any downside to this, or in other words why python decided not to implement __reversed__ method on list objects like this to result exactly how iter() can generate values if an item is removed.
In which situation we would see issues ?

The list call will iterate the reverse iterator, whose index < PyList_GET_SIZE(seq) check here will fail because you shrunk seq in the meantime, and thus won't yield a value but stop:
listreviter_next(listreviterobject *it)
{
(some checks)
index = it->it_index;
if (index>=0 && index < PyList_GET_SIZE(seq)) {
(decrease the index and return the element)
}
(stop the iteration)
}

Related

Difference between these two methods of an user-defined class

I have no idea what the difference is between these two codes. When I rum those codes in the scoring python code, it marks mine wrong. I would appreciate it if you can tell me the different between using a variable to make a new empty list and appending values verses just making a list with values in it. The first code is mine and the second code is the code from the answer sheet. Thank you very much :)
class Server:
def __init__(self):
self.list = []
def makeOrder(self,orderNum, orderList):
existance = False
for order in self.list:
if order[0]==orderNum:
existance = True
if existance == True:
return -1
else:
self.list.append([orderNum,orderList])
return [orderNum,orderList]
class Server:
def __init__(self):
self.q = []
# 1
def makeOrder(self, orderNumber, orderList):
existAlready = False
for order in self.q:
if order[0] == orderNumber:
existAlready = True
if existAlready == True:
return -1
else:
tmp = []
tmp.append(orderNumber)
tmp.append(orderList)
self.q.append(tmp)
return tmp

Functionally, these are both very similar. The only difference of substance I see (other than obvious variable names, etc) is in the return value.
In the first option, you can see that one list is appended to self.list, and a new (albeit identical in value) list is returned. This means that this is a different object.
self.list.append([orderNum,orderList])
return [orderNum,orderList]
However in the second option, you can clearly see that tmp is both pushed AND returned:
tmp = []
tmp.append(orderNumber)
tmp.append(orderList)
self.q.append(tmp)
return tmp
This is a single object that gets appended and returned.
Fundamentally, this mens that any modification to the returned list in the second option will be reflected inside self.q, while in the first option, the returned value is wholly independent. If you wanted to more or less replicate the behavior of option 1, you can change:
return tmp
with
return list(tmp)
Although keep in mind that if orderList is itself a mutable list, the same behavior will occur if you modify the returned value's [1] element. Since it is a reference to a data structure, modification of that list will also affect self.q (or self.list in the first option)
Example:
>>> class Foo:
... def __init__(self):
... self.q = []
... def bar(self, i, l):
... tmp = [i, l]
... self.q.append(tmp)
... return list(tmp)
...
>>> f = Foo()
>>> f.q
[]
>>> x = f.bar(1, [1,2,3])
>>> x
[1, [1, 2, 3]]
>>> x[1].append(4)
>>> x
[1, [1, 2, 3, 4]]
>>> f.q # it changed here as well
[[1, [1, 2, 3, 4]]]

In a python iterator select other iterator based on conditions

In python I have an iterator returning an infinite string of indices in a fixed range [0, N] called Sampler. Actually I have a list of those and all they do is return indices in the range [0, N_0], [N_0, N_1], ..., [N_{n-1}, N_n].
What I now want to do is first select one of these iterators based on the length of their range, so I have a weights list [N_0, N_1 - N_0, ...] and I select one of these with:
iterator_idx = random.choices(range(len(weights)), weights=weights/weights.sum())[0]
Next, what I want to do is create an iterator which randomly selects one of the iterators and selects a batch of M samples.
class BatchSampler:
def __init__(self, M):
self.M = M
self.weights = [weight_list]
self.samplers = [list_of_iterators]
]
self._batch_samplers = [
self.batch_sampler(sampler) for sampler in self.samplers
]
def batch_sampler(self, sampler):
batch = []
for batch_idx in sampler:
batch.append(batch_idx)
if len(batch) == self.M:
yield batch
if len(batch) > 0:
yield batch
def __iter__(self):
# First select one of the datasets.
iterator_idx = random.choices(
range(len(self.weights)), weights=self.weights / self.weights.sum()
)[0]
return self._batch_samplers[iterator_idx]
The issue with this is that iter() only seems to be called once, so only the first time iterator_idx is selected. Obviously this is wrong... What is the way around this?
This is a possible case when you would have multiple datasets in pytorch, but you want to sample only batches from one of the datasets.

Seems to me that you want to define your own container type.
I'll try to provide examples of a few standard ways to do so
(hopefully without missing too many details);
you should be able to reuse one of these simple examples,
into your own class.
Using just __getitem__ (support indexing & looping):
object.__getitem__
Called to implement evaluation of self[key].
class MyContainer:
def __init__(self, sequence):
self.elements = sequence # Just something to work with.
def __getitem__(self, key):
# If we're delegating to sequences like built-in list,
# invalid indices are handled automatically by them
# (throwing IndexError, as per the documentation).
return self.elements[key]
t = (1, 2, 'a', 'b')
c = MyContainer(t)
elems = [e for e in c]
assert elems == [1, 2, 'a', 'b']
assert c[1:-1] == t[1:-1] == (2, 'a')
Using the iterator protocol:
object.__iter__
object.__iter__(self)
This method is called when an iterator is required for a container. This method should return a new iterator object that can iterate over all the objects in the container. For mappings, it should iterate over the keys of the container.
Iterator objects also need to implement this method; they are required to return themselves. For more information on iterator objects, see Iterator Types.
Iterator Types
container.__iter__()
Return an iterator object. The object is required to support the iterator protocol described below.
The iterator objects themselves are required to support the following two methods, which together form the iterator protocol:
iterator.__iter__()
Return the iterator object itself. This is required to allow both containers and iterators to be used with the for and in statements.
iterator.__next__()
Return the next item from the container. If there are no further items, raise the StopIteration exception.
Once an iterator's __next__() method raises StopIteration, it must continue to do so on subsequent calls.
class MyContainer:
class Iter:
def __init__(self, container):
self.cont = container
self.pos = 0
self.len = len(container.elements)
def __iter__(self): return self
def __next__(self):
if self.pos == self.len: raise StopIteration
curElem = self.cont.elements[self.pos]
self.pos += 1
return curElem
def __init__(self, sequence):
self.elements = sequence # Just something to work with.
def __iter__(self):
return MyContainer.Iter(self)
t = (1, 2, 'a', 'b')
c = MyContainer(t)
elems = [e for e in c]
assert elems == [1, 2, 'a', 'b']
Using a generator:
Generator Types
Python's generators provide a convenient way to implement the iterator protocol. If a container object's iter() method is implemented as a generator, it will automatically return an iterator object (technically, a generator object) supplying the iter() and next() methods.
generator
A function which returns a generator iterator. It looks like a normal function except that it contains yield expressions for producing a series of values usable in a for-loop or that can be retrieved one at a time with the next() function.
Usually refers to a generator function, but may refer to a generator iterator in some contexts.
generator iterator
An object created by a generator function.
6.2.9. Yield expressions
Using a yield expression in a function's body causes that function to be a generator
class MyContainer:
def __init__(self, sequence):
self.elements = sequence # Just something to work with.
def __iter__(self):
for e in self.elements: yield e
t = (1, 2, 'a', 'b')
c = MyContainer(t)
elems = [e for e in c]
assert elems == [1, 2, 'a', 'b']

Do we really need to use built in iter in all for loops of our iterables?

Using iter in our for loop makes programming an efficient one in python. How does it actually works?
Tried to visualize iter(iterables) in "http://www.pythontutor.com/visualize.html#mode=display". Here iter helps to create an instance.
Doesn't it actually refer to internal numerical objects?
val = [1,2,3,4,5]
val = iter(val)
for item in val:
print(item)
val = [1,2,3,4,5]
for item in val:
print(item)
Both returns same output. But how iter identifies the values?

What you're doing is redundant. A for loop is essentially syntactic sugar for:
val = [1,2,3,4,5]
iterator = iter(val)
while True:
try:
item = next(iterator)
except StopIteration:
break
print(item)
All that you're doing by calling iter() on your val is replacing
iterator = iter(val)
with
iterator = iter(iter(val))
Calling iter() on an iterator is a no-op; returning the same iterator.

Lets pretend that Python lists were not iterable and we wanted to create a class, MyList, that could be constructed with a list instance and iterate it. MyList would need to implement a __iter__ method that would return an iterator object that implements the __next__ method. Each successive call to this __next__ method should return the next element of the list. When there are no more elements to be returned, it needs to raise a StopIteration exception. This iterator object clearly needs two pieces of information:
A reference to the list being iterated.
The index of the last element that was output.
Let's call this iterator class MyListIterator. Its implementation might be:
class MyListIterator:
def __init__(self, l):
self.l = l # the list being iterated
self.index = -1 # the index of the last element outputted
def __next__(self):
self.index += 1 # next index to output
if self.index < len(self.l):
return self.l[self.index]
raise StopIteration
Class MyList then would be:
class MyList:
def __init__(self, the_list):
self.l = the_list # we are constructed with a list
def __iter__(self):
return MyListIterator(self.l) # pass to the iterator our list
An example of use:
l = MyList([0, 1, 2])
for i in l:
for j in l:
print (i, j)
Prints:
0 0
0 1
0 2
1 0
1 1
1 2
2 0
2 1
2 2

how to handle "Too many values to unpack"

I have checked some idea and the reason, is which investigated below for this problem...
"Too many values to unpack" Exception
(Stefano Borini's explanation)
But here I am iterating through a list as a comprehension list and move the result to a list...!
So the number of the inputs reads the number of the output variable, i.e. tempList...
Then, what is wrong with the process?!
def DoProcess(self, myList):
tempList = []
tempList = [[x,y,False] for [x,y] in myList]
return tempList
Edit 1: myList is a list of lists, just like [[x1, y1], [x2, y2], [x3, y3], [x4 y4]].
class Agent(object):
def __init__(self, point = None):
self.locationX = point.x
self.locationY = point.y
def __iter__(self):
return self
def __next__(self):
return [self.locationX, self.locationY]
def __getItem__(self):
return [self.locationX, self.locationY]
def GenerateAgents(self, numberOfAgents):
agentList = []
while len(agentList) < numberOfAgents:
point = Point.Point()
point.x = random.randint(0, 99)
point.y = random.randint(0, 99)
agent = Agent(point)
agentList.append(agent)
return agentList
def DoProcess(self, myList):
tempList = []
tempList = [[x[0],x[1],False] for x in myList]
return myList
And each Point has two attribute as locationX and locationY...

Your implementation of Agent is severely flawed; you created an infinite generator:
def __iter__(self):
return self
def __next__(self):
return [self.locationX, self.locationY]
This will forever yield lists with two values. Trying to use this object in a tuple assignment will yield at least 3 such values (2 for the x and y targets, plus one more for Python to know there were more values to unpack than requested). What Python does is call __next__ each time it needs another value in the sequence, and your code just returns [x, y] each time. For ever and ever until eternity.
The __iter__ method should return an actual iteration over the two values instead:
def __iter__(self):
for value in (self.locationX, self.locationY):
yield value
or even just
def __iter__(self):
yield self.locationX
yield self.locationY
dropping the __next__ altogether. The above generator will then yield two values then raise StopIteration properly, and work with tuple assignment.
The __getitem__ method is spelled all lowercase and takes an index argument:
def __getitem__(self, index):
return (self.locationX, self.locationY)[index]
Now 0 maps to locationX and 1 to locationY.
Rewriting your code with those changes:
class Agent(object):
def __init__(self, point):
self.locationX = point.x
self.locationY = point.y
def __iter__(self):
yield self.locationX
yield self.locationY
def __getitem__(self, index):
return (self.locationX, self.locationY)[index]
def GenerateAgents(self, numberOfAgents):
agentList = []
for _ in range(numberOfAgents):
point = Point.Point()
point.x = random.randint(0, 99)
point.y = random.randint(0, 99)
agent = Agent(point)
agentList.append(agent)
return agentList
def DoProcess(self, myList):
return [[x, y, False] for x, y in myList]

Your list needs to contain nested iterables of length two to which x and y are unpacked.

The proper way to make your getitem method is to write it as such:
def __getitem__(self, index):
if index == 0:
return self.locationX
if index == 1:
return self.locationY
raise IndexError()
Note that it has an index passed as argument and it is written __getitem__ and not __getItem__. Without the index error, it seems that python tries to unpack as many values as possible until the getitem raise an index error.
Not that you can simplify your code and add a clause for the index 2 and return False.
Honestly, I don't see the point to override getitem here. It will be easier to understand if you write.
tempList = [[x.locationX,x.locationY,False] for x in myList]
Also there is no need to write this:
tempList = []
tempList = [...]
Creating an empty list to replace it by a new list is pointless.
Here's a reworked sample of code. Note that I changed the method Generate and DoProcess as staticmethod. They can be made static as they do not really require any instance to work. You can call them directly using the Agent class. I removed the iterator, getitem as they aren't really necessary here. If they are used anywhere else then it might create troubles.
The thing is that in this case, it seems strange to unpack values from an Agent. If I would ever see such code... I wouldn't understand the needs for an iterator or __getitem__. It's not obvious that agent[0] is its X location and agent[1], its Y location. Python has named attributes for a reason. If you don't use them, then you could simply store your agents in a list instead of a class. Well that's exactly what the DoProcess method do.
class Agent(object):
def __init__(self, point):
self.locationX = point.x
self.locationY = point.y
#staticmethod
def GenerateAgents(numberOfAgents):
agentList = []
for i in range(numberOfAgents):
point = Point.Point()
point.x = random.randint(0, 99)
point.y = random.randint(0, 99)
agent = Agent(point)
agentList.append(agent)
return agentList
#staticmethod
def DoProcess(myList):
return [
[obj.locationX, obj.locationY, False]
for obj in myList
]

Maybe something like this:
def DoProcess(self, myList):
tempList = [[x[0],x[1],False] for x in myList]
return tempList

You could just do this
def DoProcess(self, myList):
return [sublist + [False] for sublist in myList]
There are several options as to why this isn't working:
Somewhere in myList you have sublists which have fewer than two elements (or more generally, sublists whose length isn't 2). You can find them with
print [sublist for sublist in myList if len(sublist) != 2]
There are elements in myList which aren't lists. You can find them with
print [element for element in myList if not isinstance(element, list)]
Combine them together with
print [element for element in myList if not isinstance(element, list) or len(element) != 2]

How can I implement a data structure that preserves order and has fast insertion/removal?

I'm looking for a data structure that preserves the order of its elements (which may change over the life of the data structure, as the client may move elements around).
It should allow fast search, insertion before/after a given element, removal of a given element, lookup of the first and last elements, and bidirectional iteration starting at a given element.
What would be a good implementation?
Here's my first attempt:
A class deriving from both collections.abc.Iterable and collections.abc.MutableSet that contains a linked list and a dictionary. The dictionary's keys are elements, values are nodes in the linked list. The dictionary would handle search for a node given an element. Once an element is found, the linked list would handle insertion before/after, deletion, and iteration. The dictionary would be updated by adding or deleting the relevant key/value pair. Clearly, with this approach the elements must be hashable and unique (or else, we'll need another layer of indirection where each element is represented by an auto-assigned numeric identifier, and only those identifiers are stored as keys).
It seems to me that this would be strictly better in asymptotic complexity than either list or collections.deque, but I may be wrong. [EDIT: Wrong, as pointed out by #roliu. Unlike list or deque, I would not be able to find an element by its numeric index in O(1). As of now, it is O(N) but I am sure there's some way to make it O(log N) if necessary.]

A slightly modified version of Raymond Hettinger's OrderedSet recipe seems to satisfy all my requirements. I only added support for position-based access and read/write.
# changes vs. original recipe at http://code.activestate.com/recipes/576696/:
# added a position parameter to add
# changed how pop works, and added popleft
# added find, get_start, get_end, next_pos, prev_pos, __getitem__, __setitem__
class OrderedSetPlus(collections.MutableSet, collections.Iterable):
'''
>>> oset = OrderedSetPlus([3, 3, 3, 2, 1, 8, 8])
>>> oset.add(13)
>>> p = oset.find(2)
>>> oset.add(15, p)
>>> oset
OrderedSetPlus([3, 15, 2, 1, 8, 13])
>>> p = oset.next_pos(p)
>>> oset[p]
1
>>> oset.add(7, p)
>>> oset
OrderedSetPlus([3, 15, 2, 7, 1, 8, 13])
>>> oset[p] = 20
>>> oset
OrderedSetPlus([3, 15, 2, 7, 20, 8, 13])
'''
class DuplicateElement(Exception):
pass
def __init__(self, iterable=None):
self.end = end = []
end += [None, end, end] # sentinel node for doubly linked list
self.map = {} # key --> [key, prev, next]
if iterable is not None:
self |= iterable
def __len__(self):
return len(self.map)
def __contains__(self, key):
return key in self.map
def find(self, key):
return self.map.get(key, None)
# inserts element before the specified position
# if pos is None, inserts at the end
# position can only be obtained by calling instance methods
def add(self, key, pos = None):
if pos is None:
pos = self.end
if key not in self.map:
curr = pos[PREV]
curr[NEXT] = pos[PREV] = self.map[key] = [key, curr, pos]
def discard(self, key):
if key in self.map:
key, prev, next = self.map.pop(key)
prev[NEXT] = next
next[PREV] = prev
def __iter__(self):
end = self.end
curr = end[NEXT]
while curr is not end:
yield curr[KEY]
curr = curr[NEXT]
def get_end(self):
return self.end[PREV]
def get_start(self):
return self.end[NEXT]
def next_pos(self, pos):
pos = pos[NEXT]
return None if pos is self.end else pos
def prev_pos(self, pos):
pos = pos[PREV]
return None if pos is self.end else pos
def __getitem__(self, pos):
return pos[KEY]
def __setitem__(self, pos, key):
if key in self.map:
raise DuplicateElement
pos[KEY] = key
def __reversed__(self):
end = self.end
curr = end[PREV]
while curr is not end:
yield curr[KEY]
curr = curr[PREV]
def popleft(self):
return self.pop(pos = self.get_start())
def pop(self, pos=None):
if not self:
raise IndexError()
if pos is None:
pos = self.get_end()
key = self[pos]
#key = next(reversed(self)) if last else next(iter(self))
self.discard(key)
return key
def __repr__(self):
return '{}({})'.format(self.__class__.__name__, list(self))
def __eq__(self, other):
if isinstance(other, OrderedSet):
return len(self) == len(other) and list(self) == list(other)
return set(self) == set(other)

Using doubly-linked lists in Python is a bit uncommon. However, your own proposed solution of a doubly-linked list and a dictionary has the correct complexity: all the operations you ask for are O(1).
I don't think there is in the standard library a more direct implementation. Trees might be nice theoretically, but also come with drawbacks, like O(log n) or (precisely) their general absence from the standard library.

I do know this isn't exactly a direct answer to your question (because this is not a python-implemented solution), but if your data structure is going to be rather large I'd consider a Redis db. You can use redis-pi to talk to it from Python.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Impact of removing a list item on reversed() in python - python

Related

Difference between these two methods of an user-defined class

In a python iterator select other iterator based on conditions

Do we really need to use built in iter in all for loops of our iterables?

how to handle "Too many values to unpack"

How can I implement a data structure that preserves order and has fast insertion/removal?

Categories

Resources