Iterating and Removing From a Set - Possible or Not? - python

I don't have a specific piece of code that I want looked at; however, I do have a question that I can't seem to get a straight, clear answer.
Here's the question: if I have a set, I can iterate over it in a for loop. As I iterate over it can I remove specific numbers using .remove() or do I have to convert my set to a list first? If that is the case, why must I convert it first?

In both cases, you should avoid iterating and removing items from a list or set. It's not a good idea to modify something that you're iterating through as you can get unexpected results. For instance, lets start with a set
numbers_set = {1,2,3,4,5,6}
for num in numbers_set:
numbers_set.remove(num)
print(numbers_set)
We attempt to iterate through and delete each number but we get this error.
Traceback (most recent call last):
File ".\test.py", line 2, in <module>
for num in numbers_set:
RuntimeError: Set changed size during iteration
Now you mentioned "do I have to convert my set to a list first?". Well lets test it out.
numbers_list = [1,2,3,4,5,6]
for num in numbers_list:
print(num)
numbers_list.remove(num)
print(numbers_list)
This is the result:
[2, 4, 6]
We would expect the list to be empty but it gave us this result. Whether you're trying to iterate through a list or a set and delete items, its generally not a good idea.

#nathancy has already given a good explanation as to why deleting during iteration won't work, but I'd like to suggest an alternative: instead of doing the deletion at the same time as you iterate, do it instead as a second stage. So, you'd instead:
Iterate over your set to decide what you want to delete, and store the collection of things to be deleted separately.
Iterate over your to-be-deleted collection and removing each item from the original set.
For instance:
def should_be_deleted(num):
return num % 2 == 0
my_set = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
to_delete = []
for value in my_set:
if should_be_deleted(value):
to_delete.append(value)
for value in to_delete:
my_set.remove(value)
print(my_set)
Prints:
set([1, 3, 5, 7, 9])
The same pattern can be applied to delete from any collection—not just set, but also list, dict, etc.

This is how I'd do this:
myset = ...
newset = set()
while myset:
v = myset.pop()
if not do_i_want_to_delete_this_value(v):
newset.add(v)
myset = newset
A list comprehension will work too:
myset = set([x for x in myset if not do_i_want_to_delete_this_value(x)])
But this gets messy if you want to do other stuff while you're iterating and you don't want to wrap all that logic in a single function call. Nothing wrong with doing that though.
myset = set([x for x in myset if process_element(x)])
process_element() just has to return True/False to say if the element should be removed from the set.

Related

I want to implement bubblesort using python list comprehension but it displays blank

I want to implement bubblesort using python list comprehension but it displays blank. Tried using assignment operators to swap (l[j]=l[j+1]) but it throws an error as list comprehension does not support assignment
l = [8, 1, 3, 5, 4, 6, 7, 2]
newlist= [ [(l[j],l[j+1]),(l[j+1],l[j])] for i in range(1,len(l)-1) for j in range(0,len(l)-1) if l[j]>l[j+1] ]
Expected output is: 1, 2, 3, 4, 5, 6, 7, 8
But I am getting the output as [].
This fails from a conceptual problem, which is that every filtered result must produce a value to include in the list. Your list comprehension has no capability to store intermediate results -- it's doomed to failure. You have to determine whether to emit a value, and which value to emit, given only the original list, i, and j. That information does not exist with bubble-sort logic.
For instance, consider the first nested iteration. You have this information at hand:
l = [8,1,3,5,4,6,7,2]
i = 1
j = 0
Given this, you must decide right now whether or not to put information into y our final list -- if so, which information to put there. You cannot defer this to a second pass, because you have no temporary storage within the comprehension.
See the problem?

How can I figure out which arbitrary number occurs twice in a list of integers from input? (Python)

Say I'm receiving a list of arbitrary numbers from input, like
[1,2,3,4,5,6,7,8,8,9,10]
My code doesn't know what numbers these are going to be before it receives the list, and I want to return the number that appears twice automatically. How do I go about doing so?
Thank you.
You could do:
input = [1,2,3,4,5,6,7,8,8,9,10]
list_of_duplicates = []
for i in input:
if i not in list_of_duplicates:
list_of_duplicates.append(i)
input.pop(i)
print(input)
Now input will have all the numbers that were in the list multiple times.
You can use Counter By defualt Method in python 2 and 3
from collections import Counter
lst=[1,2,3,4,5,6,7,8,8,9,10]
items=[k for k,v in Counter(lst).items() if v==2]
print(items)
Hope this helps.
input = [1,2,3,4,5,6,7,8,8,9,10]
unique = set(input)
twice = []
for item in unique:
if input.count(item) == 2:
twice.append(item)
I've created something monstrous that does it in one line because my brain likes to think when it's time for bed I guess?
This will return a list of all duplicate values given a list of integers.
dupes = list(set(map(lambda x: x if inputList.count(x) >= 2 else None, inputList))-set([None]))
How does it work? The map() function applies a function every value of a list, in your case our input list with possible duplicates is called "inputList". It then applies a lambda function that returns the value of the integer being iterated over IF the iterated value when applied to the inputList via the .count() method is greater than or equal to two, else if it doesn't count as a duplicate it will return None. With this lambda function being applied by the map function, we get a list back that contains a bunch of None's and the actual integers detected as duplicates via the lambda function. Given that this is a list, we the use set to de-duplicate it. We then minus the set of duplicates against a static set made from a list with one item of None, stripping None values from our set of the map returned list. Finally we take the set after subtraction and convert it to a list called "dupes" for nice and easy use.
Example usage...
inputList = [1, 2, 3, 4, 4, 4, 5, 6, 6, 7, 1001, 1002, 1002, 99999, 100000, 1000001, 1000001]
dupes = list(set(map(lambda x: x if inputList.count(x) >= 2 else None, inputList))-set([None]))
print(dupes)
[1000001, 1002, 4, 6]
I'll let someone else elaborate on potential scope concerns..... or other concerns......
This will create a list of the numbers that are duplicated.
x = [1, 2, 3, 4, 5, 6, 7, 8, 8, 9, 10]
s = {}
duplicates = []
for n in x:
try:
if s[n]:
duplicates.append(n)
s[n] = False
except KeyError:
s[n] = True
print(duplicates)
Assuming the list doesn't contain 0

Extract index of Non duplicate elements in python list

I have a list:
input = ['a','b','c','a','b','d','e','d','g','g']
I want index of all elements except duplicate in a list.
output = [0,1,2,5,6,8]
You should iterate over the enumerated list and add each element to a set of "seen" elements and add the index to the output list if the element hasn't already been seen (is not in the "seen" set).
Oh, the name input overrides the built-in input() function, so I renamed it input_list.
output = []
seen = set()
for i,e in enumerate(input_list):
if e not in seen:
output.append(i)
seen.add(e)
which gives output as [0, 1, 2, 5, 6, 8].
why use a set?
You could be thinking, why use a set when you could do something like:
[i for i,e in enumerate(input_list) if input_list.index(e) == i]
which would work because .index returns you the index of the first element in a list with that value, so if you check the index of an element against this, you can assert that it is the first occurrence of that element and filter out those elements which aren't the first occurrences.
However, this is not as efficient as using a set, because list.index requires Python to iterate over the list until it finds the element (or doesn't). This operation is O(n) complexity and since we are calling it for every element in input_list, the whole solution would be O(n^2).
On the other hand, using a set, as in the first solution, yields an O(n) solution, because checking if an element is in a set is complexity O(1) (average case). This is due to how sets are implemented (they are like lists, but each element is stored at the index of its hash so you can just compute the hash of an element and see if there is an element there to check membership rather than iterating over it - note that this is a vague oversimplification but is the idea of them).
Thus, since each check for membership is O(1), and we do this for each element, we get an O(n) solution which is much better than an O(n^2) solution.
You could do a something like this, checking for counts (although this is computation-heavy):
indexes = []
for i, x in enumerate(inputlist):
if (inputlist.count(x) == 1
and x not in inputlist[:i]):
indexes.append(i)
This checks for the following:
if the item appears only once. If so, continue...
if the item hasn't appeared before in the list up till now. If so, add to the results list
In case you don't mind indexes of the last occurrences of duplicates instead and are using Python 3.6+, here's an alternative solution:
list(dict(map(reversed, enumerate(input))).values())
This returns:
[3, 4, 2, 7, 6, 9]
Here is a one-liner using zip and reversed
>>> input = ['a','b','c','a','b','d','e','d','g','g']
>>> sorted(dict(zip(reversed(input), range(len(input)-1, -1, -1))).values())
[0, 1, 2, 5, 6, 8]
This question is missing a pandas solution. 😉
>>> import pandas as pd
>>> inp = ['a','b','c','a','b','d','e','d','g','g']
>>>
>>> pd.DataFrame(list(enumerate(inp))).groupby(1).first()[0].tolist()
[0, 1, 2, 5, 6, 8]
Yet another version, using a side effect in a list comprehension.
>>> xs=['a','b','c','a','b','d','e','d','g','g']
>>> seen = set()
>>> [i for i, v in enumerate(xs) if v not in seen and not seen.add(v)]
[0, 1, 2, 5, 6, 8]
The list comprehension filters indices of values that have not been seen already.
The trick is that not seen.add(v) is always true because seen.add(v) returns None.
Because of short circuit evaluation, seen.add(v) is performed if and only if v is not in seen, adding new values to seen on the fly.
At the end, seen contains all the values of the input list.
>>> seen
{'a', 'c', 'g', 'b', 'd', 'e'}
Note: it is usually a bad idea to use side effects in list comprehension,
but you might see this trick sometimes.

Removing duplicates and preserving order when elements inside the list is list itself

I have a following problem while trying to do some nodal analysis:
For example:
my_list=[[1,2,3,1],[2,3,1,2],[3,2,1,3]]
I want to write a function that treats the element_list inside my_list in a following way:
-The number of occurrence of certain element inside the list of my_list is not important and, as long as the unique elements inside the list are same, they are identical.
Find the identical loop based on the above premises and only keep the
first one and ignore other identical lists of my_list while preserving
the order.
Thus, in above example the function should return just the first list which is [1,2,3,1] because all the lists inside my_list are equal based on above premises.
I wrote a function in python to do this but I think it can be shortened and I am not sure if this is an efficient way to do it. Here is my code:
def _remove_duplicate_loops(duplicate_loop):
loops=[]
for i in range(len(duplicate_loop)):
unique_el_list=[]
for j in range(len(duplicate_loop[i])):
if (duplicate_loop[i][j] not in unique_el_list):
unique_el_list.append(duplicate_loop[i][j])
loops.append(unique_el_list[:])
loops_set=[set(x) for x in loops]
unique_loop_dict={}
for k in range(len(loops_set)):
if (loops_set[k] not in list(unique_loop_dict.values())):
unique_loop_dict[k]=loops_set[k]
unique_loop_pos=list(unique_loop_dict.keys())
unique_loops=[]
for l in range(len(unique_loop_pos)):
unique_loops.append(duplicate_loop[l])
return unique_loops
from collections import OrderedDict
my_list = [[1, 2, 3, 1], [2, 3, 1, 2], [3, 2, 1, 3]]
seen_combos = OrderedDict()
for sublist in my_list:
unique_elements = frozenset(sublist)
if unique_elements not in seen_combos:
seen_combos[unique_elements] = sublist
my_list = seen_combos.values()
you could do it in a fairly straightforward way using dictionaries. but you'll need to use frozenset instead of set, as sets are mutable and therefore not hashable.
def _remove_duplicate_lists(duplicate_loop):
dupdict = OrderedDict((frozenset(x), x) for x in reversed(duplicate_loop))
return reversed(dupdict.values())
should do it. Note the double reversed() because normally the last item is the one that is preserved, where you want the first, and the double reverses accomplish that.
edit: correction, yes, per Steven's answer, it must be an OrderedDict(), or the values returned will not be correct. His version might be slightly faster too..
edit again: You need an ordered dict if the order of the lists is important. Say your list is
[[1,2,3,4], [4,3,2,1], [5,6,7,8]]
The ordered dict version will ALWAYS return
[[1,2,3,4], [5,6,7,8]]
However, the regular dict version may return the above, or may return
[[5,6,7,8], [1,2,3,4]]
If you don't care, a non-ordered dict version may be faster/use less memory.

Filter duplicates from a list in Python [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
I'm given a problem where I have to filter out dupes from a list such a
a = [1,1,4,5,6,5]
This is my code:
def unique(a):
uni = []
for value in a:
if value[0] not in found:
yield value
found.add(value[0])
print list(unique(a))
However, when I define the list, a, and try unique(a), I get this output:
<generator object unique at 0x0000000002891750>
Can someone tell me what I'm doing wrong? Why can't I get the list?
EDIT, NEW PROBLEM..
I was able to get it print out the filtered list, but I lose the order of the list.
How can I prevent this?
def unique(a):
s = set()
for i in a:
if i not in s:
s.add(i)
return s
You have to keep track of all the elements that have been seen. The best way is to use set as lookup complexity of it is O(1).
>>> def unique(it):
s = set()
for el in it:
if el not in s:
s.add(el)
yield el
>>> list(unique(a))
[1, 4, 5, 6]
If you don't need to keep the order of the elements you can utilize the set constructor, and then convert it back to list. This will remove all the duplicates, but will destroy the order of the elements:
list(set(a))
First of all, to remove duplicates, use a set:
>>> a = [1, 1, 4, 5, 6, 5]
>>> set(a)
{1, 4, 5, 6}
>>> list(set(a)) # if you really _need_ a list, you can convert it back
[1, 4, 5, 6]
Second, the output you get, generator object unique at 0x..., means that you have a generator object, instead of a simple list as its return value. And this is what you should expect after using yield in the function. yield will make any function a generator and will give you only all results, if you request them (or iterate over it). If you just want to get the full result, you can call list() on the object to create a list from the generator object: list(unique(a)).
However, then you will notice the errors your function gives you: TypeError: 'int' object is not subscriptable. The reason for that is the value[0] you use. value is an element from the list (you iterate over the list) and as such is an integer. You cannot get the first element from the integer, so you probably meant just value there.
Next, you add elements to found although you defined the list as uni first, so you should decide on one of the names there. Also, the method is append, not add.
Finally, you should really not recursively call the method with the same parameter multiple times inside the function again, as this will just fill up the stack without providing any use, so remove the print out of it.
Then, you end up with this, which works just fine:
>>> def unique(a):
found = [] # better: use a set() here
for value in a:
if value not in found:
yield value
found.append(value)
>>> list(unique(a))
[1, 4, 5, 6]
But still, this is not really a good solution, and you should really just use set instead, as it will also give you further methods to work with that set once its created (e.g. a quick check for containedness).
I'm also required to get the answer just by inputting unique(a)
In that case, just remove the yield value from your function, and return the found list at the end of it.
This is a well known classic:
>>> def unique(xs):
... seen = set()
... seen_add = seen.add
... return [x for x in xs if x not in seen and not seen_add(x)]
...
>>> unique([1, 2, 3, 3, 4, 1, 3, 5, 5, 4, 6])
[1, 2, 3, 4, 5, 6]
The usual way to do this is list(set(a)
def unique(a):
return list(set(a))
Now, coming to to your question. yield returns a generator that you must iterator over and not print. So if you have a function, which has a yield in it, iterate over like like for return_value from function_that_yields():
There are more problems with your question. You have not defined found and then you indexing value which may not be a container.

Categories

Resources