List Duplicate Removal Issue? - python

I wrote a code that eliminates duplicates from a list in Python. Here it is:
List = [4, 2, 3, 1, 7, 4, 5, 6, 5]
NewList = []
for i in List:
if List[i] not in NewList:
NewList.append(i)
print ("Original List:", List)
print ("Reworked List:", NewList)
However the output is:
Original List: [4, 2, 3, 1, 7, 4, 5, 6, 5]
Reworked List: [4, 2, 3, 7, 6]
Why is the 1 missing from the output?

Using set() kills the order. You can try this :
>>> from collections import OrderedDict
>>> NewList = list(OrderedDict.fromkeys(List))

You missunderstood how for loops in python work. If you write for i in List: i will have the values from the list one after another, so in your case 4, 2, 3 ...
I assume you thought it'd be counting up.
You have several different ways of removing duplicates from lists in python that you don't need to write yourself, like converting it to a set and back to a list.
list(set(List))
Also you should read Pep8 and name your variables differently, but that just btw.
Also if you really want a loop with indices, you can use enumerate in python.
for idx, value in enumerate(myList):
print(idx)
print(myList[idx])

Your code is not doing what you think it does. Your problem are these two constructs:
for i in List: # 1
if List[i] # 2
Here you are using i to represent the elements inside the list: 4, 2, 3, ...
Here you are using i to represent the indices of the List: 0, 1, 2, ...
Obviously, 1. and 2. are not compatible. In short, your check is performed for a different element than the one you put in your list.
You can fix this by treating i consistently at both steps:
for i in List:
if i not in NewList:
NewList.append(i)

Your method for iterating over lists is not correct. Your code currently iterates over elements, but then does not use that element in your logic. Your code doesn't error because the values of your list happen also to be valid list indices.
You have a few options:
#1 Iterate over elements directly
Use elements of a list as you iterate over them directly:
NewList = []
for el in L:
if el not in NewList:
NewList.append(i)
#2 Iterate over list index
This is often considered anti-pattern, but is not invalid. You can iterate over the range of the size of the list and then use list indexing:
NewList = []
for idx in range(len(L)):
if L[idx] not in NewList:
NewList.append(i)
In both cases, notice how we avoid naming variables after built-ins. Don't use list or List, you can use L instead.
#3 unique_everseen
It's more efficient to implement hashing for O(1) lookup complexity. There is a unique_everseen recipe in the itertools docs, replicated in 3rd party toolz.unique. This works by using a seen set and tracking items as you iterate.
from toolz import unique
NewList = list(unique(L))

Related

Extract indices of certain values from a list in python

Suppose, I have a list [0.5,1,1.5,2,2.5,3,3.5,4,4.5], now I would like to extract the indices of a list [1.5,2.5,3.5,4.5], which is a subset of that list.
You can use the inbuilt function <list>.index to find the index of each element(of second list) in the first list.
Having that learnt, you can use list comprehension to achieve what you want:
>>> list1 = [0.5,1,1.5,2,2.5,3,3.5,4,4.5]
>>> list2 = [1.5,2.5,3.5,4.5]
>>> [list1.index(elem) for elem in list2]
[2, 4, 6, 8]
One other option is to use enumerate function. See the following answer:
a = [0.5,1,1.5,2,2.5,3,3.5,4,4.5]
b = [1.5,2.5,3.5,4.5]
indexes = []
for id, i in enumerate(a):
if i in b:
indexes.append(id)
print(indexes)
The output is going to be [2, 4, 6, 8].

I want to make a function that takes a list and returns the same list but without the duplicate elements, what's wrong with this program?

I want to make a function that takes a list and returns the same list but without the duplicate elements
def func(list1):
for i in range(len(list1)):
list2 = [x for x in list1 if x != list1[i]]
if list1[i] in list2:
list1.remove(i)
return list1
mylist = [1,2,3,4]
func(mylist)
There are lots of answers already showing how to implement this task correctly, but none discuss your actual code.
Deconstructing your code
Hopefully this will help understand why your code did not work. Here's your code, with comments added:
def func(list1):
for i in range(len(list1)):
list2 = [x for x in list1 if x != list1[i]]
Here you construct a list, list2, that does not contain list1[i].
if list1[i] in list2:
This if statement is always going to be false, by construction of list2.
list1.remove(i)
Even if you fix the logic of the if statement, list1.remove(i) would cause trouble: trying to remove elements of a list while iterating over it invalidates the indices for the rest of the iteration. It's best to avoid doing this and build the results into a new list instead.
Furthermore, list1.remove(i) is looking for value i in list1 and will remove it, wherever it first finds it. It's not removing the i'th element, as this code seems to expect.
return list1
And this return statement was probably not meant to be indented this deep, it should probablement have been at the same level as the for statement at the top.
Working code closer to your logic
While I don't think this is the best solution -- the other answers provide better ones -- here is a solution inspired from your code, which works.
def func(list1):
list2 = [] # We'll append elements here
for item in list1: # more pythonic way to loop over a list
if item not in list2:
list2.append(item)
return list2
My preferred solution
OK, so now I'm duplicating information in other answers, but with Python 3.7(*) or more recent, where dict is guaranteed to keep the order of the elements inserted into it, I'd use #Pedro Maia's solution:
def func(list1):
return list(dict.fromkeys(list1))
In fact, I used that technique in my code base recently.
In older versions of Python, dicts are not ordered. However, there has been the OrderedDict class in the collections library since Python 2.7, so this code would work with any version of Python >= 2.7:
from collections import OrderedDict
def func(list1):
return list(OrderedDict.fromkeys(list1))
If you're not familiar with them, have a look at the collections library and the itertools library, they provide immensely useful helpers for all sorts of problems.
(*) According to the manual for collections.OrderedDict, Python dicts have only been guaranteed to be ordered since Python 3.7, even though they started actually being ordered in Python 3.6, at least for the CPython implementation. See also: Are dictionaries ordered in Python 3.6+?
You can just do this:
def remove_duplicates(l):
return list(set(l))
Now remove_duplicates([1,1,2,3,3,4,5,6,6,7]) will return [1, 2, 3, 4, 5, 6, 7].
Let's assume this input l = [1,2,3,2,7,3,4,4,7,1]
removing the duplicates without keeping order
use a python set:
list(set(l))
output: [1, 2, 3, 4, 7]
keeping order
use a dictionary (or collections.Counter)
from collections import Counter
list(Counter(l))
output: [1, 2, 3, 7, 4]
Use a set:
mylist = [1,2,3,1,3,2,4]
mylist = set(mylist)
print(mylist)
Output:
{1, 2, 3, 4}
To keep the order you can do:
mylist = list(dict.fromkeys(mylist))
which doesn't require any library
In python we can use the set to remove the duplicate elements:
def removeDuplicates(l):
return list(set(l))
mylist = [1, 2, 3, 4, 2, 3, 4, 2, 3, 4]
newList = removeDuplicates(mylist)
print(newList)

Most pythonic way to select at random an element in a list with conditions

I have two list:
list1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
list2 = [1, 4, 5]
I want to select an element from list1 but it shouldn't belongs to list2.
I have a solution using a while loop but I would like to have a one liner more pythonic and elegant.
If your elements are unique you can use the set difference. (Convert list1 to a set and remove the elements from list2). Then draw a random sample.
random.choice(list(set(list1).difference(list2)))
[item for item in list1 if not in list2]
To make it a bit faster(because lookup in set faster than in list):
list2_items = set(list2)
[item for item in list1 if not in list2_items]
or with filter function(you will get a generator object in Python3
filter(lambda item: item not in list2, list1)
Converting list2 to set will also speed up filtering here.
To get more information read about list comprehensions.
Update: it seems that I missed a point about one random value. Well, you still can use list comprehensions, but use random.choice as was mentioned before:
import random
random.choice([item for item in list1 if not in list2_items])
It will filter choices and then get one randomly. #zeehio response looks like better solution.
import random
import itertools
next(item for item in (random.choice(list1) for _ in itertools.count()) if item not in list2)
That's equivalent to:
while True:
item = random.choice(list1)
if item not in list2:
break
You would probably want to utilize sets like so:
list1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
list2 = [1, 4, 5]
import random
print(random.choice([x for x in set(list1).difference(list2)])) # kudos #chepner
That way we randomly draw from set(list1) - set(list2): elements in list1 but not in list2. This approach also scales well as lists 1 & 2 become large.
As #MSeifert noticed, the conversion of list1 to set will remove any duplicate elements that might be present in list1 thus altering the probabilities. If list1 might contain duplicate elements in the general case you might want to do this instead:
print(random.choice([x for x in list1 if x not in list2]))
without importing random library:
print((list(set(list1) - set(list2))).pop())
inside pop() you can give the index of the element you want to select,it will pop out that element
example : for selecting list of index 1(from new list) ,((list(set(list1) - set(list2))).pop(1))
Here this (list(set(list1) - set(list2)) code will create a new list which contains only list with items from the first list which aren't present in the second one,

Python while and for loops - can I make the code more efficient?

I wrote the following code:
initial_list = [item1, item2, item3, item4, ...]
lists = [[list1], [list2], [list3], [list4], ..., [list(n-1)], [list(n)]]
# The number of elements in the both lists might chance
while len(initial_list) > 0:
for list in lists:
if len(initial_list) == 0:
break
item = initial_list.pop(0)
list.append(item)
I would like to know if there is any nicer/simpler/shorter way to write the code above? If yes, please do not use difficult functions (etc.) because I am still a beginner and will not understand it.
I believe you are trying to append items from initial_list to the nested list values in lists until all value initial_list are used, cycling over lists from the start if there are fewer nested lists than initial values to append.
Use zip() to pair up nested lists with initial_list items:
from itertools import cycle
for nested, value in zip(cycle(lists), initial_list):
nested.append(value)
The itertools.cycle() function here ensures that all values from initial_list are used; zip() stops at the shortest list, which will always be initial_lists here.
Demo with an initial list of 10 values (integers 0 through to 9) and a nested list with 4 empty sublists:
>>> from itertools import cycle
>>> initial_list = range(10)
>>> lists = [[] for _ in range(4)]
>>> for nested, value in zip(cycle(lists), initial_list):
... nested.append(value)
...
>>> lists
[[0, 4, 8], [1, 5, 9], [2, 6], [3, 7]]
Not using cycling would require you to keep a counter and append to lists[count % len(lists)]. The counter can be generated with the enumerate() function:
for i, value in enumerate(initial_list):
lists[i % len(lists)].append(value)
You can use:
if container:
...instead of:
if len(container) > 0:
Also, list_.pop(0) is slow. Maybe you are looking for a http://docs.python.org/3/library/collections.html#collections.deque ? It'll allow you to rapidly remove things from either end, but you won't be able to conveniently access (EG) the middle anymore.

Why does list comprehension not filter out duplicates?

I have a workaround to the following question. That workaround would be a for loop with a test for inclusion in the output like the following:
#!/usr/bin/env python
def rem_dup(dup_list):
reduced_list = []
for val in dup_list:
if val in reduced_list:
continue
else:
reduced_list.append(val)
return reduced_list
I am asking the following question, because I am curious to see if there is a list comprehension solution.
Given the following data:
reduced_vals = []
vals = [1, 2, 3, 3, 2, 2, 4, 5, 5, 0, 0]
Why does
reduced_vals = = [x for x in vals if x not in reduced_vals]
produce the same list?
>>> reduced_vals
[1, 2, 3, 3, 2, 2, 4, 5, 5, 0, 0]
I think it has something to do with checking the output (reduced_vals) as part of an assignment to a list. I am curious, though as to the exact reason.
Thank you.
The list comprehension creates a new list, while reduced_vals points to the empty list all the time during the evaluation of the list comprehension.
The semantics of assignments in Python are: Evaluate the right-hand side and bind the resulting object to the name on the left-hand side. An assignment to a bare name never mutates any object.
By the way, you should use set() or collections.OrderedDict.fromkeys() to remove duplicates in an efficient way (depending on whether you need to preserve order or not).
You are testing against an empty list.
The expression is evaluated in full first before assigning it as the new value of reduced_vals, which thus remains empty until the full list expression has been evaluated.
To put it differently, the expression [x for x in vals if x not in reduced_vals] is executed in isolation. It might help if you view your code in a slightly modified fashion:
temp_var = [x for x in vals if x not in reduced_vals]
reduced_vals = temp_var
del temp_var
The above is the moral equivalent of directly assigning the result of the list expression to reduced_vals, but I have more clearly separated assigning the result by using a second variable.
In this line: [x for x in vals if x not in reduced_vals] there's not a single value that is not in reduced_vals, as reduced_vals is the empty list []. In other words, nothing gets filtered and all the elements in vals get returned.
If you try this:
[x for x in vals if x in reduced_vals]
The result is the empty list [], as all the values are not in reduced_vals (which is empty). I believe you have a confusion with how the filtering part works in a list comprehension: you see, the filter only selects those values which make the condition True, but it won't prevent duplicate values.
Now, if what you need is to filter out duplicates, then a list comprehension is not the right tool for the job. For that, use a set - although it won't necessarily preserve the order of the original list, it'll guarantee that the elements are unique:
vals = [1, 2, 3, 3, 2, 2, 4, 5, 5, 0, 0]
list(set(vals))
> [0, 1, 2, 3, 4, 5]
Because the elements in the list comprehension are not assigned to reduced_vals until the entire list has been constructed. Use a for loop with .append() if you want to make this work.
Because reduced_vals is not changing during evaluation of the list comprehension.

Categories

Resources