Keep duplicates in a list in Python - python

I know this is probably an easy answer but I can't figure it out. What is the best way in Python to keep the duplicates in a list:
x = [1,2,2,2,3,4,5,6,6,7]
The output should be:
[2,6]
I found this link: Find (and keep) duplicates of sublist in python, but I'm still relatively new to Python and I can't get it to work for a simple list.

I'd use a collections.Counter:
from collections import Counter
x = [1, 2, 2, 2, 3, 4, 5, 6, 6, 7]
counts = Counter(x)
output = [value for value, count in counts.items() if count > 1]
Here's another version which keeps the order of when the item was first duplicated that only assumes that the sequence passed in contains hashable items and it will work back to when set or yeild was introduced to the language (whenever that was).
def keep_dupes(iterable):
seen = set()
dupes = set()
for x in iterable:
if x in seen and x not in dupes:
yield x
dupes.add(x)
else:
seen.add(x)
print list(keep_dupes([1,2,2,2,3,4,5,6,6,7]))

This is a short way to do it if the list is sorted already:
x = [1,2,2,2,3,4,5,6,6,7]
from itertools import groupby
print [key for key,group in groupby(x) if len(list(group)) > 1]

List Comprehension in combination with set() will do exactly what you want.
list(set([i for i in x if x.count(i) >= 2]))
>>> [2,6]

keepin' it simple:
array2 = []
aux = 0
aux2=0
for i in x:
aux2 = i
if(aux2==aux):
array2.append(i)
aux= i
list(set(array2))
That should work

Not efficient but just to get the output, you could try:
import numpy as np
def check_for_repeat(check_list):
repeated_list = []
for idx in range(len(check_list)):
elem = check_list[idx]
check_list[idx] = None
if elem in temp_list:
repeated_list.append(elem)
repeated_list = np.array(repeated_list)
return list(np.unique(repeated_list))

Related

Is there a way I can make a list out of this?

So I programmed this code to print out how many times a number would be printed in the list that I provided, and the output works, but I want to put all the values that I get into a list, how can I do that?
This is my code...
i = [5,5,7,9,9,9,9,9,8,8]
def num_list(i):
return [(i.count(x),x) for x in set(i)]
for tv in num_list(i):
if tv[1] > 1:
print(tv)
The output that I get is
(2, 8)
(5, 9)
(2, 5)
(1, 7)
but I want the output to be like
[2,8,5,9,2,5,1,7)
How can I do that??
Just do:
tvlist = []
for tv in num_list(i):
if tv[1] > 1:
tvlist.extend(tv)
print(tvlist)
Or a list comprehension:
tvlist = [x for tv in num_list(i) if tv[1] > 1 for x in tv]
Also your function could just simply be collections.Counter:
from collections import Counter
def num_list(i):
return Counter(i).items()
flattened_iter = itertools.chain.from_iterable(num_list(i))
print(list(flattened_iter))
is how i would flatten a list
as mentioned by everyone else collections.Counter is likely to be significantly better performance for large lists...
if you would rather implement it yourself you can pretty easily
def myCounter(a_list):
counter = {}
for item in a_list:
# in modern python versions order is preserved in dicts
counter[item] = counter.get(item,0) + 1
for unique_item in counter:
# make it a generator just for ease
# we will just yield twice to create a flat list
yield counter[unique_item]
yield unique_item
i = [5,5,7,9,9,9,9,9,8,8]
print(list(myCounter(i)))
Using a collections.Counter is more efficient. This paired with itertools.chain will get you your desired result:
from collections import Counter
from itertools import chain
i = [5,5,7,9,9,9,9,9,8,8]
r = list(chain(*((v, k) for k, v in Counter(i).items() if v > 1)))
print(r)
[2, 5, 5, 9, 2, 8]
Without itertools.chain
r = []
for k, v in Counter(i).items():
if v > 1:
r.extend((v, k))

Filter a list of strings by frequency

I have a list of strings:
a = ['book','book','cards','book','foo','foo','computer']
I want to return anything in this list that's x > 2
Final output:
a = ['book','book','book']
I'm not quite sure how to approach this. But here's two methods I had in mind:
Approach One:
I've created a dictionary to count the number of times an item appears:
a = ['book','book','cards','book','foo','foo','computer']
import collections
def update_item_counts(item_counts, itemset):
for a in itemset:
item_counts[a] +=1
test = defaultdict(int)
update_item_counts(test, a)
print(test)
Out: defaultdict(<class 'int'>, {'book': 3, 'cards': 1, 'foo': 2, 'computer': 1})
I want to filter out the list with this dictionary but I'm not sure how to do that.
Approach two:
I tried to write a list comprehension but it doesn't seem to work:
res = [k for k in a if a.count > 2 in k]
A very barebone answer is that you should replace a.count by a.count(k) in your second solution.
Although, do not attempt to use list.count for this, as this will traverse the list for each item. Instead count occurences first with collections.Counter. This has the advantage of traversing the list only once.
from collections import Counter
from itertools import repeat
a = ['book','book','cards','book','foo','foo','computer']
count = Counter(a)
output = [word for item, n in count.items() if n > 2 for word in repeat(item, n)]
print(output) # ['book', 'book', 'book']
Note that the list comprehension is equivalent to the loop below.
output = []
for item, n in count.items():
if n > 2:
output.extend(repeat(item, n))
Try this:
a_list = ['book','book','cards','book','foo','foo','computer']
b_list = []
for a in a_list:
if a_list.count(a) > 2:
b_list.append(a)
print(b_list)
# ['book', 'book', 'book']
Edit: You mentioned list comprehension. You are on the right track! You can do it with list comprehension like this:
a_list = ['book','book','cards','book','foo','foo','computer']
c_list = [a for a in a_list if a_list.count(a) > 2]
Good luck!
a = ['book','book','cards','book','foo','foo','computer']
list(filter(lambda s: a.count(s) > 2, a))
Your first attempt builds a dictionary with all of the counts. You need to take this a step further to get the items that you want:
res = [k for k in test if test[k] > 2]
Now that you have built this by hand, you should check out the builtin Counter class that does all of the work for you.
If you just want to print there are better answers already, if you want to remove you can try this.
a = ['book','book','cards','book','foo','foo','computer']
countdict = {}
for word in a:
if word not in countdict:
countdict[word] = 1
else:
countdict[word] += 1
for x, y in countdict.items():
if (2 >= y):
for i in range(y):
a.remove(x)
You can try this.
def my_filter(my_list, my_freq):
'''Filter a list of strings by frequency'''
# use set() to unique my_list, then turn set back to list
unique_list = list(set(my_list))
# count frequency in unique_list
frequencies = []
for value in unique_list:
frequencies.append(my_list.count(value))
# filter frequency
return_list = []
for i, frequency in enumerate(frequencies):
if frequency > my_freq:
for _ in range(frequency):
return_list.append(unique_list[i])
return return_list
a = ['book','book','cards','book','foo','foo','computer']
my_filter(a, 2)
['book', 'book', 'book']

Removing some of the duplicates from a list in Python

I would like to remove a certain number of duplicates of a list without removing all of them. For example, I have a list [1,2,3,4,4,4,4,4] and I want to remove 3 of the 4's, so that I am left with [1,2,3,4,4]. A naive way to do it would probably be
def remove_n_duplicates(remove_from, what, how_many):
for j in range(how_many):
remove_from.remove(what)
Is there a way to do remove the three 4's in one pass through the list, but keep the other two.
If you just want to remove the first n occurrences of something from a list, this is pretty easy to do with a generator:
def remove_n_dupes(remove_from, what, how_many):
count = 0
for item in remove_from:
if item == what and count < how_many:
count += 1
else:
yield item
Usage looks like:
lst = [1,2,3,4,4,4,4,4]
print list(remove_n_dupes(lst, 4, 3)) # [1, 2, 3, 4, 4]
Keeping a specified number of duplicates of any item is similarly easy if we use a little extra auxiliary storage:
from collections import Counter
def keep_n_dupes(remove_from, how_many):
counts = Counter()
for item in remove_from:
counts[item] += 1
if counts[item] <= how_many:
yield item
Usage is similar:
lst = [1,1,1,1,2,3,4,4,4,4,4]
print list(keep_n_dupes(lst, 2)) # [1, 1, 2, 3, 4, 4]
Here the input is the list and the max number of items that you want to keep. The caveat is that the items need to be hashable...
You can use Python's set functionality with the & operator to create a list of lists and then flatten the list. The result list will be [1, 2, 3, 4, 4].
x = [1,2,3,4,4,4,4,4]
x2 = [val for sublist in [[item]*max(1, x.count(item)-3) for item in set(x) & set(x)] for val in sublist]
As a function you would have the following.
def remove_n_duplicates(remove_from, what, how_many):
return [val for sublist in [[item]*max(1, remove_from.count(item)-how_many) if item == what else [item]*remove_from.count(item) for item in set(remove_from) & set(remove_from)] for val in sublist]
If the list is sorted, there's the fast solution:
def remove_n_duplicates(remove_from, what, how_many):
index = 0
for i in range(len(remove_from)):
if remove_from[i] == what:
index = i
break
if index + how_many >= len(remove_from):
#There aren't enough things to remove.
return
for i in range(index, how_many):
if remove_from[i] != what:
#Again, there aren't enough things to remove
return
endIndex = index + how_many
return remove_from[:index+1] + remove_from[endIndex:]
Note that this returns the new array, so you want to do arr = removeCount(arr, 4, 3)
Here is another trick which might be useful sometimes. Not to be taken as the recommended recipe.
def remove_n_duplicates(remove_from, what, how_many):
exec('remove_from.remove(what);'*how_many)
I can solve it in different way using collections.
from collections import Counter
li = [1,2,3,4,4,4,4]
cntLi = Counter(li)
print cntLi.keys()

Nested lists python

Can anyone tell me how can I call for indexes in a nested list?
Generally I just write:
for i in range (list)
but what if I have a list with nested lists as below:
Nlist = [[2,2,2],[3,3,3],[4,4,4]...]
and I want to go through the indexes of each one separately?
If you really need the indices you can just do what you said again for the inner list:
l = [[2,2,2],[3,3,3],[4,4,4]]
for index1 in xrange(len(l)):
for index2 in xrange(len(l[index1])):
print index1, index2, l[index1][index2]
But it is more pythonic to iterate through the list itself:
for inner_l in l:
for item in inner_l:
print item
If you really need the indices you can also use enumerate:
for index1, inner_l in enumerate(l):
for index2, item in enumerate(inner_l):
print index1, index2, item, l[index1][index2]
Try this setup:
a = [["a","b","c",],["d","e"],["f","g","h"]]
To print the 2nd element in the 1st list ("b"), use print a[0][1] - For the 2nd element in 3rd list ("g"): print a[2][1]
The first brackets reference which nested list you're accessing, the second pair references the item in that list.
You can do this. Adapt it to your situation:
for l in Nlist:
for item in l:
print item
The question title is too wide and the author's need is more specific. In my case, I needed to extract all elements from nested list like in the example below:
Example:
input -> [1,2,[3,4]]
output -> [1,2,3,4]
The code below gives me the result, but I would like to know if anyone can create a simpler answer:
def get_elements_from_nested_list(l, new_l):
if l is not None:
e = l[0]
if isinstance(e, list):
get_elements_from_nested_list(e, new_l)
else:
new_l.append(e)
if len(l) > 1:
return get_elements_from_nested_list(l[1:], new_l)
else:
return new_l
Call of the method
l = [1,2,[3,4]]
new_l = []
get_elements_from_nested_list(l, new_l)
n = [[1, 2, 3], [4, 5, 6, 7, 8, 9]]
def flatten(lists):
results = []
for numbers in lists:
for numbers2 in numbers:
results.append(numbers2)
return results
print flatten(n)
Output: n = [1,2,3,4,5,6,7,8,9]
I think you want to access list values and their indices simultaneously and separately:
l = [[2,2,2],[3,3,3],[4,4,4],[5,5,5]]
l_len = len(l)
l_item_len = len(l[0])
for i in range(l_len):
for j in range(l_item_len):
print(f'List[{i}][{j}] : {l[i][j]}' )

Delete item in a list using a for-loop

I have an array with subjects and every subject has connected time. I want to compare every subjects in the list. If there are two of the same subjects, I want to add the times of both subjects, and also want to delete the second subject information (subject-name and time).
But If I delete the item, the list become shorter, and I get an out-of-range-error. I tried to make the list shorter with using subjectlegth-1, but this also don't work.
...
subjectlegth = 8
for x in range(subjectlength):
for y in range(subjectlength):
if subject[x] == subject[y]:
if x != y:
#add
time[x] = time[x] + time[y]
#delete
del time[y]
del subject[y]
subjectlength = subjectlength - 1
Iterate backwards, if you can:
for x in range(subjectlength - 1, -1, -1):
and similarly for y.
If the elements of subject are hashable:
finalinfo = {}
for s, t in zip(subject, time):
finalinfo[s] = finalinfo.get(s, 0) + t
This will result in a dict with subject: time key-value pairs.
The best practice is to make a new list of the entries to delete, and to delete them after walking the list:
to_del = []
subjectlength = 8
for x in range(subjectlength):
for y in range(x):
if subject[x] == subject[y]:
#add
time[x] = time[x] + time[y]
to_del.append(y)
to_del.reverse()
for d in to_del:
del subject[d]
del time[d]
An alternate way would be to create the subject and time lists anew, using a dict to sum up the times of recurring subjects (I am assuming subjects are strings i.e. hashable).
subjects=['math','english','necromancy','philosophy','english','latin','physics','latin']
time=[1,2,3,4,5,6,7,8]
tuples=zip(subjects,time)
my_dict={}
for subject,t in tuples:
try:
my_dict[subject]+=t
except KeyError:
my_dict[subject]=t
subjects,time=my_dict.keys(), my_dict.values()
print subjects,time
Though a while loop is certainly a better choice for this, if you insist on using a for loop, one can replace the list elements-to-be-deleted with None, or any other distinguishable item, and redefine the list after the for loop. The following code removes even elements from a list of integers:
nums = [1, 1, 5, 2, 10, 4, 4, 9, 3, 9]
for i in range(len(nums)):
# select the item that satisfies the condition
if nums[i] % 2 == 0:
# do_something_with_the(item)
nums[i] = None # Not needed anymore, so set it to None
# redefine the list and exclude the None items
nums = [item for item in nums if item is not None]
# num = [1, 1, 5, 9, 3, 9]
In the case of the question in this post:
...
for i in range(subjectlength - 1):
for j in range(i+1, subjectlength):
if subject[i] == subject[j]:
#add
time[i] += time[j]
# set to None instead of delete
time[j] = None
subject[j] = None
time = [item for item in time if item is not None]
subject = [item for item in subject if item is not None]

Categories

Resources