Removing N Elements - python

How to write a code that returns a list with all elements that occurred N times removed?
I'm trying to write a python function duplicate(elem, N) which returns a copy of the input list with all elements that appeared at least N times removed. The list can't be sorted. For example ((2, 2, 2, 1, 2, 3, 4, 4, 5), 2) would return (1,3,5).

You can use collections.Counter
import collections
def duplicate(lst, n):
counts = collections.Counter(lst)
keep = {v for v, count in counts.items() if count < n}
return [el for el in lst if el in keep]

Another way to solve your question by using defaultdict (PS: we can also use a normal dicts too if we want it):
from collections import defaultdict
def duplicate(data, n):
counter = defaultdict(int)
for elm in data:
counter[elm] += 1
return [k for k, v in counter.items() if v < n]
a = (2, 2, 2, 1, 2, 3, 4, 4, 5)
print(duplicate(a, 2))
Output:
[1, 3, 5]
PS: If you want to compare my code and the Python's collections.Counter class code, visit this collections in Python's Github repository. (Spoiler alert, Counter is using a dict not defaultdict)

I would use a list comprehension here. Assuming elem is your list of values as input.
[x for x in set(elem) if elem.count(x) < 2]

Related

How to remove all repeating elements from a list in python?

I have a list like this:-
[1,2,3,4,3,5,3,6,7,8]
I want to remove the repeating element (here:- 3) completely from the list like this:-
[1,2,4,5,6,7,8]
How do I implement this in python such that not only the first occurrence of duplicate element is removed but all the duplicating values are removed
You can use Counter from collections to count the number of occurrences and select those elements which appear exactly once using a list comprehension:
from collections import Counter
a = [1,2,3,4,3,5,3,6,7,8]
[k for k, v in Counter(a).items() if v == 1]
(Counter basically returns a dictionary where elements are stored as keys and their counts are stored as values.)
This code should work
#Finding duplicate
def dup(x):
s = len(x)
duplicate = []
for i in range(s):
k = i + 1
for j in range(k, s):
if x[i] == x[j] and x[i] not in duplicate:
duplicate.append(x[i])
return duplicate
#begining
list1 = [1, 2, 3, 4, 3, 5, 3, 6, 7, 8]
#Finding duplicate
dup_list = (dup(list1))
#removing duplicates from the list
final_list = list(set(list1) - set(dup_list))
print(final_list)
A good and efficient way will be to use pandas
import pandas as pd
sample_list = [1,2,3,4,3,5,3,6,7,8]
unique_list = list(pd.Series(sample_list).drop_duplicates())
>> unique_list
>> [1, 2, 3, 4, 5, 6, 7, 8]
Use a set. To get rid of all repeating values use this:
a = [1,2,3,4,3,5,3,6,7,8]
print(a)
a = list(set(a))
print(a)
That will output
[1,2,3,4,3,5,3,6,7,8]
[1,2,4,5,6,7,8]

Count elements in a list without using Counter

Should be returning a dictionary in the following format:
key_count([1, 3, 2, 1, 5, 3, 5, 1, 4]) ⇒ {
1: 3,
2: 1,
3: 2,
4: 1,
5: 2,
}
I know the fastest way to do it is the following:
import collections
def key_count(l):
return collections.Counter(l)
However, I would like to do it without importing collections.Counter.
So far I have:
x = []
def key_count(l):
for i in l:
if i not in x:
x.append(i)
count = []
for i in l:
if i == i:
I approached the problem by trying to extract the two sides (keys and values) of the dictionary into separate lists and then use zip to create the dictionary. As you can see, I was able to extract the keys of the eventual dictionary but I cannot figure out how to add the number of occurrences for each number from the original list in a new list. I wanted to create an empty list count that will eventually be a list of numbers that denote how many times each number in the original list appeared. Any tips? Would appreciate not giving away the full answer as I am trying to solve this! Thanks in advance
Separating the keys and values is a lot of effort when you could just build the dict directly. Here's the algorithm. I'll leave the implementation up to you, though it sort of implements itself.
Make an empty dict
Iterate through the list
If the element is not in the dict, set the value to 1. Otherwise, add to the existing value.
See the implementation here:
https://stackoverflow.com/a/8041395/4518341
Classic reduce problem. Using a loop:
a = [1, 3, 2, 1, 5, 3, 5, 1, 4]
m = {}
for n in a:
if n in m: m[n] += 1
else: m[n] = 1
print(m)
Or explicit reduce:
from functools import reduce
a = [1, 3, 2, 1, 5, 3, 5, 1, 4]
def f(m, n):
if n in m: m[n] += 1
else: m[n] = 1
return m
m2 = reduce(f, a, {})
print(m2)
use a dictionary to pair keys and values and use your x[] to track the diferrent items founded.
import collections
def keycount(l):
return collections.Counter(l)
key_count=[1, 3, 2, 1, 5, 3, 5, 1, 4]
x = []
dictionary ={}
def Collection_count(l):
for i in l:
if i not in x:
x.append(i)
dictionary[i]=1
else:
dictionary[i]=dictionary[i]+1
Collection_count(key_count)
[print(key, value) for (key, value) in sorted(dictionary.items())]

How to pair up consecutive elements in a list, without overlap?

Simple problem: I have u = [1, 2, 3, 4, 5, 6]. I want v = [(1,2), (3,4), (5,6)]. How?
Obviously, one solution is:
v = []
for i in range(len(u)):
j = 2*i
v += [(u(j), u(j+1))]
But this is so ugly and lame I hate looking at it every time I do it. Is there a better way?
zip is a pythonic solution.
list(zip(u[::2], u[1::2]))
Here's a more general solution for iterables that don't support slice notation
def n_clusters(iterable, n=2):
return zip(*[iter(iterable)]*n)
Note that any leftover values (say your input list has an odd number of elements) aren't included in the output. We an correct this by using the itertools.zip_longest function.
from itertools import zip_longest
def n_clusters(iterable, n=2, fillvalue=None):
return zip_longest(*[iter(iterable)]*n, fillvalue=fillvalue)
Both of the above will return iterators over the grouped values. You can either build a list out of them manually
grouped_values = list(n_cluster([1, 2, 3, 4]))
# [(1, 2), (3, 4)]
or iterate over them directly
for a, b in n_cluster([1, 2, 3, 4]):
...

Sorting repeated elements

What I try to do is to write a function sort_repeated(L) which returns a sorted list of the repeated elements in the list L.
For example,
>>sort_repeated([1,2,3,2,1])
[1,2]
However, my code does not work properly. What did I do wrong in my code?
def f5(nums):
count = dict()
if not nums:
for num in nums:
if count[num]:
count[num] += 1
else:
count[num] = 1
return sorted([num for num in count if count[num]>1])
return []
if count[num]: will fail if the dictionary doesn't have the key already. Take a look at the various counter recipes on this site and use one instead.
Also, not nums is true if nums is an empty sequence, which means that the loop body will never be executed. Invert the condition.
Use a counter and check for values greater than 1
from collections import Counter
def sort_repeated(_list):
cntr = Counter(_list)
print sorted([x for x in cntr.keys() if cntr[x] > 1])
sort_repeated([7, 7, 1, 2, 3, 2, 1, 4, 3, 4, 6, 5])
>> [1, 2, 3, 4, 7]

How to find duplicate elements in array using for loop in Python?

I have a list with duplicate elements:
list_a=[1,2,3,5,6,7,5,2]
tmp=[]
for i in list_a:
if tmp.__contains__(i):
print i
else:
tmp.append(i)
I have used the above code to find the duplicate elements in the list_a. I don't want to remove the elements from list.
But I want to use for loop here.
Normally C/C++ we use like this I guess:
for (int i=0;i<=list_a.length;i++)
for (int j=i+1;j<=list_a.length;j++)
if (list_a[i]==list_a[j])
print list_a[i]
how do we use like this in Python?
for i in list_a:
for j in list_a[1:]:
....
I tried the above code. But it gets solution wrong. I don't know how to increase the value for j.
Just for information, In python 2.7+, we can use Counter
import collections
x=[1, 2, 3, 5, 6, 7, 5, 2]
>>> x
[1, 2, 3, 5, 6, 7, 5, 2]
>>> y=collections.Counter(x)
>>> y
Counter({2: 2, 5: 2, 1: 1, 3: 1, 6: 1, 7: 1})
Unique List
>>> list(y)
[1, 2, 3, 5, 6, 7]
Items found more than 1 time
>>> [i for i in y if y[i]>1]
[2, 5]
Items found only one time
>>> [i for i in y if y[i]==1]
[1, 3, 6, 7]
Use the in operator instead of calling __contains__ directly.
What you have almost works (but is O(n**2)):
for i in xrange(len(list_a)):
for j in xrange(i + 1, len(list_a)):
if list_a[i] == list_a[j]:
print "duplicate:", list_a[i]
But it's far easier to use a set (roughly O(n) due to the hash table):
seen = set()
for n in list_a:
if n in seen:
print "duplicate:", n
else:
seen.add(n)
Or a dict, if you want to track locations of duplicates (also O(n)):
import collections
items = collections.defaultdict(list)
for i, item in enumerate(list_a):
items[item].append(i)
for item, locs in items.iteritems():
if len(locs) > 1:
print "duplicates of", item, "at", locs
Or even just detect a duplicate somewhere (also O(n)):
if len(set(list_a)) != len(list_a):
print "duplicate"
You could always use a list comprehension:
dups = [x for x in list_a if list_a.count(x) > 1]
Before Python 2.3, use dict() :
>>> lst = [1, 2, 3, 5, 6, 7, 5, 2]
>>> stats = {}
>>> for x in lst : # count occurrences of each letter:
... stats[x] = stats.get(x, 0) + 1
>>> print stats
{1: 1, 2: 2, 3: 1, 5: 2, 6: 1, 7: 1} # filter letters appearing more than once:
>>> duplicates = [dup for (dup, i) in stats.items() if i > 1]
>>> print duplicates
So a function :
def getDuplicates(iterable):
"""
Take an iterable and return a generator yielding its duplicate items.
Items must be hashable.
e.g :
>>> sorted(list(getDuplicates([1, 2, 3, 5, 6, 7, 5, 2])))
[2, 5]
"""
stats = {}
for x in iterable :
stats[x] = stats.get(x, 0) + 1
return (dup for (dup, i) in stats.items() if i > 1)
With Python 2.3 comes set(), and it's even a built-in after than :
def getDuplicates(iterable):
"""
Take an iterable and return a generator yielding its duplicate items.
Items must be hashable.
e.g :
>>> sorted(list(getDuplicates([1, 2, 3, 5, 6, 7, 5, 2])))
[2, 5]
"""
try: # try using built-in set
found = set()
except NameError: # fallback on the sets module
from sets import Set
found = Set()
for x in iterable:
if x in found : # set is a collection that can't contain duplicate
yield x
found.add(x) # duplicate won't be added anyway
With Python 2.7 and above, you have the collections module providing the very same function than the dict one, and we can make it shorter (and faster, it's probably C under the hood) than solution 1 :
import collections
def getDuplicates(iterable):
"""
Take an iterable and return a generator yielding its duplicate items.
Items must be hashable.
e.g :
>>> sorted(list(getDuplicates([1, 2, 3, 5, 6, 7, 5, 2])))
[2, 5]
"""
return (dup for (dup, i) in collections.counter(iterable).items() if i > 1)
I'd stick with solution 2.
You can use this function to find duplicates:
def get_duplicates(arr):
dup_arr = arr[:]
for i in set(arr):
dup_arr.remove(i)
return list(set(dup_arr))
Examples
print get_duplicates([1,2,3,5,6,7,5,2])
[2, 5]
print get_duplicates([1,2,1,3,4,5,4,4,6,7,8,2])
[1, 2, 4]
If you're looking for one-to-one mapping between your nested loops and Python, this is what you want:
n = len(list_a)
for i in range(n):
for j in range(i+1, n):
if list_a[i] == list_a[j]:
print list_a[i]
The code above is not "Pythonic". I would do it something like this:
seen = set()
for i in list_a:
if i in seen:
print i
else:
seen.add(i)
Also, don't use __contains__, rather, use in (as above).
The following requires the elements of your list to be hashable (not just implementing __eq__ ).
I find it more pythonic to use a defaultdict (and you have the number of repetitions for free):
import collections
l = [1, 2, 4, 1, 3, 3]
d = collections.defaultdict(int)
for x in l:
d[x] += 1
print [k for k, v in d.iteritems() if v > 1]
# prints [1, 3]
Using only itertools, and works fine on Python 2.5
from itertools import groupby
list_a = sorted([1, 2, 3, 5, 6, 7, 5, 2])
result = dict([(r, len(list(grp))) for r, grp in groupby(list_a)])
Result:
{1: 1, 2: 2, 3: 1, 5: 2, 6: 1, 7: 1}
It looks like you have a list (list_a) potentially including duplicates, which you would rather keep as it is, and build a de-duplicated list tmp based on list_a. In Python 2.7, you can accomplish this with one line:
tmp = list(set(list_a))
Comparing the lengths of tmp and list_a at this point should clarify if there were indeed duplicate items in list_a. This may help simplify things if you want to go into the loop for additional processing.
You could just "translate" it line by line.
c++
for (int i=0;i<=list_a.length;i++)
for (int j=i+1;j<=list_a.length;j++)
if (list_a[i]==list_a[j])
print list_a[i]
Python
for i in range(0, len(list_a)):
for j in range(i + 1, len(list_a))
if list_a[i] == list_a[j]:
print list_a[i]
c++ for loop:
for(int x = start; x < end; ++x)
Python equivalent:
for x in range(start, end):
Just quick and dirty,
list_a=[1,2,3,5,6,7,5,2]
holding_list=[]
for x in list_a:
if x in holding_list:
pass
else:
holding_list.append(x)
print holding_list
Output [1, 2, 3, 5, 6, 7]
Using numpy:
import numpy as np
count,value = np.histogram(list_a,bins=np.hstack((np.unique(list_a),np.inf)))
print 'duplicate value(s) in list_a: ' + ', '.join([str(v) for v in value[count>1]])
In case of Python3 and if you two lists
def removedup(List1,List2):
List1_copy = List1[:]
for i in List1_copy:
if i in List2:
List1.remove(i)
List1 = [4,5,6,7]
List2 = [6,7,8,9]
removedup(List1,List2)
print (List1)
Granted, I haven't done tests, but I guess it's going to be hard to beat pandas in speed:
pd.DataFrame(list_a, columns=["x"]).groupby('x').size().to_dict()
You can use:
b=['E', 'P', 'P', 'E', 'O', 'E']
c={}
for i in b:
value=0
for j in b:
if(i == j):
value+=1
c[i]=value
print(c)
Output:
{'E': 3, 'P': 2, 'O': 1}
Find duplicates in the list using loops, conditional logic, logical operators, and list methods
some_list = ['a','b','c','d','e','b','n','n','c','c','h',]
duplicates = []
for values in some_list:
if some_list.count(values) > 1:
if values not in duplicates:
duplicates.append(values)
print("Duplicate Values are : ",duplicates)
Finding the number of repeating elements in a list:
myList = [3, 2, 2, 5, 3, 8, 3, 4, 'a', 'a', 'f', 4, 4, 1, 8, 'D']
listCleaned = set(myList)
for s in listCleaned:
count = 0
for i in myList:
if s == i :
count += 1
print(f'total {s} => {count}')
Try like this:
list_a=[1,2,3,5,6,7,5,2]
unique_values = []
duplicates = []
for i in list_a:
if i not in unique_values:
unique_values.append(i)
else:
found = False
for x in duplicates:
if x.get("key") == i:
found = True
if found:
x["occurrence"] += 1
else:
duplicates.append({
"key": i,
"occurrence": 1
})
some_string= list(input("Enter any string:\n"))
count={}
dup_count={}
for i in some_string:
if i not in count:
count[i]=1
else:
count[i]+=1
dup_count[i]=count[i]
print("Duplicates of given string are below:\n",dup_count)
A little bit more Pythonic implementation (not the most, of course), but in the spirit of your C code could be:
for i, elem in enumerate(seq):
if elem in seq[i+1:]:
print elem
Edit: yes, it prints the elements more than once if there're more than 2 repetitions, but that's what the op's C pseudo code does too.

Categories

Resources