Compare occurrences of shared items between lists - python

Ok, for a fun project I'm working on in order to learn some python I'm hitting a wall with what should be a basic task: I need to compare lists for the times items shared among the lists occur in each list. Using
shared_items = set(alist).intersection(blist)
gives me the items sharedbetwen the lists, but it does not tell me, how often those items occur in each list.
I tried loops like this for example:
def the_count(alist,blist):
c = 0
for x in alist:
for y in blist:
if x == y:
c += 1
return c
but that doesn't do the trick.
Another attempt was to use Counter:
c = Counter(alist)
b = Counter(blist)
But trying to loop over the Counter results failed too, last try was
a = Counter(alist)
b = Counter(blist)
for key, val in a:
if key in b:
val1 = b[key]
if val < val1:
print b[key]
else:
print a[key]

You almost had it using the set intersection. Since that gives you the common elements amongst both lists, all you have to do now is loop over that and count the elements. One way could be:
list1 = [0, 1, 2, 3, 1, 2, 3, 4, 3, 2]
list2 = [1, 4, 3, 5, 2, 1, 0, 2, 7, 8]
shared = set(list1).intersection(list2)
# Now, loop over the elements and create a dictionary using a generator.
# The key will be the shared element, and the value would be a tuple
# which corresponds to the counts of the first list and the second list, respectively
counts = {num:(list1.count(num), list2.count(num)) for num in shared}
counts now contains:
{
0: (1, 1),
1: (2, 2),
2: (3, 2),
3: (3, 1),
4: (1, 1)
}
This can further be abstracted into a function similar to:
def count_shared_elements(list1, list2):
shared = set(list1).intersection(list2)
return {num:(list1.count(num), list2.count(num)) for num in shared}

Using list (dict) as jrd1 pointed comprehension:
>>> list1 = [0, 1, 2, 3, 1, 2, 3, 4, 3, 2]
>>> list2 = [1, 4, 3, 5, 2, 1, 0, 2, 7, 8]
>>> {i:(list1.count(i), list2.count(i)) for i in set(list1) & set(list2)}
{0: (1, 1), 1: (2, 2), 2: (3, 2), 3: (3, 1), 4: (1, 1)}

Take a look at the answers linked in the question comments, another way to do this would be like this:
for a in alist:
c+= blist.count(a)

Best way is to get the unique items from two lists and check the count of those distinct numbers in each list.
for distinct_num in set(alist + blist):
print(alist.count(distinct_num))
print(blist.count(distinct_num))

Related

Python. How to conveniently count the frequence of lists in a collection of lists

I have a list of list.
e.g. list_a = [[1,2,3], [2,3], [4,3,2], [2,3]]
I want to count them like
[1,2,3]: 1
[2,3]: 2
[4,3,2]: 1
There is a library Counter in collections but not for unhashable elements like list. Currently, I just try to use other indirect ways for example transfer the list [1,2,3] into a string "1_2_3" to do that. Is there any other way can enable the count on the list directly?
Not the prettiest way to do it, but this works:
list_a = [[1,2,3], [2,3], [4,3,2], [2,3]]
counts = {}
for x in list_a:
counts.setdefault(tuple(x), list()).append(1)
for a, b in counts.items():
counts[a] = sum(b)
print(counts)
{(2, 3): 2, (4, 3, 2): 1, (1, 2, 3): 1}
A possible approach to do this job is using a dict.
Create a empty dict
Iterate over the list using a for loop.
For each element (iteration), check if the dict contains it.
If it doesn't, save it in the dict as a key. The value will be the occurrence counter.
If it does, just increment its value.
Possible implementation:
occurrence_dict = {}
for list in list_a:
if (occurrence_dict.get(str(list), False)):
occurence_dict[str(list)] += 1
else:
ocorrence_dict[str(list)] = 1
print(occurence_dict)
You can achieve it easily, by using tuple instead of list
c = Counter(tuple(item) for item in list_a)
# or
c = Counter(map(tuple, list_a))
# Counter({(2, 3): 2, (1, 2, 3): 1, (4, 3, 2): 1})
# exactly what you expected
(1, 2, 3) 1
(2, 3) 2
(4, 3, 2) 1
Way 1
Through the indexes of repeatable lists
list_a = [[1,2,3], [2,3], [4,3,2], [2,3], [1,2,3]] # just add some data
# step 1
dd = {i:v for i, v in enumerate(list_a)}
print(dd)
Out[1]:
{0: [1, 2, 3], 1: [2, 3], 2: [4, 3, 2], 3: [2, 3], 4: [1, 2, 3]}
# step 2
tpl = [tuple(x for x,y in dd.items() if y == b) for a,b in dd.items()]
print(tpl)
Out[2]:
[(0, 4), (1, 3), (2,), (1, 3), (0, 4)] # here is the tuple of indexes of matching lists
# step 3
result = {tuple(list_a[a[0]]):len(a) for a in set(tpl)}
print(result)
Out[3]:
{(4, 3, 2): 1, (2, 3): 2, (1, 2, 3): 2}
Way 2
Through converting nested lists to tuples
{i:[tuple(a) for a in list_a].count(i) for i in [tuple(a) for a in list_a]}
Out[1]:
{(1, 2, 3): 2, (2, 3): 2, (4, 3, 2): 1}

Python Sorting with lambda

I am new to python and I have a question
A = [3,2,4,1]
N = len(A)
B = sorted(range(N), key = lambda i: A[i])
print(B)
output #[3, 1, 0, 2]
input #A = [7,2,4,1]
output #[3, 1, 2, 0]
I do not understand the output ?? Can anyone explain to me?
Let's talk about the specific example you have used
A = [3, 2, 4, 1]
N = len(A) . # N = 4
B = sorted(range(N), key = lambda i: A[i]) # sorted([0,1,2,3], key= lambda i:A[i])
Basically you are trying to sort [0,1,2,3] based on the values A[i] which are [3,2,4,1]
Now, A[3] < A[1] < A[0] < A[2]
And so you get the answer as [3, 1, 0, 2]
In the sorted function the first element is the item you would like to sort. If you can you sort a generator that is converted to a list of size 4. The sorting function sorts according to the value given by the anonymous function.
In your case - A = [3, 2, 4, 1]
List to sort - [0, 1, 2, 3]. Keys for each element [3, 2, 4, 1]. Basically you can imagine you sort [(0, 3), (1, 2), (2, 4), (3, 1)] according to the second element and then left with the first element and this results [3, 1, 0, 2] you get.

Find most common element

How can I print the most common element of a list without importing a library?
l=[1,2,3,4,4,4]
So I want the output to be 4.
You can get the unique values first:
l = [1, 2, 3, 4, 4, 4]
s = set(l)
then you can create list of (occurrences, value) tuples
freq = [(l.count(i), i) for i in s] # [(1, 1), (1, 2), (1, 3), (3, 4)]
get the "biggest" element (biggest number of occurrences, the biggest value if there are more than one with the same number of occurrences):
result = max(freq) # (3, 4)
and print the value:
print(result[1]) # 4
or as a "one-liner" way:
l = [1, 2, 3, 4, 4, 4]
print(max((l.count(i), i) for i in set(l))[1]) # 4
lst=[1,2,2,2,3,3,4,4,5,6]
from collections import Counter
Counter(lst).most_common(1)[0]
Counter(lst) returns a dict of element-occurence pairs. most_common(n) returns the n most common elements from the dict, along with the number of occurences.

Python Random List Comprehension

I have a list similar to:
[1 2 1 4 5 2 3 2 4 5 3 1 4 2]
I want to create a list of x random elements from this list where none of the chosen elements are the same. The difficult part is that I would like to do this by using list comprehension...
So possible results if x = 3 would be:
[1 2 3]
[2 4 5]
[3 1 4]
[4 5 1]
etc...
Thanks!
I should have specified that I cannot convert the list to a set. Sorry!
I need the randomly selected numbers to be weighted. So if 1 appears 4 times in the list and 3 appears 2 times in the list, then 1 is twice as likely to be selected...
Disclaimer: the "use a list comprehension" requirement is absurd.
Moreover, if you want to use the weights, there are many excellent approaches listed at Eli Bendersky's page on weighted random sampling.
The following is inefficient, doesn't scale, etc., etc.
That said, it has not one but two (TWO!) list comprehensions, returns a list, never duplicates elements, and respects the weights in a sense:
>>> s = [1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]
>>> [x for x in random.choice([p for c in itertools.combinations(s, 3) for p in itertools.permutations(c) if len(set(c)) == 3])]
[3, 1, 2]
>>> [x for x in random.choice([p for c in itertools.combinations(s, 3) for p in itertools.permutations(c) if len(set(c)) == 3])]
[5, 3, 4]
>>> [x for x in random.choice([p for c in itertools.combinations(s, 3) for p in itertools.permutations(c) if len(set(c)) == 3])]
[1, 5, 2]
.. or, as simplified by FMc:
>>> [x for x in random.choice([p for p in itertools.permutations(s, 3) if len(set(p)) == 3])]
[3, 5, 2]
(I'll leave the x for x in there, even though it hurts not to simply write list(random.choice(..)) or just leave it as a tuple..)
Generally, you don't want to do this sort of thing in a list comprehension -- It'll lead to much harder to read code. However, if you really must, we can write a completely horrible 1 liner:
>>> values = [random.randint(0,10) for _ in xrange(12)]
>>> values
[1, 10, 6, 6, 3, 9, 0, 1, 8, 9, 1, 2]
>>> # This is the 1 liner -- The other line was just getting us a list to work with.
>>> [(lambda x=random.sample(values,3):any(values.remove(z) for z in x) or x)() for _ in xrange(4)]
[[6, 1, 8], [1, 6, 10], [1, 0, 2], [9, 3, 9]]
Please never use this code -- I only post it for fun/academic reasons.
Here's how it works:
I create a function inside the list comprehension with a default argument of 3 randomly selected elements from the input list. Inside the function, I remove the elements from values so that they aren't available to be picked again. since list.remove returns None, I can use any(lst.remove(x) for x in ...) to remove the values and return False. Since any returns False, we hit the or clause which just returns x (the default value with 3 randomly selected items) when the function is called. All that is left then is to call the function and let the magic happen.
The one catch here is that you need to make sure that the number of groups you request (here I chose 4) multiplied by the number of items per group (here I chose 3) is less than or equal to the number of values in your input list. It may seem obvious, but it's probably worth mentioning anyway...
Here's another version where I pull shuffle into the list comprehension:
>>> lst = [random.randint(0,10) for _ in xrange(12)]
>>> lst
[3, 5, 10, 9, 10, 1, 6, 10, 4, 3, 6, 5]
>>> [lst[i*3:i*3+3] for i in xrange(shuffle(lst) or 4)]
[[6, 10, 6], [3, 4, 10], [1, 3, 5], [9, 10, 5]]
This is significantly better than my first attempt, however, most people would still need to stop, scratch their head a bit before they figured out what this code was doing. I still assert that it would be much better to do this in multiple lines.
If I'm understanding your question properly, this should work:
def weighted_sample(L, x):
# might consider raising some kind of exception of len(set(L)) < x
while True:
ans = random.sample(L, x)
if len(set(ans)) == x:
return ans
Then if you want many such samples you can just do something like:
[weighted_sample(L, x) for _ in range(num_samples)]
I have a hard time conceiving of a comprehension for the sampling logic that isn't just obfuscated. The logic is a bit too complicated. It sounds like something randomly tacked on to a homework assignment to me.
If you don't like infinite looping, I haven't tried it but I think this will work:
def weighted_sample(L, x):
ans = []
c = collections.Counter(L)
while len(ans) < x:
r = random.randint(0, sum(c.values())
for k in c:
if r < c[k]:
ans.append(k)
del c[k]
break
else:
r -= c[k]
else:
# maybe throw an exception since this should never happen on valid input
return ans
First of all, I hope your list might be like
[1,2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]
so if you want to print the permutation from the given list as size 3, you can do as the following.
import itertools
l = [1,2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]
for permutation in itertools.permutations(list(set(l)),3):
print permutation,
Output:
(1, 2, 3) (1, 2, 4) (1, 2, 5) (1, 3, 2) (1, 3, 4) (1, 3, 5) (1, 4, 2) (1, 4, 3) (1, 4, 5) (1, 5, 2) (1, 5, 3) (1, 5, 4) (2, 1, 3) (2, 1, 4) (2, 1, 5) (2, 3, 1) (2, 3, 4) (2, 3, 5) (2, 4, 1) (2, 4, 3) (2, 4, 5) (2, 5, 1) (2, 5, 3) (2, 5, 4) (3, 1, 2) (3, 1, 4) (3, 1, 5) (3, 2, 1) (3, 2, 4) (3, 2, 5) (3, 4, 1) (3, 4, 2) (3, 4, 5) (3, 5, 1) (3, 5, 2) (3, 5, 4) (4, 1, 2) (4, 1, 3) (4, 1, 5) (4, 2, 1) (4, 2, 3) (4, 2, 5) (4, 3, 1) (4, 3, 2) (4, 3, 5) (4, 5, 1) (4, 5, 2) (4, 5, 3) (5, 1, 2) (5, 1, 3) (5, 1, 4) (5, 2, 1) (5, 2, 3) (5, 2, 4) (5, 3, 1) (5, 3, 2) (5, 3, 4) (5, 4, 1) (5, 4, 2) (5, 4, 3)
Hope this helps. :)
>>> from random import shuffle
>>> L = [1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]
>>> x=3
>>> shuffle(L)
>>> zip(*[L[i::x] for i in range(x)])
[(1, 3, 2), (2, 2, 1), (4, 5, 3), (1, 4, 4)]
You could also use a generator expression instead of the list comprehension
>>> zip(*(L[i::x] for i in range(x)))
[(1, 3, 2), (2, 2, 1), (4, 5, 3), (1, 4, 4)]
Starting with a way to do it without list compehensions:
import random
import itertools
alphabet = [1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]
def alphas():
while True:
yield random.choice(alphabet)
def filter_unique(iter):
found = set()
for a in iter:
if a not in found:
found.add(a)
yield a
def dice(x):
while True:
yield itertools.islice(
filter_unique(alphas()),
x
)
for i, output in enumerate(dice(3)):
print list(output)
if i > 10:
break
The part, where list comprehensions have troubles is filter_unique() since list comprehension does not have 'memory' of what it did output. The possible solution would be to generate many outputs while the one of good quality is not found as #DSM suggested.
The slow, naive approach is:
import random
def pick_n_unique(l, n):
res = set()
while len(res) < n:
res.add(random.choice(l))
return list(res)
This will pick elements and only quit when it has n unique ones:
>>> pick_n_unique([1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2], 3)
[2, 3, 4]
>>> pick_n_unique([1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2], 3)
[3, 4, 5]
However it can get slow if, for example, you have a list with thirty 1s and one 2, since once it has a 1 it'll keep spinning until it finally hits a 2. The better is to count the number of occurrences of each unique element, choose a random one weighted by their occurrence count, remove that element from the count list, and repeat until you have the desired number of elements:
def weighted_choice(item__counts):
total_counts = sum(count for item, count in item__counts.items())
which_count = random.random() * total_counts
for item, count in item__counts.items():
which_count -= count
if which_count < 0:
return item
raise ValueError("Should never get here")
def pick_n_unique(items, n):
item__counts = collections.Counter(items)
if len(item__counts) < n:
raise ValueError(
"Can't pick %d values with only %d unique values" % (
n, len(item__counts))
res = []
for i in xrange(n):
choice = weighted_choice(item__counts)
res.append(choice)
del item__counts[choice]
return tuple(res)
Either way, this is a problem not well-suited to list comprehensions.
def sample(self, population, k):
n = len(population)
if not 0 <= k <= n:
raise ValueError("sample larger than population")
result = [None] * k
try:
selected = set()
selected_add = selected.add
for i in xrange(k):
j = int(random.random() * n)
while j in selected:
j = int(random.random() * n)
selected_add(j)
result[i] = population[j]
except (TypeError, KeyError): # handle (at least) sets
if isinstance(population, list):
raise
return self.sample(tuple(population), k)
return result
Above is a simplied version of the sample function Lib/random.py. I only removed some optimization code for small data sets. The codes tell us straightly how to implement a customized sample function:
get a random number
if the number have appeared before just abandon it and get a new one
repeat the above steps until you get all the sample numbers you want.
Then the real problem turns out to be how to get a random value from a list by weight.This could be by the original random.sample(population, 1) in the Python standard library (a little overkill here, but simple).
Below is an implementation, because duplicates represent weight in your given list, we can use int(random.random() * array_length) to get a random index of your array.
import random
arr = [1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]
def sample_by_weight( population, k):
n = len(population)
if not 0 <= k <= len(set(population)):
raise ValueError("sample larger than population")
result = [None] * k
try:
selected = set()
selected_add = selected.add
for i in xrange(k):
j = population[int(random.random() * n)]
while j in selected:
j = population[int(random.random() * n)]
selected_add(j)
result[i] = j
except (TypeError, KeyError): # handle (at least) sets
if isinstance(population, list):
raise
return self.sample(tuple(population), k)
return result
[sample_by_weight(arr,3) for i in range(10)]
With the setup:
from random import shuffle
from collections import deque
l = [1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]
This code:
def getSubLists(l,n):
shuffle(l) #shuffle l so the elements are in 'random' order
l = deque(l,len(l)) #create a structure with O(1) insert/pop at both ends
while l: #while there are still elements to choose
sample = set() #use a set O(1) to check for duplicates
while len(sample) < n and l: #until the sample is n long or l is exhausted
top = l.pop() #get the top value in l
if top in sample:
l.appendleft(top) #add it to the back of l for a later sample
else:
sample.add(top) #it isn't in sample already so use it
yield sample #yield the sample
You end up with:
for s in getSubLists(l,3):
print s
>>>
set([1, 2, 5])
set([1, 2, 3])
set([2, 4, 5])
set([2, 3, 4])
set([1, 4])

How to count the frequency of the elements in an unordered list? [duplicate]

This question already has answers here:
Using a dictionary to count the items in a list
(8 answers)
Closed 7 months ago.
Given an unordered list of values like
a = [5, 1, 2, 2, 4, 3, 1, 2, 3, 1, 1, 5, 2]
How can I get the frequency of each value that appears in the list, like so?
# `a` has 4 instances of `1`, 4 of `2`, 2 of `3`, 1 of `4,` 2 of `5`
b = [4, 4, 2, 1, 2] # expected output
In Python 2.7 (or newer), you can use collections.Counter:
>>> import collections
>>> a = [5, 1, 2, 2, 4, 3, 1, 2, 3, 1, 1, 5, 2]
>>> counter = collections.Counter(a)
>>> counter
Counter({1: 4, 2: 4, 5: 2, 3: 2, 4: 1})
>>> counter.values()
dict_values([2, 4, 4, 1, 2])
>>> counter.keys()
dict_keys([5, 1, 2, 4, 3])
>>> counter.most_common(3)
[(1, 4), (2, 4), (5, 2)]
>>> dict(counter)
{5: 2, 1: 4, 2: 4, 4: 1, 3: 2}
>>> # Get the counts in order matching the original specification,
>>> # by iterating over keys in sorted order
>>> [counter[x] for x in sorted(counter.keys())]
[4, 4, 2, 1, 2]
If you are using Python 2.6 or older, you can download an implementation here.
If the list is sorted, you can use groupby from the itertools standard library (if it isn't, you can just sort it first, although this takes O(n lg n) time):
from itertools import groupby
a = [5, 1, 2, 2, 4, 3, 1, 2, 3, 1, 1, 5, 2]
[len(list(group)) for key, group in groupby(sorted(a))]
Output:
[4, 4, 2, 1, 2]
Python 2.7+ introduces Dictionary Comprehension. Building the dictionary from the list will get you the count as well as get rid of duplicates.
>>> a = [1,1,1,1,2,2,2,2,3,3,4,5,5]
>>> d = {x:a.count(x) for x in a}
>>> d
{1: 4, 2: 4, 3: 2, 4: 1, 5: 2}
>>> a, b = d.keys(), d.values()
>>> a
[1, 2, 3, 4, 5]
>>> b
[4, 4, 2, 1, 2]
Count the number of appearances manually by iterating through the list and counting them up, using a collections.defaultdict to track what has been seen so far:
from collections import defaultdict
appearances = defaultdict(int)
for curr in a:
appearances[curr] += 1
In Python 2.7+, you could use collections.Counter to count items
>>> a = [1,1,1,1,2,2,2,2,3,3,4,5,5]
>>>
>>> from collections import Counter
>>> c=Counter(a)
>>>
>>> c.values()
[4, 4, 2, 1, 2]
>>>
>>> c.keys()
[1, 2, 3, 4, 5]
Counting the frequency of elements is probably best done with a dictionary:
b = {}
for item in a:
b[item] = b.get(item, 0) + 1
To remove the duplicates, use a set:
a = list(set(a))
You can do this:
import numpy as np
a = [1,1,1,1,2,2,2,2,3,3,4,5,5]
np.unique(a, return_counts=True)
Output:
(array([1, 2, 3, 4, 5]), array([4, 4, 2, 1, 2], dtype=int64))
The first array is values, and the second array is the number of elements with these values.
So If you want to get just array with the numbers you should use this:
np.unique(a, return_counts=True)[1]
Here's another succint alternative using itertools.groupby which also works for unordered input:
from itertools import groupby
items = [5, 1, 1, 2, 2, 1, 1, 2, 2, 3, 4, 3, 5]
results = {value: len(list(freq)) for value, freq in groupby(sorted(items))}
results
format: {value: num_of_occurencies}
{1: 4, 2: 4, 3: 2, 4: 1, 5: 2}
I would simply use scipy.stats.itemfreq in the following manner:
from scipy.stats import itemfreq
a = [1,1,1,1,2,2,2,2,3,3,4,5,5]
freq = itemfreq(a)
a = freq[:,0]
b = freq[:,1]
you may check the documentation here: http://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.stats.itemfreq.html
from collections import Counter
a=["E","D","C","G","B","A","B","F","D","D","C","A","G","A","C","B","F","C","B"]
counter=Counter(a)
kk=[list(counter.keys()),list(counter.values())]
pd.DataFrame(np.array(kk).T, columns=['Letter','Count'])
seta = set(a)
b = [a.count(el) for el in seta]
a = list(seta) #Only if you really want it.
Suppose we have a list:
fruits = ['banana', 'banana', 'apple', 'banana']
We can find out how many of each fruit we have in the list like so:
import numpy as np
(unique, counts) = np.unique(fruits, return_counts=True)
{x:y for x,y in zip(unique, counts)}
Result:
{'banana': 3, 'apple': 1}
This answer is more explicit
a = [1,1,1,1,2,2,2,2,3,3,3,4,4]
d = {}
for item in a:
if item in d:
d[item] = d.get(item)+1
else:
d[item] = 1
for k,v in d.items():
print(str(k)+':'+str(v))
# output
#1:4
#2:4
#3:3
#4:2
#remove dups
d = set(a)
print(d)
#{1, 2, 3, 4}
For your first question, iterate the list and use a dictionary to keep track of an elements existsence.
For your second question, just use the set operator.
def frequencyDistribution(data):
return {i: data.count(i) for i in data}
print frequencyDistribution([1,2,3,4])
...
{1: 1, 2: 1, 3: 1, 4: 1} # originalNumber: count
I am quite late, but this will also work, and will help others:
a = [1,1,1,1,2,2,2,2,3,3,4,5,5]
freq_list = []
a_l = list(set(a))
for x in a_l:
freq_list.append(a.count(x))
print 'Freq',freq_list
print 'number',a_l
will produce this..
Freq [4, 4, 2, 1, 2]
number[1, 2, 3, 4, 5]
a = [1,1,1,1,2,2,2,2,3,3,4,5,5]
counts = dict.fromkeys(a, 0)
for el in a: counts[el] += 1
print(counts)
# {1: 4, 2: 4, 3: 2, 4: 1, 5: 2}
a = [1,1,1,1,2,2,2,2,3,3,4,5,5]
# 1. Get counts and store in another list
output = []
for i in set(a):
output.append(a.count(i))
print(output)
# 2. Remove duplicates using set constructor
a = list(set(a))
print(a)
Set collection does not allow duplicates, passing a list to the set() constructor will give an iterable of totally unique objects. count() function returns an integer count when an object that is in a list is passed. With that the unique objects are counted and each count value is stored by appending to an empty list output
list() constructor is used to convert the set(a) into list and referred by the same variable a
Output
D:\MLrec\venv\Scripts\python.exe D:/MLrec/listgroup.py
[4, 4, 2, 1, 2]
[1, 2, 3, 4, 5]
Simple solution using a dictionary.
def frequency(l):
d = {}
for i in l:
if i in d.keys():
d[i] += 1
else:
d[i] = 1
for k, v in d.iteritems():
if v ==max (d.values()):
return k,d.keys()
print(frequency([10,10,10,10,20,20,20,20,40,40,50,50,30]))
#!usr/bin/python
def frq(words):
freq = {}
for w in words:
if w in freq:
freq[w] = freq.get(w)+1
else:
freq[w] =1
return freq
fp = open("poem","r")
list = fp.read()
fp.close()
input = list.split()
print input
d = frq(input)
print "frequency of input\n: "
print d
fp1 = open("output.txt","w+")
for k,v in d.items():
fp1.write(str(k)+':'+str(v)+"\n")
fp1.close()
from collections import OrderedDict
a = [1,1,1,1,2,2,2,2,3,3,4,5,5]
def get_count(lists):
dictionary = OrderedDict()
for val in lists:
dictionary.setdefault(val,[]).append(1)
return [sum(val) for val in dictionary.values()]
print(get_count(a))
>>>[4, 4, 2, 1, 2]
To remove duplicates and Maintain order:
list(dict.fromkeys(get_count(a)))
>>>[4, 2, 1]
i'm using Counter to generate a freq. dict from text file words in 1 line of code
def _fileIndex(fh):
''' create a dict using Counter of a
flat list of words (re.findall(re.compile(r"[a-zA-Z]+"), lines)) in (lines in file->for lines in fh)
'''
return Counter(
[wrd.lower() for wrdList in
[words for words in
[re.findall(re.compile(r'[a-zA-Z]+'), lines) for lines in fh]]
for wrd in wrdList])
For the record, a functional answer:
>>> L = [1,1,1,1,2,2,2,2,3,3,4,5,5]
>>> import functools
>>> >>> functools.reduce(lambda acc, e: [v+(i==e) for i, v in enumerate(acc,1)] if e<=len(acc) else acc+[0 for _ in range(e-len(acc)-1)]+[1], L, [])
[4, 4, 2, 1, 2]
It's cleaner if you count zeroes too:
>>> functools.reduce(lambda acc, e: [v+(i==e) for i, v in enumerate(acc)] if e<len(acc) else acc+[0 for _ in range(e-len(acc))]+[1], L, [])
[0, 4, 4, 2, 1, 2]
An explanation:
we start with an empty acc list;
if the next element e of L is lower than the size of acc, we just update this element: v+(i==e) means v+1 if the index i of acc is the current element e, otherwise the previous value v;
if the next element e of L is greater or equals to the size of acc, we have to expand acc to host the new 1.
The elements do not have to be sorted (itertools.groupby). You'll get weird results if you have negative numbers.
Another approach of doing this, albeit by using a heavier but powerful library - NLTK.
import nltk
fdist = nltk.FreqDist(a)
fdist.values()
fdist.most_common()
Found another way of doing this, using sets.
#ar is the list of elements
#convert ar to set to get unique elements
sock_set = set(ar)
#create dictionary of frequency of socks
sock_dict = {}
for sock in sock_set:
sock_dict[sock] = ar.count(sock)
For an unordered list you should use:
[a.count(el) for el in set(a)]
The output is
[4, 4, 2, 1, 2]
Yet another solution with another algorithm without using collections:
def countFreq(A):
n=len(A)
count=[0]*n # Create a new list initialized with '0'
for i in range(n):
count[A[i]]+= 1 # increase occurrence for value A[i]
return [x for x in count if x] # return non-zero count
num=[3,2,3,5,5,3,7,6,4,6,7,2]
print ('\nelements are:\t',num)
count_dict={}
for elements in num:
count_dict[elements]=num.count(elements)
print ('\nfrequency:\t',count_dict)
You can use the in-built function provided in python
l.count(l[i])
d=[]
for i in range(len(l)):
if l[i] not in d:
d.append(l[i])
print(l.count(l[i])
The above code automatically removes duplicates in a list and also prints the frequency of each element in original list and the list without duplicates.
Two birds for one shot ! X D
This approach can be tried if you don't want to use any library and keep it simple and short!
a = [1,1,1,1,2,2,2,2,3,3,4,5,5]
marked = []
b = [(a.count(i), marked.append(i))[0] for i in a if i not in marked]
print(b)
o/p
[4, 4, 2, 1, 2]

Categories

Resources