I have a list of sets like below. I want to write a function to return the elements that only appear once in those sets. The function I wrote kinda works. I am wondering, is there better way to handle this problem?
s1 = {1, 2, 3, 4}
s2 = {1, 3, 4}
s3 = {1, 4}
s4 = {3, 4}
s5 = {1, 4, 5}
s = [s1, s2, s3, s4, s5]
def unique(s):
temp = []
for i in s:
temp.extend(list(i))
c = Counter(temp)
result = set()
for k,v in c.items():
if v == 1:
result.add(k)
return result
unique(s) # will return {2, 5}
You can use directly a Counter and then get the elements that only appear once.
from collections import Counter
import itertools
c = Counter(itertools.chain.from_iterable(s))
res = {k for k,v in c.items() if v==1}
# {2, 5}
I love the Counter-based solution by #abc. But, just in case, here is a pure set-based one:
result = set()
for _ in s:
result |= s[0] - set.union(*s[1:])
s = s[-1:] + s[:-1] # shift the list of sets
#{2, 5}
This solution is about 6 times faster but cannot be written as a one-liner.
set.union(*[i-set.union(*[j for j in s if j!=i]) for i in s])
I think the proposed solution is similar to what #Bobby Ocean suggested but not as compressed.
The idea is to loop over the complete set array "s" to compute all the subset differences for each target subset "si" (avoiding itself).
For example starting with s1 we compute st = s1-s2-s3-s4-s5 and starting with s5 we have st=s5-s1-s2-s3-s4.
The logic behind is that due to the difference, for each target subset "si" we only keep the elements that are unique to "si" (compared to the other subsets).
Finally result is the set of the union of these uniques elements.
result= set()
for si in s: # target subset
st=si
for sj in s: # the other subsets
if sj!=si: # avoid itself
st = st-sj #compute differences
result=result.union(st)
Related
I have a list of sets constructed as below. I want to count how many times the set s1 appears in the list. My approach right now is converting each set to tuple and count them. Is there another solution for this?
s1 = {1, 2}
s2 = {1, 3, 4}
s3 = {1, 4}
s = [s1, s2, s1, s3]
# This won't work because set is unhashable
# c = Counter(s)
s = [tuple(i) for i in s]
c = Counter(s)
print(c)
Okay, so I have to make a function called unique. This is what it should do:
If the input is: s1 = [{1,2,3,4}, {3,4,5}]
unique(s1) should return: {1,2,5} because the 1, 2 and 5 are NOT in both lists.
And if the input is s2 = [{1,2,3,4}, {3,4,5}, {2,6}]
unique(s2) should return: {1,5,6} because those numbers are unique and are in only one list of this collection of 3 lists.
I tried to make something like this:
for x in s1:
if x not in unique_list:
unique_list.append(x)
else:
unique_list.remove(x)
print(unique_list)
But the problem with this is that it takes a whole list as "x" and not each element from each list.
Anyone that can help me a bit with this?
I am not allowed to import anything.
Python set() objects have a symmetric_difference() method to find elements in either, but not both sets. You can reduce your list with this to find the total elements unique to each set:
from functools import reduce
l = [{1,2,3,4}, {3,4,5}, {2,6}]
reduce(set.symmetric_difference, l)
# {1, 5, 6}
You can, of course do this without reduce by manually looping over the list. ^ will produce the symmetric_difference:
l = [{1,2,3,4}, {3,4,5}, {2,6}]
final = set()
for s in l:
final = final ^ s
print(final)
# {1, 5, 6}
In [13]: def f(sets):
...: c = {}
...: for s in sets:
...: for x in s:
...: c[x] = c.setdefault(x, 0) + 1
...: return {x for x, v in c.items() if v == 1}
...:
In [14]: f([{1,2}, {2, 3}, {3, 4}])
Out[14]: {1, 4}
I have multiple sets (the number is unknown) and I would like to find the commonality between the sets, if I have a match between sets (80% match) I would like to merge these 2 sets and then rerun the new set that I have against all the other sets from the beginning.
for example:
A : {1,2,3,4}
B : {5,6,7}
C : {1,2,3,4,5}
D : {2,3,4,5,6,7}
Then A runs and there is no commonality between A & B and then it runs A against C which hits the commonalty target therefore we have now a new set AC = {1,2,3,4,5} and now we compare AC to B it doesn't hit the threshold but D does therefore we have a new ACD set and now we run again and now we have a hit with B.
I'm currently using 2 loops but this solve only if I compare between 2 sets.
in order to calculate the commonality I'm using the following calculation:
overlap = a_set & b_set
universe = a_set | b_set
per_overlap = (len(overlap)/len(universe))
I think the solution should be a recursive function but I'm not so sure how to write this I'm kind of new to Python or maybe there is a different and simple way to do this.
I believe this does what you are looking for. The complexity is awful because it starts over each time it gets a match. No recursion is needed.
def commonality(s1, s2):
overlap = s1 & s2
universe = s1 | s2
return (len(overlap)/len(universe))
def set_merge(s, threshold=0.8):
used_keys = set()
out = s.copy()
incomplete = True
while incomplete:
incomplete = False
restart = False
for k1, s1 in list(out.items()):
if restart:
incomplete = True
break
if k1 in used_keys:
continue
for k2, s2 in s.items():
if k1==k2 or k2 in used_keys:
continue
print(k1, k2)
if commonality(s1, s2) >= threshold:
out.setdefault(k1+k2, s1 | s2)
out.pop(k1)
if k2 in out:
out.pop(k2)
used_keys.add(k1)
used_keys.add(k2)
restart = True
break
out.update({k:v for k,v in s.items() if k not in used_keys})
return out
For your particular example, it only merges A and C, as any other combination is below the threshold.
set_dict = {
'A' : {1,2,3,4},
'B' : {5,6,7},
'C' : {1,2,3,4,5},
'D' : {2,3,4,5,6,7},
}
set_merge(set_dict)
# returns:
{'B': {5, 6, 7},
'D': {2, 3, 4, 5, 6, 7},
'AC': {1, 2, 3, 4, 5}}
I have 2 lists:
1. ['a', 'b', 'c']
2. ['a', 'd', 'a', 'b']
And I want dictionary output like this:
{'a': 2, 'b': 1, 'c': 0}
I already made it:
#b = list #1
#words = list #2
c = {}
for i in b:
c.update({i:words.count(i)})
But it is very slow, I need to process like 10MB txt file.
EDIT: Entire code, currently testing so unused imports..
import string
import os
import operator
import time
from collections import Counter
def getbookwords():
a = open("wu.txt", encoding="utf-8")
b = a.read().replace("\n", "").lower()
a.close()
b.translate(string.punctuation)
b = b.split(" ")
return b
def wordlist(words):
a = open("wordlist.txt")
b = a.read().lower()
b = b.split("\n")
a.close()
t = time.time()
#c = dict((i, words.count(i)) for i in b )
c = Counter(words)
result = {k: v for k, v in c.items() if k in set(b)}
print(time.time() - t)
sorted_d = sorted(c.items(), key=operator.itemgetter(1))
return(sorted_d)
print(wordlist(getbookwords()))
Since speed is currently an issue, it might be worth considering not passing through the list for each thing you want to count. The set() function allows you to only use the unique keys in your list words.
An important thing to remember for speed in all cases is the line unique_words = set(b). Without this, an entire pass through your list is being done to create a set from b at every iteration in whichever kind of data structure you happen to use.
c = {k:0 for k in set(words)}
for w in words:
c[w] += 1
unique_words = set(b)
c = {k:counts[k] for k in c if k in unique_words}
Alternatively, defaultdicts can be used to eliminate some of the initialization.
from collections import defaultdict
c = defaultdict(int)
for w in words:
c[w] += 1
unique_words = set(b)
c = {k:counts[k] for k in c if k in unique_words}
For completeness sake, I do like the Counter based solutions in the other answers (like from Reut Sharabani). The code is cleaner, and though I haven't benchmarked it I wouldn't be surprised if a built-in counting class is faster than home-rolled solutions with dictionaries.
from collections import Counter
c = Counter(words)
unique_words = set(b)
c = {k:v for k, v in c.items() if k in unique_words}
Try using collections.Counter and move b to a set, not a list:
from collections import Counter
c = Counter(words)
b = set(b)
result = {k: v for k, v in c.items() if k in b}
Also, if you can read the words lazily and not create an intermediate list that should be faster.
Counter provides the functionality you want (counting items), and filtering the result against a set uses hashing which should be a lot faster.
You can use collection.Counter on a generator that skips ignored keys using a set lookup.
from collections import Counter
keys = ['a', 'b', 'c']
lst = ['a', 'd', 'a', 'b']
unique_keys = set(keys)
count = Counter(x for x in lst if x in unique_keys)
print(count) # Counter({'a': 2, 'b': 1})
# count['c'] == 0
Note that count['c'] is not printed, but is still 0 by default in a Counter.
Here's an example I just coughed up in repl. Assuming you're not counting duplicates in list two. We create a hash table using a dictionary. For each item in the list were matching two, we create a key value pair with the item being the key and we set the value to 0.
Next we iterate through the second list, for each value, we check if the value has been defined already, if it has been, than we increment the value using the key. Else, we ignore.
Least amount of iterations possible. You hit each item in each list only once.
x = [1, 2, 3, 4, 5];
z = [1, 2, 2, 2, 1];
y = {};
for n in x:
y[n] = 0; //Set the value to zero for each item in the list
for n in z:
if(n in y): //If we defined the value in the hash already, increment by one
y[n] += 1;
print(y)
#Makalone, above answers are appreciable. You can also try the below code sample which uses Python's Counter() from collections module.
You can try it at http://rextester.com/OTYG56015.
Python code »
from collections import Counter
list1 = ['a', 'b', 'c']
list2 = ['a', 'd', 'a', 'b']
counter = Counter(list2)
d = {key: counter[key] for key in set(list1)}
print(d)
Output »
{'a': 2, 'c': 0, 'b': 1}
I never used python before. Now I have a dictionary like :
d1 = {1:2,3:3,2:2,4:2,5:2}
The pair[0] in each pair means point, the pair[1] in each pair means the cluster id. So d1 means point 1 belongs to cluster 2, point 3 belongs to cluster 3, point 2 belongs to cluster 2, point 4 belongs to cluster 2, point 5 belongs to cluster 2. No point belongs to cluster 1.
How to use filter(don't use loop) to get a dictionary like following :
d2 = {1:[],2:[1,2,4,5],3:[3]}
it means no point belongs to cluster 1, 1,2,4,5 belongs to cluster 2, 3 belongs to cluster 3.
I tried :
d2 = dict(filter(lambda a,b: a,b if a[1] == b[1] , d1.items()))
I would use a collections.defaultdict
from collections import defaultdict
d2 = defaultdict(list)
for point, cluster in d1.items():
d2[cluster].append(point)
Your defaultdict won't have a cluster 1 in it, but if you know what clusters you expect, then all will be fine with the world (because the empty list will be put in that slot when you try to look there -- this is the "default" part of the defaultdict):
expected_clusters = [1, 2, 3]
for cluster in expected_clusters:
print(d2[cluster])
FWIW, doing this problem with the builtin filter is just insanity. However, if you must, something like the following works:
d2 = {}
filter(lambda (pt, cl): d2.setdefault(cl, []).append(pt), d1.items())
Note that I'm using python2.x's unpacking of arguments. For python3.x, you'd need to do something like lambda item: d2.setdefault(item[1], []).append(item[0]), or, maybe we could do something like this which is a bit nicer:
d2 = {}
filter(lambda pt: d2.setdefault(d1[pt], []).append(pt), d1)
We can do a tiny bit better with the reduce builtin (at least the reduce isn't simply a vehicle to create an implicit loop and therefore actually returns the dict we want):
>>> d1 = {1:2,3:3,2:2,4:2,5:2}
>>> reduce(lambda d, k: d.setdefault(d1[k], []).append(k) or d, d1, {})
{2: [1, 2, 4, 5], 3: [3]}
But this is still really ugly python.
>>> d1 = {1:2,3:3,2:2,4:2,5:2}
>>> dict(map(lambda c : (c, [k for k, v in d1.items() if v == c]), d1.values()))
{2: [1, 2, 4, 5], 3: [3]}
lambda function to get the value list
map function to map values(clusters) using the above lambda