Related
I have a list of sets like below. I want to write a function to return the elements that only appear once in those sets. The function I wrote kinda works. I am wondering, is there better way to handle this problem?
s1 = {1, 2, 3, 4}
s2 = {1, 3, 4}
s3 = {1, 4}
s4 = {3, 4}
s5 = {1, 4, 5}
s = [s1, s2, s3, s4, s5]
def unique(s):
temp = []
for i in s:
temp.extend(list(i))
c = Counter(temp)
result = set()
for k,v in c.items():
if v == 1:
result.add(k)
return result
unique(s) # will return {2, 5}
You can use directly a Counter and then get the elements that only appear once.
from collections import Counter
import itertools
c = Counter(itertools.chain.from_iterable(s))
res = {k for k,v in c.items() if v==1}
# {2, 5}
I love the Counter-based solution by #abc. But, just in case, here is a pure set-based one:
result = set()
for _ in s:
result |= s[0] - set.union(*s[1:])
s = s[-1:] + s[:-1] # shift the list of sets
#{2, 5}
This solution is about 6 times faster but cannot be written as a one-liner.
set.union(*[i-set.union(*[j for j in s if j!=i]) for i in s])
I think the proposed solution is similar to what #Bobby Ocean suggested but not as compressed.
The idea is to loop over the complete set array "s" to compute all the subset differences for each target subset "si" (avoiding itself).
For example starting with s1 we compute st = s1-s2-s3-s4-s5 and starting with s5 we have st=s5-s1-s2-s3-s4.
The logic behind is that due to the difference, for each target subset "si" we only keep the elements that are unique to "si" (compared to the other subsets).
Finally result is the set of the union of these uniques elements.
result= set()
for si in s: # target subset
st=si
for sj in s: # the other subsets
if sj!=si: # avoid itself
st = st-sj #compute differences
result=result.union(st)
Okay, so I have to make a function called unique. This is what it should do:
If the input is: s1 = [{1,2,3,4}, {3,4,5}]
unique(s1) should return: {1,2,5} because the 1, 2 and 5 are NOT in both lists.
And if the input is s2 = [{1,2,3,4}, {3,4,5}, {2,6}]
unique(s2) should return: {1,5,6} because those numbers are unique and are in only one list of this collection of 3 lists.
I tried to make something like this:
for x in s1:
if x not in unique_list:
unique_list.append(x)
else:
unique_list.remove(x)
print(unique_list)
But the problem with this is that it takes a whole list as "x" and not each element from each list.
Anyone that can help me a bit with this?
I am not allowed to import anything.
Python set() objects have a symmetric_difference() method to find elements in either, but not both sets. You can reduce your list with this to find the total elements unique to each set:
from functools import reduce
l = [{1,2,3,4}, {3,4,5}, {2,6}]
reduce(set.symmetric_difference, l)
# {1, 5, 6}
You can, of course do this without reduce by manually looping over the list. ^ will produce the symmetric_difference:
l = [{1,2,3,4}, {3,4,5}, {2,6}]
final = set()
for s in l:
final = final ^ s
print(final)
# {1, 5, 6}
In [13]: def f(sets):
...: c = {}
...: for s in sets:
...: for x in s:
...: c[x] = c.setdefault(x, 0) + 1
...: return {x for x, v in c.items() if v == 1}
...:
In [14]: f([{1,2}, {2, 3}, {3, 4}])
Out[14]: {1, 4}
I have multiple sets (the number is unknown) and I would like to find the commonality between the sets, if I have a match between sets (80% match) I would like to merge these 2 sets and then rerun the new set that I have against all the other sets from the beginning.
for example:
A : {1,2,3,4}
B : {5,6,7}
C : {1,2,3,4,5}
D : {2,3,4,5,6,7}
Then A runs and there is no commonality between A & B and then it runs A against C which hits the commonalty target therefore we have now a new set AC = {1,2,3,4,5} and now we compare AC to B it doesn't hit the threshold but D does therefore we have a new ACD set and now we run again and now we have a hit with B.
I'm currently using 2 loops but this solve only if I compare between 2 sets.
in order to calculate the commonality I'm using the following calculation:
overlap = a_set & b_set
universe = a_set | b_set
per_overlap = (len(overlap)/len(universe))
I think the solution should be a recursive function but I'm not so sure how to write this I'm kind of new to Python or maybe there is a different and simple way to do this.
I believe this does what you are looking for. The complexity is awful because it starts over each time it gets a match. No recursion is needed.
def commonality(s1, s2):
overlap = s1 & s2
universe = s1 | s2
return (len(overlap)/len(universe))
def set_merge(s, threshold=0.8):
used_keys = set()
out = s.copy()
incomplete = True
while incomplete:
incomplete = False
restart = False
for k1, s1 in list(out.items()):
if restart:
incomplete = True
break
if k1 in used_keys:
continue
for k2, s2 in s.items():
if k1==k2 or k2 in used_keys:
continue
print(k1, k2)
if commonality(s1, s2) >= threshold:
out.setdefault(k1+k2, s1 | s2)
out.pop(k1)
if k2 in out:
out.pop(k2)
used_keys.add(k1)
used_keys.add(k2)
restart = True
break
out.update({k:v for k,v in s.items() if k not in used_keys})
return out
For your particular example, it only merges A and C, as any other combination is below the threshold.
set_dict = {
'A' : {1,2,3,4},
'B' : {5,6,7},
'C' : {1,2,3,4,5},
'D' : {2,3,4,5,6,7},
}
set_merge(set_dict)
# returns:
{'B': {5, 6, 7},
'D': {2, 3, 4, 5, 6, 7},
'AC': {1, 2, 3, 4, 5}}
I need to swap two random values from a dicitonary
def alphabetcreator():
letters = random.sample(range(97,123), 26)
newalpha = []
engalpha =['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
alphasmerged = {}
for i in letters:
newalpha.append(chr(i))
alphasmerged = dict(zip(engalpha, newalpha))
return(alphabetsmerged)
This code gives me my two different alphabets, putting them into a dictionary so I can translate between one and the other. I now need to randomly swap two of the values whilst keeping all the rest the same. How can I do this?
You can first use random.sample to randomly pick two different values from a collection.
From the doc:
Return a k length list of unique elements chosen from the population sequence or set. Used for random sampling without replacement.
Use this function on the keys of your dictionary to have two distinct keys.
In Python 3, you can directly use it on a dict_keys object.
In Python 2, you can either convert d.keys() into a list, or directly pass the dictionary to the sample.
>>> import random
>>> d = {'a': 1, 'b': 2}
>>> k1, k2 = random.sample(d.keys(), 2) # Python 3
>>> k1, k2 = random.sample(d, 2) # Python 2
>>> k1, k2
['a', 'b']
Then, you can in-place-ly swap two values of a collection.
>>> d[k1], d[k2] = d[k2], d[k1]
>>> d
{'b': 1, 'a': 2}
d = {12: 34, 67: 89}
k, v = random.choice(list(d.items()))
d[v] = k
d.pop(k)
which when running, gave the random output of d as:
{12: 34, 89: 67}
You can try this:
import random
engalpha =['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
new_dict = {a:b for a, b in zip(engalpha, map(chr, random.sample(range(97,123), 26)))}
key_val = random.choice(list(new_dict.keys()))
final_dict = {b if a == key_val else a:a if a == key_val else b for a, b in new_dict.items()}
Regarding your recent comment:
import random
s = {'a': 'h', 'b': 'd', 'c': 'y'}
random_dict = [(a, b) for a, b in random.sample(list(s.items()), 2)]
new_dict = {a:b for a, b in zip([i[0] for i in sorted(random_dict, key=lambda x:x[0])], [i[-1] for i in sorted(random_dict, key=lambda x:x[-1])][::-1])}
final_dict = {a:new_dict.get(a, b) for a, b in s.items()}
Output (randomly generated):
{'a': 'y', 'c': 'h', 'b': 'd'}
I am trying to create a dictionary of word and number of times it is repeating in string. Say suppose if string is like below
str1 = "aabbaba"
I want to create a dictionary like this
word_count = {'a':4,'b':3}
I am trying to use dictionary comprehension to do this.
I did
dic = {x:dic[x]+1 if x in dic.keys() else x:1 for x in str}
This ends up giving an error saying
File "<stdin>", line 1
dic = {x:dic[x]+1 if x in dic.keys() else x:1 for x in str}
^
SyntaxError: invalid syntax
Can anybody tell me what's wrong with the syntax? Also,How can I create such a dictionary using dictionary comprehension?
As others have said, this is best done with a Counter.
You can also do:
>>> {e:str1.count(e) for e in set(str1)}
{'a': 4, 'b': 3}
But that traverses the string 1+n times for each unique character (once to create the set, and once for each unique letter to count the number of times it appears. i.e., This has quadratic runtime complexity.). Bad result if you have a lot of unique characters in a long string... A Counter only traverses the string once.
If you want no import version that is more efficient than using .count, you can use .setdefault to make a counter:
>>> count={}
>>> for c in str1:
... count[c]=count.setdefault(c, 0)+1
...
>>> count
{'a': 4, 'b': 3}
That only traverses the string once no matter how long or how many unique characters.
You can also use defaultdict if you prefer:
>>> from collections import defaultdict
>>> count=defaultdict(int)
>>> for c in str1:
... count[c]+=1
...
>>> count
defaultdict(<type 'int'>, {'a': 4, 'b': 3})
>>> dict(count)
{'a': 4, 'b': 3}
But if you are going to import collections -- Use a Counter!
Ideal way to do this is via using collections.Counter:
>>> from collections import Counter
>>> str1 = "aabbaba"
>>> Counter(str1)
Counter({'a': 4, 'b': 3})
You can not achieve this via simple dict comprehension expression as you will require reference to your previous value of count of element. As mentioned in Dawg's answer, as a work around you may use list.count(e) in order to find count of each element from the set of string within you dict comprehension expression. But time complexity will be n*m as it will traverse the complete string for each unique element (where m are uniques elements), where as with counter it will be n.
This is a nice case for collections.Counter:
>>> from collections import Counter
>>> Counter(str1)
Counter({'a': 4, 'b': 3})
It's dict subclass so you can work with the object similarly to standard dictionary:
>>> c = Counter(str1)
>>> c['a']
4
You can do this without use of Counter class as well. The simple and efficient python code for this would be:
>>> d = {}
>>> for x in str1:
... d[x] = d.get(x, 0) + 1
...
>>> d
{'a': 4, 'b': 3}
Note that this is not the correct way to do it since it won't count repeated characters more than once (apart from losing other characters from the original dict) but this answers the original question of whether if-else is possible in comprehensions and demonstrates how it can be done.
To answer your question, yes it's possible but the approach is like this:
dic = {x: (dic[x] + 1 if x in dic else 1) for x in str1}
The condition is applied on the value only not on the key:value mapping.
The above can be made clearer using dict.get:
dic = {x: dic.get(x, 0) + 1 for x in str1}
0 is returned if x is not in dic.
Demo:
In [78]: s = "abcde"
In [79]: dic = {}
In [80]: dic = {x: (dic[x] + 1 if x in dic else 1) for x in s}
In [81]: dic
Out[81]: {'a': 1, 'b': 1, 'c': 1, 'd': 1, 'e': 1}
In [82]: s = "abfg"
In [83]: dic = {x: dic.get(x, 0) + 1 for x in s}
In [84]: dic
Out[84]: {'a': 2, 'b': 2, 'f': 1, 'g': 1}