Python: sets and set.intersection - random order of output? - python

I have a question about the un-ordered nature of sets.
This code:
#Set1 is 'a' to 'e' in alpha order
set1 = {}
set1 = {'a', 'b', 'c', 'd', 'e'}
print('\nSet1 :', set1)
#Set2 is 'f' to 'a' (missing 'e') in reverse-alpha order
set2 = {}
set2 = {'f', 'd', 'c', 'b', 'a'}
print ('Set2 :', set2)
print ('Common to both sets:', set1.intersection(set2))
...gives random ordering of the elements in set1, set2 and in the result of set.intersection:
Set : {'a', 'c', 'b', 'e', 'd'}
Set: {'a', 'c', 'b', 'd', 'f'}
Common to both sets: {'a', 'c', 'b', 'd'}
Although not a problem per se, my question is this: is there a set algorithm for this? Or could I (feasibly) use this property to generate random lists of items present in two lists (i.e. is it truly random?). BTW, I have no idea why I might want to do this - thinking out loud.

The order in which sets are printed is based on, among other things, a hash of their contents - it is not random. If you need a set to be ordered, you can always use the built-in sorted() function:
>>>> sorted(set1.intersection(set2))
{'a', 'b', 'c', 'd'}

Related

why cant we add lists or dictionaries or tuples inside set [duplicate]

How do I add a list of values to an existing set?
Adding the contents of a list
Use set.update() or the |= operator:
>>> a = set('abc')
>>> a
{'a', 'b', 'c'}
>>> xs = ['d', 'e']
>>> a.update(xs)
>>> a
{'e', 'b', 'c', 'd', 'a'}
>>> xs = ['f', 'g']
>>> a |= set(xs)
>>> a
{'e', 'b', 'f', 'c', 'd', 'g', 'a'}
Adding the list itself
It is not possible to directly add the list itself to the set, since set elements must be hashable.
Instead, one may convert the list to a tuple first:
>>> a = {('a', 'b', 'c')}
>>> xs = ['d', 'e']
>>> a.add(tuple(xs))
>>> a
{('a', 'b', 'c'), ('d', 'e')}
You can't add a list to a set because lists are mutable, meaning that you can change the contents of the list after adding it to the set.
You can however add tuples to the set, because you cannot change the contents of a tuple:
>>> a.add(('f', 'g'))
>>> print a
set(['a', 'c', 'b', 'e', 'd', ('f', 'g')])
Edit: some explanation: The documentation defines a set as an unordered collection of distinct hashable objects. The objects have to be hashable so that finding, adding and removing elements can be done faster than looking at each individual element every time you perform these operations. The specific algorithms used are explained in the Wikipedia article. Pythons hashing algorithms are explained on effbot.org and pythons __hash__ function in the python reference.
Some facts:
Set elements as well as dictionary keys have to be hashable
Some unhashable datatypes:
list: use tuple instead
set: use frozenset instead
dict: has no official counterpart, but there are some
recipes
Object instances are hashable by default with each instance having a unique hash. You can override this behavior as explained in the python reference.
To add the elements of a list to a set, use update
From https://docs.python.org/2/library/sets.html
s.update(t): return set s with elements added from t
E.g.
>>> s = set([1, 2])
>>> l = [3, 4]
>>> s.update(l)
>>> s
{1, 2, 3, 4}
If you instead want to add the entire list as a single element to the set, you can't because lists aren't hashable. You could instead add a tuple, e.g. s.add(tuple(l)). See also TypeError: unhashable type: 'list' when using built-in set function for more information on that.
Hopefully this helps:
>>> seta = set('1234')
>>> listb = ['a','b','c']
>>> seta.union(listb)
set(['a', 'c', 'b', '1', '3', '2', '4'])
>>> seta
set(['1', '3', '2', '4'])
>>> seta = seta.union(listb)
>>> seta
set(['a', 'c', 'b', '1', '3', '2', '4'])
Please notice the function set.update(). The documentation says:
Update a set with the union of itself and others.
list objects are unhashable. you might want to turn them in to tuples though.
Sets can't have mutable (changeable) elements/members. A list, being mutable, cannot be a member of a set.
As sets are mutable, you cannot have a set of sets!
You can have a set of frozensets though.
(The same kind of "mutability requirement" applies to the keys of a dict.)
Other answers have already given you code, I hope this gives a bit of insight.
I'm hoping Alex Martelli will answer with even more details.
I found I needed to do something similar today. The algorithm knew when it was creating a new list that needed to added to the set, but not when it would have finished operating on the list.
Anyway, the behaviour I wanted was for set to use id rather than hash. As such I found mydict[id(mylist)] = mylist instead of myset.add(mylist) to offer the behaviour I wanted.
You want to add a tuple, not a list:
>>> a=set('abcde')
>>> a
set(['a', 'c', 'b', 'e', 'd'])
>>> l=['f','g']
>>> l
['f', 'g']
>>> t = tuple(l)
>>> t
('f', 'g')
>>> a.add(t)
>>> a
set(['a', 'c', 'b', 'e', 'd', ('f', 'g')])
If you have a list, you can convert to the tuple, as shown above. A tuple is immutable, so it can be added to the set.
You'll want to use tuples, which are hashable (you can't hash a mutable object like a list).
>>> a = set("abcde")
>>> a
set(['a', 'c', 'b', 'e', 'd'])
>>> t = ('f', 'g')
>>> a.add(t)
>>> a
set(['a', 'c', 'b', 'e', 'd', ('f', 'g')])
Here is how I usually do it:
def add_list_to_set(my_list, my_set):
[my_set.add(each) for each in my_list]
return my_set
Try using * unpack, like below:
>>> a=set('abcde')
>>> a
{'a', 'd', 'e', 'b', 'c'}
>>> l=['f','g']
>>> l
['f', 'g']
>>> {*l, *a}
{'a', 'd', 'e', 'f', 'b', 'g', 'c'}
>>>
Non Editor version:
a=set('abcde')
l=['f', 'g']
print({*l, *a})
Output:
{'a', 'd', 'e', 'f', 'b', 'g', 'c'}
Union is the easiest way:
list0 = ['a', 'b', 'c']
set0 = set()
set0.add('d')
set0.add('e')
set0.add('f')
set0 = set0.union(list0)
print(set0)
Output:
{'b', 'd', 'f', 'c', 'a', 'e'}

Index of a list item that occurs multiple times

I have the following code
items = ['a', 'a', 'b', 'a', 'c', 'c', 'd']
for x in items:
print(x, end='')
print(items.index(x), end='')
## out puts: a0a0b2a0c4c4d6
I understand that python finds the first item in the list to index, but is it possible for me to get an output of a0a1b2a3c4c5d6 instead?
It would be optimal for me to keep using the for loop because I will be editing the list.
edit: I made a typo with the c indexes
And in case you really feel like doing it in one line:
EDIT - using .format or format-strings makes this shorter / more legible, as noted in the comments
items = ['a', 'a', 'b', 'a', 'c', 'c', 'd']
print("".join("{}{}".format(e,i) for i,e in enumerate(items)))
For Python 3.7 you can do
items = ['a', 'a', 'b', 'a', 'c', 'c', 'd']
print("".join(f"{e}{i}" for i, e in enumerate(items)))
ORIGINAL
items = ['a', 'a', 'b', 'a', 'c', 'c', 'd']
print("".join((str(e) for item_with_index in enumerate(items) for e in item_with_index[::-1])))
Note that the reversal is needed (item_with_index[::-1]) because you want the items printed before the index but enumerate gives tuples with the index first.
I think you're looking for a0a1b2a3c4c5d6 instead.
for i, x in enumerate(items):
print("{}{}".format(x,i), end='')
Don't add or remove items from your list as you are traversing it. If you want the output specified, you can use enumerate to get the items and the indices of the list.
items = ['a', 'a', 'b', 'a', 'c', 'c', 'd']
for idx, x in enumerate(items):
print("{}{}".format(x, idx), end='')
# outputs a0a1b2a3c4c5d6

How to sort a list alphabetically by treating same letters in different case as same in python

If input is like ['z','t','Z','a','b','A','d'],then after sorting I want to get output like ['a','A','b','d','t','z','Z'] or ['A','a','b','d','t','Z','z'].
This will sort always upper-case letter first:
lst = ['z','t','Z','a','b','A','d']
print(sorted(lst, key=lambda k: 2*ord(k.lower()) + k.islower()))
Prints:
['A', 'a', 'b', 'd', 't', 'Z', 'z']
EDIT Thanks to #MadPhysicist in the comments, another variant:
print(sorted(lst, key=lambda k: (k.lower(), k.islower())))
There are two options on how this sorting could be done. Option 1 is stable, meaning that the order of elements is preserved regardless of case:
['A', 'b', 'a', 'B'] -> ['A', 'a', 'b', 'B']
The other option is to always put uppercase before or after lowercase:
['A', 'b', 'a', 'B'] -> ['A', 'a', 'B', 'b'] or ['a', 'A', 'b', 'B']
Both are possible with the key argument to list.sort (or the builtin sorted).
A stable sort is simply:
['A', 'b', 'a', 'B'].sort(key=str.lower)
A fully ordered sort requires you to check the original status of the letter, in addition to comparing the lowercased values:
['A', 'b', 'a', 'B'].sort(key=lambda x: (x.lower(), x.islower()))
This uses the fact that a tuples are compared lexicographically, or element-by-element. The first difference determines the order. If two letters have different values for x.lower(), they will be sorted as usual. If they have the same lowercase representation, x.islower() will be compared. Since uppercase letters will return 0 and lowercase letters return 1, lowercase letters will come after uppercase. To switch that, invert the sense of the comparison:
['A', 'b', 'a', 'B'].sort(key=lambda x: (x.lower(), not x.islower()))
OR
['A', 'b', 'a', 'B'].sort(key=lambda x: (x.lower(), x.isupper()))
OR
['A', 'b', 'a', 'B'].sort(key=lambda x: (x.lower(), -x.islower()))
etc...
You could use sorted's (or list.sort's) extra keyword - key. You can pass to key a function according to which the sort will be performed. So for example:
l = ['z','t','Z','a','b','A','d']
print(sorted(l, key=str.lower))
Gives:
['a', 'A', 'b', 'd', 't', 'z', 'Z']
Note: this will not preserve the order of lower/upper between different letters. It will preserve the order of original input.

Remove some duplicates from list in python

UPDATE: I believe I found the solution. I've put it at the end.
Let’s say we have this list:
a = ['a', 'a', 'b', 'b', 'a', 'a', 'c', 'c']
I want to create another list to remove the duplicates from list a, but at the same time, keep the ratio approximately intact AND maintain order.
The output should be:
b = ['a', 'b', 'a', 'c']
EDIT: To explain better, the ratio doesn't need to be exactly intact. All that's required is the output of ONE single letter for all letters in the data. However, two letters might be the same but represent two different things. The counts are important to identify this as I say later. Letters representing ONE unique variable appear in counts between 3000-3400 so when I divide the total count by 3500 and round it, I know how many time it should appear in the end, but the problem is I don't know what order they should be in.
To illustrate this I'll include one more input and desired output:
Input: ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'a', 'a', 'd', 'd', 'a', 'a']
Desired Output: ['a', 'a', 'b', 'c', 'a', 'd', 'a']
Note that 'C' has been repeated three times. The ratio needs not be preserved exactly, all I need to represent is how many times that variable is represented and because it's represented 3 times only in this example, it isn't considered enough for it to count as two.
The only difference is that here I'm assuming all letters repeating exactly twice are unique, although in the data-set, again, uniqueness is dependent on the appearance of 3000-3400 times.
Note(1): This doesn't necessarily need to be considered but there's a possibility that not all letters will be grouped together nicely, for example, considering 4 letters for uniqueness to make it short: ['a','a',''b','a','a','b','b','b','b'] should still be represented as ['a','b']. This is a minor problem in this case, however.
EDIT:
Example of what I've tried and successfully done:
full_list = ['a', 'a', 'b', 'b', 'a', 'a', 'c', 'c']
#full_list is a list containing around 10k items, just using this as example
rep = 2 # number of estimated repetitions for unique item,
# in the real list this was set to 3500
quant = {'a': 0, "b" : 0, "c" : 0, "d" : 0, "e" : 0, "f" : 0, "g": 0}
for x in set(full_list):
quant[x] = round(full_list.count(x)/rep)
final = []
for x in range(len(full_list)):
if full_list[x] in final:
lastindex = len(full_list) - 1 - full_list[::-1].index(full_list[x])
if lastindex == x and final.count(full_list[x]) < quant[full_list[x]]:
final.append(full_list[x])
else:
final.append(full_list[x])
print(final)
My problem with the above code is two-fold:
If there are more than 2 repetitions of the same data, it will not count them correctly. For example: ['a', 'a', 'b', 'b', 'a', 'a', 'c', 'c', 'a', 'a'] should become ['a','b','a','c','a'] but instead it becomes ['a','b,'c','a']
It takes a very log time to finish as I'm sure it's a very
inefficient way to do this.
Final remark: The code I've tried was more of a little hack to achieve the desired output on the most common input, however it doesn't do exactly what I intended it to. It's also important to note that the input changes over time. Repetitions of single letters aren't always the same, although I believe they're always grouped together, so I was thinking of making a flag that is True when it hits a letter and becomes false as soon as it changes to a different one, but this also has the problem of not being able to account for the fact that two letters that are the same might be put right next to each other. The count for each letter as an individual is always between 3000-3400, so I know that if the count is above that, there are more than 1.
UPDATE: Solution
Following hiro protagonist's suggestion with minor modifications, the following code seems to work:
full = ['a', 'a', 'b', 'b', 'a', 'a', 'c', 'c', 'a', 'a']
from itertools import groupby
letters_pre = [key for key, _group in groupby(full)]
letters_post = []
for x in range(len(letters_pre)):
if x>0 and letters_pre[x] != letters_pre[x-1]:
letters_post.append(letters_pre[x])
if x == 0:
letters_post.append(letters_pre [x])
print(letters_post)
The only problem is that it doesn't consider that sometimes letters can appear in between unique ones, as described in "Note(1)", but that's only a very minor issue. The bigger issue is that it doesn't consider when two separate occurances of the same letter are consecutive, for example (two for uniqueness as example): ['a','a','a','a','b','b'] gets turned to ['a','b'] when desired output should be ['a','a','b']
this is where itertools.groupby may come in handy:
from itertools import groupby
a = ["a", "a", "b", "b", "a", "a", "c", "c"]
res = [key for key, _group in groupby(a)]
print(res) # ['a', 'b', 'a', 'c']
this is a version where you could 'scale' down the unique keys (but are guaranteed to have at leas one in the result):
from itertools import groupby, repeat, chain
a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'c', 'c', 'a', 'a',
'd', 'd', 'a', 'a']
scale = 0.4
key_count = tuple((key, sum(1 for _item in group)) for key, group in groupby(a))
# (('a', 4), ('b', 2), ('c', 5), ('a', 2), ('d', 2), ('a', 2))
res = tuple(
chain.from_iterable(
(repeat(key, round(scale * count) or 1)) for key, count in key_count
)
)
# ('a', 'a', 'b', 'c', 'c', 'a', 'd', 'a')
there may be smarter ways to determine the scale (probably based on the length of the input list a and the average group length).
Might be a strange one, but:
b = []
for i in a:
if next(iter(b[::-1]), None) != i:
b.append(i)
print(b)
Output:
['a', 'b', 'a', 'c']

Add list to set

How do I add a list of values to an existing set?
Adding the contents of a list
Use set.update() or the |= operator:
>>> a = set('abc')
>>> a
{'a', 'b', 'c'}
>>> xs = ['d', 'e']
>>> a.update(xs)
>>> a
{'e', 'b', 'c', 'd', 'a'}
>>> xs = ['f', 'g']
>>> a |= set(xs)
>>> a
{'e', 'b', 'f', 'c', 'd', 'g', 'a'}
Adding the list itself
It is not possible to directly add the list itself to the set, since set elements must be hashable.
Instead, one may convert the list to a tuple first:
>>> a = {('a', 'b', 'c')}
>>> xs = ['d', 'e']
>>> a.add(tuple(xs))
>>> a
{('a', 'b', 'c'), ('d', 'e')}
You can't add a list to a set because lists are mutable, meaning that you can change the contents of the list after adding it to the set.
You can however add tuples to the set, because you cannot change the contents of a tuple:
>>> a.add(('f', 'g'))
>>> print a
set(['a', 'c', 'b', 'e', 'd', ('f', 'g')])
Edit: some explanation: The documentation defines a set as an unordered collection of distinct hashable objects. The objects have to be hashable so that finding, adding and removing elements can be done faster than looking at each individual element every time you perform these operations. The specific algorithms used are explained in the Wikipedia article. Pythons hashing algorithms are explained on effbot.org and pythons __hash__ function in the python reference.
Some facts:
Set elements as well as dictionary keys have to be hashable
Some unhashable datatypes:
list: use tuple instead
set: use frozenset instead
dict: has no official counterpart, but there are some
recipes
Object instances are hashable by default with each instance having a unique hash. You can override this behavior as explained in the python reference.
To add the elements of a list to a set, use update
From https://docs.python.org/2/library/sets.html
s.update(t): return set s with elements added from t
E.g.
>>> s = set([1, 2])
>>> l = [3, 4]
>>> s.update(l)
>>> s
{1, 2, 3, 4}
If you instead want to add the entire list as a single element to the set, you can't because lists aren't hashable. You could instead add a tuple, e.g. s.add(tuple(l)). See also TypeError: unhashable type: 'list' when using built-in set function for more information on that.
Hopefully this helps:
>>> seta = set('1234')
>>> listb = ['a','b','c']
>>> seta.union(listb)
set(['a', 'c', 'b', '1', '3', '2', '4'])
>>> seta
set(['1', '3', '2', '4'])
>>> seta = seta.union(listb)
>>> seta
set(['a', 'c', 'b', '1', '3', '2', '4'])
Please notice the function set.update(). The documentation says:
Update a set with the union of itself and others.
list objects are unhashable. you might want to turn them in to tuples though.
Sets can't have mutable (changeable) elements/members. A list, being mutable, cannot be a member of a set.
As sets are mutable, you cannot have a set of sets!
You can have a set of frozensets though.
(The same kind of "mutability requirement" applies to the keys of a dict.)
Other answers have already given you code, I hope this gives a bit of insight.
I'm hoping Alex Martelli will answer with even more details.
I found I needed to do something similar today. The algorithm knew when it was creating a new list that needed to added to the set, but not when it would have finished operating on the list.
Anyway, the behaviour I wanted was for set to use id rather than hash. As such I found mydict[id(mylist)] = mylist instead of myset.add(mylist) to offer the behaviour I wanted.
You want to add a tuple, not a list:
>>> a=set('abcde')
>>> a
set(['a', 'c', 'b', 'e', 'd'])
>>> l=['f','g']
>>> l
['f', 'g']
>>> t = tuple(l)
>>> t
('f', 'g')
>>> a.add(t)
>>> a
set(['a', 'c', 'b', 'e', 'd', ('f', 'g')])
If you have a list, you can convert to the tuple, as shown above. A tuple is immutable, so it can be added to the set.
You'll want to use tuples, which are hashable (you can't hash a mutable object like a list).
>>> a = set("abcde")
>>> a
set(['a', 'c', 'b', 'e', 'd'])
>>> t = ('f', 'g')
>>> a.add(t)
>>> a
set(['a', 'c', 'b', 'e', 'd', ('f', 'g')])
Here is how I usually do it:
def add_list_to_set(my_list, my_set):
[my_set.add(each) for each in my_list]
return my_set
Try using * unpack, like below:
>>> a=set('abcde')
>>> a
{'a', 'd', 'e', 'b', 'c'}
>>> l=['f','g']
>>> l
['f', 'g']
>>> {*l, *a}
{'a', 'd', 'e', 'f', 'b', 'g', 'c'}
>>>
Non Editor version:
a=set('abcde')
l=['f', 'g']
print({*l, *a})
Output:
{'a', 'd', 'e', 'f', 'b', 'g', 'c'}
Union is the easiest way:
list0 = ['a', 'b', 'c']
set0 = set()
set0.add('d')
set0.add('e')
set0.add('f')
set0 = set0.union(list0)
print(set0)
Output:
{'b', 'd', 'f', 'c', 'a', 'e'}

Categories

Resources