Understanding the behavior of Python's set - python

The documentation for the built-in type set says:
class set([iterable])
Return a new set or frozenset object
whose elements are taken from
iterable. The elements of a set must
be hashable.
That is all right but why does this work:
>>> l = range(10)
>>> s = set(l)
>>> s
set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
And this doesn't:
>>> s.add([10])
Traceback (most recent call last):
File "<pyshell#7>", line 1, in <module>
s.add([10])
TypeError: unhashable type: 'list'
Both are lists. Is some magic happening during the initialization?

When you initialize a set, you provide a list of values that must each be hashable.
s = set()
s.add([10])
is the same as
s = set([[10]])
which throws the same error that you're seeing right now.

In [13]: (2).__hash__
Out[13]: <method-wrapper '__hash__' of int object at 0x9f61d84>
In [14]: ([2]).__hash__ # nothing.
The thing is that set needs its items to be hashable, i.e. implement the __hash__ magic method (this is used for ordering in the tree as far as I know). list does not implement that magic method, hence it cannot be added in a set.

In this line:
s.add([10])
You are trying to add a list to the set, rather than the elements of the list. If you want ot add the elements of the list, use the update method.

Think of the constructor being something like:
class Set:
def __init__(self,l):
for elem in l:
self.add(elem)
Nothing too interesting to be concerned about why it takes lists but on the other hand add(element) does not.

It behaves according to the documentation: set.add() adds a single element (and since you give it a list, it complains it is unhashable - since lists are no good as hash keys). If you want to add a list of elements, use set.update(). Example:
>>> s = set([1,2,3])
>>> s.add(5)
>>> s
set([1, 2, 3, 5])
>>> s.update([8])
>>> s
set([8, 1, 2, 3, 5])

s.add([10]) works as documented. An exception is raised because [10] is not hashable.
There is no magic happening during initialisation.
set([0,1,2,3,4,5,6,7,8,9]) has the same effect as set(range(10)) and set(xrange(10)) and set(foo()) where
def foo():
for i in (9,8,7,6,5,4,3,2,1,0):
yield i
In other words, the arg to set is an iterable, and each of the values obtained from the iterable must be hashable.

Related

Why isn't lst.sort().reverse() valid?

Per title. I do not understand why it is not valid. I understand that they mutate the object, but if you call the sort method, after it's done then you'd call the reverse method so it should be fine. Why is it then that I need to type lst.sort() then on the line below, lst.reverse()?
Edit: Well, when it's pointed out like that, it's a bit embarrassing how I didn't get it before. I literally recognize that it mutated the object and thus returns a None, but I suppose it didn't register that also meant that you can't reverse a None-type object.
When you call lst.sort(), it does not return anything, it changes the list itself.
So the result of lst.sort() is None, thus you try to reverse None which is impossible.
Put simply, lst.sort() does not return the list sorted. It modifies itself.
>>> lst = [3,1,2,0]
>>> lst
[3, 1, 2, 0]
>>> lst.sort().reverse()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'reverse'
>>>
Since lst.sort() doesn't return anything, Python automatically returns None for you. Since None doesn't have a reverse method, you get an error.
>>> lst.sort()
>>> lst.reverse()
>>> lst
[3, 2, 1, 0]
>>>
You can also try reversing the list while sorting Like
lst.sort( reverse=True )

Creating a numpy array from a set

I noticed the following behaviour exhibited by numpy arrays:
>>> import numpy as np
>>> s = {1,2,3}
>>> l = [1,2,3]
>>> np.array(l)
array([1, 2, 3])
>>> np.array(s)
array({1, 2, 3}, dtype=object)
>>> np.array(l, dtype='int')
array([1, 2, 3])
>>> np.array(l, dtype='int').dtype
dtype('int64')
>>> np.array(s, dtype='int')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: int() argument must be a string, a bytes-like object or a number, not 'set'
There are 2 things to notice:
Creating an array from a set results in the array dtype being object
Trying to specify dtype results in an error which suggests that the
set is being treated as a single element rather than an iterable.
What am I missing - I don't fully understand which bit of python I'm overlooking. Set is a mutable object much like a list is.
EDIT: tuples work fine:
>>> t = (1,2,3)
>>> np.array(t)
array([1, 2, 3])
>>> np.array(t).dtype
dtype('int64')
The array factory works best with sequence objects which a set is not. If you do not care about the order of elements and know they are all ints or convertible to int, then you can use np.fromiter
np.fromiter({1,2,3},int,3)
# array([1, 2, 3])
The second (dtype) argument is mandatory; the last (count) argument is optional, providing it can improve performance.
As you can see from the syntax of using curly brackets, a set are more closely related to a dict than to a list. You can solve it very simply by turning the set into a list or tuple before converting to an array:
>>> import numpy as np
>>> s = {1,2,3}
>>> np.array(s)
array({1, 2, 3}, dtype=object)
>>> np.array(list(s))
array([1, 2, 3])
>>> np.array(tuple(s))
array([1, 2, 3])
However this might be too inefficient for large sets, because the list or tuple functions have to run through the whole set before even starting the creation of the array. A better method would be to use the set as an iterator:
>>> np.fromiter(s, int)
array([1, 2, 3])
The np.array documentation says that the object argument must be "an array, any object exposing the array interface, an object whose __array__ method returns an array, or any (nested) sequence" (emphasis added).
A set is not a sequence. Specifically, sets are unordered and do not support the __getitem__ method. Hence you cannot create an array from a set like you trying to with the list.
Numpy expects the argument to be a list, it doesn't understand the set type so it creates an object array (this would be the same if you passed any other non sequence object). You can create a numpy array with a set by first converting the set to a list numpy.array(list(my_set)). Hope this helps.

add vs update in set operations in python - in case of String [duplicate]

What is the difference between add and update operations in python if i just want to add a single value to the set.
a = set()
a.update([1]) #works
a.add(1) #works
a.update([1,2])#works
a.add([1,2])#fails
Can someone explain why is this so.
set.add
set.add adds an individual element to the set. So,
>>> a = set()
>>> a.add(1)
>>> a
set([1])
works, but it cannot work with an iterable, unless it is hashable. That is the reason why a.add([1, 2]) fails.
>>> a.add([1, 2])
Traceback (most recent call last):
File "<input>", line 1, in <module>
TypeError: unhashable type: 'list'
Here, [1, 2] is treated as the element being added to the set and as the error message says, a list cannot be hashed but all the elements of a set are expected to be hashables. Quoting the documentation,
Return a new set or frozenset object whose elements are taken from iterable. The elements of a set must be hashable.
set.update
In case of set.update, you can pass multiple iterables to it and it will iterate all iterables and will include the individual elements in the set. Remember: It can accept only iterables. That is why you are getting an error when you try to update it with 1
>>> a.update(1)
Traceback (most recent call last):
File "<input>", line 1, in <module>
TypeError: 'int' object is not iterable
But, the following would work because the list [1] is iterated and the elements of the list are added to the set.
>>> a.update([1])
>>> a
set([1])
set.update is basically an equivalent of in-place set union operation. Consider the following cases
>>> set([1, 2]) | set([3, 4]) | set([1, 3])
set([1, 2, 3, 4])
>>> set([1, 2]) | set(range(3, 5)) | set(i for i in range(1, 5) if i % 2 == 1)
set([1, 2, 3, 4])
Here, we explicitly convert all the iterables to sets and then we find the union. There are multiple intermediate sets and unions. In this case, set.update serves as a good helper function. Since it accepts any iterable, you can simply do
>>> a.update([1, 2], range(3, 5), (i for i in range(1, 5) if i % 2 == 1))
>>> a
set([1, 2, 3, 4])
add is faster for a single element because it is exactly for that purpose, adding a single element:
In [5]: timeit a.update([1])
10000000 loops, best of 3: 191 ns per loop
In [6]: timeit a.add(1)
10000000 loops, best of 3: 69.9 ns per loop
update expects an iterable or iterables so if you have a single hashable element to add then use add if you have an iterable or iterables of hashable elements to add use update.
s.add(x) add element x to set s
s.update(t) s |= t return set s with elements added from t
add adds an element, update "adds" another iterable set, list or tuple, for example:
In [2]: my_set = {1,2,3}
In [3]: my_set.add(5)
In [4]: my_set
Out[4]: set([1, 2, 3, 5])
In [5]: my_set.update({6,7})
In [6]: my_set
Out[6]: set([1, 2, 3, 5, 6, 7])
.add() is intended for a single element, while .update() is for the introduction of other sets.
From help():
add(...)
Add an element to a set.
This has no effect if the element is already present.
update(...)
Update a set with the union of itself and others.
add only accepts a hashable type. A list is not hashable.
a.update(1) in your code won't work. add accepts an element and put it in the set if it is not already there but update takes an iterable and makes a unions of the set with that iterable. It's kind of like append and extend for the lists.
I guess no one mentioned about the good resource from Hackerrank. I'd like to paste how Hackerrank mentions the difference between update and add for set in python.
Sets are unordered bag of unique values. A single set contains values of any immutable data type.
CREATING SET
myset = {1, 2} # Directly assigning values to a set
myset = set() # Initializing a set
myset = set(['a', 'b']) # Creating a set from a list
print(myset) ===> {'a', 'b'}
MODIFYING SET - add() and update()
myset.add('c')
myset ===>{'a', 'c', 'b'}
myset.add('a') # As 'a' already exists in the set, nothing happens
myset.add((5, 4))
print(myset) ===> {'a', 'c', 'b', (5, 4)}
myset.update([1, 2, 3, 4]) # update() only works for iterable objects
print(myset) ===> {'a', 1, 'c', 'b', 4, 2, (5, 4), 3}
myset.update({1, 7, 8})
print(myset) ===>{'a', 1, 'c', 'b', 4, 7, 8, 2, (5, 4), 3}
myset.update({1, 6}, [5, 13])
print(myset) ===> {'a', 1, 'c', 'b', 4, 5, 6, 7, 8, 2, (5, 4), 13, 3}
Hope it helps. For more details on Hackerrank, here is the link.
add method directly adds elements to the set while the update method converts first argument into set then it adds
the list is hashable therefore we cannot add a hashable list to unhashable set.
We use add() method to add single value to a set.
We use update() method to add sequence values to a set.
Here Sequences are any iterables including list,tuple,string,dict etc.

Iterator vs Iterable?

(For python 3)
In the python docs, you can see that the list() function takes an iterable.
In the python docs, you can also see that the next() funciton takes an iterator.
So I did this in IDLE:
>>> var = map(lambda x: x+5, [1,2,3])
>>> var
>>> next(v)
>>> list(v)
Which gives the output:
<map object at 0x000000000375F978>
6
[7,8]
Frankly, this isn't what I expected. Is a map object an iterator or an iterable? Is there even a difference? Clearly both the list() and next() functions work on the map object, whatever it is.
Why do they both work?
An iterator is an iterable, but an iterable is not necessarily an iterator.
An iterable is anything that has an __iter__ method defined - e.g. lists and tuples, as well as iterators.
Iterators are a subset of iterables whose values cannot all be accessed at the same time, as they are not all stored in memory at once. These can be generated using functions like map, filter and iter, as well as functions using yield.
In your example, map returns an iterator, which is also an iterable, which is why both functions work with it. However, if we take a list for instance:
>>> lst = [1, 2, 3]
>>> list(lst)
[1, 2, 3]
>>> next(lst)
Traceback (most recent call last):
File "<pyshell#3>", line 1, in <module>
next(lst)
TypeError: 'list' object is not an iterator
we can see that next complains, because the list, an iterable, is not an iterator.

Are Python sets mutable?

Are sets in Python mutable?
In other words, if I do this:
x = set([1, 2, 3])
y = x
y |= set([4, 5, 6])
Are x and y still pointing to the same object, or was a new set created and assigned to y?
>>>> x = set([1, 2, 3])
>>>> y = x
>>>>
>>>> y |= set([4, 5, 6])
>>>> print x
set([1, 2, 3, 4, 5, 6])
>>>> print y
set([1, 2, 3, 4, 5, 6])
Sets are unordered.
Set elements are unique. Duplicate elements are not allowed.
A set itself may be modified, but the elements contained in the set must be of an immutable type.
set1 = {1,2,3}
set2 = {1,2,[1,2]} --> unhashable type: 'list'
# Set elements should be immutable.
Conclusion: sets are mutable.
Your two questions are different.
Are Python sets mutable?
Yes: "mutable" means that you can change the object. For example, integers are not mutable: you cannot change the number 1 to mean anything else. You can, however, add elements to a set, which mutates it.
Does y = x; y |= {1,2,3} change x?
Yes. The code y = x means "bind the name y to mean the same object that the name x currently represents". The code y |= {1,2,3} calls the magic method y.__ior__({1,2,3}) under the hood, which mutates the object represented by the name y. Since this is the same object as is represented by x, you should expect the set to change.
You can check whether two names point to precisely the same object using the is operator: x is y just if the objects represented by the names x and y are the same object.
If you want to copy an object, the usual syntax is y = x.copy() or y = set(x). This is only a shallow copy, however: although it copies the set object, the members of said object are not copied. If you want a deepcopy, use copy.deepcopy(x).
Python sets are classified into two types. Mutable and immutable. A set created with 'set' is mutable while the one created with 'frozenset' is immutable.
>>> s = set(list('hello'))
>>> type(s)
<class 'set'>
The following methods are for mutable sets.
s.add(item) -- Adds item to s. Has no effect if listis already in s.
s.clear() -- Removes all items from s.
s.difference_update(t) -- Removes all the items from s that are also in t.
s.discard(item) -- Removes item from s. If item is not a member of s, nothing happens.
All these operations modify the set s in place. The parameter t can be any object that supports iteration.
After changing the set, even their object references match. I don't know why that textbook says sets are immutable.
>>> s1 ={1,2,3}
>>> id(s1)
140061513171016
>>> s1|={5,6,7}
>>> s1
{1, 2, 3, 5, 6, 7}
>>> id(s1)
140061513171016
print x,y
and you see they both point to the same set:
set([1, 2, 3, 4, 5, 6]) set([1, 2, 3, 4, 5, 6])
Sets are muttable
s = {2,3,4,5,6}
type(s)
<class 'set'>
s.add(9)
s
{2, 3, 4, 5, 6, 9}
We are able to change elements of set
Yes, Python sets are mutable because we can add, delete elements into set, but sets can't contain mutable items into itself. Like the below code will give an error:
s = set([[1,2,3],[4,5,6]])
So sets are mutable but can't contain mutable items, because set internally uses hashtable to store its elements so for that set elements need to be hashable.
But mutable elements like list are not hashable.
Note:
Mutable elements are not hashable
Immutable elements are hashable
Just like key of a dictionary can't be a list.
Sets are mutable, you can add to them. The items they contain CAN BE MUTABLE THEY MUST BE HASHABLE. I didn't see any correct answers in this post so here is the code
class MyClass:
"""
This class is hashable, however, the hashes are
unique per instance not the data so a set will
have no way to determine equality
"""
def __init__(self):
self.my_attr = "no-unique-hash"
def __repr__(self):
return self.my_attr
class MyHashableClass:
"""
This object implements __hash__ and __eq__ and will
produce the same hash if the data is the same.
That way a set can remove equal objects.
"""
def __init__(self):
self.my_attr = "unique-hash"
def __hash__(self):
return hash(str(self))
def __eq__(self, other):
return hash(self) == hash(other)
def __repr__(self):
return self.my_attr
myclass_instance1 = MyClass()
myclass_instance2 = MyClass()
my_hashable_instance1 = MyHashableClass()
my_hashable_instance2 = MyHashableClass()
my_set = {
myclass_instance1,
myclass_instance2,
my_hashable_instance1,
my_hashable_instance2, # will be removed, not unique
} # sets can contain mutuable types
# The only objects set can not contain are objects
# with the __hash__=None, such as List, Dict, and Sets
print(my_set)
# prints {unique-hash, no-unique-hash, no-unique-hash }
my_hashable_instance1.my_attr = "new-hash" # mutating the object
# now that the hashes between the objects are differrent
# instance2 can be added
my_set.add(my_hashable_instance2)
print(my_set)
# {new-hash, no-unique-hash, no-unique-hash, unique-hash}
I don't think Python sets are mutable as mentioned clearly in book "Learning Python 5th Edition by Mark Lutz - O'Reilly Publications"

Categories

Resources