Are Python sets mutable? - python

Are sets in Python mutable?
In other words, if I do this:
x = set([1, 2, 3])
y = x
y |= set([4, 5, 6])
Are x and y still pointing to the same object, or was a new set created and assigned to y?

>>>> x = set([1, 2, 3])
>>>> y = x
>>>>
>>>> y |= set([4, 5, 6])
>>>> print x
set([1, 2, 3, 4, 5, 6])
>>>> print y
set([1, 2, 3, 4, 5, 6])
Sets are unordered.
Set elements are unique. Duplicate elements are not allowed.
A set itself may be modified, but the elements contained in the set must be of an immutable type.
set1 = {1,2,3}
set2 = {1,2,[1,2]} --> unhashable type: 'list'
# Set elements should be immutable.
Conclusion: sets are mutable.

Your two questions are different.
Are Python sets mutable?
Yes: "mutable" means that you can change the object. For example, integers are not mutable: you cannot change the number 1 to mean anything else. You can, however, add elements to a set, which mutates it.
Does y = x; y |= {1,2,3} change x?
Yes. The code y = x means "bind the name y to mean the same object that the name x currently represents". The code y |= {1,2,3} calls the magic method y.__ior__({1,2,3}) under the hood, which mutates the object represented by the name y. Since this is the same object as is represented by x, you should expect the set to change.
You can check whether two names point to precisely the same object using the is operator: x is y just if the objects represented by the names x and y are the same object.
If you want to copy an object, the usual syntax is y = x.copy() or y = set(x). This is only a shallow copy, however: although it copies the set object, the members of said object are not copied. If you want a deepcopy, use copy.deepcopy(x).

Python sets are classified into two types. Mutable and immutable. A set created with 'set' is mutable while the one created with 'frozenset' is immutable.
>>> s = set(list('hello'))
>>> type(s)
<class 'set'>
The following methods are for mutable sets.
s.add(item) -- Adds item to s. Has no effect if listis already in s.
s.clear() -- Removes all items from s.
s.difference_update(t) -- Removes all the items from s that are also in t.
s.discard(item) -- Removes item from s. If item is not a member of s, nothing happens.
All these operations modify the set s in place. The parameter t can be any object that supports iteration.

After changing the set, even their object references match. I don't know why that textbook says sets are immutable.
>>> s1 ={1,2,3}
>>> id(s1)
140061513171016
>>> s1|={5,6,7}
>>> s1
{1, 2, 3, 5, 6, 7}
>>> id(s1)
140061513171016

print x,y
and you see they both point to the same set:
set([1, 2, 3, 4, 5, 6]) set([1, 2, 3, 4, 5, 6])

Sets are muttable
s = {2,3,4,5,6}
type(s)
<class 'set'>
s.add(9)
s
{2, 3, 4, 5, 6, 9}
We are able to change elements of set

Yes, Python sets are mutable because we can add, delete elements into set, but sets can't contain mutable items into itself. Like the below code will give an error:
s = set([[1,2,3],[4,5,6]])
So sets are mutable but can't contain mutable items, because set internally uses hashtable to store its elements so for that set elements need to be hashable.
But mutable elements like list are not hashable.
Note:
Mutable elements are not hashable
Immutable elements are hashable
Just like key of a dictionary can't be a list.

Sets are mutable, you can add to them. The items they contain CAN BE MUTABLE THEY MUST BE HASHABLE. I didn't see any correct answers in this post so here is the code
class MyClass:
"""
This class is hashable, however, the hashes are
unique per instance not the data so a set will
have no way to determine equality
"""
def __init__(self):
self.my_attr = "no-unique-hash"
def __repr__(self):
return self.my_attr
class MyHashableClass:
"""
This object implements __hash__ and __eq__ and will
produce the same hash if the data is the same.
That way a set can remove equal objects.
"""
def __init__(self):
self.my_attr = "unique-hash"
def __hash__(self):
return hash(str(self))
def __eq__(self, other):
return hash(self) == hash(other)
def __repr__(self):
return self.my_attr
myclass_instance1 = MyClass()
myclass_instance2 = MyClass()
my_hashable_instance1 = MyHashableClass()
my_hashable_instance2 = MyHashableClass()
my_set = {
myclass_instance1,
myclass_instance2,
my_hashable_instance1,
my_hashable_instance2, # will be removed, not unique
} # sets can contain mutuable types
# The only objects set can not contain are objects
# with the __hash__=None, such as List, Dict, and Sets
print(my_set)
# prints {unique-hash, no-unique-hash, no-unique-hash }
my_hashable_instance1.my_attr = "new-hash" # mutating the object
# now that the hashes between the objects are differrent
# instance2 can be added
my_set.add(my_hashable_instance2)
print(my_set)
# {new-hash, no-unique-hash, no-unique-hash, unique-hash}

I don't think Python sets are mutable as mentioned clearly in book "Learning Python 5th Edition by Mark Lutz - O'Reilly Publications"

Related

update vs. union for sets in python [duplicate]

Python sets have these methods:
s.union(t) s | t new set with elements from both s and t
s.update(t) s |= t return set s with elements added from t
Likewise, there's also these:
s.intersection_update(t) s &= t return set s keeping only elements also found in t
s.intersection(t) s & t new set with elements common to s and t
And so on, for all the standard relational algebra operations.
What exactly is the difference here? I see that it says that the update() versions returns s instead of a new set, but if I write x = s.update(t), does that means that id(x) == id(s)? Are they references to the same object now?
Why are both sets of methods implemented? It doesn't seem to add any significant functionality.
They are very different. One set changes the set in place, while the other leaves the original set alone, and returns a copy instead.
>>> s = {1, 2, 3}
>>> news = s | {4}
>>> s
set([1, 2, 3])
>>> news
set([1, 2, 3, 4])
Note how s has remained unchanged.
>>> s.update({4})
>>> s
set([1, 2, 3, 4])
Now I've changed s itself. Note also that .update() didn't appear to return anything; it did not return s to the caller and the Python interpreter did not echo a value.
Methods that change objects in-place never return the original in Python. Their return value is always None instead (which is never echoed).
The _update methods modify the set in-place and return None. The methods without update return a new object. You almost certainly do not want to do x = s.update(t), since that will set x to None.
>>> x = set([1, 2])
>>> x.intersection(set([2, 3]))
set([2])
>>> x
set([1, 2])
>>> x.intersection_update(set([2, 3]))
>>> x
set([2])
>>> x = x.intersection_update(set([2, 3]))
>>> print x
None
The functionality added by the _update methods is the ability to modify existing sets. If you share a set between multiple objects, you may want to modify the existing set so the other objects sharing it will see the changes. If you just create a new set, the other objects won't know about it.
It looks like the docs don't state it in the clearest way possible, but set.update doesn't return anything at all (which is equivalent to returning None), neither does set.intersection_update. Like list.append or list.extend or dict.update, they modify the container in place.
In [1]: set('abba')
Out[1]: set(['a', 'b'])
In [2]: set('abba').update(set('c'))
In [3]:
Edit: actually, the docs don't say what you show in the question. They say:
Update the set, adding elements from all others.
and
Update the set, keeping only elements found in it and all others.

What is the best practice way to create a list of 1 element type

For example, I want a list that looks like this: [1, 1, 1, 1, 1].
I could either create it:
l = [1, 1, 1, 1, 1]
Or I could multiple a single list containing solely the element 5 times:
l = [1] * 5
Or I could use a list comprehension:
l = [1 for i in range(5)]
Of these three ways, which is fastest and best practice?
I somehow remember the second approach being problematic because each position in the list will contain the same memory address, so updating one index will cause all elements in the list to be updated to the same value. However, after tinkering with it this doesn't seem to be the case.
Two better cases for mutable objects:
import copy
class A:
def __init__(self, value):
self.value = value
_LEN_OF_LIST = 5
# object could be created quickly
l = [A(value=100500) for i in range(_LEN_OF_LIST)]
# creating of object is expensive operation
default_object = A(value=100500) # mutable object which
l = [copy.deepcopy(default_object) for i in range(_LEN_OF_LIST)]
# addresses
print('\r\n'.join([str(obj) for obj in l]))
Output:
<__main__.A object at 0x0000028A9BD45B00>
<__main__.A object at 0x0000028A9BC17EB8>
<__main__.A object at 0x0000028A9BC17AC8>
<__main__.A object at 0x0000028A9BC17B38>
<__main__.A object at 0x0000028A9BC17DA0>
As you can see: each object has got own address in memory

Shallow copy of list[:]

According to this Official Documentation:
list[:]
creates a new list by shallow copy. I performed following experiments:
>>> squares = [1, 4, 9, 16, 25]
>>> new_squares = square[:]
>>> squares is new_squares
False
>>> squares[0] is new_squares[0]
True
>>> id(squares)
4468706952
>>> id(new_squares)
4468425032
>>> id(squares[0])
4466081856
>>> id(new_squares[0])
4466081856
All here look good! new_square and square are different object (list here), but because of shallow copy, they share the same content. However, the following results make me confused:
>>> new_squares[0] = 0
>>> new_squares
[0, 4, 9, 16, 25]
>>> squares
[1, 4, 9, 16, 25]
I update new_square[0] but square is not affected. I checked their ids:
>>> id(new_squares[0])
4466081824
>>> id(squares[0])
4466081856
You can find that the id of squares[0] keeps no change but the id of new_squares[0] changes. This is quite different from the shallow copy I have understood before.
Could anyone can explain it? Thanks!
You have a list object that represents a container of other objects. When you do a shallow copy, you create a new list object (as you see) that contains references to the same objects that the original list contained.
new_squares[0] = 0 is an assignment. You're saying "set a new object at the 0th index of the list". Well, the lists are now separate objects and you're flatly replacing the object held at an index of the copy. It wouldn't matter if the object at the 0th index was mutable either, since you're just replacing the reference that the list object holds.
If the list instead contained a mutable object and you were to modify that object in place without completely changing what object is stored in that index, then you would see the change across both lists. Not because the lists are in any way linked, but because they hold reference to a mutable object that you have now changed.
This can be illustrated below, where I can separately make modifications to the shallow-copied list, and also cause a mutable object to change across both lists, even when that mutable object is now at difference indices between the two.
# MAKING A CHANGE TO THE LIST
a = [1, {'c': 'd'}, 3, 4]
b = a[:]
b.insert(0, 0)
print(a)
print(b)
print()
# MODIFYING A MUTABLE OBJECT INSIDE THE LIST
a[1]['c'] = 'something_else'
print(a)
print(b)
list are mutables, integers are immutables
when you do:
squares = [1, 4, 9, 16, 25]
new_squares = square[:]
squares and new_squares have different ids
if you do:
[id(squares[i]) for i in range(len(squares))]
[id(new_squares[i]) for i in range(len(new_squares))]
you'll see the same id for each integer.
If you modify an integer with another value, you'll have a new id for this integer

How to pass a list element as reference?

I am passing a single element of a list to a function. I want to modify that element, and therefore, the list itself.
def ModList(element):
element = 'TWO'
l = list();
l.append('one')
l.append('two')
l.append('three')
print l
ModList(l[1])
print l
But this method does not modify the list. It's like the element is passed by value. The output is:
['one','two','three']
['one','two','three']
I want that the second element of the list after the function call to be 'TWO':
['one','TWO','three']
Is this possible?
The explanations already here are correct. However, since I have wanted to abuse python in a similar fashion, I will submit this method as a workaround.
Calling a specific element from a list directly returns a copy of the value at that element in the list. Even copying a sublist of a list returns a new reference to an array containing copies of the values. Consider this example:
>>> a = [1, 2, 3, 4]
>>> b = a[2]
>>> b
3
>>> c = a[2:3]
>>> c
[3]
>>> b=5
>>> c[0]=6
>>> a
[1, 2, 3, 4]
Neither b, a value only copy, nor c, a sublist copied from a, is able to change values in a. There is no link, despite their common origin.
However, numpy arrays use a "raw-er" memory allocation and allow views of data to be returned. A view allows data to be represented in a different way while maintaining the association with the original data. A working example is therefore
>>> import numpy as np
>>> a = np.array([1, 2, 3, 4])
>>> a
array([1, 2, 3, 4])
>>> b = a[2]
>>> b
3
>>> b=5
>>> a
array([1, 2, 3, 4])
>>> c = a[2:3]
>>> c
array([3])
>>> c[0]=6
>>> a
array([1, 2, 6, 4])
>>>
While extracting a single element still copies by value only, maintaining an array view of element 2 is referenced to the original element 2 of a (although it is now element 0 of c), and the change made to c's value changes a as well.
Numpy ndarrays have many different types, including a generic object type. This means that you can maintain this "by-reference" behavior for almost any type of data, not only numerical values.
Python doesn't do pass by reference. Just do it explicitly:
l[1] = ModList(l[1])
Also, since this only changes one element, I'd suggest that ModList is a confusing name.
Python is a pass by value language hence you can't change the value by assignment in the function ModList. What you could do instead though is pass the list and index into ModList and then modify the element that way
def ModList(theList, theIndex) :
theList[theIndex] = 'TWO'
ModList(l, 1)
In many cases you can also consider to let the function both modify and return the modified list. This makes the caller code more readable:
def ModList(theList, theIndex) :
theList[theIndex] = 'TWO'
return theList
l = ModList(l, 1)

Understanding the behavior of Python's set

The documentation for the built-in type set says:
class set([iterable])
Return a new set or frozenset object
whose elements are taken from
iterable. The elements of a set must
be hashable.
That is all right but why does this work:
>>> l = range(10)
>>> s = set(l)
>>> s
set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
And this doesn't:
>>> s.add([10])
Traceback (most recent call last):
File "<pyshell#7>", line 1, in <module>
s.add([10])
TypeError: unhashable type: 'list'
Both are lists. Is some magic happening during the initialization?
When you initialize a set, you provide a list of values that must each be hashable.
s = set()
s.add([10])
is the same as
s = set([[10]])
which throws the same error that you're seeing right now.
In [13]: (2).__hash__
Out[13]: <method-wrapper '__hash__' of int object at 0x9f61d84>
In [14]: ([2]).__hash__ # nothing.
The thing is that set needs its items to be hashable, i.e. implement the __hash__ magic method (this is used for ordering in the tree as far as I know). list does not implement that magic method, hence it cannot be added in a set.
In this line:
s.add([10])
You are trying to add a list to the set, rather than the elements of the list. If you want ot add the elements of the list, use the update method.
Think of the constructor being something like:
class Set:
def __init__(self,l):
for elem in l:
self.add(elem)
Nothing too interesting to be concerned about why it takes lists but on the other hand add(element) does not.
It behaves according to the documentation: set.add() adds a single element (and since you give it a list, it complains it is unhashable - since lists are no good as hash keys). If you want to add a list of elements, use set.update(). Example:
>>> s = set([1,2,3])
>>> s.add(5)
>>> s
set([1, 2, 3, 5])
>>> s.update([8])
>>> s
set([8, 1, 2, 3, 5])
s.add([10]) works as documented. An exception is raised because [10] is not hashable.
There is no magic happening during initialisation.
set([0,1,2,3,4,5,6,7,8,9]) has the same effect as set(range(10)) and set(xrange(10)) and set(foo()) where
def foo():
for i in (9,8,7,6,5,4,3,2,1,0):
yield i
In other words, the arg to set is an iterable, and each of the values obtained from the iterable must be hashable.

Categories

Resources