update vs. union for sets in python [duplicate] - python

Python sets have these methods:
s.union(t) s | t new set with elements from both s and t
s.update(t) s |= t return set s with elements added from t
Likewise, there's also these:
s.intersection_update(t) s &= t return set s keeping only elements also found in t
s.intersection(t) s & t new set with elements common to s and t
And so on, for all the standard relational algebra operations.
What exactly is the difference here? I see that it says that the update() versions returns s instead of a new set, but if I write x = s.update(t), does that means that id(x) == id(s)? Are they references to the same object now?
Why are both sets of methods implemented? It doesn't seem to add any significant functionality.

They are very different. One set changes the set in place, while the other leaves the original set alone, and returns a copy instead.
>>> s = {1, 2, 3}
>>> news = s | {4}
>>> s
set([1, 2, 3])
>>> news
set([1, 2, 3, 4])
Note how s has remained unchanged.
>>> s.update({4})
>>> s
set([1, 2, 3, 4])
Now I've changed s itself. Note also that .update() didn't appear to return anything; it did not return s to the caller and the Python interpreter did not echo a value.
Methods that change objects in-place never return the original in Python. Their return value is always None instead (which is never echoed).

The _update methods modify the set in-place and return None. The methods without update return a new object. You almost certainly do not want to do x = s.update(t), since that will set x to None.
>>> x = set([1, 2])
>>> x.intersection(set([2, 3]))
set([2])
>>> x
set([1, 2])
>>> x.intersection_update(set([2, 3]))
>>> x
set([2])
>>> x = x.intersection_update(set([2, 3]))
>>> print x
None
The functionality added by the _update methods is the ability to modify existing sets. If you share a set between multiple objects, you may want to modify the existing set so the other objects sharing it will see the changes. If you just create a new set, the other objects won't know about it.

It looks like the docs don't state it in the clearest way possible, but set.update doesn't return anything at all (which is equivalent to returning None), neither does set.intersection_update. Like list.append or list.extend or dict.update, they modify the container in place.
In [1]: set('abba')
Out[1]: set(['a', 'b'])
In [2]: set('abba').update(set('c'))
In [3]:
Edit: actually, the docs don't say what you show in the question. They say:
Update the set, adding elements from all others.
and
Update the set, keeping only elements found in it and all others.

Related

Appended object to a set is `NoneType` python 2.7

I have a huge array of labels which I make unique via:
unique_train_labels = set(train_property_labels)
Which prints out as set([u'A', u'B', u'C']). I want to create a new set of unique labels with a new label called "no_region", and am using:
unique_train_labels_threshold = unique_train_labels.add('no_region')
However, this prints out to be None.
My ultimate aim is to use these unique labels to later generate a random array of categorical labels via:
rng = np.random.RandomState(101)
categorical_random = rng.choice(list(unique_train_labels), len(finalTestSentences))
categorical_random_threshold = rng.choice(list(unique_train_labels_threshold), len(finalTestSentences))
From the docs it says that set.add() should generate a new set, which seems not to be the case (hence I can't later call list(unique_train_labels_threshold))
As mentioned in Moses' answer, the set.add method mutates the original set, it does not create a new set. In Python it's conventional for methods that perform in-place mutation to return None; the methods of all built-in mutable types do that, and the convention is generally observed by 3rd-party libraries.
An alternative to using the .copy method is to use the .union method, which returns a new set that is the union of the original set and the set supplied as an argument. For sets, the | or operator invokes the .union method.
a = {1, 2, 3}
b = a.union({5})
c = a | {4}
print(a, b, c)
output
{1, 2, 3} {1, 2, 3, 5} {1, 2, 3, 4}
The .union method (like other set methods that can be invoked via operator syntax) has a slight advantage over the operator syntax: you can pass it any iterable for its argument; the operator version requires you to explicitly convert the argument to a set (or frozenset).
a = {1, 2, 3}
b = a.union([5, 6])
c = a | set([7, 8])
print(a, b, c)
output
{1, 2, 3} {1, 2, 3, 5, 6} {1, 2, 3, 7, 8}
Using the explicit .union method is slightly more efficient here because it bypasses converting the arg to a set: internally, the method just iterates over the contents of the arg, adding them to the new set, so it doesn't care if the arg is a set, list, tuple, string, or dict.
From the official Python set docs
Note, the non-operator versions of union(), intersection(),
difference(), and symmetric_difference(), issubset(), and issuperset()
methods will accept any iterable as an argument. In contrast, their
operator based counterparts require their arguments to be sets. This
precludes error-prone constructions like set('abc') & 'cbs' in favor
of the more readable set('abc').intersection('cbs').
The set add method mutates the set inplace and returns a None.
You should do:
unique_train_labels_threshold = unique_train_labels.copy()
unique_train_labels_threshold.add('no_region')
Using copy ensures mutations on the new set are not propagated to the old one.
I can't find in the documentation anywhere were it says that add generates a new set.
You need to copy the original, then add to the new copy.
import copy
unique_train_labels_threshold = copy.deepcopy(unique_train_labels)
unique_train_labels_threshold.add('no_region')

Why use Inbuilt copy() function for sets in python when we can simply assign it? [duplicate]

This question already has answers here:
Variable assignment and modification (in python) [duplicate]
(6 answers)
Closed 4 years ago.
Given below is the code showing use of copy() function.
s = set([1, 2, 3, 4, 5])
t = s.copy()
g = s
print s == t #Output: True
print s == g #Output: True
What is use of copy() function when we can simply assign value of s in g?
Why do we have a separate function('copy') to do this task?
Continue with your example by modifying g: s will change, but t won't.
>>> g.add(4)
>>> g
set([1, 2, 3, 4])
>>> s
set([1, 2, 3, 4])
>>> t
set([1, 2, 3])
Because those two assignments aren't doing the same thing:
>>> t is s
False
>>> g is s
True
t may be equal to s, but .copy() has created a separate object, whereas g is a reference to the same object. Why is this difference relevant? Consider this:
>>> g.add(6)
>>> s
set([1, 2, 3, 4, 5, 6])
>>> t
set([1, 2, 3, 4, 5])
You might find this useful reading.
From the docs:
Assignment statements in Python do not copy objects, they create bindings between a target and an object.
You can visualize the difference here. Notice s and g point to the same object, where as t points to a shallow copy of the two.
Sets are mutable objects in Python, so depending on what you are doing with it, you may want to operate over a copy instead in order to prevent the propagation of any changes you make.
If you are sure the operations you are performing have no side effects, go ahead and just assign it. In this case, be aware that any change to the value pointed by s will also affect the value of t, because both point to the same set instance (it helps if you think of python variables as C pointers).

How to pass a list element as reference?

I am passing a single element of a list to a function. I want to modify that element, and therefore, the list itself.
def ModList(element):
element = 'TWO'
l = list();
l.append('one')
l.append('two')
l.append('three')
print l
ModList(l[1])
print l
But this method does not modify the list. It's like the element is passed by value. The output is:
['one','two','three']
['one','two','three']
I want that the second element of the list after the function call to be 'TWO':
['one','TWO','three']
Is this possible?
The explanations already here are correct. However, since I have wanted to abuse python in a similar fashion, I will submit this method as a workaround.
Calling a specific element from a list directly returns a copy of the value at that element in the list. Even copying a sublist of a list returns a new reference to an array containing copies of the values. Consider this example:
>>> a = [1, 2, 3, 4]
>>> b = a[2]
>>> b
3
>>> c = a[2:3]
>>> c
[3]
>>> b=5
>>> c[0]=6
>>> a
[1, 2, 3, 4]
Neither b, a value only copy, nor c, a sublist copied from a, is able to change values in a. There is no link, despite their common origin.
However, numpy arrays use a "raw-er" memory allocation and allow views of data to be returned. A view allows data to be represented in a different way while maintaining the association with the original data. A working example is therefore
>>> import numpy as np
>>> a = np.array([1, 2, 3, 4])
>>> a
array([1, 2, 3, 4])
>>> b = a[2]
>>> b
3
>>> b=5
>>> a
array([1, 2, 3, 4])
>>> c = a[2:3]
>>> c
array([3])
>>> c[0]=6
>>> a
array([1, 2, 6, 4])
>>>
While extracting a single element still copies by value only, maintaining an array view of element 2 is referenced to the original element 2 of a (although it is now element 0 of c), and the change made to c's value changes a as well.
Numpy ndarrays have many different types, including a generic object type. This means that you can maintain this "by-reference" behavior for almost any type of data, not only numerical values.
Python doesn't do pass by reference. Just do it explicitly:
l[1] = ModList(l[1])
Also, since this only changes one element, I'd suggest that ModList is a confusing name.
Python is a pass by value language hence you can't change the value by assignment in the function ModList. What you could do instead though is pass the list and index into ModList and then modify the element that way
def ModList(theList, theIndex) :
theList[theIndex] = 'TWO'
ModList(l, 1)
In many cases you can also consider to let the function both modify and return the modified list. This makes the caller code more readable:
def ModList(theList, theIndex) :
theList[theIndex] = 'TWO'
return theList
l = ModList(l, 1)

Remove the last N elements of a list

Is there a a better way to remove the last N elements of a list.
for i in range(0,n):
lst.pop( )
Works for n >= 1
>>> L = [1,2,3, 4, 5]
>>> n=2
>>> del L[-n:]
>>> L
[1, 2, 3]
if you wish to remove the last n elements, in other words, keep first len - n elements:
lst = lst[:len(lst)-n]
Note: This is not an in memory operation. It would create a shallow copy.
As Vincenzooo correctly says, the pythonic lst[:-n] does not work when n==0.
The following works for all n>=0:
lst = lst[:-n or None]
I like this solution because it is kind of readable in English too: "return a slice omitting the last n elements or none (if none needs to be omitted)".
This solution works because of the following:
x or y evaluates to x when x is logically true (e.g., when it is not 0, "", False, None, ...) and to y otherwise. So -n or None is -n when n!=0 and None when n==0.
When slicing, None is equivalent to omitting the value, so lst[:None] is the same as lst[:] (see here).
As noted by #swK, this solution creates a new list (but immediately discards the old one unless it's referenced elsewhere) rather than editing the original one. This is often not a problem in terms of performance as creating a new list in one go is often faster than removing one element at the time (unless n<<len(lst)). It is also often not a problem in terms of space as usually the members of the list take more space than the list itself (unless it's a list of small objects like bytes or the list has many duplicated entries). Please also note that this solution is not exactly equivalent to the OP's: if the original list is referenced by other variables, this solution will not modify (shorten) the other copies unlike in the OP's code.
A possible solution (in the same style as my original one) that works for n>=0 but: a) does not create a copy of the list; and b) also affects other references to the same list, could be the following:
lst[-n:n and None] = []
This is definitely not readable and should not be used. Actually, even my original solution requires too much understanding of the language to be quickly read and univocally understood by everyone. I wouldn't use either in any real code and I think the best solution is that by #wonder.mice: a[len(a)-n:] = [].
Just try to del it like this.
del list[-n:]
I see this was asked a long ago, but none of the answers did it for me; what if we want to get a list without the last N elements, but keep the original one: you just do list[:-n]. If you need to handle cases where n may equal 0, you do list[:-n or None].
>>> a = [1,2,3,4,5,6,7]
>>> b = a[:-4]
>>> b
[1, 2, 3]
>>> a
[1, 1, 2, 3, 4, 5, 7]
As simple as that.
Should be using this:
a[len(a)-n:] = []
or this:
del a[len(a)-n:]
It's much faster, since it really removes items from existing array. The opposite (a = a[:len(a)-1]) creates new list object and less efficient.
>>> timeit.timeit("a = a[:len(a)-1]\na.append(1)", setup="a=range(100)", number=10000000)
6.833014965057373
>>> timeit.timeit("a[len(a)-1:] = []\na.append(1)", setup="a=range(100)", number=10000000)
2.0737061500549316
>>> timeit.timeit("a[-1:] = []\na.append(1)", setup="a=range(100)", number=10000000)
1.507638931274414
>>> timeit.timeit("del a[-1:]\na.append(1)", setup="a=range(100)", number=10000000)
1.2029790878295898
If 0 < n you can use a[-n:] = [] or del a[-n:] which is even faster.
This is one of the cases in which being pythonic doesn't work for me and can give hidden bugs or mess.
None of the solutions above works for the case n=0.
Using l[:len(l)-n] works in the general case:
l=range(4)
for n in [2,1,0]: #test values for numbers of points to cut
print n,l[:len(l)-n]
This is useful for example inside a function to trim edges of a vector, where you want to leave the possibility not to cut anything.

Are Python sets mutable?

Are sets in Python mutable?
In other words, if I do this:
x = set([1, 2, 3])
y = x
y |= set([4, 5, 6])
Are x and y still pointing to the same object, or was a new set created and assigned to y?
>>>> x = set([1, 2, 3])
>>>> y = x
>>>>
>>>> y |= set([4, 5, 6])
>>>> print x
set([1, 2, 3, 4, 5, 6])
>>>> print y
set([1, 2, 3, 4, 5, 6])
Sets are unordered.
Set elements are unique. Duplicate elements are not allowed.
A set itself may be modified, but the elements contained in the set must be of an immutable type.
set1 = {1,2,3}
set2 = {1,2,[1,2]} --> unhashable type: 'list'
# Set elements should be immutable.
Conclusion: sets are mutable.
Your two questions are different.
Are Python sets mutable?
Yes: "mutable" means that you can change the object. For example, integers are not mutable: you cannot change the number 1 to mean anything else. You can, however, add elements to a set, which mutates it.
Does y = x; y |= {1,2,3} change x?
Yes. The code y = x means "bind the name y to mean the same object that the name x currently represents". The code y |= {1,2,3} calls the magic method y.__ior__({1,2,3}) under the hood, which mutates the object represented by the name y. Since this is the same object as is represented by x, you should expect the set to change.
You can check whether two names point to precisely the same object using the is operator: x is y just if the objects represented by the names x and y are the same object.
If you want to copy an object, the usual syntax is y = x.copy() or y = set(x). This is only a shallow copy, however: although it copies the set object, the members of said object are not copied. If you want a deepcopy, use copy.deepcopy(x).
Python sets are classified into two types. Mutable and immutable. A set created with 'set' is mutable while the one created with 'frozenset' is immutable.
>>> s = set(list('hello'))
>>> type(s)
<class 'set'>
The following methods are for mutable sets.
s.add(item) -- Adds item to s. Has no effect if listis already in s.
s.clear() -- Removes all items from s.
s.difference_update(t) -- Removes all the items from s that are also in t.
s.discard(item) -- Removes item from s. If item is not a member of s, nothing happens.
All these operations modify the set s in place. The parameter t can be any object that supports iteration.
After changing the set, even their object references match. I don't know why that textbook says sets are immutable.
>>> s1 ={1,2,3}
>>> id(s1)
140061513171016
>>> s1|={5,6,7}
>>> s1
{1, 2, 3, 5, 6, 7}
>>> id(s1)
140061513171016
print x,y
and you see they both point to the same set:
set([1, 2, 3, 4, 5, 6]) set([1, 2, 3, 4, 5, 6])
Sets are muttable
s = {2,3,4,5,6}
type(s)
<class 'set'>
s.add(9)
s
{2, 3, 4, 5, 6, 9}
We are able to change elements of set
Yes, Python sets are mutable because we can add, delete elements into set, but sets can't contain mutable items into itself. Like the below code will give an error:
s = set([[1,2,3],[4,5,6]])
So sets are mutable but can't contain mutable items, because set internally uses hashtable to store its elements so for that set elements need to be hashable.
But mutable elements like list are not hashable.
Note:
Mutable elements are not hashable
Immutable elements are hashable
Just like key of a dictionary can't be a list.
Sets are mutable, you can add to them. The items they contain CAN BE MUTABLE THEY MUST BE HASHABLE. I didn't see any correct answers in this post so here is the code
class MyClass:
"""
This class is hashable, however, the hashes are
unique per instance not the data so a set will
have no way to determine equality
"""
def __init__(self):
self.my_attr = "no-unique-hash"
def __repr__(self):
return self.my_attr
class MyHashableClass:
"""
This object implements __hash__ and __eq__ and will
produce the same hash if the data is the same.
That way a set can remove equal objects.
"""
def __init__(self):
self.my_attr = "unique-hash"
def __hash__(self):
return hash(str(self))
def __eq__(self, other):
return hash(self) == hash(other)
def __repr__(self):
return self.my_attr
myclass_instance1 = MyClass()
myclass_instance2 = MyClass()
my_hashable_instance1 = MyHashableClass()
my_hashable_instance2 = MyHashableClass()
my_set = {
myclass_instance1,
myclass_instance2,
my_hashable_instance1,
my_hashable_instance2, # will be removed, not unique
} # sets can contain mutuable types
# The only objects set can not contain are objects
# with the __hash__=None, such as List, Dict, and Sets
print(my_set)
# prints {unique-hash, no-unique-hash, no-unique-hash }
my_hashable_instance1.my_attr = "new-hash" # mutating the object
# now that the hashes between the objects are differrent
# instance2 can be added
my_set.add(my_hashable_instance2)
print(my_set)
# {new-hash, no-unique-hash, no-unique-hash, unique-hash}
I don't think Python sets are mutable as mentioned clearly in book "Learning Python 5th Edition by Mark Lutz - O'Reilly Publications"

Categories

Resources