Say I have a list l. Under what circumstance is l.__rmul__(self, other) called?
I basically understood the documentation, but I would also like to see an example to clarify its usages beyond any doubt.
When Python attempts to multiply two objects, it first tries to call the left object's __mul__() method. If the left object doesn't have a __mul__() method (or the method returns NotImplemented, indicating it doesn't work with the right operand in question), then Python wants to know if the right object can do the multiplication. If the right operand is the same type as the left, Python knows it can't, because if the left object can't do it, another object of the same type certainly can't either.
If the two objects are different types, though, Python figures it's worth a shot. However, it needs some way to tell the right object that it is the right object in the operation, in case the operation is not commutative. (Multiplication is, of course, but not all operators are, and in any case * is not always used for multiplication!) So it calls __rmul__() instead of __mul__().
As an example, consider the following two statements:
print "nom" * 3
print 3 * "nom"
In the first case, Python calls the string's __mul__() method. The string knows how to multiply itself by an integer, so all is well. In the second case, the integer does not know how to multiply itself by a string, so its __mul__() returns NotImplemented and the string's __rmul__() is called. It knows what to do, and you get the same result as the first case.
Now we can see that __rmul__() allows all of the string's special multiplication behavior to be contained in the str class, such that other types (such as integers) do not need to know anything about strings to be able to multiply by them. A hundred years from now (assuming Python is still in use) you will be able to define a new type that can be multiplied by an integer in either order, even though the int class has known nothing of it for more than a century.
By the way, the string class's __mul__() has a bug in some versions of Python. If it doesn't know how to multiply itself by an object, it raises a TypeError instead of returning NotImplemented. That means you can't multiply a string by a user-defined type even if the user-defined type has an __rmul__() method, because the string never lets it have a chance. The user-defined type has to go first (e.g. Foo() * 'bar' instead of 'bar' * Foo()) so its __mul__() is called. They seem to have fixed this in Python 2.7 (I tested it in Python 3.2 also), but Python 2.6.6 has the bug.
Binary operators by their nature have two operands. Each operand may be on either the left or the right side of an operator. When you overload an operator for some type, you can specify for which side of the operator the overloading is done. This is useful when invoking the operator on two operands of different types. Here's an example:
class Foo(object):
def __init__(self, val):
self.val = val
def __str__(self):
return "Foo [%s]" % self.val
class Bar(object):
def __init__(self, val):
self.val = val
def __rmul__(self, other):
return Bar(self.val * other.val)
def __str__(self):
return "Bar [%s]" % self.val
f = Foo(4)
b = Bar(6)
obj = f * b # Bar [24]
obj2 = b * f # ERROR
Here, obj will be a Bar with val = 24, but the assignment to obj2 generates an error because Bar has no __mul__ and Foo has no __rmul__.
I hope this is clear enough.
__mul__() is the case of a dot product and the result of the dot product should be a scalar or just a number, i.e. __mul__() results in a dot product multiplication like x1*x2+y1*y2. In __rmul__() , the result is a point with x = x1*x2 and y = y1*y2 .
Related
How can I pass an integer by reference in Python?
I want to modify the value of a variable that I am passing to the function. I have read that everything in Python is pass by value, but there has to be an easy trick. For example, in Java you could pass the reference types of Integer, Long, etc.
How can I pass an integer into a function by reference?
What are the best practices?
It doesn't quite work that way in Python. Python passes references to objects. Inside your function you have an object -- You're free to mutate that object (if possible). However, integers are immutable. One workaround is to pass the integer in a container which can be mutated:
def change(x):
x[0] = 3
x = [1]
change(x)
print x
This is ugly/clumsy at best, but you're not going to do any better in Python. The reason is because in Python, assignment (=) takes whatever object is the result of the right hand side and binds it to whatever is on the left hand side *(or passes it to the appropriate function).
Understanding this, we can see why there is no way to change the value of an immutable object inside a function -- you can't change any of its attributes because it's immutable, and you can't just assign the "variable" a new value because then you're actually creating a new object (which is distinct from the old one) and giving it the name that the old object had in the local namespace.
Usually the workaround is to simply return the object that you want:
def multiply_by_2(x):
return 2*x
x = 1
x = multiply_by_2(x)
*In the first example case above, 3 actually gets passed to x.__setitem__.
Most cases where you would need to pass by reference are where you need to return more than one value back to the caller. A "best practice" is to use multiple return values, which is much easier to do in Python than in languages like Java.
Here's a simple example:
def RectToPolar(x, y):
r = (x ** 2 + y ** 2) ** 0.5
theta = math.atan2(y, x)
return r, theta # return 2 things at once
r, theta = RectToPolar(3, 4) # assign 2 things at once
Not exactly passing a value directly, but using it as if it was passed.
x = 7
def my_method():
nonlocal x
x += 1
my_method()
print(x) # 8
Caveats:
nonlocal was introduced in python 3
If the enclosing scope is the global one, use global instead of nonlocal.
Maybe it's not pythonic way, but you can do this
import ctypes
def incr(a):
a += 1
x = ctypes.c_int(1) # create c-var
incr(ctypes.ctypes.byref(x)) # passing by ref
Really, the best practice is to step back and ask whether you really need to do this. Why do you want to modify the value of a variable that you're passing in to the function?
If you need to do it for a quick hack, the quickest way is to pass a list holding the integer, and stick a [0] around every use of it, as mgilson's answer demonstrates.
If you need to do it for something more significant, write a class that has an int as an attribute, so you can just set it. Of course this forces you to come up with a good name for the class, and for the attribute—if you can't think of anything, go back and read the sentence again a few times, and then use the list.
More generally, if you're trying to port some Java idiom directly to Python, you're doing it wrong. Even when there is something directly corresponding (as with static/#staticmethod), you still don't want to use it in most Python programs just because you'd use it in Java.
Maybe slightly more self-documenting than the list-of-length-1 trick is the old empty type trick:
def inc_i(v):
v.i += 1
x = type('', (), {})()
x.i = 7
inc_i(x)
print(x.i)
A numpy single-element array is mutable and yet for most purposes, it can be evaluated as if it was a numerical python variable. Therefore, it's a more convenient by-reference number container than a single-element list.
import numpy as np
def triple_var_by_ref(x):
x[0]=x[0]*3
a=np.array([2])
triple_var_by_ref(a)
print(a+1)
output:
7
The correct answer, is to use a class and put the value inside the class, this lets you pass by reference exactly as you desire.
class Thing:
def __init__(self,a):
self.a = a
def dosomething(ref)
ref.a += 1
t = Thing(3)
dosomething(t)
print("T is now",t.a)
In Python, every value is a reference (a pointer to an object), just like non-primitives in Java. Also, like Java, Python only has pass by value. So, semantically, they are pretty much the same.
Since you mention Java in your question, I would like to see how you achieve what you want in Java. If you can show it in Java, I can show you how to do it exactly equivalently in Python.
class PassByReference:
def Change(self, var):
self.a = var
print(self.a)
s=PassByReference()
s.Change(5)
class Obj:
def __init__(self,a):
self.value = a
def sum(self, a):
self.value += a
a = Obj(1)
b = a
a.sum(1)
print(a.value, b.value)// 2 2
In Python, everything is passed by value, but if you want to modify some state, you can change the value of an integer inside a list or object that's passed to a method.
integers are immutable in python and once they are created we cannot change their value by using assignment operator to a variable we are making it to point to some other address not the previous address.
In python a function can return multiple values we can make use of it:
def swap(a,b):
return b,a
a,b=22,55
a,b=swap(a,b)
print(a,b)
To change the reference a variable is pointing to we can wrap immutable data types(int, long, float, complex, str, bytes, truple, frozenset) inside of mutable data types (bytearray, list, set, dict).
#var is an instance of dictionary type
def change(var,key,new_value):
var[key]=new_value
var =dict()
var['a']=33
change(var,'a',2625)
print(var['a'])
Reading this answer it seems, that if __eq__ is defined in custom class, __hash__ needs to be defined as well. This is understandable.
However it is not clear, why - effectively - __eq__ should be same as self.__hash__()==other.__hash__
Imagining a class like this:
class Foo:
...
self.Name
self.Value
...
def __eq__(self,other):
return self.Value==other.Value
...
def __hash__(self):
return id(self.Name)
This way class instances could be compared by value, which could be the only reasonable use, but considered identical by name.
This way set could not contain multiple instances with equal name, but comparison would still work.
What could be the problem with such definition?
The reason for defining __eq__, __lt__ and other by Value is to be able to sort instances by Value and to be able to use functions like max. For example, he class should represent a physical output of a device (say heating element). Each of these outputs has unique Name. The Value is power of the output device. To find optimal combination of heating elements to turn on, it is useful to be able to compare them by power (Value). In a set or dictionary, however, it should not be possible to have multiple outputs with same names. Of course, different outputs with different names might easily have equal power.
The problem is that it does not make sense, hash is used to do efficient bucketing of objects. Consequently, when you have a set, which is implemented as a hash table, each hash points to a bucket, which is usually a list of elements. In order to check if an element is in the set (or other hash based container) you go to the bucket pointed by a hash and then you iterate over all elements in the list, comparing them one by one.
In other words - hash is not supposed to be a comparator (as it can, and should give you sometimes a false positive). In particular, in your example, your set will not work - it will not recognize duplicate, as they do not compare to each other.
class Foo:
def __eq__(self,other):
return self.Value==other.Value
def __hash__(self):
return id(self.Name)
a = set()
el = Foo()
el.Name = 'x'
el.Value = 1
el2 = Foo()
el2.Name = 'x'
el2.Value = 2
a.add(el)
a.add(el2)
print len(a) # should be 1, right? Well it is 2
actually it is even worse then that, if you have 2 objects with the same values but different names, they are not recognized to be the same either
class Foo:
def __eq__(self,other):
return self.Value==other.Value
def __hash__(self):
return id(self.Name)
a = set()
el = Foo()
el.Name = 'x'
el.Value = 2
el2 = Foo()
el2.Name = 'a'
el2.Value = 2
a.add(el)
a.add(el2)
print len(a) # should be 1, right? Well it is 2 again
while doing it properly (thus, "if a == b, then hash(a) == hash(b)") gives:
class Foo:
def __eq__(self,other):
return self.Name==other.Name
def __hash__(self):
return id(self.Name)
a = set()
el = Foo()
el.Name = 'x'
el.Value = 1
el2 = Foo()
el2.Name = 'x'
el2.Value = 2
a.add(el)
a.add(el2)
print len(a) # is really 1
Update
There is also an non deterministic part, which is hard to easily reproduce, but essentially hash does not uniquely define a bucket. Usually it is like
bucket_id = hash(object) % size_of_allocated_memory
consequently things that have different hashes can still end up in the same bucket. Consequently, you can get two elements equal to each (inside set) due to equality of Values even though Names are different, as well as the other way around, depending on actual internal implementation, memory constraints etc.
In general there are many more examples where things can go wrong, as hash is defined as a function h : X -> Z such that x == y => h(x) == h(y), thus people implementing their containers, authorization protocols, and other tools are free to assume this property. If you break it - every single tool using hashes can break. Furthermore, it can break in time, meaning that you update some library and your code will stop working, as a valid update to the underlying libraries (using the above assumption) can lead to exploiting your violation of this assumption.
Update 2
Finally, in order to solve your issue - you simply should not define your eq, lt operators to handle sorting. It is about actual comparison of the elements, which should be compatible with the rest of the behaviours. All you have to do is define a separate comparator and use it in your sorting routines (sorting in python accepts any comparator, you do not need to rely on <, > etc.). The other way around is to instead have valid <, >, = defined on values, but in order to keep names unique - keep a set with... well... names, and not objects themselves. Whichever path you choose - the crucial element here is:
equality and hashing have to be compatible, that's all.
It is possible to implement your class like this and not have any problems. However, you have to be 100% sure that no two different objects will ever produce the same hash. Consider the following example:
class Foo:
def __init__(self, name, value):
self.name= name
self.value= value
def __eq__(self, other):
return self.value == other.value
def __hash__(self):
return hash(self.name[0])
s= set()
s.add(Foo('a', 1))
s.add(Foo('b', 1))
print(len(s)) # output: 2
But you have a problem if a hash collision occurs:
s.add(Foo('abc', 1))
print(len(s)) # output: 2
In order to prevent this, you would have to know exactly how the hashes are generated (which, if you rely on functions like id or hash, might vary between implementations!) and also the values of the attribute(s) used to generate the hash (name in this example). That's why ruling out the possibility of a hash collision is very difficult, if not impossible. It's basically like begging for unexpected things to happen.
I am working on a class that requires multiple rules to validate against e.g. if certain pattern appears more than, equal, less than to a certain number. I have the output of this regular expression I am validating in a list and checking the length of the list. Now how do I call one of these methods (__gt__, __eq__ etc) dynamically?
My approach:
func = getattr(len(re.findall('\d+', 'password1'), method_name) #method_name
#can be any one of the values __eq__, __gt__).
if func(desired_length):
print ":validated"
else:
raise Exception("Not sufficient complexity")
For example, for method_name='__eq__' and desired_length=1, the above will result True. For method_name='__gt__' and desired_length=1, the above will result False (if a number appears in this string more than once).
But I realize int objects don't really implement these methods. Is there any way I can achieve this?
Rather than using getattr on the int instance here, maybe you should consider the operator module. Then you can grab the comparison operator and pass the int instance and the desired length. A full example would be something like this:
import operator
method_name = '__eq__'
desired_length = 1
func = getattr(operator, method_name)
n_ints = len(re.findall('\d+', 'password1'))
if func(n_ints, desired_length):
print('Yeah Buddy!')
I want instances of my custom class to be able to compare themselves to one another for similarity. This is different than the __cmp__ method, which is used for determining the sorting order of objects.
Is there a magic method that makes sense for this? Is there any standard syntax for doing this?
How I imagine this could look:
>>> x = CustomClass("abc")
>>> y = CustomClass("abd")
>>> z = CustomClass("xyz")
>>> x.__<???>__(y)
0.75
>>> x <?> y
0.75
>>> x.__<???>__(z)
0.0
>>> x <?> z
0.0
Where <???> is the magic method name and <?> is the operator.
Take a look at the numeric types emulation in the datamodel and pick an operator hook that suits you.
I don't think there is currently an operator that is an exact match though, so you'll end up surprising some poor hapless future code maintainer (could even be you) that you overloaded a standard operator.
For a Levenshtein Distance I'd just use a regular method instead. I'd find a one.similarity(other) method a lot clearer when reading the code.
well, you could override __eq__ to mean both boolean logical equality and 'fuzzy' simlirity, by returning a sufficiently weird result from __eq__:
class FuzzyBool(object):
def __init__(self, quality, tolerance=0):
self.quality, self._tolerance = quality, tolerance
def __nonzero__(self):
return self.quality <= self._tolerance
def tolerance(self, tolerance):
return FuzzyBool(self.quality, tolerance)
def __repr__(self):
return "sorta %s" % bool(self)
class ComparesFuzzy(object):
def __init__(self, value):
self.value = value
def __eq__(self, other):
return FuzzyBool(abs(self.value - other.value))
def __hash__(self):
return hash((ComparesFuzzy, self.value))
>>> a = ComparesFuzzy(1)
>>> b = ComparesFuzzy(2)
>>> a == b
sorta False
>>> (a == b).tolerance(3)
sorta True
the default behavior of the comparator should be that it is Truthy only if the compared values are exactly equal, so that normal equality is unaffected
No, there is not. You can make a class method, but I don't think there is any intuitive operator to overload that would do what you're looking for. And, to avoid confusion, I would avoid overloading unless it is obviously intuitive.
I would simply call is CustomClass.similarity(y)
I don't think there is a magic method (and corresponding operator) that would make sense for this in any context.
However, if, with a bit of fantasy, your instances can be seen as vectors, then checking for similarity could be analogous to calculating the scalar product. It would make sense then to use __mul__ and multiplication sign for this (unless you have already defined product for CustomClass instances).
No magic function/operator for that.
When I think of "similarity" for ints and floats, I think of the difference being lower than a certain threshold. Perhaps that's something you might use?
E.g. being able to calculate the "difference" between your objects might be suitable in the sub method.
In the example you've cited, I would use difflib. This conducts spell-check like comparisons between strings. But in general, if you really are comparing objects rather than strings, then I agree with the others; you should probably create something context-specific.
I've noticed that when an instance with an overloaded __str__ method is passed to the print function as an argument, it prints as intended. However, when passing a container that contains one of those instances to print, it uses the __repr__ method instead. That is to say, print(x) displays the correct string representation of x, and print(x, y) works correctly, but print([x]) or print((x, y)) prints the __repr__ representation instead.
First off, why does this happen? Secondly, is there a way to correct that behavior of print in this circumstance?
The problem with the container using the objects' __str__ would be the total ambiguity -- what would it mean, say, if print L showed [1, 2]? L could be ['1, 2'] (a single item list whose string item contains a comma) or any of four 2-item lists (since each item can be a string or int). The ambiguity of type is common for print of course, but the total ambiguity for number of items (since each comma could be delimiting items or part of a string item) was the decisive consideration.
I'm not sure why exactly the __str__ method of a list returns the __repr__ of the objects contained within - so I looked it up: [Python-3000] PEP: str(container) should call str(item), not repr(item)
Arguments for it:
-- containers refuse to guess what the user wants to see on str(container) - surroundings, delimiters, and so on;
-- repr(item) usually displays type information - apostrophes around strings, class names, etc.
So it's more clear about what exactly is in the list (since the object's string representation could have commas, etc.). The behavior is not going away, per Guido "BDFL" van Rossum:
Let me just save everyone a lot of
time and say that I'm opposed to this
change, and that I believe that it
would cause way too much disturbance
to be accepted this close to beta.
Now, there are two ways to resolve this issue for your code.
The first is to subclass list and implement your own __str__ method.
class StrList(list):
def __str__(self):
string = "["
for index, item in enumerate(self):
string += str(item)
if index != len(self)-1:
string += ", "
return string + "]"
class myClass(object):
def __str__(self):
return "myClass"
def __repr__(self):
return object.__repr__(self)
And now to test it:
>>> objects = [myClass() for _ in xrange(10)]
>>> print objects
[<__main__.myClass object at 0x02880DB0>, #...
>>> objects = StrList(objects)
>>> print objects
[myClass, myClass, myClass #...
>>> import random
>>> sample = random.sample(objects, 4)
>>> print sample
[<__main__.myClass object at 0x02880F10>, ...
I personally think this is a terrible idea. Some functions - such as random.sample, as demonstrated - actually return list objects - even if you sub-classed lists. So if you take this route there may be a lot of result = strList(function(mylist)) calls, which could be inefficient. It's also a bad idea because then you'll probably have half of your code using regular list objects since you don't print them and the other half using strList objects, which can lead to your code getting messier and more confusing. Still, the option is there, and this is the only way to get the print function (or statement, for 2.x) to behave the way you want it to.
The other solution is just to write your own function strList() which returns the string the way you want it:
def strList(theList):
string = "["
for index, item in enumerate(theList):
string += str(item)
if index != len(theList)-1:
string += ", "
return string + "]"
>>> mylist = [myClass() for _ in xrange(10)]
>>> print strList(mylist)
[myClass, myClass, myClass #...
Both solutions require that you refactor existing code, unfortunately - but the behavior of str(container) is here to stay.
Because when you print the list, generally you're looking from the programmer's perspective, or debugging. If you meant to display the list, you'd process its items in a meaningful way, so repr is used.
If you want your objects to be printed while in containers, define repr
class MyObject:
def __str__(self): return ""
__repr__ = __str__
Of course, repr should return a string that could be used as code to recreate your object, but you can do what you want.