I have read
What is the difference between #staticmethod and #classmethod in
Python?
Python #classmethod and #staticmethod for beginner?
As staticmethod can't access the instance of that class, I don't know what's the difference betweent it and global function?
And when should use staticmethod? Can give a good example?
Like global function, static method cannot access the instance of the containing class. But it conceptually belongs to the containing class. The other benefit is it can avoid name confliction.
When the function is designed to serve for some given class, it's advisable to make it as a static method of that class. This is called cohesion. Besides, if this function is not used outside, you can add underscore before it to mark it as "private", this is called information hiding(despite Python doesn't really support private methods). As a rule of thumb, exposing as little interfaces as possible will make code more clean and less subject to change.
Even if that function is supposed to serve as a shared utility for many classes that are across multiple modules, making it a global function is still not the first choice. Consider to make it as some utility class's static method, or make it a global function in some specialized module. One reason for this is collecting similar-purposed functions into a common class or module is good for another level's abstraction/modularization(for small projects, some people may argue that this is overengineering). The other reason is this may reduce namespace pollution.
A static method is contained in a class (adding a namespace as pointed out by #MartijnPieters). A global function is not.
IMO it is more of a design question, rather than a technical one. If you feel that the logic belongs to a class (not the instance) add it as a staticmethod, if it's unrelated implement it as a global function.
For example:
class Image(object):
#staticmethod
def to_grayscale(pixel):
# ...
IMO is better than
def to_grayscale(pixel):
#...
class Image(object):
# ...
Related
I have a class that includes some auxiliary functions that do not operate on object data. Ordinarily I would leave these methods private, but I am in Python so there is no such thing. In testing, I am finding it slightly goofy to have to instantiate an instance of my class in order to be able to call these methods. Is there a solid theoretical reason to choose to keep these methods non-static or to make them static?
If a method does not need access to the current instance, you may want to make it either a classmethod, a staticmethod or a plain function.
A classmethod will get the current class as first param. This enable it to access the class attributes including other classmethods or staticmethods. This is the right choice if your method needs to call other classmethods or staticmethods.
A staticmethod only gets it's explicit argument - actually it nothing but a function that can be resolved on the class or instance. The main points of staticmethods are specialization - you can override a staticmethod in a child class and have a method (classmethod or instancemethod) of the base class call the overridden version of the staticmethod - and eventually ease of use - you don't need to import the function apart from the class (at this point it's bordering on lazyness but I've had a couple cases with dynamic imports etc where it happened to be handy).
A plain function is, well, just a plain function - no class-based dispatch, no inheritance, no fancy stuff. But if it's only a helper function used internally by a couple classes in the same module and not part of the classes nor module API, it's possibly just what you're looking for.
As a last note: you can have "some kind of" privacy in Python. Mostly, prefixing a name (whether an attribute / method / class or plain function) with a single leading underscore means "this is an implementation detail, it's NOT part of the API, you're not even supposed to know it exists, it might change or disappear without notice, so if you use it and your code breaks then it's your problem".
If you want to keep said methods in the class just for structural reasons, you might as well make them static, by using the #staticmethod decorator:
class Foo():
#staticmethod
def my_static_method(*args, **kwargs):
....
Your first argument will not be interpretted as the object itself, and you can use it from either the class or an object from that class. If you still need to access class attributes in your method though, you can make it a class method:
class Bar():
counter = 0
#classmethod
def my_class_method(cls, *args, **kwargs):
cls.counter += 1
...
Your first argument of the class method will obviously be the class instead of the instance.
If you do not use any class or instance attribute, I can see no "theoretical" reason to not make them static. Some IDE's even highlight this as a soft warning to prompt you to make the method static if it does not use or mutate any class/instance attribute.
In Python, I have a class that I've built.
However, there is one method where I apply a rather specific type of substring-search procedure. This procedure could be a standalone function by itself (it just requires a needle a haystack string), but it feels odd to have the function outside the class, because my class depends on it.
What is the typical design paradigm for this? Is it typical to just have myClassName.py with the main class, as well as all the support functions outside the class itself, in the same file? Or is it better to have the support function embedded within the class at the expense of modularity?
You can create a staticmethod, like so:
class yo:
#staticmethod
def say_hi():
print "Hi there!"
Then, you can do this:
>>> yo.say_hi()
Hi there!
>>> a = yo()
>>> a.say_hi()
Hi there!
They can be used non-statically, and statically (if that makes sense).
About where to put your functions...
If a method is required by a class, and it is appropriate for the method to perform data that is specific to the class, then make it a method. This is what you would want:
class yo:
self.message = "Hello there!"
def say_message(self):
print self.message
My say_message relies on the data that is particular to the instance of a class.
If you feel the need to have a function, in addition to the class method, by all means go ahead. Use whichever one is more appropriate in your script. There are many examples of this, including in the python built-ins. Take generator objects for example:
a = my_new_generator()
a.next()
Can also be done as:
a = my_new_generator()
next(a)
Use whichever is more appropriate, and obviously whichever one is more readable. :)
If you can think or any reason to override this function one day, make it a staticmethod, else a plain function is just ok - FWIW, your class probably depends on much more than this simple function. And if you cannot think of any reason for anyone else to ever use this function, keep it in the same module as your class.
As a side note: "myClassName.py" is definitly unpythonic. First because module names should be all_lower, then because the one-module-per-class stuff is a nonsense in Python - we group related classes and functions (and exceptions and whatnots) together.
If the search method you are talking about is really so specific and you will never need to reuse it somewhere else, I do not see any reason to make it static. The fact that it doesn't require access to instance variables doesn't make it static by definition.
If there is a possibility, that this method is going to be reused, refactor it into a helper/utility class (no static again).
ADDED:
Just wanted to add, that when you consider something being static or not, think about how method name relates to the class name. Does this method name makes more sense when used in class context or object context?
I am writing a framework to be used by people who know some Python. I have settled on some syntax, and it makes sense to me and them to use something like this, where Base is the Base class that implements the framework.
class A(Base):
#decorator1
#decorator2
#decorator3
def f(self):
pass
#decorator4
def f(self):
pass
#decorator5
def g(self)
pass
All my framework is implemented via Base's metaclass. This setup is appropriate for my use case, because all these user-defined classes have a rich inheritance graph. I expect the user to implement some of the methods, or just leave it with pass. Much of the information that the user is giving here is in the decorators. This allows me to avoid other solutions where the user would have to monkey-patch, give less structured dictionaries, and things like that.
My problem here is that f is defined twice by the user (with good reason), and this should be handled by my framework. Unfortunately, by the time this gets to the metaclass'__new__ method, the dictionary of attributes contains only one key f. My idea was to use yet another decorator, such as #duplicate for the user to signal this is happening, and the two f's to be wrapped differently so they don't overwrite each other. Can something like this work?
You should use a namespace to distinguish the different fs.
Heed the advice of the "Zen of Python":
Namespaces are one honking great idea -- let's do more of those!
Yes, you could, but only with an ugly hack, and only by storing the function with a new name.
You'd have to use the sys._getframe() function to retrieve the local namespace of the calling frame. That local namespace is the class-in-construction, and adding items to that namespace means they'll end up in the class dictionary passed to your metaclass.
The following retrieves that namespace:
callframe = sys._getframe(1)
namespace = callframe.f_locals
namespace is a dict-like object, just like locals(). You can now store something in that namespace (like __function_definitions__ or similar) to add extra references to the functions.
You might be thinking Java - methods overloading and arguments signature - but this is Python and you cannot do this. The second f() will override the first f() and you end up with only one f(). The namespace is a dictionary and you cannot have duplicated keys.
Python classes have no concept of public/private, so we are told to not touch something that starts with an underscore unless we created it. But does this not require complete knowledge of all classes from which we inherit, directly or indirectly? Witness:
class Base(object):
def __init__(self):
super(Base, self).__init__()
self._foo = 0
def foo(self):
return self._foo + 1
class Sub(Base):
def __init__(self):
super(Sub, self).__init__()
self._foo = None
Sub().foo()
Expectedly, a TypeError is raised when None + 1 is evaluated. So I have to know that _foo exists in the base class. To get around this, __foo can be used instead, which solves the problem by mangling the name. This seems to be, if not elegant, an acceptable solution. However, what happens if Base inherits from a class (in a separate package) called Sub? Now __foo in my Sub overrides __foo in the grandparent Sub.
This implies that I have to know the entire inheritance chain, including all "private" objects each uses. The fact that Python is dynamically-typed makes this even harder, since there are no declarations to search for. The worst part, however, is probably the fact Base might inherit from object right now, but in some future release, it switches to inheriting from Sub. Clearly if I know Sub is inherited from, I can rename my class, however annoying that is. But I can't see into the future.
Is this not a case where a true private data type would prevent a problem? How, in Python, can I be sure that I'm not accidentally stepping on somebody's toes if those toes might spring into existence at some point in the future?
EDIT: I've apparently not made clear the primary question. I'm familiar with name mangling and the difference between a single and a double underscore. The question is: how do I deal with the fact that I might clash with classes whose existence I don't know of right now? If my parent class (which is in a package I did not write) happens to start inheriting from a class with the same name as my class, even name mangling won't help. Am I wrong in seeing this as a (corner) case that true private members would solve, but that Python has trouble with?
EDIT: As requested, the following is a full example:
File parent.py:
class Sub(object):
def __init__(self):
self.__foo = 12
def foo(self):
return self.__foo + 1
class Base(Sub):
pass
File sub.py:
import parent
class Sub(parent.Base):
def __init__(self):
super(Sub, self).__init__()
self.__foo = None
Sub().foo()
The grandparent's foo is called, but my __foo is used.
Obviously you wouldn't write code like this yourself, but parent could easily be provided by a third party, the details of which could change at any time.
Use private names (instead of protected ones), starting with a double underscore:
class Sub(Base):
def __init__(self):
super(Sub, self).__init__()
self.__foo = None
# ^^
will not conflict with _foo or __foo in Base. This is because Python replaces the double underscore with a single underscore and the name of the class; the following two lines are equivalent:
class Sub(Base):
def x(self):
self.__foo = None # .. is the same as ..
self._Sub__foo = None
(In response to the edit:) The chance that two classes in a class hierarchy not only have the same name, but that they are both using the same property name, and are both using the private mangled (__) form is so minuscule that it can be safely ignored in practice (I for one haven't heard of a single case so far).
In theory, however, you are correct in that in order to formally verify correctness of a program, one most know the entire inheritance chain. Luckily, formal verification usually requires a fixed set of libraries in any case.
This is in the spirit of the Zen of Python, which includes
practicality beats purity.
Name mangling includes the class so your Base.__foo and Sub.__foo will have different names. This was the entire reason for adding the name mangling feature to Python in the first place. One will be _Base__foo, the other _Sub__foo.
Many people prefer to use composition (has-a) instead of inheritance (is-a) for some of these very reasons.
This implies that I have to know the entire inheritance chain. . .
Yes, you should know the entire inheritance chain, or the docs for the object you are directly sub-classing should tell you what you need to know.
Subclassing is an advanced feature, and should be treated with care.
A good example of docs specifying what should be overridden in a subclass is the threading class:
This class represents an activity that is run in a separate thread of control. There are two ways to specify the activity: by passing a callable object to the constructor, or by overriding the run() method in a subclass. No other methods (except for the constructor) should be overridden in a subclass. In other words, only override the __init__() and run() methods of this class.
How often do you modify base classes in inheritance chains to introduce inheritance from a class with the same name as a subclass further down the chain???
Less flippantly, yes, you have to know the code you are working with. You certainly have to know the public names being used, after all. Python being python, discovering the public names in use by your ancestor classes takes pretty much the same effort as discovering the private ones.
In years of Python programming, I have never found this to be much of an issue in practice. When you're naming instance variables, you should have a pretty good idea whether (a) a name is generic enough that it's likely to be used in other contexts and (b) the class you're writing is likely to be involved in an inheritance hierarchy with other unknown classes. In such cases, you think a bit more carefully about the names you're using; self.value isn't a great idea for an attribute name, and neither is something like Adaptor a great class name.
In contrast, I have run into difficulties with the overuse of double-underscore names a number of times. Python being Python, even "private" names tend to be accessed by code defined outside the class. You might think that it would always be bad practice to let an external function access "private" attributes, but what about things like getattr and hasattr? The invocation of them can be in the class's own code, so the class is still controlling all access to the private attributes, but they still don't work without you doing the name-mangling manually. If Python had actually-enforced private variables you couldn't use functions like those on them at all. These days I tend to reserve double-underscore names for cases when I'm writing something very generic like a decorator, metaclass, or mixin that needs to add a "secret attribute" to the instances of the (unknown) classes it's applied to.
And of course there's the standard dynamic language argument: the reality is that you have to test your code thoroughly to have much justification in making the claim "my software works". Such testing will be very unlikely to miss the bugs caused by accidentally clashing names. If you are not doing that testing, then many more uncaught bugs will be introduced by other means than by accidental name clashes.
In summation, the lack of private variables is just not that big a deal in idiomatic Python code in practice, and the addition of true private variables would cause more frequent problems in other ways IMHO.
Mangling happens with double underscores. Single underscores are more of a "please don't".
You don't need to know all the details of all parent classes (note that deep inheritance is usually best avoided), because you can still dir() and help() and any other form of introspection you can come up with.
As noted, you can use name mangling. However, you can stick with a single underscore (or none!) if you document your code adequately - you should not have so many private variables that this proves to be a problem. Just say if a method relies on a private variable, and add either the variable, or the name of the method to the class docstring to alert users.
Further, if you create unit tests, you should create tests that check invariants on members, and accordingly these should be able to show up such name clashes.
If you really want to have "private" variables, and for whatever reason name-mangling doesn't meet your needs, you can factor your private state into another object:
class Foo(object):
class Stateholder(object): pass
def __init__(self):
self._state = Stateholder()
self.state.private = 1
I have inherited code in which there are standalone functions, one per country code. E.g.
def validate_fr(param):
pass
def validate_uk(param):
pass
My idea is to create a class to group them together and consolidate the code into one method. Unfortunately that breaks cohesion. Another option is to dispatch to instance methods ?
class Validator(object):
def validate(param, country_code):
# dispatch
Alas, python does not have a switch statement.
UPDATE: I am still not convinced why I should leave them as global functions in my module. Lumping them as class methods seems cleaner.
I would keep the functions at module level -- no need for a class if you don't want to instantiate it anyway. The switch statement can easily be simulated using a dicitonary:
def validate_fr(param):
pass
def validate_uk(param)
pass
validators = {"fr": validate_fr,
"uk": validate_uk}
def validate(country_code, param):
return validators[country_code](param)
Given the naming scheme, you could also do it without the dictionary:
def validate(country_code, param):
return gloabls()["validate_" + country_code](param)
You do not need a switch statement for this.
validators = {
'fr': Validator(...),
'uk': Validator(...),
...
}
...
validators['uk'](foo)
Classes are not meant to group functions together, modules are. Functions in a class should be either methods that operate on the object itself (changing it's state, emitting information about the state, etc.) or class methods that do the same, but for the class itself (classes in Python are also objects). There's not even a need for static methods in Python, since you can always have functions at module level. As they say: Flat is better than nested.
If you want to have a set of functions place them in separate module.