When and how are static methods suppose to be used in python? We have already established using a class method as factory method to create an instance of an object should be avoided when possible. In other words, it is not best practice to use class methods as an alternate constructor (See Factory method for python object - best practice).
Lets say I have a class used to represent some entity data in a database. Imagine the data is a dict object containing field names and field values and one of the fields is an ID number that makes the data unique.
class Entity(object):
def __init__(self, data, db_connection):
self._data = data
self._db_connection
Here my __init__ method takes the entity data dict object. Lets say I only have an ID number and I want to create an Entity instance. First I will need to find the rest of the data, then create an instance of my Entity object. From my previous question, we established that using a class method as a factory method should probably be avoided when possible.
class Entity(object):
#classmethod
def from_id(cls, id_number, db_connection):
filters = [['id', 'is', id_number]]
data = db_connection.find(filters)
return cls(data, db_connection)
def __init__(self, data, db_connection):
self._data = data
self._db_connection
# Create entity
entity = Entity.from_id(id_number, db_connection)
Above is an example of what not to do or at least what not to do if there is an alternative. Now I am wondering if editing my class method so that it is more of a utility method and less of a factory method is a valid solution. In other words, does the following example comply with the best practice for using static methods.
class Entity(object):
#staticmethod
def data_from_id(id_number, db_connection):
filters = [['id', 'is', id_number]]
data = db_connection.find(filters)
return data
# Create entity
data = Entity.data_from_id(id_number, db_connection)
entity = Entity(data)
Or does it make more sense to use a standalone function to find the entity data from an ID number.
def find_data_from_id(id_number, db_connection):
filters = [['id', 'is', id_number]]
data = db_connection.find(filters)
return data
# Create entity.
data = find_data_from_id(id_number, db_connection)
entity = Entity(data, db_connection)
Note: I do not want to change my __init__ method. Previously people have suggested making my __init__ method to look something like this __init__(self, data=None, id_number=None) but there could be 101 different ways to find the entity data so I would prefer to keep that logic separate to some extent. Make sense?
When and how are static methods suppose to be used in python?
The glib answer is: Not very often.
The even glibber but not quite as useless answer is: When they make your code more readable.
First, let's take a detour to the docs:
Static methods in Python are similar to those found in Java or C++. Also see classmethod() for a variant that is useful for creating alternate class constructors.
So, when you need a static method in C++, you need a static method in Python, right?
Well, no.
In Java, there are no functions, just methods, so you end up creating pseudo-classes that are just bundles of static methods. The way to do the same thing in Python is to just use free functions.
That's pretty obvious. However, it's good Java style to look as hard as possible for an appropriate class to wedge a function into, so you can avoid writing those pseudo-classes, while doing the same thing is bad Python style—again, use free functions—and this is much less obvious.
C++ doesn't have the same limitation as Java, but many C++ styles are pretty similar anyway. (On the other hand, if you're a "Modern C++" programmer who's internalized the "free functions are part of a class's interface" idiom, your instincts for "where are static methods useful" are probably pretty decent for Python.)
But if you're coming at this from first principles, rather than from another language, there's a simpler way to look at things:
A #staticmethod is basically just a global function. If you have a function foo_module.bar() that would be more readable for some reason if it were spelled as foo_module.BazClass.bar(), make it a #staticmethod. If not, don't. That's really all there is to it. The only problem is building up your instincts for what's more readable to an idiomatic Python programmer.
And of course use a #classmethod when you need access to the class, but not the instance—alternate constructors are the paradigm case for that, as the docs imply. Although you often can simulate a #classmethod with a #staticmethod just by explicitly referencing the class (especially when you don't have much subclassing), you shouldn't.
Finally, getting to your specific question:
If the only reason clients ever need to look up data by ID is to construct an Entity, that sounds like an implementation detail you shouldn't be exposing, and it also makes client code more complex. Just use a constructor. If you don't want to modify your __init__ (and you're right that there are good reasons you might not want to), use a #classmethod as an alternate constructor: Entity.from_id(id_number, db_connection).
On the other hand, if that lookup is something that's inherently useful to clients in other cases that have nothing to do with Entity construction, it seems like this has nothing to do with the Entity class (or at least no more than anything else in the same module). So, just make it a free function.
The answer to the linked question specifically says this:
A #classmethod is the idiomatic way to do an "alternate constructor"—there are examples all over the stdlib—itertools.chain.from_iterable, datetime.datetime.fromordinal, etc.
So I don't know how you got the idea that using a classmethod is inherently bad. I actually like the idea of using a classmethod in your specific situation, as it makes following the code and using the api easy.
The alternative would be to use default constructor arguments like so:
class Entity(object):
def __init__(self, id, db_connection, data=None):
self.id = id
self.db_connection = db_connection
if data is None:
self.data = self.from_id(id, db_connection)
else:
self.data = data
def from_id(cls, id_number, db_connection):
filters = [['id', 'is', id_number]]
return db_connection.find(filters)
I prefer the classmethod version that you wrote originally however. Especially since data is fairly ambiguous.
Your first example makes the most sense to me: Entity.from_id is pretty succinct and clear.
It avoids the use of data in the next two examples, which does not describe what's being returned; the data is used to construct an Entity. If you wanted to be specific about the data being used to construct the Entity, then you could name your method something like Entity.with_data_for_id or the equivalent function entity_with_data_for_id.
Using a verb such as find can also be pretty confusing, as it doesn't give any indication of the return value — what is the function supposed to do when it's found the data? (Yes, I realize str has a find method; wouldn't it be better named index_of? But then there's also index...) It reminds me of the classic:
I always try to think what a name would indicate to someone with (a) no knowledge of the system, and (b) knowledge of other parts of the system — not to say I'm always successful!
Here is a decent use case for #staticmethod.
I have been working on a game as a side project. Part of that game includes rolling dice based on stats, and the possibility of picking up items and effects that impact your character's stats (for better or worse).
When I roll the dice in my game, I need to basically say... take the base character stats and then add any inventory and effect stats into this grand netted figure.
You can't take these abstract objects and add them without instructing the program how. I'm not doing anything at the class level or instance level either. I didn't want to define the function in some global module. The last best option was to go with a static method for adding up stats together. It just makes the most sense this way.
class Stats:
attribs = ['strength', 'speed', 'intellect', 'tenacity']
def __init__(self,
strength=0,
speed=0,
intellect=0,
tenacity=0
):
self.strength = int(strength)
self.speed = int(speed)
self.intellect = int(intellect)
self.tenacity = int(tenacity)
# combine adds stats objects together and returns a single stats object
#staticmethod
def combine(*args: 'Stats'):
assert all(isinstance(arg, Stats) for arg in args)
return_stats = Stats()
for stat in Stats.attribs:
for _ in args:
setattr(return_stats, stat,
getattr(return_stats, stat) + getattr(_, stat))
return (return_stats)
Which would make the stat combination calls work like this
a = Stats(strength=3, intellect=3)
b = Stats(strength=1, intellect=-1)
c = Stats(tenacity=5)
print(Stats.combine(a, b, c).__dict__)
{'strength': 4, 'speed': 0, 'intellect': 2, 'tenacity': 5}
Related
I have a data engineering program that is grabbing some data off of Federal government websites and transforming that data. I'm a bit confused on whether I need to use the 'self' keyword or if it's a better practice to not use a class at all. This is how it's currently organized:
class GetGovtData():
def get_data_1(arg1=0, arg2=1):
df = conduct_some_operations
return df
def get_data_2(arg1=4, arg2=5):
df = conduct_some_operations_two
return df
I'm mostly using a class here for organization purposes. For instance, there might be a dozen different methods from one class that I need to use. I find it more aesthetically pleasing / easier to type out this:
from data.get_govt_data import GetGovtData
df1 = GetGovtData.get_data_1()
df2 = GetGovtData.get_data_2()
Rather than:
from data import get_govt_data
df1 = get_govt_data.get_data_1()
df2 = get_govt_data.get_data_2()
Which just has a boatload of underscores. So I'm just curious if this would be considered bad code to use a class like this, without bothering with 'self'? Or should I just eliminate the classes and use a bunch of functions in my files instead?
If you develop functions within a Python class you can two ways of defining a function: The one with a self as first parameter and the other one without self.
So, what is the different between the two?
Function with self
The first one is a method, which is able to access content within the created object. This allows you to access the internal state of an individual object, e.g., a counter of some sorts. These are methods you usually use when using object oriented programming. A short intro can be fund here [External Link]. These methods require you to create new instances of the given class.
Function without self
Functions without initialising an instance of the class. This is why you can directly call them on the imported class.
Alternative solution
This is based on the comment of Tom K. Instead of using self, you can also use the decorator #staticmethod to indicate the role of the method within your class. Some more info can be found here [External link].
Final thought
To answer you initial question: You do not need to use self. In your case you do not need self, because you do not share the internal state of an object. Nevertheless, if you are using classes you should think about an object oriented design.
I suppose you have a file called data/get_govt_data.py that contains your first code block. You can just rename that file to data/GetGovtData.py, remove the class line and not bother with classes at all, if you like. Then you can do
from data import GetGovtData
df1 = GetGovtData.get_data_1()
Depending on your setup you may need to create an empty file data/__init__.py for Python to see data as a module.
EDIT: Regarding the file naming, Python does not impose any too tight restrictions here. Note however that many projects conventionally use camelCase or CapitalCase to distinguish function, class and module names. Using CapitalCase for a module may confuse others for a second to assume it's a class. You may choose not to follow this convention if you do not want to use classes in your project.
To answer the question in the title first: The exact string 'self' is a convention (that I can see no valid reason to ignore BTW), but the first argument in a class method is always going to be a reference to the class instance.
Whether you should use a class or flat functions depends on if the functions have shared state. From your scenario it sounds like they may have a common base URL, authentication data, database names, etc. Maybe you even need to establish a connection first? All those would be best held in the class and then used in the functions.
So I've used python as a functional language for a while but I'm trying to do thing "right" and use classes now... and falling down. I'm trying to write a classmethod that can instantiate multiple members of the class (use case is load rows from SQLAlchemy.) I'd like to just be able to call the classmethod and have it return a status code (success/failure) rather than returning a list of objects. Then to access the objects I'll iterate through the class. Here's my code so far (which fails to iterate when I use the classmethod, works fine when I use the normal constructor.) Am I way off-base/crazy here? What's the "pythonic" way to do this? Any help is appreciated and thank you.
class KeepRefs(object):
__refs__ = defaultdict(list)
def __init__(self):
self.__refs__[self.__class__].append(weakref.ref(self))
#classmethod
def get_instances(cls):
for inst_ref in cls.__refs__[cls]:
inst = inst_ref()
if inst is not None:
yield inst
class Credentials(KeepRefs):
def __init__(self,name, username, password):
super(Credentials, self).__init__()
self.name=name
self.username=username
self.password=password
#classmethod
def loadcreds(cls):
Credentials('customer1','bob','password')
return True
success = Credentials.loadcreds()
for i in Credentials.get_instances():
print (i.name)
In your own words - yes, you are off-base and crazy :)
Status-Codes are a thing of C, not languages with proper exception semantics as Python. Modifying global state is a sure recipe for disaster. So - don't do it. Return a list of objects. Throw an exception if something disastrous happens, and just return an empty list if there happen to be no objects. This allows the client code to just do
for item in Thingies.load_thingies():
... # this won't do anything if load_thingies gave us an empty list
without having to painstakingly check something before using it.
Functional languages have certain advantages, and you are going too far the other way in your exploration of the procedural style. Global variables and class variable have their place, but what will happen if you need to fire off two SQAlchemy queries and consume the results in parallels? The second query will stomp over the class attributes that the first one still needs, is what. Using an object attribute (instance attribute) solves the problem, since each result contains its own handle.
If your concern is to avoid pre-fetching the array of results, you are in luck because Python offers the perfect solution: Generators, which are basically lazy functions. They are so nicely integrated in Python, I bet you didn't know you've been using them with every for-loop you write.
In my Python app, I need to create a simple RequestContext object that contains the actual request to send, as well as some other metadata about the request, e.g. index, id, source, etc. Also, it's likely that more metadata will be added.
I can think of two ways to do this:
Option 1: Attributes of the RequestContext:
class RequestContext(object):
def __init__(self, request, index=0, source=None):
self.request = request
self.index = index
self.source = source
...
Option 2: Dictionary in the RequestContext:
class RequestContext(object):
def __init__(self, request):
self.request = request
self.context_info = {}
Then users can add whatever context info they want.
I personally like accessing values through attributes, because you have a predefined set of attributes that you know are there.
On the other hand, a dictionary lets the client code (owned by me) add more metadata without having to update the RequestContext object definition.
So which one would be better in terms of ease of use and ease of adding more metadata? Are there other pitfalls or considerations that I should think about?
First, you seem to be operating under a misapprehension:
On the other hand, a dictionary lets the client code (owned by me) add more metadata without having to update the RequestContext object definition.
You don't have to update the class definition to add new attributes. (You can force the class to have a fixed set of attributes in various ways—e.g., by using __slots__. But if you don't do any of that, you can add new attributes on the fly.) For example:
>>> class C(object):
... def __init__(self, x):
... self.x = x
>>> c = C(10)
>>> c.y = 20
>>> print(c.x, c.y)
10 20
In fact, if you look under the covers, attributes are (by default) stored in a perfectly normal dictionary, named __dict__:
>>> print(c.__dict__)
{'x': 10, 'y': 20}
So, what is the difference between using attributes, vs. just adding a dictionary attribute and using members of that dictionary (or, alternatively, inheriting from or delegating to a dict)?
Really, it's the same as the difference between using separate variables vs. a single dict at the top level.
One way to look at it is whether the name-value pairs are data, or whether the names are part of your program and only the values are data. In the former case, you want a dict; in the latter case, you want separate variables.
Alternatively, you can ask how dynamic the names are. If there's an open-ended set of values whose names are only known at runtime, you want a dict. If there's a fixed set of values whose names are hardcoded into your source, you want attributes.
Finally, just ask yourself how often you'd have to use getattr and setattr if you went with attributes. If the answer is "frequently", you want a dict; if it's "never", you want attributes.
In many real-life apps, it's not entirely clear, because you're somewhere between the two. Sometimes, rethinking your design can make it clearly one or the other, but sometimes things are just inherently "sort of dynamic". In that case, you have to make a judgment call: decide which of the two cases you're closest to, and code as if that were the real case.
It may be worth looking at some real-life open source apps that are similar to yours. For example, you're dealing with metadata about some kind of requests. Maybe look at how requests and pycurl deal with HTTP information that's kind of like metadata, like headers and status lines. Or maybe look at how QuodLibet and MusicBrainz Picard deal with metadata in a different domain, music files. And so on.
I want to create a class with two methods at this point (I also want to be able to
alter the class obviously).
class ogrGeo(object):
def __init__(self):
pass
def CreateLine(self, o_file, xy):
#lots of code
def CreatePoint(self, o_file, xy):
# lot's of the same code as CreateLine(),
# only minor differences
To keep things as clean and to to repeat as
less code as possible I'm asking for some advise. The two methods CreateLine()
and CreatePoint() share a lot of code. To reduce redundance:
Should a define third method that both methods can call?
In this case you could still call
o = ogrGeo()
o.CreateLine(...)
o.CreatePoint(...)seperatly.
Or should I merge them into one method? Is there another solution I haven't thought about or know nothing about?
Thanks already for any suggestions.
Whether you should merge the methods into one is a matter of API design. If the functions have a different purpose, then you keep them seperate. I would merge them if client code is likely to follow the pattern
if some_condition:
o.CreateLine(f, xy)
else:
o.CreatePoint(f, xy)
But otherwise, don't merge. Instead, refactor the common code into a private method, or even a freestanding function if the common code does not touch self. Python has no notion of "private method" built into the language, but names with a leading _ will be recognized as such.
It's perfectly normal to factor out common code into a (private) helper method:
class ogrGeo(object)
def __init__(self):
pass
def CreateLine(self, o_file, xy):
#lots of code
value = self._utility_method(xy)
def CreatePoint(self, o_file, xy):
# lot's of the same code as CreateLine(),
# only minor differences
value = self._utility_method(xy)
def _utility_method(self, xy):
# Common code here
return value
The method could return a value, or it could directly manipulate the attributes on self.
A word of advice: read the Python style guide and stick to it's conventions. Most other python projects do, and it'll make your code easier to comprehend for other Python developers if you do.
For the pieces of code that will overlap, consider whether those can be their own separate functions as well. Then CreateLine would be comprised of several calls to certain functions, with parameter choices that make sense for CreateLine, meanwhile CreatePoint would be several function calls with appropriate parameters for creating a point.
Even if those new auxiliary functions aren't going to be used elsewhere, it's better to modularize them as separate functions than to copy/paste code. But, if it is the case that the auxialiary functions needed to create these structures are pretty specific, then why not break them out into their own classes?
You could make an "Object" class that involves all of the basics for creating objects, and then have "Line" and "Point" classes which derive from "Object". Within those classes, override the necessary functions so that the construction is specific, relying on auxiliary functions in the base "Object" class for the portions of code that overlap.
Then the ogrGeo class will construct instances of these other classes. Even if the ultimate consumer of "Line" or "Shape" doesn't need a full blown class object, you can still use this design, and give ogrGeo the ability to return the sub-pieces of a Line instance or a Point instance that the consumer does wish to use.
It hardly matters. You want the class methods to be as usable as possible for the calling programs, and it's slightly easier and more efficient to have two methods than to have a single method with an additional parameter for the type of object to be created:
def CreateObj(self, obj, o_file, xy) # obj = 0 for Point, 1 for Line, ...
Recommendation: use separate API calls and factor the common code into method(s) that can be called within your class.
You as well could go the other direction. Especially if the following is the case:
def methA/B(...):
lots of common code
small difference
lots of common code
then you could do
def _common(..., callback):
lots of common code
callback()
lots of common code
def methA(...):
def _mypart(): do what A does
_common(..., _mypart)
def methB(...):
def _mypart(): do what B does
_common(..., _mypart)
I'm getting back to programming for Google App Engine and I've found, in old, unused code, instances in which I wrote constructors for models. It seems like a good idea, but there's no mention of it online and I can't test to see if it works. Here's a contrived example, with no error-checking, etc.:
class Dog(db.Model):
name = db.StringProperty(required=True)
breeds = db.StringListProperty()
age = db.IntegerProperty(default=0)
def __init__(self, name, breed_list, **kwargs):
db.Model.__init__(**kwargs)
self.name = name
self.breeds = breed_list.split()
rufus = Dog('Rufus', 'spaniel terrier labrador')
rufus.put()
The **kwargs are passed on to the Model constructor in case the model is constructed with a specified parent or key_name, or in case other properties (like age) are specified. This constructor differs from the default in that it requires that a name and breed_list be specified (although it can't ensure that they're strings), and it parses breed_list in a way that the default constructor could not.
Is this a legitimate form of instantiation, or should I just use functions or static/class methods? And if it works, why aren't custom constructors used more often?
In your example, why not use the default syntax instead of a custom constructor:
rufus = Dog( name='Rufus', breeds=['spaniel','terrier','labrador'] )
Your version makes it less clear semantically IMHO.
As for overriding Model constructors, Google recommends against it (see for example: http://groups.google.com/group/google-appengine/browse_thread/thread/9a651f6f58875bfe/111b975da1b4b4db?lnk=gst&q=python+constructors#111b975da1b4b4db) and that's why we don't see it in Google's code.
I think it's unfortunate because constructor overriding can be useful in some cases, like creating a temporary property.
One problem I know of is with Expando, anything you define in the constructor gets auto-serialized in the protocol buffer.
But for base Models I am not sure what are the risks, and I too would be happy to learn more.
There's usually no need to do something like that; the default constructor will assign name, and when working with a list it almost always makes more sense to pass an actual list instead of a space-separated string (just imagine the fun if you passed "cocker spaniel" instead of just "spaniel" there, for one thing...).
That said, if you really need to do computation when instantiating a Model subclass instance, there's probably nothing inherently wrong with it. I think most people probably prefer to get the data into the right form and then create the entity, which is why you're not seeing a lot of examples like that.