In my Python app, I need to create a simple RequestContext object that contains the actual request to send, as well as some other metadata about the request, e.g. index, id, source, etc. Also, it's likely that more metadata will be added.
I can think of two ways to do this:
Option 1: Attributes of the RequestContext:
class RequestContext(object):
def __init__(self, request, index=0, source=None):
self.request = request
self.index = index
self.source = source
...
Option 2: Dictionary in the RequestContext:
class RequestContext(object):
def __init__(self, request):
self.request = request
self.context_info = {}
Then users can add whatever context info they want.
I personally like accessing values through attributes, because you have a predefined set of attributes that you know are there.
On the other hand, a dictionary lets the client code (owned by me) add more metadata without having to update the RequestContext object definition.
So which one would be better in terms of ease of use and ease of adding more metadata? Are there other pitfalls or considerations that I should think about?
First, you seem to be operating under a misapprehension:
On the other hand, a dictionary lets the client code (owned by me) add more metadata without having to update the RequestContext object definition.
You don't have to update the class definition to add new attributes. (You can force the class to have a fixed set of attributes in various ways—e.g., by using __slots__. But if you don't do any of that, you can add new attributes on the fly.) For example:
>>> class C(object):
... def __init__(self, x):
... self.x = x
>>> c = C(10)
>>> c.y = 20
>>> print(c.x, c.y)
10 20
In fact, if you look under the covers, attributes are (by default) stored in a perfectly normal dictionary, named __dict__:
>>> print(c.__dict__)
{'x': 10, 'y': 20}
So, what is the difference between using attributes, vs. just adding a dictionary attribute and using members of that dictionary (or, alternatively, inheriting from or delegating to a dict)?
Really, it's the same as the difference between using separate variables vs. a single dict at the top level.
One way to look at it is whether the name-value pairs are data, or whether the names are part of your program and only the values are data. In the former case, you want a dict; in the latter case, you want separate variables.
Alternatively, you can ask how dynamic the names are. If there's an open-ended set of values whose names are only known at runtime, you want a dict. If there's a fixed set of values whose names are hardcoded into your source, you want attributes.
Finally, just ask yourself how often you'd have to use getattr and setattr if you went with attributes. If the answer is "frequently", you want a dict; if it's "never", you want attributes.
In many real-life apps, it's not entirely clear, because you're somewhere between the two. Sometimes, rethinking your design can make it clearly one or the other, but sometimes things are just inherently "sort of dynamic". In that case, you have to make a judgment call: decide which of the two cases you're closest to, and code as if that were the real case.
It may be worth looking at some real-life open source apps that are similar to yours. For example, you're dealing with metadata about some kind of requests. Maybe look at how requests and pycurl deal with HTTP information that's kind of like metadata, like headers and status lines. Or maybe look at how QuodLibet and MusicBrainz Picard deal with metadata in a different domain, music files. And so on.
Related
I have a data engineering program that is grabbing some data off of Federal government websites and transforming that data. I'm a bit confused on whether I need to use the 'self' keyword or if it's a better practice to not use a class at all. This is how it's currently organized:
class GetGovtData():
def get_data_1(arg1=0, arg2=1):
df = conduct_some_operations
return df
def get_data_2(arg1=4, arg2=5):
df = conduct_some_operations_two
return df
I'm mostly using a class here for organization purposes. For instance, there might be a dozen different methods from one class that I need to use. I find it more aesthetically pleasing / easier to type out this:
from data.get_govt_data import GetGovtData
df1 = GetGovtData.get_data_1()
df2 = GetGovtData.get_data_2()
Rather than:
from data import get_govt_data
df1 = get_govt_data.get_data_1()
df2 = get_govt_data.get_data_2()
Which just has a boatload of underscores. So I'm just curious if this would be considered bad code to use a class like this, without bothering with 'self'? Or should I just eliminate the classes and use a bunch of functions in my files instead?
If you develop functions within a Python class you can two ways of defining a function: The one with a self as first parameter and the other one without self.
So, what is the different between the two?
Function with self
The first one is a method, which is able to access content within the created object. This allows you to access the internal state of an individual object, e.g., a counter of some sorts. These are methods you usually use when using object oriented programming. A short intro can be fund here [External Link]. These methods require you to create new instances of the given class.
Function without self
Functions without initialising an instance of the class. This is why you can directly call them on the imported class.
Alternative solution
This is based on the comment of Tom K. Instead of using self, you can also use the decorator #staticmethod to indicate the role of the method within your class. Some more info can be found here [External link].
Final thought
To answer you initial question: You do not need to use self. In your case you do not need self, because you do not share the internal state of an object. Nevertheless, if you are using classes you should think about an object oriented design.
I suppose you have a file called data/get_govt_data.py that contains your first code block. You can just rename that file to data/GetGovtData.py, remove the class line and not bother with classes at all, if you like. Then you can do
from data import GetGovtData
df1 = GetGovtData.get_data_1()
Depending on your setup you may need to create an empty file data/__init__.py for Python to see data as a module.
EDIT: Regarding the file naming, Python does not impose any too tight restrictions here. Note however that many projects conventionally use camelCase or CapitalCase to distinguish function, class and module names. Using CapitalCase for a module may confuse others for a second to assume it's a class. You may choose not to follow this convention if you do not want to use classes in your project.
To answer the question in the title first: The exact string 'self' is a convention (that I can see no valid reason to ignore BTW), but the first argument in a class method is always going to be a reference to the class instance.
Whether you should use a class or flat functions depends on if the functions have shared state. From your scenario it sounds like they may have a common base URL, authentication data, database names, etc. Maybe you even need to establish a connection first? All those would be best held in the class and then used in the functions.
I am working on a project where I have a number of custom classes to interface with a varied collection of data on a user's system. These classes only have properties as user-facing attributes. Some of these properties are decently resource intensive, so I want to only run the generation code once, and store the returned value on disk (cache it, that is) for faster retrieval on subsequent runs. As it stands, this is how I am accomplishing this:
def stored_property(func):
"""This ``decorator`` adds on-disk functionality to the `property`
decorator. This decorator is also a Method Decorator.
Each key property of a class is stored in a settings JSON file with
a dictionary of property names and values (e.g. :class:`MyClass`
stores its properties in `my_class.json`).
"""
#property
#functools.wraps(func)
def func_wrapper(self):
print('running decorator...')
try:
var = self.properties[func.__name__]
if var:
# property already written to disk
return var
else:
# property written to disk as `null`
return func(self)
except AttributeError:
# `self.properties` does not yet exist
return func(self)
except KeyError:
# `self.properties` exists, but property is not a key
return func(self)
return func_wrapper
class MyClass(object):
def __init__(self, wf):
self.wf = wf
self.properties = self._properties()
def _properties(self):
# get name of class in underscore format
class_name = convert(self.__class__.__name__)
# this is a library used (in Alfred workflows) for interacted with data stored on disk
properties = self.wf.stored_data(class_name)
# if no file on disk, or one of the properties has a null value
if properties is None or None in properties.values():
# get names of all properties of this class
propnames = [k for (k, v) in self.__class__.__dict__.items()
if isinstance(v, property)]
properties = dict()
for prop in propnames:
# generate dictionary of property names and values
properties[prop] = getattr(self, prop)
# use the external library to save that dictionary to disk in JSON format
self.wf.store_data(class_name, properties,
serializer='json')
# return either the data read from file, or data generated in situ
return properties
#this decorator ensures that this generating code is only run if necessary
#stored_property
def only_property(self):
# some code to get data
return 'this is my property'
This code works precisely as I need it, but it still forces me to manually add the _properties(self) method to each class wherein I need this functionality (currently, I have 3). What I want is a way to "insert" this functionality into any class I please. I think that a Class Decorator could get this job done, but try as I might, I can't quite figure out how to wrangle it.
For the sake of clarity (and in case a decorator is not the best way to get what I want), I will try to explain the overall functionality I am after. I want to write a class that contains some properties. The values of these properties are generated via various degrees of complex code (in one instance, I'm searching for a certain app's pref file, then searching for 3 different preferences (any of which may or may not exist) and determining the best single result from those preferences). I want the body of the properties' code only to contain the algorithm for finding the data. But, I don't want to run that algorithmic code each time I access that property. Once I generate the value once, I want to write it to disk and then simply read that on all subsequent calls. However, I don't want each value written to its own file; I want a dictionary of all the values of all the properties of a single class to be written to one file (so, in the example above, my_class.json would contain a JSON dictionary with one key, value pair). When accessing the property directly, it should first check to see if it already exists in the dictionary on disk. If it does, simply read and return that value. If it exists, but has a null value, then try to run the generation code (i.e. the code actually written in the property method) and see if you can find it now (if not, the method will return None and that will once again be written to file). If the dictionary exists and that property is not a key (my current code doesn't really make this possible, but better safe than sorry), run the generation code and add the key, value pair. If the dictionary doesn't exist (i.e. on the first instantiation of the class), run all generation code for all properties and create the JSON file. Ideally, the code would be able to update one property in the JSON file without rerunning all of the generation code (i.e. running _properties() again).
I know this is a bit peculiar, but I need the speed, human-readable content, and elegant code all together. I would really not to have to compromise on my goal. Hopefully, the description of what I want it clear enough. If not, let me know in a comment what doesn't make sense and I will try to clarify. But I do think that a Class Decorator could probably get me there (essentially by inserting the _properties() method into any class, running it on instantiation, and mapping its value to the properties attribute of the class).
Maybe I'm missing something, but it doesn't seem that your _properties method is specific to the properties that a given class has. I'd put that in a base class and have each of your classes with #stored_property methods subclass that. Then you don't need to duplicate the _properties method.
class PropertyBase(object):
def __init__(self, wf):
self.wf = wf
self.properties = self._properties()
def _properties(self):
# As before...
class MyClass(PropertyBase):
#stored_property
def expensive_to_calculate(self):
# Calculate it here
If for some reason you can't subclass PropertyBase directly (maybe you already need to have a different base class), you can probably use a mixin. Failing that, make _properties accept an instance/class and a workflow object and call it explicitly in __init__ for each class.
When and how are static methods suppose to be used in python? We have already established using a class method as factory method to create an instance of an object should be avoided when possible. In other words, it is not best practice to use class methods as an alternate constructor (See Factory method for python object - best practice).
Lets say I have a class used to represent some entity data in a database. Imagine the data is a dict object containing field names and field values and one of the fields is an ID number that makes the data unique.
class Entity(object):
def __init__(self, data, db_connection):
self._data = data
self._db_connection
Here my __init__ method takes the entity data dict object. Lets say I only have an ID number and I want to create an Entity instance. First I will need to find the rest of the data, then create an instance of my Entity object. From my previous question, we established that using a class method as a factory method should probably be avoided when possible.
class Entity(object):
#classmethod
def from_id(cls, id_number, db_connection):
filters = [['id', 'is', id_number]]
data = db_connection.find(filters)
return cls(data, db_connection)
def __init__(self, data, db_connection):
self._data = data
self._db_connection
# Create entity
entity = Entity.from_id(id_number, db_connection)
Above is an example of what not to do or at least what not to do if there is an alternative. Now I am wondering if editing my class method so that it is more of a utility method and less of a factory method is a valid solution. In other words, does the following example comply with the best practice for using static methods.
class Entity(object):
#staticmethod
def data_from_id(id_number, db_connection):
filters = [['id', 'is', id_number]]
data = db_connection.find(filters)
return data
# Create entity
data = Entity.data_from_id(id_number, db_connection)
entity = Entity(data)
Or does it make more sense to use a standalone function to find the entity data from an ID number.
def find_data_from_id(id_number, db_connection):
filters = [['id', 'is', id_number]]
data = db_connection.find(filters)
return data
# Create entity.
data = find_data_from_id(id_number, db_connection)
entity = Entity(data, db_connection)
Note: I do not want to change my __init__ method. Previously people have suggested making my __init__ method to look something like this __init__(self, data=None, id_number=None) but there could be 101 different ways to find the entity data so I would prefer to keep that logic separate to some extent. Make sense?
When and how are static methods suppose to be used in python?
The glib answer is: Not very often.
The even glibber but not quite as useless answer is: When they make your code more readable.
First, let's take a detour to the docs:
Static methods in Python are similar to those found in Java or C++. Also see classmethod() for a variant that is useful for creating alternate class constructors.
So, when you need a static method in C++, you need a static method in Python, right?
Well, no.
In Java, there are no functions, just methods, so you end up creating pseudo-classes that are just bundles of static methods. The way to do the same thing in Python is to just use free functions.
That's pretty obvious. However, it's good Java style to look as hard as possible for an appropriate class to wedge a function into, so you can avoid writing those pseudo-classes, while doing the same thing is bad Python style—again, use free functions—and this is much less obvious.
C++ doesn't have the same limitation as Java, but many C++ styles are pretty similar anyway. (On the other hand, if you're a "Modern C++" programmer who's internalized the "free functions are part of a class's interface" idiom, your instincts for "where are static methods useful" are probably pretty decent for Python.)
But if you're coming at this from first principles, rather than from another language, there's a simpler way to look at things:
A #staticmethod is basically just a global function. If you have a function foo_module.bar() that would be more readable for some reason if it were spelled as foo_module.BazClass.bar(), make it a #staticmethod. If not, don't. That's really all there is to it. The only problem is building up your instincts for what's more readable to an idiomatic Python programmer.
And of course use a #classmethod when you need access to the class, but not the instance—alternate constructors are the paradigm case for that, as the docs imply. Although you often can simulate a #classmethod with a #staticmethod just by explicitly referencing the class (especially when you don't have much subclassing), you shouldn't.
Finally, getting to your specific question:
If the only reason clients ever need to look up data by ID is to construct an Entity, that sounds like an implementation detail you shouldn't be exposing, and it also makes client code more complex. Just use a constructor. If you don't want to modify your __init__ (and you're right that there are good reasons you might not want to), use a #classmethod as an alternate constructor: Entity.from_id(id_number, db_connection).
On the other hand, if that lookup is something that's inherently useful to clients in other cases that have nothing to do with Entity construction, it seems like this has nothing to do with the Entity class (or at least no more than anything else in the same module). So, just make it a free function.
The answer to the linked question specifically says this:
A #classmethod is the idiomatic way to do an "alternate constructor"—there are examples all over the stdlib—itertools.chain.from_iterable, datetime.datetime.fromordinal, etc.
So I don't know how you got the idea that using a classmethod is inherently bad. I actually like the idea of using a classmethod in your specific situation, as it makes following the code and using the api easy.
The alternative would be to use default constructor arguments like so:
class Entity(object):
def __init__(self, id, db_connection, data=None):
self.id = id
self.db_connection = db_connection
if data is None:
self.data = self.from_id(id, db_connection)
else:
self.data = data
def from_id(cls, id_number, db_connection):
filters = [['id', 'is', id_number]]
return db_connection.find(filters)
I prefer the classmethod version that you wrote originally however. Especially since data is fairly ambiguous.
Your first example makes the most sense to me: Entity.from_id is pretty succinct and clear.
It avoids the use of data in the next two examples, which does not describe what's being returned; the data is used to construct an Entity. If you wanted to be specific about the data being used to construct the Entity, then you could name your method something like Entity.with_data_for_id or the equivalent function entity_with_data_for_id.
Using a verb such as find can also be pretty confusing, as it doesn't give any indication of the return value — what is the function supposed to do when it's found the data? (Yes, I realize str has a find method; wouldn't it be better named index_of? But then there's also index...) It reminds me of the classic:
I always try to think what a name would indicate to someone with (a) no knowledge of the system, and (b) knowledge of other parts of the system — not to say I'm always successful!
Here is a decent use case for #staticmethod.
I have been working on a game as a side project. Part of that game includes rolling dice based on stats, and the possibility of picking up items and effects that impact your character's stats (for better or worse).
When I roll the dice in my game, I need to basically say... take the base character stats and then add any inventory and effect stats into this grand netted figure.
You can't take these abstract objects and add them without instructing the program how. I'm not doing anything at the class level or instance level either. I didn't want to define the function in some global module. The last best option was to go with a static method for adding up stats together. It just makes the most sense this way.
class Stats:
attribs = ['strength', 'speed', 'intellect', 'tenacity']
def __init__(self,
strength=0,
speed=0,
intellect=0,
tenacity=0
):
self.strength = int(strength)
self.speed = int(speed)
self.intellect = int(intellect)
self.tenacity = int(tenacity)
# combine adds stats objects together and returns a single stats object
#staticmethod
def combine(*args: 'Stats'):
assert all(isinstance(arg, Stats) for arg in args)
return_stats = Stats()
for stat in Stats.attribs:
for _ in args:
setattr(return_stats, stat,
getattr(return_stats, stat) + getattr(_, stat))
return (return_stats)
Which would make the stat combination calls work like this
a = Stats(strength=3, intellect=3)
b = Stats(strength=1, intellect=-1)
c = Stats(tenacity=5)
print(Stats.combine(a, b, c).__dict__)
{'strength': 4, 'speed': 0, 'intellect': 2, 'tenacity': 5}
I want to create a list of class instances that automatically updates itself following a particular condition on the instance attributes.
For example, I have a list of object of my custom class Person() and I want to be able to generate a list that always contains all the married persons, i.e. all persons having the attribute 'MAR_STATUS' equal to 'MARRIED'.
Is this possible at all in Python? I have used a C++ precompiler for microsimulations that had a very handy built-in called "actor_set" which did exactly this. But I have no idea of how it was implemented in C++.
Thank you.
List comprehension:
[person for person in people if person.MAR_STATUS == 'MARRIED']
If you need to assign it to a variable and you want that variable to automatically update on every access, you can put this same code in a lambda, a normal function, or, if your variable is a class member, in a property getter.
It is poor form to have "action at a distance" / mutations / side-effects unless it is very carefully controlled.
That said, imperative language will let you do this, if you really want to, as follows. Here we use python's [property getters and setters]:
MARRIED_SET = set()
def updateMarriedSet(changedPerson):
if hasattr(changedPerson,'married') and changedPerson.married==Person.MARRIED:
MARRIED_SET.add(changedPerson)
else:
MARRIED_SET.discard(changedPerson)
class Person(object):
...
#property
def married(self):
"""The person is married"""
return self._married
#married.setter
def married(self, newStatus):
self._married = newStatus
updateMarriedSet(self)
#married.deleter
def married(self):
del self._married
updateMarriedSet(self)
I can imagine this might, possibly, be useful to ensure accesses to getMarriedPeople() runs in O(1) time rather than amortized O(1) time.
The simple way is to generate the list on the fly e.g., as shown in #sr2222's answer.
As an alternative you could call an arbitrary callback each time MAR_STATUS changes. Use __new__ if Person instances are immutable or make MAR_STATUS a property and call registered callbacks in the setter method (see notifications in traits library for a more complex implementation).
I'm getting back to programming for Google App Engine and I've found, in old, unused code, instances in which I wrote constructors for models. It seems like a good idea, but there's no mention of it online and I can't test to see if it works. Here's a contrived example, with no error-checking, etc.:
class Dog(db.Model):
name = db.StringProperty(required=True)
breeds = db.StringListProperty()
age = db.IntegerProperty(default=0)
def __init__(self, name, breed_list, **kwargs):
db.Model.__init__(**kwargs)
self.name = name
self.breeds = breed_list.split()
rufus = Dog('Rufus', 'spaniel terrier labrador')
rufus.put()
The **kwargs are passed on to the Model constructor in case the model is constructed with a specified parent or key_name, or in case other properties (like age) are specified. This constructor differs from the default in that it requires that a name and breed_list be specified (although it can't ensure that they're strings), and it parses breed_list in a way that the default constructor could not.
Is this a legitimate form of instantiation, or should I just use functions or static/class methods? And if it works, why aren't custom constructors used more often?
In your example, why not use the default syntax instead of a custom constructor:
rufus = Dog( name='Rufus', breeds=['spaniel','terrier','labrador'] )
Your version makes it less clear semantically IMHO.
As for overriding Model constructors, Google recommends against it (see for example: http://groups.google.com/group/google-appengine/browse_thread/thread/9a651f6f58875bfe/111b975da1b4b4db?lnk=gst&q=python+constructors#111b975da1b4b4db) and that's why we don't see it in Google's code.
I think it's unfortunate because constructor overriding can be useful in some cases, like creating a temporary property.
One problem I know of is with Expando, anything you define in the constructor gets auto-serialized in the protocol buffer.
But for base Models I am not sure what are the risks, and I too would be happy to learn more.
There's usually no need to do something like that; the default constructor will assign name, and when working with a list it almost always makes more sense to pass an actual list instead of a space-separated string (just imagine the fun if you passed "cocker spaniel" instead of just "spaniel" there, for one thing...).
That said, if you really need to do computation when instantiating a Model subclass instance, there's probably nothing inherently wrong with it. I think most people probably prefer to get the data into the right form and then create the entity, which is why you're not seeing a lot of examples like that.