Schema agnostic model design

Schema agnostic model design - python

It is time for a question on good design and performance.
Say I have three django models:
class Student(Model):
classroom = ForeignKey('Classroom')
# student info
class Classroom(Model):
teacher = ForeignKey('Teacher')
# classroom info
class Teacher(Model):
# teacher info
I want to make sure that a view has a nice way to access all the students that a teacher has. To do this, it might make sense to have a method defined on the Teacher model
def get_students(self):
# code
Now, there are a few ways to do this. One of my priorities is to keep each model relatively agnostic to overall database schema. As a result, I don't really like the following solution:
def get_students(self):
return Student.objects.filter(classroom__teacher=self)
This solution relies on students being related to teachers via a classroom; if I change this structure (maybe students will need to be related to teachers directly, rather than through a classroom), I have to now change the get_students method. If I have a bunch of models and they relate to each other through these nested relationships, changing the schema means hunting down all such filter queries. In my particular case, I have a number of models that exist in different applications and my project is getting quite big, so taking this approach means all inviting all sorts of opportunities for me to miss something and create a bug. Even if my tests are good, I will have to spend a lot of time looking for queries.
A solution that seems more elegant to me is to have a Student manager that defines a for_teacher method:
class StudentManager(Manager):
def for_teacher(self, teacher):
return self.filter(classroom__teacher=teacher)
Now, my get_students method can look like this:
def get_students(self):
return Students.objects.for_teacher(self)
With this approach, I've abstracted things so that a Teacher doesn't know how it is related to Student (namely, through Classroom). All it knows is it is related to students somehow. Of course, if I change the schema, I will have to change the StudentManager. However, if I again imagine a project with many models related to other models through different models, this method offers a way to concentrate all the schema-dependent calls in one place (the managers). This saves me from having to hunt down queries in all sorts of models (and views, perhaps).
The question is, is this a sane approach? If not, what is a preferred way to handle this?
A corollary:
As mentioned above, my project has a bunch of models in a bunch of apps and they need to know about each other somehow. So now we have an added issue: if Teacher contains a get_students method and Student has a get_teacher method, I now run into cyclic module dependencies. A potential solution to this new problem is this version of get_students (get_teacher):
def get_students(self):
from student.models import Student
return Student.objects.for_teacher(self)
I come from a world where imports are put at the top of a program, so this seems a little strange to me. Is this a reasonable approach? Are there performance considerations when doing dynamic imports like this? Will Python cache the Student import in the get_students method so it only happens once?
Thanks in advance!

Certainly it seems like a fine idea to centralize your schema-dependent operations in one place. However, I don't see any advantage to doing that in Managers instead of Models as you propose.
this method offers a way to concentrate all the schema-dependent calls in one place (the managers)
But the managers aren't in one place, there will be one for each relevant model, and they are usually put in the exact same place - models.py. In your example there's certainly no advantage, since in both cases you would have to change one method if the schema changed.
Don't get me wrong, Managers are great, and StudentManager.for_teacher() might make more sense than Teacher.get_students() based on your needs and access patterns. But I don't see an advantage from an encapsulation point of view.
As for imports, it's common and accepted to import a module inside a function if that's necessary to avoid a circular import, even though it's less Pythonic. PEP 8 advises against it, but doesn't say why. The most often cited reason (in my experience) is that it makes it harder to track the module's dependencies when the imports are spread around the file. Performance is not an important consideration here since Python will indeed cache the imported module.
It's probably also true that circular imports are often a sign that something more serious is going wrong. However, in Django it's not uncommon and not troublesome in and of itself.

Related

Python: What is wrong with subclassing a class from a parent package?

When I try to design package structures and class hierarchies within those packages in Python 3 projects, I'm constantly affected with circular import issues. This is whenever I want to implement a class in some module, subclassing a base class from a parent package (i.e. from some __init__.py from some parent directory).
Although I technically understand why that happens, I was not able to come up with a good solution so far. I found some threads here, but none of them go deeper in that particular scenario, let alone mentioning solutions.
In particular:
Putting everything in one file is maybe not great. It potentially can be quite a mass of things. Maybe 90% of the entire project code?! Maybe it wants to define a common subclass of things (whatever, e.g. widget base class for a ui lib), and then just lots of subclasses in nicely organized subpackages? There are obvious reasons why we would not write masses of stuff in one file.
Implementing your base class outside of __init__.py and just importing it there can work for a few places, but messes up the hierarchy with lots of aliases for the same thing (e.g. myproject.BaseClass vs. myproject.foo.BaseClass. Just not nice, is it? It also leads to a lot of boilerplate code that also wants to be maintained.
Implementing it outside, and even not importing it in __init__.py makes those "fully qualified" notations longer everywhere in the code, because it has to contain .foo everywhere I use BaseClass (yes, I usually do import myproject...somemodule, which is not what everybody does, but should not be bad afaik).
Some kinds of rather dirty tricks could help, e.g. defining those subclasses inside some kind of factory methods, so they are not defined at module level. Can work in a few situations, but is just terrible in general.
All of them are maybe okay in a single isolated situation, more as kind of a workaround imho, but in larger scale it ruins a lot. Also, the dirtier the tricks, the more it also breaks the services an IDE can provide.
All kinds of vague statements like 'then your class structure is probably bad and needs reworking' puzzle me a bit. I'm more or less distantly aware of some kinds of general good coding practices, although I never read "Clean Code" or similar stuff.
So, what is wrong with subclassing a class from a parent module from that abstract perspective? And if it is not, what is the secret workaround that people use in Python3? Is there a one, or is everybody basically dealing with one of the mentioned hacks? Sure, everywhere are tradeoffs to be made. But here I'm really struggling, even after many years of writing Python code.

Okay, thank you! I think I was a bit on the wrong way here in understanding the actual problem. It's essentially what #tdelaney said. So it's more complicated than what I initially wrote. It's allowed to do that, but you must not import the submodule in some parent packages. This is what I do here in one place for offering some convenience things. So, it's maybe not perfect, maybe the point where you could argue that my structure is not well designed (but, well, things are always a compromise). At least it's admittedly not true what I posted initially. And the workaround to use a few function level imports is maybe kind of okay in that place.

Style guide: how to not get into a mess in a big project

Django==2.2.5
In the examples below two custom filters and two auxiliary functions.
It is a fake example, not a real code.
Two problems with this code:
When a project becomes big I forget what aux functions I have already written. Not to mention team programming. What is the solution here? To organize a separate module for functions that can be imported? And sort them alphabetically?
Some functions from here may be reused outside this package, and some may not. Say, the combine function seems to be reusable, while get_salted_str is definitely for this module only. I think that it is better to distinguish between functions that may be imported and those that may not. Is it better to use underline symbol to mark unimported functions? Like this: _get_salted_str. This may ease the first problem a bit.
Does Django style guide or any other pythonic style guide mention solutions to the two above mentioned problems?
My code example:
def combine(str1, str2):
return "{}_{}".format(str1, str2)
def get_salted_str(str):
SALT = "slkdjghslkdjfghsldfghaaasd"
return combine(str, SALT)
#register.filter
def get_salted_string(str):
return combine(str, get_salted_str(str))
#register.filter
def get_salted_peppered_string(str):
salted_str = get_salted_str(str)
PEPPER = "1234128712908369735619346"
return "{}_{}".format(PEPPER, salted_str)

When a project becomes big I forget what aux functions I have already written. Not to mention team programming. What is the solution here?
Good documentation and proper modularization.
To organize a separate module for functions that can be imported?
Technically, all functions (except of course nested ones) can be imported. Now I assume you meant: "for functions that are meant to be imported from other modules", but even then, it doesn't mean much - it often happens that a function primarily intended for "internal use" (helper function used within the same module) later becomes useful for other modules.
Also, the proper way to regroup function is not based on whether those are for internal or public use (this is handled by prefixing 'internal use only' functions with a single leading underscore), but on how those functions are related.
NB: I use the term "function" because that's how you phrased your question, but this applies to all other names (classes etc).
And sort them alphabetically?
Bad idea IMHO - it doesn't make any sense from a function POV, and can cause issue when merging diverging branches.
Some functions from here may be reused outside this package, and some may not. Say, the combine function seems to be reusable, while "get_salted_str" is definitely for this module only. I think that it is better to distinguish between functions that may be imported and those that may not. Is it better to use underline symbol to mark unimported functions? Like this: _get_salted_str. This may ease the first problem a bit.
Why would you prevent get_salted_str from being imported by another module actually ?
'protected' (single leading underscore) names are for implementation parts that the module's client code should not mess with nor even be aware of - this is called "encapsulation" -, the goal being to allow for implementation changes that won't break the client code.
In your example, get_salted_str() is a template filter, so it's obviously part of your package's public API.
OTHO, combine really looks like an implementation detail - the fact that some unrelated code in another package may need to combine two strings with the same separator seems mostly accidental, and if you expose combine as part of the module's API you cannot change it's implementation anyway. This is typically an implementation function as far as I can tell from your example (and also it's so trivial that it really doesn't warrant being exposed as ar as I'm concerned).
As a more general level: while avoiding duplication is a very audable goal, you must be careful of overdoing it. Some duplication is actually "accidental" - at some point in time, two totally unrelated parts of the code have a few lines in common, but for totally different reasons, and the forces that may lead to a change in one point of the code are totally unrelated to the other part. So before factoring out seemingly duplicated code, ask yourself if this code is doing the same thing for the same reasons and whether changing this code in one part should affect the other part too.
Does Django style guide or any other pythonic style guide mention solutions to the two above mentioned problems?
This is nothing specific to Django, nor even to Python. Writing well organized code relies on the same heuristics whatever the language: you want high cohesions (all functions / classes etc in a same module should be related and provide solutions to the same problems) and low coupling (a module should depend on as few other modules as possible).
NB: I'm talking about "modules" here but the same rules hold for packages (a package is kind of a super-module) or classes (a class is a kind of mini-module too - except that you can have multiple instances of it).
Now it must be said that proper modularisation - like proper naming etc - IS hard. It takes time and experience (and a lot of reflexion) to develop (no pun intended but...) a "feel" for it, and even then you often find yourself reorganizing things quite a bit during your project's lifetime. And, well, there almost always be some messy area somewhere, because sometimes finding out where a given feature really belongs is a bit of wild guess (hint: look for modules or packages named "util" or "utils" or "helpers" - those are usually where the dev regrouped stuff that didn't clearly belong anywhere else).

There are a lot of ways to go about this, so here is the way I always handle this:
1. Reusable functions in a project
First and foremost: Documentation. When working in a big team you definitely need to document reusable function.
Second, packages. When creating a lot of auxiliary/helper functions, that might have a use outside the current module or app, it can be useful to bundle them all together. I often create a 'base' or 'utils' package in my Django project where I bundle all sorts of functions.
The django.contrib package is a pretty good example of all sorts of helper packages bundled into one.
My rule of thumb is, if I find that I reuse some function/piece of code, I move it to my utils package, and if it's related to something else in that package, I bundle them together. That makes it pretty easy to keep track of all the functions there are.
2. Private functions
Python doesn't really have private members, but the generally accepted way to 'mark' a member as private is to add an underscore, like _get_salted_str
3. Style guide
With regards to auxiliary functions, I'm not aware of any styleguide.
'Private' members : https://docs.python.org/3/tutorial/classes.html#private-variables

How to determine main program vs class [duplicate]

I have been programming in python for about two years; mostly data stuff (pandas, mpl, numpy), but also automation scripts and small web apps. I'm trying to become a better programmer and increase my python knowledge and one of the things that bothers me is that I have never used a class (outside of copying random flask code for small web apps). I generally understand what they are, but I can't seem to wrap my head around why I would need them over a simple function.
To add specificity to my question: I write tons of automated reports which always involve pulling data from multiple data sources (mongo, sql, postgres, apis), performing a lot or a little data munging and formatting, writing the data to csv/excel/html, send it out in an email. The scripts range from ~250 lines to ~600 lines. Would there be any reason for me to use classes to do this and why?

Classes are the pillar of Object Oriented Programming. OOP is highly concerned with code organization, reusability, and encapsulation.
First, a disclaimer: OOP is partially in contrast to Functional Programming, which is a different paradigm used a lot in Python. Not everyone who programs in Python (or surely most languages) uses OOP. You can do a lot in Java 8 that isn't very Object Oriented. If you don't want to use OOP, then don't. If you're just writing one-off scripts to process data that you'll never use again, then keep writing the way you are.
However, there are a lot of reasons to use OOP.
Some reasons:
Organization:
OOP defines well known and standard ways of describing and defining both data and procedure in code. Both data and procedure can be stored at varying levels of definition (in different classes), and there are standard ways about talking about these definitions. That is, if you use OOP in a standard way, it will help your later self and others understand, edit, and use your code. Also, instead of using a complex, arbitrary data storage mechanism (dicts of dicts or lists or dicts or lists of dicts of sets, or whatever), you can name pieces of data structures and conveniently refer to them.
State: OOP helps you define and keep track of state. For instance, in a classic example, if you're creating a program that processes students (for instance, a grade program), you can keep all the info you need about them in one spot (name, age, gender, grade level, courses, grades, teachers, peers, diet, special needs, etc.), and this data is persisted as long as the object is alive, and is easily accessible. In contrast, in pure functional programming, state is never mutated in place.
Encapsulation:
With encapsulation, procedure and data are stored together. Methods (an OOP term for functions) are defined right alongside the data that they operate on and produce. In a language like Java that allows for access control, or in Python, depending upon how you describe your public API, this means that methods and data can be hidden from the user. What this means is that if you need or want to change code, you can do whatever you want to the implementation of the code, but keep the public APIs the same.
Inheritance:
Inheritance allows you to define data and procedure in one place (in one class), and then override or extend that functionality later. For instance, in Python, I often see people creating subclasses of the dict class in order to add additional functionality. A common change is overriding the method that throws an exception when a key is requested from a dictionary that doesn't exist to give a default value based on an unknown key. This allows you to extend your own code now or later, allow others to extend your code, and allows you to extend other people's code.
Reusability: All of these reasons and others allow for greater reusability of code. Object oriented code allows you to write solid (tested) code once, and then reuse over and over. If you need to tweak something for your specific use case, you can inherit from an existing class and overwrite the existing behavior. If you need to change something, you can change it all while maintaining the existing public method signatures, and no one is the wiser (hopefully).
Again, there are several reasons not to use OOP, and you don't need to. But luckily with a language like Python, you can use just a little bit or a lot, it's up to you.
An example of the student use case (no guarantee on code quality, just an example):
Object Oriented
class Student(object):
def __init__(self, name, age, gender, level, grades=None):
self.name = name
self.age = age
self.gender = gender
self.level = level
self.grades = grades or {}
def setGrade(self, course, grade):
self.grades[course] = grade
def getGrade(self, course):
return self.grades[course]
def getGPA(self):
return sum(self.grades.values())/len(self.grades)
# Define some students
john = Student("John", 12, "male", 6, {"math":3.3})
jane = Student("Jane", 12, "female", 6, {"math":3.5})
# Now we can get to the grades easily
print(john.getGPA())
print(jane.getGPA())
Standard Dict
def calculateGPA(gradeDict):
return sum(gradeDict.values())/len(gradeDict)
students = {}
# We can set the keys to variables so we might minimize typos
name, age, gender, level, grades = "name", "age", "gender", "level", "grades"
john, jane = "john", "jane"
math = "math"
students[john] = {}
students[john][age] = 12
students[john][gender] = "male"
students[john][level] = 6
students[john][grades] = {math:3.3}
students[jane] = {}
students[jane][age] = 12
students[jane][gender] = "female"
students[jane][level] = 6
students[jane][grades] = {math:3.5}
# At this point, we need to remember who the students are and where the grades are stored. Not a huge deal, but avoided by OOP.
print(calculateGPA(students[john][grades]))
print(calculateGPA(students[jane][grades]))

Whenever you need to maintain a state of your functions and it cannot be accomplished with generators (functions which yield rather than return). Generators maintain their own state.
If you want to override any of the standard operators, you need a class.
Whenever you have a use for a Visitor pattern, you'll need classes. Every other design pattern can be accomplished more effectively and cleanly with generators, context managers (which are also better implemented as generators than as classes) and POD types (dictionaries, lists and tuples, etc.).
If you want to write "pythonic" code, you should prefer context managers and generators over classes. It will be cleaner.
If you want to extend functionality, you will almost always be able to accomplish it with containment rather than inheritance.
As every rule, this has an exception. If you want to encapsulate functionality quickly (ie, write test code rather than library-level reusable code), you can encapsulate the state in a class. It will be simple and won't need to be reusable.
If you need a C++ style destructor (RIIA), you definitely do NOT want to use classes. You want context managers.

I think you do it right. Classes are reasonable when you need to simulate some business logic or difficult real-life processes with difficult relations.
As example:
Several functions with share state
More than one copy of the same state variables
To extend the behavior of an existing functionality
I also suggest you to watch this classic video

dantiston gives a great answer on why OOP can be useful. However, it is worth noting that OOP is not necessary a better choice most cases it is used. OOP has the advantage of combining data and methods together. In terms of application, I would say that use OOP only if all the functions/methods are dealing and only dealing with a particular set of data and nothing else.
Consider a functional programming refactoring of dentiston's example:
def dictMean( nums ):
return sum(nums.values())/len(nums)
# It's good to include automatic tests for production code, to ensure that updates don't break old codes
assert( dictMean({'math':3.3,'science':3.5})==3.4 )
john = {'name':'John', 'age':12, 'gender':'male', 'level':6, 'grades':{'math':3.3}}
# setGrade
john['grades']['science']=3.5
# getGrade
print(john['grades']['math'])
# getGPA
print(dictMean(john['grades']))
At a first look, it seems like all the 3 methods exclusively deal with GPA, until you realize that Student.getGPA() can be generalized as a function to compute mean of a dict, and re-used on other problems, and the other 2 methods reinvent what dict can already do.
The functional implementation gains:
Simplicity. No boilerplate class or selfs.
Easily add automatic test code right after each
function for easy maintenance.
Easily split into several programs as your code scales.
Reusability for purposes other than computing GPA.
The functional implementation loses:
Typing in 'name', 'age', 'gender' in dict key each time is not very DRY (don't repeat yourself). It's possible to avoid that by changing dict to a list. Sure, a list is less clear than a dict, but this is a none issue if you include an automatic test code below anyway.
Issues this example doesn't cover:
OOP inheritance can be supplanted by function callback.
Calling an OOP class has to create an instance of it first. This can be boring when you don't have data in __init__(self).

A class defines a real world entity. If you are working on something that exists individually and has its own logic that is separate from others, you should create a class for it. For example, a class that encapsulates database connectivity.
If this not the case, no need to create class

It depends on your idea and design. If you are a good designer, then OOPs will come out naturally in the form of various design patterns.
For simple script-level processing, OOPs can be overhead.
Simply consider the basic benefits of OOPs like reusability and extendability and make sure if they are needed or not.
OOPs make complex things simpler and simpler things complex.
Simply keep the things simple in either way using OOPs or not using OOPs. Whichever is simpler, use that.

Design pattern in Python with a lot of customer specific exceptions

At this moment we are working on a large project.
This project is supposed to create EDIFACT messages. This is not so hard at first but the catch is there are a lot of customers that have their own implementation of the standard.
On top of that we are working with several EDIFACT standards (D96A and D01B in our case.)
Some customer exceptions might be as small as having a divergent field length, but some have made their own implementation completely different.
At this moment we have listed the customer exceptions in a list (Just to keep them consistent) and in the code we use something like:
if NAME_LENGTH_IS_100 in customer_exceptions:
this.max_length = 100
else:
this.max_length = 70
For a couple of simple exceptions this works just fine, but at this moment the code starts to get really cluttered and we are thinking about refactoring the code.
I am thinking about some kind of factory pattern, but I am not sure about the implementation.
Another option would be to create a base package and make a separate implementation for every customer that is diverging from the standard.
I hope someone can help me out with some advice.
Thanks in advance.

I think your question is too broad to be answered properly (I was up to click the close button because of this but decided otherwise). The reason for this is the following:
There is nothing wrong the code snippet you provided. It should be part of some kind of initialization routine, then this is just fine the way it is. It also doesn't hurt to have things like this in a large amount.
But how to handle more complex cases depends greatly on the cases themselves.
For lots of situations it might be sufficient to have such variables which represent the customer's special choices.
For other aspects I'd propose to have a Customer base class with subclasses thereof, for each customer one (or maybe the customers can even be hierarchically grouped, then a nice inheritance tree could reflect this).
For other cases again I'd propose aspect-oriented programming by use of Python decorators to tweak the behavior of methods, functions, and classes.
Since this depends greatly on your concrete usecases, I think this question cannot be answered more concretely than this.

Why not put the all this in a resource file with the standard as default and each exception handle in a surcharged value, then you'll just need to read the right key for the right client and you code stay clean.

Python: add a parent class to a class after initial evaluation

General Python Question
I'm importing a Python library (call it animals.py) with the following class structure:
class Animal(object): pass
class Rat(Animal): pass
class Bat(Animal): pass
class Cat(Animal): pass
...
I want to add a parent class (Pet) to each of the species classes (Rat, Bat, Cat, ...); however, I cannot change the actual source of the library I'm importing, so it has to be a run time change.
The following seems to work:
import animals
class Pet(object): pass
for klass in (animals.Rat, animals.Bat, animals.Cat, ...):
klass.__bases__ = (Pet,) + klass.__bases__
Is this the best way to inject a parent class into an inheritance tree in Python without making modification to the source definition of the class to be modified?
Motivating Circumstances
I'm trying to graft persistence onto the a large library that controls lab equipment. Messing with it is out of the question. I want to give ZODB's Persistent a try. I don't want to write the mixin/facade wrapper library because I'm dealing with 100+ classes and lots of imports in my application code that would need to be updated. I'm testing options by hacking on my entry point only: setting up the DB, patching as shown above (but pulling the species classes w/ introspection on the animals module instead of explicit listing) then closing out the DB as I exit.
Mea Culpa / Request
This is an intentionally general question. I'm interested in different approaches to injecting a parent and comments on the pros and cons of those approaches. I agree that this sort of runtime chicanery would make for really confusing code. If I settle on ZODB I'll do something explicit. For now, as a regular user of python, I'm curious about the general case.

Your method is pretty much how to do it dynamically. The real question is: What does this new parent class add? If you are trying to insert your own methods in a method chain that exists in the classes already there, and they were not written properly, you won't be able to; if you are adding original methods (e.g. an interface layer), then you could possibly just use functions instead.
I am one who embraces Python's dynamic nature, and would have no problem using the code you have presented. Make sure you have good unit tests in place (dynamic or not ;), and that modifying the inheritance tree actually lets you do what you need, and enjoy Python!

You should try really hard not to do this. It is strange, and will likely end in tears.
As #agf mentions, you can use Pet as a mixin. If you tell us more about why you want to insert a parent class, we can help you find a nicer solution.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Schema agnostic model design - python

Related

Python: What is wrong with subclassing a class from a parent package?

Style guide: how to not get into a mess in a big project

How to determine main program vs class [duplicate]

Design pattern in Python with a lot of customer specific exceptions

Python: add a parent class to a class after initial evaluation

Categories

Resources