splitting a class that is too large - python

I have a class BigStructure that builds a complicated data structure from some input. It also includes methods that perform operations on that data structure.
The class grew too large, so I'm trying to split it in two to help maintainability. I was thinking that it would be natural to move the operations into a new class, say class OperationsOnBigStructure.
Unfortunately, since class BigStructure is quite unique, OperationsOnBigStructure cannot be reasonably reused with any other class. In a sense, it's forever tied to BigStructure. For example, a typical operation may consist of traversing a big structure instance in a way that is only meaningful for a BigStructure object.
Now, I have two classes, but it feels like I haven't improved anything. In fact, I made things slightly more complicated, since I now need to pass the BigStructure object to the methods in OperationsOnBigStructure, and they need to store that object internally.
Should I just live with one big class?

The solution I came up with for this problem was to create a package to contain the class.
Something along the lines of:
MyClass/
__init__.py
method_a.py
method_b.py
...
in my case __init__.py contains the actual datastructure definition, but no methods. To 'attach' the methods to the class I just import them into the class' namespace.
Contents of method_a.py:
def method_a(self, msg):
print 'a: %s' % str(msg)
Contents of __init__.py:
class MyClass():
from method_a import method_a
from method_b import method_b
def method_c(self):
print 'c'
In the python console:
>>> from MyClass import MyClass
>>> a = MyClass()
>>> dir(a)
['__doc__', '__module__', 'method_a', 'method_b', 'method_c']
>>> a.method_a('hello world')
a: hello world
>>> a.method_c()
c
This has worked for me.

I was thinking that it would be natural to move the operations into a new class, say class OperationsOnBigStructure.
I would say, that's quite the opposite of what Object Oriented Design is all about. The idea behind OOD is to keep data and methods together.
Usually a (too) big class is a sign of too much responsibility: i.e. your class is simply doing too much. It seems that you first defined a data structure and then added functions to it. You could try to break the data structure into substructures and define independent classes for these (i.e. use aggregation). But without knowing more it's difficult to say...
Of course sometimes, a program just runs fine with one big class. But if you feel incomfortable with it yourself, that's a strong hint to start doing something against ...

"""Now, I have two classes, but it feels like I haven't improved anything. In fact, I made things slightly more complicated, since I now need to pass the BigStructure object to the methods in OperationsOnBigStructure, and they need to store that object internally."""
I think a natural approach there would be to have "OperationsOnBigStructure" to inherit from bigstructure - therefore you have all the relevant code in one place, without the extra parameter passing,as the data it needs to operate on will be contained in "self".

For example, a typical operation may consist of traversing a big
structure instance in a way that is only meaningful for a BigStructure
object.
Perhaps you could write some generators as methods of BigStructure that would do the grunt work of traversal. Then, OperationsOnBigStructure could just loop over an iterator when doing a task, which might improve readability of the code.
So, by having two classes instead of one, you are raising the level of abstraction at two stages.

At first make sure you have high test coverage, this will boost your refactoring experience. If there are no or not enough unittests, create them.
Then make reasonable small refactoring steps and keep the unittests working:
As a rule of thumb try to keep the core functionality together in the big class. Try not to draw a border if there is too much coupling.
At first refactor sub tasks to functions in seperate libraries. If it is possible to abstract things, move that functionality to libraries.
Then make the code more clean and reorder it, until you can see the structure it really 'wants' to have.
If in the end you feel you can still cut it in two classes and this is quite natural according to the inner structure, then consider really doing it.
Always keep the test coverage high enough and refactor always after you made some changes. AFter some time you will have much more beautiful code.

Related

How to determine main program vs class [duplicate]

I have been programming in python for about two years; mostly data stuff (pandas, mpl, numpy), but also automation scripts and small web apps. I'm trying to become a better programmer and increase my python knowledge and one of the things that bothers me is that I have never used a class (outside of copying random flask code for small web apps). I generally understand what they are, but I can't seem to wrap my head around why I would need them over a simple function.
To add specificity to my question: I write tons of automated reports which always involve pulling data from multiple data sources (mongo, sql, postgres, apis), performing a lot or a little data munging and formatting, writing the data to csv/excel/html, send it out in an email. The scripts range from ~250 lines to ~600 lines. Would there be any reason for me to use classes to do this and why?
Classes are the pillar of Object Oriented Programming. OOP is highly concerned with code organization, reusability, and encapsulation.
First, a disclaimer: OOP is partially in contrast to Functional Programming, which is a different paradigm used a lot in Python. Not everyone who programs in Python (or surely most languages) uses OOP. You can do a lot in Java 8 that isn't very Object Oriented. If you don't want to use OOP, then don't. If you're just writing one-off scripts to process data that you'll never use again, then keep writing the way you are.
However, there are a lot of reasons to use OOP.
Some reasons:
Organization:
OOP defines well known and standard ways of describing and defining both data and procedure in code. Both data and procedure can be stored at varying levels of definition (in different classes), and there are standard ways about talking about these definitions. That is, if you use OOP in a standard way, it will help your later self and others understand, edit, and use your code. Also, instead of using a complex, arbitrary data storage mechanism (dicts of dicts or lists or dicts or lists of dicts of sets, or whatever), you can name pieces of data structures and conveniently refer to them.
State: OOP helps you define and keep track of state. For instance, in a classic example, if you're creating a program that processes students (for instance, a grade program), you can keep all the info you need about them in one spot (name, age, gender, grade level, courses, grades, teachers, peers, diet, special needs, etc.), and this data is persisted as long as the object is alive, and is easily accessible. In contrast, in pure functional programming, state is never mutated in place.
Encapsulation:
With encapsulation, procedure and data are stored together. Methods (an OOP term for functions) are defined right alongside the data that they operate on and produce. In a language like Java that allows for access control, or in Python, depending upon how you describe your public API, this means that methods and data can be hidden from the user. What this means is that if you need or want to change code, you can do whatever you want to the implementation of the code, but keep the public APIs the same.
Inheritance:
Inheritance allows you to define data and procedure in one place (in one class), and then override or extend that functionality later. For instance, in Python, I often see people creating subclasses of the dict class in order to add additional functionality. A common change is overriding the method that throws an exception when a key is requested from a dictionary that doesn't exist to give a default value based on an unknown key. This allows you to extend your own code now or later, allow others to extend your code, and allows you to extend other people's code.
Reusability: All of these reasons and others allow for greater reusability of code. Object oriented code allows you to write solid (tested) code once, and then reuse over and over. If you need to tweak something for your specific use case, you can inherit from an existing class and overwrite the existing behavior. If you need to change something, you can change it all while maintaining the existing public method signatures, and no one is the wiser (hopefully).
Again, there are several reasons not to use OOP, and you don't need to. But luckily with a language like Python, you can use just a little bit or a lot, it's up to you.
An example of the student use case (no guarantee on code quality, just an example):
Object Oriented
class Student(object):
def __init__(self, name, age, gender, level, grades=None):
self.name = name
self.age = age
self.gender = gender
self.level = level
self.grades = grades or {}
def setGrade(self, course, grade):
self.grades[course] = grade
def getGrade(self, course):
return self.grades[course]
def getGPA(self):
return sum(self.grades.values())/len(self.grades)
# Define some students
john = Student("John", 12, "male", 6, {"math":3.3})
jane = Student("Jane", 12, "female", 6, {"math":3.5})
# Now we can get to the grades easily
print(john.getGPA())
print(jane.getGPA())
Standard Dict
def calculateGPA(gradeDict):
return sum(gradeDict.values())/len(gradeDict)
students = {}
# We can set the keys to variables so we might minimize typos
name, age, gender, level, grades = "name", "age", "gender", "level", "grades"
john, jane = "john", "jane"
math = "math"
students[john] = {}
students[john][age] = 12
students[john][gender] = "male"
students[john][level] = 6
students[john][grades] = {math:3.3}
students[jane] = {}
students[jane][age] = 12
students[jane][gender] = "female"
students[jane][level] = 6
students[jane][grades] = {math:3.5}
# At this point, we need to remember who the students are and where the grades are stored. Not a huge deal, but avoided by OOP.
print(calculateGPA(students[john][grades]))
print(calculateGPA(students[jane][grades]))
Whenever you need to maintain a state of your functions and it cannot be accomplished with generators (functions which yield rather than return). Generators maintain their own state.
If you want to override any of the standard operators, you need a class.
Whenever you have a use for a Visitor pattern, you'll need classes. Every other design pattern can be accomplished more effectively and cleanly with generators, context managers (which are also better implemented as generators than as classes) and POD types (dictionaries, lists and tuples, etc.).
If you want to write "pythonic" code, you should prefer context managers and generators over classes. It will be cleaner.
If you want to extend functionality, you will almost always be able to accomplish it with containment rather than inheritance.
As every rule, this has an exception. If you want to encapsulate functionality quickly (ie, write test code rather than library-level reusable code), you can encapsulate the state in a class. It will be simple and won't need to be reusable.
If you need a C++ style destructor (RIIA), you definitely do NOT want to use classes. You want context managers.
I think you do it right. Classes are reasonable when you need to simulate some business logic or difficult real-life processes with difficult relations.
As example:
Several functions with share state
More than one copy of the same state variables
To extend the behavior of an existing functionality
I also suggest you to watch this classic video
dantiston gives a great answer on why OOP can be useful. However, it is worth noting that OOP is not necessary a better choice most cases it is used. OOP has the advantage of combining data and methods together. In terms of application, I would say that use OOP only if all the functions/methods are dealing and only dealing with a particular set of data and nothing else.
Consider a functional programming refactoring of dentiston's example:
def dictMean( nums ):
return sum(nums.values())/len(nums)
# It's good to include automatic tests for production code, to ensure that updates don't break old codes
assert( dictMean({'math':3.3,'science':3.5})==3.4 )
john = {'name':'John', 'age':12, 'gender':'male', 'level':6, 'grades':{'math':3.3}}
# setGrade
john['grades']['science']=3.5
# getGrade
print(john['grades']['math'])
# getGPA
print(dictMean(john['grades']))
At a first look, it seems like all the 3 methods exclusively deal with GPA, until you realize that Student.getGPA() can be generalized as a function to compute mean of a dict, and re-used on other problems, and the other 2 methods reinvent what dict can already do.
The functional implementation gains:
Simplicity. No boilerplate class or selfs.
Easily add automatic test code right after each
function for easy maintenance.
Easily split into several programs as your code scales.
Reusability for purposes other than computing GPA.
The functional implementation loses:
Typing in 'name', 'age', 'gender' in dict key each time is not very DRY (don't repeat yourself). It's possible to avoid that by changing dict to a list. Sure, a list is less clear than a dict, but this is a none issue if you include an automatic test code below anyway.
Issues this example doesn't cover:
OOP inheritance can be supplanted by function callback.
Calling an OOP class has to create an instance of it first. This can be boring when you don't have data in __init__(self).
A class defines a real world entity. If you are working on something that exists individually and has its own logic that is separate from others, you should create a class for it. For example, a class that encapsulates database connectivity.
If this not the case, no need to create class
It depends on your idea and design. If you are a good designer, then OOPs will come out naturally in the form of various design patterns.
For simple script-level processing, OOPs can be overhead.
Simply consider the basic benefits of OOPs like reusability and extendability and make sure if they are needed or not.
OOPs make complex things simpler and simpler things complex.
Simply keep the things simple in either way using OOPs or not using OOPs. Whichever is simpler, use that.

Best way to design a class in python

So, this is more like a philosophical question for someone who is trying to understand classes.
Most of time, how i use class is actually a very bad way to use it. I think of a lot of functions and after a time just indent the code and makes it a class and replacing few stuff with self.variable if a variable is repeated a lot. (I know its bad practise)
But anyways... What i am asking is:
class FooBar:
def __init__(self,foo,bar):
self._foo = foo
self._bar = bar
self.ans = self.__execute()
def __execute(self):
return something(self._foo, self._bar)
Now there are many ways to do this:
class FooBar:
def __init__(self,foo):
self._foo = foo
def execute(self,bar):
return something(self._foo, bar)
Can you suggest which one is bad and which one is worse?
or any other way to do this.
This is just a toy example (offcourse). I mean, there is no need to have a class here if there is one function.. but lets say in __execute something() calls a whole set of other methods.. ??
Thanks
If each FooBar is responsible for bar then the first is correct. If bar is only needed for execute() but not FooBar's problem otherwise, the second is correct.
In a single phrase, the formal term you want to worry about here is Separation of Concerns.
In response to your specific example, which of the two examples you choose depends on the concerns solved by FooBar and bar. If bar is in the same problem domain as FooBar or if it otherwise makes sense for FooBar to store a reference to bar, then the second example is correct. For instance, if FooBar has multiple methods that accept a bar and if you always pass the same instance of bar to each of a particular FooBar instance's bar-related methods, then the prior example is correct. Otherwise, the latter is more correct.
Ideally, each class you create should model exactly one major concern of your program. This can get a bit tricky, because you have to decide the granularity of "major concern" for yourself. Look to your dependency tree to determine if you're doing this correctly. It should be relatively easy to pull each individual class out of your program and test it in isolation from all other classes. If this is very difficult, you haven't split your concerns correctly. More formally, good separation of concerns is accomplished by designing classes which are cohesive and loosely coupled.
While this isn't useful for the example you posted, one simple pattern that helps accomplish this on a broader scale is Inversion of Control (IoC, sometimes called Dependency Injection). Under IoC, classes are written such that they aren't aware of their dependencies directly, only the interfaces (protocols in Python-speak) which their dependencies implement. Then at run time, typically during application initialization, instances of major classes are created by factories, and they are assigned references to their actual dependencies. See this article (with this example) for an explanation of how this can be accomplished in Python.
Finally, learn the four tenets of object-oriented programming.

Python: add a parent class to a class after initial evaluation

General Python Question
I'm importing a Python library (call it animals.py) with the following class structure:
class Animal(object): pass
class Rat(Animal): pass
class Bat(Animal): pass
class Cat(Animal): pass
...
I want to add a parent class (Pet) to each of the species classes (Rat, Bat, Cat, ...); however, I cannot change the actual source of the library I'm importing, so it has to be a run time change.
The following seems to work:
import animals
class Pet(object): pass
for klass in (animals.Rat, animals.Bat, animals.Cat, ...):
klass.__bases__ = (Pet,) + klass.__bases__
Is this the best way to inject a parent class into an inheritance tree in Python without making modification to the source definition of the class to be modified?
Motivating Circumstances
I'm trying to graft persistence onto the a large library that controls lab equipment. Messing with it is out of the question. I want to give ZODB's Persistent a try. I don't want to write the mixin/facade wrapper library because I'm dealing with 100+ classes and lots of imports in my application code that would need to be updated. I'm testing options by hacking on my entry point only: setting up the DB, patching as shown above (but pulling the species classes w/ introspection on the animals module instead of explicit listing) then closing out the DB as I exit.
Mea Culpa / Request
This is an intentionally general question. I'm interested in different approaches to injecting a parent and comments on the pros and cons of those approaches. I agree that this sort of runtime chicanery would make for really confusing code. If I settle on ZODB I'll do something explicit. For now, as a regular user of python, I'm curious about the general case.
Your method is pretty much how to do it dynamically. The real question is: What does this new parent class add? If you are trying to insert your own methods in a method chain that exists in the classes already there, and they were not written properly, you won't be able to; if you are adding original methods (e.g. an interface layer), then you could possibly just use functions instead.
I am one who embraces Python's dynamic nature, and would have no problem using the code you have presented. Make sure you have good unit tests in place (dynamic or not ;), and that modifying the inheritance tree actually lets you do what you need, and enjoy Python!
You should try really hard not to do this. It is strange, and will likely end in tears.
As #agf mentions, you can use Pet as a mixin. If you tell us more about why you want to insert a parent class, we can help you find a nicer solution.

Declaring types for complex data structures in python

I am quite new to python programming (C/C++ background).
I'm writing code where I need to use complex data structures like dictionaries of dictionaries of lists.
The issue is that when I must use these objects I barely remember their structure and so how to access them.
This makes it difficult to resume working on code that was untouched for days.
A very poor solution is to use comments for each variable, but that's very inflexible.
So, given that python variables are just pointers to memory and they cannot be statically type-declared, is there any convention or rule that I could follow to ease complex data structures usage?
If you use docstrings in your classes then you can use help(vargoeshere) to see how to use it.
Whatever you do, do NOT, I repeat, do NOT use Hungarian Notation! It causes severe brain & bit rot.
So, what can you do? Python and C/C++ are quite different. In C++ you typically handle polymorphic calls like so:
void doWithFooThing(FooThing *foo) {
foo->bar();
}
Dynamic polymorphism in C++ depends on inheritance: the pointer passed to doWithFooThing may point only to instances of FooThing or one of its subclasses. Not so in Python:
def do_with_fooish(fooish):
fooish.bar()
Here, any sufficiently fooish thing (i.e. everything that has a callable bar attribute) can be used, no matter how it is releated to any other fooish thing through inheritance.
The point here is, in C++ you know what (base-)type every object has, whereas in Python you don't, and you don't care. What you try to achieve in Python is code that is reusable in as many situations as possible without having to force everthing under the rigid rule of class inheritance. Your naming should also reflect that. You dont write:
def some_action(a_list):
...
but:
def some_action(seq):
...
where seq might be not only a list, but any iterable sequence, be it list, tuple, dict, set, iterator, whatever.
In general, you put emphasis on the intent of your code, instead of its the type structure. Instead of writing:
dict_of_strings_to_dates = {}
you write:
users_birthdays = {}
It also helps to keep functions short, even more so than in C/C++. Then you'll be easily able to see what's going on.
Another thing: you shouldn't think of Python variables as pointers to memory. They're in fact dicionary entries:
assert foo.bar == getattr(foo, 'bar') == foo.__dict__['bar']
Not always exactly so, I concur, but the details can be looked up at docs.python.org.
And, BTW, in Python you don't declare stuff like you do in C/C++. You just define stuff.
I believe you should take a good look some of your complex structures, what you are doing with them, and ask... Is This Pythonic? Ask here on SO. I think you will find some cases where the complexity is an artifact of C/C++.
Include an example somewhere in your code, or in your tests.

How can I create global classes in Python (if possible)?

Let's suppose I have several functions for a RPG I'm working on...
def name_of_function():
action
and wanted to implement axe class (see below) into each function without having to rewrite each class. How would I create the class as a global class. I'm not sure if I'm using the correct terminology or not on that, but please help. This has always held me abck from creating Text based RPG games. An example of a global class would be awesome!
class axe:
attack = 5
weight = 6
description = "A lightweight battle axe."
level_required = 1
price = 10
You can't create anything that's truly global in Python - that is, something that's magically available to any module no matter what. But it hardly matters once you understand how modules and importing work.
Typically, you create classes and organize them into modules. Then you import them into whatever module needs them, which adds the class to the module's symbol table.
So for instance, you might create a module called weapons.py, and create a WeaponBase class in it, and then Axe and Broadsword classes derived from WeaponsBase. Then, in any module that needed to use weapons, you'd put
import weapons
at the top of the file. Once you do this, weapons.Axe returns the Axe class, weapons.Broadsword returns the Broadsword class, and so on. You could also use:
from weapons import Axe, Broadsword
which adds Axe and Broadsword to the module's symbol table, allowing code to do pretty much exactly what you are saying you want it to do.
You can also use
from weapons import *
but this generally is not a great idea for two reasons. First, it imports everything in the module whether you're going to use it or not - WeaponsBase, for instance. Second, you run into all kinds of confusing problems if there's a function in weapons that's got the same name as a function in the importing module.
There are a lot of subtleties in the import system. You have to be careful to make sure that modules don't try to import each other, for instance. And eventually your project gets large enough that you don't want to put all of its modules in the same directory, and you'll have to learn about things like __init__.py. But you can worry about that down the road.
i beg to differ with the view that you can't create something truly global in python. in fact, it is easy. in Python 3.1, it looks like this:
def get_builtins():
"""Due to the way Python works, ``__builtins__`` can strangely be either a module or a dictionary,
depending on whether the file is executed directly or as an import. I couldn’t care less about this
detail, so here is a method that simply returns the namespace as a dictionary."""
return getattr( __builtins__, '__dict__', __builtins__ )
like a bunch of other things, builtins are one point where Py3 differs in details from the way it used to work in Py2. read the "What's New in Python X.X" documents on python.org for details. i have no idea what the reason for the convention mentioned above might be; i just want to ignore that stuff. i think that above code should work in Py2 as well.
so the point here is there is a __builtins__ thingie which holds a lot of stuff that comes as, well, built-into Python. all the sum, max, range stuff you've come to love. well, almost everything. but you don't need the details, really. the simplest thing you could do is to say
G = get_builtins()
G[ 'G' ] = G
G[ 'axe' ] = axe
at a point in your code that is always guaranteed to execute. G stands in for the globally available namespace, and since i've registered G itself within G, G now magically transcends its existence into the background of every module. means you should use it with care. where naming collisions occur between whatever is held in G and in a module's namespace, the module's namespace should win (as it gets inspected first). also, be prepared for everybody to jump on you when you tell them you're POLLUTING THE GLOBAL NAMESPACE dammit. i'm relly surprised noone has copmplained about that as yet, here.
well, those people would be quite right, in a way. personally, however, this is one of my main application composition techniques: what you do is you take a step away from the all-purpose module (which shouldn't do such a thing) towards a fine-tuned application-specific namespace. your modules are bound not to work outside that namespace, but then they're not supposed to, either. i actually started this style of programming as an experimental rebellion against (1) established views, hah!, and (2) the desperation that befalls me whenever i want to accomplish something less than trivial using Python's import statement. these days, i only use import for standard library stuff and regularly-installed modules; for my own stuff, i use a homegrown system. what can i say, it works!
ah yes, two more points: do yourself a favor, if you like this solution, and write yourself a publish() method or the like that oversees you never publish a name that has already been taken. in most cases, you do not want that.
lastly, let me second the first commenter: i have been programming in exactly the style you show above, coz that's what you find in the textbook examples (most of the time using cars, not axes to be sure). for a rather substantial number of reasons, i've pretty much given up on that.
consider this: JSON defines but seven kinds of data: null, true, false, numbers, texts, lists, dictionaries, that's it. i claim you can model any other useful datatype with those.
there is still a lot of justification for things like sets, bags, ordered dictionaries and so on. the claim here is not that it is always convenient or appropriate to fall back on a pure, directly JSON-compatible form; the claim is only that it is possible to simulate. right now, i'm implementing a sparse list for use in a messaging system, and that data type i do implement in classical OOP. that's what it's good for.
but i never define classes that go beyond these generic datatypes. rather, i write libraries that take generic datatypes and that provide the functionality you need. all of my business data (in your case probably representations of players, scenes, implements and so on) go into generic data container (as a rule, mostly dicts). i know there are open questions with this way of architecturing things, but programming has become ever so much easier, so much more fluent since i broke thru the BS that part of OOP propaganda is (apart from the really useful and nice things that another part of OOP is).
oh yes, and did i mention that as long as you keep your business data in JSON-compatible objects you can always write them to and resurrect them from the disk? or send them over the wire so you can interact with remote players? and how incredibly twisted the serialization business can become in classical OOP if you want to do that (read this for the tip of the iceberg)? most of the technical detail you have to know in this field is completely meaningless for the rest of your life.
You can add (or change existing) Python built-in functions and classes by doing either of the following, at least in Py 2.x. Afterwards, whatever you add will available to all code by default, although not permanently.
Disclaimer: Doing this sort of thing can be dangerous due to possible name clashes and problematic due to the fact that it's extremely non-explicit. But, hey, as they say, we're all adults here, right?
class MyCLass: pass
# one way
setattr(__builtins__, 'MyCLass', MyCLass)
# another way
import __builtin__
__builtin__.MyCLass = MyCLass
Another way is to create a singleton:
class Singleton(type):
def __init__(cls, name, bases, dict):
super(Singleton, cls).__init__(name, bases, dict)
cls.instance = None
class GlobalClass(object):
__metaclass__ = Singleton
def __init__():
pinrt("I am global and whenever attributes are added in one instance, any other instance will be affected as well.")

Categories

Resources