I have a rather lengthy class for data analysis. In this class there are functions for input, output, plotting, different analysis steps and so on. I really would like to split this class to smaller, easier to read subclasses.
The most easy way would of course be to define a superclass and then inherit multiple subclasses. However, this is not what I want because functions of on subclass cannot change the variables of another subclass.
What I want to have is a splitting of the class definition into multiple files where I can group certain methods.
The structure should be something like:
master.py # contains something that puts together all the parts
io.py # contains function for data input / output
plot.py # contains functions for plotting / visualization of data
analyze1.py # contains functions to perform certain analysis steps
analyze2.py # contains functions to perform certain analysis steps
Take a look at mixins:
plot.py:
class DataPlotter(object):
def plot(self):
# lots of code
my_plot_lib.plot(self.data) # assume self.data is available in instance
io.py:
class DataIOProvider(object):
def read(self, filename):
# lots of code
self.data = magic_data
master.py:
from plot import DataPlotter
from io import DataIOProvider
class GodDataProcessor(DataPlotter, DataIOProvider):
def run(self):
self.read('my_file.txt')
self.plot()
Note that you should wrap your code in some package to avoid name clashing (io is a built-in module name in Python).
All base classes may reside in individual modules, and when attribute is set in one of base classes, simply assume it's available in all other classes.
Related
I need to connect three classes so as to some of them could use the others' methods.
Here I show an example of how classes work. As it's seen, data is input throughout Data class and is manipulated by Statistics and Plotting classes.
class Data (object): # This class read a file and creates a DataFrame object
def __init__(input_data):
def Tool:
# [df managing operations]
return df
class Statistics: # This class use Data dataframe and manipulate it.
def mean(df):
return scalar
class Plotting: # This class plot Data dataframe in function of Statistics outputs
def with_colors (df, scalar):
I don't think Plotting or Statistics map well to classes or instances.
They look more like libraries of functions. Otherwise you will instantiate a single Plotting and single Statistics just to call their methods on something else.
It looks like you grouped your utility methods in classes and ended up with too many methods. This is just an organization/partition problem.
If you want you can just make them modules, define functions there, import the relevant functions into the main program and pass those functions the data they need as arguments.
Also it looks you are just creating a dataframe-like object and adding methods to it. And reading data from somewhere looks like just another utility function.
While nothing stops you from doing those things, including inheriting from dataframe to make your own extended version, I think you are better using df objects as-is, and passing them around, to utility functions.
Try passing a Data object in Statistics and in Plotting classes' methods.
I have a quite a bit of confusion on how to use classes. I understand what they are, and why they should be used, just not how. For example, we're given a pre-made class (I'll call it class Class_1(object) to keep things simple) with a few functions (methods, right?) and variables in it.
class Class_1(object):
var_1= [a,b,c]
var_2= [x,y,z]
var_3= {n:[o,p],g:[h,i]}
def method_1(self):
'''here's a method'''
(As a side note, the Class_1(object) does have the __init__(self): method already done.)
Now, in a separate program, I've imported the file that contains that class at the top of the program, but how do I use methods or variables from the class? For example, if I want to check a user input against a value in var_1, how would I do that?
I've gotten better with functions in general, but calling on classes and methods is as clear as mud.
Edit: Realized I said "methods" instead of "variables" when I actually need both.
To use the class, you need to create an class instance from the separate file:
import filename1
class1 = filename1.Class_1()
With the instance, you can then access the member variables:
value1 = class1.method_1
I have a class that looks something like the following:
# Class violates the Single Responsibility Principle
class Baz:
data = [42]
def do_foo_to_data(self):
# call a dozen functions that do complicated stuff to data
def do_bar_to_data(self):
# call other functions that do different stuff to data
I want to break it into two separate classes because it violates the SRP. The functions called by do_foo_to_data() are completely distinct from those called by do_bar_to_data(). Yet they must operate on the same data.
I've come up with a bunch of solutions, but they're all ugly. Is there a way to do this cleanly, preferably in Python 3 (though 2.7 is OK too)?
The best of my "solutions" is below:
# I find this hard to read and understand
class Baz:
data = [42]
def create_foo(self):
return Baz.Foo()
def create_bar(self):
return Baz.Bar()
class Foo:
def do_foo_to_data(self):
# call foo functions
class Bar:
def do_bar_to_data(self):
# call bar functions
Note: It's not essential to me that the data member be a class member.
I only expect to create one instance of Baz; but I didn't want to ask two questions in one post and start a discussion about singletons.
This is not an elegant solution. You better pass a reference to the object you want them to operate on. So something like:
class Foo:
def __init__(self,data):
self.data = data
def do_foo_to_data(self):
#...
self.data[0] = 14
pass
class Bar:
def __init__(self,data):
self.data = data
def do_bar_to_data(self):
#...
self.data.append(15)
pass
(I added sample manipulations like self.data[0] = 14 and self.data.append(15))
And now you construct the data. For instance:
data = [42]
Next you construct a Foo and a Bar and pass a reference to data like:
foo = Foo(data)
bar = Bar(data)
__init__ is what most programming languages call the constructor and as you have seen in the first fragment, it requires an additional parameter data (in this case it is a reference to our constructed data).
and then you can for instance call:
foo.do_foo_to_data()
which will set data to [14] and
bar.do_bar_to_data()
which will result in data being equal to [14,15].
Mind that you cannot state self.data = ['a','new','list'] or something equivalent in do_foo_to_data or do_bar_to_data because this would change the reference to a new object. Instead you could for instance .clear() the list, and append new elements to it like:
def do_foo_to_data(self): #alternative version
#...
self.data.clear()
self.data.append('a')
self.data.append('new')
self.data.append('list')
Finally to answer your remark:
preferably in Python 3 (though 2.7 is OK too)?
The technique demonstrated is almost universal (meaning it is available in nearly every programming language). So this will work in both python-3.x and python-2.7.
Why do you even need a class for that? All you want is two separated functions which do some job on some data.
data = [42]
def foo(data):
data.append('sample operation foo')
def bar(data):
data.append('sample operation bar')
Problem solved.
You can pull out the distinct groups of functionality to separate mix-in classes:
class Foo:
"""Mixin class.
Requires self.data (must be provided by classes extending this class).
"""
def do_foo_to_data(self):
# call a dozen functions that do complicated stuff to data
class Bar:
"""Mixin class.
Requires self.data (must be provided by classes extending this class).
"""
def do_bar_to_data(self):
# call other functions that do different stuff to data
class Baz(Foo, Baz):
data = [42]
This relies on Python's duck-typing behavior. You should only apply the Foo and Bar mix-ins to classes that actually provide self.data, like the Baz class here does.
This might be suitable where certain classes are by convention required to provide certain attributes anyway, such as customized view classes in Django. However, when such conventions aren't already in place, you might not want to introduce new ones. It's too easy to miss the documentation and then have NameErrors at runtime. So let's make the dependency explicit, rather than only documenting it. How? With a mix-in for the mix-ins!
class Data:
"""Mixin class"""
data = [42]
class Foo(Data):
"""Mixin class"""
def do_foo_to_data(self):
# call a dozen functions that do complicated stuff to data
class Bar(Data):
"""Mixin class"""
def do_bar_to_data(self):
# call other functions that do different stuff to data
class Baz(Foo, Baz):
pass
Whether this is appropriate for your use-case is difficult to say at this level of abstraction. As RayLuo's answer shows, you might not need classes at all. Instead, you could put the different groups of functions into different modules or packages, to organize them.
Background:
I have been working on a game in Python, and in order to keep everything clean, organized and in a pythonic way, I have a folder containing multiple python files, each containing one big class, for example "MapEngine" or "NPCEngine".
From main.py, I am loading each class from each file and "glueing everything together with a "Game" class, such as:
from folder import *
class Game:
def __init__(self):
self.MapEngine = MapEngine.MapEngine()
...
def loop(self):
...
Since classes such as "CollisionEngine" requires data from other classes such as, "MapEngine", I usually assign some variables in the former (i.e. CollisionEngine) to the latter (i.e MapEngine), in order to use MapEngine's loaded map data or functions:
class CollisionEngine:
def __init__(self, MapClass, ...):
self.MapEngine = MapClass
Problem:
Well, since many classes have to be linked to others, it became hard after a while to figure out which class to load first in order to assign variables. Furthermore, classes like "EventEngine" need to have access to every other class. My code became hard to read, and I have trouble when 2 classes are equally important to each other.
Question:
I have heard of class inheritance, but I do not think it can be applied here because each class is very different as in its function. Therefore, is there a way to beautifully link every class together, as if it was all part of one big class? In other words, is there a way to refer to variables from other classes, from within a class?
(My thoughts: Perhaps, I can write a class called "Request", and it will act as a top level class manager. Although, I think I will have to use functions such as exec() or eval(), which are not efficient and are somewhat dangerous.)
This is my first post, I've tried to be as explicit as possible, please ask me for clarification, & thank you for your reply!
Consider separating your project into layers - that should help you keep things more organised and make the imports more natural.
The principle is that lower layers of your code "cake" shouldn't depend on (read: import) upper layers of your code.
For example you might have a foundation layer which contains common data structures, util classes and algorithms that are used in lots of your code at various layers.
Then you might have a model layer which depends on the foundation layer (i.e. data structures/utils/algorithms) but nothing else. The model layer provides models of objects within the domain.
You might then have a game layer which depends on the model layer (so it would be quite reasonable for modules in your game layer to import things from the model layer, but not vice versa).
Well, after many tries, I have figured out a (sketchy) way of solving my problem. Of course, as eddiewould suggested, I will have a better organization and multiple layers for my code, but if one would like to have multiple classes all linked together, simply include a variable to the main class (that called every class) to every class. I believe that a code snippet will explain it better:
main.py
engine_folder
----> engine_1.py
----> engine_2.py
in main.py, engine_1 and engine_2 are loaded:
from engine_folder import engine_1, engine_2
class game:
def __init__(self):
self.engine_1 = engine_1.engine(self, ...)
self.engine_2 = engine_2.engine(self, ...)
#where engine_1.engine and engine_2.engine are
#two classes that need to transfer data between
#each other
def run(self):
self.engine_1.run()
self.engine_2.run()
Notice how engine_1.engine's first argument is self, which refers to the top level class which called this class. Now, in engine_1, if we would want to print a variable from engine_2, the class would look similar to this:
class engine:
def __init__(self, top_level_class, ...):
self.top_level_class = top_level_class
def run(self):
print self.top_level_class.engine_2.random_var
This is very beautiful (besides the fact that print self.top_level_class.engine_2.random_var is very long), but compared to something like:
class EventEngine:
def __init__(self, Camera_Class, Map_Class, Character_Class, Skill_Class,
Interface_Class, Sprite_Class, Dialog_Class, Game_Class,
Item_Class):
self.ItemEngine = Item_Class
self.GameEngine = Game_Class
self.DialogEngine = Dialog_Class
self.SpriteEngine = Sprite_Class
self.SkillEngine = Skill_Class
self.CameraEngine = Camera_Class
self.MapEngine = Map_Class
self.CharacterEngine = Character_Class
self.IEngine = Interface_Class
The new version:
class EventEngine:
def __init__(self, top_level_class):
self.top = top_level_class
#a var from Map_Class can be called as such:
#self.top.MapEngine.map[0][1]
#and I can call variables from every class, not only those I
#have loaded like before
is much better and much cleaner.
I want to create a class with two methods at this point (I also want to be able to
alter the class obviously).
class ogrGeo(object):
def __init__(self):
pass
def CreateLine(self, o_file, xy):
#lots of code
def CreatePoint(self, o_file, xy):
# lot's of the same code as CreateLine(),
# only minor differences
To keep things as clean and to to repeat as
less code as possible I'm asking for some advise. The two methods CreateLine()
and CreatePoint() share a lot of code. To reduce redundance:
Should a define third method that both methods can call?
In this case you could still call
o = ogrGeo()
o.CreateLine(...)
o.CreatePoint(...)seperatly.
Or should I merge them into one method? Is there another solution I haven't thought about or know nothing about?
Thanks already for any suggestions.
Whether you should merge the methods into one is a matter of API design. If the functions have a different purpose, then you keep them seperate. I would merge them if client code is likely to follow the pattern
if some_condition:
o.CreateLine(f, xy)
else:
o.CreatePoint(f, xy)
But otherwise, don't merge. Instead, refactor the common code into a private method, or even a freestanding function if the common code does not touch self. Python has no notion of "private method" built into the language, but names with a leading _ will be recognized as such.
It's perfectly normal to factor out common code into a (private) helper method:
class ogrGeo(object)
def __init__(self):
pass
def CreateLine(self, o_file, xy):
#lots of code
value = self._utility_method(xy)
def CreatePoint(self, o_file, xy):
# lot's of the same code as CreateLine(),
# only minor differences
value = self._utility_method(xy)
def _utility_method(self, xy):
# Common code here
return value
The method could return a value, or it could directly manipulate the attributes on self.
A word of advice: read the Python style guide and stick to it's conventions. Most other python projects do, and it'll make your code easier to comprehend for other Python developers if you do.
For the pieces of code that will overlap, consider whether those can be their own separate functions as well. Then CreateLine would be comprised of several calls to certain functions, with parameter choices that make sense for CreateLine, meanwhile CreatePoint would be several function calls with appropriate parameters for creating a point.
Even if those new auxiliary functions aren't going to be used elsewhere, it's better to modularize them as separate functions than to copy/paste code. But, if it is the case that the auxialiary functions needed to create these structures are pretty specific, then why not break them out into their own classes?
You could make an "Object" class that involves all of the basics for creating objects, and then have "Line" and "Point" classes which derive from "Object". Within those classes, override the necessary functions so that the construction is specific, relying on auxiliary functions in the base "Object" class for the portions of code that overlap.
Then the ogrGeo class will construct instances of these other classes. Even if the ultimate consumer of "Line" or "Shape" doesn't need a full blown class object, you can still use this design, and give ogrGeo the ability to return the sub-pieces of a Line instance or a Point instance that the consumer does wish to use.
It hardly matters. You want the class methods to be as usable as possible for the calling programs, and it's slightly easier and more efficient to have two methods than to have a single method with an additional parameter for the type of object to be created:
def CreateObj(self, obj, o_file, xy) # obj = 0 for Point, 1 for Line, ...
Recommendation: use separate API calls and factor the common code into method(s) that can be called within your class.
You as well could go the other direction. Especially if the following is the case:
def methA/B(...):
lots of common code
small difference
lots of common code
then you could do
def _common(..., callback):
lots of common code
callback()
lots of common code
def methA(...):
def _mypart(): do what A does
_common(..., _mypart)
def methB(...):
def _mypart(): do what B does
_common(..., _mypart)