Function Encapsulation Efficiency in Python - python

I have a large set of objects, and I need to be able to do some complex stuff with each object, so I have some fairly long functions.
In general, is it better to put the long functions in the class that they'll actually be used in (GreatObject, below) for proper encapsulation, or is it better for efficiency to put one function in the collection class (GreatSet, which will only ever have one instance)?
class GreatSet(object):
def __init__(self):
self.great_set = [] # Will contain a lot of GreatObjects.
def long_method(self, great_object): # Is this function better here?
[Many lines of code]
class GreatObject(object):
def __init__(self, params):
self.params = params
def.long_method(self): # Or here?
[Many lines of code]
I'm using Python 2.7.

in both cases, long_method will belong to it's class (there will be a single long_method function per class, shared by all instances), and in both cases looking up obj.long_method will create a new Method instance for each lookup, so wrt/ "efficiency" (whatever it's supposed to mean) it won't make any difference. Also, unless you need maximum time and space performances - in which case a lower level language might be a best choice - you should really feel more concerned with proper design and maintainability than with maximum raw performances.
So, if long_method is supposed to work on GreatObject it might belong to GreatObject, but it depends on the respective responsabilities of those classes, what long_method really do, and which application layers long_method, GreatObject and GreatSet belong to. If for example GreatObject and GreatSet both belong to the domain model and long_method do presentation-related stuff then obviously long_method belongs neither in GreatObject nor GreatSet.
Finally, as PartialOrder mentions in his comment, "long" functions are most often a code / design smell. Sometimes a function has to be "long enough" because it has do something complex - and even then you can usually refactor it into smaller functions (eventually into methods of a distinct class if those functions need to share state) -, but quite often a long function means it's just doing too many things.

Related

Where to put a function that acts on two instances of a specific class

This is really a design question and I would like to know a bit of what design patterns to use.
I have a module, let's say curves.py that defines a Bezier class. Then I want to write a function intersection which uses a recursive algorithm to find the intersections between two instances of Bezier.
What options do I have for where to put this functions? What are some best practices in this case? Currently I have written the function in the module itself (and not as a method to the class).
So currently I have something like:
def intersections(inst1, inst2): ...
def Bezier(): ...
and I can call the function by passing two instances:
from curves import Bezier, intersections
a = Bezier()
b = Bezier()
result = intersections(a, b)
However, another option (that I can think of) would be to make intersection a method of the class. In this case I would instead use
a.intersections(b)
For me the first choice makes a bit more sense since it feels more natural to call intersections(a, b) than a.intersections(b). However, the other option feels more natural in the sense that the function intersection really only acts on Bezier instances and this feels more encapsulated.
Do you think one of these is better than the other, and in that case, for what reasons? Are there any other design options to use here? Are there any best practices?
As an example, you can compare how the builtin set class does this:
intersection(*others)
set & other & ...
Return a new set with elements common to the set and all others.
So intersection is defined as a regular instance method on the class that takes another (or multiple) sets and returns the intersection, and it can be called as a.intersection(b).
However, due to the standard mechanics of how instance methods work, you can also spell it set.intersection(a, b) and in practice you'll see this quite often since like you say it feels more natural.
You can also override the __and__ method so this becomes available as a & b.
In terms of ease of use, putting it on the class is also friendlier, because you can just import the Bezier class and have all associated features available automatically, and they're also discoverable via help(Bezier).

Best practice: how to pass many arguments to a function?

I am running some numerical simulations, in which my main function must receive lots and lots of arguments - I'm talking 10 to 30 arguments depending on the simulation to run.
What are some best practices to handle cases like this? Dividing the code into, say, 10 functions with 3 arguments each doesn't sound very feasible in my case.
What I do is create an instance of a class (with no methods), store the inputs as attributes of that instance, then pass the instance - so the function receives only one input.
I like this because the code looks clean, easy to read, and because I find it easy to define and run alternative scenarios.
I dislike it because accessing class attributes within a function is slower than accessing a local variable (see: How / why to optimise code by copying class attributes to local variables?) and because it is not an efficient use of memory - too much data stored multiple times unnecessarily.
Any thoughts or recommendations?
myinput=MyInput()
myinput.input_sql_table = that_sql_table
myinput.input_file = that_input_file
myinput.param1 = param1
myinput.param2 = param2
myoutput = calc(myinput)
Alternative scenarios:
inputs=collections.OrderedDict()
scenarios=collections.OrderedDict()
inputs['base scenario']=copy.deepcopy(myinput)
inputs['param2 = 100']=copy.deepcopy(myinput)
inputs['param2 = 100'].param2 = 100
# loop through all the inputs and stores the outputs in the ordered dictionary scenarios
I don't think this is really a StackOverflow question, more of a Software Engineering question. For example check out this question.
As far as whether or not this is a good design pattern, this is an excellent way to handle a large number of arguments. You mentioned that this isn't very efficient in terms of memory or speed, but I think you're making an improper micro-optimization.
As far as memory is concerned, the overhead of running the Python interpreter is going to dwarf the couple of extra bytes used by instantiating your class.
Unless you have run a profiler and determined that accessing members of that options class is slowing you down, I wouldn't worry about it. This is especially the case because you're using Python. If speed is a real concern, you should be using something else.
You may not be aware of this, but most of the large scale number crunching libraries for Python aren't actually written in Python, they're just wrappers around C/C++ libraries that are much faster.
I recommend reading this article, it is well established that "Premature optimization is the root of all evil".
You could pass in a dictionary like so:
all_the_kwargs = {kwarg1: 0, kwarg2: 1, kwargN: xyz}
some_func_or_class(**all_the_kwargs)
def some_func_or_class(kwarg1: int = -1, kwarg2: int = 0, kwargN: str = ''):
print(kwarg1, kwarg2, kwargN)
Or you could use several named tuples like referenced here: Type hints in namedtuple
also note that depending on which version of python you are using there may be a limit to the number of arguments you can pass into a function call.
Or you could use just a dictionary:
def some_func(a_dictionary):
a_dictionary.get('argXYZ', None) # defaults to None if argXYZ doesn't exist

memory usage #on_trait_change vs _foo_changed()

I did built an application with Enthought Traits, which is using too much memory. I think, the problem is caused by trait notifications:
There seems to be a fundamental difference in memory usage of events caught by #on_trait_change or by using the special naming convention (e.g. _foo_changed() ). I made a little example with two classes Foo and FooDecorator, which i assumed to show exactly the same behaviour. But they don't!
from traits.api import *
class Foo(HasTraits):
a = List(Int)
def _a_changed(self):
pass
def _a_items_changed(self):
pass
class FooDecorator(HasTraits):
a = List(Int)
#on_trait_change('a[]')
def bar(self):
pass
if __name__ == '__main__':
n = 100000
c = FooDecorator
a = [c() for i in range(n)]
When running this script with c = Foo, Windows task manager shows a memory usage for the whole python process of 70MB, which stays constant for increasing n. For c = FooDecorator, the python process is using 450MB, increasing for higher n.
Can you please explain this behaviour to me?
EDIT: Maybe i should rephrase: Why would anyone choose FooDecorator over Foo?
EDIT 2: I just uninstalled python(x,y) 2.7.9 and installed the newest version of canopy with traits 4.5.0. Now the 450MB became 750MB.
EDIT 3: Compiled traits-4.6.0.dev0-py2.7-win-amd64 myself. The outcome is the same as in EDIT 2. So despite all plausibility https://github.com/enthought/traits/pull/248/files does not seem to be the cause.
I believe you are seeing the effect of a memory leak that has been fixed recently:
https://github.com/enthought/traits/pull/248/files
As for why one would use the decorator, in this particular instance the two versions are practically equivalent.
In general, the decorator is more flexible: you can give a list of traits to listen to, and you can use the extended name notation, as described here:
http://docs.enthought.com/traits/traits_user_manual/notification.html#semantics
For example, in this case:
class Bar(HasTraits):
b = Str
class FooDecorator(HasTraits):
a = List(Bar)
#on_trait_change('a.b')
def bar(self):
print 'change'
the bar notifier is going to be called for changes to the trait a, its items, and for the change of the trait b in each of the Bar items. Extended names can be quite powerful.
What's going on here is that Traits has two distinct ways of handling notifications: static notifiers and dynamic notifiers.
Static notifiers (such as those created by the specially-named _*_changed() methods) are fairly light-weight: each trait on an instance has a list of notifiers on t, which are basically the functions or methods with a lightweight wrapper.
Dynamic notifiers (such as those created with on_trait_change() and the extended trait name conventions like a[] are significantly more powerful and flexible, but as a result they are much more heavy-weight. In particular, in addition to the wrapper object they create, they also create a parsed representation of the extended trait name and a handler object, some of which are in-turn HasTraits subclass instances.
As a result, even for a simple expression like a[] there will be a fair number of new Python objects created, and these objects have to be created for every on_trait_change listener on every instance separately to properly handle corner-cases like instance traits. The relevant code is here: https://github.com/enthought/traits/blob/master/traits/has_traits.py#L2330
Base on the reported numbers, the majority of the difference in memory usage that you are seeing is in the creation of this dynamic listener infrastructure for each instance and each on_trait_change decorator.
It's worth noting that there is a short-circuit for on_trait_change in the case where you are using a simple trait name, in which case it generates a static trait notifier instead of a dynamic notifier. So if you were to instead write something like:
class FooSimpleDecorator(HasTraits):
a = List(Int)
#on_trait_change('a')
def a_updated(self):
pass
#on_trait_change('a_items')
def a_items_updated(self):
pass
you should see similar memory performance to the specially-named methods.
To answer the rephrased question about "why use on_trait_change", in FooDecorator you can write one method instead of two if your response to a change of either the list or any items in the list is the same. This makes code significantly easier to debug and maintain, and if you aren't creating thousands of these objects then the extra memory usage is negligible.
This becomes even more of a factor when you consider more sophisticated extended trait name patterns, where the dynamic listeners automatically handle changes which would otherwise require significant manual (and error-prone) code for hooking up and removing listeners from intermediate objects and traits. The power and simplicity of this approach usually outweighs the concerns about memory usage.

Explanation of importing Python classes

So I know this could be considered quite a broad quesiton, for which I am sorry, but I'm having problems understanding the whole importing and __init__ and self. things and all that... I've tried reading through the Python documentation and a few other tutorials, but this is my first language, and I'm a little (a lot) confused.
So far through my first semester at university I have learnt the very basics of Python, functions, numeric types, sequence types, basic logic stuff. But it's moving slower than I would like, so I took it upon myself to try learn a bit more and create a basic text based, strategy, resource management sorta game inspired by Ogame.
First problem I ran into was how to define each building, for example each mine, which produces resources. I did some research and found classes were useful, so I have something like this for each building:
class metal_mine:
level = 1
base_production = 15
cost_metal = 40
cost_crystal = 10
power_use = 10
def calc_production():
metal_mine.production = A formula goes here
def calc_cost_metal():
etc, same for crystal
def calc_power_use():
metal_mine.power_use = blah blah
def upgrade():
various things
solar_plant.calc_available_power()
It's kinda long, I left a lot out. Anyway, so the kinda important bit is that last bit, you see when I upgrade the mine, to determine if it has enough power to run, I calculate the power output of the solar plant which is in its own class (solar_plant.calc_output()), which contains many similar things to the metal mine class. If I throw everything in the same module, this all works fantastically, however with many buildings and research levels and the likes, it gets very long and I get lost in it.
So I tried to split it into different modules, so one for mines, one for storage buildings, one for research levels, etc. This makes everything very tidy, however I still need a way to call the functions in classes which are now part of a different module. My initial solution was to put, for example, from power import *, which for the most part, made the solar_plant class available in the metal_mine class. I say for the most part, because depending on the order in which I try to do things, sometimes it seems this doesn't work. The solar_plant class itself calls on variables from the metal_mine class, now I know this is getting very spagetti-ish..but I don't know of any better conventions to follow yet.
Anyway, sometimes when I call the solar_plant class, and it in turn tries to call the metal_mine class, it says that metal_mine is not defined, which leads me to think somehow the modules or classes need to be initialized? There seems to be a bit of looping between things in the code. And depending on the order in which I try and 'play the game', sometimes I am unintentionally doing this, sometimes I'm not. I haven't yet been taught the conventions and details of importing and reloading and all that..so I have no idea if I am taking the right approach or anything.
Provided everything I just said made sense, could I get some input on how I would properly go about making the contents of these various modules freely available and modifiable to others? Am I perhaps trying to split things into different modules which you wouldn't normally do, and I should just deal with the large module? Or am I importing things wrong? Or...?
And on a side note, in most tutorials and places I look for help on this, I see classes or functions full of self.something and the init function..can I get a explanation of this? Or a link to a simple first-time-programmer's tutorial?
==================UPDATE=================
Ok so too broad, like I thought it might be. Based on the help I got, I think I can narrow it down.
I sorted out what I think need to be the class variables, those which don't change - name, base_cost_metal, and base_cost_crystal, all the rest would depend on the players currently selected building of that type (supposing they could have multiple settlements).
To take a snippet of what I have now:
class metal_mine:
name = 'Metal Mine'
base_cost_metal = 60
base_cost_crystal = 15
def __init__(self):
self.level = 0
self.production = 30
self.cost_metal = 60
self.cost_crystal = 15
self.power_use = 0
self.efficiency = 1
def calc_production(self):
self.production = int(30 + (self.efficiency * int(30 * self.level * 1.1 * self.level)))
def calc_cost_metal(self):
self.cost_metal = int(metal_mine.base_cost_metal * 1.5 ** self.level)
So to my understanding, this is now a more correctly defined class? I define the instance variables with their starting values, which are then changed as the user plays.
In my main function where I begin the game, I would create an instance of each mine, say, player_metal_mine = metal_mine(), and then I call all the functions and variables with the likes of
>>> player_metal_mine.level
0
>>> player_metal_mine.upgrade()
>>> player_metal_mine.level
1
So if this is correctly defined, do I now just import each of my modules with these new templates for each building? and once they are imported, and an instance created, are all the new instances and their variables contained within the scope(right terminology?) of the main module, meaning no need for new importing or reloading?
Provided the answer to that is yes, I do just need to import, what method should I use? I understand there is just import mines for example, but that means I would have to use mines.player_metal_mine.upgrade() to use it, which is a tiny bit more typing thanusing the likes of from mines import *, or more particularly, from mines import metal_mine, though that last options means I need to individually import every building from every module. So like I said, provided, yes, I am just importing it, what method is best?
==================UPDATE 2================= (You can probably skip the wall of text and just read this)
So I went through everything, corrected all my classes, everything seems to be importing correctly using from module import *, but I am having issues with the scope of my variables representing the resource levels.
If everything was in 1 module, right at the top I would declare each variable and give it the beginning value, e.g. metal = 1000. Then, in any method of my classes which alters this, such as upgrading a building, costing resources, or in any function which alters this, like the one which periodically adds all the production to the current resource levels, I put global metal, for example, at the top. Then, within the function, I can call and alter the value of metal no problem.
However now that I am importing these classes and functions from various modules all into 1 module, functions cant find these variables. What I thought would happen was that in the process of importing I would basically be saying, take everything in this module, and pretend its now in this one, and work with it. But apparently that's not what is happening.
In my main module, I import all my modules using from mines import * for example and define the value of say, metal, to be 1000. Now I create an instance of a metal mine, `metal_mine_one = metal_mine(), and I can call its methods and variables, e.g.
>>> metal_mine_one.production
30
But when I try call a method like metal_mine_one.upgrade(), which contains global metal, and then metal -= self.cost_metal, it give me an error saying metal is not defined. Like I said, if this is all in 1 module, this problem doesn't happen, but if I try to import things, it does.
So how can I import these modules in a way which doesn't cause this problem, and makes variables in the global scope of my main module available to all functions and methods within all imported modules?
First a little background on object oriented programming. i.e. classes. You should think of a class like a blueprint, it shows how to make something. When you make a class it describes how to make an object to the program. a simple class in python might look like this.
class foo:
def __init__(self, bars_starting_value):
self.bar = bars_starting_value
def print_bar(self):
print(self.bar)
This tells python how to make a foo object. The init function is called a constructor. It is called when you make a new foo. The self is a way of referencing the foo that is running the function. In this case every foo has its own bar which can be accessed from within a foo by using self.bar. Note that you have to put a self as the first argument of the function definition this makes it so those functions belong to a single foo and not all of them.
One might use this class like this:
my_foo = foo(15)
my_other_foo = foo(100)
foo.print_bar()
foo.bar = 20
print(foo.bar)
my_other_foo.print_bar()
This would output
15
20
100
As far as imports go. They take all things that are defined in one file and move them to be defined in another. This is useful if you put the a class definition in a file you can import it into your main program file and make objects from there.
As far as making variables available to others, you could pass the power that has been generated from all the generators to the mine's function to determine if it has enough power.
Hope this helps.
A lot of things to cover here.. init is a builtin method that is automatically called when an instance of a class is created. In the code you provided you've created a class, now you need to create an instance of that class. A simpler example:
class Test:
def __init__(self):
print "this is called when you create an instance of this class"
def a_method(self):
return True
class_instance = Test()
>>> "this is called when you create an instance of this class"
class_instance.a_method()
>>> True
The first argument in a class method is *always itself. By convention we just call that argument 'self'. Your methods did not accept any arguments, make sure they accept self (or have the decorator #staticmethod above them). Also, make sure you refer to attributes (in you case methods) by self.a_method or class_instance.a_method

Creating a hook to a frequently accessed object

I have an application which relies heavily on a Context instance that serves as the access point to the context in which a given calculation is performed.
If I want to provide access to the Context instance, I can:
rely on global
pass the Context as a parameter to all the functions that require it
I would rather not use global variables, and passing the Context instance to all the functions is cumbersome and verbose.
How would you "hide, but make accessible" the calculation Context?
For example, imagine that Context simply computes the state (position and velocity) of planets according to different data.
class Context(object):
def state(self, planet, epoch):
"""base class --- suppose `state` is meant
to return a tuple of vectors."""
raise NotImplementedError("provide an implementation!")
class DE405Context(Context):
"""Concrete context using DE405 planetary ephemeris"""
def state(self, planet, epoch):
"""suppose that de405 reader exists and can provide
the required (position, velocity) tuple."""
return de405reader(planet, epoch)
def angular_momentum(planet, epoch, context):
"""suppose we care about the angular momentum of the planet,
and that `cross` exists"""
r, v = context.state(planet, epoch)
return cross(r, v)
# a second alternative, a "Calculator" class that contains the context
class Calculator(object):
def __init__(self, context):
self._ctx = context
def angular_momentum(self, planet, epoch):
r, v = self._ctx.state(planet, epoch)
return cross(r, v)
# use as follows:
my_context = DE405Context()
now = now() # assume this function returns an epoch
# first case:
print angular_momentum("Saturn", now, my_context)
# second case:
calculator = Calculator(my_context)
print calculator.angular_momentum("Saturn", now)
Of course, I could add all the operations directly into "Context", but it does not feel right.
In real life, the Context not only computes positions of planets! It computes many more things, and it serves as the access point to a lot of data.
So, to make my question more succinct: how do you deal with objects which need to be accessed by many classes?
I am currently exploring: python's context manager, but without much luck. I also thought about dynamically adding a property "context" to all functions directly (functions are objects, so they can have an access point to arbitrary objects), i.e.:
def angular_momentum(self, planet, epoch):
r, v = angular_momentum.ctx.state(planet, epoch)
return cross(r, v)
# somewhere before calling anything...
import angular_momentum
angular_momentum.ctx = my_context
edit
Something that would be great, is to create a "calculation context" with a with statement, for example:
with my_context:
h = angular_momentum("Earth", now)
Of course, I can already do that if I simply write:
with my_context as ctx:
h = angular_momentum("Earth", now, ctx) # first implementation above
Maybe a variation of this with the Strategy pattern?
You generally don't want to "hide" anything in Python. You may want to signal human readers that they should treat it as "private", but this really just means "you should be able to understand my API even if you ignore this object", not "you can't access this".
The idiomatic way to do that in Python is to prefix it with an underscore—and, if your module might ever be used with from foo import *, add an explicit __all__ global that lists all the public exports. Again, neither of these will actually prevent anyone from seeing your variable, or even accessing it from outside after import foo.
See PEP 8 on Global Variable Names for more details.
Some style guides suggest special prefixes, all-caps-names, or other special distinguishing marks for globals, but PEP 8 specifically says that the conventions are the same, except for the __all__ and/or leading underscore.
Meanwhile, the behavior you want is clearly that of a global variable—a single object that everyone implicitly shares and references. Trying to disguise it as anything other than what it is will do you no good, except possibly for passing a lint check or a code review that you shouldn't have passed. All of the problems with global variables come from being a single object that everyone implicitly shares and references, not from being directly in the globals() dictionary or anything like that, so any decent fake global is just as bad as a real global. If that truly is the behavior you want, make it a global variable.
Putting it together:
# do not include _context here
__all__ = ['Context', 'DE405Context', 'Calculator', …
_context = Context()
Also, of course, you may want to call it something like _global_context or even _private_global_context, instead of just _context.
But keep in mind that globals are still members of a module, not of the entire universe, so even a public context will still be scoped as foo.context when client code does an import foo. And this may be exactly what you want. If you want a way for client scripts to import your module and then control its behavior, maybe foo.context = foo.Context(…) is exactly the right way. Of course this won't work in multithreaded (or gevent/coroutine/etc.) code, and it's inappropriate in various other cases, but if that's not an issue, in some cases, this is fine.
Since you brought up multithreading in your comments: In the simple style of multithreading where you have long-running jobs, the global style actually works perfectly fine, with a trivial change—replace the global Context with a global threading.local instance that contains a Context. Even in the style where you have small jobs handled by a thread pool, it's not much more complicated. You attach a context to each job, and then when a worker pulls a job off the queue, it sets the thread-local context to that job's context.
However, I'm not sure multithreading is going to be a good fit for your app anyway. Multithreading is great in Python when your tasks occasionally have to block for IO and you want to be able to do that without stopping other tasks—but, thanks to the GIL, it's nearly useless for parallelizing CPU work, and it sounds like that's what you're looking for. Multiprocessing (whether via the multiprocessing module or otherwise) may be more of what you're after. And with separate processes, keeping separate contexts is even simpler. (Or, you can write thread-based code and switch it to multiprocessing, leaving the threading.local variables as-is and only changing the way you spawn new tasks, and everything still works just fine.)
It may make sense to provide a "context" in the context manager sense, as an external version of the standard library's decimal module did, so someone can write:
with foo.Context(…):
# do stuff under custom context
# back to default context
However, nobody could really think of a good use case for that (especially since, at least in the naive implementation, it doesn't actually solve the threading/etc. problem), so it wasn't added to the standard library, and you may not need it either.
If you want to do this, it's pretty trivial. If you're using a private global, just add this to your Context class:
def __enter__(self):
global _context
self._stashedcontext = _context
_context = self
def __exit__(self, *args):
global context
_context = self._stashedcontext
And it should be obvious how to adjust this to public, thread-local, etc. alternatives.
Another alternative is to make everything a member of the Context object. The top-level module functions then just delegate to the global context, which has a reasonable default value. This is exactly how the standard library random module works—you can create a random.Random() and call randrange on it, or you can just call random.randrange(), which calls the same thing on a global default random.Random() object.
If creating a Context is too heavy to do at import time, especially if it might not get used (because nobody might ever call the global functions), you can use the singleton pattern to create it on first access. But that's rarely necessary. And when it's not, the code is trivial. For example, the source to random, starting at line 881, does this:
_inst = Random()
seed = _inst.seed
random = _inst.random
uniform = _inst.uniform
…
And that's all there is to it.
And finally, as you suggested, you could make everything a member of a different Calculator object which owns a Context object. This is the traditional OOP solution; overusing it tends to make Python feel like Java, but using it when it's appropriate is not a bad thing.
You might consider using a proxy object, here's a library that helps in creating object proxies:
http://pypi.python.org/pypi/ProxyTypes
Flask uses object proxies for it's "current_app", "request" and other variables, all it takes to reference them is:
from flask import request
You could create a proxy object that is a reference to your real context, and use thread locals to manage the instances (if that would work for you).

Categories

Resources