I'm learning to work with classes on Python, I ran into this issue, I have several methods inside the class. There are some chunk of code that is very similar or exactly the same inside every method.
What would be the best practice to remove the duplicates from the code and therefore shorten it?
It looks something like this:
class BasicClass(Object):
def FirstMethod(self, some_variable):
# Chunk of code that repeats across multiple methods
...
#Unique code to this method
...
def SecondMethod(self, some_variable):
# Chunk of code that repeats across multiple methods
...
#Unique code to this method
...
def ThirdMethod(self, some_variable):
# Chunk of code that repeats across multiple methods with slight variation
...
#Unique code to this method
...
Should I just write a helper function file and import that? Or is there a better way?
It really depends on what the code looks like. Using a helper function sounds reasonable. The slight variation in your third method could probably be implemented by passing an optional parameter to your helper function which then performs the variation.
If you want more detailed advice, you'd need to show the actual code...
Your question has actually nothing to do with classes - it's just as valid with plain functions.
And the answer is of course that when you have the same exact code repeated three or more times then you might indeed want to factor it out in a dedicated function.
BUT (big but) you first want to make sure that this duplications is not accidental - that is, it's the same code because it's really doing the same thing for the same reasons. Sometimes you spot a repeated pattern in your code, rush to factor it out, and later (while implementing another feature for example) find out that this repeated pattern was just a temporary coincidence and that you know have to "unfactor" it to make it change in a given way in one place and in a different way in another place.
That depends ... if this is an utility code that you envision to be useful to some other modules , yes you can encapsulate it in a Class. Otherwise if this is a particular logic to this module just follow the other recommendations to create a 4th new method inside this Class.
Related
I need several very similar plotting functions in python that share many arguments, but differ in some and of course also differ slightly in what they do. This is what I came up with so far:
Obviously just defining them one after the other and copying the code they share is a possibility, though not a very good one, I reckon.
One could also transfer the "shared" part of the code to helper functions and call these from inside the different plotting functions. This would make it tedious though, to later add features that all functions should have.
And finally I've also thought of implementing one "big" function, making possibly not needed arguments optional and then deciding on what to do in the function body based on additional arguments. This, I believe, would make it difficult though, to find out what really happens in a specific case as one would face a forest of arguments.
I can rule out the first option, but I'm hard pressed to decide between the second and third. So I started wondering: is there another, maybe object-oriented, way? And if not, how does one decide between option two and three?
I hope this question is not too general and I guess it is not really python-specific, but since I am rather new to programming (I've never done OOP) and first thought about this now, I guess I will add the python tag.
EDIT:
As pointed out by many, this question is quite general and it was intended to be so, but I understand that this makes answering it rather difficult. So here's some info on the problem that caused me to ask:
I need to plot simulation data, so all the plotting problems have simulation parameters in common (location of files, physical parameters,...). I also want the figure design to be the same. But depending on the quantity, some plots will be 1D, some 2D, some should contain more than one figure, sometimes I need to normalize the data or take a logarithm before plotting it. The output format might also vary.
I hope this helps a bit.
How about something like this. You can create a Base class that will have a method foo that is your base shared method that performs all the similar code. Then for your different classes you can inherit from Base and super the method of interest and extend the implementation to whatever extra functionality you need.
Here is an example of how it works. Note the different example I provided between how to use super in Python 2 and Python 3.
class Base:
def foo(self, *args, **kwargs):
print("foo stuff from Base")
return "return something here"
class SomeClass(Base):
def foo(self, *args, **kwargs):
# python 2
#x = super(SomeClass, self).foo(*args, **kwargs)
# python 3
x = super().foo(*args, **kwargs)
print(x)
print("SomeClass extension of foo")
s = SomeClass()
s.foo()
Output:
foo stuff from Base
return something here
SomeClass extension of foo from Base
More information needs to be given to fully understand the context. But, in a general sense, I'd do a mix of all of them. Use helper functions for "shared" parts, and use conditional statements too. Honestly, a lot of it comes down to just what is easier for you to do?
Long story short, I need to find a shortcut to calling the same method in multiple objects which were made from multiple classes.
Now, all of the classes have the same parent class, and even though the method differs bewteen the different classes, I figured the methods be the same name would work. So I though I might just be able to do something like this:
for object in listOfObjects:
object.method()
It hasn't worked. It might very well be a misspelling by me, but I can't find it. I think I could solve it by making a list that only adds the objects I need, but that would require a lot of coding, including changing other classes.
~~ skip to last paragraph for pseudo code accurately describing what I need~~
At this point, I will begin to go more in detail as to what specifically I am doing. I hope that it will better illustrate the scope of my question, and that the answering of my question will be more broadly applicable. The more general usage of this question are above, but this might help answer the question. Please be aware that I will change the question once I get an answer to more closely represent what I need done, so that it can apply to a wide variety of problems.
I am working on a gravity simulator. Whereas most simulators make objects which interact with one another and represent full bodies where their center of gravity is the actual attraction point, I am attempting to write a program which will simulate the distribution of gravity across all given points within an object.
As such, each object(not in programming terms, in literal terms) is made up of a bunch of tiny objects (both literally and figuratively). Essentially, what I am trying to do is call the object.gravity() method, which essentially takes into account all of the gravity from all other objects in the simulation and then moves the position of this particular object based on that input.
Now, either due to a syntactical bug (which I kinda doubt) or due to Python's limitations, I am unable to get all of the particles to behave properly all at once. The code snippet I posted before doesn't seem to be working.
tl;dr:
As such, I am wondering if there is a way (save adding all objects to a list and then iterating through it) to simply call the .gravity() method on every object that has the method. basically, even though this is sort of list format, this is what I want to do:
for ALL_OBJECTS:
if OBJECT has .gravity():
OBJECT.gravity()
You want the hasattr() function here:
for obj in all_objects:
if hasattr(obj, 'gravity'):
obj.gravity()
or, if the gravity method is defined by a specific parent class, you can test for that too:
for obj in all_objects:
if isinstance(obj, Planet):
obj.gravity()
Can also do ... better pythonic way to do it
for obj in all_objects:
try:
obj.gravity()
except AttributeError:
pass
Using getattr while set default option of getattr to lambda: None.
for obj in all_objects:
getattr(obj, 'gravity', lambda: None)()
I find myself constantly having to change and adapt old code back and forth repeatedly for different purposes, but occasionally to implement the same purpose it had two versions ago.
One example of this is a function which deals with prime numbers. Sometimes what I need from it is a list of n primes. Sometimes what I need is the nth prime. Maybe I'll come across a third need from the function down the road.
Any way I do it though I have to do the same processes but just return different values. I thought there must be a better way to do this than just constantly changing the same code. The possible alternatives I have come up with are:
Return a tuple or a list, but this seems kind of messy since there will be all kinds of data types within including lists of thousands of items.
Use input statements to direct the code, though I would rather just have it do everything for me when I click run.
Figure out how to utilize class features to return class properties and access them where I need them. This seems to be the cleanest solution to me, but I am not sure since I am still new to this.
Just make five versions of every reusable function.
I don't want to be a bad programmer, so which choice is the correct choice? Or maybe there is something I could do which I have not thought of.
Modular, reusable code
Your question is indeed important. It's important in a programmers everyday life. It is the question:
Is my code reusable?
If it's not, you will run into code redundancies, having the same lines of code in more than one place. This is the best starting point for bugs. Imagine you want to change the behavior somehow, e.g., because you discovered a potential problem. Then you change it in one place, but you will forget the second location. Especially if your code reaches dimensions like 1,000, 10,0000 or 100,000 lines of code.
It is summarized in the SRP, the Single-Responsibilty-Principle. It states that every class (also applicable to functions) should only have one determination, that it "should do just one thing". If a function does more than one thing, you should break it apart into smaller chunks, smaller tasks.
Every time you come across (or write) a function with more than 10 or 20 lines of (real) code, you should be skeptical. Such functions rarely stick to this principle.
For your example, you could identify as individual tasks:
generate prime numbers, one by one (generate implies using yield for me)
collect n prime numbers. Uses 1. and puts them into a list
get nth prime number. Uses 1., but does not save every number, just waits for the nth. Does not consume as much memory as 2. does.
Find pairs of primes: Uses 1., remembers the previous number and, if the difference to the current number is two, yields this pair
collect all pairs of primes: Uses 4. and puts them into a list
...
...
The list is extensible, and you can reuse it at any level. Every function will not have more than 10 lines of code, and you will not be reinventing the wheel everytime.
Put them all into a module, and use it from every script for an Euler Problem related to primes.
In general, I started a small library for my Euler Problem scripts. You really can get used to writing reusable code in "Project Euler".
Keyword arguments
Another option you didn't mention (as far as I understand) is the use of optional keyword arguments. If you regard small, atomic functions as too complicated (though I really insist you should get used to it) you could add a keyword argument to control the return value. E.g., in some scipy functions there is a parameter full_output, that takes a bool. If it's False (default), only the most important information is returned (e.g., an optimized value), if it's True some supplementary information is returned as well, e.g., how well the optimization performed and how many iterations it took to converge.
You could define a parameter output_mode, with possible values "list", "last" ord whatever.
Recommendation
Stick to small, reusable chunks of code. Getting used to this is one of the most valuable things you can pick up at "Project Euler".
Remark
If you try to implement the pattern I propose for reusable functions, you might run into a problem immediately at point 1: How to create a generator-style function for this? E.g., if you use the sieve method. But it's not too bad.
My guess, create module that contain:
private core function (example: return list of n-th first primes or even something more generall)
several wrapper/util functions that use core one and prepare output different ways. (example: n-th prime number)
Try to reduce your functions as much as possible, and reuse them.
For example you might have a function next_prime which is called repeatedly by n_primes and n_th_prime.
This also makes your code more maintainable, as if you come up with a more efficient way to count primes, all you do is change the code in next_prime.
Furthermore you should make your output as neutral as possible. If you're function returns several values, it should return a list or a generator, not a comma separated string.
What is the most pythonic way to add several identical methods to several classes?
I could use a class decorator, but that seems to bring in a fair bit of complication and its harder to write and read than the other methods.
I could make a base class with all the methods and let the other classes inherit, but then for some of the classes I would be very tempted to allow multiple inheritance, which I have read frequently is to be avoided or minimized. Also, the "is-a" relationship does not apply.
I could also change them from being methods to make them stand-alone functions which just expect their values to supply the appropriate properties through duck-typing. This is in some ways clean, but it is less object oriented and makes it less clear when a function could be used on that type of object.
I could use delegation, but this requires all of the classes that want to call to have methods calling up to the helper classes methods. This would make the code base much longer than the other options and require adding a method to delegate every time I want to add a new function to the helper class.
I know giving one class an instance of the other as an attribute works nicely in some cases, but it does not always work cleanly and can make calls more complicated than they would be otherwise.
After playing around with it a bit, I am leaning towrds inheritance even when it leads to multiple inheritance. But I hesitate due to numerous texts warning very strongly against ever allowing multiple inheritance and some (such as the wikipedia entry) going so far as to say that inheritance just for code reuse such as this should be minimized.
This may be more clear with an example, so for a simplified example say we are dealing with numerous distinct classes which all have a location on an x, y grid. There are a lot of operations we might want to make methods of everything with an x, y location, such as a method to get the distance between two such entities, or the relative direction, midpoint between them, etc.
What would be the most pythonic way to give all such classes access to these methods that rely only on having x and y as attributes?
For your specific example, I would try to take advantage of duck-typing. Write plain simple functions that take objects which are assumed to have x and y attributes:
def distance(a, b):
"""
Returns distance between `a` and `b`.
`a` and `b` should have `x` and `y` attributes.
"""
return math.sqrt((a.x-b.x)**2 + (a.y-b.y)**2)
Very simple. To make it clear how the function can be used, just document it.
Plain old functions are best for this problem. E.g. instead of this ...
class BaseGeoObject:
def distanceFromZeroZero(self):
return math.sqrt(self.x()**2 + self.y()**2)
...
... just have functions like this one:
def distanceFromZeroZero(point):
return math.sqrt(point.x()**2 + point.y()**2)
This is a good solution because it's also easy to test - it's not necessary to subclass just to exercise a specific function.
I have a small Python program consisting of very few modules (about 4 or so). The main module creates a list of tuples, thereby representing a number of records. These tuples are available to the other modules through a simple function that returns them (say, get_records()).
I am not sure if this is good design however. The problem being that the other modules need to know the indexes of each element in the tuple. This increases coupling between the modules, and isn't very transparent to someone who wants to use the main module.
I can think of a couple of alternatives:
Make the index values of the tuple elements available as module constants (e.g., IDX_RECORD_TITLE, IDX_RECORD_STARTDATE, etc.). This avoids the need of magic numbers like title = record[3].
Don't use tuples, but create a record class, and return a list of these class objects. The advantage being that the class methods will have self-explaining names like record.get_title().
Don't use tuples, but dictionaries instead. So in this scenario, the function would return a list of dictionaries. The advantage being that the dictionary keys are also self-explanatory (though someone using the module would need to know them). But this seems like a huge overhead.
I find tuples to be one of the great strengths of Python (very easy to pass compound data around without the coding overhead of classes/objects), so I currently use (1), but still wonder what would be the best approach.
http://docs.python.org/library/collections.html#namedtuple-factory-function-for-tuples-with-named-fields
i do not see any overhead or complexity in passing objects over tuples(tuples are also objects)
IMO if tuple serves your purpose easily use it, but as you have seen the constraints just switch to a class which represent your data cleanily e.g.
class MyData(object):
def __init__(self, title, desc):
self.title = title
self.desc = desc
You need not add any getter or setter method .
In those cases, I tend to use dictionaries.
If only to have things easily understandable for myself when I come back a bit later to use the code.
I don't know if it's a "huge overhead". I guess it depends on how often you do it and what it is used for. I start off with the easiest solution and optimize when I really need to. It surprisingly seldom I need to change something like that.