I've been tasked with taking over a large, existing Python program.
The author created a MainClass(), along with many (technically) functions that include a 'self' first argument and are dynamically added to a ExternalClass() that gets imported/instantiated at runtime.
In this large file, it is very easy to confuse the methods of the MainClass() with the dynamic methods of ExternalClass(), especially since the both include a first argument of self and are separated only by a few blank lines.
I know I should refactor, but for now my first priority is to keep the program working while I learn more about it.
I have put a comment between the MainClass() methods and the dynamic methods.
What is the proper term for functions that are intended to be added to another class as methods at runtime?
I've been using the term "free methods", but that doesn't seem right.
class MainClass():
def __init__(self):...
def method_1(self):...
...
def method_n(self):...
#
# -- The following [term?] methods are added to ExternalClass() dynamically at runtime
#
def dynamic_method_1(self):...
def dynamic_method_n(self):...
Related
I have a data engineering program that is grabbing some data off of Federal government websites and transforming that data. I'm a bit confused on whether I need to use the 'self' keyword or if it's a better practice to not use a class at all. This is how it's currently organized:
class GetGovtData():
def get_data_1(arg1=0, arg2=1):
df = conduct_some_operations
return df
def get_data_2(arg1=4, arg2=5):
df = conduct_some_operations_two
return df
I'm mostly using a class here for organization purposes. For instance, there might be a dozen different methods from one class that I need to use. I find it more aesthetically pleasing / easier to type out this:
from data.get_govt_data import GetGovtData
df1 = GetGovtData.get_data_1()
df2 = GetGovtData.get_data_2()
Rather than:
from data import get_govt_data
df1 = get_govt_data.get_data_1()
df2 = get_govt_data.get_data_2()
Which just has a boatload of underscores. So I'm just curious if this would be considered bad code to use a class like this, without bothering with 'self'? Or should I just eliminate the classes and use a bunch of functions in my files instead?
If you develop functions within a Python class you can two ways of defining a function: The one with a self as first parameter and the other one without self.
So, what is the different between the two?
Function with self
The first one is a method, which is able to access content within the created object. This allows you to access the internal state of an individual object, e.g., a counter of some sorts. These are methods you usually use when using object oriented programming. A short intro can be fund here [External Link]. These methods require you to create new instances of the given class.
Function without self
Functions without initialising an instance of the class. This is why you can directly call them on the imported class.
Alternative solution
This is based on the comment of Tom K. Instead of using self, you can also use the decorator #staticmethod to indicate the role of the method within your class. Some more info can be found here [External link].
Final thought
To answer you initial question: You do not need to use self. In your case you do not need self, because you do not share the internal state of an object. Nevertheless, if you are using classes you should think about an object oriented design.
I suppose you have a file called data/get_govt_data.py that contains your first code block. You can just rename that file to data/GetGovtData.py, remove the class line and not bother with classes at all, if you like. Then you can do
from data import GetGovtData
df1 = GetGovtData.get_data_1()
Depending on your setup you may need to create an empty file data/__init__.py for Python to see data as a module.
EDIT: Regarding the file naming, Python does not impose any too tight restrictions here. Note however that many projects conventionally use camelCase or CapitalCase to distinguish function, class and module names. Using CapitalCase for a module may confuse others for a second to assume it's a class. You may choose not to follow this convention if you do not want to use classes in your project.
To answer the question in the title first: The exact string 'self' is a convention (that I can see no valid reason to ignore BTW), but the first argument in a class method is always going to be a reference to the class instance.
Whether you should use a class or flat functions depends on if the functions have shared state. From your scenario it sounds like they may have a common base URL, authentication data, database names, etc. Maybe you even need to establish a connection first? All those would be best held in the class and then used in the functions.
EDIT
Note, it was brought to my attention that Instance attribute attribute_name defined outside __init__ is a possible duplicate, which I mostly agree with (I didn't come upon this because I didn't know to search for pylint). However, I would like to keep this question open because of the fact that I want to be able to reinitialize my class using the same method. The general consensus in the previous question was to return each parameter from the loadData script and then parse it into the self object. This is fine, however, I would still have to do that again within another method to be able to reinitialize my instance of class, which still seems like extra work for only a little bit more readability. Perhaps the issue is my example. In real life there are about 30 parameters that are read in by the loadData routine, which is why I am hesitant to have to parse them in two different locations.
If the general consensus here is that returning the parameters are the way to go then we can go ahead and close this question as a duplicate; however, in the mean time I would like to wait to see if anyone else has any ideas/a good explanation for why.
Original
This is something of a "best practices" question. I have been learning python recently (partially to learn something new and partially to move away from MATLAB). While working in python I created a class that was structured as follows:
class exampleClass:
"""
This is an example class to demonstrate my question to stack exchange
"""
def __init__( self, fileName ):
exampleClass.loadData( self, fileName )
def loadData( self, fileName ):
"""
This function reads the data specified in the fileName into the
current instance of exampleClass.
:param fileName: The file that the data is to be loaded from
"""
with open(fileName,'r') as sumFile:
self.name = sumFile.readLine().strip(' \n\r\t')
Now this makes sense to me. I have an init class that populated the current instance of the class by calling to a population function. I also have the population function which would allow me to reinitialize a given instance of this class if for some reason I need to (for instance if the class takes up a lot of memory and instead of creating separate instances of the class I just want to have one instance that I overwrite.
However, when I put this code into my IDE (pycharm) it throws a warning that an instance attribute was defined outside of __init__. Now obviously this doesn't affect the operation of the code, everything works fine, but I am wondering if there is any reason to pay attention to the warning in this case. I could do something where I initialize all the properties to some default value in the init method before calling the loadData method but this just seems like unnecessary work to me and like it would slow down the execution (albeit only a very small amount). I could also have essentially two copies of the loadData method, one in the __init__ method and one as an actual method but again this just seems like unnecessary extra work.
Overall my question is what would the best practice be in this situation be. Is there any reason that I should restructure the code in one of the ways I mentioned in the previous paragraph or is this just an instance of an IDE with too broad of a code-inspection warning. I can obviously see some instances where this warning is something to consider but using my current experience it doesn't look like a problem in this case.
I think it's a best practice to define all of your attributes up front, even if you're going to redefine them later. When I read your code, I want to be able to see your data structures. If there's some attribute hidden in a method that only becomes defined under certain circumstances, it makes it harder to understand the code.
If it is inconvenient or impossible to give an attribute it's final value, I recommend at least initializing it to None. This signals to the reader that the object includes that attribute, even if it gets redefined later.
class exampleClass:
"""
This is an example class to demonstrate my question to stack exchange
"""
def __init__( self, fileName ):
# Note: this will be modified when a file is loaded
self.name = None
exampleClass.loadData( self, fileName )
Another choice would be for loadData to return the value rather than setting it, so your init might look like:
def __init__(self, fileName):
self.name = self.loadData(fileName)
I tend to think this second method is better, but either method is fine. The point is, make your classes and objects as easy to understand as possible.
I have become stuck on a problem with a class that I am writing where I need to be able to reinitialize the parents of that class after having created an instance of the class. The problem is that the parent class has a read and a write mode that is determined by passing a string to the init function. I want to be able to switch between these modes without destroying the object and re-initialising. Here is an example of my problem:
from parent import Parent
class Child(Parent):
def __init__(mode="w"):
super.__init__(mode=mode)
def switch_mode():
# need to change the mode called in the super function here somehow
The idea is to extend a class that I have imported from a module to offer extended functionality. The problem is I still need to be able to access the original class methods from the new extended object. This has all worked smoothly so far with me simply adding and overwriting methods as needed. As far as I can see the alternative is to use composition rather than inheritance so that the object I want to extend is created as a member of the new class. The problem with this is this requires me to make methods for accessing each of the object's methods
ie. lots of this sort of thing:
def read_frames(self):
return self.memberObject.read_frames()
def seek(self):
return self.memberObject.seek()
which doesn't seem all that fantastic and comes with the problem that if any new methods are added to the base class in the future I have to create new methods manually in order to access them, but is perhaps the only option?
Thanks in advance for any help!
This should work. super is a function.
super(Child, self).__init__(mode=mode)
I'm more of an engineer and less of a coder, but I know enough python and C++ to be dangerous.
I'm creating a python vector/matrix class as a helper class based upon numpy as well as cvxopt. The overall goal (which I've already obtained... the answer to this question will just make the class better) is to make dot products and other processes more unified and easier for numerical methods.
However, I'd like to make my helper class even more transparent. What I'd like to do is to redefine the cvxopt.matrix() init function based upon the current variable which was used. This is to say, if I have a custom matrix: "cstmat", I'd like the function "cvxopt.matrix(cstmat)" to be defined by my own methods instead of what is written in the cvxopt class.
In short, I'd like to "intercept" the other function call and use my own function.
The kicker, though, is that I don't want to take over cvxopt.matrix(any_other_type). I just want to redefine the function when it's called upon my own custom class. Is this possible?
Thanks,
Jon
You can do this, but it's not pretty.
You can do probably something along these lines:
cvxopt._orig_matrix = cvxopt.matrix
def my_matrix(*args, **kwargs):
if isinstance(arg[0], cstmat):
# do your stuff here
else:
cvxopt._orig_matrix(*args, **kwargs)
cvxopt.matrix = my_matrix
But you're probably better off finding a less weird way. And no guarantees that won't forget who "self" is.
Better would be to use inheritance! Kinda like this:
class Cstmat(cvsopt.matrix):
def __init__(self, ...):
pass
def matrix(self, arg):
if isinstance(arg, cstmat):
# do your stuff here
else:
cvsopt.matrix(arg)
I want to create a class with two methods at this point (I also want to be able to
alter the class obviously).
class ogrGeo(object):
def __init__(self):
pass
def CreateLine(self, o_file, xy):
#lots of code
def CreatePoint(self, o_file, xy):
# lot's of the same code as CreateLine(),
# only minor differences
To keep things as clean and to to repeat as
less code as possible I'm asking for some advise. The two methods CreateLine()
and CreatePoint() share a lot of code. To reduce redundance:
Should a define third method that both methods can call?
In this case you could still call
o = ogrGeo()
o.CreateLine(...)
o.CreatePoint(...)seperatly.
Or should I merge them into one method? Is there another solution I haven't thought about or know nothing about?
Thanks already for any suggestions.
Whether you should merge the methods into one is a matter of API design. If the functions have a different purpose, then you keep them seperate. I would merge them if client code is likely to follow the pattern
if some_condition:
o.CreateLine(f, xy)
else:
o.CreatePoint(f, xy)
But otherwise, don't merge. Instead, refactor the common code into a private method, or even a freestanding function if the common code does not touch self. Python has no notion of "private method" built into the language, but names with a leading _ will be recognized as such.
It's perfectly normal to factor out common code into a (private) helper method:
class ogrGeo(object)
def __init__(self):
pass
def CreateLine(self, o_file, xy):
#lots of code
value = self._utility_method(xy)
def CreatePoint(self, o_file, xy):
# lot's of the same code as CreateLine(),
# only minor differences
value = self._utility_method(xy)
def _utility_method(self, xy):
# Common code here
return value
The method could return a value, or it could directly manipulate the attributes on self.
A word of advice: read the Python style guide and stick to it's conventions. Most other python projects do, and it'll make your code easier to comprehend for other Python developers if you do.
For the pieces of code that will overlap, consider whether those can be their own separate functions as well. Then CreateLine would be comprised of several calls to certain functions, with parameter choices that make sense for CreateLine, meanwhile CreatePoint would be several function calls with appropriate parameters for creating a point.
Even if those new auxiliary functions aren't going to be used elsewhere, it's better to modularize them as separate functions than to copy/paste code. But, if it is the case that the auxialiary functions needed to create these structures are pretty specific, then why not break them out into their own classes?
You could make an "Object" class that involves all of the basics for creating objects, and then have "Line" and "Point" classes which derive from "Object". Within those classes, override the necessary functions so that the construction is specific, relying on auxiliary functions in the base "Object" class for the portions of code that overlap.
Then the ogrGeo class will construct instances of these other classes. Even if the ultimate consumer of "Line" or "Shape" doesn't need a full blown class object, you can still use this design, and give ogrGeo the ability to return the sub-pieces of a Line instance or a Point instance that the consumer does wish to use.
It hardly matters. You want the class methods to be as usable as possible for the calling programs, and it's slightly easier and more efficient to have two methods than to have a single method with an additional parameter for the type of object to be created:
def CreateObj(self, obj, o_file, xy) # obj = 0 for Point, 1 for Line, ...
Recommendation: use separate API calls and factor the common code into method(s) that can be called within your class.
You as well could go the other direction. Especially if the following is the case:
def methA/B(...):
lots of common code
small difference
lots of common code
then you could do
def _common(..., callback):
lots of common code
callback()
lots of common code
def methA(...):
def _mypart(): do what A does
_common(..., _mypart)
def methB(...):
def _mypart(): do what B does
_common(..., _mypart)