I need to connect three classes so as to some of them could use the others' methods.
Here I show an example of how classes work. As it's seen, data is input throughout Data class and is manipulated by Statistics and Plotting classes.
class Data (object): # This class read a file and creates a DataFrame object
def __init__(input_data):
def Tool:
# [df managing operations]
return df
class Statistics: # This class use Data dataframe and manipulate it.
def mean(df):
return scalar
class Plotting: # This class plot Data dataframe in function of Statistics outputs
def with_colors (df, scalar):
I don't think Plotting or Statistics map well to classes or instances.
They look more like libraries of functions. Otherwise you will instantiate a single Plotting and single Statistics just to call their methods on something else.
It looks like you grouped your utility methods in classes and ended up with too many methods. This is just an organization/partition problem.
If you want you can just make them modules, define functions there, import the relevant functions into the main program and pass those functions the data they need as arguments.
Also it looks you are just creating a dataframe-like object and adding methods to it. And reading data from somewhere looks like just another utility function.
While nothing stops you from doing those things, including inheriting from dataframe to make your own extended version, I think you are better using df objects as-is, and passing them around, to utility functions.
Try passing a Data object in Statistics and in Plotting classes' methods.
Related
I have a data engineering program that is grabbing some data off of Federal government websites and transforming that data. I'm a bit confused on whether I need to use the 'self' keyword or if it's a better practice to not use a class at all. This is how it's currently organized:
class GetGovtData():
def get_data_1(arg1=0, arg2=1):
df = conduct_some_operations
return df
def get_data_2(arg1=4, arg2=5):
df = conduct_some_operations_two
return df
I'm mostly using a class here for organization purposes. For instance, there might be a dozen different methods from one class that I need to use. I find it more aesthetically pleasing / easier to type out this:
from data.get_govt_data import GetGovtData
df1 = GetGovtData.get_data_1()
df2 = GetGovtData.get_data_2()
Rather than:
from data import get_govt_data
df1 = get_govt_data.get_data_1()
df2 = get_govt_data.get_data_2()
Which just has a boatload of underscores. So I'm just curious if this would be considered bad code to use a class like this, without bothering with 'self'? Or should I just eliminate the classes and use a bunch of functions in my files instead?
If you develop functions within a Python class you can two ways of defining a function: The one with a self as first parameter and the other one without self.
So, what is the different between the two?
Function with self
The first one is a method, which is able to access content within the created object. This allows you to access the internal state of an individual object, e.g., a counter of some sorts. These are methods you usually use when using object oriented programming. A short intro can be fund here [External Link]. These methods require you to create new instances of the given class.
Function without self
Functions without initialising an instance of the class. This is why you can directly call them on the imported class.
Alternative solution
This is based on the comment of Tom K. Instead of using self, you can also use the decorator #staticmethod to indicate the role of the method within your class. Some more info can be found here [External link].
Final thought
To answer you initial question: You do not need to use self. In your case you do not need self, because you do not share the internal state of an object. Nevertheless, if you are using classes you should think about an object oriented design.
I suppose you have a file called data/get_govt_data.py that contains your first code block. You can just rename that file to data/GetGovtData.py, remove the class line and not bother with classes at all, if you like. Then you can do
from data import GetGovtData
df1 = GetGovtData.get_data_1()
Depending on your setup you may need to create an empty file data/__init__.py for Python to see data as a module.
EDIT: Regarding the file naming, Python does not impose any too tight restrictions here. Note however that many projects conventionally use camelCase or CapitalCase to distinguish function, class and module names. Using CapitalCase for a module may confuse others for a second to assume it's a class. You may choose not to follow this convention if you do not want to use classes in your project.
To answer the question in the title first: The exact string 'self' is a convention (that I can see no valid reason to ignore BTW), but the first argument in a class method is always going to be a reference to the class instance.
Whether you should use a class or flat functions depends on if the functions have shared state. From your scenario it sounds like they may have a common base URL, authentication data, database names, etc. Maybe you even need to establish a connection first? All those would be best held in the class and then used in the functions.
suppose I create a class with my own custom functions. I also want this class to inherit everything from Pandas.
class customClass(pandas.Dataframe):
def my_func(x,y):
return x+y.
instantiating
a = customClass()
typing "a." + tab I see I get a lot of pandas methods. but I'm missing somet other things like read_csv.
is there a way to get that also? the objective would to just use this custom class for everything.
See the Python tutorial
The most important thing to notice for your specific question is that read_csv is not a method of DataFrame. When you use that method, you call
pd.read_csv("local_file.csv")
not
my_df.read_csv("local_file.csv")
Your customClass does not include that method; it's not reasonable to supposed that your custom instance would show that as a method choice.
For your use case, you would still use pandas.read_csv in building a data frame of your custom class.
If you want to inherit the entire pandas pantheon, then you'll need to do so explicitly. I don't recommend it.
I have a quite a bit of confusion on how to use classes. I understand what they are, and why they should be used, just not how. For example, we're given a pre-made class (I'll call it class Class_1(object) to keep things simple) with a few functions (methods, right?) and variables in it.
class Class_1(object):
var_1= [a,b,c]
var_2= [x,y,z]
var_3= {n:[o,p],g:[h,i]}
def method_1(self):
'''here's a method'''
(As a side note, the Class_1(object) does have the __init__(self): method already done.)
Now, in a separate program, I've imported the file that contains that class at the top of the program, but how do I use methods or variables from the class? For example, if I want to check a user input against a value in var_1, how would I do that?
I've gotten better with functions in general, but calling on classes and methods is as clear as mud.
Edit: Realized I said "methods" instead of "variables" when I actually need both.
To use the class, you need to create an class instance from the separate file:
import filename1
class1 = filename1.Class_1()
With the instance, you can then access the member variables:
value1 = class1.method_1
I have a rather lengthy class for data analysis. In this class there are functions for input, output, plotting, different analysis steps and so on. I really would like to split this class to smaller, easier to read subclasses.
The most easy way would of course be to define a superclass and then inherit multiple subclasses. However, this is not what I want because functions of on subclass cannot change the variables of another subclass.
What I want to have is a splitting of the class definition into multiple files where I can group certain methods.
The structure should be something like:
master.py # contains something that puts together all the parts
io.py # contains function for data input / output
plot.py # contains functions for plotting / visualization of data
analyze1.py # contains functions to perform certain analysis steps
analyze2.py # contains functions to perform certain analysis steps
Take a look at mixins:
plot.py:
class DataPlotter(object):
def plot(self):
# lots of code
my_plot_lib.plot(self.data) # assume self.data is available in instance
io.py:
class DataIOProvider(object):
def read(self, filename):
# lots of code
self.data = magic_data
master.py:
from plot import DataPlotter
from io import DataIOProvider
class GodDataProcessor(DataPlotter, DataIOProvider):
def run(self):
self.read('my_file.txt')
self.plot()
Note that you should wrap your code in some package to avoid name clashing (io is a built-in module name in Python).
All base classes may reside in individual modules, and when attribute is set in one of base classes, simply assume it's available in all other classes.
I want to create a class with two methods at this point (I also want to be able to
alter the class obviously).
class ogrGeo(object):
def __init__(self):
pass
def CreateLine(self, o_file, xy):
#lots of code
def CreatePoint(self, o_file, xy):
# lot's of the same code as CreateLine(),
# only minor differences
To keep things as clean and to to repeat as
less code as possible I'm asking for some advise. The two methods CreateLine()
and CreatePoint() share a lot of code. To reduce redundance:
Should a define third method that both methods can call?
In this case you could still call
o = ogrGeo()
o.CreateLine(...)
o.CreatePoint(...)seperatly.
Or should I merge them into one method? Is there another solution I haven't thought about or know nothing about?
Thanks already for any suggestions.
Whether you should merge the methods into one is a matter of API design. If the functions have a different purpose, then you keep them seperate. I would merge them if client code is likely to follow the pattern
if some_condition:
o.CreateLine(f, xy)
else:
o.CreatePoint(f, xy)
But otherwise, don't merge. Instead, refactor the common code into a private method, or even a freestanding function if the common code does not touch self. Python has no notion of "private method" built into the language, but names with a leading _ will be recognized as such.
It's perfectly normal to factor out common code into a (private) helper method:
class ogrGeo(object)
def __init__(self):
pass
def CreateLine(self, o_file, xy):
#lots of code
value = self._utility_method(xy)
def CreatePoint(self, o_file, xy):
# lot's of the same code as CreateLine(),
# only minor differences
value = self._utility_method(xy)
def _utility_method(self, xy):
# Common code here
return value
The method could return a value, or it could directly manipulate the attributes on self.
A word of advice: read the Python style guide and stick to it's conventions. Most other python projects do, and it'll make your code easier to comprehend for other Python developers if you do.
For the pieces of code that will overlap, consider whether those can be their own separate functions as well. Then CreateLine would be comprised of several calls to certain functions, with parameter choices that make sense for CreateLine, meanwhile CreatePoint would be several function calls with appropriate parameters for creating a point.
Even if those new auxiliary functions aren't going to be used elsewhere, it's better to modularize them as separate functions than to copy/paste code. But, if it is the case that the auxialiary functions needed to create these structures are pretty specific, then why not break them out into their own classes?
You could make an "Object" class that involves all of the basics for creating objects, and then have "Line" and "Point" classes which derive from "Object". Within those classes, override the necessary functions so that the construction is specific, relying on auxiliary functions in the base "Object" class for the portions of code that overlap.
Then the ogrGeo class will construct instances of these other classes. Even if the ultimate consumer of "Line" or "Shape" doesn't need a full blown class object, you can still use this design, and give ogrGeo the ability to return the sub-pieces of a Line instance or a Point instance that the consumer does wish to use.
It hardly matters. You want the class methods to be as usable as possible for the calling programs, and it's slightly easier and more efficient to have two methods than to have a single method with an additional parameter for the type of object to be created:
def CreateObj(self, obj, o_file, xy) # obj = 0 for Point, 1 for Line, ...
Recommendation: use separate API calls and factor the common code into method(s) that can be called within your class.
You as well could go the other direction. Especially if the following is the case:
def methA/B(...):
lots of common code
small difference
lots of common code
then you could do
def _common(..., callback):
lots of common code
callback()
lots of common code
def methA(...):
def _mypart(): do what A does
_common(..., _mypart)
def methB(...):
def _mypart(): do what B does
_common(..., _mypart)