I am using python 3.7. I am running some analysis on the RSSI values obtained from wireshark and therefore need to plot the various RSSI values with respect to a channel.
I have created the following classes to hold different attributes such as mean(list), median, mode, std dev etc.
While parsing through the individual data files I plan to go about and collect the data in the above lists for each channel.
To be able to plot these values I need to pass the attributes as a list to the plot function. I am having trouble designing/defining the data structures.
ch_rssiattr_list = collections.defaultdict(lambda : collections.defaultdict(list))
ch_rssiattr_list[channel]['mean'].append(mean)
The above method would be acceptable - not a class instantiation method.
A more elegant way I could think was creating a class with the various attributes as I was explaining above as I am parsing the data through various distances.
In C++ I would create a map with the key as a channel and value as a class object.
As I am parsing through the data. I almost need a way to update the values in the following way.
ch_rssiattr_list[channel].mean_list.append(mean)
I do not know how to declare this data structure.
Any other design/ implementation methods are welcome as well but the above two ways seem simple enough to me so the right way to declare data structures here would be helpful to know. Hopefully I explained my question well enough, a google search on nested dictionaries and lists did not exactly give me what I was looking for although I do understand how nested dictionaries and lists can be declared, but in general how would you declare them for your custom class seems a bit confusing to me.
Here is what my custom class looks like.
class summary_lists(object):
def __init__(self,dist_list, mean_list, median_list, mode_list,
std_dev_list, mmode_list, unique_count_list, total_datapoint_list,
min_list, max_list):
self.dist_list = dist_list
self.mean_list = mean_list
self.mode_list = mode_list
self.median_list = median_list
self.std_dev_list = std_dev_list
self.mmode_list = mmode_list
self.unique_count_list = unique_count_list
self.total_datapoint_list = total_datapoint_list
self.min_list = min_list
Related
To provide a bit of context, I am building a risk model that pulls data from various different sources. Initially I wrote the model as a single function that when executed read in the different data sources as pandas.DataFrame objects and used those objects when necessary. As the model grew in complexity, it quickly became unreadable and I found myself copy an pasting blocks of code often.
To cleanup the code I decided to make a class that when initialized reads, cleans and parses the data. Initialization takes about a minute to run and builds my model in its entirety.
The class also has some additional functionality. There is a generate_email method that sends an email with details about high risk factors and another method append_history that point-in-times the risk model and saves it so I can run time comparisons.
The thing about these two additional methods is that I cannot imagine a scenario where I would call them without first re-calibrating my risk model. So I have considered calling them in init() like my other methods. I haven't only because I am trying to justify having a class in the first place.
I am consulting this community because my project structure feels clunky and awkward. I am inclined to believe that I should not be using a class at all. Is it frowned upon to create classes merely for the purpose of organization? Also, is it bad practice to call instance methods (that take upwards of a minute to run) within init()?
Ultimately, I am looking for reassurance or a better code structure. Any help would be greatly appreciated.
Here is some pseudo code showing my project structure:
class RiskModel:
def __init__(self, data_path_a, data_path_b):
self.data_path_a = data_path_a
self.data_path_b = data_path_b
self.historical_data = None
self.raw_data = None
self.lookup_table = None
self._read_in_data()
self.risk_breakdown = None
self._generate_risk_breakdown()
self.risk_summary = None
self.generate_risk_summary()
def _read_in_data(self):
# read in a .csv
self.historical_data = pd.read_csv(self.data_path_a)
# read an excel file containing many sheets into an ordered dictionary
self.raw_data = pd.read_excel(self.data_path_b, sheet_name=None)
# store a specific sheet from the excel file that is used by most of
# my class's methods
self.lookup_table = self.raw_data["Lookup"]
def _generate_risk_breakdown(self):
'''
A function that creates a DataFrame from self.historical_data,
self.raw_data, and self.lookup_table and stores it in
self.risk_breakdown
'''
self.risk_breakdown = some_dataframe
def _generate_risk_summary(self):
'''
A function that creates a DataFrame from self.lookup_table and
self.risk_breakdown and stores it in self.risk_summary
'''
self.risk_summary = some_dataframe
def generate_email(self, recipient):
'''
A function that sends an email with details about high risk factors
'''
if __name__ == "__main__":
risk_model = RiskModel(data_path_a, data_path_b)
risk_model.generate_email(recipient#generic.com)
In my opinion it is a good way to organize your project, especially since you mentioned the high rate of re-usability of parts of the code.
One thing though, I wouldn't put the _read_in_data, _generate_risk_breakdown and _generate_risk_summary methods inside __init__, but instead let the user call this methods after initializing the RiskModel class instance.
This way the user would be able to read in data from a different path or only to generate the risk breakdown or summary, without reading in the data once again.
Something like this:
my_risk_model = RiskModel()
my_risk_model.read_in_data(path_a, path_b)
my_risk_model.generate_risk_breakdown(parameters)
my_risk_model.generate_risk_summary(other_parameters)
If there is an issue of user calling these methods in an order which would break the logical chain, you could throw an exception if generate_risk_breakdown or generate_risk_summary are called before read_in_data. Of course you could only move the generate... methods out, leaving the data import inside __init__.
To advocate more on exposing the generate... methods out of __init__, consider a case scenario, where you would like to generate multiple risk summaries, changing various parameters. It would make sense, not to create the RiskModel every time and read the same data, but instead change the input to generate_risk_summary method:
my_risk_model = RiskModel()
my_risk_model.read_in_data(path_a, path_b)
for parameter in [50, 60, 80]:
my_risk_model.generate_risk_summary(parameter)
my_risk_model.generate_email('test#gmail.com')
It can someone be convenient to group variables under a given object.
My use case is tensorflow, where you often have to define a graph first and then feed it with actual data. To avoid getting the names of the graph variables jumbled up with those of the data variables, it's useful to group them all under an object. What I've been doing is:
g = lambda: None
g.iterator = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(minibatch_size).make_initializable_iterator()
g.x_next, g.y_next = g.iterator.get_next()
g.data_updates = g.x_data.assign(g.x_next), g.y_data.assign(g.y_next)
Except that when you use lambda: None your coworkers tend to get angry and confused.
Is there an alternative that provides equally clean syntax but uses something that is more obviously a container than lambda: None?
I first tried making them all static members of a class, but the problem is that static members cannot reference other static members. g=object() would be nice but doesn't allow you to assign attributes.
If it's not worth defining a dedicated class, you can use types.SimpleNamespace, which is a class specifically designed to do nothing but hold attributes.
g = types.SimpleNamespace()
g.iterator = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(minibatch_size).make_initializable_iterator()
g.x_next, g.y_next = g.iterator.get_next()
g.data_updates = g.x_data.assign(g.x_next), g.y_data.assign(g.y_next)
I'm currently on some heavy data analytics projects, and am trying to create a Python wrapper class to help streamline a lot of the mundane preprocessing steps involved when cleaning data, partitioning it into test / validation sets, standardizing it, etc. The idea ultimately is to transform raw data into easily consumable processed matrices for machine learning algorithms to input for training and testing purposes. Ideally, I'm working towards the point where
data = DataModel(AbstractDataModel)
processed_data = data.execute_pipeline(**kwargs)
So in many cases I'll start off with a self.df, which is a pandas dataframe object for my instance. But one method may be called standardize_data() and will ultimately return a standardized dataframe called self.std_df.
My IDE has been complaining heavily about me initializing variables outside of __init__. So to try to soothe PyCharm, I've been using the following code inside my constructor:
class AbstractDataModel(ABC):
#abstractmethod
def __init__(self, input_path, ..., **kwargs):
self.df_train, self.df_test, self.train_ID, self.test_ID, self.primary_key, ... (many more variables) = None, None, None, None, None, ...
Later on, these properties are being initialized and set. I'll admit that I'm coming from heavy-duty Java Spring projects, so I'm still used to verbosely declaring variables. Is there a more Pythonic way of declaring my instance properties here? I know I must be violating DRY with all the None values.
I've researched on SO, and came across this similar question, but the answer that is provided is more about setting instance variables through argv, so it isn't a direct solution in my context.
Use chained assignment:
self.df_train = self.df_test = self.train_ID = self.test_ID = self.primary_key = ... = None
Or set up abstract properties that default to None (So you don't have to set them)
OK.
So I've got a database where I want to store references to other Python objects (right now I'm using to store inventory information for person stores of beer recipe ingredients).
Since there are about 15-20 different categories of ingredients (all represented by individual SQLObjects) I don't want to do a bunch of RelatedJoin columns since, well, I'm lazy, and it seems like it's not the "best" or "pythonic" solution as it is.
So right now I'm doing this:
class Inventory(SQLObject):
inventory_item_id = IntCol(default=0)
amount = DecimalCol(size=6, precision=2, default=0)
amount_units = IntCol(default=Measure.GM)
purchased_on = DateCol(default=datetime.now())
purchased_from = UnicodeCol(default=None, length=256)
price = CurrencyCol(default=0)
notes = UnicodeCol(default=None)
inventory_type = UnicodeCol(default=None)
def _get_name(self):
return eval(self.inventory_type).get(self.inventory_item_id).name
def _set_inventory_item_id(self, value):
self.inventory_type = value.__class__.__name__
self._SO_set_inventory_item_id(value.id)
Please note the ICKY eval() in the _get_name() method.
How would I go about calling the SQLObject class referenced by the string I'm getting from __class__.__name__ without using eval()? Or is this an appropriate place to utilize eval()? (I'm sort of of the mindset where it's never appropriate to use eval() -- however since the system never uses any end user input in the eval() it seems "safe".)
To get the value of a global by name; Use:
globals()[self.inventory_type]
The desire is for the user to instantiate a class that represents the transeint along with automatic access to a member item for each variable being represented (up to 200 variables). The set of variable class instances would be dynamic based on file based input data and the desire is to use the file provided variable names to create a collection of these variable instances that are accessible with a natural naming scheme. Effectively, the variable class hides the details of where the data is stored and the indepedent variable (ie, time) is stored. The following pseudo code expresses random lines that the end user may express. In some cases, the post processing may be much more extensive.
tran1 = CTransient('TranData', ...)
Padj = tran1.pressPipe1 + 10 # add 10 bar to a pressure for conservatism
Tsat = TsatRoutine( tran1.tempPipe1 )
MyPlotRoutine( tran1.tempPipe1, tran1.tempPipe2 )
where pressPipeX and tempPipeX names defined in the input data files and the corresponding numpy data vectors are specified in the 'TranData' file input file and are instances of a CVariable class.
Help on how to dynamically build the set of instances that represent the transient variables such that they can be accessed would be appreciated.
Your description of what you're trying to do isn't entirely clear, but automatically naming variables something1, something2, etc. are generally a bad idea. Use a list instead:
transientvariables = []
transientvariables.append(makenewtransientvariable())
# ...
for tv in transientvariables:
print tv
Edit: OK, I think I see what you're getting at, although your explanation still isn't exactly easy to read. You have a collection of pipes, with a time series of temperature and pressure recorded for each one, right?
The easiest way would be to use a dictionary:
transients["tempPipe1"]
Or nested dictionaries:
transients["temp"]["Pipe1"]
Or you could override your class' __getattr__ method, so that it looks in a dictionary, and you can do:
transients.tempPipe1
Edit 2: Overriding __getattr__ would look a bit like this:
def __getattr__(self, name):
if name in self.varMap:
return self.varMap[name]
raise AttributeError