Python generic Container? - python

It can someone be convenient to group variables under a given object.
My use case is tensorflow, where you often have to define a graph first and then feed it with actual data. To avoid getting the names of the graph variables jumbled up with those of the data variables, it's useful to group them all under an object. What I've been doing is:
g = lambda: None
g.iterator = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(minibatch_size).make_initializable_iterator()
g.x_next, g.y_next = g.iterator.get_next()
g.data_updates = g.x_data.assign(g.x_next), g.y_data.assign(g.y_next)
Except that when you use lambda: None your coworkers tend to get angry and confused.
Is there an alternative that provides equally clean syntax but uses something that is more obviously a container than lambda: None?
I first tried making them all static members of a class, but the problem is that static members cannot reference other static members. g=object() would be nice but doesn't allow you to assign attributes.

If it's not worth defining a dedicated class, you can use types.SimpleNamespace, which is a class specifically designed to do nothing but hold attributes.
g = types.SimpleNamespace()
g.iterator = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(minibatch_size).make_initializable_iterator()
g.x_next, g.y_next = g.iterator.get_next()
g.data_updates = g.x_data.assign(g.x_next), g.y_data.assign(g.y_next)

Related

Using constructor parameter variable names during object instantiation in Python?

When declaring a new instance of an object in python, why would someone use the names of the variables from the parameters at instatntiation time? Say you have the following object:
class Thing:
def __init__(self,var1=None,var2=None):
self.var1=var1
self.var2=var2
The programmer from here decides to create an instance of this object at some point later and enters it in the following way:
NewObj = Thing(var1=newVar,var2=otherVar)
Is there a reason why someone would enter it that way vs. just entering the newVar/otherVar variables into the constructor parameters without using "var1=" and "var2="? Like below:
NewObj = Thing(newVar,otherVar)
I'm fairly novice at using python, and I couldn't find anything about this specific sort of syntax even if it seems like a fairly simple/straightforward question
The reason is clarity, not for the computer, but for yourself and other humans.
class Calculation:
def __init__(self, low=None, high=None, mean=None):
self.low=low
self.high=high
self.mean=mean
...
# with naming (notice how ordering is not important)
calc = Calculation(mean=0.5, low=0, high=1)
# without naming (now order is important and it is less clear what the numbers are used for)
calc = Calculation(0, 1, 0.5)
Note that the same can be done for any function, not only when initializing an object.

Use function parameter to construct name of object or dataframe

I would like to use a function's parameter to create dynamic names of dataframes and/or objects in Python. I have about 40 different names so it would be really elegant to do this in a function. Is there a way to do this or do I need to do this via 'dict'? I read that 'exec' is dangerous (not that I could get this to work). SAS has this feature for their macros which is where I am coming from. Here is an example of what I am trying to do (using '#' for illustrative purposes):
def TrainModels (mtype):
model_#mtype = ExtraTreesClassifier()
model_#mtype.fit(X_#mtype, Y_#mtype)
TrainModels ('FirstModel')
TrainModels ('SecondModel')
You could use a dictionary for this:
models = {}
def TrainModels (mtype):
models[mtype] = ExtraTreesClassifier()
models[mtype].fit()
First of all, any name you define within your TrainModels function will be local to that function, so won't be accessible in the rest of your program. So you have to define a global name.
Everything in Python is a dictionary, including the global namespace. You can define a new global name dynamically as follows:
my_name = 'foo'
globals()[my_name] = 'bar'
This is terrible and you should never do it. It adds too much indirection to your code. When someone else (or yourself in 3 months when the code is no longer fresh in your mind) reads the code and see 'foo' used elsewhere, they'll have a hard time figuring out where it came from. Code analysis tools will not be able to help you.
I would use a dict as Milkboat suggested.

Most Pythonic way of initializing instance variables

I'm currently on some heavy data analytics projects, and am trying to create a Python wrapper class to help streamline a lot of the mundane preprocessing steps involved when cleaning data, partitioning it into test / validation sets, standardizing it, etc. The idea ultimately is to transform raw data into easily consumable processed matrices for machine learning algorithms to input for training and testing purposes. Ideally, I'm working towards the point where
data = DataModel(AbstractDataModel)
processed_data = data.execute_pipeline(**kwargs)
So in many cases I'll start off with a self.df, which is a pandas dataframe object for my instance. But one method may be called standardize_data() and will ultimately return a standardized dataframe called self.std_df.
My IDE has been complaining heavily about me initializing variables outside of __init__. So to try to soothe PyCharm, I've been using the following code inside my constructor:
class AbstractDataModel(ABC):
#abstractmethod
def __init__(self, input_path, ..., **kwargs):
self.df_train, self.df_test, self.train_ID, self.test_ID, self.primary_key, ... (many more variables) = None, None, None, None, None, ...
Later on, these properties are being initialized and set. I'll admit that I'm coming from heavy-duty Java Spring projects, so I'm still used to verbosely declaring variables. Is there a more Pythonic way of declaring my instance properties here? I know I must be violating DRY with all the None values.
I've researched on SO, and came across this similar question, but the answer that is provided is more about setting instance variables through argv, so it isn't a direct solution in my context.
Use chained assignment:
self.df_train = self.df_test = self.train_ID = self.test_ID = self.primary_key = ... = None
Or set up abstract properties that default to None (So you don't have to set them)

How do I programmatically add new functions to current scope in Python?

In Python it is easy to create new functions programmatically. How would I assign this to programmatically determined names in the current scope?
This is what I'd like to do (in non-working code):
obj_types = ('cat', 'dog', 'donkey', 'camel')
for obj_type in obj_types:
'create_'+obj_type = lambda id: id
In the above example, the assignment of lambda into a to-be-determined function name obviously does not work. In the real code, the function itself would be created by a function factory.
The background is lazyness and do-not-repeat-yourself: I've got a dozen and more object types for which I'd assign a generated function. So the code currently looks like:
create_cat = make_creator('cat')
# ...
create_camel = make_creator('camel')
The functions create_cat etc are used hardcoded in a parser.
If I would create classes as a new type programmatically, types.new_class() as seen in the docs seems to be the solution.
Is it my best bet to (mis)use this approach?
One way to accomplish what you are trying to do (but not create functions with dynamic names) is to store the lamda's in a dict using the name as the key. Instead of calling create_cat() you would call create['cat'](). That would dovetail nicely with not hardcoding names in the parser logic as well.
Vaughn Cato points out that one could just assign into locals()[object_type] = factory(object_type). However the Python docs prohibit this: "Note: The contents of this dictionary should not be modified; changes may not affect the values of local and free variables used by the interpreter"
D. Shawley points out that it would be wiser to use a dict() object which entries would hold the functions. Access would be simple by using create['cat']() in the parser. While this is compelling I do not like the syntax overhead of the brackets and ticks required.
J.F. Sebastian points to classes. And this is what I ended up with:
# Omitting code of these classes for clarity
class Entity:
def __init__(file_name, line_number):
# Store location, good for debug, messages, and general indexing
# The following classes are the real objects to be generated by a parser
# Their constructors must consume whatever data is provided by the tokens
# as well as calling super() to forward the file_name,line_number info.
class Cat(Entity): pass
class Camel(Entity): pass
class Parser:
def parse_file(self, fn):
# ...
# Function factory to wrap object constructor calls
def create_factory(obj_type):
def creator(text, line_number, token):
try:
return obj_type(*token,
file_name=fn, line_number=line_number)
except Exception as e:
# For debug of constructor during development
print(e)
return creator
# Helper class, serving as a 'dictionary' of obj construction functions
class create: pass
for obj_type in (Cat, Camel):
setattr(create,
obj_type.__name__.lower(),
create_factory(obj_type))
# Parsing code now can use (again simplified for clarity):
expression = Keyword('cat').setParseAction(create.cat)
This is helper code for deploying a pyparsing parser. D. Shawley is correct in that the dict would actually more easily allow to dynamically generate the parser grammar.

Natural Naming Scheme for a dynamic set of members

The desire is for the user to instantiate a class that represents the transeint along with automatic access to a member item for each variable being represented (up to 200 variables). The set of variable class instances would be dynamic based on file based input data and the desire is to use the file provided variable names to create a collection of these variable instances that are accessible with a natural naming scheme. Effectively, the variable class hides the details of where the data is stored and the indepedent variable (ie, time) is stored. The following pseudo code expresses random lines that the end user may express. In some cases, the post processing may be much more extensive.
tran1 = CTransient('TranData', ...)
Padj = tran1.pressPipe1 + 10 # add 10 bar to a pressure for conservatism
Tsat = TsatRoutine( tran1.tempPipe1 )
MyPlotRoutine( tran1.tempPipe1, tran1.tempPipe2 )
where pressPipeX and tempPipeX names defined in the input data files and the corresponding numpy data vectors are specified in the 'TranData' file input file and are instances of a CVariable class.
Help on how to dynamically build the set of instances that represent the transient variables such that they can be accessed would be appreciated.
Your description of what you're trying to do isn't entirely clear, but automatically naming variables something1, something2, etc. are generally a bad idea. Use a list instead:
transientvariables = []
transientvariables.append(makenewtransientvariable())
# ...
for tv in transientvariables:
print tv
Edit: OK, I think I see what you're getting at, although your explanation still isn't exactly easy to read. You have a collection of pipes, with a time series of temperature and pressure recorded for each one, right?
The easiest way would be to use a dictionary:
transients["tempPipe1"]
Or nested dictionaries:
transients["temp"]["Pipe1"]
Or you could override your class' __getattr__ method, so that it looks in a dictionary, and you can do:
transients.tempPipe1
Edit 2: Overriding __getattr__ would look a bit like this:
def __getattr__(self, name):
if name in self.varMap:
return self.varMap[name]
raise AttributeError

Categories

Resources