Using Abseil vs. Directly calling main()? - python

I've been using the vanilla
def main():
# Do stuff
if __name__ == '__main__':
main()
but recently saw people doing
from absl import app
def main(_):
# Do things
if __name__ == '__main__':
app.run(main)
Abseil provides flags.FLAGS, but I've been using ArgumentParser, which works perfectly fine, so there is no win for Abseil in this aspect.
Then, why bother go the Abseil route?
PS: Related discussion on Reddit (which doesn't really answer this question): https://www.reddit.com/r/Python/comments/euhl81/is_using_googles_abseil_library_worth_the/

Consider a design pattern, where you are passing a json file (that contains say site-specific constants) at the cmd line as input to your Python script. Say,the json file contains immutable constants and you want to maintain it that way.
You want the constants from json file contents to be made available to all the modules within your project.
One way to do this is by implementing a central module that deserializes the json into a Python object. ABSL helps you to solve this by accessing (via the FLAGS) the input file in the central module and then storing it into a class variable so all modules across your project can use this.
Without ABSL, you would need to first argparse the input file in main module, then send it to the central module.
A code example of this can be something like:
main.py:
from centralmod import DeserializeClass
import centralmod
from absl import flags
from absl import app
_JSON_FILE = flags.DEFINE_string("json_file", None, "Constants", short_name='j', required=True)
def scenario():
import someothermodule
someothermodule.do_something()
def populate_globalvar():
centralmod.populate_somevar()
deserialized_data = DeserializeClass.somevar
def main(argv):
populate_globalvar()
scenario()
if __name__ == '__main__':
app.run(main)
centralmod.py:
from absl import flags
import json
FLAGS = flags.FLAGS
class DeserializeClass:
#classmethod
def get_value(cls, key):
return DeserializeClass.somevar[key]
def populate_somevar():
with open(FLAGS.json_file) as json_constants_fh:
deserialized_data = json.load(json_constants_fh)
setattr(DeserializeClass, 'somevar', deserialized_data)
and someothermod.py:
from centralmod import DeserializeClass
site_specific_consts = DeserializeClass.somevar
def do_something():
print(f"\nScenario: Testing. The site-specific constants are:\n{site_specific_consts}")
print(f"the value of key ssh_key_file is {DeserializeClass.get_value('ssh_key_file')}")
print(f"the value of key nodes is {DeserializeClass.get_value('nodes')}")

Related

Using Hydra Configuration inside Classes

I am trying to use the hydra tool in my project and would like to use the decorator for class functions
import hydra
from hydra.core.config_store import ConfigStore
from src.config import RecordingConfig
cs = ConfigStore.instance()
cs.store(name="recording_config", node=RecordingConfig)
class HydraClassTest:
#hydra.main(config_path="../src/conf/", config_name="conf")
def __init__(self, conf: RecordingConfig):
print(conf)
def main():
HydraClassTest()
if __name__ == "__main__":
main()
But I get the error
TypeError: __init__() missing 1 required positional argument: 'conf'
Is this intended and should I pass the configuration from the outside to the class? (For example by using the decorator on the main function and passing the configuration as a parameter to the initializer, this works)
Or am using the decorator in a wrong way?
If it is intended, is there some design reason why one would not want to do it that way?
I have checked whether I used the decorator correctly by passing the configuration through the main function, that worked.
import hydra
from hydra.core.config_store import ConfigStore
from src.config import RecordingConfig
cs = ConfigStore.instance()
cs.store(name="recording_config", node=RecordingConfig)
class HydraClassTest:
def __init__(self, conf: RecordingConfig):
print(conf)
#hydra.main(config_path="../src/conf/", config_name="conf")
def main(conf: RecordingConfig):
HydraClassTest(conf)
if __name__ == "__main__":
main()
This gives me the expected result.
#hydra.main() is not appropriate for this use case. It's designed to be used once in an application and it has many side effects (changing working directory, configuring logging etc).
Use the Compose API instead.

How can i re-use an initialized class in Python?

I'm trying to access a initialized class in the main application from other modules but don't know how to do it.
Background: i want to update a dataframe with data during the whole execution in the main application.
I have to following application structure (this is an simplified version of the code in my application):
constraints
- test_function.py (separate module which should be able to update the initialized class in the main app)
functions
- helper.py (the class which contains the dataframe logic)
main.py (my main application code)
main.py:
import functions.helper
gotstats = functions.helper.GotStats()
gotstats.add(solver_stat='This is a test')
gotstats.add(type='This is a test Type!')
print(gotstats.result())
import constraints.test_function
constraints.test_function.test_function()
helper.py:
class GotStats(object):
def __init__(self):
print('init() called')
import pandas as pd
self.df_got_statistieken = pd.DataFrame(columns=['SOLVER_STAT','TYPE','WAARDE','WAARDE_TEKST','LOWER_BOUND','UPPER_BOUND','OPTIMALISATIE_ID','GUROBI_ID'])
def add(self,solver_stat=None,type=None,waarde=None,waarde_tekst=None,lower_bound=None,upper_bound=None,optimalisatie_id=None,gurobi_id=None):
print('add() called')
self.df_got_statistieken = self.df_got_statistieken.append({'SOLVER_STAT': solver_stat,'TYPE': type, 'WAARDE': waarde, 'OPTIMALISATIE_ID': optimalisatie_id, 'GUROBI_ID': gurobi_id}, ignore_index=True)
def result(self):
print('result() called')
df_got_statistieken = self.df_got_statistieken
return df_got_statistieken
test_function.py:
import sys, os
sys.path.append(os.getcwd())
def test_function():
import functions.helper
gotstats = functions.helper.GotStats()
gotstats.add(solver_stat='This is a test from the seperate module')
gotstats.add(type='This is a test type from the seperate module!')
print(gotstats.result())
if __name__ == "__main__":
test_function()
In main i initialize the class with "gotstats = functions.helper.GotStats()". After that i can correctly use its functions and add dataframe rows by using the add function.
I would like that test_function() is able to add dataframe rows to that same object but i don't know how to do this (in current code the test_function.py just creates a new class in it's local namespace which i don't want). Do i need to extend the class object with an function to get the active one (like logging.getLogger(name))?
Any help in the right direction would be appreciated.
Make your test_function accept the instance as a parameter and pass it to the function when you call it:
main.py:
import functions.helper
from constraints.test_function import test_function
gotstats = functions.helper.GotStats()
gotstats.add(solver_stat='This is a test')
gotstats.add(type='This is a test Type!')
print(gotstats.result())
test_function(gotstats)
test_function.py:
import sys, os
import functions.helper
sys.path.append(os.getcwd())
def test_function(gotstats=None):
if gotstats is None:
gotstats = functions.helper.GotStats()
gotstats.add(solver_stat='This is a test from the seperate module')
gotstats.add(type='This is a test type from the seperate module!')
print(gotstats.result())

how to elegantly parse argumens in python before expensive imports?

I have a script, which parses a few arguments, and has some expensive imports, but those imports are only needed if the user gives valid input arguments, otherwise the program exits. Also, when the user says python script.py --help, there is no need for those expensive imports to be executed at all.
I can think of such a script:
import argparse
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument('--argument', type=str)
args = parser.parse_args()
return args
if __name__ == "__main__":
args = parse_args()
import gensim # expensive import
import blahblahblah
def the_rest_of_the_code(args):
pass
if __name__ == "__main__":
the_rest_of_the_code(args)
This does the job, but it doesn't look elegant to me. Any better suggestions for the task?
EDIT: the import is really expensive:
$ time python -c "import gensim"
Using TensorFlow backend.
real 0m12.257s
user 0m10.756s
sys 0m0.348s
You can import conditionally, or in a try block, or just about anywhere in code.
So you could do something like this:
import cheaplib
if __name__ == "__main__":
args = parse_args()
if expensive_arg in args:
import expensivelib
do_stuff(args)
Or even more clearly, only import the lib in the function that will use it.
def expensive_function():
import expensivelib
...
Not sure it's better than what you already have, but you can load it lazily:
def load_gensim():
global gensim
import gensim
If you only want to make sure the arguments make sense, you can have a wrapper main module that checks the arguments and then loads another module and call it.
main.py:
args = check_args()
if args is not None:
import mymodule
mymodule.main(args)
mymodule.py:
import gensim
def main(args):
# do work

Python OOP - Should def main() be outside of any class in a .py file?

I have a question when it comes to OOP in general, as well as Python in particular. Let's say that I have, for instance, priorities.py - a simple GUI program to manage priorities and there are three classes: Priority, Client, GuiPart:
# priorities.py
# GUI program to manage priorities
from tkinter import *
class Priority:
pass
class GuiPart:
def __init__(self):
self.root = self.createWindow()
def createWindow(self):
root = Tk()
root.resizable(width = False, height = False)
root.title("Priorities")
return root
def display(self):
Label(self.root,
text = "testes").grid(row = 0, column = 1)
class Client:
pass
def main():
g = GuiPart()
g.display()
root = g.root.mainloop()
main()
Should I put def main() outside of any classes, or should I put it in Client class?
Every module(python file) have a builtin __name__ variable, if this equal to "__main__" this means that this file ran directly, but if __name__ is equal to other things this means that current file imported to other python files.
if you running this file directly or as module, you can use __name__ variable to recognize type of code-file used, similar below:
# Some codes
if __name__ == '__main__':
main()
Now users can running this file directly and/or programmers can use this module in other codes without running main() function.
My preferred approach:
separate main file with the if __name__ == '__main__': directive
Reasons:
Application Logic and calling logic is separate. so you can scale easily
Can maintain and apply different environment settings effectively. so, we can seamlessly transition between dev/test/stage/prod setup
Increases code readability as well

Importing values in config.py

I wanted to mix a config.py approach and ConfigParser to set some default values in config.py which could be overridden by the user in its root folder:
import ConfigParser
import os
CACHE_FOLDER = 'cache'
CSV_FOLDER = 'csv'
def main():
cp = ConfigParser.ConfigParser()
cp.readfp(open('defaults.cfg'))
cp.read(os.path.expanduser('~/.python-tools.cfg'))
CACHE_FOLDER = cp.get('folders', 'cache_folder')
CSV_FOLDER = cp.get('folders', 'csv_folder')
if __name__ == '__main__':
main()
When running this module I can see the value of CACHE_FOLDER being changed. However when in another module I do the following:
import config
def main()
print config.CACHE_FOLDER
This will print the original value of the variable ('cache').
Am I doing something wrong ?
The main function in the code you show only gets run when that module is run as a script (due to the if __name__ == '__main__' block). If you want that turn run any time the module is loaded, you should get rid of that restriction. If there's extra code that actually does something useful in the main function, in addition to setting up the configuration, you might want to split that part out from the setup code:
def setup():
# the configuration stuff from main in the question
def main():
# other stuff to be done when run as a script
setup() # called unconditionally, so it will run if you import this module
if __name__ == "__main__":
main() # this is called only when the module is run as a script

Categories

Resources