In python, is it bad form to write an __init__ definition like:
class someFileType(object):
def __init__(self, path):
self.path = path
self.filename = self.getFilename()
self.client = self.getClient()
self.date = self.getDate()
self.title = self.getTitle()
self.filetype = self.getFiletype()
def getFilename(self):
'''Returns entire file name without extension'''
filename = os.path.basename(self.path)
filename = os.path.splitext(filename)
filename = filename[0]
return filename
def getClient(self):
'''Returns client name associated with file'''
client = self.filename.split()
client = client[1] # Assuming filename is formatted "date client - docTitle"
return client
where the initialized variables are calls to functions returning strings? Or is it considered lazy coding? It's mostly to save me from writing something.filetype as something.getFiletype() whenever I want to reference some aspect of the file.
This code is to sort files into folders by client, then by document type, and other manipulations based on data in the file name.
Nope, I don't see why that would be bad form. Calculating those values only once when the instance is created can be a great idea, in fact.
You could also postpone the calculations until needed by using caching propertys:
class SomeFileType(object):
_filename = None
_client = None
def __init__(self, path):
self.path = path
#property
def filename(self):
if self._filename is None:
filename = os.path.basename(self.path)
self._filename = os.path.splitext(filename)[0]
return self._filename
#property
def client(self):
'''Returns client name associated with file'''
if self._client is None:
client = self.filename.split()
self._client = client[1] # Assuming filename is formatted "date client - docTitle"
return self._client
Now, accessing somefiletypeinstance.client will trigger calculation of self.filename as needed, as well as cache the result of it's own calculation.
In this specific case, you may want to make .path a property as well; one with a setter that clears the cached values:
class SomeFileType(object):
_filename = None
_client = None
def __init__(self, path):
self._path = path
#property
def path(self):
return self._path
#path.setter
def path(self, value):
# clear all private instance attributes
for key in [k for k in vars(self) if k[0] == '_']:
delattr(self, key)
self._path = value
#property
def filename(self):
if self._filename is None:
filename = os.path.basename(self.path)
self._filename = os.path.splitext(filename)[0]
return self._filename
#property
def client(self):
'''Returns client name associated with file'''
if self._client is None:
client = self.filename.split()
self._client = client[1] # Assuming filename is formatted "date client - docTitle"
return self._client
Because property-based caching does add some complexity overhead, you need to consider if it is really worth your while; for your specific, simple example, it probably is not. The calculation cost for your attributes is very low indeed, and unless you plan to create large quantities of these classes, the overhead of calculating the properties ahead of time is negligible, compared to the mental cost of having to maintain on-demand caching properties.
Your code is doing two different things:
a) Simplifying the class API by exposing certain computed attributes as variables, rather than functions.
b) Precomputing their values.
The first task is what properties are for; a straightforward use would make your code simpler, not more complex, and (equally important) would make the intent clearer:
class someFileType(object):
#property
def filename(self):
return os.path.basename(self.path)
You can then write var.filename and you will dynamically compute the filename from the path.
#Martijn's solution adds caching, which also takes care of part b (precomputation). In your example, at least, the calculations are cheap so I don't see any benefit in doing so.
On the contrary, caching or precomputation raises consistency issues. Consider the following snippet:
something = someFileType("/home/me/document.txt")
print something.filename # prints `document`
...
something.path = "/home/me/document-v2.txt"
print something.filename # STILL prints `document` if you cache values
What should the last statement print? If you cache your computations, you will still get document instead of document-v2! Unless you are certain that nobody will try to change the value of the basic variable, you need to either avoid caching, or take measures to ensure consistency. The easiest way is to prohibit modifications to path-- one of the things that properties are designed to do.
Conclusion: Use properties to simplify your interface. Don't cache computations, unless it's necessitated by performance reasons. If you cache, take measures to ensure consistency, e.g. by making the underlying value read-only.
PS. The issues are analogous to database normalization (non-normalized designs raise consistency issues), but in python you have more resources for keeping things in sync.
Related
I'm working on a Python desktop app using wxPython and SQLite. The SQLite db is basically being used as a save file for my program so I can save and backup and reload the data being entered. I've created separate classes for parts of my UI so make it easier to manage from the "main" window. The problem I'm having is that each control needs to access the database, but the filename, and therefore the connection name, needs to be dynamic. I originally created a DBManager class that hardcoded a class variable with the connection string, which worked but didn't let me change the filename. For example
class DBManager:
conn = sqlite3.Connection('my_file.db')
#This could then be passed to other objects as needed
class Control1:
file = DBManager()
class Control2:
file = DBManager()
etc.
However, I'm running into a lot of problems trying to create this object with a dynamic filename while also using the same connection across all controls. Some examples of this I've tried...
class DBManager:
conn = None
def __init__(self):
pass
def __init__(self, filename):
self.conn = sqlite3.Connection(filename)
class Control1:
file = DBManager()
class Control2:
file = DBManager()
The above doesn't work because Python doesn't allow overloading constructors, so I always have to pass a filename. I tried adding some code to the constructor to act differently based upon whether the filename passed was blank or not.
class DBManager:
conn = None
def __init__(self, filename):
if filename != '':
self.conn = sqlite3.Connection(filename)
class Control1:
file = DBManager('')
class Control2:
file = DBManager('')
This let me compile, but the controls only had an empty connection. The conn object was None. It seems like I can't change a class variable after it's been created? Or am I just doing something wrong?
I've thought about creating one instance of DBManager that I then pass into each control, but that would be a huge mess if I need to load a new DB after starting the program. Also, it's just not as elegant.
So, I'm looking for ideas on achieving the one-connection path with a dynamic filename. For what it's worth, this is entirely for personal use, so it doesn't really have to follow "good" coding convention.
Explanation of your last example
You get None in the last example because you are instantiating DBManager in Control1 and Control2 with empty strings as input, and the DBManager constructor has an if-statement saying that a connection should not be created if filename is just an empty string. This leads to the self.conn instance variable never being set and any referal to conn would resolve to the conn class variable which is indeed set to None.
self.conn would create an instance variable only accessible by the specific object.
DBManager.conn would refer to the class variable and this is what you want to update.
Example solution
If you only want to keep one connection, you would need to do it with e.g. a. class variable, and update the class variable every time you interact with a new db.
import sqlite3
from sqlite3 import Connection
class DBManager:
conn = None
def __init__(self, filename):
if filename != '':
self.filename = filename
def load(self) -> Connection:
DBManager.conn = sqlite3.Connection(self.filename) # updating class variable with new connection
print(DBManager.conn, f" used for {self.filename}")
return DBManager.conn
class Control1:
db_manager = DBManager('control1.db')
conn = db_manager.load()
class Control2:
db_manager = DBManager('control2.db')
conn = db_manager.load()
if __name__ == "__main__":
control1 = Control1()
control2 = Control2()
would output the below. Note that the class variable conn refers to different memory addresses upon instantiating each control, showing that it's updated.
<sqlite3.Connection object at 0x10dc1e1f0> used for control1.db
<sqlite3.Connection object at 0x10dc1e2d0> used for control2.db
We needed to route our database requests to either a writer master database or a set of read replicas.
We found a blog post by Mike Bayer suggesting how to do so using SQLAlchemy. We replicated the solution but that did not work out with our existing tests due to various reasons.
We went on with the following below. This will reuse one session rather than creating new ones that will stack altogether:
class ExplicitRoutingSession(SignallingSession):
_name = None
def get_bind(self, mapper=None, clause=None):
# If reader and writer binds are not configured,
# connect using the default SQLALCHEMY_DATABASE_URI
if not self.binds_setup:
return super().get_bind(mapper, clause)
return self.load_balance(mapper, clause)
def load_balance(self, mapper=None, clause=None):
# Use the explicit name if present
if self._name and not self._flushing:
bind = self._name
self._name = None
self.app.logger.debug(f"Connecting -> {bind}")
return get_state(self.app).db.get_engine(self.app, bind=bind)
# Everything else goes to the writer engine
else:
self.app.logger.debug("Connecting -> writer")
return get_state(self.app).db.get_engine(self.app, bind='writer')
def using_bind(self, name):
self._name = name
return self
#cached_property
def binds_setup(self):
binds = self.app.config['SQLALCHEMY_BINDS'] or {}
return all([k in binds for k in ['reader', 'writer']])
So far it works good for us. We assume we might lose some functionality such as db save points by not having stacked sessions... but we'd like to know if there are stability and unforeseen risks other than losing features with such an approach?
Notes:
We are also using flask-sqlalchemy.
This is from an open source notification platform and you can browse the code/branch yourself.
Let's say I have a (simplified) class as below. I am using it for a program configuration (hyperparameters).
# config.py
class Config(object): # default configuration
GPU_COUNT = 1
IMAGES_PER_GPU = 2
MAP = {1:2, 2:3}
def display(self):
pass
# experiment1.py
from config import Config as Default
class Config(Default): # some over-written configuration
GPU_COUNT = 2
NAME='2'
# run.py
from experiment1 import Config
cfg = Config()
...
cfg.NAME = 'ABC' # possible runtime over-writing
# Now I would like to save `cfg` at this moment
I'd like to save this configuration and restore later. The member functions must be out of concern when restoring.
1. When I tried pickle:
import pickle
with open('cfg.pk', 'rb') as f: cfg = pickle.load(f)
##--> AttributeError: Can't get attribute 'Config' on <module '__main__'>
I saw a solution using class_def of Config, but I wish I can restore the configuration without knowing the class definition (eg, export to dict and save as JSON)
2. I tried to convert class to dict (so that I can export as JSON)
cfg.__dict__ # {'NAME': 'ABC'}
vars(cfg) # {'NAME': 'ABC'}
In both cases, it was difficult to access attributes. Is it possible?
The question's title is "how to convert python class to dict", but I suspect you are really just looking for an easy way to represent (hyper)parameters.
By far the easiest solution is to not use classes for this. I've seen it happen on some machine learning tutorials, but I consider it a pretty ugly hack. It breaks some semantics about classes vs objects, and the difficulty pickling is a result from that. How about you use a simple class like this one:
class Params(dict):
__getattr__ = dict.__getitem__
__setattr__ = dict.__setitem__
__delattr__ = dict.__delitem__
def __getstate__(self):
return self
def __setstate__(self, state):
self.update(state)
def copy(self, **extra_params):
return Params(**self, **extra_params)
It can do everything the class approach can. Predefined configs are then just objects you should copy before editing, as follows:
config = Params(
GPU_COUNT = 2,
NAME='2',
)
other_config = config.copy()
other_config.GPU_COUNT = 4
Or alternatively in one step:
other_config = config.copy(
GPU_COUNT = 4
)
Works fine with pickle (although you will need to have the Params class somewhere in your source), and you could also easily write load and save methods for the Params class if you want to use JSON.
In short, do not use a class for something that really is just an object.
Thankfully, #evertheylen's answer was great to me. However, the code returns error when p.__class__ = Params, so I slightly changed as below. I think it works in the same way.
class Params(dict):
__getattr__ = dict.__getitem__
__setattr__ = dict.__setitem__
__delattr__ = dict.__delitem__
def __getstate__(self):
return self
def __setstate__(self, state):
self.update(state)
def copy(self, **extra_params):
lhs = Params()
lhs.update(self)
lhs.update(extra_params)
return lhs
and you can do
config = Params(
GPU_COUNT = 2,
NAME='2',
)
other_config = config.copy()
other_config.GPU_COUNT = 4
After some reading, I found myself struggling with two different approaches to pass a list of arguments to a function. I read some indications. That's what I figured out so far:
Actual code:
file caller.py:
import worker
worker.version_check(iserver,login,password,proxyUser,proxyPass,
proxyServer,packageInfo)
worker.version_get(iserver,login,password,proxyUser,proxyPass,
proxyServer,packageInfo)
worker.version_send(iserver,login,password,proxyUser,proxyPass,
proxyServer,packageInfo)
File: worker.py:
def version_check(iserver,login,password,proxyUser,proxyPass,proxyServer,service):
#code and more code
def version_get(iserver,login,password,proxyUser,proxyPass,proxyServer,service):
#code and more code
def version_send(iserver,login,password,proxyUser,proxyPass,proxyServer,service):
#code and more code
And now I have:
file caller.py:
import worker
args = (env, family, host, password, prefix, proxyServer,
proxyUser, proxyPass, option, jokerVar
)
worker.version_check(*args)
worker.version_get(*args)
worker.version_send(*args)
File: worker.py:
def version_check(*args):
env = args[0]
family = args[1]
host = args[2]
password = args[3]
prefix = args[4]
proxyServer = args[5]
proxyUser = args[6]
proxyPass = args[7]
option = args[8]
jokerVar = args[9]
#code and more code
def version_get((*args):
env = args[0]
family = args[1]
host = args[2]
password = args[3]
prefix = args[4]
proxyServer = args[5]
proxyUser = args[6]
proxyPass = args[7]
option = args[8]
jokerVar = args[9]
#code and more code
def version_send(*args):
env = args[0]
family = args[1]
host = args[2]
password = args[3]
prefix = args[4]
proxyServer = args[5]
proxyUser = args[6]
proxyPass = args[7]
option = args[8]
jokerVar = args[9]
#code and more code
Using the old approach (actual code) I believe it is more "friendly" to call a function in one line only (as you can see on worker.py). But, using the new approach, I think the code get more extensive because for each function I have to define all the same variables. But is this the best practice? I'm still learning Python on a slow curve, so, sorry for any mistakes in the code.
And one important thing, most of the variables are retrieved from a database, so they are not stactic.
I really don't recommend defining functions like def version_check(*args): unless you specifically need to. Quick, without reading the source: what order are the arguments in? How do you specify a default value for proxyServer? Remember, "explicit is better than implicit".
The one time I routinely deviate from that rule is when I'm wrapping another function like:
def foo(bar):
print 'Bar:', bar
def baz(qux, *args):
print 'Qux:', qux
foo(*args)
I'd never do it for such a simple example, but suppose foo is a function from a 3rd-party package outside my control with lots of defaults, keyword arguments, etc. In that case, I'd rather punt the argument parsing to Python than attempt it myself.
Personally, I'd write that as a class like:
class Worker(object):
def __init__(iserver,login,password,proxyUser,proxyPass,proxyServer,service):
self.iserver = iserver
self.login = login
self.password = password
self.proxyUser = proxyUser
self.proxyPass = proxyPass
self.proxyServer = proxyServer
self.service = service
def version_check(self): ...
def version_get(self): ...
def version_send(self): ...
And then in the client, write:
from worker import Worker
w = Worker(iserver,login,password,proxyUser,proxyPass,proxyServer,service)
w.version_check()
w.version_get()
w.version_send()
If you really need to write functions with lots of arguments instead of encapsulating that state in a class - which is a more typically Pythonic way to do it - then consider the namedtuple datatype from recent Python versions. It lets you specify a tuple where items are addressable by keyword and can make for some very clean, elegant code.
There are many approaches, depending on what those arguments represent.
If they are just a grab-bag of arguments (especially if some are optional), use keyword arguments:
myargs = {'iserver':'server','login':'username','password':'Pa2230rd'}
version_get(**myargs)
If they represent some thing with its own state, then use classes:
If the arguments represent a single state that your functions are modifying, then accept the arguments in the object constructor and make your version_* methods functions of that class:
class Version(object):
def __init__(self,iserver,login,password,
proxyUser,proxyPass,proxyServer,service):
self.iserver = iserver
self.login = login
#etc
def check(self):
self.iserver
def get(self):
pass
#etc
myversion = Version('iserver','login',...)
myversion.check()
If you have some kind of resource those arguments represent that your functions are merely using, in that case use a separate class, and supply it as an object parameter to your functions:
class Connection(Object):
def __init__(self, iserver, ...):
self.iserver # etc
myconn = Connection('iserver',...)
version_check(myconn)
Most likely, these are two different resources and should be two classes. In this case you can combine these approaches:
#Connection() class as above
class Version(object):
def __init__(self, connection):
self.connection = connection
def check(self):
self.connection.iserver # ....
myconn = Connection('iserver', ...)
conn_versioner = Version(myconn)
conn_versioner.check()
Possibly, your arguments represent more than one object (e.g., a connection and a transparent proxy object) In that case, try to create an object with the smallest public interface methods like version_* would need and encapsulate the state represented by the other arguments using object composition.
For example, if you have proxy connections, you can create a Connection() class which just knows about server, login and password, and a ConnectionProxy() class which has all the methods of a Connection, but forwards to another Connection object. This allows you to separate the proxy* arguments, and means that your version_* functions can be ignorant of whether they're using a proxy or not.
If your arguments are just state and don't have any methods proper to them, consider using a namedtuple(). This will act like a smarter tuple (including tuple unpacking, slicing, etc) and have minimal impact on your existing code while still being easier to use.
Connection = namedtuple('Connection', 'iserver login password etc')
myconn = Connection('iserver', 'loginname', 'passw3rd')
version_check(*myconn)
You can create instance an object or define a class. e.g.
file caller.py:
import worker
info=object()
info.env=0
info.family='something'
info.host='something'
info.password='***'
info.prefix=''
info.proxyServer=''
info.proxyUser=''
info.proxyPass=''
info.option=''
info.jokerVar=''
worker.version_check(info)
worker.version_get(info)
worker.version_send(info)
file worker.py:
def version_check(info):
#you may access values from info
#code and more code
def version_get(info):
#code and more code
def version_send(info):
#code and more code
I came across the __getattr__ built-in and was wondering when it would be used. I had a hard time thinking of a practical use from the documentation
http://docs.python.org/reference/datamodel.html#. What would be an actual example of how it could be used and useful in code?
One example is to use object notation with dictionaries. For example, consider a dictionary
myDict = {'value': 1}
Typically in Python one accesses the 'value' variable as
myDict['value']
which will print 1 at the Python interpreter. However, one may wish to use the myDict.value notation. This may be achieved by using the following class:
class DictAsMember(dict):
def __getattr__(self, name):
value = self[name]
if isinstance(value, dict):
value = DictAsMember(value)
return value
my_dict = DictAsMember()
my_dict['property'] = {'sub_property': 1}
print(my_dict.property.sub_property) # 1 will be printed
An example usage would be to create a simple wrapper around some object. In order, for example, to log the calls, or modify its behavior without inheriting from it, and without having to implement the whole interface of the object.
There is several good documented examples out there, like, for example, http://western-skies.blogspot.fr/2008/02/complete-example-of-getattr-in-python.html.
Since __getattr__ is only called when an attribute is not found, it can be a useful way to define an alternate place to look up an attribute, or to give default values, similar to a defaultdict.
You could also emulate a base class higher than all the others in an object's MRO, by delegating all the lookups here to another object (though doing this you could potentially have an infinite loop if the other object is delegating the attribute back).
There is also __getattribute__, which is related in that it is called anytime any attribute is looked up on the object.
Edit: This is about the built-in function getattr, not the __getattr__ method.
I needed to do this for a REST client using bearer tokens. I wrapped Requests's Session object into my own interface so I could always send the auth header, and (more relevantly) make HTTP requests to the same site, just using the URL's path.
class RequestsWrapper():
def __init__(self, base_url):
self.client = requests.Session(
headers={'Authorization':'myauthtoken'}
)
self.base_url = base_url
def _make_path_request(self, http_method, path, **kwargs):
"""
Use the http_method string to find the requests.Session instance's
method.
"""
method_to_call = getattr(self.client, http_method.lower())
return method_to_call(self.base_url + path, **kwargs)
def path_get(self, path, **kwargs):
"""
Sends a GET request to base_url + path.
"""
return self._make_path_request('get', path, **kwargs)
def path_post(self, path, **kwargs):
"""
Sends a POST request to base_url + path.
"""
return self._make_path_request('post', path, **kwargs)
def path_put(self, path, **kwargs):
"""
Sends a PUT request to base_url + path.
"""
return self._make_path_request('put', path, **kwargs)
def path_delete(self, path, **kwargs):
"""
Sends a DELETE request to base_url + path.
"""
return self._make_path_request('delete', path, **kwargs)
Then, I could just make a request based on the path:
# Initialize
myclient = RequestsWrapper("http://www.example.com")
# Make a get request to http://www.example.com/api/spam/eggs
response = myclient.path_get("/api/spam/eggs")
# Print the response JSON data
if response.ok:
print response.json