Situation
When the Django website starts up, it need to load some data from a table in the database for computation. The data is read-only and large (e.g. 20MB).
The computation will be invoked every time a certain page is open. A module will use the data for computation. Therefore, I don't want the module to SELECT and load data every time the page is open.
Question
I guess singleton may be one of the solutions. How to implement the singleton in Django? Or is there any better solutions?
Use of Django caching will be best here. You will need to use a 3rd party caching server e.g. Redis. There is Memcached too, but as you said your data is 20MB so you will need Redis as Memcached only allows 1MB at max per key.
Also using cache is very easy, you just need to sudo apt-get install redis, add CACHES setting in Django settings and you will be good to go.
Redis (or Memcached) are in-memory cache servers and hold all the cached data in memory, so getting it from Redis will be as fast as it can be.
If when you say "data is read-only" you mean never changes, pre-calcule your data and store it in DB instead of store the large database.
If not, you can use a caching system using Memcache or Redis or so.
The idea:
Try to get the data you need.
If doesn't exist calculate it and store it in cache.
If you are deploying using GAE Memcache is the easiest on my experience. If not, I use to use Redis. Anyway, Django provides docs about it.
You could create a singleton class:
class Singleton(object):
_instances = {}
def __new__(cls, *args, **kwargs):
if cls not in cls._instances:
cls._instances[cls] = super(Singleton, cls).__new__(cls, *args, **kwargs)
return cls._instances[cls]
Then extend it in your class that contains the functions returning your selection:
class SampleClass(Singleton):
data = None
def load_data(self):
self.data = MyModel.objects.all()
def get_data(self):
if self.data is None:
self.load_data()
return self.data
What you want is caching I think. Did you check django's cache framework? Database caching seems the relevant part for you.
It basically enables you to store results of database queries for using later without querying again.
There is a simple way to create singleton class/object in Python:
First, you should define a base class as following:
class Singleton(object):
"""
This is a base class for building a singleton class.
Just inherited this class!
e.g.
class MySingleton(Singleton)
Note that not implement a __init__ method for these
classes to avoid some mistakes.
"""
_singletons = {}
def __new__(cls, *args, **kwargs):
if not cls._singletons.has_key(cls):
cls._singletons[cls] = object.__new__(cls)
return cls._singletons[cls]
Then just write a derived class extend it.
That's all!
Related
There is a flask website which accesses some data stored in a database. The data is related to more than one templates of the website, and I would prefer the database not to be accessed again and again for the same info as the user is visiting the website templates, but rather to be stored in variables. Session is not a good idea because of the size of the data. I'm wondering if using global variables for that purpose would be a good idea. Accessing once the database and assigning the data to the global variables, from where it will be available through the templates of the website for as long as the session will last. I would be grateful to know if this is the proper way to achieve it, or there are drawbacks in a way that accessing the database several times if needed, would be a better option. Thank you in advance.
Try with ORM tools like sQLAlchemy, it does most of the heavy weight lifting for you.
https://www.sqlalchemy.org/
Examples available here
https://realpython.com/flask-by-example-part-2-postgres-sqlalchemy-and-alembic/
If your code is not using sqlalchemy and you do not want to do the heavy lifting of refatoring the code for various reasons then you could wrap you database access code in to a class and attach it to the flask app instance.
app.db = DBAccessClass()
In then through the code you's call on the instance attached to the flask app.
This wont solve your issue with making multiple calls to DB for the same data that can be cached.
Then you can use an annotation class that would implement a caching strategy for your DBAccessClass, here is a very simple example:
from functools import wraps, update_wrapper
class cache(object):
def __init__(self):
self.named_caches = {}
def __call__(self, f):
#wraps(f)
def decorated(*args, **kwargs):
key = f.__name__
if key not in self.named_caches:
self.named_caches[key] = f(*args, **kwargs)
return self.named_caches[key]
return update_wrapper(decorated, f)
cached = cache()
class MyDBAccessClass(object):
#cached
def get_some_data(self):
data = query_db()
return data
This might be a short term solution, would strongly encourage to consider sQLAlchemy.
I'm creating a library that contains Bot class. I want it to be able to store messages logs and some other info in the sql database with the help of the pony orm. Idea is for the bot to accept the path to the database and create it or connect to it if it already exists.
Sadly, it seems that with pony I can only define classes after creation of the database object, needing to inherit from database.Entity class to bind my class to the database.
Obvious solution is to define all classes in the Bot class constructor after the creation of the database, but it seems quite ugly in terms of structure, because I plan those classes to be quite large, and wanted to store them in separate files.
Other hypothetical way is to do following (but I don't know whether that functionality is supported):
from pony import orm
class Message(orm.Entity):
text = orm.Required(unicode)
class Database(orm.Database):
def __init__(self, path):
# super(..).__init__(..)
# bind Message to self
class Bot(object):
def __init__(self, path):
self.database = Database(path)
Yet another possible hypothetical way would be if I could inherit from Database.Entity.
Any ideas? Maybe I can achieve this with another ORMs like SQLAlchemy?
This question isn't entirely App Engine specific, but it might help knowing the context: I have a kind of "static site generator" on App Engine that renders pages and allows them to be styled via various themes and theme settings. The themes are currently stored directly on the App Engine filesystem and uploaded with the application. A theme consists of a few templates and yaml configuration data.
To encapsulate working with themes, I have a Theme class. theme = Theme('sunshine'), for example, constructs a Theme instance that loads and parses the configuration data of the theme called 'sunshine', and allows calls like theme.render_template('index.html') that automatically load and render the correct file on the filesystem.
Problem is, loading and especially parsing a Theme's (yaml) configuration data every time a new request comes in and instantiates a Theme is expensive. So, I want to cache the data within the processes/App Engine instances and maybe later within memcached.
Until now, I've used very simple caches like so:
class Theme(object):
_theme_variables_cache = {}
def __init__(self, name):
self.name = name
if name not in Theme._theme_variables_cache:
Theme._theme_variables[name] = self.load_theme_variables()
...
(I'm aware that the config could be read multiple times when several requests hit the constructor at the same time. I don't think it causes problems though.)
But that kind of caching gets ugly really quickly. I have several different things I want to read from config files and all of the caches are dictionaries because every different theme 'name' also points to a different underlying configuration.
The last idea I had was creating a function like Theme._cached_func(func) that will only execute func when the functions result isn't already cached for the specific template (remember, when the object represents a different template, the cached value can also be different). So I could use it like: self.theme_variables = Theme._cached_func(self.load_theme_variables()), but, I have a feeling I'm missing something obvious here as I'm still pretty new to Python.
Is there an obvious and clean Python caching pattern that will work for such a situation without cluttering up the entire class with cache logic? I think I can't just memoize function results via decorators or something because different templates will have to have different caches. I don't even need any "stale" cache handling because the underlying configuration data doesn't change while a process runs.
Update
I ended up doing it like that:
class ThemeConfig(object):
__instances_cache = {}
#classmethod
def get_for(cls, theme_name):
return cls.__instances_cache.setdefault(
theme_name, ThemeConfig(theme_name))
def __init__(self, theme_name):
self.theme_name = theme_name
self._load_assets_urls() # those calls load yaml files
self._load_variables()
...
class Theme(object):
def __init__(self, theme_name):
self.theme_name = theme_name
self.config = ThemeConfig.get_for(theme_name)
...
So ThemeConfig stores all the configuration stuff that's read from the filesystem for a theme and the factory method ThemeConfig.get_for will always hand out the same ThemeConfig instance for the same theme name. The only caching logic I have is the one line in the factory method, and Theme objects are still as temporary and non-shared as they always were, so I can use and abuse them however I wish.
I will take a shot at this. Basically, a factory pattern can be used here to maintain a clean boundary between your Theme object and the creation of the Theme instance with a particular way.
The factory itself can also maintain a simple caching strategy by storing a mapping between the Theme name and the corresponding Theme object. I would go with a following implementation:
#the ThemeFactory class instantiates a Theme with a particular name if not present within it's cache
class ThemeFactory(object) :
def __init__(self):
self.__theme_variables_cache = {}
def createTheme(self, theme_name):
if not self.__theme_variables_cache.contains(name):
theme = Theme(theme_name)
self.__theme_variables_cache[name] = theme.load_theme_variables()
return self.__theme_variables_cache[name]
The definition of the Theme class is now very clean and simple and will not contain any caching complications
class Theme(object):
def __init__(self, name):
self.__theme_name = name
def load_theme_variables(self):
#contain the logic for loading theme variables from theme files
The approach has the advantages of code maintainability and clear segregation of responsibilities ( although not completely so , the factory class still maintains the simple cache. Ideally it should simply have a reference to a caching service or another class that handles caching .. but you get the point).
Your Theme class does what it does the best - loading theme variables. Since you have a factory pattern, you are keeping the client code ( the one that consumes the Theme class instance) encapsulated from the logic of creating the Theme instances. As your application grows, you can extend this factory to control the creation of various Theme objects (including classes derived fron Theme)
Note that this is just one way of achieving simple caching behavior as well as instance creation encapsulation.
One more point - you could store Theme objects within the cache instead of the theme variables. This way you could read the theme variables from templates only on first use(lazy loading). However, in this case you would need to make sure that you store the theme variables as an instance variable of the Theme class. The method load_theme_variables(self) now needs to be written this way:
def load_theme_variables(self):
#let the theme variables be stored in an instance variable __theme_variable
if not self.__theme_variables is None:
return self.__theme_variables
#__read_theme_file is a private function that reads the theme files
self__theme_variables = self.__read_theme_file(self.__theme_name)
Hopefully, this gives you an idea on how to go about achieving your use case.
I am doing like this:
def hydrate(self, bundle):
manipulate data here
Now, based on the data, I want to check whether its already available or not. And want to create object if it doesnt exist already. Simply saying, want to do get_or_create.
You have hardly any code so I don't know what your models etc are around it but I think what you're looking for is:
Model.objects.get_or_create()
I think you are looking for super, which lets you overwrite, extend or prepend any class's method by calling super on its type
You can do that by overriding the obj_create method and using get_or_create function:
def obj_create(self, bundle, request=None, **kwargs):
....
But that doesn't sound like a good idea. Create means create and return already existing data..
I've got some models on my GAE app, and I've overriden put() on some of them. When I call db.put() with a list of these entites, is there a guarantee that the overriden put() on each entity will be called?
What I'm seeing right now is that the entities are just getting saved without it being called. Is there any good way to make sure stuff is done before every save, even batches?
No. You need to monkeypatch db.put() too. For a good example of this, check out Nick Johnson's excellent blog post on Pre- and post- put hooks for Datastore models.
If you look at the source code for the db module, you'll see that db.put() does not call the entity's put() function.
You could try something like:
class SomeModel(db.Model):
aprop = db.IntegerProperty()
def _populate_internal_entity(self, *args, **kwargs):
logging.warn('about to Put() SomeModel: %r', self)
return super(SomeModel, self)._populate_internal_entity(*args, **kwargs)
However, there is probably a better way to do it. If you are trying to set some properties you should check out custom property classes. If you are trying to do logging or caching you should investigate datastore hooks.