Invalidate browser cache for static files in Django - python

In Django we have ManifestStaticFilesStorage for caching static files, but it works between Django and browser, but I want right cache between user and browser.
I want: every time static file is changed, hash of file is recalculated and browser cache is invalidated and user see new static file without F5 adn without running --collectstatic --no-input.
My code now isn't working:
settings.py
STATICFILES_STORAGE = 'auth.utils.HashPathStaticFilesStorage'
CACHES = {
'staticfiles': {
'BACKEND': 'django.core.cache.backends.locmem.LocMemCache',
'LOCATION': 'staticfiles',
'TIMEOUT': 3600 * 24 * 7,
'MAX_ENTRIES': 100,
}
}
and auth.utils.py:
# -*- coding: utf-8 -*-
import time
from hashlib import sha384
from django.conf import settings
from django.core.cache import cache
from django.contrib.staticfiles.storage import ManifestStaticFilesStorage
try:
ACCURACY = settings.STATICFILES_HASH_ACCURACY
except AttributeError:
ACCURACY = 12
try:
KEY_PREFIX = settings.STATICFILES_HASH_KEY_PREFIX
except AttributeError:
KEY_PREFIX = 'staticfiles_hash'
class HashPathStaticFilesStorage(ManifestStaticFilesStorage):
"""A static file storage that returns a unique url based on the contents
of the file. When a static file is changed the url will also change,
forcing all browsers to download the new version of the file.
The uniqueness of the url is a GET parameter added to the end of it. It
contains the first 12 characters of the SHA3 sum of the contents of the
file.
Example: {% static "image.jpg" %} -> /static/image.jpg?4e1243
The accuracy of the hash (number of characters used) can be set in
settings.py with STATICFILES_HASH_ACCURACY. Setting this value too low
(1 or 2) can cause different files to get the same hash and is not
recommended. SHA3 hashes are 40 characters long so all accuracy values
above 40 have the same effect as 40.
The values can be cached for faster performance. All keys in the cache have
the prefix specified in STATICFILES_HASH_KEY_PREFIX in setings.py. This
value defaults to 'staticfiles_hash'
"""
#property
def prefix_key(self):
return "%s:%s" % (KEY_PREFIX, 'prefix')
def invalidate_cache(self, nocache=False):
"""Invalidates the cache. Run this when one or more static files change.
If called with nocache=True the cache will not be used.
"""
value = int(time.time())
if nocache:
value = None
cache.set(self.prefix_key, value)
def get_cache_key(self, name):
hash_prefix = cache.get(self.prefix_key)
if not hash_prefix:
return None
key = "%s:%s:%s" % (KEY_PREFIX, hash_prefix, name)
return key
def set_cached_hash(self, name, the_hash):
key = self.get_cache_key(name)
if key:
cache.set(key, the_hash)
def get_cached_hash(self, name):
key = self.get_cache_key(name)
if not key:
return None
the_hash = cache.get(key)
return the_hash
def calculate_hash(self, name):
path = self.path(name)
try:
the_file = open(path, 'rb')
the_hash = sha384(the_file.read()).hexdigest()[:ACCURACY]
the_file.close()
except IOError:
return ""
return the_hash
def get_hash(self, name):
the_hash = self.get_cached_hash(name)
if the_hash:
return the_hash
the_hash = self.calculate_hash(name)
self.set_cached_hash(name, the_hash)
return the_hash
def url(self, name):
base_url = super(HashPathStaticFilesStorage, self).url(name)
the_hash = self.get_hash(name)
if "?" in base_url:
return "%s&%s" % (base_url, the_hash)
return "%s?%s" % (base_url, the_hash)

I just use this very simple idea
<img src="{{ company.logo.url }}?v={% now 'U' %}" />
Force a version with ?v= and set the version to the current timestamp {% now 'U' %} so it will change with every request

A common and simple approach to avoid users having to reload the page to get fresh static content is to append some mutable value in the inclusion of the static files in the HTML markup, something like this:
<script src="{% static 'js/library.js' %}?{{ version }}"></script>
In this way when the variable version assumes a different value, the browser is forced to download a new version of the static files from the server.
You can set version using a custom context processor, for instance reading the project version from settings. Something like this:
from django.conf import settings
def version(request):
return {
'version': settings.VERSION
}
If you are using git as VCS, another approach would be writing the last commit hash of your project in a file, when you push your modifications to the server. The file should be in a format that is readable by Python. In this way you can use the git commit hash as the version variable mentioned before.
You can do this using a GIT post-receive hook:
#!/bin/bash
WORKDIR=/path/to/project/
VERSION_MODULE=${WORKDIR}django_project/project/version.py
# for every branch which has been pushed
while read oldrev newrev ref
do
# if branch pushed is master, update version.py file in the django project
if [[ $ref =~ .*/master$ ]]; then
GIT_WORK_TREE=$WORKDIR git checkout -f master
echo "GIT_REF = 'master'" > $VERSION_MODULE
echo "GIT_REV = '$newrev'" >> $VERSION_MODULE
fi
done
Then your context processor could be:
from project.version import GIT_REV
def version(request):
return {
'version': GIT_REV[:7]
}

Super simple idea: change the STATIC_URL variable in your Django settings.
STATIC_URL = /static/ # before
STATIC_URL = /static/v2/ # after
This will change the path of all static files, forcing browsers to reload the content.

Related

I've loaded a yaml file with `!ENV SOME_VAR` and replaced the string with the value. How do I save the original string and not the changed string?

I'm using python 3.x and pyyaml. I'm not married to pyyaml if I need to replace it.
There are a number of questions (with answers) on how to replace a value in a yaml file with the value of an environment variable.
E.g. db_password: !ENV DB_PASSWORD becomes db_password: s00p3rs3kr3t.
The user and the program can make changes to other values (e.g., user sets db_table with cli option, program sets generated hash value).
I want to save those changes without saving the value of the environment variable for db_password.
A simplified example of what I have looks like the following code.
def my_regex:
return regex
def resolve_env_vars:
# replace string with environment variable value
loader = yaml.SafeLoader
loader.add_implicit_resolver('!ENV', my_regex(), None)
loader.add_constructor('!ENV', resolve_env_vars)
with open(yamlfile, 'r',) as raw:
cfg = yaml.load(raw, Loader=loader)
While this works fine for loading the value into the resulting dict, I need to figure out some way of noting the original value and which key it goes with.
I have stepped through the entire process with pudb and I cannot find a way to restore the original value when writing the config file. By the time the code gets to resolve_env_vars the associated key (e.g., db_password in the example above) is not accessible.
How do I save db_password: !ENV DB_PASSWORD instead of db_password: s00p3rs3kret when writing the data back to the config file?
You need the tag to cause the creation of an instance that behaves like a string, but has the original
environment variable tucked onto it, so it can be found at dump time:
import sys
import os
import ruamel.yaml
yaml_str = """\
db_password: !ENV DB_PASSWORD
"""
yaml = ruamel.yaml.YAML(typ='safe')
yaml.default_flow_style = False
#yaml.register_class
class EnvStr(str):
yaml_tag = '!ENV'
def __new__(cls, env_var):
ret_val = str.__new__(cls, os.environ.get(env_var, f'ENV "{env_var}" NOT SET'))
ret_val.env_var = env_var
return ret_val
#classmethod
def from_yaml(cls, constructor, node):
return cls(node.value)
#classmethod
def to_yaml(cls, representer, node):
return representer.represent_scalar(cls.yaml_tag, node.env_var)
os.environ['DB_PASSWORD'] = 's00p3rs3kr3t'
data = yaml.load(yaml_str)
print(f'The password is "{data["db_password"]}" (without the double quotes). Keep it safe!')
print('\nYAML dump:')
yaml.dump(data, sys.stdout)
which gives:
The password is "s00p3rs3kr3t" (without the double quotes). Keep it safe!
YAML dump:
db_password: !ENV DB_PASSWORD

Render dynamically changing images with same filenames in Flask

I have a flask view function as below:
#app.route('/myfunc', methods = ['POST', 'GET'])
def myfunc():
var = request.form["samplename"]
selected_ecg=ecg.loc[ecg['Patient ID'].isin([var])]
selected_ecg = selected_ecg.drop('Patient ID', 1)
arr = np.array(selected_ecg)
y = arr.T
x=np.array(range(1,189))
plot.plot(x,y)
#Remove the old file
os.remove("static\graph.png")
#Now save the new image file
plot.savefig("static\graph.png")
return render_template("outputs.html")
Outputs.html:
<html>
<head>
</head>
<body>
<h1>Output page</h1>
<img src="static/graph.png" />
</body>
</html>
I use the flask view function to display an image through the outputs.html file. The catch here is that the static image file that is served keeps changing every time based on user inputs. In other words, I keep overwriting the image file based on the inputs the user has selected.
But the problem is that the changing image file is not served. The old image file that was used for first time render is only displayed for every new input of the user.
I have already referred to old posts regarding serving dynamic content in flask. But none of them served useful.
thebjorn's solution is valid. I have found multiple posts on Stack Overflow which suggest identical solutions. To view them, search for how to not cache images on Google. link link2 link3
Below is my solution to your problem. This will delete graph file and create new one with plot.savefig on every GET request to /myfunc. I was not sure on which request you wanted this behavior.
#app.route('/myfunc', methods = ['POST', 'GET'])
def myfunc():
var = request.form["samplename"]
selected_ecg=ecg.loc[ecg['Patient ID'].isin([var])]
selected_ecg = selected_ecg.drop('Patient ID', 1)
arr = np.array(selected_ecg)
y = arr.T
x=np.array(range(1,189))
plot.plot(x,y)
new_graph_name = "graph" + str(time.time()) + ".png"
for filename in os.listdir('static/'):
if filename.startswith('graph_'): # not to remove other images
os.remove('static/' + filename)
plot.savefig('static/' + new_graph_name)
return render_template("outputs.html", graph=new_graph_name)
Outputs.html
<html>
<head>
</head>
<body>
<h1>Output page</h1>
<img src="{{ url_for('static', filename=graph) }}" />
</body>
</html>
You're running into a caching issue. Static resources, like images, are cached at every point in the chain between your server and the browser. This is a good thing. Most reasonable systems are set up to cache images for at least 1 year at the server (and that's if they're not cached in the browser).
To bust through this cache issue, you'll need to either (i) give the files new names, (ii) reconfigure Vary headers to indicate they shouldn't be cached, or (iii) add a uniqueness fragment -- e.g. instead of using static/graph.png, add a timestamp 'static/graph.png?v=' + (new Date()).valueOf() or a md5 hash.
update: Dinko has given you a fine answer (do read the links he provides). To add cache-busting on the server side, without creating new files, you can calculate an md5 checksum (disadvantage: you'll need to read the entire file):
from hashlib import md5
fname = 'static/graph.png'
with open(fname, 'rb') as fp:
checksum = md5.new(fp.read()).hexdigest()
fname += "?v" + checksum
or use the last-modified attribute (not always reliable):
from hashlib import md5
fname = 'static/graph.png'
modified_tstamp = str(int(os.stat(fname).st_mtime * 10**6))
fname += "?v" + checksum
both of these methods will serve a cached version as long as the file doesn't change.

How to avoid re-downloading media to S3 in Scrapy?

I previously asked a similar question (How does Scrapy avoid re-downloading media that was downloaded recently?), but since I did not receive a definite answer I'll ask it again.
I've downloaded a large number of files to an AWS S3 bucket using Scrapy's Files Pipeline. According to the documentation (https://doc.scrapy.org/en/latest/topics/media-pipeline.html#downloading-and-processing-files-and-images), this pipeline avoids "re-downloading media that was downloaded recently", but it does not say how long ago "recent" is or how to set this parameter.
Looking at the implementation of the FilesPipeline class at https://github.com/scrapy/scrapy/blob/master/scrapy/pipelines/files.py, it would appear that this is obtained from the FILES_EXPIRES setting, for which the default is 90 days:
class FilesPipeline(MediaPipeline):
"""Abstract pipeline that implement the file downloading
This pipeline tries to minimize network transfers and file processing,
doing stat of the files and determining if file is new, uptodate or
expired.
`new` files are those that pipeline never processed and needs to be
downloaded from supplier site the first time.
`uptodate` files are the ones that the pipeline processed and are still
valid files.
`expired` files are those that pipeline already processed but the last
modification was made long time ago, so a reprocessing is recommended to
refresh it in case of change.
"""
MEDIA_NAME = "file"
EXPIRES = 90
STORE_SCHEMES = {
'': FSFilesStore,
'file': FSFilesStore,
's3': S3FilesStore,
}
DEFAULT_FILES_URLS_FIELD = 'file_urls'
DEFAULT_FILES_RESULT_FIELD = 'files'
def __init__(self, store_uri, download_func=None, settings=None):
if not store_uri:
raise NotConfigured
if isinstance(settings, dict) or settings is None:
settings = Settings(settings)
cls_name = "FilesPipeline"
self.store = self._get_store(store_uri)
resolve = functools.partial(self._key_for_pipe,
base_class_name=cls_name,
settings=settings)
self.expires = settings.getint(
resolve('FILES_EXPIRES'), self.EXPIRES
)
if not hasattr(self, "FILES_URLS_FIELD"):
self.FILES_URLS_FIELD = self.DEFAULT_FILES_URLS_FIELD
if not hasattr(self, "FILES_RESULT_FIELD"):
self.FILES_RESULT_FIELD = self.DEFAULT_FILES_RESULT_FIELD
self.files_urls_field = settings.get(
resolve('FILES_URLS_FIELD'), self.FILES_URLS_FIELD
)
self.files_result_field = settings.get(
resolve('FILES_RESULT_FIELD'), self.FILES_RESULT_FIELD
)
super(FilesPipeline, self).__init__(download_func=download_func, settings=settings)
#classmethod
def from_settings(cls, settings):
s3store = cls.STORE_SCHEMES['s3']
s3store.AWS_ACCESS_KEY_ID = settings['AWS_ACCESS_KEY_ID']
s3store.AWS_SECRET_ACCESS_KEY = settings['AWS_SECRET_ACCESS_KEY']
s3store.POLICY = settings['FILES_STORE_S3_ACL']
store_uri = settings['FILES_STORE']
return cls(store_uri, settings=settings)
def _get_store(self, uri):
if os.path.isabs(uri): # to support win32 paths like: C:\\some\dir
scheme = 'file'
else:
scheme = urlparse(uri).scheme
store_cls = self.STORE_SCHEMES[scheme]
return store_cls(uri)
def media_to_download(self, request, info):
def _onsuccess(result):
if not result:
return # returning None force download
last_modified = result.get('last_modified', None)
if not last_modified:
return # returning None force download
age_seconds = time.time() - last_modified
age_days = age_seconds / 60 / 60 / 24
if age_days > self.expires:
return # returning None force download
Do I understand this correctly? Also, I do not see a similar Boolean statement with age_days in the S3FilesStore class; is the checking of age also implemented for files on S3? (I was also unable to find any tests testing this age-checking feature for S3).
FILES_EXPIRES is indeed the setting to tell the FilesPipeline how "old" can a file be before downloading it (again).
The key section of the code is in media_to_download:
the _onsuccess callback checks the result of the pipeline's self.store.stat_file call, and for your question, it especially looks for the "last_modified" info. If last modified is older than "expires days", then the download is triggered.
You can check how the S3store gets the "last modified" information. It depends if botocore is available or not.
One line answer to this would be - class FilesPipeline(MediaPipeline): is the only class responsible for managing, validating and downloading files in your local paths. class S3FilesStore(object): just gets the files from local paths and uploads them to S3.
class FSFilesStore is the one which manages all your local paths and FilesPipeline uses them to store your files at local.
Links:
https://github.com/scrapy/scrapy/blob/master/scrapy/pipelines/files.py#L264
https://github.com/scrapy/scrapy/blob/master/scrapy/pipelines/files.py#L397
https://github.com/scrapy/scrapy/blob/master/scrapy/pipelines/files.py#L299

django static files versioning

I'm working on some universal solution for problem with static files and updates in it.
Example: let's say there was site with /static/styles.css file - and site was used for a long time - so a lot of visitors cached this file in browser
Now we doing changes in this css file, and update on server, but some users still have old version (despite modification date returned by server)
The obvious solution is to add some version to file /static/styles.css?v=1.1 but in this case developer must track changes in this file and manually increase version
A second solution is to count the md5 hash of the file and add it to the url /static/styels.css/?v={mdp5hashvalue} which looks much better, but md5 should be calculated automatically somehow.
they possible way I see it - create some template tag like this
{% static_file "style.css" %}
which will render
<link src="/static/style.css?v=md5hash">
BUT, I do not want this tag to calculate md5 on every page load, and I do not want to store hash in django-cache, because then we will have to clear after updating file...
any thoughts ?
Django 1.4 now includes CachedStaticFilesStorage which does exactly what you need (well... almost).
Since Django 2.2 ManifestStaticFilesStorage should be used instead of CachedStaticFilesStorage.
You use it with the manage.py collectstatic task. All static files are collected from your applications, as usual, but this storage manager also creates a copy of each file with the MD5 hash appended to the name. So for example, say you have a css/styles.css file, it will also create something like css/styles.55e7cbb9ba48.css.
Of course, as you mentioned, the problem is that you don't want your views and templates calculating the MD5 hash all the time to find out the appropriate URLs to generate. The solution is caching. Ok, you asked for a solution without caching, I'm sorry, that's why I said almost. But there's no reason to reject caching, really. CachedStaticFilesStorage uses a specific cache named staticfiles. By default, it will use your existing cache system, and voilà! But if you don't want it to use your regular cache, perhaps because it's a distributed memcache and you want to avoid the overhead of network queries just to get static file names, then you can setup a specific RAM cache just for staticfiles. It's easier than it sounds: check out this excellent blog post. Here's what it would look like:
CACHES = {
'default': {
'BACKEND': 'django.core.cache.backends.memcached.PyLibMCCache',
'LOCATION': '127.0.0.1:11211',
},
'staticfiles': {
'BACKEND': 'django.core.cache.backends.locmem.LocMemCache',
'LOCATION': 'staticfiles-filehashes'
}
}
I would suggest using something like django-compressor. In addition to automatically handling this type of stuff for you, it will also automatically combine and minify your files for fast page load.
Even if you don't end up using it in entirety, you can inspect their code for guidance in setting up something similar. It's been better vetted than anything you'll ever get from a simple StackOverflow answer.
I use my own templatetag which add file modification date to url: https://bitbucket.org/ad3w/django-sstatic
Is reinventing the wheel and creating own implementation that bad? Furthermore I would like low level code (nginx for example) to serve my staticfiles in production instead of python application, even with backend. And one more thing: I'd like links stay the same after recalculation, so browser fetches only new files. So here's mine point of view:
template.html:
{% load md5url %}
<script src="{% md5url "example.js" %}"/>
out html:
static/example.js?v=5e52bfd3
settings.py:
STATIC_URL = '/static/'
STATIC_ROOT = os.path.join(PROJECT_DIR, 'static')
appname/templatetags/md5url.py:
import hashlib
import threading
from os import path
from django import template
from django.conf import settings
register = template.Library()
class UrlCache(object):
_md5_sum = {}
_lock = threading.Lock()
#classmethod
def get_md5(cls, file):
try:
return cls._md5_sum[file]
except KeyError:
with cls._lock:
try:
md5 = cls.calc_md5(path.join(settings.STATIC_ROOT, file))[:8]
value = '%s%s?v=%s' % (settings.STATIC_URL, file, md5)
except IsADirectoryError:
value = settings.STATIC_URL + file
cls._md5_sum[file] = value
return value
#classmethod
def calc_md5(cls, file_path):
with open(file_path, 'rb') as fh:
m = hashlib.md5()
while True:
data = fh.read(8192)
if not data:
break
m.update(data)
return m.hexdigest()
#register.simple_tag
def md5url(model_object):
return UrlCache.get_md5(model_object)
Note, to apply changes an uwsgi application (to be specific a process) should be restarted.
Django 1.7 added ManifestStaticFilesStorage, a better alternative to CachedStaticFilesStorage that doesn't use the cache system and solves the problem of the hash being computed at runtime.
Here is an excerpt from the documentation:
CachedStaticFilesStorage isn’t recommended – in almost all cases ManifestStaticFilesStorage is a better choice. There are several performance penalties when using CachedStaticFilesStorage since a cache miss requires hashing files at runtime. Remote file storage require several round-trips to hash a file on a cache miss, as several file accesses are required to ensure that the file hash is correct in the case of nested file paths.
To use it, simply add the following line to settings.py:
STATICFILES_STORAGE = 'django.contrib.staticfiles.storage.ManifestStaticFilesStorage'
And then, run python manage.py collectstatic; it will append the MD5 to the name of each static file.
How about you always have a URL Parameter in your URL with a version and whenever you have a major release you change the version in your URL Parameter. Even in the DNS. So if www.yourwebsite.com loads up www.yourwebsite.com/index.html?version=1.0 then after the major release the browser should load www.yourwebsite.com/index.html?version=2.0
I guess this is similar to your solution 1. Instead of tracking files can you track whole directories? For example ratehr than /static/style/css?v=2.0 can you do /static-2/style/css or to make it even granular /static/style/cssv2/.
There is an update for #deathangel908 code. Now it works well with S3 storage also (and with any other storage I think). The difference is using of static file storage for getting file content. Original doesn't work on S3.
appname/templatetags/md5url.py:
import hashlib
import threading
from django import template
from django.conf import settings
from django.contrib.staticfiles.storage import staticfiles_storage
register = template.Library()
class UrlCache(object):
_md5_sum = {}
_lock = threading.Lock()
#classmethod
def get_md5(cls, file):
try:
return cls._md5_sum[file]
except KeyError:
with cls._lock:
try:
md5 = cls.calc_md5(file)[:8]
value = '%s%s?v=%s' % (settings.STATIC_URL, file, md5)
except OSError:
value = settings.STATIC_URL + file
cls._md5_sum[file] = value
return value
#classmethod
def calc_md5(cls, file_path):
with staticfiles_storage.open(file_path, 'rb') as fh:
m = hashlib.md5()
while True:
data = fh.read(8192)
if not data:
break
m.update(data)
return m.hexdigest()
#register.simple_tag
def md5url(model_object):
return UrlCache.get_md5(model_object)
The major advantage of this solution: you dont have to modify anything in the templates.
This will add the build version into the STATIC_URL, and then the webserver will remove it with a Rewrite rule.
settings.py
# build version, it's increased with each build
VERSION_STAMP = __versionstr__.replace(".", "")
# rewrite static url to contain the number
STATIC_URL = '%sversion%s/' % (STATIC_URL, VERSION_STAMP)
So the final url would be for example this:
/static/version010/style.css
And then Nginx has a rule to rewrite it back to /static/style.css
location /static {
alias /var/www/website/static/;
rewrite ^(.*)/version([\.0-9]+)/(.*)$ $1/$3;
}
Simple templatetag vstatic that creates versioned static files urls that extends Django's behaviour:
from django.conf import settings
from django.contrib.staticfiles.templatetags.staticfiles import static
#register.simple_tag
def vstatic(path):
url = static(path)
static_version = getattr(settings, 'STATIC_VERSION', '')
if static_version:
url += '?v=' + static_version
return url
If you want to automatically set STATIC_VERSION to the current git commit hash, you can use the following snippet (Python3 code adjust if necessary):
import subprocess
def get_current_commit_hash():
try:
return subprocess.check_output(['git', 'rev-parse', '--short', 'HEAD']).strip().decode('utf-8')
except:
return ''
At settings.py call get_current_commit_hash(), so this will be calculated only once:
STATIC_VERSION = get_current_commit_hash()
I use a global base context in all my views, where I set the static version to be the millisecond time (that way, it will be a new version every time I restart my application):
# global base context
base_context = {
"title": settings.SITE_TITLE,
"static_version": int(round(time.time() * 1000)),
}
# function to merge context with base context
def context(items: Dict) -> Dict:
return {**base_context, **items}
# view
def view(request):
cxt = context({<...>})
return render(request, "page.html", cxt)
my page.html extends my base.html template, where I use it like this:
<link rel="stylesheet" type="text/css" href="{% static 'style.css' %}?v={{ static_version }}">
fairly simple and does the job

GAE Python optimization: Django filter for language support

I have a filter that I use for lang support in my webapp. But when I publish it to GAE it keeps telling me that it the usage of CPU is to high.
I think I located the problem to my filters I use for support. I use this in my templates:
<h1>{{ "collection.header"|translate:lang }}</h1>
The filter code looks like this:
import re
from google.appengine.ext import webapp
from util import dictionary
register = webapp.template.create_template_register()
def translate(key, lang):
d = dictionary.GetDictionaryKey(lang, key)
if d == False:
return "no key for " + key
else:
return d.value
register.filter(translate)
I'm to new to Python to see what's wrong with it. Or is the the entire wrong approach?
..fredrik
Little more about what I'm trying to do: I'm trying to find away to handle language support. A user needs to be able to update text elements via an admin page. As of now I have all text elements stored in a db.model. And use a filter to get the right key based on language.
After a lot of testing I still can't get to work well enough. When published I still get error messages in the logs about to much CPU usage. A typical page has about 30-50 text elements. And according to the logs it uses about 1500ms (900ms API) for each page load. I'm starting to think this might not be the best approach?
I've tried using both memcache and indexes to get around the CPU usage. It helps a little. Should one use memcache and manually added indexes?
This is how my filter looks like:
import re
from google.appengine.ext import webapp
from google.appengine.api import memcache
from util import dictionary
register = webapp.template.create_template_register()
def translate(key, lang):
re = "no key for " + key
data = memcache.get("dictionary" + lang)
if data is None:
data = dictionary.GetDictionaryKey(lang)
memcache.add("dictionary" + lang, data, 60)
if key in data:
return data[key]
else:
return "no key for " + key
register.filter(translate)
And util.dictionary looks like this:
from google.appengine.ext import db
class DictionaryEntries(db.Model):
lang = db.StringProperty()
dkey = db.StringProperty()
value = db.TextProperty()
params = db.StringProperty()
#property
def itemid(self):
return self.key().id()
def GetDictionaryKey(lang):
entries = DictionaryEntries.all().filter("lang = ", lang)
if entries.count() > 0:
langObj = {}
for entry in entries:
langObj[entry.dkey] = entry.value
return langObj
else:
return False
Your initial question is about high cpu usage, the answer i think is simple, with GAE and databases like BigTable (non-relational) the code with entries.count() is expensive and the for entry in entrie too if you have a lot of data.
I think you must have to do a couple of things:
in your utils.py
def GetDictionaryKey(lang, key):
chache_key = 'dictionary_%s_%s' % (lang, key)
data = memcache.get(cache_key)
if not data:
entry = DictionaryEntries.all().filter("lang = ", lang).filter("value =", key).get()
if entry:
data = memcache.add(cache_key, entry.value, 60)
else:
data = 'no result for %s' % key
return data
and in your filter:
def translate(key, lang):
return dictionary.GetDictionaryKey(lang, key)
This approach is better because:
You don't make the expensive query of count
You respect the MVC pattern, because a filter is part of the Template (View in the pattern) and the method GetDictionaryKey is part of the Controler.
Besides, if you are using django i suggest you slugify your cache_key:
from django.template.defaultfilters import slugify
def GetDictionaryKey(lang, key):
chache_key = 'dictionary_%s_%s' % (slugify(lang), slugify(key))
data = memcache.get(cache_key)
if not data:
entry = DictionaryEntries.all().filter("lang = ", lang).filter("value =", key).get()
if entry:
data = memcache.add(cache_key, entry.value, 60)
else:
data = 'no result for %s' % key
return data
Have you considered switching to standard gettext methods? Gettext is a widely spread approach for internationalization and very well embedded in the Python (and the Django) world.
Some links:
Python's gettext module
Django's support for gettext with special attention to unicode
PoEdit, an editor for .po-files produced by pygettext
Your template would then look like this:
{% load i18n %}
<h1>{% trans "Header of my Collection" %}</h1>
The files for translations can be generated by manage.py:
manage.py makemessages -l fr
for generating french (fr) messages, for example.
Gettext is quite performant, so I doubt that you will experience a significant slow-down with this approach compared to your storage of the translation table in memcache. And what's more, it let's you work with "real" messages instead of abstract dictionary keys, which is, at least in my experience, ways better, if you have to read and understand the code (or if you have to find and change a certain message).

Categories

Resources