What options are there for localizing an app on Google App Engine? How do you do it using Webapp, Django, web2py or [insert framework here].
1. Readable URLs and entity key names
Readable URLs are good for usability and search engine optimization (Stack Overflow is a good example on how to do it). On Google App Engine, key based queries are recommended for performance reasons. It follows that it is good practice to use the entity key name in the URL, so that the entity can be fetched from the datastore as quickly as possible.
Some characters have special meaning in URLs (&, ", ' etc). To be able to use key names as parts of an URL, they should not contain any of these characters. Currently I use the function below to create key names:
import re
import unicodedata
def urlify(unicode_string):
"""Translates latin1 unicode strings to url friendly ASCII.
Converts accented latin1 characters to their non-accented ASCII
counterparts, converts to lowercase, converts spaces to hyphens
and removes all characters that are not alphanumeric ASCII.
Arguments
unicode_string: Unicode encoded string.
Returns
String consisting of alphanumeric (ASCII) characters and hyphens.
"""
str = unicodedata.normalize('NFKD', unicode_string).encode('ASCII',
'ignore')
str = re.sub('[^\w\s-]', '', str).strip().lower()
return re.sub('[-\s]+', '-', str)
This is basically a whitelist for approved characters. It works fine for English and Swedish, however it will fail for non-western scripts and remove letters from some western ones (like Norwegian and Danish with their œ and ø).
Can anyone suggest a method that works with more languages? Would it be better to remove problematic characters (blacklist)?
2. Translating templates
Does Django internationalization and localization work on Google App Engine? Are there any extra steps that must be performed? Is it possible to use Django i18n and l10n for Django templates while using Webapp?
The Jinja2 template language provides integration with Babel. How well does this work, in your experience?
What options are avilable for your chosen template language?
3. Translated datastore content
When serving content from (or storing it to) the datastore: Is there a better way than getting the accept_language parameter from the HTTP request and matching this with a language property that you have set with each entity?
Regarding point 1, there's really no need to go to such lengths: Simply use unicode key names. They'll be encoded as UTF-8 in the datastore for you.
Regarding point 3, there are many ways to handle language detection. Certainly accept_language should be part of it, and you'll find webob's accept_language support particularly useful here (hopefully Django or your framework-of-choice has something similar). It's quite often the case, however, that a user's browser's language configuration isn't correct, so you'll want to make sure there's some way for the user to override the detected language - for example, with a link on each page to change the language, setting a preference cookie.
Concerning point 2, I asked a similar question a few months ago. I've managed to get the application internationalized, but just the content, not the urls (wasn't planning on doing so either).
I've also added the revision I made to my code so that people can see what changes went into i18n'ing this Google App Engine app. Look at my second comment on the accepted answer.
Good luck with your other 2 points!
Related
I have struggled with what language formats I need to pass to translation.activate and I asked a detailed question but then I dug up the solution from the source. See it in my answer.
The format is:
you can supply a plain language ('hu', 'pt', 'en', 'de' etc.)
you can supply a language and a "territory" ('pt-pt', 'pt-br', 'en-gb')
note that django is buggy if you use other formats (eg. 'pt_br') since it does not properly recognise the language-territory relation. (I haven't tried but looking at the code I also think 'pt-BR' format
should work. It is too bad that we don't have a verbose option to see django's guesses.)
Django checks the directories in the following order (assuming you enter 'pt-br'):
language_territory.isoencoding ('pt_BR.ISO8859-1'),
language_territory ('pt_BR') -- but note that if a territory is longer than 2 characters, then only the first character gets capitalized (eg. 'zh_Hans')!
language.encoding ('pt.ISO8859-1')
language ('pt')
falling back to default language and trying its 4 possible directory (usually ['en_US.ISO8859-2', 'en_US', 'en.ISO8859-2', 'en'])
This was missing in Django documentation, now you can all find it here. Good luck.
One more thing:
You can put a locale directory into each app directory. I recommend you to break up the translation into smaller units - it is easier to track the translations later.
I have some text fields in my Django model that are filled by a script, with values in English (the list of values is known).
But the app is actually made for Russian clients only. I'd like to translate those fields into Russian, and here comes a little question. These values are taken from an API response, which means I should check the value to translate it. What's faster: to check and translate fields in template or to make extra fields and translate strings in the Python script?
The problem is overhead of compiling Templates when rendering. So the more complicated the template gets (method calls etc), the performance tends to get slow (like py files are converted to pyc). Django has template caching but that also is limited (I don't know how much). I have faced performance issue because of lot of logic in templates. Plus its always good to have a dumb client (template). I will prefer the Python approach because of the idea to keep client thin and not because of the performance gap. Plus if tomorrow you need to add one more language then changing templates is always going to be difficult then server.
I've included a search form in my web2py application, in the following form:
myapp/controller/search?query=myquery
However, for security reasons web2py automatically replaces spaces and non-alphanumeric characters with underscores, which is okay for English-only sites but
an impediment for languages that use accent marks. For example, searching for "áéíóú" returns five underscores.
This could be solved by using POST instead of GET for the search form, but then the users wouldn't be able to bookmark the results.
Is there any option to solve this?
Thanks in advance.
Here's an idea that I've used in the past:
Use post to submit the query
Generate a unique string (e.g. youtube: https://www.youtube.com/watch?v=jX3DuS2Ak3g)
Associate the query to that string and store as key/value pair in session/app state/db (depending on how long you want it to live)
Redirect the user to that
If you don't want to occupy extra memory/space as they tend to grow a lot in some cases, you can substitute steps 2-3 with encrypting the string to something you can decrypt afterwards. You can do this in a middleware class so that it's transparent to your app's logic.
This is a general problem people face while handling urls.
You can use the quote/quote_plus module in urllib to normalize the strings -
For example, from the strings you suggested -
>>> print urllib.quote('éíóú')
%C3%A9%C3%AD%C3%B3%C3%BA
>>> print urllib.unquote('%C3%A9%C3%AD%C3%B3%C3%BA')
éíóú
you will have to perform the unquote when you retrieve it on the backend from the request.
There are also some other posts which might be helpful - urlencode implementation and unicode ready urls
I'w writing a Django application, which stores data with descriptions in at least 2 languages (Russian and English).
By "data" I mean the information, which users will write (enter, edit) to and read from the application (not the UI). The application is sort of a documentation system - it contains documentation items (paragraphs), each of which has associated text in Russian, English and potentially other languages.
Is there a standard, established way to provide multi-language descriptions for data objects in Django? If yes, where is it described?
Django's support for localization and internationalization is limited to translating text that is part of your labels and templates, formatting of dates and numbers, and more recently proper support for timezones. In addition, automatic URL prefixing of the preferred language was added in 1.4.
For storing user-entered content in multiple languages, there is no official support. However, projects like django-multilingual-model are a good step in this direction.
I think what you need is Django Hvad.
https://github.com/kristianoellegaard/django-hvad
I'm looking for a way to translate my Django project. Built in mechanism provided with Django is great, but has several weak points which made me go looking for an alternative.
Project owner must be able to edit every translation including English (original translation). With gettext it is possible to edit translations with tools like Pootle, but the original strings stay hardcoded inside file sources or templates. There is no way that product owner can change them.
Possible solution is to make gettext translate some unique identifiers, and just translate them to all languages including English, like this:
_('form_sumbit_button')
But this makes tools like pootle almost impossible to use for translators.
Question: are there any tools for Django project translation that could fit my needs?
If you use some message IDs, they would either be incomprehensible ("message_2215") or you'd be forced to synchronise the message IDs to the actual messages ("Please press any key" = "please_press_any_key" => "Any key to continue" = "any_key_to_continue"). Either way, real strings are better for the programmers and for the tools.
However, if you employ a separate proof-reader for your strings, you can do the following:
Create an English "translation" file (yes, this works)
Let your proof-reader "translate" from English to English using Pootle or any other tool
Make sure your programmers keep that translation file untranslated by updating the strings in code.
(optional) Create a way to deploy translations independently of your main code so you can fix a typo quickly.
You may be able to use Pootle with the _("message_id") approach, depending on how easy Pootle is to customise (I don't know the internals so I can't say, but IIUC it uses Django where template changes are usually straightforward).
For example, Pootle's translation screens have "Original" and "Translation" sections; you could perhaps adapt the templates to show, under the "Original" section, a "Reference" section which displays some canonical translation using a specific reference language (e.g. English).
Or you may be able to use Pootle's alternative source language functionality, without needing to customise Pootle. You could store the canonical versions of the translations using an unused language code (or a made-up one).
Using identifiers is definitely possible with Gettext and there are tools which support this. However it might be unusual for some translators as they are used to downloading only .po file for offline translation, what does not work with monolingual translations.
For example Weblate supports monolingual Gettext files just fine (I'm author of this tool): https://docs.weblate.org/en/latest/formats.html#monolingual-gettext