I have a medium sized Django project, (running on AppEngine if it makes any difference), and have all the strings living in .po files like they should.
I'm seeing strange behavior where certain strings just don't translate. They show up in the .po file when I run make_messages, with the correct file locations marked where my {% trans %} tags are. The translations are in place and look correct compared to other strings on either side of them. But when I display the page in question, about 1/4 of the strings simply don't translate.
Digging into the relevant generated .mo file, I don't see either the msgid or the msgstr present.
Has anybody seen anything similar to this? Any idea what might be happening?
trans tags look correct
.po files look correct
no errors during compile_messages
Ugh. Django, you're killing me.
Here's what was happening:
http://blog.e-shell.org/124
For some reason only Django knows, it decided to decorate some of my translations with the comment '# fuzzy'. It seems to have chosen which ones to mark randomly.
Anyway, #fuzzy means this: "don't translate this, even though here's the translation:"
I'll leave this here in case some other poor soul comes across it in the future.
The fuzzy marker is added to the .po file by makemessages. When you have a new string (with no translations), it looks for similar strings, and includes them as the translation, with the fuzzy marker. This means, this is a crude match, so don't display it to the user, but it could be a good start for the human translator.
It isn't a Django behavior, it comes from the gettext facility.
Related
I have a need to create reports from Python. Our existing reporting system for my Volunteer Fire Company is being deprecated, and it had some idiosyncrasies anyway. Everything I'm looking at that seems feasible uses a template for the report body. Before I get too far down a dead end path trying different methods, I'd like to know if anyone knows of anything out there that can do something like this. Specifically, conditional formatting in a row in the body. Below is what we get currently- today I do the bolding of the rows manually based on our criteria of 33% or better. In my code I'll obviously know if the row deserves to be bolded- but everything I'm looking at that uses a template wouldn't allow this. The end result will be a PDF, but if I have to go through Excel, Word, HTML, or whatever to get there, I'll check it out. Sorry for the non-specific question, but I could potentially churn a lot more wasted time looking for something when somebody may already know what to use. Thanks.
I want to use subset HTML inside of my QT widgets that also contain text. I'm using translations for my project, so I also have a source .ts which I later upload to transifex so a translation team can translate it.
However, when I use subset HTML my strings in my .ts file can have the following.
<message>
<location filename="layoutstest.py" line="1007"/>
<source><html><head/><body><p>Hello ...Much More code here...</source>
<translation type="unfinished"></translation>
</message>
This shows up for the translators as well, although not escaped, but still there. It's going to make it very hard for them, or anyone that matter, to translate these strings, so is there anyway around this, or anyway to remove the html from the strings, but still keep it in the code.
I really would like to use it, but I can't if it's going to live in the translation strings as well.
Turns out that they actually use syntax highlighting for the subset HTML now. This is a solution for my problem.
Is there a problem with using unicode (hebrew specificaly) strings including white space.
some of them also include characters such as "%" .
I'm experiencing some problems and since this is my first Django project I want to rule out this as a problem before going further into debugging.
And if there is a known Django problem with this kind of urls is there a way around it?
I know I can reformat the text to solve some of those problems but since I'm preparing a site that uses raw open government data sets (perfectly legal) I would like to stick to the original format as possible.
thanks for the help
Django shouldn't have any problems with unicode URLs, or whitespace in URLs for that matter (although you might want to take care to make sure whitespace is urlecoded (%20).
Either way, though, using white space in a URL is just bad form. It's not guaranteed to work unless it's urlencoded, and then that's just one more thing to worry about. Best to make any field that will eventually become part of a URL a SlugField (so spaces aren't allowed to begin with) or run the value through slugify before placing it in the URL:
In template:
http://domain.com/{{ some_string_with_spaces|slugify }}/
Or in python code:
from django.template.defaultfilters import slugify
u'http://domain.com/%s/' % slugify(some_string_with_spaces)
Take a look here for a fairly comprehensive discussion on what makes an invalid (or valid) URL.
I need to let users enter Markdown content to my web app, which has a Python back end. I don’t want to needlessly restrict their entries (e.g. by not allowing any HTML, which goes against the spirit and spec of Markdown), but obviously I need to prevent cross-site scripting (XSS) attacks.
I can’t be the first one with this problem, but didn’t see any SO questions with all the keywords “python,” “Markdown,” and “XSS”, so here goes.
What’s a best-practice way to process Markdown and prevent XSS attacks using Python libraries? (Bonus points for supporting PHP Markdown Extra syntax.)
I was unable to determine “best practice,” but generally you have three choices when accepting Markdown input:
Allow HTML within Markdown content (this is how Markdown originally/officially works, but if treated naïvely, this can invite XSS attacks).
Just treat any HTML as plain text, essentially letting your Markdown processor escape the user’s input. Thus <small>…</small> in input will not create small text but rather the literal text “<small>…</small>”.
Throw out all HTML tags within Markdown. This is pretty user-hostile and may choke on text like <3 depending on implementation. This is the approach taken here on Stack Overflow.
My question regards case #1, specifically.
Given that, what worked well for me is sending user input through
Markdown for Python, which optionally supports Extra syntax and then through
html5lib’s sanitizer.
I threw a bunch of XSS attack attempts at this combination, and all failed (hurray!); but using benign tags like <strong> worked flawlessly.
This way, you are in effect going with option #1 (as desired) except for potentially dangerous or malformed HTML snippets, which are treated as in option #2.
(Thanks to Y.H Wong for pointing me in the direction of that Markdown library!)
Markdown in Python is probably what you are looking for. It seems to cover a lot of your requested extensions too.
To prevent XSS attacks, the preferred way to do it is exactly the same as other languages - you escape the user output when rendered back. I just took a peek at the documentation and the source code. Markdown seems to be able to do it right out of the box with some trivial config tweaks.
My question is regarding i18n in Python. From what I understand, it involves:
Create a messages file per language (ONLY ONE?!).
in this file, each message will be of the format
English message here
Message en Francais ici (yea crappy french..)
then have this file compiled into another faster binary format
repeat for all other langs needed
in app code (Django) use the translate method with english (or default) language which will be translated correctly based on locale... tr('English message Here')
Probably I'm a bit off on the steps, but this seems to be the general sense right?
What I'm wondering is, is there a simpler way? I mean in the java webapp world, you set up message bundle files in the bundleName_locale.properties format. In each one you usually have a key to message relation like greeting = Hello World. You can have lots of different properties files for different sub sections of your site/app. All the locale files are hierarchical, and missing keys in a sub locale fall through to the parent etc. This is all done automatically by Java, no setup required.
Is there anything like this in the Django/Python world? Is this just insane to follow this route? Could I fake this using a module as a stand in for a java .properties file? Sorry for the long-winded question, and thanks for any input.
While you could do this fairly simply, I would question why.
As is:
Django's i18n is based around gettext, which has never given me any performance problems.
You don't have to create the message file, Django will do it for you.
The messages files Django creates can be sent as a text file to just about anyone for translation.
I spent about 5 minutes explaining to someone how to use them, and a day later got back a fantastic translation of my entire site.
Marking the strings in your code is very simple.
_("My String") is normal for .py files by using from django.utils.translation import ugettext as _.
{% trans "My String" %} in your templates. Again, quite simple.
Writing out bundle.getString("My String") seems like it would get old, quick.
You can easily incorporate Django's i18n into your JavaScript, if needed.
There are web-based editors for the message files.
You don't have to define a string more than once, it conglomerates all instances into one token across your entire app.
How many times do you have to define "Save" in your Java properties files?
Still in a nice, version-tracking friendly, text file.
If you hack together a .properties-like way to define your strings, you'd still want to compile them so that you don't have to parse the text file at execution time.
Example of empty .po file:
#: templates/inquiries/settings/new_field.html:12
#: templates/inquiries/settings/new_form.html:4
msgid "Save"
msgstr ""
Personally, I don't see a reason to bother hacking together a replacement for a solution which already exists and works well.