I want to use subset HTML inside of my QT widgets that also contain text. I'm using translations for my project, so I also have a source .ts which I later upload to transifex so a translation team can translate it.
However, when I use subset HTML my strings in my .ts file can have the following.
<message>
<location filename="layoutstest.py" line="1007"/>
<source><html><head/><body><p>Hello ...Much More code here...</source>
<translation type="unfinished"></translation>
</message>
This shows up for the translators as well, although not escaped, but still there. It's going to make it very hard for them, or anyone that matter, to translate these strings, so is there anyway around this, or anyway to remove the html from the strings, but still keep it in the code.
I really would like to use it, but I can't if it's going to live in the translation strings as well.
Turns out that they actually use syntax highlighting for the subset HTML now. This is a solution for my problem.
Related
Is HTML or CSS open source like PHP or Python?
For example PHP itself:
PHP source link in GITHUB
You must understand the difference between the specification (documentation, idea) and the implementation (interpreter, compiler, engine). Refer to https://softwareengineering.stackexchange.com/questions/238724/what-license-is-html-released-under for more detailed answers.
HTML
HTML only provides tags around text but other than that html files are practically just text files with extra markers. These tags are then parsed by the parser of the browser you use and displayed in a certain way. The easiest example of this is probably the <strong></strong>, <i></i> and <b></b> tags. These tags only purpose is to modify the text within them to look different. But the strong tag looks different on different browsers. Some browsers make text bold while others do other changes. This shows you that the browser really is the interpreter and there are not always strict rules on what tags must do. The current standard (html5) was created by a group called the WHATWG. See more information also here https://www.w3.org/html/. and their own website https://html.spec.whatwg.org/multipage/. They have a github https://github.com/whatwg/html and I believe you can make changes if they accept your merge requests.
CSS
CSS is also not really a language. CSS is prescribed by the css working group their members are publicly known. https://www.w3.org/Style/CSS/members. If you scroll through the list you will see most of these people are engineers from big tech companies. They do have a github for css it is used for people to post issues. https://github.com/w3c/csswg-drafts. I do believe you can create a merge request although i am unsure.
Summary
In short yes they are basically opensource. However changes to either of these repositories even when accepted and merged will not work untill browsers have implemented the changes.
note
I am not an expert by any means I googled most of my information. I could be wrong about multiple things.
I have an original text that I want to translate. I normally do it manually but I know I could save a lot of time translating automatically the most frequent words and expressions.
I will find out how to translate simple words, the problem is not here. I have read some books on python and I think using string manipulations can be done.
But I am lost about how to create the output file.
The output file will contain:
short empty forms ready to be filled wherever there is text that has not been translated
the translated words wherever they were in the original file
In the output file I will fill manually the empty forms, after pressing Tab the cursor should jump to the next exmpty form
I am lost here, I know how to do forms on html but the language I am used to is Python.
I would like to know what modules from Python I could use. I need some guidance on this.
Can you recommend me a book or a tool that explains how to do something similar to this?
This is what I want to do, assuming I have managed to create a simple database to translate colors from Spanish to English.
The first step contains the original file.
The second step contains the automatic translation.
In the third step I complete the manual translation.
After finishing everything is grouped into a normal txt file ready to be used.
I think it is quite clear. I don't expect people to tell me the code to do this, I just need to know what tools could be used to achieve my goal.
Thanks for editing.
To create an interface that works with a web browser, Flask for Python is a good method for creating webforms. There are tutorials available.
One method for storing data would be an SQLite file. That may be more than you need, so I'd recommend starting with a CSV file. Libraries exist in Python for both CSVs and SQLite.
I have chunks of text that may or may not contain Python code. I need a way to search the text for code and if it is there, do something. I could easily search for specific strings that match a regex, but I need something general.
One thought I had would be to run the text through ast, but that would require parsing out all possible substrings and submitting each of them to ast.
To be clear, the text comes from a Q&A forum for Python. Users frequently post code in their questions, but the code is all smushed into one long, incoherent line when it should be formatted to be displayed properly. I need to check if there is code included in the text and if there is and it isn't formatted properly, complain to the user. Checking formatting is something I can handle, I just need to check the text for any Python code.
Any help would be appreciated.
I'm writing a program that requires input in the form of a document, it needs to replace a few values, insert a table, and convert it to PDF. It's written in Python + Qt (PyQt). Is there any well known document standard which can be easily used programmatically? It must be cross platform, and preferably open.
I have looked into Microsoft Doc and Docx, which are binary formats and I can't edit them. Python has bindings for it, but they're only on Windows.
Open Office's ODT/ODF is zipped in an xml file, so I can edit that one but there's no command line utilities or any way to programmatically convert the file to a PDF. Open Office provides bindings, but you need to run Open Office from the command line, start a server, etc. And my clients may not have Open Office installed.
RTF is readable from Python, but I couldn't find any way/libraries to convert RTF documents to PDF.
At the moment I'm exporting from Microsoft Word to HTML, replacing the values and using PyQt to convert it to a PDF. However it loses formatting features and looks awful. I'm surprised there isn't a well known library which lets you edit a variety of document formats and convert them into other formats, am I missing something?
Update: Thanks for the advice, I'll have a look at using Latex.
Thanks,
Jackson
Have you looked into using LaTeX documents?
They are perfect to use programatically (compiling documents? You gotta love that...), and you have several Python frameworks you can use such as plasTeX and PyTex.
Exporting a LaTeX documents to PDF is almost immediate.
Since you're already using PyQt anyway, it might be worth looking at Qt's built-in RTF processing module which looks decent. Here's the documentation on detailed content manipulation including inserting tables. Also the QPrinter module's default print-to-file format happens to be PDF.
Without knowing more about your particular needs it's hard to say if these would do what you want, but since your application already has PyQt as a dependency, seems silly to introduce any more without evaluating the functionality you've already got available.
The non-GUI parts of the Qt framework are often overlooked though.
edit: included more links.
You might want to try ReportLab. The open source version can write PDFs, and the commercial version has a lot of really nice abstractions to allow output to a variety of different formats from a single input.
I don't know the kind of odience of your program, Tex is good and i would go with it.
Another possible choice is Excel format, parsing it with xlrd.
I've used it a couple of time and it's pretty straightforward.
Excel file is a good for the following reasons:
Well known format easy to edit
You could prepare a predefined template with constrains and table
Creating XML documents, transforming them to XSL/fo and rendering with Fop or RenderX. If you use docbook as the primary input, there are toolchains freely available for converting that to PDF, RTF, HTML and so forth.
It is rather quirky to use and not my idea of fun, but is does deliver and can be embedded in an application, AFAICT.
Creating docbook is very straightforward as it has a wide range of semantic tags, table support etc to give a "meaningful" markup which can be reliably formatted. The XSL stylesheets are modular and allow parts to be customized or replaced to generate your own look and feel.
It works well for relatively free flow documents with lots of text.
For filling in the blanks kind of documents, a regular reporting engine may be a better fit, or some straighforward XSL stylesheets spitting out the XSL-fo directly.
I have a medium sized Django project, (running on AppEngine if it makes any difference), and have all the strings living in .po files like they should.
I'm seeing strange behavior where certain strings just don't translate. They show up in the .po file when I run make_messages, with the correct file locations marked where my {% trans %} tags are. The translations are in place and look correct compared to other strings on either side of them. But when I display the page in question, about 1/4 of the strings simply don't translate.
Digging into the relevant generated .mo file, I don't see either the msgid or the msgstr present.
Has anybody seen anything similar to this? Any idea what might be happening?
trans tags look correct
.po files look correct
no errors during compile_messages
Ugh. Django, you're killing me.
Here's what was happening:
http://blog.e-shell.org/124
For some reason only Django knows, it decided to decorate some of my translations with the comment '# fuzzy'. It seems to have chosen which ones to mark randomly.
Anyway, #fuzzy means this: "don't translate this, even though here's the translation:"
I'll leave this here in case some other poor soul comes across it in the future.
The fuzzy marker is added to the .po file by makemessages. When you have a new string (with no translations), it looks for similar strings, and includes them as the translation, with the fuzzy marker. This means, this is a crude match, so don't display it to the user, but it could be a good start for the human translator.
It isn't a Django behavior, it comes from the gettext facility.