I have an original text that I want to translate. I normally do it manually but I know I could save a lot of time translating automatically the most frequent words and expressions.
I will find out how to translate simple words, the problem is not here. I have read some books on python and I think using string manipulations can be done.
But I am lost about how to create the output file.
The output file will contain:
short empty forms ready to be filled wherever there is text that has not been translated
the translated words wherever they were in the original file
In the output file I will fill manually the empty forms, after pressing Tab the cursor should jump to the next exmpty form
I am lost here, I know how to do forms on html but the language I am used to is Python.
I would like to know what modules from Python I could use. I need some guidance on this.
Can you recommend me a book or a tool that explains how to do something similar to this?
This is what I want to do, assuming I have managed to create a simple database to translate colors from Spanish to English.
The first step contains the original file.
The second step contains the automatic translation.
In the third step I complete the manual translation.
After finishing everything is grouped into a normal txt file ready to be used.
I think it is quite clear. I don't expect people to tell me the code to do this, I just need to know what tools could be used to achieve my goal.
Thanks for editing.
To create an interface that works with a web browser, Flask for Python is a good method for creating webforms. There are tutorials available.
One method for storing data would be an SQLite file. That may be more than you need, so I'd recommend starting with a CSV file. Libraries exist in Python for both CSVs and SQLite.
I have chunks of text that may or may not contain Python code. I need a way to search the text for code and if it is there, do something. I could easily search for specific strings that match a regex, but I need something general.
One thought I had would be to run the text through ast, but that would require parsing out all possible substrings and submitting each of them to ast.
To be clear, the text comes from a Q&A forum for Python. Users frequently post code in their questions, but the code is all smushed into one long, incoherent line when it should be formatted to be displayed properly. I need to check if there is code included in the text and if there is and it isn't formatted properly, complain to the user. Checking formatting is something I can handle, I just need to check the text for any Python code.
Any help would be appreciated.
After crawling many websites, in some of them i receive broken-encoding data. I can't do anything with them, i just need to detect them. For example text like:
·ç¼wÃdª«¦Ê³f
or
ãà³n³¾å¢
How can I recognize text like that ? I any language, so searching for non-english text is not an option. The only option I can think of is guess-language module.
There's NLTK which has a guess_encoding function that takes a byte string and tries all of the available encodings, would this serve your purpose?
Take a look at https://github.com/LuminosoInsight/python-ftfy
If I understand correctly, it will attempt to 'repair' incorrectly encoded/decoded text.
I call a Python script from crontab. The script does generates colored output using ANSI escapes but when crontab is sending the mail with the output I see the escapes instead of colors.
What is happening is logic but I would like to know if it would be possible to generate a html message instead.
I would like a solution that does not require to implement the email notification myself.
Maybe you can try with some txt to html converter, for example, http://txt2html.sourceforge.net/, you can also use pygments with some modifications.
I am working on a project (in Python) that needs formatted, editable output. Since the end-user isn't going to be technically proficient, the output needs to be in a word processor editable format. The formatting is complex (bullet points, paragraphs, bold face, etc).
Is there a way to generate such a report using Python? I feel like there should be a way to do this using Microsoft Word/OpenOffice templates and Python, but I can't find anything advanced enough to get good formatting. Any suggestions?
A little known, and slightly evil fact: If you create an HTML file, and stick a .doc extension on it, Word will open it as a Word document, and most users will be none the wiser.
Except maybe a very technical person will say, my this is a small Word file! :)
Use the Python Docx module for this - 100% Python, tables, images, document properties, headings, paragraphs, and more.
" The formatting is complex(bullet points, paragraphs, bold face, etc), "
Use RST.
It's trivial to produce, since it's plain text.
It's trivial to edit, since it's plain text with a few extra characters to provide structural information.
It formats nicely using a bunch of tools.
I know there is an odtwriter for docutils. You could generate your output as reStructuredText and feed it to odtwriter or look into what odtwriter is using on the backend to generate the ODT and use that.
(I'd probably go with generating rst output and then hacking odtwriter to output the stuff I want (and contribute the fixes back to the project), because that's probably a whole lot easier that trying to render your stuff to ODT directly.)
I've used xlwt to create Excel documents using python, but I haven't needed to write word files yet. I've found this package, OOoPy, but I haven't used it.
Also you might want to try outputting html files and having the users open them in Word.
You can use QTextDocument, QTextCursor and QTextDocumentWriter in PyQt4. A simple example to show how to write to an odt file:
>>>from pyqt4 import QtGui
# Create a document object
>>>doc = QtGui.QTextDocument()
# Create a cursor pointing to the beginning of the document
>>>cursor = QtGui.QTextCursor(doc)
# Insert some text
>>>cursor.insertText('Hello world')
# Create a writer to save the document
>>>writer = QtGui.QTextDocumentWriter()
>>>writer.supportedDocumentFormats()
[PyQt4.QtCore.QByteArray(b'HTML'), PyQt4.QtCore.QByteArray(b'ODF'), PyQt4.QtCore.QByteArray(b'plaintext')]
>>>odf_format = writer.supportedDocumentFormats()[1]
>>>writer.setFormat(odf_format)
>>>writer.setFileName('hello_world.odt')
>>>writer.write(doc) # Return True if successful
True
QTextCursor also can insert tables, frames, blocks, images. More information at:
http://qt-project.org/doc/qt-4.8/qtextcursor.html
As a bonus, you also can print to a pdf file by using QPrinter.
I think OpenOffice has some Python bindings - you should be able to write OO macros in Python.
But I would use HTML instead - Word and OO.org are rather good at editing it and you can write it from Python easily (although Word saves a lot of mess which could complicate parsing it by your Python app).