Is it possible to edit doc files with Python? - python

I have a set of .doc files which I want to perform some simple changes to (e.g. set the font of all the text in each file to be arial).
I don't want to do all the operations manually. I thought I'll try to automate it with a Python script. Is it a complicated task? How is it done?
I use Python 3.

The Python docx module should be helpful.
(2nd time this question was asked today!)

Related

How to create a dynamic form with python using translated text as input?

I have an original text that I want to translate. I normally do it manually but I know I could save a lot of time translating automatically the most frequent words and expressions.
I will find out how to translate simple words, the problem is not here. I have read some books on python and I think using string manipulations can be done.
But I am lost about how to create the output file.
The output file will contain:
short empty forms ready to be filled wherever there is text that has not been translated
the translated words wherever they were in the original file
In the output file I will fill manually the empty forms, after pressing Tab the cursor should jump to the next exmpty form
I am lost here, I know how to do forms on html but the language I am used to is Python.
I would like to know what modules from Python I could use. I need some guidance on this.
Can you recommend me a book or a tool that explains how to do something similar to this?
This is what I want to do, assuming I have managed to create a simple database to translate colors from Spanish to English.
The first step contains the original file.
The second step contains the automatic translation.
In the third step I complete the manual translation.
After finishing everything is grouped into a normal txt file ready to be used.
I think it is quite clear. I don't expect people to tell me the code to do this, I just need to know what tools could be used to achieve my goal.
Thanks for editing.
To create an interface that works with a web browser, Flask for Python is a good method for creating webforms. There are tutorials available.
One method for storing data would be an SQLite file. That may be more than you need, so I'd recommend starting with a CSV file. Libraries exist in Python for both CSVs and SQLite.

Data storage for standalone python application

I want to make a python program (with a PyQt GUI, but I don't know whether that is relevant) that has to save some information that I want to store even when the program closes. Example for information I want to store:
The user can search for a file in a file dialog window. I want to start the file dialog window in the previously used directory, even if the program is closed in between file searches.
The user can enter their own categories to sort items, building up on some of my predefined categories. These new categories should be available the next time the program starts.
Now I'm wondering what the proper way to store such information is. Should I use pickle? A proper database (I know a tiny bit of sqlite3, but would have to read up on that)? A simple text file that I parse myself? One thing for data like in example 1., another for data like in example 2.?
Also, whatever way to store it I use, where would I put that file?
I'm asking in the context that I might want to later make my program available to others as a standalone application (using py2app, py2exe or PyInstaller).
Right now I'm just saving a pickle file in the directory that my .py file is in, like this answer reconmends, but the answer also specifically mentions:
for a personal project it might be enough.
(emphasis mine)
Is using pickle also the "proper, professional" way, if I want to make the program available to other people as a standalone application?
Choice depends on your approach to data you store, which is yours?:
user should be able to alter it without usage of my program
user should be prevented from altering it with program other than my program
If first you might consider deploying JSON open-standard file format, for which Python has ready library called json. In effect you get text (which you can save to file) which is human-readable and can be edited in text editor. Also there exist JSON file viewers and editors which made viewing/editing of JSON files easier.
I think SQLite3 is the better solution in this case as Moldovan commented.
There is a problem in pickle, sometimes pickling format can be change across python versions and there are greater advantages of using sqlite3.

Search office documents for string python

I've been looking for a fast and relatively easy way of searching (grep-ish) for user-defined strings in files of varying formats, i.e xlsx, docx, pptx, pdf using Python.
My research has led me to believe that there might not be a convenient way of doing this, as per a single module or similar. Am I forced to use a separate module for each file type? And if so are these approriate?
docx
openpyxl
pptx
slate
I also looked at forms of decompression to get to the xml-files containing actual text but it seems unwieldy. I just want to be sure that there is no simple, uniform way of handling all of these different filetypes.
Well, I've mostly figured it out. In the end I decided to use powershell combined with "itextsharp.dll" to process the files. It turned out to be simpler than using portable python. Thanks for the answers:-)

Saving files openoffice Python

I'm working on a script to create an OpenOffice document. After this i want to save the file. Maybe later also as an PDF.. Google doesn't give me any information how to fix this..
My question here is: What method should be used to save an openoffice-writer document?
Thanks in advance!
You should look at this similar question which answer covers both MSWord and OOWriter (by the way, creating a Word file could be the easiest to be read with OpenOffice).
How can I create a Word document using Python?
Alexis
You can create a rtf file with pyrtf or it's variants, and for pdf you can use reportlab. These are libraries for use in python, not to control remotely oo. There are other libraries for other formats.

copy paste from a file to another file in python webdriver

Currently i am working on creating a template, my requirement is I should copy contents from a text document and paste it in the template which i am creating.
I want to know a method in python webdriver to do so, i searched in the web but ended up without finding a solution, i found a similar issue Copy odt file to clipboard and paste to another file with python 3.2> here but no solutions, any help will be grateful to me as i spent more time on this particular task.
Thanks in advance !
This is not much to do with the webdriver, but more to do with python. As in, how do you read an ODT file using Python? That is the core of what you are doing, so webdriver is not related to the question.
With that said, there is a standard library for this, so give it a go, this can interact with all MS Office and Open Office files:
https://github.com/mikemaccana/python-docx
There is also a COM-based library here that can interact with Word & Excel:
http://python.net/crew/pirx/spam7/
If it's OpenOffice based files, there is the ability to automate Open Office itself for whatever you are trying to do:
http://wiki.services.openoffice.org/wiki/Python
It depends on what type of text document (you only specified it was a 'text document') - if it is a simple .txt document this is very simple and easy.

Categories

Resources