Currently i am working on creating a template, my requirement is I should copy contents from a text document and paste it in the template which i am creating.
I want to know a method in python webdriver to do so, i searched in the web but ended up without finding a solution, i found a similar issue Copy odt file to clipboard and paste to another file with python 3.2> here but no solutions, any help will be grateful to me as i spent more time on this particular task.
Thanks in advance !
This is not much to do with the webdriver, but more to do with python. As in, how do you read an ODT file using Python? That is the core of what you are doing, so webdriver is not related to the question.
With that said, there is a standard library for this, so give it a go, this can interact with all MS Office and Open Office files:
https://github.com/mikemaccana/python-docx
There is also a COM-based library here that can interact with Word & Excel:
http://python.net/crew/pirx/spam7/
If it's OpenOffice based files, there is the ability to automate Open Office itself for whatever you are trying to do:
http://wiki.services.openoffice.org/wiki/Python
It depends on what type of text document (you only specified it was a 'text document') - if it is a simple .txt document this is very simple and easy.
Related
I'm looking for a way to convert excel to html while preserving formatting.
I know this is doable on windows due to the availability of some underlying win32 libraries, (eg via xlwings
Python - Excel to HTML (keeping format))
But I'm looking for a solution on Linux.
I've also come by Aspose Cells but this requires a paid license or else it will add a lot of extra junk to the output that needs to be scrubbed out.
And lastly I tried the python lib xlsx2html but it does a very poor job at preserving formatting.
Are there any suggestions for a Linux based solution? I'd also be interested in tools written in other languages that can be easily wrapped around via python.
Thanks in advance!
Update:
Here is an example of a random excel sheet I converted via excel itself that I would like to reproduce. It has some colors, some border variations, some merged cells and some font sizes to see if they all work.
You can use LibreOffice to convert an Excel file to a HTML file using the command line:
# --convert-to implies --headless so it's not mandatory to specify --headless
soffice --headless --convert-to html data.xlsx
You can refer to the documentation to know more about other CLI parameters.
I think you should search for Excel to HTML in the JS world not python (I am not saying it is not possible, but It's more usual in JS), I promise you will get better results.
In my opinion, finding a JS-based solution and make a python wrapper can be more helpful. Because in JS community, they struggled more than another communities to import and work with Excels.
Another idea is to change your approach, look for how you can import an Excel file in an embedded way or iframe inside an HTML page with JS and then export it.
But again, I highly recommend to check JS libraries or GitHub repositories, some of them care about formatting.
I have looked all over for an answer and I can't find what I'm looking for and I'm sure it is simple.
Anyway I created a docx document with python docx and tkinter. Everything works well there now I'm looking to open that file as soon as it is saved. The only problem is I have no idea how to go about doing this. My first thought was...
f = open("path\\"+name_created_by_python+".docx", mode="r")
But as some (or all) know that doesn't work for physically opening a document. Any suggestions.
UPDATE/CLARIFICATION: I want the code to save the doc (which it does) and then immediately open it up so the user can view it without having to physically go to the folder it is located. I suppose the code above "opens" it but doesn't really open it in the way that I want or need.
Thanks in advance
So I found what I was looking for. Below is the code required for windows.
import os
os.startfile("path"+filename+".docx")
The +filename+ is the method used to give my docx document its specific name.
We test an application developed in house using a python test suite which accomplishes web navigations/interactions through Selenium WebDriver. A tricky part of our web testing is in dealing with a series of pdf reports in the app. We are testing a planned upgrade of Firefox from v3.6 to v16.0.1, and it turns out that the way we captured reports before no longer works, because of changes in the directory structure of firefox's temp folder. I didn't write the original pdf capturing code, but I will refactor it for whatever we end up using with v16.0.1, so I was wondering if there' s a better way to save a pdf using Python's selenium webdriver bindings than what we're currently doing.
Previously, for Firefox v3.6, after clicking a link that generates a report, we would scan the "C:\Documents and Settings\\Local Settings\Temp\plugtmp" directory for a pdf file (with a specific name convention) to be generated. To be clear, we're not saving the report from the webpage itself, we're just using the one generated in firefox's Temp folder.
In Firefox 16.0.1, after clicking a link that generates a report, the file is generated in "C:\Documents and Settings\ \Local Settings\Temp\tmp*\cache*", with a random file name, not ending in ".pdf". This makes capturing this file somewhat more difficult, if using a technique similar to our previous one - each browser has a different tmp*** folder, which has a cache full of folders, inside of which the report is generated with a random file name.
The easiest solution I can see would be to directly save the pdf, but I haven't found a way to do that yet.
To use the same approach as we used in FF3.6 (finding the pdf in the Temp folder directory), I'm thinking we'll need to do the following:
Figure out which tmp*** folder belongs to this particular browser instance (which we can do be inspecting the tmp*** folders that exist before and after the browser is instantiated)
Look inside that browser's cache for a file generated immedaitely after the pdf report was generated (which we can by comparing timestamps)
In cases where multiple files are generated in the cache, we could possibly sort based on size, and take the largest file, since the pdf will almost certainly be the largest temp file (although this seems flaky and will need to be tested in practice).
I'm not feeling great about this approach, and was wondering if there's a better way to capture pdf files. Can anyone suggest a better approach?
Note: the actual scraping of the PDF file is still working fine.
We ultimately accomplished this by clearing firefox's temporary internet files before the test, then looking for the most recently created file after the report was generated.
Problem
On the Mac OS X platform, I would like to write a script, either in Python or Tcl to search for text within a PDF file and extract the relevant parts. I appreciate any help.
Background
I am writing scripts to look inside a PDF to determine if it is a bill, from what company, and for what period. Based on these information, I rename the PDF and move it to an appropriate directory. For example, file such as Statement_03948293929384.pdf might become 2012-07-15 Water Bill.pdf and moved to my Utilities folder.
What have I done so far?
I have searched for PDF-to-plain-text tools, but not found anything yet
I have looked into the Tcl wiki and found an example, but could not get it to work (I searched for text in PDF, but not found).
I am looking into pdf-parser.py by Didier Stevens
I heard of a Python package called pyPdf and will look at it next.
Update
I have found a command-line tool called pdftotext written by Glyph & Cog, LLC; built and packaged by Carsten Bluem. This tool is straight forward and it solves my problem. I am still looking out for those tools that can search PDF directly, without having to convert to text file.
I have successfully used PyODConverter to convert to/from PDFs (there is also a more powerful Java version). Once you have the PDF converted to text it should be trivial to do the searching. Also I believe iText should be capable of doing similar things, but I haven't tested it.
I'm working on a script to create an OpenOffice document. After this i want to save the file. Maybe later also as an PDF.. Google doesn't give me any information how to fix this..
My question here is: What method should be used to save an openoffice-writer document?
Thanks in advance!
You should look at this similar question which answer covers both MSWord and OOWriter (by the way, creating a Word file could be the easiest to be read with OpenOffice).
How can I create a Word document using Python?
Alexis
You can create a rtf file with pyrtf or it's variants, and for pdf you can use reportlab. These are libraries for use in python, not to control remotely oo. There are other libraries for other formats.