Attach a .xlsx file in PDF using python - python

Currently I am using Adobe Acrobat Pro for attaching excel file using the Comments tools present there, but since I have to work with almost hundreds of pdf therefore the manual process becomes a tedious task.
Therefore, I have been trying to automate the process wherein I can a attach a excel file in pdf using python, but as of now haven't found a single reference as how to start the same.
Is this process even possible to automate?

Related

how to get text from a downloadable .doc file without saving?

I'm trying to download a .doc file using requests.get() request (though I've heard about other methods - they all require saving too)
Is there any method I could use to extract the text from it (or even convert it into a .txt for example) straight away without saving it into a file?
I've tried passing request.raw into various conventors (docx2txt.process() for example) but I assume they all work with files, not with streams.
While the script is running the memory allocation are handled by the python interpreter but if you save the content to a file the memory allocated is different. This article can be helpful to you.
Link: article

how to use cache while working with a heavy excel file for extracting data using python

HI I have a rather huge excel file (.xlsx) and has got multiple tab that I need to access for various purpose. Every time I have to read from excel it slows the process down. is that any way I can load selected tabs to cache the first time I read the excel book? thanks

is it possible for multiple users to write data into a file (excel or text) using python

I am trying to get some info (which keeps changing) from different computers into a text file. Is it possible to write using python file by creating a text or excel file.
This is how I am planning
1. The file sits on a common location accessible by all the computers.
2. The python script runs in each computer and updates the file in the common location
3. Whenever I open, I should see the information that is updated.
Problems will arise when opening and saving files by multiple computers.
What happens when a computer opens a file, and saves data on top of another save?
The situation you described would be well suited to a database. A popular open-source and free database server is MySQL which can handle multiple connections and users reading and modifying the data using a language called SQL, and yes, python can connect to databases with this package.
And of course, you can export the database to a spreadsheet file to review on your computer.

Capturing PDF files using Python Selenium Webdriver

We test an application developed in house using a python test suite which accomplishes web navigations/interactions through Selenium WebDriver. A tricky part of our web testing is in dealing with a series of pdf reports in the app. We are testing a planned upgrade of Firefox from v3.6 to v16.0.1, and it turns out that the way we captured reports before no longer works, because of changes in the directory structure of firefox's temp folder. I didn't write the original pdf capturing code, but I will refactor it for whatever we end up using with v16.0.1, so I was wondering if there' s a better way to save a pdf using Python's selenium webdriver bindings than what we're currently doing.
Previously, for Firefox v3.6, after clicking a link that generates a report, we would scan the "C:\Documents and Settings\\Local Settings\Temp\plugtmp" directory for a pdf file (with a specific name convention) to be generated. To be clear, we're not saving the report from the webpage itself, we're just using the one generated in firefox's Temp folder.
In Firefox 16.0.1, after clicking a link that generates a report, the file is generated in "C:\Documents and Settings\ \Local Settings\Temp\tmp*\cache*", with a random file name, not ending in ".pdf". This makes capturing this file somewhat more difficult, if using a technique similar to our previous one - each browser has a different tmp*** folder, which has a cache full of folders, inside of which the report is generated with a random file name.
The easiest solution I can see would be to directly save the pdf, but I haven't found a way to do that yet.
To use the same approach as we used in FF3.6 (finding the pdf in the Temp folder directory), I'm thinking we'll need to do the following:
Figure out which tmp*** folder belongs to this particular browser instance (which we can do be inspecting the tmp*** folders that exist before and after the browser is instantiated)
Look inside that browser's cache for a file generated immedaitely after the pdf report was generated (which we can by comparing timestamps)
In cases where multiple files are generated in the cache, we could possibly sort based on size, and take the largest file, since the pdf will almost certainly be the largest temp file (although this seems flaky and will need to be tested in practice).
I'm not feeling great about this approach, and was wondering if there's a better way to capture pdf files. Can anyone suggest a better approach?
Note: the actual scraping of the PDF file is still working fine.
We ultimately accomplished this by clearing firefox's temporary internet files before the test, then looking for the most recently created file after the report was generated.

Generating correct excel xls format

I created a little script in python to generate an excel compatible xml file (saved with xls extension). The file is generated from a part database so I can place an order with the extracted data.
On the website for ordering the parts, you can import the excel file so the order fills automatically. The problem here is that each time I want to make an order, I have to open excel and save the file with xls extension of type MS Excel 97-2003 to get the import working.
The excel document then looks exactly the same, but when opened with notepad, we cannot see the xml anymore, only binary dump.
Is there a way to automate this process, by running a bat file or maybe adding some line to my python script so it is converted in the proper format?
(I know that question has been asked before, but it never has been answered)
There are two basic approaches to this.
You asked about the first: Automating Excel to open and save the file. There are in fact two ways to do that. The second is to use Python tools that can create the file directly in Python without Excel's help. So:
1a: Automating Excel through its automation interface.
Excel is designed to be controlled by external apps, through COM automation. Python has a great COM-automation interface inside of pywin32. Unfortunately, the documentation on pywin32 is not that great, and all of the documentation on Excel's COM automation interface is written for JScript, VB, .NET, or raw COM in C. Fortunately, there are a number of questions on this site about using win32com to drive Excel, such as this one, so you can probably figure it out yourself. It would look something like this:
import win32com.client
excel = win32com.client.Dispatch('Excel.Application')
spreadsheet = excel.Workbooks.Open('C:/path/to/spreadsheet.xml')
spreadsheet.SaveAs('C:/path/to/spreadsheet.xls', fileformat=excel.xlExcel8)
That isn't tested in any way, because I don't have a Windows box with Excel handy. And I vaguely remember having problems getting access to the fileformat names from win32com and just punting and looking up the equivalent numbers (a quick google for "fileformat xlExcel8" shows that the numerical equivalent is 56, and confirms that's the right format for 97-2003 binary xls).
Of course if you don't need to do it in Python, MSDN is full of great examples in JScript, VBA, etc.
The documentation you need is all on MSDN (since the Office Developer Network for Excel was merged into MSDN, and then apparently became a 404 page). The top-level page for Excel is Welcome to the Excel 2013 developer reference (if you want a different version, click on "Office client development" in the navigation thingy above and pick a different version), and what you mostly care about is the Object model reference. You can also find the same documentation (often links to the exact same webpages) in Excel's built-in help. For example, that's where you find out that the Application object has a Workbooks property, which is a Workbooks object, which has Open and Add methods that return a Workbook object, which has a SaveAs method, which takes an optional FileFormat parameter of type XlFileFormat, which has a value xlExcel8 = 56.
As I implied earlier, you may not be able to access enumeration values like xlExcel8 for some reason which I no longer remember, but you can look the value up on MSDN (or just Google it) and put the number 56 instead.
The other documentation (both here and elsewhere within MSDN) is usually either stuff you can guess yourself, or stuff that isn't relevant from win32com. Unfortunately, the already-sparse win32com documentation expects you to have read that documentation—but fortunately, the examples are enough to muddle your way through almost everything but the object model.
1b: Automating Excel via its GUI.
Automating a GUI on Windows is a huge pain, but there are a number of tools that make it a whole lot easier, such as pywinauto. You may be able to just use swapy to write the pywinauto script for you.
If you don't need to do it in Python, separate scripting systems like AutoIt have an even larger user base and even more examples to make your life easier.
2: Doing it all in Python.
xlutils, part of python-excel, may be able to do what you want, without touching Excel at all.

Categories

Resources