Capturing PDF files using Python Selenium Webdriver

Capturing PDF files using Python Selenium Webdriver - python

We test an application developed in house using a python test suite which accomplishes web navigations/interactions through Selenium WebDriver. A tricky part of our web testing is in dealing with a series of pdf reports in the app. We are testing a planned upgrade of Firefox from v3.6 to v16.0.1, and it turns out that the way we captured reports before no longer works, because of changes in the directory structure of firefox's temp folder. I didn't write the original pdf capturing code, but I will refactor it for whatever we end up using with v16.0.1, so I was wondering if there' s a better way to save a pdf using Python's selenium webdriver bindings than what we're currently doing.
Previously, for Firefox v3.6, after clicking a link that generates a report, we would scan the "C:\Documents and Settings\\Local Settings\Temp\plugtmp" directory for a pdf file (with a specific name convention) to be generated. To be clear, we're not saving the report from the webpage itself, we're just using the one generated in firefox's Temp folder.
In Firefox 16.0.1, after clicking a link that generates a report, the file is generated in "C:\Documents and Settings\ \Local Settings\Temp\tmp*\cache*", with a random file name, not ending in ".pdf". This makes capturing this file somewhat more difficult, if using a technique similar to our previous one - each browser has a different tmp*** folder, which has a cache full of folders, inside of which the report is generated with a random file name.
The easiest solution I can see would be to directly save the pdf, but I haven't found a way to do that yet.
To use the same approach as we used in FF3.6 (finding the pdf in the Temp folder directory), I'm thinking we'll need to do the following:
Figure out which tmp*** folder belongs to this particular browser instance (which we can do be inspecting the tmp*** folders that exist before and after the browser is instantiated)
Look inside that browser's cache for a file generated immedaitely after the pdf report was generated (which we can by comparing timestamps)
In cases where multiple files are generated in the cache, we could possibly sort based on size, and take the largest file, since the pdf will almost certainly be the largest temp file (although this seems flaky and will need to be tested in practice).
I'm not feeling great about this approach, and was wondering if there's a better way to capture pdf files. Can anyone suggest a better approach?
Note: the actual scraping of the PDF file is still working fine.

We ultimately accomplished this by clearing firefox's temporary internet files before the test, then looking for the most recently created file after the report was generated.

Related

Download local files off of a website using Selenium

I need to inspect the contents of this file (or download it and somehow search that way) named "(index)" (it doesn't have any extension if that matters) on my target website (reddit.com) to search through and find an access token to then use later in my script.
How would I go about doing this? Also, this seems like a roundabout way of doing it, so do you have any suggestions that would be better? For example, is there a command I can execute in the console with driver.execute_script("something") to then search the file for me?
I have tried driver.page_source, but, as expected it doesn't return anything involving the pages files.

Create direct file object in python

I have images in 100 folders and the search results are slow, so I want to access those images, so maybe I wanna do it with python(if it is faster), in the way that when we select all the files, and drag and drop them in windows. then I realized that drag and drop in windows uses Component Object Model this source.
So I want to know is there any way in python to have COMs of the image files in those 100 folders in the same place (a specific folder)? or in other words can we create COMs of other files, (equivalent of shortcuts), cause I know shortcuts for my purpose won't work.
The question in general is about how to access direct handles or COMs of files of different folders in one folder? if it's possible, please tell me how? to be simpler I want to have similar function of file shortcuts but not 'shortcuts' existing in windows, because for my purpose 'shortcuts' won't work, so I think it can be done with COMs.
tkinter equivalent question:
let me ask my question in other way, lets think I want to make a windows file search application in python with some library like tkinter, so one background part of my code finds the file paths of desired search results, and other part in gui('gui part'): wants to show the result files with ability of opening files from that gui or drag files from gui to other folder or applications, so how should I do the 'gui part'?
this tutorial suggested by #Thingamabobs is about getting external files into window(gui) of app, but I want the opposite, I mean having file handles to open, something like windows explorer
My question maybe wrong in case of misunderstanding the concept of COMs, so please provide me more relevant sources of use case of mine. finally if the title seems to be unsuitable, feel free to change it.

Based on an interpretation of the question, the following is an initial summary approach to a solution.
"""
This module will enable easy access to files spread across 100 plus
directories. A file should be as easy to open as clicking on a link.
Analysis:
Will any files be duplicated in any other directory? Do not know.
Will any file name be the same as another file in a different directory? Do
not know.
Initial design in pseudocode:
> Capture absolute path to each file in each directory.
> Store files information in python data structure
> for instance a list of tuples <path>,<filename>
> Once a data structure is determined use Tkinter, ttk.treeview to open a
file as easy as clicking on a link in the tree.
"""

How to automate SAS enterprise guide reports with Python Script?

I tried with SASpy but it's not working. I am able to open the SAS .egp file but not able to run the multiple scripts within in sequence.
import os, sys, subprocess
def OpenProject(sas_exe, egp_path):
sasExe = sas_exe
sasEGpath = egp_path
subprocess.call([sasExe, sasEGpath])
sas_exe = path\path\
egp_path = path\path\path\
OpenProject(sas_exe, egp_path)

This depends a bit on exactly what the workflow is. A few side notes, then the full solution.
First: EGP is not really intended to store production processes, in my opinion. EGP should really be used for development, then production is done with .sas (text) files. EGP can directly store the nodes as .sas files; ask a new question about that if you want to know more, but it's pretty easy to figure out. Best practice is to have EGP save the code modules as .sas files, then run those - SASPy will easily do that for you.
Second: If you use SAS's built-in Git connectivity, then you can do this a bit more easily I suspect. Consider doing that if you already use Git for your other processes. Again, then you end up with a .sas file, and can directly run that via SASPy.
So: how can you do this in Python, with the assumption you do have to use the .egp itself, without too many different moving parts? The key here is the .egp format. EGP is a container file, which is actually a .zip format container that has in it, among other things, all of the SAS code you want to run, as text. Text in xml format, but still, text.
You can write a python program that opens the .egp as a .zip file, using the zipfile library, and then use xml.etree.ElementTree to parse the project.xml file inside that project. Exactly what you do from there depends on your particular details, and is well out of scope for a Stack Overflow answer, but if you do better visually you can simply rename the .egp to .zip and then open in unzip program of your choice, then browse project.xml in your text editor, and find the nodes and code related to those nodes.
You can then extract the .sas code as text, and submit it directly via SASPy, or extract it to a .sas file and then submit that however you prefer (SASPy or something else).
I do something similar to this for a project - I don't actually run code from it, I'm just parsing it to verify that the correct programs were synced from the EGP to production - but it would be trivial to actually submit the code from what I've written, which is about 50 lines of code total. I may write a SGF paper this year or next year on this topic, in which case I'll try and remember to submit it here - or you can head over to my github page and see if it's there (in the future!).

web2py site doesn't load all images/videos (especially larger ones)

First of all, I must say I have seen something similar to this in the web2py discussion group, but I couldn't understand it very well.
I've set up a database-driven website using web2py in which the entries are just HTML text. Most of them will contain img and/or video tags that point to relative URLs; these files are stored in folders with the address pattern static/content/article/<article-name> and the document's base href is set via the controller to make these links work. So, the images are stored and referenced directly, without all the upload/download machinery.
I'm testing it locally and using Rocket server because I'm not allowed to install Apache in this PC.
The problem:
Everything works fine, except, as it seems, when there are several "large" files being requested. By "large" I mean 4Mb files were enough, which isn't really a lot (and I think slightly smaller files would produce the same result). I'm pretty sure the links aren't broken since 1) by copying/pasting their URLs in the browser they show up normally, 2) the images/videos appear well/broken randomly when I refresh the page and 3) sometimes a video loads until a certain point and then stops, and the browser inspector shows a 'fail' signal. When I replaced these files with smaller ones (each with a dozen kb), all of them loaded. Another thing to consider is that sometimes it takes a really long time until the page finishes loading (from 2 seconds to several minutes).
The questions:
Is this the simplest/optimal way of getting the job done? I'm aware that web2py has some neat features like upload fields, but I don't know how I could make these files be effortlessly referenced in the document, considering there will be some special features in such pages involving the static files. So the solution I've come up with so far was to create a directory which name equals to the entry's and store the files there, as I said before. Is it an overkill considering what web2py has to offer?
If the answer to the first question is something like "yes", then (obvious question) what may be causing the problem and how do I fix it? Does it have something to do with the fact that web2py sends static files in chunks of 1Mb? Might it be the Rocket server? Or because I'm testing it locally?
Thanks in advance!

It's hard to give you an answer without knowing some details...
Where is hosted your Web2py application?
Do you use apache? nginx?
Did you deploy using a one step-deploy script? (http://web2py.googlecode.com/hg/scripts/setup-web2py-ubuntu.sh)
But in any case, you can (should) :
Configure Apache/Nginx to serve your static files directly (files in /YourApp/static/.). See "setup-web2py-" scripts in the "scripts" folder for more informations
Use scripts/zip_static_files.py to create gzipped versions of your static files. You can create a cron to run "python web2py.py -S myapp -R scripts/zip_static_files.py"
More details about efficiency in the book : http://web2py.com/books/default/chapter/29/13/deployment-recipes?search=static+files#Efficiency-and-scalability

copy paste from a file to another file in python webdriver

Currently i am working on creating a template, my requirement is I should copy contents from a text document and paste it in the template which i am creating.
I want to know a method in python webdriver to do so, i searched in the web but ended up without finding a solution, i found a similar issue Copy odt file to clipboard and paste to another file with python 3.2> here but no solutions, any help will be grateful to me as i spent more time on this particular task.
Thanks in advance !

This is not much to do with the webdriver, but more to do with python. As in, how do you read an ODT file using Python? That is the core of what you are doing, so webdriver is not related to the question.
With that said, there is a standard library for this, so give it a go, this can interact with all MS Office and Open Office files:
https://github.com/mikemaccana/python-docx
There is also a COM-based library here that can interact with Word & Excel:
http://python.net/crew/pirx/spam7/
If it's OpenOffice based files, there is the ability to automate Open Office itself for whatever you are trying to do:
http://wiki.services.openoffice.org/wiki/Python
It depends on what type of text document (you only specified it was a 'text document') - if it is a simple .txt document this is very simple and easy.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.