How to automate SAS enterprise guide reports with Python Script?

How to automate SAS enterprise guide reports with Python Script? - python

I tried with SASpy but it's not working. I am able to open the SAS .egp file but not able to run the multiple scripts within in sequence.
import os, sys, subprocess
def OpenProject(sas_exe, egp_path):
sasExe = sas_exe
sasEGpath = egp_path
subprocess.call([sasExe, sasEGpath])
sas_exe = path\path\
egp_path = path\path\path\
OpenProject(sas_exe, egp_path)

This depends a bit on exactly what the workflow is. A few side notes, then the full solution.
First: EGP is not really intended to store production processes, in my opinion. EGP should really be used for development, then production is done with .sas (text) files. EGP can directly store the nodes as .sas files; ask a new question about that if you want to know more, but it's pretty easy to figure out. Best practice is to have EGP save the code modules as .sas files, then run those - SASPy will easily do that for you.
Second: If you use SAS's built-in Git connectivity, then you can do this a bit more easily I suspect. Consider doing that if you already use Git for your other processes. Again, then you end up with a .sas file, and can directly run that via SASPy.
So: how can you do this in Python, with the assumption you do have to use the .egp itself, without too many different moving parts? The key here is the .egp format. EGP is a container file, which is actually a .zip format container that has in it, among other things, all of the SAS code you want to run, as text. Text in xml format, but still, text.
You can write a python program that opens the .egp as a .zip file, using the zipfile library, and then use xml.etree.ElementTree to parse the project.xml file inside that project. Exactly what you do from there depends on your particular details, and is well out of scope for a Stack Overflow answer, but if you do better visually you can simply rename the .egp to .zip and then open in unzip program of your choice, then browse project.xml in your text editor, and find the nodes and code related to those nodes.
You can then extract the .sas code as text, and submit it directly via SASPy, or extract it to a .sas file and then submit that however you prefer (SASPy or something else).
I do something similar to this for a project - I don't actually run code from it, I'm just parsing it to verify that the correct programs were synced from the EGP to production - but it would be trivial to actually submit the code from what I've written, which is about 50 lines of code total. I may write a SGF paper this year or next year on this topic, in which case I'll try and remember to submit it here - or you can head over to my github page and see if it's there (in the future!).

Related

Create direct file object in python

I have images in 100 folders and the search results are slow, so I want to access those images, so maybe I wanna do it with python(if it is faster), in the way that when we select all the files, and drag and drop them in windows. then I realized that drag and drop in windows uses Component Object Model this source.
So I want to know is there any way in python to have COMs of the image files in those 100 folders in the same place (a specific folder)? or in other words can we create COMs of other files, (equivalent of shortcuts), cause I know shortcuts for my purpose won't work.
The question in general is about how to access direct handles or COMs of files of different folders in one folder? if it's possible, please tell me how? to be simpler I want to have similar function of file shortcuts but not 'shortcuts' existing in windows, because for my purpose 'shortcuts' won't work, so I think it can be done with COMs.
tkinter equivalent question:
let me ask my question in other way, lets think I want to make a windows file search application in python with some library like tkinter, so one background part of my code finds the file paths of desired search results, and other part in gui('gui part'): wants to show the result files with ability of opening files from that gui or drag files from gui to other folder or applications, so how should I do the 'gui part'?
this tutorial suggested by #Thingamabobs is about getting external files into window(gui) of app, but I want the opposite, I mean having file handles to open, something like windows explorer
My question maybe wrong in case of misunderstanding the concept of COMs, so please provide me more relevant sources of use case of mine. finally if the title seems to be unsuitable, feel free to change it.

Based on an interpretation of the question, the following is an initial summary approach to a solution.
"""
This module will enable easy access to files spread across 100 plus
directories. A file should be as easy to open as clicking on a link.
Analysis:
Will any files be duplicated in any other directory? Do not know.
Will any file name be the same as another file in a different directory? Do
not know.
Initial design in pseudocode:
> Capture absolute path to each file in each directory.
> Store files information in python data structure
> for instance a list of tuples <path>,<filename>
> Once a data structure is determined use Tkinter, ttk.treeview to open a
file as easy as clicking on a link in the tree.
"""

Data storage for standalone python application

I want to make a python program (with a PyQt GUI, but I don't know whether that is relevant) that has to save some information that I want to store even when the program closes. Example for information I want to store:
The user can search for a file in a file dialog window. I want to start the file dialog window in the previously used directory, even if the program is closed in between file searches.
The user can enter their own categories to sort items, building up on some of my predefined categories. These new categories should be available the next time the program starts.
Now I'm wondering what the proper way to store such information is. Should I use pickle? A proper database (I know a tiny bit of sqlite3, but would have to read up on that)? A simple text file that I parse myself? One thing for data like in example 1., another for data like in example 2.?
Also, whatever way to store it I use, where would I put that file?
I'm asking in the context that I might want to later make my program available to others as a standalone application (using py2app, py2exe or PyInstaller).
Right now I'm just saving a pickle file in the directory that my .py file is in, like this answer reconmends, but the answer also specifically mentions:
for a personal project it might be enough.
(emphasis mine)
Is using pickle also the "proper, professional" way, if I want to make the program available to other people as a standalone application?

Choice depends on your approach to data you store, which is yours?:
user should be able to alter it without usage of my program
user should be prevented from altering it with program other than my program
If first you might consider deploying JSON open-standard file format, for which Python has ready library called json. In effect you get text (which you can save to file) which is human-readable and can be edited in text editor. Also there exist JSON file viewers and editors which made viewing/editing of JSON files easier.

I think SQLite3 is the better solution in this case as Moldovan commented.
There is a problem in pickle, sometimes pickling format can be change across python versions and there are greater advantages of using sqlite3.

Capturing PDF files using Python Selenium Webdriver

We test an application developed in house using a python test suite which accomplishes web navigations/interactions through Selenium WebDriver. A tricky part of our web testing is in dealing with a series of pdf reports in the app. We are testing a planned upgrade of Firefox from v3.6 to v16.0.1, and it turns out that the way we captured reports before no longer works, because of changes in the directory structure of firefox's temp folder. I didn't write the original pdf capturing code, but I will refactor it for whatever we end up using with v16.0.1, so I was wondering if there' s a better way to save a pdf using Python's selenium webdriver bindings than what we're currently doing.
Previously, for Firefox v3.6, after clicking a link that generates a report, we would scan the "C:\Documents and Settings\\Local Settings\Temp\plugtmp" directory for a pdf file (with a specific name convention) to be generated. To be clear, we're not saving the report from the webpage itself, we're just using the one generated in firefox's Temp folder.
In Firefox 16.0.1, after clicking a link that generates a report, the file is generated in "C:\Documents and Settings\ \Local Settings\Temp\tmp*\cache*", with a random file name, not ending in ".pdf". This makes capturing this file somewhat more difficult, if using a technique similar to our previous one - each browser has a different tmp*** folder, which has a cache full of folders, inside of which the report is generated with a random file name.
The easiest solution I can see would be to directly save the pdf, but I haven't found a way to do that yet.
To use the same approach as we used in FF3.6 (finding the pdf in the Temp folder directory), I'm thinking we'll need to do the following:
Figure out which tmp*** folder belongs to this particular browser instance (which we can do be inspecting the tmp*** folders that exist before and after the browser is instantiated)
Look inside that browser's cache for a file generated immedaitely after the pdf report was generated (which we can by comparing timestamps)
In cases where multiple files are generated in the cache, we could possibly sort based on size, and take the largest file, since the pdf will almost certainly be the largest temp file (although this seems flaky and will need to be tested in practice).
I'm not feeling great about this approach, and was wondering if there's a better way to capture pdf files. Can anyone suggest a better approach?
Note: the actual scraping of the PDF file is still working fine.

We ultimately accomplished this by clearing firefox's temporary internet files before the test, then looking for the most recently created file after the report was generated.

Script to search for text from PDF

Problem
On the Mac OS X platform, I would like to write a script, either in Python or Tcl to search for text within a PDF file and extract the relevant parts. I appreciate any help.
Background
I am writing scripts to look inside a PDF to determine if it is a bill, from what company, and for what period. Based on these information, I rename the PDF and move it to an appropriate directory. For example, file such as Statement_03948293929384.pdf might become 2012-07-15 Water Bill.pdf and moved to my Utilities folder.
What have I done so far?
I have searched for PDF-to-plain-text tools, but not found anything yet
I have looked into the Tcl wiki and found an example, but could not get it to work (I searched for text in PDF, but not found).
I am looking into pdf-parser.py by Didier Stevens
I heard of a Python package called pyPdf and will look at it next.
Update
I have found a command-line tool called pdftotext written by Glyph & Cog, LLC; built and packaged by Carsten Bluem. This tool is straight forward and it solves my problem. I am still looking out for those tools that can search PDF directly, without having to convert to text file.

I have successfully used PyODConverter to convert to/from PDFs (there is also a more powerful Java version). Once you have the PDF converted to text it should be trivial to do the searching. Also I believe iText should be capable of doing similar things, but I haven't tested it.

Create a 'single-serving site' with python

I want to make a Python script available as a service on the net. The script, which is my first 'proper' Python program, takes a txt file as argument and writes an image into the work directory. So:
How difficult is it for somebody who is new to Python and web development?
How much work is it?
Do I need a framework (Django, cherryPy, web2py)?
Are there good tutorials?
How do I avoid the server to be compromised?
What are my next steps?
==> What is the easiest way?
In the end it is enough, if it is a white page, with some text, and a button, which when clicked, opens a file dialog. After the txt is processed, the server should just return the image, which was written on the hard drive. Already I have access to a server which has Ubuntu installed through a friend.
[update]
Thanks for all your answers. After reading them I want to stress again, that I want to have it as minimal as possible. Srikar's suggestion sounds like the easiest one:
Put it in executable directory of your OS (commonly known as CGI
path). Provide a simple HTML form & upon form submission hit this
script which executes & returns back the image you want to display.
Any objections or comments? Do you know any tutorials for that?
[udpate2]
I found this SO answer: File Sharing Site in Python Is this a sensible approach?

It's not too difficult. Actually, it sounds like a good first project.
That too subjective to answer. An hour to days.
No, you don't need one, but I'd use one if I were you. They abstract away some of the stuff you really don't care about, and you'll learn a tool you can use again in the future.
Plenty. If you want a real rundown of how Python works for the web, read the HOWTO from Python.org. If you just want to learn how to do this one project, pick a framework and do their tutorial.
This question is so broad and complex that I'm not going to try to answer it. Search this site, or Google, for questions like that.
Your next step should be to pick a framework; I've used Django successfully. Just download it, follow the installation instructions, and work your way through their tutorial; it should tell you everything you need to know to do what you want. If you still have questions once you've learned how to do the basics, come back and ask again!
Edit: The answer to that other question will certainly work for you. There, they just receive a GET request and respond with data from a Python file. You need to receive a GET request, respond with an HTML page (easy enough), then respond to a POST request that includes an uploaded file (slightly more complicated) and run your python routine on the uploaded file and then respond with the created image (or a link to it).
Take a look at this page which includes a simple Python script to do file uploads. You should easily be able to modify it to do what you want.

How difficult is it for somebody who is new to Python and web development?
Depends on your level of knowledge.
How much work is it?
Depends on which method you choose to solve the problem.
Do I need a framework (Django, cherryPy, web2py)?
Not necessarily - you could get started by using the CGI (http://docs.python.org/library/cgi.html)
Are there good tutorials?
Yes, there are plenty. The Python docs are an excellent place to start.
How do I avoid the server to be compromised?
Again, depends on the method you choose to solve the problem, although there are commonalities.
What are my next steps?
Dare I say it again, choose a method, read the docs, have a play!

If its just as simple as you have described it. Then you might not even need Django. You could simply use CGI scripting. All of these design decisions, depend on whether
You need (or foresee) a SQL storage?
or a Content-Management-System?
Will you need multiple-user support?
Do you need tight security?
Do you need different privileges for different users?
Do you need an Admin to manage your site?
If the answer to above questions is atleast 60% correct, then you might consider Django. otherwise, just write a python script. Put it in executable directory of your OS (commonly known as CGI path). Provide a simple HTML form & upon form submission hit this script which executes & returns back the image you want to display. So, it all depends on the features you need...

In the end, I created what I needed with Flask.
They have a well documented pattern / tutorial on Uploading Files. The tutorial is understandable even for people with little python and web expericence.
To get a first working version it took me 2h and the resulting code was only 50 lines. This includes, starting the webserver, having a html file/form with file upload and serving a file back to the user.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.