Print PDFs automatically with python

Print PDFs automatically with python - python

I am building a website which accepts pdf from users and the options, like pages to print, copies, color or black&white and the shop from which they want to get it printed.
The pdf will be stored in server and will be passed on to the shop to print. How do i get it printed automatically with those options applied. One way i thought was to edit the pdf and sent to the store to print with the options applied.
How do i print the pdf automatically and report back to the server that the pdf was printed?
chose python as it may have easy implementation.
BTW i'll build website using NodeJS

You can do the following:
import os
os.startfile("C:/Users/TestFile.txt", "print")
This will start the file, in its default opener, with the verb 'print', which will print to your default printer.Only requires the os module which comes with the standard library
This only works on windows. So if you want it to work on other OS's you'll need a way to detect which OS the pdf is being sent to.

Related

How to automate SAS enterprise guide reports with Python Script?

I tried with SASpy but it's not working. I am able to open the SAS .egp file but not able to run the multiple scripts within in sequence.
import os, sys, subprocess
def OpenProject(sas_exe, egp_path):
sasExe = sas_exe
sasEGpath = egp_path
subprocess.call([sasExe, sasEGpath])
sas_exe = path\path\
egp_path = path\path\path\
OpenProject(sas_exe, egp_path)

This depends a bit on exactly what the workflow is. A few side notes, then the full solution.
First: EGP is not really intended to store production processes, in my opinion. EGP should really be used for development, then production is done with .sas (text) files. EGP can directly store the nodes as .sas files; ask a new question about that if you want to know more, but it's pretty easy to figure out. Best practice is to have EGP save the code modules as .sas files, then run those - SASPy will easily do that for you.
Second: If you use SAS's built-in Git connectivity, then you can do this a bit more easily I suspect. Consider doing that if you already use Git for your other processes. Again, then you end up with a .sas file, and can directly run that via SASPy.
So: how can you do this in Python, with the assumption you do have to use the .egp itself, without too many different moving parts? The key here is the .egp format. EGP is a container file, which is actually a .zip format container that has in it, among other things, all of the SAS code you want to run, as text. Text in xml format, but still, text.
You can write a python program that opens the .egp as a .zip file, using the zipfile library, and then use xml.etree.ElementTree to parse the project.xml file inside that project. Exactly what you do from there depends on your particular details, and is well out of scope for a Stack Overflow answer, but if you do better visually you can simply rename the .egp to .zip and then open in unzip program of your choice, then browse project.xml in your text editor, and find the nodes and code related to those nodes.
You can then extract the .sas code as text, and submit it directly via SASPy, or extract it to a .sas file and then submit that however you prefer (SASPy or something else).
I do something similar to this for a project - I don't actually run code from it, I'm just parsing it to verify that the correct programs were synced from the EGP to production - but it would be trivial to actually submit the code from what I've written, which is about 50 lines of code total. I may write a SGF paper this year or next year on this topic, in which case I'll try and remember to submit it here - or you can head over to my github page and see if it's there (in the future!).

Data storage for standalone python application

I want to make a python program (with a PyQt GUI, but I don't know whether that is relevant) that has to save some information that I want to store even when the program closes. Example for information I want to store:
The user can search for a file in a file dialog window. I want to start the file dialog window in the previously used directory, even if the program is closed in between file searches.
The user can enter their own categories to sort items, building up on some of my predefined categories. These new categories should be available the next time the program starts.
Now I'm wondering what the proper way to store such information is. Should I use pickle? A proper database (I know a tiny bit of sqlite3, but would have to read up on that)? A simple text file that I parse myself? One thing for data like in example 1., another for data like in example 2.?
Also, whatever way to store it I use, where would I put that file?
I'm asking in the context that I might want to later make my program available to others as a standalone application (using py2app, py2exe or PyInstaller).
Right now I'm just saving a pickle file in the directory that my .py file is in, like this answer reconmends, but the answer also specifically mentions:
for a personal project it might be enough.
(emphasis mine)
Is using pickle also the "proper, professional" way, if I want to make the program available to other people as a standalone application?

Choice depends on your approach to data you store, which is yours?:
user should be able to alter it without usage of my program
user should be prevented from altering it with program other than my program
If first you might consider deploying JSON open-standard file format, for which Python has ready library called json. In effect you get text (which you can save to file) which is human-readable and can be edited in text editor. Also there exist JSON file viewers and editors which made viewing/editing of JSON files easier.

I think SQLite3 is the better solution in this case as Moldovan commented.
There is a problem in pickle, sometimes pickling format can be change across python versions and there are greater advantages of using sqlite3.

How to generate a static .html with python

I'm looking for a python solution to create a static .html that can be sent out via email, either attached or embedded in the email (ignore this latter option if it requires a lot more work). I do not have requirements for what regards the layout of the .html. The focus here is in identifying the less painful solution for to generate an offline .html.
A potential solution could be along the lines of the following pseudo-code.
from some_unknown_pkg import StaticHTML
# Initialise instance
newsletter = StaticHTML()
# Append charts, tables and text to blank newsletter.
newsletter.append(text_here)
newsletter.append(interactive_chart_generated_with_plotly)
newsletter.append(more_text_here)
newsletter.append(a_png_file_loaded_from_local_pc)
# Save newsletter to .html, ready to be sent out.
newsletter.save_to_html('newsletter.html')
Where 'newsletter.html' can be opened in a whatever browser. Just to provide a bit more context, this .html is supposed to be sent out to a few selected people inside my company and contains sensible data. I'm using plotly to generate interactive charts to be inserted in the .html.

Possible solution here
Seems package in that answer is exactly you want. Docs: http://www.yattag.org/
Another pretty nice package here.

Start your python module with by importing sys module and redirect stdout to newsletter.html
import sys
sys.stdout = open('newsletter.html','w')
This will redirect any output generated to the html file. Now, just use the print command in python to transmit html tags to the file. For eg try:
print "<html>"
print "<p> This is my NewsLetter </p>"
print "</html>"`
This code snippet will create a basic HTML file. Now, you can open this file in any browser. For sending email you can use email and smtplib modules of python.

The Dominate package looks like it provides a simple and intuitive way to create HTML pages. https://www.yattag.org/

Search/Filter/Select/Manipulate data from a website using Python

I'm working on a project that basically requires me to go to a website, pick a search mode (name, year, number, etc), search a name, select amongst the results those with a specific type (filtering in other words), pick the option to save those results as opposed to emailing them, pick a format to save them then download them by clicking the save button.
My question is, is there a way to do those steps using a Python program? I am only aware of extracting data and downloading pages/images, but I was wondering if there was a way to write a script that would manipulate the data, and do what a person would manually do, only for a large number of iterations.
I've thought of looking into the URL structures, and finding a way to generate for each iteration the accurate URL, but even if that works, I'm still stuck because of the "Save" button, as I can't find a link that would automatically download the data that I want, and using a function of the urllib2 library would download the page but not the actual file that I want.
Any idea on how to approach this? Any reference/tutorial would be extremely helpful, thanks!
EDIT: When I inspect the save button here is what I get:
Search Button

This would depend a lot on the website your targeting and how their search is implemented.
For some websites, like Reddit, they have an open API where you can add a .json extension to a URL and get a JSON string response as opposed to pure HTML.
For using a REST API or any JSON response, you can load it as a Python dictionary using the json module like this
import json
json_response = '{"customers":[{"name":"carlos", "age":4}, {"name":"jim", "age":5}]}'
rdict = json.loads(json_response)
def print_names(data):
for entry in data["customers"]:
print(entry["name"])
print_names(rdict)

You should take a look at the Library of Congress docs for developers. If they have an API, you'll be able to learn about how you can do search and filter through their API. This will make everything much easier than manipulating a browser through something like Selenium. If there's an API, then you could easily scale your solution up or down.
If there's no API, then you have
Use Selenium with a browser(I prefer Firefox)
Try to get as much info generated, filtered, etc. without actually having to push any buttons on that page by learning how their search engine works with GET and POST requests. For example, if you're looking for books within a range, then manually conduct this search and look at how the URL changes. If you're lucky, you'll see that your search criteria is in the URL. Using this info you can actually conduct a search by visiting that URL which means your program won't have to fill out a form and push buttons, drop-downs, etc.
If you have to use the browser through Selenium(for example, if you want to save the whole page with html, css, js files then you have to press ctrl+s then click "save" button), then you need to find libraries that allow you to manipulate the keyboard within Python. There are such libraries for Ubuntu. These libraries will allow you to press any keys on the keyboard and even do key combinations.
An example of what's possible:
I wrote a script that logs me in to a website, then navigates me to some page, downloads specific links on that page, visits every link, saves every page, avoids saving duplicate pages, and avoids getting caught(i.e. it doesn't behave like a bot by for example visiting 100 pages per minute).
The whole thing took 3-4 hours to code and it actually worked in a virtual Ubuntu machine I had running on my Mac which means while it was doing all that work I could do use my machine. If you don't use a virtual machine, then you'll either have to leave the script running and not interfere with it or make a much more robust program that IMO is not worth coding since you can just use a virtual machine.

python open web pages in batch mode

System: Dell, Windows 7
Setup: PyCharm 5.0.2 + anaconda 2.7.11-0
Limitations: My preference would be to use a Mac, but my work requires me to use a Windows machine. I am familiar with the unix/linux terminals but unfamiliar with the windows command prompt.
Problem: I would like to open multiple webpages that use a Bing (or Google) search on a list of keywords. (I have entered into a new field and need to look up a hundreds of unknown terms. I am too lazy to manually do the search for each term.)
My attempts:
I have not been successful in finding an application that does Bing or Google searches in batch mode. Thus, I have decided to write my own script.
I recalled seeing that webpages could be rendered directly in the jupyter with the HTML() from IPython.display:
# import modules
from IPython.display import HTML
# create the list of keywords
keys = ['grumpy+cat','garfield+cat','basement+cat']
# loop through each key and display the search webpage
for k in keys:
HTML('<iframe src=https://www.google.com/search?q='+k+' width=700 height=350></iframe>')
Alas, nothing happens. I did check this:
HTML('<iframe src=https://www.google.com/search?q='+k[0]+' width=700 height=350></iframe>')
And that seems to be fine, but I need automation.
I moved on to a combination of input file, python command generating script, and batch script (to run in command prompt, which would be equivalent to a shell script like bash or tcsh).
Contents of 'search_list.txt':
grumpy+cat
garfield+cat
basement+cat
Contents of 'batch_search.py':
# Do a batch bing search in firefox using a keyword search list
# Make import statements
import numpy as np
# Read in the data
keys = np.loadtxt("search_list.txt", dtype='str')
# Output to a batch file
f1=open('./batch_search.bat', 'w+')
for k in keys:
command = "start firefox http://www.bing.com/search?q="+s+"\r\n"
f1.write(command)
# Why doesn't this line work?
#%%!batch_search.bat
Contents of 'run_batch_search.bat':
ipython batch_search.py
batch_search.bat
Finally, run in command prompt using:
run_batch_search.bat
What now?:
The above does open multiple tabs in firefox pointed to the bing search results for the inputed keywords, but I would like this to be a bit more streamlined.
Ideally, I would be able to (most preferably) open the web browsers pointed to the google/bing search directly from the python script. Solutions?
Otherwise, how can I properly format the last line of 'batch_search.py' to get the call to the command prompt to work?
More Information for #Kris (see comments below): The main goal is to look up via web searches hundreds of unknown terms for self education. For example, if I look up https://www.google.com/search?q=garfield+cat in firefox, the wikipedia results for Garfield cat pops up in the right-hand column of the page. If for one of my own search terms, the Bing popup results appear to be accurate, then I retain that as my reference information and I move on to the results for the next keyword. If it does not appear to be a good match, then I must continue to read the search results and follow links until I find an appropriate match.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.