How to generate a static .html with python - python

I'm looking for a python solution to create a static .html that can be sent out via email, either attached or embedded in the email (ignore this latter option if it requires a lot more work). I do not have requirements for what regards the layout of the .html. The focus here is in identifying the less painful solution for to generate an offline .html.
A potential solution could be along the lines of the following pseudo-code.
from some_unknown_pkg import StaticHTML
# Initialise instance
newsletter = StaticHTML()
# Append charts, tables and text to blank newsletter.
newsletter.append(text_here)
newsletter.append(interactive_chart_generated_with_plotly)
newsletter.append(more_text_here)
newsletter.append(a_png_file_loaded_from_local_pc)
# Save newsletter to .html, ready to be sent out.
newsletter.save_to_html('newsletter.html')
Where 'newsletter.html' can be opened in a whatever browser. Just to provide a bit more context, this .html is supposed to be sent out to a few selected people inside my company and contains sensible data. I'm using plotly to generate interactive charts to be inserted in the .html.

Possible solution here
Seems package in that answer is exactly you want. Docs: http://www.yattag.org/
Another pretty nice package here.

Start your python module with by importing sys module and redirect stdout to newsletter.html
import sys
sys.stdout = open('newsletter.html','w')
This will redirect any output generated to the html file. Now, just use the print command in python to transmit html tags to the file. For eg try:
print "<html>"
print "<p> This is my NewsLetter </p>"
print "</html>"`
This code snippet will create a basic HTML file. Now, you can open this file in any browser. For sending email you can use email and smtplib modules of python.

The Dominate package looks like it provides a simple and intuitive way to create HTML pages. https://www.yattag.org/

Related

Print PDFs automatically with python

I am building a website which accepts pdf from users and the options, like pages to print, copies, color or black&white and the shop from which they want to get it printed.
The pdf will be stored in server and will be passed on to the shop to print. How do i get it printed automatically with those options applied. One way i thought was to edit the pdf and sent to the store to print with the options applied.
How do i print the pdf automatically and report back to the server that the pdf was printed?
chose python as it may have easy implementation.
BTW i'll build website using NodeJS
You can do the following:
import os
os.startfile("C:/Users/TestFile.txt", "print")
This will start the file, in its default opener, with the verb 'print', which will print to your default printer.Only requires the os module which comes with the standard library
This only works on windows. So if you want it to work on other OS's you'll need a way to detect which OS the pdf is being sent to.

How to move Confluence pages along with the contents from one space to another using Python?

I'm trying to write a Python script based on the atlassian-python-api module which will copy the spaces from one space and the create them in another space hosted in a different server using the following commands:
pages = sourceConfluence.get_all_pages_from_space(space = source_Space, start=0, limit=100, status=None, expand='body.storage.content', content_type='page')
for i in pages:
status = destConfluence.create_page(space = dest_Space, title=i['title'], body=i['body'], parent_id=None, type='page', representation='storage')
This works fine until pages with content like pdf or images comes in. In that case, it creates a invalid link for the contents in the newly generated pages.
How can I move the pages with the content intact using the wrapper or Confluence REST API directly?
As per my understanding, there’s no proper way to do this programmatically as of now using Python.
The only ways we can achieve this are as follows:
Copy Paste the pages manually one by one from edit mode.
XML Export & Import the space by putting the XML file in the JIRA server.
Using Bobswift CLI tool which supports this functionality.

Convert Wikipedia/MediaWiki's code into HTML using python

I am trying to grab content from Wikipedia and use the HTML of the article. Ideally I would also like to be able to alter the content (eg, hide certain infoboxes etc).
I am able to grab page content using mwclient:
>>> import mwclient
>>> site = mwclient.Site('en.wikipedia.org')
>>> page = site.Pages['Samuel_Pepys']
>>> print page.text()
{{Redirect|Pepys}}
{{EngvarB|date=January 2014}}
{{Infobox person
...
But I can't see a relatively simple, lightweight way to translate this wikicode into HTML using python.
Pandoc is too much for my needs.
I could just scrape the original page using Beautiful Soup but that doesn't seem like a particularly elegant solution.
mwparserfromhell might help in the process, but I can't quite tell from the documentation if it gives me anything I need and don't already have.
I can't see an obvious solution on the Alternative Parsers page.
What have I missed?
UPDATE: I wrote up what I ended up doing, following the discussion below.
page="""<html>
your pretty html here
<div id="for_api_content">%s</div>
</html>"""
Now you can grab your raw content with your API and just call
generated_page = page%api_content
This way you can design any HTML you want and just insert the API content in a designed spot.
Those APIs that you are using are designed to return raw content so it's up to you to style how you want the raw content to be displayed.
UPDATE
Since you showed me the actual output you are dealing with I realize your dilemma. However luckily for you there are modules that already parse and convert to HTML for you.
There is one called mwlib that will parse the wiki and output to HTML, PDF, etc. You can install it with pip using the install instructions. This is probably one of your better options since it was created in cooperation between Wikimedia Foundation and PediaPress.
Once you have it installed you can use the writer method to do the dirty work.
def writer(env, output, status_callback, **kwargs): pass
Here are the docs for this module: http://mwlib.readthedocs.org/en/latest/index.html
And you can set attributes on the writer object to set the filetype (HTML, PDF, etc).
writer.description = 'PDF documents (using ReportLab)'
writer.content_type = 'application/pdf'
writer.file_extension = 'pdf'
writer.options = {
'coverimage': {
'param': 'FILENAME',
'help': 'filename of an image for the cover page',
}
}
I don't know what the rendered html looks like but I would imagine that it's close to the actual wiki page. But since it's rendered in code I'm sure you have control over modifications as well.
I would go with HTML parsing, page content is reasonably semantic (class="infobox" and such), and there are classes explicitly meant to demarcate content which should not be displayed in alternative views (the first rule of the print stylesheet might be interesting).
That said, if you really want to manipulate wikitext, the best way is to fetch it, use mwparserfromhell to drop the templates you don't like, and use the parse API to get the modified HTML. Or use the Parsoid API which is a partial reimplementation of the parser returning XHTML/RDFa which is richer in semantic elements.
At any rate, trying to set up a local wikitext->HTML converter is by far the hardest way you can approach this task.
The mediawiki API contains a (perhaps confusingly named) parse action that in effect renders wikitext into HTML. I find that mwclient's faithful mirroring of the API structure sometimes actually gets in the way. There's a good example of just using requests to call the API to "parse" (aka render) a page given its title.

Generate ODT/DOC(X) and convert to PDF, without OO.o/MS

I have a WSGI application that generates invoices and stores them as PDF.
So far I have solved similar problems with FPDF (or equivalents), generating the PDF from scratch like a GUI. Sadly this means the entire formatting logic (positioning headers, footers and content, styling) is in the application, where it really shouldn't be.
As the templates already exist in Office formats (ODT, DOC, DOCX), I would prefer to simply use those as a basis and fill in the actual content. I've found the Appy framework, which does pretty much that with annotated ODT files.
That still leaves the bigger problem open, tho: converting ODT (or DOC, or DOCX) to PDF. On a server. Running Linux. Without GUI libraries. And thus, without OO.o or MS Office.
Is this at all possible or am I better off keeping the styling in my code?
The actual content that would be filled in is actually quite restricted: a few paragraphs, some of which may be optional, a headline or two, always at the same place, and a few rows of a table. In HTML this would be trivial.
EDIT: Basically, I want a library that can generate ODT files from ODF files acting as templates and a library that can convert the result into PDF (which is probably the crux).
I don't know how to go about automatic ODT -> PDF conversion, but a simpler route might be to generate your invoices as HTML and convert them to PDF using http://www.xhtml2pdf.com/. I haven't tried the library myself, but it definitely seems promising.
You can use QTextDocument, QTextCursor and QTextDocumentWriter in PyQt4. A simple example to show how to write to an odt file:
>>>from pyqt4 import QtGui
# Create a document object
>>>doc = QtGui.QTextDocument()
# Create a cursor pointing to the beginning of the document
>>>cursor = QtGui.QTextCursor(doc)
# Insert some text
>>>cursor.insertText('Hello world')
# Create a writer to save the document
>>>writer = QtGui.QTextDocumentWriter()
>>>writer.supportedDocumentFormats()
[PyQt4.QtCore.QByteArray(b'HTML'), PyQt4.QtCore.QByteArray(b'ODF'), PyQt4.QtCore.QByteArray(b'plaintext')]
>>>odf_format = writer.supportedDocumentFormats()[1]
>>>writer.setFormat(odf_format)
>>>writer.setFileName('hello_world.odt')
>>>writer.write(doc) # Return True if successful
True
If not sure the difference between odt and odf in this case. I checked the file type and it said 'application/vnd.oasis.opendocument.text'. So I assume it is odt. You can print to a pdf file by using QPrinter.
More information at:
http://qt-project.org/doc/qt-4.8/

How do i output a dynamically generated web page to a .html page instead of .py cgi page?

So ive just started learning python on WAMP, ive got the results of a html form using cgi, and successfully performed a database search with mysqldb. I can return the results to a page that ends with .py by using print statements in the python cgi code, but i want to create a webpage that's .html and have that returned to the user, and/or keep them on the same webaddress when the database search results return.
thanks
paul
edit: to clarify on my local machine, i see /localhost/search.html in the address bar i submit the html form, and receive a results page at /localhost/cgi-bin/searchresults.py. i want to see the results on /localhost/results.html or /localhost/search.html. if this was on a public server im ASSUMING it would return .../cgi-bin/searchresults.py, the last time i saw /cgi-bin/ directories was in the 90s in a url. ive glanced at addhandler, as david suggested, im not sure if thats what i want.
edit: thanks all of you for your input, yep without using frameworks, mod_rewrite seems the way to go, but having looked at that, I decided to save myself the trouble and go with django with mod_wsgi, mainly because of the size of its userbase and amount of docs. i might switch to a lighter/more customisable framework, once ive got the basics
First, I'd suggest that you remember that URLs are URLs and that file extensions don't matter, and that you should just leave it.
If that isn't enough, then remember that URLs are URLs and that file extensions don't matter — and configure Apache to use a different rule to determine that is a CGI program rather than a static file to be served up as is. You can use AddHandler to add a handler for files on the hard disk with a .html extension.
Alternatively, you could use mod_rewrite to tell Apache that …/foo.html means …/foo.py
Finally, I'd suggest that if you do muck around with what URLs look like, that you remove any sign of something that looks like a file extension (so that …/foo is requested rather then …/foo.anything).
As for keeping the user on the same address for results as for the request … that is just a matter of having the program output the basic page without results if it doesn't get the query string parameters that indicate a search term had been passed.

Categories

Resources