Is it possible to embed rendered HTML output into IPython output?
One way is to use
from IPython.core.display import HTML
HTML('link')
or (IPython multiline cell alias)
%%html
link
Which return a formatted link, but
This link doesn't open a browser with the webpage itself from the console. IPython notebooks support honest rendering, though.
I'm unaware of how to render HTML() object within, say, a list or pandas printed table. You can do df.to_html(), but without making links inside cells.
This output isn't interactive in the PyCharm Python console (because it's not QT).
How can I overcome these shortcomings and make IPython output a bit more interactive?
This seems to work for me:
from IPython.core.display import display, HTML
display(HTML('<h1>Hello, world!</h1>'))
The trick is to wrap it in display as well.
Source: http://python.6.x6.nabble.com/Printing-HTML-within-IPython-Notebook-IPython-specific-prettyprint-tp5016624p5016631.html
Edit:
from IPython.display import display, HTML
In order to avoid:
DeprecationWarning: Importing display from IPython.core.display is
deprecated since IPython 7.14, please import from IPython display
Some time ago Jupyter Notebooks started stripping JavaScript from HTML content [#3118]. Here are two solutions:
Serving Local HTML
If you want to embed an HTML page with JavaScript on your page now, the easiest thing to do is to save your HTML file to the directory with your notebook and then load the HTML as follows:
from IPython.display import IFrame
IFrame(src='./nice.html', width=700, height=600)
Serving Remote HTML
If you prefer a hosted solution, you can upload your HTML page to an Amazon Web Services "bucket" in S3, change the settings on that bucket so as to make the bucket host a static website, then use an Iframe component in your notebook:
from IPython.display import IFrame
IFrame(src='https://s3.amazonaws.com/duhaime/blog/visualizations/isolation-forests.html', width=700, height=600)
This will render your HTML content and JavaScript in an iframe, just like you can on any other web page:
<iframe src='https://s3.amazonaws.com/duhaime/blog/visualizations/isolation-forests.html', width=700, height=600></iframe>
Related: While constructing a class, def _repr_html_(self): ... can be used to create a custom HTML representation of its instances:
class Foo:
def _repr_html_(self):
return "Hello <b>World</b>!"
o = Foo()
o
will render as:
Hello World!
For more info refer to IPython's docs.
An advanced example:
from html import escape # Python 3 only :-)
class Todo:
def __init__(self):
self.items = []
def add(self, text, completed):
self.items.append({'text': text, 'completed': completed})
def _repr_html_(self):
return "<ol>{}</ol>".format("".join("<li>{} {}</li>".format(
"☑" if item['completed'] else "☐",
escape(item['text'])
) for item in self.items))
my_todo = Todo()
my_todo.add("Buy milk", False)
my_todo.add("Do homework", False)
my_todo.add("Play video games", True)
my_todo
Will render:
☐ Buy milk
☐ Do homework
☑ Play video games
Expanding on #Harmon above, looks like you can combine the display and print statements together ... if you need. Or, maybe it's easier to just format your entire HTML as one string and then use display. Either way, nice feature.
display(HTML('<h1>Hello, world!</h1>'))
print("Here's a link:")
display(HTML("<a href='http://www.google.com' target='_blank'>www.google.com</a>"))
print("some more printed text ...")
display(HTML('<p>Paragraph text here ...</p>'))
Outputs something like this:
Hello, world!
Here's a link:
www.google.com
some more printed text ...
Paragraph text here ...
First, the code:
from random import choices
def random_name(length=6):
return "".join(choices("abcdefghijklmnopqrstuvwxyz", k=length))
# ---
from IPython.display import IFrame, display, HTML
import tempfile
from os import unlink
def display_html_to_frame(html, width=600, height=600):
name = f"temp_{random_name()}.html"
with open(name, "w") as f:
print(html, file=f)
display(IFrame(name, width, height), metadata=dict(isolated=True))
# unlink(name)
def display_html_inline(html):
display(HTML(html, metadata=dict(isolated=True)))
h="<html><b>Hello</b></html>"
display_html_to_iframe(h)
display_html_inline(h)
Some quick notes:
You can generally just use inline HTML for simple items. If you are rendering a framework, like a large JavaScript visualization framework, you may need to use an IFrame. Its hard enough for Jupyter to run in a browser without random HTML embedded.
The strange parameter, metadata=dict(isolated=True) does not isolate the result in an IFrame, as older documentation suggests. It appears to prevent clear-fix from resetting everything. The flag is no longer documented: I just found using it allowed certain display: grid styles to correctly render.
This IFrame solution writes to a temporary file. You could use a data uri as described here but it makes debugging your output difficult. The Jupyter IFrame function does not take a data or srcdoc attribute.
The tempfile
module creations are not sharable to another process, hence the random_name().
If you use the HTML class with an IFrame in it, you get a warning. This may be only once per session.
You can use HTML('Hello, <b>world</b>') at top level of cell and its return value will render. Within a function, use display(HTML(...)) as is done above. This also allows you to mix display and print calls freely.
Oddly, IFrames are indented slightly more than inline HTML.
to do this in a loop, you can do:
display(HTML("".join([f"<a href='{url}'>{url}</a></br>" for url in urls])))
This essentially creates the html text in a loop, and then uses the display(HTML()) construct to display the whole string as HTML
Related
I am writing and coming up with code illustrations in Jupyter Notebook. My use case then is to take the final code from certain code cells and put it in an HTML document. I have found a very good pipeline to use pygments package which highlights the code for me and puts it into proper HTML.
from pygments import highlight
from pygments.lexers import PythonLexer
from pygments.formatters import HtmlFormatter
def PyHighlight(code):
return highlight(code, PythonLexer(), HtmlFormatter())
PyHighlight("print('Hello world!')")
Output:
'<div class="highlight"><pre><span></span><span class="nb">print</span><span class="p">(</span><span class="s1">'Hello world!'</span><span class="p">)</span>\n</pre></div>\n'
But it's very tedious for me to convert each code cell into a string and then pass to the PyHighlight function and finally get the HTML.
Is there a way I can grab the content of each cell as string? Even better, can I trigger PyHighlight to run after every cell with the cell content as the argument to PyHightlight so I can just copy-paste the highlighted code HTML?
I took some inspiration from the IPython documentation on Custom Magic
Define and register the following magic in a cell.
from IPython.core.magic import (Magics, cell_magic)
#magics_class
class MyMagics(Magics):
#cell_magic
def pygmented(self, line, cell):
print(PyHighlight(cell))
get_ipython().register_magics(MyMagics)
After that, just add %%pygmented to the top of each cell, and after running the cell, the content of the cell would also be printed after all the highlighting (as asked in the question).
Using wxpython 4.1.0, Windows 10 x64, Python 3.7.7 x64...
What I want to achieve is pretty basic, but cannot figure it out from reading wxpython documentation and searching the internet.
I used python's native difflib module to create an HTML difference report. If you open this html difference report using any popular browser, Chrome, Firefox, Edge, and etc. you get nice highlighting that distinguishes the differences that were found.
When I open this same file using wxpythons wx.html.HtmlWindow() widget, the nice visuals are not shown. The highlighting that clearly shows the differences is not displayed and instead just text is displayed making it nearly impossible to find the differences.
wxpython is very complete in terms of functionality and I assume there is a way to achieve this using the wx.html.HtmlWindow() widget or a similar widget(s) in the wx.html API. I'm thinking the only way of achieving this is maybe using the wx.html2 API.
Minimal amount of code to view my problem (not including in classes for simplicity):
DIFFLIB Code:
import difflib
import pathlib as path
file1 = path.Path('text_file1.txt')
file2 = path.Path('text_file2.txt')
with file1.open() as file_obj1:
contents1 = file_obj1.readlines()
with file2.open() as file_obj2:
contents2 = file_obj2.readlines()
html = difflib.HtmlDiff().make_file(contents1, contents2, file1.name, file2.name)
with open('report.html', 'w') as file_obj:
file_obj.write(html)
GUI Code:
import wx
import wx.html
def html_window():
frame = wx.Frame(parent=None)
html = wx.html.HtmlWindow(frame)
html.LoadFile('report.html')
frame.Show()
app = wx.App(False)
html_window()
app.MainLoop()
Your problem lies with wx.html not supporting css.
wx.html2 does but even that may struggle with inline css which is what you have in the report.html file.
Still, you could always try wx.html2.
Other than that, why not use wx.LaunchDefaultBrowser("path_to_file/report.html")
One other option that springs to mind, is use:
difflib.HtmlDiff().make_table(contents1, contents2, file1.name, file2.name)
and then add your own css.
I'm trying to insert a picture into a Word document using python-docx but running into errors.
The code is simply:
document.add_picture("test.jpg", width = Cm(2.0))
From looking at the python-docx documentation I can see that the following XML should be generated:
<pic:pic xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture">
<pic:nvPicPr>
<pic:cNvPr id="1" name="python-powered.png"/>
<pic:cNvPicPr/>
</pic:nvPicPr>
<pic:blipFill>
<a:blip r:embed="rId7"/>
<a:stretch>
<a:fillRect/>
</a:stretch>
</pic:blipFill>
<pic:spPr>
<a:xfrm>
<a:off x="0" y="0"/>
<a:ext cx="859536" cy="343814"/>
</a:xfrm>
<a:prstGeom prst="rect"/>
</pic:spPr>
</pic:pic>
This does in fact get generated in my document.xml file. (When unzipping the docx file). However looking into the OOXML format I can see that the image should also be saved under the media folder and the relationship should be mapped in word/_rels/document.xml:
<Relationship Id="rId20"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image"
Target="media/image20.png"/>
None of this is happens however, and when I open the Word document I'm met with a "The picture can't be displayed" placeholder.
Can anyone help me understand what is going on?
It looks like the image is not embedded the way it should be and I need to insert it in the media folder and add the mapping for it, however as a well documented feature this should be working as expected.
UPDATE:
Testing it out with an empty docx file that image does get added as expected which leads me to believe it might have something to do with the python-docx-template library. (https://github.com/elapouya/python-docx-template)
It uses python-docx and jinja to allow templating capabilities but runs and works the same way python-docx should. I added the image to a subdoc which then gets inserted into a full document at a given place.
A sample code can be seen below (from https://github.com/elapouya/python-docx-template/blob/master/tests/subdoc.py):
from docxtpl import DocxTemplate
from docx.shared import Inches
tpl=DocxTemplate('test_files/subdoc_tpl.docx')
sd = tpl.new_subdoc()
sd.add_paragraph('A picture :')
sd.add_picture('test_files/python_logo.png', width=Inches(1.25))
context = {
'mysubdoc' : sd,
}
tpl.render(context)
tpl.save('test_files/subdoc.docx')
I'll keep this up in case anyone else manages to make the same mistake as I did :) I managed to debug it in the end.
The problem was in how I used the python-docx-template library. I opened up a DocxTemplate like so:
report_output = DocxTemplate(template_path)
DoThings(value,template_path)
report_output.render(dictionary)
report_output.save(output_path)
But I accidentally opened it up twice. Instead of passing the template to a function, when working with it, I passed a path to it and opened it again when creating subdocs and building them.
def DoThings(data,template_path):
doc = DocxTemplate(template_path)
temp_finding = doc.new_subdoc()
#DO THINGS
Finally after I had the subdocs built, I rendered the first template which seemed to work fine for paragraphs and such but I'm guessing the images were added to the "second" opened template and not to the first one that I was actually rendering. After passing the template to the function it started working as expected!
I came acrossed with this problem and it was solved after the parameter width=(1.0) in method add_picture removed.
when parameter width=(1.0) was added, I could not see the pic in test.docx
so, it MIGHT BE resulted from an unappropriate size was set to the picture,
to add pictures, headings, paragraphs to existing document:
doc = Document(full_path) # open an existing document with existing styles
for row in tableData: # list from the json api ...
print ('row {}'.format(row))
level = row['level']
levelStyle = 'Heading ' + str(level)
title = row['title']
heading = doc.add_heading( title , level)
heading.style = doc.styles[levelStyle]
p = doc.add_paragraph(row['description'])
if row['img_http_path']:
ip = doc.add_paragraph()
r = ip.add_run()
r.add_text(row['img_name'])
r.add_text("\n")
r.add_picture(row['img_http_path'], width = Cm(15.0))
doc.save(full_path)
I am trying to learn python and also create a web utility. One task I am trying to accomplish is creating a single html file which can be run locally but link to everything it needs to look like the original web page. (if you are going to ask why i want this, its because it may act of a part of a utility i am creating, or if not, just for education) So i have two questions, a theoretical one and a practical one:
1) Is this, for visual (as opposed to functional) purposes, possible? Can a html page work offline while linking to everything it needs online? or if their something fundamental about having the html file itself execute on the web server which does not allow this to be possible? How far can I go with it?
2) I have started a python script which de-relativises (made that one up) linked elements on a html page, but I am a noob so most likely I missed some elements or attributes which would also link to outside resources. I have noticed after trying a few pages that the one in the code below does not work properly, their appears to be a .js file which is not linking correctly. (the first of many problems to come) Assuming the answer to my first question was at least a partial yes, can anyone help me fix the code for this website?
Thank you.
Update, I missed the script tag on this, but even after I added it it still does not work correctly.
import lxml
import sys
from lxml import etree
from StringIO import StringIO
from lxml.html import fromstring, tostring
import urllib2
from urlparse import urljoin
site = "www.script-tutorials.com/advance-php-login-system-tutorial/"
output_filename = "output.html"
def download(site):
response = urllib2.urlopen("http://"+site)
html_input = response.read()
return html_input
def derealitivise(site, html_input):
active_html = lxml.html.fromstring(html_input)
for element in tags_to_derealitivise:
for tag in active_html.xpath(str(element+"[#"+"src"+"]")):
tag.attrib["src"] = urljoin("http://"+site, tag.attrib.get("src"))
for tag in active_html.xpath(str(element+"[#"+"href"+"]")):
tag.attrib["href"] = urljoin("http://"+site, tag.attrib.get("href"))
return lxml.html.tostring(active_html)
active_html = ""
tags_to_derealitivise = ("//img", "//a", "//link", "//embed", "//audio", "//video", "//script")
print "downloading..."
active_html = download(site)
active_html = derealitivise(site, active_html)
print "writing file..."
output_file = open (output_filename, "w")
output_file.write(active_html)
output_file.close()
Furthermore, I could make the code more through by checking all of the elements...
It would look kind of like this, but I do not know the exact way to iterate through all of the elements. This is a seperate problem, and I will most likely figure it out by the time anyone responds...:
def derealitivise(site, html_input):
active_html = lxml.html.fromstring(html_input)
for element in active_html.xpath:
for tag in active_html.xpath(str(element+"[#"+"src"+"]")):
tag.attrib["src"] = urljoin("http://"+site, tag.attrib.get("src"))
for tag in active_html.xpath(str(element+"[#"+"href"+"]")):
tag.attrib["href"] = urljoin("http://"+site, tag.attrib.get("href"))
return lxml.html.tostring(active_html)
update
Thanks to Burhan Khalid's solution, which seemed too simple to be viable at first glance, I got it working. The code is so simple most of you will most likely not require it, but I will post it anyway incase it helps:
import lxml
import sys
from lxml import etree
from StringIO import StringIO
from lxml.html import fromstring, tostring
import urllib2
from urlparse import urljoin
site = "www.script-tutorials.com/advance-php-login-system-tutorial/"
output_filename = "output.html"
def download(site):
response = urllib2.urlopen(site)
html_input = response.read()
return html_input
def derealitivise(site, html_input):
active_html = html_input.replace('<head>', '<head> <base href='+site+'>')
return active_html
active_html = ""
print "downloading..."
active_html = download(site)
active_html = derealitivise(site, active_html)
print "writing file..."
output_file = open (output_filename, "w")
output_file.write(active_html)
output_file.close()
Despite all of this, and its great simplicity, the .js object running on the website I have listed in the script still will not load correctly. Does anyone know if this is possible to fix?
while i am trying to make only the html file offline, while using the
linked resources over the web.
This is a two step process:
Copy the HTML file and save it to your local directory.
Add a BASE tag in the HEAD section, and point the href attribute of it to the absolute URL.
Since you want to learn how to do it yourself, I will leave it at that.
#Burhan has an easy answer using <base href="..."> tag in the <head>, and it works as you have found out. I ran the script you posted, and the page downloaded fine. As you noticed, some of the JavaScript now fails. This can be for multiple reasons.
If you are opening the HTML file as a local file:/// URL, the page may not work. Many browsers heavily sandbox local HTML files, not allowing them to perform network requests or examine local files.
The page may perform XmlHTTPRequests or other network operations to the remote site, which will be denied for cross domain scripting reasons. Looking in the JS console, I see the following errors for the script you posted:
XMLHttpRequest cannot load http://www.script-tutorials.com/menus.php?give=menu. Origin http://localhost:8000 is not allowed by Access-Control-Allow-Origin.
Unfortunately, if you do not have control of www.script-tutorials.com, there is no easy way around this.
I need to convert markdown text to plain text format to display summary in my website. I want the code in python.
Despite the fact that this is a very old question, I'd like to suggest a solution I came up with recently. This one neither uses BeautifulSoup nor has an overhead of converting to html and back.
The markdown module core class Markdown has a property output_formats which is not configurable but otherwise patchable like almost anything in python is. This property is a dict mapping output format name to a rendering function. By default it has two output formats, 'html' and 'xhtml' correspondingly. With a little help it may have a plaintext rendering function which is easy to write:
from markdown import Markdown
from io import StringIO
def unmark_element(element, stream=None):
if stream is None:
stream = StringIO()
if element.text:
stream.write(element.text)
for sub in element:
unmark_element(sub, stream)
if element.tail:
stream.write(element.tail)
return stream.getvalue()
# patching Markdown
Markdown.output_formats["plain"] = unmark_element
__md = Markdown(output_format="plain")
__md.stripTopLevelTags = False
def unmark(text):
return __md.convert(text)
unmark function takes markdown text as an input and returns all the markdown characters stripped out.
The Markdown and BeautifulSoup (now called beautifulsoup4) modules will help do what you describe.
Once you have converted the markdown to HTML, you can use a HTML parser to strip out the plain text.
Your code might look something like this:
from bs4 import BeautifulSoup
from markdown import markdown
html = markdown(some_html_string)
text = ''.join(BeautifulSoup(html).findAll(text=True))
This is similar to Jason's answer, but handles comments correctly.
import markdown # pip install markdown
from bs4 import BeautifulSoup # pip install beautifulsoup4
def md_to_text(md):
html = markdown.markdown(md)
soup = BeautifulSoup(html, features='html.parser')
return soup.get_text()
def example():
md = '**A** [B](http://example.com) <!-- C -->'
text = md_to_text(md)
print(text)
# Output: A B
Commented and removed it because I finally think I see the rub here: It may be easier to convert your markdown text to HTML and remove HTML from the text. I'm not aware of anything to remove markdown from text effectively but there are many HTML to plain text solutions.
I came here while searching for a way to perform s.c. GitLab Releases via API call. I hope this matches the use case of the original questioner.
I decoded markdown to plain text (including whitespaces in the form of \n etc.) in that way:
with open("release_note.md", 'r') as file:
release_note = file.read()
description = bytes(release_note, 'utf-8')
return description.decode("utf-8")