Saving Ipython Notebook with html figures as pdf? - python

I am using Ipython Notebook and I would like to save the notebook as pdf. When a notebook contains html figures in markdown mode I cannot export them
In example:
<img src='http://draftingmanuals.tpub.com/14262/img/14262_140_2.jpg'>
represents the following:
However, when I download the notebook as PDF via LaTeX (pdf) the result is without the figure:
Is this a bug or can I avoid this somehow?

This is not really a bug, but a known limitation. Actually there are two issues in your example:
the raw html <img> tag gets stripped when the markdown cells are converted by pandoc to latex (see pandoc docu).
you link to a remote image, which is (currently) not downloaded prior to the conversion.
Thus, it is a bit tricky to get what you desire. The first issue may be overcome by means of a custom filter and custom template. For the second, you may need a custom preprocessor.
Alternatively, you could use python with urllib (e.g. Downloading a picture via urllib and python) and matplotlib to display this image. Such embedded images are converted fine.

Related

minimize my nbconvert html output screen size

I searched a lot for this issue but didn't come to any straight to the point answer, so I am turning to you here and hopefully someone can help direct me to the right path at least.
The issue is simple, I have normal jupyter Nb and I would like to share it with others by sending them html format file. Using the normal !jupyter nbconvert --to html mynotebook.ipynbwill get the html export, but recently I started getting output with very wide screen (it uses the monitor screen).
How can I change the output screen size to maintain the same configurations even after exporting it to html?
In case my explanation wasn't clear I will add pictures:
notebook before exporting:
After exporting:
I don't want to create any special template, I just want to maintain the same parameters before exporting i.e. the width of the cells (inputs and outputs). Most of the answers I found here was talking about creating my own template or running some css code (both I don't have knowledge in...). Is there a ready to use template or argument that I can use to maintain the same layout?
I ran into the same issue. I think the problem might be that --to html uses the Jupyter Lab template by default. Once I added --template classic to my call to nbconvert the resulting HTML-file was much smaller and resembled the actual Jupyter notebook much more closely.

Jupiter: nbconvert does convert HTML image to LaTex

I love using jupyter notebooks to document topics for my physics course, so I am having my students use CoCalc and either Markdown or Jupyter notebook to write their lab reports. I have a problem.
In a jupyter notebook, the Markdown way to insert an image is ![Two slit](twoSlit1.png), for example. However you can not control the size, location and wrapping. Stack Exchange helpfully has many suggestions [here] to control the size 1.
However, when I try to convert it to pdf, the images get lost if they are inserted using HTML. For example
![Two slit](twoSlit1.png)
converted with jupyter nbconvert Example.ipynb --to html works fine, but with no control over image size of placement. I can then print this to pdf, but it does not have a good "document" look. But
<img align="right" src="twoSlit1.png" width="200" />
converted with jupyter nbconvert Example.ipynb --to pdf gives a pdf rendering, but the images are missing. I tried many different HTML image codings, but none work with nbconvert --to pdf
The CoCalc File -> Download as PDF via LaTex (.pdf)... does note render the images either.
I get similar results on CoCalc if I create a Markdown document and use pandoc Example1.md -o Example1.pdf to create a pdf file.
BTW, both render the equations perfectly.
I also get the same results on my Mac running jupyter locally (Anaconda distribution, python 3.7)
I got prompt answers William Stein and Harald Schilly at CoCalc when I posted a support request. Here is my answer.
The issue is that the Markdown (and probably the jupyter notebook Markdown cells) Preview pane is rendered in the frontend, not using pandoc in the backend, so when I used pandoc from the terminal I got a different result.
I could get HTML that looks reasonable by putting a HTML <style> tag at the beginning of the Markdown document and putting CSS code in the <style> tag. Then I put my image in an HTML <div> that used the CSS to set its size, float it right, add a caption, etc. I also reduced the size of the header fonts, set the font-family to serif, etc in the CSS file. Then the Preview looked OK, but there were still problems when I printed the Preview and saved it as pdf. So:
Just use Markdown and use HTML for just a few things: a) A <div class="center"> to center the title, author, ... b) <blockquote> to style the abstract. c) A <div class=figure width="280px"> to insert floating figures.
Create a file style.css and run pandoc from the commandline using
pandoc Example.md -s --css="style.css" --mathjax -o Example.html
Open Example.html in CoCalc and use the print icon in the rendered pane and save the pdf file.
I am satisfied. The students can get decent reports without the extra labor of working with LaTeX and with a minimum of HTML. I give them the CSS file.

Can't get text out of PDF file with PyPDF2

I am trying to get the text from a PDF file I downloaded with PyPDF.
Here is my code:
if not PyPDF2.PdfFileReader('download.pdf').isEncrypted:
PyPDF2.PdfFileReader('download.pdf').getPage(0).extractText()
This is the output:
'\n\n˘ˇ˘ˆ˙\n˝˛˚˜!\n\n\n\n#\nˇ˘ˆ˙ˆ˝˛˝\n˙˙˘ ˘ˆ"˝\n$!%˙(˝)˙*˜+,˝-.#/.(#0)0)/.1.+02345.\n˛˛ˇ/#.$/0/70/#.+322.32˙˘˛˘˘\n˛˘ 8˙˘9:˘ˆ;\n˛˘\n\n˝=\n˙˘˛\n.ˇ<9:˘ˇˇ%˘˛ˇ ˘˘<˘\n˝>"?˝˘$#<˘*ˆˆ˘˙˘A˘B˘˙˘˛ˇ!˛˘˙˘˛ˇ˘\n1C˙ˆ˘06˛˘8+˛9:˘D10+E˝ˆ˘8\n$˘˘9:˘˘1C˙ˆ˘+˘F˛˘D$1+FE˝˘˛˘˘<˘?˝\n////)*˘1˘˛ ?GG˜*HI\nD˘˙A˘E\nJ$\n˛\nDLE///M˛˝˛˙˘˛˘˛\n˛˘˛>"?\n˙˘˛\n˛\n/)M6;˝˛˙˘˛˘\n˛\n///˛\n\n'
When I open the file its content is fine. Also when I use another program to transform pdf into txt it works fine. It is a javascript rendered pdf on a webpage, don't know if it makes any difference.
Under Win 7, Python 3.6, I had the problem that PyPDF2 did not properly encode some PDF files. My solution was to use pdfminer.six.
pip install pdfminer.six
To extract text from a PDF, you can use functions such as the one in this post: https://stackoverflow.com/a/42154976/9524424
Worked perfect for me...
The following is taken from the documentation (https://pythonhosted.org/PyPDF2/PageObject.html)
extractText() Locate all text drawing commands, in the order they are
provided in the content stream, and extract the text. This works well
for some PDF files, but poorly for others, depending on the generator
used. This will be refined in the future. Do not rely on the order of
text coming out of this function, as it will change if this function
is made more sophisticated. Returns: a unicode string object.
So, it seems that the performance of this function depends on the pdf itself.

save ipython html object as pdf

I'm plotting some data using shap, which returns me (in console) a
<IPython.core.display.HTML object>. I would like to convert the object to a PDF and save it.
I do understand that there is some hassle, as I probably would need to simulate a browser to open and view the HTML in.
What would be the most straight-forward way of storing the html output as PDF?
Please note that I am not inside a browser/iPython notebook, and I do not want to create and convert a whole iPython notebook.

Is it possible to redirect cell output in jupyter

I am using jupyter and jupyter-nbconvert to create a html presentation. However, I have some cells that produce an output image that I want to share on a separate slide. Is it possible to redirect the output of one cell to its own slide?
You might want to consider using Damian Vila's Jupyter extension RISE. It provides some of the control you need for how cells are displayed in slides.
It is flagged by the latest Jupyter (3.6) as possibly not compatible, but I've seen no problems using it so far.

Categories

Resources