Get only the code out of Jupyter Notebook - python

Is there a solution to pull out all the code of the notebook?
For example, if I wanted to generate a source file of my notebook "source.py" that contained all the code in the code cells of the notebook, is that possible?
Thanks!

nbconvert
You can use the command line tool nbconvert to convert the ipynb file to various other formats.
The easiest way to convert it to a .py file is:
jupyter nbconvert --no-prompt --to script notebook_name.ipynb
It outputs only the code and comments without the markdown, input and output prompts. There is also --stdout option.
nbconvert documentation
jq
But you can also just parse the JSON of the notebook using jq:
jq -j '
.cells
| map( select(.cell_type == "code") | .source + ["\n\n"] )
| .[][]
' \
notebook.ipynb > source.py
jq homepage
Jupyter Notebook format

You can do File -> Download as -> Python (.py) — this should export all code cells as single .py file

In case you are using jupyter lab then the option is:
File > Export Notebook As > Executable Script

Since the notebook format is JSON it's relatively easy to extract just the text content of only the code cells. The task is made even easier when you use the Python API for working with notebook files.
The following will get you the code on standard output. You can handle it in other ways similarly easily. Bear in mind code source may not have a terminating newline.
from nbformat import read, NO_CONVERT
with open("Some Notebook.ipynb") as fp:
notebook = read(fp, NO_CONVERT)
cells = notebook['cells']
code_cells = [c for c in cells if c['cell_type'] == 'code']
for cell in code_cells:
print(cell['source'])
Notebook nodes are a little more flexible than dictionaries, though, and allow attribute (.name) access to fields as well as subscripting (['name']). As a typing-challenged person I find it preferable to write
cells = notebook.cells
code_cells = [c for c in cells if c.cell_type == 'code']
for cell in code_cells:
print(cell.source)
In answering this question I became aware that the nbformat library has been unbundled, and can therefore be installed with pip without the rest of Jupyter.

There is an "ugly" solution. Select all the cells of your notebook. Merge them, then just copy and paste all the code.

Related

Python library for dynamic documents

I want to write a script that generates reports for each team in my unit where each report uses the same template, but where the numbers specific to each team is used for each report. The report should be in a format like .pdf that non-programmers know how to open and read. This is in many ways similar to rmarkdown for R, but the reports I want to generate are based on data from code already written in python.
The solution I am looking for does not need to export directly to pdf. It can export to markdown and then I know how to convert. I do not need any fancier formatting than what markdown provides. It does not need to be markdown, but I know how to do everything else in markdown, if I only find a way to dynamically populate numbers and text in a markdown template from python code.
What I need is something that is similar to the code block below, but on a bigger scale and instead of printing output on screen this would saved to a file (.md or .pdf) that can then be shared with each team.
user = {'name':'John Doe', 'email':'jd#example.com'}
print('Name is {}, and email is {}'.format(user["name"], user["email"]))
So the desired functionality heavily influenced by my previous experience using rmarkdown would look something like the code block below, where the the template is a string or a file read as a string, with placeholders that will be populated from variables (or Dicts or objects) from the python code. Then the output can be saved and shared with the teams.
user = {'name':'John Doe', 'email':'jd#example.com'}
template = 'Name is `user["name"]`, and email is `user["email"]`'
output = render(template, user)
When trying to find a rmarkdown equivalent in python, I have found a lot of pointers to Jupyter Notebook which I am familiar with, and very much like, but it is not what I am looking for, as the point is not to share the code, only a rendered output.
Since this question was up-voted I want to answer my own question, as I found a solution that was perfect for me. In the end I shared these reports in a repo, so I write the reports in markdown and do not convert them to PDF. The reason I still think this is an answer to my original quesiton is that this works similar to creating markdown in Rmarkdown which was the core of my question, and markdown can easily be converted to PDF.
I solved this by using a library for backend generated HTML pages. I happened to use jinja2 but there are many other options.
First you need a template file in markdown. Let say this is template.md:
## Overview
**Name:** {{repo.name}}<br>
**URL:** {{repo.url}}
| Branch name | Days since last edit |
|---|---|
{% for branch in repo.branches %}
|{{branch[0]]}}|{{branch[1]}}|
{% endfor %}
And then you have use this in your python script:
from jinja2 import Template
import codecs
#create an dict will all data that will be populate the template
repo = {}
repo.name = 'training-kit'
repo.url = 'https://github.com/github/training-kit'
repo.branches = [
['master',15],
['dev',2]
]
#render the template
with open('template.md', 'r') as file:
template = Template(file.read(),trim_blocks=True)
rendered_file = template.render(repo=repo)
#output the file
output_file = codecs.open("report.md", "w", "utf-8")
output_file.write(rendered_file)
output_file.close()
If you are OK with your dynamic doc being in markdown you are done and the report is written to report.py. If you want PDF you can use pandoc to convert.
I would strongly recommend to install and use the pyFPDF Library, that enables you to write and export PDF files directly from python. The Library was ported from php and offers the same functionality as it's php-variant.
1.) Clone and install pyFPDF
Git-Bash:
git clone https://github.com/reingart/pyfpdf.git
cd pyfpdf
python setup.py install
2.) After successfull installation, you can use python code similar as if you'd work with fpdf in php like:
from fpdf import FPDF
pdf = FPDF()
pdf.add_page()
pdf.set_xy(0, 0)
pdf.set_font('arial', 'B', 13.0)
pdf.cell(ln=0, h=5.0, align='L', w=0, txt="Hello", border=0)
pdf.output('myTest.pdf', 'F')
For more Information, take a look at:
https://pypi.org/project/fpdf/
To work with pyFPDF clone repo from: https://github.com/reingart/pyfpdf
pyFPDF Documentation:
https://pyfpdf.readthedocs.io/en/latest/Tutorial/index.html

Jupyter nbconvert LaTex Export Theme

I am using Jupyter Notebook nbconvert (Save as menu) to export as pdf via Latex. However, the pdf file is not in a good shape. For example, some wide tables are shown well. I would prefer to have a box for tables to be resized to the width of the page. Is there any style, template that I can use to have nice reports and how I may ask nbconverter to use that style?
Here is the Latex output:
I would like something like this:
Looks like Pandas gained a ._repr_latex_() method in version 0.23. You'll need to set pd.options.display.latex.repr=True to activate it.
Without latex repr:
With latex repr:
Check out the options to get the formatting close to what you want. In order to match your desired output exactly, you'll need to use a custom latex template.
Edited to provide more information on templates:
Start here for general information about templates. You can create a .tplx file in the same path as your notebook and specify it as the template when running nbconvert from the command line: !jupyter nbconvert --to python 'example.ipynb' --stdout --template=my_custom_template.tplx. Alternatively, you can specify a default template to use when exporting as Latex via the menu by modifying the jupyter_notebook_config.py file in your ~.jupyter directory. If this file doesn't exist already, you can generate it by running the command jupyter notebook --generate-config from the command line. I have my template sitting in the ~/.jupyter directory as well, so I added the following to my jupyter_notebook_config.py:
# Insert this at the top of the file to allow you to reference
# a template in the ~.jupyter directory
import os.path
import sys
sys.path.insert(0, os.path.expanduser("~") + '/.jupyter')
# Insert this at the bottom of the file:
c.LatexExporter.template_file = 'my_template' # no .tplx extension here
c.LatexExporter.template_path = ['.', os.path.expanduser("~") + '/.jupyter'] # nbconvert will look in ~/.jupyter
To understand a bit about how the templates work, start by taking a look at null.tplx. The line ((*- for cell in nb.cells -*)) loops over all the cells in the notebook. The if statements that follow check the type of each cell and call the appropriate block.
The other templates extend null.tplx. Each template defines (or redefines) some of the blocks. The hierarchy is null->display_priority->document_contents->base->style_*->article.
Your custom template should probably extend article.tplx and add some Latex commands to the header that sets up the tables the way you want. Take a look at this blog post for an example of setting up a custom template.
Any setting that change the table size to fit it in the width of the page?
Latex code is something like this: \resizebox*{\textwidth}{!}{%

log jupyter notebook and get output and input number?

I want to write a tutorial for python (assume basic python) and I want to make a pdf version using latex. I want to run a session of jupyter notebook and write codes step by step and print the steps in pdf.
I want to get something like this:
In [12]: a = 'foo'
In [13]: type(a)
Out[13]: str
When I use magic code %logstart -o myfile.py I get something like this:
2+2
#[Out]# 4
is there anyway to log my codes like first style?
Exporting the notebook as a web page with File > Download as > HTML, will save the contents as they are presented on the screen.
You can then edit and convert that file to tex with the pandoc command-line utility:
pandoc -i notebook.html -o notebook.tex
After you're done editing the file (it's bound to have a few errors unfortunately), you can create a pdf:
pandoc -i notebook.tex -o notebook.pdf

How do I save a Beaker notebook as straight python/r/...?

I just discovered Beaker Notebook. I love the concept, and am desperately keen to use it for work. To do so, I need to be sure I can share my code in other formats.
Question
Say I write pure Python in a Beaker notebook:
Can I save it as a .py file as I can in iPython Notebook/Jupyter?
Could I do the same if I wrote a pure R Beaker notebook?
If I wrote a mixed (polyglot) notebook with Python and R, can I save this to e.g. Python, with R code present but commented out?
Lets say none of the above are possible. Looking at the Beaker Notebook file as a text file, it seems to be saved in JSON. I can even find the cells that correspond to e.g. Python, R. It doesn't look like it would be too challenging to write a python script that does 1-3 above. Am I missing something?
Thanks!
PS - there's no Beaker notebook tag!? bad sign...
It's really not that hard to replicate the basics of the export:
#' Save a beaker notebook cell type to a file
#'
#' #param notebook path to the notebook file
#' #param output path to the output file (NOTE: this file will be overwritten)
#' #param cell_type which cells to export
save_bkr <- function(notebook="notebook.bkr",
output="saved.py",
cell_type="IPython") {
nb <- jsonlite::fromJSON(notebook)
tmp <- subset(nb$cells, evaluator == cell_type)$input
if (length(tmp) != 0) {
unlink(output)
purrr::walk(tidyr::unnest(tmp, body), cat, file=output, append=TRUE, sep="\n")
} else {
message("No cells found matching cell type")
}
}
I have no idea what Jupyter does with the "magic" stuff (gosh how can notebook folks take that seriously with a name like "magic").
This can be enhanced greatly, but it gets you the basics of what you asked.

How to use the content of a ipython notebook Markdown cell in python

In IPython one can get previous outputs and inputs via Out[n] and In[n] variables. Is it possible to use the contents of a Markdown notebook cell and use it in python.
I would like to write some text in a Markdown cell
This is Markdown I would like to manipulate with.
Then I would like to use this text in the next python cell
md_cell = ???
print md_cell.replace("Markdown", "Markup")
... # do stuff, write it to a file, be happy
to do something with it.

Categories

Resources