Finding output cells causing large file size in jupyter notebook - python

I have a jupyter notebook which has ~400 cells. The total file size is 8MB so I'd like to suppress the output cells that have a large size so as to reduce the overall file size.
There are quite a few possible output cells that could be causing this (mainly matplotlib and seaborn plots) so to avoid spending time on trial and error, is there a way of finding the size of each output cell? I'd like to keep as many output plots as possible as I'll be pushing the work to github for others to see.

My idea with nbformat spelled out for running in a cell in a Jupyter notebook cell to get the code cell numbers listed largest to smallest (it will fetch a notebook example first to have something to try it on):
############### Get test notebook ########################################
import os
notebook_example = "matplotlib3d-scatter-plots.ipynb"
if not os.path.isfile(notebook_example):
!curl -OL https://raw.githubusercontent.com/fomightez/3Dscatter_plot-binder/master/matplotlib3d-scatter-plots.ipynb
### Use nbformat to get estimate of output size from code cells. #########
import nbformat as nbf
ntbk = nbf.read(notebook_example, nbf.NO_CONVERT)
size_estimate_dict = {}
for cell in ntbk.cells:
if cell.cell_type == 'code':
size_estimate_dict[cell.execution_count] = len(str(cell.outputs))
out_size_info = [k for k, v in sorted(size_estimate_dict.items(), key=lambda item: item[1],reverse=True)]
out_size_info
(To have a place to easily run that code go here and click on the launch binder button. When the session spins up, open a new notebook and paste in the code and run it. Static form of the notebook is here.)
Example I tried didn't include Plotly, but it seemed to do similar using a notebook with all Plotly plots. I don't know how it will handle a mix though. It may not sort perfectly if different kinds.
Hopefully, this gives you an idea though how to do what you wondered. The code example could be further expanded to use the retrieved size estimates to have nbformat make a copy of the input notebook without the output showing for, say, the top ten largest code cells.

Related

Require Jupyter notebook to render matplotlib notebook figure before closing

Introduction and examples.
It is a widely known
best practice to
close matplotlib figures after opening them
to avoid consuming too much memory.
In a standalone Python script, for example,
I might do something like this:
fig, ax = plt.subplots()
ax.plot(x, y1);
plt.show() # stop here and wait for user to finish
plt.close(fig)
Similarly, for a Jupyter notebook
it's a common pattern to create a figure in one cell
and then close it in the next cell.
A minimal example might look something like this,
where each blank line is a new cell:
%matplotlib notebook
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
fig, ax = plt.subplots()
ax.plot(x, y1);
plt.close(fig)
However, this has a distinct disadvantage:
the figure is blank when run with "Kernel" -> "Restart & Run All"
or "Cell" -> "Run All".
That is, the figure doesn't display until all cells finish evaluating,
but the figure doesn't have time to render
since we are using plt.close right afterward.
Here's an example screenshot based on the example above:
This is different from running each cell individually:
as long as I run each cell slowly enough,
I can get each figure to display:
Full Jupyter notebook .ipynb files are available here:
https://github.com/nbeaver/jupyter-figure-rendering-tests
Inadequate workarounds.
The simplest workaround is start from the top
and manually step through each cell,
wait for it to render, and then move onto the next cell.
I don't consider this an acceptable workaround,
as it becomes impractically laborious for large notebooks
and is generally contrary to the purpose of an executable notebook environment.
Another workaround is to simply never call plt.close(fig).
This is not a good option for several reasons:
It results in excess memory usage,
as evidenced by warnings about opening too many figures.
On some machines or resource-constrained environments
this may prevent the notebook execution from completing at all.
Even in environments with abundant memory,
cursor tracking and interactive functionality like zoom or pan
becomes very slow when many figures are open at once.
As mentioned above, in general none of the figures will
display until the final cell has finished executing,
which is undesirable for notebooks
where execution time may be long for certain cells further down.
Another workaround is to use %matplotlib inline
instead of %matplotlib notebook.
This is unsatisfactory also:
The inline mode does not permit interactive inspection of the figure
such as cursor position values or pan and zoom.
This functionality may be desirable or essential for analysis.
The inline and notebook settings cannot in general be toggled on a per-cell basis,
so effectively this is an all-or-nothing setting.
Even if it were possible to toggle between inline and notebook,
I would prefer to only use notebook and then
close the figure in a subsequent cell,
so that I can return to the cell later
and re-run it to get the interactive controls
without needing to edit the cell.
An analogous workaround is to call savefig for each cell with a figure
and then browse the generated images with an external image viewing program.
While this allows limited zooming and panning,
it doesn't give cursor positions
and it's really not comparable to the interactive notebook plots.
Criteria and current workaround.
Here are my requirements:
The effect of "Restart & Run All" must render all figures eventually;
no figures can be left blank.
Use %matplotlib notebook for all cells
so that I can re-run the cell later and inspect the figure interactively.
Allow plt.close(fig) after each cell so that notebook
resources can be conserved.
Essentially, I would like a way to force the kernel
to render the current figure
before proceeding on to plt.close(fig).
My hope is that this is a well-known behavior
and that I've simply missed something.
Here's what I have tried so far that didn't help at all:
plt.show() at the end of a cell or between cells.
fig.show() at the end of a cell or between cells.
plt.ioff() at the end of a cell or between cells.
time.sleep(1) at the end of a cell or between cells.
plt.pause(1) at the end of a cell or between cells.
fig.canvas.draw_idle() at the end of a cell or between cells.
Doing from IPython.display import display and then display(fig)
at the end of a cell or between cells.
calling plt.close('all') at end of notebook instead of between each cell.
Currently, the best I've been able to do
is call fig.canvas.draw() in a separate cell
between the figure and the cell with plt.close(fig).
Using the example above, here's what this looks like:
%matplotlib notebook
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
fig, ax = plt.subplots()
ax.plot(x, y1);
fig.canvas.draw()
plt.close(fig)
This works reliably in smaller notebooks,
and for a while I thought this had solved the problem.
However, this doesn't always work;
particularly in large notebooks with many figures,
some of them still come out blank,
suggesting that fig.canvas.draw() adds some delay
but is not always sufficient,
perhaps due to a race condition.
Since fig.canvas.draw() is not a documented method
for making Jupyter notebooks render the current figure,
I would hesitate to describe this as a bug,
although it seems to be the closely related to this matplotlib issue,
which ultimately seems to be a Jupyter bug:
The simplest work around may be to put the input() call in the next cell or to add plt.gcf().canvas.draw() above the input call. This will still result in "dead" figures (which may not have caught up to the hi-dpi settings of your browser), but they will at least show.
I've observed this behavior in many combinations of matplotlib and Jupyter,
including matplotlib version 2.1.1 and 3.5.1
and Jupyter version 4.4.0 and 6.4.8.
I've also observed it in both Google Chrome 99.0.4844.51 and Firefox 97.0.2
and on both Windows 10 and Ubuntu 18.04.6.
Related questions (not duplicates):
Programmatically Stop Interaction for specific Figure in Jupyter notebook
Specify where in output matplotlib figures are rendered in jupyter using figure handles
Get Jupyter notebook to display matplotlib figures in real-time
Force matplotlib to fully render plots in an IPython/Jupyter notebook cell before advancing to next cell

How to insert python variable into Latex matrix in Jupyter notebook markdown cell

So, I am trying to make a Jupyter notebook that is slightly interactive in which I can change the number value of a variable, and then use that variable in a Markdown cell to display a Latex matrix like so:
And that cell displays this:
I don't know why that spanID thing is showing or how to get rid of it. I already have NBextensions installed and in that Markdown cell if I just type {{a_0}} then when I run that cell it just displays 1 like it should. But the moment I put it within the latex matrix, then I get the error. Any help with this is greatly appreciated.
Thanks!
The solution I applied for the problem is to create Latex objects in the code cells and displaying them in the markdown cells with Python Markdown extension of the Jupyter notebook extensions. For example, a code cell can have:
from IPython.display import Latex as lt
# Room_length [m]
l_rl = 30
l_rl_lt = lt(rf"$l_\mathrm{{rl}}={l_rl}\ \mathsf{{m}}$")
and the markdown cell can have:
The room has length size {{l_rl_lt}}.
Since variables in markdown cells are still not natively supported and it cannot be done without specific extensions (this is especially a problem in JupyterLab).
It is worth mentioning that you can also print the matrix directly from the code cell.
As an example, I will use function from my gist.
It basically looks like this:

Jupyter notebook make input cell scrollable

I have input code cell in the notebook that contains a lot of lines of code. I can't split it into different cells so its a long line of code block.
I would like to make it scrollable so that its easier to navigate within the notebook. Is it possible to do in jupyter?
Again, I need the vertical scrolling for code block (INPUT cell) not output.

How can I increase the size of Jupyter notebook cells and text?

I want to increase the size of my cells and text to increase readability, I already tried this solution: How do I increase the cell width of the Jupyter/ipython notebook in my browser?.
However that just made the cells wider and not bigger, so all text stays the same small size.
Screenshot of Jupyter notebook now
What I want Jupyter notebook to look like
This isn't related to Python.
I'm guessing your monitor is pretty big.
Just try zooming in your view for this particular page.
On Chrome this is Cmd and +

Kernel restart when setting x_lim matplotlib

I am drawing a scatter plot using Matplotlib on IPython notebook.
When I limit my x axis from 0 to 6e7 I get Kernel restart error.
What shall I do?
Here's the code which I using to limit the x axis:
ax.set_xlim(0, 6e7)
Without the above line everything is working fine.
The reason I want to limit the x axis is that I want to have many plots from different data and able to compare them. Therefore I want my different plots to have same axis.
UPDATE
I just noticed that even if I limit my x axis from 0 to 100, I am getting the same error.
Additional information
Error message: The kernel appears to have died. It will restart automatically.
Code snippet I am using:
The line which I have commented above is causing the problem.
IPython version: 4.0.1
Matplotlib version: 1.5.0
This seems to be due to IPython Notebook limitations. For debugging:
Copy all your relevant code into a file.
Add plt.show() at the end of the file.
Save this file as myfile.py.
Run it from the console: python myfile.py.
See if it works our you get an error message.

Categories

Resources