Jupyter notebook is extremely slow when re-running cells - python

I have a relatively large Jupyter/Notebook (about 40GB of Pandas DFs in RAM). I'm running a Python 3.6 kernel installed with Conda.
I have about 115 cells that I'm executing. If I restart the kernel and run the cells, my whole notebook runs in about 3 minutes. If I re-run a simple cell that's not doing much work (i.e. a function definition), it takes an extremely long time to execute (~15 minutes).
I cannot find any documentation online that has Jupyer notebook installation best practices. My disk usage is low, available RAM is high and CPU load is very low.
My swap space does seem to be maxed out, but I'm not sure what would be causing this.
Any recommendations on troubleshooting a poor-performing Jupyter notebook server? This seems to be related to re-running cells only.

If the Variable Inspector nbextension is activated, it might slow down the notebook when you have large variables in memory (such as your Pandas dataframes).
See: https://github.com/ipython-contrib/jupyter_contrib_nbextensions/issues/1275
If that's the case, try disabling it in Edit -> nbextensions config.

Related

How can I measure the memory required to run a Jupyter notebook?

I am preparing a Jupyter notebook which uses large arrays (1-40 GB), and I want to give its memory requirements, or rather:
the amount of free memory (M) necessary to run the Jupyter server and then the notebook (locally),
the amount of free memory (N) necessary to run the notebook (locally) when the server is already running.
The best idea I have is to:
run /usr/bin/time -v jupyter notebook,
assume that "Maximum resident set size" is the memory used by the server alone (S),
download the notebook as a *.py file,
run /usr/bin/time -v ipython notebook.py
assume that "Maximum resident set size" is the memory used by the code itself (C).
Then assume N > C and M > S + C.
I think there must be a better way, as:
I expect Jupyter notebook to use additional memory to communicate with client etc.,
there is also additional memory used by the client run in a browser,
Uncollected garbage contributes to C, but should not be counted as the required memory, should it?
Your task is hard I think.
You have no guarantee that Python is actually putting in ram every variable. Maybe the OS decided to kick some of the memory on the disk using the swap.
You can try to disable it but there may be other things that cache stuff.
You can force garbage collection somehow in python using the gc package but results were inconsistent when I tried to do so.
I don't know if M and N are actually useful to you or if you are just trying to size a server. If it's the later, maybe renting increasing size server on AWS or Digital Ocean and running a runtime benchmark may actually give you faster and more reliable results.
The jupyter-resource-usage extension is an extension preinstalled. It tells you how much memory you have and how much you need, but it's only available in the jupyter notebook you find in anaconda installations. It is located in the top-right corner.

Jupyter notebook executions turn grey in Visual studio code

I am trying to execute python code in VS code enabling jupyter notebook execution.Repeatedly the execution screen turns gray which makes the output invisible with the headers.Code will be still executable.
Any suggestions to recover from this issue..Each time copy pasting to another notebook and rerunning is not helping to solve the issue.
I also have the same issue with VScode and Jupyter notebook, in my case it only happens when the overall size of the notebook is large (more than 150 MB) which caused by keeping the output of the cells (in my case the high quality figures), this causes the notebook to crash and grey out all the outputs. The solution that I found so far is to clear the output, it won't crash again, there is also some solution suggested by the developers here, which suggests to remove the Code cash. I would suggest to break long notebooks to smaller notebooks, or clear the output.
Update
I frequently had this issue with my notebooks of any size. One of the solutions was to remove the code Cache on my Windows machine (for mac users you have to find the equivalent app data on your system and remove the Cache).
The easiest way to access the Cache folder is to open a run window and search the following line and delete the Cache as much as you can:
%APPDATA%\Code - Insiders\Code Cache
It helped me so far. Please let me know if it worked for you guys too or you found any other solutions.

Very high memory usage when profiling python in PyCharm

I'm trying to profile a python application in pycharm, however when the application terminates and the profiler results are displayed Pycharm requires all 16gb of RAM that I have, which makes pycharm unusable.
Said python application is doing reinforcement learning, so it does take a bit of time to run (~10 min or so), however while running it does not require large amounts of RAM.
I'm using the newest version of PyCharm on Ubuntu 16.04 and CProfile is used by Pycharm for profiling.
I'd be very glad if one of you knows a solution.
EDIT: It seems this was an issue within PyCharm, which has since been fixed (as of 2017/11/21)
It's a defect within PyCharm: https://youtrack.jetbrains.com/issue/PY-25768

Kernel crashes when increasing iterations

I am running a Python script using Spyder 2.3.9. I have a fairly large script and when running it through with (300x600) iterations (a loop inside another loop), everything appears to be working fine and takes approximately 40 minutes. But when I increase the number to (500x600) iterations, after 2 hours, the output yields:
It seems the kernel died unexpectedly. Use 'Restart kernel' to continue using this console.
I've been trying to go through the code but don't see anything that might be causing this in particular. I am using Python 2.7.12 64bits, Qt 4.8.7, PyQt4 (API v2) 4.11.4. (Anaconda2-4.0.0-MacOSX-x86_64)
I'm not entirely sure what additional information is pertinent, but if you have any suggestions or questions, I'd be happy to read them.
https://github.com/spyder-ide/spyder/issues/3114
It seems this issue has been opened on their GitHub profile, should be addressed soon given the repo record.
Some possible solutions:
It may be helpful, if possible, to modify your script for faster convergence. Very often, for most practical purposes, the incremental value of iterations after a certain point is negligible.
An upgrade or downgrade of the Spyder environment may help.
Check your local firewall for blocked connections to 127.0.0.1 from pythonw.exe.
If nothing works, try using Spyder on Ubuntu.

Does running IPython/Jupyter Notebook affect the speed of the program?

I am developing a program for simulation (kind of like numerical solver). I am developing it in an ipython notebook. I am wondering if the speed of the code running in the notebook is the same as the speed of the code running from terminal ?
Would browser memory or overhead from notebook and stuff like that makes code run slower in notebook compared to native run from the terminal?
One of the things that might slow things a lot would be if you had a lot of print statements in your simulation.
If you run the kernels server and browser on the same machine, assuming your simulation would have used all the cores of your computer, yes using notebook will slow things down. But no more than browsing facebook or Youtube while the simulation is running. Most of the overhead of using IPython is actually when you press shift-enter. In pure python prompt the REPL might react in 100ms, and in IPython 150 or alike. But if you are concern about performance, the overhead of IPython is not the first thing you should be concern about.
I have found that Jupyter is significantly slower than Ipython, whether or not many print statements are used. Nearly all functions suffer decreased performance, but especially if you are analyzing large dataframes or performing complex calculations, I would stick with Ipython.
I tested learning the same small neural net (1) under Jupyter and (2) running Python under Anaconda prompt (either with exec(open(foo.py).read()) under python or with python foo.py directly under Anaconda prompt).
It takes 107.4 sec or 108.2 sec under Anaconda prompt, and 105.7 sec under Jupyter.
So no, there is no significant difference, and the minor difference is in favor of Jupyter.

Categories

Resources