How to run Python code before every jupyter notebook kernel - python

Suppose I have a code snippet that I'd like to run every time I open a jupyter notebook (in my case it's opening up a Spark connection). Let's say I save that code in a .py script:
-- startup.py --
sc = "This is a spark connection"
I want to be able to have that code snippet run every time I open a kernel. I've found some stuff about the Jupyter Configuration File, but it doesn't seem like variables defined there show up when I try to run
print(sc)
in a notebook. Is there a command-line option that I could use -- something like:
jupyter notebook --startup-script startup.py
or do I have to include something like
from startup import sc, sqlContext
in all of the notebooks where I want those variables to be defined?

I'd recommend to create a startup file as you suggested, and include it via
%load ~/.jupyter/startup.py
This will paste the content of the file into the cell, which you can then execute.
Alternatively, you can write a minimal, installable package that contains all your startup code.
Pro: Doesn't clutter your notebook
Con: More difficult to make small changes.

A custom package or explicit loading is not needed (though might be preferred if you work with others): you can have auto-executed startup scripts
👉 https://stackoverflow.com/a/47051758/2611913

Related

How to run a jupyter notebook programmatically (inside a Sagemaker notebook) from a local environment

I can start/stop Sagemaker notebooks with boto3, but how do run the jupyter notebooks or .py scripts inside?
This is something I'll run from a local environment or lambda (but that's no problem).
Start Sagemaker notebook instance:
import boto3
client = boto3.client('sagemaker')
client.start_notebook_instance(
NotebookInstanceName='sagemaker-notebook-name'
)
docs
In the UI I would just click "Open Jupyter", then run a notebook or a .py script inside it.
But I want to do it programmatically with boto3 or other.
My file inside is called lemmatize-input-data.ipynb.
This must be possible but I'm unsure how?
I also tried:
In a "start notebook" lifecycle configuration script, after creating a simpler test file called test_script.ipynb to be certain it wasn't something advanced in my jupyter notebook that caused the error.
set -e
jupyter nbconvert --execute test_script.ipynb
But got the error:
[NbConvertApp] WARNING | pattern 'test_script.ipynb' matched no files
I encourage you to look into papermill. It copies and runs a template notebook, using nbconvert under the hood. What I have found the main benefit of papermill to be is you can easily parameterize notebooks and pass in the parameters via a python dictionary. The copies of the template then maintain a history of what was executed and the results.
Your code would be something like:
import papermill as pm
pm.execute_notebook(
'lemmatize-input-data.ipynb',
'lemmatize-input-data-####.ipynb'
)
With #### being something like datetime.now() or whatever you would like to differentiate the notebooks as they execute.
Since notebooks are intended to be living documents, you want to limit the number of external dependencies that would have breaking changes if the notebook changes and you need to re-run as of a point in time. Papermill addresses this by making a snapshot of what was executed at that time.
Update for a little more background:
I would update the jupyter notebook to contain the python code instead of the script. The notebook will execute cell by cell and act just like script. This also allows you to print and display intermediate and final values within the notebook, if needed. When papermill copies and executes the template notebook, all of the output will be displayed and saved within the notebook. This is handy for any graphs that have been generated.
Papermill also has functionality that will then aggregate data across notebooks. See here for a good article summarizing papermill in general. Papermill was designed by Netflix and they have a good post about the philosophy behind it here, in which they reference machine learning.
All this being said, papermill can be used to easily document each step of training your machine learning model in sagemaker. Then using the aggregation capabilities of papemrill, you can graphically see over time how your model changed.
You have the correct approach of executing the notebook inside a Lifecycle Configuration script. The issue is that the working directory of the script is "/" whereas the Jupyter server starts up from /home/ec2-user/SageMaker.
So, if you modify the script to address the absolute path to the notebook file and it should work.
jupyter nbconvert --execute /home/ec2-user/SageMaker/lemmatize-input-data.ipynb
Thanks for using Amazon SageMaker!
Have a look here. This can deploy your Jupyter Notebook as a serverless function and then you can invoke it using the REST endpoint.

How to load Python script into Jupyter Notebook without revealing code and making it available to all cells

I'm trying to import a Python script as a module into one of my Notebooks, but the import or import importlib commands do not work in Jupyter Notebook like they do in a standard Python file and terminal.
I've come across the %load command but as far as I understand and have seen in my own Notebook, is that it only loads the contents of the script I'm trying to import into the current cell and is invisible to the other cells, and it also outputs the code in the script being imported for everyone to see.
I want to be able to load the script and make it available to all cells, as well as hiding the code for the purpose of encapsulation and also keeping the Notebook neat and tidy - only focusing on the code relevant to the topic of the Notebook. Is this possible?

Use a single iPython console to run files and parts of code in PyCharm

I started using Pycharm and managed to have the python prompt running after a file is run (link) and I found how to run pieces of code in the current iPython console (link).
For data science, however, it is very convenient to have a single iPython console/Python kernel where I can run files and code, just like Spyder does, and continue using the same started python kernel. For example to load a big amount of data with a script and write another script or pieces of code for plotting in different ways, exploring the data.
Is there a way in Pycharm to do it?
I think that it would imply generating automatically and running a line like:
runfile('C:/temp/my_project/src/console/load_data.py', wdir='C:/temp/my_project/src')
If the option does not exist, is it possible to make a macro or similar to do it?
The option "Single instance only" in Edit configurations doesn't help.

What's the best practice in the workflow of writing and running python files with Vim?

Currently, I write pythons files in Vim, and run it with jupyter qtconsole. The advantage of this way is that I could work with Vim so get all the benefits of Vim.
I could run the python directly in Vim using the pymode plugins, but in this way I cannot see and manipulate the output variables, and the figures are opened in another window which is quite annoying when I have to close them to make Vim responsible again. Compared with this, in jupyter qtconsole I could use %maplotlib inline to display figures elegantly.
However, my current workflow has a big disadvantage that every time I run my python script in qtconsole, and then I edit my python script, it is not so easy to run it again with the modified script. Since the module has been loaded, rerun it will not automatically reload the modified module source. I found no easy way to overcome this drawback. Currently I have to restart the kernel and then reset the path, turn on %matplotlib inline, and %run python-script.py again.
Any one can give me a solution?
I find an answer which solves my problem by using ipython extension autoreload.
%load_ext autoreload
%autoreload 2
Then I do not have to restart kernel any more.

Synchronize .ipynb and .py to use ipython notebook and eclipse at the same time

I started programming some scripts with ipython notebook but now the project is becoming to big for a notebook. Nevertheless I love to execute my stuff in an ipython notebook (load de data only once, online figures...).
What I would want is to program everything with eclipse but executing it in ipython. I know I can save the notebooks as .py by adding the --script option at the beginning. But now I want to automatically make the process the other way around. I mean, I want my ipython notebook to reload de code I modify with Eclipse.
Is it possible?
Should I make a program that makes it using the converter?
Thanks!!
I found the solution for manually updating the functions without rerunning the whole .ipynb file. However, I do not know how to make it automated. Here is the solution:
Synchronizing code between jupyter/iPython notebook script and class methods
To cut it short you need to put reload function from importlib module in the cell of interest:
import babs_visualizations
from importlib import reload
reload(babs_visualizations)
Just a little addition: make sure that you are addressing the function in the form of moldule.function. If you previously imported function by from module import function and then reloaded the module the function will not be reloaded. You can put the function inside the notebook cell and rerun it to see, how the changes in the module affected the function output in the notebook.
I hope this was helpful for someone.

Categories

Resources