I followed along Semaphore's Blog Post on testing Jupyter notebooks using pytest and Nbmake. This is a great post, and testing worked great. Summarizing how I applied the blog post:
run pip install pytest nbmake into a virtual env
run pytest --nbmake notebooks -where notebooks is a folder containing my *.ipynb files
It's working correctly, because when I add an intentional error cell, the test fails.
What I'd like to know is the minimal set of additional libraries and commands that are necessary for me to be able to interactively run my notebooks as well in the same environment. I know that you can also add the --overwrite flag to inspect the results, and this is definitely very useful, but that's not what I'm asking for. In particular, I'd like to have steps (3) and (4) which:
pip install some additional libraries -or maybe we can even skip this step altogether?
awesome-jupyter-command notebooks/foo.ipynb -so now the jupyter kernel is started and automatically displays foo.ipynb for interactive evaluation
Most jupyter server commands (.e.g jupyter notebook and jupyter lab) accept a directory or notebook file as a positional argument, so you can do:
pip install jupyterlab
jupyter lab notebooks/foo.ipynb
which will launch the server and open the specified file.
Some other examples, for different flavors of UI:
# 'retro' single-document interface with new features
pip install retrolab
jupyter retro notebooks/foo.ipynb
# 'classic' application, which is trying to push folks to lab-based UI
pip install notebook
jupyter notebook notebooks/foo.ipynb
There's also nbopen which adds an additional step of checking for already-runnning servers, rather than always starting a new one:
pip install nbopen
nbopen notebooks/foo.ipynb
Related
I'm still learning how this all works, so please bear with me.
I'm running conda 4.8.5 on my Windows 10 machine. I've already installed all necessary Jupyter extensions, I think (Jupyter Lab, Jupyter Notebook, Jupyter Book, Node.js, and their dependencies).
The problem might have to do with the fact that I've installed Miniconda on a separate (D:/) drive.
I've set up a virtual environment (MyEnv) with all the packages I might need for this project. These are the steps I follow:
Launch CMD window
$ conda activate MyEnv
$ jupyter-lab --notebook-dir "Documents/Jupyter Books"
At this point a browser tab opens running Jupyter Lab
From the launcher within Jupyter Lab, open a terminal
$ cd "Documents/Jupyter Books"
$ jb create MyCoolBook
New folder with template book contents gets created in this directory (Yay!)
Without editing anything: $ jb build MyCoolBook
A folder gets added to MyCoolBook called _build, but it doesn't contain much more than a few CSS files.
The terminal throws this error traceback which wasn't very helpful to me. The issue may be obvious to an experienced user.
I am not sure how to proceed. I've reset the entire environment a few times trying to get this to work. What do you suggest? I'm considering submitting a bug report but I want to rule out the very reasonable possibility that I'm being silly.
I asked around in the Github page/forum for Jupyter Book. Turns out it's a matter of text encoding in Windows (I could have avoided this by reading deep into the documentation).
If anyone runs across this issue just know that it can be fixed by reverting to some release, Python 3.7.*, and setting an environment variable (PYTHONUTF8=1) but this is not something I would recommend because some other packages might require the default system encoding. Instead, follow the instructions in this section of the documentation.
I am trying to open a notebook with jupyter, by using the following command:
jupyter notebook notebook.ipynb
I use Linux and python2, I have installed jupyter inside a virtualenvironment with pip.
When I run the command above, what happens is that I get a new screen
REFRESH(1 sec): http://localhost:8889/notebooks/mylink
This page should redirect you to Jupyter Notebook. If it doesn't, click here to go to Jupyter.
However, there is no dashboard opening automatically. So, I clicked on the link, but I am unable to connect. The weird think is that I don't remember what I did yesterday, but I managed to get through very easily.
Am I doing something wrong? Or maybe there is a problem with the link?
EDIT: If I run
jupyter notebook.ipynb
I get
Error executing Jupyter command 'notebook.ipynb': [Errno 2] No such file or directory
which does not make sense, because the file notebook.ipynb is actually there.
If I type
jupyter notebook stop
I get
There are no running servers
Happened to me too. Also their troubleshooting could not resolve the issue. You may note the This Worked An Hour Ago section.
If you are using environments, try creating a new environment and install jupyter notebook from scratch there. For Anaconda it would look like this:
conda create --name jupyter_env
conda activate jupyter_env
conda install -c conda-forge jupyterlab
jupyter notebook
You may change jupyter_env to a differen name of your liking.
What I usually do is run
jupyter notebook
and it will automatically open up the browser with my current directory. I then search for my notebook that way.
I guess one of these could help you fix this one:
change url to http://127.0.0.1:8889/notebooks/mylink or http://0.0.0.0:8889/notebooks/mylink
in case proxy or any other network settings set on your browser, disable them and check twice
if none of them did work, try to enter the url this way: http://localhost:8888/tree?
And, provide logs from console might help as well.
And one more thing, just to make sure, why you're connecting via port 8889? did you tried to run jupyter by --port command option?
I have some Python code in a Jupyter notebook and I need to run it automatically every day, so I would like to know if there is a way to set this up. I really appreciate any advice on this.
Update
recently I came across papermill which is for executing and parameterizing notebooks.
https://github.com/nteract/papermill
papermill local/input.ipynb s3://bkt/output.ipynb -p alpha 0.6 -p l1_ratio 0.1
This seems better than nbconvert, because you can use parameters. You still have to trigger this command with a scheduler. Below is an example with cron on Ubuntu.
Old Answer
nbconvert --execute
can execute a jupyter notebook, this embedded into a cronjob will do what you want.
Example setup on Ubuntu:
Create yourscript.sh with the following content:
/opt/anaconda/envs/yourenv/bin/jupyter nbconvert \
--execute \
--to notebook /path/to/yournotebook.ipynb \
--output /path/to/yournotebook-output.ipynb
You have more options except --to notebook. I like this option since you have a fully executable "log"-File afterwards.
I recommend using a virtual environment to run your notebook, to avoid that future updates mess with your script. Do not forget to install nbconvert into the environment.
Now create a cronjob, that runs every day e.g. at 5:10 AM, by typing crontab -e in your terminal and add this line:
10 5 * * * /path/to/yourscript.sh
Try the SeekWell Chrome Extension. It lets you schedule notebooks to run weekly, daily, hourly or every 5 minutes, right from Jupyter Notebooks. You can also send DataFrames directly to Sheets or Slack if you like.
Here's a demo video, and there is more info in the Chrome Web Store link above as well.
**Disclosure: I'm a SeekWell co-founder
It's better to combine with airflow if you want to have higher quality.
I packaged them in a docker image, https://github.com/michaelchanwahyan/datalab.
It is done by modifing an open source package nbparameterize and integrating the passing arguments such as execution_date. Graph can be generated on the fly The output can be updated and saved within inside the notebook.
When it is executed
the notebook will be read and inject the parameters
the notebook is executed and the output will overwrite the original path
Besides, it also installed and configured common tools such as spark, keras, tensorflow, etc.
you can add jupyter notebook in cronjob
0 * * * * /home/ec2-user/anaconda3/bin/python /home/ec2-user/anaconda3/bin/jupyter-notebook
you have to replace /home/ec2-user/anaconda3 with your anaconda install location, and you can schedule time based on your requirements in cron
Executing Jupyter notebooks with parameters is conveniently done with Papermill. I also find convenient to share/version control the notebook either as a Markdown file or a Python script with Jupytext. Then I convert the notebook to an HTML file with nbconvert. Typically my workflow looks like this:
cat world_facts.md \
| jupytext --from md --to ipynb --set-kernel - \
| papermill -p year 2017 \
| jupyter nbconvert --no-input --stdin --output world_facts_2017_report.html
Learn more about the above, including how to specify the Python environment in which the notebook is expected to run, and how to use continuous integration on notebooks, have a look at my article Automated reports with Jupyter Notebooks (using Jupytext and Papermill) which you can read either on Medium, GitHub, or on Binder. Use the Binder link if you want to test interactively the outcome of the commands in the article.
As others have mentioned, papermill is the way to go. Papermill is just nbconvert with a few extra features.
If you want to handle a workflow of multiple notebooks that depend on one another, you can try Airflow's integration with papermill. If you are looking for something simpler that does not need a scheduler to run, you can try ploomber which also integrates with papermill (Disclaimer: I'm the author).
To run your notebook manually:
jupyter nbconvert --to notebook --execute /home/username/scripts/mynotebook.ipynb
Create a simple batch file and add the command above to the file:
/home/username/scripts/mynotebook.sh
Paste the command above into the file
Make the file executable
chmod +x /home/username/scripts/mynotebook.sh
To schedule your notebook use cron or airflow, depends on your needs vs complexity. if you want to use cron, you can simply do crontab -e and add an entry
00 11 * * * /home/username/scripts/mynotebook.sh
You can download the notebook in the form of .py and then create a batch file to execute the .py script. Then schedule the batch file in the task scheduler
Creating a BAT file then running it through Task scheduler worked for me. Below is the code.
call C:\Users\...user...\Anaconda3\condabin\conda activate
python -m notebook_file.py
pause
call conda deactivate
There are several ways to execute a Jupyter Notebook daily, according to the article.
Cron or Windows Task Scheduler
You can use your operating system scheduler to execute the notebook. There are two command line tools for executing notebooks:
nbconvert
papermill
Both are great, I personally use nbconvert, but papermill offers handful of extensions as input parameters for notebooks or automatic export to cloud storage.
Mercury
The open source framework Mercury is a web based application that:
can execute notebook in the background,
can share notebook as website,
can send execute notebook as email with PDF or HTML attachment,
can restrict access to notebooks to authenticated users.
Notebooks available in web app
Scheduled notebook
PDF notebook sent in email
Notebooker
Notebooker is open source web app for scheduling and sharing notebooks.
List of notebooks
Executed notebook
You want to use Google AI Platform Notebooks Scheduler service currently in EAP.
Here is my setting
and this is my script
I am trying to use jupyter notebook in pyCharm, but it kept using python2 instead of python3.
Any idea about this problem?
Add:
this pic is running jupyter notebook in chrome.
My problem was that I had multiple kernels, and PyCharm launches the default kernel. One approach might be to configure PyCharm to specify the kernel of choice to start up, I didn't investigate how to do that. I simply changed the default kernel in Jupyter and this worked for me (I have a virtualenv for tensorflow). c.MultiKernelManager.default_kernel_name = 'tensorflow'.
The preferences image you show is indeed how you would setup your interpreter for PyCharm, but that's not what the output/logging of PyCharm looks like. I'm guessing that's a jupyter-notebook display, which means you are running into the issue in jupyter-notebook and not PyCharm. So you need to change your setup for jupyter. Based on some quick searching pip install jupyter will install a python 2.7 version of jupyter. Sounds like what you want is
pip3 install jupyter
which will install the python3 version for you. You will likely have to uninstall your current version of jupyter.
When you kick off Jupyter-notebook from within PyCharm there is a configuration which is created. If the configuration is initially 2.7 ( I think it defaults to the current interpreter), and then keep using that same configuration, it wouldn't matter the state of the current project interpreter because it would be using the value saved in the run configuration.
You can modify your run configuration by
Run | Run...
Edit Configurations...
Select your Jupyter Notebook run configuration on the left (here is untitled4)
Make sure the python interpreter is correct here on the right
I was able to start a jupyter notebook like this and get it to output python 3 by doing this. Hope this is what you are needing.
I am trying to work with 12GB of data in python for which I desperately need to use Spark , but I guess I'm too stupid to use command line by myself or by using internet and that is why I guess I have to turn to SO ,
So by far I have downloaded the spark and unzipped the tar file or whatever that is ( sorry for the language but I am feeling stupid and out ) but now I can see nowhere to go. I have seen the instruction on spark website documentation and it says :
Spark also provides a Python API. To run Spark interactively in a Python interpreter, use bin/pyspark but where to do this ? please please help .
Edit : I am using windows 10
Note:: I have always faced problems when trying to install something mainly because I can't seem to understand Command prompt
If you are more familiar with jupyter notebook, you can install Apache Toree which integrates pyspark,scala,sql and SparkR kernels with Spark.
for installing toree
pip install toree
jupyter toree install --spark_home=path/to/your/spark_directory --interpreters=PySpark
if you want to install other kernels you can use
jupyter toree install --interpreters=SparkR,SQl,Scala
Now run
jupyter notebook
In the UI while selecting new notebook, you should see following kernels availble
Apache Toree-Pyspark
Apache Toree-SparkR
Apache Toree-SQL
Apache Toree-Scala
When you unzip the file, a directory is created.
Open a terminal.
Navigate to that directory with cd.
Do an ls. You will see its contents. bin must be placed
somewhere.
Execute bin/pyspark or maybe ./bin/pyspark.
Of course, in practice it's not that simple, you may need to set some paths, like said in TutorialsPoint, but there are plenty of such links out there.
I understand that you have already installed Spark in the windows 10.
You will need to have winutils.exe available as well. If you haven't already done so, download the file from http://public-repo-1.hortonworks.com/hdp-win-alpha/winutils.exe and install at say, C:\winutils\bin
Set up environment variables
HADOOP_HOME=C:\winutils
SPARK_HOME=C:\spark or wherever.
PYSPARK_DRIVER_PYTHON=ipython or jupyter notebook
PYSPARK_DRIVER_PYTHON_OPTS=notebook
Now navigate to the C:\Spark directory in a command prompt and type "pyspark"
Jupyter notebook will launch in a browser.
Create a spark context and run a count command as shown.