Problems executing the Jupyter notebook using Sagemaker Lifecyle configuration

Problems executing the Jupyter notebook using Sagemaker Lifecyle configuration - python

I have a set up a notebook in a sagemaker instance that collects data from a postgresql database and then updates a different one with the output. When running it manually, it works (I am using sqlalchemy's create_engine and then just using pandas' to_sql on the final dataframe).
I then set up a lifecycle configuration that would execute the notebook once the instance is triggered. I wanted to get around the 5-timeout issues so I used the nohup command when executing the notebook:
#!/bin/bash
set -e
sudo -u ec2-user -i <<'EOF'
# PARAMETERS
ENVIRONMENT=python3
NOTEBOOK_FILE=/home/ec2-user/SageMaker/HourlyRun.ipynb
source /home/ec2-user/anaconda3/bin/activate "$ENVIRONMENT"
pip install --upgrade psycopg2-binary
pip install gensim==3.8.3
nohup jupyter nbconvert --to html "$NOTEBOOK_FILE" --ExecutePreprocessor.kernel_name=python3 --ExecutePreprocessor.timeout=-1 --execute &
source /home/ec2-user/anaconda3/bin/deactivate
# PARAMETERS
IDLE_TIME=1800
wget https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-notebook-instance-lifecycle-config-samples/master/scripts/auto-stop-idle/autostop.py
if /usr/bin/python -c "import boto3" 2>/dev/null; then
PYTHON_DIR='/usr/bin/python'
elif /usr/bin/python3 -c "import boto3" 2>/dev/null; then
PYTHON_DIR='/usr/bin/python3'
else
exit 1
fi
(crontab -l 2>/dev/null; echo "*/5 * * * * $PYTHON_DIR $PWD/autostop.py --time $IDLE_TIME --ignore-connections >> /var/log/jupyter.log") | crontab -
EOF
The instance now starts and stops as scheduled with no errors but I noticed the database wasn't being updated. I checked the logs and it seems like an operational error when querying the database.
I searched around for the solution but I only found references to changing hot_standby_feedback to "on" but it's not clear how I can do this within the code in jupyter. Not sure how approach this or if there is a better solution.
Thanks in advance.

Related

Notebook Instance Lifecycle Config for Notebook Instance took longer than 5 minutes

I am running a notebook using Sagemaker Lifecycle configuration but I am running into timeout issues. I went through blogs online on how to run scripts/codes that take >5 mins.
Figured out that using nohup and & would put the process in the background to complete. However I am running into timeout issues and havent been able to figure out why.
Below is that script and error I am receiving
set -e
ENVIRONMENT=python3
NOTEBOOK_FILE="/home/ec2-user/SageMaker/mynotebook.ipynb"
AUTO_STOP_FILE="/home/ec2-user/SageMaker/auto-stop.py"
IDLE_TIME=300 # 5minute
echo "<>Activating conda env"
echo "<>PWD: $PWD"
source /home/ec2-user/anaconda3/bin/activate "$ENVIRONMENT"
echo "<>Installing packages"
pip install cloudscraper==1.2.58 bs4
pip install pandas
pip install boto3 s3fs
echo "<>Starting notebook"
nohup jupyter nbconvert "$NOTEBOOK_FILE" --ExecutePreprocessor.kernel_name=python3 --to notebook --inplace --ExecutePreprocessor.timeout=-1 --execute &
echo "<>Decativating conda env"
source deactivate
# PARAMETERS
echo "<>Fetching the autostop script"
wget https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-notebook-instance-lifecycle-config-samples/master/scripts/auto-stop-idle/autostop.py
echo "<>Starting the SageMaker autostop script in cron"
(crontab -l 2>/dev/null; echo "*/1 * * * * /usr/bin/python $PWD/autostop.py --time $IDLE_TIME --ignore-connections") | crontab -
"
Error -
Failure reason Notebook Instance Lifecycle Config
'arn:aws:sagemaker:ap-south-1:588770669142:notebook-instance-lifecycle-config/web-scraping-lifecycle'
for Notebook Instance
'arn:aws:sagemaker:ap-south-1:588770669142:notebook-instance/web-scraping-routines-lambda'
took longer than 5 minutes. Please check your CloudWatch logs for more
details if your Notebook Instance has Internet access.

try with this, this is working at my end
jupyter nbconvert --to html "$NOTEBOOK_FILE" --ExecutePreprocessor.kernel_name=python3 ExecutePreprocessor.timeout=-1 --execute

Cron, execute bash script as root, but one part (Python script) as user

I need to run a bash script periodically on a Jetson Nano (so, Ubuntu 18.04). The script should run system updates, pull some Python code from a repository, and run it as a specified user.
So, I created this script:
#! /bin/bash
## system updates
sudo apt update
sudo apt upgrade
## stop previous instances of the Python code
pkill python3
## move to python script folder
cd /home/user_name/projects/my_folder
## pull updates from repo
git stash
git pull
## create dummy folder to check bash script execution to this point
sudo -u user_name mkdir /home/user_name/projects/dummy_folder_00
## launch python script
sudo -u user_name /usr/bin/python3 python_script.py --arg01 --arg02
## create dummy folder to check bash script execution to this point
sudo -u user_name mkdir /home/user_name/projects/dummy_folder_01
I created a cron job running this script as root, by using
sudo crontab -e
and adding the entry
00 13 * * * /home/user_name/projects/my_folder/script.sh
Now, I can see that at the configured time, both the dummy folders are created, and they actually belong to user_name. However, the Python script isn't launched.
I tried creating the cron job as non root user (crontab -e), but at this point even if the Python script gets exectured, I guess I wouldn't be able to run apt update/upgrade.
How can I fix this?

Well, if the dummy folders did get created, that means the sudo statements work, so i'd say theres a 99%+ chance that python was infact started.
I'm guessing the problem is that you havent specified the path for the python file, and your working directory likely isn't what you're expecting it to be.
change:
sudo -u user_name /usr/bin/python3 python_script.py --arg01 --arg02
to something like
sudo -u user_name /usr/bin/python3 /path/to/your/python_script.py --arg01 --arg02
then test.
If that didn't solve the problem , then enable some logging, change the line to:
sudo -u user_name /usr/bin/python3 /path/to/your/python_script.py --arg01 --arg02 \
1> /home/user_name/projects/dummy_folder_00/log.txt 2>&1 ;
and test again, it should log STDOUT and STDERR to that file then.

Python xlwings fails to run COM server

I’m a stumped. Xlwings is reporting failure to activate the COM server. I am using Excel 365 on Windows 10 and xlwings 0.17.1 (Excel module version 0.20.5) and using a python 3.8 conda environment crated directly through Anaconda Navigator.
In Excel, I traced the error to a shell command (below), which is kicked off by Excel XLPyCommand() in the xlwings Module (xlwings quickstart myproject --standalone).
cmd.exe /K ""C:\Users\Chris\anaconda3\condabin\conda" activate xw_cja && pythonw -B -c "import sys, os;sys.path[0:0]=os.path.normcase(os.path.expandvars(r'D:\Code\Python\AnacondaVenvs\xw_cja;D:\Code\Python\AnacondaVenvs\xw_cja\advena_xw.zip;C:\Users\Chris\anaconda3\envs\xw_cja')).split(';');import xlwings.server; xlwings.server.serve('$(CLSID)')"
I created a simple .py file to run this in PyCharm directly. I replaced “('$(CLSID)')” with “(‘{506e67c3-55b5-48c3-a035-eed5deea7d6d}’)”, which is hard-coded by the developer in both the Excel VBA and the python code.
import sys, os
os.path.expandvars(r'D:\Code\Python\AnacondaVenvs\xw_cja;D:\Code\Python\AnacondaVenvs\xw_cja\advena_xw.zip;C:\Users\Chris\anaconda3\envs\xw_cja')
sys.path[0:0]=os.path.normcase(os.path.expandvars(r'D:\Code\Python\AnacondaVenvs\xw_cja;D:\Code\Python\AnacondaVenvs\xw_cja\advena_xw.zip;C:\Users\Chris\anaconda3\envs\xw_cja')).split(';')
import xlwings.server
xlwings.server.serve('{506e67c3-55b5-48c3-a035-eed5deea7d6d}')
The code above runs just fine in a standalone ‘.py’ file (in PyCharm).
FYI: I used MKLINK -j (in a batch file) to make a directory junction link C:\Users\Chris\anaconda3\envs to D:\Code\Python\AnacondaVenvs, which makes them synonyms. This is unrelated, but wihtout this knowledge, you may mistakenly observe path discrepancies in the code above. (As an aside, I do this to make data on my D drive appear to be on C to manage the space on C)
The shell opened, showing the command is run against the correct conda environment, xw_cja.
Excel hourglassed for at least a minute. Then, I got this error: "Could not activate Python COM server, hr = -2147221164 1000"
I opened a command prompt and ran the following with no errors.
C:\Users\Chris>"C:\Users\Chris\anaconda3\condabin\conda" activate xw_cja
(xw_cja) C:\Users\Chris>pythonw -B -c "import sys, os;sys.path[0:0]=os.path.normcase(os.path.expandvars(r'D:\Code\Python\AnacondaVenvs\xw_cja;D:\Code\Python\AnacondaVenvs\xw_cja\advena_xw.zip;C:\Users\Chris\anaconda3\envs\xw_cja\')).split(';')
(xw_cja) C:\Users\Chris>pythonw -B -c "import xlwings.server"
(xw_cja) C:\Users\Chris>pythonw -B -c "import xlwings.server.serve('$(CLSID)')"
(xw_cja) C:\Users\Chris>pythonw -B -c "import sys, os;sys.path[0:0]=os.path.normcase(os.path.expandvars(r'D:\Code\Python\AnacondaVenvs\xw_cja;D:\Code\Python\AnacondaVenvs\xw_cja\advena_xw.zip;C:\Users\Chris\anaconda3\envs\xw_cja\')).split(';');import xlwings.server; xlwings.server.serve('$(CLSID)')"
(xw_cja) C:\Users\Chris>
I opened a new command prompt / shell and tried the full command as a single line with no error reported.
C:\Users\Chris>"C:\Users\Chris\anaconda3\condabin\conda" activate xw_cja && pythonw -B -c "import sys, os;sys.path[0:0]=os.path.normcase(os.path.expandvars(r'D:\Code\Python\AnacondaVenvs\xw_cja;D:\Code\Python\AnacondaVenvs\xw_cja\advena_xw.zip;C:\Users\Chris\anaconda3\envs\xw_cja\')).split(';');import xlwings.server; xlwings.server.serve()
(xw_cja) C:\Users\Chris>
Here is my config data as set on the 'xlwings.conf' worksheet (yes, I removed the leading underscore).
Interpreter_Win pythonw
PYTHONPATH C:\Users\Chris\anaconda3\envs\xw_cja\
Conda Path C:\Users\Chris\anaconda3
Conda Env xw_cja
UDF Modules advena_xw;udf
Debug UDFs FALSE
Log File
Use UDF Server FALSE
Show Console FALSE

So, this was very interesting. I was using a conda environment. Even so, adding the anaconda path variables to my system environment variables (Windows 10) resolved the issue. it is my understanding from the xlwings documentation that this is not needed. But it resolved my problem. I did verify that, for most python scripts triggered from Excel, xlwings is using the conda environment. However, I must have the same version of xlwings dll files in both the conda environment and the main python installation. I have not had this problem when using venv in the past (the native python virtual environment library).

Set Jupyter Notebook password in bash script through environment variable

I have come across the following code in a bash script for automatic deployment and setup of Jupyter Notebook. There are issues when the password contains a #, as the rest of the line is treated as a comment (and probably raises an exception in Python due to lack of closing bracket):
# Set up Jupyter Notebook with the same password
su - $USERNAME -c "$SHELL +x" << EOF
mkdir -p ~/.jupyter
python3 -c "from notebook.auth.security import set_password; set_password(password=\"$PASSWORD\")"
EOF
Obviously this is not an ideal way to set a password due to the above issues.
I am wondering what best practice would be in this situation? Using triple-quotes instead of single enclosing quotes might be better, but still runs into issues if the supplied password has triple quotes. How is this situation normally tackled?

Instead of passing the password through a bash here-document, just use python to read it from the environment:
# Set up Jupyter Notebook with the same password
su - $USERNAME -c "$SHELL +x" << EOF
mkdir -p ~/.jupyter
python3 -c "from notebook.auth.security import set_password; import os; set_password(password=os.environ['PASSWORD'])"
EOF

You can pass Jupyter command line optionsenter link description here to notebook like its done for docker. The method is save as it will use ths sha. the start-notebook.sh script looks like this:source.
#!/bin/bash
# Copyright (c) Jupyter Development Team.
# Distributed under the terms of the Modified BSD License.
set -e
wrapper=""
if [[ "${RESTARTABLE}" == "yes" ]]; then
wrapper="run-one-constantly"
fi
if [[ ! -z "${JUPYTERHUB_API_TOKEN}" ]]; then
# launched by JupyterHub, use single-user entrypoint
exec /usr/local/bin/start-singleuser.sh "$#"
elif [[ ! -z "${JUPYTER_ENABLE_LAB}" ]]; then
. /usr/local/bin/start.sh $wrapper jupyter lab "$#"
else
. /usr/local/bin/start.sh $wrapper jupyter notebook "$#"
fi
For example, to secure the Notebook server with a custom password hashed using IPython.lib.passwd() instead of the default token, you can run the following to test via docker:
#generate a password
from IPython.lib.security import passwd
passwd(passphrase=None, algorithm='sha1')
# run test in docker
docker run -d -p 8888:8888 jupyter/base-notebook start-notebook.sh --notebookApp.password='sha1:31246c00022b:cd4a5e1eac6b621284331642a27b621948d80a92'

how to connect oracle database from python from unix server

How to connect oracle database server from python inside unix server ?
I cant install any packages like cx_Orcale, pyodbc etc.
Please consider even PIP is not available to install.
It my UNIX PROD server, so I have lot of restriction.
I tried to run the sql script from sqlplus command and its working.

Ok, so there is sqlplus and it works, this means that oracle drivers are there.
Try to proceed as follows:
1) create a python virtualenv in your $HOME. In python3
python -m venv $HOME/my_venv
2) activate it
source $HOME/my_venv/bin/activate[.csh] # .csh is for cshell, for bash otherwise
3) install pip using python binary from you new virtualenv, it is well described here: https://pip.pypa.io/en/stable/installing/
TL;DR:
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python get_pip.py (this should install pip into your virtualenv as $HOME/my_env/bin/pip[3]
4) install cx_Oracle:
pip install cx_Oracle
Now you should be able to import it in your python code and connect to an oracle DB.

I tried to connect Oracle database via SQLPLUS and I am calling the script with below way :
os.environ['ORACLE_HOME'] = '<ORACEL PATH>'
os.chdir('<DIR NAME>')
VARIBALE=os.popen('./script_to_Call_sql_script.sh select.sql').read()
My shell script: script_to_Call_sql_script.sh
#!/bin/bash
envFile=ENV_FILE_NAME
envFilePath=<LOACTION_OF_ENV>${envFile}
ORACLE_HOME=<ORACLE PATH>
if [[ $# -eq 0 ]]
then
echo "USAGES: Please provide the positional parameter"
echo "`$basename $0` <SQL SCRIPT NAME>"
fi
ECR=`$ORACLE_HOME/bin/sqlplus -s /#<server_name><<EOF
set pages 0
set head off
set feed off
#$1;
exit
EOF`
echo $ECR
Above things help me to do my work done on Production server.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Problems executing the Jupyter notebook using Sagemaker Lifecyle configuration - python

Related

Notebook Instance Lifecycle Config for Notebook Instance took longer than 5 minutes

Cron, execute bash script as root, but one part (Python script) as user

Python xlwings fails to run COM server

Set Jupyter Notebook password in bash script through environment variable

how to connect oracle database from python from unix server

Categories

Resources