Anaconda + Spark - changing python version for ipython notebooks

Anaconda + Spark - changing python version for ipython notebooks - python

I installed Anaconda, and try to work with spark on top.
When I launch spark with Ipython_OPTS="notebook", the python version is the one associated with anaconda's version of python for the notebook.
$ conda search python
Using Anaconda Cloud api site https://api.anaconda.org
Fetching package metadata: ....
ipython 0.13 py26_0 defaults
* 4.1.2 py35_1 defaults
ipython-notebook 0.13.2 py27_0 defaults
4.0.4 py27_0 defaults
4.0.4 py34_0 defaults
4.0.4 py35_0 defaults
python 1.0.1 0 defaults
. 2.7.11 0 defaults
* 3.5.1 0 defaults
And if start spark-shell I can precise the environment version on which I am interested (I want 2.7.11) :
$ PYSPARK_PYTHON=/Applications/anaconda/anaconda/envs/vingt-sept/bin/python pyspark
Python 2.7.11 |Continuum Analytics, Inc.| (default, Dec 6 2015, 18:57:58)
but if I start spark with the ipython notebook then it defaults back to the python 3.5 version :-(
How can I link the default ipython version to the same version as the one of my env "vingt-sept" ?

Similar to how you are setting the PYSPARK_PYTHON environment variable for the pyspark shell, you can also set this environment variable in your IPython/Jupyter notebook using:
import os
os.environ["PYSPARK_PYTHON"] = "/Applications/anaconda/anaconda/envs/vingt-sept/bin/python"
Refer to this blog post for more information about setting PYSPARK_PYTHON and other Spark-related environment variables from your notebook.

Related

How should I install keras-bert to use properly on R side?

I am trying to install keras-bert as explained here. Although it is successfully installed on the environment as, I cannot see keras-bert inside the R side.
(bert_env) C:\Users\xxxxx\Dropbox\Rcode\ProjectBERT>pip list
Package Version
-------------------------------- ---------
certifi 2022.6.15
keras 2.6.0
keras-bert 0.89.0
keras-embed-sim 0.10.0
keras-layer-normalization 0.16.0
keras-multi-head 0.29.0
keras-pos-embd 0.13.0
keras-position-wise-feed-forward 0.8.0
keras-self-attention 0.51.0
keras-transformer 0.40.0
mkl-fft 1.3.1
mkl-random 1.2.2
mkl-service 2.4.0
numpy 1.23.1
pip 22.1.2
setuptools 61.2.0
six 1.16.0
wheel 0.37.1
wincertstore 0.2
Another proof which shows keras-bert is successfully install on the current environment (bert_env):
(bert_env) C:\Users\xxxxx\Dropbox\Rcode\ProjectBERT>pip install keras-bert
Requirement already satisfied: keras-bert in c:\anaconda3\envs\bert_env\lib\site-packages (0.89.0)
Requirement already satisfied: keras-transformer==0.40.0 in c:\anaconda3\envs\bert_env\lib\site-packages (from keras-bert) (0.40.0)
However, I cannot properly call the package on the R side which says:
> reticulate::conda_list()
name python
1 base C:\\Anaconda3/python.exe
2 bert_env C:\\Anaconda3\\envs\\bert_env/python.exe
3 py27 C:\\Anaconda3\\envs\\py27/python.exe
> reticulate::use_condaenv("bert_env", required=TRUE)
> reticulate::py_config()
C:\ANACON~2\envs\bert_env\lib\site-packages\numpy\__init__.py:138: UserWarning: mkl-service package failed to import, therefore Intel(R) MKL initialization ensuring its correct out-of-the box operation under condition when Gnu OpenMP had already been loaded by Python process is not assured. Please install mkl-service package, see http://github.com/IntelPython/mkl-service
from . import _distributor_init
python: C:/Anaconda3/envs/bert_env/python.exe
libpython: C:/Anaconda3/envs/bert_env/python310.dll
pythonhome: C:/Anaconda3/envs/bert_env
version: 3.10.4 | packaged by conda-forge | (main, Mar 30 2022, 08:38:02) [MSC v.1916 64 bit (AMD64)]
Architecture: 64bit
numpy: C:/Anaconda3/envs/bert_env/Lib/site-packages/numpy
numpy_version: 1.23.1
I have almost tried everything which offered as solution such as:
conda update conda
conda update --all
Then I have created a fully clean new environment in order to reinstall keras-bert on there. Also looked scipy and numpy package conflicts which is mentioned on their github side. No success at all!
Of course, once I check availability, it returns false!
reticulate::py_module_available('keras_bert')
[1] FALSE
I am aware of this post and applied exactly same patterns with YAML. No improvements as well. Any suggestions to properly install keras-bert or how should I approach the issue are greatly appreciated.

After a couple of days search, I found a way to set the reticulate path properly. This issue solves the main issue as well. .Renviron file is a user-controllable file that can be used to create environment variables. Hence, the .Renviron file can be used to store the information which can be accessed from all R projects in your computer.
First of all, you need to find the location of your .Renviron file. In order to find the path of your .Renviron file, you can type this on RStudio console:
usethis::edit_r_environ()
In my case, it returns:
* Modify 'C:/Users/xxxxx/Documents/.Renviron'
* Restart R for changes to take effect
Then, you might either navigate the location of .Renviron file manually. (Note: It was empty once I opened it in my scenario.) Then paste the full path of python.exe inside to the file as:
RETICULATE_PYTHON="YourEnvironmentPath/python.exe"
Tip: If you are not sure about your full python path, you can get it as:
reticulate::conda_python("keras_bert")
Please do not forget to type your environment name instead of my environment nam > keras_bert).
Then I got:
[1] "C:\\Anaconda3\\envs\\keras_bert/python.exe"
In my case, I pasted:
RETICULATE_PYTHON="C:\\Anaconda3\\envs\\keras_bert/python.exe"
Or (on Windows), you can click on start and open Windows Powershell. Then copy this code into powershell to handle same thing:
Add-Content C:\Users\xxxxx\Documents/.Renviron 'RETICULATE_PYTHON="C:\\Anaconda3\\envs\\keras_bert/python.exe"'
In order to see updates, you must close and reopen RStudio to have a clean RStudio session. Now, run the following line to check whether everything is fine or not:
Sys.getenv('RETICULATE_PYTHON')
or:
reticulate::py_config()
In my case, it returns:
[1] "C:/Anaconda3/envs/keras_bert/python.exe"
or:
python: C:/Anaconda3/envs/keras_bert/python.exe
libpython: C:/Anaconda3/envs/keras_bert/python38.dll
pythonhome: C:/Anaconda3/envs/keras_bert
version: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 05:59:45) [MSC v.1929 64 bit (AMD64)]
Architecture: 64bit
numpy: C:/Anaconda3/envs/keras_bert/Lib/site-packages/numpy
numpy_version: 1.23.2
NOTE: Python version was forced by RETICULATE_PYTHON
Briefly, with this solution, you do not need to care about whether you declare the environment at the start of the R session or any other messy scenarios.

How do I update the default version of python when using Anaconda

I'm trying to start using Python 3.9 and am unable to make this happen.
From the Anaconda Prompt I run and get this:
C:\Users\fname lname>python --version
Python 3.7.10
I thought I had installed 3.9 already, so I check to see if I have it and I run this command:
C:\Users\fname lname>conda search python
Loading channels: done
# Name Version Build Channel
python 2.7.13 h1b6d89f_16 pkgs/main
python 2.7.13 h9912b81_15 pkgs/main
...
python 3.9.1 h6244533_2 pkgs/main
python 3.9.2 h6244533_0 pkgs/main
Sure enough it looks like I have 3.9.2 so I run
conda install python=3.9.2
Here I'm getting a lot of conflict errors and I'm not sure really what actually is happening if anything.
It's a pretty long list of items that pop up and most of it scrolls long enough that I can't recover all of the lines:
...
Package hdf5 conflicts for:
hdf5
opencv -> hdf5[version='>=1.10.2,<1.10.3.0a0|>=1.10.4,<1.10.5.0a0|>=1.8.20,<1.9.0a0|>=1.8.18,<1.8.19.0a0']
Package pkginfo conflicts for:
conda-build -> pkginfo
pkginfo
Package pywin32-ctypes conflicts for:
keyring -> pywin32-ctypes[version='!=0.1.0,!=0.1.1']
pywin32-ctypes
spyder -> keyring[version='>=17.0.0'] -> pywin32-ctypes[version='!=0.1.0,!=0.1.1']
And running the same version command as before I get Python 3.7.10

Python - why would library version from pip freeze be different than when I actually run it?

I'm using conda and pip to manage my packages.
In my environment.yml, I have the following
- LOTS OF PACKAGES
- ...
- ...
- pip:
- pyarrow==0.16.0
So pyarrow should be a specific version - 0.16.
I conda activate into that environment. And when I do a pip freeze or pip show, the version agrees. It's supposed to be 0.16
(CONDA) $ pip show pyarrow
Name: pyarrow
Version: 0.16.0
Summary: Python library for Apache Arrow
Home-page: https://arrow.apache.org/
Author: None
Author-email: None
License: Apache License, Version 2.0
Location: /home/<USRER>/anaconda3/envs/CONDA/lib/python3.6/site-packages
Requires: numpy, six
But when I fire up python, import the library, and try get the version, it's a different version.
(CONDA) $ python
Python 3.6.6 |Anaconda, Inc.| (default, Oct 9 2018, 12:34:16)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow
>>> pyarrow.__version__
'0.12.0'
>>> pyarrow.__file__
'/home/<USER>/anaconda3/envs/CONDA/lib/python3.6/site-packages/pyarrow/__init__.py'
I don't understand how that's possible. I would expect the version to agree, but for some reason python insists that pyarrow is a different version.
Now I suspect my entire conda environment is bad. Shouldn't the version that I get in python agree with pip freeze?

pyarrow does some funny stuff with the __version__ variable. It is generated using setuptools-scm.
So maybe the conda release you installed is broken for some reason. Maybe try to install it manually by cloning it and installing it with pip from the cloned folder without passing through conda and see if you get a different result.
I beleive you can do this automagically by running:
pip install "git+https://github.com/apache/arrow#apache-arrow-0.16.0#egg=pyarrow&subdirectory=python"

install opencv3 on OSX and use with notebook

I am on OSX el Capitan and doing Data Science.
For this I am using anaconda with Python 2.7
I used various envs successfully and was very happy in general with anaconda.
Now I wanted to do a new env (called tf for tensorflow) and install opencv 3.1 which I succeeded after several trials. So, if I open python, it prompts with
Python 2.7.13 |Continuum Analytics, Inc.| (default, Dec 20 2016, 23:05:08)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
and then I do
import cv2
print(cv2.__version__)
and it prompts me with 3.1.0
So far so good.
All this I do within my environment tf
But now I call a notebook by
jupyter notebook
and open a new notebook, import cv2, it does not like this
ImportError: No module named cv2
I cannot understand this and need help!
When I
conda list
I get all packages (see below partial paste)
jsonschema 2.5.1 py27_0
jupyter 1.0.0 py27_3
jupyter_client 5.0.0 py27_0
jupyter_console 5.1.0 py27_0
jupyter_core 4.3.0 py27_0
libpng 1.6.28 0 conda-forge
libtiff 4.0.6 7 conda-forge
markupsafe 0.23 py27_2
mistune 0.7.4 py27_0
mkl 2017.0.1 0
nbconvert 5.1.1 py27_0
nbformat 4.3.0 py27_0
notebook 4.4.1 py27_0
numpy 1.12.0 py27_0
opencv 3.1.0 np112py27_1 conda-forge
opencv3 3.1.0 py27_0 menpo
openssl 1.0.2k
I also add for info the output of the system when I do
conda info -a
I get
Current conda install:
platform : osx-64
conda version : 4.3.14
conda is private : False
conda-env version : 4.3.14
conda-build version : not installed
python version : 2.7.13.final.0
requests version : 2.12.4
root environment : /Users/peterhirt/anaconda (writable)
default environment : /Users/peterhirt/anaconda/envs/tf
envs directories : /Users/peterhirt/anaconda/envs
/Users/peterhirt/.conda/envs
package cache : /Users/peterhirt/anaconda/pkgs
/Users/peterhirt/.conda/pkgs
channel URLs : https://repo.continuum.io/pkgs/free/osx-64
https://repo.continuum.io/pkgs/free/noarch
https://repo.continuum.io/pkgs/r/osx-64
https://repo.continuum.io/pkgs/r/noarch
https://repo.continuum.io/pkgs/pro/osx-64
https://repo.continuum.io/pkgs/pro/noarch
config file : None
offline mode : False
user-agent : conda/4.3.14 requests/2.12.4 CPython/2.7.13 Darwin/15.6.0 OSX/10.11.6
UID:GID : 501:20
# conda environments:
#
tf * /Users/peterhirt/anaconda/envs/tf
root /Users/peterhirt/anaconda
sys.version: 2.7.13 |Anaconda 4.3.1 (x86_64)| (defaul...
sys.prefix: /Users/peterhirt/anaconda
sys.executable: /Users/peterhirt/anaconda/bin/python
conda location: /Users/peterhirt/anaconda/lib/python2.7/site-packages/conda
conda-build: None
conda-env: /Users/peterhirt/anaconda/bin/conda-env
conda-server: /Users/peterhirt/anaconda/bin/conda-server
user site dirs: ~/.local/lib/python2.7
CIO_TEST: <not set>
CONDA_DEFAULT_ENV: tf
CONDA_ENVS_PATH: <not set>
DYLD_LIBRARY_PATH: <not set>
PATH: /Users/peterhirt/anaconda/envs/tf/bin:/Users/peterhirt/anaconda/bin:/usr/local/bin:/Users/peterhirt/.npm-packages/bin:/Users/peterhirt/anaconda2/bin:/Users/peterhirt/google-cloud-sdk/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin
PYTHONHOME: <not set>
PYTHONPATH: <not set>
License directories:
/Users/peterhirt/.continuum
/Users/peterhirt/Library/Application Support/Anaconda
/Users/peterhirt/anaconda/licenses
License files (license*.txt):
Package/feature end dates:

Using Docker images helps the best in such cases since it encapsulates the environment. You can install Docker from here.
After pulling the image, you can use code like this in the shell:
docker run --rm -it -p 8888:8888 -v d:/Kaggles:/d datmo/kaggle:cpu
Run jupyter notebook inside the container
jupyter notebook --ip=0.0.0.0 --no-browser
This mounts the local directory onto the container having access to it.
Then, go to the browser and hit https://localhost:8888, and when I open a new kernel it's with Python 3.5.
You can find more information from here.
You can also try using datmo in order to easily setup environment and track machine learning projects to make experiments reproducible. You can run datmo task command as follows for setting up jupyter notebook,
datmo task run 'jupyter notebook' --port 8888
It sets up your project and files inside the environment to keep track of your progress.

Packages from Conda env not found in Jupyer Notebook

I created an environment called imagescraper and installed pip with it.
I then proceed to use pip to install a package called ImageScraper;
>>activate imagescraper
[imagescraper]>>pip install ImageScraper
Just to ensure that I have the package successfully installed:
>>conda list
[imagescraper] C:\Users\John>conda list
# packages in environment at C:\Anaconda2\envs\imagescrap
#
future 0.15.2 <pip>
imagescraper 2.0.7 <pip>
lxml 3.6.0 <pip>
numpy 1.11.0 <pip>
pandas 0.18.0 <pip>
pip 8.1.1 py27_1
python 2.7.11 4
python-dateutil 2.5.2 <pip>
pytz 2016.3 <pip>
requests 2.9.1 <pip>
setproctitle 1.1.9 <pip>
setuptools 20.3 py27_0
simplepool 0.1 <pip>
six 1.10.0 <pip>
vs2008_runtime 9.00.30729.1 0
wheel 0.29.0 py27_0
Before I launch Jupyter notebook, just to check where we are getting the path from:
[imagescraper] C:\Users\John>python
Python 2.7.11 |Continuum Analytics, Inc.| (default, Feb 16 2016, 09:58:36) [MSC
v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>> import sys
>>> sys.executable
'C:\\Anaconda2\\envs\\imagescraper\\python.exe'
>>> import image_scraper
Seems ok, so I proceed to launch Jupyter notebook using
[imagescraper]>>jupyter notebook
Within the notebook I created a new book and when i tried the same;
import image_scraper
I am returned with:
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-1-6c2b65c9cdeb> in <module>()
----> 1 import image_scraper
ImportError: No module named image_scraper
Doing the same to check the paths within Jupyter notebook, I get this;
import sys
sys.executable
'C:\\Anaconda2\\python.exe'
Which tells me that it is not referring to the environment where I installed the modules in.
Is there a way I can ensure that my notebooks all refer to its own env packages?

Here are two possible solutions:
You can register a new kernel based on your imagescraper environment. The kernel will start from the imagescraper environment and thus sees all its packages.
source activate imagescraper
conda install ipykernel
ipython kernel install --name imagescraper
This will add a new kernel named imagescraper to your jupyter dashboard.
Another solution is to install jupyter notebook into the imagescraper environment and start jupyter from the enviroment. This requires activating imagescraper whenever you start jupyter notebook.
source activate imagescraper
conda install notebook
jupyter notebook

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Anaconda + Spark - changing python version for ipython notebooks - python

Related

How should I install keras-bert to use properly on R side?

How do I update the default version of python when using Anaconda

Python - why would library version from pip freeze be different than when I actually run it?

install opencv3 on OSX and use with notebook

Packages from Conda env not found in Jupyer Notebook

Categories

Resources