I create a virtual environment and run PySpark script. If I do these steps on MacOS, everything works fine. However, if I run them on Linux (Ubuntu 16), then the incorrect version of Python is picked. Of course, I previously did export PYSPARK_PYTHON=python3 on Linux, but still the same issue. Below I explain all steps:
1. edit profile :vim ~/.profile
2. add the code into the file: export PYSPARK_PYTHON=python3
3. execute command: source ~/.profile
Then I do:
pip3 install --upgrade pip
pip3 install virtualenv
wget https://archive.apache.org/dist/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz
tar -xvzf spark-2.4.0-bin-hadoop2.7.tgz && rm spark-2.4.0-bin-hadoop2.7.tgz
virtualenv tes-ve
source test-ve/bin/activate && pip install -r requirements.txt
If I execute python --version inside the visual environment, I see Python 3.5.2.
However when I run Spark code with this command: sudo /usr/local/spark-2.4.0-bin-hadoop2.7/bin/spark-submit mySpark.py, I get Using Python version 2.7... for these lines of code:
print("Using Python version %s (%s, %s)" % (
platform.python_version(),
platform.python_build()[0],
platform.python_build()[1]))
PYSPARK_PYTHON sets the call that's used to execute Python on the slave nodes. There's a separate environment variable called PYSPARK_DRIVER_PYTHON that sets the call for the driver node (ie the node on which your script is initially run). So you need to set PYSPARK_DRIVER_PYTHON=python3 too.
Edit
As phd points out, you may be running into trouble with your environment since you're using sudo to call the Pyspark submit. One thing to try would be using sudo -E instead of just sudo. The -E option will preserve your environment (though it isn't perfect).
If that fails, you can try setting the spark.pyspark.driver.python and spark.pyspark.python options directly. For example, you can pass the desired values into your call to spark-submit:
sudo /usr/local/spark-2.4.0-bin-hadoop2.7/bin/spark-submit --conf spark.pyspark.driver.python=python3 --conf spark.pyspark.python=python3 mySpark.py
There's a bunch of different ways to set these options (see this doc for full details). If one doesn't work/is inconvenient for you, try another.
Related
Currently I am tryting to, in a python script,
create a conda venv in a temp dir with a different python version I am using in my system
install some packages into this temp conda venv
Execute other python script using this new venv
Kill the process (which is automatic since it is under with .... as ..:)
import subprocess
from tempfile import TemporaryDirectory
with TemporaryDirectory() as tmpdir:
subprocess.call([
f"""
conda create -p {tmpdir}/temp_venv python=3.8 <<< y;
conda activate {tmpdir}/temp_venv && pip install <some_package>==XXX;
{tmpdir}/temp_venv/bin/python /path/to/python/script/test.py
"""
],
shell=True)
The point is that when I try this approach, I get the following error
**CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
You may need to close and restart your shell after running 'conda init'.**
I have already tried running conda init bash but the error persists.
I have also tried to use the Venv package for that but unfortunately it does not let me create a venv with a python version that is not installed in the system.
So, the problem is that conda expects your shell to be initialized normally (an interactive shell). But when you use subprocess, you are in a non-login, non-interactive shell. So, one hack would be to manually call the shell startup script. So for example, on my Macbook Pro:
subprocess.run([
f"""
conda create -y -p {tmpdir}/temp_venv python=3.8;
conda init
source ~/.bash_profile
conda activate {tmpdir}/temp_venv && pip install <some_package>==XXX;
{tmpdir}/temp_venv/bin/python /path/to/python/script/test.py
"""
],
shell=True)
Of course, this is going to be a bit platform dependent. For example, on Ubuntu, you are going to want to use:
source ~/.bashrc
instead.
A more portable solution would be to get subprocess.run to use an interactive shell, that would automatically call those scripts according to the convention of your OS (which conda handles setting up correctly).
So, this is definitely a hack, but it should work.
BTW, if you are using conda, you might as well use:
conda create -y -p {tmpdir}/temp_venv python=3.8 <some_package>==XXX
instead of a seperate:
pip install <some_package>==XXX;
A less hacky alternative is to use conda run, which will run a script in the conda environment. So something like:
subprocess.run([
f"""
conda create -y -p {tmpdir}/temp_venv python=3.8;
conda run -p {tmpdir}/temp_venv --no-capture-output pip install <some_package>==XXX;
conda run -p {tmpdir}/temp_venv/bin/python --no-capture-output /path/to/python/script/test.py
"""
],
shell=True)
I hesitate to use conda run because, at least a few years ago, it was considered "broken" for various subtle reasons, although, in simple cases it works. I think it is still considered an "experimental feature", so use with that caveat in mind, but it should be more portable.
I'm aware there are many similar questions but I have been through them all to no avail.
On Ubuntu 18.04, I have Python 2 and Python 3.6. I create a venv using the command below and attempt to install a package using pip. However, it attempts to install on the global system and not in the venv.
python3 -m venv v1
When I run 'which python' it correctly picks the python within the venv. I have checked he v1/bin folder and pip is installed. The path within the pip script is correctly pointed to toward python in the venv.
I have tried reinstalling python3 and venv, destroying and recreating the virtual environment and many other things. Wondering is there some rational way to understand and solve this.
The problem in my case was that the mounted drive I was working on was not mounted as executable. So pip couldn't be executed from within the venv on the mount.
This was confirmed because I was able to get a pip install using 'python -m pip install numpy' but when importing libraries, e.g. 'import numpy', was then faced with further error of:
multiarray_umath.cpython-36m-x86_64-linux-gnu.so: failed to map segment from shared object
which led back to the permissions issue as per github issue below. Fix for that by dvdabelle in comments then fixes dependent and original issue.
https://github.com/numpy/numpy/issues/15102
In his case, he could just switch drive. I have to use this drive. So the fix was to unmount my /data disk where I was working and remount it with exec option!
sudo umount /data
sudo mount -o exec /dev/sda4 /data
'which pip' now points to the pip in the venv correctly
Note: to make it permanent add the exec switch to the line for the drive in fstab as per https://download.tuxfamily.org/linuxvillage/Informatique/Fstab/fstab.html (make exec the last parameter in the options or user will override it) E.g.
UUID=1332d6c6-da31-4b0a-ac48-a87a39af7fec /data auto rw,user,auto,exec 0 0
I had this in my .bash_profile:
PATH="/Library/Frameworks/Python.framework/Versions/3.4/bin:${PATH}"
And I thought that if I just change it to this:
PATH="/Users/myusername/.pyenv/versions/3.7.2/bin:${PATH}"
Then virtualenvwrapper should simply use this as the new "source" Python to use. But that breaks it and issues a warning about the Python version not having any "virtualenvwrapper hooks".
How can I change the version mkvirtualenv installs by default? I'm looking for this to be a one-time change. I'm aware of the -p flag but don't want to have to specify it every time I create a virutalenv.
Solution 1:
alias vv="virtualenvwrapper -p python3.7"
Solution 2:
set python3.7 as your default version, for example:
export py_which=`which python`
sudo rm $py_which
sudo ln -s `which python3.7` $py_which
Apparently the code in my question works, I just needed to install virtualenvwrapper for that specific python env.
For simplicity, I'm now prepending this python version to my path as below, so I can easily change the path in the future:
export PYTHON_PATH_LATEST="/Users/myusername/.pyenv/versions/3.7.2/bin"
PATH="${PYTHON_PATH_LATEST}:${PATH}"
As an added bonus, this is now also the python version pipenv will choose by default.
As I understood, I have two versions of python 2.7 installed on my machine. One is located in /usr/bin and another one is in /usr/local/bin. When I type python in the shell, it calls one in /usr/local/bin, and it doesn't have access to all the packages installed using apt-get and pip. I have tried to set up an alias, but when I type sudo python it still calls one in /usr/local/bin. I want to always use one in /usr/bin, since I have all the packages there. How do I do that?
From what I understood,
You have two version of python. One is in /usr/local/bin/python
and another is in /usr/bin/python.
In your current configuration default python ->
/usr/local/bin/python
You want to use the one that is in /usr/bin.
Update your ~/.bashrc and append this line at the end
alias python=/usr/bin/python
Then open a new terminal. Or do source ~/.bashrc in the current terminal
Run which python to see the location of the python executable. It will show you /usr/bin/python
Also, if you want to get packages in your current python (i.e. /usr/local/bin/python) you can use pip with that particular python version.
Find pip location using which pip
Assuming pip location is /usr/local/bin/pip
/usr/local/bin/python /usr/local/bin/pip install
you can easily have two python version in your machine.
But first I recommend to install the Anaconda package.
And then you can create an environment with python 3 version
conda create --name test_env python=3 numpy pandas
In order to activate it, you need to write in your terminal
source activate test_env
More info here:
https://conda.io/docs/using/envs.html
I'm trying to run some python code with the sudo command, but every time I do it, it gives me an Import error. However, if I run, say, import numpy in terminal it gives me no errors. Also, if I build a code with several Imports and then run it without the sudo command, it gives me no errors and the code runs flawlessly. I already added Defaults env_keep += "PYTHONPATH" to the sudoers folder, so that's not the problem. I installed Anaconda3, so maybe that's useful information?
I'm running GNOME Ubuntu 16.04.1 LTS. And kernel version 4.4.0-59-generic.
I'm sorry, I'm very new at this, but I'm learning.
I ran which python and then I ran sudo which python and they gave me different directories.
sudo which python gave me usr/bin/python which python gave me home/user/anaconda3/bin/python
I tried running sudo ./anaconda3/envs/ml/bin/python doc.py but now it says that it can't find the file.
I'm running it with sudo because I need the permission for docker to work.
EDIT: trying sudo -E instead of sudo yields the same error.
The problem you have is that sudo does not follow the usual PATH order when looking at an executable: it searches the system directories first. This is written in the man sudo:
SECURITY NOTES
sudo tries to be safe when executing external commands.
To prevent command spoofing, sudo checks "." and "" (both denoting current directory) last when searching for a command in the
user's PATH (if one or both are in the PATH). Note, however, that the actual PATH environment variable is not modified and is passed
unchanged to the program that sudo executes.
So, to fix this you have to make sure that the command you give to sudo cannot match a system executable, i.e. specify the absolute path:
sudo /home/user/anaconda3/bin/python
A general command that should work is:
sudo "$(which python)"
This is because which python is executed before sudo, and its output is passed as an argument to sudo. However sudo by default does not perform any "shell-like" setup, and may restrict the environment, so you may consider using the -E or -i flags to make sudo pass the environment untouched and do the proper shell setup.