Installing packages in python and setting up the working environment - python

I've been coding with R for quite a while but I want to start learning and using python more for its machine learning applications. However, I'm quite confused as to how to properly install packages and set up the whole working environment. Unlike R where I suppose most people just use RStudio and directly install packages with install.packages(), there seems to be a variety of ways this can be done in python, including pip install conda install and there is also the issue of doing it in the command prompt or one of the IDEs. I've downloaded python 3.8.5 and anaconda3 and some of my most burning questions right now are:
When to use which command for installing packages? (and also should I always do it in the command prompt aka cmd on windows instead of inside jupyter notebook)
How to navigate the cmd syntax/coding (for example the python documentation for installing packages has this piece of code: py -m pip install "SomeProject" but I am completely unfamiliar with this syntax and how to use it - so in the long run do I also have to learn what goes on in the command prompt or does most of the operations occur in the IDE and I mostly don't have to touch the cmd?)
How to set up a working directory of sorts (like setwd() in R) such that my .ipynb files can be saved to other directories or even better if I can just directly start my IDE from another file destination?
I've tried looking at some online resources but they mostly deal with coding basics and the python language instead of these technical aspects of the set up, so I would greatly appreciate some advice on how to navigate and set up the python working environment in general. Thanks a lot!

Python uses a different way of installing packages. Python has a thing named venv which stands for Virtual Environment. You install all of your packages in venv. Usually for each new project you make a new venv.
By using Anaconda on windows you install everything within the anaconda environment that you have specified.
python -m pip install "modulename" is a command that will install modulename to your default venv. You will be able to use this module when no other venv is specified. Here is the docs page. And here is a tutorial on how to use venv
By default python uses the same directory you have your code in. e.g. C:/Users/me/home/mypythonfile.py will run in C:/Users/me/home/ and will be able to access files in this directory. However you can use ../ to navigate directories or you can specify an absolute path to file you want to open e.g. with open("C:/system32/somesystemfile.sys") as file

Going over the technical differences of conda and pip:
So Conda is a packaging tool and installer that aims to do more than what pip does; handle library dependencies outside of the Python packages as well as the Python packages themselves. Both have many similar functionalities as well, you can install packages or create virtual environments with both.
It is generally advisable to generally have both conda and pip installed since there are some packages which might not be available with conda but with pip and vice versa.
The commands to install in both the ways is easy enough, but one thing to keep in mind is that
conda stores packages in the anaconda/pkgs directory
pip stores it in directory under /usr/local/bin/ for a Unix-based system, or \Program Files\ for Windows
You can use both pip or conda inside the jupyter notebook, it will work just fine, but it may be possible that you get multiple versions of the same package.
Most of the times, you will use cmd only to install a module used in your code, or to create environments, py -m pip install "SomeProject" here basically means that the module "SomeProject" will be downloaded in base env.

You could think of conda as python with a variety of additional functionalities, such as certain pre-installed packages and tools, such as spyder and jupyter. Hence, you must be precise when you say:
I've downloaded python 3.8.5 and anaconda3
Does it mean you installed python in your computer and then also anaconda?
In general, or at least in my opinion, using anaconda has advantages for development, but typically you'll just use a simple python installation in production (if that applies to you).
Anaconda has it's own package registry/repository . When you call conda install <package>, it will search for the package there and install it if available. You would better search it first, for instance matplotlib.
pip is a package manager for the Python Package Index. pip also ships with anaconda. Hence, in an anaconda environment you may install packages from either sources (either using pip install or conda install). For instance, pandas from PyPI and pandas from conda. There is no guarantee that packages exist in both sources. You must either search it first or simply try it.
In your first steps, I would suggest you to stick to only one dev env (either simple python or anaconda, recommend the second). Because that simplifies the question: "which python and which pip is executed in the cmd line?". That said, those commands should work as expected in any terminal, it be a simple cmd or an embedded one like in PyCharm or VS Code.
You could inspect that by running (on windows and linux at least):
which python, which pip.
Honestly, this is a question/answer that falls outside the scope of SO and for more info you would better check official websites, such as for anaconda or search for python vs anaconda blogs.

Related

Should I pip install python inside virtualenv?

I need python3.6 for tensorflow installation, so I downloaded python3.6.12.tar. And I found that I should pip install tarfile. However, in this case it is an older version of python. FYI, In my computer(laptop) I installed python3.9.
My question is: can I pip install python.tar inside a virtualenv?
This is not how virtual environments work. I suggest you to do a little bit more research on virtual environments in Python.
Virtual Environments and Packages
Basically you need to install the necessary python version onto your machine. Then go ahead and use that specific python (which is version 3.6 in your case), to create a virtual environment with the command
~ /usr/bin/<path-to-python3.6> -m venv venv
This command will create a folder called venv. Now you need to source the activation script inside this folder to activate your environment.
Handy note: if you are dealing with different versions of python, a more robust way of handling such situations is via using a tool called pyenv.

Issue adding site-packages directory to PYTHONPATH in Spyder

I am using Spyder and trying to add /usr/local/lib/python3.7/site-packages to the PYTHONPATH Manager. However, I receive an error informing me "This directory cannot be added to PATH. If you want to set a different Python interpreter, please go to Preferences > Main Interpreter".
However, I have already changed my interpreter to point to /usr/bin/python3
At the moment, I am using the rather annoying work around of putting the following at the top of all my code.
import sys
sys.path.append("/usr/local/lib/python3.7/site-packages")
Typing the following gives me the below. Is there a way which I can even ensure after running pip3 install XXX in the terminal, that the packages are downloaded somewhere such as the below?
for p in sys.path: print(p)
/Users/user
/usr/local/lib/python3.7
/Users/user/opt/anaconda3/lib/python37.zip
/Users/user/opt/anaconda3/lib/python3.7
/Users/user/opt/anaconda3/lib/python3.7/lib-dynload
/Users/user/opt/anaconda3/lib/python3.7/site-packages
/Users/user/opt/anaconda3/lib/python3.7/site-packages/aeosa
/Users/user/opt/anaconda3/lib/python3.7/site-packages/IPython/extensions
/Users/user/.ipython
Alternatively, and preferably, advice on how to add the above site-packages directory to my PATH? I feel I am missing something obvious.
(Spyder maintainer here) We forbid adding site-packages directories through our PYTHONPATH manager because it allows people to mix two different Python versions (which is what you're trying to do by adding your system site-packages to your Anaconda's Python).
And we do that because it usually generates odd errors and segfaults for binary packages such as Numpy, Pandas and Matplotlib, given that binary packages for one Python version are incompatible with packages for another one.
Finally, even though you found a workaround for that (by using sys.path), we strongly suggest you to stop doing that because it'll give you nothing by headaches in the future.
Doing what you are asking isn't the recommended path forward but you can solve the underlying problem in either of the following ways (A or B).
To "ensure pip installs packages to another location which Spyder can see" as the asker guessed in a comment on the accepted answer which got no answer (Method B below) is usually not a good idea. Keeping a clean environment for Spyder will ensure that you can determine requirements (including package version) for each of your projects reliably. Therefore, do the reverse of what you guessed: Ensure Spyder uses the Python interpreter in the environment where pip installed your project's required packages.
A. Change the Python interpreter
Go to Tools, Preferences, and set Python interpreter to the python executable that was used to install the package (If using a virtual environment, it would be your_other_env/bin/python).
Close and reopen Spyder (Spyder says to restart the IPython console, but it may not work in this case and show the error where Spyder cannot restart a kernel it didn't start).
Open Spyder again and run any py file. You will get an error that says to install the spyder-kernels package (for some reason pip 22.0.4 will only install spyder_kernels: This issue is at "spyder-kernels should be spyder_kernels" :edit: but the issue is invalid, so upgrade pip first such as via pip install --upgrade pip in your virtual environment). Take note of the version in the error, since that is the version you need.
If you are using conda or are on Windows the instructions will differ, so see Common Illnesses in the Spyder documentation instead of continuing this step.
source your_other_env/bin/activate
pip install --upgrade pip setuptools
pip install spyder-kernels=...
deactivate
but change ... to the version shown in your Spyder error from step 3. If you installed Spyder with conda as recommended, use the commands from the URL above instead.
I don't recommend Method B, as I've explained. However, it may be useful if you are manually installing Spyder plugins or test suites that apply to all projects but aren't in the requirements.txt or setup.py requirements for your project(s) (and therefore don't affect determining requirements for your users).
B. To "ensure pip installs packages to another location which Spyder can see" you would run "spyder_env/bin/python -m pip install ..." to install the package, where spyder_env is the virtualenv where Spyder is installed (but if Spyder is installed in the system using an installer or linux distro package, you may need to use your system's python such as via python3 -m pip install --user ... where ... is the package name. Always use --user instead of sudo or root to avoid mismatched files caused by mashing together the distro-packaged modules and your manually installed modules).

Does miniconda installation affect standard python installation?

I had first installed python using the standard python distribution available on their official website and I would be using pip to install all necessary packages.
However, now I wish to use miniconda, since it is a better choice for data science.
But, it installs python along with It and I don't want to disturb my earlier setup of pip+Python.
Will installing miniconda affect my python installation.
Is there a way of installing it without disturbing the python installation?
I am on a Windows operating system.
You can safely install Anaconada (or Miniconda) on top of other Python installations. It goes into a completely different folder on your local disk. But leave the default installation options on default, especially don't add Python to your path.
The important thing is that you activate your environment before you use it via
conda activate
and then start Python from there (or let your IDE do that for you).
(base)> python
Without activatation conda doesn't work and calling python from the command prompt will start your 'standard installation' again.
The advantage of Anaconda is that it guarantees maximumum consistency for the 'scientific stack' and in case you are still missing some 3rd party packages you can always install them aditionally via `pip install' into an activated conda environment.

Should Anaconda be use to manage system python? Or, is it just to create an isolated environment?

I already have Python 2.7 installed in Windows. I have normally used pip to install packages. However, Pandas recommends using Anaconda and it appears that it has many benefits so I wanted to try it.
I installed miniconda and it just reinstalled Python under its own directory. Does Anaconda always duplicate the python libraries or can it be used to manage the system's python.
I use python to develop and also wanted to use Pandas to analyse data. However, I would like to avoid have two copies of Python. I want to have one python environment that is constant with all the packages that I intend to have. Otherwise, I feel that I will have to install the same packages multiple times.
I know that Anaconda is to separate different environments. Does this mean that I am trying to do something it is not its purpose or have I installed it incorrectly?
Anaconda has a root environment that includes a bit more than 100 of the most popular Python packages.
Yes, you can use the root Python as your system's Python executable.
The anaconda installation comes with Conda, which is a robust environment manager. If you want to keep your root environment stable, you can use Conda to create new environments for each project, and Conda handles the dependencies of each environment as well.
You can create a new environment named "analysis" that has Python, IPython, and Pandas using:
conda create --name analysis python ipython pandas
After installing all of the packages, you can use the environment by running (from the CMD prompt):
conda activate analysis

How to install Python libraries under specific environments

I have two Anaconda installations on my computer. The first one is based on Python 2.7 and the other is based on Python 3.4. The default Python version is the 3.4 though. What is more, I can start Python 3.4 either by typing /home/eualin/.bin/anaconda3/bin/python or just python. I can do the same but for Python 2.7 by typing /home/eualin/.bin/anaconda2/bin/python. My problem is that I don't know how to install new libraries under certain environments (either under Python 2.7 or Python 3.4). For example, when I do pip install seaborn the library gets installed under Python 3.4 by default when in fact I want to install it under Python 2.7. Any ideas?
EDIT
This is what I am doing so far: the ~/.bashrc file contains the following two blocks, of which only one is enabled at any given time.
# added by Anaconda 2.1.0 installer
export PATH="/home/eualin/.bin/anaconda2/bin:$PATH"
# added by Anaconda3 2.1.0 installer
#export PATH="/home/eualin/.bin/anaconda3/bin:$PATH"
Depending of which version I want to work, I open the fie, comment the opposite block and do source ~/.bashrc Then, I install the libraries I want to use one by one. But, is this the recommended way?
You don't need multiple anaconda distributions for different python versions. I would suggest keeping only one.
conda basically lets you create environments for your different needs.
conda create -n myenv python=3.3 creates a new environment named myenv, which works with a python3.3 interpreter.
source activate myenv switches to the newly created environment. This basically sets the PATH such that pip, conda, python and other binaries point to the correct environment and interpreter.
conda install pip is the first thing you may want to do. Afterwards you can use pip and conda to install the packages you need.
After activating your environment pip install <mypackage> will point to the right version of pip so no need to worry too much.
You may want to create environments for different python versions or different sets of packages. Of course you can easily switch between those environments using source activate <environment name>.
For more examples and details you may want to have a look at the docs.
Virtualenv seems like the obvious answer here, but I do want to suggest an alternative that we've been using to great effect lately: Fig - this is particularly effective since we use Docker in production as well, but I imagine that using Fig as a replacement for virtualenv would be quite effective regardless of your production environment.
Using virtualenv is your best option as #Dettorer has mentioned.
I found this method of installing and using virtualenv the most useful.
Check it out:
Proper way to install virtualenv

Categories

Resources