Skip to main content

Anaconda - Scientific Python

What is Anaconda?

At a basic level Anaconda is a distribution of Python and R, although there is an emphasis on working with python, that provides access collections of associated packages optimized specifically for data science maintained in repositories. The installation and management of these packages is handled with the Anaconda package manager Conda. While initially focused mainly on python packages the repositories hosted by Anaconda and others now house a large collection of non-python packages.

Conda is more than just a package manager however, it also creates and manages the environments that packages are installed in to. The use of environments to isolate software means you can have multiple versions of the same software installed in different environments and avoid conflicts or incompatibilities between software or dependencies. This is accomplished by installing packages into a separate directory which is then appended to your PATH when that environment is activated.

The next couple of pages will provide a brief introduction on how to use Conda to create and maintain locally administered environments on the CARC machines.

For more information on the usage and various features of Conda, please visit their website at this link, or for a quick reference guide please see the conda cheatsheet here.

Working with Conda

Using Conda to create a new environment

Let's start by logging in to Wheeler and create an empty environment. After creating a new environment we will then install Python 2.7 our newly created environment and explore what Conda is doing. To do this we must first load the module for Anaconda, in this case we will be using Anaconda3 which uses Python 3 by default. If you know that you are going to be using a lot of code written in Python 2 then you can load Anaconda instead of Anaconda3.

$ module load anaconda3
$ conda create --name py-2.7
Solving environment: done

## Package Plan ##environment location: /users/yourusername/.conda/envs/py-2.7


Proceed ([y]/n)?Preparing transaction: done
Verifying transaction: done
Executing transaction: done
## To activate this environment, use##   $ conda activate MyFirstEnvironment## To deactivate an active environment, use##   $ conda deactivate

We now have a new Conda environment that we can populate with whatever software we require for the analyses we wish to run. ###Installing packages with Conda Now that we have our empty environment let's install Python 2.7 in it. Make sure you have the Anaconda3 module loaded and then type the following command:

$ conda install --name py-2.7 python=2.7

You should see the following print to stdout

## Package Plan ##environment location: /users/yourusername/.conda/envs/py-2.7

  added / updated specs:
    - python=2.7


The following packages will be downloaded:

    package|build---------------------------|-----------------pip-10.0.1|py27_0      1.7 MB
    certifi-2018.8.24|py27_1      139 KB
    python-2.7.15|h1571d57_0      12.1 MB
    setuptools-40.2.0|py27_0      585 KB
    wheel-0.31.1|py27_0          62 KB
    tk-8.6.8|hbc83047_0      3.1 MB
    readline-7.0|h7b6447c_5      392 KB
    ------------------------------------------------------------Total:      18.1 MB

The following NEW packages will be INSTALLED:

    ca-certificates: 2018.03.07-0
    certifi:        2018.8.24-py27_1
    libedit:        3.1.20170329-h6b74fdf_2
    libffi:         3.2.1-hd88cf55_4
    libgcc-ng:      8.2.0-hdf63c60_1
    libstdcxx-ng:   8.2.0-hdf63c60_1
    ncurses:        6.1-hf484d3e_0
    openssl:        1.0.2p-h14c3975_0
    pip:            10.0.1-py27_0
    python:         2.7.15-h1571d57_0
    readline:       7.0-h7b6447c_5
    setuptools:     40.2.0-py27_0
    sqlite:         3.24.0-h84994c4_0
    tk:             8.6.8-hbc83047_0
    wheel:          0.31.1-py27_0
    zlib:           1.2.11-ha838bed_2

Proceed ([y]/n)?Downloading and Extracting Packages
pip-10.0.1|1.7 MB |####################################### | 100%certifi-2018.8.24|139 KB |####################################### | 100%python-2.7.15|12.1 MB |####################################### | 100%setuptools-40.2.0|585 KB |####################################### | 100%wheel-0.31.1|62 KB |####################################### | 100%tk-8.6.8|3.1 MB |####################################### | 100%readline-7.0|392 KB |####################################### | 100%Preparing transaction: done
Verifying transaction: done
Executing transaction: done

Now our py-2.7 environment actually has something in it that we can use. There are other ways to install packages in an environment, but they all use the conda install command. You can do what we did above and specify which environment you want to install in to, or you can activate the environment first and then install the package, as shown below:

$ py-2.7(py-2.7)$ conda install python=2.7

You will notice that the name of the currently active environment now precedes your command prompt.
We can also save time by installing our packages while we create our environment by listing the packages we want installed after the name of the environment we are creating, as shown below:

$ conda create --name py-2.7 python=2.7

These all accomplish the same goal of populating an environment with software, but what exactly is a Conda environment and what is it doing?

What is a Conda environment?

What Conda does when it creates an environment is generate an isolated directory where software packages are installed, then, upon activation of that environment, it prepends our PATH to direct the computer to search in that environment directory first. To help visualize and understand what Conda is doing when it is creating an environment let's run a couple of Bash commands. Run the following commands, which pythonpython --version, and echo $PATH while you have the anaconda3 module loaded, but no environment activated. You should see the following print to stdout:

$ module load anaconda3
$ which python
/opt/local/anaconda3/5.2.0/bin/python
$ python --version
Python 3.6.5 :: Anaconda, Inc.
$ echo$PATH/opt/local/anaconda3/5.2.0/bin:/users/yourusername/bin

You can see that we are using Python version 3.6.5 distributed by Anaconda which is located in the Anaconda root directory. You can also see that at the beginning of our PATH is that root directory. Now let's activate our py-2.7 environment and see how things change.

$ source activate py-2.7
(py-2.7)$ which python
~/.conda/envs/py-2.7/bin/python(py-2.7)$ python --version
Python 2.7.15 :: Anaconda, Inc.
(py-2.7)$ echo$PATH/users/yourusername/.conda/envs/py-2.7/bin:/opt/local/anaconda3/5.2.0/bin:/users/yourusername/bin

As you can see we are now accessing Python version 2.7.15 distributed by Anaconda which is installed in the py-2.7 environment directory located in our home directory. By comparing the PATH before and after activating our environment you can see that Conda prepends our PATH to direct to your environment. This is fundamentally how Conda controls and manages environments. Once you deactivate an environment your PATH variable returns to its previous state.

For more information on managing Conda environments please visit the Conda help documentation at this link.

USING PIP AND ADDING CHANNELS #Installing with pip and adding channels ####Installing packages with pip

Not all versions of all software have Conda packages available however, especially for some python libraries. Pip, the python package manager, is automatically installed by default in all environments created by Conda, and can install packages alongside those installed by Conda without conflict.
For example, say you need the library psutil, but you specifically need version 5.3.0. When you search for psutil using conda you get the following:

$ conda search psutil=5.3
Loading channels: done
# Name                  Version         Build  Channelpsutil                  5.3.1       py27_0  conda-forge
psutil                  5.3.1  py27h4c169b4_0  pkgs/main
psutil                  5.3.1       py35_0  conda-forge
psutil                  5.3.1  py35h6e9e629_0  pkgs/main
psutil                  5.3.1       py36_0  conda-forge
psutil                  5.3.1  py36h0e357b8_0  pkgs/main

Unfortunately there are no packages built for psutil version 5.3.0. We can use pip to install the version we want however.

$ source activate py-2.7

(py-2.7)$ pip install psutil==5.3.0
Collecting psutil==5.3.0
  Downloading https://files.pythonhosted.org/packages/1c/da/555e3ad3cad30f30bcf0d539cdeae5c8e7ef9e2a6078af645c70aa81e418/psutil-5.3.0.tar.gz (397kB)
    100%|████████████████████████████████|399kB 1.3MB/s
Building wheels for collected packages: psutil
  Running setup.py bdist_wheel for psutil ... done
  Stored in directory: /users/yourusername/.cache/pip/wheels/ff/c5/4f/1ee2208203f1cfeda16e91fccd8bfce5f4840b683671729d57
Successfully built psutil

(py-2.7)$ conda list

# packages in environment at /users/yourusername/.Conda/envs/py-2.7:## Name                  Version                 Buildca-certificates         2018.03.07                  0
certifi                 2018.8.24               py27_1
libedit                 3.1.20170329        h6b74fdf_2
libffi                  3.2.1               hd88cf55_4
libgcc-ng               8.2.0               hdf63c60_1
libstdcxx-ng            8.2.0               hdf63c60_1
ncurses                 6.1                 hf484d3e_0
openssl                 1.0.2p              h14c3975_0
pip                     10.0.1                  py27_0
**psutil                5.3.0                   <pip>python                  2.7.15              h1571d57_0
readline                7.0                 h7b6447c_5
setuptools              40.2.0                  py27_0
sqlite                  3.24.0              h84994c4_0
tk                      8.6.8               hbc83047_0
wheel                   0.31.1                  py27_0
zlib                    1.2.11              ha838bed_2

When installing packages using pip it is important to first activate the Conda environment that you want to install the package in since pip is strictly a package manager and cannot modify Conda environments from outside that environment. You can see that our psutil package, marked with a double asterisk, is version 5.3.0, just like we wanted. Under the 'build' column however you will see that conda is not sure which build it is since it was installed with pip, as indicated by the <pip> designator. ####Performance with Conda versus Pip One thing to note when installing packages is that it is always preferable to first install necessary packages with conda, and only then use pip to install only those packages that were not available through Anaconda repositories. This is because those packages available through the Anaconda repository have been highly optimized and in general greatly out-perform their pip counterparts, especially packages like the Intel Math Kernel Library (mkl).

Adding package repositories (channels)

Sometimes the default repositories, or channels for Conda, do not have the package you are looking for, but that does not mean that it is necessarily unavailable entirely. Say you are working with some Illumina sequence data and need the Burrows-Wheeler Aligner (bwa) in your pipeline, so you activate your bioinformatics environment and type conda install bwa which prints the following:

Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

  - bwa

Current channels:

  - https://repo.anaconda.com/pkgs/main/linux-64
  - https://repo.anaconda.com/pkgs/main/noarch
  - https://repo.anaconda.com/pkgs/free/linux-64
  - https://repo.anaconda.com/pkgs/free/noarch
  - https://repo.anaconda.com/pkgs/r/linux-64
  - https://repo.anaconda.com/pkgs/r/noarch
  - https://repo.anaconda.com/pkgs/pro/linux-64
  - https://repo.anaconda.com/pkgs/pro/noarch

To search for alternate channels that may provide the conda package you'relooking for, navigate to    https://anaconda.organd use the search bar at the top of the page.

We can search other channels that may have the package we are interested in with the -c flag. For example, BioConda is a large repository that hosts several thousand bioinformatics packages. We can search for our bwa package by specifying that channel.

$ conda search -c bioConda bwa

Which yields better results:

Loading channels: done
# Name                  Version         Build  Channelbwa                     0.5.9               0  bioConda
bwa                     0.5.9               1  bioConda
bwa                     0.6.2               0  bioConda
bwa                     0.6.2               1  bioConda
bwa                     0.7.3a              0  bioConda
bwa                     0.7.3a              1  bioConda
bwa                     0.7.3a      ha92aebf_2  bioConda
bwa                     0.7.4   ha92aebf_0  bioConda
bwa                     0.7.8               0  bioConda
bwa                     0.7.8               1  bioConda
bwa                     0.7.8   ha92aebf_2  bioConda
bwa                     0.7.12              0  bioConda
bwa                     0.7.12              1  bioConda
bwa                     0.7.13              0  bioConda
bwa                     0.7.13              1  bioConda
bwa                     0.7.15              0  bioConda
bwa                     0.7.15              1  bioConda
bwa                     0.7.16      pl5.22.0_0  bioConda
bwa                     0.7.17      ha92aebf_3  bioConda
bwa                     0.7.17      pl5.22.0_0  bioConda
bwa                     0.7.17      pl5.22.0_1  bioConda
bwa                     0.7.17      pl5.22.0_2  bioConda

We can then install our bwa package using conda install -c bioConda bwa and continue with our analyses. You can permanently add channels by appending your .condarc file either directly in a text editor, or with conda by using the config command:

$ conda config --append channels bioconda

This will permanently add the BioConda channel to your configuration file meaning Conda will automatically search BioConda as well as the default channels when looking for packages.

For more information on managing channels and installing with pip please refer to the Conda support documentation at this link.