From Documentation
Jump to: navigation, search
(Graham cluster)
 
Line 1: Line 1:
 +
{{Template:CCDelete}}
 
{{Software
 
{{Software
 
|package_name=JUPYTER
 
|package_name=JUPYTER

Latest revision as of 10:23, 6 June 2019

This page is scheduled for deletion because it is either redundant with information available on the CC wiki, or the software is no longer supported.
JUPYTER
Description: Produce documents with Jupyter Notebook App
SHARCNET Package information: see JUPYTER software page in web portal
Full list of SHARCNET supported software


Introduction

Provide Jupyter Notebook App on a sharcnet fedora visualization workstation.

Version

Jupyter is in the default path, no module must be loaded to use it. Only one version is currently installed.

Usage

Graham cluster

Jupyter notebook comes in one Python model on Graham. Instructions for installing it can be found here https://docs.computecanada.ca/wiki/Jupyter. You can get it work on the login node, and the compute nodes. Note that any notebook running on the login node will be killed after sometime. You will have to submit a job requesting the amount of CPU, GPU, memory and runtime. Here, we give the instruction to run in the compute node, since it is the recommended way.

Load the module

Log onto graham.sharcnet.ca (or graham.computecanada.ca) and load the python module

ssh user@graham.sharcnet.ca
module load python35-scipy-stack
Install Python modules
pip3.5 install pycuda --user
Submit the job

Create a bash script for submitting a jupyter job on the slurm scheduler, i.e., slurm_jupyter.sh and add

#!/bin/bash
#SBATCH --gres=gpu:2 #only if you need GPU
#SBATCH --time=0-01:00 #runtime d-hh:mm
#SBATCH --nodes 1 #how many nodes
#SBATCH --ntasks-per-node 32 #number of cores per node
#SBATCH --mem-per-cpu 4000 # memory in MB
#SBATCH --job-name tunnel #name of the job
#SBATCH --output jupyter-log-%J.txt #name of the log file
#SBATCH --mail-type=BEGIN #send email if job has started
#SBATCH --mail-user=<email_address> #send the email to this email_address

## load modules that you might need, in this case cuda for pycuda

module load cuda 

## get tunneling info
XDG_RUNTIME_DIR=""
ipnport=$(shuf -i8000-9999 -n1)
ipnip=$(hostname -i | xargs)

## print tunneling instructions to jupyter-log-{jobid}.txt
echo -e "
        Copy/Paste this in your local terminal to ssh tunnel with remote
        -----------------------------------------------------------------
        sshuttle -r $USER@graham.sharcnet.ca -v $ipnip/24
        -----------------------------------------------------------------

        Then open a browser on your local machine to the following address
        ------------------------------------------------------------------
        http://$ipnip:$ipnport (prefix w/ https:// if using password)
        ------------------------------------------------------------------
        "
## start an ipcluster instance and launch jupyter server
jupyter-notebook --no-browser --port=$ipnport --ip=$ipnip
Making the tunneling

Open a new terminal window, and run the sshuttle command to port-forward the jupyter port, e.g.

 sshuttle -r jnandez@graham.sharcnet.ca -v 10.29.76.66/24
Opening in a browser

Open your local browser and type, e.g.

http://10.29.76.66:8850/

Interactive

Not applicable.

Graphical

Log into vdi-fedora23.user.sharcnet.ca with vncviewer and simply run command:

jupyter notebook
  • More information on using vncviewer maybe found here.

Notes

Getting Some Help

jupyter notebook --help

Create Jupyter Notebooks

Follow the below sections to make additional environments available to Jupiter Notebook. The conda approach is used for setting up python 2, 3 and r virtual environments as described in https://ipython.readthedocs.io/en/latest/install/kernel_install.html ...

Preparation

If you plan to install the optional pybinding package in step 3) then to a bug described in https://bugs.python.org/issue18378 and further in https://stackoverflow.com/questions/15526996/ipython-notebook-locale-error before attempting to create the notebook on fedora23 which uses CA locale by default, add the following lines to your ~/.bashrc file then logout and login again. Running the locale command will show US active.

export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
export LANGUAGE=en_US.UTF-8

Failure to change the setting will result in a fatal error message during the installation:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 394: ordinal not in range(128

If further complications develop you could try carefully cleaning up the config files then reinstalling with:

rm -rf ~/.conda
rm -rf ~/.cache/pip

Python 2 Environment

Step 1)

[roberpj@vdi-fedora23:~] python -V
Python 2.7.11

[roberpj@vdi-fedora23:] export PATH=/usr/local/miniconda/3/bin:$PATH
[roberpj@vdi-fedora23:] conda create -n ipykernel_py2 python=2 ipykernel
Proceed ([y]/n)? y
[roberpj@vdi-fedora23:~] source activate ipykernel_py2

(ipykernel_py2) [roberpj@vdi-fedora23:~] python -V
Python 2.7.13 :: Continuum Analytics, Inc.

Step 2)

(ipykernel_py2) [roberpj@vdi-fedora23:~] python -m ipykernel install --user
Installed kernelspec python2 in /home/roberpj/.local/share/jupyter/kernels/python2

Step 3) [Optional]

 (ipykernel_py2) [roberpj@vdi-fedora23:~] pip install matplotlib pandas scipy scikit-learn pybinding

Step 4) Deactivate the environment

 (ipykernel_py2) [jnandez@vdi-fedora23 ~]$ source deactivate

Python 2 should now appear under "New" pulldown (on rhs ) when we run the app:

[roberpj@vdi-fedora23:~] jupyter notebook

To uninstall do something like ...

[roberpj@vdi-fedora23:~] conda uninstall -n ipykernel_py2 ipykernel

Python 3 Environment

Step1)

[roberpj@vdi-fedora23:~] python -V
Python 2.7.11

[roberpj@vdi-fedora23:~] export PATH=/usr/local/miniconda/3/bin:$PATH
[roberpj@vdi-fedora23:~] conda create -n ipykernel_py3 python=3 ipykernel
Proceed ([y]/n)? y
[roberpj@vdi-fedora23:~] source activate ipykernel_py3

(ipykernel_py3) [roberpj@vdi-fedora23:~] python -V
Python 3.6.0 :: Continuum Analytics, Inc.

Step 2)

(ipykernel_py3) [roberpj@vdi-fedora23:~] python -m ipykernel install --user
Installed kernelspec python3 in /home/roberpj/.local/share/jupyter/kernels/python3

Step 3) [Optional]

(ipykernel_py3) [roberpj@vdi-fedora23:~] pip install matplotlib pandas scipy scikit-learn numpy pybinding

Python 3 should now appear under the "New" pulldown when we run the app in the Notebooks section:

[roberpj@vdi-fedora23:~] jupyter notebook

R Environment

Step 1)

[roberpj@vdi-fedora23:~] R --version
R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree"

[roberpj@vdi-fedora23:~] export PATH=/usr/local/miniconda/3/bin:$PATH
[roberpj@vdi-fedora23:~] conda create -n r anaconda
[roberpj@vdi-fedora23:~] source activate r
(r) [roberpj@vdi-fedora23:~] conda install -c r r
(r) [roberpj@vdi-fedora23:~] conda install -c r r-essentials
(r) [roberpj@vdi-fedora23:~] conda install -c r r-irkernel

(r) [roberpj@vdi-fedora23:~] R --version
R version 3.3.2 (2016-10-31) -- "Sincere Pumpkin Patch"

Step 2)

[roberpj@vdi-fedora23:~] R
> IRkernel::installspec()
[InstallKernelSpec] Installed kernelspec ir in /home/roberpj/.local/share/jupyter/kernels/ir
> q()
Save workspace image? [y/n/c]: n
(r) [jnandez@vdi-fedora23 ~]$ source deactivate

R should appear under the "New" pulldown in the Notebooks section when jupyter is started:

[roberpj@vdi-fedora23:~] jupyter notebook

Apache Spark Environment

Apache Spark contains R, Python, Scala, SQL interactive shells. These shells can be access through Jupyter. You will need to install Apache Spark in your home like this using another cluster, or you could use vdi-fedora23 to do so. We are currently supporting only Python 2, since Python 3 is still not well tested by the Spark community.

$ cd
$ wget https://d3kbcqa49mib13.cloudfront.net/spark-2.1.1-bin-hadoop2.7.tgz
$ tar -xvzf spark-2.1.1-bin-hadoop2.7.tgz
$ mv spark-2.1.1-bin-hadoop2.7 spark211
$ export SPARK_HOME=/home/$USER/spark211
$ export PATH=$SPARK_HOME/bin/:$PATH
$ export SPARK_LOCAL_IP=127.0.0.1
$ pyspark

Note that you will always have to "export SPARK_LOCAL_IP=127.0.0.1" when you start a new session in vdi-fedora23.

PySpark using Python 2

One you have the Python 2.7.13 installed in your home, you can proceed to use Apache Spark on Jupyter. You will open a terminal window, do the following

Now, pyspark should run fine. Thereafter, we need to tell Spark that we are going to use Jupyter as the Driver. You should do the following,

$ PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark

[W 11:09:52.465 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation) </pre> Once this pyspark is running, Jupyter will be automatically open in your web browser. Then you can create a new notebook New -> Python 2, then you can test if Spark is running in the Jupyter notebook as follows

In [1]: print(spark.version) [SHIFT+ENTER]
Out[1]: 2.1.0

This means that Apache Spark is successfully running on your Jupyter notebook.

If the given command does not initiate a Spark-Python notebook, then try this commands, and repeat the Notebook test.

Alternatively:

$ export PYSPARK_DRIVER_PYTHON=/usr/bin/jupyter
$ export PYSPARK_DRIVER_PYTHON_OPTS="notebook --NotebookApp.open_browser=True --NotebookApp.ip=127.0.01 --NotebookApp.port=8888"
$ export PYSPARK_PYTHON=/home/$USER/.conda/envs/ipykernel_py2/bin/python
$ pyspark

PySpark with Apache Toree

Apache Toree is designed to enable applications that are both interactive and remote to work with Apache Spark. Apache Toree mainly works with Scala, but you can have PySpark running. The downside is that matplotlib magic does not currently work, and you will need "plt.show()" in order to see your plots. We suggest you use PySpark using Python 2. If you need PySpark with Apache Toree, you can do the following

Step 1)

$ export PATH=/usr/local/miniconda/3/bin:$PATH
$ conda create -n toree anaconda
$ source activate toree

Step 2)

$ pip install https://dist.apache.org/repos/dist/dev/incubator/toree/0.2.0/snapshots/dev1/toree-pip/toree-0.2.0.dev1.tar.gz
$ export PYSPARK_PYTHON=/home/$USER/.conda/envs/ipykernel_py2/bin/python
$ jupyter toree install --interpreters=PySpark --spark_home=$SPARK_HOME\
   --user --python_exec=$PYSPARK_PYTHON

Now you should have an option "Apache Toree - PySpark".

Scala with Apache Toree

Apache Toree is designed to enable applications that are both interactive and remote to work with Apache Spark. Apache Toree mainly works with Scala.

Step 1)

export PATH=/usr/local/miniconda/3/bin:$PATH
conda create -n toree anaconda
source activate toree

Step 2)

pip install toree
jupyter toree install --interpreters=Scala --spark_home=$SPARK_HOME --user

Now you should have an option "Apache Toree - Scala".

SparkR

You will need to install R first. Once you have R, you will not need to do anything in the command line, you start a new jupyter session,

jupyter notebook

Create a new R notebook as you would normally do. In the notebook you will have to add the Spark Kernel,

In [1]: library(SparkR,lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))
In [2]: sparkR.session(master="local[*]",sparkConfig = list(spark.driver.memory = "4g"))

The first line sets the Apache Spark home path, note that you will have to change "<FILL IN>" for your SHARCNET username. The second line makes available the SparkR libraries to R. The third line loads the library from the installed Apache Spark home path. The fourth line creates the Spark Context, and initiates the Spark Kernel. The fifth lines creates sqlContext from the created Spark Context.

Octave Environment

Step 1) Get the environments:

export PATH=/usr/local/miniconda/3/bin:$PATH
conda create -n octave anaconda
source activate octave

Step 2) Install Octave:

pip install octave_kernel
python -m octave_kernel.install

Haskell Environment

It is good to create an anaconda environment,

$ export PATH=/usr/local/miniconda/3/bin:$PATH
$ conda create -n ihaskell anaconda

After you have created the environment, you need activate it,

$ source activate ihaskell
$ cd

We need to clone the iHaskell repository from github.com,

$ git clone https://github.com/gibiansky/IHaskell.git
$ cd IHaskell

Then we install it

$ ./build.sh ihaskell
$ ihaskell install
$ source deactivate

Now you can start jupyter notebook.

Installing optional packages

Deactivate Environment

As an example we can deactivate the r environment by doing:

(r) [roberpj@vdi-fedora23:~] source deactivate
[roberpj@vdi-fedora23:~] 

Package Removal

Ideally use conda, python and r to cleanly uninstall all packages if one wants remove everything and start over. Otherwise remove the installation directories directly with great case by doing something like the following:

rm -rf ~/.jupyter/ ~/.local/share/jupyter /work/$USER/python/envs $XDG_RUNTIME_DIR/jupyter*

References

o Homepage: http://jupyter.org/
o Release: http://jupyter.readthedocs.io/en/latest/releases/content-releases.html
o Forum: http://jupyter.org/community.html
o Spark-Cloudera: http://www.cloudera.com/documentation/enterprise/5-5-x/topics/spark_ipython.html