From Documentation
Revision as of 15:31, 21 July 2017 by Jnandez (Talk | contribs) (Cluster)

Jump to: navigation, search
JUPYTER
Description: Produce documents with Jupyter Notebook App
SHARCNET Package information: see JUPYTER software page in web portal
Full list of SHARCNET supported software


Introduction

Provide Jupyter Notebook App on a sharcnet fedora visualization workstation.

Version

Jupyter is in the default path, no module must be loaded to use it. Only one version is currently installed.

Usage

Graham cluster

Jupyter notebook comes in one Python model on Graham. You can get it work on the login node, and the compute nodes. Note that any notebook running on the login node will be killed after sometime. It is recommended to use the compute nodes. You will have to submit a job requesting the amount of CPU, GPU, memory and runtime. Here, we give the instruction to run in the compute node, since it is the recommended way.

Load the module

Log onto graham.sharcnet.ca (or graham.computecanada.ca) and load the python module

ssh user@graham.sharcnet.ca
module load python35-scipy-stack
Install Python modules
pip3.5 install pycuda --user
Submit the job

Interactive

Not applicable.

Graphical

Log into vdi-fedora23.user.sharcnet.ca with vncviewer and simply run command:

jupyter notebook
  • More information on using vncviewer maybe found here.

Notes

Getting Some Help

jupyter notebook --help

Create Jupyter Notebooks

Follow the below sections to make additional environments available to Jupiter Notebook. The conda approach is used for setting up python 2, 3 and r virtual environments as described in https://ipython.readthedocs.io/en/latest/install/kernel_install.html ...

Python 2 Environment

Step 1)

[roberpj@vdi-fedora23:~] python -V
Python 2.7.11

[roberpj@vdi-fedora23:] export PATH=/usr/local/miniconda/3/bin:$PATH
[roberpj@vdi-fedora23:] conda create -n ipykernel_py2 python=2 ipykernel
Proceed ([y]/n)? y
[roberpj@vdi-fedora23:~] source activate ipykernel_py2

(ipykernel_py2) [roberpj@vdi-fedora23:~] python -V
Python 2.7.13 :: Continuum Analytics, Inc.

Step 2)

(ipykernel_py2) [roberpj@vdi-fedora23:~] python -m ipykernel install --user
Installed kernelspec python2 in /home/roberpj/.local/share/jupyter/kernels/python2

Step 3) [Optional for installing new libraries (such as Pandas, MatplotLib, SciPy, etc.)]

 (ipykernel_py2) [roberpj@vdi-fedora23:~] pip install matplotlib pandas scipy scikit-learn

Step 4) Deactivate the environment

 (ipykernel_py2) [jnandez@vdi-fedora23 ~]$ source deactivate

Python 2 should now appear under "New" pulldown (on rhs ) when we run the app:

[roberpj@vdi-fedora23:~] jupyter notebook

To uninstall do something like ...

[roberpj@vdi-fedora23:~] conda uninstall -n ipykernel_py2 ipykernel

Python 3 Environment

Step1)

[roberpj@vdi-fedora23:~] python -V
Python 2.7.11

[roberpj@vdi-fedora23:~] export PATH=/usr/local/miniconda/3/bin:$PATH
[roberpj@vdi-fedora23:~] conda create -n ipykernel_py3 python=3 ipykernel
Proceed ([y]/n)? y
[roberpj@vdi-fedora23:~] source activate ipykernel_py3

(ipykernel_py3) [roberpj@vdi-fedora23:~] python -V
Python 3.6.0 :: Continuum Analytics, Inc.

Step 2)

(ipykernel_py3) [roberpj@vdi-fedora23:~] python -m ipykernel install --user
Installed kernelspec python3 in /home/roberpj/.local/share/jupyter/kernels/python3

Step 3) [Optional for installing new libraries (such as Pandas, MatplotLib, SciPy, etc.)]

(ipykernel_py3) [roberpj@vdi-fedora23:~] pip install matplotlib pandas scipy scikit-learn numpy

Python 3 should now appear under the "New" pulldown when we run the app in the Notebooks section:

[roberpj@vdi-fedora23:~] jupyter notebook

R Environment

Step 1)

[roberpj@vdi-fedora23:~] R --version
R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree"

[roberpj@vdi-fedora23:~] export PATH=/usr/local/miniconda/3/bin:$PATH
[roberpj@vdi-fedora23:~] conda create -n r anaconda
[roberpj@vdi-fedora23:~] source activate r
(r) [roberpj@vdi-fedora23:~] conda install -c r r
(r) [roberpj@vdi-fedora23:~] conda install -c r r-essentials
(r) [roberpj@vdi-fedora23:~] conda install -c r r-irkernel

(r) [roberpj@vdi-fedora23:~] R --version
R version 3.3.2 (2016-10-31) -- "Sincere Pumpkin Patch"

Step 2)

[roberpj@vdi-fedora23:~] R
> IRkernel::installspec()
[InstallKernelSpec] Installed kernelspec ir in /home/roberpj/.local/share/jupyter/kernels/ir
> q()
Save workspace image? [y/n/c]: n
(r) [jnandez@vdi-fedora23 ~]$ source deactivate

R should appear under the "New" pulldown in the Notebooks section when jupyter is started:

[roberpj@vdi-fedora23:~] jupyter notebook

Apache Spark Environment

Apache Spark contains R, Python, Scala, SQL interactive shells. These shells can be access through Jupyter. You will need to install Apache Spark in your home like this using another cluster, or you could use vdi-fedora23 to do so. We are currently supporting only Python 2, since Python 3 is still not well tested by the Spark community.

$ cd
$ wget https://d3kbcqa49mib13.cloudfront.net/spark-2.1.1-bin-hadoop2.7.tgz
$ tar -xvzf spark-2.1.1-bin-hadoop2.7.tgz
$ mv spark-2.1.1-bin-hadoop2.7 spark211
$ export SPARK_HOME=/home/$USER/spark211
$ export PATH=$SPARK_HOME/bin/:$PATH
$ export SPARK_LOCAL_IP=127.0.0.1
$ pyspark

Note that you will always have to "export SPARK_LOCAL_IP=127.0.0.1" when you start a new session in vdi-fedora23.

PySpark using Python 2

One you have the Python 2.7.13 installed in your home, you can proceed to use Apache Spark on Jupyter. You will open a terminal window, do the following

Now, pyspark should run fine. Thereafter, we need to tell Spark that we are going to use Jupyter as the Driver. You should do the following,

$ PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark

[W 11:09:52.465 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation) </pre> Once this pyspark is running, Jupyter will be automatically open in your web browser. Then you can create a new notebook New -> Python 2, then you can test if Spark is running in the Jupyter notebook as follows

In [1]: print(spark.version) [SHIFT+ENTER]
Out[1]: 2.1.0

This means that Apache Spark is successfully running on your Jupyter notebook.

If the given command does not initiate a Spark-Python notebook, then try this commands, and repeat the Notebook test.

Alternatively:

$ export PYSPARK_DRIVER_PYTHON=/usr/bin/jupyter
$ export PYSPARK_DRIVER_PYTHON_OPTS="notebook --NotebookApp.open_browser=True --NotebookApp.ip=127.0.01 --NotebookApp.port=8888"
$ export PYSPARK_PYTHON=/home/$USER/.conda/envs/ipykernel_py2/bin/python
$ pyspark

PySpark with Apache Toree

Apache Toree is designed to enable applications that are both interactive and remote to work with Apache Spark. Apache Toree mainly works with Scala, but you can have PySpark running. The downside is that matplotlib magic does not currently work, and you will need "plt.show()" in order to see your plots. We suggest you use PySpark using Python 2. If you need PySpark with Apache Toree, you can do the following

Step 1)

$ export PATH=/usr/local/miniconda/3/bin:$PATH
$ conda create -n toree anaconda
$ source activate toree

Step 2)

$ pip install https://dist.apache.org/repos/dist/dev/incubator/toree/0.2.0/snapshots/dev1/toree-pip/toree-0.2.0.dev1.tar.gz
$ export PYSPARK_PYTHON=/home/$USER/.conda/envs/ipykernel_py2/bin/python
$ jupyter toree install --interpreters=PySpark --spark_home=$SPARK_HOME\
   --user --python_exec=$PYSPARK_PYTHON

Now you should have an option "Apache Toree - PySpark".

Scala with Apache Toree

Apache Toree is designed to enable applications that are both interactive and remote to work with Apache Spark. Apache Toree mainly works with Scala.

Step 1)

export PATH=/usr/local/miniconda/3/bin:$PATH
conda create -n toree anaconda
source activate toree

Step 2)

pip install toree
jupyter toree install --interpreters=Scala --spark_home=$SPARK_HOME --user

Now you should have an option "Apache Toree - Scala".

SparkR

You will need to install R first. Once you have R, you will not need to do anything in the command line, you start a new jupyter session,

jupyter notebook

Create a new R notebook as you would normally do. In the notebook you will have to add the Spark Kernel,

In [1]: library(SparkR,lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))
In [2]: sparkR.session(master="local[*]",sparkConfig = list(spark.driver.memory = "4g"))

The first line sets the Apache Spark home path, note that you will have to change "<FILL IN>" for your SHARCNET username. The second line makes available the SparkR libraries to R. The third line loads the library from the installed Apache Spark home path. The fourth line creates the Spark Context, and initiates the Spark Kernel. The fifth lines creates sqlContext from the created Spark Context.

Octave Environment

Step 1) Get the environments:

export PATH=/usr/local/miniconda/3/bin:$PATH
conda create -n octave anaconda
source activate octave

Step 2) Install Octave:

pip install octave_kernel
python -m octave_kernel.install

Haskell Environment

It is good to create an anaconda environment,

$ export PATH=/usr/local/miniconda/3/bin:$PATH
$ conda create -n ihaskell anaconda

After you have created the environment, you need activate it,

$ source activate ihaskell
$ cd

We need to clone the iHaskell repository from github.com,

$ git clone https://github.com/gibiansky/IHaskell.git
$ cd IHaskell

Then we install it

$ ./build.sh ihaskell
$ ihaskell install
$ source deactivate

Now you can start jupyter notebook.

Installing optional packages

Deactivate Environment

As an example we can deactivate the r environment by doing:

(r) [roberpj@vdi-fedora23:~] source deactivate
[roberpj@vdi-fedora23:~] 

Package Removal

Ideally use conda, python and r to cleanly uninstall all packages if one wants remove everything and start over. Otherwise remove the installation directories directly with great case by doing something like the following:

rm -rf ~/.jupyter/ ~/.local/share/jupyter /work/$USER/python/envs $XDG_RUNTIME_DIR/jupyter*

References

o Homepage: http://jupyter.org/
o Release: http://jupyter.readthedocs.io/en/latest/releases/content-releases.html
o Forum: http://jupyter.org/community.html
o Spark-Cloudera: http://www.cloudera.com/documentation/enterprise/5-5-x/topics/spark_ipython.html