From Documentation
Jump to: navigation, search
(PySpark)
(Scala with Apache Toree)
Line 215: Line 215:
 
Step 2)
 
Step 2)
 
  pip install toree
 
  pip install toree
 +
jupyter toree install --interpreters=Scala --spark_home=/home/$USER/spark162 --user
  
 
==== SparkR ====
 
==== SparkR ====

Revision as of 09:51, 26 July 2016

JUPYTER
Description: Produce documents with Jupyter Notebook App
SHARCNET Package information: see JUPYTER software page in web portal
Full list of SHARCNET supported software


Introduction

Provide Jupyter Notebook App on a sharcnet fedora visualization workstation.

Version

Jupyter is in the default path, no module must be loaded to use it. Only one version is currently installed.

Usage

Cluster

Jupyter is not installed on the clusters.

Interactive

Not applicable.

Graphical

Run command line:

jupyter notebook

This package is best run with noVNC or vncviewer https://www.sharcnet.ca/help/index.php/Remote_Graphical_Connections.

Notes

Getting Some Help

jupyter notebook --help

Create Jupyter Notebooks

Follow the below sections to make additional environments available to Jupiter Notebook. The conda approach is used for setting up python 2, 3 and r virtual environments as described in https://ipython.readthedocs.io/en/latest/install/kernel_install.html ...

Python 2 Environment

Step 1)

[roberpj@vdi-fedora23:~] python -V
Python 2.7.11

[roberpj@vdi-fedora23:] export PATH=/usr/local/miniconda/3/bin:$PATH
[roberpj@vdi-fedora23:] conda create -n ipykernel_py2 python=2 ipykernel
Proceed ([y]/n)? y
[roberpj@vdi-fedora23:~] source activate ipykernel_py2

(ipykernel_py2) [roberpj@vdi-fedora23:~] python -V
Python 2.7.12 :: Continuum Analytics, Inc.

Step 2)

(ipykernel_py2) [roberpj@vdi-fedora23:~] python -m ipykernel install --user
Installed kernelspec python2 in /home/roberpj/.local/share/jupyter/kernels/python2

Step 3) [Optional for installing new libraries (such as Pandas, MatplotLib, SciPy, etc.)]

 (ipykernel_py2) [roberpj@vdi-fedora23:~] pip install matplotlib pandas scipy scikit-learn

Python 2 should now appear under "New" pulldown (on rhs ) when we run the app:

[roberpj@vdi-fedora23:~] jupyter notebook

To uninstall do something like ...

[roberpj@vdi-fedora23:~] conda uninstall -n ipykernel_py2 ipykernel

Python 3 Environment

Step1)

[roberpj@vdi-fedora23:~] python -V
Python 2.7.11

[roberpj@vdi-fedora23:~] export PATH=/usr/local/miniconda/3/bin:$PATH
[roberpj@vdi-fedora23:~] conda create -n ipykernel_py3 python=3 ipykernel
Proceed ([y]/n)? y
[roberpj@vdi-fedora23:~] source activate ipykernel_py3

(ipykernel_py3) [roberpj@vdi-fedora23:~] python -V
Python 3.5.2 :: Continuum Analytics, Inc.

Step 2)

(ipykernel_py3) [roberpj@vdi-fedora23:~] python -m ipykernel install --user
Installed kernelspec python3 in /home/roberpj/.local/share/jupyter/kernels/python3

Step 3) [Optional for installing new libraries (such as Pandas, MatplotLib, SciPy, etc.)]

(ipykernel_py3) [roberpj@vdi-fedora23:~] pip install matplotlib pandas scipy scikit-learn

Python 3 should now appear under the "New" pulldown when we run the app in the Notebooks section:

[roberpj@vdi-fedora23:~] jupyter notebook

R Environment

Step 1)

[roberpj@vdi-fedora23:~] R --version
R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree"

[roberpj@vdi-fedora23:~] export PATH=/usr/local/miniconda/3/bin:$PATH
[roberpj@vdi-fedora23:~] conda create -n r anaconda
[roberpj@vdi-fedora23:~] source activate r
(r) [roberpj@vdi-fedora23:~] conda install -c r r
(r) [roberpj@vdi-fedora23:~] conda install -c r r-essentials
(r) [roberpj@vdi-fedora23:~] conda install -c r r-irkernel

(r) [roberpj@vdi-fedora23:~] R --version
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"

Step 2)

[roberpj@vdi-fedora23:~] R
> IRkernel::installspec()
[InstallKernelSpec] Installed kernelspec ir in /home/roberpj/.local/share/jupyter/kernels/ir

R should appear under the "New" pulldown in the Notebooks section when jupyter is started:

[roberpj@vdi-fedora23:~] jupyter notebook

ASIDE notice we so far now have created these 3 directories:

[roberpj@vdi-fedora23:/work/roberpj/python/envs] ls
ipykernel_py2  ipykernel_py3  r

Apache Spark Environment

Apache Spark contains R, Python, Scala, SQL interactive shells. These shells can be access through Jupyter. You will need to install Apache Spark in your home like this using another cluster, or you could use vdi-fedora23 to do so.

$ cd
$ wget http://d3kbcqa49mib13.cloudfront.net/spark-1.6.2-bin-hadoop2.6.tgz
$ tar -xvzf spark-1.6.2-bin-hadoop2.6.tgz
$ mv spark-1.6.2-bin-hadoop2.6 spark162
$ export PATH=/home/$USER/spark162/bin/:$PATH
$ pyspark

If for some reason the last command gives you errors (e.g. "sparkDriver could not bind on port 0"), type this

$ export SPARK_LOCAL_IP=127.0.0.1
$ pyspark

Note that you will always have to "export SPARK_LOCAL_IP=127.0.0.1" when you start a new session in vdi-fedora23.

PySpark with Apache Toree

Now, pyspark should run fine. Thereafter, we need to tell Spark that we are going to use Jupyter as the Driver.

$ export PYSPARK_DRIVER_PYTHON=/usr/bin/jupyter
$ export PYSPARK_DRIVER_PYTHON_OPTS="notebook --NotebookApp.open_browser=False --NotebookApp.ip=127.0.01 --NotebookApp.port=8880"
$ export PYSPARK_PYTHON=/home/$USER/.conda/envs/ipykernel_py2/bin/python

Now, we can initiate pyspark

$ pyspark
[W 11:09:52.465 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation)

Once this pyspark is running, you can open a firefox tab, and type in the http://localhost:8880/ in the browser. Then you can create a new notebook New -> Python 2, then you can test if Spark is running in the Jupyter notebook as follows

In [1]: print(sc.version) [SHIFT+ENTER]
Out[1]: 1.6.2

This means that Apache Spark is successfully running on your Jupyter notebook.


PySpark

One you have the Python 2.7.12 installed in your home, you can proceed to use Apache Spark on Jupyter. You will open a terminal window, do the following

Now, pyspark should run fine. Thereafter, we need to tell Spark that we are going to use Jupyter as the Driver.

$ export PYSPARK_DRIVER_PYTHON=/usr/bin/jupyter
$ export PYSPARK_DRIVER_PYTHON_OPTS="notebook --NotebookApp.open_browser=False --NotebookApp.ip=127.0.01 --NotebookApp.port=8880"
$ export PYSPARK_PYTHON=/home/$USER/.conda/envs/ipykernel_py2/bin/python

Now, we can initiate pyspark

$ pyspark
[W 11:09:52.465 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation)

Once this pyspark is running, you can open a firefox tab, and type in the http://localhost:8880/ in the browser. Then you can create a new notebook New -> Python 2, then you can test if Spark is running in the Jupyter notebook as follows

In [1]: print(sc.version) [SHIFT+ENTER]
Out[1]: 1.6.2

This means that Apache Spark is successfully running on your Jupyter notebook.

Scala with Apache Toree

Apache Toree is designed to enable applications that are both interactive and remote to work with Apache Spark. Apache Toree mainly works with Scala.

Step 1)

export PATH=/usr/local/miniconda/3/bin:$PATH
conda create -n toree anaconda
source activate toree

Step 2)

pip install toree
jupyter toree install --interpreters=Scala --spark_home=/home/$USER/spark162 --user

SparkR

You will need to install R first. Once you have R, you will not need to do anything in the command line, you start a new jupyter session,

jupyter notebook

Create a new R notebook as you would normally do. In the notebook you will have to add the Spark Kernel,

In [1]: Sys.setenv(SPARK_HOME="/home/<FILL IN>/spark162/")
In [2]: .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
In [3]: library(SparkR)
In [4]: sc <- sparkR.init(master="local")
In [5]: sqlContext <- sparkRSQL.init(sc)

The first line sets the Apache Spark home path, note that you will have to change "<FILL IN>" for your SHARCNET username. The second line makes available the SparkR libraries to R. The third line loads the library from the installed Apache Spark home path. The fourth line creates the Spark Context, and initiates the Spark Kernel. The fifth lines creates sqlContext from the created Spark Context.

Octave Environment

Step 1) Get the environments:

export PATH=/usr/local/miniconda/3/bin:$PATH
conda create -n octave anaconda
source activate octave

Step 2) Install Octave:

pip install octave_kernel
python -m octave_kernel.install

Haskell Environment

Coming soon ...

Installing optional packages

Deactivate Environment

As an example we can deactivate the r environment by doing:

(r) [roberpj@vdi-fedora23:~] source deactivate
[roberpj@vdi-fedora23:~] 

Package Removal

Ideally use conda, python and r to cleanly uninstall all packages if one wants remove everything and start over. Otherwise remove the installation directories directly with great case by doing something like the following:

rm -rf ~/.jupyter/ ~/.local/share/jupyter /work/$USER/python/envs $XDG_RUNTIME_DIR/jupyter*

References

o Homepage: http://jupyter.org/
o Release: http://jupyter.readthedocs.io/en/latest/releases/content-releases.html
o Forum: http://jupyter.org/community.html
o Spark-Cloudera: http://www.cloudera.com/documentation/enterprise/5-5-x/topics/spark_ipython.html