From Documentation
Jump to: navigation, search
(PySpark using Python 2)
(Python 3 Environment)
Line 89: Line 89:
  
 
(ipykernel_py3) [roberpj@vdi-fedora23:~] python -V
 
(ipykernel_py3) [roberpj@vdi-fedora23:~] python -V
Python 3.5.2 :: Continuum Analytics, Inc.
+
Python 3.6.0 :: Continuum Analytics, Inc.
 
</pre>
 
</pre>
  

Revision as of 15:11, 23 February 2017

JUPYTER
Description: Produce documents with Jupyter Notebook App
SHARCNET Package information: see JUPYTER software page in web portal
Full list of SHARCNET supported software


Introduction

Provide Jupyter Notebook App on a sharcnet fedora visualization workstation.

Version

Jupyter is in the default path, no module must be loaded to use it. Only one version is currently installed.

Usage

Cluster

Jupyter is not installed on the clusters.

Interactive

Not applicable.

Graphical

Log into vdi-fedora23.user.sharcnet.ca with vncviewer and simply run command:

jupyter notebook
  • More information on using vncviewer maybe found here.

Notes

Getting Some Help

jupyter notebook --help

Create Jupyter Notebooks

Follow the below sections to make additional environments available to Jupiter Notebook. The conda approach is used for setting up python 2, 3 and r virtual environments as described in https://ipython.readthedocs.io/en/latest/install/kernel_install.html ...

Python 2 Environment

Step 1)

[roberpj@vdi-fedora23:~] python -V
Python 2.7.11

[roberpj@vdi-fedora23:] export PATH=/usr/local/miniconda/3/bin:$PATH
[roberpj@vdi-fedora23:] conda create -n ipykernel_py2 python=2 ipykernel
Proceed ([y]/n)? y
[roberpj@vdi-fedora23:~] source activate ipykernel_py2

(ipykernel_py2) [roberpj@vdi-fedora23:~] python -V
Python 2.7.12 :: Continuum Analytics, Inc.

Step 2)

(ipykernel_py2) [roberpj@vdi-fedora23:~] python -m ipykernel install --user
Installed kernelspec python2 in /home/roberpj/.local/share/jupyter/kernels/python2

Step 3) [Optional for installing new libraries (such as Pandas, MatplotLib, SciPy, etc.)]

 (ipykernel_py2) [roberpj@vdi-fedora23:~] pip install matplotlib pandas scipy scikit-learn

Python 2 should now appear under "New" pulldown (on rhs ) when we run the app:

[roberpj@vdi-fedora23:~] jupyter notebook

To uninstall do something like ...

[roberpj@vdi-fedora23:~] conda uninstall -n ipykernel_py2 ipykernel

Python 3 Environment

Step1)

[roberpj@vdi-fedora23:~] python -V
Python 2.7.11

[roberpj@vdi-fedora23:~] export PATH=/usr/local/miniconda/3/bin:$PATH
[roberpj@vdi-fedora23:~] conda create -n ipykernel_py3 python=3 ipykernel
Proceed ([y]/n)? y
[roberpj@vdi-fedora23:~] source activate ipykernel_py3

(ipykernel_py3) [roberpj@vdi-fedora23:~] python -V
Python 3.6.0 :: Continuum Analytics, Inc.

Step 2)

(ipykernel_py3) [roberpj@vdi-fedora23:~] python -m ipykernel install --user
Installed kernelspec python3 in /home/roberpj/.local/share/jupyter/kernels/python3

Step 3) [Optional for installing new libraries (such as Pandas, MatplotLib, SciPy, etc.)]

(ipykernel_py3) [roberpj@vdi-fedora23:~] pip install matplotlib pandas scipy scikit-learn

Python 3 should now appear under the "New" pulldown when we run the app in the Notebooks section:

[roberpj@vdi-fedora23:~] jupyter notebook

R Environment

Step 1)

[roberpj@vdi-fedora23:~] R --version
R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree"

[roberpj@vdi-fedora23:~] export PATH=/usr/local/miniconda/3/bin:$PATH
[roberpj@vdi-fedora23:~] conda create -n r anaconda
[roberpj@vdi-fedora23:~] source activate r
(r) [roberpj@vdi-fedora23:~] conda install -c r r
(r) [roberpj@vdi-fedora23:~] conda install -c r r-essentials
(r) [roberpj@vdi-fedora23:~] conda install -c r r-irkernel

(r) [roberpj@vdi-fedora23:~] R --version
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"

Step 2)

[roberpj@vdi-fedora23:~] R
> IRkernel::installspec()
[InstallKernelSpec] Installed kernelspec ir in /home/roberpj/.local/share/jupyter/kernels/ir

R should appear under the "New" pulldown in the Notebooks section when jupyter is started:

[roberpj@vdi-fedora23:~] jupyter notebook

ASIDE notice we so far now have created these 3 directories:

[roberpj@vdi-fedora23:/work/roberpj/python/envs] ls
ipykernel_py2  ipykernel_py3  r

Apache Spark Environment

Apache Spark contains R, Python, Scala, SQL interactive shells. These shells can be access through Jupyter. You will need to install Apache Spark in your home like this using another cluster, or you could use vdi-fedora23 to do so.

$ cd
$ wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz
$ tar -xvzf spark-2.1.0-bin-hadoop2.7.tgz
$ mv spark-2.1.0-bin-hadoop2.7 spark210
$ export PATH=/home/$USER/spark210/bin/:$PATH
$ pyspark

If for some reason the last command gives you errors (e.g. "sparkDriver could not bind on port 0"), type this

$ export SPARK_LOCAL_IP=127.0.0.1
$ pyspark

Note that you will always have to "export SPARK_LOCAL_IP=127.0.0.1" when you start a new session in vdi-fedora23.

PySpark using Python 3

One you have the Python 3.5 installed in your home, you can proceed to use Apache Spark on Jupyter. You will open a terminal window, do the following

Now, pyspark should run fine. Thereafter, we need to tell Spark that we are going to use Jupyter as the Driver.

$ export PYSPARK_DRIVER_PYTHON=/usr/bin/jupyter
$ export PYSPARK_DRIVER_PYTHON_OPTS="notebook --NotebookApp.open_browser=True --NotebookApp.ip=127.0.01 --NotebookApp.port=8888"
$ export PYSPARK_PYTHON=/home/$USER/.conda/envs/ipykernel_py3/bin/python

Now, we can initiate pyspark

$ pyspark
[W 11:09:52.465 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation)

Once this pyspark is running, Jupyter will be automatically open in your web browser. Then you can create a new notebook New -> Python 3, then you can test if Spark is running in the Jupyter notebook as follows

In [1]: print(spark.version) [SHIFT+ENTER]
Out[1]: 2.1.0

This means that Apache Spark is successfully running on your Jupyter notebook.


PySpark using Python 2

One you have the Python 2.7.12 installed in your home, you can proceed to use Apache Spark on Jupyter. You will open a terminal window, do the following

Now, pyspark should run fine. Thereafter, we need to tell Spark that we are going to use Jupyter as the Driver.

$ export PYSPARK_DRIVER_PYTHON=/usr/bin/jupyter
$ export PYSPARK_DRIVER_PYTHON_OPTS="notebook --NotebookApp.open_browser=True --NotebookApp.ip=127.0.01 --NotebookApp.port=8888"
$ export PYSPARK_PYTHON=/home/$USER/.conda/envs/ipykernel_py2/bin/python

Now, we can initiate pyspark

$ pyspark
[W 11:09:52.465 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation)

Once this pyspark is running, Jupyter will be automatically open in your web browser. Then you can create a new notebook New -> Python 2, then you can test if Spark is running in the Jupyter notebook as follows

In [1]: print(spark.version) [SHIFT+ENTER]
Out[1]: 2.0.0

This means that Apache Spark is successfully running on your Jupyter notebook.

PySpark with Apache Toree

Apache Toree is designed to enable applications that are both interactive and remote to work with Apache Spark. Apache Toree mainly works with Scala, but you can have PySpark running. The downside is that matplotlib magic does not currently work, and you will need "plt.show()" in order to see your plots. We suggest you use PySpark using Python 2. If you need PySpark with Apache Toree, you can do the following

Step 1)

$ export PATH=/usr/local/miniconda/3/bin:$PATH
$ conda create -n toree anaconda
$ source activate toree

Step 2)

$ pip install toree
$ export PYSPARK_PYTHON=/home/$USER/.conda/envs/ipykernel_py2/bin/python
$ jupyter toree install --interpreters=PySpark --spark_home=/home/$USER/spark162\
   --user --python_exec=/home/$USER/.conda/envs/ipykernel_py2/bin/python

Now you should have an option "Apache Toree - PySpark".

Scala with Apache Toree

Apache Toree is designed to enable applications that are both interactive and remote to work with Apache Spark. Apache Toree mainly works with Scala.

Step 1)

export PATH=/usr/local/miniconda/3/bin:$PATH
conda create -n toree anaconda
source activate toree

Step 2)

pip install toree
jupyter toree install --interpreters=Scala --spark_home=/home/$USER/spark162 --user

Now you should have an option "Apache Toree - Scala".

SparkR

You will need to install R first. Once you have R, you will not need to do anything in the command line, you start a new jupyter session,

jupyter notebook

Create a new R notebook as you would normally do. In the notebook you will have to add the Spark Kernel,

In [1]: Sys.setenv(SPARK_HOME="/home/<FILL IN>/spark162/")
In [2]: .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
In [3]: library(SparkR)
In [4]: sc <- sparkR.init(master="local")
In [5]: sqlContext <- sparkRSQL.init(sc)

The first line sets the Apache Spark home path, note that you will have to change "<FILL IN>" for your SHARCNET username. The second line makes available the SparkR libraries to R. The third line loads the library from the installed Apache Spark home path. The fourth line creates the Spark Context, and initiates the Spark Kernel. The fifth lines creates sqlContext from the created Spark Context.

Octave Environment

Step 1) Get the environments:

export PATH=/usr/local/miniconda/3/bin:$PATH
conda create -n octave anaconda
source activate octave

Step 2) Install Octave:

pip install octave_kernel
python -m octave_kernel.install

Haskell Environment

It is good to create an anaconda environment,

$ export PATH=/usr/local/miniconda/3/bin:$PATH
$ conda create -n ihaskell anaconda

After you have created the environment, you need activate it,

$ source activate ihaskell
$ cd

We need to clone the iHaskell repository from github.com,

$ git clone https://github.com/gibiansky/IHaskell.git
$ cd IHaskell

Then we install it

$ ./build.sh ihaskell
$ ihaskell install
$ source deactivate

Now you can start jupyter notebook.

Installing optional packages

Deactivate Environment

As an example we can deactivate the r environment by doing:

(r) [roberpj@vdi-fedora23:~] source deactivate
[roberpj@vdi-fedora23:~] 

Package Removal

Ideally use conda, python and r to cleanly uninstall all packages if one wants remove everything and start over. Otherwise remove the installation directories directly with great case by doing something like the following:

rm -rf ~/.jupyter/ ~/.local/share/jupyter /work/$USER/python/envs $XDG_RUNTIME_DIR/jupyter*

References

o Homepage: http://jupyter.org/
o Release: http://jupyter.readthedocs.io/en/latest/releases/content-releases.html
o Forum: http://jupyter.org/community.html
o Spark-Cloudera: http://www.cloudera.com/documentation/enterprise/5-5-x/topics/spark_ipython.html