(→Apache Spark Environment) |
(→Apache Spark Environment) |
||
Line 157: | Line 157: | ||
Note that you will always have to "export SPARK_LOCAL_IP=127.0.0.1" when you start a new session in vdi-fedora23. | Note that you will always have to "export SPARK_LOCAL_IP=127.0.0.1" when you start a new session in vdi-fedora23. | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
Revision as of 11:11, 24 February 2017
JUPYTER |
---|
Description: Produce documents with Jupyter Notebook App |
SHARCNET Package information: see JUPYTER software page in web portal |
Full list of SHARCNET supported software |
Contents
- 1 Introduction
- 2 Version
- 3 Usage
- 4 Notes
- 5 References
Introduction
Provide Jupyter Notebook App on a sharcnet fedora visualization workstation.
Version
Jupyter is in the default path, no module must be loaded to use it. Only one version is currently installed.
Usage
Cluster
Jupyter is not installed on the clusters.
Interactive
Not applicable.
Graphical
Log into vdi-fedora23.user.sharcnet.ca with vncviewer and simply run command:
jupyter notebook
- More information on using vncviewer maybe found here.
Notes
Getting Some Help
jupyter notebook --help
Create Jupyter Notebooks
Follow the below sections to make additional environments available to Jupiter Notebook. The conda approach is used for setting up python 2, 3 and r virtual environments as described in https://ipython.readthedocs.io/en/latest/install/kernel_install.html ...
Python 2 Environment
Step 1)
[roberpj@vdi-fedora23:~] python -V Python 2.7.11 [roberpj@vdi-fedora23:] export PATH=/usr/local/miniconda/3/bin:$PATH [roberpj@vdi-fedora23:] conda create -n ipykernel_py2 python=2 ipykernel Proceed ([y]/n)? y [roberpj@vdi-fedora23:~] source activate ipykernel_py2 (ipykernel_py2) [roberpj@vdi-fedora23:~] python -V Python 2.7.12 :: Continuum Analytics, Inc.
Step 2)
(ipykernel_py2) [roberpj@vdi-fedora23:~] python -m ipykernel install --user Installed kernelspec python2 in /home/roberpj/.local/share/jupyter/kernels/python2
Step 3) [Optional for installing new libraries (such as Pandas, MatplotLib, SciPy, etc.)]
(ipykernel_py2) [roberpj@vdi-fedora23:~] pip install matplotlib pandas scipy scikit-learn
Python 2 should now appear under "New" pulldown (on rhs ) when we run the app:
[roberpj@vdi-fedora23:~] jupyter notebook
To uninstall do something like ...
[roberpj@vdi-fedora23:~] conda uninstall -n ipykernel_py2 ipykernel
Python 3 Environment
Step1)
[roberpj@vdi-fedora23:~] python -V Python 2.7.11 [roberpj@vdi-fedora23:~] export PATH=/usr/local/miniconda/3/bin:$PATH [roberpj@vdi-fedora23:~] conda create -n ipykernel_py3 python=3 ipykernel Proceed ([y]/n)? y [roberpj@vdi-fedora23:~] source activate ipykernel_py3 (ipykernel_py3) [roberpj@vdi-fedora23:~] python -V Python 3.6.0 :: Continuum Analytics, Inc.
Step 2)
(ipykernel_py3) [roberpj@vdi-fedora23:~] python -m ipykernel install --user Installed kernelspec python3 in /home/roberpj/.local/share/jupyter/kernels/python3
Step 3) [Optional for installing new libraries (such as Pandas, MatplotLib, SciPy, etc.)]
(ipykernel_py3) [roberpj@vdi-fedora23:~] pip install matplotlib pandas scipy scikit-learn numpy
Python 3 should now appear under the "New" pulldown when we run the app in the Notebooks section:
[roberpj@vdi-fedora23:~] jupyter notebook
R Environment
Step 1)
[roberpj@vdi-fedora23:~] R --version R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" [roberpj@vdi-fedora23:~] export PATH=/usr/local/miniconda/3/bin:$PATH [roberpj@vdi-fedora23:~] conda create -n r anaconda [roberpj@vdi-fedora23:~] source activate r (r) [roberpj@vdi-fedora23:~] conda install -c r r (r) [roberpj@vdi-fedora23:~] conda install -c r r-essentials (r) [roberpj@vdi-fedora23:~] conda install -c r r-irkernel (r) [roberpj@vdi-fedora23:~] R --version R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Step 2)
[roberpj@vdi-fedora23:~] R > IRkernel::installspec() [InstallKernelSpec] Installed kernelspec ir in /home/roberpj/.local/share/jupyter/kernels/ir
R should appear under the "New" pulldown in the Notebooks section when jupyter is started:
[roberpj@vdi-fedora23:~] jupyter notebook
ASIDE notice we so far now have created these 3 directories:
[roberpj@vdi-fedora23:/work/roberpj/python/envs] ls ipykernel_py2 ipykernel_py3 r
Apache Spark Environment
Apache Spark contains R, Python, Scala, SQL interactive shells. These shells can be access through Jupyter. You will need to install Apache Spark in your home like this using another cluster, or you could use vdi-fedora23 to do so.
$ cd $ wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz $ tar -xvzf spark-2.1.0-bin-hadoop2.7.tgz $ mv spark-2.1.0-bin-hadoop2.7 spark210 $ export PATH=/home/$USER/spark210/bin/:$PATH $ export SPARK_LOCAL_IP=127.0.0.1 $ pyspark
Note that you will always have to "export SPARK_LOCAL_IP=127.0.0.1" when you start a new session in vdi-fedora23.
PySpark using Python 2
One you have the Python 2.7.12 installed in your home, you can proceed to use Apache Spark on Jupyter. You will open a terminal window, do the following
Now, pyspark should run fine. Thereafter, we need to tell Spark that we are going to use Jupyter as the Driver.
$ export PYSPARK_DRIVER_PYTHON=/usr/bin/jupyter $ export PYSPARK_DRIVER_PYTHON_OPTS="notebook --NotebookApp.open_browser=True --NotebookApp.ip=127.0.01 --NotebookApp.port=8888" $ export PYSPARK_PYTHON=/home/$USER/.conda/envs/ipykernel_py2/bin/python
Now, we can initiate pyspark
$ pyspark [W 11:09:52.465 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation)
Once this pyspark is running, Jupyter will be automatically open in your web browser. Then you can create a new notebook New -> Python 2, then you can test if Spark is running in the Jupyter notebook as follows
In [1]: print(spark.version) [SHIFT+ENTER] Out[1]: 2.0.0
This means that Apache Spark is successfully running on your Jupyter notebook.
PySpark with Apache Toree
Apache Toree is designed to enable applications that are both interactive and remote to work with Apache Spark. Apache Toree mainly works with Scala, but you can have PySpark running. The downside is that matplotlib magic does not currently work, and you will need "plt.show()" in order to see your plots. We suggest you use PySpark using Python 2. If you need PySpark with Apache Toree, you can do the following
Step 1)
$ export PATH=/usr/local/miniconda/3/bin:$PATH $ conda create -n toree anaconda $ source activate toree
Step 2)
$ pip install toree $ export PYSPARK_PYTHON=/home/$USER/.conda/envs/ipykernel_py2/bin/python $ jupyter toree install --interpreters=PySpark --spark_home=/home/$USER/spark162\ --user --python_exec=/home/$USER/.conda/envs/ipykernel_py2/bin/python
Now you should have an option "Apache Toree - PySpark".
Scala with Apache Toree
Apache Toree is designed to enable applications that are both interactive and remote to work with Apache Spark. Apache Toree mainly works with Scala.
Step 1)
export PATH=/usr/local/miniconda/3/bin:$PATH conda create -n toree anaconda source activate toree
Step 2)
pip install toree jupyter toree install --interpreters=Scala --spark_home=/home/$USER/spark162 --user
Now you should have an option "Apache Toree - Scala".
SparkR
You will need to install R first. Once you have R, you will not need to do anything in the command line, you start a new jupyter session,
jupyter notebook
Create a new R notebook as you would normally do. In the notebook you will have to add the Spark Kernel,
In [1]: Sys.setenv(SPARK_HOME="/home/<FILL IN>/spark162/") In [2]: .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths())) In [3]: library(SparkR) In [4]: sc <- sparkR.init(master="local") In [5]: sqlContext <- sparkRSQL.init(sc)
The first line sets the Apache Spark home path, note that you will have to change "<FILL IN>" for your SHARCNET username. The second line makes available the SparkR libraries to R. The third line loads the library from the installed Apache Spark home path. The fourth line creates the Spark Context, and initiates the Spark Kernel. The fifth lines creates sqlContext from the created Spark Context.
Octave Environment
Step 1) Get the environments:
export PATH=/usr/local/miniconda/3/bin:$PATH conda create -n octave anaconda source activate octave
Step 2) Install Octave:
pip install octave_kernel python -m octave_kernel.install
Haskell Environment
It is good to create an anaconda environment,
$ export PATH=/usr/local/miniconda/3/bin:$PATH $ conda create -n ihaskell anaconda
After you have created the environment, you need activate it,
$ source activate ihaskell $ cd
We need to clone the iHaskell repository from github.com,
$ git clone https://github.com/gibiansky/IHaskell.git $ cd IHaskell
Then we install it
$ ./build.sh ihaskell $ ihaskell install $ source deactivate
Now you can start jupyter notebook.
Installing optional packages
Deactivate Environment
As an example we can deactivate the r environment by doing:
(r) [roberpj@vdi-fedora23:~] source deactivate [roberpj@vdi-fedora23:~]
Package Removal
Ideally use conda, python and r to cleanly uninstall all packages if one wants remove everything and start over. Otherwise remove the installation directories directly with great case by doing something like the following:
rm -rf ~/.jupyter/ ~/.local/share/jupyter /work/$USER/python/envs $XDG_RUNTIME_DIR/jupyter*
References
o Homepage: http://jupyter.org/
o Release: http://jupyter.readthedocs.io/en/latest/releases/content-releases.html
o Forum: http://jupyter.org/community.html
o Spark-Cloudera: http://www.cloudera.com/documentation/enterprise/5-5-x/topics/spark_ipython.html