(→Apache Spark Environment) |
(→Install Python modules) |
||
(30 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
+ | {{Template:CCDelete}} | ||
{{Software | {{Software | ||
|package_name=JUPYTER | |package_name=JUPYTER | ||
Line 15: | Line 16: | ||
=Usage= | =Usage= | ||
− | == | + | ==Graham cluster== |
− | Jupyter is | + | Jupyter notebook comes in one Python model on Graham. Instructions for installing it can be found here https://docs.computecanada.ca/wiki/Jupyter. You can get it work on the login node, and the compute nodes. Note that any notebook running on the login node will be killed after sometime. You will have to submit a job requesting the amount of CPU, GPU, memory and runtime. Here, we give the instruction to run in the compute node, since it is the recommended way. |
+ | |||
+ | ===== Load the module ===== | ||
+ | |||
+ | Log onto graham.sharcnet.ca (or graham.computecanada.ca) and load the python module | ||
+ | |||
+ | ssh user@graham.sharcnet.ca | ||
+ | module load python35-scipy-stack | ||
+ | |||
+ | ===== Install Python modules ===== | ||
+ | |||
+ | pip3.5 install pycuda --surya28 | ||
+ | |||
+ | ===== Submit the job ===== | ||
+ | |||
+ | Create a bash script for submitting a jupyter job on the slurm scheduler, i.e., slurm_jupyter.sh and add | ||
+ | |||
+ | <pre> | ||
+ | #!/bin/bash | ||
+ | #SBATCH --gres=gpu:2 #only if you need GPU | ||
+ | #SBATCH --time=0-01:00 #runtime d-hh:mm | ||
+ | #SBATCH --nodes 1 #how many nodes | ||
+ | #SBATCH --ntasks-per-node 32 #number of cores per node | ||
+ | #SBATCH --mem-per-cpu 4000 # memory in MB | ||
+ | #SBATCH --job-name tunnel #name of the job | ||
+ | #SBATCH --output jupyter-log-%J.txt #name of the log file | ||
+ | #SBATCH --mail-type=BEGIN #send email if job has started | ||
+ | #SBATCH --mail-user=<email_address> #send the email to this email_address | ||
+ | |||
+ | ## load modules that you might need, in this case cuda for pycuda | ||
+ | |||
+ | module load cuda | ||
+ | |||
+ | ## get tunneling info | ||
+ | XDG_RUNTIME_DIR="" | ||
+ | ipnport=$(shuf -i8000-9999 -n1) | ||
+ | ipnip=$(hostname -i | xargs) | ||
+ | |||
+ | ## print tunneling instructions to jupyter-log-{jobid}.txt | ||
+ | echo -e " | ||
+ | Copy/Paste this in your local terminal to ssh tunnel with remote | ||
+ | ----------------------------------------------------------------- | ||
+ | sshuttle -r $USER@graham.sharcnet.ca -v $ipnip/24 | ||
+ | ----------------------------------------------------------------- | ||
+ | |||
+ | Then open a browser on your local machine to the following address | ||
+ | ------------------------------------------------------------------ | ||
+ | http://$ipnip:$ipnport (prefix w/ https:// if using password) | ||
+ | ------------------------------------------------------------------ | ||
+ | " | ||
+ | ## start an ipcluster instance and launch jupyter server | ||
+ | jupyter-notebook --no-browser --port=$ipnport --ip=$ipnip | ||
+ | </pre> | ||
+ | |||
+ | ===== Making the tunneling ===== | ||
+ | |||
+ | Open a new terminal window, and run the sshuttle command to port-forward the jupyter port, e.g. | ||
+ | |||
+ | sshuttle -r jnandez@graham.sharcnet.ca -v 10.29.76.66/24 | ||
+ | |||
+ | ===== Opening in a browser ===== | ||
+ | |||
+ | Open your local browser and type, e.g. | ||
+ | |||
+ | http://10.29.76.66:8850/ | ||
==Interactive== | ==Interactive== | ||
Line 40: | Line 105: | ||
Follow the below sections to make additional environments available to Jupiter Notebook. The conda approach is used for setting up python 2, 3 and r virtual environments as described in https://ipython.readthedocs.io/en/latest/install/kernel_install.html ... | Follow the below sections to make additional environments available to Jupiter Notebook. The conda approach is used for setting up python 2, 3 and r virtual environments as described in https://ipython.readthedocs.io/en/latest/install/kernel_install.html ... | ||
+ | |||
+ | ===Preparation=== | ||
+ | |||
+ | If you plan to install the optional pybinding package in step 3) then to a bug described in https://bugs.python.org/issue18378 and further in https://stackoverflow.com/questions/15526996/ipython-notebook-locale-error before attempting to create the notebook on fedora23 which uses CA locale by default, add the following lines to your ~/.bashrc file then logout and login again. Running the locale command will show US active. | ||
+ | |||
+ | export LC_ALL=en_US.UTF-8 | ||
+ | export LANG=en_US.UTF-8 | ||
+ | export LANGUAGE=en_US.UTF-8 | ||
+ | |||
+ | Failure to change the setting will result in a fatal error message during the installation: | ||
+ | <pre> | ||
+ | UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 394: ordinal not in range(128 | ||
+ | </pre> | ||
+ | |||
+ | If further complications develop you could try carefully cleaning up the config files then reinstalling with: | ||
+ | |||
+ | rm -rf ~/.conda | ||
+ | rm -rf ~/.cache/pip | ||
===Python 2 Environment=== | ===Python 2 Environment=== | ||
Line 54: | Line 137: | ||
(ipykernel_py2) [roberpj@vdi-fedora23:~] python -V | (ipykernel_py2) [roberpj@vdi-fedora23:~] python -V | ||
− | Python 2.7. | + | Python 2.7.13 :: Continuum Analytics, Inc. |
</pre> | </pre> | ||
Line 63: | Line 146: | ||
</pre> | </pre> | ||
− | Step 3) [Optional | + | Step 3) [Optional] |
<pre> | <pre> | ||
− | (ipykernel_py2) [roberpj@vdi-fedora23:~] pip install matplotlib pandas scipy scikit-learn | + | (ipykernel_py2) [roberpj@vdi-fedora23:~] pip install matplotlib pandas scipy scikit-learn pybinding |
+ | </pre> | ||
+ | |||
+ | Step 4) Deactivate the environment | ||
+ | |||
+ | <pre> | ||
+ | (ipykernel_py2) [jnandez@vdi-fedora23 ~]$ source deactivate | ||
</pre> | </pre> | ||
Line 98: | Line 187: | ||
</pre> | </pre> | ||
− | Step 3) [Optional | + | Step 3) [Optional] |
<pre> | <pre> | ||
− | (ipykernel_py3) [roberpj@vdi-fedora23:~] pip install matplotlib pandas scipy scikit-learn numpy | + | (ipykernel_py3) [roberpj@vdi-fedora23:~] pip install matplotlib pandas scipy scikit-learn numpy pybinding |
</pre> | </pre> | ||
Line 122: | Line 211: | ||
(r) [roberpj@vdi-fedora23:~] R --version | (r) [roberpj@vdi-fedora23:~] R --version | ||
− | R version 3.3. | + | R version 3.3.2 (2016-10-31) -- "Sincere Pumpkin Patch" |
</pre> | </pre> | ||
Line 130: | Line 219: | ||
> IRkernel::installspec() | > IRkernel::installspec() | ||
[InstallKernelSpec] Installed kernelspec ir in /home/roberpj/.local/share/jupyter/kernels/ir | [InstallKernelSpec] Installed kernelspec ir in /home/roberpj/.local/share/jupyter/kernels/ir | ||
+ | > q() | ||
+ | Save workspace image? [y/n/c]: n | ||
+ | (r) [jnandez@vdi-fedora23 ~]$ source deactivate | ||
</pre> | </pre> | ||
Line 136: | Line 228: | ||
[roberpj@vdi-fedora23:~] jupyter notebook | [roberpj@vdi-fedora23:~] jupyter notebook | ||
− | + | ===Apache Spark Environment=== | |
− | + | Apache Spark contains R, Python, Scala, SQL interactive shells. These shells can be access through Jupyter. You will need to install Apache Spark in your home like [https://www.sharcnet.ca/help/index.php/Apache_Spark#Different_versions_in_any_SHARCNET_cluster this] using another cluster, or you could use vdi-fedora23 to do so. We are currently supporting only Python 2, since Python 3 is still not well tested by the Spark community. | |
− | [ | + | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | |||
<pre> | <pre> | ||
$ cd | $ cd | ||
− | $ wget | + | $ wget https://d3kbcqa49mib13.cloudfront.net/spark-2.1.1-bin-hadoop2.7.tgz |
− | $ tar -xvzf spark-2.1. | + | $ tar -xvzf spark-2.1.1-bin-hadoop2.7.tgz |
− | $ mv spark-2.1. | + | $ mv spark-2.1.1-bin-hadoop2.7 spark211 |
− | $ export | + | $ export SPARK_HOME=/home/$USER/spark211 |
+ | $ export PATH=$SPARK_HOME/bin/:$PATH | ||
$ export SPARK_LOCAL_IP=127.0.0.1 | $ export SPARK_LOCAL_IP=127.0.0.1 | ||
$ pyspark | $ pyspark | ||
Line 158: | Line 245: | ||
Note that you will always have to "export SPARK_LOCAL_IP=127.0.0.1" when you start a new session in vdi-fedora23. | Note that you will always have to "export SPARK_LOCAL_IP=127.0.0.1" when you start a new session in vdi-fedora23. | ||
− | ==== PySpark using Python | + | ==== PySpark using Python 2==== |
− | One you have the Python | + | One you have the Python 2.7.13 installed in your home, you can proceed to use Apache Spark on Jupyter. You will open a terminal window, do the following |
+ | |||
+ | Now, pyspark should run fine. Thereafter, we need to tell Spark that we are going to use Jupyter as the Driver. You should do the following, | ||
+ | |||
+ | $ PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
[W 11:09:52.465 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation) | [W 11:09:52.465 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation) | ||
</pre> | </pre> | ||
− | Once this pyspark is running, Jupyter will be automatically open in your web browser. Then you can create a new notebook New -> Python | + | Once this pyspark is running, Jupyter will be automatically open in your web browser. Then you can create a new notebook New -> Python 2, then you can test if Spark is running in the Jupyter notebook as follows |
<pre> | <pre> | ||
In [1]: print(spark.version) [SHIFT+ENTER] | In [1]: print(spark.version) [SHIFT+ENTER] | ||
Line 180: | Line 262: | ||
This means that Apache Spark is successfully running on your Jupyter notebook. | This means that Apache Spark is successfully running on your Jupyter notebook. | ||
+ | If the given command does not initiate a Spark-Python notebook, then try this commands, and repeat the Notebook test. | ||
− | + | Alternatively: | |
− | |||
− | |||
− | |||
<pre> | <pre> | ||
$ export PYSPARK_DRIVER_PYTHON=/usr/bin/jupyter | $ export PYSPARK_DRIVER_PYTHON=/usr/bin/jupyter | ||
$ export PYSPARK_DRIVER_PYTHON_OPTS="notebook --NotebookApp.open_browser=True --NotebookApp.ip=127.0.01 --NotebookApp.port=8888" | $ export PYSPARK_DRIVER_PYTHON_OPTS="notebook --NotebookApp.open_browser=True --NotebookApp.ip=127.0.01 --NotebookApp.port=8888" | ||
$ export PYSPARK_PYTHON=/home/$USER/.conda/envs/ipykernel_py2/bin/python | $ export PYSPARK_PYTHON=/home/$USER/.conda/envs/ipykernel_py2/bin/python | ||
− | |||
− | |||
− | |||
$ pyspark | $ pyspark | ||
− | |||
</pre> | </pre> | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
==== PySpark with Apache Toree==== | ==== PySpark with Apache Toree==== | ||
Line 212: | Line 282: | ||
$ source activate toree | $ source activate toree | ||
Step 2) | Step 2) | ||
− | $ pip install toree | + | $ pip install https://dist.apache.org/repos/dist/dev/incubator/toree/0.2.0/snapshots/dev1/toree-pip/toree-0.2.0.dev1.tar.gz |
$ export PYSPARK_PYTHON=/home/$USER/.conda/envs/ipykernel_py2/bin/python | $ export PYSPARK_PYTHON=/home/$USER/.conda/envs/ipykernel_py2/bin/python | ||
− | $ jupyter toree install --interpreters=PySpark --spark_home= | + | $ jupyter toree install --interpreters=PySpark --spark_home=$SPARK_HOME\ |
− | --user --python_exec= | + | --user --python_exec=$PYSPARK_PYTHON |
Now you should have an option "Apache Toree - PySpark". | Now you should have an option "Apache Toree - PySpark". | ||
Line 229: | Line 299: | ||
Step 2) | Step 2) | ||
pip install toree | pip install toree | ||
− | jupyter toree install --interpreters=Scala --spark_home= | + | jupyter toree install --interpreters=Scala --spark_home=$SPARK_HOME --user |
Now you should have an option "Apache Toree - Scala". | Now you should have an option "Apache Toree - Scala". | ||
Line 239: | Line 309: | ||
Create a new R notebook as you would normally do. In the notebook you will have to add the Spark Kernel, | Create a new R notebook as you would normally do. In the notebook you will have to add the Spark Kernel, | ||
<pre> | <pre> | ||
− | In [1]: | + | In [1]: library(SparkR,lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"))) |
− | + | In [2]: sparkR.session(master="local[*]",sparkConfig = list(spark.driver.memory = "4g")) | |
− | In [ | + | |
− | + | ||
− | + | ||
</pre> | </pre> | ||
The first line sets the Apache Spark home path, note that you will have to change "<FILL IN>" for your SHARCNET username. The second line makes available the SparkR libraries to R. The third line loads the library from the installed Apache Spark home path. The fourth line creates the Spark Context, and initiates the Spark Kernel. The fifth lines creates sqlContext from the created Spark Context. | The first line sets the Apache Spark home path, note that you will have to change "<FILL IN>" for your SHARCNET username. The second line makes available the SparkR libraries to R. The third line loads the library from the installed Apache Spark home path. The fourth line creates the Spark Context, and initiates the Spark Kernel. The fifth lines creates sqlContext from the created Spark Context. |
Latest revision as of 15:03, 21 February 2020
This page is scheduled for deletion because it is either redundant with information available on the CC wiki, or the software is no longer supported. |
Contents
- 1 Introduction
- 2 Version
- 3 Usage
- 4 Notes
- 4.1 Getting Some Help
- 4.2 Create Jupyter Notebooks
- 5 References
JUPYTER |
---|
Description: Produce documents with Jupyter Notebook App |
SHARCNET Package information: see JUPYTER software page in web portal |
Full list of SHARCNET supported software |
Introduction
Provide Jupyter Notebook App on a sharcnet fedora visualization workstation.
Version
Jupyter is in the default path, no module must be loaded to use it. Only one version is currently installed.
Usage
Graham cluster
Jupyter notebook comes in one Python model on Graham. Instructions for installing it can be found here https://docs.computecanada.ca/wiki/Jupyter. You can get it work on the login node, and the compute nodes. Note that any notebook running on the login node will be killed after sometime. You will have to submit a job requesting the amount of CPU, GPU, memory and runtime. Here, we give the instruction to run in the compute node, since it is the recommended way.
Load the module
Log onto graham.sharcnet.ca (or graham.computecanada.ca) and load the python module
ssh user@graham.sharcnet.ca module load python35-scipy-stack
Install Python modules
pip3.5 install pycuda --surya28
Submit the job
Create a bash script for submitting a jupyter job on the slurm scheduler, i.e., slurm_jupyter.sh and add
#!/bin/bash #SBATCH --gres=gpu:2 #only if you need GPU #SBATCH --time=0-01:00 #runtime d-hh:mm #SBATCH --nodes 1 #how many nodes #SBATCH --ntasks-per-node 32 #number of cores per node #SBATCH --mem-per-cpu 4000 # memory in MB #SBATCH --job-name tunnel #name of the job #SBATCH --output jupyter-log-%J.txt #name of the log file #SBATCH --mail-type=BEGIN #send email if job has started #SBATCH --mail-user=<email_address> #send the email to this email_address ## load modules that you might need, in this case cuda for pycuda module load cuda ## get tunneling info XDG_RUNTIME_DIR="" ipnport=$(shuf -i8000-9999 -n1) ipnip=$(hostname -i | xargs) ## print tunneling instructions to jupyter-log-{jobid}.txt echo -e " Copy/Paste this in your local terminal to ssh tunnel with remote ----------------------------------------------------------------- sshuttle -r $USER@graham.sharcnet.ca -v $ipnip/24 ----------------------------------------------------------------- Then open a browser on your local machine to the following address ------------------------------------------------------------------ http://$ipnip:$ipnport (prefix w/ https:// if using password) ------------------------------------------------------------------ " ## start an ipcluster instance and launch jupyter server jupyter-notebook --no-browser --port=$ipnport --ip=$ipnip
Making the tunneling
Open a new terminal window, and run the sshuttle command to port-forward the jupyter port, e.g.
sshuttle -r jnandez@graham.sharcnet.ca -v 10.29.76.66/24
Opening in a browser
Open your local browser and type, e.g.
http://10.29.76.66:8850/
Interactive
Not applicable.
Graphical
Log into vdi-fedora23.user.sharcnet.ca with vncviewer and simply run command:
jupyter notebook
- More information on using vncviewer maybe found here.
Notes
Getting Some Help
jupyter notebook --help
Create Jupyter Notebooks
Follow the below sections to make additional environments available to Jupiter Notebook. The conda approach is used for setting up python 2, 3 and r virtual environments as described in https://ipython.readthedocs.io/en/latest/install/kernel_install.html ...
Preparation
If you plan to install the optional pybinding package in step 3) then to a bug described in https://bugs.python.org/issue18378 and further in https://stackoverflow.com/questions/15526996/ipython-notebook-locale-error before attempting to create the notebook on fedora23 which uses CA locale by default, add the following lines to your ~/.bashrc file then logout and login again. Running the locale command will show US active.
export LC_ALL=en_US.UTF-8 export LANG=en_US.UTF-8 export LANGUAGE=en_US.UTF-8
Failure to change the setting will result in a fatal error message during the installation:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 394: ordinal not in range(128
If further complications develop you could try carefully cleaning up the config files then reinstalling with:
rm -rf ~/.conda rm -rf ~/.cache/pip
Python 2 Environment
Step 1)
[roberpj@vdi-fedora23:~] python -V Python 2.7.11 [roberpj@vdi-fedora23:] export PATH=/usr/local/miniconda/3/bin:$PATH [roberpj@vdi-fedora23:] conda create -n ipykernel_py2 python=2 ipykernel Proceed ([y]/n)? y [roberpj@vdi-fedora23:~] source activate ipykernel_py2 (ipykernel_py2) [roberpj@vdi-fedora23:~] python -V Python 2.7.13 :: Continuum Analytics, Inc.
Step 2)
(ipykernel_py2) [roberpj@vdi-fedora23:~] python -m ipykernel install --user Installed kernelspec python2 in /home/roberpj/.local/share/jupyter/kernels/python2
Step 3) [Optional]
(ipykernel_py2) [roberpj@vdi-fedora23:~] pip install matplotlib pandas scipy scikit-learn pybinding
Step 4) Deactivate the environment
(ipykernel_py2) [jnandez@vdi-fedora23 ~]$ source deactivate
Python 2 should now appear under "New" pulldown (on rhs ) when we run the app:
[roberpj@vdi-fedora23:~] jupyter notebook
To uninstall do something like ...
[roberpj@vdi-fedora23:~] conda uninstall -n ipykernel_py2 ipykernel
Python 3 Environment
Step1)
[roberpj@vdi-fedora23:~] python -V Python 2.7.11 [roberpj@vdi-fedora23:~] export PATH=/usr/local/miniconda/3/bin:$PATH [roberpj@vdi-fedora23:~] conda create -n ipykernel_py3 python=3 ipykernel Proceed ([y]/n)? y [roberpj@vdi-fedora23:~] source activate ipykernel_py3 (ipykernel_py3) [roberpj@vdi-fedora23:~] python -V Python 3.6.0 :: Continuum Analytics, Inc.
Step 2)
(ipykernel_py3) [roberpj@vdi-fedora23:~] python -m ipykernel install --user Installed kernelspec python3 in /home/roberpj/.local/share/jupyter/kernels/python3
Step 3) [Optional]
(ipykernel_py3) [roberpj@vdi-fedora23:~] pip install matplotlib pandas scipy scikit-learn numpy pybinding
Python 3 should now appear under the "New" pulldown when we run the app in the Notebooks section:
[roberpj@vdi-fedora23:~] jupyter notebook
R Environment
Step 1)
[roberpj@vdi-fedora23:~] R --version R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" [roberpj@vdi-fedora23:~] export PATH=/usr/local/miniconda/3/bin:$PATH [roberpj@vdi-fedora23:~] conda create -n r anaconda [roberpj@vdi-fedora23:~] source activate r (r) [roberpj@vdi-fedora23:~] conda install -c r r (r) [roberpj@vdi-fedora23:~] conda install -c r r-essentials (r) [roberpj@vdi-fedora23:~] conda install -c r r-irkernel (r) [roberpj@vdi-fedora23:~] R --version R version 3.3.2 (2016-10-31) -- "Sincere Pumpkin Patch"
Step 2)
[roberpj@vdi-fedora23:~] R > IRkernel::installspec() [InstallKernelSpec] Installed kernelspec ir in /home/roberpj/.local/share/jupyter/kernels/ir > q() Save workspace image? [y/n/c]: n (r) [jnandez@vdi-fedora23 ~]$ source deactivate
R should appear under the "New" pulldown in the Notebooks section when jupyter is started:
[roberpj@vdi-fedora23:~] jupyter notebook
Apache Spark Environment
Apache Spark contains R, Python, Scala, SQL interactive shells. These shells can be access through Jupyter. You will need to install Apache Spark in your home like this using another cluster, or you could use vdi-fedora23 to do so. We are currently supporting only Python 2, since Python 3 is still not well tested by the Spark community.
$ cd $ wget https://d3kbcqa49mib13.cloudfront.net/spark-2.1.1-bin-hadoop2.7.tgz $ tar -xvzf spark-2.1.1-bin-hadoop2.7.tgz $ mv spark-2.1.1-bin-hadoop2.7 spark211 $ export SPARK_HOME=/home/$USER/spark211 $ export PATH=$SPARK_HOME/bin/:$PATH $ export SPARK_LOCAL_IP=127.0.0.1 $ pyspark
Note that you will always have to "export SPARK_LOCAL_IP=127.0.0.1" when you start a new session in vdi-fedora23.
PySpark using Python 2
One you have the Python 2.7.13 installed in your home, you can proceed to use Apache Spark on Jupyter. You will open a terminal window, do the following
Now, pyspark should run fine. Thereafter, we need to tell Spark that we are going to use Jupyter as the Driver. You should do the following,
$ PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark
[W 11:09:52.465 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation) </pre> Once this pyspark is running, Jupyter will be automatically open in your web browser. Then you can create a new notebook New -> Python 2, then you can test if Spark is running in the Jupyter notebook as follows
In [1]: print(spark.version) [SHIFT+ENTER] Out[1]: 2.1.0
This means that Apache Spark is successfully running on your Jupyter notebook.
If the given command does not initiate a Spark-Python notebook, then try this commands, and repeat the Notebook test.
Alternatively:
$ export PYSPARK_DRIVER_PYTHON=/usr/bin/jupyter $ export PYSPARK_DRIVER_PYTHON_OPTS="notebook --NotebookApp.open_browser=True --NotebookApp.ip=127.0.01 --NotebookApp.port=8888" $ export PYSPARK_PYTHON=/home/$USER/.conda/envs/ipykernel_py2/bin/python $ pyspark
PySpark with Apache Toree
Apache Toree is designed to enable applications that are both interactive and remote to work with Apache Spark. Apache Toree mainly works with Scala, but you can have PySpark running. The downside is that matplotlib magic does not currently work, and you will need "plt.show()" in order to see your plots. We suggest you use PySpark using Python 2. If you need PySpark with Apache Toree, you can do the following
Step 1)
$ export PATH=/usr/local/miniconda/3/bin:$PATH $ conda create -n toree anaconda $ source activate toree
Step 2)
$ pip install https://dist.apache.org/repos/dist/dev/incubator/toree/0.2.0/snapshots/dev1/toree-pip/toree-0.2.0.dev1.tar.gz $ export PYSPARK_PYTHON=/home/$USER/.conda/envs/ipykernel_py2/bin/python $ jupyter toree install --interpreters=PySpark --spark_home=$SPARK_HOME\ --user --python_exec=$PYSPARK_PYTHON
Now you should have an option "Apache Toree - PySpark".
Scala with Apache Toree
Apache Toree is designed to enable applications that are both interactive and remote to work with Apache Spark. Apache Toree mainly works with Scala.
Step 1)
export PATH=/usr/local/miniconda/3/bin:$PATH conda create -n toree anaconda source activate toree
Step 2)
pip install toree jupyter toree install --interpreters=Scala --spark_home=$SPARK_HOME --user
Now you should have an option "Apache Toree - Scala".
SparkR
You will need to install R first. Once you have R, you will not need to do anything in the command line, you start a new jupyter session,
jupyter notebook
Create a new R notebook as you would normally do. In the notebook you will have to add the Spark Kernel,
In [1]: library(SparkR,lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"))) In [2]: sparkR.session(master="local[*]",sparkConfig = list(spark.driver.memory = "4g"))
The first line sets the Apache Spark home path, note that you will have to change "<FILL IN>" for your SHARCNET username. The second line makes available the SparkR libraries to R. The third line loads the library from the installed Apache Spark home path. The fourth line creates the Spark Context, and initiates the Spark Kernel. The fifth lines creates sqlContext from the created Spark Context.
Octave Environment
Step 1) Get the environments:
export PATH=/usr/local/miniconda/3/bin:$PATH conda create -n octave anaconda source activate octave
Step 2) Install Octave:
pip install octave_kernel python -m octave_kernel.install
Haskell Environment
It is good to create an anaconda environment,
$ export PATH=/usr/local/miniconda/3/bin:$PATH $ conda create -n ihaskell anaconda
After you have created the environment, you need activate it,
$ source activate ihaskell $ cd
We need to clone the iHaskell repository from github.com,
$ git clone https://github.com/gibiansky/IHaskell.git $ cd IHaskell
Then we install it
$ ./build.sh ihaskell $ ihaskell install $ source deactivate
Now you can start jupyter notebook.
Installing optional packages
Deactivate Environment
As an example we can deactivate the r environment by doing:
(r) [roberpj@vdi-fedora23:~] source deactivate [roberpj@vdi-fedora23:~]
Package Removal
Ideally use conda, python and r to cleanly uninstall all packages if one wants remove everything and start over. Otherwise remove the installation directories directly with great case by doing something like the following:
rm -rf ~/.jupyter/ ~/.local/share/jupyter /work/$USER/python/envs $XDG_RUNTIME_DIR/jupyter*
References
o Homepage: http://jupyter.org/
o Release: http://jupyter.readthedocs.io/en/latest/releases/content-releases.html
o Forum: http://jupyter.org/community.html
o Spark-Cloudera: http://www.cloudera.com/documentation/enterprise/5-5-x/topics/spark_ipython.html