The first step in using Jupyter notebooks is to set things up so that we can actually run Jupyter. Broadly, there are two ways to do this:
jupyter-lab
executable and connect to the notebooks using your browserPersonally, I prefer the first approach. Part of this is familiarity, as I've been using that approach for longer than VSCode was available. The other part is that I personally find the first approach to be more reliable; I sometimes have trouble with VSCode forgetting which kernel to use with notebooks. However, VSCode is much more of an out-of-the-box approach, especially when running notebooks on a remote computer or when running on Windows.
I'll cover two use cases for now:
jupyter-lab
yourself on your local computer,jupyter-lab
on a remote computer and connecting via an SSH tunnel, andIn the future, I may cover using VSCode locally and remotely. But since that's not my preferred method, I don't have as many tips about doing so.
Jupyter Lab vs. Jupyter Notebook: If you've looks around online, you've probably seen references to both Jupyter Lab and Jupyter notebooks. The difference is that the notebook is the basic format of cells which can contain markdown text, raw text, or code that can run in different kernels. While the basic notebook viewer works well, Jupyter Lab is essentially the second generation notebook viewer. It allows you to have multiple notebooks open at once, as well as Python interpreters or terminal prompts, can open a number of simple file types, and includes a file browser. I find the Lab better in all respects, which is why I focus on using that one here.
An important caveat: these instructions work well on Linux or Mac, but I've not tried them on Windows. I suspect you might be able to replicate this set up using Windows Subsystem for Linux, but have not tested that myself.
If you installed Anaconda Python, you probably already have jupyter-lab
installed. Open a terminal
and run:
$ jupyter-lab
If it works, you should see something like:
[I 21:53:07.116 LabApp] JupyterLab extension loaded from /home/josh/anaconda3/lib/python3.8/site-packages/jupyterlab
[I 21:53:07.117 LabApp] JupyterLab application directory is /home/josh/anaconda3/share/jupyter/lab
[I 21:53:07.119 LabApp] Serving notebooks from local directory: /home/josh
[I 21:53:07.119 LabApp] The Jupyter Notebook is running at:
[I 21:53:07.119 LabApp] http://localhost:8888/?token=ec9aa1072c7cb472270e4a015df4ed924a479a4be1b8c616
[I 21:53:07.119 LabApp] or http://127.0.0.1:8888/?token=ec9aa1072c7cb472270e4a015df4ed924a479a4be1b8c616
[I 21:53:07.119 LabApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 21:53:07.173 LabApp]
To access the notebook, open this file in a browser:
file:///home/josh/.local/share/jupyter/runtime/nbserver-9530-open.html
Or copy and paste one of these URLs:
http://localhost:8888/?token=ec9aa1072c7cb472270e4a015df4ed924a479a4be1b8c616
or http://127.0.0.1:8888/?token=ec9aa1072c7cb472270e4a015df4ed924a479a4be1b8c616
in the terminal and your browser should open a new page. If so, you're done! If not, here's what I do:
conda
to create a new environment for Jupyter with the shell command conda create -n jupyter jupyterlab
. This will create a new environment
named "jupyter" and install the "jupyterlab" package and its dependencies. (Installing it in its own environment isolates it
from other packages we might install in the future to do our regular work.)jupyter-lab
was installed with the command which jupyter-lab
. If that returns nothing, try which jupyter
.jupyter-lab
(or jupyter
) program to a directory that will always be on our PATH so that we can run it without activating
the jupyter
environment. jupyter
environment with the shell command conda deactivate
.cd
into it. (For help understanding the PATH, see the aside below.) I usually choose ~/.local/bin
as it is a standard location for user-installed programs on Linux machines, but any directory on your PATH that you can write to will work.ln -s /path/to/jupyter
where "/path/to/jupyter" is the path we got in step 3.Now, if you cd
to any directory and run jupyter-lab
, it should launch as described at the start of this section.
Finding a directory on your PATH: If you're not familiar with the PATH concept, here's a quick guide to how to find a good directory to choose.
echo $PATH
. This will print a bunch of directories, with each one separated by a colon.The idea of the PATH variable is that it lists all directories where the shell can search for programs when you invoke a plain command like ls
or cd
with no
leading path elements. That is, running jupyter-lab
will have the shell search all directories for a program named jupyter-lab
, but running ./jupyter-lab
will
run jupyter-lab
from the current directory, and running ~/bin/jupyter-lab
will run the program at that exact location.
The way that Jupyter works, the web browser that displays the notebook need not be on the same computer that executes the code.
As long as your web browers can connect to the jupyter-lab
server, it can use it to run code.
That means that if you have SSH access to a computing cluster, or just a workstation computer, you can run notebooks on that
server from your computer.
If you are working with some kind of shared high performance computing cluster, check if they have instructions on how to run Jupyter on that system. They probably will, and those instructions will be meant to make sure you are being a good cluster citizen and sharing resources correctly.
To connect to a remote instance, we need to set up a Jupyter server on the computing cluster we want the code to run on, then
create an SSH tunnel from our computer to that one.
On the computing cluster, follow the setup steps from the previous section until you run jupyter-lab
. When you run it, add the --no-browser
flag, to ensure that Jupyter doesn't try to start a web brower on the remote machine.
Also take note that after the first 8 or so lines of output from Jupyter, there should be a block similar to this:
To access the notebook, open this file in a browser:
file:///home/josh/.local/share/jupyter/runtime/nbserver-18237-open.html
Or copy and paste one of these URLs:
http://localhost:8888/?token=74332ccfd156bfebca2ad4fdc9c216c328c826a4640004b0
or http://127.0.0.1:8888/?token=74332ccfd156bfebca2ad4fdc9c216c328c826a4640004b0
We'll need the line that includes "localhost" in it, so copy that.
Now, back on your computer, you need to start the SSH tunnel so that your computer can talk to the remote on. An SSH tunnel is like a regular SSH connection, except that instead of only taking input from your terminal, other applications can connect to it and send traffic back and forth. In this case, that's what we want Jupyter to do.
For this, I'll assume that you're connecting to a machine named hpc
, that is you type
ssh user@hpc
to connect your terminal to this machine.
We'll also need to know what port Jupyter is listening on on the cluster.
Fortunately, it told us: in the "localhost" line from before, it is the number immediately after localhost:
, i.e. in localhost:8888
the port is 8888
.
To create the SSH tunnel, assuming that the port is 8888, you would run the following command on your computer:
ssh -N -f -L 8888:localhost:8888 hpc
Breaking this down:
-N
tells SSH not to actually do anything on the remote system, just establish a connection.-f
puts this SSH command in the background, so that we can keep using the terminal.-L
is what makes this a tunnel.8888
(first one) is the port on our machine that we connect to the tunnel via.localhost
is the address on the other side of the tunnel that traffic should go to once it's out of the tunnel.
Since we want it to go to that machine, we use "localhost".8888
(second one) is the port on the remote machine to send traffic to. This must match the port in the URL
that Jupyter printed when it started.hpc
tells SSH what machine to connect to.I like to add a message that confirms the tunnel started, so I run something like:
ssh -N -f -L 8888:localhost:8888 hpc && 'Jupyter tunnel started on port 8888'
The message after the &&
will only print if the ssh
command succeeds, so if we see that, we know we're good.
The last step is to open a web browser and paste the link Jupyter gave us in the web browser; in this example
it was http://localhost:8888/?token=74332ccfd156bfebca2ad4fdc9c216c328c826a4640004b0
.
Once it connects, you should see the files on the remote machine in the Jupyter Lab file browser.
If so, success!
If not, see if there are any errors in the Jupyter output on the remote machine (in the terminal window where you
ran the jupyter-lab
command).
ssh -N -f -L 8900:localhost:8900 hpc
. That way
you can just copy the URL Jupyter gave you and paste it into your web broswer without editing it.ssh -N -f -L 9000:localhost:8888 ...
.--port
argument when you launch Jupyter Lab on the cluster, e.g.
jupyter-lab --no-browser --port=9999
will have it try ports starting with 9999 instead of 8888. Since this will
get you out of the port range most people use, your port number stay the same more often.Which ports can I use? Generally, ports in the 8000s, 9000s, and 10,000s should be okay. If you choose a port already in use on the cluster, Jupyter should try up to 50 other ports by default to find one not being used. If you choose a port in use on your computer for the SSH tunnel, you should get an error message - just try a different port.
Whatever you do, stay away from port numbers less than 1000 - such ports are generally reserved for specific applications, and you probably won't be allowed to even try to use them.