Setting up Jupyter

The first step in using Jupyter notebooks is to set things up so that we can actually run Jupyter. Broadly, there are two ways to do this:

  1. Run the jupyter-lab executable and connect to the notebooks using your browser
  2. Use VSCode to run Jupyter notebooks

Personally, I prefer the first approach. Part of this is familiarity, as I've been using that approach for longer than VSCode was available. The other part is that I personally find the first approach to be more reliable; I sometimes have trouble with VSCode forgetting which kernel to use with notebooks. However, VSCode is much more of an out-of-the-box approach, especially when running notebooks on a remote computer or when running on Windows.

I'll cover two use cases for now:

  1. running jupyter-lab yourself on your local computer,
  2. running jupyter-lab on a remote computer and connecting via an SSH tunnel, and

In the future, I may cover using VSCode locally and remotely. But since that's not my preferred method, I don't have as many tips about doing so.

Jupyter Lab vs. Jupyter Notebook: If you've looks around online, you've probably seen references to both Jupyter Lab and Jupyter notebooks. The difference is that the notebook is the basic format of cells which can contain markdown text, raw text, or code that can run in different kernels. While the basic notebook viewer works well, Jupyter Lab is essentially the second generation notebook viewer. It allows you to have multiple notebooks open at once, as well as Python interpreters or terminal prompts, can open a number of simple file types, and includes a file browser. I find the Lab better in all respects, which is why I focus on using that one here.

Running jupyter-lab yourself

An important caveat: these instructions work well on Linux or Mac, but I've not tried them on Windows. I suspect you might be able to replicate this set up using Windows Subsystem for Linux, but have not tested that myself.

Running locally

If you installed Anaconda Python, you probably already have jupyter-lab installed. Open a terminal and run:

$ jupyter-lab

If it works, you should see something like:

[I 21:53:07.116 LabApp] JupyterLab extension loaded from /home/josh/anaconda3/lib/python3.8/site-packages/jupyterlab
[I 21:53:07.117 LabApp] JupyterLab application directory is /home/josh/anaconda3/share/jupyter/lab
[I 21:53:07.119 LabApp] Serving notebooks from local directory: /home/josh
[I 21:53:07.119 LabApp] The Jupyter Notebook is running at:
[I 21:53:07.119 LabApp] http://localhost:8888/?token=ec9aa1072c7cb472270e4a015df4ed924a479a4be1b8c616
[I 21:53:07.119 LabApp]  or http://127.0.0.1:8888/?token=ec9aa1072c7cb472270e4a015df4ed924a479a4be1b8c616
[I 21:53:07.119 LabApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 21:53:07.173 LabApp] 

    To access the notebook, open this file in a browser:
        file:///home/josh/.local/share/jupyter/runtime/nbserver-9530-open.html
    Or copy and paste one of these URLs:
        http://localhost:8888/?token=ec9aa1072c7cb472270e4a015df4ed924a479a4be1b8c616
     or http://127.0.0.1:8888/?token=ec9aa1072c7cb472270e4a015df4ed924a479a4be1b8c616

in the terminal and your browser should open a new page. If so, you're done! If not, here's what I do:

  1. Install Anaconda or Miniconda if you don't already have it. I recommend Miniconda, and follow the installation instructions here. If you installed it, open a new terminal tab/window before proceding (to make sure any changes to your shell are in place).
  2. Use conda to create a new environment for Jupyter with the shell command conda create -n jupyter jupyterlab. This will create a new environment named "jupyter" and install the "jupyterlab" package and its dependencies. (Installing it in its own environment isolates it from other packages we might install in the future to do our regular work.)
  3. Get the path where jupyter-lab was installed with the command which jupyter-lab. If that returns nothing, try which jupyter.
  4. Now we will link the jupyter-lab (or jupyter) program to a directory that will always be on our PATH so that we can run it without activating the jupyter environment.
    • Deactivate the jupyter environment with the shell command conda deactivate.
    • Choose a directory you can write to that is on your PATH, and cd into it. (For help understanding the PATH, see the aside below.) I usually choose ~/.local/bin as it is a standard location for user-installed programs on Linux machines, but any directory on your PATH that you can write to will work.
    • In the chosen directory, run the command ln -s /path/to/jupyter where "/path/to/jupyter" is the path we got in step 3.

Now, if you cd to any directory and run jupyter-lab, it should launch as described at the start of this section.

Finding a directory on your PATH: If you're not familiar with the PATH concept, here's a quick guide to how to find a good directory to choose.

  • First run the command echo $PATH. This will print a bunch of directories, with each one separated by a colon.
  • Find one of these under your home directory. Any of those will do for this guide.
  • If there are no paths listed under your home directory, you should create a directory where you want custom programs to live and add it to the PATH. This involves editing one of your shell's startup files as described here.

The idea of the PATH variable is that it lists all directories where the shell can search for programs when you invoke a plain command like ls or cd with no leading path elements. That is, running jupyter-lab will have the shell search all directories for a program named jupyter-lab, but running ./jupyter-lab will run jupyter-lab from the current directory, and running ~/bin/jupyter-lab will run the program at that exact location.

Running remotely

The way that Jupyter works, the web browser that displays the notebook need not be on the same computer that executes the code. As long as your web browers can connect to the jupyter-lab server, it can use it to run code. That means that if you have SSH access to a computing cluster, or just a workstation computer, you can run notebooks on that server from your computer.

If you are working with some kind of shared high performance computing cluster, check if they have instructions on how to run Jupyter on that system. They probably will, and those instructions will be meant to make sure you are being a good cluster citizen and sharing resources correctly.

To connect to a remote instance, we need to set up a Jupyter server on the computing cluster we want the code to run on, then create an SSH tunnel from our computer to that one. On the computing cluster, follow the setup steps from the previous section until you run jupyter-lab. When you run it, add the --no-browser flag, to ensure that Jupyter doesn't try to start a web brower on the remote machine. Also take note that after the first 8 or so lines of output from Jupyter, there should be a block similar to this:

To access the notebook, open this file in a browser:
    file:///home/josh/.local/share/jupyter/runtime/nbserver-18237-open.html
Or copy and paste one of these URLs:
    http://localhost:8888/?token=74332ccfd156bfebca2ad4fdc9c216c328c826a4640004b0
 or http://127.0.0.1:8888/?token=74332ccfd156bfebca2ad4fdc9c216c328c826a4640004b0

We'll need the line that includes "localhost" in it, so copy that.

Now, back on your computer, you need to start the SSH tunnel so that your computer can talk to the remote on. An SSH tunnel is like a regular SSH connection, except that instead of only taking input from your terminal, other applications can connect to it and send traffic back and forth. In this case, that's what we want Jupyter to do.

For this, I'll assume that you're connecting to a machine named hpc, that is you type

ssh user@hpc

to connect your terminal to this machine. We'll also need to know what port Jupyter is listening on on the cluster. Fortunately, it told us: in the "localhost" line from before, it is the number immediately after localhost:, i.e. in localhost:8888 the port is 8888.

To create the SSH tunnel, assuming that the port is 8888, you would run the following command on your computer:

ssh -N -f -L 8888:localhost:8888 hpc

Breaking this down:

  • -N tells SSH not to actually do anything on the remote system, just establish a connection.
  • -f puts this SSH command in the background, so that we can keep using the terminal.
  • -L is what makes this a tunnel.
  • 8888 (first one) is the port on our machine that we connect to the tunnel via.
  • localhost is the address on the other side of the tunnel that traffic should go to once it's out of the tunnel. Since we want it to go to that machine, we use "localhost".
  • 8888 (second one) is the port on the remote machine to send traffic to. This must match the port in the URL that Jupyter printed when it started.
  • hpc tells SSH what machine to connect to.

I like to add a message that confirms the tunnel started, so I run something like:

ssh -N -f -L 8888:localhost:8888 hpc && 'Jupyter tunnel started on port 8888'

The message after the && will only print if the ssh command succeeds, so if we see that, we know we're good.

The last step is to open a web browser and paste the link Jupyter gave us in the web browser; in this example it was http://localhost:8888/?token=74332ccfd156bfebca2ad4fdc9c216c328c826a4640004b0. Once it connects, you should see the files on the remote machine in the Jupyter Lab file browser. If so, success! If not, see if there are any errors in the Jupyter output on the remote machine (in the terminal window where you ran the jupyter-lab command).

Extra tips
  • If the remote port changes: If Jupyter gives you a port in that "localhost" URL like 8889 or 8900, it's simplest to use the same port in both parts of the SSH command, e.g. ssh -N -f -L 8900:localhost:8900 hpc. That way you can just copy the URL Jupyter gave you and paste it into your web broswer without editing it.
  • Changing the local port: If you run Jupyter on your local computer as well as remote, you might want to run the tunnel out of a different local port so that it doesn't conflict with your local Jupyter instance. To do that, change the first number in the SSH command. So if you want to use port 9000 locally and 8888 remotely, the command is ssh -N -f -L 9000:localhost:8888 ....
  • Choosing a different remote port: if lots of people run Jupyter on the cluster you're connecting to, you may find that the port Jupyter gives you bounces around each time you start it on the cluster. You can tell Jupyter to start from a port other than 8888 by adding the --port argument when you launch Jupyter Lab on the cluster, e.g. jupyter-lab --no-browser --port=9999 will have it try ports starting with 9999 instead of 8888. Since this will get you out of the port range most people use, your port number stay the same more often.

Which ports can I use? Generally, ports in the 8000s, 9000s, and 10,000s should be okay. If you choose a port already in use on the cluster, Jupyter should try up to 50 other ports by default to find one not being used. If you choose a port in use on your computer for the SSH tunnel, you should get an error message - just try a different port.

Whatever you do, stay away from port numbers less than 1000 - such ports are generally reserved for specific applications, and you probably won't be allowed to even try to use them.