Jupyter Cache#

Execute and cache multiple Jupyter Notebook-like files via an API and CLI.

🤓 Smart re-execution

Notebooks will only be re-executed when code cells have changed (or code related metadata), not Markdown/Raw cells.

🧩 Pluggable execution modes

Select the executor for notebooks, including serial and parallel execution

📈 Execution reports

Timing statistics and exception tracebacks are stored for analysis

📖 jupytext integration

Read and execute notebooks written in multiple formats

Why use jupyter-cache?#

If you have a number of notebooks whose execution outputs you want to ensure are kept up to date, without having to re-execute them every time (particularly for long running code, or text-based formats that do not store the outputs).

The notebooks must have deterministic execution outputs:

  • You use the same environment to run them (e.g. the same installed packages)

  • They run no non-deterministic code (e.g. random numbers)

  • They do not depend on external resources (e.g. files or network connections) that change over time

For example, it is utilised by jupyter-book, to allow for fast document re-builds.

Installation#

Install jupyter-cache, via pip or Conda:

pip install jupyter-cache
conda install jupyter-cache

Quick-start#

Add one or more source notebook files to the “project” (a folder containing a database and a cache of executed notebooks):

$ jcache notebook add tests/notebooks/basic_unrun.ipynb tests/notebooks/basic_failing.ipynb
Cache path: ../.jupyter_cache
The cache does not yet exist, do you want to create it? [y/N]: y
Adding: ../tests/notebooks/basic_unrun.ipynb
Adding: ../tests/notebooks/basic_failing.ipynb
Success!

These files are now ready for execution:

$ jcache notebook list 
  ID  URI                                  Reader    Added             Status
----  -----------------------------------  --------  ----------------  --------
   1  tests/notebooks/basic_unrun.ipynb    nbformat  2023-11-08 18:34  -
   2  tests/notebooks/basic_failing.ipynb  nbformat  2023-11-08 18:34  -

Now run the execution:

$ jcache project execute 
Executing 2 notebook(s) in serial
Executing: ../tests/notebooks/basic_unrun.ipynb
0.00s - Debugger warning: It seems that frozen modules are being used, which may
0.00s - make the debugger miss breakpoints. Please pass -Xfrozen_modules=off
0.00s - to python to disable frozen modules.
0.00s - Note: Debugging will proceed. Set PYDEVD_DISABLE_FILE_VALIDATION=1 to disable this validation.
Execution Successful: ../tests/notebooks/basic_unrun.ipynb
Executing: ../tests/notebooks/basic_failing.ipynb
warning: Execution Excepted: ../tests/notebooks/basic_failing.ipynb
warning: CellExecutionError: An error occurred while executing the following cell:
warning: ------------------
warning: raise Exception('oopsie!')
warning: ------------------
warning: 
warning: 
warning: ---------------------------------------------------------------------------
warning: Exception                                 Traceback (most recent call last)
warning: Cell In[1], line 1
warning: ----> 1 raise Exception('oopsie!')
warning: 
warning: Exception: oopsie!
Finished! Successfully executed notebooks have been cached.
succeeded:
- ../tests/notebooks/basic_unrun.ipynb
excepted:
- ../tests/notebooks/basic_failing.ipynb
errored: []

Successfully executed files will now be associated with a record in the cache:

$ jcache notebook list 
  ID  URI                                  Reader    Added             Status
----  -----------------------------------  --------  ----------------  --------
   1  tests/notebooks/basic_unrun.ipynb    nbformat  2023-11-08 18:34  ✅ [1]
   2  tests/notebooks/basic_failing.ipynb  nbformat  2023-11-08 18:34  ❌

The cache record includes execution statistics:

$ jcache cache info 1
ID: 1
Origin URI: ../tests/notebooks/basic_unrun.ipynb
Created: 2023-11-08 18:34
Accessed: 2023-11-08 18:34
Hashkey: 94c17138f782c75df59e989fffa64e3a
Data:
  execution_seconds: 1.191696745999252

Next time we execute, jupyter-cache will check which files require re-execution:

$ jcache project execute 
Executing 1 notebook(s) in serial
Executing: ../tests/notebooks/basic_failing.ipynb
warning: Execution Excepted: ../tests/notebooks/basic_failing.ipynb
warning: CellExecutionError: An error occurred while executing the following cell:
warning: ------------------
warning: raise Exception('oopsie!')
warning: ------------------
warning: 
warning: 
warning: ---------------------------------------------------------------------------
warning: Exception                                 Traceback (most recent call last)
warning: Cell In[1], line 1
warning: ----> 1 raise Exception('oopsie!')
warning: 
warning: Exception: oopsie!
Finished! Successfully executed notebooks have been cached.
succeeded: []
excepted:
- ../tests/notebooks/basic_failing.ipynb
errored: []

The source files themselves will not be modified during/after execution. You can create a new “final” notebook, with the cached outputs merged into the source notebook with:

$ jcache notebook merge 1 final_notebook.ipynb
Merged with cache PK 1
Success!

You can also add notebooks with custom formats, such as those read by jupytext:

$ jcache notebook add --reader jupytext tests/notebooks/basic.md
Adding: ../tests/notebooks/basic.md
Success!
$ jcache notebook list 
  ID  URI                                  Reader    Added             Status
----  -----------------------------------  --------  ----------------  --------
   1  tests/notebooks/basic_unrun.ipynb    nbformat  2023-11-08 18:34  ✅ [1]
   2  tests/notebooks/basic_failing.ipynb  nbformat  2023-11-08 18:34  ❌
   3  tests/notebooks/basic.md             jupytext  2023-11-08 18:34  ✅ [1]

Design considerations#

Although there are certainly other use cases, the principle use case this was written for is generating books / websites, created from multiple notebooks (and other text documents). It is desired that notebooks can be auto-executed only if the notebook had been modified in a way that may alter its code cell outputs.

Some desired requirements (not yet all implemented):

  • A clear and robust API

  • The cache is persistent on disk

  • Notebook comparisons separate out “edits to content” from “edits to code cells”. Cell rearranges and code cell changes should require a re-execution. Text content changes should not.

  • Allow parallel access to notebooks (for execution)

  • Store execution statistics/reports.

  • Store external assets: Notebooks being executed often require external assets: importing scripts/data/etc. These are prepared by the users.

  • Store execution artefacts: created during execution

  • A transparent and robust cache invalidation: imagine the user updating an external dependency or a Python module, or checking out a different git branch.

Contents#