nbio

Reading and writing Jupyter notebooks

Reading a notebook

A notebook is just a json file.

Exported source

def _read_json(self, encoding=None, errors=None):
    return loads(Path(self).read_text(encoding=encoding, errors=errors))

minimal_fn = Path('../tests/minimal.ipynb')
minimal_txt = AttrDict(_read_json(minimal_fn))

It contains two sections, the metadata…:

minimal_txt.metadata

{'solveit_dialog_mode': 'learning', 'solveit_ver': 2}

…and, more importantly, the cells:

minimal_txt.cells

[{'cell_type': 'markdown',
  'id': '801558df',
  'metadata': {},
  'source': ['## A minimal notebook']},
 {'cell_type': 'code',
  'execution_count': None,
  'id': 'e2147a69',
  'metadata': {'time_run': '2026-01-04T20:52:49.901559+00:00'},
  'outputs': [{'data': {'text/plain': ['2']},
    'execution_count': 0,
    'metadata': {},
    'output_type': 'execute_result'}],
  'source': ['# Do some arithmetic\n', '1+1']}]

The second cell here is a code cell, however it contains no outputs, because it hasn’t been executed yet. To execute a notebook, we first need to convert it into a format suitable for nbclient (which expects some dict keys to be available as attrs, and some available as regular dict keys). Normally, nbformat is used for this step, but it’s rather slow and inflexible, so we’ll write our own function based on fastcore’s handy dict2obj, which makes all keys available as both attrs and keys.

nb_lang

def nb_lang(
    nb
):

Call self as a function.

Each notebook language has its own comment character(s), taken from Quarto’s table; nb_lang reads the notebook’s language from its kernelspec, defaulting to Python. Cells record their language in lang_ (a per-cell metadata.language overrides the notebook default), which the directive functions below use to recognize comments.

NbCell

def NbCell(
    idx, cell, lang:str='python'
):

dict subclass that also provides access to keys as attrs, and has a pretty markdown repr

We use an AttrDict subclass which has some basic functionality for accessing notebook cells.

Two cells are equal if they have the same id, even when their content differs – id is what identifies which cell this is, so this is what lets list.index/list.remove (used by Notebook’s cell-moving methods) find the right cell even when another cell elsewhere has identical source. Cells lacking an id fall back to comparing source/cell_type.

c1 = NbCell(0, dict(cell_type='code', source='a', id='x'))
c2 = NbCell(0, dict(cell_type='code', source='b', id='x'))
c3 = NbCell(0, dict(cell_type='code', source='a', id='y'))
test_eq(c1, c2)      # same id, different content -- still equal
assert c1 != c3      # same content, different id -- not equal

cells = [c1, c3]  # duplicate content ('a'), different ids
test_eq(cells.index(c3), 1)  # finds the actual cell, not the first with matching content

dict2nb

def dict2nb(
    js:NoneType=None, **kwargs
):

Convert dict js to an AttrDict,

We can now convert our JSON into this nbclient-compatible format, which pretty prints the source code of cells in notebooks.

minimal = dict2nb(minimal_txt)
cell = minimal.cells[1]
cell

{ 'cell_type': 'code',
  'execution_count': None,
  'id': 'e2147a69',
  'idx_': 1,
  'lang_': 'python',
  'metadata': {'time_run': '2026-01-04T20:52:49.901559+00:00'},
  'outputs': [ { 'data': {'text/plain': '2'},
                 'execution_count': 0,
                 'metadata': {},
                 'output_type': 'execute_result'}],
  'source': '# Do some arithmetic\n1+1'}

The abstract syntax tree of source code cells is available in the parsed_ property:

cell.parsed_(), cell.parsed_()[0].value.op

([1 + 1], )

read_nb

def read_nb(
    path
):

Return notebook at path

This reads the JSON for the file at path and converts it with dict2nb. For instance:

minimal = read_nb(minimal_fn)
str(minimal.cells[0])

"{'cell_type': 'markdown', 'id': '801558df', 'metadata': {}, 'source': '## A minimal notebook', 'idx_': 0, 'lang_': 'python'}"

The file name read is stored in path_:

minimal.path_

'../tests/minimal.ipynb'

Creating a notebook

mk_cell

def mk_cell(
    text, # `source` attr in cell
    cell_type:str='code', # `cell_type` attr in cell
    **kwargs
):

Create an NbCell containing text

mk_cell('print(1)', execution_count=0)

{ 'cell_type': 'code',
  'directives_': {},
  'execution_count': 0,
  'id': '37dd5d5e',
  'idx_': 0,
  'lang_': 'python',
  'metadata': {},
  'outputs': [],
  'source': 'print(1)'}

new_nb

def new_nb(
    cells:NoneType=None, meta:NoneType=None, nbformat:int=4, nbformat_minor:int=5
):

Returns an empty new notebook

Use this function when creating a new notebook. Useful for when you don’t want to create a notebook on disk first and then read it.

test_eq(new_nb().cells, [])

Directives

nbdev and Quarto put directives in comments at the top of a cell: lines like #| export or #| eval: false, ending at the first non-comment line. All spellings are equivalent: #| foo: bar, #| foo:bar, and #| foo bar parse to the same thing, and a directive’s value is the raw text after its name, so nothing is lost to tokenization. A value of true means the same as no value at all: #| hide, #| hide: true, and #| hide: are one directive (which also means no directive can take the literal word true as its value). Directives can also be stored in cell metadata, as a dict under an nbdev key, with the same value domain: every value is a str, \"true\" meaning bare (non-str values raise, so a JSON false fails loudly rather than silently differing from \"false\"). When the same name appears in both places, the comment wins.

first_code_ln

def first_code_ln(
    code_list, re_pattern:NoneType=None, lang:str='python'
):

get first line number where code occurs, where code_list is a list of code

_tst = """ 
#| default_exp
 #| export
#| hide_input
foo
"""
test_eq(first_code_ln(_tst.splitlines(True)), 4)

_directive parses one directive line into its name and value. The value is the raw text after the name (and optional colon), byte-exact apart from one leading space and trailing whitespace, so both spellings parse identically; a value of true collapses to '', the same as a bare directive. Non-directive lines (including cell magics) parse to None.

test_eq(_directive('#| export: utils'), ('export','utils'))
test_eq(_directive('#| export utils'),  ('export','utils'))
test_eq(_directive('#| export:utils'),  ('export','utils'))
test_eq(_directive('#| hide'), ('hide',''))
test_eq(_directive('#| hide:'), ('hide',''))
test_eq(_directive('#| hide: true'), ('hide',''))
test_eq(_directive(' # | woo:baz'), ('woo','baz'))
test_eq(_directive('#| fig-cap: "two  spaces: kept"'), ('fig-cap','"two  spaces: kept"'))
test_eq(_directive('#| filter_stream secret apikey'), ('filter_stream','secret apikey'))
test_eq(_directive('%%timeit'), None)
test_eq(_directive('# plain comment'), None)

directives reads as a plain dict of name to value string: {'export': 'utils', 'hide': ''} (bare directives have value ''). The getter returns a copy, so to edit, modify the copy and assign it back; the setter regenerates the comment block in canonical colon form (cell magics stay first, untouched). Directives that came from cell metadata are tracked by name and written back to metadata rather than to comments; names added through the setter become comments.

c = mk_cell('#| export: utils\n#| hide\ndef f(): pass')
test_eq(c.directives, {'export':'utils', 'hide':''})

d = c.directives
d.pop('hide')
d['eval'] = 'false'
c.directives = d
test_eq(c.source, '#| export: utils\n#| eval: false\ndef f(): pass')

Metadata directives merge in (comments win on conflict), and edits route back to where each directive came from. The setter writes bare directives to metadata as "true":

c = mk_cell('#| export: utils\n1', metadata=dict(nbdev=dict(export='other', eval='false')))
test_eq(c.directives, {'export':'utils', 'eval':'false'})

d = c.directives
d['eval'] = ''
d['hide'] = ''
c.directives = d
test_eq(c.metadata['nbdev'], {'eval': 'true'})
test_eq(c.source, '#| export: utils\n#| hide\n1')
test_eq(c.directives, {'export':'utils', 'hide':'', 'eval':''})

with expect_fail(TypeError, 'must be str'): mk_cell('1', metadata=dict(nbdev=dict(eval=False))).directives

NbCell.remove_directives

def remove_directives(
    quarto:bool=False
):

Strip directives from source, keeping cell magics; with quarto, instead materialize every directive (metadata included) as a Quarto option line

NbCell.has_directive

def has_directive(
    name
):

Call self as a function.

NbCell.directive

def directive(
    name, default:NoneType=None
):

Value of directive name ('' if bare), or default if absent

directive answers “what value?” and has_directive answers “is it there?”: a bare directive’s value is '', which is falsy, so presence tests need has_directive. remove_directives serves the two pipeline consumers: the default strips every directive line (module export needs clean source), while quarto=True instead rewrites the cell with every directive (comments and metadata alike) as a Quarto option line, bare ones as name: true; Quarto consumes the options it knows and quietly turns unknown ones into data-* attributes.

c = mk_cell('%%time\n#| exports: utils\n#| hide\n#| code-fold: show\nslow()', metadata=dict(nbdev=dict(echo='false')))
test_eq(c.directive('exports'), 'utils')
assert c.has_directive('hide') and not c.has_directive('eval')

c.remove_directives(quarto=True)
test_eq(c.source, '%%time\n#| exports: utils\n#| hide: true\n#| code-fold: show\n#| echo: false\nslow()')

c = mk_cell('%%time\n#| exports: utils\n#| hide\n#| code-fold: show\nslow()', metadata=dict(nbdev=dict(echo='false')))
c.remove_directives()
test_eq(c.source, '%%time\nslow()')

Writing a notebook

nb2dict

def nb2dict(
    d, k:NoneType=None
):

Convert parsed notebook to dict

This returns the exact same dict as is read from the notebook JSON.

minimal_fn = Path('../tests/minimal.ipynb')
minimal = read_nb(minimal_fn)
minimal_dict = _read_json(minimal_fn)
assert minimal_dict==nb2dict(minimal)

nb2str

def nb2str(
    nb
):

Convert nb to a str

To save a notebook we first need to convert it to a str:

print(nb2str(minimal)[:45])

{
 "cells": [
  {
   "cell_type": "markdown",

write_nb

def write_nb(
    nb, path
):

Write nb to path

This returns the exact same string as saved by Jupyter.

tmp = Path('tmp.ipynb')
try:
    minimal_txt = minimal_fn.read_text()
    write_nb(minimal, tmp)
    test_eq(minimal_txt, tmp.read_text())
finally: tmp.unlink()

Cell tools

Cell tools apply fastcore.tools’ string editing primitives to one notebook cell’s source, addressed by path and cell id, mirroring that module’s file tools: the same operations and parameters, with path, cell_id in place of path. Each editor (including the structural cell_ast_replace) returns a diff of the change, and view_cell shows a cell’s source with optional line numbers or exhash addresses.

Naming and parameter conventions shared across the editing toolkit are documented in fastcore.editskill, which also re-exports this module’s editing tools.

cell_edit

def cell_edit(
    f, name:NoneType=None
):

Wrap text editor f as a cell editing function: path, cell_id addressing, diff-or-error return

view_cell

def view_cell(
    path:str, # Notebook file to read
    cell_id:str, # Id of the cell to view (exact, or unique prefix)
    start_line:int=1, # Starting line to view
    end_line:int=None, # End line (defaults to last line if None; may be past EOF, which clamps to the last line)
    nums:bool=True, # Show line numbers?
    lnhashs:bool=False, # Show exhash `lineno|hash|` addresses instead of line numbers?
):

View a cell’s source, optionally limited to 1-based line range

A round-trip over a temporary notebook exercises each editor and view mode:

tmp_nb = Path('tmp_cells.ipynb')
write_nb(new_nb([mk_cell('a=1\nprint(a)'), mk_cell('# title', 'markdown')]), tmp_nb)
cid = read_nb(tmp_nb).cells[0].id
test_eq(str(view_cell(tmp_nb, cid)), '1: a=1\n2: print(a)')
test_eq(str(view_cell(tmp_nb, cid, start_line=2, nums=False)), 'print(a)')
test_eq(str(view_cell(tmp_nb, cid, lnhashs=True)).splitlines()[0], lnhash(1,'a=1')+'a=1')
res = cell_str_replace(tmp_nb, cid, 'a=1', 'a=2')
assert '-a=1' in str(res) and '+a=2' in str(res)
test_eq(read_nb(tmp_nb).cells[0].source, 'a=2\nprint(a)')
cell_insert_line(tmp_nb, cid, 0, 'import sys')
cell_del_lines(tmp_nb, cid, 1, 1)
test_eq(read_nb(tmp_nb).cells[0].source, 'a=2\nprint(a)')
cell_replace_lines(tmp_nb, cid, new_content='b=3')
test_eq(read_nb(tmp_nb).cells[0].source, 'b=3\n')  # replace_lines yields a line block, so a trailing newline
assert str(cell_str_replace(tmp_nb, cid, 'q', 'r')).startswith('error:')
with expect_fail(KeyError, 'no cell id'): view_cell(tmp_nb, 'zzzz')
tmp_nb.unlink()

Here’s how to put all the pieces of fastcore.nbio together:

nb = new_nb([mk_cell('print(1)')])
path = Path('test.ipynb')
write_nb(nb, path)
nb2 = read_nb(path)
print(nb2.cells)
path.unlink()

[{'cell_type': 'code', 'execution_count': None, 'id': 'b00e7cbc', 'metadata': {}, 'outputs': [], 'source': 'print(1)', 'idx_': 0, 'lang_': 'python'}]

Notebook files on disk store multiline text as lists of lines (nbformat’s split_lines), for source, stream text, textual mime data, and attachments. Like nbformat’s own reader, we join these to plain strings on read, and split them back on write, so in-memory code always sees strings while files round-trip byte-identically with Jupyter’s. JSON mimes and error tracebacks are genuinely structured, so they pass through untouched.

disk = dict(nbformat=4, nbformat_minor=5, metadata={}, cells=[
    dict(cell_type='code', id='c0', metadata={}, execution_count=1, source=['a=1\n','a'],
         outputs=[dict(output_type='execute_result', metadata={}, execution_count=1,
                       data={'text/plain':['hi\n','there'], 'text/markdown':['single'],
                             'application/json':{'a':[1,2]}, 'image/png':'iVBOR\nw0KG=='}),
                  dict(output_type='stream', name='stdout', text=['s1\n','s2\n']),
                  dict(output_type='error', ename='E', evalue='e', traceback=['t1','t2'])]),
    dict(cell_type='markdown', id='m0', metadata={}, source=['# t\n','x'],
         attachments={'im.txt':{'text/plain':['a\n','b']}, 'im.png':{'image/png':'aGk='}})])
tmp = Path('tmp.ipynb')
try:
    tmp.write_text(dumps(disk))
    nb = read_nb(tmp)
    res,strm,err = nb.cells[0].outputs
    test_eq(res['data']['text/plain'], 'hi\nthere')        # textual data joined on read
    test_eq(res['data']['text/markdown'], 'single')        # one-element lists too
    test_eq(res['data']['application/json'], {'a':[1,2]})  # JSON mimes untouched
    test_eq(res['data']['image/png'], 'iVBOR\nw0KG==')
    test_eq(strm['text'], 's1\ns2\n')                      # stream text joined
    test_eq(err['traceback'], ['t1','t2'])                 # tracebacks stay lists
    test_eq(nb.cells[1].attachments['im.txt']['text/plain'], 'a\nb')
    test_eq(nb2dict(nb), disk)                             # splitting on save restores the disk form exactly
finally: tmp.unlink()

Validation

Direct edits to cell dicts can produce notebooks that Jupyter and other tools reject: an outputs key on a markdown cell, a code cell missing execution_count, duplicate ids. validate_cell and validate_nb are cheap structural checks for exactly those mistakes. They raise ValueError naming the offending cell, and never repair: fixing is the caller’s decision. This is not the full nbformat schema, just the rules whose violation breaks notebooks in practice.

validate_nb

def validate_nb(
    nb
):

Raise ValueError for structural problems in notebook nb; returns it unchanged if fine

validate_cell

def validate_cell(
    cell, idx:NoneType=None
):

Raise ValueError for structural problems in notebook cell dict cell; returns it unchanged if fine

A valid notebook passes through unchanged, and each rule fails loudly, naming the cell (the markdown-with-outputs case is a real one: stray keys from hand-edited files):

vnb = new_nb([mk_cell('1+1'), mk_cell('a note', 'markdown')])
test_eq(validate_nb(vnb), vnb)
vnb.cells[1]['outputs'] = []
with expect_fail(ValueError, 'not allowed in a markdown cell'): validate_nb(vnb)
del vnb.cells[1]['outputs'], vnb.cells[0]['execution_count']
with expect_fail(ValueError, 'requires execution_count'): validate_nb(vnb)

dup = new_nb([mk_cell('a', id='x1'), mk_cell('b', id='x1')])
with expect_fail(ValueError, 'duplicate cell id'): validate_nb(dup)
with expect_fail(ValueError, 'unknown cell_type'): validate_cell(dict(cell_type='wat', source=''))
with expect_fail(ValueError, 'str or list of str'): validate_cell(dict(cell_type='raw', source=[1,2]))

repair_nb

def repair_nb(
    nb
):

Fix deterministic structural problems in nb, returning a list of repairs made

repair_cell

def repair_cell(
    cell, idx:NoneType=None
):

Fix deterministic structural problems in cell, returning a list of repairs made

Each validation rule has a deterministic repair, applied by repair_nb: non-code cells lose stray outputs/execution_count, code cells gain missing ones, broken source/metadata are coerced, missing notebook fields are added, and duplicate cell ids are regenerated (the first occurrence keeps the id). Unknown cell_types are left alone, since no repair can know the intent. The returned list says what was done, so callers can report or count repairs; a valid notebook returns [] and is untouched.

bad = dict2nb(dict(cells=[
    dict(cell_type='markdown', source='hi', outputs=[], execution_count=1, id='x1'),
    dict(cell_type='code', source='1+1', id='x1'),
], metadata={}, nbformat=4, nbformat_minor=5))
with expect_fail(ValueError): validate_nb(bad)

repairs = repair_nb(bad)
validate_nb(bad)
test_eq(len(repairs), 5)
assert bad.cells[0].id != bad.cells[1].id
test_eq(repair_nb(bad), [])

Output rendering

preferred_out

def preferred_out(
    data, html1st:bool=True, include_imgs:bool=False
):

Call self as a function.

preferred_out selects the best MIME type from an output’s data dict, preferring HTML by default:

data = dict(text_plain=['42'], **{'text/html': ['<b>42</b>'], 'text/plain': ['42']})
preferred_out(data), preferred_out(data, html1st=False)

(('text/html', ['<b>42</b>']), ('text/html', ['<b>42</b>']))

mk_error

def mk_error(
    traceback, ename:str='', evalue:str=''
):

Helper to create an error output dict

mk_display

def mk_display(
    metadata:NoneType=None, **data
):

Helper to create a display_data output dict

mk_result

def mk_result(
    metadata:NoneType=None, **data
):

Helper to create an execute_result output dict

mk_stream

def mk_stream(
    name, text
):

Helper to create an output stream dict

concat_streams

def concat_streams(
    outputs
):

Concatenate stream outputs by name (stdout/stderr), preserving execute_result at end

concat_streams merges consecutive stream outputs by name and moves execute_results to the end, like standard jupyter output rendering:

outs = [mk_result(text_plain=['42']),
        mk_stream('stdout', 'hello '), mk_stream('stdout', 'world\n'), mk_stream('stderr', 'warn\n')]
outs

[{'output_type': 'execute_result',
  'data': {'text/plain': ['42']},
  'metadata': {}},
 {'output_type': 'stream', 'name': 'stdout', 'text': 'hello '},
 {'output_type': 'stream', 'name': 'stdout', 'text': 'world\n'},
 {'output_type': 'stream', 'name': 'stderr', 'text': 'warn\n'}]

concat_streams(outs)

[{'output_type': 'stream', 'name': 'stdout', 'text': 'hello world\n'},
 {'output_type': 'stream', 'name': 'stderr', 'text': 'warn\n'},
 {'output_type': 'execute_result',
  'data': {'text/plain': ['42']},
  'metadata': {}}]

Carriage-return overwrites apply across chunk boundaries, as they would on a live terminal — a progress line ending in \r is overwritten by the next chunk, but a final \r (nothing follows it) leaves the text visible:

prog = [mk_stream('stdout', 'step 1\r'), mk_stream('stdout', 'step 2\r'), mk_stream('stdout', 'done\n')]
test_eq(concat_streams(prog), [mk_stream('stdout', 'done\n')])
test_eq(concat_streams([mk_stream('stdout', 'working\r')]), [mk_stream('stdout', 'working')])

preferred_msg_out

def preferred_msg_out(
    out, html1st:bool=True, include_imgs:bool=False
):

Preferred mime type and content for any Jupyter output dict (stream, error, or data-bearing)

preferred_msg_out extends preferred_out to any output dict: streams and errors are always plain text, while data-bearing outputs go through mime preference:

test_eq(preferred_msg_out(mk_stream('stdout', ['a\n','b\n'])), ('text/plain', 'a\nb\n'))
test_eq(preferred_msg_out(mk_result(text_html=['<b>4</b>'], text_plain=['4'])), ('text/html', ['<b>4</b>']))
test_eq(preferred_msg_out(mk_result(text_markdown=['*4*'], text_html=['<b>4</b>']), html1st=False)[0], 'text/markdown')

render_output

def render_output(
    out
):

Convert a single output dict to an HTML string

print(render_output(mk_result(text_plain=['42'])))

<pre class="!border-0 !rounded-none !my-0 !p-0"><code class="nohighlight">42</code></pre>

render_outputs

def render_outputs(
    outputs
):

Render a full list of outputs, concatenating streams first.

print(render_outputs(outs))

<pre class="!border-0 !rounded-none !my-0 !p-0"><code class="nohighlight">hello world
</code></pre>
<pre class="!border-0 !rounded-none !my-0 !p-0"><code class="nohighlight">warn
</code></pre>
<pre class="!border-0 !rounded-none !my-0 !p-0"><code class="nohighlight">42</code></pre>

render_text

def render_text(
    outputs, html1st:bool=False
):

Render notebook outputs to concise text, using XML-ish tags when multiple outputs are present.

A single output renders as plain text directly:…

print(render_text([outs[0]]))

…but multiple outputs get wrapped in XML-ish tags so they stay distinguishable. For rich outputs we prefer markdown by default; pass html1st=True to prefer HTML instead. Outputs without a text representation (e.g. images alone) render as empty.

print(render_text(outs))

<stdout>
hello world
</stdout>
<stderr>
warn
</stderr>
<execute_result>
42
</execute_result>

disp = [mk_display(text_html=['<b>42</b>'], text_markdown=['**42**'], text_plain=['42'])]
test_eq(render_text(disp), '**42**')
test_eq(render_text(disp, html1st=True), '<b>42</b>')

test_eq(render_text([mk_display(image_png='abc')]), '')
test_eq(render_text([mk_error('oops')]), 'oops')

nb.cells[0].outputs = [mk_stream('stdout', '1\n')]

Notebook class

item2xml

def item2xml(
    typ, # Tag name: the cell or message type, e.g. 'code', 'markdown', 'raw', 'prompt'
    content:str='', # The item's source text
    out:str='', # Rendered output text
    id:NoneType=None, # Optional id attribute
    meta:NoneType=None, # Cell/message metadata: directives in its `nbdev` dict render as attrs, bare ones as bare attrs
    **attrs
):

A notebook cell or dialog message as concise XML: content, then an <out> section when out is non-empty

item2xml renders notebook cells and dialog messages to LLM-friendly XML (cell2xml below and llmsurgery’s message renderers build on it). The content sits directly inside the type tag, with no wrapper of its own, and an <out> section marks where output begins - so an item without output carries no extra tags at all. Passing meta renders the metadata’s nbdev directives as attributes, so every projection built on item2xml shows them the same way.

test_eq(to_xml(item2xml('code', 'x*2', '42', id='ab')), '<code id="ab">x*2<out>42</out></code>')
to_xml(item2xml('markdown', '# hi', id='cd'))

'<markdown id="cd"># hi</markdown>'

Falsy attrs are dropped, literal True attrs have no value:

test_eq(to_xml(item2xml('code', 'x', time='', kind='system', foo=True)), '<code kind="system" foo>x</code>')
test_eq(to_xml(item2xml('code', 'x', meta=dict(nbdev=dict(hide='true', eval='false')))), '<code hide eval="false">x</code>')

cells2xml

def cells2xml(
    cells, wrap:partial=functools.partial(<function ft at 0x7fbe4031bce0>, 'nb'), ids:bool=True, incl_out:bool=True,
    **kw
):

Convert notebook cells to XML format

cell2xml

def cell2xml(
    cell, ids:bool=True, incl_out:bool=True
):

Convert NbCell to concise XML format

We can view any notebook as concise XML. For instance, our minimal notebook:

print(cells2xml(nb.cells, incl_out=False))

<nb><code id="c0">a=1
a</code><markdown id="m0"># t
x</markdown></nb>

repr(cell2xml(nb.cells[0], incl_out=False))

'<code id="c0">a=1\na</code>'

repr(cell2xml(nb.cells[0], incl_out=True))

'<code id="c0">a=1\na<out>1\n</out></code>'

# Metadata directives render as attrs: bare as bare, others verbatim (names unhyphenated)
c = mk_cell('1+1', metadata=dict(nbdev=dict(hide='true', default_exp='core')))
test_eq(repr(cell2xml(c, incl_out=False)), f'<code id="{c.id}" hide default_exp="core">1+1</code>')

Notebook

def Notebook(
    nb, path:NoneType=None
):

Read, query, and edit Jupyter notebooks

We can now open a notebook and access its metadata and cells:

nbo = Notebook.open(minimal_fn)
list(nbo.meta), len(nbo.cells), len(nbo)

(['solveit_dialog_mode', 'solveit_ver'], 2, 2)

nbo.path.name

'minimal.ipynb'

[o.id for o in nbo]

['801558df', 'e2147a69']

'e2147a69' in nbo, 'nonexistent' in nbo

(True, False)

Notebooks’ repr is their xml:

nbo

<nb path="/Users/jhoward/aai-ws/fastcore/tests/minimal.ipynb"><markdown id="801558df">## A minimal notebook</markdown><code id="e2147a69"># Do some arithmetic
1+1<out>2</out></code></nb>

You can also get a more concise version that doesn’t include outputs or the full path:

print(nbo.concise)

<nb path="minimal.ipynb"><markdown id="801558df">## A minimal notebook</markdown><code id="e2147a69"># Do some arithmetic
1+1</code></nb>

Cells can be accessed by integer index or by their string id:

nbo[0].source

'## A minimal notebook'

nbo['e2147a69'].source

'# Do some arithmetic\n1+1'

You can directly set a cell’s source by id or index:

nbo['e2147a69'] = '2+2'
nbo['e2147a69'].source

'2+2'

Cells also carry the shared edit family as methods - the same operations as the cell_* functions, minus the address arguments, since the cell in hand is the carrier. These are in-memory edits: the diff comes back for verification, and nothing touches disk until the notebook is saved.

d = nbo['e2147a69'].str_replace('2+2', '3+3')
test_eq(nbo['e2147a69'].source, '3+3')
print(d)

You can also update outputs and metadata directly on a cell:

nbo['e2147a69'].outputs = [{'output_type': 'execute_result', 'data': {'text/plain': ['4']}}]
nbo['e2147a69'].outputs

[{'output_type': 'execute_result', 'data': {'text/plain': ['4']}}]

nbo['e2147a69'].metadata['custom'] = True
nbo['e2147a69'].metadata

{'custom': True, 'time_run': '2026-01-04T20:52:49.901559+00:00'}

The add method inserts a new cell at a given position (defaulting to the end):

Notebook.add

def add(
    source, cell_type:str='code', idx:NoneType=None, after:NoneType=None, before:NoneType=None, **kwargs
):

Add a new cell with source at idx (default: end), or after/before a cell id

nbo.add('print("hello")')
nbo.add('# A heading', cell_type='markdown', idx=0)
len(nbo), nbo[0].source

(4, '## A minimal notebook')

Cells can also be inserted relative to an existing cell by id:

cid = nbo[0].id
nbo.add('# After first', cell_type='markdown', after=cid)
nbo.add('# Before first', cell_type='markdown', before=cid)
[c.source for c in nbo[:3]]

['# Before first', '## A minimal notebook', '# After first']

Notebook.md

def md(
    source, idx:NoneType=None, after:NoneType=None, before:NoneType=None, **kwargs
):

Add a new cell with source at idx (default: end), or after/before a cell id

md is a shortcut to add(..., cell_type='markdown')

nbo.md('A note')
len(nbo), nbo[-1].cell_type

(7, 'markdown')

You can delete by id or index:

prev_len = len(nbo)
del nbo[0]
len(nbo) == prev_len - 1

True

The find_cells method searches cell sources by regex, returning matching cells:

Notebook.find_cells

def find_cells(
    pat, cell_type:NoneType=None
):

Find cells with source matching regex pat

nbo.find_cells(r'\d\+\d', cell_type='code')

[{'cell_type': 'code',
  'execution_count': None,
  'id': 'e2147a69',
  'metadata': {'time_run': '2026-01-04T20:52:49.901559+00:00', 'custom': True},
  'outputs': [{'output_type': 'execute_result',
    'data': {'text/plain': ['4']}}],
  'source': '2+2',
  'idx_': 1,
  'lang_': 'python'}]

Notebook.move

def move(
    src_ids, after:NoneType=None, before:NoneType=None
):

Move cells with src_ids after/before a cell id, or to end

Cells can be moved by id, either relative to another cell or to the end:

nbo = Notebook.open(minimal_fn)
c0,c1 = nbo[0].id,nbo[1].id
nbo.move(c1, before=c0)
[c.id for c in nbo] == [c1, c0]

True

Use save to write to disk:

nbo.save('path.ipynb')

If no path is passed, the path used in open() will be re-used.

Notebook.view_cell

def view_cell(
    id, nums:bool=True
):

Show cell source with optional line numbers

The view_cell method displays a cell’s source with optional line numbers:

print(nbo.view_cell('e2147a69'))

     1 │ # Do some arithmetic
     2 │ 1+1

The path-taking twins of the query methods return snapshots, not live cells: CellRow records id, type, source, and the cell meta, and its summary line id:t[directives]:source (t: c=code m=markdown r=raw) shows any nbdev directives, so export state is visible at a glance. Rows are data to read and addresses to act on - edits then go through the cell_* functions or cell_exhash. The correspondence is mechanical: the function is the method with a path argument standing where the held notebook was.

CellRows

def CellRows(
    *args, **kwargs
):

Built-in mutable sequence.

If no argument is given, the constructor creates a new empty list. The argument must be an iterable if specified.

CellRow

def CellRow(
    c, maxlen:int=120
):

Snapshot of one cell, shown as id:t[directives]:source (t: c=code m=markdown r=raw)

Notebook.to_dict

def to_dict():

The plain dict form of the held notebook (nb2dict): the representation layer

find_cells

def find_cells(
    path, # Notebook file to search
    pat:str='', # Regex over cell source
    cell_type:str=None, # Optional limit by type ('code', 'markdown', or 'raw')
):

Snapshot CellRows for matching cells in the notebook at path

summary_nb

def summary_nb(
    path, # Notebook file to read
    maxlen:int=120, # Maximum source characters per line
):

One snapshot line per cell of the notebook at path

Notebook.summary

def summary(
    maxlen:int=120
):

One CellRow line per cell

rows = find_cells(minimal_fn, r'\d\+\d')
test_eq([r.cell_type for r in rows], ['code'])
test_eq(type(Notebook.open(minimal_fn).to_dict()), dict)
summary_nb(minimal_fn)