How to autogenerate documentation for a CLI with Sphinx

You've documented the CLI command options near the source code with comments. All the range of possible values defined are defined as enumerations. And, when users run a command with the option --help, they can check how to use it from the terminal. However, you also maintain reference docs on a site that you must update every time there is a new release.

My client ScyllaDB faced this same problem with their CLI sctool. While the commands were properly documented in the code, the users frequently complained when they found inconsistencies between the actual CLI implementation and the latest version of their docs.

In this article, I'll review how we autogenerated reference docs from the code for Scylla's CLI using Sphinx. Sphinx is a static site generator for docs, such as Jekyll, Gatsby, Docusaurus, or Antora. However, if you are not using Sphinx to document your software, I think you'll still find this article useful if you can apply the same general principles to other documentation tools.

Solution overview

There are different ways to autogenerate documentation for a CLI. For example, you could run all commands with the option --help, and leave the output of each command in a folder that you could later import from the documentation.

However, we quickly discarded this option because it required building and installing the CLI for every documentation update, which substantially increased the build time. Equally important, this approach gave us little flexibility to render the documentation differently from its original output.

Solution overview diagram

Together with @michalmatczuk and @TzachL, we decided to create a new CLI command that generates the documentation for all the commands in YAML form. Having all the options structured, we can later transform them to other formats such as a Markdown or restructuredText without worrying if the CLI output or the implementation changes.

Then, we created a Jinja template that defines how to parse restucturedText from YAML. Sphinx reads this template and the YAML file, and embeds it in the documentation with a bit of help from a third-party extension.

Do you want to know more about the implementation details? Then, keep reading to learn how we implemented this solution in the next few sections.

Step 1 - Generate structured YAML

The first thing we did was to adapt the CLI tool to generate the documentation in YAML form. In our case, we created a new CLI command that generates the documentation for the CLI in the directory of our choice:

sctool docs [options]
    Generates documentation for the CLI in YAML form.

    Options:
      --output,-o          (string) Path to generate output to (absolute or relative to the root project directory). By default is the current directory.

The command above produces a YAML file for each command with the following syntax:

sctool_cluster_list.yaml

name: sctool cluster list
    synopsis: Show managed clusters
    description: |
     This command displays a list of managed clusters.
    usage: sctool cluster list [flags]
    options:
    - name: help
     shorthand: h
     default_value: "false"
     usage: help for list
    inherited_options:
    - name: api-cert-file
     usage: |
     File `path` to HTTPS client certificate used to access the Scylla Manager server when client certificate validation is enabled (envvar SCYLLA_MANAGER_API_CERT_FILE).
    - name: api-key-file
     usage: |
     File `path` to HTTPS client key associated with --api-cert-file flag (envvar SCYLLA_MANAGER_API_KEY_FILE).
    - name: api-url
     default_value: http://127.0.0.1:5080/api/v1
     usage: |
     Base `URL` of Scylla Manager server (envvar SCYLLA_MANAGER_API_URL).
     If running sctool on the same machine as server, it's generated based on '/etc/scylla-manager/scylla-manager.yaml' file.
    see_also:
    - sctool cluster - Add or delete clusters

As you can see, each YAML file defines the command name, a synopsis with a description, and lists options in a structured form. This allows us to read the file with less effort and render the documentation in the format we prefer.

Step 2 - Convert YAML to restructuredText (or Markdown)

Once we had the commands' docs as YAML files, it was the right time to outline how we wanted the reference documentation to be displayed.

To do so, we created a new template file that defines how to iterate over an object - in this case, a single YAML file - and format the resulting document as restructuredText. In our case, we divided our command template for a sctool command in the following sections:

Syntax: The synopsis which describes command usage for the end-user.
Options: The list of command options with a description and their default values.
Inherited options: The list of inherited options, which are common for all commands.
Example: An example of use.

command.tmpl

.. -*- mode: rst -*-

{{ data['description'] }}

{% if data['usage'] %}
Syntax
......

.. code-block:: none

   {{ data['usage'] }}
{% endif %}

{% if data['options'] %}
Command options
...............

{% for item in data['options'] %}
``{% if item.shorthand %}-{{item.shorthand}}, {% endif %}--{{item.name}}``

{{item.usage}}

{% if item.default_value %}**Default value:** ``{{ item.default_value }}``{% endif %}

{% endfor %}
{% endif %}

{% if data['inherited_options'] %}
.. collapse:: Inherited Options

   {% for item in data['inherited_options'] %}
   ``{% if item.shorthand %}-{{item.shorthand}}, {% endif %}--{{item.name}}``
   
   {% set usage = item.usage.split('\n') %}
   {% for line in usage %}
   {{line}}
   {% endfor %}
   
   {% if item.default_value %}**Default value:** ``{{ item.default_value }}``{% endif %}
   {% endfor %}
{% endif %}

{% if data['example'] %}
Example
.......

{% set example = data['example'].split('\n') %}

.. code-block::

   {% for line in example %}{{line}}
   {% endfor %}

{% endif %}

This file follows the Jinja2 syntax. Here are some of the most common Jinja2 delimiters we used:

Delimiter	Example	Description
Variable substitution	`{{ data['key'] }}`	Inserts the value from the object with key `key` in the document.
Conditional	`{% if condition %}...{% endif %}`	Shows the block between the brackets if the `condition` evaluates to true.
Loop	`{% for o in data['key'] %}...{% endfor %}`	If `data['key']` is an array, iterate over all its elements.

👉 For a complete Jinja2 reference, see Template designer documentation.

Step 3 - Autogenerate the documentation

Next, we passed the YAML files to the Jinja template to generate restructuredText.

One option would have been to create a script that, for each CLI command, runs a tool like the Jinja2 CLI.

The next command takes as the input the YAML file sctool_cluster_list.yaml, and produces sctool_cluster_list.rst using command.tmpl as the intermediate template:

jinja2 command.tmpl sctool_cluster_list.yaml --format=json > sctool_cluster_list.rst

Then, we could include the output in the documentation. In Sphinx we have the directive literalinclude to import entire files within other restructuredText files.

For example:

Cluster
=======

list
----

.. literalinclude:: sctool_cluster_list.rst

In our case, we preferred to use the extension sphinx-data-templates. This brings a directive that infers the YAML file and the Jinja template and includes the content inlined in a restructuredText file.

For example:

Cluster
=======

cluster list
------------

.. datatemplate:yaml:: partials/sctool_cluster_list.yaml
   :template: command.tmpl

Results in the following page after building the docs:

CLI documentation preview

Preview: https://manager.docs.scylladb.com/master/sctool/cluster.html

Step 4 - Keep documentation up to date

To keep docs up to date, we added a mechanism that checks if the source YAML files stored in the repository match the latest version of the CLI.

A CI workflow triggers every time a new pull request edits the CLI code. Here's a sample workflow implementation using GitHub Actions:

.github/workflows/verify-cli-docs.yaml

name: Verify CLI docs
on:
  push:
    branches:
      - main
  pull:
    branches:
      - main
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
        with:
          ref: ${{ github.head_ref }}
      - name: Install CLI
        run: npm install .
      - name: Run command to generate YAML files
        run: sctool docs -- output docs/commands/_partials
      - name: Check for uncommitted changes
        id: check-changes
        uses: mskri/check-uncommitted-changes-action@v1.0.1
      - name: Evaluate if there are changes
        if: steps.check-changes.outputs.outcome == failure()
        run: exit 1

In short, the workflow runs the command sctool docs. If there are uncommitted changes after running the command, this means that the repository does not have the latest version of the YAML files. Therefore, the workflow raises an error.

By doing so, we ensure the docs and the implementation are up to date as long as the CI does not complain.

Conclusion

After implementing those changes in Scylla's project, the cost of maintaining this reference documentation tends to zero as long as devs remember to update the source code. Besides, the docs always match the latest source code version, reducing the number of complaints and keeping a single source of truth. Finally, autogenerating the CLI reference documentation made the CLI docs more comprehensive since they always follow a standard format.