RSM file processing steps#

This guide explains the processing pipeline that occurs when executing rsm.make(source) (or calling rsm-make from the command line). Assume there exists a file called src.rst in the current directory and a user executes rsm.make(src).

Steps#

Read#

The first step is to find the file and read its contents as a string.

  • Input: path to file.

  • Output: string with RSM source.

  • Module responsible: reader.

  • Class responsible: Reader.

Parse#

The RSM source is put through the parser, which outputs a syntax tree. This tree contains a node for each part of the manuscript. Some nodes in the manuscript tree correspond one-to-one to RSM tags, but not always.

The manuscript tree is built in two parsing sub-steps.

  1. Concrete syntax tree: The source is processed by a parser written as a C extension (this parser is generated by tree-sitter). The output of this stage is a concrete syntax tree which contains nodes that correspond to syntactic elements of the source file such as Halmoses :: and other delimiters {}.

  2. Abstract syntax tree: The concrete syntax tree is parsed into an abstract syntax tree, called the manuscript tree. This tree no longer contains syntactic information relevant to the RSM source code used to generate it, it only contains nodes that are semantically meaningful parts of the manuscript such as sections, labels, and bibliography.

The process of converting the concrete syntax tree into the (abstract) manuscript tree is called abstractifying, and is carried out in pure python. The concrete syntax tree is composed of nodes defined by the python bindings of tree-sitter, namely py-tree-sitter. The manuscript tree is made up of the nodes found in nodes.

The concrete syntax tree is useful for syntax highlighting and code navigation. Editor extensions and other code-oriented tools in the RSM ecosystem may stop their processing at this stage if they do not need the abstract manuscript tree.

  • Input: string with RSM source.

  • Output: abstract syntax tree.

  • Module responsible: tsparser.

  • Class responsible: TSParser.

  • Other relevant modules: nodes.

Transform#

The abstract manuscript tree output by the parser is then transformed in several ways. This is specially important for procedures that cannot take place before the entire manuscript tree is parsed and held in memory. For example, if a :ref: tag makes reference to a :label: that has not yet been defined (forward reference), RSM will need to read the entire manuscript before it can link the reference to its target. For this reason, label and reference resolution (including bibliography citations) is finished at this stage.

Other examples of transformations include generating a table of contents, or adding certain adornments that are necessary in the output manuscript but not manually input by the user, such as adding a turnstile to mathematical claims.

After the transform step, the abstract manuscript tree is considered finalized and should not be modified again by any later processing. Some tools that only need an in-memory representation of the manuscript may end their processing here. For example, the CLI utility rsm-lint does not carry out any of the remaining steps in the standard processing pipeline (with the exception of the linting step itself, which is optional).

  • Input: abstract manuscript tree.

  • Output: (transformed) abstract manuscript tree.

  • Module responsible: transformer.

  • Class responsible: Transformer.

Lint#

This is an optional step that only takes place when running a linter, for example with rsm-lint. The linter takes the finalized abstract manuscript tree and runs routines that check it for consistency and soundness. If it encounters any problems, they are flagged to the user. The linter does not modify the tree or the file it came from.

  • Input: abstract manuscript tree.

  • Output: errors, warnings, and suggestions shown to the user.

  • Module responsible: linter.

  • Class responsible: Linter.

Translator#

The abstract manuscript tree is translated into HTML. Each node of the tree is visited and corresponding HTML code is added to a string.

There are currently two kinds of translator: a basic translator that generates human-readable HTML and a more advanced translator that adds web components such as handrails, additional CSS classes, and other features necessary for the manuscript to display correctly in a browser. The basic translator is useful during automatic testing, and is the one used by the CLI utility rsm-render, which takes the HTML body and simply returns it to the user.

This step generates only the body of the final HTML document. Adding headers, scripts, static files, and other such features is the task of the next step.

  • Input: abstract manuscript tree.

  • Output: string with HTML source (only the body).

  • Module responsible: translator.

  • Class responsible: Translator.

Builder#

The HTML body from the previous step is developed into a fully-featured, working website by adding headers, metadata, scripts, etc. Static files are gathered, including CSS style sheets, JS files, and any figures or data files the user has included in the manuscript.

  • Input: string with HTML body.

  • Output: in-memory representation of the final output folder.

  • Module responsible: builder.

  • Class responsible: Builder.

Writer#

Finally, the output folder is generated in disk. This is the final step of the CLI utility rsm-make.

  • Input: in-memory representation of the final output folder.

  • Output: write final folder to disk.

  • Module responsible: writer.

  • Class responsible: Writer.