tsparser#

Input: RSM source string – Output: abstract syntax tree.

Produce a manuscript tree (a.k.a. abstract manuscript tree, a.k.a. abstract syntax tree) from the RSM source string.

The process occurs in two stages:

  1. Parsing: The first step parses the RSM source string into a concrete syntax tree using the tree-sitter parser written in C. The grammar and parser definitions can be found in the tree-sitter-rsm repository. This step uses py-tree-sitter, the python bindings of tree sitter. Thus, the concrete syntax tree is composed of nodes defined by py-tree-sitter and the RSM package has little control over what these nodes look like. The concrete syntax tree contains a node for each syntactically-relevant token in the source file, including but not limited to tags (:tag-name:), delimiters ({, }), Halmoses (::), and other special characters (/, *, $, etc).

  2. Abstractifying: The second step takes the concrete syntax tree output by the tree-sitter parser and builds an abstract syntax tree. This latter tree contains a node for each semantically-relevant piece of the manuscript, and no longer contains nodes for delimiters or special characters. This step is carried out by this module in pure python.

Both steps are handled by TSParser. The rest of this module contains auxiliary functions (mostly private ones) that are useful during the parsing process.

Classes

TSParser

Parse RSM source into an abstract syntax tree.

Functions

rsm.tsparser.print_cst(tree, named_only=False)[source]#

Print a tree-sitter concrete syntax tree.

This is executed by default when processing a manuscript with logging level DEBUG. For examples, see TSParser docstring.

Parameters:
  • tree (Tree) – A concrete syntax tree parsed by tree-sitter.

  • named_only (bool) – Whether to print only named nodes.

Notes

Named nodes are those that do not correspond to syntax-only nodes.

Exceptions

RSMParserError([pos, msg])

Raised when there is an irrecoverable error during parser.