The dedoc Document Engineering Environment
dedoc is a system for writing natural-language technical documentation or technical standards which focuses on providing a powerful writing environment while refusing to compromise on quality of output.
dedoc as a whole comprises three fundamental parts:
DEDOC, an XML schema for writing high-quality technical documentation while targeting multiple different output formats, including XHTML, EPUB, man pages and plain text, while also being simpler and more focused than DocBook;
dedoc.scm, an environment for writing technical documentation in Scheme targeting the DEDOC schema, providing a friendlier writing environment for hypertext than writing XML directly and allowing procedural generation of content using arbitrary Scheme code;
dedoc-methods, a set of Makefile-driven methods for converting DEDOC files to various output formats, such as XHTML, EPUB, plain text and man pages.
Scheme writing environment. In short, dedoc is a system for writing natural-language technical documents using Scheme. By using the power of S-expressions, including ordinary functions, macros and backquoting, and SXML (the representation of XML in S-expressions), you can ergonomically write documents which compile to a highly semantic, clean XML representation:
(define (introduction) (sec "Introduction" (p "This is an example of writing a DEDOC section with a single paragraph. We can also procedurally generate content:") (map (lambda (name) (p "This is "name"'s paragraph.")) (list "Alice" "Bob" "Charlie")) (p "p is just a function which simply provides a convenience for writing SXML. We can also write SXML directly using quasiquotation:") `(p (@ (class "the-answer")) "The answer to life, the universe, and everything is " ,(* 6 7) ".") )))
Internal and external transformation. You can then use the power of SXML (or, if you wish, any external XML manipulation tooling of your choice, such as an XSLT processor) to “lower” this semantic, platonic XML representation of your document into a more concrete representation, such as DocBook, XHTML or XSL-FO; or you can use a Scheme transformer which consumes your SXML and produces a non-XML representation such as LaTeX, ConTeXt or roff. You can also use Scheme or external processing tools to transform the (S)XML document prior to lowering, for example by automatically adding section numbering, tables of contents or similar, or by compiling figure images from their source representation.
High quality output. The purpose of dedoc is to allow the production of technical documents which can be consumed as XHTML and come across as documents which were designed for optimal consumption as XHTML; and which can be consumed as PDFs and come across as documents which were designed for optimal consumption as PDFs. In other words, dedoc seeks to avoid any compromise as to the quality of the output. This remains a work in progress, mainly due to the lack of any open-source PDF typesetting solution which can produce an acceptable quality of output in an unattended manner; thus PDF output is not yet available.
Machine generation and readability. Because documents are literally written in a full Scheme environment, the system is also optimally suited to generating documents with large amounts of generated content; for example, register manuals. You can use Scheme to import arbitrary machine-readable data from any source and trivially transform it into SXML. Moreover, since dedoc can be used to produce high-quality, semantic XML documents, those resulting documents are easily machine-read to extract the embedded information. Why have people read through a 10,000 page PDF of registers, when you can provide them with an XML file which is both viewable in a web browser and machine intelligible, allowing automated processing of all register definitions? You can either use XHTML directly with embedded domain-specific semantic annotations, or a domain-specific XML representation which is rendered viewable in a web browser via an attached XSLT or CSS stylesheet.
Generalisable to other schemas. Though DEDOC is the reference schema for dedoc, the principles used are fully generalisable and dedoc.scm amounts to little more than a handful of utility functions and macros to make writing SXML slightly more ergonomic. Thus, the entire system could be readily adapted to use with any other schema.
Small size. dedoc is tiny. The core of it is a very small amount of ergonomic glue which simply makes writing SXML for the target schema (DEDOC) slightly easier. You can consider it a demonstration of the power of Scheme.
Checked lexical references. Because documents are written in Scheme, you can ergonomically reference other objects inline in text via lexical references. For example, when using DEDOC, terminology (as might be found in a definitions section of a specification) is defined like so:
;; Define a term. This can be referenced lexically. All terms are collated ;; in a list inside the Scheme environment, and can be mapped over ;; programmatically to automatically populate a Definitions section in the ;; output. The use of lexical references to terms ensures integrity of ;; references and allows the output to contain hyperlinks to definitions, etc. (dt table "table" "A table is a surface on which objects can be placed.") ;; Define a proword. This is similar to a term but it is usually used ;; to express a normative requirement. RFCs have prowords such as ;; MUST, MUST NOT, SHOULD, SHOULD NOT, MAY, etc. ISO standards have ;; prowords such as "shall", "shall not", etc. (dpword must "MUST") ;; Paragraph referencing the term "table" and the proword "MUST" lexically. ;; Note that the quotes around "table" and "MUST" here are actually ending a ;; quoted-string then beginning a new one; Scheme does not require any spaces here. ;; Simply make your eyes forget you're in a quoted string, and accept the convention ;; of placing terms and prowords in quotes to refer to them. It's surprisingly ergonomic. (p "A "table" "MUST" have four legs.")
Note that this is not a special parsing environment. dt, dpword and p are simply Scheme macros or functions which define items in the Scheme lexical environment.
In the example above, table and MUST are simply lexical references in the Scheme environment to objects which have been defined in that environment via the previous dt and dpword macro invocations.
Note that unlike some other languages, Scheme does not require you to place spaces between quoted strings and other tokens. This allows you to form the habit of simply putting all terminology in quotes to reference it (a style which is reminiscent of that used by some contract lawyers when referencing terminology). Although in actuality what you're doing here is ending a string constant, referencing an object and then starting a new string constant, it's easy to localise your vision to make your eyes “forget” this and act as though you're actually opening and closing a semantic construct, rather than closing and opening one.
Scheme runtime environment. The full power of Scheme can be used to produce generated output. A dedoc document is simply a Scheme program which, when executed, outputs XML.
dedoc is intended to be used with Guile Scheme, though it's possible it could be ported to other Schemes.
dedoc-methods. dedoc comes with a collection of typesetting methods known as dedoc-methods. Methods are split into different tiers depending on the priority they are given in terms of support and maintenance. The highest tier is Tier 1, which includes:
- xhtml-single-xsl1, which produces a single XHTML file from DEDOC XML using an XSLT1 transform. Uses xsltproc.
- epub-xsl1, which produces an EPUB file from DEDOC XML using the xhtml-single-xsl1 method and a subsequent transform. Requires zip.
- mdoc-xsl1, which produces a mdoc-format roff-style file from DEDOC XML using an XSLT1-based transform. Requires xsltproc and XMLStarlet.
Tier 2 currently includes:
- txt-mandoc-mdoc-xsl1, an example of generating plain text (tier 1), PostScript, PDF or XHTML (tier 2†) output using the mandoc(1) program.
- txt-roff-mdoc-xsl1, an example of generating plain text (tier 1), PostScript, PDF or XHTML (tier 2†) output using the groff(1) program.
†The mandoc(1) and groff(1) workflows are primarily intended to be used for their plain text output. Since they are also capable of producing PostScript, PDF and (X)HTML output, this functionality is also exposed for demonstration purposes, however this output will not be as high quality as the dedicated methods above.
Current usage. dedoc is used to produce the documentation for acmetool.
To get started, try the tutorial or examine some of the example documents. For a more elaborate example of dedoc's output, see the acmetool manual.