23 KiB
Org Glossary
Introduction
Summary
Org Glossary defines a flexible model for working with glossary-like constructs
(glossaries, acronyms, indices, etc.) within Org documents, with support for
in-buffer highlighting of defined terms and high-quality exports across all ox-*
backends.
Quickstart
To define a glossary entry, simply place a top-level heading in the document
titled Glossary
or Acronyms
and therein define terms using an Org definition
list, like so:
* Glossary - Emacs :: A lisp-based generic user-centric text manipulation environment that masquerades as a text editor. - Org mode :: A rich and versatile editing mode for the lovely Org format.
Then simply use the terms as you usually would when writing. On export Org Glossary will automatically:
- Pick up on the uses of defined terms
- Generate a Glossary/Acronym section at the end of the document
- Link uses of terms with their definitions, in a backend-appropriate manner (e.g. hyperlinks in html)
- Give the expanded version of each acronym in parenthesis when they are first used (e.g. "PICNIC (Problem In Chair, Not In Computer)")
To generate an Index for certain terms, you can almost do the same thing, just
use an Index
heading and use a plain list of terms, e.g.
* Index - org-mode
To see how this all works, try exporting the following example with Org Glossary installed:
Try using Org Glossary for all your glosses, acronyms, and more within your favourite ML with a unicorn mascot. It attempts to provide powerful functionality, in keeping with the simplicity of the Org ML we all know and love. * Glossary - glosses :: Brief notations, giving the meaning of a word or wording in a text. * Acronyms - ML :: Markup Language * Index - unicorn
If you'd like to see a visual indication of term uses while in org-mode, call
M-x org-glossary-mode
.
Design
In large or technical documents, there's often a need for an appendix clarifying terms and listing occurrences; this may take the form of a glossary, index, or something else. Org Glossary abstracts all of these glossary-like forms into tracked generated text replacements. Most common structures fit into this abstraction like so:
- Search for definitions of
$term
- Replace all uses of
$term
withf($term)
- Generate a definition section for all used terms, linking to the uses
Out of the box, four glossary-like structures are configured:
- Glossary
- The term is transformed to the same text, but linking to the definition.
- Acronyms
- The first use of the term adds the definition in parentheses, and subsequent uses simply link to the definition (behaving the same as glossary terms).
- Index
- The term is unchanged (the entire purpose of the index is achieved via step 3. alone).
- Text Substitutions
- The term is replaced with its definition.
For more details on how this works, see /tec/org-glossary/src/commit/25fcf67e6443f4edb966e2debdd77f9f13dfb64b/Structure%20of%20an%20export%20template%20set.
There is a little special-cased behaviour for indexes (usage detection) and text substitution (fontification), but it is kept to a minimum and ideally will be removed via generalisation in future.
Usage
Defining terms
Placement of definitions
Definitions must be placed under one of the specially named headings listed in
org-glossary-headings
, by default:
* Glossary * Acronyms * Index * Text Substitutions
If org-glossary-toplevel-only
is non-nil, then these headlines must also be
level one headings. If it is nil, then they are recognised wherever they occur
in the document. Note that when using subtree export and non-nil
org-glossary-toplevel-only
, only level-1 headings in the widened document will
be recognised (i.e. it behaves the same as non-subtree export).
External definition sources
Org Glossary supports searching for term definitions in other =#+include=d files,
respecting the various restrictions such as headings and line number ranges. You
may also specify include paths providing definitions that should be globally
available via org-glossary-global-terms
.
If you maintain a set of common term sources you may want to use, instead of
#+include=ing them, you can make use of the convenience keyword
=#+glossary_sources
.
The value of #+glossary_sources
is split on spaces and to form a list of
locations. Each location is appended to org-glossary-collection-root
to form the
fully qualified location. These locations are then =#+include=d.
For example, if org-glossary-collection-root
is set to a folder where a number
of individual definition files are places, one could then conveniently use a few with:
#+glossary_sources: abbrevs physics.org::*Quantum foo bar.org
This would be equivalent to:
#+include: COLLECTION-ROOT/abbrevs.org #+include: COLLECTION-ROOT/physics.org::*Quantum :only-contents t #+include: COLLECTION-ROOT/foo.org #+include: COLLECTION-ROOT/bar.org
You could also set to an individual file with the beginning of a heading
specification, say file.org::*
. This would allow you to have all the terms
defined in one file and include groups by heading.
Not that sources with heading/custom-id searches will automatically have
:only-contents t
added (as seen in the example). This allows for named headings
with glossary subheadings to work when org-glossary-toplevel-only
is set.
Basic definitions
Org already has a very natural structure for term-definition associations, description lists. Term definitions are extracted from all non-nested description lists within the glossary heading, other elements are simply ignored.
For example, to define "late pleistocene wolf" you could use a description list entry like so:
- late pleistocene wolf :: an extinct lineage of the grey wolf, thought to be the ancestor of the dog
which is an instance of the basic structure,
- TERM :: DEFINITION
Advanced definitions
When giving a simple definition like automaton :: A thing or being regarded as
having the power of spontaneous motion or action
, Org Glossary will actually
make a few assumptions.
- Your wish to refer to the term
automaton
withautomaton
- There is also a plural form, guessed by calling
org-glossary-plural-function
, in this case resulting inautomata
, and you wish to refer to the plural form withautomata
.
This is equivalent to the following "full form",
- automaton,automata = automaton,automata :: A thing or being regarded as having the power of spontaneous motion or action
which is an instance of the full structure,
- SINGULAR KEY, PLURAL KEY = SINGULAR FORM, PLURAL FORM :: DEFINITION
This may seem overly complicated, but unfortunately irregular plurals and homographs exist. Here are some examples of where this functionality comes into play:
- eveningtime=evening :: The latter part of the day, and early night. - eveninglevel=evening :: To make more even, to become balanced or level.
Here we wish to clarify different uses of the same term "evening", and so define unique keys for each usage. In writing you would use the keys like so,
In the eveningtime I take to eveninglevel out the sand pit.
Let us now consider both irregular plurals and defective nouns.
- ox, oxen :: A male bovine animal. - sheep, :: A domesticated ruminant mammal with a thick wooly coat. - glasses, :: An optical instrument worn to correct vision.
In the case of "ox, oxen" we give the irregular plural form explicitly. "Sheep" is also an irregular plural and by just putting a comma but omitting the plural form no plural form will be generated (it will be treated as a singularia tantum). The same behaviour occurs with "glasses", and while it is a plurale tantum internally it will be represented as a singularia tantum, but the behaviour is identical and so this is fine.
Alias terms
Sometimes a term may be known by multiple names. Such a situation is supported by the use of "alias terms", who's definition is simply the key of the canonical term.
This is best illustrated through an example, for which we will visit the field of molecular biology.
- beta sheet :: Common structural motif in proteins in which different sections of the polypeptide chain run alongside each other, joined together by hydrogen bonding between atoms of the polypeptide backbone.
The beta sheet may also be referred to using the greek letter β instead of "beta", or as the "beta pleated sheet". We can support these variants like so:
- \beta sheet :: beta sheet - beta pleated sheed :: beta sheet - \beta-pleated sheet :: beta sheet
Since the definition of each of these terms is an exact match for "beta sheet", they will be recognised as an alias for that term.
Categorisation
To make working with a large collection of terms easier, you might use sub-headings, e.g.
* Glossary ** Animals - late pleistocene wolf :: an extinct lineage of the grey wolf, thought to be the ancestor of the dog - ox, oxen :: A male bovine animal. - sheep, :: A domesticated ruminant mammal with a thick wooly coat. ** Technology - Emacs :: A lisp-based generic user-centric text manipulation environment that masquerades as a text editor. - glasses, :: An optical instrument worn to correct vision.
This structure will be ignored on export, allowing you to structure things
freely without worrying about how it will affect the export. Should you wish to
split up the exported entries into categories, this can be accomplished by using
subheadings with the :category:
tag. You can nest category-tagged subheadings
inside each other, but only the innermost category will be applied.
* Glossary ** Animals :category: ** Technology :category: *** Text Editors :category: *** Mechanical :category:
Using terms
Org Glossary presumes that you'll want to associate a defined term with every usage of it. As such, on export it scans the document for all instances of a defined term and transforms them into one of the four glossary link types:
gls
, singular lowercaseglspl
, plural lowercaseGls
, singular sentence caseGlspl
, plural sentence case
To switch from implicit associations to explicit, set org-glossary-automatic
to
nil
and then only gls=/=glspl=/=Gls=/=Glspl
links will be picked up. To convert
implicit associations to explicit links, you can run M-x
org-glossary-apply-terms
(if nothing happens, try running M-x
org-glossary-update-terms
first).
Note that as Org Glossary relies on links, recognised usages can only occur in
places where a link is appropriate (i.e. not inside a source block, verbatim
text, or another link, etc.). The variable org-glossary-autodetect-in-headings
determines whether terms in headings are automatically linked. By default, this
is off and headings are ignored, since this behaviour is generally seen as
undesirable.
In addition to all this, there's a bit of special behaviour for indexing. As
you can discuss a topic without explicitly stating it, we support
ox-texinfo
-style #+[cfkptv]?index
keywords. For example:
#+index: penguin The Linux operating system has a flightless, fat waterfowl (affectionately named Tux) as its mascot. * Index - penguin
Printing definition sections
When exporting a document, all identified glossary headings are unconditionally stripped from the document. If nothing else is done, based on term usage definition sections will be generated and appended to the document.
Fine grained control over the generation of definition sections is possible via
the #+print_glossary:
keyword, which disables the default "generate and append to
document" behaviour.
Simply inserting a #+print_glossary:
keyword will result in the default
generated definition sections being inserted at the location of the
#+print_glossary:
keyword. However, customisation of the behaviour is possible
via a number of babel-style :key value
options, namely:
:type
(glossary acronym index
by default), the specific glossary-like structures that definition sections should be generated for-
:level
(0
by default), both:- The scope in which term uses should be searched for, with 0 representing the whole document, 1 within the parent level-1 heading, 2 the parent level-2 heading, etc.
- One less than the minimum inserted heading level.
:consume
(nil
by default), ift
oryes
then marks terms defined here as having been defined, preventing them from being listed in any other#+print_glossary:
unless:all
is set tot
oryes
.:all
(nil
by default), behaves as just described in:consumed
.:only-contents
(nil
by default), ift
oryes
then the:heading
(from the export template) is excluded from the generated content.
Putting this all together, the default #+print_glossary:
command written out in
full is:
#+print_glossary: :type glossary acronym index :level 0 :consume no :all no :only-contents no
The minor mode
A visual indication of defined terms instances is provided by the minor mode
org-glossary-mode
. This essentially performs two actions:
- Run
org-glossary-update-terms
to update an buffer-local list of defined terms - Add some fontification rules to make term uses stand out.
The local list of defined terms and fontification allow for a few niceties, such as:
- Showing the term definition in the minibuffer when hovering over a fontified use
- Calling
M-x org-glossary-goto-term-definition
or clicking on a fontified use to go to the definition M-x org-glossary-insert-term-reference
to view the list of currently defined terms, and perhaps insert a use.- In the case of Text Substitutions, displaying the replacement text on top of
the use, when
org-glossary-display-substitute-value
is non-nil.
Export configuration
Setting export parameters
The content generated for export is governed by templates defined in
org-glossary-export-specs
. We will discuss them in detail shortly, but for now
we consider that in different situations we will want different generated
content. There are two levels on which this applies:
- By export backend
- By the type of glossary-like structure (Glossary, Acronyms, Index, etc.)
This is accounted for by creating an alist of alists of templates. This is a bit of a mouthful, so let's unpack what exactly is going on.
First, we create associations between export backends and specs, with the
special "backend" t
as the default value, i.e.
((t . DEFAULT-TEMPLATE-SET) (html . HTML-TEMPLATE-SET) (latex . LATEX-TEMPLATE-SET) ...)
When selecting the appropriate template set, we actually check each entry
against the current export backend using org-export-derived-backend-p
(in
order). This has two implications:
- You can export to derived backends (e.g. beamer) and things should just work
- If specifying a template set for a derived backend (e.g.
beamer
) be sure to put it before any parent backends (i.e.latex
, inbeamer
's case) inorg-glossary-export-specs
to ensure it is actually used.
The backend-appropriate template set is itself an alist of templates, like so:
((t . TEMPLATE) (glossary . TEMPLATE) (acronym . TEMPLATE) (index . TEMPLATE))
Once again, t
gives the default value. For each of the types listed in
org-glossary-headings
, the template is filled out, pulling first from the
backend-specific defaults template, then the global defaults. This gives a
complete template set which governs the export behaviour for each type of
glossary-like structure for the current backend.
Structure of an export template set
The export of term uses and definitions is governed by template sets. The
default template set is given by (alist-get t (alist-get t
~org-glossary-export-specs))
, the default value of which is given by the
following property list:
(:use "%t" :first-use "%u" :definition "%t" :backref "%r" :heading "" :category-heading "* %c\n" :letter-heading "*%L*\n" :definition-structure-preamble "" :definition-structure "*%d*\\emsp{}%v\\ensp{}%b\n")
Each property refers to a particular situation, and the value is either:
- A format string that represents the content that should be used
- A function with the same signature as
org-glossary--export-template
, that generated the replacement content string.
The :use
, :first-use
, :definition
, and :backref
properties are applied during
backend-specific content transcoding (i.e. using the syntax of the backend's
output), while :definition-structure
, :category-heading
, and :letter-seperator
are applied to a copy of the Org document just prior to the backend-specific
export process (and so should be written using Org syntax).
The format strings can make use of the following tokens:
%t
, the term being defined/used. This is pluralised and capitalised automatically based on the link type (gls=/=glspl=/=Gls=/=Glspl
).%v
, the term definition value.%k
, the term key.%K
, the term key buffer-local nonce (number used only once). This will only be consistent within a particular Emacs session.%l
, the first letter of the term, in lower case.%L
, the first letter of the term, in upper case.%r
, the term reference index (only applicable to:use
and:first-use
).%n
, the number of times the term is used/referenced.%c
, the term category.%u
, the result of:use
(primarily intended for convenience with:first-use
)%d
, the result of:definition
(only applicable to:definition-structure
)%b
, all the:backref
results joined with", "
(only applicable to:definition-structure
).
The :definition-structure-preamble
and :heading
parameters are literal strings
also inserted to the copy of the Org document just prior to backend-specific
export stages.
To illustrate how these properties come into play, the following example uses the property names in place of their generated content.
Here's some text and now the term :first-use, if I use the term again it is now :use. Once more, :use. Now we have the appendix with glossary-like definitions. :heading :category-heading :letter-heading :definition-structure-preamble :definition-structure(:definition def-value :backref)
To avoid superfluous letter headings (i.e. not helpful), we have
org-glossary-print-letter-minimums
. This variable specifies a threshold minimum
number of distinct initial term letters and terms with the same letter before
the :letter-heading
template should be inserted.
If :heading
, :category-heading
, or :letter-heading
start with "* "
then
asterisks will be automatically prefixed to set the headings to an appropriate
level.
Creating a new glossary type
Let's consider a few examples. To start with, say we want to be able to define
indexed terms under the heading Indices
instead of Index
. To accomplish this,
all you need to do is add an entry to org-glossary-headings
, which can be done
via the customisation interface or with the following snippet:
(customize-set-value 'org-glossary-headings (cl-remove-duplicates (append org-glossary-headings '(("Indices" . index)))))
Should we actually want to have this be reflected in the export, we could either:
- Rename the
index
heading to* Indices
, or - Create a near-copy of
index
, just changing the heading
In the first case, all we need to do is execute the following snippet.
(org-glossary-set-export-spec t 'index :heading "* Indices)
Should we actually want to have this be reflected in the export, instead of
associating Indices
with the pre-defined index term we would first add an
("Indices" . indicies)
pair to org-glossary-headings
(as before).
Then, we can copy each index
template currently in org-glossary-export-specs
and
simply update the default :heading
as we've just done for index
.
(dolist (template-set org-glossary-export-specs) (when-let ((index-template (alist-get 'index (cdr template-set)))) (push (cons 'indices index-template) (cdr template-set)))) (org-glossary-set-export-spec t 'indices :heading "* Indices)
For our final example, let's say we wanted to add support for Abbreviations
.
This works in much the same way as Acronyms, just with shortened forms of words
or phrases not constructed from the first letters. After adding an
("Abbreviations" . abbreviation)
pair to org-glossary-headings
in the same
manner as earlier, this is as simple as:
(push '(abbreviation :heading "* Abbreviations" :first-use "%v (%u)") (plist-get t org-glossary-export-specs))
Tweaking specific exports
Instead of overwriting org-glossary-export-specs
, it is recommended that you
instead make use of setcdr
or plist-put
like so:
(org-glossary-set-export-spec 'latex t :backref "gls-%k-use-%r" :backref-seperator "," :definition-structure "*%d*\\emsp{}%v\\ensp{}@@latex:\\ifnum%n>0 \\labelcpageref{@@%b@@latex:}\\fi@@\n")
In this example we could alternatively set :definition-structure
to a function
to avoid the \ifnum%n>0
LaTeX switch.
(org-glossary-set-export-spec 'latex t :definition-structure (lambda (backend info term-entry form &optional ref-index plural-p capitalized-p extra-parameters) (org-glossary--export-template (if (plist-get term-entry :uses) "*%d*\\emsp{}%v\\ensp{}@@latex:\\labelcpageref{@@%b@@latex:}@@\n" "*%d*\\emsp{}%v\n") backend info term-entry ref-index plural-p capitalized-p extra-parameters)))
This allows for any change in other backends or the defaults you're not particularly attached to from freely updating.
Adding a new export backend
Adding a new export spec is as easy as pushing a spec list to
org-glossary-export-specs
, for example should we want to add an ox-md
backend we
could do this:
(push '(md (t :use "[%t](#gls-%K)" :definition "%t {#gls-%K}" :definition-structure "%d\n\\colon{} %v [%n uses]\n")) org-glossary-export-specs)
We need to remember to use \colon{}
instead of :
to avoid it being interpreted
as Org fixed-width syntax.
Alternatively, we could use org-glossary-set-export-spec
, which has the
advantage of being idempotent, and I would argue a little clearer.
(org-glossary-set-export-spec 'md t :use "[%t](#gls-%K)" :definition "%t {#gls-%K}" :definition-structure "%d\n\\colon{} %v [%n uses]\n")