Menu

Generate Plain Vanilla HTML from Org Mode

Introduction

Summary

This page explains

Issues

Org-mode comes with out-of-the-box HTML export functionality (along other supported output formats). The default export settings generate the following artefacts along with the core content:

Usually, I do not need these additional artefacts.

Furthermore, there are exports settings, which you might want to control:

Approach

I took the following approach when dealing with these issues

init.el Customisations

I needed to put a few customisations in init.el:

'(org-export-allow-bind-keywords t)

This setting is important, as it enables in-buffer settings beyond the standard in-buffer variables.

'(org-html-extension "htm")

Due to long legacy, I still stick to .htm instead of .html. At least in Emacs 25.1, it could not be set by an in-buffer setting.

 '(org-html-text-markup-alist
   '((bold . "<strong>%s</strong>")
	 (code . "<code class=\"c\">%s</code>")
	 (italic . "<em>%s</em>")
	 (strike-through . "<del>%s</del>")
	 (underline . "<span class=\"underline\">%s</span>")
	 (verbatim . "<code class=\"v\">%s</code>")))

This is my choice to map some core in-line text semantics to HTML.

In-Buffer Settings

Org Manual 13.18.1 - Exporting to minimal HTML

The official org-mode manual contains in section 13.18.1 advice how to achieve "minimal HTML" export. I list here the respective lines to be put at the top of each org-mode file, that is to be exported as HTML:

#+BIND: org-html-head-include-default-style nil

No CSS styles in the HTML, because I add a link to a central CSS style sheet in a later step, using XML/XSLT technology (see trgensit).

#+BIND: org-html-head-include-scripts nil

No JavaScript code. I only use static web sites that do not need JavaScript.

#+BIND: org-html-preamble nil

No preamble. I do this in a separate step using XML/XSLT technology (see trgensit).

#+BIND: org-html-postamble nil

No post-amble. I do this in a separate step using XML/XSLT technology (see trgensit).

#+BIND: org-html-use-infojs nil

This line from the manual is not needed, because #+INFOJS_OPT: switches on JavaScript code in the export file.

Note that from the above options, setting org-html-head-include-scripts nil is essential, because otherwise the subsequent XSLT transformation will fail. The emitted JScript code contains a line which xsltproc complains about (org-mode version 9.4.4):

parser error : EntityRef: expecting ';'
// @license magnet:?xt=urn:btih:e95b018ef3580986a04669f1b5879592219e2a7a&dn=publ
                                                                          ^

Unintentionally, the string &dn is parsed as an "entity", which is missing the terminating ;.

My own preferences

I put some more configuration lines at the top of the file:

#+OPTIONS: num:nil toc:nil ^:{} H:4 tags:nil
#+BIND: org-export-with-creator nil

No creator sentence in the post-amble.

#+BIND: org-html-toplevel-hlevel 1

The #+TITLE meta data becomes a HTML title element. And the org-mode exporter also puts a special first h1 heading based on the #+TITLE meta data content. As a consequence, by default all top level org-mode headings (one "*" star) get output as h2.

I want to remove the extra h1 due to the title meta data. I have not found a way to switch it off by org-mode means, hence I need to post-process the HTML.

With this setting, the one star headings become h1 and I can decide the heading hierarchy on a case by case basis.

#+HTML_DOCTYPE: xhtml5

The default is xhtml-strict, which is a bit more verbose than xhtml5.

#+BIND: org-html-mathjax-template ""

This suppresses further javascript code, which was introduced in Emacs 26.1 and is related to Latex. Note that this has to be an empty string. nil would cause an error.

Putting it all together using #+SETUPFILE:

Rather than putting all these #+... lines at the top of an .org file, I have a central file mysetup.org containing these lines. Thus, the .org file just need to have a first line:

#+SETUPFILE: ~/mysetup.org

This allows for easy central maintenance.

And: If you needed a special setting for a particular file, you could still put this setting after the #+SETUPFILE... line and it will take precedence.

With these settings, I get quite close to a plain vanilla HTML file fitting my requirements. A few issues remain:

XSLT Style-Sheet Processing to "Amend" the HTML

To address above mentioned issues, I have coded a small elisp package tr-org-html.el. tr-org-html.el contains a function tr-org-html-export-to-html that is calling the standard org-mode HTML export function and then does a XSLT transformation to arrive at a final, "amended" HTML file.

The XSLT transformation is done using the xsltproc executable. xsltproc uses a XSLT style sheet as input, which I named tr-org-html-trnsfrm.xsl, and the HTML file that org-mode has generated as 2nd input, to generate a new, final HTML file.

I publish the mentioned files here in ./trorghtml.zip under the GNU public license. Please see file COPYING.txt in the ZIP file. I do not take any responsibility nor warranty for using this software nor for this write-up.

Addressing the namespace issue required special gymnastics in the XSLT style sheet. I needed to do some online research, as the XSLT 1.0 documentation is silent about this issue. The issue seems to be explicitely addressed in XSLT 2.0.

Installation

Extract the files in ./trorghtml.zip and copy the files tr-org-html.el and tr-org-html-trnsfrm.xsl to a directory in Emacs' load-path, e.g. /usr/share/emacs/site-lisp/. I chose (Windows 10): C:\Users\username\OneDrive\myPrograms\emacs-27.2\share\emacs\site-lisp\.

The files tr-org-html.el and tr-org-html-trnsfrm.xsl must be in the same directory

Your init.el needs to contain the line

(require 'tr-org-html)

Requirements

Beyond standard Emacs, you need xsltproc in your path. xsltproc seems available for all Linux distributions and also for MS Windows. I trust it is also available for macOS.

Bugs

A missing #+TITLE meta data element causes the XSLT transform to fail because of the entity &lrm; (LEFT-TO-RIGHT-MARK) the exporter emits as the title element's content.

History

As I write this (2023-04), I am using Emacs v27.2 under MS Windows 10, as 27.2 seems to be the last version to support Windows 10 32 bit.

Before, I had used Emacs v25.1 and had addressed above mentioned issues by coding a "derived" HTML org-mode exporter. This had been serving me well for some years, but the derived exporter did not work any more under Emacs v27.2.


Last change: 2023-11-24
© 2002-2023 Dr. Thomas Redelberger redethogmx.de

Close menu