Menu

Generate Plain Vanilla HTML from Org Mode

Introduction

Summary

This page explains

Issues

Org-mode comes with out-of-the-box HTML export functionality (along other supported output formats). The default export settings generate the following artefacts along with the core content:

Usually, I do not need these additional artefacts.

Furthermore, there are exports settings, which you might want to control:

Approach

I took the following approach when dealing with these issues

init.el Customisations

I needed to put a few customisations in init.el:

'(org-export-allow-bind-keywords t)

This setting is important, as it enables in-buffer settings beyond the standard in-buffer variables.

'(org-html-extension "htm")

Due to long legacy, I still stick to .htm instead of .html. At least in Emacs v25.1, it could not be set by an in-buffer setting.

 '(org-html-text-markup-alist
   '((bold . "<strong>%s</strong>")
	 (code . "<code class=\"c\">%s</code>")
	 (italic . "<em>%s</em>")
	 (strike-through . "<del>%s</del>")
	 (underline . "<span class=\"underline\">%s</span>")
	 (verbatim . "<code class=\"v\">%s</code>")))

This is my choice to map some core in-line text semantics to HTML.

In-Buffer Settings

Org Manual - Exporting to minimal HTML

The official org-mode manual contains in section 13.18.1 (org version 9.5) or section 13.9.5 (org version 9.7) advice how to achieve "minimal HTML" export. You can achieve this with the following in-buffer options line:

#+OPTIONS: html-postamble:nil html-preamble:nil html-scripts:nil html-style:nil

I neither need pre-amble and post-amble, because for my web-site, I add those in a separate step using XML/XSLT technology (see trgensit).

I do not need in-line CSS styles in the HTML, because I add a link to a central CSS style sheet in a later step, using XML/XSLT technology (see trgensit).

I only use static web sites that do not need JavaScript.

My own preferences

I put some more configuration lines at the top of the file:

#+OPTIONS: num:nil toc:nil ^:{} H:4 tags:nil creator:nil
#+BIND: org-html-toplevel-hlevel 1

The #+TITLE meta data becomes a HTML title element. And the org-mode exporter also puts a special first h1 heading based on the #+TITLE meta data content. As a consequence, by default all top level org-mode headings (one "*" star) get output as h2 HTML.

I want to remove the extra h1 due to the title meta data. I have not found a way to switch it off by org-mode means, hence I need to post-process the HTML.

With this setting, the one star headings become h1 and I can decide the heading hierarchy on a case by case basis.

#+BIND: org-html-mathjax-template ""

This suppresses further javascript code, which was introduced in Emacs 26.1 and is related to Latex. Note that this has to be an empty string. nil would cause an error.

Putting it all together using #+SETUPFILE:

Rather than putting all these #+... lines at the top of an .org file, I have a central file mysetup.org containing these lines. Thus, the .org file just need to have a first line:

#+SETUPFILE: ~/mysetup.org

This allows for easy central maintenance.

And: If you needed a special setting for a particular file, you could still put this setting after the #+SETUPFILE... line and it will take precedence.

With these settings, I get quite close to a plain vanilla HTML file fitting my requirements. A few issues remain:

XSLT Style-Sheet Processing to "Amend" the HTML

To address above mentioned issues, I have coded a small elisp package tr-org-html.el. tr-org-html.el contains a function tr-org-html-export-to-html that is calling the standard org-mode HTML export function and then does a XSLT transformation to arrive at a final, "amended" HTML file.

The XSLT transformation is done using the xsltproc command line tool. xsltproc uses a XSLT style sheet as input, which I named tr-org-html-trnsfrm.xsl, and the HTML file that org-mode has generated as 2nd input, to generate a new, final HTML file.

I publish the mentioned files here in ./trorghtml.zip under the GNU public license. Please see file COPYING.txt in the ZIP file. I do not take any responsibility nor warranty for using this software nor for this write-up.

Addressing the namespace issue required special gymnastics in the XSLT style sheet. I needed to do some online research, as the XSLT 1.0 documentation is silent about this issue. The issue seems to be explicitly addressed in XSLT 2.0.

Installation

Extract the files in ./trorghtml.zip and copy the files tr-org-html.el and tr-org-html-trnsfrm.xsl to a directory in Emacs' load-path, e.g. /usr/share/emacs/site-lisp/ under Linux. When I use Windows 10, I chose C:\Users\username\OneDrive\myPrograms\emacs-27.2\share\emacs\site-lisp\.

The files tr-org-html.el and tr-org-html-trnsfrm.xsl must be in the same directory

Your init.el needs to contain the line

(require 'tr-org-html)

Requirements

Beyond standard Emacs, you need the executable xsltproc in your path. xsltproc seems available for all Linux distributions and also for MS Windows. I trust it is also available for macOS.

Bugs

In tr-org-html.el the command line for xsltproc forces it to not reach out to the internet. However, by default, xsltproc does a validation of the exported XHTML document according to the DTD specified in the document head. This will only work, if you have set-up a local copy of the DTD. I have written up how to do this here.

Another option seems to be to simply switch off DTD validation by inserting the --novalid command line option to xsltproc. However, this causes the XSLT transform to fail sometimes, because the org-mode exporter sometimes emits "entities" and such "entities" are defined in the DTD.

For example: a missing #+TITLE meta data element causes the org-mode exporter to emit a the title element containing the entity &lrm; (LEFT-TO-RIGHT-MARK). Setting

#+OPTIONS: e:nil

does not switch off the emission of &lrm; by the org-mode exporter. Rather that setting switches off the replacement of e.g.

\alpha -> &alpha;

History

As I write this (2024-06), I am using both Emacs v28.2 under Debian Bookworm 64 bit and Emacs v27.2 under MS Windows 10, as 27.2 seems to be the last version to support Windows 10 32 bit.

Before, I had used Emacs v25.1 and had addressed above mentioned issues by coding a "derived" HTML org-mode exporter. This had been serving me well for some years, but the derived exporter did not work any more under Emacs v27.2.


Last change: 2024-06-26
© 2002-2024 Dr. Thomas Redelberger redetho(a‍t)gmx.de

Close menu