Menu

Generate Plain Vanilla HTML from Org Mode

Introduction

Summary

This page explains

Issues

Org-mode comes with out-of-the-box HTML export functionality (along other supported output formats). The default export settings generate the following artefacts along with the core content:

Usually, I do not need these additional artefacts.

Furthermore, there are exports settings, which you might want to control:

Approach

I took the following approach when dealing with these issues

init.el Customisations

I needed to put a few customisations in init.el:

'(org-export-allow-bind-keywords t)

This setting is important, as it enables in-buffer settings beyond the standard in-buffer variables.

'(org-html-extension "htm")

Due to long legacy, I still stick to .htm instead of .html. At least in Emacs v25.1, it could not be set by an in-buffer setting.

 '(org-html-text-markup-alist
   '((bold . "<strong>%s</strong>")
	 (code . "<code class=\"c\">%s</code>")
	 (italic . "<em>%s</em>")
	 (strike-through . "<del>%s</del>")
	 (underline . "<span class=\"underline\">%s</span>")
	 (verbatim . "<code class=\"v\">%s</code>")))

This is my choice to map some core in-line text semantics to HTML.

In-Buffer Settings

Org Manual 13.18.1 - Exporting to minimal HTML

The official org-mode manual contains in section 13.18.1 advice how to achieve "minimal HTML" export. You can achieve this with the following in-buffer options line:

#+OPTIONS: html-postamble:nil html-preamble:nil html-scripts:nil html-style:nil

I neither need pre-amble and post-ample, because I add those in a separate step using XML/XSLT technology (see trgensit).

I do not need in-line CSS styles in the HTML, because I add a link to a central CSS style sheet in a later step, using XML/XSLT technology (see trgensit).

I still only use static web sites that do not need JavaScript.

My own preferences

I put some more configuration lines at the top of the file:

#+OPTIONS: num:nil toc:nil ^:{} H:4 tags:nil creator:nil
#+BIND: org-html-toplevel-hlevel 1

The #+TITLE meta data becomes a HTML title element. And the org-mode exporter also puts a special first h1 heading based on the #+TITLE meta data content. As a consequence, by default all top level org-mode headings (one "*" star) get output as h2.

I want to remove the extra h1 due to the title meta data. I have not found a way to switch it off by org-mode means, hence I need to post-process the HTML.

With this setting, the one star headings become h1 and I can decide the heading hierarchy on a case by case basis.

#+BIND: org-html-mathjax-template ""

This suppresses further javascript code, which was introduced in Emacs 26.1 and is related to Latex. Note that this has to be an empty string. nil would cause an error.

Putting it all together using #+SETUPFILE:

Rather than putting all these #+... lines at the top of an .org file, I have a central file mysetup.org containing these lines. Thus, the .org file just need to have a first line:

#+SETUPFILE: ~/mysetup.org

This allows for easy central maintenance.

And: If you needed a special setting for a particular file, you could still put this setting after the #+SETUPFILE... line and it will take precedence.

With these settings, I get quite close to a plain vanilla HTML file fitting my requirements. A few issues remain:

XSLT Style-Sheet Processing to "Amend" the HTML

To address above mentioned issues, I have coded a small elisp package tr-org-html.el. tr-org-html.el contains a function tr-org-html-export-to-html that is calling the standard org-mode HTML export function and then does a XSLT transformation to arrive at a final, "amended" HTML file.

The XSLT transformation is done using the xsltproc executable. xsltproc uses a XSLT style sheet as input, which I named tr-org-html-trnsfrm.xsl, and the HTML file that org-mode has generated as 2nd input, to generate a new, final HTML file.

I publish the mentioned files here in ./trorghtml.zip under the GNU public license. Please see file COPYING.txt in the ZIP file. I do not take any responsibility nor warranty for using this software nor for this write-up.

Addressing the namespace issue required special gymnastics in the XSLT style sheet. I needed to do some online research, as the XSLT 1.0 documentation is silent about this issue. The issue seems to be explicitely addressed in XSLT 2.0.

Installation

Extract the files in ./trorghtml.zip and copy the files tr-org-html.el and tr-org-html-trnsfrm.xsl to a directory in Emacs' load-path, e.g. /usr/share/emacs/site-lisp/. I chose (Windows 10): C:\Users\username\OneDrive\myPrograms\emacs-27.2\share\emacs\site-lisp\.

The files tr-org-html.el and tr-org-html-trnsfrm.xsl must be in the same directory

Your init.el needs to contain the line

(require 'tr-org-html)

Requirements

Beyond standard Emacs, you need xsltproc in your path. xsltproc seems available for all Linux distributions and also for MS Windows. I trust it is also available for macOS.

Bugs

A missing #+TITLE meta data element causes the XSLT transform to fail because of the entity &lrm; (LEFT-TO-RIGHT-MARK) the exporter emits as the title element's content. To fix this, you need to set up a local "catalog" plus a local DTD file and entity definition files, which is involved.

History

As I write this (2023-04), I am using Emacs v27.2 under MS Windows 10, as 27.2 seems to be the last version to support Windows 10 32 bit.

Before, I had used Emacs v25.1 and had addressed above mentioned issues by coding a "derived" HTML org-mode exporter. This had been serving me well for some years, but the derived exporter did not work any more under Emacs v27.2.


Last change: 2024-03-22
© 2002-2024 Dr. Thomas Redelberger redethogmx.de

Close menu