This page explains
Org-mode comes with out-of-the-box HTML export functionality (along other supported output formats). The default export settings generate the following artefacts along with the core content:
Usually, I do not need these additional artefacts.
Furthermore, there are exports settings, which you might want to control:
I took the following approach when dealing with these issues
init.el
init.el
Customisations
I needed to put a few customisations in init.el
:
'(org-export-allow-bind-keywords t)
This setting is important, as it enables in-buffer settings beyond the standard in-buffer variables.
'(org-html-extension "htm")
Due to long legacy, I still stick to .htm
instead of .html
. At
least in Emacs v25.1, it could not be set by an in-buffer setting.
'(org-html-text-markup-alist '((bold . "<strong>%s</strong>") (code . "<code class=\"c\">%s</code>") (italic . "<em>%s</em>") (strike-through . "<del>%s</del>") (underline . "<span class=\"underline\">%s</span>") (verbatim . "<code class=\"v\">%s</code>")))
This is my choice to map some core in-line text semantics to HTML.
The official org-mode manual contains in section 13.18.1 (org version 9.5) or section 13.9.5 (org version 9.7) advice how to achieve "minimal HTML" export. You can achieve this with the following in-buffer options line:
#+OPTIONS: html-postamble:nil html-preamble:nil html-scripts:nil html-style:nil
I neither need pre-amble and post-amble, because for my web-site, I add those in a separate step using XML/XSLT technology (see trgensit).
I do not need in-line CSS styles in the HTML, because I add a link to a central CSS style sheet in a later step, using XML/XSLT technology (see trgensit).
I only use static web sites that do not need JavaScript.
I put some more configuration lines at the top of the file:
#+OPTIONS: num:nil toc:nil ^:{} H:4 tags:nil creator:nil
#+BIND: org-html-toplevel-hlevel 1
The #+TITLE
meta data becomes a HTML title
element. And the
org-mode exporter also puts a special first h1
heading based on
the #+TITLE
meta data content. As a consequence, by default all top
level org-mode headings (one "*" star) get output as h2
HTML.
I want to remove the extra h1
due to the title meta data. I have not
found a way to switch it off by org-mode means, hence I need to
post-process the HTML.
With this setting, the one star headings become h1
and I can decide
the heading hierarchy on a case by case basis.
#+BIND: org-html-mathjax-template ""
This suppresses further javascript code, which was introduced in Emacs
26.1 and is related to Latex. Note that this has to be an empty
string. nil
would cause an error.
#+SETUPFILE:
Rather than putting all these #+...
lines at the top of an .org
file, I
have a central file mysetup.org
containing these lines. Thus, the .org
file just need to have a first line:
#+SETUPFILE: ~/mysetup.org
This allows for easy central maintenance.
And: If you needed a special setting for a particular file, you could
still put this setting after the #+SETUPFILE...
line and it will
take precedence.
With these settings, I get quite close to a plain vanilla HTML file fitting my requirements. A few issues remain:
id
attributes, which start
with org...
. They are used to link from the table of contents. As
I do not use the TOC and as these ids change, I cannot use them as
link anchors myself. Hence I remove those idsdiv
elements, even nested ones, where I do
not see a purpose. Hence I remove those divsh1
element, that
stems from #+TITLE
img
element to an object
element for linking SVG files.
I want to keep img
elements. Emacs v28.1 seems to have gone back to
img
elements.xmlns="http://www.w3.org/1999/xhtml"
in the output's html
element. While this is correct and reflects the wanted xhtml
format, this namespace declaration breaks the XSLT processing I do
down the line to generate web content. I would need to modify the
XSLT style sheets to explicitly cater for this namespace. I do not
see a benefit in doing so. Hence I remove the declaration.
To address above mentioned issues, I have coded a small elisp package
tr-org-html.el
. tr-org-html.el
contains a function
tr-org-html-export-to-html
that is calling the standard org-mode
HTML export function and then does a XSLT transformation to arrive at
a final, "amended" HTML file.
The XSLT transformation is done using the xsltproc
command line
tool. xsltproc
uses a XSLT style sheet as input, which I named
tr-org-html-trnsfrm.xsl
, and the HTML file that org-mode has
generated as 2nd input, to generate a new, final HTML file.
I publish the mentioned files here in ./trorghtml.zip under the GNU
public license. Please see file COPYING.txt
in the ZIP file. I do
not take any responsibility nor warranty for using this software nor
for this write-up.
Addressing the namespace issue required special gymnastics in the XSLT style sheet. I needed to do some online research, as the XSLT 1.0 documentation is silent about this issue. The issue seems to be explicitly addressed in XSLT 2.0.
Extract the files in ./trorghtml.zip and copy the files
tr-org-html.el
and tr-org-html-trnsfrm.xsl
to a directory in
Emacs' load-path
, e.g. /usr/share/emacs/site-lisp/
under Linux.
When I use Windows 10, I chose
C:\Users\username\OneDrive\myPrograms\emacs-27.2\share\emacs\site-lisp\
.
The files tr-org-html.el
and tr-org-html-trnsfrm.xsl
must be in
the same directory
Your init.el
needs to contain the line
(require 'tr-org-html)
Beyond standard Emacs, you need the executable xsltproc
in your
path. xsltproc
seems available for all Linux distributions and also
for MS Windows. I trust it is also available for macOS.
In tr-org-html.el
the command line for xsltproc
forces it to not
reach out to the internet. However, by default, xsltproc
does a
validation of the exported XHTML document according to the DTD
specified in the document head. This will only work, if you have
set-up a local copy of the DTD. I have written up how to do this here.
Another option seems to be to simply switch off DTD validation by
inserting the --novalid
command line option to xsltproc
. However,
this causes the XSLT transform to fail sometimes, because the org-mode
exporter sometimes emits "entities" and such "entities" are defined in
the DTD.
For example: a missing #+TITLE
meta data element causes the org-mode
exporter to emit a the title element containing the entity ‎
(LEFT-TO-RIGHT-MARK). Setting
#+OPTIONS: e:nil
does not switch off the emission of ‎
by the org-mode
exporter. Rather that setting switches off the replacement of e.g.
\alpha -> α
As I write this (2024-06), I am using both Emacs v28.2 under Debian Bookworm 64 bit and Emacs v27.2 under MS Windows 10, as 27.2 seems to be the last version to support Windows 10 32 bit.
Before, I had used Emacs v25.1 and had addressed above mentioned issues by coding a "derived" HTML org-mode exporter. This had been serving me well for some years, but the derived exporter did not work any more under Emacs v27.2.