Menu

How to Deal With XML Entities

XML Entities have been part of the XML standard from the beginning.

Entities are more widely known for their use in HTML, where they are used to define special characters like %szlig;. All web browsers have built-in entity support. However, ever since UTF-8 character encoding has been widely adopted on the internet, the need to use entities to denote special characters has been decreasing.

Still, when processing XHTML (HTML which conforms to the XML standard), entities need to be taken into account.

Issue: XSLT transformations of XHTML1 documents.

For example, Emacs (v28.2) and its built-in org-mode (v9.5.5) HTML exporter issues the following at the top of the exported XHTML document:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
...

When you want to process such a file by xsltproc ‑ a standard XSLT transformation tool ‑ by default, it will reach out via the Internet and fetch the file http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd. This "Document Type Definition" (DTD) file

The entity definitions can be multiple files, which include each other.

You might not want to have the XSLT processor (transformation tool) reach out via the internet, for various reasons. In this case, you need to store the DTD locally.

Tools to list and manage a so called "catalog" of DTDs are the xmllint and xmlcatalog command line utilities. Their man pages explain the details. These are for example part of the Debian package libxml2-utils.

The file catalog.xml needs to go to /etc/xml/. This is the default location tools like xsltproc look-up.

The file xhtml1-strict.dtd goes to /usr/share/xml/xhtml1/xhtml1-strict.dtd

The files

xhtml1-lat1.ent   
xhtml1-special.ent
xhtml1-symbol.ent 

also need to go to /usr/share/xml/xhtml1/. These are all included by the DTD.

For your convenience, I provide here a ZIP archive ./xhtml1-dtd.zip containing these files.

Links

xsltproc - How to use and write a catalog file

Regards catalog file in xsltproc and libxml2

List of XML and HTML character entity references - Wikipedia

HTML entity in XSLT (e.g. &nbsp;) - Stack Overflow

https://stackoverflow.com/questions/31870/using-an-html-entity-in-xslt-e-g-nbsp This works only for entities in the style sheet, not in the input XML doc.


Last change: 2024-06-04
© 2002-2024 Dr. Thomas Redelberger redetho(a‍t)gmx.de

Close menu