XML Entities have been part of the XML standard from the beginning.
Entities are more widely known for their use in HTML, where they are used to define special characters like %szlig;. All web browsers have built-in entity support. However, ever since UTF-8 character encoding has been widely adopted on the internet, the need to use entities to denote special characters has been decreasing.
Still, when processing XHTML (HTML which conforms to the XML standard), entities need to be taken into account.
For example, Emacs (v28.2) and its built-in org-mode (v9.5.5) HTML exporter issues the following at the top of the exported XHTML document:
<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"> <head> ...
When you want to process such a file by xsltproc
‑ a standard XSLT
transformation tool ‑ by default, it will reach out via the Internet
and fetch the file http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd.
This "Document Type Definition" (DTD) file
The entity definitions can be multiple files, which include each other.
You might not want to have the XSLT processor (transformation tool) reach out via the internet, for various reasons. In this case, you need to store the DTD locally.
Tools to list and manage a so called "catalog" of DTDs are the
xmllint
and xmlcatalog
command line utilities. Their man pages
explain the details. These are for example part of the Debian package
libxml2-utils
.
The file catalog.xml
needs to go to /etc/xml/
. This is the default
location tools like xsltproc
look-up.
The file xhtml1-strict.dtd
goes to /usr/share/xml/xhtml1/xhtml1-strict.dtd
The files
xhtml1-lat1.ent xhtml1-special.ent xhtml1-symbol.ent
also need to go to /usr/share/xml/xhtml1/
. These are all included by
the DTD.
For your convenience, I provide here a ZIP archive ./xhtml1-dtd.zip containing these files.
https://stackoverflow.com/questions/31870/using-an-html-entity-in-xslt-e-g-nbsp This works only for entities in the style sheet, not in the input XML doc.