Generate PowerPoint Slides from Org Mode Using Pandoc
Introduction
- To generate HTML content I rely on org mode's HTML export functionality.
- I had used pandoc (from John Macfarlane) for converting content in other formats to org mode
- For a project, I needed Microsoft PowerPoint slides
- As I mostly use org-mode for writing, I evaluated pandoc to generate PowerPoint
pptx files. Documentation:
Summary
- Using pandoc, you can generate PowerPoint slides easily
- pandoc supports a core set of slide types
- title slide, divider slide
- bulleted lists
- tables
- "stand alone" images and images with text, possibly bulleted
- You might want to use the latest pandoc version (I used 3.1), as there had been quite some improvements and bug fixes
- More detail at https://web222.webclient5.de/doc/swdev/emacs/orgmode/pandoc
- The test files can be found in a zip file ./org2pptx.zip. Please consult the file README.txt in the ZIP file.
Telling pandoc How the Resulting pptx File Shall Look
Pandoc has two main mechanisms to influence the look of the result document:
- Templates e.g. for HTML output
- Reference documents e.g. for docx and pptx output
The difference seems to be that templates can also have content that will be copied to the output document, whereas reference documents only provide "styles" that will influence the look of the output document.
Where Pandoc looks for Template, Reference and Defaults Files 1/2
By default, pandoc looks for the files /home/user/.local/share/pandoc/reference.docx or /home/user/.local/share/pandoc/reference.pptx as reference documents.
However, if you provide a --reference-doc=file command line argument, pandoc will look for file in the "resource-path" and use file as the reference document instead. The resource-path is the working directory by default. You can give an absolute path as file. There is also the possibility to specify the resource-path.
Where Pandoc looks for Template, Reference and Defaults Files 2/2
Pandoc's default search tree for templates, reference documents and defaults documents seem to be as follows (not exhaustive):
/home/user/.local/share/pandoc/
├ reference.docx
├ reference.potx
├ templates/
│ └ default.html
└ defaults/
└ FILE[.yaml]
Legacy .pandoc Directory in User's Home Directory
Alternatively /home/user/.pandoc/ can be used instead of /home/user/.local/share/pandoc/.
Note that the user has to create either of these directories himself.
Options to Run the Conversion
- Sometimes, I use org-babel to run pandoc directly from the org source file and invoke PowerPoint of LibreOffice to show the result
- For more complex projects, I use a
Makefile
- There is also "ox-pandoc", due to Alex FENTON and Taichi KAWABATA https://github.com/a-fent/ox-pandoc to help with integration. ox-pandoc also executes org-babel blocks and org macros before the pandoc conversion
Observations
Structure of the Presentation
- Whether or not to generate "divider" slides, is described in the org manual under "Structuring the slide show"
- To map from org-mode headings to slides, pandoc has the concept of "slide level"
- this is the org-mode heading level that maps to the slide headers
- can be specified on the command line e.g. as
--slide-level=2
Localised Master Slide Names Need To Be Renamed 1/2
- Per slide, pandoc picks an appropriate template slide from the slide master in the reference file
- pandoc looks for the standard English name of the template slide
- When using a non-English localised version of PowerPoint, you have to rename the slide master templates slides to:
- Title Slide
- Title and Content
- Section Header
- Two Content
- Content with Caption
- Blank
- Comparison
Localised Master Slide Names Need To Be Renamed 2/2
Showing Source Code or Similar Content
- Code blocks (i.e. with colon,
#+begin_src, #+begin_example) are indented. This might be configurable on the pandoc side
Tables
- pandoc uses the default table layout that you define in the PowerPoint reference file
- org-mode automatically right aligns columns with figures in Emacs, whereas pandoc needs a first row with
<r> type hints
- Columns in HTML tables have variable width. pandoc generates PowerPoint table columns with all the same widths
Images - Shape Types
- An org-mode link to a JPEG file becomes in PowerPoint a shape of type msoPicture (13), not msoLinkedPicture (11). I.e. the image becomes embedded in the PPTX file
- Similarly, a linked SVG file becomes msoGraphic (28), not msoLinkedGraphic (29)
Images - Scaling
- PowerPoint
- uses DPI meta information from the image file when you insert an image
- uses a default of 96 dpi for SVG images, which do not contain DPI information
- pandoc does not seem to use the DPI meta information. It scales images so that they fully fill the content area
- The aspect ratio of the images is kept
- For some SVG files, the aspect ratio got changed a bit by pandoc. It turned out, these SVG files had an UTF-8 BOM (byte order mark) of 0xef 0xbb 0xef as the first three bytes in the SVG file. This is fixed in recent versions of pandoc
Images - Captions and Links to Images
- Image captions provided in markdown input work, but do not work from org mode input in V3.1. It did work in V2.9.2.1 though.
- If an image link contains a link text, the image will not be embedded. Rather a link will be embedded, with an underlined link text
Images - Limitations
- You cannot have multiple images on the same slide
- If there is text right before before an image link, the image does not appear in the PPTX
- Text after an image link is put onto another slide. This might be unavoidable because there seems to be no notion of in-line images in PowerPoint
Slide Footer
The PowerPoint output contains the footer information as given in the PowerPoint reference file:
- Slide numbers are set automatically
- The date field is filled with the org-mode
#+DATE: field, even if the date was set to "manual" in the template
- The footer text field is not filled by pandoc. It will have the content as was provided by the PowerPoint reference file, if any. Footer information does not seem to be part of pandoc's internal data structure. There has been discussion about this on the Internet, related to
docx output. This might apply to pptx output as well.
Covered Functionality
Scope
Two factors determine, what PowerPoint functionality can be used from org mode via pandoc conversion:
- What PowerPoint functionality is accessible from pandoc
- What pandoc functionality is accessible from org mode
Regards the PowerPoint output format, the pandoc documentation elaborates about
- The needed template slide layouts
- "fenced code attributes" Extension
- A markdown code example
- Speaker notes
Markdown and Org Mode
- On the Internet, there are often markdown examples to convert using pandoc
- The pandoc manual itself seems to be written in markdown
- pandoc cannot access Emacs/org mode settings in
init.el
- pandoc might access in-buffer org-mode settings
Because of this, my first attempt to scope the supported functionality was to convert the provided markdown example to org mode using pandoc.
What about Generating DOCX Files?
Same Approach for docx as for pptx
Generating Microsoft Word docx files using pandoc follows the same mechanics as outlined above for pptx.
However I noticed the following issue (pandoc version 3.1.8):
The same org file referencing an SVG graphics would be processed by pandoc OK and properly show the graphics in the resulting pptx. However for docx, pandoc would show a warning:
Could not convert image process file.svg:
check that rsvg-convert is in the path.
Issue with Scalable Vector Graphics (SVG) Files
When opening with MS Word 2016, the resulting docx file did show the graphics. But when opened with LibreOffice Writer version 7.4.7.2, the graphics did not show, rather an empty frame with some place holder graphics.
rsvg-convert is a command line utility to convert SVG to PNG. In Debian Bookworm, it is contained in package librsvg2-bin.
On the other hand, this LibreOffice version does support inserted SVG graphics.
Appendix with Further Test Slides
Slide Title - Simple Text Paragraph with hard line break
Body text: Lorem ipsum dolor sit amet,
consectetur adipisici elit, sed eiusmod tempor incidunt ut labore et
dolore magna aliqua.
Level 3 Heading within a slide
Body text after heading: strong text, emphasized text,
strike through text underline text. code with wiggle, code
with equals sign. Superscript and subscript text.
Slide Title - Source Code Blocks
Code block with colons:
line 1
line 2
Code block using #+BEGIN_EXAMPLE … #+END_EXAMPLE
line 1
line 2
Slide Title - Bullets and Lists
Bullet - Level 1
- Bullet - Level 2. Lorem ipsum dolor sit amet, consectetur
adipisici elit, sed eiusmod tempor incidunt ut labore
Continuation of Level 2
- Bullet - Level 1
- Ordered List - Level 1
- Ordered List - Level 1
Section Divider Slide
Slide Title - Tables
| Column 1 |
Column 2 |
Column 3 |
| Some very very very very long description |
123 |
Comment |
| Shrt dscr |
956 |
|
| Sum |
1079 |
|
Slide Title - JPEG Images
Text before picture
Slide Title - JPEG Images
Text before picture
Slide Title - SVG Graphics
Text before picture
Slide Title - SVG Graphics
Slide Title - SVG Graphics with UTF-8 BOM (who uses a BOM anyway?)