Multiple Distributions with Jekyll, Pandoc & Markdown
Some time ago, I got the idea that I should write a book about something. I wasn’t quite sure of the topic but I thought it would be an interesting exercise.
While contemplating that idea, I reasoned that, if I wrote a book, I would want to use some interesting technologies. I like using GitHub for source control and I’ve been interested in learning Markdown. It would also be nice if I could write the content once and use it in multiple places. Depending on the subject matter, I may even be interested in open sourcing the materials for the book.
With those constraints in mind, I set out to learn more about technologies that could solve my problem.
After some research, I found that Jekyll could be used to generate a static HTML site from Markdown, and Pandoc can generate an e-book from Markdown. The only trick that remained was organizing the content in a manner that would allow easy re-use of content.
What follows is the content of a presentation that I gave for TahoeJS.
For the presentation, I built a website, e-book and an HTML presentation (using Reveal.js) that are all generated off of the same set of Markdown content files. The code and all of the content may be found in my GitHub repository.
The Problem
One of the big challenges that we face today is distributing content across multiple channels without duplicating that content.
Duplication of content:
- introduces the possibility that out-of-date content will be published by accident
- increases the amount of work needed for maintenance
- is unnecessary!
Markdown
“Markdown is a lightweight markup language with plain text formatting syntax designed so that it can be converted to HTML and many other formats…”
–Wikipedia
Markdown gives us the opportunity to write our content once and then output it in different formats as necessary.
Jekyll
“Jekyll is a simple, blog-aware, static site generator for personal, project, or organization sites.”
–Wikipedia
Jekyll is a static site generator. It uses a system of templates, and source files written in Markdown, HTML, CSS, Liquid, etc. to build a static site built out of HTML, CSS and JavaScript. Because it is “blog-aware”, Jekyll allows you to quickly create blog posts by simply writing your posts in the correct folder.
Static sites can be served cheaply with good performance.
Pandoc
Pandoc is a tool for turning content files (HTML, Markdown, etc) into static documents suitable for publishing.
Output formats include:
- Office Open XML
- OpenDocument
- HTML
- Wiki markup
- ebooks
- and various TeX formats (through which it can produce a PDF)
A Solution
Use Markdown to create the content and, Jekyll and Pandoc to format it.
Since they both support it as an input, we can write our content once in Markdown and use Jekyll and Pandoc to create the output in the correct format.
This site and the presentation serve as an example of how this can be accomplished.
Site Structure
The following site structure is recommended:
. ├── _config.yml ├── _data │ └── content.csv ├── _epub │ ├── create.sh │ └── title.txt ├── _includes │ └── en │ └── jekyll │ ├── content.md │ └── title.md ├── _layouts ├── _posts ├── _site ├── assets ├── css ├── index.md ├── js ├── other.md ├── package.json └── plugin
Jekyll is somewhat particular about how files are arranged. Jekyll favors convention over configuration, so it expects files the files that it will operate on to be in particular locations.
Pandoc is more forgiving. Generally, you are just providing a list of files to the Pandoc command line tool, so it doesn’t particularly care where those files are. But not all of the code that will be used to build our Jekyll site should make it into our Pandoc ebook.
Structure Considerations
There are a few things to consider before building your site and book:
- consider which parts of your content will appear in book and on site
- break up the content into small blocks based on if or where it will appear in different publications
- for example, an introductory paragraph might appear on the About page of your site, the Introduction of your book or in the speaker notes of a presentation. It should be in a separate block than other introductory content that might appear in other parts of your materials
- take advantage of Jekyll’s collections to put content together that can be iterated over
- for example, chapters of your book might be organized into Jekyll collections so that you can simply iterate through them on your site