As I mentioned in my previous post, my blog is often both a place to throw ideas out into the world, and a place to mess around with screwy ideas, and one of those ideas I’ve been messing around with is using Jekyll’s abilities as a static site generator to produce multiple outputs from a single source.

The first experiment in this area involved my cookbook, wherein I took a bunch of individual markdown files and crammed them together into something that pandoc can use to generate PDF and even EPUB outputs.

My second experiment in this area was with my CV. The challenge with something like a CV is that the layout requirements are pretty complex and don’t fit well with a basic template-and-markdown model. As a result, I ended up having to take a less orthodox approach to this project.

Of course the first challenge I faced was deciding on a layout for my CV. I spent quite a bit of time exploring various examples out there before I settled on something I liked. And I’ll be honest, I’m not convinced I’ve nailed it. I’m particularly concerned that the design I settled on doesn’t work well with the automated resume scraping tools that have taken over the HR world. As a result, I have no doubt I’ll have to revisit the design at some point.

Once I knew what I wanted my CV to look like, I needed to settle on a set of output formats, and I selected three:

  1. HTML, so I could publish it directly on my site,
  2. PDF, because I wanted a clean, beautiful version I could share via email, and
  3. Word, in order to remain compatible with all those scraping tools.

The question, then, was: how do I store the source content, and how do I generate the outputs?

The big challenge is that I knew I’d need a ton of control over the generated output, due to the complexity of the layout and the need to make compromises for the various formats I was using. So using straight markdown as I did with my cookbook was unlikely to work very well.

In the end I decided to take a bit of an usual approach and use Jekyll data files combined with liquid templates to get the result I was looking for.

In Jekyll, data files, which can take a few different formats–YAML, CSV, JSON, etc–can be stored in the _data directory, and those files are then loaded by Jekyll and the contents stored in the variable namespace. This data can then be used by template, plugins, or other components. For example, a template can use a data file as part of the process for generating outputs.

The purpose of the data file, in this context, is to store the source material that would populate my CV. In particular, I created a custom YAML file structure that contains all the content for my CV, including everything from my name, to contact details, to my employment timeline and my publication list.

Then, for each output format, I could create a template that used the contents of that data file to populate the contents of the document.

For the HTML output, the template generates straight HTML that is combined with a bunch of CSS to ensure the output matches the overall aesthetic for my blog. To me the coolest part about this output is that the HTML itself is nearly semantic in its structure, with the CSS doing all the heavy lifting. It really is incredible what you can do with modern stylesheets, especially if you’re willing to ignore old browsers.

For the PDF output I used Latex, and as is always the case with Latex, getting a result I was happy with involved a lot of aggravation and cargo-culting to get things looking the way I wanted. Huge shout-out to the AwesomeCV package, which made this job so much easier (after some amount of customization)! Honestly, if I had to pick my favourite output, it’s gotta be this one. If you ask me it looks simply fantastic!

Finally, there’s the Word output.

And this is where things get a little nutty.

First off, you may not be aware that the DOCX file format is also known as the Office Open XML format. And the key acronym, here, is “XML”.

If you take a DOCX file and unzip it, you’ll find it’s actually composed of a bunch of XML files representing various parts of the document. Now, these files are… a hot mess. I don’t recommend trying to build a DOCX file by hand. But they are XML, which means they can be generated by Jekyll, including the use of liquid templates.

So, I started out by using Word to produce the document layout I wanted. This took… a lot of fiddling. One of the most critical things to do, at this stage, was to create a set of document styles representing the various parts of the layout I wanted to generate, so as to simplify the template generation later (since I could just reference those styles rather than manually copying formatting specifications all over the place).

Then I tore open the DOCX and reverse engineered what was going on in the file. I then turned the whole thing into a liquid template, generating elements and substituting content based on the YAML definition of my CV.

After Jekyll spits out the resulting XML documents, I then run a post-build command to zip up the contents back into a DOCX! Voila!

Like I said, it’s a little nutty.

So, what were some key challenges in this project?

As with my cookbook, hands down the hardest part was creating layouts for each different output I wanted to generate. The HTML design, the Latex document, and the DOCX output, were all completely separate design efforts. So while I was able to write the content once (which was brilliant!), I had to do a ton of work to generate the target layouts.

And each one of these formats has its weird challenges: HTML and CSS can be challenging for a variety of reasons; Latex is simply a nightmare, particularly if you dive into it only occasionally; and you can only imagine what it took to get the Word output looking right.

This also means that changing the layout is going to be a gigantic PITA, as I’ll have to update not one layout but three. This is certainly one of the reasons I haven’t worked hard to redesign for document scrapers. In hindsight I really should’ve manually created a Word output, validated it against those types of tools, and then generalized the layout later.

Now, that being said, using this approach, I was able to create a perfect, custom output for each format. Rather than trying to goad some automated transformation system into converting HTML into Word, or Word into a PDF, I was able to use native formats and best-of-breed tools to generate exactly what I wanted.

And man is it nice to just be able to update the contents of my resume without monkeying around with the layouts! I recently made some minor tweaks and all I had to do was edit the YAML file, rebuild, and everything updated perfectly. Nice!