Hakyll Pt. 2 – Generating a Sitemap XML File
This is part 2 of a multipart series where we will look at getting a website / blog set up with hakyll and customized a fair bit.
- Pt. 1 – Setup & Initial Customization
- Pt. 2 – Generating a Sitemap XML File
- Pt. 3 – Generating RSS and Atom XML Feeds
- Pt. 4 – Copying Static Files For Your Build
- Pt. 5 – Generating Custom Post Filenames From a Title Slug
- Pt. 6 – Pure Builds With Nix
- (wip) Pt. 7 – Customizing Markdown Compiler Options
Overview
Adding a Sitemap Template
A sitemap.xml template, just like the templates in the last
post,
receives context fields to work with (variables, essentially), and outputs the
result of applying said context to the template. Here is what our sitemap
template will look like today in our project’s templates/sitemap.xml
:
<?xml version="1.0" encoding="UTF-8"?>
urlset
< xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:news="http://www.google.com/schemas/sitemap-news/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
xmlns:mobile="http://www.google.com/schemas/sitemap-mobile/1.0"
xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"
xmlns:video="http://www.google.com/schemas/sitemap-video/1.1"
>url>
<loc>$root$</loc>
<changefreq>daily</changefreq>
<priority>1.0</priority>
<url>
</
$for(pages)$url>
<loc>$root$$url$</loc>
<lastmod>$if(updated)$$updated$$else$$if(date)$$date$$endif$$endif$</lastmod>
<changefreq>weekly</changefreq>
<priority>0.8</priority>
<url>
</
$endfor$urlset> </
Apart from the normal sitemap boilerplate, you can see root
, pages
, url
,
date
and updated
context fields. While date
and updated
would come from your metadata fields defined for a post, and the url
is built
from hakyll’s defaultContext
,
the root
and pages
fields are custom defined in what will be our very own
sitemapCtx
context. In the next section, we’ll use this template to generate
our sitemap.xml file.
Generating the Sitemap XML File
If you create a hakyll project from scratch,
you will start out with a few files that we can add to our sitemap:
* index.html
* about.rst
* contact.markdown
* posts/2015-08-12-spqr.html
* posts/2015-10-07-rosa-rosa-rosam.html
* posts/2015-11-28-carpe-diem.html
* posts/2015-12-07-tu-quoque.html
You should note that your site.hs
file also has the following:
main :: IO ()
= hakyllWith config $ do
main -- ...
"about.rst", "contact.markdown"]) $ do
match (fromList [$ setExtension "html"
route $ pandocCompiler
compile >>= loadAndApplyTemplate "templates/default.html" defaultContext
"posts/*" $ do
match $ setExtension "html"
route $ pandocCompiler
compile >>= loadAndApplyTemplate "templates/post.html" postCtx
>>= loadAndApplyTemplate "templates/default.html" postCtx
It’s important that you understand that any files you want to be loaded and sent
to templates/sitemap.xml
must first be match
ed and compile
d before the
sitemap can be built. If you don’t do this, you’ll pull your hair out wondering
why the file (or folder) you’re trying to include in the sitemap never shows up.
Now, there is something that we are going to emulate to make this sitemap a reality
(this should already be in site.hs
):
main :: IO ()
= hakyllWith config $ do
main -- ...
"archive.html"] $ do
create [
route idRoute$ do
compile <- recentFirst =<< loadAll "posts/*"
posts let archiveCtx =
"posts" postCtx (return posts) `mappend`
listField "title" "Archives" `mappend`
constField
defaultContext
""
makeItem >>= loadAndApplyTemplate "templates/archive.html" archiveCtx
>>= loadAndApplyTemplate "templates/default.html" archiveCtx
Reading the code above, this essentially says
1. here’s a file we want to create that does not yet exist (how create
differs from match
)
1. when you create the route, keep the filename (what idRoute
does)
1. when you compile, load all the posts, specify what the context to send
to each template will be, then make the item (the ""
is an identifier…
see the source
for more), then pass the context to the archive template and pass that on to
the default template, ultimately building up a full webpage from the
inside-out
Let’s change this 3-step rule to suit our needs before we wrangle the code. We
want our rules to say:
1. here’s a file we want to create that does not yet exist (sitemap.xml
)
1. when you create the route, keep the filename (what idRoute
does)
1. when you compile, load all the posts, load all the other pages, specify
what the context to send to each template will be, then make the item, then
pass the context to the sitemap template, ultimately building up an XML file
This is almost the same! Let’s write it:
main :: IO ()
= hakyllWith config $ do
main -- ...
"sitemap.xml"] $ do
create [
route idRoute$ do
compile -- load and sort the posts
<- recentFirst =<< loadAll "posts/*"
posts
-- load individual pages from a list (globs DO NOT work here)
<- loadAll (fromList ["about.rst", "contact.markdown"])
singlePages
-- mappend the posts and singlePages together
let pages = posts <> singlePages
-- create the `pages` field with the postCtx
-- and return the `pages` value for it
= listField "pages" postCtx (return pages)
sitemapCtx
-- make the item and apply our sitemap template
""
makeItem >>= loadAndApplyTemplate "templates/sitemap.xml" sitemapCtx
This is starting to look good! But what’s wrong here? Remember the root
context bits? We’re going to need to define what that is, and the best way that
I’ve found right now is simply as a String
; if you want to do something fancy
with configuration or reading it in dynamically, then go nuts.
root :: String
= "https://ourblog.com" root
With that defined, we can add it to our contexts:
main :: IO ()
= hakyllWith config $ do
main -- ...
"sitemap.xml"] $ do
create [
route idRoute$ do
compile <- recentFirst =<< loadAll "posts/*"
posts <- loadAll (fromList ["about.rst", "contact.markdown"])
singlePages let pages = posts <> singlePages
=
sitemapCtx "root" root <> -- here
constField "pages" postCtx (return pages)
listField ""
makeItem >>= loadAndApplyTemplate "templates/sitemap.xml" sitemapCtx
-- ...
postCtx :: Context String
=
postCtx "root" root <> -- here
constField "date" "%Y-%m-%d" <>
dateField defaultContext
Hint: if the <>
is throwing you for a loop, it’s defined as the same as thing
as mappend
.
See how we defined constField "root" root
in two places? We’re talking about
two different contexts here: the sitemap context and the post context. While
you could have the postCtx
be combined with the sitemapCtx
, thus giving the
pages
field access to the root
field, you probably want to use root
(and
perhaps other constants) wherever you work with posts, so adding them to
postCtx
for use everywhere seems like the right thing to do.
Once you’ve got all this, run the following to build (or rebuild) your
docs/sitemap.xml
file:
1. λ stack build
1. λ stack exec site clean
1. λ stack exec site build
Your docs/sitemap.xml
should now have all your pages defined in it!
Adding Other Pages and Directories
We’ve done some epic traveling in New Zealand and now want to include a bunch of
pages we’ve written in the sitemap. Those pages are:
* new-zealand/index.md
* new-zealand/otago/index.md
* new-zealand/otago/dunedin-area.md
* new-zealand/otago/queenstown-area.md
* new-zealand/otago/wanaka-area.md
First, we make sure that our pages get compiled (we’ll use postCtx
for them):
main :: IO ()
= hakyllWith config $ do
main -- ...
"new-zealand/**" $ do
match $ setExtension "html"
route $ pandocCompiler
compile >>= loadAndApplyTemplate "templates/post.html" postCtx
>>= loadAndApplyTemplate "templates/default.html" postCtx
And then we want to make sure we add them to our create
function:
main :: IO ()
= hakyllWith config $ do
main -- ... match code up here
"sitemap.xml"] $ do
create [
route idRoute$ do
compile <- recentFirst =<< loadAll "posts/*"
posts <- loadAll (fromList ["about.rst", "contact.markdown"])
singlePages <- loadAll "new-zealand/**" -- here
nzPages let pages = posts <> singlePages <> nzPages -- here
=
sitemapCtx "root" root <>
constField "pages" postCtx (return pages)
listField ""
makeItem >>= loadAndApplyTemplate "templates/sitemap.xml" sitemapCtx
I could not figure out how to mix globs (new-zealand/**
) in with individual
file paths (included in fromList
), so I had to load them separately; if you
figure out how, let me know!
Once you’ve got all this, run the following to rebuild your docs/sitemap.xml
file:
1. λ stack build
1. λ stack exec site rebuild
Wrapping Up
In this lesson we learned how to dynamically generate a sitemap.xml file using hakyll. Next time, we’ll use these same skills to generate our own RSS and Atom XML feeds.
Next up: * Pt. 3 – Generating RSS and Atom XML Feeds * Pt. 4 – Copying Static Files For Your Build * Pt. 5 – Generating Custom Post Filenames From a Title Slug * (wip) Pt. 6 – Customizing Markdown Compiler Options
Thank you for reading!
Robert