Generate a Sitemap from an Algolia Index
On this page
Having great content and UX is only useful if people can find it. Search Engine Optimization (SEO) is a crucial traction strategy for most websites, and sitemaps play a significant role. A sitemap is a file that describes all the pages of your website, so that search engine bots can easily index your content. Sitemaps provide valuable information such as which pages to prioritize, or how often a page updates.
Sitemaps are particularly useful with sites or applications that load content asynchronously. That’s the case of most JavaScript-powered single-page applications and progressive web apps. That’s also the case when you’re using Algolia on the front-end.
Thanks to the flexibility of facets, Algolia can power navigation in addition to search result pages, which lets you implement dynamic category pages based on the data in your index. These are great candidates to add to your sitemap.
Prerequisites
Familiarity with Node.js
This tutorial assumes you’re familiar with Node.js, how it works, and how to create and run Node.js scripts. Make sure to install Node.js (v6+) in your environment.
If you want to learn more about Node.js before going further, you can start with the following resources.
Have an Algolia account
This tutorial assumes you already have an Algolia account. If not, you can create an account before getting started.
Dataset
For this tutorial, you’ll use an ecommerce dataset where each result is a product. All records have a categories
attribute containing one or more categories.
To follow along, you can download the dataset and import it in your Algolia application.
Install dependencies
Before starting, you need to install algolia-sitemap
in your project. This open source wrapper for algoliasearch lets you dynamically generate sitemaps from your Algolia indices.
1
npm install algolia-sitemap
Create a sitemap of all the records in your index
First, you need to create a sitemap with all your catalog products to make sure search engines know where to find them. You need to provide your Algolia credentials (application ID and browse-capable API key). Make sure that the key has the browse
permission. You can generate one from the API keys tab of your Algolia dashboard.
1
2
3
4
5
6
7
8
9
10
11
const algoliaSitemap = require('algolia-sitemap');
const algoliaConfig = {
appId: 'YourApplicationID',
apiKey: 'YourBrowseCapableAPIKey', // Must have a `browse` permission
indexName: 'your_index_name',
};
algoliaSitemap({
algoliaConfig,
});
Then, you need to provide a hitToParams
callback. You want to call this function for each record, allowing you to map a record to a sitemap entry. The return value of your callback must be an object whose attributes are the same as those of a <url>
entry in a sitemap.xml
file.
loc
(required): The URL of the detail pagelastmod
: The last modified date (ISO 8601)priority
: The priority of this page compared to other pages in your site (between 0 and 1)changefreq
: Describes how frequently the page is likely to changealternates
: Alternate versions of this linkalternates.languages
: An array of enabled languages for this linkalternates.hitToURL
: A function to transform a language into a URL
In your case, you can keep it simple and only output the loc
property for each product. Make sure to modify the hitToParams
function to match the content of your records. You also need to create a /sitemaps
directory to output all generated sitemaps.
1
2
3
4
5
6
7
8
9
10
function hitToParams({ url }) {
return { loc: url };
}
algoliaSitemap({
algoliaConfig,
hitToParams,
sitemapLoc: 'https://example.com/sitemaps',
outputFolder: 'sitemaps',
});
You can now run your script with Node.js to generate sitemaps in the /sitemaps
directory. There are two types of sitemap files:
- the
sitemap-index
file with a link to each sitemap, - and the sitemaps files with links to your products.
To ensure the generated sitemaps are correct, you can use any sitemap validator online such XML Sitemap Checker. Note that Algolia doesn’t run this website and can’t provide support for it.
Create a sitemap for categories
Now you can generate entries for category pages. Your records have a categories
attribute that looks like the following:
1
2
3
{
"categories": ["Mobile Phones", "Phones & Tablets"]
}
Here, the product belongs to two categories, so you can assume you can access each of them at https://example.com/CATEGORY_NAME
.
You need to modify your hitToParams
function so it returns an array of all the categories that belong to the given hit. Since categories likely apply to many records, you need to make sure not to add them to your sitemaps more than once.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
const alreadyAdded = {};
function hitToParams({ categories }) {
const newCategories = categories.filter(
(category) => !alreadyAdded[category]
);
if (!newCategories.length) {
return false;
}
const locs = [];
newCategories.forEach((category) => {
alreadyAdded[category] = category;
locs.push({
loc: `https://example.com/${category}`,
});
});
return locs;
}
For each hit, you check if they contain categories that you didn’t add to the sitemap yet, and you add them. This lets you save all your category pages to your sitemap.
Create a sitemap for both products and categories
You can edit your script to generate a sitemap for both your products and categories. To do so, all you need to do is push the current product along with its categories.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
function hitToParams({ categories, url }) {
// ...
newCategories.forEach((category) => {
alreadyAdded[category] = category;
alreadyAdded[url] = url;
locs.push(
...[
{
loc: `https://example.com/${category}`,
},
{ loc: url },
]
);
});
// ...
}
Notify search engines of sitemap changes
Finally, you can let search engines know that your sitemap changed. Most search engines have a ping
mechanism to inform them of a new sitemap, so you can perform this directly from your script.
For Google and Bing, all you need to do is send a GET
request to a specific endpoint.
1
2
3
4
5
6
7
8
const endpoints = [
'http://www.google.com/webmasters/sitemaps/ping?sitemap=http://example.com/sitemap.xml',
'http://www.bing.com/webmaster/ping.aspx?siteMap=http://example.com/sitemap.xml',
];
Promise.all(endpoints.map(fetch)).then(() => {
console.log('Done');
});