Guides / Sending and managing data / Manage your indices

Generate a Sitemap from an Algolia Index

Having great content and UX is only useful if people can find it. Search Engine Optimization (SEO) is a crucial traction strategy for most websites, and sitemaps play a significant role. A sitemap is a file that describes all the pages of your website, so that search engine bots can easily index your content. Sitemaps provide valuable information such as which pages to prioritize, or how often a page updates.

Sitemaps are particularly useful with sites or applications that load content asynchronously. That’s the case of most JavaScript-powered single-page applications and progressive web apps. That’s also the case when you’re using Algolia on the front-end.

Thanks to the flexibility of facets, Algolia can power navigation in addition to search result pages, which lets you implement dynamic category pages based on the data in your index. These are great candidates to add to your sitemap.

Prerequisites

Familiarity with Node.js

This tutorial assumes you’re familiar with Node.js, how it works, and how to create and run Node.js scripts. Make sure to install Node.js (v6+) in your environment.

If you want to learn more about Node.js before going further, you can start with the following resources.

Have an Algolia account

This tutorial assumes you already have an Algolia account. If not, you can create an account before getting started.

Dataset

For this tutorial, you’ll use an ecommerce dataset where each result is a product. All records have a categories attribute containing one or more categories.

To follow along, you can download the dataset and import it in your Algolia application.

Install dependencies

Before starting, you need to install algolia-sitemap in your project. This open source wrapper for algoliasearch lets you dynamically generate sitemaps from your Algolia indices.

1
npm install algolia-sitemap

Create a sitemap of all the records in your index

First, you need to create a sitemap with all your catalog products to make sure search engines know where to find them. You need to provide your Algolia credentials (application ID and browse-capable API key). Make sure that the key has the browse permission. You can generate one from the API keys tab of your Algolia dashboard.

1
2
3
4
5
6
7
8
9
10
11
const algoliaSitemap = require('algolia-sitemap');

const algoliaConfig = {
  appId: 'YourApplicationID',
  apiKey: 'YourBrowseCapableAPIKey', // Must have a `browse` permission
  indexName: 'your_index_name',
};

algoliaSitemap({
  algoliaConfig,
});

Then, you need to provide a hitToParams callback. You want to call this function for each record, allowing you to map a record to a sitemap entry. The return value of your callback must be an object whose attributes are the same as those of a <url> entry in a sitemap.xml file.

  • loc (required): The URL of the detail page
  • lastmod: The last modified date (ISO 8601)
  • priority: The priority of this page compared to other pages in your site (between 0 and 1)
  • changefreq: Describes how frequently the page is likely to change
  • alternates: Alternate versions of this link
  • alternates.languages: An array of enabled languages for this link
  • alternates.hitToURL: A function to transform a language into a URL

In your case, you can keep it simple and only output the loc property for each product. Make sure to modify the hitToParams function to match the content of your records. You also need to create a /sitemaps directory to output all generated sitemaps.

1
2
3
4
5
6
7
8
9
10
function hitToParams({ url }) {
  return { loc: url };
}

algoliaSitemap({
  algoliaConfig,
  hitToParams,
  sitemapLoc: 'https://example.com/sitemaps',
  outputFolder: 'sitemaps',
});

You can now run your script with Node.js to generate sitemaps in the /sitemaps directory. There are two types of sitemap files:

  • the sitemap-index file with a link to each sitemap,
  • and the sitemaps files with links to your products.

To ensure the generated sitemaps are correct, you can use any sitemap validator online such XML Sitemap Checker. Note that Algolia doesn’t run this website and can’t provide support for it.

Create a sitemap for categories

Now you can generate entries for category pages. Your records have a categories attribute that looks like the following:

1
2
3
{
  "categories": ["Mobile Phones", "Phones & Tablets"]
}

Here, the product belongs to two categories, so you can assume you can access each of them at https://example.com/CATEGORY_NAME.

You need to modify your hitToParams function so it returns an array of all the categories that belong to the given hit. Since categories likely apply to many records, you need to make sure not to add them to your sitemaps more than once.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
const alreadyAdded = {};

function hitToParams({ categories }) {
  const newCategories = categories.filter(
    (category) => !alreadyAdded[category]
  );

  if (!newCategories.length) {
    return false;
  }

  const locs = [];

  newCategories.forEach((category) => {
    alreadyAdded[category] = category;

    locs.push({
      loc: `https://example.com/${category}`,
    });
  });

  return locs;
}

For each hit, you check if they contain categories that you didn’t add to the sitemap yet, and you add them. This lets you save all your category pages to your sitemap.

Create a sitemap for both products and categories

You can edit your script to generate a sitemap for both your products and categories. To do so, all you need to do is push the current product along with its categories.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
function hitToParams({ categories, url }) {
  // ...

  newCategories.forEach((category) => {
    alreadyAdded[category] = category;
    alreadyAdded[url] = url;

    locs.push(
      ...[
        {
          loc: `https://example.com/${category}`,
        },
        { loc: url },
      ]
    );
  });

  // ...
}

Notify search engines of sitemap changes

Finally, you can let search engines know that your sitemap changed. Most search engines have a ping mechanism to inform them of a new sitemap, so you can perform this directly from your script.

For Google and Bing, all you need to do is send a GET request to a specific endpoint.

1
2
3
4
5
6
7
8
const endpoints = [
  'http://www.google.com/webmasters/sitemaps/ping?sitemap=http://example.com/sitemap.xml',
  'http://www.bing.com/webmaster/ping.aspx?siteMap=http://example.com/sitemap.xml',
];

Promise.all(endpoints.map(fetch)).then(() => {
  console.log('Done');
});
Did you find this page helpful?