Guides / Sending and managing data

Format and Structure Your Data

Sending data to Algolia

Before you can search your content with Algolia, you need to send your data to Algolia. Algolia doesn’t search in your original data source, but in the data you submit, which Algolia hosts on its servers.

Here’s what the data workflow looks like:

  1. You fetch data from your data source, such as a database or static files.
  2. You transform that data into JSON records.
  3. You send the records to Algolia, using one of the official API clients or the Algolia dashboard. This is the indexing step.

Fetching data from your data source

Algolia doesn’t search directly into your data source. You need to send the data to the Algolia servers so that the engine can search into it. Whether your data is in a database, a collection of XML files, spreadsheets, or any other format, it doesn’t matter. What you need to do first is extract data from one or several sources and format it in a way that Algolia recognizes.

You don’t need to extract everything. You should be selective about what goes in the record, gathering solely information that’s useful for building a search experience.

Transforming the extracted data

You need to transform the extracted data into a format that Algolia recognizes—JSON records.

Formatting and structuring your data is one of the most critical aspects in creating excellent search and relevance. Along with turning your data into JSON records, you also need to refine them. This includes reworking their content, adding new or computed attributes, creating filters, restructuring record relationships, and more.

Sending your records to Algolia

Once your records are ready, you need to send them to Algolia, using one of the Algolia API clients. Records are then stored into an Algolia index. This is all you need to do to start searching into your data.

To get started, you can use the Algolia dashboard, which allows you to paste in JSON records directly. You can also write a script to send your data using the Algolia API. This script runs on your computer or server, not on Algolia’s. You can write the script in any of the 11 languages that Algolia covers with the official API clients. Check out the quick start guide to learn more.

If the data you plan to send to Algolia lives on various websites, consider using the Algolia Crawler. With a little configuration, the Algolia Crawler directly extracts and uploads records from your sites to Algolia indices.

Algolia records

An Algolia record (or object) is a set of key-value pairs called attributes. Attributes don’t have to respect a schema and can change from one object to another.

You want your records to contain any information that facilitates search, display on the front end, filtering, or relevance. You can leave everything else out.

Here is an example record of all four kinds of attributes.

1
2
3
4
5
6
7
8
9
{
  "title": "Blackberry and blueberry pie",
  "description": "A delicious pie recipe that combines blueberries and blackberries.",
  "image": "https://yourdomain.com/blackberry-blueberry-pie.jpg",
  "likes": 1128,
  "sales": 284,
  "categories": ["pie", "dessert", "sweet"],
  "gluten_free": false
}

Attributes for searching

Attributes for searching are the ones that contain the terms that your end users look for. If you want to search for “blueberry pie recipe”, you need attributes that contain those words—in this example, title and description.

Any textual, descriptive attribute that contains searchable keywords, such as summaries, brands, or colors, can be useful for searching.

All attributes are searchable by default, which lets you search in your records right from the start. Yet, for better relevance and performance, you want to be more selective by setting only some attributes as searchable. You can do this with the searchable attributes feature. You can also use this setting to rank your searchable attributes, making some more relevant than others.

Attributes for displaying

If you want to display images in your results, you need an attribute with their URLs in your records. This way, Algolia can return them within search results, and you can use them directly in the front end.

Display attributes include anything that can be useful to see in the results. This can be images, titles, and descriptions, or even attributes that you would typically use for filtering and custom ranking, such as likes count or categories. Some display attributes can also be searchable, like title and description, some shouldn’t, like image or likes.

Attributes for filtering

If you want to search for a subset of records based on a category, for example pie recipes, gluten-free desserts, etc., you can set some attributes as filters. In this example, it would include categories and gluten_free.

Filterable attributes include booleans (like whether an item is public), lists (categories, tags), numeric attributes (price, rounded rating), and normalized text (colors, types, or any kind of enumerated types).

Attributes for customizing ranking

If you want the most popular recipes to appear first in your results, you can add business-metric attributes such as the number of likes, ratings, or sales. In the recipe example, this includes likes, sales, and gluten_free.

Attributes for custom ranking are either numeric or boolean.

Custom ranking strengthens and individualizes Algolia’s default ranking formula. Ranking contributes to the relevance of your search results. You can improve upon Algolia’s default ranking by including your own business metrics into the mix. To do this, you can use the custom ranking feature.

Simplifying your records

When creating a searchable index, you want to simplify your record structure as much as possible.

Each record should contain enough information to be discoverable on its own. You don’t have to follow relational database principles, such as not repeating data or creating hierarchical structures with primary and foreign keys. The Algolia engine returns records as results. Each object in your index should contain enough information for users to find it, and to allow a full display of its content.

Take a book dataset. You can have one record per book, which contains everything about the book, including chapters. The problem is, a search for a common word like “boat” would retrieve too many books, most of which aren’t about boats.

If you want to get better, more relevant matches, you need to break up chapters into individual records. This way, you can search for books on boats with far more relevance by searching through their chapters.

Algolia index

An index is a collection of records that you create as soon as you send records to Algolia. You can create several indices that contain different sets of objects. All indices live on Algolia’s servers.

Once you’ve pushed your data to Algolia, you can start thinking of how to organize your indices. This includes how many indices to have and how to configure each one. You can put all your records into a single index, or spread them across several indices. How you organize your indices depends on how you want to search and display your objects.

Did you find this page helpful?