Case Study for an Online Clothing Company
On this page
This case study helps you understand how to configure Algolia using the dashboard: it’s equally valuable for business users with no technical background and developers. The main goal is to help you follow the proper steps and get consistently great search results.
The real-world case study gives a better picture of the configuration process.
Selecting searchable attributes
Attributes are the key-value pairs that compose records, and the records compose an index.
You need to select the most relevant attributes a user can search for. For example, on an ecommerce search, searchable attributes could be name
or description
.
Initial list of searchable attributes
- name
- description
- price
- image_url
From the initial list, you wouldn’t select price
or image_url
. These attributes are helpful for sorting and display. A user wouldn’t search for those.
Selecting your searchable attributes
- name
- description
Add color
and brand
as searchable attributes. These attributes are often used for filtering and faceting, but people also use them in search queries.
Adding more attributes
- description
- name
- color
- brand
To further improve relevance, you need to move the most relevant attributes to the top of the list. In this case, users tend to search for a color or a brand as well as an item’s name or description. It then makes more sense to have these attributes higher in the list.
Users tend to search with descriptive terms instead of exact item names. To account for it, put description
first on the list, before name
.
Listing the most relevant attributes first
- description
- name
- brand
- color
You could find that brand
and color
are just as relevant to your users. You would then put them both on the same line, separated by a comma.
Listing two equal ranking attributes
- description
- name
- brand, color
Long attributes like description
can be too “noisy” and generate false positives in relevance. In such cases, you can create a derived, shorter attribute and pick just the search-relevant terms. For example, create a short_description
attribute by selecting the applicable search terms from description
. To create this shorter attribute, your engineering team needs to configure your data so that it’s available in Algolia’s dashboard.
Replace a “noisy” attribute with a more efficient one
- short_description
- name
- brand, color
You might want to ensure that all the words in short_description
have the same importance. This means that words in the beginning, middle, and end of the description are uniformly relevant. You can do this with the unordered
modifier. You can do the same for any multi-word attribute, like item_name
.
Final list of searchable attributes
- unordered(short_description)
- unordered(name)
- brand, color
Setting custom ranking and sorting
You start with Algolia’s out-of-the-box ranking formula.
Default ranking
- Default ranking formula
You shouldn’t change or remove this ranking formula. It works out of the box for 99% of use cases.
Setting custom ranking
You can now customize your ranking by adding some business metrics. It’s typical to add popularity attributes, such as the number of likes or best sellers. It’s also worth using Click and Conversion Analytics to rank the products with the most successful conversion rate.
Custom ranking (main index)
- Default ranking formula
- number_of_sells
- popularity
- conversion_rate
Sorting by a specific attribute
You might want to allow your users to sort by a specific attribute. To do so, you can leverage Algolia’s sorting capability, which requires you to create a replica index for each sort.
For example, you can sort by price, from highest-priced items to lowest. To do this, you can create a new replica index called products_sorted_by_price_descending
.
Sort by price, highest to lowest (replica one)
- price (sort-by, descending)
- Default ranking formula
To reverse the order and sort by ascending price, you need to add another replica index, products_sorted_by_price_ascending
.
Sort by price, lowest to highest (replica two)
- price (sort-by, ascending)
- Default ranking formula
Do the same for the date with a third replica, and sort from newest to oldest.
Sort by date, newest to oldest (replica three) –> ### Sort by date, newest to oldest (replica three)
- date (sort-by, descending)
- Default ranking formula
You now have four indices: a main index and three replicas:
- Your main index with custom ranking (by best-sellers, popularity, and conversion rate),
- Your three replicas, which are:
- Sorted by price, descending,
- Sorted by price, ascending,
- Sorted by date, descending.
If your plan includes it, Relevant sorting offers the best user experience for most ecommerce use cases. This sort provides the most relevant results instead of sorting on attributes like price and date. Relevant sorting also doesn’t require data duplication, keeping your application leaner.
For example, for an ascending price sort, the query “red skirt” wouldn’t return items containing “red shirt” because they’re less relevant. Relevant sorting removes noise for users.
Creating buckets to combine sorts with custom ranking
You might want to add a field like featured
, a true
or false
value that forces all featured items to show up first.
Show featured items first
- featured (sort-by, descending)
- Default ranking formula
- number_of_sells
- popularity
- conversion_rate
By doing this, you’re creating two buckets of results, where each bucket is individually ranked. If you have 100 results, 50 of which are featured
, the first bucket contains all featured items. The featured items rank by textual relevance and custom ranking. The second bucket—the 50 non-featured items—rank by textual relevance and custom ranking.
One consequence with buckets is that you may have the most textually relevant record appear in the 51st position because it’s not in the first bucket—that is, it’s not a featured item.
Promoting content with good user engagement
You might want to prioritize content that has more likes and comments. There are two ways to approach this, depending on how much prioritization you want the content to have: Rules and custom ranking.
Sorting on likes or number of comments isn’t recommended since all other relevancy criteria are lost.
Rules
Use Rules to move items that have a higher number or likes or comments to the top of your results. You can enhance this further by using optional filters with filter scoring.
Custom ranking
Use custom ranking for likes and/or number of comments. For example, assuming you had these attributes in your records, use the likes/comments post date (postDate
) and the number of comments or likes (engagement
) as sortable attributes. To do this on the dashboard, go to Index > Ranking and Sorting and click the Add sort-by attribute button (set the attribute to Descending sort order).
It’s important to consider postDate
since, otherwise, all old comments have equal relevance to newer ones. For example, imagine you have four items:
- One from last year with enormous engagement (100K likes and comments),
- One from yesterday with great engagement (50K likes and comments)
- One from today with good engagement (10K likes and comments)
- One from today with decent engagement (5K likes and comments).
If postDate
appears before engagement
in your custom ranking order, post 3 and then post 4 will appear at the top of these results.
However, post 2 is relatively recent and has much higher engagement, so it should sit before 3 and 4. The problem is that a single day is too precise for this case.
Reducing precision
You can reduce precision by creating buckets. For instance, instead of ranking by postDate
, you could assign a “recency score”:
- 0 if it’s a week old or less
- 1 if it’s between one and two weeks old
- Up to a score where recency no longer matters (it may not matter at all if a post is over two years old).
You could also reduce precision for engagement: instead of ranking on a specific number of likes or comments, you could round to the nearest dozen, hundred, or thousand.
In such a case, your record would look something like this.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
[
{
"title": "Post #1",
"postDate": 1569226869, // last year
"likes_and_comments": 109023, // exact computation
"recency": 5, // a recency score based on an arbitrary scale of your choosing
"engagement": 109000 // rounded engagement
},
{
"title": "Post #2",
"postDate": 1601281269, // yesterday
"likes_and_comments": 57986,
"recency": 0, // a week old or less
"engagement": 58000
},
{
"title": "Post #3",
"postDate"": 1601367669, // today
"likes_and_comments": 11356,
"recency": 0,
"engagement": 11000
},
{
"title": "Post #4",
"postDate": 1601367669, // today
"likes_and_comments": 5439,
"recency": 0,
"engagement": 5000
}
]
Your custom ranking would be:
- recency
- engagement
- postDate
- likes_and_comments
Self-optimizing content
You might want to ensure that items with the most clicks get boosted towards the top of search results.
To do this:
- Create a
numberOfClicks
attribute in your data - Collect click events from your users
- Update the
numberOfClicks
values in your records with the collected data - Use custom ranking on
numberOfClicks
to boost items with a higher number of clicks (after relevance has been taken into account).
Something to watch out for, though, is that this is self-reinforcing change. The higher these items are in the results, the more likely they are to be clicked, meaning that they will gradually rank higher and higher. You may want to use Algolia’s A/B testing feature to check the impact of a change like this.
Using unique objectID
s to preserve your data
Some of your records may have duplicate objectID
s. This causes problems when updating your data.
Duplicate objectIDs
- objectID=12345 (red t-shirt)
- objectID=12345 (Nike shoes)
- objectID=67890 (Levi jeans, slim)
- objectID=67890 (Levi jeans, slimmed)
Here, it looks like two objects share the same objectID
of 12345
. If you try to index them both, the index retains just one. Make sure that each item has a unique objectID
.
The items with objectID:67890
look like duplicate records. You should remove one of them.
Fixed, no duplicate objectIDs
- objectID=12345 (red t-shirt)
- objectID=23456 (Nike shoes)
- objectID=67890 (Levi jeans, slim)
Language settings
If you haven’t set up your language to that of your users, you should do so. Also, make sure that you’ve set ignorePlurals and removeStopWords to true
.
Defining synonyms
If you’re selling coats, you may notice that some people search for “coats” and others for “jackets”. In your store, these are the same.
Your synonyms
- coat=jacket
You may also want to add a synonym for shoes and boots so that users looking for shoes can find boots as well. To do this, create a synonym for “shoes” and “boots.”
Your synonyms
- coat=jacket
- shoes=boots
Keep this list up to date with as many synonyms as necessary, but not too many. A long list of synonyms can become unmanageable and create false positives.
Front-end UI concerns
Highlighting
Using highlighting lets your users instantly see why a record is present in the results.
Without highlighting
Query: nike
Results
- Nike Air is the best
- Magic Nike is built to last
With highlighting
Query: nike
Results
- Nike Air is the best
- Magic Nike is built to last
Instant search results and as-you-type search experience
Implement instant search results which comes out of the box with the InstantSearch libraries with a default for as-you-type search.
Setting up facets
Do you have helpful categories in your index, like colors or brands? Add them as facets and display them on your UI to let your users filter their results.
Staying up to date
Keep your clients and libraries up to date. For example:
- Ensure that you update both your front-end library and search client to the latest version, including patches.
- Ensure that you update your indexing client and any other relevant libraries to the latest version.