Guides / Managing results / Optimize search results / Typo tolerance

Oct. 13, 2021

Preventing Typosquatting

For example, imagine you want to attract Twitter users. An example of typosquatting is the account @BarakObama, which has 15.8k followers, but isn’t @BarackObama (Barack Obama’s official account). Because Algolia prioritizes exact matches, typing “BarakObama” would return the “BarakObama” record first, regardless of custom ranking.

Not all use cases need to prevent typosquatting.

However, if this is your case, which often happens when you have to deal with user-generated content, you may need to put a strategy in place.

Dataset example# A

Back to the Twitter example: assume you have an index called twitter_accounts that looks like this:

Copy
[
  {
    "twitter_handle": "BarackObama",
    "nb_followers": 103500000
  },
  {
    "twitter_handle": "BarakObama",
    "nb_followers": 15800
  }
]

Even if you set descending custom ranking on nb_followers, because Algolia prioritizes exact results, the @BarakObama account would benefit from the traffic coming from users making a typo when searching for the official Barack Obama account.

You can short-circuit this issue by leveraging Algolia’s sort-by attribute feature.

Updating the dataset# A

The recommended solution is to add a boolean attributes that separates popular records from the rest. For example, you could add something like is_verified_account = true, or is_popular = true, and sort on that attribute.

For this approach to work well, the number of records with is_popular or is_verified_account set to true should be a small subset of the dataset (around 1% of the dataset maximum).

You have a popularity metric (nb_followers), so you can use it to define a rule that determines if a record is popular or not. In this example, you could say that a user is popular if they have more than a million followers.

You can use the browse method to update the index:

Edit

Copy

1
2
3
4
5
6
7
8
$records = [];

foreach ($index->browseObjects() as $hit) {
  $hit['is_popular'] = ($hit['nb_followers'] > 1000000);
  $records[] = $hit;
}

$index->saveObjects($records);

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
index
  .browseObjects({
    batch(hits) {
      records = records.concat(hits.map(hit => {
        return {
          ...hit,
          is_popular: hit.nb_followers > 1000000
        };
      }));
    }
  })
  .then(() => index.saveObjects(records))
  .then(({ objectIDs }) => {
    console.log(objectIDs);
  });

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
var hits = index.BrowseAll(new Query());
List<JObject> records = new List<JObject>();

foreach (var hit in hits)
{
  hit["is_popular"] = (long)hit["nb_followers"] > 1000000;
  records.Add(hit);

  if (records.Count > 1000)
  {
    index.AddObjects(hits);
    records.Clear();
  }
}

index.AddObjects(records);

1
2
3
4
5
6
7
8
9
10
11
12
SearchIndex<Record> index = client.initIndex("YourIndexName", Record.class);

IndexIterable<Record> iterator = index.browseObjects(new BrowseIndexQuery());
ArrayList<Record> records = new ArrayList<>();

iterator.forEach(record -> {
    boolean isPopular = record.getNbFollowers() > 1000000;
    record.setPopular(isPopular);
    records.add(record);
});

index.saveObjects(records);

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
type User struct {
  ObjectID    string `json:"objectID"`
  NbFollowers int    `json:"nb_followers"`
  IsPopular   bool   `json:"is_popular"`
}

var users []User
var user User
it, _ := index.BrowseObjects()

for {
  _, err := it.Next(&user)
  if err != nil {
    if err == io.EOF {
      break
    }
    // error handling
  }

  user.IsPopular = user.NbFollowers > 1000000
  users = append(users, user)
}

res, err := index.SaveObjects(users)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
case class User(objectID: String, nb_followers: Int, is_popular: Option[Boolean]) extends ObjectID

implicit val ec: ExecutionContextExecutor = ExecutionContext.global
implicit val awaitDuration = 10 seconds

val syncHelper = AlgoliaSyncHelper(client)

val futures =
  syncHelper
    .browse[User]("myIndex", Query())
    .map { hits =>
      hits
        .map { hit =>
          User(
            objectID = hit.objectID,
            nb_followers = hit.nb_followers,
            is_popular = Some(hit.nb_followers > 1000000),
          )
        }
    }
    .map { batch =>
      client.execute {
        index into "myIndex" objects batch
      }
    }

Await.ready(Future.sequence(futures), awaitDuration)
System.exit(0)

1
2
3
4
5
6
7
8
9
10
11
val records = index.browseObjects().flatMap { response ->
    response.hits.map {
        val map = it.toMutableMap()
        val nbFollowers = it.getValue("nb_followers").primitive.long

        map["is_popular"] = JsonLiteral(nbFollowers > 1000000)
        JsonObject(map)
    }
}

index.saveObjects(records)

Once updated, your dataset would look like this:

Copy
[
  {
    "twitter_handle": "BarackObama",
    "nb_followers": 103500000,
    "is_popular": true
  },
  {
    "twitter_handle": "BarakObama",
    "nb_followers": 15800,
    "is_popular": false
  }
]

By default, the first rule in Algolia’s ranking formula is typo (which, for the vast majority of use cases, is a sensible default value). To prevent typosquatting, you need to add another ranking signal that’s higher than the typo rule. This is what Algolia commonly refers to as a sort-by attribute. When your ranking has been applied, searching for “BarakObama” will first return the “BarackObama” record.

Using the API# A

To set a sort-by attribute, you need to use the ranking with the setSettings method.

Edit

Copy

1
2
3
4
5
6
7
8
9
10
11
12
13
$index->setSettings([
  'ranking' => [
    "desc(is_popular)",
    "typo",
    "geo",
    "words",
    "filters",
    "proximity",
    "attribute",
    "exact",
    "custom"
  ]
]);

1
2
3
4
5
6
7
8
9
10
11
12
13
index.set_settings({
  ranking: [
    'desc(is_popular)',
    'typo',
    'geo',
    'words',
    'filters',
    'proximity',
    'attribute',
    'exact',
    'custom'
  ]
})

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
index.setSettings({
  ranking: [
    "desc(is_popular)",
    "typo",
    "geo",
    "words",
    "filters",
    "proximity",
    "attribute",
    "exact",
    "custom"
  ]
}).then(() => {
  // done
});

1
2
3
4
5
6
7
8
9
10
11
12
13
index.set_settings({
    'ranking': [
        'desc(is_popular)',
        'typo',
        'geo',
        'words',
        'filters',
        'proximity',
        'attribute',
        'exact',
        'custom'
    ]
})

1
2
3
4
5
6
7
8
9
10
11
12
13
index.setSettings([
  "ranking": [
    "desc(is_popular)",
    "typo",
    "geo",
    "words",
    "filters",
    "proximity",
    "attribute",
    "exact",
    "custom"
  ]
])

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
IndexSettings settings = new IndexSettings
{
    CustomRanking = new List<string>
    {
        "desc(is_popular)",
        "typo",
        "geo",
        "words",
        "filters",
        "proximity",
        "attribute",
        "exact",
        "custom"
    }
};

index.SetSettings(settings);

// Asynchronous
await index.SetSettingsAsync(settings);

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
index.setSettings(
new IndexSettings()
  .setRanking(
    Arrays.asList(
      "desc(is_popular)",
      "typo",
      "geo",
      "words",
      "filters",
      "proximity",
      "attribute",
      "exact",
      "custom"
    )
  )
);

1
2
3
4
5
6
7
8
9
10
11
12
13
index.SetSettings(search.Settings{
    Ranking: opt.Ranking(
        "desc(is_popular)",
        "typo",
        "geo",
        "words",
        "filters",
        "proximity",
        "attribute",
        "exact",
        "custom",
    ),
})

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
client.execute {
  setSettings of "myIndex" `with` IndexSettings(
    ranking = Some(Seq(
      Ranking.desc("is_popular"),
      typo,
      geo,
      words,
      filters,
      proximity,
      attribute,
      exact,
      custom,
    )),
  )
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
val settings = settings {
    ranking {
        +Desc("is_popular")
        +Typo
        +Geo
        +Words
        +Filters
        +Proximity
        +Attribute
        +Exact
        +Custom
    }
}

index.setSettings(settings)

Using the Dashboard# A

You can also set a sort-by attribute in your Algolia dashboard.

Select the Search product icon on your dashboard and then select your index.
Click the Ranking tab.
In the Ranking Formula & Custom Ranking section, click the Add sort-by attribute button and select is_popular.
Don’t forget to save your changes.

Did you find this page helpful?

Preventing Typosquatting

On this page

Dataset example# A

Updating the dataset# A

Using the API# A

Using the Dashboard# A

On this page

Preventing Typosquatting

On this page

Dataset example# Found an issue? Edit this guide A Edit this guide

Updating the dataset# Found an issue? Edit this guide A Edit this guide

Using the API# Found an issue? Edit this guide A Edit this guide

Using the Dashboard# Found an issue? Edit this guide A Edit this guide

On this page

Dataset example# A

Updating the dataset# A

Using the API# A

Using the Dashboard# A