Sending Records in Batches

Whether you’re using the API or Algolia dashboard, it’s best to send several records at a time instead of pushing them one by one. This has many benefits: it reduces network calls and speeds up indexing. Batching has a major impact on performance when you have a lot of records, but everyone should send indexing operations in batches whenever possible.

For example, imagine you’re fetching all data from your database and end up with a million records to index. That would be too big to send in one take, because Algolia limits you to 1 GB per request. Plus, sending that much data in a single network call would fail anyway before ever reaching the API. You might go for looping over each record and send them with the saveObjects method. The problem is that you would perform a million individual network calls, which would take way too long and saturate your Algolia cluster with as many indexing jobs.

A leaner approach is to split your collection of records into smaller collections, then send each chunk one by one. For optimal indexing performance, aim for a batch size of ~10 MB, which represents between 1,000 or 10,000 records depending on the average record size.

Batching records doesn’t reduce your operations count. Algolia counts indexing operations per record, not per method call, so from a pricing perspective, batching records is no different from indexing records one by one.

Using the API# A

When using the saveObjects method, the API client automatically chunks your records into batches of 1,000 objects.

Edit

Copy

1
2
3
4
5
6
7
$client = new \AlgoliaSearch\Client('AJ0P3S7DWQ', '••••••••••••••••••••ce1181300d403d21311d5bca9ef1e6fb');
$index = $client->initIndex('actors');

$records = json_decode(file_get_contents('actors.json'), true);

// Batching is done automatically by the API client
$index->saveObjects($records, ['autoGenerateObjectIDIfNotExist' => true]);

1
2
3
4
5
6
7
8
9
10
require 'json'
require 'algolia'

client  = Algolia::Search::Client.create('AJ0P3S7DWQ', '••••••••••••••••••••ce1181300d403d21311d5bca9ef1e6fb')
index   = client.init_index('actors')
file    = File.read('actors.json')
records = JSON.parse(file)

# The API client automatically batches your records
index.save_objects(records, { autoGenerateObjectIDIfNotExist: true })

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
const algoliasearch = require('algoliasearch')
const fs = require('fs');
const StreamArray = require('stream-json/streamers/StreamArray');

const client = algoliasearch('AJ0P3S7DWQ', '••••••••••••••••••••ce1181300d403d21311d5bca9ef1e6fb');
const index = client.initIndex('actors');

const stream = fs.createReadStream('actors.json').pipe(StreamArray.withParser());
let chunks = [];

stream
  .on('data', ({ value }) => {
    chunks.push(value);
    if (chunks.length === 10000) {
      stream.pause();
      index
        .saveObjects(chunks, { autoGenerateObjectIDIfNotExist: true })
        .then(() => {
          chunks = [];
          stream.resume();
        })
        .catch(console.error);
    }
  })
  .on('end', () => {
    if (chunks.length) {
      index.saveObjects(chunks, { 
        autoGenerateObjectIDIfNotExist: true
      }).catch(console.error);
    }
  })
  .on('error', err => console.error(err));

1
2
3
4
5
6
7
8
9
10
11
import json
from algoliasearch.search_client import SearchClient

client = SearchClient.create('AJ0P3S7DWQ', '••••••••••••••••••••ce1181300d403d21311d5bca9ef1e6fb')
index = client.init_index('actors')

with open('actors.json') as f:
    records = json.load(f)

# Batching is done automatically by the API client
index.save_objects(records, {'autoGenerateObjectIDIfNotExist': True});

1
2
3
4
5
6
7
8
9
10
let filePath = Bundle.main.path(forResource: "actors", ofType: "json")!
let contentData = FileManager.default.contents(atPath: filePath)!
let records = try! JSONSerialization.jsonObject(with: contentData, options: []) as! [[String: Any]]

let chunkSize = 10000

for beginIndex in stride(from: 0, to: records.count, by: chunkSize) {
  let endIndex = min(beginIndex + chunkSize, records.count)
  index.addObjects(Array(records[beginIndex..<endIndex]))
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
using System.IO;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;

public class Actor
{
  public string Name { get; set; }
  public string ObjectId { get; set; }
  public int Rating { get; set; }
  public string ImagePath { get; set; }
  public string AlternativePath { get; set; }
}

AlgoliaClient client = new AlgoliaClient("AJ0P3S7DWQ", "••••••••••••••••••••ce1181300d403d21311d5bca9ef1e6fb");
Index index = client.InitIndex("actors");

// Don't forget to set the naming strategy of the serializer to handle Pascal/Camel casing
IEnumerable<Actor> actors = JsonConvert.DeserializeObject<IEnumerable<Actor>>(File.ReadAllText("actors.json"));

// Batching/Chunking is done automatically by the API client
bool autoGenerateObjectIDIfNotExist = true;
index.SaveObjects(actors, autoGenerateObjectIDIfNotExist);

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import java.io.FileInputStream;
import java.io.InputStream;
import com.fasterxml.jackson.databind.ObjectMapper;

public class Actor {
    // Getters/Setters ommitted
    private String name;
    private String objectId;
    private int rating;
    private String imagePath;
    private String alternativePath;
}

// Synchronous version
SearchClient client =
DefaultSearchClient.create("AJ0P3S7DWQ", "••••••••••••••••••••ce1181300d403d21311d5bca9ef1e6fb");

SearchIndex<Actor> index = client.initIndex("actors", Actor.class);

ObjectMapper objectMapper = Defaults.getObjectMapper();

InputStream input = new FileInputStream("actors.json");
Actor[] actors = objectMapper.readValue(input, Actor[].class);

// Batching/Chuking is done automatically by the API client
boolean autoGenerateObjectIDIfNotExist = true;
index.saveObjects(Arrays.asList(actors), autoGenerateObjectIDIfNotExist);

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
package main

import (
    "encoding/json"
    "io/ioutil"

    "github.com/algolia/algoliasearch-client-go/v3/algolia/search"
)

type Actor struct {
    Name            string `json:"name"`
    Rating          int    `json:"rating"`
    ImagePath       string `json:"image_path"`
    AlternativeName string `json:"alternative_name"`
    ObjectID        string `json:"objectID"`
}

func main() {
    client := search.NewClient("AJ0P3S7DWQ", "••••••••••••••••••••ce1181300d403d21311d5bca9ef1e6fb")
    index := client.InitIndex("actors")

    var actors []Actor
    data, _ := ioutil.ReadFile("actors.json")
    _ = json.Unmarshal(data, &actors)

    // Batching is done automatically by the API client
    _, _ = index.SaveObjects(actors)
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
package algolia

import java.io.FileInputStream

import algolia.AlgoliaDsl._
import org.json4s._
import org.json4s.native.JsonMethods._

import scala.concurrent.ExecutionContext.Implicits.global

case class Actor(name: String,
                 rating: Int,
                 image_path: String,
                 alternative_path: Option[String],
                 objectID: String)

object Main {

  def main(args: Array[String]): Unit = {
    val client = new AlgoliaClient("AJ0P3S7DWQ", "••••••••••••••••••••ce1181300d403d21311d5bca9ef1e6fb")

    val records = parse(new FileInputStream("actors.json")).extract[Seq[Actor]]

    records
      .grouped(10000)
      .map(g => {
        client.execute {
          index into "actors" objects g
        }
      })
  }

}

1
2
3
4
5
6
7
8
9
10
val client = ClientSearch(ApplicationID("AJ0P3S7DWQ"), APIKey("••••••••••••••••••••ce1181300d403d21311d5bca9ef1e6fb"))
val index = client.initIndex(IndexName("actors"))
val string = File("actors.json").readText()
val actors = Json.plain.parse(JsonObjectSerializer.list, string)

index.apply {
    actors
        .chunked(1000)
        .map { saveObjects(it) }
        .wait() // Wait for all indexing operations to complete.

With this approach, you would make 100 API calls instead of 1,000,000. Depending on the size of your records and your network speed, you could create bigger or smaller chunks.

How to Scaling to Larger Datasets

Importing with the API

Using the dashboard# A

You can also send your records in your Algolia dashboard.

Add records manually#

Go to your dashboard, select the Data Sources icon and then select your index.
Click the Add records tab and select Add manually.
Copy/paste your chunk in the JSON editor, then click Push record.
Repeat for all your chunks.

Upload a file#

Go to your dashboard and select your index.
Click Manage current index then Upload file.
Either click the file upload area to select the file where your chunk is, or drag it on it.
Upload starts automatically.
Repeat for all your chunks.

How to Importing from the Dashboard

Did you find this page helpful?

Sending Records in Batches

On this page

Using the API# A

Using the dashboard# A

Add records manually#

Upload a file#

On this page

Sending Records in Batches

On this page

Using the API# Found an issue? Edit this guide A Edit this guide

Using the dashboard# Found an issue? Edit this guide A Edit this guide

Add records manually#

Upload a file#

On this page

Using the API# A

Using the dashboard# A