Guides / Managing results / Must do / Custom ranking

Enrich Your Records with Google Analytics Data

You can use data from Google Analytics to enrich your records, and boost search results based on their popularity or any other metric available in Google Analytics. This can help to improve the relevance of your search.

Because both Algolia and GA are API-based platforms, you can create scripts that update Algolia records to include any data from Google Analytics. In this guide, we’ll create a script that fetches the ga:uniquePageViews metric from Google Analytics, and add it to a pageviews attribute in our Algolia records.

If you use our Crawler, you can easily link your Google Analytics account to it to enrich your records with analytics data faster and easier.

Google Credentials

The first step is to get the credentials to use the Google Analytics API. There are multiple ways to authenticate with Google’s APIs, but in this guide we will work with service accounts. These are specifically meant for server-to-server applications.

Create a service account

First, create a service account through Google API Console.

  1. Create a project (or select an existing one) from Google API Console.
  2. Activate the Analytics Reporting API in that project.
  3. In the Credentials section, create a new service account. You can skip the optional steps.
  4. Open the service account and click on Add key -> Create new key. Select JSON, and download the resulting JSON file.

Ga guide service account

Grant access to Google Analytics data

Now, an administrator of your Google Analytics account has to provide read access to your service account. To do this, follow these steps as an administrator:

  1. Log in to Google Analytics.
  2. Select the account, property, and view that contain the analytics of your website.
  3. Go to the Admin tab.
  4. In the View panel (on the right side of the screen), click on View User Management.
  5. Click the + button, then click Add users.
  6. Enter the email address of the service account that was generated when creating the service account.
  7. In the Permissions panel, make sure that only the “Read & Analyze” permission is enabled.
  8. Click the Add button to confirm.

Ga guide grant access

Get the view ID

To keep our script short, we’ll manually retrieve the view ID of your Google Analytics account.

  1. Return to the Admin tab of your Google Analytics View and click on View Settings.
  2. Copy the View ID number. You have to put this in the GA_PARAMETERS object of the script later on.

Ga guide view id

Prepare your records

The only constraint for this guide, is that your records must have an attribute that contains the full URL of their associated page. By default, our script expects the url attribute to hold this, but you can change that if needed.

Create the script

Now that we have our credentials and our records ready, we can create a script to fetch the Google Analytics data of our view and inject them into our existing records. For that, we use:

Fetch Google Analytics data

To fetch the Google Analytics (GA) data of our view, we will use the Google API endpoint batchGet method. In our script, we specify the following parameters for this method:

  • The viewId.
  • The dateRanges to specify the period we want to fetch the data for.
  • The dimensions: we need the ga:hostname and ga:pagePath to rebuild full URL of the page.
  • The metrics that you want to use in your custom ranking strategy. You can choose any metrics from the complete list of available metrics.
  • orderBys: In our example, we order the pages by uniquePageViews.
  • pageSize and pageToken are used for pagination. batchGet can only return a maximum of 100,000 rows. To fetch more rows, you need to use pagination.

In our example, we implement the GA data fetching in a MetricsFetcher class, which has two methods:

  • A next() method, which:
    • performs calls to batchGet,
    • keep tracks of the pagination cursor,
    • and transform the complex GA response into simple JSON objects.
  • A fetchAll() method, which:
    • iterates over the next() method, until the requested number of records are fetched,
    • and builds a big JSON object with the full URLs as keys. This will be useful in the second step of the script, where we retrieve the analytics data associated to a specific URL.

The MetricsFetcher also handles the authentication with the Service Account JSON file you’ve downloaded during the creation of your Google Service Account.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
class MetricsFetcher {
  constructor({ /* GA_PARAMETERS */ }) {
    // Setup the necessary auth scopes
    this.auth = new google.auth.GoogleAuth({ scopes: ['https://www.googleapis.com/auth/analytics.readonly'] });
    // ... variables initialization
  }

  async next() {
    const response = await analytics.reports.batchGet({
      auth: this.auth,
      requestBody: {
        // batchGet options
        // https://developers.google.com/analytics/devguides/reporting/core/v4/rest/v4/reports/batchGet
        // ...
      },
    });

    if (!response.data.reports || response.data.reports.length <= 0) {
      return { rows: [], hasMore: false };
    }

    const { rows } = response.data.reports[0].data;
    this.pageToken = response.data.reports[0].nextPageToken;
    if (this.remaining !== null && rows) {
      this.remaining -= rows.length;
    }

    const rowsClean = !rows
      ? []
      : rows.map(row => {
        return {
          hostname: row.dimensions[0],
          pagePath: row.dimensions[1],
          // append one key-value per metric, with integer value
          ...this.metrics.reduce(
            (keyVals, metric, idx) => ({
              ...keyVals,
              [metric]: parseInt(row.metrics[0].values[idx], 10),
            }),
            {}
          ),
        };
      });

    const hasMore =
      (this.remaining === null || this.remaining > 0) &&
      this.pageToken !== undefined &&
      this.pageToken !== null;

    return { rows: rowsClean, hasMore };
  }

  async fetchAll() {
    let counter = 0;
    let batch;
    const res = {};
    do {
      batch = await this.next();
      batch.rows.forEach(row => {
        ++counter;
        res[getPageUrl(row.hostname, row.pagePath)] = row;
      });
    } while (batch.hasMore);
    return res;
  }
}

Update Algolia records

To update our records, we use the browse and partialUpdateObjects methods. We browse our records one by one to see if the URL the record belongs to has any GA data. If so, we create a partial record with new fields containing the GA metrics. When we have browsed the whole index, we do a partialUpdateObjects with the partial records.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
const recordsToUpdate = [];
await index.browseObjects({
  query: '', // Empty query will match all records
  attributesToRetrieve: [URL_ATTRIBUTE],
  batch: batch => {
    batch.forEach(record => {
      if (allGAData[record[URL_ATTRIBUTE]]) {
        // Create a partial record with a new `pageviews` attribute
        recordsToUpdate.push({
          objectID: record.objectID,
          pageviews:
            allGAData[record[URL_ATTRIBUTE]][METRICS.uniquePageViews],
        });
      }
    });
  },
});

console.log(`Updating ${recordsToUpdate.length} records...`);
await index.partialUpdateObjects(recordsToUpdate, {
  createIfNotExists: false,
});

Finalize the script

We are using the Node.js clients of both API clients to build our final script. Our main function performs the two steps explained above; fetch our GA data with the MetricsFetcher, and update our Algolia records using browse and partialUpdateObjects. Note: You must update the variables in the // Script parameters section of the final script:

  • APP_ID, API_KEY and INDEX_NAME are your Algolia credentials.
  • URL_ATTRIBUTE is the name of the attribute in your Algolia records that contain the URL.
  • The GA_PARAMETERS object contains all parameters related to GA. You must include your viewId, but you can add and change other parameters as well.

With the script on your machine, you can run it by running the following command (assuming the script is named ga_connector.js). The path/to/your-service-account-file.json file has to point the JSON file downloaded when you created a Service Account:

1
GOOGLE_APPLICATION_CREDENTIALS=path/to/your-service-account-file.json node ga_connector.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
// ga_connector.js

const algoliasearch = require('algoliasearch');
const { google } = require('googleapis');
const analytics = google.analyticsreporting('v4');

// GA metrics. Reference doc: https://ga-dev-tools.appspot.com/dimensions-metrics-explorer/
const METRICS = {
  pageviews: 'ga:pageviews',
  uniquePageViews: 'ga:uniquePageViews',
  entranceRate: 'ga:entranceRate',
  // ...
};

// Script parameters
const APP_ID = 'YourApplicationID';
const API_KEY = 'YourAdminAPIKey';
const INDEX_NAME = 'indexName';
const URL_ATTRIBUTE = 'url';
const GA_PARAMETERS = {
  viewId: 123456789,
  metrics: [METRICS.uniquePageViews],
  startDate: '30daysAgo', // https://developers.google.com/analytics/devguides/reporting/core/v3/reference#startDate
  endDate: 'today',
  limit: 10000, // number of rows to fetch from GA
};
const PROTOCOL = 'https://';
const MAX_PAGE_SIZE = 100000; // 100000 is the max value, according to https://developers.google.com/analytics/devguides/reporting/core/v4/rest/v4/reports/batchGet

/**
* This class allows to fetch metrics of an unpredictable number of pages of analytics from GA API.
* Instances keep track of the fetching progress, for the next() method.
*
* It also handles authentication using a service account, if GOOGLE_APPLICATION_CREDENTIALS is correctly set.
* See https://github.com/googleapis/google-api-nodejs-client/#using-the-google_application_credentials-env-var
*
* @param {object} p           - (compound parameters).
* @param {number} p.viewId    - Identifier of the Google Analytics' view from which to fetch data.
* @param {string[]} p.metrics - Array of GA metric types, as defined in the METRICS dictionary (default = ['ga:uniquePageViews']).
* @param {number} p.limit     - Maximum number of URLs to fetch.
* @param {string} p.startDate - Period from which analytics must cover until endDate. (default: '7daysAgo').
* @param {string} p.endDate   - Period until which analytics must cover. (default: 'today').
*/
class MetricsFetcher {
  constructor({
    viewId,
    metrics = [METRICS.uniquePageViews],
    limit = 100000,
    startDate = '7daysAgo',
    endDate = 'today',
  }) {
    this.auth = new google.auth.GoogleAuth({ scopes: ['https://www.googleapis.com/auth/analytics.readonly'] });
    this.viewId = viewId.toString();
    this.metrics = metrics.includes(METRICS.uniquePageViews)
      ? metrics
      : metrics.concat([METRICS.uniquePageViews]);
    this.remaining = limit;
    this.startDate = startDate;
    this.endDate = endDate;
    this.pageToken = undefined;
  }

  /**
  * Get next page.
  * Data is ordered by most 'ga:uniquePageViews' first.
  *
  * @returns {Object} An object that contains { rows: [{ hostname: string, pagePath: string, [metricName]: number }], hasMore: boolean }.
  */
  async next() {
    console.log(`[GA] batchGet viewId=${this.viewId} remaining=${this.remaining}...`);

    const response = await analytics.reports.batchGet({
      auth: this.auth,
      requestBody: {
        reportRequests: [
        {
          viewId: this.viewId,
          dateRanges: [
            {
              startDate: this.startDate,
              endDate: this.endDate,
            },
          ],
          // reference doc: https://ga-dev-tools.appspot.com/dimensions-metrics-explorer/#ga:pagePath
          dimensions: [{ name: 'ga:hostname' }, { name: 'ga:pagePath' }],
          metrics: this.metrics.map(metric => ({ expression: metric })),
          orderBys: [
          {
            fieldName: METRICS.uniquePageViews,
            sortOrder: 'DESCENDING',
          },
          ],
          pageSize:
            this.remaining !== null && this.remaining < MAX_PAGE_SIZE
              ? this.remaining
              : MAX_PAGE_SIZE,
          pageToken: this.pageToken,
        },
        ],
      },
    });

    if (!response.data.reports || response.data.reports.length <= 0) {
      return { rows: [], hasMore: false };
    }

    const { rows } = response.data.reports[0].data;
    this.pageToken = response.data.reports[0].nextPageToken;
    if (this.remaining !== null && rows) {
      this.remaining -= rows.length;
    }

    console.log(`[GA] fetched a page of ${rows ? rows.length : 0} rows from Google Analytics (viewId=${this.viewId})`);

    const rowsClean = !rows
      ? []
      : rows.map(row => {
        return {
          hostname: row.dimensions[0],
          pagePath: row.dimensions[1],
          // append one key-value per metric, with integer value
          ...this.metrics.reduce(
            (keyVals, metric, idx) => ({
              ...keyVals,
              [metric]: parseInt(row.metrics[0].values[idx], 10),
            }),
            {}
          ),
        };
      });

    const hasMore =
      (this.remaining === null || this.remaining > 0) &&
      this.pageToken !== undefined &&
      this.pageToken !== null;

    return { rows: rowsClean, hasMore };
  }

  /**
  * Get All GA data of the view.
  *
  * @returns {Object} An object with the following structure:
  *   {
  *     `${url1}`: { hostname: string, pagePath: string, [metricName]: number },
  *     `${url2}`: { hostname: string, pagePath: string, [metricName]: number },
  *   }.
  */
  async fetchAll() {
    let counter = 0;
    let batch;
    const res = {};
    do {
      batch = await this.next();
      batch.rows.forEach(row => {
        ++counter;
        res[getPageUrl(row.hostname, row.pagePath)] = row;
      });
    } while (batch.hasMore);
    console.log(`=> fetched ${counter} rows from GA view: ${this.viewId} 📚`);
    return res;
  }
}

/**
* Helper to rebuild the complete page URL from GA data, as set in the Algolia records.
* Google Analytics doesn't store the protocol so we are re-adding it.
* Another way is to store the URLs without the protocol in your Algolia records.
*
* @param {string} hostname - The hostname returned by GA.
* @param {string} pagePath - The pagePath returned by GA.
* @returns {string} The full page URL.
*/
function getPageUrl(hostname, pagePath) {
  // When google analytics is misconfigured, the pagePath can contain a path prefixed by the hostname (www.example.com/)
  if (!pagePath.startsWith('/')) {
    // the path is prefixed by a host name => let's use it as-is
    return `${PROTOCOL}${pagePath}`;
  } else {
    // generate the full URL
    return `${PROTOCOL}${hostname}${pagePath}`;
  }
}

// Main
(async () => {
  console.log(`Fetching Google Analytics data...`);
  const metricsFetcher = new MetricsFetcher(GA_PARAMETERS);
  const allGAData = await metricsFetcher.fetchAll();

  console.log('Browsing Algolia index and creating partial records...');
  const client = algoliasearch(APP_ID, API_KEY);
  const index = client.initIndex(INDEX_NAME);
  const recordsToUpdate = [];
  await index.browseObjects({
    query: '', // Empty query will match all records
    attributesToRetrieve: [URL_ATTRIBUTE],
    batch: batch => {
      batch.forEach(record => {
        if (allGAData[record[URL_ATTRIBUTE]]) {
          // Create a partial record with a new `pageviews` attribute
          recordsToUpdate.push({
            objectID: record.objectID,
            pageviews:
              allGAData[record[URL_ATTRIBUTE]][METRICS.uniquePageViews],
          });
        }
      });
    },
  });

  console.log(`Updating ${recordsToUpdate.length} records...`);
  await index.partialUpdateObjects(recordsToUpdate, {
    createIfNotExists: false,
  });
  console.log('Records updated.');
})();
Did you find this page helpful?