Enrich your records with Google Analytics data
On this page
You can use data from Google Analytics to enrich your records, and boost search results based on their popularity or any other metric available in Google Analytics. This can help to improve the relevance of your search.
Because both Algolia and GA are API-based platforms, you can create scripts that update Algolia records to include any data from Google Analytics. In this guide, weโll create a script that fetches the ga:uniquePageViews
metric from Google Analytics, and add it to a pageviews
attribute in our Algolia records.
If you use our Crawler, you can easily link your Google Analytics account to it to enrich your records with analytics data faster and easier.
Google Credentials
The first step is to get the credentials to use the Google Analytics API. There are multiple ways to authenticate with Googleโs APIs, but in this guide we will work with service accounts. These are specifically meant for server-to-server applications.
Create a service account
First, create a service account through Google API Console.
- Create a project (or select an existing one) from Google API Console.
- Activate the Analytics Reporting API in that project.
- In the Credentials section, create a new service account. You can skip the optional steps.
- Open the service account and click on Add key -> Create new key. Select JSON, and download the resulting JSON file.
Grant access to Google Analytics data
Now, an administrator of your Google Analytics account has to provide read access to your service account. To do this, follow these steps as an administrator:
- Log in to Google Analytics.
- Select the account, property, and view that contain the analytics of your website.
- Go to the Admin tab.
- In the View panel (on the right side of the screen), click on View User Management.
- Click the + button, then click Add users.
- Enter the email address of the service account that was generated when creating the service account.
- In the Permissions panel, make sure that only the โRead & Analyzeโ permission is enabled.
- Click the Add button to confirm.
Get the view ID
To keep our script short, weโll manually retrieve the view ID of your Google Analytics account.
- Return to the Admin tab of your Google Analytics View and click on View Settings.
- Copy the View ID number. You have to put this in the
GA_PARAMETERS
object of the script later on.
Prepare your records
The only constraint for this guide, is that your records must have an attribute that contains the full URL of their associated page. By default, our script expects the url
attribute to hold this, but you can change that if needed.
Create the script
Now that we have our credentials and our records ready, we can create a script to fetch the Google Analytics data of our view and inject them into our existing records. For that, we use:
Fetch Google Analytics data
To fetch the Google Analytics (GA) data of our view, we will use the Google API endpoint batchGet
method. In our script, we specify the following parameters for this method:
- The
viewId
. - The
dateRanges
to specify the period we want to fetch the data for. - The
dimensions
: we need thega:hostname
andga:pagePath
to rebuild full URL of the page. - The
metrics
that you want to use in your custom ranking strategy. You can choose any metrics from the complete list of available metrics. orderBys
: In our example, we order the pages byuniquePageViews
.pageSize
andpageToken
are used for pagination.batchGet
can only return a maximum of 100,000 rows. To fetch more rows, you need to use pagination.
In our example, we implement the GA data fetching in a MetricsFetcher
class, which has two methods:
- A
next()
method, which:- performs calls to
batchGet
, - keep tracks of the pagination cursor,
- and transform the complex GA response into simple JSON objects.
- performs calls to
- A
fetchAll()
method, which:- iterates over the
next()
method, until the requested number of records are fetched, - and builds a big JSON object with the full URLs as keys. This will be useful in the second step of the script, where we retrieve the analytics data associated to a specific URL.
- iterates over the
The MetricsFetcher
also handles the authentication with the Service Account JSON file youโve downloaded during the creation of your Google Service Account.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
class MetricsFetcher {
constructor({ /* GA_PARAMETERS */ }) {
// Setup the necessary auth scopes
this.auth = new google.auth.GoogleAuth({ scopes: ['https://www.googleapis.com/auth/analytics.readonly'] });
// ... variables initialization
}
async next() {
const response = await analytics.reports.batchGet({
auth: this.auth,
requestBody: {
// batchGet options
// https://developers.google.com/analytics/devguides/reporting/core/v4/rest/v4/reports/batchGet
// ...
},
});
if (!response.data.reports || response.data.reports.length <= 0) {
return { rows: [], hasMore: false };
}
const { rows } = response.data.reports[0].data;
this.pageToken = response.data.reports[0].nextPageToken;
if (this.remaining !== null && rows) {
this.remaining -= rows.length;
}
const rowsClean = !rows
? []
: rows.map(row => {
return {
hostname: row.dimensions[0],
pagePath: row.dimensions[1],
// append one key-value per metric, with integer value
...this.metrics.reduce(
(keyVals, metric, idx) => ({
...keyVals,
[metric]: parseInt(row.metrics[0].values[idx], 10),
}),
{}
),
};
});
const hasMore =
(this.remaining === null || this.remaining > 0) &&
this.pageToken !== undefined &&
this.pageToken !== null;
return { rows: rowsClean, hasMore };
}
async fetchAll() {
let counter = 0;
let batch;
const res = {};
do {
batch = await this.next();
batch.rows.forEach(row => {
++counter;
res[getPageUrl(row.hostname, row.pagePath)] = row;
});
} while (batch.hasMore);
return res;
}
}
Update Algolia records
To update our records, we use the browse
and partialUpdateObjects
methods.
We browse our records one by one to see if the URL the record belongs to has any GA data. If so, we create a partial record with new fields containing the GA metrics. When we have browsed the whole index, we do a partialUpdateObjects
with the partial records.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
const recordsToUpdate = [];
await index.browseObjects({
query: '', // Empty query will match all records
attributesToRetrieve: [URL_ATTRIBUTE],
batch: batch => {
batch.forEach(record => {
if (allGAData[record[URL_ATTRIBUTE]]) {
// Create a partial record with a new `pageviews` attribute
recordsToUpdate.push({
objectID: record.objectID,
pageviews:
allGAData[record[URL_ATTRIBUTE]][METRICS.uniquePageViews],
});
}
});
},
});
console.log(`Updating ${recordsToUpdate.length} records...`);
await index.partialUpdateObjects(recordsToUpdate, {
createIfNotExists: false,
});
Finalize the script
We are using the Node.js clients of both API clients to build our final script. Our main
function performs the two steps explained above; fetch our GA data with the MetricsFetcher
, and update our Algolia records using browse
and partialUpdateObjects
.
Note: You must update the variables in the // Script parameters
section of the final script:
APP_ID
,API_KEY
andINDEX_NAME
are your Algolia credentials.URL_ATTRIBUTE
is the name of the attribute in your Algolia records that contain the URL.- The
GA_PARAMETERS
object contains all parameters related to GA. You must include yourviewId
, but you can add and change other parameters as well.
With the script on your machine, you can run it by running the following command (assuming the script is named ga_connector.js
). The path/to/your-service-account-file.json
file has to point the JSON file downloaded when you created a Service Account:
1
GOOGLE_APPLICATION_CREDENTIALS=path/to/your-service-account-file.json node ga_connector.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
// ga_connector.js
const algoliasearch = require('algoliasearch');
const { google } = require('googleapis');
const analytics = google.analyticsreporting('v4');
// GA metrics. Reference doc: https://ga-dev-tools.appspot.com/dimensions-metrics-explorer/
const METRICS = {
pageviews: 'ga:pageviews',
uniquePageViews: 'ga:uniquePageViews',
entranceRate: 'ga:entranceRate',
// ...
};
// Script parameters
const APP_ID = 'YourApplicationID';
const API_KEY = 'YourWriteAPIKey';
const INDEX_NAME = 'indexName';
const URL_ATTRIBUTE = 'url';
const GA_PARAMETERS = {
viewId: 123456789,
metrics: [METRICS.uniquePageViews],
startDate: '30daysAgo', // https://developers.google.com/analytics/devguides/reporting/core/v3/reference#startDate
endDate: 'today',
limit: 10000, // number of rows to fetch from GA
};
const PROTOCOL = 'https://';
const MAX_PAGE_SIZE = 100000; // 100000 is the max value, according to https://developers.google.com/analytics/devguides/reporting/core/v4/rest/v4/reports/batchGet
/**
* This class allows to fetch metrics of an unpredictable number of pages of analytics from GA API.
* Instances keep track of the fetching progress, for the next() method.
*
* It also handles authentication using a service account, if GOOGLE_APPLICATION_CREDENTIALS is correctly set.
* See https://github.com/googleapis/google-api-nodejs-client/#using-the-google_application_credentials-env-var
*
* @param {object} p - (compound parameters).
* @param {number} p.viewId - Identifier of the Google Analytics' view from which to fetch data.
* @param {string[]} p.metrics - Array of GA metric types, as defined in the METRICS dictionary (default = ['ga:uniquePageViews']).
* @param {number} p.limit - Maximum number of URLs to fetch.
* @param {string} p.startDate - Period from which analytics must cover until endDate. (default: '7daysAgo').
* @param {string} p.endDate - Period until which analytics must cover. (default: 'today').
*/
class MetricsFetcher {
constructor({
viewId,
metrics = [METRICS.uniquePageViews],
limit = 100000,
startDate = '7daysAgo',
endDate = 'today',
}) {
this.auth = new google.auth.GoogleAuth({ scopes: ['https://www.googleapis.com/auth/analytics.readonly'] });
this.viewId = viewId.toString();
this.metrics = metrics.includes(METRICS.uniquePageViews)
? metrics
: metrics.concat([METRICS.uniquePageViews]);
this.remaining = limit;
this.startDate = startDate;
this.endDate = endDate;
this.pageToken = undefined;
}
/**
* Get next page.
* Data is ordered by most 'ga:uniquePageViews' first.
*
* @returns {Object} An object that contains { rows: [{ hostname: string, pagePath: string, [metricName]: number }], hasMore: boolean }.
*/
async next() {
console.log(`[GA] batchGet viewId=${this.viewId} remaining=${this.remaining}...`);
const response = await analytics.reports.batchGet({
auth: this.auth,
requestBody: {
reportRequests: [
{
viewId: this.viewId,
dateRanges: [
{
startDate: this.startDate,
endDate: this.endDate,
},
],
// reference doc: https://ga-dev-tools.appspot.com/dimensions-metrics-explorer/#ga:pagePath
dimensions: [{ name: 'ga:hostname' }, { name: 'ga:pagePath' }],
metrics: this.metrics.map(metric => ({ expression: metric })),
orderBys: [
{
fieldName: METRICS.uniquePageViews,
sortOrder: 'DESCENDING',
},
],
pageSize:
this.remaining !== null && this.remaining < MAX_PAGE_SIZE
? this.remaining
: MAX_PAGE_SIZE,
pageToken: this.pageToken,
},
],
},
});
if (!response.data.reports || response.data.reports.length <= 0) {
return { rows: [], hasMore: false };
}
const { rows } = response.data.reports[0].data;
this.pageToken = response.data.reports[0].nextPageToken;
if (this.remaining !== null && rows) {
this.remaining -= rows.length;
}
console.log(`[GA] fetched a page of ${rows ? rows.length : 0} rows from Google Analytics (viewId=${this.viewId})`);
const rowsClean = !rows
? []
: rows.map(row => {
return {
hostname: row.dimensions[0],
pagePath: row.dimensions[1],
// append one key-value per metric, with integer value
...this.metrics.reduce(
(keyVals, metric, idx) => ({
...keyVals,
[metric]: parseInt(row.metrics[0].values[idx], 10),
}),
{}
),
};
});
const hasMore =
(this.remaining === null || this.remaining > 0) &&
this.pageToken !== undefined &&
this.pageToken !== null;
return { rows: rowsClean, hasMore };
}
/**
* Get All GA data of the view.
*
* @returns {Object} An object with the following structure:
* {
* `${url1}`: { hostname: string, pagePath: string, [metricName]: number },
* `${url2}`: { hostname: string, pagePath: string, [metricName]: number },
* }.
*/
async fetchAll() {
let counter = 0;
let batch;
const res = {};
do {
batch = await this.next();
batch.rows.forEach(row => {
++counter;
res[getPageUrl(row.hostname, row.pagePath)] = row;
});
} while (batch.hasMore);
console.log(`=> fetched ${counter} rows from GA view: ${this.viewId} ๐`);
return res;
}
}
/**
* Helper to rebuild the complete page URL from GA data, as set in the Algolia records.
* Google Analytics doesn't store the protocol so we are re-adding it.
* Another way is to store the URLs without the protocol in your Algolia records.
*
* @param {string} hostname - The hostname returned by GA.
* @param {string} pagePath - The pagePath returned by GA.
* @returns {string} The full page URL.
*/
function getPageUrl(hostname, pagePath) {
// When google analytics is misconfigured, the pagePath can contain a path prefixed by the hostname (www.example.com/)
if (!pagePath.startsWith('/')) {
// the path is prefixed by a host name => let's use it as-is
return `${PROTOCOL}${pagePath}`;
} else {
// generate the full URL
return `${PROTOCOL}${hostname}${pagePath}`;
}
}
// Main
(async () => {
console.log(`Fetching Google Analytics data...`);
const metricsFetcher = new MetricsFetcher(GA_PARAMETERS);
const allGAData = await metricsFetcher.fetchAll();
console.log('Browsing Algolia index and creating partial records...');
const client = algoliasearch(APP_ID, API_KEY);
const index = client.initIndex(INDEX_NAME);
const recordsToUpdate = [];
await index.browseObjects({
query: '', // Empty query will match all records
attributesToRetrieve: [URL_ATTRIBUTE],
batch: batch => {
batch.forEach(record => {
if (allGAData[record[URL_ATTRIBUTE]]) {
// Create a partial record with a new `pageviews` attribute
recordsToUpdate.push({
objectID: record.objectID,
pageviews:
allGAData[record[URL_ATTRIBUTE]][METRICS.uniquePageViews],
});
}
});
},
});
console.log(`Updating ${recordsToUpdate.length} records...`);
await index.partialUpdateObjects(recordsToUpdate, {
createIfNotExists: false,
});
console.log('Records updated.');
})();