The raw tracking data in BigQuery is one of the most interesting and powerful features offered by GA360 (the paid version of Google Analytics). It allows you to query the tracking data without any kind of limitations or sampling. This article shows basic examples on how to use BigQuery to extract information from the GA data. 

Of course, data in BigQuery can be queried using the basic web interface on https://bigquery.cloud.google.com. Nevertheless, it's much more convenient to use the BigQuery API in conjunction with your favorite programming language. For the examples shown below, I used Google Cloud Datalab - a useful web-based tool which combines Python code, documentation and visualization. 

A full description of the GA360 BigQuery export schema can be found on https://support.google.com/analytics/answer/3437719.

 

Import necessary libraries

import datalab.bigquery as bq
import pandas as pd
import matplotlib.pyplot as plt

 

Sample 1: Get number of users, sessions and pageviews for a given timeframe

Notes:

  • in BigQuery the visitId is kept over the 0am break
  • the GA interface only counts sessions that have at least one interaction 
%%sql
SELECT 
 date,
 EXACT_COUNT_DISTINCT(fullVisitorId) AS Users,
 EXACT_COUNT_DISTINCT(CONCAT(STRING(fullVisitorId), STRING(visitStartTime))) AS GASessions,
 EXACT_COUNT_DISTINCT(CONCAT(STRING(fullVisitorId), STRING(visitId))) AS BigQuerySessions,
 SUM(IF(hits.type ='PAGE', 1, 0)) AS PageViews 
FROM TABLE_DATE_RANGE([myProject:123456789.ga_sessions_], 
TIMESTAMP('2018-06-01'),
TIMESTAMP('2018-06-05'))
GROUP BY date

dateUsersGASessionsBigQuerySessionsPageViews
20180604 23.623 28.730 28.730 74.206
20180601 15.326 19.526 19.526 58.923
20180602 14.322 18.080 18.080 52.225
20180603 16.301 21.035 21.035 61.926
20180605 19.137 23.870 23.870 65.074

 

Sample 2: Show sessions grouped by source and medium

The totals.visits field contains 1 for visits with at least one interaction hit. 

%%sql
SELECT 
    trafficSource.source,
    trafficSource.medium,
    SUM(totals.visits) as Sessions
FROM TABLE_DATE_RANGE([myProject:123456789.ga_sessions_], 
TIMESTAMP('2018-06-01'),
TIMESTAMP('2018-06-05'))
GROUP BY 1,2
ORDER BY 3 DESC
LIMIT 10

 

trafficSource_sourcetrafficSource_mediumSessions
google organic 35.731
google cpc 30.458
newsletter email 13.384
(direct) (none) 12.449
myPartner1 partner 5.278
myPartner2 partner 2.022
bing cpc 1.649
bing organic 1.382
myPartner3 partner 1.371
m.facebook.com referral 1.078

 

Sample 3: Show pageviews grouped by page

Pageviews have the hits.type field set to "PAGE". 

%%sql
SELECT   
 hits.page.pagePath,
 COUNT(*) as pageviews 
FROM TABLE_DATE_RANGE([myProject:123456789.ga_sessions_], 
TIMESTAMP('2018-06-01'),
TIMESTAMP('2018-06-05'))
WHERE hits.type = 'PAGE'
GROUP BY hits.page.pagePath
ORDER BY pageviews DESC
LIMIT 10

 

hits_page_pagePathpageviews
/shop/page1 32.030
/home/ 23.035
/shop/customer/account/login/ 19.960
/ 15.730
/news 14.756
/shop/zubehoer 12.288
/shop/ 11.801
/page3 10.084
/shop/checkout/cart/ 10.038
/shop/ 7.790

 

Sample 4: Group data by custom dimension

Custom dimension values are stored in a nested field as index, value pairs. User and session custom dimensions are stored on the top level of the session grouped data - hit level custom dimensions are stored for each hit in the hits.customDimensions object.

%%sql
SELECT pagetype, SUM(pageviews) AS pageviews
FROM 
(
 SELECT   
 1 AS pageviews,
 IF(hits.customDimensions.index = 4, hits.customDimensions.value, null) AS pagetype,
 FROM TABLE_DATE_RANGE([myProject:123456789.ga_sessions_], 
TIMESTAMP('2018-06-01'),
TIMESTAMP('2018-06-05'))
 WHERE hits.type='PAGE'
 HAVING Pagetype IS NOT NULL
)
GROUP BY pagetype
ORDER BY pageviews DESC

 

pagetypepageviews
content 70.177
listing 52.203
homepage 35.239
product 34.134

 

Sample 5: Load data into pandas dataframe and plot

BigQuery can be loaded in the widely used pandas dataframe format for further processing or plotting.

 

First Step: Define Query

%%sql --module myExampleQuery
SELECT 
 date,
 EXACT_COUNT_DISTINCT(fullVisitorId) AS Users,
 EXACT_COUNT_DISTINCT(CONCAT(STRING(fullVisitorId), STRING(visitStartTime))) AS GASessions,
 EXACT_COUNT_DISTINCT(CONCAT(STRING(fullVisitorId), STRING(visitId))) AS BigQuerySessions,
 SUM(IF(hits.type ='PAGE', 1, 0)) AS PageViews 
FROM TABLE_DATE_RANGE([myProject:123456789.ga_sessions_], 
TIMESTAMP('2018-06-01'),
TIMESTAMP('2018-06-15'))
GROUP BY date

 

Second Step: Load Data into Pandas dataframe

myDataFrame = bq.Query(myExampleQuery).to_dataframe()

 

Preview the data

myDataFrame.head()

 

 dateUsersGASessionsBigQuerySessionsPageViews
0 20180609 14544 18597 18597 48974
1 20180602 14322 18080 18080 52225
2 20180606 15657 19573 19573 54294
3 20180604 23623 28730 28730 74206
4 20180607 16474 20964 20964 57126

Plot the data

myPlot = myDataFrame.plot(x="date",y=["GASessions","Users"])
myPlot.set_xticklabels(myDataFrame["date"], rotation=25)
plt.show()

 

bigquery basic examples plot