Select your language

For GA360 clients, the BigQuery export is one of the most powerful features that allow you to work with the raw tracking data. Using this raw, session grouped tracking data on the one hand you might run deeper advanced analysis generating reports that not directly possible in the GA interface. On the other hand you can build your own automated systems to personalize and optimine your website.

The following section will describe a simple example on how you can use the GA BigQuery export data and a few lines of Python code to create a basic recommender system, based on the tracked transactional data. The code will use the transaction statistics to calculate and output "similar" products. The similarity between products will be calculated using the cosine similarity equation, applied to buying (sparse) vectors for each product with n dimensions - n, being the number of users that have bought any product.

cosine similarity

As an output, for each prouduct id you will get the assigned similar product ids with a metric for the similarity based on the equation above (larger value equals a higher similarity). If you save those values into a database (e.g. based on the transaction tracking data of the last 60 days, updated on a daily basis), you can use it for recommending products on your website:

  • Show product recommendations on your product detail pages
    "Users that have bought this product also bought these other products"
  • Show product recommendations for upselling during the checkout process
    In this case you can calculate and combine the similarities for each product in the basket - "Users that have bought the selected products, also bought the following other products"

from google.cloud import bigquery
import math
import os

project_id = 'my-project-id' # bigquery project id
output_filename = 'recommendation-data.csv' # output csv filename

# get buying userIds for each product from bigquery
print("loading data from bigquery...", end='')

buyingUserIdyByProductId = {}

bigquery_client = bigquery.Client(project='praxis-1216')

query_results = bigquery_client.run_sync_query("""
  SELECT
  hits.product.productSKU as productId,
  GROUP_CONCAT(fullVisitorId,',') AS buyerList
  FROM TABLE_DATE_RANGE([123456.ga_sessions_], DATE_ADD(CURRENT_TIMESTAMP(), -60, 'DAY'),    
  CURRENT_TIMESTAMP())

  WHERE hits.eCommerceAction.action_type = '6'
  GROUP BY productId""")

query_results.run()

page_token = None

while True:
  rows, total_rows, page_token = query_results.fetch_data(max_results=1000,page_token=page_token)

  for row in rows:
    buyingUserIdyByProductId[row[0]] = row[1].split(",")

    if not page_token:
      break

    print('.', end='')

print("done")

# Calculate cosine similarity for each product pair and write output csv
print('calculating cosine similarities and writing output file...', end='')

f = open(output_filename, 'w')

for productId in buyingUserIdyByProductId:
  for productId_compare in buyingUserIdyByProductId:
    if productId != productId_compare:
      similarity = 0
      for userId in buyingUserIdyByProductId[productId]:
        if userId in buyingUserIdyByProductId[productId_compare]:
          similarity += 1
      if similarity > 0:
        similarity /= math.sqrt(len(buyingUserIdyByProductId[productId])) * math.sqrt(len(buyingUserIdyByProductId[productId_compare]))
    f.write(productId + "\t" + productId_compare + "\t" + str(similarity)+ "\n")
    print('.', end='')
    f.flush()

f.flush()
f.close()

print("done")


Download Link: item-item-recommender.py