Big Data Infrastructure Powers our Workflow

by Ben Hawker on Monday 27 March 2017
A whistlestop tour on how Forward3D uses internal systems to power our strategies.

At Forward3D, we have over 300 Analysts based across 8 timezones. Each client team has a daily requirement for data imports from a variety of 3rd party services, which will inform their analysis, reporting, and actions on where to direct resources on client campaigns. 

These 3rd party services generally fall into 2 categories.


1. Services our Analysts are actively using to manage live campaigns.

  • Examples:
    • Google Adwords/Bing Ads/Yandex Direct/Naver Ads/DoubleClick
    • Google Shopping/Bing Shopping/Yandex Marketplace
    • Getstat/Google Search Console

2. Supplementary data sources.

  • Examples:
    • Custom FTP imports (e.g. Client Product/Price Feeds)
    • Exchange Rates
    • Weather
    • TV Listings

For a given client, each of these services may return multiple custom defined reports across multiple regions and languages. For example, every day for one of our larger clients we import over 5000 separate reports from the Adwords API alone.This covers multiple languages, countries and reports from simpler keyword reports through to more complex reports, which consider conversions across multiple different devices.


In total we are currently importing over 500 million lines of data, which are split across more than 50,000 reports each day.


To achieve this we have a series of Ruby applications built in a lambda architecture each with a single responsibility from permissions to client management to report configuration to report scheduling.


Importing Data

For processing data imports we use a persistent queue system (Resque) which enables scheduling and background processing, which is a well utilised scalability strategy across many other web applications. This allows us to send jobs to servers where we have available resources as they need to be run.


We use Apache Mesos, a respected open source cluster manager and our own Mesos scheduler implementation (which we will call Quesos) which runs all our job submission. Our schedule is written in JRuby allowing thread-level parallelism. Simplistically, Quesos receives resource offers from slaves in the Mesos cluster and passes relevant jobs to be run back. Mesos allows us to utilise multiple distributed resources with relative ease.


Since Forward3D’s inception the use of internally built technology has been integral to the agency’s ethos and approach. Our suite of internal applications automates that which can be automated, leaving our client teams to focus on delivering actual insights and campaign improvements rather than the initial stage of raw data collection.


The next evolution of our internal technology will provide tools for our client teams to create new campaigns as well as editing and optimising existing campaigns in bulk across the keywords digital advertising platforms. This will enable us to standardise inputs across different providers (e.g. Google vs. Bing) and find further efficiencies and time savings across common tasks.