The Top 8 Terms to Know When Exploring the Cross Device Tech Space

by Peter Ouzounov on Thursday 6 October 2016

How many times have you heard one of the below:


A cross device picture of your customers is what you need this year…

Cross device tracking is the new holy grail of advertising with the promise of more accurate customer insight to inform performance and creative strategy … 

Cross device is the new attribution of the industry, uncovering true campaign value that has been hidden or double counted…


All of the above are potentially true: customers are now more than ever, engaging with brands on one device (or more than one), but purchasing on another.  Therefore, when you connect your data to the services of a cross device partner, you could uncover longer, more complex customer journeys, giving you better insight to both attribute and target more accurately across the conversion funnel. 


However, suppliers of cross device information for your data use increasingly technical language. This is understandable as the problem is a highly technical challenge.


In this article, we go through some of the top terms you should know when exploring the space. Next to our modest definition of each of the terms, we have also included a few learnings from our experience that should help you add some context to any discussions you might have about cross device mapping. 


Interested in a full audit of a cross device partner, an understanding what they could potential bring to your data? Check out our sister post.


1. Deterministic versus probabilistic cross device solutions


A supplier can either know for sure that two of your cookie IDs belong to the same customer by tracking them across the web (deterministic matching), or they can create a predictive algorithm based on their extensive customer tracked data, that will make predictions about whether two cookies are of the same user journey (probabilistic matching). 

First of all, nobody has a complete set of deterministic customer data. No matter how large you are, there will always be holes in this kind of data. 

That said, two examples of partially deterministic cross device partners are Google and (probably) Facebook’s Atlas solution. Google has been including cross device conversions in Google AdWords for some time now and there are similar solutions across the rest of the DoubleClick tech stack.


Probabilistic device suppliers include players like Tapad, AdBrain, Drawbridge, and Crosswise-Oracle (amongst others). We’ll revisit difference in terms of expected performance between the two methods in accuracy and recall below.


In summary, nobody is entirely deterministic, so all suppliers (including Google) exist on a continuum from partially deterministic to probabilistic.


2. Training data set (important for probabilistic methods)


This is the data that your cross device partner used to build their model. In short, the device to user matching predictions are generalized based on the data that is used in the calibration stage. 


One example of where this is important is in the geography of the training data set. If using training data from the UK market, specifically web traffic data of 25-35 year-olds, the predictions will really only be valid for similar customers in the UK market. Using this data to predict cross device matches in Brazil will likely be inaccurate. Usually cross device partners have billions of data points so that they can make predictions about many demographics and markets. Nonetheless, it’s important to always check potential biases in the data like this when speaking to potential suppliers.


3. Device graph


This is a pairing of cookies that the cross device supplier sends back after their matching algorithm does its magic. It’s basically cookie pairings that belong to the same user. For probabilistic device graphs, it may also contain a probability measure (or confidence interval) to give you an idea of how confident the supplier is that the algorithm accurately predicted the pairing. Usually, lower confidence means a higher match rate and vice versa. 


4. Accuracy


This is the percentage of the supplier’s device graph that correctly attaches your cookies to the right user. However, this is not as straightforward as it seems as we don’t know who the correct user is all of the time (this is because of the data holes I mentioned earlier). If we did, we wouldn’t need a cross device partner! Therefore, this is usually calculated on the training data set of the supplier, or on your own CRM and cookie data (as we recommend in our sister post). It’s usually the case that deterministic data will have very high accuracy (although there will always be some error, for instance when customers use a friend’s device to make purchases). On the other hand, probabilistic models will claim something along the lines of 90-95% accuracy (on their internal training data). The latter can be much smaller on any internal test you would run with your own CRM and cookie data. This is simply because your data will be different and have a much smaller sample size than the data used to train your supplier’s model. 


5. Recall


Recall is more tricky to explain but it is an important metric to consider. Accuracy is about correct numbers divided by total predictions made. Recall is the correct number as a percentage of all actual users and their multiple cookie ids, including the ones that were not included in the original prediction. This is a measure of relevance. 


To understand the difference between accuracy and recall, let’s take an example from my own algorithm, called Pete’s Pick. Pete’s Pick can tell you exactly which cookies correspond to my own phone and laptop. I will have 100% accuracy on this, since I personally know exactly what device I am using as I am using it. However, my recall is nearly 0% (1/UK population using technology to make purchases online). That is because I am not going to be able to make a correct prediction for the other millions of device owners in the UK. Even though it is highly accurate, Pete’s Pick has very little relevance as an algorithm. 


Deterministic data will have lower recall than that of a probabilistic model. This is because probabilistic data can be used to make a prediction about any situation that has the relevant data inputs, while deterministic is limited to the pairs you absolutely know are the same, of course because they have to be tracked across devices. 


6. Cookie sync


Device partners will want to sync cookies by placing their pixel into one of your tracking pixels (commonly done in things like DCM). This allows them to associate their cookie id with other cookies ids and to receive device partner information about your customers which they use to input into their device model. It usually takes 2-4 months to complete a cookie sync. 


7. Sync rate


Percentage of your cookies which are passed to the provider. This needs to be in 80-90% of the case, and the error (10-20% in the example) shouldn’t be systematic. When I say systematic, I mean that the reasons why cookies don’t sync should be almost random – a technical glitch for example, or a connection error. If it is systematic, non-random, then you might be losing valuable information. 


For example, let’s say that you lose the sync between 11 pm to midnight everyday, that means that your data will not have any late night shoppers who are buying on their phone (possibly in they pyjamas already in bed). This is systematic error. Your resulting device mapping won’t be as accurate or insightful as you are dropping this kind of customer behaviour from your model.


8. Match rate


The percentage of your cookies which the device mapping has associated with a device. This has nothing to do with accuracy. However, based on our discussion with various cross device partners, you should see 40-60% match rate for probabilistic models, and 10-40% for deterministic models. Note that many times, this will not only be across multiple devices, but across different cookie ids on the same device. For example, cookie A and Cookie B both belong to me because I cleared my history recently. Matching across cookies is an under discussed, underappreciated benefit of a cross device graph. 


Data science lingo has once again permeated into the discussion of something simple, even as simple as describing a consumer’s journey from browsing on their phone on their couch, to jumping back to their desk and working on their laptop. In this case though, this might be a good thing in that more accuracy in our ability to measure the customer journey can give us greater insights for marketing. Google’s own cross-device metrics are on average attributing 16% more conversion to paid search campaigns and that is likely an underestimation!


Given the myriad amount of solutions and cost structures, it’s important to do your own digging into the specifics of the space, and this glossary should help you get started. 


Looking for a guide on how you or your agency can evaluate a cross-device solution? Check out our sister post here.