The Evolving World of Tracking Media - Mark Green, founder and CEO of MediaBrain mulls the issues of the day
Tracking digital media consumption is an evolving science. As aspects like cookies crumble, other tools like pixels are being increasingly leaned upon. What remains to be seen is how far the governments of the world plan to go in restricting companies' abilities to collect and analyze personally identifiable information.
Before jumping into opinions, let's spell out what today’s primary collection methods are.
A pixel is a line of code that you embed in your digital content. It is used to record each time that content is exposed. The line of code collects some data in the moment of the exposure and sends it to a listening service. The pixel’s code prescribes what to collect and where to send it, typically collecting the type of device, type of app or browser, IP address, operating system, screen resolution, and date-time of the exposure.
Making sense of pixels requires data partners with device graphs and consumer data platforms, to tell you who owns the device and what else is happening around the time of the exposure and after.
Pixels can tell you what is happening with your content in digital media. They do not tell what is happening with other people’s content. So you can know about your ad but not the publisher content or journey to your ad, and of course, you are blind to what your competitors are doing. Third party measurement services can fill in some of these gaps.
While linear TV is now mostly digital, linear distribution generally does not support pixels. On the other hand, most on-demand viewing is being rendered through separate streaming apps which do support pixels. As linear distribution transforms its tuning to apps, we expect pixels to become the predominant way to measure your content exposures for all TV. We expect third party ACR measurement services to provide the full view of all exposures across all TV content.
While pixels collect information page-by-page, cookies collect across pages. They are downloaded to and stored on your browser so they can track the browser activity across pages.
1st party cookies only track users of the website from which the cookie comes. The user’s benefit is that 1st party cookies remember them, so they do not need to login again. The publisher's benefit is that they get to see everything each user does on their site, across all visits until the cookies are deleted or expire.
3rd party cookies track everywhere you go on the browser. 3rd parties never announce that they are different from 1st parties. Sometimes 3rd parties like 1st parties ask for permission in the small print of a “terms” agreement that is almost never read. Often, 3rd parties do not even ask that much.
Cookies are the primary method used by publishers to track digital activity on websites. For tracking activity not through browsers, such as apps on your phone, tracking is built into the device’s operating system, the two big players being Apple’s iOS and Google’s Android. The tracking information from cookies and these operating systems are equivalent and sometimes lumped together.
Before consumer privacy became a thing, the tech and marketing tech companies were all about collecting and connecting this data to use and sell it. The trick was figuring out how to connect the data to know who was using a given device and in what context: where, when, and for what purpose.
The exercise of connecting data is called graphing. Certain pieces of data are the same across measurements, such as device ID, Cookie ID, IP address, GPS, and street address, and these are used to group the information. As an example, all the information from one phone (device ID) probably uses one IP address more than any other at the home address of the device’s primary user. Mapping IP addresses to street addresses with additional information like preponderance of GPS data points, let’s devices be grouped by IP addresses and Home addresses and Cookie IDs via IP addresses. While there is some educated guesswork at the start, with enough information knowing who owns and uses what becomes close to a sure thing. The graph is the backbone to connecting everything else, like your shopping habits, financial transactions, vacations, etc. When you carry a tracking device like your phone everywhere you go, tracking you becomes easy. This is what being “on the grid” means.
Most people are unaware of the “grid”, that they are “on the grid”, and how many companies have access to this information.
A few people made it their mission to raise awareness and that led Europe to pass a law (the General Data Protection Regulation) saying people have to give their permission to be tracked by a company for each case. Interpreting what this means is now being decided in the agencies and courts that implement and adjudicate laws. Europe is in the middle of this. Meanwhile California with a similar law (the California Consumer Privacy Act) is just starting to go through this process of implementation and adjudication.
The Privacy War
In concert with the European and California governments, Apple has recently championed privacy by requiring cookies to be opted in on their Safari browser and tracking to be opted in on all iOS devices. Google is taking similar actions on cookies with their Chrome browser, but has yet to follow with tracking opt-ins on Android devices.
Various companies have been preparing for a cookie-less world by leaning towards pixels and building graphs that work without Cookies. For publishers who track their own content, this is straightforward. For 3rd party trackers, this means getting cooperation from publishers and sampling activity through opt-in panels to cover the rest. For bidding exchanges, this means publishers selling their unique and duplicate impressions from within their sites without knowledge of how they duplicate with other sites. While buyers use frequency caps, they really have no information on frequency across sites until their ad exposure pixels come back and are mapped to their internal graphs, that they rent from data partners, tell them. Unlike exchanges, DSPs do have lists of notional people from graphing devices, cookies, emails, IPs, and physical addresses together, and they leverage these lists to buy reach through direct deals and bidding on exchanges.
Leveraging their respective sign-in and inventory sales partnerships, Facebook and Google track many publishers (both websites and apps) and provide bidding with frequency caps across many publishers, guaranteeing greater reach. The capability to track across sites is the strategic advantage that Facebook is starting to lose with Apple’s moves and new opt-in rules from GDPR and CCPA. While Google faces similar challenges, they retain control of tracking through their Android and Chrome distribution systems.
While publishers can continue to track consumption of their own content and collect personally identifiable information, such as IP addresses, from users with permission, there are strict rules about sharing that information with data partners, exchanges and advertisers. The informed consent of the user is required to transfer the information to other parties and the data must not be used for any purposes except for those specified at the time of collection. The full impact of how supply side “selling” platforms (SSPs), exchanges, and demand side “buying” platforms (DSPs) handle personally identifiable information outside of the European Union and the UK is yet to be determined.
Data handling is evolving fast. Some companies are avoiding personally identifiable information by generalizing their data to aggregations so individuals cannot be identified. A simple way to become GDPR compliant is to link consumer information at the post code level. Other companies are specializing in handling personally identifiable information for companies that have permission to use it.
How CCPA is interpreted and enforced will likely be similar: though nothing is definite, as activists and companies battle each other in this privacy war.
Most likely: people will be fine with being measured but not targeted, and enforcers consequently treat these as different use cases. The winners will be the measurers, who track activity but do not need to share personally identifiable information in selling their measurements. The losers will be the targeters and re-targeters of individuals and narrow cohorts, where personally identifiable information is necessary to frame the cohort.
The ability to measure will be controlled by those who either own the content or own the distribution systems.
Facebook will be the most challenged, as they gradually get cut off from their 3rd party sign-in data. Google will find that their targeting will need to be more generalized, but will still have the advantage of controlling and tapping into Android / Chrome tracking data beyond their content tracking.
Amazon sits in a differentiated position, being a retailer with explicit permission to collect consumer information in relation to both purchases and reviews. So advertisers do not need to share personally identifiable information to buy advertising space that targets specific buyers. Amazon’s marketplace size makes this valuable to advertisers.
Beyond marketing through retailers, advertisers will need to relearn the value of cohorts that are not simply lists of people or addresses.
Everyone will return their attention to reach. For digital media, it will be the beginning of really taking reach seriously.
Vertical integrations always face the challenge of innovation. The importance of the whole limits variation of the parts. This applies to product packaging, innovations, and profit margins. Experimentation and pivots that disrupt platforms and margins are not allowed.
Modular architecture opens the door to innovation, allowing upgrades and replacements without disrupting workflows. Modules can insert two steps into a workflow, where there was one, or conversely converge two steps into one, depending on the needs and advantages of the innovations. The architecture allows evolutionary processes to take place, where ecosystems turn over as their parts evolve.
Television is expanding into streaming, and evolving privacy laws are driving experimentation in media measurements, analytics, planning, and activation.
In US streaming, everyone is trying to quantify whether planning reach matters and if so, how to estimate it. On the yes side, Nielsen is linking census data from the streaming services to its projectable panel to estimate the combined linear, delayed viewing, and streaming ratings and reach. Given the volume of correction notifications, Nielsen is finding this to be a bumpy road. Meanwhile, Comscore is charging in. Others are graphing together cable, smart tv and like datasets to crack this nut too.
The preeminence of walled gardens from the social / search companies (Facebook, Google, etc.), broadcast / cable companies (Disney, NBCU/Comcast, etc.), publishers (individuals and groups), information companies (IHS Markit, Transunion, Experian, etc.), research companies (Nielsen, Ipsos, Kantar, etc.), device OEMs (Apple, Samsung, Roku, etc.), and retailers (Amazon, Walmart, etc.) lead to gated and siloed data. Oddly, the rise of privacy laws may make synthesizing and analyzing these data easier by encouraging native digital people to see the value of probabilities.
In Europe and coming back to the US, measurement and analytics are converging on postcodes as a straightforward approach to associate granular data without requiring special permissions and handling for privacy. You can still track journeys of their customers through your own campaigns with cooperating publishers and pixels, as long as culled device graphs survive the privacy laws. However the prospect of analyzing potentials and targeting beyond your own customer base will become more probabilistic, as look-a-like projections anchor on postcodes. Tracking behaviors below the postcode will soon require both first party permissions and specialized machine learning techniques, like Federated Learning, to identify and quantify behaviors and characteristics while keeping personal data unidentifiable.
In China, Procter & Gamble is testing an algorithm dubbed CAID that collects anonymous device data through apps that Apple does not currently block, such as start-up time, model, time zone, country, language, and IP address, to keep device graphs alive by creating virtual device IDs to track behavior and performance.
Census samples measure all activities of specific apps or devices. But census samples are silos when limited to deterministic matching in a privacy aware world. The only way around this is to apply probabilities.
Traditional technique is to leverage panel samples drawn from enumerated universes to calibrate the relationships between anonymized versions of the census samples to estimate narrower segments within samples and behaviors across census samples.
Newer techniques leverage machine learning techniques to aggregate, “learn” is the term of art, the relationships of granular behavior while keeping personally identifiable information impossible to discover.
Projecting these samples still needs to be done and is more complicated. The classic approach is to use a randomly drawn panel to project.
Modular adtech allows varying data, governance rules, algorithms, and workflows to be experimented with without disrupting the whole platform.
We see an opportunity for modular initiatives that plug-and-play with mature platforms. Consultants who imagine new methods can work with companies like MediaBrain to code their method into plug-and-play modules, allowing Consultants to scale their custom work.
MediaBrain is a modular player that enhances ERPs (Enterprise Resource Plans). Its OptiBrain module ingests effectiveness measures and reports out optimal plans that deliver the most effective potential for a given price or guide decisions by framing the best price solutions for different levels of effectiveness.
The modular solutions are natural partners for the ERPs, being innovation labs for the adtech components, enabling their mature ERP ecosystems to become nimble and evolve with the digital transformation of advertising and communications.
Ad Tech is starting to evolve again, so it is time to take a look at it, poke at it, and offer a viewpoint on where it is going. To do this, it is necessary to sound a touch clinical in framing what’s up before providing an opinion on where it is going.
Performance is about effectiveness and efficiency. Effectiveness: “what works”, and efficiency: “doing it for the least amount of money”.
Effectiveness entails identifying, targeting, and converting prospects into sales. To do this efficiently, you need to unify the measurements of these activities and then forecast, optimize plans (for optimal return on investment), and activate opportunities.
The key data are prospects, exposures, duplications, presale steps, and sales.
For Connected TV, we need exposure patterns for prospects to forecast where we are likely to find them. Prospects are population segments. Exposures are tracked in panels and census samples. Since panels can only report broad segments like gender and ages, it is necessary to combine information from panels and censuses to discern exposure patterns for prospect segments.
Census & Panel Data
Census samples measure all activities of specific apps or devices. Panel samples relate data across different census samples and in best cases projects them to universes.
Panels are used as general estimation tools, or as calibrators for census data. In the latter case, narrow segments of census data can be projected to national estimates.
Often overlapping samples work better together. As an example, a smart TV census sample can gauge the duplication between linear and streaming TV, while a “projectable” panel sample can be used to calibrate that smart TV data using geography and devices to create a converged measurement. Even without granular data from one or both of the sources, you can still leverage the aggregated patterns to inform both duplication and projections.
Advertisers can implant pixels on their digital assets (ads, websites, apps, and presale steps) to track and connect exposures to outcomes.
Sellers often offer to implant pixels on a buyer's digital assets to provide an audit of their performance. If the seller also supplies which programs the ad were on and when they were viewed, this can be used with the viewing planning data sources to evaluate which opportunities have better engagement.
To connect pixels deterministically, it is necessary to have a device graph. When an activity happens like “the ad has been successfully served”, the pixel “fires”, announcing the date, time, device, and sometimes location. The device graph says which devices have the same owners. The graph connects multiple activities to specific devices which can then be mapped through similar graphs to specific homes and people.
If privacy rules do not allow deterministic connections, then proximity matching (or more nuanced probability scores to create synthetic agents) using date, time, and more generalized location data such as post code and post code profiles can be an excellent near-term substitute and long-term solution, as the emerging privacy laws will require this.
If Advertisers are managing their own pixels, they can often connect them all the way to sales. Brand direct advertisers generally know the people who they are selling to. Brand advertisers that sell through intermediary retailers need to leverage other census and panel data sources focused on sales to make this connection. This can likewise be done either deterministically or probabilistically.
A segment represents a specific slice of the universe. In the case of prospects, they are potential purchasers. Segments are generally used to target potential behaviors with the purpose of converting potentials into action.
Advertisers often know past purchasers. They leverage this list of people to find similar people, through “look-a-like” models, to target a larger list of potential purchasers. Sometimes Advertisers buy characteristics or a list of people who purchase competitive or similar products to expand their scope of potential purchasers.
These segments become their targets, and the advertiser then plans how to convert potentials into action through various communication investments.
The original focus of measuring direct response to digital ads conflated media planning and buying. The immediacy of direct response to gauge performance has led media investment decisions to become tactical and in some cases real-time. This makes tremendous sense for sales of known products that have no brand differentiation or message.
For advertisers that want to create brands to enhance pricing and sales opportunities, a messaging plan prior to activating messages is necessary. Branding objectives and hence their effectiveness measures vary.
What’s the best way to plan for effectiveness? Firstly, we need ways to measure it. Like: how many times and how often has your target segment seen the ad? Content Performance or: “Was the context right?” (Some messages work better than others in a video about airplane crashes!). Most importantly, does the message resonate with the people you’re trying to reach? This is typically measured with branding surveys and A/B testing messages.
These effectiveness measures then become inputs to planning future campaigns. For newer brands, these can be based on look-a-like brand benchmarks or market test results.
Wielding these inputs to plan forward involves mountains of data processing to figure out a brand’s potential and the least expensive way of getting there. This is where planning pivots from focusing on effectiveness measures to efficiency.
Adtech is evolving from two connected quagmires: privacy and operations. In “The Rise and Fall of Deterministic Analytics”, we addressed the impact of privacy on data ownership, management, and attribution modeling.
As privacy challenges arise, Advertisers are seeing their response data becoming more integral to internal strategies and operations, with analytics and workflows becoming more automated. This inward pull and customization of adtech is disrupting large vertically integrated, external adtech solutions. Friction is increasing. Large tech companies want to control the adtech ecosystem while advertisers want to integrate the data and tech into their own enterprise resource planning (ERP) systems.
The new players chomping at the adtech pie will be the ERPs and the modular players, who can plug-and-play and reside in anyone’s cloud.
Oracle exemplifies ERP companies trying to incorporate adtech.
MediaBrain is a modular player that enhances ERPs. Its OptiBrain module ingests effectiveness measures and reports out optimal plans that deliver the most effective potential for a given price or guide decisions by framing the best price solutions for different levels of effectiveness.
Companies that provide modular solutions are natural partners for the ERPs. Modular components enable innovation through enhancements and replacements without disrupting mature ecosystems. Adtech modules allow ERP platforms to evolve with the digital transformation of advertising and communications
Until recently, deterministic analytics was on the ascendency with its claim of following the journey of actual consumers from their first exposure to the product, in ads and displays, to every exposure along the way to the purchase. Tracking every touchpoint and attributing its value towards getting the sale is the objective of multi-touch attribution (MTA).
From the digital certainty, that someone clicked my ad and bought my product, came the religion of determinism. The digerati of the ad world, led by the goliaths Google and Facebook who controlled large swaths of the data, argued that only events that could be directly traced counted. This posed a huge challenge to linear media whose messages were broadcasted to and received by unknown persons. The consequence was television got squeezed and the rest of linear media (newspapers, magazines, etc.) came tumbling down as ad dollars went online.
Directly tracing what people want is a fast and powerful tool to deciding what to create and sell. Google and Facebook taught this lesson to advertisers and direct to consumer sales took off. Netflix taught the TV industry this lesson and television is now transforming.
Tracking digital media and messages is well understood and practiced with cookies, pixels, and device IDs. Since advertisers buy exposures or clicks, they only need to track their advertising message. Complexities arise when the advertiser wants to govern the acceptability of the content its messages show in, but that is another discussion entirely. Determining the journey of exposures of their advertising message requires the advertiser to connect all the touch point identifiers: the cookies, pixels, and device IDs to persons. The challenge is connecting the myriad of cookies and pixels to device IDs and then to persons and transactions.
Ecosystems, such as data management platforms (DMPs) and later consumer data platforms (CDPs), rose to connect these data. Google and Facebook built tagging and tracing systems - through their activation platforms - to enable smaller advertisers to gain journey insights without having to finance a DMP or CDP. Google with its urchin tracking module (UTM) tags takes this a step further with their Analytics, letting advertisers track their website traffic and non-Google digital ads too. Advertisers get the journey analytics leading up to the transactions, but not who they are, making the final connection to the transaction probabilistic.
Linear is more complicated. Linear is one way communication. In some cases, there are devices to say it was received, such as cable boxes or smart TVs, and in other cases. In these cases, there is the possibility of connecting linear exposures to journeys. In other cases where there are no devices to say it was received, it is impossible to connect such exposures to journeys. Beyond this leaking of touch points, the challenge of corralling the devices that report receiving linear is immense, as they are owned and controlled by a myriad of competing cable and smart device companies. To get complete coverage on linear, you have to get all this data and then distill it down to the time and channel your advertisement aired to see who was exposed. Alternatively, you can get a large sample of this exposure data, connect it to your journeys, and then model the touch points gaps from the data that you could not get. Getting complete coverage is practically impossible. No one gets it. Consequently everyone models the gaps.
Then there is the challenge of objective. Do you want to measure Direct Response or Brand Resonance? Last touch attributions are always focused on direct response.
Since the determinist puritans came from digital direct marketing, these challenges are waved aside as yet-to-be-transformed parts of media and marketing.
Privacy is starting to take root and likely to kill determinism for brand marketing. Europe’s legislation general data protection regulation (GDPR) is moving through court cases, defining its scope. The California Consumer Privacy Act (CCPA) is just starting its court cases.
Personal information is any information including patterns that can be identified back to a person or household.
The basic ideas of these two laws are:
Google and Apple are now starting to champion privacy to their advantage. In Google's case, they are no longer allowing third party cookies, making tracing journeys beyond their walls probabilistic. In Apple's case, they are requiring users to say that they want to be tracked for each app, making journey's that include Apple devices probabilistic.
Probabilities to associate data and draw performance insights are being tapped to deal with the reality that not all data can be connected deterministically anymore.
Determinism will continue with last touch attribution. Brands will need to find probabilistic paths to handle the increasing data gaps that privacy brings if they want to maintain or grow their resonance.
MediaBrain anticipates ad-tech and mar-tech startups to focus on the implications of privacy to transform the ecosystem infrastructure. We expect artificial intelligence to play a role in filling these gaps. Soon we will hear about “machine learning” all over again. This time it will come back as “privacy preserving machine learning”. Think of learning bots that go from one private data source to another to develop an aggregate view of how these sources look and behave according to different characteristics. The learnings are encoded with one way math, so that others cannot reverse engineer the learnings to identify any of the private information.
The first generation of learning bots will focus on predicted behaviors of common characteristics, such as age, gender, wealth, and location. The second generation will move to deep learning methods that do not presuppose which characteristics are predictive and look to discover which data are explanatory and of what. These multilayer techniques are currently being used to recognize things in photos. Of course the sequencing, that privacy requires, complicates the maths. Google is starting to deploy some first generation privacy preserving learning bots across the Android ecosystem with a technique called Federated Learning. The second generation of privacy preserving deep learning is still in development.
Look for these methods to start transforming the ad-tech and mar-tech ecosystems over the next five years.