Methods for integrating acquired datasets – Part II
When Zynga was acquired by Take-Two last month, much of the analysis and press revolved around Zynga’s ability to use the combined data for “improving monetization and growing revenue” of the two combined companies. Ben Thompson—with always brilliant analysis— highlights another major driver behind the acquisition:
Zynga, meanwhile, was among the least prepared of the major mobile gaming companies for the changes wrought by Apple’s App Tracking Transparency (ATT) policy, which was introduced with iOS 14 and rolled out over the first half of 2021. In the pre-ATT world everyone from e-commerce sellers to app developers could effectively offload the collection and analysis of conversion data and subsequent targeting of advertising to Facebook, to the benefit of everyone involved: individual developers and retailers did not need to bear the risk or expense of collecting and analyzing data, and could instead collectively outsource that job to the Facebook data factory.
“Data” is constantly referenced as a reason, or advantage, behind the merging of content and technology companies. But beyond the nebulous term “data” there isn’t much detail on what, or how this data is used. In our last article, we explored what these companies are actually going to do with the combined data. Today, we’ll talk about how companies are going about combining and leveraging these datasets.
Big data is hard. Really hard. Mobile apps generate a ton of data, so combining datasets is an ambitious and expensive undertaking. There are questions of data connections, storage, access, and ongoing maintenance– that’s before considering privacy and compliance. Like most ambitious projects, the details in execution can hamstring efforts. In the words of Patton “bad tactics will destroy even the best strategy.” We work with and talk to many large-scale app publishers, vendors, and tech providers and have seen a variety of strategies for how to combine and leverage the combined large datasets –while outlining how to build a data platform is outside the scope of this article– we do have some excellent examples that range in levels of sophistication we’ve seen.
How companies are combining and accessing datasets
Method 1: Periodic Reporting. “Excel Jockeys”
I’m the last person to knock excel, studies show it still powers budgeting for 89% of organizations. It’s not sexy and you won’t hear anyone bragging about it publicly, but sharing spreadsheets is still hands-down the fastest and easiest method for a company to combine data– you might be surprised at the scope and scale of an organization’s “data sharing” initiative is still combining spreadsheets. But hey, if it’s stupid and it works, it ain’t stupid.
Pros: It’s fast, it works, it’s ubiquitous.
Cons: Scale relies on meatware, you might find yourself relying on a single analyst for the entire company’s financial reporting. Large datasets will quickly reach the limit of excel.
Method 2: The enterprise approach. Centralized, data teams – “I have a guy”
I have a friend who runs the UA budget at a top-10 app company – one that comfortably spends seven figures on UA a month– and we recently were talking through preferred methods of predicting user LTVs and he said something that sums up this approach “Oh, I have a guy for that. Whatever questions I have, I send his way and his team of PhDs answers it.” This centralized BI team represents a more traditional enterprise approach to accessing data. Invented back when Data Engineers were called DBAs, an internal IT team would field business user requests, set priorities, and knock out the business intelligence requests and data projects.
The pros: Once up and running these teams are capable, have repeatable processes, and are effective in having highly specialized teams supporting an organization. Centralization means innovations can be more readily applied.
Cons: These teams often function as internal cost-centers. As such, they generally receive just enough investment to work. They can be slow-moving with the internal teams scrambling through tickets and shifting business priorities and may impede the ability to get answers quickly. And finally, they are expensive: Data engineers and scientists are in high demand and command very impressive salaries.
Method 3: The heavy hand. Standardized Technology – “Install this SDK”
Zynga is infamous for throwing an SDK over the fence of a newly acquired company and saying: “welcome aboard. Integrate this.” A corporate SDK functions as the first level of standardization to unify a tech stack. Often there’s universal analytics, a corporate-wide agreement with a chosen ad mediator, the same pre-tested grouping of ad network SDK-bundles, and the same MMP. This isn’t just for massive companies: a smaller level, we’ve seen even indie app developers with modest portfolios internally standardize across apps to simplify ongoing product development and operations.
Pros: Obviously the same integration makes data unification straightforward, and a standardized tech stack can simply product development and QA across organizations.
Cons: First, while this may simplify ongoing data collection it doesn’t solve the problem of integrating legacy data, so the challenge of getting value from data pre-SDK will still exist. Another problem with this approach is speed. An SDK footprint takes a long time to develop and deploy. Testing new partners and technology innovations are stifled by a centralized stack that quickly fast becomes stale and outdated. And finally – and potentially most important—organizations often aren’t using the same technology because they fundamentally operate differently. Mandating an approach could slow or impede the abilities of a company or app to be effective.
Method 4: Org-level data pipelines
A novel and increasingly popular approach we’ve seen is enabling studios to create their own data pipelines. Data is stored in a central repository where each team’s data engineers build data pipelines to push necessary reporting and metrics to the centralized system, retrieve desired company-wide data, but manage/continue making their own technology decisions at the organization/studio level.
Pro: Allowing an organization to retain autonomy with technology decisions means the company can use different– the best, or at least familiar– solutions that will best help them achieve the KPIs to measure success. A standardized method of access allows all the benefits of combining data without the heavy-handed draconian approach of mandating vendors or SDKs.
Con: To allow every organization access, manipulate, store, and visualize the data in their preferred methodology, still need a data team at the org level, so you still need to staff and resource the people and since about every company is hiring for data roles, this can present a hiring challenge. Also, the fragmentation across technology vendors means you’re less likely to command higher discounts realized through consolidation.
Method 4: Democratizing data. Empowering everyone to answer questions
We believe the cornerstone of a data-driven business is removing the obstacles in getting insights and answers from data. This approach provides a centralized repository for consolidated and unified data and enables anyone in the business to non-technically access the combined data: from analyst to executive, anyone can answer questions about the business. Full disclosure: we believe in this approach so much, we’ve built a platform to achieve it.
Pros: Removing technical hurdles in answering fundamental questions to the business without resorting to technical teams (how much does completing a tutorial impact LTV? helps product, marketing, management, and monetization teams set the correct priorities, measure effectiveness, and drive impact.
Cons: Without a team incentivized– or capable– to drive change even the best insights are wasted. A winning lottery ticket means nothing to a person who’s stuck in a burning house.
Bringing data together is just the beginning
The architecture and technology to centralize data is an important step in enabling a data-driven business. However, an often-overlooked critical factor is who will be using it. Even the best weapon is useless without someone to wield it. In our next article, we’ll cover how various teams use data to drive their business forward.