Third-party data — or data acquired from outside sources not created internally — can be a critical component of any data and analytics infrastructure. Available across a wide variety of providers and industries, third-party data is used to optimize both business processes and analytical insights.
In many cases, third-party data defines the standards and governance policies required to support the operations of complex value chains spanning multiple businesses and processes.
Given its ubiquity and utility, the operations of many businesses are highly dependent on data acquired from other entities. This means the efficient management of third-party data can be a competitive differentiator for companies that do it well.
This article explores some of the common use cases for third-party data (agnostic of industry) as well as some key considerations for engaging providers of third-party data.
Some Common Use Cases for Third-Party Data
Third-party data is used widely by both business units and data and analytics teams across every conceivable industry and company. While there are far too many individual business or industry-specific use cases to list here, there are several common themes spanning both business and IT uses.
It’s simply impossible for companies to develop a complete picture of their customers, products, employees, or any other business-critical domain. The depth of insights on these critical entities would otherwise be limited to a company’s first-party interactions with them.
However, third-party data can be extremely useful to enrich or augment internally sourced data with data that would otherwise be too expensive (or impossible) to capture by any one company alone. Examples here are many, but include customer demographics, behavioral data, credit or risk data or geospatial data.
Companies may need data on entirely new markets or to gain insights into areas where they don’t operate today. This could include competitive intelligence or government datasets which help to quantify a new product or market opportunity.
AI Training Data
An increasing trend will be companies acquiring third-party data to help train or tune AI models. This will particularly be the case for smaller or midsized companies that lack the volume and variety of data required to develop more robust AI models.
This focus on a need from smaller companies to access data for AI training may lead to the creation of consortia of companies voluntarily sharing data to enable them to more effectively complete with larger companies with access to more robust datasets.
‘Pre-Mastered’ Data, Including Reference Data
Acquiring data from a company that specializes in ensuring data adheres to specific governance policies (including quality standards) means companies get to avoid all the dirty work required to make the data fit for purpose. It also means they get to benefit from greater interoperability in situations where sharing data across business entities is required for efficient operations of a complex value chain or business process.
A great example here is the data produced by standards bodies — such as the International Standards Organization (ISO) — that defines the international quality and governance standards for data used widely by companies and consumers across the globe.
Data Quality and Verification
For many companies, paying a vendor to verify and validate data may be far more cost-effective than hiring a small army of data stewards to manage that process individually.
Key Considerations When Working with a Third-Party Data Vendor
Third-party data is provided from outside entities. This means establishing a solid relationship and operating agreement (often in the form of a contract or license agreement) with the provider of that data is critical to the success of any third-party data initiative. Here are some key considerations.
The Vendor or Source
Can the vendor be trusted? Are they willing to share their processes for how they acquire and manage data? Have they been operating for a long time and are they willing to provide customer references? Will they support some form of a proof of concept (POC) and give access to a test data set?
The Governance Policies Used to Create or Source the Data
This especially applies to data quality and lineage. Just because the vendor applies consistent governance policies doesn’t mean their governance policies align to those of the company consuming the data.
Knowing in advance any gaps between the vendor policies and the policies of the third-party data consumer is critical to the ability to operationalize this data and get value from it. Is the vendor willing to provide a dictionary and supporting definitions for the data they provide?
Baseline Data Quality
How will a data consumer know the quality of the data they are acquiring is fit for a given business purpose? Will the vendor make any warranties about the quality of the data, and if yes, are they willing to share the processes used to assess quality?
Data Access and Integration
Does the vendor support industry-standard APIs, and can the data be accessed in real time? How easy is it to extract the data from the vendor and load it to an external system? Is there significant work required to normalize the third-party data so that it conforms to the structures and architecture of where it’s being consumed?
If a company is using third-party data to augment or enrich data it already has (for example, on customers or products), how will the data being enriched be matched to third-party data? If a company calls a customer “Acme Inc.” and the vendor refers to the same business as “Acme LLC,” how will it be determined if these are the same entities, and how often will data stewards need to manually review records?
Cost and ROI
Third-party data can be expensive. Is there a well-defined business case to justify the acquisition of this data? Does the business case consider all the work that needs to be done to access the data, normalize the data, and align it to the existing definitions and governance standards?
Privacy, Data Security and Other Legal or Regulatory Requirements
Does the vendor guarantee that the data they are providing was sourced in accordance with any legal or regulatory requirements? What processes does the vendor have to ensure the data continues to conform to these regulations in the future?
The Licensing Model and Limitations
Will the consuming business “own” the data acquired from a vendor, or is it just being leased? If it’s the latter, is the consuming company prepared to purge the data from its systems in the event the vendor requests to terminate the agreement?
Will signing a lease agreement for data that’s used in mission-critical business processes effectively bind a given company to that vendor (or one like it) in perpetuity? What limitations are there on the use of the licensed data, and how exactly will a company using that data ensure compliance to those limitations?
Data Updates and Change Control
Is the use of third-party data a one-and-done (like buying a marketing list), or will there be regular updates to the data? If regular updates are expected, how does that process work, and what is the time interval for it?
When there are updates, will the vendor provide only new data, or will the entire dataset need to be updated and reloaded? If data has been previously integrated into a business process and the vendor changes it, will the existing governance and operational processes allow for the data to be easily updated?
If the data has changed, will that negatively impact any business processes using the data?
Who internally will be responsible for managing the relationship with the vendor providing the third-party data? What is the post-sale support model? Will there be an established process for dealing with situations where issues or discrepancies are discovered with the data? Are there established escalation procedures or SLAs for situations when a data transfer fails?
Understanding the intended uses and benefits of third-party data and taking all the issues noted above into consideration will together ensure that organizations can maximize its value and avoid any unforeseen issues.
I cover topics like this and several others important data leaders on the CDO Matters Podcast, the only show dedicated to helping leaders become more data driven.