Data Sharing (Re)Defined

Data sharing was recently identified as a top transformational trend by Gartner in their 2022 Data and Analytics Programs and Practices Hype Cycle. This is because Gartner research conducted over the last two years demonstrates that data sharing can provide exceptional value to organizations and the business leaders, especially Chief Data Officers (CDOs), who embrace it.  

As transformational as data sharing might be, it remains loosely defined. Gartner calls data sharing a “business key performance indicator (KPI)” that promotes data reuse — but provides no specifics on data sharing use cases. 

This means data sharing could encompass any situation where data moves between individuals, databases or entire organizations. And if it’s truly just a KPI, then one could argue that data sharing doesn’t necessarily require any discipline in managing a new or novel business process — which surely cannot be true given the potential benefits.  

I touched on this rather “squishy” definition of data sharing in my previous blog where data sharing spans every potential form of data distribution or exchange, from for-profit data monetization to “data for good” and everything in between. 

This also includes sharing data internally or even outside your company. This means that even a traditional “extract, transform, load” (ETL) process between two database tables could be considered data sharing, which is most certainly not novel and simply can’t represent the transformational nature of data sharing that Gartner states in its recent hype cycle.  

WHY DATA SHARING NEEDS BETTER DEFINITION 

There are many problems with data sharing being poorly defined. Primarily, if data experts can’t describe it — or quantify it — this will make it extremely difficult for companies to justify investments in it. It would be like asking a company to invest in world peace. On paper, it may sound like a great idea, but if you don’t really know what it involves or how to define it, you can’t prove you delivered it. Something that means everything ultimately means nothing.   

Another problem with this poor definition is that it allows software vendors to say they can sell you a sharing solution. I believe data sharing — much like master data management (MDM) — is a software-enabled discipline.  

Unfortunately, with this singular definition, no specific solution yet exists in the market to support it. That’s not stopping several large data and analytics vendors from marketing data sharing as a key function of their solutions — many of whom are in the cloud data warehouse, lake or house markets.  

While creating a data common data store in the cloud with an access/permissions layer on top of it is an extremely basic approach to sharing that may provide some value by negating the need for complex integration processes, simply having access to data is a long way from enabling a true business transformation. 

 This doesn’t mean that software has no role to play here — it most certainly does. But if the definition of data sharing remains nebulous, it’s impossible to drive any clarity on what the software market for a true sharing solution might look like. As the current definition suggests, any software that allows data to move from one entity to another could be considered a sharing solution — even email.

While the data-sharing landscape remains murky, its business benefits have been very real for decades. A common example is the various data consortiums that exist to help facilitate global trade or credit — such as data standards around product data including UPC codes. But if we are going to optimize data sharing, then we need to be more succinct in how we define it.  

8 KEY PRINCIPLES OF DATA SHARING 

Before providing specifics on a new definition, it’s important to highlight eight key guiding principles that I firmly believe must be supported when attempting to describe what data sharing is or isn’t:  

  1. Organizations will share data with the expectation of receiving something of value in return.

    This means that donating data without an expectation of any value exchange or realization is not data sharing, but rather “data for good” or data charity.  
  2. Data sharing requires that at least two parties are involved in contributing/sharing data, where both parties benefit from sharing.  

    It also means that this one-way flow of data — from a single data producer to an individual consumer — is not data sharing. This flow from a data source to a destination or from a producer to a consumer is what it’s always been — a data transfer or a data integration. Data sharing requires two datasets to be combined — physically or virtually — to allow for the combined data to generate incremental insights and value.  
  3. Value from data sharing can be accelerated through the creation of network effects that are created by the combination of data from multiple contributors/sharers. 

    The more participants that are in the network, the greater the potential network effects.  
  4. At scale, data sharing creates data sharing ecosystems. 

  5. Data sharing is an IT-supported business discipline, and not simply a KPI or the “plumbing” between two data sets.

    Since data sharing is a discipline, that means there are ways to optimize the benefits by improving the underlying business and technical processes supporting it.  
  6. Data sharing results in the creation of a shared data asset. 

    In other words, data sharing creates something new that benefits all contributors. 
  7. If data sharing happens between and/or across corporate entities, legal agreements which define the ownership rights of the new shared asset may be required. 

    These agreements are often brokered between individual participants, or through an intermediary.  
  8. The value of data sharing is optimized when all parties participate in a shared data governance framework. 

    This helps operationalize any constraints of the legal agreement — including data access, usage limits, and redistribution — and enforce any data quality or governance policies. A lack of this framework will severely limit the ability for parties to realize value from sharing.  

Data Sharing Defined

With these guiding principles in mind, I propose the following definition with the goal of the broader industry supporting data sharing as an emerging data management discipline: 

WHAT IS DATA SHARING?

Data sharing is a technology-enabled business discipline which involves the reciprocal exchange of data between multiple entities to facilitate the creation and management of a shared data asset.

If data sharing represents a transformational business force, then it must be defined in a way that represents something beyond the simple distribution or monetization of data. This is because companies have tried and failed for years to realize transformational value through data marketplaces or other data monetization schemes with the only exception being companies where their core business is data monetization. 

One-way flows of data between tables, people, databases or companies have been supported via legacy data pipelines for as long as computers have existed for years and certainly do not represent anything worthy of the hype behind data sharing.  

Focusing a definition on an act of reciprocity for multiple benefits only enables all participants in our industry — from vendors to data leaders — to better understand what’s required to move beyond the hype and into true business value.  

What do you think of my definition of data sharing? Join me at our monthly Office Hours to discuss data sharing, data governance and more in a live, interactive forum. 

Malcolm Hawker
Head of Data Strategy @ Profisee

Facebook
Twitter
LinkedIn

LET'S DO THIS!

Complete the form below to request your spot at Profisee’s happy hour and dinner at Il Mulino in the Swan Hotel on Tuesday, March 21 at 6:30pm.

REGISTER BELOW

MDM vs. MDS graphic
The Profisee website uses cookies to help ensure you have the best experience possible.  Learn more