Data Governance – Definition, Tools, Framework [Updated 2024]
Table of Contents
- Continue reading this article to learn:
- Data Governance Definition
- Why is Data Governance Important?
- OpenStreetMap Data Governance Example
- Who’s Involved with Data Governance?
- The Data Governance Framework
- Data Governance Best Practices
- Enterprise Data Governance Tools
- How to Get Started with Data Governance
- Data Governance Resources
- Frequently Asked Questions About Data Governance Download Guide
Updated: June 11, 2024
According to a 2022 Gartner report, analysts estimate that 80 percent of organizations seeking to scale digital business will fail by 2025 because they do not take a modern approach to data governance. All organizations need to plan how they use data so that it is handled consistently throughout the business to support business outcomes.
This means that organizations that successfully do this consider the who, what, how, when, where and why of data to not only ensure security and compliance but to extract value from all the information collected and stored across the business – improving business performance.
It’s all about how you handle the data collected within your business.
This is data governance, and most organizations are doing some sort of this without even knowing it.
According to a Profisee-sponsored report from Harvard Business Review Analytics Services, 67% of respondents say data governance is important to achieving high-quality enterprise data. Since technology trends such as Machine Learning and AI rely on data quality, and with the push of digital transformation initiatives across the globe, this trend is likely not going to change any time soon.
Because of this, we wanted to raise awareness of data governance to help those who care about data quality learn more about how the role of data governance impacts today’s business environments, stakeholders and company objectives.
We set out to produce the most comprehensive, free resources available on the web about data governance; this article is exactly that.
Download your copy of the guide to keep in your back pocket. If you’re ready to dive in, continue your journey below.
Continue reading this article to learn:
- Data governance definition
- Why bother
- Common business benefits associated with data governance
- Example goals of data governance programs
- OpenStreetMap Case Study
- Who’s typically involved in data governance programs
- A framework for a data governance strategy
- A look at a data governance maturity model
- The role of master data management in data governance
- The role of data governance related to data security, protection and privacy
- 15 data governance best practices; you’re welcome
Let’s get started!
Data Governance Definition
Go ahead and Google “Data Governance.” Within five seconds you will be drowning in definitions. Pick your favorite. We’ll wait.
At Profisee, we are big fans of keeping things simple, so we’ll give you one sentence:
Data governance is a set of principles and practices that ensure high quality through the complete lifecycle of your data.
According to the Data Governance Institute (DGI), it is a practical and actionable framework to help a variety of data stakeholders across any organization identify and meet their information needs.
The DGI maintains that businesses don’t just need systems for managing data. They need a whole system of rules, with processes and procedures to make sure those rules are followed, consistently, every working day. That is a tall order for any system of governance. Tools like the Profisee Platform make the work much easier.
That is sufficient enough to get us started.
Why is Data Governance Important?
Data is becoming the core corporate asset that will determine the success of your business. Digital transformation is on the agenda everywhere. You can only exploit your data assets and do a successful digital transformation if you are able to govern your data. This means that it is imperative to deploy a data governance framework that fits your organization and your future business objectives and models. That framework must control the data standards needed for this journey and delegate the required roles and responsibilities within your organization and in relation to the business ecosystem where your company operates.
A well-managed data governance framework will underpin the business transformation toward operating on a digital platform at many levels within an organization:
- Management: For top management this will ensure the oversight of corporate data assets, their value and their impact on the changing business operations and market opportunities
- Finance: For finance, this will safeguard consistent and accurate reporting
- Sales: For sales and marketing, this will enable trustworthy insight into customer preferences and behavior
- Procurement: For procurement and supply chain management, this will fortify cost reduction and operational efficiency initiatives based on exploiting data and business ecosystem collaboration
- Production: For production, this will be essential in deploying automation
- Legal: For legal and compliance, this will be the only way to meet increasing regulation requirements
Data Governance Benefits
If you’ve managed to get this far, the benefits are probably obvious. Data governance means better, leaner, cleaner data, which means better analytics, which means better business decisions, which means better business results. Better market positioning. Mindshare in your space. Reputation. Better profit margin (everybody likes this one).
It’s the GIGO principle. Garbage In, Garbage Out. Or as our friend, Scott Taylor puts it, the GIGE principle.
Remember: Garbage In, Garbage Everywhere.
Data Governance Goals
Of course, definitions are important. But action is more important. Now we know what it is. What do we want to do with it?
Here are a few possibilities:
- Make consistent, confident business decisions based on trustworthy data aligned with all the various purposes for the use of the data assets within the enterprise
- Meet regulatory requirements and avoid fines by documenting the lineage of the data assets and the access controls related to the data
- Improve data security by establishing data ownership and related responsibilities
- Define and verify data distribution policies including the roles and accountabilities of involved internal and external entities
- Use data to increase profits (everybody likes this one). Data monetization starts with having data that is stored, maintained, classified and made accessible in an optimal way.
- Assign data quality responsibilities in order to measure and follow up on data quality KPIs related to the general performance KPIs within the enterprise
- Plan better by not having to cleanse and structure data for each planning purpose
- Eliminate re-work by having data assets that are trusted, standardized and capable of serving multiple purposes
- Optimize staff effectiveness by providing data assets that meet the desired data quality thresholds
- Evaluate and improve by rising the data governance maturity level phase-by-phase
- Acknowledge gains and build on forward momentum in order to secure stakeholder continuous commitment and broad organizational support
These are just a handful of things you can do with great data governance. Bottom line is, that we either want to do these things to grow, or we have to do them to meet regulatory requirements. Regardless of the reason, the end result of not doing these things is the same. If we have bad data, we make bad decisions that we don’t realize are bad decisions until later.
“With bad data, we keeping making bad decisions.
We just don’t realize they’re bad decisions until later.”
– Scott Taylor, MetaMeta Consulting
That’s Scott Taylor, also known as the Data Whisperer. He’s been a thought leader in the MDM world for about 20 years, so when he crashed our office party in February, we figured there was a pretty good chance he knew what he was talking about. You’ll hear more from him later.
OpenStreetMap Data Governance Example
So, what does data governance look like in the wild? One of the most challenging spaces to put these practices to work is in open source projects like OpenStreetMap. Created by British entrepreneur Steve Coast in 2004, it was a response to the proliferation of siloed, proprietary international geographical data sources—dozens of mapping software products that did not talk to each other.
OSM uses data from volunteer contributors, much like Wikipedia, and is available to anyone with an Internet connection. Since 2008, OSM has grown from 50,000 registered users and contributors to over 2 million, with all of the map data submitted and collated by those volunteers. OSM is currently used by Facebook, Foursquare, and MapQuest, to name only three of the largest among literally thousands of professional users.
In plain speak: It is a miracle that this thing works at all. Some contributors are professional cartographers using high-tech GPS systems, and some are just weekend cyclists using their cellphones to triangulate and upload trip landmarks. But it does work, and it works well enough to be the trusted source of data for a number of Fortune 500 companies, some fast-track upstarts, and more mom-and-pop ventures than you can shake a stick at. A lot of folks use OpenStreetMap for their businesses.
We’re pretty optimistic when it comes to data purity. It comes with the territory. This is a miracle we understand. This model can only function if the data governance behind it works. And it is what Mr. Coast had in mind all along, building on a single revelatory concept.
The data is the product, not the map.
As you might imagine, a crowdsourced mapping system without a way to standardize contributor data could go wonky, as the Brits say, in a hurry. Establishing data standards early in the process and ensuring contributors adhere to them is key to the platform’s continued success.
Who’s Involved with Data Governance?
Data governance involves the whole organization to a greater or lesser degree, but let’s break down the most commonly involved stakeholders:
Data Owners: First, you will need to appoint data owners (or data sponsors if you like) in the business. These must be people that are able to make decisions and enforce these decisions throughout the organization. Data owners can be appointed at the entity level (eg customer records, product records, employee records and so forth) and supplementary on the attribute level (eg customer address, customer status, product name, product classification and so forth). Data owners are ultimately accountable for the state of the data as an asset.
Data Stewards: Next, you will need data stewards (or data champions if you like) who are the people making sure that the data policies and data standards are adhered to in daily business. These people will often be the subject matter experts for a data entity and/or a set of data attributes. Data stewards are either the ones responsible for taking care of the data as an asset or the ones consulted on how to do that.
Data Custodians: Furthermore, you may use data custodians (or data operators if you like) to make the business and technical onboarding, maintenance and end-of-life updates to your data assets.
Data Governance Committee: Typically, a data governance committee will be established as the main forum for approving data policies and data standards and handling escalated issues. Depending on the size and structure of your organization, there may be a subcategory for each data domain (eg customer, vendor, product, employee).
These roles highlighted above should optionally be supported by a Data Governance Office with a Data Governance Team. In a typical enterprise, here are some folks who might make up a Data Governance Team:
- Manager, Master Data Governance: Leads the design, implementation and continued maintenance of Master Data Control and governance across the corporation.
- Solution and Data Governance Architect: Provides oversight for solution designs and implementations.
- Data Analyst: Uses analytics to determine trends and review information
- Data Strategist: Develops and executes trend-pattern analytics plans
- Compliance specialist: Ensure adherence to required standards (legal, defense, medical, privacy)
One of the most important aspects of assigning and fulfilling the roles is having a well-documented description of the roles, the expectations and how the roles interact. This will typically be outlined in a RACI matrix describing who is responsible and accountable to be consulted and be informed within certain enforcement, a processor for a certain artifact as a policy or standard.
The Data Governance Framework
A data governance framework is a set of data rules, organizational role delegations and processes aimed at bringing everyone in the organization on the same page.
There are many data governance frameworks out there. As an example, we will use the one from The Data Governance Institute. This framework has 10 components; let’s discuss in detail:
WHY:
Master data can be described by the way that it interacts with other data.
A mission and vision that states why data governance is essential within our organization. At best, this should be related to the business objectives of the enterprise. This should be endorsed by the top-management.
WHAT:
The short-term and long-term goals for the data governance program as well as the success criteria and their measurement. Often this should be addressing the main pain points that exist in various lines of the business. This must be aligned with the funding and other involved line management.
HOW:
Data rules and definitions in the form of data policies, data standards, data definitions preferable as a business glossary and how business rules transform into data rules. This should cover the data assets describing the core business entities essential to meeting the business objectives. The data governance office/team will work with data owners and data stewards to set this up.
- The decision rights that exist for managing the data assets in the day-to-day business. This will include what data stewards can decide and what must be escalated to a data governance committee or similar authority.
- The accountabilities and related responsibilities delegated within the organization. This can include a full RACI matrix with counsel and informee roles as well.
- The control mechanisms that is put into action in order to measure adherence of data rules and achievements toward the defined goals. The mechanisms can by established within business processes, in IT applications and as part of reporting.
WHO:
The Governance Office / Team should be organized to support the cross functional data governance structures and activities. It collects metrics and success measures and reports on them to data stakeholders. It provides ongoing stakeholder care in the form of communication, access to information, record-keeping, and education/support.Data stewards will play an essential part in enforcing data rules and resolve most issues before they become a major challenge. A typical responsibility for data stewards will setting up the data quality measurements and following up on the trends in the data quality KPIs and performing root cause analysis where thresholds are not met.
WHEN:
Last, but not least, at set of standardized, documented and repeatable processes must be deployed with the right balance of enabling technology. The orchestration of data governance processes will ultimately determine the success – or failure – or your data governance framework and the ability to rise in data governance maturity.
Grow Up, Kid: The Maturity Model
Measuring your organization up against a data governance maturity model can be a very useful element in making the roadmap and communicating the as-is and to-be part of the data governance initiative and the context for deploying a data governance framework.
One example of such a maturity model is the Enterprise Information Management maturity model from Gartner, the analyst firm:
Phase 0 – Unaware: In this initial phase, organizations may find themselves unaware of the potential benefits of data governance. It is common to encounter skepticism and lack of understanding regarding how data governance can enable better business outcomes. However, even in this phase, there may be a vision for the future and a need to convince key stakeholders in the organization about the value of data governance through smaller goals and awareness-building initiatives.
Phase 1 – Aware: As organizations progress to the aware phase, they begin to recognize the lack of ownership and sponsorship of data governance, as well as the need for policies and standards. This phase presents an opportunity to launch a tailored data governance framework that addresses the immediate pain points within the organization. It involves establishing a structure to oversee data governance and improve data management practices.
Phases 2 and 3 – Reactive & Proactive: Moving into the reactive and proactive phases signifies a more comprehensive approach to data governance. The organization can establish a comprehensive data governance framework that covers all aspects of data governance, including data ownership, data stewardship, and the establishment of a Data Governance Office/Team. The focus shifts to aligning the data governance framework with the organization’s business outcomes, achieving greater efficiency and effectiveness in managing data.
Phases 4 and 5 – Managed & Effective: In the managed and effective phases, the data governance framework becomes an integrated part of the organization’s way of doing business. At this stage, the organization’s data governance policies and procedures serve as a guidebook, while the maturity model acts as a history book, documenting the journey and progress made over time. These phases represent a high level of maturity, where data governance is ingrained in the organizational culture and supports optimal data management practices.
While a “one-size-fits-all” approach may not work for every organization, the maturity model provides valuable insights into an organization’s data governance journey, helping shape future goals and strategies. By understanding where they have been, organizations can chart a path forward, leveraging the maturity model to continuously enhance their data governance practices and achieve better business outcomes.
Connection to Master Data Management
Data Governance is the strategic approach. Master Data Management (MDM) is the tactical execution. That’s it. We’re good. You can go home now.
Not convinced? Ok. Don’t take our word for it. As promised, we’re back with Scott Taylor of MetaMeta Consulting. He has forgotten more about master data than most of us will ever know, so we’re happy to give him the last word.
“All enterprise systems need master data management,” Scott said at our Profisee 2019 kickoff event. “Marketing, sales, finance, operations. There is benefit everywhere, in enterprises of any size, in every industry, across the globe, at any point in their data journey.”
“Master data is the most important data because it is the data in charge,” Scott said. It’s about the “business nouns” – the essential elements of your business. Customers, partners, products, services. Whatever your business is, that’s where master data lives and breathes. You may have the best governance plan on the planet. Well-governed bad data is still bad data. It’s not going to help your business.
“Everybody is in the data business, whether they realize it or not,” Scott said. “Everything we touch turns to data. Business is transforming from analog to digital. No matter what your product is, data is your product. Business is changing because of data, and data is power. With the right tools, you can harness that power right now.”
We could not have said it better ourselves.
Data Protection and Data Privacy
The increasing awareness around data protection and data privacy, for example, manifested by the European Union General Data Protection Regulation (GDPR) have a strong impact on data governance.
Terms such as data protection by default and data privacy by default must be baked into our data policies and data standards not least when dealing with data domains such as employee data, customer data, vendor data and other party master data.
As a data controller, you must have full oversight over where your data is stored, who is updating the data and who is accessing the data for what purposes. You must know when you handle personally identifiable information and do that for legitimate purposes in the given geography both in production environments and in test and development environments.
Having well-enforced rules for the deletion of data is a must too in the compliance era.
Data Governance Best Practices
On one hand, you can learn a lot from others who have been on a data governance journey. However, every organization is different, and you need to adapt the data governance practices all the way starting from the unaware maturity phase to the nirvana in the effective maturity phase. Nevertheless, please find below a collection of 15 short best practices that will apply in general:-
Start small.
As in all aspects of business, do not try to boil the ocean. Strive for quick wins and build up ambitions over time. -
Set clear, measurable, and specific goals.
You cannot control what you cannot measure. Celebrate when goals are met and use this to go for the next win. -
Define ownership.
Without business ownership a data governance framework cannot succeed. -
Identify related roles and responsibilities.
Data governance is a teamwork with deliverables from all parts of the business. -
Educate stakeholders.
Wherever possible use business terms and translate the academic parts of the data governance discipline into meaningful content in the business context. -
Focus on the operating model.
A data governance framework must integrate into the way of doing business in your enterprise. -
Map infrastructure, architecture, and tools.
Your data governance framework must be a sensible part of your enterprise architecture, the IT landscape and the tools needed. -
Develop standardized data definitions.
It is essential to strike a balance between what needs to be centralized and where agility and localization works best. -
Identify data domains.
Start with the data domain with the best ratio between impact and effort for rising the data governance maturity. -
Identify critical data elements.
Focus on the most critical data elements. -
Define control measurements.
Deploy these in business process, IT applications and/or reporting where it makes most sense. -
Build a business case.
Identify advantages of rising data governance maturity related to growth, costs savings, risk and compliance. -
Leverage metrics.
Focus on a limited set of data quality KPIs that can be related to general performance KPIs within the enterprise. -
Communicate frequently.
Data governance practitioners agree that communication is the most crucial part of the discipline. -
It is a practice, not a project.
Enterprise Data Governance Tools
Investing in enterprise-wide data governance is critical for organizations leveraging their data as a strategic asset. It is no surprise that in a Harvard Business Review survey of 343 business leaders, over 87% said data governance is critical to their enterprise data strategy.
As organizations begin to more seriously consider the impact that data governance can have on their business, they often consider tools and technological solutions to fit their needs. But that in and of itself can be an overwhelming task, especially given the number of data governance solutions available today and the varying functions each category of tool can provide.
In fact, hundreds of vendors say they ‘do’ data governance, but there’s no single solution to support every aspect of an enterprise data governance program.
An effective data governance program requires coordination across many, many different data governance tools spanning both operational and analytical uses of data.
Types of Data Governance Tools
Consider the many types of data governance tools and the different problems they solve:
Cloud infrastructure providers
The major infrastructure and service providers include Microsoft Azure, Amazon Web Services (AWS) Google Cloud and more. Along with other data warehouses, data lakes or other repositories, these providers primarily focus on data access and security controls.
While this is certainly helpful functionality, it is only a small part of the data governance equation.
Data Catalogs:
Many of these vendors market themselves as governance solutions, as some allow for both data cataloging and governance policies to be defined for the data in the catalog. Keep in mind that data governance spans policy definition, management and enforcement of data quality rules.
Having policies written down is a great start, but organizations need a way to ensure downstream solutions comply with those policies for a truly ‘complete’ data governance tool.
Master Data Management Solutions:
As outlined above, master data management (MDM) is intricately tied to data governance strategies in that it encompasses the enforcement of governance policies for a significant subset of widely shared enterprise data.
Ideally, data governance tools are natively and bi-directionally integrated with MDM tools to share metadata, inform the master data model and label “certified” master data sets for consumption by analytics or other downstream systems.
Data Quality Tools:
Like MDM, data quality tools manage the enforcement of a subset of data governance policies. Data quality is a critical aspect of governance, but it is not the only aspect.
Enterprise Analytics & Business Intelligence Solutions:
Analytics tools like Power BI, Tableau, Qlik and others can play a critical role in enforcing governance policies for the consumption of analytical data, but that’s only half of the equation. Data governance programs also need to support operational uses of data, too.
While no data governance tool will likely address every facet of an enterprise-wide data governance strategy, many of the best data governance tools support bi-directional integration with other data management or analytics systems and cover a broad array of use cases.
Best Enterprise Data Governance Tools
Some of the most popular and common data governance tools include:
Microsoft Purview
While originally released as Azure Purview in September 2021, it was rebranded as Microsoft Purview in April 2022.
The newer release served as a step up from the previous iteration bringing together former Azure Purview and Microsoft 365 Compliance portfolio functionalities under a single brand to create a more unified platform for end users.
According to Microsoft, Microsoft Purview helps businesses understand and govern the data across their entire estate, safeguard that data wherever it lives, and improve the risk and compliance posture in a much simpler way than traditional solutions.
Microsoft Purview gives organizations a unified data governance solution. By summarizing your data estate, it delivers key insights on stewardship and governance while also pointing out the possible risks involved with your foundational data. Purview allows data leaders to govern and administer their on-premises, multi-cloud and software-as-a-service (SaaS) data while also having a better understanding of their overall data estate.
Microsoft Purview Core Features
Microsoft Purview performs several core features, including:
- Audit
Support forensic investigations and meet regulatory requirements with critical audit log events, and customized retention policies.
- Communication Compliance
Foster a safe and compliant workplace by detecting sensitive or inappropriate content shared across your organization’s communication channels.
- Compliance Manager
Reduce risk by translating complex regulatory requirements into specific improvement actions that help you raise your score and track progress.
- Data Lifecycle Management
Classify and govern data at scale to meet your legal, business, privacy, and regulatory content obligations.
- Data Loss Prevention
Automatically protect sensitive information from risky and unauthorized access across apps, services, endpoints, and on-premises files.
- Data Map and Data Catalog
Maximize the business value of data for your consumers by creating a unified map to automate and manage metadata from hybrid sources. Make data easily discoverable and understand the origin of your data with interactive data lineage visualization.
- eDiscovery
Discover and manage your data in place with end-to-end workflows for internal or legal investigations.
- Information Protection
Discover, identify, classify, and protect sensitive data that is business critical, then manage and protect it across your environment.
- Insider Risk Management
Detect, investigate, and act on critical risks in your organization, including data theft, data leaks, and security policy violations.
Microsoft Purview is natively integrated with Profisee Master Data Management for unified data governance.
Microsoft Purview Reviews
Where to find reviews for Microsoft Purview:
Collibra
Collibra data governance software is a comprehensive solution designed to facilitate effective management and control of an organization’s data assets. This software offers a range of features and tools that aid businesses in maintaining data quality, compliance, and accessibility throughout the data lifecycle. Collibra enables users to define data ownership, establish data-related policies, and ensure consistent data definitions across the organization.
The software also supports data cataloging, allowing users to discover and understand available data assets. Through its user-friendly interface, Collibra provides dashboards, reporting, and collaboration capabilities to enhance data stewardship and collaboration among teams. With a focus on data transparency and accountability, Collibra assists enterprises in achieving better decision-making processes and improved data-driven insights.
Collibra Core Features
Collibra offers several core features, including:
- Data Governance Framework
Establishes a structured approach to managing data assets, defining ownership, responsibilities, and processes. - Data Cataloging
Provides a centralized repository for storing metadata and information about available data assets, making data discovery easier. - Data Lineage
Visualizes the flow of data across systems, showing how data is transformed and used, aiding in understanding data origins and dependencies. - Data Quality Management
Monitors and measures data quality, allowing users to define data quality rules, perform assessments, and address data quality issues. - Policy Management
Enables the creation and enforcement of data-related policies, ensuring compliance with regulations and internal standards. - Data Security and Privacy
Manages access controls, permissions, and data masking to protect sensitive information and ensure privacy compliance. - Data Lifecycle Management
Supports data from its creation to archival or deletion, ensuring data is managed effectively throughout its lifecycle. - Change Management
Tracks changes made to data assets and related governance processes, maintaining a historical record for transparency. - Audit and Compliance
Assists in meeting regulatory requirements by providing documentation and evidence of data governance practices.
Collibra Reviews
Where to find reviews for Collibra:
Apache Atlas
Apache Atlas is an open-source data governance and metadata management platform that provides organizations with tools to effectively manage and understand their data assets. Developed under the Apache Software Foundation, Apache Atlas offers a comprehensive set of features designed to enhance data discovery, lineage tracking, and data governance practices.
The platform enables users to define, classify, and manage metadata for various data entities, fostering a better understanding of data relationships and usage. Apache Atlas supports the creation of a business glossary, allowing organizations to establish consistent data terminology. Additionally, it provides data lineage visualization, illustrating how data flows through different systems and transformations. This aids in establishing data provenance and understanding data transformations.
Apache Atlas Core Features
Apache Atlas offers several core features, including:
- Data Cataloging
Alation provides a centralized repository for cataloging and organizing data assets across the organization. It captures metadata, data lineage, and usage information, making it easier to discover and understand available data. - Data Discovery
Users can search and find relevant data quickly, thanks to Alation’s search capabilities and smart recommendations based on usage patterns and context. - Data Lineage
Users can visualize how data flows and transforms across systems, helping them understand data origins, transformations, and dependencies. - Business Glossary
Alation supports the creation and maintenance of a consistent business vocabulary, ensuring that data terms and definitions are understood uniformly across the organization. - Data Stewardship
The software facilitates the assignment of data ownership and stewardship responsibilities, ensuring accountability and proper management of data assets. - Data Governance Policies
Alation allows organizations to define and enforce data governance policies, ensuring compliance with regulations and internal standards. - Data Quality Monitoring
Users can measure and monitor data quality using predefined or custom rules. This helps in identifying and addressing data quality issues. - User Access Controls
Alation helps manage user permissions and access controls, ensuring that sensitive data is accessed only by authorized individuals. - Reporting and Analytics
Alation provides insights into data usage, data quality, and compliance through dashboards and reports. - Metadata Management
Apache Atlas provides a centralized repository for capturing and managing metadata related to data assets, including information about data sources, attributes, relationships, and usage. - Data Classification and Tagging
Users can define and apply metadata tags and classifications to data assets, making it easier to categorize and organize data based on its type, sensitivity, or other criteria. - Data Lineage Visualization
Apache Atlas offers the capability to visualize the lineage of data, showing how it moves and transforms across various systems and processes. This aids in understanding data flows and transformations. - Data Provenance
The platform helps establish data provenance by tracking the origin and history of data, ensuring transparency and accountability in data usage. - Business Glossary
Apache Atlas supports the creation and management of a business glossary, providing a consistent vocabulary for data terms and definitions across the organization. - Data Cataloging
Apache Atlas acts as a catalog for data assets, making it easier for users to find and understand available data resources within the organization.
Where to find reviews for Apache Atlas:
How to Get Started with Data Governance
Data governance is crucial for organizations to effectively manage and utilize the vast amount of data they possess. If your organization is looking to get started with data governance, here are some steps to follow:
- Assess your data ecosystem: Begin by understanding the current state of your data. Identify where the data is stored, who owns it and how it is used within the organization. This comprehensive assessment will provide insights into the gaps and opportunities for data governance.
- Define data governance objectives: Clearly define the goals and objectives of your data governance initiatives. Consider aspects such as data quality, data privacy, compliance and security. These objectives will serve as guiding principles throughout the data governance implementation process.
- Establish data governance policies: Develop policies and guidelines that will govern how data is collected, stored, shared and used within your organization. These policies should align with industry regulations and best practices.
- Establish data governance roles and responsibilities: Assign specific roles and responsibilities to individuals in your organization to ensure the smooth implementation and ongoing management of data governance. Identify data stewards, data owners and other key stakeholders who will be responsible for data-related tasks and decision-making.
- Implement data governance practices and processes: Utilize the capabilities of data governance solutions to implement the defined policies and establish structured processes for data management. This may include data profiling, data lineage, metadata management and data catalogs.
- Collaborate and communicate: Data governance is a collaborative effort involving various departments and stakeholders. Foster a culture of collaboration and communication to ensure everyone understands the importance of data governance and their role in maintaining data integrity.
- Monitor and measure: Regularly monitor the effectiveness of your data governance program. Measure key performance indicators (KPIs) related to data quality, compliance and security. A data governance tool with monitoring and reporting capabilities can help track progress and identify areas for improvement.
- Continuous improvement: Data governance is an ongoing process. Continuously evaluate and refine your data governance practices based on feedback, industry trends and evolving data needs. Regularly update policies, processes, and technologies to adapt to changing business requirements.
By following these steps and leveraging data governance solutions, your organization can effectively establish and strengthen its data governance practices, minimizing risks and maximizing the value of data assets.
Data Governance Resources
There are many resources out there where you can learn more about data governance. Read our full guide on the what, why and how of data governance.
Frequently Asked Questions About Data Governance
What is enterprise data governance?
According to Gartner, data governance is “…the specification of decision rights and an accountability framework to ensure the appropriate behavior in the valuation, creation, consumption and control of data and analytics.” Data governance varies from organization to organization, but it always entails defining a set of standards, principles and processes to make sure data is of good quality and serves the organization’s specific needs to drive business outcomes.
For smaller organizations, this could be as simple as adding data validation to spreadsheets where data is stored. On the other hand, organizations at the enterprise level typically require data governance tools, like Microsoft Purview, to help them execute the advanced data governance requirements of a large operation, including scanning data sources, building a data glossary and capturing permissions-based rules and operations. Enterprise data governance tools will usually include features for regulatory compliance, data lifecycle management, audit trails and data governance catalog management, among others.
Additional benefits of enterprise data governance tools include driving added layers of scalability through the automation or augmentation of the definition and enforcement of governance policies. They also improve traceability or auditability of the change management process, which is critical for governance success.
How can I start a data governance program?
- Foundation and Planning
- Data Governance Strategy and Roadmap: A strategic plan outlining the goals, scope and timeline for implementing data governance.
- Business Case and ROI Analysis: Documentation demonstrating the value and expected return on investment of data governance to gain executive support.
- Data Governance Framework: A detailed framework that includes the policies, procedures and standards for managing data assets.
- Organizational Structure
- Data Governance Council/Committee Charter: A charter defining the roles, responsibilities and membership of the data governance council or committee.
- Roles and Responsibilities Matrix: A detailed matrix outlining the roles (e.g., Data Stewards, Data Owners, Data Custodians) and their responsibilities within the data governance framework.
- Policies and Procedures
- Data Governance Policies: Policies covering data quality, data privacy, data security, data usage and data lifecycle management.
- Standard Operating Procedures (SOPs): Procedures for implementing the data governance policies and managing data assets.
- Data Management
- Data Catalog: A comprehensive catalog of the organization’s data assets, including metadata and data lineage information.
- Data Quality Management Plan: A plan for assessing, monitoring, and improving data quality.
- Data Stewardship Program: A program outlining how data stewards will manage data quality, integrity, and availability.
- Technology and Tools
- Tool Selection and Implementation Plan: A plan for selecting and implementing data governance tools such as data cataloging, metadata management, data quality and master data management (MDM) tools.
- Data Governance Technology Infrastructure: The infrastructure required to support data governance tools and processes.
- Change Management and Training
- Change Management Plan: A plan for managing organizational change and ensuring stakeholder buy-in and adopting data governance practices.
- Training Materials and Programs: Comprehensive training programs and materials for educating stakeholders on data governance principles, policies and tools.
- Monitoring and Reporting
- Key Performance Indicators (KPIs): Defined metrics to monitor the effectiveness and progress of the data governance program.
- Reporting Framework: A framework for regularly reporting on data governance activities, compliance, and performance against KPIs.
- Compliance and Risk Management
- Compliance Framework: A framework for ensuring data governance practices comply with relevant regulations and standards (e.g., GDPR, CCPA, HIPAA).
- Risk Management Plan: A plan for identifying, assessing and mitigating risks related to data governance.
- Continuous Improvement
- Feedback Mechanism: Processes for collecting feedback from stakeholders to continually improve data governance practices.
- Audit and Review Schedule: A schedule for regularly auditing and reviewing data governance practices to ensure they remain effective and aligned with business goals.
What are some data governance standards I can use?
Data governance standards are an essential part of any data governance program. Going beyond just data quality, data standards define the dimensions (or standards) that data should include for it to adequately serve the organization’s data governance goals and reduce its risk profile.
According to the DAMA Guide to the Data Management Body of Knowledge (DAMA-DM BOK) Data Governance Wheel, data governance standards include:
- Quality: There is no set definition of data quality — rather, organizations decide what qualities data should have for it to serve their needs. This usually means data must be accurate, timely, complete and consistent.
- Architecture: A data architecture outlines the structure, policies and standards for managing data assets. An architecture should support the organization’s goals and facilitate efficient data management and use.
- Modeling and Design: Detailed data models accurately represent the organization’s data requirements and relationships. This helps to ensure data consistency, quality and compliance with business rules.
- Storage and Operations: Efficient data storage solutions and operational practices ensure data is securely stored, easily accessible and well-maintained. Storage strategies should be reviewed and optimized regularly to handle growing data volumes.
- Security: Sensitive data should be stored securely, ideally with strong encryption and carefully managed role-based permissions. This is not only a matter of protecting proprietary information, but also protecting customer privacy and maintaining compliance with privacy regulations.
- Integration and Interoperability: Establish processes and standards for integrating data from different sources and ensuring seamless interoperability between systems. This promotes a unified view of data across the organization, reducing or eliminating data silos for better decision making.
- Document and Content Management: Document and content management systems let you manage documents and unstructured content effectively and securely. Ensure that key documents are accurately captured, stored, indexed and made accessible to the right people while maintaining compliance with regulatory requirements.
- Reference and Master Data: Reference data and master data are critical for ensuring consistency and accuracy across all business processes and systems. A master data management (MDM) solution will usually let you manage both reference and master data, since reference data is a special kind of master data used to classify or provide additional context around data.
- Data Warehousing and Business Intelligence: Data warehouses store large volumes of structured and unstructured data where it can then be retrieved through data marts for use in downstream systems like business intelligence and analytics.
- Metadata: Metadata provides context and meaning to data assets. This includes capturing, storing and maintaining metadata to facilitate data discovery, usage and governance. Metadata policies should be aligned with business objectives and regulatory requirements.
Data governance standards will vary slightly from organization to organization, so it’s important to consider the needs and goals of your organization to define your own data standards.
Benjamin Bourgeois
Ben Bourgeois is the Director of Digital Marketing at Profisee, where he works with product, engineering and pre-sales teams to develop useful content and resources for prospective and current customers. He has experience managing content teams across the B2B SaaS, healthcare, specialty chemical and manufacturing industries.