Master Data Services (MDS) is Microsoft’s solution for managing master data. In the past, master data management (MDM) has been a highly problematic area for database administrators. To help you understand better how MDS addresses the inherent challenges of MDM, this guide begins by explaining what MDM encompasses and how it has been handled historically. It then introduces you to MDM solutions other than MDS and explains their shortcomings so that you can fully appreciate the features and functionality of MDS, which is introduced in the latter half of this chapter.
Master data management (MDM) is the management of the non-transactional data within an organization. The definition of “master” data varies by organization, but can be loosely defined as the nouns that describe all business processes. These nouns might be organization-specific data, like your organization’s list of products or list of employees. Or the nouns might be common reference data provided by an external service provider or government agency, like address information or a Dunn & Bradstreet (D-U-N-S®) number. Although customer and product are the two most commonly managed domains, many businesses find value in managing additional domains, such as wells and fields at big oil companies and recipes at food manufacturers.
In most businesses, customers buy products or services. Because customer relationships are essential, each time a customer buys a product or service, the transaction is recorded. After the transaction is recorded and coded properly, the details of the transaction will never change. MDM is about managing the relationships between these static transactions, rather than the transactions themselves. For example, for a retail chain, each store has a certain group of products that it has in its inventory and a database of registered customers. The list of products available to sell and the list of customer addresses are master data. That Sally bought four shirts on Friday is a transaction.
The more often this data is required for a transaction, the greater its importance to your organization. Central management of this data helps identify data discrepancies between multiple transactional systems and helps your organization run more efficiently.
Master data management is composed of the following:
By definition, all companies must be utilizing some form of MDM, although the term generally refers to the implementation of formalized processes and specifically designed tools.
When companies began using computer applications to manage information, they had to load all pertinent data directly into each application before they could perform tasks and generate results with those applications. Over time, companies adopted more applications and had to enter the same dataset into each of them. As a response to this, developers began to employ the technique of creating master files that stored key, reusable data for use within each application. These initial master files were the first MDM systems.
Once the master data was loaded into each application, new records would be added in the application or to the master list, and the result of which was that the two sets of data were no longer in sync. Early data reconciliation efforts were time consuming and costly. Very few tools existed that could identify differences and manage duplicate records.
Enterprise resource planning (ERP) systems were then created to help combine the data in these systems and to solve the constant need for integration between systems. ERP systems were meant to be single software suites that managed standard business processes within an organization. Each module in an ERP system could leverage the common master data tables required for the associated business process. Many organizations today consider the master data tables within their ERP systems to be the “master” data within their organization.
As more and more organizations adopted ERP systems, they realized that using these single systems in isolation was an unrealistic solution. They then adopted multiple ERP systems, and their master data problems re-emerged. Specialized systems designed to manage the most problematic domains became major implementations in the larger organizations. Now there is a push in the industry to provide solutions that can solve MDM issues across a wide variety of domains.
All organizations must deal with their MDM issues. Whether managing domains in Excel spreadsheets or managing them with a specialized solution, a key role of the IT department in any size organization today is to provide processes for managing master data and integrating it across all systems. Various solutions other than MDS are available, but each has weaknesses that are addressed by MDS.
As explained in the following sections, ERP solutions can have several drawbacks as MDM solutions, other specialized solutions can prove to be too specialized, and often MDM solutions typically cater to only the biggest organizations. As you will read later, in MDS, Microsoft hopes to provide an adaptable solution that can cater to organizations of all sizes.
While ERP systems are a significant consumer of master data, using these systems by themselves to manage master data has major drawbacks. Also, the initial concept vision that a single ERP system would handle all of the computing needs of an organization was shortsighted.
As you would suspect, the most common domains for management are those domains most troublesome to large organizations. Specialized systems have been created to manage these domains, with many features designed directly for them. These systems were designed to manage a single domain and don’t translate well to other master data problems. The two most common types of specialized solutions for managing master data are customer data integration and product information management.
A central feature of CDI solutions is the ability to identify and manage duplicate customer records. Other common features include address correction and standardization and the ability to integrate with service organizations to further enrich business-specific customer knowledge. CDI implementations tend to incorporate most systems within an enterprise and focus on a relatively small subset of attributes.
Companies with customer management problems will certainly benefit from a CDI implementation. Unfortunately, these features do not translate well to managing other domains across the enterprise such as organization or product.
Again, these systems are highly effective to solve a narrow band of problems. These systems do not translate well to additional domains. Very few providers have solutions in both the CDI and PIM spaces.
Historically, vendors of MDM solutions have catered to the needs of Fortune 500 companies. Until recently, the MDM market was dominated by complex and expensive applications that generate large amounts of consulting dollars to trained implementers. These companies generally have the means and the budget to pay for the consulting time and tools that comprise an MDM solution. These solutions were tied to either a specific domain or a feature set that supported one domain better than others. These solutions are expensive and are built to solve a unique set of issues in large organizations. The size and complexity of these engagements has led many of these projects to end in failure.
The MDM market’s focus on Fortune 500 companies doesn’t mean that small and midsize companies don’t face similar issues with managing their data. Any organization that’s attempting to store critical data in multiple systems or spreadsheets and having trouble determining a true version of its master data is in need of an MDM solution.
The costs associated with MDM solutions and the high risk of failure in self-deploying such solutions leave a large portion of the small and midsize business market underserved. Out of necessity, many of these businesses are using Excel spreadsheets or internally designed systems to manage master data. These systems typically neglect the need for security, central management, and versioning.
Microsoft first shipped Master Data Services with SQL Server 2008 R2. That release was largely about Microsoft entering the MDM market and beginning to shape the discussion around MDM. It also provided an opportunity for Microsoft to begin building channels for partners that both implement the solution and create additional applications on top of MDS. In effect, the first release was about getting into the game. The second release is about changing the rules. In SQL Server 2012, MDS looks to redefine the MDM market, expanding the types of data stored and easing the barriers to managing that data. This section provides a quick overview of the new MDS features in SQL Server 2012 as well as the features carried forward from SQL Server 2008 R2.
Master Data Services (MDS) provides a number of features that facilitate central management of master data while providing greater access to the editors and consumers of this information.
The MDS system is built to be rapidly deployed for any domain within an organization. Once deployed, all models support additional customization without complex coding or reconfiguration. The intent is to make MDM software more accessible to small and midsize businesses, and to aid departments of large companies in creating solutions for themselves. While Master Data Services provides a rich web services platform for system integration, and enables you to use web services to create your own custom user interface, you do not need to use web services to take advantage of what MDS has to offer. An MDS implementation can be completed successfully by business users with no programming knowledge.
The first step in any software application is to deploy the software, and Master Data Services is no exception. This chapter focuses on project size and scope, which should help you determine your needs, including whether you need any external assistance.
Before installing Master Data Services, it is important to determine your short- and medium-term goals for the application. MDS was designed as a web application to provide a simple deployment model for a wide range of organizations. This is also a good time to make sure you have the necessary information to deploy and configure the application successfully, and to determine whether your project requires external expertise. There are a number of questions to consider before installing MDS:
Based on the preliminary data that you discover from answering the preceding questions, you can determine the relative complexity of your MDM project.
The complexity of your project should affect both the scope of the implementation and the amount of ongoing effort required to maintain the MDS project. Small projects should be manageable by novice individuals with sufficient business knowledge and the aid of this guide. These projects should be functional and productive within a week’s worth of effort.
Many small projects revolve around finding a home for “homeless” data within the organization. This data is critical to regular business processes, but not important enough to be managed in any standard process system. Much of this homeless data tends to live in unmanaged Excel spreadsheets. The transition of this homeless data into MDS can provide structure and control over it. If IT personnel were previously responsible for managing changes provided by business users, they can now provide users with access to the appropriate data directly. No formal roles or duties need to be created.
A wide range of projects fall into the intermediate range. These projects can be handled internally, but require resources to be fully committed to the implementation effort. Most small and medium-sized business implementations will be intermediate-sized projects. As these projects become more complex, bringing in outside expertise for the implementation should be considered seriously. Making the decision to do so does not diminish the need for internal knowledge and education; although these consultants will aid in the implementation of the solution initially, internal staff will be needed to maintain the MDM system and processes going forward. Most of these projects can be managed in a single phase, requiring approximately 30 to 200 hours to be successfully implemented. Identification of at least one data steward within the organization is essential to long-term success.
Data steward is a common role found in MDM projects. These individuals tend to be technically savvy while still understanding the nuances of the business domains. A data steward must be a champion of data governance and must help create sustainable data maintenance processes within the organization. Many times, data stewards find themselves acting as referees in how data is maintained, caught between competing business processes and applications. Systems may maintain different rules regarding the quality and timeliness of data that must be managed by the data steward. The ability to find efficient compromises will determine how effective a data steward is for an organization.
Once projects reach a certain level of complexity, they become too large to manage in a single phase. The cost and complexity of these large projects requires engaging external expertise that can provide the guidance and resources necessary to implement enterprise-wide MDM solutions. These large projects may span multiple years and locations, and breaking these projects into multiple milestones and ROI checkpoints is advisable. If these projects can be broken down into more manageable intermediate projects, lessons learned from preceding projects can be applied to later implementations.
Before you build a model in MDS or any other MDM system, it is essential to have a basic understanding of your data and current processes. In this chapter, we provide deeper insight into what is and what is not master data, and we provide some simple labels for your current business systems. Toward the end of the chapter, we discuss some tips and tricks for modeling your data, with the goal of helping you to free your enterprise from many of the data restrictions that have plagued it for so long.
Over the past year, one of the questions most commonly asked by organizations has been, “Is there a definitive line between master data and other important data in my company?” The truth is, there isn’t one and there shouldn’t be. In most companies, master data management is used only for business-critical domains like product or customer because the implementation of MDM solutions throughout these organizations would be so prohibitively expensive. If an accessible solution with a rapid time to value were available, other important domains could benefit from MDM tools as well.
We’ve already talked about a couple of the key differentiators between master data and transactional data within an enterprise, but they are worth repeating: Data that you wish to store in MDS should not relate to a single event in time because that is transactional data. In MDS, you should store information that has a state or continues to have a state for some specific period of time. This information is master data.
Before you build your model in MDS or any other MDM system, it is imperative to understand the flow of data through your enterprise. Where do new accounts, customers, or products originate? What is the process that turns a concept into a product? What is the process to onboard a new customer? How does the data flow from inception to all systems across the organization?
Business process modeling can provide decision makers with the information that they need to prioritize and plan the creation of their MDS implementation. Modeling relevant processes related to the creation and maintenance of domains you are interested in managing is the most logical place to start. Although you could employ skilled business process managers to interview employees and provide detailed designs of their findings, most projects can get by with a do-it-yourself approach.
Your first step should be a quick inventory of any existing documentation of current processes, like the example shown in Figure 3-1 . Much of what you require might already exist. Be sure to review any existing documents with current employees in the modeled roles. It is astounding how quickly reality can diverge from the documentation given a little bit of time. The more manual the processes, the more easily these processes can change or be abandoned.
Figure 1: Diagram of sample business process model
When you conduct the interviews required to create a business process model yourself, you should complete the following two steps in sequence. Each step provides insight into the investigated domain. Make sure to record the actors, activity, and length of time required for each business process.
Systems can be categorized into three different roles. As different domains have significantly different origination points, it is common for the same system to perform separate roles for different domains within a company.
A system of entry (SOE) is a system where data is first entered in a business. These systems typically include all of the users who are able to create new data. Generally, these are the systems that are used daily by the owners of the type of data being investigated. Many systems of entry require that rules be enforced before data can fully be entered. Oftentimes these rules force much of the accumulated information about the member record to be managed outside of the system.
Think of a new product being created. Often, the inception of a new product is scribbled on a napkin or sketched on a whiteboard. Many of the details of the product will be discussed and decided via e-mail format. It is important to capture and manage this process, because creating a workflow and roles around product creation can provide many benefits, like online collaboration and a tracked history of the creative process. The lack of resources to manage this process often impedes the implementation of a new product.
A system of record (SOR) is any system within the organization that is considered the source for other systems. Typically, the enterprise resource planning (ERP) system is the system of record within an organization, but oftentimes many other systems can be considered sources for downstream systems. Many data inconsistencies are created on multiple SORs within an organization. While MDS strives to be an adequate SOE for organizations, it is imperative that many if not all of the SORs are moved onto the MDS platform. This is where much of the value of master data management is derived.
As your MDS implementation matures, the most common type of system should be a subscribing system. Subscribing systems are those systems that consume data from another system with no direct user changes to the managed domain. Some larger companies use an intermediate store to pass data to multiple systems. As long as direct changes are not made in any of these systems, all these systems can be considered downstream subscribing systems.
The next phase in the project plan is to determine data sizes and types within the Master Data Services system. Analysis of all source systems’ main tables and their columns to determine the best data type for storing that data can provide an initial roadmap for MDS. There are two main questions that must be asked during this phase: What kinds of internal constraints are placed on this column? Are there any downstream constraints in the organization that require further data cleansing to be performed?
You should start the mapping process by identifying those columns within the source system that you will manage in MDS. For each of these columns, determine a rudimentary data type. In Chapter 4, we discuss the different data types available in MDS, but for now just identify data as date, number, or string.
If you identify a date field, be very careful that this field provides information across multiple systems and is a state of the mapped domain. Oftentimes, date fields are red flags that you have mistaken a system-specific field such as “entry date” for a more important field like “start date” or “discontinued date.” You should be extra wary of any column that requires the storage of a specific date or time. These fields generally signal that the information stored is most likely transactional in nature.
Columns that you identify as numeric should be reviewed as well. Is this column storing a valid state on a record? We have seen many projects in which the designers have been tempted to map in fields such as account balances or sales figures. These numbers are subject to daily change and are best handled as part of a business intelligence (BI) solution. Master Data Services does not support simple math functions or consolidations, in an effort to dissuade users from storing inappropriate data. This does not mean all numbers should not be stored. MSRP, safety stock levels, standard terms, and credit limits are all valid information for MDS. When identifying numeric columns, you should log the precision, or number of decimal places, each column will require. While many systems may not limit this on the backend, the actual required precision should be easy to determine for the primary system owner.
When mapping a source system to MDS, it may be advisable to create additional tables to store choices for certain fields. Some of these relationships are easy to see. If there is a foreign key relationship within the source system that displays options for one field to another, you should continue to preserve this relationship. For instance, if a customer has a relationship to an address table that stores addresses available to the customer, this relationship can be modeled in MDS.
Some relationships are not quite as evident, though. For instance, a source system may only store specific attributes as text fields, yet valid values for those text fields may be constrained by the business process. If those values are better managed in a separate table, and a foreign key relationship between those tables would be advisable, you want to highlight those relationships now. Determine if any of these relationships may be reused. For instance, within many data sources, there will be a number of fields that map to a choice of either Yes or No. In these cases, you may want to create a single entity to store those valid choices.
Data quality is a major concern for corporations, and the identification and management of duplicate records is a central task in the effort to ensure data quality. Duplicate records exist within organizations for a variety of reasons. One of the biggest reasons employees duplicate records is that they are unaware that records already exist in a separate system, because the systems are not integrated effectively. These records should be merged wherever possible.
Some companies intentionally duplicate records within systems for a specific purpose, typically to work around limitations within current applications to support necessary business processes. In most cases, these duplicates must remain within the business application to continue to provide the workaround they were designed for. Systems that required duplicate member records should never be considered for Systems of Record. When connecting these subscribing systems to the MDS repository, master data attributes should be updated in all downstream attributes.
To determine whether an attribute should be managed in MDS, data stewards must decide on the nature of the attribute. Is this a state of the domain? Is this information useful for multiple systems? All numeric and date fields should be evaluated closely. Tracking information should not be managed if it is specific to an application in the organization. Calculated values or balances should not be managed in an MDS system because these values will change over time. Make sure that attributes that store a product’s age are not managed as such but that you manage the product’s date of creation instead.
It is important to understand the flow of data within an organization before implementing an MDM system. Any MDM system that does not match the natural flow of data through an organization is doomed to fail, as busy users work in the easiest, most natural manner. Analysis of business processes can provide insight into the flow of related data through the enterprise. All systems within an organization can be classified as either systems of entry, systems of record, or subscribing systems.
Many systems within the organization will have data-cleansing needs. It is important to determine what remediation will be acceptable if duplicate records are found. If history must be maintained for duplicate members, a mapping between the MDS source and the system should be maintained. In the next chapter, we discuss how to create our models in MDS.
In this chapter, we discuss the creation and customization of MDS models. In MDS, models are the central work surface that will be exposed to the master data editors and consumers once your project goes live. The success of your project will depend mainly on how well you design your model.
We begin with a discussion of modeling concepts. Then we show you how to deploy pre-built models.
Master Data Services is made up of a relatively simple group of concepts. These concepts are encapsulated as data containers and services exposed in MDS. While these concepts are simple to understand by themselves, it is the varied and complex data schemas that these concepts support that provide both the power and complexity of MDS. Before we delve into the implementation of the model objects, let’s review general definitions of these MDS concepts. Each of these concepts is described in more detail later in the chapter.
Master Data Services begins as a blank canvas, allowing you to create your data models within the product in any way you choose. The model structure is created in the System Administration functional area of the Master Data Manager web application, or by using the web services. As you create the structure, you can open the Explorer functional area of the UI to see the results of your work. The Explorer functional area is where users will go day to day to manage their master data.
Opening the System Administration area of the Master Data Manager web UI for the first time can be a daunting experience. In the latest release, the System Administration will default to the model creation screen if you have not loaded any models. When you first create an MDS database, only the Metadata model is created. Figure 4-1 below shows this first model if you open in the Model View screen. This is the only system model within Master Data Services and has been marked for deprecation in a future release. While this model can still be accessed from System Administration or Explorer as its own model, all access to the Metadata features from your own models has been eliminated through the UI. Access to these features will need to be completed through the web services for Metadata.
Figure 1: The default Metadata model that is created during installation
Notice that the menu bar and page title of the System Administration functional area both contain the word “Explorer.” If you go back to the home page, you’ll see that there is an Explorer functional area, where you will manage master data. This is not the same as the System Administration area, where you use the Explorer page to manage your model structure. Most of the time, when we mention Explorer, we are referring to the functional area that you access from the home page of the Master Data Services Web UI.
If you are a new user of MDS, you have two avenues for creating your first useful model within the product. You can load a sample model deployment package into your database and modify the model from there, or you can build your model from scratch.
Starting with a predefined model can be the easiest way to get started with SQL Server MDS. MDS also includes three sample models in the installation. These samples are very basic Customer, Product, and Account models, located in Program Files\Microsoft SQL Server\110\Master Data Services\Samples\Packages. You can take a look at them to get a better idea of some of the more common features of the application.
While these models provide some data and attributes and support the documentation provided in Microsoft Books Online for SQL Server 2012, they are relatively limited in their functionality and do not provide adequate coverage for the exercises listed in this book. However, these models include sample data, hierarchies, business rules, and versioning, so you can see what the end result of all your upcoming work might look like.
A model deployment package is an XML file saved with a .pkg extension. It includes the model structure, the business rules (Chapter 6), and version flags (Chapter 7) and Subscription views (Chapter 10). It does not include file attributes (Chapter 4), and user and group permissions (Chapter 9). Model packages can contain the data from a version of the model when created from the ModelDeploy.exe command-line executable. Only the model structure can be created or deployed from the web UI in the SQL Server 2012 version of Master Data Services. The new utility and the limitations on data in the UI were added in response due to the inability to support all model sizes through the web services.
Model deployment in MDS was designed to provide organizations with two very important capabilities related to managing data models in MDS. The first is the capability to develop, test, and deploy a model within three separate implementations of MDS, while limiting the number of objects that need to be re-created in each environment. This enables IT management to roll out changes en masse and to ensure that the production environment remains operational throughout the process.
The second capability enables organizations to share their model schemas with others. There are a number of efforts across a host of industries to standardize object models. Previously, most organizations would develop data models in a vacuum, creating schemas that solved their current problem most efficiently. They integrated only those applications owned by the organization and central to the solution. IT departments at all of their competitors were doing the same. This led to a wide variety of data models among competitors in the same industry. Some organizations built data models that were flexible, stable, and scalable, but many did not. When consolidation within an industry occurred, acquiring companies discovered similar business processes were hard to integrate due to the vastly different data models that had been developed in isolation.
Following are a couple of advantages to employing common data models within an industry:
MDS supports these efforts in two ways. By not enforcing its own model, MDS ensures there is one less data model to be managed. Model deployment empowers industry leaders to quickly build master data–centric versions of these industry models, providing efficient deployment across their industry. Any customization that needs be made to these standardized models is still supported in MDS.
As discussed previously, models are the highest level of organization within Master Data Services. Models are nothing but containers of related entities. Only entities within the same model can be related within MDS. This supports a simplified versioning model where all entities within a model are versioned simultaneously. Although many models revolve around a single domain such as customer or product, this is not a requirement. You can combine major entities into a single model if this makes business sense.
When determining whether or not to include an entity in a model, there are a number of factors to consider:
In MDS, all data that is managed by the system is stored in entities. Entities can be loosely thought of as tables in SQL. The data within entities are called members.
MDS’s entities can also support explicit hierarchies. Once explicit hierarchies have been enabled for an entity, an entity becomes far more than a single table. The entity can then support parent and collection members, as well as hierarchy and collection relationships. Hierarchies and collections are discussed in detail in Chapter 5.
Members are the individual records stored in Master Data Services. Members are uniquely identified by the required Code attribute. There are two types of members in MDS: leaf members and consolidated members.
Leaf members are the most granular level of records in an entity and usually represent physical objects within your business. For example, in a Product entity, a leaf member might be Men’s Shirt #602. A leaf member in an Employee entity might be John Smith. A leaf member in a Warehouse entity might be Warehouse-98101.
One notable exception to this rule of thumb applies to any entities managed within the Finance domain. When managing entities associated with Finance, leaf members typically represent those low-level identifiers that transactions can be coded to. If transactions are coded with a store, account, employee, and product, you can be assured that the available values for each of these attributes should be stored as leaf member records in the associated entities.
Consolidated members are used only in explicit hierarchies, and will be discussed in more detail in Chapter 5.
All entities within Master Data Services can be enriched by the creation of additional attributes. MDS supports a subset of the attributes supported in SQL Server. MDS supports four specific types of free-form attributes for leaf and consolidated entity members: text, datetime, number, and link. Text and link are essentially string fields, with link providing one-click support for hypertext links. Number supports as many as seven decimals, and datetime allows you to specify the mask for how the data will be input.
In order to relate two entities and ensure that values are constrained to specific values, MDS gives users the ability to create domain-based attributes. These attributes limit available values to the list of active members within the related entity.
The last type of attribute available to entities is the file attribute. Within a master data entity, it may be advisable to manage some files associated with each member. For instance, there may be instruction documents, specifications, blueprints, or photos that need to be associated with a Product entity within an organization. However, the file attribute has a number of limitations that make it less attractive to use for file management than SharePoint. First, to limit database sizes of MDS, file attributes will not be versioned. Only the last loaded file will be available within the MDS system. The transaction log will display information related to new files that have been loaded, but these will not be reversible.
Master Data Services can contain every attribute associated with an entity in a business. Because of this, some entities can become unwieldy when viewed as a single table in the web UI.
In a large organization, it is not uncommon for a major entity to contain over 400 attributes. Very few, if any, system users are interested in viewing all attributes simultaneously. Most users are interested in viewing only specific, related attributes at any one time. MDS provides the ability to group attributes into multiple tabs in the Master Data Manager web application. MDS calls these groupings attribute groups. Any attribute can be added to any attribute group.
Typically attribute groups are created for each role consuming data within an entity. Different functional areas of the organization will be concerned with different groups of attributes for each entity stored in the application. Access to attribute groups can be managed by applying security for different users or groups at the attribute group level. This will be covered in more detail in Chapter 9.
Businesses are not built only on lists; they require more complex structures for data. Dollars, units, and hours must all be calculated to produce consolidated views of a business. For many years, companies have been using business intelligence (BI) applications to better understand their businesses and discover opportunities for both cost savings and revenue growth. At the heart of these applications are the hierarchies that make consolidations possible.
When managing consolidations within a business, it is imperative that all values are accounted for once and only once. To ensure this, hierarchies in Master Data Services enforce that all leaf members have one and only one parent. This limits your ability to manage many-to-many relationships within MDS and display these relationships hierarchically.
In this chapter, we review the two primary types of hierarchies supported in MDS.
When you’re working with hierarchies, it is important to understand some common terms used across applications. Any hierarchy can be either ragged or level based. Ragged describes hierarchies that support leaf members at multiple levels. A hierarchy is level based if leaf members always exist at the same level, regardless of the number of levels within the hierarchy.
Some business applications do not support ragged hierarchies, so it is important to know the limitations of downstream systems before you create hierarchies in MDS.
In MDS, derived hierarchies provide the ability to highlight preexisting data relationships within entities and display them hierarchically. Derived hierarchies are based on the pre-existing data relationships within MDS. In SQL Server 2012, derived hierarchies have been expanded to allow more complex relationships to be managed effectively. It is clear that as the MDS product advances, derived hierarchies will continue to evolve to handle all data relationships, and explicit hierarchies will lose importance. Derived hierarchies are always level based. This means that every level within the hierarchy corresponds to a specific domain-based attribute within the entity–attribute chain (see Figure 6-1).
This entity is a domain-based attribute of the entity below it.
This is a member in the Gender entity.
These are members in the Product entity.
Figure 1: A derived hierarchy, created by using a domain-based attribute
Explicit hierarchies are multi-level hierarchies with very few restrictions. Explicit hierarchies are managed as name–value pairs, with consolidated members containing other consolidated or leaf members. Explicit hierarchies are created for one entity at a time. Derived hierarchies, in contrast, require multiple entities.
Consolidated members are almost always theoretical items, whereas leaf members represent physical items. Like leaf members, consolidated members can have attributes assigned to them. In the MDS database, a separate table exists to manage consolidated members and their associated attributes. Consolidated members are available only if an entity is enabled for explicit hierarchies. Each consolidated member can be associated with only a single hierarchy no matter how many explicit hierarchies have been created for the entity.
Derived hierarchies are determined by the structure of the model, and changes to the structure are rare. To illustrate the value of this rigor, consider the following scenario that organizations commonly encounter:
A fictional company devises a regional hierarchy to manage sales. As with most of these hierarchies, this hierarchy begins as a level-based hierarchy, with each level within the hierarchy corresponding to a distinct type of attribute. To store the hierarchy, IT uses the parent-child format from its analysis system.
Over time, this hierarchy is modified by mid-level managers to help them manage their divisions better. Changes are not centrally managed, and new levels are added monthly. John is the manager of the Western region of the company. John has two managers who split duties managing the Southwest division for him: Bill and Margaret. Since Bill and Margaret are splitting the Southwest, John needs to split divisional data in the company’s reports to measure this divisional structure. To do this, he creates two additional nodes within the hierarchy for the Southwest division, SWB (Southwest Bill) and SWM (Southwest Margaret).
Over time, Bill and Margaret move on to other jobs, either moving up in the company or on to other opportunities. Yet the Southwest division continues to be split into SWM and SWB. This is not an isolated occurrence, as many managers make isolated changes to the hierarchy structure. Some of these changes may not be warranted, and others may have a short shelf life. Soon the initial hierarchy is unrecognizable and difficult to manage. The ability to provide managers with rigid derived hierarchies, malleable explicit hierarchies, and focused collections allows BI professionals to provide the perfect tool for each scenario.
|Derived||Uses multiple entities. Based on domain-based attribute relationships. Level based. Hierarchy structure is designed in System Administration. Hierarchy members are updated in Explorer.|
|Explicit||Uses one entity only. Consolidated members are used to group other consolidated and leaf members. Ragged. Entity must be enabled for explicit hierarchies in System Administration. Hierarchy structure is designed in Explorer. Hierarchy members are maintained and updated in Explorer.|
Managing data is not just about storing the data in a customized entity; it is also about ensuring that the data is both accurate and complete. Master Data Services provides business rules to achieve this aim. In this chapter, we review business rules.
In many organizations, no matter what the size, the business owners responsible for managing the master data don’t have the technical knowledge needed to implement the related processes. If the business owners don’t know how to use SQL Server or how to code business rule engines, they can be left at the mercy of their IT departments. At the same time, because the IT department has the technical know-how, the burden often falls on them to learn business domains they don’t necessarily need to know. MDS strives to simplify the creation of business rules to empower business users to manage their own data quality.
In MDS, business rules are declarative expressions that govern the conduct of business processes. These expressions are compiled into stored procedures that perform the task of validating the data. The area of the Master Data Manager web application used for business rules was created to empower business users to write relatively complex business rules without knowledge of Transact-SQL.
Business rules are IF…THEN statements. IF certain conditions evaluate to true, THEN perform specific actions. Conditions can be combined using either AND or OR logical operators. These operators can be used to create extremely complex business rules. You can use as many as seven levels for complex conditioning.
Although you have the ability to create complex rules, there are some real benefits in creating multiple, more granular rules. You should consider breaking any rule that uses the logical OR operator into multiple rules. This makes rules easier for other users to read and understand. Multiple rules also allow you to exclude specific rules and to provide more granular notifications. Rules built with the AND operator must be kept together to function as a unit.
Business rules are always applied to attribute values. For example, if an attribute value is blank, you might want to send an e-mail to notify someone or set the value to Pending. Or you might want to update the value of one attribute based on the value of another attribute. Because business rules are applied to attribute values, you should determine which attributes you’re going to work with before you start creating rules. Each time you create a rule, you must select the model, entity, and type of member that contains the attribute you’re looking for.
After you understand the workflow for adding and modifying the data stored in MDS, you must learn how to manage that data over time. For each model within MDS, a historical record of the data can be stored. These data snapshots are called versions.
Each time you create a version of a model, the data for all the entities within the individual model are versioned at the same time. Only the data is stored; any changes to the structure of the model affect all versions and can create unintended consequences.
Version flags are another important component of versioning in MDS. By assigning flags to versions, integration with other systems can be better managed.
There were no significant changes to versioning in the SQL Server 2012 release. Administrators should continue to use versions to manage snapshots of their models over time.
In order to manage data within your organization effectively, you may be required to create versions of the data stored within a model. In MDS, the data in all entities within each model is versioned simultaneously. A benefit of this design is that you can manage relationships between entities without worrying about time and version. A side effect of this design is that entities in different models cannot interact with one another.
There is often a natural cadence to many data domains. This cadence can help define which entities should be managed in the same model. For example, in many organizations, the accounting department manages structural changes on a monthly basis. As each month ends, everyone in the accounting department goes through standard routines to ensure that the month’s books can be closed properly. Any entities related to this process—accounts, divisions, departments, or other internal business entities that are central to this process—should be managed together, versioned in concert with the month-end process.
In another example, an organization may release products on a quarterly basis. All entities central to the product development process would follow this quarterly versioning scheme. Other entities may not require a versioning scheme at all.
Versions provide a number of benefits to the data management process:
MDS does not version metadata changes. Any changes to the model structure affect all open and committed versions. If an attribute or entity is deleted, for example, all history for that attribute or entity is lost from all versions forever. When you need to maintain historic data, we suggest that you use security permissions to hide attributes or entities instead of deleting them. You can also hide attributes by setting the display width to zero or by not adding them to attribute groups.
When you add an attribute or entity, the model structure is updated in all versions as well. You can add the corresponding data to any version of the model; if you add data to a later version, the structure exists in the earlier versions but the data does not.
Sometimes users and downstream systems need to be certain that all data has been validated and reviewed. Because MDS allows incomplete members to be added to the system and encourages users to manage the data creation and correction workflow from within MDS entities, it may not be reliable for external production systems to use open or locked versions. Only committed versions ensure that all members in every entity within the model have passed all business rules successfully. Once a version is committed, no additional changes can be made to the data, and the status of the version cannot be changed.
Figure 9-1 shows the most common version control workflow.
More master data is stored in Excel than in any other application on the planet. Some smaller companies use this solution because it is cost effective and easy to use. Others use it to store their “homeless” data in worksheets because it is the tool they are most comfortable with. Even the largest organizations with the greatest, most well-designed MDM systems that money can buy struggle with propagation of master data in Excel.
In SQL Server 2012, the Master Data Services team has attempted to harness the power of Excel to give users all the security, audit, and management features of MDS in a package that Information Workers are most comfortable with. In this chapter we will explore working with MDS data in Excel and the benefits of doing so.
Using the combination of Excel and MDS as a data management tool can provide organizations with significant advantages. Excel is a natural platform for staging data from a variety of sources. Its built-in data functionality can be used to parse and cleanse data before loading the data into MDS. The developer interfaces, whether code, macros, or formulas, can provide additional custom automation in a rapid fashion, without voiding any support in the MDS solution.
Many of Excel’s limitations are also addressed by using it in conjunction with Master Data Services. Data sharing, security, and transaction logging have always posed problems that spreadsheet designers have struggled to solve. One of the biggest concerns with storing data in Excel is the lack of security and central management. Using the MDS Add-In for Excel can alleviate many of these concerns.
The Master Data Service Add-In for Excel will allow you to load entity data into a worksheet as an Excel formatted table. An Excel table is really a perfect environment to review and manage reasonable amounts of data from Master Data Services. Filtering, sorting, and formula functions can be used to review the data in a friendly format without destroying the ability to write back to the server. Data can also be taken “‘offline”’ and modified without connectivity and then published back to MDS when connectivity has been restored.
The MDS Add-in for Excel really changes the game in MDM, providing a well-known and efficient interface for all members of the enterprise to load and manage their data. The intuitive interface can create new entities, thereby eliminating or reducing costly modeling phases of MDM projects. In Chapter 11, the new modeling process will be discussed in greater detail.
By the time you are ready to implement security, most of the functionality in MDS has been enabled. Models have been built and refined, rules have been written, and the application has been integrated into your organization.
In order to deploy MDS, everyone needs access to the application. While transaction management can provide some accountability, limiting access based on needs and roles ensures that users are unable to change data without authorization. Limiting the number of models and functions available to users can also help them focus more quickly on the data they need. The ability to provide specific data access within the MDS system is the single most important feature of the application. The focused security access provided by MDS empowers business users and frees the IT organization to manage the overall process, not maintain the individual data points.
In this chapter we provide an overview of the security framework in Master Data Services. We discuss the process of managing user and group permissions and explain the highly customized access that can be granted.
MDS security is broken into three distinct areas:
Model object permissions (which apply to attributes) and hierarchy member permissions (which apply to members) are combined to determine the exact level of security for every attribute value. Figure 11-1 shows how attribute and member permissions intersect so that security can be determined for an individual attribute value.
Figure 1: Model object and hierarchy member permissions are combined to determine permissions for every attribute value.
Security in the prior release of MDS was a double-edged sword. The ability to set security on multiple hierarchies simultaneously provided significant flexibility and complexity, but with so many pieces working together, it was easy for administrators to lose sight of what the effective permissions were for each user. A key effort for the second release of Master Data Services was to simplify the security model. In SQL Server 2012, Attribute Groups are no longer securable; you must set attribute security explicitly on each attribute. Direct hierarchy security has been removed from the security model.
MDS relies on Active Directory for user and group authentication. While all security permissions are stored in the MDS database, no passwords or group memberships are managed in MDS.
To keep security as simple as possible, you should do the following:
Before you begin working with security, you should take some time to determine which groups your users might be part of, and which attributes or members those groups might need access to. The following list should give you a general idea of the groups that you might use.
All of these groups will be able to access MDS after being assigned functional area and model object permissions. You might also assign hierarchy member permissions to a few select members of the Purchasing group. These users should be able to view products for only the manufacturers they are responsible for.
Even though we recommend that you assign permissions to groups, for the rest of this chapter we’ll refer to permissions that users receive, because at the end of the day, groups wouldn’t mean anything if users weren’t in them. Users are ultimately the ones who will access the data.
In MDS there are two types of administrators:
Functional area security determines which of the five functional areas on the Master Data Manager home page a user or group can access. Security at this level is either permitted or denied. If permission to access a specific functional area is denied, the area is not displayed in the web UI and related web service operations are denied.
The Explorer functional area is where users manage data. When you assign access to Explorer, you must assign access to specific model objects, so the user gets access to a specific set of data. When you assign access to any of the other functional areas, the user must have access to the entire model (on the Models tab) in order to use those areas. Without this access, the user can open the functional areas, but no models are displayed. This is how MDS handles permission for Administrators.
Model object permissions, assigned on the Models tab, are required. Users cannot view any models or data if they do not have model object permissions.
When you give users permission to model objects, you are giving them the ability to edit attributes for members, based on the object you select. For example, if you set Update permission on the Product entity, all attributes for all Product members (leaf and consolidated) can be updated. If you set Update on the Color attribute of the Product entity, only the Color attribute can be updated.
In addition to giving a user the ability to update attribute values, if you assign Update model object permissions to a model, entity, or to the word “Leaf” or “Consolidated,” the user can also create and delete members. If permissions are assigned at a lower level, the user cannot create and delete members.
Note: Permissions automatically cascade to all child objects within the current model unless permissions are assigned at a lower level. You do not need to explicitly set permission on every object.
If you assign Update model object permissions to the model only, the user is an administrator, which means he or she can access the model in functional areas other than just Explorer if given to access to that functional area.
Things to remember about model object permissions include the following:
There are many different model objects you can assign permission to. Giving access to specific models, entities, or attributes should fulfill most of your security needs. In SQL Server 2012, most complicated security configurations have been simplified with the removal of attribute group and hierarchy object security.
Although the ability to manage data is an important feature of Master Data Services, the ability to export data to other systems in the enterprise is equally if not more important. MDS has simplified the export process by providing subscription views that you can create on any entity or derived hierarchy object within the MDS system.
In this chapter, we describe the different types of the export views that can be created and the format of each of these views. While Master Data Services provides no direct Extract, Transform, and Load (ETL) features of any kind, the Integration Management functional area of the Master Data Manager web application (or “web UI”) can facilitate your integration processes by providing a wide variety of views to assist in the loading of downstream systems.
Master Data Services can provide value for a project that keeps the data locked in its entities by providing process around the data management; however, to meet the operational or analytical needs of a master data management project, organizations need to transport stored data downstream to subscribing systems. To insulate organizations from the complex object model necessary to manage performance and the customization necessary within MDS, the developers created a subscription view layer.
You can create subscription views within the Integration Management functional area of the web UI, or by using the web service. The web service refers to these views as export views, but we’ll use “subscription views” and “export views” interchangeably.
The following table shows the available subscription view formats. When you create a subscription view, you have to choose which format you want to use. There are two major types of subscription views within MDS: attribute views and hierarchy views. Attribute views display the data stored for leaf, consolidated, or collection members in an easily consumable tabular view. Hierarchy views provide relationship data for all types of relationships in MDS, whether explicit or derived hierarchies or collection members. Collection member relationships can only be displayed in a parent-child format. An additional view is available for derived and explicit hierarchies. It contains a row for each member and the parentage all the way to the top consolidation in the hierarchy; this view is considered level-based.
|View Format||Description of View|
|Leaf attributes||Shows leaf members and their associated attribute values|
|Consolidated attributes||Shows consolidated members and their associated attribute values|
|Collection attributes||Shows collection members and their associated attribute values|
|Collections||Shows collections and their members in a parent-child format|
|Explicit parent child||Shows explicit hierarchy structures for an entity in a parent-child format|
|Explicit levels||For the entity, shows all members in all explicit hierarchies in a level-based format|
|Derived parent child||Shows all derived hierarchy members in a parent-child format|
|Derived levels||Shows all derived hierarchy members in a level-based format|
Many of the columns in Master Data Services’ subscription views are identical across view types. The columns provide either context for the data displayed in the view or additional system information for the records contained in the view. Subscribing systems can use this information to update a subset of records based on validation status or last updated statistics.
|VersionName||The version name for the current version being displayed. If this is a view based on the version name, this value will never change.|
|VersionNumber||The version number for the current version being displayed. If this is a view built on the version name, this value will never change.|
|VersionFlag||The current version flag for the displayed version. If a view is based on a version flag, then this column will remain constant and the VersionName and VersionNumber column values will change as the version flag is moved between versions.|
|EnterDateTime||The date and time the member was first entered into MDS.|
|EnterUserName||The user who initially entered the member into MDS.|
|EnterVersionNumber||The initial version this member was created in.|
|LastChgUserName||The user who last updated this member in MDS.|
|LastChgVersionNumber||The number of the version this member was last changed in.|
|ValidationStatus||The current validation status for the member. Validation status only exists for leaf, consolidated, and collection attribute views, because there is no validation stored for relationship members.|
Modeling is more art than science. You must understand not only your datasets and how they interact, but you also need to understand the related systems, people and their processes.
Master Data Services is not a data warehouse or a reporting tool. It is a data management tool. This is a difficult distinction for many to understand. Master Data Services should be used to manage and store lists for the business. Security can create boundaries for multiple users. Business rules notify users of issues with the data, while the Web UI, Add-In for Excel, and Web Service give the entire organization access to work on the same datasets. All of these features are designed around managing data changes.
Once data is being managed correctly in MDS, ETL processes can send this data into data marts and warehouses to provide the reports demanded by the business. These systems can focus on the reporting structures, disseminating the information without concern for the data editing and quality processes. In this chapter we will try to provide insight into some of the more common complex problems that we have seen.
MDS is best built from the top down, not the bottom up. As we discussed in Chapter 3, the first step in managing data is to determine the major entities that need to be managed. Unfortunately, many modelers then move directly to the downstream systems to build their models, deriving all columns from these pre-existing systems. This section discusses some significant pitfalls with this approach that you should be aware of.
Whether a system is purchased or created, the data model of that system was developed through a series of compromises. If you were building a system to manage payroll, how many attributes would you add for each employee? Would you add information about certifications they hold? Each system is built to solve a business need. Some of these systems will attempt to solve multiple needs or allow for minimal customization. If you focus on a single system, you run the risk of bringing these compromises into MDS.
A couple of common compromises seen in the field are perpetuating overloaded fields into your MDM system and propagating system column names. Either of these issues can blunt the effectiveness of your MDS implementation, limiting usability for business users.
Many systems have tables that need to have a single column they use as the primary key. To ensure this column is always unique, values in this column may be a combination of three or more distinct values that have been pasted together, typically as a fixed-length combination. Do not bring this as a single column into MDS. Allow users to manage these distinct values separately. If visibility of this value is required, use business rules to concatenate the obscure codes into this overloaded value for the external systems.
Most systems try to use standard naming conventions and technical shorthand to name tables and fields within the database. Ensure that you adapt these names to be easily readable by your end users. This might apply to allowable values within some fields as well. Create an entity and provide useful names to each of these unique values.
Each system within your enterprise comes pre-configured with some built-in nomenclature and culture. If you look at these systems at the database level, where table and column names have been abbreviated, determining meaning and use can be difficult. This can lead to confusion about how different systems or departments manage similar attributes. While these architecture diagrams can help lead the conversation, working directly with the business users is the best way to ensure you accurately model your business.
When we rely on the existing structures of systems within the organization, we may find ourselves modeling scenarios the business does not require. This can be due to changes in the business structure over time or an incomplete understanding of the problem when the system was implemented. It is important to take advantage of the current mindshare of the business users and to build your MDS model as the business wants it.
With the advent of the MDS Add-In for Excel, Master Data Services has revolutionized the modeling process. In the past, there was a gulf between modelers who read and create data models and the rest of the business. The best way to bridge this gulf is to design a model, review it with the users, receive feedback, and repeat the process. With competing products, this process usually takes weeks or months for each iteration, whereas with the Add-In for Excel, a team can create a fully populated model in a couple of days.
Excel is a great intermediate location for data from any source. Many of the datasets currently managed in your organization already exist within Excel. For any data that does not exist in an Excel sheet, you can use the existing data features in Excel to load the data into a workbook. Once the data is loaded into Excel, you can create a new entity by using the Create Entity wizard.
One of the most common concerns when helping data stewards model data in MDS is how to handle history within the system. In most cases, these concerns are not drawn from operational systems, as these systems typically focus on the processing of the current state. Analytic systems such as data warehouses or data marts need to store or report information based on changes over time. Most modeling discussions quickly turn to managing these dimensions changes in MDS. This is the wrong way to look at it. The role of MDS is to create the changes, not to store them for reporting purposes. We need to look at how to leverage the existing tools in MDS to send these changes to the reporting systems that require them.
"The Profisee MDM platform provides exactly what we are looking for."
―Slobodan R.Read the full review
"Very capable MDM platform with solid development toolkit and favorable TCO"
―Data & Analytics Architecture Manager in the ManufRead the full review
"Profisee really stood out with their attractive pricing model and implementation time compared to the competition."
―Project Manager in the Finance IndustryRead the full review
"Very affordable and user friendly. Great for modeling big data domains."
―User in Higher EducationRead the full review
"Great end-to-end product to make MDM easier for organizations."
―Internal ConsultantRead the full review
"Excellent vision and roadmap for the product."
―Senior Manager Business Intelligence in the ServicRead the full review
"The Profisee product is intuitive enough for us to implement our first domain in under six months."
―Manager of Data Architecture in the ManufacturingRead the full review
"The technology is well built and is a flexible/robust tool - powerful engine and has solid UI and exceptional workflows - and ability to customize."
―Vice President in the Manufacturing IndustryRead the full review
"The best thing about the software is the UI, it is very nice and clear. It is very easy to understand"
―Administrator in Computer SoftwareRead the full review