Data Governance has been with us for quite some time now, and for years has regularly been cited as a corporate objective for many companies. While there have been some notable success stories, many companies have found it difficult to get Data Governance to stick. One of the main reasons for this may be that Data Governance is often viewed as a massive undertaking, with a great deal of time and effort needed to agree, define and document data across the enterprise. This notion is emphasized by many vendors, who advance the idea of ‘holistic data governance’. While there is no doubt that full on, end to end data governance can bring many advantages to a company in the areas of revenue generation, cost and efficiency improvement and risk management, there are some techniques that can be used to accrue much of the benefits without a full commitment to enterprise data governance.
There is no single, universally accepted definition of Data Governance. Rather there are many very similar definitions. The Data Governance Institute uses this one:
‘Data governance (DG) refers to the overall management of the availability, usability, integrity, and security of the data employed in an enterprise.’
As can be seen, the definition refers to several outcomes desired from Data Governance. And often, companies use this definition or one similar and define a program that will address all of these outcomes. This often leads to Data Governance being regarded as ‘all or nothing’, and being defined as managing data to a single set of definitions and rules across the enterprise. In many cases, Data Governance has come to mean:
- Attempting to define a single owner for data domains across the enterprise
- A single definition of data across the enterprise
- Knowledge of end-to-end lineage across all sources of information across the enterprise
This all-encompassing Data Governance is extremely hard to implement at many companies:
- Stakeholders do not understand why they have to change definitions they have been using for years – and in many cases will not change them
- Data Owners are often hard to find, and , if appointed, do not have the authority or interest in creating a single definition
- Initiatives that are used to adopt Data Governance spend inordinate amounts of time:
- Locating data stewards across the organization, and attempting to get them to change definitions they have no interest in changing
- Trying to determine lineage of data that has a complex series of hops across the environment, often being altered and updated without documentation
The result is that often, Data Governance becomes seen as a large undertaking that, at best, slows down the initiatives that are used to adopt it, and at worst consumes huge amounts of money and effort with little return to many of the stakeholders. This, in turn, leads to Data Governance being largely ignored and ultimately not achieving the goals that it has set for itself.
But it doesn’t have to be that way. An examination of the definition does not reveal anything that insists on a single definition across the enterprise, or detailed knowledge about data lineage. In fact, all the definition says is that Data Governance is the management of data to achieve certain outcomes. Literally following the definition only means that rules for availability, usability, integrity and security should be established by the business, and Governance should put controls into effect to ensure these rules are followed. In other words, Data Governance has become confused with enacting and adopting business requirements around data as opposed to ensuring these requirements are followed.
Some companies are now defining Data Governance as the controls to ensure the requirements are met, and as a result finding Data Governance is easier and less costly. Rather than defining the ‘all or nothing’ data governance described earlier, they are taking an ’80-20 path’ to Data Governance, and achieving many of the desired outcomes much quicker and easier. Let’s take a look at how they are doing this, and then at some of the benefits they gain, and issues they will still have to deal with.
The main concept these companies are using is that they are governing data at point of use. Point of use may be a specific repository, or it may be a series of repositories that support a specific business process, and have an identifiable process owner. The process owner (i.e. point of use owner) then has the responsibility to define the rules for that point of use (e.g. document the meaning, usage, quality and business rules for the data along with the location that the data was acquired from and any locations that it is provided to), and to ensure (or certify) that the data is fit for use. Once the rules are defined, Data Governance can help ensure that they are enforced. At that point, the data can be said to be under governance for that specific usage. Obviously, there are many questions that this definition of governance will not answer, but the basics ensuring that the data is documented, fit for purpose and able to be checked for quality are all present. For many stakeholders, this is all they are really looking for.
While governing at this level does not provide all of the solutions of traditional Data Governance, it can be seen to move the enterprise towards adopting all or nothing data governance if necessary e.g.:
- If questions arise about why ‘the same’ data on different reports is different, if both locations are under governance, the documented definition and usage of the data in each place will provide a path to resolve the issues.
- If there is a requirement to understand a more formal lineage (e.g. for Regulatory reasons), the documentation of source and targets at each point of use can be ‘chained’ together to show a data journey
- A desire to reach a common language or taxonomy can be enabled by the Data Governance group documenting a common meaning for the same data used in multiple different places without changing the definitions in the sources.
Adopting governance at point of use introduces several important differences from adopting data governance across the enterprise.
- The concept of data owner as the ultimate decision maker on data meaning is changed. Now there are multiple process owners of the same data at different use points, all of whom have their own requirements that may or may not be the same e.g. :
- Data may be defined as ‘critical’ at some points of use, and not at others
- Quality requirements may be different e.g. for Financial Services companies, a CUSIP may consist of 8 or 9 digits in some processes, but may be required to be 9 in others (see the ‘formal agreement’ bullet point below for a means to resolve this situation
- Even with different process owners, some requirements are universal and must be inherited throughout the data supply chain – e.g. if a client mandates that their data may not be moved offshore, that requirement will be effective for all uses of the data and steps must be taken to ensure compliance
- Under these circumstances, all consumers of the data must accept the requirement, and ensure it is passed on to downstream consumers
- It may be advisable to introduce formal agreements between providers and consumers of data. In many cases, data requirements are determined by IT deciding to ‘piggyback’ on data that is already being provided to other consumers. If a consumer is to certify that their data is fit for purpose, they must know that it will meet their requirements. If a specific provider cannot fulfill the requirements, it is up to the consumer to find a way to get their requirements met. However, once requirements are accepted by the provider, it is their responsibility to meet them as they pass along the data.
So if the full advantages of end to end data governance are not gained, and there are still contradictions in meaning and gaps in lineage across the data supply chain, why should a company decide to govern at points of use? The main reason, of course, is that it is simply more practical and less costly to govern at point of use, and in many cases, the advantages of full data governance are not necessary or cost effective. Consider the following:
- It is much easier to find process owners at point of use than data owners for the enterprise. And the process owners are much more familiar with their requirements for their use than the data owners are for all of the uses across the enterprise
- It is much faster to define data at point of use than to reach consensus across the enterprise. It also makes more sense to the people who are using the data, who have been using their definition for years to document that definition without having to agree with many other stakeholders across the enterprise.
- It is much easier to simply define where the data arrived from and where it is going to than to trace it back and forward across the enterprise
Data Governance at point of use provides a practical path to moving forward with Data Governance in a meaningful way, without the full weight of agreeing and tracing across the entire supply chain. Obviously, it will not fulfill all needs, but it can be seen to move the ball forward. And when there are real requirements for true Enterprise Data Governance such as Regulatory Compliance, the initial steps taken by Governance at point of use can often supply a springboard to meeting them more quickly.