Data Governance: 9 Key Steps to Getting Started

The Following Concepts Summarize the Key Steps in Implementing a Data Governance Program

Step 1: Incorporate Regulatory and Legal Requirements into the Plan

All data governance programs must comply with regulatory and legal requirements. This section discusses the best practices that address general requirements you will or may be required to follow. While specific laws may involve specialized procedures and policies, the following set of guidance applies to key regulations and laws.

Data Integrity

Data integrity best practices always allow organizations to fully understand their data. Any updates, additions or deletions to data must be tracked or audited. The company must ensure that when data is moved from one system to another, or transformed, that is the only thing that happens to the data. Procedures need to be put in place so data remains accurate and complete as it flows through the organization.

Data Classification

Regulatory and legal bodies understand that requiring companies to proactively manage and protect data is expensive and time consuming. Because of this, many legal requirements focus on sensitive data that presents the greatest risk to the people the company holds data for. Information that identifies individuals such as name, birth date, gender and address is particularly sensitive information and must be classified as one of the highest risks to an organization.

Data Privacy

Data privacy requirements focus on ensuring that only authorized people view data elements for lawful and appropriate use. Best practices apply privacy rules to data that are classified at the attribute level such as birth date, address, email, adverse event and pre-existing conditions. When information about a subject involves data attributes that are classified differently, the organization must ensure that protection is enforced at the attribute level. From this, storage, transformation, and security policies at the classification level are applied.

Data Security

The organization is responsible to protect data from theft, invalid viewing, and accuracy. Data should be classified to ensure that it is known which data is most at risk and presents the greatest harm. Furthermore, stringent data security is an imperative across all data elements as they flow through the organization. The company will have different requirements for data reporting and sharing, so it is critical that all requests for data sharing focus on data classification and the security for that group of data.

Data Retention

While all data is valuable, the organization must have policies and procedures that deal with data retention for information that is no longer required legally. Understanding the data subject’s desires to opt in or out is part of the decision-making process for keeping data beyond its necessary use.

Step 2: Take and Re-take Inventory of Your Data

Your next step is to determine your organization’s data assets. Each department needs to document every data asset, including spreadsheets, presentations, word documents, lab notebooks and notebook type applications, emails, small and large applications, individual file folders on employee desktops, and shared folders (including Dropbox, SharePoint, OneDrive, Google Drive, and network shares). This inventory should be updated on a regular basis.

It will become apparent how ungoverned and widespread the company’s data assets are. Not everything can be identified at first, so aim to find key data assets first. The results should be audited on an ongoing basis to ensure that the inventory is accurate and up to date. The process of inventorying all aspects of data assets is a significant undertaking that should not be taken lightly.

Step 3: Establish Data Ownership

One of the most critical goals of a data governance program is the establishment of ownership of key data entities as they progress through the product life cycle. Within Life Sciences companies, this involves drug discovery, research and development, clinical trials and commercialization and end of patent life. Companies should first establish ownership at an entity/life cycle level, such as product/drug discovery. The product would be tracked throughout the life cycle, assuming the candidate makes it through trials.

Once owners are defined, they can begin to define the boundaries of entity ownership within the life cycle of their data assets (e.g., we own candidate products until they begin clinical trials). They will also authorize data sharing and updating in a controlled and auditable fashion. They will define where the corporate ‘golden record’ is stored and where reporting should be performed from. This process does not need to be onerous.

Step 4: Trace Data Movement Throughout Your Organization

Once the organization has located data throughout the company and assigned ownership, it can now be traced as it flows throughout the company and to external entities. Each of these flows can now be managed, removed, or altered based on the perceived value of the data movement. Since the company has assigned ownership to data, it can begin to govern transport. In some cases, the organization will add data movement to new departments or applications to increase the value of the data assets. Each data movement should be assigned either value or risk to assess if it should be retained.

Step 5: Track Data Usage and Downstream Business Rules

Once data has been inventoried and its programmatic movement throughout the company has been documented, we can now track end user data usage/sharing. This allows organizations to do change impact analysis if changes are made upstream. It also allows them to determine reporting and analytic business rules for the data owned by a user/department. They will want to document these business rules as they will have a potential material effect on decision making. Furthermore, an understanding of business rules will aid discussions when disagreements on reporting arise.

Step 6: Share Data Securely and Effectively

One of the best practices of data sharing comes from government entities. Since government agencies are separate in their organizations, they typically house their own data solutions. When asked to share data, they will require Data Sharing Agreements. These agreements dictate how the data will be used and provide guidance (and rules of engagement) on further sharing of the data. Most data sharing agreements do not allow the receiving organization, for example, to share the data with any other group. This may be overly burdensome for an organization, but simply thinking about agreements between departments and how they can use data is a valuable exercise. Some companies will have Data Usage and Sharing guidelines that all key users must sign.

Once the system owners have been identified over the life cycle of the product (R&D vs. Clinical Trials vs. Commercialization) and they set mechanisms or processes to track data usage, the company can begin to craft the solution for sharing data. The first step is to identify a Chief Data Security Officer. This could be the CIO, one of their reports, or someone else in the business. While this role typically falls under corporate IT and acts as the data custodian for the entire organization, it can exist wherever it makes sense. It is important, though, that the position has the appropriate amount of authority in the organization to authorize changes on how data is stored, moved, and shared.

The most critical aspects to sharing data safely and effectively are:

Create data sharing agreements that provide guidance on using data

Classify the data being requested for sensitivity and certification of validity

Assign a data owner that can approve the request

Provide the controls necessary for sharing that classification of data

Keep an audit of the request

Don’t be afraid to share data, it is an enterprise asset; however, it must be protected

Step 7: Ensure that Data Is Accurate

One of the biggest investments the organization can make is the creation of a centralized data hub. The company may choose to implement an active data hub where systems are connected and updates in one system propagate to others as needed. Reporting is typically performed as a federated solution where all the systems contain the information, and the central hub collects and disseminates it. These systems are very complex and can be expensive to optimize. They are typically supplemented with a passive solution such as an Enterprise Data Warehouse.

A data quality framework should be part of building this architecture and will report on key entity and attribute values as they migrate throughout the company. The availability of data quality metrics is critical to business decision-making with advanced analytics. For example, if it is known that the gender field only has values 30% of the time, the decision to use it in analysis will be very different than if it filled in with values 100% of the time and has been adjusted to current day thinking about gender beyond male and female.

Step 8: Actively Look for Competitive Advantage from Data

Once a set series of processes, procedures, and a data infrastructure are in place, the company will start to gain efficiencies, make more informed decisions and be in a better state to comply with current and future laws and regulations. The organization can start to identify where the data is unique in the market and where it may have value beyond its current use. When data architecture and governance programs are matured, an organization can gain competitive advantage. Most organizations are in the awareness phase and playing ‘catch-up’ as they are in aggressive growth phases.

Because of the potential advantage of a data governance program, there are typically ways of improving many core processes. If R&D is tied to Commercial, the company can create a feedback loop to better inform the potential pipeline. If the organization needs to purchase or acquire data on current drugs in their pipeline and combine it with sales and marketing information from Commercial data, more informed decisions can be made on where to focus investments. Advancements in Artificial Intelligence look at current products and potential re-use. 

There are many ways to take advantage of the investments in a data governance program. Assign working groups that have colleagues across the organization and focus on how to best leverage the data the company has or should acquire.

Step 9: Continue Improving Data Governance Your Way

Most data governance programs fail because they are overly bureaucratic, and the benefits are not worth the effort (at least in the eyes of the key stakeholders). It is key that you understand where your data is, who owns it and who is using it. Seek help in defining your data governance program, but make sure it’s yours and will work in your organization. Data is your most valuable asset and is well worth the time and effort.