There’s a growing understanding among smart life science organizations that their investments in their scientific data can’t be based on a one-and-done approach, where research data is set up for one goal, then discarded afterward. There’s too much future value in reusing the data for other important purposes. This is becoming especially true as machine learning and big data analytics-type activities are becoming popular, but require discoverable and actionable information. We’ve also seen a burst of activity driven by the COVID-19 pandemic as companies look to past research in hopes of finding viable repurposing opportunities that can be capitalized on quickly. Others have found dangers in the improper reuse of past data, going as far as having to retract scientific articles and suffer damage to their reputations.
One-and-done data design has been the default for most organizations for many years. Individual leads and teams set up their database and data collection tools with the sole purpose of completing the experiment or trial at hand. The window of being able to find, access or use the data is fixed. There’s little compelling reason to document the data beyond the immediate team’s use. Groups are often siloed, both by project or use case and by area of focus (e.g., oncology, inflammation, etc.). The lead or team may feel that they personally “own” the data and it is not seen as an asset of the broader organization. With the short-term use case in focus, designing the data environment in such a manner doesn’t prepare the data to be fully unlocked by the company, and is a potential lost future opportunity.
FAIR is the antidote to one-and-done data. It enables organizations to effectively reuse and capitalize on data in the future in perpetuity. FAIR principles define a data design philosophy that provides guides and standards for setting up experiments or trial data environments. FAIR stands for Findable, Accessible, Interoperable, Reusable.
Introducing FAIR
The antidote to one-and-done data is FAIR. FAIR is a new design philosophy to ensure future data reusability, even if those future uses are completely unknown at the time. It is a set of data environment principles, implemented at the initiation of the project, to ensure that data being created now is useful for the broader organization and the future. FAIR also removes the problem of a single experiment leader or team being the only people who understand the data; instead, the data is now owned and understood by the organization (e.g., when John quits the company, the wisdom doesn’t leave with him).
FAIR is an acronym for the following characteristics of the data environment:
Findable: The data can be found within the organization and its contents are easily discoverable. If a data set can’t be identified or searched for, it’s essentially lost.
Accessible: Other users must be able to access the data for use, including authentication and permissions. Data that can’t be accessed might as well not exist.
Interoperable: The data must be able to be integrated and useable with other data, even if we don’t know what the future other data will be.
Reusable: Most importantly, the data must be reusable. In addition to being findable, accessible and interoperable, the new user of the data must be able to understand the context, the caveats, the origination, the conditions, etc., of how the data was first created.
The biggest attribute of FAIR is rich and robust metadata. Metadata (data about data) is used by the new users to find, access, integrate and understand the now-filed data set. For example, a data set with an obscure file name and a series of numeric values with a vague column heading may be a complete and useless mystery after the first team has used it. It would be unfindable. If the data set was password protected and the credentials were lost, the data would be inaccessible. If no descriptive data accompanies the file or if the data was in a proprietary format, the context is missing and it becomes not reusable. For example, if the data numerically showed how a compound or protein changed over time, data about the environment, time constraints, instrumentation used, experiment set-up, etc., could be completely absent, meaning the new user would not be able to reliably use or integrate the data.
Capturing robust metadata does create additional activity and cost up front that may seem like a distraction to some. It takes time to document data. It’s worth it though, even if the efforts need to be justified out of the scope of the current project.
Being FAIR to your data design: Things to consider
Many organizations are just getting started with FAIR, and others are doing it spottily or not at all. Many organizations build their data environments from whole cloth and have the entire responsibility to implement FAIR principles. Others may use off-the-shelf tools or a hybrid solution, but as of this writing, most of these vendors aren’t reliably incorporating FAIR principles consistently. Leaders interested in FAIR should consider the following actions:
Build the company-wide and future perspective: Foster the idea that data should never be one-and-done for a specific study or trial, but should always be framed as a valuable asset that will be used in a cross-company and future-use way.
Learn and advocate the FAIR philosophy: Do some reading and talk to some experts to learn the principles and philosophy of FAIR. Become an expert yourself, so that you can teach others the value of FAIR and effective data reuse.
Lead from the top: It will likely be senior leadership who will first advocate for FAIR, and not the individual research leaders setting up projects. After all, the senior leaders will best understand value for the company and value for the future. Therefore, the senior leaders must drive the change, be it through persuasion or fiat.
Know implications at the technical level: When it’s actually time to implement FAIR, the technologists and set-up team will need to know how FAIR principles translate into actual technical specifications and requirements. Engage experts to help.
Implement always: Don’t make FAIR a selective or sometimes process. Insist that it will be a requirement in every data generation activity. Institutionalize the practice.
Consult with experts and insist on vendor participation or leadership: Find experts outside of your organization who really know FAIR inside and out to make the FAIR change happen. If vendors are involved, make sure they either know their FAIR or insist that they learn and comply.
Generally speaking, FAIR is about setting your company up to find unseen value in the future use of data. It’s also a good discipline—a sign that your research data set-up is thoughtful, organized and complete. Be FAIR with your data. You don’t know what the future will bring.