It is no surprise that data has seen explosive growth over recent years.
The reason for this rapid data creation lies in the billions of computers and smartphones generating data, as well as the devices, appliances and other items that comprise the internet of things. This increase in data has generated a concurrent increase in data’s value, because of its ability to generate critical business insights and improved outcomes through analytics.
Organizations have rushed to adopt modern data management technologies to analyze massive amounts of data and derive these analytical insights. However, even with advanced technologies, organizations still have vast quantities of untapped dark data that could potentially reveal additional analytical insights.
Dark data comes in a variety of forms, including information gathered by sensors and data from analog databases still awaiting processing and transformation into digital assets. Regardless, none of it is used to derive analytical insights or for business decision-making.
The reason the majority of dark data remains unused is because it is typically unstructured. Unstructured data is often in formats (e.g. text, images, videos) that make it tough to categorize, difficult for data management tools to read, and impossible to be analyzed without conversion into the structured formats nearly all analytic algorithms require. Businesses don’t want to devote the time to data governance and invest in the resources required to have that data analyzed.
BRINGING THE LIGHT TO DARK DATA
For organizations to sift through large amounts of dark data while also digitizing and migrating analog databases into their data environments requires a comprehensive data governance program. Data governance is all about formally orchestrating people, processes and technology within organizations to leverage all data as an enterprise asset.
Data governance allows organizations to shine a light on their untouched cache of dark data to better understand what assets might lurk within that could be used for business purposes and analytics. By establishing a business glossary, data catalog and data dictionaries, tracking data lineage, managing metadata, and creating uniform policies and procedures for dark data migration, organizations develop a clear-cut framework for responsibilities and standards, which in turn ensures dark data may be better understood among all data users.
In addition, data quality scoring and monitoring is a key component of a comprehensive data governance framework. As dark data is created and moves within the data supply chain, the integrity of that data is often an unknown. For dark data to transition to usable data that is trusted by data consumers, it needs to be subject to the data controls and quality monitoring that are a key dimension of enterprise data governance.
Data governance is not a one-off project. It’s an ongoing initiative. The process evolves and improves over time. Eventually, organizations can layer in analytics to continue to strengthen organizational data quality. By performing analytics and machine learning in concert with data quality business rules, organizations substantially enhance the efficiency and effectiveness of their data integrity checks. However, successful governance requires a modern data intelligence platform with a set of capabilities that work in sync with one another and maximize analytical insights.
LEVERAGING ADVANCED TECHNOLOGIES TO TRANSFORM DARK DATA TO STRENGTHEN BUSINESS DECISIONS
A successful data governance program that allows businesses to leverage dark data requires a data intelligence platform with a multitude of capabilities beyond just data governance. The solution should include data quality and analytics capabilities to help deliver complete transparency into an organization’s data ecosystem.
The platform should deliver a complete 360-degree view of an organization’s data landscape, from data availability, its owner/steward, lineage and usage, to its associated definitions, synonyms and business attributes. It should allow all data users to easily define, track and manage all aspects of their data assets, giving them confidence in every business decision they make.
The solution needs to integrate data quality and data prep with a foundation of governance and promote a community approach that bridges the business to technical divide by translating IT’s technical lingo into easy-to-understand terms. By bridging this divide, collaboration can build trust among data producers, enablers, and consumers by clearly defining ownership and accountability for every data asset in the organization. All types of users will be able to answer pertinent questions about data, such as, “How is it used?” “How is it defined?” “What is its quality?”
The solution suite should feature high-volume data quality checks to verify the quality of data and ensure continued trust among business users. It should also leverage analytics capabilities and apply machine learning algorithms for self-learning to continuously improve data quality across the data supply chain. Lastly, it should enable the generation of analytic insights that all organizations are seeking for business and competitive advantage.
An all-inclusive data intelligence platform with a solid data governance framework is the best way to turn your untapped dark data into viable (and valuable) data assets, turning raw information into real insights for your business.
Emily Washington is a contributor to Grit Daily. She is senior vice president of product management at Infogix, where she is responsible for driving product strategy, product roadmaps, product marketing and vertical solution initiatives. Since joining Infogix in 2002, Emily has worked closely with product development teams and customers to drive introduction and adoption of all new products.