How to manage dark data with intelligent automation?

A recent research shows that between 50% to 75% of enterprise data is unstructured and generally inaccessible to enterprise systems. Most of the organizations source, collate and preserve a lot of information assets created during various business activities only for compliance purposes. Further, unstructured and semi-structured data can come from various sources like emails, texts, logs, documents, audio, video, images, invoices, purchase orders, mortgage applications, etc. These assets are referred to as dark data.

Often, enterprises use a dedicated entity (for example a private cloud service provider, an external data partner) to store and secure all their data, including dark data. In other words, an organization may choose to pay a higher cost for guarding data against adversaries and internal threats, but not to add value.

Dark matters

In astronomy, dark matter is considered to be invisible, inaccessible and the most mysterious substance. Gartner draws a comparison between dark matter and dark data – ‘Similar to dark matter in physics, dark data often comprises most organizations’ universe of information assets.’ Whereas Deloitte says, ‘In the context of business data, “dark” describes something that is hidden or undigested.’ It is obvious that dark data is something that is not readily available. Here is a list of factors that makes it hard for enterprises to analyze the hidden data:

  • No control over the data generating devices and apps due to lack of coordination between internal departments and third parties
  • No tool to capture and unlock dark data hidden in system silos
  • The sheer volume of data makes it hard for analytics
  • Hardly 25% of the available data are structured
  • Around 66% of data is missing or incomplete

In spite of all these challenges, the retention of dark data is valuable because they often provide qualitative data insight or insights into business operations, processes, data quality and more.
Mostly, the captured data goes wrong in some form or the other, and for customers and partners, that translates into huge challenges, wasted time, and lost business. Enterprises have to spend massive resources, often hire in-house data scientists, and trust vendors to convert this broad and ambiguous data into actionable insights.

From dark data into actionable insights

To turn dark data into actionable insights, it is important to understand data formats and document processing opportunities. Processes that involve readily accessible structured content can be automated using RPA, which includes UI level screen-scraping and API interactions. For processes that involve semi-structured and unstructured content formats, automation becomes complicated as it requires rules engines, artificial intelligence (AI) and machine learning (ML) to transform data into accessible formats.

Now, it is important for enterprises to map and prioritize automation opportunities so that they can get the best out of their data.

The automation opportunities are identified based on the input attributes, task complexity, process maturity and the quality of the output data.

Intelligent automation framework

The first generation RPA focuses mainly on structured data, where data extraction is straightforward and it usually results in just 30%-40% Straight Through Processing (STP). In an effort to bring structure to the unstructured data, enterprises turn to cognitive automation technologies, which is a convergence of RPA and AI and ML. RPA use cases are content-dependent and in cognitive automation models, the idea is to make the RPA bots learn from human behavior. For this, vision technology like optical character recognition (OCR), document extraction tools, ML or a combination of these capabilities are leveraged to bring structure to the unstructured and semi-structured data.

The major challenge in automating data extraction is due to the presence of voluminous unstructured dark data. Imaginea recommends a four-step process to extract insight from both structured and unstructured data. The steps are shown below:

Here is a typical automation flow that uses Intelligent Data Processing (IDP).:

With this framework, a much higher STP of 60%-80% can be expected.

The success rate of dark data extraction depends on how the content/document processing funnel is designed. Here is our tried and tested approach:

With this approach, organizations are realizing the practical benefits associated with intelligent automation. The benefits include:

  • Reduced dark data debts
  • Enable end to end automation
  • Reduced operating cost
  • Better ROI for automation initiatives

While the data is aggregated for analytics, the real value lies in the capabilities and the expertise that data analytics and artificial intelligence give to the businesses. Analytical modeling systems and artificial intelligence can give companies the data and the knowledge they need to develop specific value propositions and offer services to their customers. It allows companies to more easily use their mobile apps and applications to connect with their customers and inform their business.

Today, companies have started to embrace automation to have a seamless digital journey and improve their overall operational efficiency. The major use cases are given below:

Illuminate dark data

Existence of unused data may hold no value, instead it could maintenance incur costs. Taking action to gain visibility into organizational data is very important in today’s data-centric world. Data extraction is the first step towards getting insights into multiple facets of a business. While the RPA trend is a promising approach towards data extraction, it is still in shallow waters. Cognitive RPA takes it to the deep sea, powered by smart systems that promise to transform the automation landscape. To leverage the hidden ‘Dark Data’ treasure, enterprises need to understand how to progress from the nascent level to the cognitive level with intelligent data processing and drive ROI.


Your email address will not be published. Required fields are marked *