Unified Data Platform for Business Intelligence

Case study

Unified Data Platform for Business Intelligence

Current statistics indicate that 40% of internet users worldwide have purchased goods or services online, either through a PC, smartphone, tablet, or other internet-enabled devices. This indicates an excess of 1 billion online buyers, which will only continue to grow in the future.

As a result, e-commerce companies experience a massive influx of data. The data can be leveraged to keenly derive the right mix of product descriptions, product listings, and suggestions, which can make the e-commerce transaction process easier and smoother. For example,consider that a user wishes to buy shoes from an e-commerce web site. The e-commerce web site can present related information, derived from the analysis of factual data. This information can include similar shoes bought by other users, popular items under the ‘Shoes’ category, and other items related to shoes, such as socks, laces, and so on.

However, e-commerce businesses find it difficult to make sense of their data and make informed decisions. The data for each department in such companies originate from various sources (for example, product knowledge is curated from the internet, manufacturing/partner database, etc). So, it is imperative that they leverage technology to harness such vast amounts of data, present information in a unified manner, and enable themselves and their customers to make well-informed decisions.

Our client is one of the leading e-commerce companies in the US and provides a secure, hassle-free marketplace platform for people to buy and sell items online. They needed a unified data platform that could manage all of their data requirements, stemming from different departments in the organization.


  • Build integrated product knowledge with unified dashboards to derive actionable product insights.
  • Enable multi-dimensional analytical queries (MDA) that enhances product listings and suggestions.


  • As the volume of online transactions kept increasing, there was no sure-fire way to validate the correctness of items listed by users on the website.
  • As a result, many  items (user listings) with incorrect data ended up in the system. These incorrect items do not generate any revenue for the client. 
  • Lack of product knowledge made it nearly impossible to prevent incorrect product listings from being added to the platform.
  • Such ‘noisy data’ made it impossible for the client to obtain relevant insights from the data and subsequently make the right business decisions.
  • Multiple ad-hoc pipelines were created to manage specific data requirements from different departments, the management of which became an operational bottleneck after a certain point of time.


We created and implemented a platform that brought together all of the existing data acquisition channels, based on the data type (batch, streaming, etc):

  • Real-time data ingestion pipelines to ingest data from OLTP (Online Transactional Processing) databases to OLAP (Online analytical processing).
  • Batch data ingestion pipelines to ingest data from partner data sources to OLAP.

The output would be routed to the respective storage systems (Graph DB, data warehouse, etc) for the ensuing systems to process.

This unified data platform provides a holistic view of the data ingested from disparate sources and subsequently helps build a knowledge base that contains 100% verified information about the listed products.

Tech stack

How our solution helped

Our ML-based, product knowledge management platform helped in increasing the user adoption of the client’s e-commerce platform by 25-40% and is poised to increase substantially in the future.

Overall approach

The following diagram provides an overview of the platform:

The platform was built for maintaining and managing data and converting it to usable product knowledge. The core concept of this is given below:

  • Data ingestion pipelines:
    • Run batch jobs to ingest data from partner data sources.
    • Gather real-time user data and classify it under different categories. .
    • The platform comes with different, predefined plugins, which users can just click, configure, and then deploy the pipelines instantaneously.
  • Product knowledge base:
    • We built a web crawler and content scraper for gathering information from the web and NLP was used to extract the required product knowledge.
    • The products are identified from different sources and merged into one consolidated item in the knowledge source to aid better analysis.
    • We used Named Entity Recognition (NER) to group the same products from different sellers on the basis of product title, features, models, and so on.


  • The unified platform has enabled different departmental teams to manage their data pipelines much more efficiently.
  • Department teams are now able to monitor data at a much granular level and draw valuable business insights from the dashboard metrics.
  • Due to the curated product knowledge, the client is now able to guide users more effectively by providing valid product options for their listings as suggestions.
  • As product listings are tagged with suggestions or options, buyers are now able to search for products even more precisely.

Talk to us