Hybrid data architecture for modern enterprise

Case study

Hybrid data architecture for modern enterprise

Today’s technology has come a long way from just showcasing, ‘Frequently bought together’ items at the moments of intent to data-activated marketing. With data-activated marketing, modern enterprises can anticipate user expectations and proactively nudge them in the right direction by offering irresistible deals. This type of engaged user experience needs extensive data management. If enterprises fail to renovate or reinvent their data architectures, they lose customers and market share.

Modern data architecture prioritizes data consumers over the data sources. Outcome of data architecture is not just defining the blueprint of how the data needs to be managed to and from various components within a system; it’s much more than that.

Our client offers a through-channel marketing automation platform that enables smart marketing automation and partner management. This end-to-end SaaS platform helps leading brands to manage all aspects of national and local marketing, which includes local sales partner management, marketing campaigns, budget and fund allocation.

What started as a pilot version of the platform written in PHP backed by PostgreSQL (RDBMS) helped them to grow their business in 5 years without any issues. As the product gained traction, the technical team found it difficult to scale up.

Requirements

The old data architecture was designed to create a set of relational tables, when a brand signs up on the marketing automation platform. The number of tables created were directly proportional to the number of brands and their target audience user base. When they had roughly 4000 brands as the clients & 450,000 tables in the production environment, managing petabytes of data became an issue. Further, scaling up by embracing the existing architecture would not help and identifying the right solution was the need of the hour.

Challenges

Our client wanted to fix their scalability issues through the complete transition to microservices architecture, an approach successfully adopted by Netflix. Adopting such an approach would cause disruption to their day-to-day operations, as their client offerings continued to grow.

Solution

Imaginea proposed a hybrid data architecture (schema & schema-less) design, which also involves a two phased architectural change. With this approach, there would be minimal disruption to their existing environment, at the same time offer maximum scalability. The two phases are:

  • Phase 1 – Renovation: Fix the core issues in the data architecture by moving the schemaless portion of the data to NoSQL capability of PostgreSQL.
  • Phase 2 – Reinvention: Implement microservices architecture and cloud adoption.

Tech stack

How Our Solution Helped

Number of data tables drastically reduced from 450,000to just 50 with improved performance and unprecedented data scaling

Overall approach

Imaginea had come up with the next generation data architecture by carefully evaluating and applying all the principles of data architecture – policies & governance for the business in the context. Here is the 3 pillar approach that fixed the scalability issues:

1. Dynamic Columns with JSONB

Instead of creating tables & indices on the fly to support the dynamic columns, we managed it by leveraging the JSON-B capability of PostgreSQL.. The GIN index created on the JSONB column was the ultimate query performance tuner.

Audience

IDLIST_IDFIRST_NAMELAST_NAMEJSONB Data (Brand Defined Attributes in Single Column)
11VishyAnand{ “blitz”; 2786, “rapid”; 2737 }
21GarryKasparov{ “classic”: “blitz”: 2801, “rapid”: 2783 }
32CristianoRonaldo{ “nationality”: “Portugal”, “height”: “1.87 m”}
42LionelMessi{ “nationality”: “Argentina”, “height”: “1.7 m”}
53GeethaPhogat{ “coach”: “Mahavir Singh”, “height”: “1.62 m”}

2. Partitioning

For databases with extremely large domain objects – (represented as Audience, Orders & Snapshots tables), where they had petabytes of data, partitioning them was a crafty solution to improve the query performance.

The benefits of this approach are:

  • It allows the execution of the queries to scan on much smaller tables and indices to find the relevant data.
  • Data Maintenance like dropping/archiving partitioned tables are easier (deletion of older data – instead of searching for them in a larger table, just delete the table where the older data is stored)
  • The updates and indices affects only the partitioned smaller table, instead of the huge master table and the respective index.

3. PostgreSQL Server Upgrade

The postgreSQL server used in the production environment at the time of discovery was 9.1. But in PostgreSQL 10 and later, a new partitioning feature ‘Declarative Partitioning’ was introduced, which exposes an easier way to set up partitions.

Verification Phase:

This proposal was simple and fluid and underwent a tough verification phase, which proved its effectiveness.

Cloud SQL from GCP was launched with a reasonable configuration (unlike a production DB server that has umpteen number of cores, GBs of RAM and powerful IOPS) and without doing any partition, performance benchmarking was carried out for a minimal load.

Data Load - Audience

#T-ShirtAudience Size
1S1,000
2M10,000
3L100,000
4XL1,000,000

All reads were < 2 seconds even for the 100K recipient lists

Results

  • Effective Space Management:
    The tables were drastically reduced from 450K to just 50 Tables with the hybrid approach and efficient partitioning.
  • Effective Time management:
    With the partitioning & GIN Indexing techniques applied on the JSONB column, the SLA was met.
  • Effective Resource Management:
    No additional cost & maintenance was incurred by introducing new tools, instead it leveraged the resources for the needs effectively.

Talk to us