Hybrid data architecture for modern enterprise

Case study

Hybrid data architecture for modern enterprise

Today’s technology has come a long way from just showcasing, ‘Frequently bought together’ items at the moments of intent to data-activated marketing. With data-activated marketing, modern enterprises can anticipate user expectations and proactively nudge them in the right direction by offering irresistible deals. This type of engaged user experience needs extensive data management. If enterprises fail to renovate or reinvent their data architectures, they lose customers and market share.

Modern data architecture prioritizes data consumers over the data sources. Outcome of data architecture is not just defining the blueprint of how the data needs to be managed to and from various components within a system; it’s much more than that.

Our client offers a through-channel marketing automation platform that enables smart marketing automation and partner management. This end to end SaaS platform helps leading brands manage all aspects of national and local marketing, which includes local sales partner management, marketing campaigns, budget and fund allocation.

What started as a pilot version of the platform written in PHP backed by PostgreSQL (RDBMS), helped them grow their business in five years without any issues. As the product gained traction, the technical team found it difficult to scale up.

Requirements

The old data architecture was designed to create a set of relational tables, when a brand signs up on the marketing automation platform. The number of tables created was directly proportional to the number of brands and their target audience user base. When they had roughly 4000 brands as clients and 450,000 tables in the production environment, managing petabytes of data became an issue. Further, scaling up by embracing the existing architecture would not help and identifying the right solution was the need of the hour.

Challenges

Our client wanted to fix their scalability issues through the complete transition to microservices architecture, an approach successfully adopted by Netflix. Adopting such an approach would cause disruption to their day-to-day operations, as their offerings continued to grow.

Solution

Imaginea proposed a hybrid data architecture (schema and schema-less) design, which also involves a two phased architectural change. With this approach, there would be minimal disruption to their existing environment, at the same time offer maximum scalability. The two phases are:

  • Phase 1 – Renovation: Fix the core issues in the data architecture by moving the schema-less portion of the data to NoSQL capability of PostgreSQL.
  • Phase 2 – Reinvention: Implement microservices architecture and cloud adoption.

Tech stack

How Our Solution Helped

Number of data tables drastically reduced from 450,000 to just 50 with improved performance and unprecedented data scaling.

Overall approach

Imaginea had come up with the next generation data architecture by carefully evaluating and applying all the principles of data architecture such as policies and governance for the business in the context. Here is the 3 pillar approach that fixed scalability issues:

1. Dynamic Columns with JSONB

Instead of creating tables and indices on the fly to support the dynamic columns, we managed it by leveraging the JSONB capability of PostgreSQL. The GIN index created on the JSONB column was the ultimate query performance tuner.

Audience

ID LIST_ID FIRST_NAME LAST_NAME JSONB Data (Brand Defined Attributes in Single Column)
1 1 Vishy Anand { “blitz”; 2786, “rapid”; 2737 }
2 1 Garry Kasparov { “classic”: “blitz”: 2801, “rapid”: 2783 }
3 2 Cristiano Ronaldo { “nationality”: “Portugal”, “height”: “1.87 m”}
4 2 Lionel Messi { “nationality”: “Argentina”, “height”: “1.7 m”}
5 3 Geetha Phogat { “coach”: “Mahavir Singh”, “height”: “1.62 m”}

2. Partitioning

For databases with extremely large domain objects (represented as Audience, Orders and Snapshots tables), where they had petabytes of data, partitioning them was a crafty solution to improve the query performance.

The benefits of this approach are:

  • It allows the execution of the queries to scan on much smaller tables and indices to find the relevant data.
  • Data Maintenance like dropping/archiving/deleting partitioned tables are easier. For example, to delete obsolete data using older approach, it has to be searched from a larger table and then delete specific data. In the new approach, as tables are partitioned, just delete the table which has the obsolete data.
  • The updates and indices affect only the partitioned smaller table, instead of the huge master table and the respective index.

3. PostgreSQL Server Upgrade

The PostgreSQL server used in the production environment at the time of discovery was 9.1. But in PostgreSQL 10 and later, a new partitioning feature, ‘Declarative Partitioning’ was introduced, which is an easier way to set up partitions.

Verification Phase:

This proposal was simple and fluid and underwent a tough verification phase which proved its effectiveness.

Cloud SQL from GCP was launched with a reasonable configuration (unlike a production DB server that has umpteen number of cores, GBs of RAM and powerful IOPS) and without creating any partition, performance benchmarking was carried out for a minimal load.

Data Load - Audience

# T-Shirt Audience Size
1 S 1,000
2 M 10,000
3 L 100,000
4 XL 1,000,000

All reads were < 2 seconds even for the 100K recipient lists

Results

  • Effective Space Management:
    The tables were drastically reduced from 450K to just 50 tables with the hybrid approach and efficient partitioning.
  • Effective Time management:
    With the partitioning and GIN indexing techniques applied on the JSONB column, the SLA was met.
  • Effective Resource Management:
    No additional cost and maintenance was incurred by introducing new tools, instead it leveraged the resources for the needs effectively.

Talk to us