Actionable intelligence with Hadoop

Millions of people on the Internet leave terabytes of digital footprints. Connecting these dots for the bigger picture is crucial to social media, eCommerce and other internet companies. How do you analyze such vast and unstructured information to yield actionable intelligence? SocialTwist came to us with the same question. We answered it using Hadoop Map/Reduce — a software framework for writing applications which process multi-terabytes of data in parallel on large clusters of commodity hardware, all this in a reliable, fault-tolerant way.

SocialTwist's marketing platform uses their Tell-A-Friend widget that serves billions of impressions containing key information in the form of web activity logs. They needed a solution that not only analyzes and reports campaign efficacy data but also one that is easy to develop and implement over their existing infrastructure.

Hadoop Analytics and Administration

We applied Hadoop Map/Reduce and were able to analyze terabytes of web activity logs at just one-tenth of the processing time. To convey the bigger picture out of these crumbs of information, we built a front-end using Ruby on Rails. Not only does it give detailed information on the underlying tasks and data in the database, it also serves as a dashboard of vital metrics like Top Sites and Top Users. SocialTwist can now relay key customer behavior and campaign efficacy information that much faster to their clients.

The new HUE — Hadoop and beyond

Moreover, we made it extremely simple to administer Hadoop Map/Reduce tasks by modifying HUE (Hadoop User Experience) — a web UI for Hadoop, started by Cloudera. Applying our expertise and knowledge of Django, Python & MooTools, we forked out a modified HUE that goes further in managing and administering Map/Reduce tasks.

  1. You can plan job execution better by deciding how many nodes you would need. There is also a wizard guides on history of job execution.
  2. You get the best of both worlds with connectivity to Apache Hadoop.
  3. Also, there is better upload support with uploads to S3.

With richer information of job execution and easy to use UI, the modified HUE takes administration of Hadoop to the next level — making it easier for developers to get more out of Hadoop in a short period of time.

    Download PDF
  • Technology:
  • Hadoop, RoR, Python, Django, Amazon S3, DSS
  • Services:
  • Performance Engineering, Analytics, Reporting
  • Area:
  • Social Marketing