SenSage Blogs
Security Intelligence: essential decision support for security, risk management and compliance operations

Back to SenSage Blogs Home

Posts Tagged ‘MapReduce’

MapReduce Made Easy - The Future of Database Analytics

Posted: June 11, 2009 at 3:01 pm | by Jim Pflaging

I’ve been noticing a lot of discussion online about MapReduce and Hadoop recently. While MapReduce may seem new, implementations have been around for years. Let’s take a closer look.

MapReduce is a software framework introduced by Google to support distributed computing for large data sets on clusters of computers. The objective of MapReduce is to get extremely fast answers from massive amounts of data. In the “Map” step, the master node takes the input, chops it up into smaller sub-problems, and distributes those to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker nodes process the smaller problem, and pass the answers back to its master node. In the “Reduce” step the master node then takes the answers to all the sub-problems and combines them to get the answer to the problem. One example of MapReduce is the Apache project Hadoop, a widely used open-source implementation of MapReduce.

So are these really new concepts? Not really. Some database systems with MPP architecture have been doing this for quite a while. While MapReduce is powerful, one of its drawbacks has been that each step of the MapReduce operation (filtering, grouping, and aggregation) is a separate, high-level programming abstraction that needs to be maintained by a developer and thereby increases data management total cost of ownership.

SenSage has been providing MapReduce capabilities with “in database” analytics commercially available since 2004. You might be saying, “yeah right”. Well, it’s true. We have over 400 deployed customers and patents to back it up.

We’ve simplified the promise of MapReduce. Namely, we’ve eliminated the hassle of intermediate programmatic effort to produce lightning-fast, in-memory analytics. SenSage combined a few pieces of our intellectual property with our MPP share nothing architecture to solve the problem:

  • First, the SenSage columnar database supports parallel transformation and partitioning of data. In SenSage, SQL Map is like the group-by clause of an aggregate query. Reduce is analogous to the aggregate function (e.g., average or sum) that is computed over all the rows with the same group-by attribute.
  • Second, since day one, SenSage has allowed users to write their own functions in SenSage SQL, which are automatically enabled for parallel execution using our MPP architecture. With Google, Hadoop, and many others, users have to write and maintain their own programs to accomplish the same thing.  With SenSage, users write standard SQL and SenSage does the rest.
  • Third is “IntelliSchema” – this is where it gets really cool. This is a SenSage innovation that is an abstraction layer between the original data and the analysis tools, and enables our MapReduce engine to execute queries successfully even if the underlying data schema changes. Intellischema gives our customers the ability to handle a wide variety of data sources and write standardized libraries of analytics while still maintaining the fidelity of the original event data.  This allows any data source to automatically appear in relevant queries and reports.

It’s good to see technologies like MapReduce getting attention in the marketplace. As customers better understand the benefits, they can make more informed buying decisions.

permalink