Learning to Love your Logs
Posted: June 22, 2009 at 5:01 pm | by Jim Pflaging
Saw an interesting column in InfoWorld on “Learn to Love Your Log Files” - http://tinyurl.com/lvhabg
The author, Roger Grimes, highlights a theme that is increasingly getting increased attention – the value of log files. The article gives practical ideas for implementing and managing log management systems. He also provides an interesting perspective on how SIEM and log management technologies fit together.
In my opinion, SIEM originated with the vision to be the single-pane of glass – to separate the signal from noise. From an architectural perspective, data management was generally an afterthought. Events were normalized and data discarded after a few weeks. As a result, the initial wave of vendors built their solutions around familiar data management systems such as Oracle databases or flat files. Over time, the reporting requirements became more demanding and the amount of data to be analyzed increased significantly.
The pendulum has shifted – data management is a central buying criterion for a logging or SIEM solution. Compliance might have started this trend, but now security is giving it the next push. Why? Threats are more sophisticated. Insiders don’t generate failed logons. So, you need to keep months of valid session detail if you want to find the low and slow anomalies. In order to keep up with these demands, many customers are expanding their data retention period as well as the scope of data analyzed to include ERP applications, credit card and ATM transactions….their most sensitive data.
The implication of these trends is massive data stores and more sophisticated data analysis – even for small firms. Log data repositories can easily reach into the 10s of terabytes for small firms and hundreds of terabytes for larger firms. It’s no surprise that for many organizations, security and event data is their largest single data store. As a result, customers are looking at long-term ROI and are pulling their enterprise data warehouse architects into data governance & compliance efforts.
Today, people from diverse roles across the enterprise need immediate access to security and GRC information. Having said that, you can’t trade off accuracy and completeness for ease of use, and it has to be tamper-proof. The implication is you need a system that is easy to use AND provides reports and trending information that is 100% accurate. That’s why some vendors who claim to be “Google for Logs” (fast but not 100% accurate) will have difficulty addressing the reporting, forensic, and retention requirements of the log management market.
Check out the article - another good contribution to the conversation.
permalink
MapReduce Made Easy - The Future of Database Analytics
Posted: June 11, 2009 at 3:01 pm | by Jim Pflaging
I’ve been noticing a lot of discussion online about MapReduce and Hadoop recently. While MapReduce may seem new, implementations have been around for years. Let’s take a closer look.
MapReduce is a software framework introduced by Google to support distributed computing for large data sets on clusters of computers. The objective of MapReduce is to get extremely fast answers from massive amounts of data. In the “Map” step, the master node takes the input, chops it up into smaller sub-problems, and distributes those to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker nodes process the smaller problem, and pass the answers back to its master node. In the “Reduce” step the master node then takes the answers to all the sub-problems and combines them to get the answer to the problem. One example of MapReduce is the Apache project Hadoop, a widely used open-source implementation of MapReduce.
So are these really new concepts? Not really. Some database systems with MPP architecture have been doing this for quite a while. While MapReduce is powerful, one of its drawbacks has been that each step of the MapReduce operation (filtering, grouping, and aggregation) is a separate, high-level programming abstraction that needs to be maintained by a developer and thereby increases data management total cost of ownership.
SenSage has been providing MapReduce capabilities with “in database” analytics commercially available since 2004. You might be saying, “yeah right”. Well, it’s true. We have over 400 deployed customers and patents to back it up.
We’ve simplified the promise of MapReduce. Namely, we’ve eliminated the hassle of intermediate programmatic effort to produce lightning-fast, in-memory analytics. SenSage combined a few pieces of our intellectual property with our MPP share nothing architecture to solve the problem:
- First, the SenSage columnar database supports parallel transformation and partitioning of data. In SenSage, SQL Map is like the group-by clause of an aggregate query. Reduce is analogous to the aggregate function (e.g., average or sum) that is computed over all the rows with the same group-by attribute.
- Second, since day one, SenSage has allowed users to write their own functions in SenSage SQL, which are automatically enabled for parallel execution using our MPP architecture. With Google, Hadoop, and many others, users have to write and maintain their own programs to accomplish the same thing. With SenSage, users write standard SQL and SenSage does the rest.
- Third is “IntelliSchema” – this is where it gets really cool. This is a SenSage innovation that is an abstraction layer between the original data and the analysis tools, and enables our MapReduce engine to execute queries successfully even if the underlying data schema changes. Intellischema gives our customers the ability to handle a wide variety of data sources and write standardized libraries of analytics while still maintaining the fidelity of the original event data. This allows any data source to automatically appear in relevant queries and reports.
It’s good to see technologies like MapReduce getting attention in the marketplace. As customers better understand the benefits, they can make more informed buying decisions.
permalink