Everyone knows that data is growing exponentially. The term that gets thrown around most is Big Data. However from a holistic standpoint, Big Data does not just include big volumes of data (terabytes and petabytes), but also the need for faster access to that data, as well as the need for integrating structured data with unstructured content.
The Three V Methodology that we embrace, is a comprehensive strategy for addressing all aspects of Big Data, and balancing the technical aspects of all three dimensions such that the target data architecture satisfies the client’s needs on all three:
- Volume: For scaling up to terabytes and petabytes of data, traditional databases usually do not cut it; hence we recommend the use of MapReduce technology, which can support huge volumes of data very quickly, across a distributed computing architecture. MapReduce originated at Google, and is now freely available using Hadoop and other open source implementations.
- Variety: In order to ingest hundreds of thousands of documents (or social media feeds or any other unstructured content), we recommend the use of Hadoop Distributed File System (HDFS), which is an open source technology that has become the de-facto standard for addressing Big Data needs. With our deep experience in enterprise data tools from vendors such as Oracle and Microsoft, we know how best to make use of their support for Hadoop, and how to create a data architecture that combines the best technologies for handling both structured (databases) and unstructured data (documents).
- Velocity: In the Big Data world, the need is not only for ingesting huge volumes of data in real time, but also extracting information from them equally fast. We use open source technologies such as MongoDB which are not SQL, but document-oriented databases. They support an innovative way of querying data, called NoSQL. It is many times faster than the traditional SQL-based inserts and queries on the database.
In a cool application of the three Vs of Big Data, Google scientists are working with the Centers for Disease Control and Prevention (CDC) to track the spread of flu around the world by analyzing what people are typing in to search. Since 2010 Google also has this information on a portal: http://www.google.org/flutrends/.
Another pioneer in this space is IKANOW. Their open source platform, Infinit.e , helps leverage the technologies mentioned above to get business value from Big Data.
Conclusion: Simply focusing on one of the 3 V’s individually isn’t going to solve your Big Data problems. You need to be able to ingest (variety and velocity), scale (volume), query (velocity), and analyze (velocity) this enormous amount of data.