Keep hearing about Big Data and Hadoop? Having a hard time explaining what is behind the curtain?
The term “big data” comes from computational sciences to describe scenarios where the volume of the data outstrips the tools to store it or process it.
Three reasons why we are generating data faster than ever: (1) Processes are increasingly automated; (2) Systems are increasingly interconnected; (3) People are increasingly “living” online.
As huge data sets invaded the corporate world there are new tools to help process big data. Corporations have to run analysis on massive data sets to separate the signal from the noisy data. Hadoop is an emerging framework for Web 2.0 and enterprise businesses who are dealing with data deluge challenges – store, process, index, and analyze large amounts of data as part of their business requirements.
So what’s the big deal? The first phase of e-commerce was primarily about cost and enabling transactions. So everyone got really good at this. Then we saw differentiation around convenience… fulfillment excellence (e.g., Amazon Prime) , or relevant recommendations (if you bought this and then you may like this – next best offer).
Then the game shifted as new data mashups became possible based on… seeing who is talking to who in your social network, seeing who you are transacting with via credit-card data, looking at what you are visiting via clickstreams, influenced by ad clickthru, ability to leverage where you are standing via mobile GPS location data and so on.
The differentiation is shifting to turning volumes of data into useful insights to sell more effectively. For instance, E-bay apparently has 9 petabytes of data in their Hadoop and Teradata cluster. With 97 million active buyers and sellers they have 2 Billion page view and 75 billion database calls each day. E-bay like others is racing to put in the analytics infrastructure to (1) collect real-time data; (2) process data as it flows; (3) explore and visualize. Read more