Skip to content

Posts tagged ‘Mapreduce’

20
Aug

Innovation and Big Data: A Roadmap


The bleeding edge of data and insight innovation is around next generation digital consumer experience.  Consumer behaviors are rapidly evolving….always connected, always sharing, always aware. Obviously new technology like Big Data drives and transforms  consumer behavior and empowerment.

With the influx of money, attention and entrepreneurial energy, there is a massive amount of innovation taking place to solve data centric problems (such as the high cost of collecting,  cleaning, curating, analyzing, maintaining, predicting) in new ways.

There are two distinct patterns in data-centric  innovation:

  • Disruptive innovation like predictive search which brings a very different value proposition to tasks like discover, engage, explore and buy and/or creates new markets!!
  • Sustaining innovation like mobile dashboards,   visualization  or data supply chain management which improves self service and performance of existing products and services.

With either pattern the managerial challenge is moving from big picture strategy to day-to-day execution.  Execution of big data or data-driven decision making requires a multi-year evolving roadmap around toolset, skillset, dataset, and mindset.

Airline loyalty programs are a great example of multi-year evolving competitive roadmaps. Let’s look at BA’s Know Me project.

British Airways “Know Me” Project

British Airways (BA) has focused on competitiveness via customer insight. It has petabytes of customer information from its Executive Club loyalty program and its website. BA decided to put customer big data to work in its Know Me program. The goal of the program is to understand customers better than any other airline, and leverage customer insight accumulated across billions of touch points to work.

BA’s Know Me program  is using data and applying it to customer decision points in following ways:

  • Personal recognition—This involves recognizing customers for being loyal to BA, and expressing appreciation with targeted benefits and recognition activities
  • Personalization — based on irregular disruptions like being stuck on a freeway due to an accident – A pre-emptive text message… We are sorry that you are missing your flight departure to Chicago. Would you like a seat on the next one at 5:15PM.  Please reply Yes or No.
  • Service excellence and recovery—BA will track the service it provides to its customers and aim to keep it at a high level. Given air travel constant problems and disruptions, BA wants to understand what problems its customers experience, and do its best to recover a positive overall result
  • Offers that inspire and motivate—BA’s best customers are business travelers who don’t have time for irrelevant offers, so Know Me program analyzes customer data to construct relevant and targeted “next best offers” for their consideration.

The information to support these objectives is integrated across a variety of systems, and applied in real-time customer interactions at check-in locations and lounges. Even on BA planes, service personnel have iPads that display customer situations and authorized offers. Some aspects of the Know Me program have already been rolled out, while others are still under development.

The Need for New Data Roadmaps

New IT paradigms (cloud resident apps, mobile apps, multi-channel, always-on etc.) are creating more and more complex integration landscapes with live, “right-now” and real-time data. With data increasingly critical to business strategy, the problems of poor quality data,  fragmentation, and lack of lineage are also taking center stage.

The big change taking place in the application landscape: application owners of the past expected to own their data. However, applications of the future will leverage data – a profound change that is driving the data-centric enterprise.  The applications of the future need one “logical” place to go that provides the business view of the data to enable agile assembly.

Established and startup vendors are racing to fill this new information management void.  The establish vendors are expanding on this current enterprise footprint by adding more features and capabilities. For example, the Oracle BI stack (hardware – databases – platform – prebuilt content) illustrates the data landscape changes taking place from hardware to mobile BI apps.  Similar stack evolution is being followed by SAP AG, IBM, Teradata and others.  The startup vendors typically are building around disruptive technology or niche point solutions.

To enable this future of information management,  there are three clusters of “parallel” innovation waves: (1) technology/infrastructure centric; (2) business/problem centric; and (3) organizational innovation.

IBM summarize this wave of innovation in this Investor Day slide:

datadrivers

Data Infrastructure Innovation

  • Data sources and integration — Where does the raw data come from?
  • Data aggregation and virtualization- Where it stored and how is it retrieved?
  • Clean high quality data — How does the raw data get processed in order to be useful?

Even in the technology/infrastructure centric side there are multiple paths of disruptive innovation that are taking along different technology stacks shown below.  

Read more »

6
Nov

What is a “Hadoop”? Explaining Big Data to the C-Suite


Keep hearing about Big Data and Hadoop? Having a hard time explaining what is behind the curtain?

The term “big data” comes from computational sciences to describe scenarios where the volume of the data outstrips the tools to store it or process it.

Three reasons why we are generating data faster than ever: (1) Processes are increasingly automated; (2) Systems are increasingly interconnected; (3) People are increasingly “living” online.

DataEvolutionAs huge data sets invaded the corporate world there are new tools to help process big data. Corporations have to run analysis on massive data sets to separate the signal from the noisy data.  Hadoop is an emerging  framework for Web 2.0 and enterprise businesses who are dealing with data deluge challenges – store, process, index,  and analyze large amounts of data as part of their business requirements.

So what’s the big deal? The first phase of e-commerce was primarily about cost and enabling transactions.  So everyone got really good at this. Then we saw differentiation around convenience… fulfillment excellence (e.g., Amazon Prime) , or relevant recommendations (if you bought this and then you may like this – next best offer).

Then the game shifted as new data mashups became possible based on… seeing who is talking to who in your social network, seeing who you are transacting with via credit-card data, looking at what you are visiting via clickstreams, influenced by ad clickthru, ability to leverage where you are standing via mobile GPS location data and so on.

The differentiation is shifting to turning volumes of data into useful insights to sell more effectively. For instance, E-bay apparently has 9 petabytes of data in their Hadoop and Teradata cluster. With 97 million active buyers and sellers they have 2 Billion page view and 75 billion database calls each day.  E-bay like others is racing to put in the analytics infrastructure to (1) collect real-time data; (2) process data as it flows; (3) explore and visualize. Read more »

13
Aug

Analytics-as-a-Service: Understanding how Amazon.com is changing the rules


“By 2014, 30% of analytic applications will use proactive, predictive and forecasting capabilities”  Gartner Forecast

“More firms will adopt Amazon EC2 or EMR or Google App Engine platforms for data analytics. Put in a credit card, by an hour or months worth of compute and storage data. Charge for what you use. No sign up period or fee. Ability to fire up complex analytic systems. Can be a small or large player”    Ravi Kalakota’s forecast 

—————————-

Big data Analytics = Technologies and techniques for working productively with data, at any scale.

Analytics-as-a-Service is cloud based… Elastic and highly scalable, No upfront capital expense. Only pay for what you use, Available on-demand

The combination of the two is the emerging new trend.  Why?  Many organizations are starting to think about “analytics-as-a-service” as they struggle to cope with the problem of analyzing massive amounts of data to find patterns, extract signals from background noise and make predictions. In our discussions with CIOs and others, we are increasingly talking about leveraging the private or public cloud computing to build an analytics-as-a-service model.

Analytics-as-a-Service is an umbrella term I am using to encapsulate “Data-as-a-Service” and “Hadoop-as-a-Service” strategies.  It is more sexy 🙂

The strategic goal is to harness data to drive insights and better decisions faster than competition as a core competency.  Executing this goal requires developing state-of-the-art capabilities around three facets:  algorithms, platform building blocks, and infrastructure.

Analytics is moving out of the IT function and into business — marketing,  research and development, into strategy.  As result of this shift, the focus is greater on speed-to-insight than on common or low-cost platforms.   In most IT organizations it takes anywhere from 6 weeks to 6 months to procure and configure servers.  Then another several months to load, configure and test software. Not very fast for a business user who needs to churn data and test hypothesis. Hence cloud-as-a-analytics alternative is gaining traction with business users.

Read more »

15
May

New Tools for New Times – Primer on Big Data, Hadoop and “In-memory” Data Clouds


Data growth curve:  Terabytes -> Petabytes -> Exabytes -> Zettabytes -> Yottabytes -> Brontobytes -> Geopbytes.  It is getting more interesting.

Analytical Infrastructure curve: Databases -> Datamarts -> Operational Data Stores (ODS) -> Enterprise Data Warehouses -> Data Appliances -> In-Memory Appliances -> NoSQL Databases -> Hadoop Clusters

———————

In most enterprises, whether it’s a public or private enterprise, there is typically a mountain of data, structured and unstructured data, that contains potential insights about how to serve their customers better, how to engage with customers better and make the processes run more efficiently.  Consider this:

  • Online firms–including Facebook, Visa, Zynga–use Big Data technologies like Hadoop to analyze massive amounts of business transactions, machine generated and application data.
  • Wall street investment banks, hedge funds, algorithmic and low latency traders are leveraging data appliances such as EMC Greenplum hardware with Hadoop software to do advanced analytics in a “massively scalable” architecture
  • Retailers use HP Vertica  or Cloudera analyze massive amounts of data simply, quickly and reliably, resulting in “just-in-time” business intelligence.
  • New public and private “data cloud” software startups capable of handling petascale problems are emerging to create a new category – Cloudera, Hortonworks, Northscale, Splunk, Palantir, Factual, Datameer, Aster Data, TellApart.

Data is seen as a resource that can be extracted and refined and turned into something powerful. It takes a certain amount of computing power to analyze the data and pull out and use those insights. That where the new tools like Hadoop, NoSQL, In-memory analytics and other enablers come in.

What business problems are being targeted?

Why are some companies in retail, insurance, financial services and healthcare racing to position themselves in Big Data, in-memory data clouds while others don’t seem to care?

World-class companies are targeting a new set of business problems that were hard to solve before – Modeling true risk, customer churn analysis,  flexible supply chains, loyalty pricing, recommendation engines, ad targeting, precision targeting, PoS transaction analysis, threat analysis, trade surveillance, search quality fine tuning,  and mashups  such as location + ad targeting.

To address these petascale problems an elastic/adaptive infrastructure for data warehousing and analytics capable of three things is converging:

  • ability to analyze transactional,  structured and unstructured data on a single platform
  • low-latency in-memory or Solid State Devices (SSD) for super high volume web and real-time apps
  • Scale out with low cost commodity hardware; distribute processing  and workloads

As a result,  a new BI and Analytics framework is emerging to support public and private cloud deployments.

Read more »

%d bloggers like this: