Skip to content

Predictive Analytics 101

UsefulDataGapInsight, not hindsight is the essence of predictive analytics. How organizations instrument, capture, create  and use data to predict next steps/actions is fundamentally changing the dynamics of work, life and leisure.

Analytics is the discovery and communication of meaningful patterns in data (corporate, product, channel, and customer). It’s not the data but the signals buried in and inferred from data. There are four distinct types of analytics:

  • What happened, where and when – Descriptive Analytics
  • Why did it happen – Diagnostic or Prescriptive Analytics
  • What is likely to happen – Predictive Analytics
  • Guided actions and steps – Machine Learning, AI and Cognitive Learning
  • Conversational AI – Chatbots

All of these need data.  Data is the new raw material. Cloud is the new pipeline. Machine Learning is the new refinery.  Digital use cases is the new experience frontier.

The Next Frontier for Business

Having data or collecting data is not valuable. Using data is. Analytical insights powered by ML, Deep Learning,  AI are changing customer expectations and corporate strategy.

Google CEO Sundar Pichai laid out the emerging corporate mindset: “Machine learning [powered predictive analytics] is a core, transformative way by which we’re rethinking how we’re doing everything. We are thoughtfully applying it across all our products, be it search, ads, YouTube, or Play. And we’re in early days, but you will see us — in a systematic way — apply machine learning in all these areas.”

A decade ago, GE was in the mode of “the product breaks, we fix it,” Today, GE has >$100 billion in revenue tied to data-driven SLA asset maintenance contracts, whereby it gets paid based on a product being in service — a power plant turbine, a jet engine, a locomotive. “Power by the Hour” leasing model means the maintenance costs and service outages are GE’s headaches not clients. GE needs predictive analytics to help meet SLAs, avoid downtime, predict safety issues, and make those contracts profitable.

Predictive Analytics is the new differentiator. Take for instance, Uber, which is a data driven business model powered by a huge trove of real-world, real-time preference/usage /feedback data. Uber’s mission is to architect, develop, and deploy world-class data systems to empower multiple services. The backend data engineering group is responsible for real-time business metrics aggregation, data warehousing and querying, large scale log processing, schema and data management as well as a number of other analytics infrastructure systems.

It’s a new world with new rules around man+machine interactions.  Once-a-day sensor readings are moving to real-time rule based or machine learning platforms. Consumer facing interactions are evolving from… “How companies find customers” to “how customers find companies today”.  Strategies are shifting from…. Serving customers with “with silo’d channels were enough” to now “seamlessly integrating multi-channels, screens, devices.”  Demographic segmentation is evolving from simple behavior segmentation to complex 1:1 predictive personalization.

I believe that we are on the cusp of a multi-year predictive analytics revolution that will transform everything. However, change will be slower than people think as legacy systems have to replaced; it’s impact greater than people envision in the long run.  Analytics and AI will be highly disruptive to some industries, affecting not only revenue and cost structure but also shaking up the core business and operating models. The scope of predictive analytics is also expanding considerably as human behavior is modeled and expressed mathematically.



Using analytics to compete and innovate is a multi-dimensional issue. It ranges from simple (reporting) to complex (prediction) to learning (AI). For additional case studies and examples see the complementary post Business Analytics 101.

Reporting on what is happening in your business right now is the first step to making smart business decisions. This is the core of KPI scorecards or business intelligence (BI). The next level of analytics maturity takes this a step further. Can you understand what is taking place (BI) and also anticipate what is about to take place (predictive analytics).

By automatically delivering relevant insights to end-users, managers and even applications,  predictive decision solutions aims to reduces the need of business users to understand the ‘how’ and focus on the ‘why.’

The end goal of  predictive analytics = [Better outcomes, smarter decisions, actionable insights,  relevant information]. How you execute this varies by industry and information supply chain (Raw Data -> Aggregated Data ->  Contextual Intelligence -> Analytical  Insights (reporting vs. prediction) -> Decisions (Human or Automated Downstream Actions)).

whatisBIThere are four types of data analysis:

    • Simple summation and statistics
    • Predictive (forecasting),
    • Descriptive (business intelligence and data mining) and
    • Prescriptive (optimization and simulation)

Predictive analytics leverages four core techniques to turn data into valuable, actionable information:

  1. Predictive modeling
  2. Decision Analysis and Optimization
  3. Transaction Profiling
  4. Predictive Search (supervised machine learning)

Predictive Modeling

Predictive modeling mathematically represents underlying relationships in historical data in order to explain the data and make predictions, forecasts or classifications about future events.  Example of predictive modeling, next best action (or offer) in e-commerce was covered in this post.

Predictive models analyze current and historical data on individuals to produce metrics such as scores. These scores rank-order individuals by likely future performance, e.g., their likelihood of making credit payments on time, or of responding to a particular offer for services. Predictive models can also detect the likelihood of a transaction being fraudulent (Risk Detection). Predictive models are frequently operationalized in mission-critical transaction systems and drive decisions and actions in near real time. A number of analytic methodologies underlie solutions in this area including:

  • Applications of both linear and nonlinear mathematical programming algorithms, in which one objective is optimized within a set of constraints,
  • Advanced “neural” systems, which learn complex patterns from large data sets to predict the probability that a new individual will exhibit certain behaviors of business interest. Neural Networks (also known as Deep Learning) are biologically inspired machine learning models that are being used to achieve the recent record-breaking performance on speech recognition and visual object recognition.
  • Statistical techniques for analysis and pattern detection within large datasets.

Predictive models summarize large quantities of data to amplify its value.  The value chain for predictive modeling in a M2M scenario is shown below (source: Greenplum Blog). It’s all about having the right people and right models.

Big Data Value Chain

Decision Analysis and Optimization

Decision analysis refers to the broad quantitative field that deals with modeling, analyzing and optimizing decisions made by individuals, groups and organizations.Some applications include optimizing supply chain management, tracking key performance indicators, uncovering hidden sales opportunities and determining runaway operating costs.

While predictive models analyze multiple aspects of individual behavior to forecast future behavior, decision analysis analyzes multiple aspects of a given decision to identify the most effective action to take to reach a desired result.

Most consulting firms leverage decision analysis to provide bespoke data-driven solutions to a variety of business applications. Apart from statistical modeling and data analysis, the focus is also on understanding business challenges and delivering solutions.  Integrated approaches to decision analysis incorporate the development of a decision model that mathematically maps the entire decision structure; proprietary optimization technology that identifies the most effective strategies, given both the performance objective and constraints; the development of designed testing required for active, continuous learning; and the robust extrapolation of an optimized strategy to a wider set of scenarios than historically encountered.

Optimization capabilities also include a proprietary mathematical modeling and programming language, an easy-to-use development and visualization environment, and a state-of-the-art set of optimization algorithms.

Transaction Profiling

Transaction profiling is a technique used to extract meaningful information and reduce the complexity of transaction data used in modeling.

Many solutions operate using transactional data, such as credit card purchase transactions, or other types of data that change over time. In its raw form, this data is very difficult to use in predictive models for several reasons. First, an isolated transaction contains very little information about the behavior of the individual who generated the transaction. In addition, transaction patterns change rapidly over time. Finally, this type of data can often be highly complex.

To overcome these issues, a set of proprietary techniques are used to transform raw data into a mathematical representation that reveals latent information, and which make the data more usable by predictive models. This profiling technology accumulates data across multiple transactions of many types to create and update profiles of transaction patterns. These profiles enable the neural network models to efficiently and effectively make accurate assessments of, for example, fraud risk and credit risk within real-time transaction streams.

Increasingly, teams are pushing the envelope of how to use information retrieval, machine learning, computational linguistics, matrix and graph algorithms, unsupervised clustering & data mining to solve profiling problems.

Deep Learning, Predictive Search (supervised Machine Learning)

The future of consumer engagement is an engaging and immersive experience across formats. To enable this… Machine intelligence is getting increasingly married to human insight.

Machines (software robots) increasingly are learning to teach themselves to recognize objects, text, spoken words and more. They are also able to interface/interact with people in  a natural way. Apple Siri, Google Maps are examples of this.

Machine Learning (ML) is how signals in the data are uncovered via Supervised learning…. Devise an Algorithm A that: Given training set S…Finds a function F that given an input… Returns an output.

The type of target scenarios (analytics based data products) include:

  • Personalization based on customer behaviors or the absence of them (“We are sorry we missed you this week at Starbucks after twelve straight weeks of enjoying your company! Here is a free “Venti Blonde” for you”);
  • Personalization based on social media relationships (“Several of your Facebook friends have recently enjoyed visits to our Spa, so we’re offering you 20% off to try it yourself”);
  • Personalization with regard to cross-sell (“We know you’ve enjoyed our sister restaurant in the past, so if you or your family visit any of our other restaurants next week, here’s a coupon for a free appetizer”);
  • Personalization based on location (“We see you have just landed in New York JFK, and your

    final destination is Marriott in Times Square. Here is a $10 Uber taxi coupon to get you there in 45 


A range of start-ups – Cue, reQall, Donna, Tempo AI, MindMeld and Evernote – and firms like Apple, Google, Facebook, Microsoft, GE are working on various forms of supervised learning also known as predictive search — new tools that act as personal valets, anticipating what you need before you ask for it.

Google, for instance, is continuously evolving search with machine learning. Almost all Google’s products have some machine learning. Entire categories wouldn’t be possible without machine learning. Google Photos, the search feature where you can search and it finds things like dogs because it can recognize what a dog looks like is an example.

The figure below captures the essence of predictive search (or data science) quite well (source: Data Science with Spark)


Larry Page, Google CEO,  described the “perfect search engine” as something that “understands exactly what you mean and gives you back exactly what you want.” The shift toward contextual or predictive search is driven by data — big data.

  • Google launched predictive search back in 2004 with Google Suggest, which was renamed Google AutoComplete in 2010.
  • In 2010, Google Instant came on the scene, generating look-ahead search results as users type.
  • Google’s  Knowledge Graph in 2013 further enhances predictive search by predicting what type of information a user is searching for when they search a celebrity name “Brad Pitt” and generates specific related content right alongside normal search results.
  • Google Now is the next version of predictive search, serving as a valet or personalized assistant that can predict your needs, wants, and deep desires.  This is basically taking multiple buckets of data and intelligently connecting them to facilitate decisions….everyday data supported decision making. For some, Google Now delivers information about the traffic on morning commute, your updated flight itinerary, and the results of last night’s hockey game on your phone, without your  asking.
  • Singularity… Google has bought 15+ Robotics and AI companies like Deep Mind and hiring talent like Ray Kurzweil, to heavily integrate ML and NLP.

How does Google Now work….In order to provide relevant contextual info that relates to you and only you, Google uses your private data – people you know, documents, images, hangouts, location, e-mail, daily calendar, and other info – in order to keep tabs on things like search preferences, appointments, flight reservations, payments and hotel bookings.  Or auto-suggesting restaurants from the Zagat’s guide to have dinner at.

Google has become a lifestyle/ambient brand. Having Android on every smartphone allows Google to do creative things enabling more augmented reality leveraging the hybrid data ecosystem. Google is also in a unique position to know what information people are most searching for and when they want it based on Web searches processed by the search engine daily.  The different cloud services (YouTube etc.) that it controls creates a web of clickstream data that is unsurpassed.

Facebook and Apple might be the closest to Google in terms of knowledge about your life.  Embedded in IOS is iBeacon technology that can pinpoint your location to within a few feet. IBeacon is software that enhances the location-tracking services in an iPhone, an iPad Mini, or any device running iOS 8+. For retailers desperate to turn smartphones into a sales portal, it provides a quick way to target ads and other messages to engage consumers as they walk outside or through a store. For Apple, it’s a chance to collect valuable behavior data and create momentum for Apple Pay.

Facebook’s M personal assistant is a new entrant in the predictive analytics game. M appears to be the first tangible manifestation of Facebook’s machine-learning efforts. M is a combination of machine and human intelligence, trained by a huge dataset augmented by “M trainers.” With its unique access to posts, pictures etc. – that form the basis of Facebook’s Social Graph — and how they relate to one another, Facebook has the ability to index information at a scale that rivals Google’s comprehensive Web indexing capabilities.

IBM is transforming itself for new era of “cognitive computing” which IBM aims to drive by offering Watson supercomputing capabilities to businesses and developers via the cloud. Watson learns from user interactions and can answer complex questions.

It’s amazing how Microsoft fumbled the ball on Predictive Search enabled innovation.  They were touching every aspect of the Hybrid Data Ecosystem and failed to capitalize on this (history might call this a textbook case of Innovator’s Dilemma). More recently, Microsoft is attempting to make up ground with Cortana.


Decision Analytics: Automated Insights is the Objective

Just because firms have a lot of data doesn’t mean they’re doing a good job of acting on it.

Converting data -> information -> knowledge -> decision/action is the holy grail of analytics.  Just simply getting insight is not enough if you can’t act on it.  The closing of the loop from insight to automated action is where the big challenges lie. See The Notification Economy for some examples of how alerts and notifications are helping close the analytics loop.

The core challenge is putting it all together – Science + Art + Scale. A core question that I try to evaluate firms on:  Do they have the right toolset, dataset, skillset and mindset for closed loop decision analytics? What is their maturity on each of these dimensions?  But even more fundamental is the question:  Do you have the right foundation to handle and convert the growing volume of data?

So asking questions of your data is only effective if you know the right questions to ask.  Common sense but suprisingly uncommon.  I have seen many firms investing millions in infrastructure like SAP HANA or Oracle Exadata and don’t really have a clue how to use it.  A good salesperson gave them a great deal so they bought it.

If  you don’t know where you are going…any path will get you there. Exploratory Analytics and data discovery are new techniques aimed at helping unravel that locked information and use it to the advantage of the company. The key is a powerful merger of statistical data mining and a consultative approach which enables companies to make more effective decisions while addressing their business challenges.

Informatica illustrates the evolving challenge facing us as we migrate towards the Internet of Things.



However, delivering real-time actionable intelligence is not easy. Closed-loop performance systems that deliver continuous innovation and insight are tricky to build and maintain. Applications include marketing campaigns, customer behaviors, risk management, operations, financial and investment management.  The challenge is not insight but the evolving context. What was interesting 2 weeks ago may no longer be interesting to a target individual. So the context (rules engine) has to evolve continously… basically the ability to learn.

Below is a figure from HP that illustrates this central tenet of predictive analytics. You are free to replace the HP products with your own vendors 🙂   But the foundational building blocks….Enterprise data virtualization or aggregation are not trivial at large firms with legacy applications and diverse platforms.

The modern business analyst needs data from all over the place:  the data warehouse, but also the Web, big data, production systems, as well as via partners and vendors. In fact, the typical analyst spends more than 50% of the time chasing data, which slows delivery of analytic insights and limits the time available for thorough analysis. Some refer to this conundrum as “the data problem.”

Who are some Predictive Analytics Providers

Predictive Analytics as a core strategy enabled by big data is happening in regulated (Health, Wealth) and unregulated markets (Retail, Media, Publishing).

Full Range of Analytics include:  Reporting, Relational & Multi-Dimensional OLAP, Discovery, Decisioning, Scorecards and Dashboards Vendors who provide this capability include:

  • Marketing services market — Fair Issac, Acxiom, Epsilon, Equifax, Experian, Harte-Hanks, InfoUSA, KnowledgeBase, Merkle and TargetBase, among others.  These vendors compete with traditional advertising agencies and companies’ own internal information technology and analytics departments.
  • Origination market —  Fair Issac, Experian, Equifax, and CGI, among others.
  • Customer management market — Fair Issac, Experian, among others.
  • Fraud solutions market — Fair Issac, Actimize, a division of NICE Systems, ID Analytics, Experian, Detica, a division of BAE, SAS and ACI Worldwide, a division of Transaction Systems Architects, in the banking market; IBM and ViPS in the healthcare segment; and SAS, Infoglide Software Corporation, NetMap Analytics and Magnify in the property and casualty and workers’ compensation insurance market.
  • Collections and recovery solutions market — Fair Issac, CGI, Experian, and various boutique firms for software and ASP servicing and in-house scoring and computer science departments, along with the three major U.S. credit reporting agencies and Experian-Scorex for scoring and optimization projects.
  • Insurance and healthcare solutions market — Fair Issac, Emdeon, Ingenix, ViPS, MedStat, Detica, a division of BAE, SAS, Verisk Analytics and IBM.
 These vendors are classified into a variety of market categories:
  • scoring model builders;
  • enterprise resource planning (“ERP”) and customer relationship management (“CRM”) packaged solutions providers;
  • business intelligence solutions providers;
  • business process management and business rules management providers;
  • providers of credit reports and credit scores;
  • providers of automated application processing services;
  • data vendors;
  • neural network developers and artificial intelligence system builders;
  • third-party professional services and consulting organizations;
  • providers of account/workflow management software; and
  • software companies supplying modeling, rules, or analytic development tools.

Behind the Covers: Analytics Techniques in Play

  • linear algebra
  • basic statistics
  • linear and logistic regression
  • data mining
  • predictive modeling
  • cluster analysis
  • association rules
  • market basket analysis
  • decision trees
  • time-series analysis
  • forecasting machine learning
  • Bayesian and Monte Carlo Statistics
  • matrix operations
  • sampling
  • text analytics
  • summarization
  • classification
  • primary components analysis
  • experimental design
  • unsupervised learning
  • constrained optimization

No longer SAS or SPSS: NEW Analytics Infrastructure Techniques in Play

  • Columnar databases
  • Analytic accelerators
  • Hadoop/MapReduce
  • NoSQL engines
  • Stream computing
  • In-memory analysis
  • Workload optimization
  • Scripting and development tools
  • Complex event processing
  • Information integration
  • Scalable storage infrastructure
  • High-capacity warehouse
  • Visualization – Data Discovery
  • Cloud infrastructure for Analytics
    • Amazon Elastic Map Reduce
    • Google BigTable

No longer 1-2-3-4: Analytics on Everything

  • POS data
  • Social media
  • External feeds
  • Payments
  • Log data
  • Telephone conversations
  • RFID Scans
  • Events
  • Emails
  • Sensors
  • Free-form text
  • Geospatial
  • Audio
  • Still images/videos
  • Transactions
  • Call center notes

Predictive Analytics Project Phases

BigDataAdoptionStages There are two distinct patterns in predictive analytics innovation:

  • Disruptive innovation like predictive search which brings a very different value proposition and/or creates new markets!!
  • Sustaining innovation like dashboards or visualization which improves performance of existing products and services.

In executing either pattern, you tend to go thru the same project steps.  There are four main stages of any analytics project   —  Educate,  Explore, Engage and Execute.

  • In the Educate stage, the primary focus is on awareness and knowledge development.
  • In Explore stage the focus is on developing an organization’s roadmap for big data development.
  • In the Engage stage, organizations begin to prove the business value of big data, as well as perform an assessment of their technologies and skills.
  • In the Execute stage, big data and analytics capabilities are more widely operationalized and implemented within the organization.

Depending on the use case engineering the Analytics solution (Raw Data -> Aggregated Data ->  Contextual Intelligence -> Analytical  Insights (reporting vs. prediction) -> Decisions (Human or Automated Downstream Actions)) will require choices and decisions along various dimensions.

The range of choices you will have to make in Implementing Analytics project shown below (source: IBM DeveloperWorks). Also see the complementary post  Executing Analytics 101  and  Outsourcing Analytics and Data Science – Vendors, Models and Approaches.


 Putting it all together – Analytics Use Cases

“By 2014, 30% of analytic applications will use proactive, predictive and forecasting capabilities”  Gartner Forecast

While it is true that big data and predictive analytics is in its infancy, it is growing at a maddening pace. Projects vary from the expected to the unexpected, and even to the esoteric, whimsical and paranoid.


For a detailed posting on Analytics Use Cases see:

Some interesting use cases are illustrated below.

Predictive Analytics at FINRA – Stock Market Surveillance

One of the best examples of predictive analytics is seen at Financial Industry Regulatory Authority.  FINRA is an independent regulator overseen by the Securities and Exchange Commission,  chartered to monitor theU.S. stock market. FINRA oversees every brokerage firm and broker doing business with the U.S. public and monitors trading on the U.S. stock markets.

FINRA finds evidence of market manipulation by assembling 75+ billion market events into a holistic picture of the U.S. securities market every day. The three activity streams are:

Collect and Create

  • Up to 75 billion events per day
  • 13 National Exchanges, 5 Reporting Facilities
  • Reconstruct the market from trillions of events spanning years

Detect & Investigate

  • Identify market manipulations, insider trading, fraud and compliance violations
  • How… Apply hundreds of surveillance algorithms against massive amounts of data in multiple products (Equities, Options, etc.) and across multiple exchanges (NASDAQ, NYSE, CBOE, etc.).

Enforce & Discipline

  • Ensure rule compliance
  • Fine and bar broker dealers
  • Refer matters to the SEC and other authorities



Predictive Analytics at, Tinder and OkCupid

See the full posting on how predictive analytics and big data is now the core behind match making business models….  Love, Sex and Predictive Analytics:, Tinder, OkCupid


Predictive Analytics: Coupons in Grocery Stores

  • Will the customer buy this product or not buy this product?
  • What offers might result in higher take rates?
  • What offers based on consumer behavior over time (longitudinal) result in more spend?
  • Browsers to Buyers – what does the “Path diagram” look like?

Retailers accumulate huge amounts of data on a day-to-day basis.  Each time, you head to Costco or Kroger and fill up your cart. The cashier scans your items, then hands you a coupon – for $1.00 off your favorite brand of ice-cream.

With hundreds of thousands of grocery items on the shelves, how does Kroger know what you’re most likely to buy? Using predictive analytics and data from loyalty cards, computers in real-time are able to crunch terabytes of your historical purchases to figure out that your favorite ice-cream was the one item missing from your shopping basket that week. Further, the algo matches your past purchase history to ongoing promotions in the store.

So with your bill, you receive a coupon for the item you are most likely to buy next time. The shift toward contextual marketing and retailing is driven by data — big data. The typical objectives are:

  • Data to enable cross-channel and multi-stage marketing. 70% of buying experiences are based on how customer feels he or she is being treated. A negative experience is extremely difficult to overcome.
  • Dynamic, personalized content across touch points
  • Social marketing as buying cycles being online and consumers make decisions before engaging with the company (people buying from influencer’s comments and feedback)

Big changes are underway. Millennials believe that other consumers care more about their shopping experience than companies do – that’s why they share their opinions online (OECD 2013)

Customer 360 or Profile Analytics 

Every retail and web interaction both feeds a customer profile or influenced by a customer profile.

Advances in analytic approaches enable firms to engineer a 360 degree view of a “Customer Profile.”  Customer Profile is a way of interpreting each customer’s psychology, relationship drivers, and behavior, based on his or her relationship and interactions with all touchpoints of an organization, and it brings to light the hidden reasons behind each customer’s purchase and relationship patterns.

Customer Profile is developed by analyzing, among other things, relationship length and individual customers’ purchase behavior in all its dimensions: quantity, amount, where, what, how recent, how frequent, and how sensitive to different types of offers. Customer Profile captures the full extent of a customer’s relationship with a company in a single, predictive package that organizations can use to ensure that they are maximizing the value of each relationship and each interaction.

Customer Profiles are constantly evolving.  They allow companies to understand what their customers want, need, and respond to, it enables firms to communicate with them through their preferred channels and frequency, as well as to target and price offers to them for the products and services they value most.

Predictive Analytics in Healthcare and Wellness

There has been an explosion of data and analytics usage in Healthcare.  Instead of repeating the posts, i am going to point to four articles that detail the story.

Predictive Analytics in Sports:  “MoneyBall” with Oakland A’s

Competitive sports is a heavy user of predictive analytics.  The gap between legendary and anonymity in sports is often less than a 1% performance differential in elite sports.

Analytics in baseball was refined in 1990s by the Oakland Athletics ( (Oakland A’s) and depicted in Michael Lewis’s book Moneyball: The Art of Winning an Unfair Game and the Oscar nominated movie.

The Business Problem:  the New York Yankees were the most acclaimed team in Major League Baseball. Small market teams like Oakland Athletics (Oakland A’s) had to change the way they did business.  The A’s were not a wealthy team, in fact were ranked 12th (out of 14th) in payroll.

A core strategy question for the A’s (and in sports) is:

  • How to compete with rich teams with constraints like salary caps and small market economics?
  • How to spot and acquire low-cost undervalued talent that is a “force multipler” and not a “money blackhole”?

The Solution:  In 1999 Billy Beane (manager for the Oakland Athletics) found a novel use of data mining. Beane hired a statistics grad to analyze baseball statistics (pitcher’s records, RBI, batting average in MLB and minors) advocated by baseball guru Bill James.  Beane was able to hire excellent players undervalued by the market.

A year after Beane took over, the A’s ranked 2nd! How did they do it?  While the Yankees paid its star players tens of millions, the A’s managed to compete with a low payroll. When signing players, they didn’t just look at basic productivity values such as RBIs, home runs, and earned-run averages.

Instead, they analyzed hundreds of variables from every player and every game, attempting to predict future performance and production. Past performance as a predictor of the future. Some statistics were even obtained from game footage by using video recognition techniques. This allowed the team to sign players who may have been overlooked but were equally productive on the field.

Implications: The Oakland A’s started a trend, and the “reporting to predictive analytics” techniques began to penetrate the world of Baseball.  The application of  analytics to a wide variety of sports is now standard practice. It’s important to note that baseball statistics is not new. Leveraging stats to make hiring decisions is.

********* Analytical Tidbit************

Dodgers GM Branch Rickey hired the first baseball statistician in 1947, after which the use of statistical analysis in baseball grew. But the practice took a major leap forward in 1977 when Bill James began self-publishing works about a new discipline he called sabermetrics. See MIT Sloan Sports Analytics Conference for more history and latest trends.

Other drivers for sport analytics include Fantasy football, sports betting,  and point spreads.

********* Analytical Tidbit************

Social Enterprise – connect data, insights, and people in the organization


“If you work for a company that wants to be customer-centric but you never get to talk/listen to a user or end-client, you’re making shit up.” — Anonymous

Conversations at scale… Conversations amplified and relevancy increased … Conversations impacting decisions and actions.

Social is spawning a new creative vortex of co-creation, commerce, and collaboration the likes of which we have not see before.

Social enterprise is the desire to get work done in new ways, work that involves other people and has Influencers as part of the process (vs. people watching the process).

Social Applications are at the heart of Social enterprise and we are seeing a gradual shift in usage as usage matures with more experience.

  1. The first phase was around new and innovative  collaboration capabilities such as Facebook, Twitter, Digg, Yammer or LinkedIn.  In this phase, the focus was better customer engagement through Twitter or Facebook.
  2. The second phase is enterprise social — social embedded in apps such as CRM, Sales force management, marketing Intelligence or Data Management tools to embrace a more real-time streaming,  “crowdsouring” architecture.
  3. In the third phase we are seeing the trend of business applications taking on attributes of these consumer-facing sites to develop better predictive insight.  For example, better data management (structured + unstructured; inside the four walls + outside data) within a CRM system could allow operations staff to give greater context to sales forecasts that show steep drops in certain product category sales.

So what questions are we trying to answer in the quest for more customer/employee intimacy and engagement. Questions like:

  • How do I find out what customers are saying about my brand?
  • How do I join those conversations and proactively engage with my customers?
  • How do I make it easier and  more fun for my customers to engage with my brand?
  • How do I enable collaboration inside my organization to reshape business processes?

Social data leverage brings in new capabilities so problems are identified more quickly and the resulting relevant insights can be explored.  B2C techniques are coming to B2B and B2E interactions.

********* Analytical FACT ************

“Who cares if we find out we lost a customer after she left?” The objective of predictive analytics is not just understand why you lost a customer but how to prevent you from losing one before it happens.

Think about the many organizations  consumers interact with daily: banks, retailers, branded luxury goods; oil & gas multinationals; telephone companies; non-profits  etc. In every case, the strategic vision and objectives for the businesses differ, as do the key relationships and interactions with their customers and with their employees. Each business requires a unique blueprint for its future as  a social enterprise, and a tailored implementation roadmap to reach that target future state.

********* Analytical FACT ************


Notes and References

  1. See also: A Very Short History of Data Science 
  2. Sabermetrics uses statistical analysis to analyze baseball records and make determinations about player performance. James called sabermetrics “the search for objective knowledge about baseball”.  Sabermetricians have questioned some basic assumptions about how talent and player contributions are judged and created quite a stir. But over time, many sabermetric ideas have found wide acceptance.
  3. Business value comes from consumption of data sciences or analytics, rather than the creation of analytics.  Consumption – decisions and actions – is where competitive advantage is generated.
  4. Talent shortage — According to a McKinsey report, by 2018, there will be a shortage of 140,000 to 190,000 data scientists, and about 1.5 million managers and analysts who can use Big Data effectively to make decisions.
  5. About this Blog: The Business Analytics 3.0 blog covers some of today’s thorniest business problems around data strategy, technology, process, governance, and leadership.
  6. Data Science is increasingly becoming a catch-all buzz word that encapsulates statistics, Operational Research and Management Science. The danger is that the entire field might collapse under the weight of unrealistic expectations.
  7. Big data architecture and patterns, Part 1: Introduction to big data classification and architecture”  IBM developerWorks  (
  8. Different foundation components of Analytics ProjectsAnalyticTechniques

9.  Mashable lists five of the more unusual of these projects. They include:

  • Homicide Watch D.C. (a precursor to the Minority Report?)
  • Falling Fruit (you can find hidden outdoor edibles in urban settings, hopefully road kill isn’t included)
  • Topography of Tweets (a Twitter visualization map to show you where most people are tweeting from in certain cities, because, I dunno why)

For more information:  – see the Mashable article

10.  Data Volume Table

mega- M 1,000,000 2 power 20
giga- G 1,000,000,000 2 power 30
tera- T 1,000,000,000,000 2 power 40
peta- P 1,000,000,000,000,000 2 power 50
exa- E 1,000,000,000,000,000,000 2 power 60
zetta- Z 1,000,000,000,000,000,000,000 2 power 70
yotta- Y 1,000,000,000,000,000,000,000,000 2 power 80

11. Great Infographic from FICO


See also:

  1. Making Money on Predictive Analytics: Tools, Consulting and Content
  2. A Very Short History of Data Science 
  3.  A Very Short History of Big Data and A Very Short History of Information Technology
  5. Dilbert on Big Data —
  6. Apps That Know What You Want, Before You Do – New York Times
  7. The Data Scientist on a Quest to Turn Computers Into Doctors — Wired profiles Kaggle founder Jeremy Howard and his efforts to improve health care through data science practices such as developing deep learning algorithms.
  8. What Cars Did for Today’s World, Data May Do for Tomorrow’s  — The New York Times profiles General Electric’s major push towards Internet of Things-connected machines and devices to fuel its data lake. GE is bringing analytics-derived insights to its products for the railroad, airline, hospital, and utility industries, yielding successes such as the ability to detect “possible [airline] defects 2,000 times as fast as it could before.” In related news,the Washington Post reported on how GE is using sensor-enabled devices and data-driven insights to revolutionize its manufacturing processes.
  9. For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights — For all of the promises of data science, the major hurdle for many practitioners remains the messiness of available data, the New York Times reports. “Data wrangling” accounts for 50 to 80 percent of professionals’ time according to studies cited in the article, reducing the effectiveness of data scientists’ efforts.
  10. The Notification Economy – predictive analytics plays a critical role in identifying which notification is a critical one to respond to or react to.
59 Comments Post a comment
  1. Feb 10 2012

    Good overview.

    The other day I participated in a webinar by on their InfiniteInsight product. I haven’t used their product, but it seems like a strong contender with sophisticated prediction technology (Structural Risk Mitigation) and large scalability.

    Regards, Thomas.

    (Note: I am not affiliated with KXEN and do not benefit from their product being used in any way.)


    • Ravi Kalakota
      Feb 11 2012

      Thanks for the comments. I will update the list. Appreciate the pointer.

      Regards, ravi


  2. Great article particularly the write up on various implementations. Its really gives a lot of insight., How about HPCC ? This looks to be a worthy competitor of hadoop. We tried it on our laptops and its very fast. It processed a 140 MB file in seconds in a standard windows 7 laptop , 8 gb ram , i5 processor. Unlike hadoop no need to change code to read name/value pairs regards sreeni


  3. Mark Schouls
    Mar 29 2014

    Great overview. I really enjoyed the read.


  4. Jul 26 2014

    Great breakdown of a complex subject. Thanks for the diagrams and hard-work!


  5. Nov 11 2014

    Its an interesting read on Predictive analytics. Predictive analytics is gaining traction at multiple industries. We at analyttica offer a free course on Predictive Analytics through simualtions where you are expected to learn analytics by solving real problems in analytics. Visit us at


  6. Apr 20 2015

    RetailReco Predictive Analytics solutions solves all these problems and gives on the mark predictions for customer’s predicted buys as well as products predicted customers. The predictions are served directly from the web and automatically the sold out products are replaced by the new products. Consistent personalized offers on mobile, on website and on email increases results into much stronger customer loyalty, happy customer base and increase in same customer revenue. Always current cross-sale and up-sale product offers is the basis of success in online retailing.


  7. Sep 10 2016

    What a great and indepth overview! Coming from the HR / Workforce analytics domain – it’s lovely to see how much we can learn from other domains. Loved your comment “The scope of predictive analytics is also expanding considerably as human behavior is modeled and expressed mathematically.”

    Predictive human behavior has a long history – from defaulting on paying bank loans, to defaulting on a new job, to fraud – so many instances of predicting human behavior. Thank you.


  8. Jan 2 2017

    Thanks for including the related articles as well!


  9. Richard Roach
    Feb 22 2017

    Brilliant overview of BI Maturity. Very comprehensive. Well Researched


  10. Jun 23 2018

    Great post. I really enjoyed to read this blog. Keep sharing


  11. Jul 30 2018

    Great explanation on predictive analytics. Predictive Analytics uses many techniques to make predictions of future.



Trackbacks & Pingbacks

  1. Making Money on Predictive Analytics – Tools, Consulting and Content | Business Analytics 3.0
  2. ROI on Analytics – Now We Have Numbers | Business Analytics 3.0
  3. Gartner says – BI and Analytics a $10.5 Bln market | Business Analytics 3.0
  4. Big Data Analytics Use Cases | Business Analytics 3.0
  5. Targeted Offer Design: Outlining the Architecture | Business Analytics 3.0
  6. Organizing for BI, Analytics and Big Data | Business Analytics 3.0
  7. Email Marketing is a Predictive Analytics Problem | Business Analytics 3.0
  8. Data Envelope Analysis is … | Economics
  9. Predictive Analytics 101 | BI - Business Intell...
  10. Predictive Analytics 101 | Exploration de donn&...
  11. Predictive Analytics 101 | Analytics |
  12. Data Science and Analytics Outsourcing – Vendors, Models, Steps | Business Analytics 3.0
  13. YRC/DOT Mobile Healthcare?
  14. Predictive Analytics 101 | Economics of Work an...
  15. Lingering problems hurt earnings
  16. Drivers' Identity Verified Using Brain Waves?
  17. What is a “Hadoop”? Explaining Big Data to the C-Suite | Focus Training Services
  18. Data, Data Everywhere...and Not a Drop to Use | Krispy's Bytes
  19. @Vise Business Consultancy | Predictieve analyse als beslissingsmodel.
  20. Guest Personalization & Wearable Computing: Disney MyMagic+ | Business Analytics 3.0
  21. Predictive Analytics & Wearable Computing = Personalized Big Data | Business Analytics 3.0
  22. Data, Data Everywhere…and Not a Drop to Use | Krispy's Bytes
  23. Predictive Analytics 101 | Nostri Orbis | Scoo...
  24. Data Visualization, Discovery and Visual Analytics – Tools, CoE | Business Analytics 3.0
  25. Predictieve analyse als beslissingsmodel. - Supply chain performance management
  26. Predictive Analytics in Marketing – Is it for Real? | Permission based E-mail Marketing Platform for brokers
  27. predictive analytics - issue
  28. Predictive Modeling Overview |
  29. Components to effective Predictive Modeling |
  30. The Future of Retail Profiling | Retail Profiling 101
  31. The Hottest Oakland Athletics Analytics | Baseball Reference
  32. Data, Data Everywhere…and Not a Drop to Use » i am krispy »
  33. MobileFirst Marketing Automation is HOT | Digital Transformation, Engagement and Experience
  34. Predictive Analytics 101 | Business Analytics 3.0 | Arctic Vision
  35. Let Data Tell Your Story - Emarsys
  36. Let Data Tell Your Story - Emarsys APAC
  37. What is a “Hadoop”? Explaining Big Data to the C-Suite | Business Analytics 3.0
  38. One By Fetch – In our world review- September 25th
  39. One By Fetch – Revue de Presse- 18 septembre
  40. One By Fetch – Revue de Presse- 25 septembre
  41. Lex-Operandi – In our world review- September 27th
  42. Innovation and Big Data: A Roadmap | Business Analytics 3.0
  43. Predictive Analytics: Shifting To An Advanced Analytics Driven Business -
  44. Robotic Process Automation and Analytics
  45. Robotic Process Automation & Analytics – Welcome to Shariff's Place
  46. Robotic Process Automation + Analytics | Business Analytics 3.0
  47. Predictive Analytics: What it is and why does it matter? – Fast Session

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Note: HTML is allowed. Your email address will never be published.

Subscribe to comments

%d bloggers like this: