Analytics-as-a-Service: Understanding how Amazon.com is changing the rules
“More firms will adopt Amazon EC2 or EMR or Google App Engine platforms for data analytics. Put in a credit card, by an hour or months worth of compute and storage data. Charge for what you use. No sign up period or fee. Ability to fire up complex analytic systems. Can be a small or large player” Ravi Kalakota’s forecast
Big data Analytics = Technologies and techniques for working productively with data, at any scale.
Analytics-as-a-Service is cloud based… Elastic and highly scalable, No upfront capital expense. Only pay for what you use, Available on-demand
The combination of the two is the emerging new trend. Why? Many organizations are starting to think about “analytics-as-a-service” as they struggle to cope with the problem of analyzing massive amounts of data to find patterns, extract signals from background noise and make predictions. In our discussions with CIOs and others, we are increasingly talking about leveraging the private or public cloud computing to build an analytics-as-a-service model.
Analytics-as-a-Service is an umbrella term I am using to encapsulate “Data-as-a-Service” and “Hadoop-as-a-Service” strategies. It is more sexy 🙂
The strategic goal is to harness data to drive insights and better decisions faster than competition as a core competency. Executing this goal requires developing state-of-the-art capabilities around three facets: algorithms, platform building blocks, and infrastructure.
Analytics is moving out of the IT function and into business — marketing, research and development, into strategy. As result of this shift, the focus is greater on speed-to-insight than on common or low-cost platforms. In most IT organizations it takes anywhere from 6 weeks to 6 months to procure and configure servers. Then another several months to load, configure and test software. Not very fast for a business user who needs to churn data and test hypothesis. Hence cloud-as-a-analytics alternative is gaining traction with business users.
The “analytics-as-a-service” operating model that corporations are thinking about is already being facilitated by Amazon, , Opera Solutions, eBay and others like LiquidHub. They are anticipating the value migrating from traditional outmoded BI to Analytics-as-a-service model. We believe that Amazon’s analytics-as-a-service model provides a directional and aspirational target for IT organizations who want to build an on-premise equivalent.
Situation/Problem Summary: The Challenges of Departmental or Functional Analytics
The dominant design of analytics today is static or dependent on specific questions or dimensions.
With the need for predictive analytics driven business insights growing at ever increasing speeds, it’s clear that current departmental stove-pipe implementations are unable to meet the demands of increasingly complex KPIs, metrics and dashboards.
What works well at a smaller department scale tends to breakdown when it comes to enterprise analytics. Also enterprise level analytic solutions stitched together from different hodge-podge BI stacks (SAS, Cognos, SAP BusinessObjects, Microstrategy, Hyperion, Oracle etc.) tend to get quite expensive to manage, maintain and enhance.
After years of cost cutting, organizations are striving for top-line growth again. They are finding that with the proliferation of front-end analytics tools and back-end BI tools, platforms, data marts, the burden/overhead of managing, maintaining and developing the “raw data to insights” value chain is tremendous.
It is becoming more and more necessary to restructure the islands and stovepipes of platforms/tools/information into much more centralized but flexible analytical infrastructures.
This doesn’t mean doing more of the predominant model — enterprise datawarehouse with dependent data marts (hub-and-spoke). But a virtual datawarehouse that pulls data dynamically from source systems as needed.
Keep in mind this design conundrum: centralization of analytics infrastructure conflicts with the business requirement of time-to-market, high quality and execution speed. IT depts that focus on optimizing standards, costs via offshore vendors, and release discipline miss the point that the front-line LoB teams need usable, adaptable, flexible and constantly changing insights to keep up with customers. The front-line teams care about revenue, alignment with customers and sales opportunities. So how do you bridge the two worlds and deliver the ultimate flexibility with the lowest possible cost of ownership?
The solution is Analytics-as-a-Service.
Emerging Operating Model: Analytics-as-a-Service
It’s clear that sophisticated firms are moving along a trajectory of consolidating their departmental platforms into general purpose analytical platforms (either inside or outside the firewall) and then packaging them into a shared services utility.
This model is about providing a cloud computing model for analytics to anyone within or even outside your organization. Fundamental building blocks (or enablers) like – Information Security, Data Integrity, Data and Storage Management, iPad and Mobile capabilities and other aspects – which are critical don’t have to be designed, developed, tested again and again. More complex enablers like Operations Research, Data Mining, Machine Learning, Statistical models are also thought of as services.
Enterprise architects are migrating to “analytics-as-a-service” because they want to address three core challenges – size, speed, type – in every organization:
- The vast amount of data that needs to be processed to produce accurate and actionable results
- The speed at which you need to analyze data to produce results
- The type of data that you analyze—structured versus unstructured
The real value of this service bureau model lies in achieving the economies of scale and scope…the more virtual analytical apps you deploy, the better the overall scalability and the higher the cost savings. With growing data volumes and dozens of virtual analytical apps, chances are that more and more of them leverage processing at different times, usage patterns and frequencies, one of the main selling points of service pooling in the first place.
Amazon Analytics-as-a-Service in the Cloud
Amazon.com is becoming a market leader in supporting the analytics-as-a-service concept. They are attacking this as a cloud-enabled business model innovation opportunity than an incremental BI extension. This is a great example of value migration from outmoded methods to new architectural patterns that are better able to satisfy business’ priorities.
Amazon is aiming at firms that deal with lots and lots of data and need elastic/flexible infrastructure. This can be domain areas like Gene Sequencing, Clickstream analysis, Sensors, Instrumentation, Logs, Cyber-Security, Fraud, Geolocation, Oil Exploration modeling, HR/workforce analytics and others. The challenge is to harness data and derive insights without spending years building complex infrastructure.
Amazon is betting that traditional enterprise “hard-coded” BI infrastructure will be unable to handle the data volume growth, data structure flexibility and data dimensionality issues. Also even if the IT organization wants to evolve from the status quo they are hamstrung with resource constraints, talent shortage and tight budgets. Predicting infrastructure needs for emerging (and yet-to-be-defined) analytics scenarios is not trivial.
Analytics-as-a-service that supports dynamic requirements requires some serious heavy lifting and complex infrastructure. Enter the AWS cloud. The cloud offers some interesting value 1) on demand; 2) pay-as-you-go; 3) elastic; 4) programmable; 5) abstraction; and in many cases 6) better security.
The core differentiator for Amazon parallel efficiency – The effectiveness of distributing large amounts of workload over pools and grids of servers coupled with techniques like MapReduce and Hadoop.
Amazon has analyzed the core requirements for general analytics-as-a-service infrastructure and is providing core building blocks that include 1) scalable persistent storage like Amazon Elastic Block Store; 2) scalable storage like Amazon S3; 3) elastic on-demand resources like Amazon Elastic Compute Cloud (Amazon EC2); and 4) tools like Amazon Elastic MapReduce. It offers choice in the database images (Amazon RDS, Oracle, MySQL, etc.)
How does Amazon Analytics-in-the-Cloud work?
BestBuy had a clickstream analysis problem — 3.5 billion records, 71 million unique cookies, 1.7 million targeted ads required per day. How to make sense of this data? They used a partner to implement an analytic solution on Amazon Web Services and Elastic MapReduce. Solution was a 100 node cluster on demand; processing time was reduced from 2+ days to 8 hours.
Predictive exploration of data, separating “signals from noise” is the base use case. This manifests in different problem spaces like targeted advertising / clickstream analysis; data warehousing applications; bioinformatics; financial modeling; file processing; web indexing; data mining and BI. Amazon analytics-as-a-service is perfect for compute intensive scenarios in financial services like Credit Ratings, Fraud Models, Portfolio analysis, and VaR calculations.
The ultimate goal for Amazon in Analytics-as-a-Service is to provide unconstrained tools for unconstrained growth. A remarkable engineering feat. What is interesting is that an architecture of mixing commercial off-the-shelf packages with the core Amazon services is also possible.
The Power of Amazon’s Analytics-as-a-Service
So what does the future hold? The market in predictive analytics is shifting. It is moving from “Data-at-Rest” to “Data-in-motion” Analytics.
The service infrastructure to do “data-in-motion” analytics is pretty complicated to setup and execute. The complexity ranges from the core (e.g., analytics and query optimization), to the practical (e.g., horizontal scaling), to the mundane (e.g., backup and recovery). Doing all these well while insulating the end-user is where Amazon.com will be most dominant.
Data “in motion” analytics is the analysis of data before it has come to rest on a hard drive or other storage medium. Due to the vast amount of data being collected today, it is often not feasible to store the data first before analyzing it. In addition, even if you have the space to store the data first, additional time is required to store and then analyze. This time delay is often not acceptable in some use cases.
Due to the vast amounts of data stored, technology is needed to sift through it, make sense of it, and draw conclusions from it. Much data is stored in relation or OLAP stores. But, more data today is not stored in a structured manner. With the explosive growth of unstructured data, technology is required to provide analytics on relational, non-relational, structured, and unstructured data sources.
Now Amazon AWS is not the only show in town attempting to provide analytics-as-a-service. Competitors like Google BigQuery, a managed data analytics service in the cloud. BigQuery is aimed at analyzing big sets of data… you can run query analysis on big data sets — 5 to ten terabytes — and get a response back pretty quickly, in a matter of seconds, ten to twenty seconds. That’s pretty useful when you just want a standardized self-service machine learning service. How is BigQuery used? Claritic has built an application for game developers to gather real-time insights into gaming behavior. Another firm, Crystalloids, built an application to help a resort network “analyze customer reservations, optimize marketing and maximize revenue.” (source: THINKstrategies’ Cloud Analytics Summit in April, Ju-kay Kwek, product manager for Google’s cloud platform).
Example of a Web Log Analysis reference Architecture on AWS.
Bottom-line and Takeaways
Analytics is moving from the domain of departments to the enterprise level. As the demand for analytics grows rapidly the CIOs and IT organizations are going to be under increasing pressure to deliver. It will be especially interesting to watch how companies that have outsourced and offshored extensively (50+%) to Infosys, TCS, IBM, Wipro, Cognizant, Accenture, HP, CapGemini and others will adapt and leverage their partners to deliver analytics innovation.
At the enterprise level a shared utility model is the right operating model. But given the multiple BI projects already in progress and vendor stacks in place (sunk cost and effort), it is going to be extra-ordinarily difficult in most large corporations to rip-and-replace. They will instead take a conservative and incremental integrate-and-enhance-what-we-have approach which puts them at a disadvantage. Users will increasingly complain that IT is not able to deliver what innovators like Amazon Web Services are providing.
Amazon’s analytics-as-a-service platform strategy shows exactly where the enterprise analytics marketplace is moving to or needs to go. The type of capabilities illustrated in this posting are exact exactly what needs to be enabled by many IT organizations (either via construction, cloudsourcing or outsourcing). But most IT groups are going to struggle to implement this trajectory without some strong leadership support, experimentation and program management.
We expect this enterprise analytics transformation trend will take a decade to play out (innovation to maturity cycle).
Additional Resources and Notes
- The dominant design of most analytics products today is static or dependent on specific questions or dimensions. This model is ineffective in a constantly changing landscape where you have to look for new signals or patterns. For example, complexity of predicting or isolating buying trends from social media interactions or isolating security threats from online chatter.
- Changes in the Analytics landscape — more decentralized and self-service what-if analytics; Shortened Time to Market Requirements; Adhoc Exploration; Real-time data feeds; Petascale transactional data; Continuous prototyping – Can’t wait for Enterprise Datawarehouse.
- With the rise of Mergers and Acquisitions (M&A), carveouts, “analytics-as-a-service” models that allow faster integration and better leverage become even more attractive.
- http://aws.amazon.com/elasticmapreduce; http://aws.amazon.com/articles/Elastic-MapReduce;
- Source of Amazon Hadoop Analytics-as-a-Service figure is from Deepak Singh, Ph.D., Sr. Business Development Manager at Amazon Web Services
- New concept that has emerged recently is Government Private Cloud. This has analytics-as-a-service implications as a big argument about data security in the cloud is being addressed. Amazon Web Services GovCloud, is a service that complies with the International Traffic in Arms Regulation. ITAR regulates how government agencies manage and store sensitive data, including defense data. Any cloud service used by organizations that are covered by ITAR can only be accessible by U.S. citizens. Because the AWS GovCloud is only accessible by U.S. citizens and complies with ITAR’s other requirements, government agencies can use the service to store and manage additional kinds of data. Amazon is stepping up the security and access features of its cloud services in an effort to attract more government agencies as customers. Amazon’s cloud services already meet key government regulations, such as the Federal Information Security Management Act and the Federal Information Processing Standard. NASA’s Jet Propulsion Lab and the U.S. Recovery and Accountability Transparency Board are two federal agencies that already use AWS.
- Opera Solutions offers “analytics-as-a-service” for specific vertical solutions. This service is delivered via Opera’s LeanForm infrastructure. The charge for the service is either per-use or monthly.
- It is surprising to us that traditional vendors IBM, Oracle and SAP despite making all these multi-billion $ analytics acquisitions (Cognos, SPSS, BusinessObjects, Hyperion, Siebel, CastIron, etc.) are not much more aggressive in providing Analytics-as-a-service capabilities. They seem more interested in version upgrades and maintenance revenue (packaging old-wine-in-a-new-bottle). Lack of business model innovation will hurt them against emerging players and disruptive technology.
- In a recent Sandhill blog, “Big Data and Insight As a Service,” Evangelos Simoudis, senior managing director of Trident Capital, outlined two types of cloud-based, big data analytics services. One type operates on data that is primarily managed behind the firewall, such as transactional applications (ERP, HCM, CRM) that can be supplemented with syndicated or open source data. Another operates on the data generatedby the software.
- Oracle’s Analytics-as-a-Service Strategy: Exalytics, Exalogic and Exadata (practicalanalytics.wordpress.com)
- 5 Big Data Questions For CEOs (forbes.com)
- IDC: Analytics a $51B business by 2016 thanks to big data (gigaom.com)
- The big data boom: What, why, and how? (itproportal.com)
- IT Life: Big Data For The Travel Industry (techweekeurope.co.uk)