Exacaster client Ultra Mobile, a U.S.-based mobile carrier developing first-of-its-kind mobile phone services and unlimited international communications, has just been named Inc. Magazine’s Fastest Growing Private Company in America for 2015. Over the past three years, Ultra Mobile has recorded revenue growth of 100,849 percent, with 2014 gross revenue of $118.2M, earning the company the number one ranking on the prestigious list.
We sat down with Exacaster CEO Sarunas Chomentauskas and Chief Data Scientist Egidijus Pilypas to pick their brains about business and technological challenges Exacaster’s team of data scientists and engineers are tackling in their work with Ultra Mobile, as well as the general developments in the big data predictive analytics landscape.
Q: What would you say is the number one challenge superfast growing companies like Ultra Mobile face in developing their analytical capabilities?
Sarunas: When we started working with Ultra, their analytical stack was like something we commonly find among rapidly growing companies: enough to get by, but not entirely up to the task, and the primary reason for that was that their valuable data was scattered across a multitude of systems and platforms. Some of it was within on premise operational systems. Some with various cloud providers. A relational database dedicated to analytics. All this is in fact very common.
“Interactive data exploration takes seconds and unlocks creativity and productivity for business users on an unprecedented scale.”
It takes time and patience to set up an analytical stack the right way – and at an organization that is growing at an explosive pace, it is something that has to be addressed early on and done right to prevent technological and operational nightmares in the future. Very often rapidly developing businesses miss that critical moment. At Ultra this was addressed at exactly the right time – and I must admit we were delighted to see how focused and forward-thinking Ultra’s senior management were about their present and future Big Data needs – from their CIO‘s clear vision of their data strategy through to their CTO‘s hard push to build a next generation data warehouse employing Hadoop and the Exacaster platform.
Q: So how did you address the challenge at Ultra?
Egidijus: Over many years in the business we have refined a simple 3-step formula that has been proven to work well in client situations similar to Ultra’s.
First we land data from all sources, in original format, to a data lake on Hadoop.
Secondly, we transform all this data to query-friendly data warehouse with optimized query performance and columnar storage to increase analyst productivity – again, all that happening within Hadoop.
Then we deploy a continuously-updating and refreshing, full 360 degrees customer profile for marketing automation, segmentation and various kinds of analytical and operational purposes – with all this taking place within Hadoop again.
Sarunas: At Ultra we centralized all data in a Hadoop Data Lake with all data transformation processes running there. A well known axiom is that the value of data increases exponentially once it can be combined. By centralizing all the data in a multi-purpose Data Lake we go through the roof in terms of what can be done with it. Large queries that used to take 20 hours run in minutes on hardware that costs a fraction of what it used to. Interactive data exploration takes seconds and unlocks creativity and productivity for business users on an unprecedented scale. We leveraged Impala to serve interactive query needs. We’re big believers in having a unified Big Data architecture. That’s why we love modern Hadoop – much of what we are doing for Ultra Mobile and our other clients at Exacaster would not be possible without it. The disruption that Hadoop is causing in the data warehousing world is very real and tangible.
Q: Building one huge centralized data lake seems to be the approach nowadays advocated by many analytics technology and services providers, including Palantir, who is doing it for government institutions and largest global enterprises. Is everyone basically doing the same thing, or do you see any differences in the approaches your competitors use?
“Predictive algorithms are but a small step in large data pipelines, and most important algorithms are all public domain.“
Sarunas: Fundamentally everyone is driven by the same data imperatives: to combine data, reduce costs and complexity, increase flexibility, move arbitrary computation to data, aid end-users with familiar concepts, languages and instant response. While there are many cool and expensive technologies for niche applications, the most widely adopted and accessible technologies will win in the long term. Hadoop is clearly there.
Egidijus: The QWERTY keyboard is still with us. Don’t underestimate the human inertia and switching cost. SQL has made the transition to Big Data sets and has made analysts all the more productive again.
Q: Data lake or data warehouse – which is the right approach?
Sarunas: For most companies operating in data-soaked environments – be it a telecom operator, a bank, an insurance company, a retailer or a SaaS provider – this is not an “either/or” choice: you need to have both. A data lake is the fastest route to results because you simply store all your data on Hadoop in original formats and query it immediately after storing using sheer computing force. You have all the flexibility in the world with this kind of setup – that is really important.
Then as your analyst team grows to a certain size, or you want to build an abstraction layer, or start optimizing the tables being queried, you build star schemas, adopt columnar storage and query those. The data lake is always there to back you up if there is something you cannot achieve with your star schemas. This is a very viable alternative to traditional expensive data warehouse platforms.
Q: So you are effectively saying that for rapidly growing companies like Ultra, centralizing data and enabling easy access to data is actually a more important challenge than fancy data science algorithms?
“The most attractive business cases for generating returns on investments in Big Data-centric efforts tend to center around a company’s core processes“
Egidijus: Data science is a linear process and data is the first step. In general predictive algorithms are but a small step in large data pipelines, and most important algorithms are all public domain. So it is rare to find a purely algorithmic problem. Value is created in combination, and data is the input element with the highest degree of freedom where creativity and know-how can make wonders.
Sarunas: In Ultra’s case we are able to bring proven, existing data pipelines and algorithms to speed up the time to market significantly in acquisition, retention, segmentation and up-sell, and to enable true personalisation of marketing activities.
Q: Gartner has defined three different kinds of analytics – descriptive, predictive, and prescriptive. Descriptive analytics has been around 1980s and is mostly about reporting with simple tools such as frequency distributions, graphs and charts. Predictive analytics use models for crunching past data to predict future events. Prescriptive analytics go further ahead by providing recommendations to front-line managers. Tom Davenport has been writing about the fourth kind – automated analytics – that further extend prescriptive analytics by eliminating the human factor entirely and letting the algorithm make decisions based on calculated predictions and take action – e.g. execute financial transactions, change prices or send personalized marketing emails.
Building on that, Blue Yonder’s CEO Uwe Weiss has allowed in a recent interview that “99 percent of business decisions can be automated.”
What’s your take on this – does Exacaster predictive analytics platform merely help managers make better decisions, or are there situations where your software replaces human decision makers entirely? How do you see this evolving in the future?
Egidijus: We are big believers in the potential of software to replace human decision making in many situations – eventually. In fact, our vision of the end-game is perhaps having just three buttons to run your business: “increase revenue”, “increase margin” or “increase market share”, with the rest being executed automatically by systems and algorithms.
Sarunas: In the real world it could of course be possible only in fully digitized environments where the whole business process – from data collection to decision to implementation – is controlled by software. This is another way to define a robot or a robotic system. While many routine processes can be controlled by robotic systems, there is a big risk of overselling it at this stage of technology maturity. Full automation of decisions is hard and typical robots are dumb. It makes sense today where many decisions must be made very fast, based on simple rules of thumb, like in real time online advertising bidding optimization – an area in which we do have a working product, as do quite a few other predictive analytics technology vendors. Some business processes are like that, but many are not. There is a lot of value to be created by systems that support and automate away mundane parts of human decision making.
Q: What is the practical value of having a 360 degree customer profile?
Egidijus: Where do I even begin? A 360 degree profile provides the foundation for segmentation, complex events detection, predictive behavior analytics and campaign targeting automation – as well as a multitude of other customer experience management aspects. Our clients have found all these to be generating significant added value.
I believe Yahoo used to call the 360 degree profile the “DNA of the customer.” We have yet to meet a client company that would have doubts about the value in having access to that. But historically the problem has been the cost to build and maintain that in an efficient way – it is still true that data scientists spend 80 percent of their time wrangling data.
Sarunas: The need to give our clients access to 360 degree profiles of their customers cost-effectively was what drove us to develop our proprietary BigMatrix component on top of Hadoop. With it in place, the process of generating incredibly rich customer profiles populated with data integrated from multiple sources becomes fully automated, and the profiles are kept up to date with no human involvement.
Egidijus: BigMatrix really brought the fun back for the data scientists – both on our own team and those of our clients’!
Q: Where do companies like Ultra Mobile look for return on investment in Big Data?
Sarunas: Google founders have famously said that some very big questions can be answered with large amounts of trivial data and simple algorithms. It is true. Data is the nerve signal in business, and evolution has shown us that brains win over muscle.
The most attractive business cases for generating returns on investments in Big Data-centric efforts tend to center around a company’s core processes – in Ultra’s case that would be customer acquisition, sales and marketing, and service.
Q: How does one use data science to scale sales and marketing?
Egidijus: This is always a multi-layered problem and we solve it by using a combination of technology and best practice business processes. We bring to our clients ready-made tools for segmentation, activity guides for each segment and micro-personalization within those. This helps create a clear structure that can be used by a marketing and sales organization very effectively.
Q: Let’s get back to your collaboration with Ultra. It is perhaps a sure sign of true globalization within tech that the fastest growing American company is relying on a European big data analytics technology and services provider. How do you make it work logistically – do you forward deploy engineers to work on-site, or do things remotely?
Sarunas: It is true that while Exacaster does have a representative office in NYC, our main teams of data scientists and deployment engineers are based in Northern Europe. After years of working with clients across three continents – we actively serve customers not only in Europe and the United States, but also in a few South American countries – we have refined an engagement model that seems to work very well both for our clients and us.
It is a necessity to forward-deploy engineers and data scientists in larger projects, as it helps create the right fusion of business-specific understanding with technological and analytical know-how. The operations, R&D and software development, on the other hand, are centralized and operate out of our Northern European HQs.
Thank you!