The growing amounts of data that are being generated due to such trends as the Internet of Things (IoT) and cloud computing have naturally beget the need for data scientists who can collect, analyze and, most importantly, interpret these massive stockpiles of complex information to help their companies more quickly and accurately make better business decisions to give them a competitive edge over competitors and to improve their operations and make them more efficient.
That in turn has created something of a land rush in what’s become a rapidly expanding data science platform market of more than a dozen vendors that range from established companies like IBM, Google, Microsoft and SAS to an array of smaller, younger pure-plays.
The goal of all of these companies is to give these data scientists a single place to develop and run algorithms, use machine learning to help build predictive models and then deploy those models into their businesses’ operations. IBM offers such products as SPSS Modeler and SPSS Statistics as well as its two-year-old Data Science Experience, a set of tools around such aspects as machine learning via the vendor’s Watson cognitive computing technology and the R programming language, through the open-source RStudio offering. SAS has its Visual Suite for data visualization, prep, analytics and model building, while Microsoft offers its Azure Machine Learning platform as part of the cloud-based Cortana Intelligence Suite and Microsoft R for those who want to code in R. Other names in the space include H2O, RapidMiner, Angoss, Knime and Dataiku.
Add to this list Domino Data Lab, a five-year old company backed by $40.5 million in funding from the likes of Sequoia, Coatue Management, and Zetta Venture Partners. The company was founded by three former employees of Bridgewater Associates, a massive investment firm that relied heavily on data science techniques in its operations. The three Domino founders, CEO Nick Elprin, CTO Chris Yang, and board member Matthew Granade, were responsible for building and testing predictive models at Bridgewater and putting them into production so they could be used in financial trades.
“Bridgewater as a hedge fund ran all of their investment strategies on predictive models,” Elprin tells The Next Platform. “There are other hedge funds that do that, but Bridgewater is probably one of the first and the biggest and the most successful to do that. All of their investment strategies were driven by predictive model algorithms that data scientists at Bridgewater built and developed over years and years and years and continuously improved. … What we saw there was, how do you really put predictive models at the heart of the business? How do you run your business on models? At Bridgewater, that was literally what the business did. It was to generate these investments. When we started Domino, we saw more and more of these organizations that were really interested in data science but sort of viewing data science as a technical skill. ‘We’ve got to hire some PhDs and we’ve got to give them these fancy algorithms and tools to work with,’ and what we saw was a need for someone to help build technologies and platforms and help companies understand how to put models at the heart of their business as a broad, organizational capability.”
He noted a recent survey by Domino that found that 90 percent of companies want to use data science in their businesses, but only 11 percent had 50 or more models in production settings, and 30 percent had more than five. By comparison, Bridgewater has thousands in production, and other major companies like Amazon Web Services (AWS) and Netflix have hundreds to thousands. Being a model-driven business means having many such predictive models being at the core of the business. A challenge for companies in embracing data science is that too many times they view data science being similar to software engineering.
“Many of these data scientists are writing code and [companies say], ‘We’ve got systems for writing code, we’ve got tools we use for engineering,’ so they try to reuse the same message, tools and platforms as they were using in the generations past, as they were doing software development or doing big data work,” Elprin says. “Our point of view is that models are inherently different. They’re a new kind of species in the business. They’re not software, they’re not data, they’re not BI [business intelligence] and dashboards. They’re a different kind of thing. As a result, they call for a different set of capabilities and platforms and processes to effectively manage them, to get them deployed into production, to govern them and to collaborate on them.”
The Domino platform essentially comes in three parts. The workbench gives data scientists the tools to develop models using a broad array of open languages like Python or R and frameworks like TensorFlow. They can also use commercial tools like SAS or MathWork’s MATLAB with the Domino platform, he says. The workbench also gives the data scientists one-click ways to order the environment they need by outlining how much memory and how many processing cores they need or the GPUs they want, and the platform spins it up.
“That makes data scientists much more productive and efficient,” the CEO says. “They can get a lot more research done. As they work in the central infrastructure that we’re providing, we automatically record and track what they do, so all the experiments they run and all the models they train and all the results they generate are automatically are tracked and preserved and made reproduceable and trackable by colleagues.”
The platform also enables data scientists to deploy and publish models as production-grade APIs, interactive applications or dashboards to more easily get the models into a business’ operations, and to better collaborate by making their work easy to search, find, track and discuss. According to Domino’s survey, the ability to easily collaborate is a key part of a model-driven environment, with 72 percent of respondents saying it was the main attribute for success.
Organizations can run the platform in a number of environments, Elprin says.
“Generally, customers run our platform in a dedicated environment that’s specific to them, because their IP is very important, and that environment can be a private cloud they use or it can be on their own servers on premises,” Elprin tells The Next Platform. “We have customers with both configurations. We have many, many customers who do run Domino and their data science in AWS, but they have a VPC [virtual private cloud] in it.”
The company is getting good feedback, saying in January that revenues tripled in 2017 and is planning its first conference for data scientists, called Rev, for late May in San Francisco. Domino has about 100 production customers, including such major corporations as insurance company Allstate, global agriculture company Monsanto, Carnival Cruise Line, and Moody’s Analytics. The company last year added Dell, T Rowe Price, Postmates and SurveyMonkey. Allstate uses the Domino platform in a range of areas, including claims, policy pricing and connected car research, while Monsanto is building models that can tell farmers what parts of their fields to plant and those to avoid as well as determine where on their farms water might accumulate and threaten crops, Elprin says. Moody’s leverages the technology to develop risk models for such financial services as bonds, and Dell uses predictive models to look at the reliability of hardware components and fraud models to support request for replacement parts.
“There’s an extremely high degree of enthusiasm and investment in data science, and the exciting thing is, it’s not all hype,” Elprin explains. “Companies are getting real returns from this kind of work. They’re seeing value. The point of friction that companies are running into now is, ‘OK, we’ve done some projects, we’ve done some proof-of-concepts, we’ve proved that there’s some real value here. How do we make that scalable and repeatable and really operationalize a business around it?’ And that’s where they’re running into friction points. ‘We have to get our model actually deployed and integrated into the business. We’ve got to have lots of them. We’re going to hire a lot of data scientists. We’ve got to make sure they can effectively collaborate.’ Things like that.”