I recently attended a meeting of government and industry HPC leaders where one topic of much discussion was why so many universities have started “Data Science” programs but with the exception of a small number of universities, are not actively training students in HPC and parallel programming. In other words, why is there a demand to prepare a workforce for big data but not HPC?
The answer can be found through understanding the corporate profit loss statement and how it drives the behavior of corporations. First, a little P&L 101. A P&L typically consists of four parts: revenues or sales, which are proceeds from the sale of the company’s products or services; cost of goods sold or COGS, the costs incurred in manufacturing a product or providing a service; selling, general, and administrative or SG&A expense, the general costs of running a company such as human resources, legal expenses, and copier paper; and finally Earnings Before Interest, Taxes, Depreciation, and Amortization or EBITDA, a common metric for assessing a company’s profitability. Subtract COGS from Revenues and you have gross profit or GP; subtract SG&A from GP and you have EBITDA. GP and EBITDA are typically expressed as a percentage of sales.
Depending on how a company sets up its P&L, the costs of operating an HPC system could show up in COGS or SG&A; if the company has purchased a large HPC system, the initial purchase price will most likely be treated as a capital investment and show up as depreciation expense using an appropriate amortization schedule.
My theory as to why big data is currently viewed differently than HPC is that big data is seen as a new revenue source, whereas HPC is seen primarily as a cost of doing business. In short, big data is a business and HPC is a cost. To borrow from the format of a popular TV commercial: If you are a company, you invest in businesses and you minimize costs – that’s what you do.
So, why is big data seen as a new revenue source? Either because companies are learning they can take data that has been collected in the course of selling other products (e.g., customer data) and “monetize” that data (to use an over-used buzzword) by running big data analytics on it; or companies are setting up businesses whose sole purpose is to collect and monetize data (i.e., most Internet businesses being created today).
The result is there is an excitement about big data – big data is being discussed in boardrooms and C-suites and executives are eager to invest in a new line of business that can drive revenue and profit growth. Data scientists are viewed as the key human resource to unlock the potential of big data and that is driving the interest in data science training programs.
Unlike big data, HPC today lives somewhere down the P&L, buried in COGS or SG&A detail. HPC may be and in many cases, is, critical to developing a new product or sustaining an existing one – but it’s not the product, it’s not the source of revenues – it’s viewed as a cost, a burden. And as noted above, for-profit companies exist to generate sales and minimize costs, thus maximizing profits and creating value for their shareholders. As long as HPC is viewed as a cost, the instinct of executives will be to minimize it – rather than asking “How much to you need?” the question will be “How little can you get away with?” That attitude generally does not evoke excitement and perhaps leads only to reluctant investments in what is seen as absolutely necessary to get the job done.
Of course, what we are really talking about here is the general zeitgeist. As can be easily surmised from the current Top500 list, there are many companies worldwide that run huge HPC operations and make large capital investments in HPC systems. These companies know that HPC is an essential component of their business and spend accordingly. But it’s done with little of the fanfare or visibility seen in the big data phenomenon.
And that perhaps leads to the ever-present question asked throughout the HPC community: “What is all the hubbub with this big data stuff – we’ve been doing Big HPC for years!” If this is all true, what is to be done?
Perhaps the moral of the story for those charged with promoting HPC, either within their companies or for the HPC community at large, is to work at making a clear connection to revenue generation and new business development – showing that more HPC leads to new business opportunities, revenues, and profits. As long as HPC is viewed solely as a cost, “management’s” question will always be “How little do you need?”
Ron Hawkins is Director of Industry Relations at the San Diego Supercomputer Center
Ron – great article! I would add a subtle reference to how HPC grew up inside classical IT departments; whereas, Big Data can at least initially be discussed without all the baggage of CapX spending. It is quite probable that innovation will accelerate once HPC and its data-intensive computing cousins are considered OpX expenditures that shows up in Cost of Goods Sold.