Teradata, AWS, And Data Gravity
October 15, 2015 Joab Jackson
Teradata is bringing its flagship data warehouse to Amazon Web Services. Beginning next year, the company will offer on the AWS Marketplace the full Teradata Database, which will run on EC2 (Elastic Cloud Compute) instances. It is a big jump for Teradata, which has always tightly controlled the infrastructure upon which its databases run.
Teradata is no stranger to the cloud, of course. The company has virtualized the database to run on the VMware ESXi hypervisor for developmental use. Teradata also offers its own private cloud service, called Teradata Cloud, a fully hosted operation based in a Las Vegas data center.
Still, the company is careful to explain the niche where the AWS fits within the overall company portfolio. “The dedicated platform on-premises will always have the best performance,” Chris Twogood, vice president of products and services marketing, tells The Next Platform.
After all, multitenancy cloud operations, however well-provisioned, probably still couldn’t match dedicated hardware, especially systems wired together with low-latency InfiniBand; word is still out on if AWS users could deploy the InfiniBand-based HPC EC2 instances for Teradata jobs. Users that want the best performance should stick with the in-house products, and those that want the most flexibility should go with AWS, Twogood explains.
By flexibility, Twogood means that users can spin up an instance, run it for testing or pilot projects, and then spin it down again. They can scale up popular services without waiting for hardware to be delivered. Flexibility in the AWS case also means that users can connect more readily to other AWS services they may be running. The Teradata Cloud sits right between the in-house and AWS options, Twogood further explains. It offers the guaranteed performance of the in-house setup, but the flexibility of scaling up and down the cloud.
Teradata has not nailed down the exact configurations on how the parallel database will be deployed on EC2. The general idea is that users will add more EC2 instances to scale up a cluster should they need additional computing capacity. Data storage, through S3, will work in the same way. But the topology may not match, on a node-for-node basis, between in-house and AWS deployments.
The Teradata on AWS offering will be configured to work with parallel databases in the size of multiple terabytes, even if they are not initially in the hundreds of petabytes that an in-house Teradata cluster could hold. The company sees the AWS service as an ideal fit for departmental-level production work, and for companies that are new to Teradata.
Pricing hasn’t been disclosed yet, so word is still out on if it would be less expensive to run Teradata software sans Teradata hardware over time. People will no doubt be doing the math on that one, as they do for all datacenter workloads these days. The move to the AWS cloud eliminates the upfront costs of software and hardware purchases, which is attractive to many IT organizations. Also, users can provision Teradata Database themselves, without waiting for the equipment to be delivered or administrators to incorporate it into the internal networks.
With this offering, Teradata is going to compete directly with Amazon’s own Redshift service, which is also a petabyte-scale, fully managed data warehouse. If an organization is running other AWS services such as Elastic Map Reduce, DynamoDB, or Kinesis, it might make more sense to use AWS Redshift, which is more tightly integrated with these other offerings.
Then again, if the organization has a fair amount of Teradata skills, and takes advantage of Teradata specific features, then running the Teradata AWS instance may be the way to go, Doug Henschen, vice president and principal analyst for Constellation Research, explains to The Next Platform.
Teradata provides a wealth of connectivity tools such as JBDC, ODBC, OLE, CGI (for PHP) and .Net connectors, as well as support for older languages like COBOL, C and PL/1, through preprocessors.
Overall, the appeal of the service depends heavily on an organization’s data gravity, Henschen says. Where does the company data sit? On AWS or in-house? How about the other processes that manipulate that data into something resembling value? Where the company’s systems may be situated will be the largest factor in whether to consider to use AWS or the in-house Teradata product. Or Redshift, for that matter.
Teradata plans to bring its eponymous database to other cloud providers in 2016. It has not named which companies yet, but Microsoft Azure and Google Cloud Platform are safe bets. The company is also working to ready its Aster parallel database for cloud services as well. Aster is already available as a service on the Teradata Cloud.
The company will be talking more about the AWS deployments during its Partners conference next week in Anaheim, California.