Optimization for Real-Time AI and Analytics Starts in the Datacenter

It is never easy to blaze new trails in IT infrastructure due to specific, unique workload requirements. There can be few references for architectural lessons, which means the need for hard-won optimizations is more critical than ever.

Recommendation engines are notoriously resource-heavy for IT shops, especially with deep learning in the mix. Separate clusters for training and retraining, systems for near real-time inference, storage and I/O systems that need to handle mixed, high-velocity workloads, and software stacks that change with new frameworks are just part of the overall puzzle for companies doing recommendation at scale.

Content discovery giant Taboola serves up over 80 billion recommendations across tens of thousands of sites daily, all of which need to be tailored to the reader at the right time. This places considerable demands on the company’s distributed infrastructure, especially on the compute side. Its servers must be incredibly dense and efficient while still robust enough to deliver the kind of rapid, accurate turnaround its users expect.

We will be looking at the company’s IT infrastructure in some depth in different pieces (storage and training in particular) but to begin, we wanted to shed some light on key hardware decisions Taboola made as it continued to scale its capabilities to serve a growing list of publishers and content providers. For insight, we spoke to the mastermind behind Taboola’s large IT infrastructure, Ariel Pisetzky, VP of IT. Pisetzky pointed out that the infrastructure has grown organically over time and because of the uniqueness of their workloads, it was difficult to look to the outside datacenter world. Their demands did not fit the template for large general enterprise IT, even though scale, cost, and efficiency concerns were similar, nor did it fit what the hyperscalers or HPC shops built. Because of their need for rapid deployment of massive trained (and constantly retrained) neural networks and the need for vast integration across a largely open source-based foundation, they had to learn lessons, sometimes the hard way, but always with their partner in scaling up and out, Dell Technologies.

Taboola has over 10,000 global servers across nine datacenters, three of which are in Israel. With the exception of cloud for backup and a few other less mission-critical workloads, everything Taboola does runs in-house across a distributed platform that spans the globe for low latency and high availability. Pisetzky says that one of the foundational elements for future optimization is having flexible infrastructure. That journey begins with choosing the right server nodes with the right NVIDIA GPUs, networks, and storage tied in.

The recommendation-focused neural networks the company uses are not widely deployed outside Taboola (and probably not at the same scale or with the same mission-critical demands as Taboola). The company also has unique hardware infrastructure requirements that maximize performance, scalability, and power consumption. The time budget from a user loading a page to Taboola delivering the exact correct content for that user, location, and behavior is around 80 milliseconds, which means those balancing acts on the infrastructure side are challenging. For more on the process on backend and front-end servers, take a look at this case study of Taboola’s infrastructure.

“We have had to optimize over and over again and we’re always learning,” Pisetzky says. For instance, when TensorFlow was implemented as the foundation for AI training engineering teams had to optimize the process as fast as possible to keep up with demands coming in thousands of times per second. “We needed each server to push out as many recommendations as possible per second.”

“We are always looking to optimize and push for more out of hardware. At our scale, with 10,000 servers and the amount of traffic coming in, any improvement, whether it’s even 1% up to 160% that’s the moment where you see what you need to get more of. We continue to work on optimization on how to best utilize things like AVX, the clock speeds of our servers, just as a couple of examples.”

Pisetzky and his team have worked to fine-tune everything from their Kubernetes-based container approaches to the CPUs that feed inference and the NVIDIA GPU clusters that train and re-train at a furious pace. At the core of their ability to optimize is a right-sized set of systems that are adaptable to their power budgets and performance requirements and more important, are flexible enough to allow Taboola change with demands, especially on the AI front.

The company has invested heavily in NVIDIA GPUs for training over the years, which Pisetzky expects will be continued investment due to their ability to train and retrain crucial Taboola data that feeds into their real-time platforms for content delivery. The company has gone through a number of generations of NVIDIA GPUs, from lower-end GPUs from the Kepler series all the way up to the recent NVIDIA Volta V100 GPUs for rapid, efficient training.

For the other end of the line in inference, Taboola has matched Dell EMC PowerEdge servers with NVIDIA GPUs for training for fast, power-efficient results.

“We invested heavily in the FX server line from Dell. We didn’t want to go with blade servers or have any customized hardware. Dell was early in identifying us as not a regular customer but they put us in touch with their OEM and DSS team for specialized workloads and needs. They really helped us with storage as well as with our overall growing server infrastructure.”

He makes the point that the first point of optimization is working from a systems base that will support unique and often shifting requirements. In Taboola’s case, inference needs continued to grow as model training became more advanced. Further, all infrastructure had to be supported across their HDFS-based clusters with Cassandra, Vertica, and other analytical systems interfacing across distributed machines. He says the personalized, tailored approach to choosing and scaling infrastructure began with Dell’s focus on understanding their uniqueness. From there on, tweaking was less about fixing than about optimization—which is what true optimization should always be.

More detail about the overall infrastructure stack at Taboola can be found here.

Optimization for Real-Time AI and Analytics Starts in the Datacenter

Sign up to our Newsletter

Be the first to comment

Leave a Reply Cancel reply

Sign up to our Newsletter

Related Articles

Evolving GPUs Power Content Recommendation at Scale

Blazing New Trails in Storage for Large-Scale Recommendation Systems

Be the first to comment

Leave a Reply Cancel reply