Julia Still Not Grown Up Enough to Ride Exascale Train

Nicole Hemsoth Prickett

8 months ago

We’ve been watching Julia, an HPC-oriented programming language designed for technical and scientific computing for a number of years to see it can make inroads into supercomputing. This week we were able to get some info about how it performs on an exascale system.

The main selling point of Julia has been that it combines the computational efficiency of languages like C and Fortran with the ease aspect of languages like Python and R.

Julia is particularly strong in areas requiring high computational power and data manipulation, such as supercomputing and has built-in support for parallel and distributed computing. It is open-source and utilizes Just-In-Time (JIT) compilation through LLVM to achieve its high performance.

One of the reasons it’s had some traction in HPC is because there are many libraries for various scientific needs with syntax is similar to MATLAB, making it easier for non-HPC-expert researchers to transition. Another fancy feature is the ability to easily call C and Fortran libraries without requiring special wrappers, enabling seamless integration with legacy codebases. It also has native support for a range of parallel computing paradigms, from shared memory multi-threading to distributed memory clusters, and even GPUs, making it versatile for various kinds of computational workloads.

While this sounds ideal, it’s pushed beyond some limits at extreme scale, although it’s hard to criticize anything for showing some cracks on the U.S.’s first exascale system, the Frontier supercomputer at ORNL.

In this breakdown of Julia’s performance on Frontier, it appears Julia is promising but not yet a fully optimized language for end-to-end workflows as the title indicates, at least at that scale.

It does hold up its end of the bargain as a unifying language for end-to-end HPC workflows on Frontier. The problem is that its computational performance lags, and significantly. As the paper outlines, it’s a nearly 50% gap compared to native AMD HIP codes on Frontier’s GPUs.

And while Julia scales well up to 4,096 MPI processes and exhibits near-zero overhead in MPI communications and parallel I/O, issues arise at larger scales and the JIT compilation introduces initial overheads.

The language shows good weak scaling properties but the study outlines increased variability in time-to-solution when scaling to larger process counts.

On a positive note, it looks like its MPI and ADIOS2 bindings demonstrate near-zero overheads, indicating Julia can handle communication and I/O tasks well. And sorry for the “but” but, the study didn’t explore GPU-aware MPI, which could be an avenue for further performance improvements and lift the Julia at exascale a bit.

That other key feature, the Just-In-Time (JIT) compilation, also appears to introduce an initial overhead, although it looks like this is amortized over time, which is okay though in an HPC setting, this could still be a concern for short, time-critical runs.

Julia was successful in providing a seamless workflow from computational simulation to data analysis and visualization in Jupyter Notebooks. This flexibility is a strong point in favor of Julia as an end-to-end HPC solution.

Julia shows a lot of promise and advantages in terms of language unification for different HPC tasks and near-zero overheads in communication and I/O.

Ah but, that pesky performance gap and the scalability concerns suggest Julia may not yet be fully ready for exascale computing without further optimization and testing.