A Path to Prominence: SMU's New High Performance Computing Cluster

SMU's ManeFrame

Southern Methodist University, the nationally ranked university in downtown Dallas, has taken significant steps to achieve a new level of computational capabilities on its campus of 11,000 students. With strong programs in the humanities and sciences plus seven professional schools, SMU has recognized that the best path forward to advancing its research agenda is computationally-based research, not the traditional wet labs of the past.

After several generations of smaller clusters, SMU received an award from the Department of Defense of a large cluster with 8,800 cores several years ago. SMU held a naming contest for their new powerful machine: the winning name, "ManeFrame," reflected SMU's Mustang mascot and pony heritage. Eventually, that machine's capabilities were stretched by the university's growing computational research community and a new solution was pursued.

A committee composed of Research Community's Center for Scientific Computation representatives and the Office of Information Technology came up with a proposed configuration of worker nodes, high memory nodes, and graphics process unit nodes all backed by a high speed multi-petabyte storage and 100Gbs internal network. Although the resulting system design would be costly both from an initial purchase standpoint and a continuing maintenance perspective, the University's administration and board supported the purchase, recognizing its importance to SMU's research prominence.

A key to the future success of the new High Performance (HPC) would be the infrastructure needed to house it, and SMU was fortunate at having constructed a new state-of-the-art data center as well as having upgraded its networking with fiber optics connecting all 100 buildings and installing redundant high-speed fiber connections to LEARN, SMU's Internet service provider. With those needs satisfied, SMU proceeded to release an RFP, select vendors, and order the significantly improved High Performance Computing Cluster.

The new HPC cluster will dramatically increase the computational capacity and performance that SMU provides to its researchers. It features state of the art CPUs, accelerators, high speed/low latency network, multi-petabyte parallel file system, larger memory configuration per node, and advanced interactive GPU-accelerated remote desktop experiences. Also, the cluster is much more energy efficient (making it more economical to run) and more environmentally friendly than the previous cluster.

Joe Gargiulo and Allen Hughes

SMU Data Center Corridor

The new ManeFrame II cluster will provide a similar interactive experience for researchers currently using the soon-to-be-replaced cluster. The familiar operating system, resource scheduler, and the Lmod environment module system will be in the new cluster. Additionally, updated, but familiar development tool chains will be available, making the transition to the significantly-improved cluster as easy as possible for SMU's researchers.

The new and more efficient architecture, high core count, and high memory capacities of ManeFrame II nodes will provide significant improvements to existing computationally or memory intensive workflows. The new cluster consists of:

  • 176 standard compute nodes with dual Intel Xeon E5-2695v4 2.1 GHz 18-core Broadwell processors, 256GB of memory, and 100Gb/s networking,
  • 35 medium-memory compute nodes with the same processors as the standard ones but with 768GB of memory,
  • Five high-memory compute nodes with 1,536GB (1.5TB) of DDR4-2400 memory,
  • The previous ManeFrame's new four 768GB and six 1,536GB (1.5TB) nodes also will be added to the new cluster,
  • 36 accelerator nodes powered with dual Intel Xeon E5-2695v4 2.1 GHz 18-core Broadwell processors, 256GB of DDR4-2400 memory, and one NVIDIA P100 GPU accelerator,
  • 36 many-core nodes with Intel Xeon Phi 7230 (also known as Knights Landing or KNL) processors and 385GB of DDR4-2400 memory,
  • Five virtual desktop nodes will allow researchers remote desktop access to high-performance compute capability. These nodes can be used for applications that have demanding remote visualization and/or rendering requirements. In addition, these virtual desktops can be configured as either Linux or Windows for a handful of compatible applications,
  • The cluster provides high-speed and low-latency EDR InfiniBand networking. Every node is equipped with a Mellanox ConnectX-5 InfiniBand adaptor and all nodes are connected via Mellanox Switch-IB 2 switches, and
  • Three new storage systems. The first storage system will be an NFS based storage providing space for home directories, applications, libraries, and compilers, etc. The second storage system will provide the high-performance Lustre parallel file system for calculation scratch space. The third storage system is 110TB of usable disk based archive space that includes off-site backup for disaster recovery.

Southern Methodist University's ManeFrame II HPC Cluster, while still under construction, has generated lots of excitement at the university and demonstrates SMU's commitment to research through High Performance Computing. Joe Gargiulo, CIO of SMU, is convinced that the HPC is on a path to help achieve his university's goal of "World Changers Shaped Here!"

SMU's New Data Center (old on the left and new build-in-progress on the right)