You are here

rCUDA

Introduction

rCUDA is middleware that enables seamless access to any CUDA-compatible device present in a cluster from all compute nodes. It is structured following a client--server distributed architecture. The GPUs can be shared between nodes, and a single node can use all these graphic accelerators as if they were local. These properties aim to attain higher accelerator utilization rates in the overall system while simultaneously reducing resource, space, and energy requirements. In rCUDA the client middleware is allocated in the same cluster node as the application demanding GPGPU acceleration services, providing a transparent replacement for the native CUDA libraries. The server middleware, on the other hand, is executed in the cluster nodes from which the actual GPUs provide the requested GPGPU service.

The rCUDA client exposes the same interface as the regular NVIDIA CUDA, including the runtime and driver APIs and cuBLAS, cuFFT, cuRAND and cuSPARSE libraries, so that applications are not aware of the fact that they are executing on top of a virtualization layer.

rCUDA additionally includes highly optimized TCP and low-level InfiniBand pipelined communications as well as full multi-thread and multi-node capabilities. Furthermore, an integration of rCUDA with the SLURM scheduler has been developed, allowing your scheduled jobs to use remote GPUs.

We have developed rCUDA in a joint collaboration (until January 2015) with the Parallel Architectures Group from the Technical University of Valencia.

People from HPC&A involved in the project



Development

  • Antonio J. Peña: Main Architech
  • Adrián Castelló: Features, Network and Support
  • Sergio Iserte: Resource Management Integration
  • Vicente Roca: Network
  • Sonia Cervera: Libraries

Management

Publications



PhD Dissertation

Journals

International Conferences

Spanish conferences

  • S. Iserte, A. Castelló, R. Mayo, E. S. Quintana-Ortí, J. Prades, C. Reaño, F. Silla, and J. Duato, “Comparativa de políticas de selección de GPUs remotas en clusters HPC,” (XXVI Jornadas de Paralelismo (JP2015), Córdoba (Spain)), 2015.
  • S. Iserte, A. Castelló, A. J. Peña, C. Reaño, J. Prades, F. Silla, R. Mayo, , E. S. Quintana-Ortí, and J. Duato, “Extendiendo SLURM con soporte para el uso de GPUs remotas,” in Actas de las XXV Jornadas de Paralelismo (ISBN:978-84-697-0329-3), (XXV Jornadas de Paralelismo (JP2014), Valladolid (Spain)), pp. 187–193, 17-19 September 2014.
  • A. Castelló, J. Duato, R. Mayo, A. J. Peña, E. S. Quintana-Ortí, V. Roca, and F. Silla, “Acelerando aplicaciones cientificas con GPUs remotas y procesadores de bajo consumo,” in Actas de las XXV Jornadas de Paralelismo (ISBN:978-84-697-0329-
    3), (XXV Jornadas de Paralelismo (JP2014), Valladolid (Spain)), pp. 187–193, 17-19 September 2014.
  • S. Iserte, A. Castelló, C. Reaño, A. J. Peña, F. Silla, R. Mayo, E. S. Quintana-Ortí, and J. Duato, “Un planificador de GPUs remotas para clusters HPC,” in Actas de las XXIV Jornadas de Paralelismo (ISBN:978-84-695-8330-2), (XXIV Jornadas de Paralelismo (JP2013), Madrid (Spain)), pp. 193–198, 17-20 September 2013.
  • C. Reaño, A. Castelló, S. Iserte, A. J. Peña, F. Silla, R. Mayo, E. S. Quintana-Ortí and J. Duato, “Virtualización remota de GPUs: Evaluación de soluciones disponibles para CUDA,” in Actas de las XXIV Jornadas de Paralelismo (ISBN:978-84-695-8330-2), (XXIV Jornadas de Paralelismo (JP2013), Madrid (Spain)), pp. 270–275, 17-20 September 2013.

Talks

  • S. Iserte, "GPU Virtualization in AWS Virtual Machines" in Amazon AWS Day at Universitat Jaume I, Castellón de la Plana (Spain). 6 June 2015
  • S. Iserte, "Managing Virtualized Remote GPUs with Slurm" in Slurm Training '15 (HPCKP'15), Barcelona (Spain). 6 February 2015
  • S. Iserte, "Extending Slurm with Support for Remote GPU Virtualization" in Slurm User Group Meeting 2014, Lugano (Switzerland). 23 September 2014
  • R. Mayo, "Managing the GPUs of your cluster in a flexible way with rCUDA" in HPC Advisory Council Brasil Conference 2014
    May 26, 2014, University of Sao Paulo, Brasil
  • A. J. Peña, R.Mayo, "rCUDA 4: GPGPU as a service in HPC clusters" in HPC Advisory Council Spain Conference 2012
    Septembre 13th, 2012, University of Malaga, Spain
  • R. Mayo, "rCUDA: a ready-to-use remote GPU virtualization framework" in HPC Advisory Council European Conference 2013
    June 16, 2013, Leipzig, Germany
  • F. Silla, A. J. Peña, "Accelerations hands-on: rCUDA, an approach to provide remote access to GPU computational power" in
    HPC Advisory Council Switzerland Conference 2012 March 13-15, 2012, Lugano, Switzerland.
  • T. Wide, R. Mayo, "Maximize the GPU Performance in Your Compute Cluster Using rCUDA Virtual GPU Technology" http://www.mellanox.com/webinars/2012/Using-rCUDA-Virtual-GPU-Technology/
  • R. Mayo, "An approach to provide remote access to GPU computational power" in HPC Advisory Council China Workshop 2011 October 25th, 2011, Jinan, China
  • R. Mayo, "rCUDA: an approach to provide remote access to GPU computational power" in HPC Advisory Council European 2011 Workshop June 19th, 2011, Hamburg, Germany