rCUDA
Software efforts / rCUDA
Introduction
rCUDA is middleware that enables seamless access to any CUDA-compatible device present in a cluster from all compute nodes. It is structured following a client–server distributed architecture. The GPUs can be shared between nodes, and a single node can use all these graphic accelerators as if they were local. These properties aim to attain higher accelerator utilization rates in the overall system while simultaneously reducing resource, space, and energy requirements. In rCUDA the client middleware is allocated in the same cluster node as the application demanding GPGPU acceleration services, providing a transparent replacement for the native CUDA libraries. The server middleware, on the other hand, is executed in the cluster nodes from which the actual GPUs provide the requested GPGPU service.
The rCUDA client exposes the same interface as the regular NVIDIA CUDA, including the runtime and driver APIs and cuBLAS, cuFFT, cuRAND and cuSPARSE libraries, so that applications are not aware of the fact that they are executing on top of a virtualization layer.
rCUDA additionally includes highly optimized TCP and low-level InfiniBand pipelined communications as well as full multi-thread and multi-node capabilities. Furthermore, an integration of rCUDA with the SLURM scheduler has been developed, allowing your scheduled jobs to use remote GPUs.
We have developed rCUDA in a joint collaboration (until January 2015) with the Parallel Architectures Group from the Technical University of Valencia.
People from HPC&A involved in the project
Development
- Antonio J. Peña: Main Architech
- Adrián Castelló: Development, Network and Support
- Sergio Iserte: Resource Management Integration
- Vicente Roca: Network
- Sonia Cervera: Libraries
Management
- Rafael Mayo
- Enrique S. Quintana-Ortí
Publications
PhD Dissertation
- A. J. Peña, «Virtualization of accelerators in high performance clusters”, Universitat Jaume I, Castellon, Spain, Jan. 2013.
- V. Roca, «Diseño de un Sistema de Comunicaciones para Virtualización Remota de Aceleradores Gráficos sobre Sistemas Heterogéneos», Universitat Jaume I, Castellon, Spain, Dec. 2015.
Journals
- A. Castelló, A. J. Peña, R. Mayo, J. Planas, E. S. Quintana-Ortí, and P. Balaji, “Exploring the interoperability of remote GPGPU virtualization using rCUDA and irective-based programming models,” Journal of Supercomputing, 2016. DOI: 10.1007/s11227-016-1791-y
- C. Reaño, F. Silla, A. Castelló, A. J. Peña, R. Mayo, E. S. Quintana-Ortí, and J. Duato, “Improving the user experience of the rCUDA remote GPU virtualization framework,” Concurrency and Computation: Practice and Experience, 2014. DOI: 10.1002/cpe.3409
- A. J. Peña, C. Reaño, F. Silla, R. Mayo, E. S. Quintana-Orti and J. Duato, “A complete and efficient CUDA-sharing solution for HPC clusters,” Parallel Computing, vol. 40, no. 10, pp. 574 – 588, 2014. DOI: 10.1016/j.parco.2014.09.011
International Conferences
- S. Iserte, F. J. Clemente-Castelló, A. Castelló, R. Mayo and E. S. Quintana-Ortí, “Enabling GPU Virtualization in Cloud Environments,” in Proceedings of the International Conference on Cloud Computing and Services Science (CLOSER 2016), (Rome (Italy)), Apr. 2016.
- A. Castelló, A. J. Peña, R. Mayo, P. Balaji, and E. S. Quintana-Ortí, “Exploring the suitability of remote GPGPU virtualization for the OpenACC programming model using rCUDA,” in Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER 2015), (Chicago, IL (USA)), Sept. 2015.
- A. Castelló, R. Mayo, J. Planas, and E. S. Quintana-Ortí, “Exploiting task-parallelism on GPU clusters via OmpSs and rCUDA virtualization,” in Proceedings of the I IEEE International Workshop on Reengineering for Parallelism in Heterogeneous Parallel Platforms (RePara 2015), (Helsinki (Finland)), Aug. 2015.
- A. Castelló, J. Duato, R. Mayo, A. J. Peña, E. S. Quintana-Ortí, V. Roca, and F. Silla, “On the Use of Remote GPUs and Low-Power Processors for the Acceleration of Scientific Applications,” in The Fourth International Conference on Smart Grids,
Green Communications and IT Energy-aware Technologies (ISBN:978-1-61208-332-2), (ENERGY 2014, Chamonix (France)), pp. 57–62, 20 – 24 April 2014. - C. Reaño, F. Silla, A. J. Peña, G. Shainer, S. Schultz, A. Castelló, E. S. Quintana-Ortí, and J. Duato, “Boosting the performance of remote gpu virtualization using infiniband connect-ib and pcie 3.0,” in Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER 2014), (Madrid, Spain), Sept. 2014.
- S. Iserte, A. Castelló, R. Mayo, E. S. Quintana-Ortí, C. Reaño, J. Prades, F. Silla, and J. Duato, “Slurm support for remote gpu virtualization: Implementation and performance study,” in Proceedings of the International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2014), (Paris, France), Oct. 2014.
- C. Reaño, F. Silla, R. Mayo, E. S. Quintana-Ortí, J. Duato, and A. J. Peña. «Influence of InfiniBand FDR on the performance of remote GPU virtualization», in IEEE Cluster, Indianapolis, IN, USA, Sep. 2013.
- C. Reaño, A. J. Peña, F. Silla, J. Duato, R. Mayo, and E. S. Quintana-Ortí. “CU2rCU: towards the complete rCUDA remote GPU virtualization and sharing solution”, in Proceedings of the International Conference on High Performance Computing (HiPC), Pune, India, Dec. 2012.
- J. Duato, J. C. Fernández, R. Mayo, A. J. Peña, E. S. Quintana, and F. Silla. “Enabling CUDA acceleration within virtual machines using rCUDA”, in High Performance Computing Conference (HiPC), Bangalore, India, Dec. 2011.
- J. Duato, R. Mayo, A. J. Peña, E. S. Quintana-Ortí, and F. Silla. “Performance of CUDA virtualized remote GPUs in high performance clusters”, in International Conference on Parallel Processing (ICPP), pp. 365-374, Taipei, Taiwan, Sep. 2011.
- J. Duato, A. J. Peña, F. Silla, R. Mayo, and E. S. Quintana-Ortí, “rCUDA: reducing the number of GPU-based accelerators in high performance clusters”, in Proceedings of the International Conference on High Performance Computing and Simulation (HPCS), Caen, France, June 2010.
- J. Duato, F. D. Igual, R. Mayo, A. J. Peña, E. S. Quintana-Ortí, and F. Silla, “An efficient implementation of GPU virtualization in high performance clusters”, in Euro-Par 2009, Parallel Processing – Workshops, 6043, pp. 385-394, Lecture Notes in Computer Science, Springer-Verlag, 2010.
Spanish conferences
- S. Iserte, A. Castelló, R. Mayo, E. S. Quintana-Ortí, J. Prades, C. Reaño, F. Silla, and J. Duato, “Comparativa de políticas de selección de GPUs remotas en clusters HPC,” (XXVI Jornadas de Paralelismo (JP2015), Córdoba (Spain)), 2015.
- S. Iserte, A. Castelló, A. J. Peña, C. Reaño, J. Prades, F. Silla, R. Mayo, , E. S. Quintana-Ortí, and J. Duato, “Extendiendo SLURM con soporte para el uso de GPUs remotas,” in Actas de las XXV Jornadas de Paralelismo (ISBN:978-84-697-0329-3), (XXV Jornadas de Paralelismo (JP2014), Valladolid (Spain)), pp. 187–193, 17-19 September 2014.
- A. Castelló, J. Duato, R. Mayo, A. J. Peña, E. S. Quintana-Ortí, V. Roca, and F. Silla, “Acelerando aplicaciones cientificas con GPUs remotas y procesadores de bajo consumo,” in Actas de las XXV Jornadas de Paralelismo (ISBN:978-84-697-0329-
3), (XXV Jornadas de Paralelismo (JP2014), Valladolid (Spain)), pp. 187–193, 17-19 September 2014. - S. Iserte, A. Castelló, C. Reaño, A. J. Peña, F. Silla, R. Mayo, E. S. Quintana-Ortí, and J. Duato, “Un planificador de GPUs remotas para clusters HPC,” in Actas de las XXIV Jornadas de Paralelismo (ISBN:978-84-695-8330-2), (XXIV Jornadas de Paralelismo (JP2013), Madrid (Spain)), pp. 193–198, 17-20 September 2013.
- C. Reaño, A. Castelló, S. Iserte, A. J. Peña, F. Silla, R. Mayo, E. S. Quintana-Ortí and J. Duato, “Virtualización remota de GPUs: Evaluación de soluciones disponibles para CUDA,” in Actas de las XXIV Jornadas de Paralelismo (ISBN:978-84-695-8330-2), (XXIV Jornadas de Paralelismo (JP2013), Madrid (Spain)), pp. 270–275, 17-20 September 2013.
Talks
- S. Iserte, «GPU Virtualization in AWS Virtual Machines» in Amazon AWS Day at Universitat Jaume I, Castellón de la Plana (Spain). 6 June 2015
- S. Iserte, «Managing Virtualized Remote GPUs with Slurm» in Slurm Training ’15 (HPCKP’15), Barcelona (Spain). 6 February 2015
- S. Iserte, «Extending Slurm with Support for Remote GPU Virtualization» in Slurm User Group Meeting 2014, Lugano (Switzerland). 23 September 2014
- R. Mayo, «Managing the GPUs of your cluster in a flexible way with rCUDA» in HPC Advisory Council Brasil Conference 2014
May 26, 2014, University of Sao Paulo, Brasil - A. J. Peña, R.Mayo, «rCUDA 4: GPGPU as a service in HPC clusters» in HPC Advisory Council Spain Conference 2012
Septembre 13th, 2012, University of Malaga, Spain - R. Mayo, «rCUDA: a ready-to-use remote GPU virtualization framework» in HPC Advisory Council European Conference 2013
June 16, 2013, Leipzig, Germany - F. Silla, A. J. Peña, «Accelerations hands-on: rCUDA, an approach to provide remote access to GPU computational power» in
HPC Advisory Council Switzerland Conference 2012 March 13-15, 2012, Lugano, Switzerland. - T. Wide, R. Mayo, «Maximize the GPU Performance in Your Compute Cluster Using rCUDA Virtual GPU Technology» http://www.mellanox.com/webinars/2012/Using-rCUDA-Virtual-GPU-Technology/
- R. Mayo, «An approach to provide remote access to GPU computational power» in HPC Advisory Council China Workshop 2011 October 25th, 2011, Jinan, China
- R. Mayo, «rCUDA: an approach to provide remote access to GPU computational power» in HPC Advisory Council European 2011 Workshop June 19th, 2011, Hamburg, Germany