Barcelona Supercomputing Center / Universitat Politecnica de Catalunya

NVIDIA GPU Center of Excellence

Since 2011, the Barcelona Supercomputing Center (BSC) in association with Universitat Politecnica de Catalunya (UPC) is a NVIDIA GPU Center of Excellence.

BSC-GCoE at 2017 GTC, Silicon Valley, May 8-11, 2017

At GTC 2017, BSC will present:


  • Interactive HPC: Large Scale In-situ Visualization using NVIDIA Index in ALYA MultiPhysics - Vishal Mehta, Christopher Lux, and Marc Nienhaus
  • OmpSs+OpenACC: Multi-target Task-Based Programming Model exploiting OpenACC GPU Kernels - Guray Ozen
  • Urban Scale Crowd Data Analysis, Simulation, and Visualization - Isaac Rudomin

Instructor-led Labs:

  • Best GPU Code Practices Combining OpenACC, CUDA, and OmpSs - Pau Farre and Antonio J. Peña


  • Simulating the Behavior of the Human Brain on NVIDIA GPUs (Human Brain Project) - P. Valero-Lara, I. Martinez-Perez, Antonio J. Peña, X. Martorell, R. Sirvent, and J. Labarta

BSC-GCoE at 2017 GPU Hackathon located at Jülich Supercomputing Center, March 6-11, 2017

PUMPS 2017 Summer School at BSC/UPC, June 26-30, 2017

AsHES Workshop, May 29, 2017

SCALE Challenge, with CCGrid, May 14-17, 2017

The GCoE at SC'16 , November 2016

  • Vishal Mehta Sr. Engineer and PhD Student presented a talk at NVIDIA booth titled “The Human Sniff: Application of NVIDIA IndeX Advanced Rendering Solution in HPC”. A graphical demonstration was played both at NVIDIA and BSC's booths.

1st Annual BSC/UPC HPC Hackathon October 21, 2016

GTC Europe 2016 September 28-29, 2016 Amsterdam

At GPU Technology Conference (GTC), this GCoE presented:

  • Best Practices for OpenACC Optimizations in Large Scale Multi-Physics Applications
    • Vishal Mehta Sr. Engineer and PhD Student. Learn best practices for finite element modelling in multi-physics applications. Understanding the right approach to sparse matrix assembly for performance and portability of the application. Adding OpenACC kernels and parallel regions with proper granularity. Learn how to program large scale MPI simulation along with OpenACC. The workshop will have hands on coding experience with vectorization of code, adding OpenACC pragmas with data dependence and end with scaling the code with MPI processes.
  • Analyzing the effect of last level cache sharing on integrated platforms with fine-grain CPU-GPU collaboration
    • Victor Garcia PhD Candidate, Antonio J. Peña GCoE Acting Director, and Eduard Ayguadé CS Department Associate Director, in collaboration with Juan Gómez-Luna, Thomas Grass, and Alejandro Rico. Although on-die GPU integration seems to be the trend among the major microprocessor manufacturers, there are still many open questions regarding the architectural design of these systems. This poster is a step forward towards understanding the effect of on-chip resource sharing between GPU and CPU cores, and in particular, of the impact of last-level cache (LLC) sharing in heterogeneous computations.

PUMPS 2016 Summer School at BSC/UPC, July 11-15, 2016

PATC Course: Introduction to OpenACC, July 7-8, 2016

PATC Course: Introduction to CUDA Programming, July 4-6, 2016

GTC 2016 April 4-8, 2016 Silicon Valley

At GPU Technology Conference (GTC), this GCoE presented:

  • ALYA Multi-Physics System on GPUs: Offloading Large-Scale Computational Mechanics Problems
    • Vishal Mehta Sr. Engineer and PhD Student. Learn to interface CUDA kernels, CUDA library API and driver APIs with existing Fortran applications in HPC. This session informs you about the Alya multi-physics code developed at Barcelona Supercomputing Centre. The code is based on Fortran95 and scales across thousands of cores. We describe in depth how to port computationally heavy modules from Fortran to CUDA. The session will teach in depth on how to use CUDA features like dynamic parallelism, CUDA streams, unified memory, and error handling features for Fortran applications with NVCC compiler. We also discuss future directions using next-generation programming models such as OmpSs for hybrid CPU and GPU computing. The presentation includes various example codes for improving the programming skills of the scientific community.
  • HPC Application Porting to CUDA? at BSC
    • Pau Farre Jr. Engineer, Mar Jorda, Jr. Engineer. In this session you will learn the main challenges that we have overcome at the BSC to successfully accelerate two large applications by using CUDA and NVIDIA GPUs: WARIS (a Volcanic Ash Transportation Model) and PELE (a Drug Molecule Interaction Simulator). We show that leveraging asynchronous execution is key to achieve a high utilization of the GPU resources (even for very small problem sizes) and to overlap CPU and GPU execution. We also explain some techniques to introduce Unified Virtual Memory in your data structures for seamless CPU/GPU data sharing. Our results show an execution time improvement in WARIS of 8.6x for a 4-GPU node compared to a 16-core CPU node (using by-hand AVX vectorization and MPI). Preliminary experiments in PELE already show a 2x speedup.
  • Implementing Deep Learning for Video Analytics on Tegra X1
    • Carles Fernandez, UPC startup Herta Security. The performance of Tegra X1 architecture opens the door to real-time evaluation and deployment of deep neural networks for video analytics applications. This session presents a highly optimized, low-latency pipeline to accelerate demographics estimation based on deep neural networks in videos. The proposed techniques leverage the on-die hardware video decoding engine and Maxwell GPU cores for conducting advanced video analytics such as gender or age estimation. Our results show that Tegra X1 is the right platform for developing embedded video analytics solutions.

Antonio J. Peña new Acting Director of the GCoE - March 2016

In Memoriam - Nacho Navarro, February 2016

  • We are very sad to say goodbye to our beloved Nacho Navarro, who has been the Acting Director of this GCoE since its beginning. We will all miss you.
start.txt · Last modified: 2017/02/08 16:36 by pvalero CUDA Research Center