BSC/UPC GCoE Achievements Summary

  • Enabling Preemptive Multiprogramming on GPUs
    • PhD Candidate Ivan Tanasic successfully passed the pre-defense of his doctoral dissertation titled “Towards Multiprogrammed GPUs”. Congrats!
  • PUMPS Summer School, “Programming and Tuning Massively Parallel Systems”
    • Since 2010, UPC/BSC organizes the worldwide recognized PUMPS summer school, sponsored by NVIDIA, with distinguished faculty members Dr. David B. Kirk of NVIDIA and Prof. Wen-mei Hwu of the University of Illinois. Each year, a very competitive selection of close to 100 international students come to UPC/BSC and learn how to exploit programming languages like CUDA, MPI, OmpSs, and therefore optimize their applications to run on the new GPU platforms. At a poster session the attendees show their projects. At the end of the week, best students get the Best Poster and Best Achievement Awards. We have become the European hub for GPU computing training.
  • CUDA on ARM platforms: Supercomputing for the “Small” Masses
    • We believe that the CARMA Kit is a big step towards enabling CUDA to reach a broader range of devices. The work we have done has helped to produce the first hardware that allows executing CUDA applications on ARM platforms, and will help bringing CUDA to mobile devices in the near future. Furthermore, we have also shown that HPC systems can be built using mobile parts and GPUs, and supercomputing workloads can be executed on those systems using novel programming models such as OmpSs.
    • Finalist at NVIDIA CCOE Achievement Award 2013: Abstract describing the achievement Achievement2013_BSC.pdf
    • We promote and support the ARM+GPU prototypes under the Mont-Blanc EU Project
    • Alex Ramirez, the project leader at BSC, is now Principal Research Scientist at NVIDIA in the Architecture group.
  • GMAC: Global Memory for Accelerators
    • The asymmetric distributed shared memory model (ADSM) for heterogeneous parallel systems has been seminal to introduce a Unified Virtual Address Space (UVAS) which is shared by CPU and GPU. ADSM allows programmers to declare objects once and use them both in GPU and CPU functions with no need for explicit memory transfers at all. The ADSM runtime transparently creates the needed copies of the objects and makes sure that they are coherent at consistency points. It uses acquire/release consistency at kernel call boundaries. ADSM is implemented in the GMAC user level library that provides eager GPU memory update to transparently overlap CPU computation (e.g., data initialization) and data transfers.
    • Presented at ASPLOS XV, NY, USA, 2010. Repository: https://code.google.com/p/adsm/
    • Isaac Gelado, who was leading this research, is now Research Scientist at NVIDIA Research in the Programming Models Group.
  • Accelerating Face Detection and Deep Learning on GPUs
    • UPC startup Herta Security is ahead in using GPUs to extract biometric data. In an era where security has become a major growth industry, technologies to refine facial recognition are in high demand. Herta Security is on the cutting edge of real-time face recognition and has developed a number of solutions, including Biosurveillance and BioFinder — high-performance video-surveillance specially designed to simultaneously identify subjects in a crowded and changeable environment. On the non-security side, Herta has developed BioMarketing, which can identify parameters such as gender, approximate age, use of glasses, and various facial expressions to enable advertisers to reach an identified audience with a specific message.
    • Herta Security, Javier Rodriguez Saeta, CEO, has been selected to present this work at the Emerging Companies Summit “Show & Tell” event at GTC 2015
    • Press Release (Spanish)
  • OmpSs: Leveraging CUDA for Productive Programming in Clusters of Multi-GPU Systems
    • OmpSs is a directive-based model through which a programmer defines tasks in an otherwise sequential program. Directionality annotations describe the data access pattern for the tasks and convey the runtime information it uses to automatically detect potential parallelism, to automatically perform data transfers and to optimize locality. Integrating this model with CUDA allows applications to leverage the dazzling performance of GPUs, enabling the same simple and clean code that would run on an SMP to run on multi-GPU nodes and clusters.
    • Finalist at NVIDIA CCOE Achievement Award 2012: Abstract describing the achievement Achievement2012_BSC.pdf
    • Up to date status at Programming Models @ BSC: http://pm.bsc.es/
  • Scientific Big Data Visualization
    • The BSC Department on Computer Applications on Science and Engineering has designed and implemented a parallel visualisation system for the analysis of large scale time-dependent particle type data. The particular challenge we address is how to analyse a high performance computation style dataset when a visual representation of the full set is not possible or useful, and one is only interested in finding and inspecting smaller subsets that fulfil certain complex criteria. The system runs on the BSC Minotauro supercomputer with 252 NVIDIA M2090 cards.
    • Among the videos that have won prizes:
      • “Alya Red: A Computational Heart”, a science dissemination video (made using scientific data visualizations) about a project for simulating a human heart. Winner of the 2012 International Science Visualization challenge organized by the National Science Foundation and the Science journal. link: http://youtu.be/tKD2hfF27rM
      • “Supercomputers”, a science dissemination video (made using scientific data visualizations) about the impact of HPC simulations in science and our daily lives. Winner, Category Exact Sciences, Engineering, and Technology, 2014 Ronda International Scientific Film Biennial. link: http://youtu.be/S9YPcPtPsuY
  • AMGE: Heterogeneous and Automatic multi-GPU Execution
    • AMGE (Automatic Multi-GPU Execution), a programming interface, compiler support and runtime system that automatically executes computations that are programmed for a single GPU across all the GPUs in the system, choosing the best computation and data distribution configuration to minimize inter-GPU communication and memory footprint.
    • Published at PACT 2014, GPGPU 2015.
    • Javier Cabezas, principal researcher and developer, has been intern at NVIDIA Santa Clara Labs during 2014.
  • Enabling Preemptive Multiprogramming on GPUs
    • (ISCA 41) International Symposium on Computer Architecture, June, 2014
    • We are establishing a collaboration with NVIDIA Austin Lab on this topic.
  • Oil exploration: Accelerator-based HPC Reverse Time Migration
    • Oil and gas companies trust Reverse Time Migration (RTM), the most advanced seismic imaging technique, with crucial decisions on drilling investments. The economic value of the oil reserves that require RTM to be localized is in the order of 10^13 dollars. RTM's major strength, compared to previous solutions, is the capability of showing the bottom of salt bodies at several kilometers (6 km) beneath the earth's surface. Our multi-GPU Tesla implementation outperformed all other accelerators because of its highly parallel and higher memory bandwidth that allows for a very efficient implementation of the data-parallel kernels in RTM.
    • The BSC CCOE and the Applications Department have been developing the optimization and parallelization on GPUs in partnership with REPSOL, the Spanish gas and oil company, running searches for oil reserves of the Mexican Gulf.
achievements.txt · Last modified: 2016/12/15 17:55 by apena
www.bsc.es CUDA Research Center