GCoE@BSC Related Publications

  • J. Gómez-Luna, I. El Hajj, L. Chang, V. Garcia-Flores, S. Garcia de Gonzalo, T. B. Jablin, A. J. Peña, and W. Hwu. “HPB: A suite of heterogeneous benchmarks with collaborative execution patterns”, in IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), San Francisco, USA, Apr. 2017.
  • R. de la Cruz, A. Folch, P. Farre, J. Cabezas, N. Navarro, J. M. Cela. “Optimization of atmospheric transport models on HPC platforms”, Computers & Geosciences, Elsevier, vol. 97, pp. 30-39, Dec. 2016.
  • A. M. Aji, A. J. Peña, P. Balaji, and W. Feng. “MultiCL: Enabling automatic scheduling for task-parallel workloads in OpenCL”, Parallel Computing, Elsevier, vol. 58, pp. 37-55, Oct. 2016.
  • V. Garcia, J. Gomez-Luna, T. Grass, A. Rico, E. Ayguade, and A. J. Peña. “Evaluating the effect of last-level cache sharing on integrated GPU-CPU systems with heterogeneous applications”, in IEEE International Symposium on Workload Characterization (IISWC), Rhode Island, USA, Sep. 2016.
  • A. Castelló, A. J. Peña, R. Mayo, J. Planas, E. S. Quintana-Ortí, and P. Balaji, Exploring the interoperability of remote GPGPU virtualization using rCUDA and directive-based programming models, Journal of Supercomputing, Springer, June 2016. [online] 10.1007/s11227-016-1791-y.
  • H. Pérez, B. Hernández, I. Rudomin, and E. Ayguadé. Task-based crowd simulation for heterogeneous architectures, in Innovative Research and Applications in Next-Generation High Performance Computing. April 2016.
  • G. Ozen, E. Ayguadé, and J. Labarta. Exploring dynamic parallelism in OpenMP, in Second Workshop on Accelerator Programming using Directives (WACCPD), November 2015.
  • J. Planas, R. M. Badia, E. Ayguadé, and J. Labarta. SSMART: Smart scheduling of multi-architecture tasks on heterogeneous systems, in Second Workshop on Accelerator Programming using Directives (WACCPD), November 2015.
  • J. Planas, R. M. Badia, E. Ayguadé, and J. Labarta. AMA: Asynchronous management of accelerators for task-based programming models, in International Conference on Computational Science (ICCS), June 2015.
  • Automatic Parallelization of Kernels in Shared-Memory Multi-GPU Nodes, Javier Cabezas (BSC), Lluis Vilanova (BSC), Isaac Gelado (NVIDIA Research), Thomas B. Jablin (UIUC), Nacho Navarro (BSC), Wen-mei Hwu (UIUC). ICS 2015, CA. June 2015
  • H. Pérez, B. Hernández, I. Rudomin, and E. Ayguadé. “Scaling crowd simulations in a GPU accelerated cluster”, iin 6th International Conference ISUM, Mexico City (Mexico), March 2015.
  • GPU-SM: shared memory multi-GPU programming, Javier Cabezas, Marc Jordà, Isaac Gelado, Nacho Navarro and Wen-Mei Hwu, GPGPU-8, 8th Workshop on General Purpose Processing Using GPUs, February 2015.
  • Automatic execution of single-GPU computations across multiple GPUs, Javier Cabezas, Lluís Vilanova, Isaac Gelado, Thomas B. Jablin, Nacho Navarro, Wen-mei Hwu, In Proceedings of the 23rd international conference on Parallel architectures and compilation (PACT '14). DOI=10.1145/2628071.2628109, August 2014
  • Random Forests of Very Fast Decision Trees on GPU for Mining Evolving Big Data Streams, Diego Marron, Albert Bifet, Gianmarco De Francisci Morales, ECAI 2014, P. 615-620, August 2014
  • Enabling Preemptive Multiprogramming on GPUs, Ivan Tanasic, Isaac Gelado (NVIDIA Research), Javier Cabezas, Alex Ramirez, Nacho Navarro, Mateo Valero, ISCA 2014, MN, June, 2014
  • Experimental Assessment of a High Performance Back-end PCE for Flexgrid Optical Network Re-optimization, Lluís Gifre, Luis Velasco, Nacho Navarro, Gabriel Junyent, Optical Fiber Communication Conference and Exposition (OFC), March 2014.E-print UPC
  • Runtime and Architecture Support for Efficient Data Exchange in Multi-Accelerator Applications, Javier Cabezas, Isaac Gelado, John S Stone, Nacho Navarro, David B Kirk, Wen-mei W Hwu. IEEE Transactions on Parallel and Distributed Systems. April, 2014 IEEE Xplore
  • Auto-Tuning of Data Communication on Heterogeneous Systems, Marc Jorda, Ivan Tanasic, Luis Vilanova, Javier Cabezas, Isaac Gelado and Nacho Navarro, IEEE 7th International Symposium on Embedded Multicore/Many-core System-on-Chip (MCSoC’13), September 2013.
  • Architecture of a Specialized Back-End High Performance Computing-Based PCE for Flexgrid Networks, Lluís Gifre, Luis Velasco, Nacho Navarro, 15th International Conference on Transparent Optical Networks (ICTON), June 2013. 10.1109/ICTON.2013.6602716
  • Self-Adaptive OmpSs Tasks in Heterogeneous Environments, J. Planas, Badia, R. M., Ayguadé, E., and Labarta, J., 27th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2013). IEEE, Boston, United States, pp. 138–149, 2013.
  • Comparison Based Sorting for Systems with Multiple GPUs, Ivan Tanasic, Lluís Vilanova, Marc Jorda, Javier Cabezas, Isaac Gelado, Nacho Navarro and Wen-mei W. Hwu, GPGPU-6 - Six Workshop on General Purpose Processing Using GPUs, Houston, TX (United States), Mar 2013.
  • Parallelizing General Histogram Application for CUDA Architectures, Ugljesa Milic, Isaac Gelado, Nikola Puzovic , Alex Ramirez and Milo Tomasevic, 2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIII), July 2013, http://dx.doi.org/10.1109/SAMOS.2013.6621100
  • Productive Programming of GPU Clusters with OmpSs, J. Bueno-Hedo, Planas, J., Duran, A., Badia, R. M., Martorell, X., Ayguadé, E., and Labarta, J., 26th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2012). IEEE Computer Society, pp. 557-568, 2012.
  • Assessing Accelerator-Based HPC Reverse Time Migration, Mauricio Araya-Polo, Javier Cabezas, Mauricio Hanzich, Miquel Pericas, Felix Rubio, Isaac Gelado, Muhammad Shafiq, Enric Morancho, Nacho Navarro, Eduard Ayguade, Jose Maria Cela, Mateo Valero, IEEE Transactions on Parallel and Distributed Systems, pp. 147-162, January 2011 (vol. 22 no. 1), doi:10.1109/TPDS.2010.144
  • Optimizing the Exploitation of Multicore Processors and GPUs with OpenMP and OpenCL, R. Ferrer, J. Planas, P. Bellens, A. Duran, M. Gonzalez, X. Martorell, R. Badia. E. Ayguade, J. Labarta, on proceedings of the The 23rd International Workshop on Languages and Compilers for Parallel Computing (LCPC2010), Lecture Notes in Computer Science, vol. 6548/2011. Springer-Verlag Berlin Heidelberg, pp. 215-229, 2011
  • An Asymmetric Distributed Shared Memory Model for Heterogeneous Parallel Systems, I. Gelado. J.E. Stone. J. Cabezas, S. Patel, N. Navarro and W.W. Hwu, 15th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'10), March 2010, Pittsburgh, PA.
  • High-Performance Reverse Time Migration on GPU, Javier Cabezas, Mauricio Araya-Polo, Isaac Gelado, Nacho Navarro, Enric Morancho and José M. Cela, In XXVIII International Conference of the Chilean Computer Society -­ XIII Workshop on Parallel and Distributed Systems (WSDP), Santiago de Chile (Chile), Nov 2009
  • High-Performance Reverse Time Migration on GPU, Javier Cabezas, Mauricio Araya-Polo, Isaac Gelado, Nacho Navarro, Enric Morancho and José M. Cela. In XXVIII International Conference of the Chilean Computer Society -­ XIII Workshop on Parallel and Distributed Systems (WSDP), Santiago de Chile (Chile), Nov 2009
  • Assessing Accelerator-based HPC Reverse Time Migration, Mauricio Araya-Polo, Javier Cabezas, Mauricio Hanzich, Miquel Pericàs, Félix Rubio, Isaac Gelado, Muhammad Shafiq, Enric Morancho, Nacho Navarro, Eduard Ayguadé, Jose María Cela and Mateo Valero. IEEE Transactions on Parallel and Distributed Systems.
  • Predictive Runtime Code Scheduling for Heterogeneous Architectures, Victor Jimenez, Isaac Gelado, Luis Vilanova, Marisa Gil, Grigori Fursin and Nacho Navarro, HiPEAC 2009 Conference, January 2009
  • An Extension of the StarSs Programming Model for Platforms with Multiple GPUs, Eduard Ayguade, Rosa M. Badia, Francisco D. Igual, Jesus Labarta, Rafael Mayo and Enrique S. Quintana-Ortí, in Proceedings of the EuroPar Conference (EuroPar 2009)
  • CUBA: An Architecture for Efficient CPU/Co-processor Data Communication, Isaac Gelado, John H. Kelm, Shane Ryoo, Nacho Navarro, Steve S. Lumetta, and Wen-mei W. Hwu, Proceedings of the 22nd ACM International Conference on Supercomputing, June 2008
  • Implementing Closed-Form Expressions on FPGAs Using the NAL, with Comparison to CUDA GPU and Cell BE Implementations, Robin Bruce, Javier Setoain, Richard Chamberlain, Malachy Devlin, Rosa M. Badia, Reconfigurable Systems Summer Institute 2008 (RSSI 2008), Urbana (Illinois), 2008
  • CIGAR: Application Partitioning for a CPU/Coprocessor Architecture, John H. Kelm, Isaac Gelado, Mark Murphy, Steven Lumetta, Nacho Navarro, Wen-mei Hwu, Proceedings of the Sixteenth International Conference on Parallel Architectures and Compilation Techniques, September 2007
  • Implicit Parallel Programming Models for Thousand-Core Microprocessors, Wen-mei Hwu, Shane Ryoo, Sain-Zee Ueng, John H. Kelm, Isaac Gelado, Sam S. Stone, Robert E. Kidd, Sara S. Baghsorkhi, Aqeel A. Mahesri, Stephanie C. Tsao, Nacho Navarro, Steve S. Lumetta, Matthew I. Frank, and Sanjay J. Patel, Proceedings of the 44th Annual Design Automation Conference, June 2007
publications.txt · Last modified: 2017/02/16 14:47 by apena
www.bsc.es CUDA Research Center