Los supercomputadores y centros de proceso de datos (CPDs) son infraestructuras imprescindibles para la transición digital de cualquier sociedad, ya que dan soporte a servicios esenciales actualmente para la industria, la ciencia, y las propias administraciones públicas, tales como la computación de altas prestaciones (High-Performance Computing, HPC), el análisis de datos de alto rendimiento y volumen (Big-Data, o High-Performance Data Analytics, HPDA), y la inteligencia artificial (IA), entre otros. Para poder proporcionar la enorme capacidad computacional y de almacenamiento requerida hoy en día por estos servicios, los supercomputadores de la actual generación (conocida como Exascale) y los modernos CPDs cuentan con un elevadísimo número de nodos de cómputo y almacenamiento, del orden de varias decenas o cientos de miles. Un número tal de nodos requiere a su vez una red que los interconecte muy eficientemente, de cara a que la información necesaria para completar colaborativamente los procesos a realizar pueda fluir entre los nodos a una velocidad suficiente como para evitar tiempos muertos de espera en los receptores. Por ello, tanto desde la industria como desde el ámbito académico se viene desarrollando una gran actividad en I+D centrada en elevar las prestaciones de las redes de interconexión. Este proyecto se enmarca precisamente en este ámbito investigador, ya que sus objetivos básicos son proponer nuevas técnicas que mejoren diversas facetas de las tecnologías de red de altas prestaciones existentes en el mercado (InfiniBand, BXI, HPC Ethernet, etc.), así como diseños de red que exploten óptimamente las posibilidades de sus componentes. Nuestras propuestas irán orientadas a alcanzar las prestaciones exigidas a la red en sistemas Exascale, pero siempre cuidando la eficiencia, desde distintos aspectos, como por ejemplo tratando de mantener un coste asequible o un consumo energético razonable. Además, se dedicará un esfuerzo especial al intercambio de ideas y opiniones con la industria del sector, de cara a conocer de primera mano sus necesidades y a aumentar las posibilidades de transferencia tecnológica.
Efficient Techniques for Advanced Interconnect Technologies 2
Supercomputers and Data Centers are fundamental infrastructures to drive the digital transition of any society, as they support services that are essential nowadays for industry, science, and public administrations, such as High-Performance Computing (HPC), Artificial Intelligence (AI), Big-Data or High-Performance Data Analytics. (HPDA). In order to provide the enormous amount of computing power and storage requested nowadays by these services, the current generation of supercomputers (in general known as Exascale generation) and modern CPDs include a huge number of computing and storage nodes, in the order of tens or hundreds of thousands. Such a number of nodes require a network able to interconnect them efficiently, so that the information necessary to complete collaboratively the processes flows among nodes fast enough to avoid idle times at the receiver nodes. Hence, both industry and academia have been developing intense R&D activities to boost the performance of interconnection networks. This project focuses particularly on this research context, as their basic objectives are proposing new techniques to improve several aspects of the interconnect technologies existing in the market (InfiniBand, BXI, HPC Ethernet, etc.), as well as network designs that leverage the features of their components. Our proposals will aim at reaching the performance demanded to the network in Exascale systems, but always taking care of efficiency from several points of view, for instance keeping an affordable cost, or reasonable levels of power consumption. In addition, special efforts will be devoted to the exchange of ideas and opinions with the companies in this sector, in order to have a first-hand knowledge of their needs and to increase the possibilities of technology transfer.
Research Team
- Pedro Javier García Full Professor at the Universidad de Castilla-La Mancha.
- Jesús Escudero Sahuquillo Associate Profesor at the Universidad de Castilla-La Mancha.
- Francisco José Quiles Full Professor at the Universidad de Castilla-La Mancha.
- José Luis Sánchez Full Professor at the Universidad de Castilla-La Mancha.
- Francisco J. Alfaro Full Professor at the Universidad de Castilla-La Mancha.
- Javier Cano Cano. Software Engineer at Red Hat.
- Antonio Morán Muñoz. PhD Student.
- Cristina Olmedilla López PhD Student.
- Gabriel Gómez López PhD Student.
- Miguel Sánchez De la Rosa PhD Student.
- José Duato Marín. Qsimov CTO.
- Gaspar Mora Porta. Senior Architect at Nvidia Corporation.
- Mikel Eukeni Pozo Astigarraga. Network Engineer at CERN.
- Said Derradji. Lead Hardware Architect at Atos BDS.
- Tor Skeie. Full Professor at the University of Oslo.
- Germán Maglione Mathey Senior Software Engineer at Red Hat.
- Jose Manuel Rocher González. Consultant at Simula Research Laboratory.
- Juan José García-Castro Crespo. Engineer at ARM Ltd.
- Francisco José Andújar Muñoz. Associate Profesor at the Universidad de Valladolid.
Patents
- P. Yébenes, J. Escudero, C. Gómez, P.J. García, F.J. Quiles, J. Duato. Método para reducir los efectos negativos de la congestión en redes de interconexión de alto rendimiento con topología híbrida para supercomputadores y grandes centros de proceso de datos. Nº P201331273, CCP ES 2529700 B1. Assignee: Universidad de Castilla-La Mancha (2016).
- J. Escudero, P.J. García, F.J. Quiles. Método para descongestionar el tráfico de datos en redes de interconexión basadas en tecnología Infiniband. Nº P201331916, CCP: ES 2539248 B1. Assignee: Universidad de Castilla-La Mancha (2016).
PhD Thesis
- New Queuing Schemes to Improve the Efficiency of Hybrid and Hierarchical High-Performance Interconnection Network Topologies
Student: Pedro Yébenes Segura. Advisors: Jesús Escudero Sahuquillo y Pedro Javier Garcia García. Reading date: 05/11/2018. - Habilitación de calidad de servicio en arquitecturas de switch jerárquico
Student: Javier Cano Cano. Advisors: Francisco José Alfaro Cortés y Francisco José Andújar Muñoz.Reading date: 07/10/2021. - Efficient routing, job-isolation and congestion control techniques in commercial interconnection networks
Student:German Horacio Maglione Mathey. Advisors: Jesús Escudero Sahuquillo y Pedro Javier Garcia García. Planned reading date: 18/10/2021. - Upstream Progressive Network Reconfiguration Schemes for High Performance Networks
Student: Juan José García-Castro Crespo. Advisors: Francisco José Alfaro Cortés, José Luis Sánchez García y José Flich Cardo. Planned reading date: /11/2021.
Journals
- Cristina Olmedilla, Jesús Escudero-Sahuquillo, Pedro Javier García, Francisco J. Alfaro-Cortés, José L. Sánchez, Francisco J. Quiles, Wenhao Sun, Xiang Yu, Yonghui Xu, José Duato: DVL-Lossy: Isolating Congesting Flows to Optimize Packet Dropping in Lossy Data-Center Networks. IEEE Micro 41(1): 37-44 (2021). DOI: 10.1109/MM.2020.3042263.
- Javier Cano-Cano, Francisco J. Andújar, Francisco J. Alfaro-Cortés, José L. Sánchez. Enabling Quality of Service Provision in Omni-Path Switches. Computational and Mathematical Methods, p. e1147, John Wiley & Sons, Ltd, (2021). doi:10.1002/CMM4.1147.
- Javier Cano-Cano, Francisco J. Andújar, Jesús Escudero-Sahuquillo, Francisco J. Alfaro-Cortés, José L. Sánchez. A methodology to enable QoS provision on InfiniBand hardware. The Journal of Supercomputing 2021 77:9 77(9), p. 9934-9946, Springer (2021). url, doi:10.1007/S11227-021-03667-X
- Javier Cano-Cano, Francisco J. Andújar, Francisco J. Alfaro-Cortés, José L. Sánchez. QoS provision in hierarchical and non-hierarchical switch architectures. Journal of Parallel and Distributed Computing 148, p. 138-150, Academic Press (2021). doi:10.1016/J.JPDC.2020.10.009
- Jose Rocher-Gonzalez, Jesus Escudero-Sahuquillo, Pedro J. García, Francisco J. Quiles, Gaspar Mora. Towards an efficient combination of adaptive routing and queuing schemes in Fat-Tree topologies. Journal of Parallel and Distributed Computing 147, p. 46-63, Academic Press (2021). doi:10.1016/J.JPDC.2020.07.009
- German Maglione-Mathey, Jesus Escudero-Sahuquillo, Pedro Javier Garcia, Francisco J. Quiles, Eitan Zahavi. Leveraging InfiniBand controller to configure deadlock-free routing engines for Dragonflies. Journal of Parallel and Distributed Computing 147, p. 16-33, Academic Press. (2021) doi:10.1016/J.JPDC.2020.07.010
- Juan-José Crespo, José L. Sánchez, Francisco J. Alfaro-Cortés, José Flich, José Duato. UPR: deadlock-free dynamic network reconfiguration by exploiting channel dependency graph compatibility. The Journal of Supercomputing 2021, p. 1-31, Springer (2021) doi:10.1007/S11227-021-03791-8.
- German Maglione-Mathey, Jesus Escudero-Sahuquillo, Pedro Javier Garcia, Francisco J. Quiles, Jose Duato. Path2SL: Leveraging InfiniBand resources to reduce head-of-line blocking in fat trees. IEEE Micro 40(1), p. 8-14, IEEE Computer Society, (2020), doi:10.1109/MM.2019.2949280.
- Francisco J. Andújar, Juan A. Villar, José L. Sánchez, Francisco J. Alfaro, José Duato, Holger Fröning. Constructing virtual 5-dimensional tori out of lower-dimensional network cards. Concurrency Computation 31(2), p. e4361, John Wiley and Sons Ltd (2019). doi:10.1002/cpe.4361.
- Javier Cano-Cano, Francisco J. Andújar, Francisco J. Alfaro, José L. Sánchez. Speeding up exascale interconnection network simulations with the VEF3 trace framework. Journal of Parallel and Distributed Computing 133, p. 124-135, Academic Press Inc. (2019). doi:10.1016/j.jpdc.2019.06.013.
- Juan Jose Crespo, José L. Sánchez, Francisco J. Alfaro-Cortés. Silicon photonic networks: Signal loss and power challenges. Concurrency Computation 31(21), p. 1-14 (2019), doi:10.1002/cpe.4777.
- Pedro Yebenes, Jose Rocher-Gonzalez, Jesus Escudero-Sahuquillo, Pedro Javier Garcia, Francisco J. Alfaro, Francisco J. Quiles, Crispín Gómez, Jose Duato. Combining Source-adaptive and Oblivious Routing with Congestion Control in High-performance Interconnects using Hybrid and Direct Topologies. ACM Transactions on Architecture and Code Optimization 16(2), p. 1-26, Association for Computing Machinery (2019). doi:10.1145/3319805.
- Escudero, P.J. García, F.J. Quiles, J. Duato, G. Maglione. Feasible enhancements to congestion control in InfiniBand-based networks. Journal of Parallel and Distributed Computing (2018). doi:10.1016/j.jpdc.2017.09.008.
- Maglione, P. Yébenes, J. Escudero, P.J. García, F.J. Quiles, E. Zahavi. Scalable Deadlock-free Deterministic Minimal-Path Routing Engine for InfiniBand-Based Dragonfly Networks. IEEE Transactions on Parallel and Distributed Systems (2018) doi:10.1109/TPDS.2017.2742503.
- F.J. Andújar, J.A. Villar, J.L. Sánchez, F.J. Alfaro. Applying search algorithms to obtain the optimal configuration of nDT torus nodes. Journal on Concurrency and Computation: Practice and Experience, 29(13), Wiley, July 2017.
- P. Yébenes, J. Escudero, P.J. García, F.J. Alfaro, F.J. Quiles. Providing differentiated services, congestion management, and deadlock freedom in dragonfly networks with adaptive routing. Journal on Concurrency and Computation: Practice and Experience, 29(13), Wiley, July 2017.
- F.J. Andújar, J.A. Villar, J.L. Sánchez, F.J. Alfaro, J. Duato. Adaptive routing for n-dimensional Twin Torus. IEEE Transactions on Computers, ISSN 0018-9340, 65(12), pp. 3780-3786 (2016).
- P. Yébenes, J. Escudero, P.J. García, F.J. Quiles. Straightforward solutions to reduce HoL blocking in different Dragonfly fully-connected interconnection patterns. The Journal of Supercomputing. 72(12), pp. 4497-4519 (2016).
- F.J. Andújar, J.A. Villar, F.J. Alfaro, J.L. Sánchez, J. Escudero-Sahuquillo. An open-source family of tools to reproduce MPI-based workloads in interconnection network simulators. The Journal of Supercomputing, ISSN 0920-8542, 72(12), pp. 4601-4628 (2016).
International Conferences
- Cristina Olmedilla, Jesús Escudero-Sahuquillo, Pedro Javier García, Francisco J. Alfaro-Cortés, Francisco J. Quiles, José L. Sánchez, Wenhao Sun, Xiang Yu, Yonghui Xu, José Duato: Optimizing Packet Dropping by Efficient Congesting-Flow Isolation in Lossy Data-Center Networks. Hot Interconnects 2020: 47-54.
- JM. Rocher, J. Escudero, P.J. García, F.J. Quiles, G. Mora. Efficient Congestion Management for High-Speed Interconnects using Adaptive Routing. CCGRID 2019: 221-230.
- Crespo, J. J., Maglione-Mathey, G., Sanchez, J. L., Alfaro-Cortes, F. J., Escudero-Sahuquillo, J., Garcia, P. J., & Quiles, F. J. (2019, July). Methodology for Decoupled Simulation of SystemVerilog HDL Designs. In 2019 International Conference on High Performance Computing & Simulation (HPCS) (pp. 741-746). IEEE.
- Luis Gonzalez-Naharro, Jesús Escudero-Sahuquillo, Pedro Javier García, Francisco J. Quiles, José Duato, Wenhao Sun, Xiang Yu, Hewen Zheng: Modeling Traffic Workloads in Data-center Network Simulation Tools. HPCS 2019: 1036-1042.
- Luis Gonzalez-Naharro, Jesús Escudero-Sahuquillo, Pedro Javier García, Francisco José Quiles Flor, José Duato, Wenhao Sun, Li Shen, Xiang Yu, Hewen Zheng: Efficient Dynamic Isolation of Congestion in Lossless DataCenter Networks. NEAT@SIGCOMM 2019: 15-21.
- Felix Zahn, Pedro Yébenes, Jesús Escudero-Sahuquillo, Pedro Javier García, Holger Fröning: Effects of Congestion Management on Energy Saving Techniques in Interconnection Networks. HiPINEB@HPCA 2019: 9-16.
- G. Maglione-Mathey, J. Escudero-Sahuquillo, P. J. Garcia, F. J. Quiles and J. Duato, “Path2SL: Optimizing Head-of-Line Blocking Reduction in InfiniBand-Based Fat-Tree Networks,” 2019 IEEE Symposium on High- Performance Interconnects (HOTI), 2019, pp. 5-8, doi: 10.1109/HOTI.2019.00014.
- Javier Cano-Cano, Francisco J. Andújar, Francisco J. Alfaro-Cortés, José L. Sánchez. VEF3 Traces: Towards a Complete Framework for Modelling Network Workloads for Exascale Systems. IEEE International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB), p. 32-39, (2018).
- Juan A. Villar, German Maglione-Mathey, Jesus Escudero-Sahuquillo, Pedro Javier Garcia, Francisco J. Alfaro-Cortés, José L. Sánchez and Francisco J. Quiles, “TopGen: A Library to Provide Simulation Tools with the Modeling of Interconnection Network Topologies”, 2018 International Conference on High Performance Computing Simulation (HPCS), 2018, pp. 452-459, https://doi.org/10.1109/HPCS.2018.00078.
- J.M. Rocher, J. Escudero, P.J. García, F.J. Quiles. On the impact of routing algorithms in the effectiveness of queuing schemes in high-performance interconnection networks. IEEE Hot Interconnects (HoTI) (2017).
- P. Yébenes, J. Escudero, P.J. García, F.J. Quiles, T. Hoefler. Improving non-minimal and adaptive routing algorithms in Slim Fly networks. IEEE Hot Interconnects (HoTI) (2017). Best student paper award.
- J.J. García-Castro, F.J. Alfaro, J.L. Sánchez. Design simulation tool for silicon integrated photonics towards exascale systems. The 10th Workshop on UnConventional High Performance Computing, en conjunción con EuroPar (2017).
- F.J. Andújar, J.A. Villar, J.L. Sánchez, F.J. Alfaro, J. Duato, H. Fröening. A case study on implementing virtual 5D torus networks using network components of lower dimensionality. HiPINEB, en conjunción con HPCA, pp. 9-16 (2017).
- P. Yébenes, J. Escudero, P.J. García, F.J. Quiles, T. Hoefler. An effective queuing scheme to provide Slim Fly topologies with HoL Blocking reduction and deadlock freedom for minimal-path routing. HiPINEB, en conjunción con HPCA, pp. 9-16 (2017).
- J. Cano, J.J. García-Castro, F.J. Alfaro, J.L. Sánchez. Optical network-on-chip signal losses. ACACES HiPEAC, pp. 10-16, (2016).
- P. Yébenes, G. Maglione, J. Escudero, P.J. García, F.J. Quiles: Modeling a switch architecture with virtual output queues and virtual channels in HPC-systems simulators. In Proceedings of the 2016 International Conference on High Performance Computing & Simulation (HPCS), pp. 380-386 (2016).
- P. Yébenes, J. Escudero, P.J. García, F.J. Alfaro, F.J. Quiles. Providing differentiated services, congestion management, and deadlock freedom in Dragonfly networks. HiPINEB, en conjunción con HPCA, pp. 33-40 (2016).
- G. Maglione, P. Yébenes, J. Escudero, P.J. García, F.J. Quiles. Combining OpenFabrics software and simulation tools for modeling InfiniBand-based interconnection networks. HiPINEB, en conjunción con HPCA, pp. 55-58 (2016).